Why, in a world where the only constant is change, we need a Continual Learning approach to AI models.
Imagine you have a small robot that is designed to walk around your garden and water your plants. Initially, you spend a few weeks collecting data to train and test the robot, investing considerable time and resources. The robot learns to navigate the garden efficiently when the ground is covered with grass and bare soil.
However, as the weeks go by, flowers begin to bloom and the appearance of the garden changes significantly. The robot, trained on data from a different season, now fails to recognise its surroundings accurately and struggles to complete its tasks. To fix this, you need to add new examples of the blooming garden to the model.
Your first thought is to add new data examples to the training and retrain the model from scratch. But this is expensive and you do not want to do this every time the environment changes. In addition, you have just realised that you do not have all the historical training data available.
Now you consider just fine-tuning the model with new samples. But this is risky because the model may lose some of its previously learned capabilities, leading to catastrophic forgetting (a situation where the model loses previously acquired knowledge and skills when it learns new information).
..so is there an alternative? Yes, using Continual Learning!
Of course, the robot watering plants in a garden is only an illustrative example of the problem. In the later parts of the text you will see more realistic applications.
Learn adaptively with Continual Learning (CL)
It is not possible to foresee and prepare for all the possible scenarios that a model may be confronted with in the future. Therefore, in many cases, adaptive training of the model as new samples arrive can be a good option.
In CL we want to find a balance between the stability of a model and its plasticity. Stability is the ability of a model to retain previously learned information, and plasticity is its ability to adapt to new information as new tasks are introduced.
“(…) in the Continual Learning scenario, a learning model is required to incrementally build and dynamically update internal representations as the distribution of tasks dynamically changes across its lifetime.” [2]
But how to control for the stability and plasticity?
Researchers have identified a number of ways to build adaptive models. In [3] the following categories have been established:
- Regularisation-based approach
- In this approach we add a regularisation term that should balance the effects of old and new tasks on the model structure.
- For example, weight regularisation aims to control the variation of the parameters, by adding a penalty term to the loss function, which penalises the change of the parameter by taking into account how much it contributed to the previous tasks.
2. Replay-based approach
- This group of methods focuses on recovering some of the historical data so that the model can still reliably solve previous tasks. One of the limitations of this approach is that we need access to historical data, which is not always possible.
- For example, experience replay, where we preserve and replay a sample of old training data. When training a new task, some examples from previous tasks are added to expose the model to a mixture of old and new task types, thereby limiting catastrophic forgetting.
3. Optimisation based approach
- Here we want to manipulate the optimisation methods to maintain performance for all tasks, while reducing the effects of catastrophic forgetting.
- For example, gradient projection is a method where gradients computed for new tasks are projected so as not to affect previous gradients.
4. Representation-based approach
- This group of methods focuses on obtaining and using robust feature representations to avoid catastrophic forgetting.
- For example, self-supervised learning, where a model can learn a robust representation of the data before being trained on specific tasks. The idea is to learn high-quality features that reflect good generalisation across different tasks that a model may encounter in the future.
5. Architecture-based approach
- The previous methods assume a single model with a single parameter space, but there are also a number of techniques in CL that exploit model’s architecture.
- For example, parameter allocation, where, during training, each new task is given a dedicated subspace in a network, which removes the problem of parameter destructive interference. However, if the network is not fixed, its size will grow with the number of new tasks.
And how to evaluate the performance of the CL models?
The basic performance of CL models can be measured from a number of angles [3]:
- Overall performance evaluation: average performance across all tasks
- Memory stability evaluation: calculating the difference between maximum performance for a given task before and its current performance after continual training
- Learning plasticity evaluation: measuring the difference between joint training performance (if trained on all data) and performance when trained using CL
So why don’t all AI researchers switch to Continual Learning right away?
If you have access to the historical training data and are not worried about the computational cost, it may seem easier to just train from scratch.
One of the reasons for this is that the interpretability of what happens in the model during continual training is still limited. If training from scratch gives the same or better results than continual training, then people may prefer the easier approach, i.e. retraining from scratch, rather than spending time trying to understand the performance problems of CL methods.
In addition, current research tends to focus on the evaluation of models and frameworks, which may not reflect well the real use cases that the business may have. As mentioned in [6], there are many synthetic incremental benchmarks that do not reflect well real-world situations where there is a natural evolution of tasks.
Finally, as noted in [4], many papers on the topic of CL focus on storage rather than computational costs, and in reality, storing historical data is much less costly and energy consuming than retraining the model.
If there were more focus on the inclusion of computational and environmental costs in model retraining, more people might be interested in improving the current state of the art in CL methods as they would see measurable benefits. For example, as mentioned in [4], model re-training can exceed 10 000 GPU days of training for recent large models.
Why should we work on improving CL models?
Continual learning seeks to address one of the most challenging bottlenecks of current AI models — the fact that data distribution changes over time. Retraining is expensive and requires large amounts of computation, which is not a very sustainable approach from both an economic and environmental perspective. Therefore, in the future, well-developed CL methods may allow for models that are more accessible and reusable by a larger community of people.
As found and summarised in [4], there is a list of applications that inherently require or could benefit from the well-developed CL methods:
- Model Editing
- Selective editing of an error-prone part of a model without damaging other parts of the model. Continual Learning techniques could help to continuously correct model errors at much lower computational cost.
2. Personalisation and specialisation
- General purpose models sometimes need to be adapted to be more personalised for specific users. With Continual Learning, we could update only a small set of parameters without introducing catastrophic forgetting into the model.
3. On-device learning
- Small devices have limited memory and computational resources, so methods that can efficiently train the model in real time as new data arrives, without having to start from scratch, could be useful in this area.
4. Faster retraining with warm start
- Models need to be updated when new samples become available or when the distribution shifts significantly. With Continual Learning, this process can be made more efficient by updating only the parts affected by new samples, rather than retraining from scratch.
5. Reinforcement learning
- Reinforcement learning involves agents interacting with an environment that is often non-stationary. Therefore, efficient Continual Learning methods and approaches could be potentially useful for this use case.
Learn more
As you can see, there is still a lot of room for improvement in the area of Continual Learning methods. If you are interested you can start with the materials below:
- Introduction course: [Continual Learning Course] Lecture #1: Introduction and Motivation from ContinualAI on YouTube https://youtu.be/z9DDg2CJjeE?si=j57_qLNmpRWcmXtP
- Paper about the motivation for the Continual Learning: Continual Learning: Application and the Road Forward [4]
- Paper about the state of the art techniques in Continual Learning: Comprehensive Survey of Continual Learning: Theory, Method and Application [3]
If you have any questions or comments, please feel free to share them in the comments section.
Cheers!
[1] Awasthi, A., & Sarawagi, S. (2019). Continual Learning with Neural Networks: A Review. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (pp. 362–365). Association for Computing Machinery.
[2] Continual AI Wiki Introduction to Continual Learning https://wiki.continualai.org/the-continualai-wiki/introduction-to-continual-learning
[3] Wang, L., Zhang, X., Su, H., & Zhu, J. (2024). A Comprehensive Survey of Continual Learning: Theory, Method and Application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5362–5383.
[4] Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke Hüllermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, & Gido M. van de Ven. (2024). Continual Learning: Applications and the Road Forward https://arxiv.org/abs/2311.11908
[5] Awasthi, A., & Sarawagi, S. (2019). Continual Learning with Neural Networks: A Review. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (pp. 362–365). Association for Computing Machinery.
[6] Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, & Fartash Faghri. (2024). TiC-CLIP: Continual Training of CLIP Models.