Riccardo Volpi |
2020 |
Modern machine learning has been successfully applied to many problems, recently bringing huge improvements to several fields as diverse as object localization and recognition in images [4], game playing [12] and autonomous driving [1], to name but a few. Unfortunately, such performance is achieved at the expense of having a single, very specialized model per task i.e. a model trained to distinguish between cats and dogs has no idea how to distinguish apples from pears. Updating the recognition model to take into account these two new classes works to some extent but the same, underlying technology that makes these models perform so well (deep learning), has a major drawback: it has a terrible memory! After a few model updates, requiring multiple retraining as sets of new classes come along (called ”fine-tuning” on the newly proposed example images), the initial model will have forgotten what differentiates a dog from a cat. In jargon, we call this issue catastrophic forgetting [8, 2]. The line of research which tries to mitigate this issue, and which we discuss here, is called lifelong learning.
How big the memory problem is depends very much on the application. In some contexts, we might be happy with a model that can do a single task very well, without the need to learn new concepts during its lifespan. One example would be warehouse robots who need to perform repetitive actions in environments that rarely change. In other contexts, catastrophic forgetting represents a real issue. Think about a computer vision model for a social network app such as a filter to detect whether images violate a code of conduct before they’re posted. The module may be initially trained to handle realistic images (DSLR/phone photos). After some time, you may want it to also handle sketches or artwork, and update the model using relevant samples. However, unless you take the necessary precautions, your module is likely to forget how to handle the camera images.
In this blog post, we take you on a journey through the current state of lifelong learning research. Some of the points we cover are the solutions proposed, what researchers are currently focussed on and the benchmarks they use.
Throughout the lifetime of a model, different factors can vary across time. It can be exposed to new domains, to new tasks, to new classes or a combination of all of them. These different factors are illustrated in Figure 1.
New domains, where domain means image statistics. One example is the one used earlier, where we’ve trained a model on camera images and later we want to include sketches of drawings in the “comfort zone” of the model. In Figure 1 (left), in a simplified triangles vs. circles task, we have a domain where shapes are blue and a domain where shapes are purple.
New tasks. We have a model trained on one (or more) task(s), and we wish to include more. Using the same example as above, we may have an original filter that classifies whether an image contains violence, and in a second phase we may also want to be able to tell whether it contains hate. In Figure 1 (middle) the first task is again triangles vs. circles, and the second task is squares vs. crosses.
New classes. We have a model trained on a certain number of classes, and we want to include other classes. For an intuitive example, consider a house robot that needs to recognize an increasing number of different concepts. In Figure 1 (right), we’re again initially interested in the triangles vs. circles task, but later we want to also classify squares vs. crosses.
Deep learning models are generally trained by being repeatedly exposed to image samples they need to learn. An effective solution to the issue of forgetting (at least for the accuracy of a model), is that of keeping a record of the data used to devise the original model, so that every time new samples from new domains/tasks/classes arrive, one can also re-use the original ones, helping the model to remember.
This strategy requires a steadily increasing amount of memory and hence a steadily increasing use of energy and, in some cases, it’s not even possible to keep track of old data (for example, for privacy issues where retention periods expire). Lifelong learning research tries to find more effective and efficient solutions to the problem. We cover here the main ones, largely drawing from Parisi et al. [9] and Maltoni and Lomonaco [7] to categorize them (see also Table 1).
Rehearsal-based methods. These methods keep a memory buffer with samples from the older domains/tasks/classes that the model needs to remember. The main directions are (i) avoiding new knowledge interfering with existing knowledge and (ii) reducing memory requirements. The main advantage of these methods is their effectiveness in retaining prior knowledge. The main disadvantage is the need to store old data. This can be problematic for several reasons; apart from memory requirements, sometimes it’s just not possible to keep track of old data, e.g., for privacy related issues (especially in the medical field) or if the original model comes from a third party.
Architecture growing. These methods help a model to remember old information by increasing the number of parameters throughout its lifespan. The idea is to protect old patterns by freezing some parts of the model that were previously trained. This is effective in avoiding catastrophic forgetting but a major drawback is the memory requirements of the model which increase throughout its lifespan. This may even be critical in embedded systems (for example, mobile applications) where the model needs to fit specific constraints such as limited memory.
Regularization strategies. Regularization strategies face the lifelong learning problem from an optimization perspective. These methods generally constrain the loss that the model optimizes in a way that penalizes it from forgetting earlier concepts. A notable example is penalizing the important weights for severe changes. The huge advantage of this class of methods is that it overcomes the weaknesses of the other two families: there’s no need to store old data points nor to increase the model capacity throughout its lifespan. However it does become increasingly difficult to retain good performance on older tasks.
One could naturally take the best of all worlds and define a hybrid strategy. See Figure 2 for an overview of the various possibilities.
Defining realistic protocols to assess the performance of lifelong learning models is a vibrant research area itself. The use of non-realistic protocols has lately been a source of debate. Some widely adopted evaluation benchmarks are indeed way less realistic than the ones typically used in “standard” machine learning. For example, the most adopted benchmark constitutes learning from different versions of the MNIST dataset where pixels are (unrealistically) randomly permuted. It’s true that even in such simplistic contexts neural networks generally forget the past as new information comes in, but these scenarios are extremely different to any realistic application.
Among different attempts to propose new protocols, a notable one is the CORe-50 dataset [6], where lifelong learning performance can be evaluated in different directions (varying classes, varying domains or both). Furthermore, it also allows learning from temporally coherent streams of data, which is consistent with the way humans are exposed to visual information. This direction is also pursued in a very recent work [11], which introduces a new dataset that allows learning from streams of data recorded in the wild.
Practitioners also started using ImageNet, with the goal of sequentially learning samples from the 1,000 provided classes. Intriguing results were achieved with the REMIND algorithm [3], where competitive performance is achieved with just a single pass over ImageNet’s samples.
Of course we can’t predict the future, but we strongly believe that lifelong learning will play a crucial role in democratizing AI applications. This is because real, ambient AI should be able to adapt to an evolving environment. This will only be possible if models can enrich their capabilities as they’re exposed to new problems they need to solve, instead of drifting away from their initial purpose.
Our bet is that, although rehearsal approaches might be the only solution in some cases, we’ll see the gap narrow between them and methods that are less data-demanding (i.e. regularization strategies). Some meta-learning approaches have started to appear providing alternatives with very promising performance. The idea here is to learn the learning algorithms themselves, in order to accommodate specific needs (for instance, avoiding catastrophic forgetting). In contexts where rehearsal is the only option, a natural direction is to reduce the storage requirements. Recently, different pieces of work have independently explored the space of “featured replay” [5, 10, 3], where the information related to previous tasks is stored in a more compressed fashion, namely as feature embeddings.
Apart from the methods themselves, significant effort has been devoted to the design of more realistic protocols that more closely mimic the conditions in which a human learns. For instance, there’s a surge of interest in benchmarks where the goal is learning from a data stream, without allowing the learner to perform multiple passes over the data. This is very exciting and challenges the classical learning setting where neural networks perform so well (performing multiple passes over a training set). The number of applications that could arise from a learning system that can efficiently learn from a stream of data is countless! We passionately look forward to the next few years of lifelong learning research and what we’re working on to contribute to future progress.
Acknowledgements: Other contributors to this post are Diane Larlus and Gregory Rogez.
[1] End to End Learning for Self-Driving Cars. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao and Karol Zieba. arXiv:1604.07316 [cs.CV], 2016.
[2] Catastrophic Interference in Connectionist Networks: Can It Be Predicted, Can It Be Prevented? Robert M French. Proceedings of Advances in Neural Information Processing Systems 6 (NIPS), 1993.
[3] REMIND Your Neural Network to Prevent Catastrophic Forgetting. Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya and Christopher Kanan. Proceedings of the European Conference on Computer Vision (ECCV), 2020.
[4] ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever and Geoffrey E Hinton. Proceedings of Advances in Neural Information Processing Systems (NIPS), 2012.
[5] Continuous Domain Adaptation with Variational Domain-Agnostic Feature Replay. Qicheng Lao, Xiang Jiang, Mohammad Havaei, and Yoshua Bengio. arXiv:2003.04382 [cs.LG], 2020.
[6] CORe50: a New Dataset and Benchmark for Continuous Object Recognition. Vincenzo Lomonaco and Davide Maltoni. Proceedings of the Conference on Robot Learning (CoRL), pp. 17 – 26, 2017.
[7] Continuous learning in single-incremental-task scenarios. Davide Maltoni and Vincenzo Lomonaco. Neural Networks, 116: 56-73, 2019. DOI: 10.1016/.jneunet.2019.03.010
[8] Catastrophic interference in connectionist networks: The sequential learning problem. Michael McCloskey and Neil J. Cohen. The Psychology of Learning and Motivation, 24: 109–165, 1989. DOI: 10.1016/S0079-7421(08)60536-8
[9] Continual Lifelong Learning with Neural Networks: A Review. German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan and Stefan Wermter. Neural Networks, 113: 54–71, 2019. DOI: 10.1016/j.neunet.2019.01.012
[10] Latent Replay for Real-Time Continual Learning. Lorenzo Pellegrini, Gabrile Graffieti, Vincenzo Lomonaco and Davide Maltoni. arXiv:1912.01100 [cs.LG], 2019.
[11] Stream-51: Streaming Classification and Novelty Detection from Videos. Ryne Roady, Tyler L. Hayes, Hitesh Vaidya and Christopher Kanan. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Workshops, 2020.[12] Mastering the game of Go with deep neural networks and tree search. David Silver, Aja Huang, Chris J. Maddison et al. Nature, 529: 484–489, (2016). DOI: 10.1038/nature16961.
This blogpost was written as part of the Lifelong Representation Learning Chair of the MIAI Institute.
NAVER LABS Europe 6-8 chemin de Maupertuis 38240 Meylan France Contact
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimization problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimization to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments. More details on our research can be found in the Explore section below.
For a robot to be useful it must be able to represent its knowledge of the world, share what it learns and interact with other agents, in particular humans. Our research combines expertise in human-robot interaction, natural language processing, speech, information retrieval, data management and low code/no code programming to build AI components that will help next-generation robots perform complex real-world tasks. These components will help robots interact safely with humans and their physical environment, other robots and systems, represent and update their world knowledge and share it with the rest of the fleet. More details on our research can be found in the Explore section below.
Visual perception is a necessary part of any intelligent system that is meant to interact with the world. Robots need to perceive the structure, the objects, and people in their environment to better understand the world and perform the tasks they are assigned. Our research combines expertise in visual representation learning, self-supervised learning and human behaviour understanding to build AI components that help robots understand and navigate in their 3D environment, detect and interact with surrounding objects and people and continuously adapt themselves when deployed in new environments. More details on our research can be found in the Explore section below.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
—————
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
1. Difference in female/male salary: 34/40 points
2. Difference in salary increases female/male: 35/35 points
3. Salary increases upon return from maternity leave: Non calculable
4. Number of employees in under-represented gender in 10 highest salaries: 5/10 points
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
——————-
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
1. Les écarts de salaire entre les femmes et les hommes: 34 sur 40 points
2. Les écarts des augmentations individuelles entre les femmes et les hommes : 35 sur 35 points
3. Toutes les salariées augmentées revenant de congé maternité : Incalculable
4. Le nombre de salarié du sexe sous-représenté parmi les 10 plus hautes rémunérations : 5 sur 10 points
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimisation problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimisation to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments.
The research we conduct on expressive visual representations is applicable to visual search, object detection, image classification and the automatic extraction of 3D human poses and shapes that can be used for human behavior understanding and prediction, human-robot interaction or even avatar animation. We also extract 3D information from images that can be used for intelligent robot navigation, augmented reality and the 3D reconstruction of objects, buildings or even entire cities.
Our work covers the spectrum from unsupervised to supervised approaches, and from very deep architectures to very compact ones. We’re excited about the promise of big data to bring big performance gains to our algorithms but also passionate about the challenge of working in data-scarce and low-power scenarios.
Furthermore, we believe that a modern computer vision system needs to be able to continuously adapt itself to its environment and to improve itself via lifelong learning. Our driving goal is to use our research to deliver embodied intelligence to our users in robotics, autonomous driving, via phone cameras and any other visual means to reach people wherever they may be.
This web site uses cookies for the site search, to display videos and for aggregate site analytics.
Learn more about these cookies in our privacy notice.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.