|Riccardo Volpi, Cesar de Souza, Yannis Kalantidis, Diane Larlus, Gregory Rogez|
|Findings of the Workshop on Continual Learning in Computer Vision (CLVISION) at the Conference on Computer Vision and Pattern Recognition (CVPR), virtual event, 25 June 2021|
Most learning algorithms rely on certain assumptions to work properly, namely that a) all training data is available before training, and b) all training data belongs to the same distribution. However, it should be easy to see those assumptions do not always hold true in practice. A robot interacting with the environment may obtain new data samples as time goes by; a SaaS operator may want to improve models which have already been
trained and deployed without re-starting training from scratch or revisiting all data samples already seen by the model; or one could want to learn a model on a potentially infinite source of data. In this work, we explore memory-based methods to efficiently train neural networks when training samples are provided in the form of a data stream whose underlying distribution changes in well-distinct domains—on which we desire to perform uniformly well. We show that different memory-update strategies have a deep impact in the efficacy of the learning, addressing the catastrophic forgetting phenomena often associated with a shift in the input domain. We provide a protocol for assessing the characteristics of different strategies and show how choosing them correctly can result in models that are less sensitive to a particular choice of hyper-parameters, such as the learning rate.