Topics

SuperLoss: Robust curriculum learning helps machines to learn like humans

Published by Jérome Revaud at 9 December 2020

2020

Our novel framework, SuperLoss—which uses individual sample losses as error measures to determine the relative difficulty of samples in a dataset—can be plugged on top of existing neural network models to implement curriculum learning for any task, even with noisy datasets.

Generally, humans (and animals) learn concepts by mastering a series of progressively challenging problems. One example of this process is the way in which schoolchildren learn to solve increasingly advanced mathematical problems over several years, as illustrated in Figure 1. By learning simpler concepts first, children are better equipped to solve more difficult problems later. Overcoming simple challenges in this way provides them with a baseline of knowledge that can be built upon iteratively.

Curriculum learning takes inspiration from this natural style of learning and applies it in the context of machines (1). Typically in machine learning, a neural network is tasked with sequential samples taken randomly from the entire set of training data. In curriculum learning, however, the network is instead presented with the easier samples first. This approach has been shown to perform better than traditional machine learning (2, 3, 4), even for small datasets (5).

For curriculum learning in computers to be a success, some prior knowledge about the task at hand is normally required. This is because the relative difficulty of each sample in a given dataset must first be estimated to enable the network to tackle the samples in curriculum order.

Estimating the difficulty of samples in a dataset

In early work on curriculum learning (2), experiments were carried out on toy datasets in which the separation between easy and hard samples was clear and predefined in the dataset construction. More recent approaches (6) have shown that the losses—i.e. the prediction error—of samples during training can be used to identify which ones are difficult, as they usually show a high loss across training compared to easy samples. To effectively apply curriculum learning, the importance of the difficult samples is lessened while training by reducing the weight of their contribution (downweighting). At later stages of the training process, the model learns to tackle more difficult samples, which thus contribute more to the training objective.

However, this approach is challenging to implement. Even ‘self-learning’ models that are capable of estimating difficulty themselves call for significant changes to the training procedure to work properly. They may require, for example, multistage training (7), extra parameters or layers (8, 9), and ad hoc adaptations specific to each task. For these reasons, such methods are generally specialized to specific tasks, like image classification (10, 11).

All in all, current curriculum-learning approaches demand significant adaptation of the training procedure for a given task and, for this reason, generally require dedicated training schemes. Such schemes are time-consuming to implement and computationally expensive in practice, as well as being restrictive in terms of application. Additionally, they often require clean labelled datasets for training, which places further limits on their applicability.

SuperLoss: a straightforward framework for implementing curriculum learning for any task

We have developed an easy-to-use framework, called SuperLoss, which makes curriculum learning applicable to any task (12). Our SuperLoss module can in fact be plugged on top of an existing loss function during training, as shown in Figure 2. SuperLoss automatically downweights the contribution of hard samples while upweighting easy samples, effectively implementing the core principle of curriculum learning.

To determine the difference between easy and hard samples, the current loss of a sample is compared with respect to an exponential averaging of the losses over all samples. A direct benefit of this approach is that no change is required at test time. Also, very little additional computation overhead is added during the learning process.

Using confidence estimates to increase the reliability of network predictions

Our work is inspired by a family of recently proposed loss functions referred to as ‘confidence-aware’. Such functions incorporate confidence estimates, which increase the reliability of predictions made by a neural network without adding a great deal of computational cost for training. Additionally, a confidence-aware loss allows curriculum learning to be performed automatically. Existing loss functions are specialized to precise tasks and do not generalize easily, which limits their application.

Somewhat surprisingly, however, we’ve discovered that confidence-aware loss functions for different tasks share striking similarities.

Three recently designed confidence-aware loss functions are shown in Figure 3. Each was designed for a different task and was independently proposed (10, 13, 14). In each plot, the region that corresponds to the low-confidence value is almost flat, whereas the higher confidence region contains standard/emphasized gradients. In other words, the gradient of the loss with respect to the network parameters increases with the confidence when all other parameters are fixed.

Based on these similarities, we propose a novel way to transform any loss function into a confidence-aware version. Our solution is a task-agnostic, interpretable, confidence-aware loss function that receives the standard loss and an additional confidence parameter. For it to comply with any type of loss, we design our function such that it is translation-invariant and homogeneous with respect to the input loss and that it generalizes the input loss.

The formulation of our confidence-aware transform admits an optimal confidence value given the input loss (as this specific confidence value has a closed-form solution). We can therefore define the SuperLoss as the value of our confidence-aware loss for the optimal confidence. The SuperLoss has a single input: the original loss value. Therefore, it can simply be placed on top of any loss function (hence the name!) and does not require any change in the training procedure, nor any extra parameters.

High robustness to noise with SuperLoss enables learning from automatically collected web data

To determine the performance of SuperLoss, we carry out extensive experiments on various computer vision tasks (image classification, deep regression, object detection and image retrieval). Overall, our results show that the use of SuperLoss gives rise to small, consistent improvements when training on clean data. More significantly, however, we found that for data labels containing noise—e.g. those automatically collected from the web—training with SuperLoss leads to significantly higher performance.

The improvement in accuracy for noisy data arises because noisy samples will remain difficult (have a high loss) even after numerous passes (or epochs). As a result of this, the contributions of these samples during training will be downweighted by the SuperLoss module. This is illustrated in the plot in Figure 4, which shows the evolution of losses for easy, hard and noisy samples. The easy and hard samples are determined to be clean with small and high loss, respectively, after a few passes (or epochs) during training.

We compared SuperLoss to a number of other models on datasets with varying noise (see Figure 5). Our results show that the addition of noise, in the form of false labels, causes the performance of all models to decrease significantly when using standard training. However, the results obtained with SuperLoss show that accuracy is less impacted by noise, even at high percentages (80%). This is true even compared to most state-of-the-art models, which have more limited applicability (e.g. are specialized to image classification, require the addition of novel network parameters and/or necessitate a change in the procedure to work properly).

Another strong result that we obtain is on image retrieval when the model is trained on a dataset automatically collected from the web. Researchers have previously had to apply geometric verification to clean such data, and then use the subset of clean data to obtain a reasonable performance in this area (15). However, by plugging SuperLoss on top of their method, we obtain a better performance when training on the full, noisy dataset rather than the subset of clean data. This highlights the ability of SuperLoss to enable learning from large, automatically collected noisy datasets instead of clean, manually curated datasets.

Summarizing SuperLoss and future work

In summary, we have developed a novel, easy-to-use framework that enables curriculum learning to be applied for any task, even for noisy data. SuperLoss works as a module that can be plugged on top of an existing loss function to increase the accuracy of any model by upweighting easy questions and downweighting hard ones, thus creating a curriculum by which the network is able to learn. This approach is computationally less expensive than state-of-the-art models and, moreover, does not require specialization for a specific task. Additionally, SuperLoss is adept at learning from large, noisy datasets, such as those collected automatically from the web. In this respect, we plan to investigate how SuperLoss could help in the context of semi-supervised learning where some labels are missing, which in a sense is another form of noise.

References

Learning and development in neural networks: the importance of starting small. Jeffrey L. Elman. Cognition, vol. 48, no. 1, 1993, pp. 71–99.
Curriculum learning. Yoshua Bengio, Jérôme Louradour, Ronan Collobert, Jason Weston. Proceedings of the 26th Annual International Conference on Machine Learning (ICML ’09), Montreal, Quebec, Canada, 14–18 June 2009.
Training agent for first-person shooter game with actor-critic curriculum learning. Yuxin Wu, Yuandong Tian. 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, 24–29 April 2017.
Curriculum learning for multi-task classification of visual attributes. Nikolaos Sarafianos, Theodore Giannakopoulos, Christophoros Nikou, Ioannis A. Kakadiaris. Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017.
Multi-task curriculum transfer deep learning of clothing attributes. Qi Dong, Shaogang Gong, Xiatian Zhu. IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, California, USA, 24–31 March 2017.
SELF: learning to filter noisy labels with self-ensembling. Duc Tam Nguyen, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Laura Beggel, Thomas Brox. 8th International Conference on Learning Representations (ICLR), virtual event, 26 April–1 May
Self-paced learning for latent variable models. M. Pawan Kumar, Benjamin Packer, Daphne Koller. Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS’10), Vancouver, Canada, 6–11 December 2010.
02U-Net: a simple noisy label detection approach for deep neural networks. Jinchi Huang, Lie Qu, Rongfei Jia, Binqian Zhao. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019.
MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels. Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei. 2018 International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018.
Data parameters: a new family of parameters for learning a differentiable curriculum. Shreyas Saxena, Oncel Tuzel, Dennis DeCoste. Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 8–14 December 2019.
Dynamic curriculum learning for imbalanced data classification. Yiru Wang, Weihao Gan, Jie Yang, Wei Wu, Junjie Yan. IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019.
SuperLoss: a generic loss for robust curriculum learning. Thibault Castells, Philippe Weinzaepfel, Jerome Revaud. Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), virtual event, 6–12 December 2020.
Self-supervised learning of geometrically stable features through probabilistic introspection. David Novotny, Samuel Albanie, Diane Larlus, Andrea Vedaldi. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA, 18–22 June 2018.
R2D2: reliable and repeatable detector and descriptor. Jerome Revaud, César De Souza, Martin Humenberger, Philippe Weinzaepfel. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 8 December–14 December 2019.
Deep image retrieval: learning global representations for image search. Albert Gordo, Jon Almazán, Jérome Revaud, Diane Larlus. 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016.

SuperLoss: Robust curriculum learning helps machines to learn like humans

Estimating the difficulty of samples in a dataset

SuperLoss: a straightforward framework for implementing curriculum learning for any task

Using confidence estimates to increase the reliability of network predictions

High robustness to noise with SuperLoss enables learning from automatically collected web data

Summarizing SuperLoss and future work

References

Related Content

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

Topics

SuperLoss: Robust curriculum learning helps machines to learn like humans

Estimating the difficulty of samples in a dataset

SuperLoss: a straightforward framework for implementing curriculum learning for any task

Using confidence estimates to increase the reliability of network predictions

High robustness to noise with SuperLoss enables learning from automatically collected web data

Summarizing SuperLoss and future work

References

Related Content

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings