Christopher Dance, Arnaud Sors, Quentin Grail |
2019 |
A bit of background:
The 36th International Conference on Machine Learning (ICML) was held at Long Beach Convention Center in California. ICML continues to grow rapidly: in 2019 there were 774 papers accepted out of 3424 submissions, compared with 621 acceptances for 2473 submissions in 2018.
The conference had (up to) nine parallel streams, which made it hard to decide what to attend – but it may have been slightly easier than 2018 when there were 10 parallel streams. Each stream had three sessions (11am, 2pm and 4pm), and each session had one 20-minute presentation followed by nine 5-minute spotlight presentations (versus 10-minute spotlights in 2018). For presenters having only 5 minutes, this was often not enough time to get across their main idea.
The institution contributing the most articles (measuring the ratio of the number of authors from this institution to the total number of authors for each paper and using data from this article) was Google. Indeed, according to this measure, Google, Google Brain and DeepMind together produced slightly more articles than the top three academic institutions together (Stanford, Berkeley and MIT), who in turn produced more than twice as many articles as the top three European institutions together (University of Oxford, ETH Zurich and EPFL)!
Following is a recap and analysis of some of the sessions that particularly caught our interest and/or are relevant to what we’re working on in the lab.
Part 1: Chris Dance, Research Fellow
Multi-task RL with PEARL
In the paper “Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables”, Rakelly et al. present a multi-task RL method called PEARL (probabilistic embeddings for actor-critic RL). The authors are motivated by the fact that many of the problems we would like autonomous agents to solve share common structure: e.g. screwing a cap on a bottle and turning a doorknob both involve grasping an object in the hand and rotating the wrist.
This paper attracted my attention as the authors use posterior sampling (Strens et al. 2000; Osband et al., NIPS 2013), which is an elegant and powerful classical RL method, but they do so in a deep RL context. Now, in classical RL, posterior sampling maintains a posterior over possible Markov Decision Processes (MDPs), samples an MDP from this posterior, then acts according to an optimal policy for this MDP for some period of time. By analogy, PEARL keeps track of the “posterior” using a trained “inference network” which outputs a distribution over some “probabilistic context variables” that capture the current uncertainty over the task. Also, while this probabilistic context model could readily be combined with on-policy RL methods (like PPO), the authors instead use the SOTA off-policy RL algorithm SAC (Haarnoja et al. 2018), thereby minimising the number of samples required both for meta-training and adaptation. Indeed, as shown in the following Figure, PEARL requires 20-100x fewer samples in the domains tested when compared with other multi-task RL methods like MAML, and it often attains substantially better final returns! The authors also describe PEARL in a rather good blog article just in case anyone finds the full paper hard to follow.
Efficient visually-guided RL with SOLAR
While model-free RL algorithms have been used in most of the highly-visible success stories of RL today, they are too sample inefficient to be practical in applications like vision-based robot control. For instance, the SOTA model-free RL algorithm SAC (Haarnoja et al., 2018) takes four hours of interaction to learn the simple visually-guided task of stacking one block on another, even when starting from just a single initial configuration. In contrast, the model-based RL method described in “SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning” by Zhang et al. can learn this task in about one hour, when starting from a wide range of initial configurations.
While previous model-based RL algorithms for vision-based control rely on the ability to accurately predict videos into the future, an ability which requires many samples to learn, the algorithm used in this work relies on structured representations which can be learned more efficiently. This algorithm is called stochastic optimal control with latent representations (SOLAR) and it relies on the linear quadratic regulator with fitted linear models (LQR-FLM) algorithm (Levine and Abbeel, 2014), which fits local linear Gaussian transition models and local quadratic cost models. Now LQR-FLM works well if the state is low-dimensional and linearity is a good approximation, but not if the state is high-dimensional and linearity is a poor approximation, as is the case if images are used for the state. Therefore, SOLAR treats image observations as if they were generated from a low-dimensional latent space by a deconvolutional neural network that is learned with amortized variational inference. The authors claim that this makes SOLAR the most efficient method to date for image-based robotic task learning.
Understanding how deep learning generalizes
In “Stable Rank Normalization for Improved Generalization in Neural Networks”, by Sanyal, Torr, Dokania, which appeared in the Workshop on Understanding and Improving Generalization in Deep Learning, the authors are motivated by recent generalisation bounds for deep networks. Several such bounds (Neyshabur et al. 2018; Bartlett et al. 2017) involve a factor called the stable rank of a matrix which is defined by
where are the singular values of and is the rank of . The authors conduct experiments to understand the impact of directly controlling the stable rank on the generalization behaviour of NNs. In particular, they conduct gradient descent on stable-rank-normalised versions of the weight matrices. They show that this gives consistently smaller generalisation gaps on CIFAR-10/-100 than when using vanilla gradient descent or spectral normalisation. Of course, this lower generalisation gap comes at the expense of a poorer fit to the training set, so the authors do not get a lower test error than the vanilla algorithm, and the performance of their vanilla algorithm is in turn very far from the current SOTA as there is no data augmentation.
The authors also kindly pointed out the paper Li, Soltanolkotabi and Omyak, “Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks”, arXiv, April 2019. This paper extends recent work that explains the good generalization of overparameterized networks by arguing that SGD results in trained weights that are close to randomly-initialized weights and this proximity implies a low Rademacher complexity, such as the work of Li and Liang (2018) and of Neyshabur et al. (2019). While that recent work assumed the training data was correctly labelled, Li et al. (2019) instead consider the case where a fraction of the training data has the wrong labels, and they prove two main results.
Part 2: Arnaud Sors, Researcher, Machine Learning & Optimization
Population-based training for deep neural nets and meta-learning (Tutorials)
As the machine learning community today needs learning methods that go beyond classical stochastic gradient descent on one loss function and one dataset and because of the relevance to robotics, Arnaud Sors selected the tutorials on population-based training and on meta learning.
A session on recent advances in Population-based training for deep neural nets was held by Jeff Clune, Joel Lehman and Kenneth Stanley. They introduced novelty search algorithms, quality-diversity algorithms, the challenges of open-ended learning and the automatic design of training environments so it was an extensive introduction to the field.
The tutorial revolved around the idea that, training towards a predefined cost can have severe limitations in the context of RL but also in standard supervised learning. Novelty search consists of encouraging diversity as a goal and follows from the idea that the success of a model in a seemingly unrelated, secondary objective can often help in the initial objective. This is related to the work we’re doing in curiosity-driven exploration in robotics, for example in active localization. In extending novelty search there are quality diversity algorithms such as MAP elites[arXiv], where a search archive of local champions over predefined parameter configurations is maintained and jointly evolved. Maintaining an archive of possible best solutions is also interesting for generalization and for adaptability. The tutorial continued with challenges and preliminary experiments related to open-ended learning, i.e. continuous learning over populations [video] and the problem of designing training environments while simultaneously training an agent to succeed in these environments [arXiv]. The tutorial finished with work on indirect encoding, i.e. a network encoding how to construct a larger network.
Another session on meta-learning with Chelsea Finn and Sergey Levine started with a general and classic overview of the field. The speakers then went over various possible optimization methods such as black box optimization, optimization-based inference, non-parametric methods and extensions to Bayesian meta-learning. A second part was on meta-RL, i.e. learning to adapt to new tasks very quickly, and possible implementation choices. A last part was on common challenges coming with the use of meta-RL, mostly how to avoid meta-overfitting and how to propose new tasks automatically.
If you’re especially interested in this topic I recommend you check the slides for the session as they’re a compact and efficient introduction to the field. A complete reading list on the topic by subtopics can also found at [tinyurl.com/meta-reading]
Following are some pieces of work that particularly caught my interest either because of their broad applicability or because they address a common recurring problem.
Cheap orthonormal constraints in neural networks
I somewhat stumbled upon the work of Mario Lezcano-Casado and David Martinez-Rubio [arXiv] on cheap orthonormal constraints for weight matrices and it struck me how interesting it is for many ML applications. Below is a high-level overview where I try to show the main ideas.
The optimization problem
The authors consider optimization problems involving an orthogonality or unitary constraint on parameters: \(\min_{B \in G} f(B)\) where \(G\) can be \(SO(n)\) (i.e. special orthogonal group, the group of rotation matrices) or \(U(n)\) (i.e. the unitary group, similar notion for complex matrices). We describe the problem for SO(n). It’s almost identical for U(n).
To optimize over such a space, we would ideally like a method that:
A very quick look at Riemanian geometry and how it is used to transform the problem
A Riemanian manifold is a smooth manifold that has an inner product on the tangent space. \(SO(n)\) is a Riemanian manifold of \(\mathcal R^n\) (and it’s also a compact and connected Lie group so this means that in this group, multiplication is smooth). The Lie algebra of a Lie group is its tangent space at the identity. A classic result of Riemanian geometry is that the Lie algebra \(so(n)\) of \(SO(n)\) is the group of skew-symmetric matrices:
\(so(n) = \{A \in \mathcal R^{n\times n} | A + A^T = 0\}\)The exponential map is a function that (by definition) maps the tangent space (Lie algebra) to the manifold (Lie group) locally. Another classic result from Riemanian geometry is that the Lie exponential map is equivalent to the matrix exponential defined as the Taylor expansion \(exp(A) = I + A + \frac{1}{2} A^2 + …\)
Therefore, the constrained optimization problem initially introduced can be cast to:
\(\min_{A \in g} f(exp(A)\) where \(g\) is the group of skew symmetric matrices, canonically parameterized with a triangular parameter matrix. The optimization problem is now unconstrained
Approximation of the exponential map and its gradient
Now, although the problem is unconstrained, the matrix exponential appears in it. The authors propose to approximate it in the forward pass using a combination of Padé approximants and a scaling squaring trick, up to machine precision. They also derive an expression for the gradient of \(f \circ exp\), including an approximation of the differential of the exponential of matrices using the exponential of a \(2n \times 2n\) matrix. Finally, they show that this reparameterization, along with approximations, is cheap in the context of minibatch stochastic optimization because the part involving exponential differentiation has to be computed only once per batch. They also provide Pytorch code to try out orthogonal constraints in your weight matrices – including a generalization to non-square matrices.
Applications
I was interested in this work because I believe it has a broad range of applications. Orthogonal constraints are useful in dimensionality reduction, clustering, feature selection or even dictionary learning. Using parameterized and differentiable orthogonal transformations, such techniques could be integrated into the forward pass of a neural net. Orthogonal constraint can also be useful to stabilize recurrent neural networks (with the unitary group), or Riemanian gradient descent (cf. Arjovsky et al.). Orthogonal constraints are also important in invertible neural networks and normalizing flows, and a paper using this way of constraining a transformation was actually presented in the workshop on normalizing flow.
Learning from noisy labels
I noticed at least 4 papers on how to deal with noisy labels which is a problem that almost every ML practitioner has to deal with at some point… Below is a brief overview of some of the approaches that were presented that I thought could be useful.
Part 3: Quentin Grail, PhD student, Machine Learning & Optimization
Never-Ending Learning (Tutorial)
As humans, we can learn from multiple and diverse experiences throughout our lives but current machine learning models tend to learn a single objective function from a unique set of data. Never-Ending Learning was the topic of the tutorial of Tom Mitchell, one of the co-founders of ICML with his colleague Partha Talukdar.
They presented the Never-Ending Learning paradigm in several case studies, including their Never-Ending Language Learner (NELL).
NELL has been designed to read the web 24 hours/day and extract relational predicates between entities. Since January 2010 it’s accumulated more than 50 million candidate beliefs such as the ones you can see below.
Neural approaches to conversational AI (Tutorial)
Working on question-answering (QA) myself at NAVER, I was particularly interested in Michel Galley and Jianfeng Gao’s tutorial. They presented a large panel of the recent advances in Deep Learning for conversational AI. Starting with a description of the latest progress related to QA, they described several models and applications of QA from a single piece of text to Open Domain QA, which corresponds to the general case where a user wants to query a large dataset such as Wikipedia, Freebase…They also introduced the recent multi-step reasoning question-answering task that requires gathering multiple pieces of information spread across multiple documents.
The last part of the tutorial dealt with task-oriented dialogue models that aim to help users to accomplish specific tasks. Here they presented standard dialog system architectures and more recent End-to-End trainable neural architectures and Multi-Domain Dialogue State Tracking.They also reviewed several reinforcement learning approaches for conversational agents.
The session concluded with a brief description of conversation models and chatbots which mainly focused on social bots, like Amazon Alexa and Google Home. One of the major differences between chatbots and task-oriented dialog agents is their objective. Task-oriented dialog agents prioritize accomplishing specific tasks while social bots like Xiaolce, (one of the most popular Chinese chatbots from Microsoft) tend to maximize user engagement with personalized and enjoyable conversations.
Adversarial examples session
Adversarial Attacks on Node Embeddings via Graph Poisoning was about poisoning network representations. The main idea of network representation is to learn node embeddings, represented in a graph structure and which is useful for further tasks such as node classification, node clustering or edge prediction etc. The authors Bojchevski and Günnemann studied the robustness of such representations. They demonstrated that a small, well-chosen adversarial perturbation i.e. adding/removing an edge, can be sufficient to poison the graph and have significant consequences on the downstream task. They also showed that such an approach can be easily transferred across other graph structures.
Most of the work in adversarial attacks has focused on modifying the input example and in Adversarial camera stickers: A physical camera-based attack on deep learning systems the authors wonder if it’s possible to fool a deep classifier by physically manipulating the input device (in this case the camera). They modeled the effect of putting a dot sticker on the camera and jointly optimized the adversarial and physical ‘realizability’ of the solution and they were able to fool ImageNet Classifiers in approximately 50% of the cases.
More on Understanding and improving generalization in deep learning
Aleksander Madry, Associate Professor at MIT, gave a talk on Are All Features Created Equal? He’s interested in how to create meaningful and explainable representations of images in the face of how fragile current networks are when it comes to adversarial examples. It’s true that even small invisible to the human eye perturbations on input data can lead to a serious failure in the model.
Another well-known related problem of neural networks is misaligned representations which is when images that are very different from a human point of view are actually represented as similar by the network.
Madry argues that meaningless perturbations from a human point of view can be meaningful from the model point of view and that there’s no reason a model should restrict its learning to robust representations. They call robust features those that humans would use to classify images, and non-robust features all the others. Because these non-robust features are present in the dataset, learned models tend to use them because they are incapable of distinguishing them from the robust features regarding the distribution of the samples of the dataset.
In this work, the authors create an augmented dataset containing adversarial examples to force the classifier to use only the robust features to take a decision.They show that doing so allows the training of a more robust network with better example representations. It also makes it easier to manipulate the features of a trained model (i.e. change the color of the object, add stripes etc.) Although it slightly decreases the accuracy, removing non-robust features does really help to train a more robust neural network.
And to finish – one (of the two) best paper awards!
The contribution of the paper Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations by Locatello et al. was twofold. First, they theoretically show that learning disentangled representations in an unsupervised way is impossible without bias on the model and the data. Secondly, they did a survey on most of the recent algorithms and evaluation metrics by training more than 12000 models on multiple datasets.
Their conclusions were that, while recent methods successfully perform on the proposed losses, no good disentanglement models could be identified without explicit supervision. They also showed that it’s not clear for the moment that unsupervised learning of such disentanglement decreases the learning complexity of the downstream tasks.
So, that’s it from us for this year. Reach out to us if one of the topics we’ve covered has piqued your interest and we hope to see you at SIGIR, ACL, ICCV, EMNLP or NeurIPS before the year is out!
NAVER LABS Europe 6-8 chemin de Maupertuis 38240 Meylan France Contact
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimization problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimization to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments. More details on our research can be found in the Explore section below.
For a robot to be useful it must be able to represent its knowledge of the world, share what it learns and interact with other agents, in particular humans. Our research combines expertise in human-robot interaction, natural language processing, speech, information retrieval, data management and low code/no code programming to build AI components that will help next-generation robots perform complex real-world tasks. These components will help robots interact safely with humans and their physical environment, other robots and systems, represent and update their world knowledge and share it with the rest of the fleet. More details on our research can be found in the Explore section below.
Visual perception is a necessary part of any intelligent system that is meant to interact with the world. Robots need to perceive the structure, the objects, and people in their environment to better understand the world and perform the tasks they are assigned. Our research combines expertise in visual representation learning, self-supervised learning and human behaviour understanding to build AI components that help robots understand and navigate in their 3D environment, detect and interact with surrounding objects and people and continuously adapt themselves when deployed in new environments. More details on our research can be found in the Explore section below.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
—————
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
1. Difference in female/male salary: 34/40 points
2. Difference in salary increases female/male: 35/35 points
3. Salary increases upon return from maternity leave: Non calculable
4. Number of employees in under-represented gender in 10 highest salaries: 5/10 points
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
——————-
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
1. Les écarts de salaire entre les femmes et les hommes: 34 sur 40 points
2. Les écarts des augmentations individuelles entre les femmes et les hommes : 35 sur 35 points
3. Toutes les salariées augmentées revenant de congé maternité : Incalculable
4. Le nombre de salarié du sexe sous-représenté parmi les 10 plus hautes rémunérations : 5 sur 10 points
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimisation problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimisation to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments.
The research we conduct on expressive visual representations is applicable to visual search, object detection, image classification and the automatic extraction of 3D human poses and shapes that can be used for human behavior understanding and prediction, human-robot interaction or even avatar animation. We also extract 3D information from images that can be used for intelligent robot navigation, augmented reality and the 3D reconstruction of objects, buildings or even entire cities.
Our work covers the spectrum from unsupervised to supervised approaches, and from very deep architectures to very compact ones. We’re excited about the promise of big data to bring big performance gains to our algorithms but also passionate about the challenge of working in data-scarce and low-power scenarios.
Furthermore, we believe that a modern computer vision system needs to be able to continuously adapt itself to its environment and to improve itself via lifelong learning. Our driving goal is to use our research to deliver embodied intelligence to our users in robotics, autonomous driving, via phone cameras and any other visual means to reach people wherever they may be.
This web site uses cookies for the site search, to display videos and for aggregate site analytics.
Learn more about these cookies in our privacy notice.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.