The Thirty-second Annual Conference on Neural Information Processing Systems

Highlights of what we saw at this year’s conference

Robotics and Optimization

### Robotics (C. Dance)

On Wednesday, Kumar *et al. *from Berkeley in Visual Memory for Robust Path Following investigated how a robot could retrace a path in a novel environment both forwards and backwards, given a single demonstration of that path, in spite of changes in the environment and uncertainty in the robot’s motion. Humans regularly accomplish this task, just like we each managed to find our way back to our hotels in snowy Montreal. The classical approach to this problem, that we’ve become familiar with over the last year at NAVER LABS Europe, is via VSLAM. One builds a 3D model of the world and localises with respect to it. Such approaches often encounter difficulties in highly symmetric man-made environments, such as a long corridor where all the doors look the same. However, precisely reconstructing and localising in such an environment is less important than recognising “this is the corner where I should turn left.”

The proposed approach converts a sequence of images along an input path into an abstract representation, and a learned controller implicitly localises the agent along this abstracted path using the current observation and outputs actions that bring the agents to the desired goal location. The proposed model learns to do things humans (sometimes) do too, like counting doors and looking on the right for a table that was previously on the left. Importantly, the approach generalises much better than VSLAM-based approaches when images are acquired close to obstacles or in a changing environment, as shown in the following figure.

I was intrigued by the work of Kurutach *et al. *on “Learning Plannable Representations with Causal InfoGAN” (CIGAN). Extensions of this work also appeared in several workshops, such as the presentation by Wang *et al. *in the Modeling the Physical World: Perception Learning and Control on “Learning Robotic Manipulation through Visual Planning and Acting”. The authors consider manipulating soft and deformable objects to achieve a human-specified goal state. For instance, they present videos of a real robot moving a rope round obstacles based on visual feedback.

The authors’ overall approach is to learn a *meaningful* and *causal *latent space for encoding the state of the environment, thereby ensuring that the latent space is useful for planning. Now, infoGANs (Chen *et al.*, NIPS, 2016) achieve meaningfulness by separating the generator’s input noise vector into two parts: a random noise part and a structure-encoding part. An infoGAN’s loss then has a term that encourages high mutual information between observations and this structure-encoding part. CIGAN extends the infoGAN by also enforcing *causality.* In particular, CIGAN works with *pairs *of latent representations at a time, along with the corresponding pairs of observations. It tries to generate only observation pairs such that a *short* action sequence generates the second observation from the first.

Kurutach *et al. *and Wang *et al.*’s system embeds the start and goal states in the CIGAN’s latent space and imagine a plan for manipulating the world from start to goal. Thanks to causality, planning then consists in simple linear interpolation between these states if there are no obstacles, and the use of Dijkstra’s algorithm otherwise. This plan in the latent space is then decoded into an observation sequence. Finally, the system uses this imagined observation sequence as a reference trajectory for a visual serving controller which relies on a learned inverse dynamics model.

### Optimization (S. Michel)

My Wednesday’s sessions are grouped around the optimization problem underlying the training of neural networks i.e. that it’s generally highly non-convex. Several research directions try to cope with this issue by proposing new non-convex optimization methods that aim to avoid bad local minima, or designing new objective functions with good landscape properties. Three papers at NeurIPS had an alternative approach: they explored how the design of the neural network itself can make the loss function easier to optimize.

Adding One Neuron Can Eliminate All Bad Local Minima by Liang *et al.* considers deep non-linear neural networks for binary classification with a smooth loss function. They show that adding an exponential neuron and its quadratic regularization makes every local minimum a global minimum. This additional neuron can be viewed as a skip connection from the input to the output, but the result also holds when adding a similar neuron to each layer of a fully-connected feed-forward neural network.

Porcupine Neural Networks: Approximating Neural Network Landscapes by Feizi *et al.* Knowing that for neural networks with constrained weights the optimization landscape has good theoretical properties, the main question this paper addresses is whether any unconstrained neural net can by approximated by such a constrained one. The authors show that in the case of two-layer neural nets, the answer is yes, by introducing a family of “Porcupine Neural Networks” (PNN) that has the right properties and showing empirically that any fully connected two-layer neural net can be approximated by a PNN. The authors also discuss the use of PNNs for general neural nets.

Visualizing the Loss Landscape of Neural Nets by Li et al. The goal of the paper is to understand why certain architectures are easier to train than others and certain choices of training parameters lead to models that generalize better than others. The authors present a new visualization technique which shows the effects of different architecture choices on the loss function landscape and therefore provides useful insights regarding the trainability and generalization of neural nets.

The next day, in a “Realistic Evaluation of Deep Semi-Supervised Learning Algorithms” Oliver et al., take a critical view of existing semi-supervised approaches. In a first set of results, they give equal budget for hyper-parameter search to semi-supervised methods as to baseline purely supervised ones (a point which was also insisted upon in Joelle Pineau’s keynote the day before) and show results in which the gain of semi-supervised approaches is much smaller than previously reported. Also, a simple transfer learning technique (fine-tuning) outperformed all other compared methods. Finally, they show that semi-supervised techniques quickly degrade if test data has a very different distribution from training data.

Highlights of what we saw at this year’s conference – Part 1/4 Expo Day

Highlights of what we saw at this year’s conference – Part 2/4 Visualization and ML

Highlights of what we saw at this year’s conference – Part 4/4 Machine Learning for Creativity and Design