COMPUTER VISION
Automatically extracting information from images and videos for real life applications based on visual search, 3D vision, human sensing, visual reasoning, camera pose estimation, and lifelong learning.
Highlights
2023
- Paper at 2nd Conf on Lifelong Learning Agents – CoLLAs 2023
- Romain Bregier and Jerome Revaud outstanding reviewers CVPR 2023.
- Diane Larlus Doctoral Consortium chair and Area Chair ICCV 2023.
- Tutorial: Visual Recognition Beyond the Comfort Zone at ICCV 2023
- 4 papers and 2 invited workshop talks at CVPR 2023
- Spotlight paper at ICLR 2023
2022
- Rafael Sampaio de Rezende and Mert Bulent Sariyildiz- outstanding reviewers of ECCV 2022
- We have 2 papers at NeurIPS 2022
- We have a paper at IROS 2022
- We have a paper at 3DV
- Keynote at Deep Learning Indaba 2022 by Gabriela Csurka
- We have 3 papers accepted at ECCV
- Gabriela Csurka – outstanding reviewer of CVPR 22
- We have 3 papers at CVPR 2022
- We have a paper at ICRA 2022
- We have 2 papers at ICLR 2022
- Paper at AAAI 2022
- The team has a paper at WACV 2022.
2021
- The team has 2 papers at 3DV 2021
- Co-organizing the ‘ImageNet: past, present and future’ workshop at NeurIPS 2021
- 2 papers at ICCV2021, Concept Generalization in Visual Representation Learning
- The team has 4 papers accepted at CVPR 2021 including an oral, and a findings paper in the Continual Learning workshop.
- Paper accepted at ICLR2021 on progressive skeletonization – network pruning at initialization (on openreview)
- Paper with IRI & Univ. Aalto on multi-finger grasping accepted at ICRA 2021 (arXIv preprint)
- Philippe Weinzaepfel & Grégory Rogez have a paper on understanding human action out-of-context and the Mimetics dataset published in IJCV. See Blog.
- Co-organizers of the PAISS Summer School 2021
- Diane Larlus and Yannis Kalantidis are serving as Area Chairs for CVPR 2021 and ICCV 2021.
- We have 2 papers at WACV 2021.
- Co-organizing the 4th Workshop on Long-Term Visual Localization under Changing Conditions at ICCV2021
- Invited talk at 3D-DLAD workshop at IEEE IV symposium on ‘Modern methods for visual localization’. Watch the talk
- Release of world’s biggest indoor localization dataset and a new version of the unified data format kapture!

The research we conduct on expressive visual representations is applicable to visual search, object detection, image classification and the automatic extraction of 3D human poses and shapes that can be used for human behavior understanding and prediction, human-robot interaction or even avatar animation. We also extract 3D information from images that can be used for intelligent robot navigation, augmented reality and the 3D reconstruction of objects, buildings or even entire cities.
Our work covers the spectrum from unsupervised to supervised approaches, and from very deep architectures to very compact ones. We’re excited about the promise of big data to bring big performance gains to our algorithms but also passionate about the challenge of working in data-scarce and low-power scenarios.
Furthermore, we believe that a modern computer vision system needs to be able to continuously adapt itself to its environment and to improve itself via lifelong learning. Our driving goal is to use our research to deliver embodied intelligence to our users in robotics, autonomous driving, via phone cameras and any other visual means to reach people wherever they may be.
We have 4 research groups in vision: Spatial AI, Deep Geometric Learning, Visual Representation Learning and 3D Humans. Our research combines skills in machine learning, pattern recognition as well as 3D vision, and our research is focused on long-term oriented problems with relevance to current and future NAVER services. We’re very active in the computer vision community and our research is often pursued in collaboration with external partners from academia.
PROJECTS
- CroCo: Self-Supervised Pretraining for 3D Vision Tasks by Cross-View Completion
- Improving the generalization of supervised models (t-ReX)
- On the Road to Online Adaptation for Semantic Image Segmentation (OASIS)
- ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (ARTEMIS)
- StacMR: Scene-Text Aware Cross-Modal Retrieval (StacMR)
- Kapture: a unified data format to facilitate visual localization and SfM (kapture)
- Continual adaptation of visual representations via domain randomization and meta-learning
- Hard Negative Mixing for Contrastive Learning (moCHI)
- Concept generalization in visual representation learning
- Proxy Virtual Worlds & Virtual KITTI 2
- Lifelong Representation Learning (Research Chair – MIAI Institute)
- Learning Visual Representations with Caption Annotations (ICMLM)
- SMPLy Benchmarking 3D Human Pose in-the-Wild
- Deep Image Retrieval

Recent Publications:
- Croco: cross-view completion for self-supervised pretraining of geometric vision models, Leonid Antsfeld, Yohann Cabon, Boris Chidlovskii, Gabriela Csurka Khedari, Jérome Revaud, Philippe Weinzaepfel, Romain Brégier, Leroy Vincent, Thomas Lucas, Arora Vaibhav, NeurIPS, New Orleans, Louisiana USA, 28 November – 9 December, 2022
- On the road to online adaptation for semantic image segmentation, Riccardo Volpi, Pau De Jorge, Gabriela Csurka Khedari, Diane Larlus, CVPR, New Orleans, Louisiana USA, 19-24 June , 2022
- Deep visual geo-localization benchmark (oral), Gabriele Berton, Riccardo Mereu, Gabriele Trivigno, Carlo Masone, Gabriela Csurka Khedari, Torsten Sattler, Barbara Caputo, CVPR, New Orleans, Louisiana USA, 19-24 June , 2022
- PUMP: pyramidal and uniqueness matching priors for unsupervised learning of local features, Jérome Revaud, Vincent Leroy, Philippe Weinzaepfel, Boris Chidlovskii, CVPR, New Orleans, Louisiana USA, 19-24 June , 2022
- An in-depth experimental study of sensor usage and visual reasoning of robots navigating in real environments, Assem Sadek, Guillaume Bono, Boris Chidlovskii, Christian Wolf, ICRA, Philadelphia, USA, 23-27 May, 2022
- ARTEMIS: attention-based retrieval with text-explicit matching implicit similarity, Ginger Delmas, Rafael Sampaio de Rezende, Gabriela Csurka, Diane Larlus, ICLR, virtual-only event, 25-29 April, 2022.
-
Learning super-features for image retrieval, Philippe Weinzaepfel, Thomas Lucas, Diane Larlus, Yannis Kalantidis, ICLR, virtual-only event, 25-29 April, 2022.
- Learning with label noise for image retrieval by selecting interactions, Sarah Ibrahimi, Arnaud Sors, Rafael Sampaio de Rezende and Stéphane Clinchant, WACV, Waikoloa Hawaii, 4-8 January, 2022
- Concept generalization in visual representation learning, Mert Bulent Sariyildiz, Yannis Kalantidis, Diane Larlus, Karteek Alahari, International Conference on Computer Vision (ICCV), virtual-only conference, 11-17 October, 2021
- Probabilistic embeddings for cross-modal retrieval, Sanghyuk Chun, Seong Joon Oh, Rafael Sampaio de Rezende, Yannis Kalantidis, Diane Larlus, Conference on Computer Vision and Pattern Recognition (CVPR), virtual-only conference, 19-25 June, 2021
- Continual adaptation of visual representations via domain randomization and meta-learning. Oral. Riccardo Volpi, Diane Larlus, Gregory Rogez, Conference on Computer Vision and Pattern Recognition (CVPR), virtual-only conference, 19-25 June, 2021
- Large-scale localization datasets in crowded indoor spaces, Donghwan Lee, Soohyun Ryu, Suyong Yeon, Yonghan Lee, Deokhwa Kim, Cheolho Han, Yohann Cabon, Philippe Weinzaepfel, Nicolas Guérin, Gabriela Csurka Khedari, Martin Humenberger, Conference on Computer Vision and Pattern Recognition (CVPR), virtual-only conference, 19-25 June, 2021
- Multi-FinGAN: generative coarse-to-fine sampling of multi-finger grasps, Jens Lundell, Enric Corona, Tran Nguyen Le, Francesco Verdoja, Philippe Weinzaepfel, Gregory Rogez, Francesc Moreno-Noguer, Ville Kyrki, IEEE International Conference on Robotics and Automation (ICRA), hybrid conference, Xi’an, China, 30 May-5 June, 2021
- Progressive skeletonization: trimming more fat from a network at initialization, Pau de Jorge, Amartya Sanyal, Harkirat Behl, Philip Torr, Gregory Rogez, Puneet Dokania, Ninth International Conference on Learning Representations (ICLR), virtual-only conference, 3-7 May, 2021
- Mimetics: Towards understanding human action out-of-context, Philippe Weinzaepfel and Grégory Rogez, International Journal of Computer Vision, volume 129, pages 1675–1690, 2021.
- Robust Image Retrieval-based Visual Localization using kapture, Martin Humenberger, Yohann Cabon, Nicolas Guerin, Julien Morat, Jérome Revaud, Philippe Rerole, Noé Pion, Cesar Roberto De Souza, Vincent Leroy, Gabriela Csurka, Published on arXiv.org