The computer vision team conducts research in a wide range of areas, including visual search, scene parsing, human sensing, action recognition, pose estimation and lifelong learning.



  • Congrats to Gabriele Csurka  – outstanding reviewer CVPR 22!
  • We have 3 papers at CVPR 2022
  • We have a paper at ICRA 2022
  • We have 2 papers at ICLR 2022
  • Paper at AAAI 2022
  • The team has a paper at WACV 2022.



Computer Vision

Our work covers the spectrum from unsupervised to supervised approaches, and from very deep architectures to very compact ones. We’re excited about the promise of big data to bring big performance gains to our algorithms but also passionate about the challenge of working in data-scarce and low-power scenarios. Our driving goal is to use our research to deliver ambient visual intelligence to our users in autonomous driving, robotics, via phone cameras and any other visual means to reach people wherever they may be.

Our research combines skills in machine learning, pattern recognition and computer vision, and we work on multi-disciplinary problems with teams specialised in natural language processing, user experience, ethnography, design and more. Our research efforts may be either long-term in focus, or may tackle problems with concrete and immediate relevance to NAVER products and services. We’re very active in the computer vision community and our research is often pursued in collaboration with external partners from government and academia.

A novel, plug and play model for human 3D shape estimation of the body or hands, in videos which is trained by mimicking the BERT algorithm from the natural language processing community. Blog post by Fabien Baradel, Philippe Weinzaepfel, Romain Brégier, Yannis Kalantidis and Gregory Rogez
Continual Learning of visual representations without catastrophic forgetting
Using domain randomization and meta-learning, computer vision models forget less when exposed to training samples from new domains. Blog article by Riccardo Volpi, Diane Larlus and Grégory Rogez
Learning Visual Representations with Caption Annotations
A new modeling task masks tokens in image captions to enable mid-sized sets of captioned images to rival large-scale labelled image sets for learning generic visual representations. Blog article by Diane Larlus
Dope featured image
A novel efficient model for whole-body 3D pose estimation (including bodies, hands and faces), that is trained by mimicking the output of hand-, body- and face-pose experts. Blog article by Philippe Weinzaepfel
The short memory of artificial neural networks
A research overview of current work in lifelong learning. Blog article by Riccardo Volpi
A first-of-its-kind architecture that, based on a single image, predicts how a robot can pick up objects from within any scene could revolutionize applications in AR/VR and robotics. Blog article by Gregory Rogez
Naver Labs Europe is leading a chair on Lifelong Representation Learning as part of the MIAI institute (Multidisciplinary Institute in Artificial Intelligence)
Learning Visual Representations with Caption Annotations (European Conference on Computer Vision (ECCV 2020 paper)

Recent Publications:

Computer Vision team:

[ultimatemember form_id=”9347138″]

Related Content

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.


Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.