Code and data from NAVER LABS Europe

Multi-HMR + Anny

A Multi-HMR checkpoint trained with the Anny body model.

Monocular multi-person Human Mesh Recovery in a unified, open and interpretable representation. Anny is our open-source (Apache 2.0) parametric 3D human model.

3D vision, Computer vision, Human understanding

Github link

Web page link

Anny-One

A large-scale synthetic dataset for 3D human understanding

Multi-person configurations, multi-view setups with extreme camera positions, diverse cameras, viewpoints, and FOVs, indoor 3D scenes and full Anny annotations. Designed for Human Mesh Recovery in challenging settings.

3D vision, Computer vision, Human understanding, Synthetic data

Web page link

Anny

3D human parametric model

An open-source (Apache 2.0) parametric 3D human model that covers a wide spectrum of humanity, from infants to elders, with interpretable and easy-to-use controls.

3D vision, Computer vision, Human understanding, Synthetic data

Github link

Blog link

Web page link

LUDVIG

Learning-free Uplifting of 2D Visual features to Gaussian splatting scenes

A simple yet effective aggregation technique yields applied to semantic masks from Segment Anything (SAM) and to generic DINOv2 features, integrating 3D scene geometry through graph diffusion (ICCV 2025 paper)

3D vision, Computer vision

Github link

Blog link

Web page link

LPOSS

Label Propagation over patches and pixels for Open-vocabulary Semantic Segmentation

A training-free method for open-vocabulary semantic segmentation using Vision-and-Language Models (VLMs).

Computer vision, Data, LLM

DUNE

Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers

A unified encoder of different foundation models excelling in 2D vision, 3D understanding, and 3D human perception. Code accompanies the CVPR 2025 paper.

3D vision, Computer vision, Data, Foundation models, Human understanding

Github link

Blog link

Web page link

Geo4D

Leveraging video generators for Geometric 4D scene reconstruction

Geo4D is a method to repurpose video diffusion models for monocular 3D reconstruction of dynamic scenes including that of long videos.

3D vision, Computer vision

Github link

Blog link

Web page link

PoseEmbroider

Towards 3D, visual, semantic-aware human pose representation

A transformer-based multi-modal alignment retrieval model that processes 3D poses, person’s pictures and textual pose descriptions to produce an enhanced 3D-, visual- and semantic aware human pose representation that can sort out partial information (e.g. image with the lower body occluded). Code with ECCV 2024 paper.

3D vision, Computer vision, Human understanding, NLP

Github link

Blog link

Web page link

UNIC

Universal Classification Models via Multi-teacher Distillation

General encoder for classification. Accompanies ECCV’24 paper.

Computer vision, Visual representation learning

Github link

Blog link

DEBiT (Dual Encoder Binocular Transformer)

Correspondence Pretext Tasks for Goal-oriented Visual Navigation

An end-to-end trained agent for image goal navigation. Accompanies ICLR24 paper End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon.

.

Computer vision, Foundation models

Github link

Blog link

SHiNe

Semantic Hierarchy Nexus for Open-vocabulary Object Detection

A novel classifier that uses semantic knowledge from class hierarchies. Can be seamlessly integrated with any off-the-shelf OvOD detector, with no additional computational overhead during inference.

Computer vision

Github link

Blog link

ZLaP

A classification method based on on label propagation (LP) that utilizes geodesic distances.

Code that accompanies the CVPR 24 paper, Label Propagation for Zero-shot Classification with Vision-Language Models.

Computer vision

Github link

Blog link

POC

Placing Objects in Context

Code for the paper “Placing Objects in Context via Inpainting for Out-of-distribution Segmentation”, ECCV 2024

3D vision, Computer vision

Github link

Blog link

Multi-HMR

Whole-body human mesh recovery of multiple persons from a single image.

A simple yet effective single-shot method to detect multiple people in an image and estimate their pose, body shape and expression. Training and demo code.

Computer vision, Human understanding

Github link

Blog link

Web page link

SHOWMe

Benchmarking Object-agnostic Hand-Object 3D Reconstruction

The SHOWMe dataset comprises 96 videos with their associated high-quality textured meshes of a hand holding an object.

Computer vision, Human understanding

Github link

Blog link

Web page link

4DHumanOutfit

A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements.

Collaboration with INRIA.

Computer vision, Human understanding

Blog link

Web page link

Transferable representations

Fake it till you make it: Learning transferable representations from synthetic ImageNet clones

Models trained on synthetic images exhibit strong generalization properties and perform on par with models trained on real data.

Computer vision, Data

PoseFix

Correcting 3D human poses with natural language.

The PoseFix dataset consists of several thousand paired 3D poses and corresponding text feedback that describes how the source pose needs to be modified to obtain the target pose.

Computer vision, Human understanding

Github link

Blog link

Web page link

SLACK

Stable Learning of Augmentations with Cold-start and KL regularization.

Learning augmentation policies without prior knowledge.

Computer vision, Visual representation learning

Github link

Blog link

Web page link

RELIS semantic segmentation

Reliability in semantic segmentation: are we on the right track?

A codebase to evaluate the robustness and uncertainty properties of semantic segmentation models as implemented in the CVPR 2024 paper.

Computer vision, Data, Visual representation learning

Github link

Blog link

T-REX

No reason for no supervision: improved generalization in supervised models.

Model for transfer learning.

Computer vision, Visual representation learning

Github link

Blog link

Web page link

Synthetic ImageNet clones

Fake it till you make it: learning transferable representations from synthetic ImageNet clones.

Two ResNet50 models pretrained on our synthetic ImageNet clones: ImageNet-100-SD or ImageNet-1K-SD.

Computer vision, Visual representation learning

Blog link

Web page link

ARTEMIS

Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity.

An Explicit Matching module for compatibility and an Implicit Similarity module for relevance.

Computer vision, Visual representation learning

Github link

Blog link

Web page link

Learning super-features for image retrieval

A novel architecture for deep image retrieval

Code for running our FIRe model , based solely on mid-level features that we call super-features.

Computer vision, Visual representation learning

Github link

Blog link

Contributing to the open science community

Topics

Multi-HMR + Anny

Anny-One

Anny

LUDVIG

LPOSS

DUNE

Geo4D

PoseEmbroider

UNIC

DEBiT (Dual Encoder Binocular Transformer)

SHiNe

ZLaP

POC

Multi-HMR

SHOWMe

4DHumanOutfit

Transferable representations

PoseFix

SLACK

RELIS semantic segmentation

T-REX

Synthetic ImageNet clones

ARTEMIS

Learning super-features for image retrieval

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

Contributing to the open science community

Topics

Multi-HMR + Anny

Anny-One

Anny

LPOSS

DUNE

PoseEmbroider

UNIC

DEBiT (Dual Encoder Binocular Transformer)

SHiNe

ZLaP

POC

Cookie settings