Code and data from NAVER LABS Europe

Annotated scenes in HRI: Robots waiting for the elevator

Integrating social norms in a low-data regime goal selection problem

This dataset of 125 procedurally-generated expert-annotated scenes accompanies the RO-MAN 2025 paper ‘Robots waiting for the elevator: integrating social norms in a low-data regime goal selection problem‘.

Data, Human understanding

Web page link

DUNE

Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers

A unified encoder of different foundation models excelling in 2D vision, 3D understanding, and 3D human perception. Code accompanies the CVPR 2025 paper.

3D vision, Computer vision, Data, Foundation models, Human understanding

Github link

Blog link

Web page link

PoseEmbroider

Towards 3D, visual, semantic-aware human pose representation

A transformer-based multi-modal alignment retrieval model that processes 3D poses, person’s pictures and textual pose descriptions to produce an enhanced 3D-, visual- and semantic aware human pose representation that can sort out partial information (e.g. image with the lower body occluded). Code with ECCV 2024 paper.

3D vision, Computer vision, Human understanding, NLP

Github link

Blog link

Web page link

Multi-HMR

Whole-body human mesh recovery of multiple persons from a single image.

A simple yet effective single-shot method to detect multiple people in an image and estimate their pose, body shape and expression. Training and demo code.

Computer vision, Human understanding

Github link

Blog link

Web page link

SHOWMe

Benchmarking Object-agnostic Hand-Object 3D Reconstruction

The SHOWMe dataset comprises 96 videos with their associated high-quality textured meshes of a hand holding an object.

Computer vision, Human understanding

Github link

Blog link

Web page link

4DHumanOutfit

A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements.

Collaboration with INRIA.

Computer vision, Human understanding

Blog link

Web page link

PoseFix

Correcting 3D human poses with natural language.

The PoseFix dataset consists of several thousand paired 3D poses and corresponding text feedback that describes how the source pose needs to be modified to obtain the target pose.

Computer vision, Human understanding

Github link

Blog link

Web page link

PoseBERT

A novel, plug and play model for human 3D shape estimation in videos.

Model trained by mimicking the BERT algorithm from the natural language processing community.

Computer vision, Human understanding

Github link

Blog link

Web page link

PoseGPT

Quantization-based 3D human motion generation and forecasting.

An auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences.

Computer vision, Human understanding

Github link

Blog link

Web page link

PoseScript

3D human poses from natural language.

A dataset pairing 3D human poses with both automatically generated and human-written descriptions.

Computer vision, Human understanding

Github link

Blog link

Web page link

DOPE

Distillation of Part Experts for whole-body 3D pose estimation in the wild.

A novel, efficient model for whole-body 3D pose estimation (including bodies, hands and faces), trained by mimicking the output of hand-, body- and face-pose experts.

Computer vision, Human understanding

Github link

Blog link

Web page link