Data, code and models released by NAVER LABS Europe


Towards 3D, visual, semantic-aware human pose representation

A transformer-based multi-modal alignment retrieval model  that processes 3D poses, person’s pictures and textual pose descriptions to produce an enhanced 3D-, visual- and semantic aware human pose representation that can sort out partial information (e.g. image with the lower body occluded). Code with ECCV 2024 paper.


A Generalist Combinatorial Optimization Agent Learning

Code to learn to solve 16 combinatorial optimization problems with strong transfer learning capacities.

Retrieval-augmented generation in multilingual settings

Retrieval-augmented generation in multilingual settings

Retrieval-augmented generation (RAG) in the multilingual setting (mRAG). Our findings highlight that despite the availability of high-quality off-the-shelf multilingual retrievers and generators, task-specific prompt engineering is needed to enable generation in user languages. Moreover, current evaluation metrics need adjustments for multilingual setting, to account for variations in spelling named entities.


A multilingual Spoken Language Understanding (SLU) dataset

Covers 12 languages from different families and inherits from the original MASSIVE dataset the annotations for the intent prediction and slot filling tasks. See also the Interspeech 2024 paper.


Universal Classification Models via Multi-teacher Distillation

General encoder for classification. Accompanies ECCV’24 paper.

DEBiT (Dual Encoder Binocular Transformer)

Correspondence Pretext Tasks for Goal-oriented Visual Navigation

An end-to-end trained agent for image goal navigation. Accompanies ICLR24 paper End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon.


BERGEN: benchmarking RAG

A Benchmarking Library for Retrieval-Augmented Generation

Designed to ease the reproducibility and integration of new datasets and models and identify strong baselines.


A benchmark for the evaluation of long-context LLMs on meeting transcripts.

The meeting data used in this benchmark originally comes from the ELITR dataset. This dataset and experiments are described in the paper and are an output of the EU UTTER project.


Lightweight Pytorch framework for training and running text generation models.

Can be used for machine translation, speech translation, language modeling and dialogue supporting a number of popular pre-trained models.


Semantic Hierarchy Nexus for Open-vocabulary Object Detection

A novel classifier that uses semantic knowledge from class hierarchies. Can be seamlessly integrated with any off-the-shelf OvOD detector, with no additional computational overhead during inference.


A classification method based on on label propagation (LP) that utilizes geodesic distances.

Code that accompanies the CVPR 24 paper, Label Propagation for Zero-shot Classification with Vision-Language Models.

DUSt3R: Dense and Unconstrained Stereo 3D Reconstruction

3D reconstruction models made easy

3D reconstruction and visual localization with no user intervention and no priors using only a few images.


Placing Objects in Context

Code for the paper “Placing Objects in Context via Inpainting for Out-of-distribution Segmentation”, ECCV 2024


A sparse bi-encoder BERT-based model for effective and efficient first-stage ranking.

Several releases: SPLADE V-2, SPLADE V-3, CoSPLADE etc.


Efficient distillation of multi-task speech models via language-specific experts.

A multitask and multilingual speech model covering 99 languages.


Whole-body human mesh recovery of multiple persons from a single image.

A simple yet effective single-shot method to detect multiple people in an image and estimate their pose, body shape and expression. Training and demo code.


Bisimulation Quotienting for Efficient Neural Combinatorial Optimization

Code to learn to solve 4 standard combinatorial optimization problems: TSPs, CVRP. OP and KP accompanying NeurIPS23 paper.


Benchmarking Object-agnostic Hand-Object 3D Reconstruction

The SHOWMe dataset comprises 96 videos with their associated high-quality textured meshes of a hand holding an object.


A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements.

Collaboration with INRIA.


Contextualizing SPLADE for conversational information retrieval.

SPLADE is sparse bi-encoder BERT-based model for effective and efficient first-stage ranking.


Correcting 3D human poses with natural language.

The PoseFix dataset consists of several thousand paired 3D poses and corresponding text feedback that describes how the source pose needs to be modified to obtain the target pose.


Stable Learning of Augmentations with Cold-start and KL regularization.

Learning augmentation policies without prior knowledge.

RELIS semantic segmentation

Reliability in semantic segmentation: are we on the right track?

A codebase to evaluate the robustness and uncertainty properties of semantic segmentation models as implemented in the CVPR 2024 paper.


No reason for no supervision: improved generalization in supervised models.

Model for transfer learning.

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.


Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
