MASt3R
Based upon the breakthrough framework, DUSt3R, MASt3R provides metric 3D reconstruction and dense local feature maps capable of handling thousands of images.
mHuBERT-147
A promising compact model for speech processing pipelines, offering an unprecedented balance between high performance and parameter efficiency. Developed within the the EU UTTER project.
PoseEmbroider
A transformer-based multi-modal alignment retrieval model that processes 3D poses, person’s pictures and textual pose descriptions to produce an enhanced 3D-, visual- and semantic aware human pose representation that can sort out partial information (e.g. image with the lower body occluded). Code with ECCV 2024 paper.
Retrieval-augmented generation in multilingual settings
Retrieval-augmented generation (RAG) in the multilingual setting (mRAG). Our findings highlight that despite the availability of high-quality off-the-shelf multilingual retrievers and generators, task-specific prompt engineering is needed to enable generation in user languages. Moreover, current evaluation metrics need adjustments for multilingual setting, to account for variations in spelling named entities.
Speech-MASSIVE
Covers 12 languages from different families and inherits from the original MASSIVE dataset the annotations for the intent prediction and slot filling tasks. See also the Interspeech 2024 paper.
UNIC
General encoder for classification. Accompanies ECCV’24 paper.
DEBiT (Dual Encoder Binocular Transformer)
An end-to-end trained agent for image goal navigation. Accompanies ICLR24 paper End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon.
.
BERGEN: benchmarking RAG
Designed to ease the reproducibility and integration of new datasets and models and identify strong baselines.
Can be used for machine translation, speech translation, language modeling and dialogue supporting a number of popular pre-trained models.
SHiNe
A novel classifier that uses semantic knowledge from class hierarchies. Can be seamlessly integrated with any off-the-shelf OvOD detector, with no additional computational overhead during inference.
ZLaP
Code that accompanies the CVPR 24 paper, Label Propagation for Zero-shot Classification with Vision-Language Models.
3D reconstruction and visual localization with no user intervention and no priors using only a few images.
POC
Code for the paper “Placing Objects in Context via Inpainting for Out-of-distribution Segmentation”, ECCV 2024
Several releases: SPLADE V-2, SPLADE V-3, CoSPLADE etc.
A multitask and multilingual speech model covering 99 languages.
A simple yet effective single-shot method to detect multiple people in an image and estimate their pose, body shape and expression. Training and demo code.
Code to learn to solve 4 standard combinatorial optimization problems: TSPs, CVRP. OP and KP accompanying NeurIPS23 paper.
The SHOWMe dataset comprises 96 videos with their associated high-quality textured meshes of a hand holding an object.
Collaboration with INRIA.
SPLADE is sparse bi-encoder BERT-based model for effective and efficient first-stage ranking.
The PoseFix dataset consists of several thousand paired 3D poses and corresponding text feedback that describes how the source pose needs to be modified to obtain the target pose.
Learning augmentation policies without prior knowledge.
A codebase to evaluate the robustness and uncertainty properties of semantic segmentation models as implemented in the CVPR 2024 paper.
Model for transfer learning.
Two ResNet50 models pretrained on our synthetic ImageNet clones: ImageNet-100-SD or ImageNet-1K-SD.