Code and data from NAVER LABS Europe

PoseEmbroider

Towards 3D, visual, semantic-aware human pose representation

A transformer-based multi-modal alignment retrieval model that processes 3D poses, person’s pictures and textual pose descriptions to produce an enhanced 3D-, visual- and semantic aware human pose representation that can sort out partial information (e.g. image with the lower body occluded). Code with ECCV 2024 paper.

3D vision, Computer vision, Human understanding, NLP

Github link

Blog link

Web page link

Speech-MASSIVE

A multilingual Spoken Language Understanding (SLU) dataset

Covers 12 languages from different families and inherits from the original MASSIVE dataset the annotations for the intent prediction and slot filling tasks. See also the Interspeech 2024 paper.

LLM, NLP, Speech

Hugging Face link

Blog link

BERGEN: benchmarking RAG

A Benchmarking Library for Retrieval-Augmented Generation

Designed to ease the reproducibility and integration of new datasets and models and identify strong baselines.

Neural retrieval, NLP

Github link

Blog link

Pasero

Lightweight Pytorch framework for training and running text generation models.

Can be used for machine translation, speech translation, language modeling and dialogue supporting a number of popular pre-trained models.

Foundation models, LLM, Machine translation, NLP

Github link

mHuBERT-147

The first general-purpose massively multilingual HuBERT speech representation model.

A promising compact model for speech processing pipelines, offering an unprecedented balance between high performance and parameter efficiency. Developed within the the EU UTTER project.

Foundation models, NLP, Speech

Github link

Hugging Face link

Blog link

Web page link

DistilWhisper

Efficient distillation of multi-task speech models via language-specific experts.

A multitask and multilingual speech model covering 99 languages.

NLP, Speech

Github link

Hugging Face link

Blog link

Multilingual machine translation

Assessing the impact of compression methods on MNMT.

Code repository for paper: What do compressed multilingual machine translation models forget?

Machine translation, NLP

Github link

Blog link

SMaLL-100

A shallow multilingual machine translation model for low-resource languages.

Covers more than 10K language pairs, achieves competitive results with M2M-100 while being much smaller and faster.

Machine translation, NLP

Github link

Hugging Face link

Blog link

NMT & Efficient Multilingual NMT

Code, model checkpoints, test sets and outputs for 4 multilingual NMT papers (EMNLP2021).

Publications concern efficient inference, continual learning, unsupervised NMT and domain adaptation.

Machine translation, NLP

Web page link

COVID-19 NMT

Multi-lingual & multi-domain translation model.

Model specialised for biomedical data.

Machine translation, NLP

Github link

Web page link

To Annotate or Not?

Domain shift prediction

A method to predict the drop in accuracy of a trained model.

NLP

Github link

Blog link

Web page link

Aspect Based Sentiment Analysis (ABSA) dataset

Manually annotated ABSA dataset from Foursquare comments.

585 samples (1006 sentences) randomly selected and annotated with the SemEval2016 annotation guidelines for the restaurant domain.

NLP

Blog link

CODE & DATA

Topics

PoseEmbroider

Speech-MASSIVE

BERGEN: benchmarking RAG

Pasero

mHuBERT-147

DistilWhisper

Multilingual machine translation

SMaLL-100

NMT & Efficient Multilingual NMT

COVID-19 NMT

To Annotate or Not?

Aspect Based Sentiment Analysis (ABSA) dataset

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

CODE & DATA

Topics

PoseEmbroider

Speech-MASSIVE

BERGEN: benchmarking RAG

mHuBERT-147

Cookie settings