Code and data from NAVER LABS Europe

GUARD

Guaranteed Generation from Large Language Models

A principled approach to enforcing strict guarantees for LLMs without compromising their generative capabilities combining an autoregressive proposal distribution with rejection sampling.

LLM

Github link

Blog link

OSCAR

Online Soft Compression And Reranking

A novel query-dependent online soft compression method for RAG that reduces computational overhead while preserving performance. Unlike traditional hard compression methods, which shorten retrieved texts, or soft compression approaches, which map documents to continuous embeddings offline, OSCAR dynamically compresses retrieved information at inference time, eliminating storage overhead and enabling higher compression rates.

Information retrieval, LLM

Hugging Face link

Blog link

Speech-MASSIVE

A multilingual Spoken Language Understanding (SLU) dataset

Covers 12 languages from different families and inherits from the original MASSIVE dataset the annotations for the intent prediction and slot filling tasks. See also the Interspeech 2024 paper.

LLM, NLP, Speech

Hugging Face link

Blog link

ELITR-Bench

A benchmark for the evaluation of long-context LLMs on meeting transcripts.

The meeting data used in this benchmark originally comes from the ELITR dataset. This dataset and experiments are described in the paper and are an output of the EU UTTER project.

Foundation models, LLM, Speech

Github link

Blog link

Web page link

Pasero

Lightweight Pytorch framework for training and running text generation models.

Can be used for machine translation, speech translation, language modeling and dialogue supporting a number of popular pre-trained models.

Foundation models, LLM, Machine translation, NLP

Github link

DISCo

DIStributional Control of LLMs

A toolkit for controlling language models and other generative models.

Foundation models, LLM

Github link

Blog link

Web page link

Zero-shot task generalization (models)

Multitask prompted training: models (BLOOM BigScience).

These prompted datasets to benchmark the ability of a model to perform completely unseen tasks specified in natural language.

Foundation models, LLM

Github link

Blog link

Zero-shot task generalization (prompts)

Multitask prompted training: prompts (BLOOM BigScience).

These prompted datasets to benchmark the ability of a model to perform completely unseen tasks specified in natural language.

Foundation models, LLM

Github link

Blog link

Generative Distribution Control (GDC)

Debiasing large pretrained language models using distributional control.

A general framework for imposing constraints on samples of pretrained language models

Foundation models, LLM

Github link

Blog link

Web page link

CODE & DATA

Topics

GUARD

OSCAR

Speech-MASSIVE

ELITR-Bench

Pasero

DISCo

Zero-shot task generalization (models)

Zero-shot task generalization (prompts)

Generative Distribution Control (GDC)

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

CODE & DATA

Topics

GUARD

OSCAR

Speech-MASSIVE

Cookie settings