CODE & DATA

Data, code and models released by NAVER LABS Europe

GUARD

Guaranteed Generation from Large Language Models

A principled approach to enforcing strict guarantees for LLMs without compromising their generative capabilities combining an autoregressive proposal distribution with rejection sampling.

OSCAR

Online Soft Compression And Reranking

A novel query-dependent online soft compression method for RAG that reduces computational overhead while preserving performance. Unlike traditional hard compression methods, which shorten retrieved texts, or soft compression approaches, which map documents to continuous embeddings offline, OSCAR dynamically compresses retrieved information at inference time, eliminating storage overhead and enabling higher compression rates.

Speech-MASSIVE

A multilingual Spoken Language Understanding (SLU) dataset

Covers 12 languages from different families and inherits from the original MASSIVE dataset the annotations for the intent prediction and slot filling tasks. See also the Interspeech 2024 paper.

ELITR-Bench

A benchmark for the evaluation of long-context LLMs on meeting transcripts.

The meeting data used in this benchmark originally comes from the ELITR dataset. This dataset and experiments are described in the paper and are an output of the EU UTTER project.

Pasero

Lightweight Pytorch framework for training and running text generation models.

Can be used for machine translation, speech translation, language modeling and dialogue supporting a number of popular pre-trained models.

DISCo

DIStributional Control of LLMs

A toolkit for controlling language models and other generative models.

Zero-shot task generalization (models)

Multitask prompted training: models (BLOOM BigScience).

These prompted datasets to benchmark the ability of a model to perform completely unseen tasks specified in natural language.

Zero-shot task generalization (prompts)

Multitask prompted training: prompts (BLOOM BigScience).

These prompted datasets to benchmark the ability of a model to perform completely unseen tasks specified in natural language.

Generative Distribution Control (GDC)

Debiasing large pretrained language models using distributional control.

A general framework for imposing constraints on samples of pretrained language models

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.

Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.