CODE & DATA

Data, code and models released by NAVER LABS Europe

Speech-MASSIVE

A multilingual Spoken Language Understanding (SLU) dataset

Covers 12 languages from different families and inherits from the original MASSIVE dataset the annotations for the intent prediction and slot filling tasks. See also the Interspeech 2024 paper.

ELITR-Bench

A benchmark for the evaluation of long-context LLMs on meeting transcripts.

The meeting data used in this benchmark originally comes from the ELITR dataset. This dataset and experiments are described in the paper and are an output of the EU UTTER project.

mHuBERT-147

The first general-purpose massively multilingual HuBERT speech representation model.

A promising compact model for speech processing pipelines, offering an unprecedented balance between high performance and parameter efficiency. Developed within the the EU UTTER project.

DistilWhisper

Efficient distillation of multi-task speech models via language-specific experts.

A multitask and multilingual speech model covering 99 languages.

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.

Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.