Speech-MASSIVE
Covers 12 languages from different families and inherits from the original MASSIVE dataset the annotations for the intent prediction and slot filling tasks. See also the Interspeech 2024 paper.
BERGEN: benchmarking RAG
Designed to ease the reproducibility and integration of new datasets and models and identify strong baselines.
Can be used for machine translation, speech translation, language modeling and dialogue supporting a number of popular pre-trained models.
mHuBERT-147
A promising compact model for speech processing pipelines, offering an unprecedented balance between high performance and parameter efficiency. Developed within the the EU UTTER project.
A multitask and multilingual speech model covering 99 languages.
Code repository for paper: What do compressed multilingual machine translation models forget?
Covers more than 10K language pairs, achieves competitive results with M2M-100 while being much smaller and faster.
Publications concern efficient inference, continual learning, unsupervised NMT and domain adaptation.
A method to predict the drop in accuracy of a trained model.
585 samples (1006 sentences) randomly selected and annotated with the SemEval2016 annotation guidelines for the restaurant domain.