PoseEmbroider
A transformer-based multi-modal alignment retrieval model that processes 3D poses, person’s pictures and textual pose descriptions to produce an enhanced 3D-, visual- and semantic aware human pose representation that can sort out partial information (e.g. image with the lower body occluded). Code with ECCV 2024 paper.
Speech-MASSIVE
Covers 12 languages from different families and inherits from the original MASSIVE dataset the annotations for the intent prediction and slot filling tasks. See also the Interspeech 2024 paper.
BERGEN: benchmarking RAG
Designed to ease the reproducibility and integration of new datasets and models and identify strong baselines.
Can be used for machine translation, speech translation, language modeling and dialogue supporting a number of popular pre-trained models.
mHuBERT-147
A promising compact model for speech processing pipelines, offering an unprecedented balance between high performance and parameter efficiency. Developed within the the EU UTTER project.
A multitask and multilingual speech model covering 99 languages.
Code repository for paper: What do compressed multilingual machine translation models forget?
Covers more than 10K language pairs, achieves competitive results with M2M-100 while being much smaller and faster.
Publications concern efficient inference, continual learning, unsupervised NMT and domain adaptation.
A method to predict the drop in accuracy of a trained model.
585 samples (1006 sentences) randomly selected and annotated with the SemEval2016 annotation guidelines for the restaurant domain.