NAVER LABS Europe submission to the Instruction-following 2026 Short Track

Published by Ioan Calapodescu at 3 July 2026

Beomseok Lee, Marcely Zanon Boito, Laurent Besacier, Ioan Calapodescu

The International Conference on Spoken Language Translation (IWSLT), San Diego, CA, USA, 3-4 July, 2026

In this paper, we describe NAVER LABS Europe’s submission to the instruction-following speech processing short track at IWSLT 2026. We participate again in the constrained setting, developing systems capable of jointly performing ASR, ST, and SQA from English speech into Chinese, Italian, and German. Building on our previous submission, ranked first in last year’s short track, we update our multi-stage training pipeline by replacing the speech projector with SpeechMapper, a method for learning a speech-to-LLM embedding projector using ASR-only data. In addition, we introduce a synthetic SQA dataset, fakACL, composed of artificially generated scientific presentations. This dataset is built by prompting the LLM backbone, segmenting the generated talks, and synthesizing speech with Seamless. The combination of an improved speech projection mechanism and domain-specific synthetic data allows our model to outperform last year’s best short-track system on the MCIF dataset, while being considerably more compact and relying on a weaker LLM backbone.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2026

All

Publications

Blog

News

Code & Data

Careers

People

NAVER LABS Europe submission to the Instruction-following 2026 Short Track

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings