Efficient inference for multilingual neural machine translation

Published by Alexandre Berard at 7 November 2021

Alexandre Berard, Dain Lee, Stéphane Clinchant, Kweonwoo Jung, Vassilina Nikoulina

Conference on Empirical Methods in Natural Language Processing (EMNLP), Punta Cana, Dominican Republic (hybrid event), 7-11 November 2021

arXiv

Download

Careers home

Abstract

Multilingual NMT has become an attractive solution for MT deployment in production. But to match bilingual quality, it comes at the cost of larger and slower models. In this work, we consider several ways to make multilingual NMT faster at inference without degrading its quality. We experiment with several “light decoder” architectures in two 20-language multi-parallel settings: small-scale on TED Talks and large-scale on ParaCrawl. Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to more than twice faster inference with no loss in translation quality. We validate our findings with BLEU and chrF (on 380 language pairs), robustness evaluation and human evaluation.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2026

All

Publications

Blog

News

Code & Data

Careers

People

Efficient inference for multilingual neural machine translation

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings