Efficient multilingual machine translation
This page contains resources related to our EMNLP and WMT 2021 publications about multilingual machine translation. We release model checkpoints, fairseq modules to decode from those models, the test splits we used in the papers, and translation outputs by our models.
Read this blog article for an overview of our work on multilingual machine translation
Download our fairseq modules
The archive contains a README giving installation and usage instructions. Note that most of our checkpoints cannot be used without these modules.
Efficient Inference for MNMT
Checkpoints: ParaCrawl models and TED Talks models
Test sets: TED2020 splits
Translation outputs: FLORES
Citation:
@inproceedings{berard2021_efficient, title={Efficient Inference for Multilingual Neural Machine Translation}, author={B\'erard, Alexandre and Lee, Dain and Clinchant, St\'ephane and Jung, Kweonwoo and Nikoulina, Vassilina}, booktitle="Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month=nov, year="2021", address="Punta Cana, Dominican Republic", publisher="Association for Computational Linguistics", url={https://arxiv.org/abs/2109.06679} }
Continual Learning in MNMT via Language-Specific Embeddings
Checkpoints: ParaCrawl models and TED Talks models
Test sets: TED2020 splits
Translation outputs: FLORES
Citation:
@inproceedings{berard2021_continual, title={Continual Learning in Multilingual NMT via Language-Specific Embeddings}, author={B\'erard, Alexandre}, booktitle="Proceedings of the Sixth Conference in Machine Translation (WMT)", month=nov, year="2021", publisher="Association for Computational Linguistics", url={https://arxiv.org/abs/2110.10478} }
Multilingual Unsupervised Neural Machine Translation with Denoising Adapters
Checkpoints: mBART full fine-tuning, task adapters and denoising adapters
Citation:
@inproceedings{ustun2021_denoising, title={Multilingual Unsupervised Neural Machine Translation with Denoising Adapters}, author={\"Ust\"un, Ahmet and B\'erard, Alexandre and Besacier, Laurent and Gall\'e, Matthias}, booktitle="Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month=nov, year="2021", address="Punta Cana, Dominican Republic", publisher="Association for Computational Linguistics", url={https://arxiv.org/abs/2110.10472} }
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters
Test sets: Koran, Medical and IT splits
Translation outputs: Koran
Citation:
@inproceedings{stickland2021_multilingual, title={Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters}, author={Cooper Stickland, Asa and B\'rard, Alexandre and Nikoulina, Vassilina}, booktitle="Proceedings of the Sixth Conference in Machine Translation (WMT)", month=nov, year="2021", publisher="Association for Computational Linguistics", url={https://arxiv.org/abs/2110.09574}}