NAVER Corporation is Korea’s premier Internet company and a global leader in online services such as NAVER search (30M DAU), LINE messaging (164MAU), and WEBTOON (62M MAU). NAVER is continuously focussed on the future and in seamlessly connecting the physical and digital worlds through advanced technology. Its AI and Robotics research in Asia and Europe are fundamental to creating this future. NAVER invests over 25 percent of annual revenue in R&D yet innovation is but one core value. NAVER promotes diversity on the internet, respects and connects people helping them to share knowledge, create communities and preserve culture.
Kang Min Yoo, Research Scientist
Ji-Hoon Kim, Research Scientist
Matthias Gallé, Lab Manager
Laurent Besacier, Research Scientist / NLP group leader
Minjoon Seo, Software Engineer
Gyuwan Kim, Software Engineer
Hady Elsahar, Research Scientist
Kweonwoo Jung, Software Engineer
Hyunchang Cho, Research Scientist
Matthias Gallé, Lab Manager
Alexandre Berard, Research Scientist
Kyoungduk Kim, Software Engineer
Seonhoon Kim, Software Engineer
German Kruszewski, Research Scientist
Nov 17, 10:00 – 11:00 CET (UTC/GMT+1): 6C
Context-aware answer extraction in question answering
EMNLP 2020 | Yeon Seonwoo, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh
Extractive QA models have shown very promising performance in predicting the correct answer to a question for a given passage. However, they sometimes result in predicting the correct answer text but in a context irrelevant to the given question. This discrepancy becomes especially important as the number of occurrences of the answer text in a passage increases. To resolve this issue, we propose BLANC (BLock AttentioN for Context prediction) based on two main ideas: context prediction as an auxiliary task in multi-task learning manner, and a block attention method that learns the context prediction task. With experiments on reading comprehension, we show that BLANC outperforms the state-of-the-art QA models, and the performance gap increases as the number of answer text occurrences increases. We also conduct an experiment of training the models using SQuAD and predicting the supporting facts on HotpotQA and show that BLANC outperforms all baseline models in this zero-shot setting.
Nov 17, 11:00 – 13:00 CET (UTC/GMT+1): 2B
Variational hierarchical dialog autoencoder for dialog state tracking data augmentation
EMNLP 2020 | Kang Min Yoo, Hanbit Lee, Franck Dernoncourt, Trung Bui, Walter Chang, Sang-goo Lee
Recent works have shown that generative data augmentation, where synthetic samples generated from deep generative models complement the training dataset, benefit NLP tasks. In this work, we extend this approach to the task of dialog state tracking for goal-oriented dialogs. Due to the inherent hierarchical structure of goal-oriented dialogs over utterances and related annotations, the deep generative model must be capable of capturing the coherence among different hierarchies and types of dialog features. We propose the Variational Hierarchical Dialog Autoencoder (VHDA) for modeling the complete aspects of goal-oriented dialogs, including linguistic features and underlying structured annotations, namely speaker information, dialog acts, and goals. The proposed architecture is designed to model each aspect of goal-oriented dialogs using inter-connected latent variables and learns to generate coherent goal-oriented dialogs from the latent spaces. To overcome training issues that arise from training complex variational models, we propose appropriate training strategies. Experiments on various dialog datasets show that our model improves the downstream dialog trackers’ robustness via generative data augmentation. We also discover additional benefits of our unified approach to modeling goal-oriented dialogs – dialog response generation and user simulation, where our model outperforms previous strong baselines.
Nov 17, 19:00 – 21:00 CET (UTC/GMT+1): 3A
Monolingual adapters for zero-shot neural machine translation
EMNLP 2020 | Jerin Philip, Alexandre Berard, Matthias Gallé, Laurent Besacier
We propose a novel adapter layer formalism for adapting multilingual models. They are more parameter efficient than existing adapter layers while obtaining as good or better performance. The layers are specific to one language (as opposed to bilingual adapters) allowing to compose them and generalize to unseen language-pairs. In this zero-shot setting, they obtain a median improvement of +2.77 BLEU points over a strong 20-language multilingual Transformer baseline trained on TED talks.
Participatory research for low-resourced machine translation: a case study in African languages
Findings of EMNLP | Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Muhammad, Salomon KABONGO KABENAMUALU, Salomey Osei, Freshia Sackey, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa Berhe, Mofetoluwa Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkadir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, ghollah kioko, Murhabazi Espoir, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Chinenye Emezue, Bonaventure F. P. Dossou, Blessing Sibanda, Blessing Bassey, Ayodele Olabiyi, Arshath Ramkilowan, Alp Öktem, Adewale Akinfaderin, Abdallah Bashir
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. “Low-resourced”-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released at [https://github.com/masakhane-io/masakhane-mt
Large product key memory for pre-trained language models
Findings of EMNLP | Gyuwan Kim, Tae-Hwan Jung
Product key memory (PKM) proposed by Lample et al. (2019) enables to improve prediction accuracy by increasing model capacity efficiently with insignificant computational overhead. However, their empirical application is only limited to causal language modeling. Motivated by the recent success of pre-trained language models (PLMs), we investigate how to incorporate large PKM into PLMs that can be finetuned for a wide variety of downstream NLP tasks. We define a new memory usage metric, and careful observation using this metric reveals that most memory slots remain outdated during the training of PKM-augmented models. To train better PLMs by tackling this issue, we propose simple but effective solutions: (1) initialization from the model weights pre-trained without memory and (2) augmenting PKM by addition rather than replacing a feed-forward network. We verify that both of them are crucial for the pre-training of PKM-augmented PLMs, enhancing memory utilization and downstream performance. Code and pre-trained weights are available.
Nov 20: Live Session 2, 16:00 – 22:00 CET (UTC/GMT+1)
16:00-16:15 A multi-lingual neural machine translation model for biomedical data,
NLP COVID-19 workshop, EMNLP | Alexandre Berard, Zae Myung Kim, Vassilina Nikoulina, Eunjeong Lucy Park, Matthias Gallé
We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain. The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English. It is trained with large amounts of generic and biomedical data, using domain tags. Our benchmarks show that it performs near state-of-the-art both on news (generic domain) and biomedical test sets, and that it outperforms the existing publicly released models. We believe that this release will help the large-scale multilingual analysis of the digital content of the COVID-19 crisis and of its effects on society, economy, and healthcare policies. We also release a test set of biomedical text for Korean-English. It consists of 758 sentences from official guidelines and recent papers, all about COVID-19.
Nov 19: Live Session 1, 10:45 – 21:00 CET (UTC/GMT+1)
12:00 – 13:30 CET (UTC/GMT+1) NAVER LABS Europe’s participation to the robustness, chat and biomedical tasks at WMT 2020
Alexandre Berard, Vassilina Nikoulina, Ioan Calapodescu
This paper describes Naver Labs Europe’s participation to the Robustness, Chat and Biomedical Translation tasks at WMT 2020. We propose a bidirectional German-English model that is multi-domain, robust to noise and which can translate entire documents (or bilingual dialogues) at once. We use the same ensemble of such models as our primary submission to all three tasks, and achieve competitive results. We also experiment with language model pre-training techniques and evaluate their impact on robustness to noise and out-of-domain translation. For German, Spanish, Italian and French to English translation in the Biomedical Task, we also submit our recently released multilingual Covid19NMT model.
Nov 20: Live Session 2, 10:00 – 21:15 CET (UTC/GMT+1)
12:00 – 13:30 CET (UTC/GMT+1) PATQUEST: Papago Translation Quality Estimation
WMT20 at EMNLP | Yujin Baek, Zae Myung Kim, Jihyung Moon, Hyunjoong Kim, Eunjeong Park
This paper is a system description paper for NAVER Papago’s submission to the WMT20 Quality Estimation Task.. It proposes two key strategies for quality estimation: (1) task-specific pre-training scheme, and (2) task-specific data augmentation. The former focuses on devising learning signals for pre-training that are closely related to the downstream task. We also present data augmentation techniques that simulate the varying levels of errors that the downstream dataset may contain. Thus, our PATQUEST models are exposed to erroneous translations in both stages of task-specific pre-training and fine tuning, effectively enhancing their generalization capability. Our submitted models achieve significant improvement over the baselines for Task 1 (Sentence-Level Direct Assessment; EN-DE only), and Task 3 (Document Level Score).
Creating new connections by advancing technology
If you enjoy a challenge, are passionate, talented and embrace diversity, then right here may be the perfect place for you!
NAVER was recognised as a top employer and company university students would like to work for in South Korea for 3 consecutive years
(2016 – 2019)