NAVER Corporation is Korea’s premier Internet company and a global leader in online services such as NAVER search (30M DAU), LINE messaging (164MAU), and WEBTOON (62M MAU). NAVER is continuously focussed on the future and in seamlessly connecting the physical and digital worlds through advanced technology. Its AI and Robotics research in Asia and Europe are fundamental to creating this future. NAVER invests over 25 percent of annual revenue in R&D yet innovation is but one core value. NAVER promotes diversity on the internet, respects and connects people helping them to share knowledge, create communities and preserve culture.
NAVER LABS Europe
NAVER, NLP and machine learning
NAVER LABS Korea
Contact details will be shared here in beforehand of the conference.
10am – 10:15am CET (UTC/GMT+1)
Hyunchang Cho and Kweonwoo Jung (Papago, NAVER)
Papago is an online translation service provided by NAVER.
The MT team within Papago focuses on advancing machine translation quality, mostly for Eastern Asian languages such as Korean, Japanese, Chinese.
In this session, we will discuss some major research topics of the team and the methods we use to enhance the user experience (ex: honorific translation and quality estimation).
10:15am – 10:30am CET (UTC/GMT+1)
Minjoon Seo (Clova, NAVER)
Clova is the AI-first organization within NAVER & LINE conducting high-impact research in a wide range of domains to empower various AI-driven products in and out of the company. In this session, I will discuss our team’s recent work and ongoing research on end-to-end document information extraction for diverse semi-structured documents, including name cards, receipts, and invoices.
10:30am – 10:45am CET (UTC/GMT+1)
Kyungduk Kim and Hyeon-gu Lee (NLP, NAVER)
NAVER focuses on both researching NLP technologies and on disseminating AI-powered products to customers. In this session, we will share our experience in deploying conversational AI technologies into commercialised AI-powered products such as smart speakers, set top boxes and vehicle infotainment systems. We’ll also briefly introduce our ongoing answer snippet extraction.
5pm – 5:30pm CET (UTC/GMT+01:00)
NLP research and openings at NAVER LABS Europe
Laurent Besacier, LIG and NLP group lead at NAVER LABS Europe: NLP research and openings at NAVER LABS Europe
This presentation is intended for potential academic collaborators, PhD or internship candidates. A short presentation of NLP activities at NAVER LABS Europe and recent highlights will be given ending with positions currently open in the group in France.
Enter the virtual venue space and meet us on the Gather.town platform (need to be an EMNLP registered attendee and login). Check out the NAVER booth on Rocket chat
Kang Min Yoo, Research Scientist
Ji-Hoon Kim, Research Scientist
Matthias Gallé, Lab Manager
Laurent Besacier, Research Scientist / NLP group leader
Minjoon Seo, Software Engineer
Gyuwan Kim, Software Engineer
Hady Elsahar, Research Scientist
Kweonwoo Jung, Software Engineer
Hyunchang Cho, Research Scientist
Matthias Gallé, Lab Manager
Alexandre Berard, Research Scientist
Kyoungduk Kim, Software Engineer
Seonhoon Kim, Software Engineer
German Kruszewski, Research Scientist
Nov 17, 10:00 – 11:00 CET (UTC/GMT+1): 6C
Context-aware answer extraction in question answering
EMNLP 2020 | Yeon Seonwoo, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh
Extractive QA models have shown very promising performance in predicting the correct answer to a question for a given passage. However, they sometimes result in predicting the correct answer text but in a context irrelevant to the given question. This discrepancy becomes especially important as the number of occurrences of the answer text in a passage increases. To resolve this issue, we propose BLANC (BLock AttentioN for Context prediction) based on two main ideas: context prediction as an auxiliary task in multi-task learning manner, and a block attention method that learns the context prediction task. With experiments on reading comprehension, we show that BLANC outperforms the state-of-the-art QA models, and the performance gap increases as the number of answer text occurrences increases. We also conduct an experiment of training the models using SQuAD and predicting the supporting facts on HotpotQA and show that BLANC outperforms all baseline models in this zero-shot setting.
Nov 17, 11:00 – 13:00 CET (UTC/GMT+1): 2B
Variational hierarchical dialog autoencoder for dialog state tracking data augmentation
EMNLP 2020 | Kang Min Yoo, Hanbit Lee, Franck Dernoncourt, Trung Bui, Walter Chang, Sang-goo Lee
Recent works have shown that generative data augmentation, where synthetic samples generated from deep generative models complement the training dataset, benefit NLP tasks. In this work, we extend this approach to the task of dialog state tracking for goal-oriented dialogs. Due to the inherent hierarchical structure of goal-oriented dialogs over utterances and related annotations, the deep generative model must be capable of capturing the coherence among different hierarchies and types of dialog features. We propose the Variational Hierarchical Dialog Autoencoder (VHDA) for modeling the complete aspects of goal-oriented dialogs, including linguistic features and underlying structured annotations, namely speaker information, dialog acts, and goals. The proposed architecture is designed to model each aspect of goal-oriented dialogs using inter-connected latent variables and learns to generate coherent goal-oriented dialogs from the latent spaces. To overcome training issues that arise from training complex variational models, we propose appropriate training strategies. Experiments on various dialog datasets show that our model improves the downstream dialog trackers’ robustness via generative data augmentation. We also discover additional benefits of our unified approach to modeling goal-oriented dialogs – dialog response generation and user simulation, where our model outperforms previous strong baselines.
Nov 17, 19:00 – 21:00 CET (UTC/GMT+1): 3A
Monolingual adapters for zero-shot neural machine translation
EMNLP 2020 | Jerin Philip, Alexandre Berard, Matthias Gallé, Laurent Besacier
We propose a novel adapter layer formalism for adapting multilingual models. They are more parameter efficient than existing adapter layers while obtaining as good or better performance. The layers are specific to one language (as opposed to bilingual adapters) allowing to compose them and generalize to unseen language-pairs. In this zero-shot setting, they obtain a median improvement of +2.77 BLEU points over a strong 20-language multilingual Transformer baseline trained on TED talks.
Participatory research for low-resourced machine translation: a case study in African languages
Findings of EMNLP | Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Muhammad, Salomon KABONGO KABENAMUALU, Salomey Osei, Freshia Sackey, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa Berhe, Mofetoluwa Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkadir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, ghollah kioko, Murhabazi Espoir, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Chinenye Emezue, Bonaventure F. P. Dossou, Blessing Sibanda, Blessing Bassey, Ayodele Olabiyi, Arshath Ramkilowan, Alp Öktem, Adewale Akinfaderin, Abdallah Bashir
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. “Low-resourced”-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released at [https://github.com/masakhane-io/masakhane-mt
14th April 2021: This paper was awarded the Wikimedia Research Award at The Web Conference 2021
Large product key memory for pre-trained language models
Findings of EMNLP | Gyuwan Kim, Tae-Hwan Jung
Product key memory (PKM) proposed by Lample et al. (2019) enables to improve prediction accuracy by increasing model capacity efficiently with insignificant computational overhead. However, their empirical application is only limited to causal language modeling. Motivated by the recent success of pre-trained language models (PLMs), we investigate how to incorporate large PKM into PLMs that can be finetuned for a wide variety of downstream NLP tasks. We define a new memory usage metric, and careful observation using this metric reveals that most memory slots remain outdated during the training of PKM-augmented models. To train better PLMs by tackling this issue, we propose simple but effective solutions: (1) initialization from the model weights pre-trained without memory and (2) augmenting PKM by addition rather than replacing a feed-forward network. We verify that both of them are crucial for the pre-training of PKM-augmented PLMs, enhancing memory utilization and downstream performance. Code and pre-trained weights are available.
Nov 20: Live Session 2, 16:00 – 22:00 CET (UTC/GMT+1)
16:00-16:15 A multi-lingual neural machine translation model for biomedical data,
NLP COVID-19 workshop, EMNLP | Alexandre Berard, Zae Myung Kim, Vassilina Nikoulina, Eunjeong Lucy Park, Matthias Gallé
We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain. The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English. It is trained with large amounts of generic and biomedical data, using domain tags. Our benchmarks show that it performs near state-of-the-art both on news (generic domain) and biomedical test sets, and that it outperforms the existing publicly released models. We believe that this release will help the large-scale multilingual analysis of the digital content of the COVID-19 crisis and of its effects on society, economy, and healthcare policies. We also release a test set of biomedical text for Korean-English. It consists of 758 sentences from official guidelines and recent papers, all about COVID-19.
Nov 19: Live Session 1, 10:45 – 21:00 CET (UTC/GMT+1)
12:00 – 13:30 CET (UTC/GMT+1) NAVER LABS Europe’s participation to the robustness, chat and biomedical tasks at WMT 2020
Alexandre Berard, Vassilina Nikoulina, Ioan Calapodescu
This paper describes Naver Labs Europe’s participation to the Robustness, Chat and Biomedical Translation tasks at WMT 2020. We propose a bidirectional German-English model that is multi-domain, robust to noise and which can translate entire documents (or bilingual dialogues) at once. We use the same ensemble of such models as our primary submission to all three tasks, and achieve competitive results. We also experiment with language model pre-training techniques and evaluate their impact on robustness to noise and out-of-domain translation. For German, Spanish, Italian and French to English translation in the Biomedical Task, we also submit our recently released multilingual Covid19NMT model.
Nov 20: Live Session 2, 10:00 – 21:15 CET (UTC/GMT+1)
12:00 – 13:30 CET (UTC/GMT+1) PATQUEST: Papago Translation Quality Estimation
WMT20 at EMNLP | Yujin Baek, Zae Myung Kim, Jihyung Moon, Hyunjoong Kim, Eunjeong Park
This paper is a system description paper for NAVER Papago’s submission to the WMT20 Quality Estimation Task.. It proposes two key strategies for quality estimation: (1) task-specific pre-training scheme, and (2) task-specific data augmentation. The former focuses on devising learning signals for pre-training that are closely related to the downstream task. We also present data augmentation techniques that simulate the varying levels of errors that the downstream dataset may contain. Thus, our PATQUEST models are exposed to erroneous translations in both stages of task-specific pre-training and fine tuning, effectively enhancing their generalization capability. Our submitted models achieve significant improvement over the baselines for Task 1 (Sentence-Level Direct Assessment; EN-DE only), and Task 3 (Document Level Score).
Creating new connections by advancing technology
If you enjoy a challenge, are passionate, talented and embrace diversity, then right here may be the perfect place for you!
A culture that recognizes ‘capability’ regardless of age or seniority
Self decision-making and system of choice
Diversity is the reason NAVER came into existence in 1999. The need to provide alternatives is a fundamental core value for a healthy society.
We value different ways of thinking about the world and different perceptions of the world.
We try to create an inclusive workplace where respect reigns. A place where everyone can be themselves.
NAVER was recognised as a top employer and company university students would like to work for in South Korea for 3 consecutive years
(2016 – 2019)
NAVER LABS Europe 6-8 chemin de Maupertuis 38240 Meylan France Contact
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimization problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimization to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments. More details on our research can be found in the Explore section below.
For a robot to be useful it must be able to represent its knowledge of the world, share what it learns and interact with other agents, in particular humans. Our research combines expertise in human-robot interaction, natural language processing, speech, information retrieval, data management and low code/no code programming to build AI components that will help next-generation robots perform complex real-world tasks. These components will help robots interact safely with humans and their physical environment, other robots and systems, represent and update their world knowledge and share it with the rest of the fleet. More details on our research can be found in the Explore section below.
Visual perception is a necessary part of any intelligent system that is meant to interact with the world. Robots need to perceive the structure, the objects, and people in their environment to better understand the world and perform the tasks they are assigned. Our research combines expertise in visual representation learning, self-supervised learning and human behaviour understanding to build AI components that help robots understand and navigate in their 3D environment, detect and interact with surrounding objects and people and continuously adapt themselves when deployed in new environments. More details on our research can be found in the Explore section below.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
—————
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
1. Difference in female/male salary: 34/40 points
2. Difference in salary increases female/male: 35/35 points
3. Salary increases upon return from maternity leave: Non calculable
4. Number of employees in under-represented gender in 10 highest salaries: 5/10 points
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
——————-
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
1. Les écarts de salaire entre les femmes et les hommes: 34 sur 40 points
2. Les écarts des augmentations individuelles entre les femmes et les hommes : 35 sur 35 points
3. Toutes les salariées augmentées revenant de congé maternité : Incalculable
4. Le nombre de salarié du sexe sous-représenté parmi les 10 plus hautes rémunérations : 5 sur 10 points
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimisation problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimisation to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments.
The research we conduct on expressive visual representations is applicable to visual search, object detection, image classification and the automatic extraction of 3D human poses and shapes that can be used for human behavior understanding and prediction, human-robot interaction or even avatar animation. We also extract 3D information from images that can be used for intelligent robot navigation, augmented reality and the 3D reconstruction of objects, buildings or even entire cities.
Our work covers the spectrum from unsupervised to supervised approaches, and from very deep architectures to very compact ones. We’re excited about the promise of big data to bring big performance gains to our algorithms but also passionate about the challenge of working in data-scarce and low-power scenarios.
Furthermore, we believe that a modern computer vision system needs to be able to continuously adapt itself to its environment and to improve itself via lifelong learning. Our driving goal is to use our research to deliver embodied intelligence to our users in robotics, autonomous driving, via phone cameras and any other visual means to reach people wherever they may be.
This web site uses cookies for the site search, to display videos and for aggregate site analytics.
Learn more about these cookies in our privacy notice.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.