Dual speech-text encoding for spoken language understanding - Internship - Naver Labs Europe
13 October 2020
Meylan, Grenoble, France, France
Start date
December 2020
5-6 months


Speech systems (spoken language understanding, spoken question answering, speech  translation) can either (a) include an explicit automatic speech recognition (ASR) module (cascade approach) or (b) rely on end-to-end architecture where the systems take speech as input and directly produce a decision from it. While those two approaches (cascade versus end-to-end) have been often opposed and compared in the past, fewer works tried to take advantage of the two modalities represented by speech input and text input (ASR transcript).


This project aims to propose a model that jointly learns from streamed audio and its noisy transcription into text and apply it to challenging tasks such as spoken language understanding or spoken question answering. In particular we believe that this approach should (a) allow to jointly integrate acoustic and semantic information for further downstream tasks, (b) facilitate knowledge transfer between text and speech tasks by minimizing the representation difference between text and speech input and, (c) bring additional paralinguistic information (speaker gender, prosody, speaker emotion) to the overall model. A starting point could be two different encoders (speech and text) whose states synchronize at the utterance level. But we could imagine more advanced architectures with cross- modality attention (and at different layers). We would work on a recently introduced dataset called EMOTyDA (https://github.com/sahatulika15/EMOTyDA) collected from open-sourced dialogue datasets and which contains speech, transcripts, videos and semantic annotations.

Required skills

The student has to be currently enrolled in a university, either in a research-oriented Master's, an engineering school or at PhD level.
• Knowledge of deep learning as applied to NLP and/or speech;
• Good coding skills, including at least one of the major deep learning toolkits (preferablyvPytorch);
• Data manipulation (textual data) and Python programming.

Application instructions

You can apply for this position online. Don't forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.


NAVER LABS Europe has full-time positions, PhD and PostDoc opportunities throughout the year which are advertised here and on international conference sites that we sponsor such as CVPR, ICCV, ICML, NeurIPS, EMNLP etc.

NAVER LABS Europe is an equal opportunity employer.

NAVER LABS are in Grenoble in the French Alps. We have a multi and interdisciplinary approach to research with scientists in machine learning, computer vision, artificial intelligence, natural language processing, ethnography and UX working together to create next generation ambient intelligence technology and services that deeply understand users and their contexts.

Apply to this internship
Drop files here browse files ...
Drop files here browse files ...
Drop files here browse files ...

Related Jobs

Research Scientist in Spoken Language Processing   Meylan, Grenoble, France, France
9 December 2020
25 November 2020
20 October 2020
Research Scientist in AI for Robotics   Meylan, Grenoble, France, France
25 June 2020
25 June 2020
Are you sure you want to delete this file?