Dual speech-text encoding for spoken language understanding - Internship - Naver Labs Europe
loader image
NAVER LABS Europe
Published
13 October 2020
Location
Meylan, Grenoble, France, France
Category
Start date
December 2020
Duration
5-6 months

Description

Speech systems (spoken language understanding, spoken question answering, speech  translation) can either (a) include an explicit automatic speech recognition (ASR) module (cascade approach) or (b) rely on end-to-end architecture where the systems take speech as input and directly produce a decision from it. While those two approaches (cascade versus end-to-end) have been often opposed and compared in the past, fewer works tried to take advantage of the two modalities represented by speech input and text input (ASR transcript).

 

This project aims to propose a model that jointly learns from streamed audio and its noisy transcription into text and apply it to challenging tasks such as spoken language understanding or spoken question answering. In particular we believe that this approach should (a) allow to jointly integrate acoustic and semantic information for further downstream tasks, (b) facilitate knowledge transfer between text and speech tasks by minimizing the representation difference between text and speech input and, (c) bring additional paralinguistic information (speaker gender, prosody, speaker emotion) to the overall model. A starting point could be two different encoders (speech and text) whose states synchronize at the utterance level. But we could imagine more advanced architectures with cross- modality attention (and at different layers). We would work on a recently introduced dataset called EMOTyDA (https://github.com/sahatulika15/EMOTyDA) collected from open-sourced dialogue datasets and which contains speech, transcripts, videos and semantic annotations.

Required skills

The student has to be currently enrolled in a university, either in a research-oriented Master's, an engineering school or at PhD level.
• Knowledge of deep learning as applied to NLP and/or speech;
• Good coding skills, including at least one of the major deep learning toolkits (preferablyvPytorch);
• Data manipulation (textual data) and Python programming.

Application instructions

You can apply for this position online. Don't forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.

About NAVER LABS

NAVER LABS Europe has full-time positions, PhD and PostDoc opportunities throughout the year which are advertised here and on international conference sites that we sponsor such as CVPR, ICCV, ICML, NeurIPS, EMNLP etc.

NAVER LABS Europe is an equal opportunity employer.

NAVER LABS are in Grenoble in the French Alps. We have a multi and interdisciplinary approach to research with scientists in machine learning, computer vision, artificial intelligence, natural language processing, ethnography and UX working together to create next generation ambient intelligence technology and services that deeply understand users and their contexts.

Apply to this internship
Drop files here browse files ...
Drop files here browse files ...
Drop files here browse files ...
Captcha

Related Jobs

OCR-based robot navigation - Internship   Meylan, Grenoble, France, France new
28 October 2020
Neural Indexing for Deep Information Retrieval - Internship   Meylan, Grenoble, France, France new
22 October 2020
20 October 2020
Ethnographer/Ergonomics/UX - Internship   Meylan, Grenoble, France, France
28 September 2020
Are you sure you want to delete this file?
/