Controllable Prosodic Features for End-to-End Neural Speech Synthesis Models - Naver Labs Europe
4 October 2019
Meylan, France
Job Type
Start date
February 2020
5-6 months


Current text-to-speech systems generally produce averaged prosodic features for multi-speaker and single-speaker corpora alike.  Recent works, like [1], [2] or [3] propose solutions for varied intonation, controllable sentence level prosodic features or prosody transfer. However, these control systems, for end-to-end TTS architectures, are not easily interpretable by an end-user. The goal of this internship is to study how to control prosodic features in the TTS output, in a way that remains interpretable (e.g. word level stress, declarative versus interrogative intonation, etc.).


The intern will work in a team with expertise in phonetics, ASR/TTS systems and NLG models. The work will consist in automating prosodic features extraction, training and evaluating a deep-neural network able to change the output’s prosodic features, at runtime, in an interpretable way.


Publication of results in major conferences/journals will be strongly encouraged.

Required skills

- Student at master (research-oriented) or PhD level
- Knowledge of deep learning as applied to NLP (NLG models)
- Good coding skills, including at least one the major deep learning toolkits (preferably pytorch)


[1] CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network, Vincent Wan and Chun-an Chan and Tom Kenter and Jakub Vit and Rob Clark, 2019

[2] Using generative modelling to produce varied intonation for speech synthesis, Zack Hodari, Oliver Watts, Simon King, 2019

[3] Fine-grained robust prosody transfer for single-speaker neural text-to-speech, Viacheslav Klimkov, Srikanth Ronanki, Jonas Rohnke, Thomas Drugman, 2019

Application instructions

Please note that applicants must be registered students at a university or other academic institution and that this establishment will need to sign an 'Internship Convention' with NAVER LABS Europe before the student is accepted.

You can apply for this position online. Don't forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.


NAVER LABS is a world class team of self-motivated and highly engaged researchers, engineers and interface designers collaborating together to create next generation ambient intelligence technology and services that are rich with the organic understanding they have of users, their contexts and situations.

Since 2013 LABS has led NAVER’s innovation in technology through products such as the AI-based translation app ‘Papago’, the omni-tasking web browser ‘Whale’, the virtual AI assistant ‘WAVE’, in-vehicle information entertainment system ‘AWAY’ and M1, the 3D indoor mapping robot.

The team in Europe is multidisciplinary and extremely multicultural specializing in artificial intelligence, machine learning, computer vision, natural language processing, UX and ethnography. We collaborate with many partners in the European scientific community on R&D projects.

NAVER LABS Europe is located in the south east of France in Grenoble. The notoriety of Grenoble comes from its exceptional natural environment and scientific ecosystem with 21,000 jobs in public and private research. It is home to 1 of the 4 French national institutes in AI called MIAI (Multidisciplinary Innovation in Ai) It has a large student community (over 62,000 students) and is a lively and cosmopolitan place, offering a host of leisure opportunities. Grenoble is close to both the Swiss and Italian borders and is the ideal place for skiing, hiking, climbing, hang gliding and all types of mountain sports.

Drop files here browse files ...
Are you sure you want to delete this file?