Current text-to-speech systems generally produce averaged prosodic features for multi-speaker and single-speaker corpora alike. Recent works, like ,  or  propose solutions for varied intonation, controllable sentence level prosodic features or prosody transfer. However, these control systems, for end-to-end TTS architectures, are not easily interpretable by an end-user. The goal of this internship is to study how to control prosodic features in the TTS output, in a way that remains interpretable (e.g. word level stress, declarative versus interrogative intonation, etc.).
The intern will work in a team with expertise in phonetics, ASR/TTS systems and NLG models. The work will consist in automating prosodic features extraction, training and evaluating a deep-neural network able to change the output’s prosodic features, at runtime, in an interpretable way.
Publication of results in major conferences/journals will be strongly encouraged.
 CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network, Vincent Wan and Chun-an Chan and Tom Kenter and Jakub Vit and Rob Clark, 2019
 Using generative modelling to produce varied intonation for speech synthesis, Zack Hodari, Oliver Watts, Simon King, 2019
 Fine-grained robust prosody transfer for single-speaker neural text-to-speech, Viacheslav Klimkov, Srikanth Ronanki, Jonas Rohnke, Thomas Drugman, 2019
NAVER LABS is a world class team of self-motivated and highly engaged researchers, engineers and interface designers collaborating together to create next generation ambient intelligence technology and services that are rich with the organic understanding they have of users, their contexts and situations.
Since 2013 LABS has led NAVER’s innovation in technology through products such as the AI-based translation app ‘Papago’, the omni-tasking web browser ‘Whale’, the virtual AI assistant ‘WAVE’, in-vehicle information entertainment system ‘AWAY’ and M1, the 3D indoor mapping robot.
The team in Europe is multidisciplinary and extremely multicultural specializing in artificial intelligence, machine learning, computer vision, natural language processing, UX and ethnography. We collaborate with many partners in the European scientific community on R&D projects.
NAVER LABS Europe is located in the south east of France in Grenoble. The notoriety of Grenoble comes from its exceptional natural environment and scientific ecosystem with 21,000 jobs in public and private research. It is home to 1 of the 4 French national institutes in AI called MIAI (Multidisciplinary Innovation in Ai) It has a large student community (over 62,000 students) and is a lively and cosmopolitan place, offering a host of leisure opportunities. Grenoble is close to both the Swiss and Italian borders and is the ideal place for skiing, hiking, climbing, hang gliding and all types of mountain sports.