Development of a Robust Model for Accented Speech Recognition – Internship

Published by Aurelia Cascarano at 16 April 2024

Published

16 April 2024

Location

Meylan, Grenoble, France, France

Description

Automatic speech recognition (ASR) systems have seen substantial improvements in the past decade, in particular with the advent of Self-supervised learning speech models; however, ASR systems do not recognize the speech of everyone equally well. Recent research shows that bias exists against different types of speech, including non-native and regional accents [5] [6], in state-of-the-art ASR systems. To attain robust speech recognition regardless of speakers’ accents, bias mitigation is necessary [8], [9].

The goal of this internship is twofold: (1) quantifying bias in speech recognition of English accented speech for widely used pre-trained speech models and across different model sizes; (2) exploring methods for mitigating accent bias and implementing models that yield improved performance on accented speech recognition.

The intern for this position is expected to perform the following tasks:

Identifying a relevant set of datasets for studying bias in accented speech (covering different speech style and rich speaker variability). The study will focus on accented English.
Evaluating speech pre-trained models (e.g., wav2vec2 [1], huBERT [2], Whisper [3], …) with different sizes, using appropriate metrics that measure bias in accented speech.
Implement methods to reduce bias of pre-trained models in accented speech. Different approaches will be explored, including data selection techniques and contextualized representations and decoding methods.
Conduct a comprehensive analysis of the results.

This internship is part of an ANR project called DIKÉ (https://www.anr-dike.fr/), which aims at studying bias, fairness and ethics of compressed NLP models. Results are expected to be reported in a paper by the end of the internship (or soon after). Improved models are also expected to be shared with the scientific community through HuggingFace models hub.

Supervisors: Caroline Brun, Salah Ait-Mokhtar and Nikolaos Lagos.

Required skills

- PhD or last year MSc student in NLP or speech processing
- Solid deep learning and NLP/speech processing background
- Advanced expertise in neural network architectures
- Excellent programming skills in Python and proficiency in PyTorch

Biography

[1] Baevski, Alexei, et al. "wav2vec 2.0: A framework for self-supervised learning of speech representations." Advances in neural information processing systems 33 (2020): 12449-12460.
[2] Hsu, Wei-Ning, et al. "Hubert: Self-supervised speech representation learning by masked prediction of hidden units." IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021): 3451-3460.
[3] Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.
[4] Siyin Wang and Chao-Han Huck Yang and Ji Wu and Chao Zhang. “Can Whisper perform speech-based in-context learning?” Proceedings of ICASSP2024, Seoul, Korea, 2024.
[5] Feng, Siyuan and Halpern, Bence Mark and Kudina, Olya and Scharenborg, Odette. “Towards inclusive automatic speech recognition”, Computer Speech and Languages, Vol. 84, 2024.
[6] Koenecke A., Nam A., Lake E., Nudell J., Quartey M., Mengesha Z., Toups C., Rickford J.R., Jurafsky D., Goel S. “Racial disparities in automated speech recognition”. Proc. Natl. Acad. Sci., 117 (14) (2020), pp. 7684-7689.
[7] Yuanyuan Zhang and Aaricia Herygers and Tanvina Patel and Zhengjun Yue and Odette Scharenborg. “Exploring data augmentation in bias mitigation against non-native-accented speech”, arXiv: 2312.15499.
[8] Darshan Prabhu, Preethi Jyothi, Sriram Ganapathy, and Vinit Unni. 2023. “Accented Speech Recognition With Accent-specific Codebooks”. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7175–7188, Singapore. Association for Computational Linguistics.
[9] Juan Zuluaga-Gomez and Sara Ahmed and Danielius Visockas and Cem Subakan. “CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice”. arXiv:2305.18283

Application instructions

Please note that applicants must be registered students at a university or other academic institution and that this establishment will need to sign an 'Internship Convention' with NAVER LABS Europe before the student is accepted.

You can apply for this position online. Don't forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.

About NAVER LABS

NAVER is the #1 Internet portal in Korea with activities that span a wide range of businesses including search, commerce, content, financial and cloud platforms.

NAVER LABS, co-located in Korea and France, is the organization dedicated to preparing NAVER’s future. NAVER LABS Europe is located in a spectacular setting in Grenoble, in the heart of the French Alps. Scientists at NAVER LABS Europe are empowered to pursue long-term research problems that, if successful, can have significant impact and transform NAVER. We take our ideas as far as research can to create the best technology of its kind. Active participation in the academic community and collaborations with world-class public research groups are, among others, important tools to achieve these goals. Teamwork, focus and persistence are important values for us.

NAVER LABS Europe is an equal opportunity employer.

Related Jobs

Research Scientist in Human-Centric Computer Vision Meylan, Grenoble, France, France

14 June 2024

Research Scientist in 3D Vision Meylan, Grenoble, France, France

26 March 2024

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.

Description

Required skills

Biography

Application instructions

About NAVER LABS

Related Jobs

All

Publications

Blog

News

Code & Data

Careers

People

NAVER FRANCE Gender Equality 2024

NAVER FRANCE Gender Equality 2023

VISION

Perception to help robots understand and interact with the environment.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

Action

Development of a Robust Model for Accented Speech Recognition – Internship

Description

Required skills

Biography

Application instructions

About NAVER LABS

Related Jobs

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings