NAVER LABS Europe virtual seminars are open to the public. Please register here for your participation (Zoom event).
Date: 18th February 2025, 11:00 am (CET)
Examining modularity in multilingual LMs via language-specialized subnetworks
About the speaker: Rochelle Choenni is a postdoctoral researcher in Natural Language Processing (NLP) working with Ivan Titov. Her main research interests include multilingual and cross-cultural NLP, modular deep learning, interpretability and social biases in language models. Previously, she obtained a PhD in NLP at the University of Amsterdam (UvA) under the supervision of prof. Ekaterina Shutova and Dr. Dan Garrette (Google Research). Before her PhD she graduated with a bachelor’s and master’s degree in Artificial Intelligence from the UvA.
Abstract: Multilingual language models (MLMs) are jointly trained on data from many different languages such that representation of individual languages can benefit from other languages’ data. Impressive performance in zero-shot cross-lingual transfer shows that these models are able to exploit this property. Yet, it remains unclear to what extent, and under which conditions, languages rely on each other’s data. To answer this question, we developed an approach to measure cross-language influence using a training data attribution method. Specifically, we test how much influence training data examples from particular training languages exert cross-lingually on the predictions for individual test languages. This allows us to analyse cross-lingual sharing mechanisms of MLMs from a new perspective. We find that MLMs rely on data from multiple languages and this reliance increases as fine-tuning progresses. Moreover, we use the proposed measure for cross-language influence to examine modularity in MLMs. Specifically, we studied the emergence of language-specialized subnetworks in pretrained MLMs and studied the effect that sparse fine-tuning (SFT) has on the degree of language specialization of subnetworks. Interestingly, our results suggest that the success of SFT can not be attributed to stronger modularity in the form of language-specialized subnetworks.