Internship: Retrieval Augmented Generation

Published by Aurelia Cascarano at 12 November 2024

This post has expired. Please check the Careers Board for all current job openings and internships.

Published

12 November 2024

Location

Meylan, France

Description

We're looking for students interested in improving retrieval augmented generation (RAG) in multilingual or multi-domain scenarios.

Based on a simple idea of augmenting user requests with relevant passages retrieved from the Internet or a given datastore, RAG has recently emerged as a promising solution to improve LLM factuality and grounded attribution. Despite high attention this research topic has received in recent years, the vast majority of work only focusses on Wikipedia-based English settings in their experiments, trained models or collected datasets. At the same time, initial efforts were made to evaluate or extend RAG to multi-domain [1, 2] or multilingual [3-6] settings.

The topic of this internship will be related to continuing to improve advanced RAG pipelines so that they better support queries and contexts in non-English or from various domains.

Internship supervisors: Nadezhda Chirkova, Thibault Formal, Vassilina Nikoulina

The intern will collaborate with a team of researchers with background in RAG [5-6], natural language generation [7-9] and information retrieval [10-12].

Required skills

- PhD or final year MSc student in NLP-related domains
- Solid deep learning and NLP background
- Experience with Pytorch toolkit

References

1: RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation. Dongyu Ru et al., NeurIPS 2024

2: RAFT: Adapting Language Model to Domain Specific RAG. Tianjun Zhang et al., COLM 2024

3: MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems. Nandan Thakur et al., 2024

4: Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation. Suhang Wu et al., 2024

5: BERGEN: A Benchmarking Library for Retrieval-Augmented Generation. David Rau, Hervé Déjean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, Vassilina Nikoulina, Stéphane Clinchant. Findings of EMNLP 2024

6: Retrieval-augmented generation in multilingual settings. Nadezhda Chirkova, David Rau, Hervé Déjean, Thibault Formal, Stéphane Clinchant, Vassilina Nikoulina. Knowledgeable LLMs workshop @ ACL 2024

7: Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks. Nadezhda Chirkova and Vasilina Nikoulina, NAACL 2024

8: Zero-shot cross-lingual transfer in instruction tuning of large language models. Nadezhda Chirkova and Vasilina Nikoulina, INLG 2024

9: BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting. Zheng Xin Yong et al., ACL 2023.

10: SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant, SIGIR 2021

11: MS-Shift: An Analysis of MS MARCO Distribution Shifts on Neural Retrieval. Simon Lupart, Thibault Formal, Stéphane Clinchant, ECIR 2023

12: Splate: Sparse late interaction retrieval. Thibault Formal, Stéphane Clinchant, Hervé Déjean, Carlos Lassance. SIGIR 2024

Application instructions

Please note that applicants must be registered students at a university or other academic institution and that this establishment will need to sign an 'Internship Convention' with NAVER LABS Europe before the student is accepted.

You can apply for this position online. Don't forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.

About NAVER LABS

NAVER is the #1 Internet portal in Korea with activities that span a wide range of businesses including search, commerce, content, financial and cloud platforms.

NAVER LABS, co-located in Korea and France, is the organization dedicated to preparing NAVER’s future. NAVER LABS Europe is located in a spectacular setting in Grenoble, in the heart of the French Alps. Scientists at NAVER LABS Europe are empowered to pursue long-term research problems that, if successful, can have significant impact and transform NAVER. We take our ideas as far as research can to create the best technology of its kind. Active participation in the academic community and collaborations with world-class public research groups are, among others, important tools to achieve these goals. Teamwork, focus and persistence are important values for us.

NAVER LABS Europe is an equal opportunity employer.

Related Jobs

Neural approaches to stochastic combinatorial optimisation 6 Chem. de Maupertuis, 38240 Meylan, France, Grenoble, France new

30 January 2025

Internship: Advanced Constraint Processing in LLMs 6 Chem. de Maupertuis, 38240 Meylan, France, Meylan, France

20 January 2025

Research Scientist in Human-Centric Computer Vision Meylan, Grenoble, France, France

14 June 2024

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.

Description

Required skills

References

Application instructions

About NAVER LABS

Related Jobs

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

All

Publications

Blog

News

Code & Data

Careers

People

NAVER FRANCE Gender Equality 2024

NAVER FRANCE Gender Equality 2023

Action

Internship: Retrieval Augmented Generation

Description

Required skills

References

Application instructions

About NAVER LABS

Related Jobs

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings