Offensive Content Mitigation Research – Internship

Published by Irene Maxwell at 18 October 2024

Published

18 October 2024

Location

Meylan, Grenoble, France, France

Description

Modern LLMs have acquired impressive language understanding while being trained on a massive amount of data but they may struggle to generate responses that align with user preferences and expectations for the input request. In deployment systems, a crucial task is to ensure that the generated content is free of offensive expressions and patronizing language to address the safety risks posed by deployed systems, such as chatbots and conversational agents. While a lot of effort is invested into alignment of LLMs [1,2,3,4,5], the safety risk is still existent, especially for non-English content [6,7,8,9]. Moreover, many aligned models tend to overreact to certain “trigger patterns” (eg. swear words, mention of protected attributes, etc.) and may wrongly refuse to answer inoffensive questions, which results in existing tension between “helpfulness” and “safety”. Models’ over-reliance on such patterns makes detection of implicit hate speech more challenging [10,11,12,13].

The goal of this internship is to investigate strategies to diminish offensive content generation focusing on implicit offensive speech in multilingual settings.

This internship is part of an ANR project called DIKÉ, which aims at studying bias, fairness and ethics of compressed NLP models. Results are expected to be reported in a paper by the end of the internship (or soon after). The internship will be hosted at NAVER LABS Europe and co-supervised by NAVER LABS and Lyon 2 University researchers.

Supervisors: Caroline Brun and Vassilina Nikoulina

Required skills

- PhD or last year MSc student in NLP-related domains
- Solid deep learning and NLP background
- Strong programming skills, with knowledge of PyTorch, NumPy and the HF Transformers
- Familiarity with recent preference optimization techniques, such as DPO, is a plus
- Ability to communicate in English; knowledge of French is an advantage.

References

[1] Compositional Preference Models for Aligning LMs, Go et al., ICLR 2024

[2] Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs, Ahmadian et al., ACL 2024

[3] Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2. Ivison et al., arXiv:2311.10702

[4] Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafailov et al., NeurIPS 2023

[5] Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models, Pozzobon et al., EMNLP Findings 2023

[6] Preference tuning for toxicity mitigation generalizes across languages, Li et al., arXiv:2406.16235

[7] From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models, Ermis et al. ACL Findings 2024.

[8] Polyglo Toxicity Prompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models, Jain et al, arXiv:2405.09373

[9] FrenchToxicityPrompts: a large benchmark for evaluating and mitigating toxicity in French Texts. Brun and Nikoulina, TRAC workshop (LREC-COLING) 2024

[10] Playing the Part of the Sharp Bully: Generating Adversarial Examples for Implicit Hate Speech Detection, Ocampo et al., ACL Findings 2023

[11] An in-depth analysis of implicit and subtle hate speech messages, Ocampo et al. EACL 2023.

[12] Latent Hatred: A Benchmark for Understanding Implicit Hate Speech, ELSherief et al., EMNLP 2021.

[13] Don't Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection, Zang et al., ACL 2024

Application instructions

Please note that applicants must be registered students at a university or other academic institution and that this establishment will need to sign an 'Internship Convention' with NAVER LABS Europe before the student is accepted.

You can apply for this position online. Don't forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.

About NAVER LABS

NAVER is the #1 Internet portal in Korea with activities that span a wide range of businesses including search, commerce, content, financial and cloud platforms.

NAVER LABS, co-located in Korea and France, is the organization dedicated to preparing NAVER’s future. NAVER LABS Europe is located in a spectacular setting in Grenoble, in the heart of the French Alps. Scientists at NAVER LABS Europe are empowered to pursue long-term research problems that, if successful, can have significant impact and transform NAVER. We take our ideas as far as research can to create the best technology of its kind. Active participation in the academic community and collaborations with world-class public research groups are, among others, important tools to achieve these goals. Teamwork, focus and persistence are important values for us.

NAVER LABS Europe is an equal opportunity employer.

Apply Online

Apply to this internship

Your name *

Your e-mail address *

Message

CV *

Drop files here browse files ...

Application letter

Drop files here browse files ...

Recommendation letter

Drop files here browse files ...

How did you hear about us ?

GDPR *

Please note by submitting your application you will be willingly sharing personal data such as your name and email address. NAVER LABS Europe will not use this personal data for any other purpose than to process your application.
NAVER LABS Europe is committed to protecting the rights and interests of our users by strictly adhering to the regulations in place for the protection of personal data.
Read the NAVER LABS Europe privacy notice for more information.

Captcha

Captcha *

Related Jobs

Research Scientist in Visual Representation Learning Meylan, Grenoble, France, France new

17 October 2024

Research Scientist in Machine Learning and Optimization Meylan, Grenoble, France, France

3 September 2024

Research Scientist in Human-Centric Computer Vision Meylan, Grenoble, France, France

14 June 2024

Research Scientist in 3D Vision Meylan, Grenoble, France, France

26 March 2024

Machine Learning Research Scientist for Robotics Meylan, Grenoble,France, France

19 March 2024

Are you sure you want to delete this file?

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.

Description

Required skills

References

Application instructions

About NAVER LABS

Related Jobs

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

All

Publications

Blog

News

Code & Data

Careers

People

NAVER FRANCE Gender Equality 2024

NAVER FRANCE Gender Equality 2023

Action

Offensive Content Mitigation Research – Internship

Description

Required skills

References

Application instructions

About NAVER LABS

Related Jobs

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings