Neural Indexing for Deep Information Retrieval - Internship - Naver Labs Europe
NAVER LABS Europe
Published
22 October 2020
Location
Meylan, Grenoble, France, France
Category
Start date
At the latest Mid-March
Duration
5-6 months

Description

Pretrained Language Models (LM) such as ELMO and BERT (Peters et al., 2018; Devlin et al., 2018 [1]) have turned out to significantly improve the quality of several Natural Language Processing (NLP) tasks by transferring the prior knowledge learned from data-rich monolingual corpora to data-poor NLP tasks such as question answering or bio-medical information extraction. In Information Retrieval (IR), BERT-based models have recently overtaken traditional learning to rank models which have been leading the field for many years. This is now an exciting time to work in this subject as new possibilities and research questions emerge.

In Information Retrieval, the ranking pipeline is generally decomposed in two stages: the first stage is focusing on retrieving a candidate set from the whole collection, providing documents for the second stage, which focuses on re-ranking the candidates using more complex techniques. Because of the size of web scale corpora, the first step is heavily conditioned on efficiency, and thereby generally relies on inverted index structure and BM25 algorithm. It is also required that it optimizes for recall, hence providing as much as possible relevant documents for the re-ranking part. On the contrary, by considering a reduced candidate set to rank, the second step has heavily been relying on machine learning, ranging from learning to rank on handcrafted features, neural ranking architectures, to BERT-based rankers that achieved state-of-the-art results on several benchmarks (https://microsoft.github.io/msmarco/).

While many works have been improving the latter stage, the first one still relies on bag-of-words matching, and hence remains a bottleneck of the pipeline. SNRM [2] was the first model that proposed to directly tackle the first stage of a ranking pipeline using neural networks.

This year, several alternative first stage rankers have been proposed, based on BERT and quantization library such as FAISS [5]. In this internship, we propose to explore and benchmarks several indexing methods for deep information retrieval.

We are looking for someone with good coding skills, a great scientific rigor and creativity.

You will join a team of people working on this topic, learn about deep information retrieval, have access to many GPUs and experiment with novel ideas.

Required skills

- Fundamental knowledge of deep learning
- Familiarity with hashing, quantization is a plus
- Pytorch

References

- [1] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, https://arxiv.org/abs/1810.04805

- [2] From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing, https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1302

- [3] ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, https://arxiv.org/abs/2004.12832

- [4] Efficient Document Re-Ranking for Transformers by Precomputing Term Representations: https://arxiv.org/abs/2004.14255

- [5] FAISS: https://github.com/facebookresearch/faiss

Application instructions

You can apply for this position online. Don't forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.
Due to the changing travel restrictions related to COVID-19, it may not be possible to host candidates from certain regions. This will depend on the conditions at the specific starting date of the internship.

About NAVER LABS

NAVER LABS Europe has full-time positions, PhD and PostDoc opportunities throughout the year which are advertised here and on international conference sites that we sponsor such as CVPR, ICCV, ICML, NeurIPS, EMNLP etc.

NAVER LABS Europe is an equal opportunity employer.

NAVER LABS are in Grenoble in the French Alps. We have a multi and interdisciplinary approach to research with scientists in machine learning, computer vision, artificial intelligence, natural language processing, ethnography and UX working together to create next generation ambient intelligence technology and services that deeply understand users and their contexts.

Apply to this internship
Drop files here browse files ...
Drop files here browse files ...
Drop files here browse files ...
Captcha

Related Jobs

25 November 2020
Full Body 3D Human Pose in the Wild - Internship   Meylan, Grenoble, France, France
19 November 2020
16 November 2020
Learning to grasp as a human demonstration - Internship   Meylan, Grenoble, France, France
16 November 2020
9 November 2020
Are you sure you want to delete this file?
/