Carlos Lassance, Hervé Déjean, Stéphane Clinchant |
45th European Conference on Information Retrieval (ECIR), Dublin, Ireland, 2–6 April, 2023 |
Finetuning Pretrained Language Models (PLM) for IR has been de facto the standard practice since their breakthrough effectiveness few years ago. But, is this approach well understood and well founded?
In this paper, we study the impact of the pretraining collection on the final IR effectiveness. In particular, we challenge the current hypothesis that PLM shall be trained on a large enough generic collection and we show that pretraining from scratch on the collection of interest is surprisingly competitive with the current approach. We benchmark first-stage ranking rankers and cross-encoders for reranking on the task of general passage retrieval on MSMARCO, Mr-Tydi for Arabic, Japanese and Russian, and TripClick for specific domain. Contrary to popular belief, we show that, for finetuning first-stage rankers, models pretrained solely on their collection have equivalent or better effectiveness compared to more general models. However, there is a slight effectiveness drop for rerankers pretrained only on the target collection. Overall, our study sheds a new light on the role of the pretraining collection and should make our community ponder on building specialized models by pretraining from scratch. Last but not least, doing so could enable better control of efficiency, data bias and replicability, which are key research questions for the IR community.
1. Difference in female/male salary: 33/40 points
2. Difference in salary increases female/male: 35/35 points
3. Salary increases upon return from maternity leave: uncalculable
4. Number of employees in under-represented gender in 10 highest salaries: 0/10 points
NAVER France targets (with respect to the 2022 index) are as follows:
En 2022, NAVER France a obtenu les notes suivantes pour chacun des indicateurs :
1. Les écarts de salaire entre les femmes et les hommes: 33 sur 40 points
2. Les écarts des augmentations individuelles entre les femmes et les hommes : 35 sur 35 points
3. Toutes les salariées augmentées revenant de congé maternité : non calculable
4. Le nombre de salarié du sexe sous-représenté parmi les 10 plus hautes rémunérations : 0 sur 10 points
Les objectifs de progression pour l’index 2022 de NAVER France sont :
NAVER LABS Europe 6-8 chemin de Maupertuis 38240 Meylan France Contact
This web site uses cookies for the site search, to display videos and for aggregate site analytics.
Learn more about these cookies in our privacy notice.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.