
This user has not added any information to their profile yet.
Publications
- RANa: Retrieval-Augmented Navigation
 Transactions on Machine Learning Research (TMLR), January 2026
 
- PISCO: Pretty simple compression for retrieval-augmented generation
 The 63rd Annual Meeting of the Association for Computational Linguistics (ACL), Vienna, Austria, 27 July - 1 August, 2025
 
- Reranking with compressed document representation
 arXiv:2505.15394
 
- OSCAR: Online Soft Compression And Reranking
 arXiv:2504.07109
 
- Context Embeddings for Efficient Answer Generation in RAG
 The 18th ACM International Conference on Web Search and Data Mining (WSDM), Hannover, Germany, 10-14 March, 2025
 
- Let your LLM generate a few tokens and you will reduce the need for retrieval
 arXiv:2412.11536
 
- BERGEN: A Benchmarking Library for Retrieval-Augmented Generation
 Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Miami, Florida, 12-16 November, 2024
 
- Retrieval-augmented generation in multilingual settings
 Proceedings of the 1st Workshop on Towards Knowledgeable Language Models (KnowLLM), in conjunction with the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), Bangkok, Thailand, 16 August, 2024
 
- SPLATE: Sparse Late Interaction Retrieval
 The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington D.C., USA, 14-18 July, 2024
 
- Two-step SPLADE: simple, efficient and effective approximation of SPLADE
 Findings of the 46th European Conference on Information Retrieval (ECIR), Glasgow, Scotland, 24-28 March, 2024
 
- A thorough comparison of cross-encoders and LLMs for reranking SPLADE
 arXiv:2403.10407
 
- A static pruning study on sparse neural retrievers
 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), Taipei, Taiwan, 23-27 July, 2023
 
- Benchmarking middle-trained language models for neural search
 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), Taipei, Taiwan, 23-27 July, 2023
 
- Parameter-efficient sparse retrievers and rerankers using adapters
 45th European Conference on Information Retrieval (ECIR), Dublin, Ireland, 2–6 April, 2023
 
- An experimental study on pretraining transformers from scratch for IR
 45th European Conference on Information Retrieval (ECIR), Dublin, Ireland, 2–6 April, 2023
 
- Vital records: uncover the past from historical handwritten records
 The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, held in conjunction with COLING 2020, Barcelona, Spain (virtual event), 12 December, 2020
 
- Transforming scholarship in the archives through handwritten text recognition
 Journal of Documentation 75(5): 954-976 (2019)
 
- ICDAR 2019 competition on Table Detection and Recognition (cTDaR)
 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20-25 September, 2019
 
- Table rows segmentation
 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20-25 September, 2019
 
- Versatile layout understanding via conjugate graph
 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20-25 September, 2019
 
- Matching table structures of historical register books using association graphs
 16th International Conference on Frontiers in Handwriting Recognition, Niagara Falls, USA, 5-8 August, 2018
 
- Comparing machine learning approaches for table recognition in historical register books
 13th IAPR International Workshop on Document Analysis Systems, Vienna, Austria, 24-27 April, 2018
 
- Transkribus Python toolkit
 International Workshop on Open Services and Tools for Document Analysis (ICDAR-OST), Kyoto, Japan, 10-12 November, 2017
 
- Extracting Structured Data from Unstructured Document with Incomplete Resources
 ICDAR, Gammarth, Tunisia, August 23-26, 2015
 
- Document Structure Analysis: Modelling Unseeable Patterns
 Workshop Machines and Manuscripts, Karlsruhe, Germany, 19-20 February, 2015.
 
- Using Ancestral Layout Models for Document Digitization
 DATeCH conference on Digital Access to Textual Cultural Heritage, Madrid, Spain, 19th - 20th May 2014.
 
- Using Page Breaks for Book Structuring
 Will be published in INEX 2011 Proceedings.
 
- Document: a Useful Level for Facing Noisy Data
 AND 2010 Fourth Workshop on Analytics for Noisy Unstructured Text Data, October 26th, 2010, Toronto, Canada
 
- XML Processing in the Cloud: Large-Scale Digital Preservation in Small Institutions
 IPDPS 2011 - 25th IEEE International Parallel & Distributed Processing Symposium, Anchorage (Alaska) USA, May 16-20, 2011
 
- Reflexions on the INEX Structure Extraction Competition
 DAS (Document Analysis System), Boston, MA, USA, 9-11 June, 2010
 
- Xeproc: A Model-Based Approach towards Document Process Preservation
 ECDL (European Conference on Digital Libraries), Glasgow, UK, 6-10 September 2010
 
- Unsupervised Method to Generate Page Template
 DRR (Document Recognition and Retrieval)- San Francisco, CA, USA, 23-27 January 2011
 
- Numbered Sequence Detection in Documents
 DRR 2009 (Document Recognition and Retrieval), San Jose, CA, USA, 20-22 January 2010
 
- XRCE Participation to the Book Structure Task (INEX 2008)
 To appear in the proceedings of INEX 2008, Dagstulh, Germany
 
- Combining Multiple Methods for Book Indexing
 8th International Workshop on Duciment Analysis Systems 2008 (DAS 2008), Nara, Japan, Sep 17-18, 2008
 
- About Tables of Contents and How to Recognize Them
 To appear in International Journal of Document Analysis and Recognition (IJDAR). <BR> Full paper available on Springer Website
 
- Versatile Page Numbering Analysis
 Document Recognition and Retrieval XV, part of the IS&T/SPIE International Symposium on Electronic Imaging, San Jose, California, USA, 26-31 January 2008.
 
- Logical Document Conversion: Combining Functional and Formal Knowledge
 Symposium on Document Engineering, Winnipeg, Canada, August 28-31, 2007.
 
- System for Converting PDF Documents into Structured XML Format
 7TH IAPR Workshop on Document Analysis Systems, Nelson, New Zealand, 13-15 February 2006.
 
- From legacy Document to XML: A conversion Framework
 9th European Conference on Research and Advanced Technology for Digital Libraries, Vienna, Austria, September 18-23, 2005.
 
- Structuring Documents According to their Table of Contents
 DocEng 05, Bristol, UK, November 2-4, 2005.
 
- A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora
 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, July 25-26, 2004.
 
- Generative vs discriminative approaches to entity recognition from label deficient data
 JADT 2004, 7èmes journées internationales analyse statistique des données textuelles, Louvain-la-Neuve, Belgium, 10-12 March, 2004
 
- Report on CLEF-2003 Experiments: Two Ways of Extracting Multilingual Resources from Corpora
 CLEF 2003, Norway, Trondheim, August 21-22, 2003.
 
- Assessing Automatically Extracted Bilingual Lexicons for CLIR in Vertical Domains
 To appear in "Lecture Notes in Computer Science"
 
- Reducing Parameter Space for Word Alignment
 https://www.cs.unt.edu/~rada/wpt/NAACL/HLT Workshop Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, Edmonton, Canada, May 31, 2003.
 
- Automatic Processing of Multilingual Medical Terminology: Applications to Thesaurus Enrichment and Cross-language Information Retrieval
 Artif Intell Med. 2005 Feb;33(2):111-24. PMID: 15811780 [PubMed - indexed for MEDLINE]
 
- Bilingual Lexicon Extraction: Using and Enriching Multilingual Thesauri
 Proc. of Terminology Knowledge Extraction, Nancy, France, August 25-30, 2002.
 
- Bilingual Terminology Extraction: an Approach based on a Multilingual Thesaurus Applicable to Comparable Corpora
 Proc. of COLING, Tapei, Taiwan, 24-30 August, 2002.
 
- Combining Labelled and Unlabelled Data : a Case Study on Fisher Kernels and Transductive Inference for Biological Entity Recognition
 Proc. of Sixth Conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, 24-25 August, 2002.
 
