This user has not added any information to their profile yet.
- BERGEN: A Benchmarking Library for Retrieval-Augmented Generation
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Miami, Florida, 12-16 November, 2024
- Retrieval-augmented generation in multilingual settings
Proceedings of the 1st Workshop on Towards Knowledgeable Language Models (KnowLLM 2024), Bangkok, Thailand, 16 August, 2024
- Context Embeddings for Efficient Answer Generation in RAG
Published on arXiv.org
- Two-step SPLADE: simple, efficient and effective approximation of SPLADE
Findings of the 46th European Conference on Information Retrieval (ECIR), Glasgow, Scotland, 24-28 March, 2024
- A thorough comparison of cross-encoders and LLMs for reranking SPLADE
Published on arXiv.org
- A static pruning study on sparse neural retrievers
46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), Taipei, Taiwan, 23-27 July, 2023
- Benchmarking middle-trained language models for neural search
46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), Taipei, Taiwan, 23-27 July, 2023
- Parameter-efficient sparse retrievers and rerankers using adapters
45th European Conference on Information Retrieval (ECIR), Dublin, Ireland, 2–6 April, 2023
- An experimental study on pretraining transformers from scratch for IR
45th European Conference on Information Retrieval (ECIR), Dublin, Ireland, 2–6 April, 2023
- Vital records: uncover the past from historical handwritten records
The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, held in conjunction with COLING 2020, Barcelona, Spain (virtual event), 12 December, 2020
- Transforming scholarship in the archives through handwritten text recognition
Journal of Documentation 75(5): 954-976 (2019)
- ICDAR 2019 competition on Table Detection and Recognition (cTDaR)
International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20-25 September, 2019
- Table rows segmentation
International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20-25 September, 2019
- Versatile layout understanding via conjugate graph
International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20-25 September, 2019
- Matching table structures of historical register books using association graphs
16th International Conference on Frontiers in Handwriting Recognition, Niagara Falls, USA, 5-8 August, 2018
- Comparing machine learning approaches for table recognition in historical register books
13th IAPR International Workshop on Document Analysis Systems, Vienna, Austria, 24-27 April, 2018
- Transkribus Python toolkit
International Workshop on Open Services and Tools for Document Analysis (ICDAR-OST), Kyoto, Japan, 10-12 November, 2017
- Extracting Structured Data from Unstructured Document with Incomplete Resources
ICDAR, Gammarth, Tunisia, August 23-26, 2015
- Document Structure Analysis: Modelling Unseeable Patterns
Workshop Machines and Manuscripts, Karlsruhe, Germany, 19-20 February, 2015.
- Using Ancestral Layout Models for Document Digitization
DATeCH conference on Digital Access to Textual Cultural Heritage, Madrid, Spain, 19th - 20th May 2014.
- Using Page Breaks for Book Structuring
Will be published in INEX 2011 Proceedings.
- Document: a Useful Level for Facing Noisy Data
AND 2010 Fourth Workshop on Analytics for Noisy Unstructured Text Data, October 26th, 2010, Toronto, Canada
- XML Processing in the Cloud: Large-Scale Digital Preservation in Small Institutions
IPDPS 2011 - 25th IEEE International Parallel & Distributed Processing Symposium, Anchorage (Alaska) USA, May 16-20, 2011
- Reflexions on the INEX Structure Extraction Competition
DAS (Document Analysis System), Boston, MA, USA, 9-11 June, 2010
- Xeproc: A Model-Based Approach towards Document Process Preservation
ECDL (European Conference on Digital Libraries), Glasgow, UK, 6-10 September 2010
- Unsupervised Method to Generate Page Template
DRR (Document Recognition and Retrieval)- San Francisco, CA, USA, 23-27 January 2011
- Numbered Sequence Detection in Documents
DRR 2009 (Document Recognition and Retrieval), San Jose, CA, USA, 20-22 January 2010
- XRCE Participation to the Book Structure Task (INEX 2008)
To appear in the proceedings of INEX 2008, Dagstulh, Germany
- Combining Multiple Methods for Book Indexing
8th International Workshop on Duciment Analysis Systems 2008 (DAS 2008), Nara, Japan, Sep 17-18, 2008
- About Tables of Contents and How to Recognize Them
To appear in International Journal of Document Analysis and Recognition (IJDAR). <BR> Full paper available on Springer Website
- Versatile Page Numbering Analysis
Document Recognition and Retrieval XV, part of the IS&T/SPIE International Symposium on Electronic Imaging, San Jose, California, USA, 26-31 January 2008.
- Logical Document Conversion: Combining Functional and Formal Knowledge
Symposium on Document Engineering, Winnipeg, Canada, August 28-31, 2007.
- System for Converting PDF Documents into Structured XML Format
7TH IAPR Workshop on Document Analysis Systems, Nelson, New Zealand, 13-15 February 2006.
- From legacy Document to XML: A conversion Framework
9th European Conference on Research and Advanced Technology for Digital Libraries, Vienna, Austria, September 18-23, 2005.
- Structuring Documents According to their Table of Contents
DocEng 05, Bristol, UK, November 2-4, 2005.
- A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora
42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, July 25-26, 2004.
- Generative vs discriminative approaches to entity recognition from label deficient data
JADT 2004, 7èmes journées internationales analyse statistique des données textuelles, Louvain-la-Neuve, Belgium, 10-12 March, 2004
- Report on CLEF-2003 Experiments: Two Ways of Extracting Multilingual Resources from Corpora
CLEF 2003, Norway, Trondheim, August 21-22, 2003.
- Assessing Automatically Extracted Bilingual Lexicons for CLIR in Vertical Domains
To appear in "Lecture Notes in Computer Science"
- Reducing Parameter Space for Word Alignment
https://www.cs.unt.edu/~rada/wpt/NAACL/HLT Workshop Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, Edmonton, Canada, May 31, 2003.
- Automatic Processing of Multilingual Medical Terminology: Applications to Thesaurus Enrichment and Cross-language Information Retrieval
Artif Intell Med. 2005 Feb;33(2):111-24. PMID: 15811780 [PubMed - indexed for MEDLINE]
- Bilingual Lexicon Extraction: Using and Enriching Multilingual Thesauri
Proc. of Terminology Knowledge Extraction, Nancy, France, August 25-30, 2002.
- Bilingual Terminology Extraction: an Approach based on a Multilingual Thesaurus Applicable to Comparable Corpora
Proc. of COLING, Tapei, Taiwan, 24-30 August, 2002.
- Combining Labelled and Unlabelled Data : a Case Study on Fisher Kernels and Transductive Inference for Biological Entity Recognition
Proc. of Sixth Conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, 24-25 August, 2002.