6th November 2025, 19:00:
Stéphane Clinchant: Efficient online text compression for RAG.
Abstract: Retrieval-Augmented Generation (RAG) significantly improves LLM accuracy by grounding responses in external documents. However, this accuracy often comes at the cost of speed, as longer contexts increase processing latency. This talk will share how to apply novel compression techniques to achieve faster RAG—dramatically reducing context length and latency—while maintaining response quality.
The talk will be based on recent publications:
Provence: efficient and robust context pruning for retrieval-augmented generation, ICLR 2025
PISCO: Pretty simple compression for retrieval-augmented generation, ACL 2025

