Grenoble Data Science meetup: Efficient online text compression for RAG

Published by Claudia Heyer at 6 November 2025

6^th November 2025, 19:00:

Stéphane Clinchant: Efficient online text compression for RAG.
Abstract: Retrieval-Augmented Generation (RAG) significantly improves LLM accuracy by grounding responses in external documents. However, this accuracy often comes at the cost of speed, as longer contexts increase processing latency. This talk will share how to apply novel compression techniques to achieve faster RAG—dramatically reducing context length and latency—while maintaining response quality.

The talk will be based on recent publications:

Provence: efficient and robust context pruning for retrieval-augmented generation, ICLR 2025

PISCO: Pretty simple compression for retrieval-augmented generation, ACL 2025

OSCAR: Online Soft Compression And Reranking, arXiv

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

Grenoble Data Science meetup: Efficient online text compression for RAG

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings