StarDrinks: An English and Korean test set for evaluating LLMs in a coffee ordering deployment scenario

Published by Marcely Zanon Boito at 26 February 2026

Marcely Zanon Boito, Caroline Brun, Inyoung Kim, Denys Proux, Salah Aït-Mokhtar, Nikolaos Lagos, Jean-Luc Meunier, Ioan Calapodescu

International Conference on Language Resources and Evaluation (LREC), Palma de Mallorca, Spain, 11-16 May, 2026

Careers home

LLMs and speech assistants are increasingly used for task-oriented interactions, yet their evaluation often relies on controlled scenarios that fail to capture the variability and complexity of real user requests. Coffee ordering, for example, involves diverse named entities, drink types, sizes, customizations, and brand-specific terminology, as well as spontaneous speech phenomena such as hesitations and self-corrections. To address this gap, we introduce ‘StarDrinks’, a test set in English and Korean containing speech utterances, transcriptions, and annotated slots. Our dataset supports speech-to-slots SLU, transcription-to-slots NLU, and speech-to-transcription ASR evaluation, providing a realistic benchmark for model robustness and generalization in a linguistically rich, real-world task.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

StarDrinks: An English and Korean test set for evaluating LLMs in a coffee ordering deployment scenario

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings