NAVER LABS Europe seminars are open to the public. This seminar is virtual and requires registration
Date: 27th November 2023, 4:00 pm (GMT +1)
Discriminator-Guided Chain-of-Thought Reasoning
About the speaker: Muhammad Khalifa is a third-year Ph.D. candidate at the University of Michigan in Ann Arbor, advised by Lu Wang and Honglak Lee. His research interests revolve around large language models, complex reasoning, and controlled generation. Muhammad has previously done internships at NAVER Labs Europe, and Amazon AWS, and is currently an intern at the Allen Institute for AI.
Abstract: During this talk, we’ll explore the challenges Large Language Models (LLMs) face with chain-of-thought (multi-step) reasoning, often leading them to invalid solutions when using standard decoding techniques. As LLMs can assign a high probability to incorrect reasoning steps and vice versa, decoding techniques that optimize for sequence probability can easily produce incorrect reasoning. I will begin the talk by discussing the issues with standard decoding techniques in reasoning as well as the limitations of post hoc approaches such as self-consistency and verifiers. Then I will introduce GRACE—a guided decoding method that leverages a specially trained discriminator to guide the LLM decoding toward correct reasoning steps. We’ll show that GRACE can boost the reasoning of LLMs on mathematical and symbolic tasks, producing not just correct final answers but also reliable reasoning chains, while outperforming standard decoding and post hoc techniques. The talk will conclude with a discussion on the limitations and future directions for inference-time methods aimed at advancing LLM reasoning.