NAVER LABS Europe seminars are open to the public. This seminar is virtual and requires registration
Date: 4th September 2024, 3:00 pm (CEST)
Rethinking approximate value iteration algorithms
About the speaker: Théo Vincent is a Ph.D. student at the Technical University of Darmstadt and at DFKI, the German Institute for Artificial Intelligence. He is currently working on off-policy Reinforcement Learning methods under the supervision of Boris Belousov and Jan Peters. Before his Ph.D., Théo graduated from MVA at ENS Paris Saclay. He also worked on Compute Vision problems in a Parisian lab, Saint-Venant lab, and a Swedish start-up, Signality. Théo did an internship in biostatistics at Harvard Medical School.
Abstract: Approximate value iteration (AVI) is a family of algorithms for Reinforcement Learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. The Bellman operator leverages transition samples, which strongly determine its behavior, as uninformative samples can result in negligible updates or long detours, whose detrimental effects are further exacerbated by the computationally intensive projection step. In this talk, I will present two ideas to addressing these issues. One idea is to learn an approximate version of the Bellman operator to mitigate our dependency on the data. The other idea is to consider multiple iterations of the Bellman operator at once to improve the learning of the projection step.