2025 Academic Thesis Prize: Victor BOONE

Headlines, Research
Victor BOONE received the 2025 Academic Thesis Prize for his research work among PhDs graduating in 2024.

Thesis Title:  Prise de décision dans les systèmes multi-agents : délais, adaptabilité et apprentissage dans les jeux

Victor BOONE, lauréat du prix de thèse académique 2025In this thesis, we investigate the problem of regret minimization in Markov decision processes under the average gain criterion.
Markov decision processes are a natural way to model the interaction between an agent and their environment. This interaction can be thought as a game, during which the agent is a player that observes their environment in its entirety and takes decisions accordingly. These decisions are choices of “actions”. Playing one action has two notable effects: It influences the way the environment evolves (by changing of state) and the reward that the agent gets. The goal of the agent is to play the right actions to earn as much reward as possible in the long run.

In this thesis, we are interested in the problem where the agent observes the system in its entirety but does not know its very mechanism. More precisely, the agent must learn the effects of every action depending on the current state of the environment. Should one play an action that seems to be the best so far? Or, should one play that other action that has been bad so far, but that may have been bad only because of bad luck? This dilemma is called the exploration-exploitation dilemma and finding the right balance between these two is the heart of such learning problems.
The goal of the sis is the design of algorithms that manage the exploration-exploitation dilemma optimally. Our prism is mostly theoretic and our approach systematic: We show that no agent can learn “too fast” in the first place, by describing a lower bound on achievable performance. Then, inspired by this lower bound, we design a learning algorithm that reaches it. This approach is used in the two main-stream variations of this learning problem (the instance dependent analysis and the worst-case analysis). Beyond average performance, we further investigate the temporally local behavior of classical learning algorithms. In that matter, we introduce new learning metrics.

Key words:  Artificial Intelligence, Markov decision processes, Reinforcement learning, Regret

Doctoral School: ED MSTII – Informatics and Mathematics
Research laboratory: Laboratoire d'informatique de Grenoble (LIG - CNRS/Inria/UGA - Grenoble INP-UGA )
Thesis supervision: Bruno GAUJAL

> To find out about all the 2025 thesis prizes
Updated on  May 27, 2025