Victor BOONE received the 2025 Academic Thesis Prize for his research work among PhDs graduating in 2024.
Thesis Title: Prise de décision dans les systèmes multi-agents : délais, adaptabilité et apprentissage dans les jeux
In this thesis, we investigate the problem of regret minimization in Markov decision processes under the average gain criterion.
Markov decision processes are a natural way to model the interaction between an agent and their environment. This interaction can be thought as a game, during which the agent is a player that observes their environment in its entirety and takes decisions accordingly. These decisions are choices of “actions”. Playing one action has two notable effects: It influences the way the environment evolves (by changing of state) and the reward that the agent gets. The goal of the agent is to play the right actions to earn as much reward as possible in the long run.
In this thesis, we are interested in the problem where the agent observes the system in its entirety but does not know its very mechanism. More precisely, the agent must learn the effects of every action depending on the current state of the environment. Should one play an action that seems to be the best so far? Or, should one play that other action that has been bad so far, but that may have been bad only because of bad luck? This dilemma is called the exploration-exploitation dilemma and finding the right balance between these two is the heart of such learning problems.
The goal of the sis is the design of algorithms that manage the exploration-exploitation dilemma optimally. Our prism is mostly theoretic and our approach systematic: We show that no agent can learn “too fast” in the first place, by describing a lower bound on achievable performance. Then, inspired by this lower bound, we design a learning algorithm that reaches it. This approach is used in the two main-stream variations of this learning problem (the instance dependent analysis and the worst-case analysis). Beyond average performance, we further investigate the temporally local behavior of classical learning algorithms. In that matter, we introduce new learning metrics.
Doctoral School: ED MSTII – Informatics and Mathematics Research laboratory: Laboratoire d'informatique de Grenoble (LIG - CNRS/Inria/UGA - Grenoble INP-UGA ) Thesis supervision: Bruno GAUJAL
Share the linkCopyCopiedClose the modal windowShare the URL of this pageI recommend:Consultable at this address:La page sera alors accessible depuis votre menu "Mes favoris".Stop videoPlay videoMutePlay audioChat: A question? Chatbot Robo FabricaMatomo traffic statisticsX (formerly Twitter)