Bianca Marin Moreno received the 2026 Academic Thesis Prize for his research work among PhDs graduating in 2025.
Her Thesis Title: Online convex reinforcement learning and applications to energy management problems
Scientific Context and Objectives
The massive integration of intermittent renewable energy requires increased demand-side flexibility, particularly through the control of thermal appliances (water heaters). The scientific challenge lies in controlling hundreds of thousands of agents under partial observation constraints to ensure data privacy. This real-time decision-making problem must operate within an uncertain and non-stationary environment, where consumer behavior and power production evolve unpredictably, rendering conventional control approaches ineffective at this scale.
Algorithmic Advances: Convergence and Robustness
The core of Bianca’s work is based on the introduction of new reinforcement learning and online optimization algorithms, specifically designed to regulate the average consumption of a vast population of agents toward a target profile. Her thesis demonstrates that applying "No-Regret" learning algorithms, such as Hedge or Online Mirror Descent, effectively enables reaching a Nash equilibrium within the framework of
Mean Field Games. A major theoretical result thus proves the convergence of aggregated consumption toward the set objective. To address the non-stationarity inherent in human behavior and power production, the
MetaCURL algorithm was developed. Based on an expert aggregation strategy, this tool dynamically adapts the control without requiring prior knowledge of the magnitude of environmental variations, while maintaining very low algorithmic complexity.
Toward Real-World Deployment: Privacy and Stability
Beyond theoretical guarantees, Bianca’s thesis addresses practical hurdles related to grid deployment:
- Privacy Preservation: Methods for reducing the amount of observed information were published (ICML 2025) to limit controller intrusiveness.
- Long-term Stability: To avoid unrealistic daily resets of agents, a solution integrating a constraint on the terminal state distribution was proposed (ALT 2026), ensuring the system's operational continuity.
Conclusion and Perspectives
Bianca’s work serves as a bridge between sequential learning theory and the critical needs of the energy transition. By providing robust, scalable, and privacy-preserving algorithms, her research lays the groundwork for low-carbon electrical systems capable of autonomous and reliable self-adjustment.
In October 2025, Bianca had already received the L’Oréal-UNESCO For Women in Science Young Talents France 2025 Award from the L’Oréal Foundation, in partnership with the French Academy of Sciences and the French National Commission for UNESCO. Among the 34 doctoral and postdoctoral researchers honoured, she stood out in the category ‘AI and Modelling: anticipating and shaping the future’.
Key words: mean-field, machine learning, power consumption control, optimisation
Doctoral School: ED MSTII – Mathematics, Information Science and Technology, Computer Science
Research laboratory: Inria center at Université Grenoble Alpes
Company: EDF
Thesis supervision: Pierre Gaillard, Nadia Oudjane and Margaux Brégère