Pierre Clavier

Hello ! I am a Research Scientist, working at Cohere on Reinforcement Learning applied to Large Language Models. Before this, I was a PhD candidate in Machine Learning, working at CMAP in Ecole polytechnique supervised by Stéphanie Allassonnière, Erwan Le Pennec and working closely with Matthieu Geist. I was also part of HeKA at Inria Paris. I graduated from a Master in Mathematics and Machine Learning at MVA in ENS Paris-Saclay. From January to March 2024, I was a visiting researcher at Caltech in the Computing + Mathematical Sciences Department, supervised by Adam Wierman and Eric Mazumdar.

My PhD thesis (available here) focused on trying to Robustify Reinforcement Learning. Moreover, I have a great interest currently in :

Reinforcement Learning with Human Feedback
Reinforcement Learning with Verifiable Reward
Bandit theory
Sampling in general

News :

06/2025: I was presenting ShiQ in UCL AI Center in London, thanks for the invitation.

05/2025: New Preprint: ShiQ is a new RL algorithm inspired form Q-Learning but adapted to LLMs !

03/12: I am starting a new position in Cohere as a Research Scientist in RL !

01/2025: New AI model Command A finetuned with beatiful and theoretically grouned RLHF algorithm : CoPG and SRPO .

12/2024: I will attend Neurips 2024 in Vancouver, feel free to contact me to drink a coffee and talk about RL/ML stuff!

11/2024: I successfully defended my PhD thesis, Robust reinforcement learning, Theory and Practice. Thanks to Shie Mannor, Rémi Munos, Eric Moulines, Michal Valko, Aurélien Garivier, and Ana Busic for their time to be part of my jury!

10/2024 : Two papers accepted at Neurips 2024, TC-MDP a new algorithm for Robust Reinforcement Learning and a theoretical paper on optimal sample complexity of Robust MDPs written at Caltech with Laixi ( arkiv coming soon )!

28/10/2024 : I will be at the 17th European Workshop on Reinforcement Learning (EWRL 2024) in Toulouse to present my two recent works on Deep Robust Reinforcement Learning : ExpectRL and TC-MDP algorithms.

09/2024 : I am starting a PhD internship at Cohere. I will work on RLHF (Reinforcement Learning with Human Feedback).

05/2024 : 3 new preprint out : A new benchmark for Robust Reinforcement Learning: RRLS : Robust Reinforcement Learning Suite, and two new Deep Robust Reinforcement Learning algorithms : ExpectRL and TC-MDP !

04/2024 : 2 papers accepted, one at ICML 2024 on Bandit with Variational Inference and one on the theory of Robust MDPs at UAI 2024 (oral) !

01/2024 : I will be a visiting researcher in the California Institut of Technology in the group of Professor Adam Wierman and Eric Mazumdar working on Robust RL !

11/2023 : I am organizing with Neurips in Paris 2023 with Linus Bleistein among other colleagues on the 6th and 7th of December 2023 at Sorbonne Université (Paris 5). A great occasion to meet and discuss recent advances in ML in central Paris !