Yaodong is a machine learning researcher with ten-year working experience in both academia and industry. Currently, he is an assistant professor at Peking University. His research is about reinforcement learning and multi-agent systems. He has maintained a track record of more than forty publications at top conferences and journals, along with the best system paper award at CoRL 2020 (first author) and the best blue-sky paper award at AAMAS 2021 (first author). Before joining Peking University, he was an assistant professor at King's College London. Before KCL, he was a principal research scientist at Huawei U.K. where he headed the multi-agent system team in London. Before Huawei, he was a senior research manager at AIG, working on AI applications in finance. He holds a Ph.D. degree from University College London, an M.Sc. degree from Imperial College London and a Bachelor degree from University of Science and Technology of China.
I have multiple PhD studentship openings, feel free to contact me if you are interested in my research direction.
本组现有多个博士名额面向2023 Fall招生, 欢迎对强化学习/多智能体系统感兴趣的同学垂询！
One paper got accepted at IJCAI 2022.
We extend the convergence guarantee for the well-known fictitious play method.
We open source two reinforcement learning projects:
We develop an optimisation tool in Pytorch where meta-gradients can be computed easily.
With TorchOpt, you can implement Meta-RL algorithms easily, try our code!
We develop a RL/MARL environment for bimanual dexterous hands manipulations.
BiDexhands are super fast, you can reach 40,000 FPS by only one GPU.
Two papers got accepted at ICLR 2022.
We develop how to conduct trust-region updates in MARL settings.
This is the SOTA algorithm in the cooperative MARL space, try our code!
The paper addresses coordination improvement in the MARL setting by learning intrinsic rewards that motivate the exploration and coordination.
Invited talk at DAI 2021 on the topic of Training A Population of Reinforcement Learning Agents.
Three papers get accepted at NeurIPS 2021:
We analysed the variance of gradient norm for multi-agent reinforcement learning and developed a minimal-variance policy gradient estimator.
We developed a rigorous way to generate diverse policies in population-based training and demonstrated impressive results on Google football.
We show it is entirely possible to make AI learn to learn how to solve zero-sum games without even telling it what is a Nash equilibrium.
Invited talk at RLChina on the tutorial of Multi-Agent Learning.
Invited talk by 机器之心 on my recent work on how to deal with non-transitivity in two-player zero-sum games.
We opensource MALib: A bespoke high-performance framework for population-based multi-agent reinforcement learning.
Two papers get accepted in ICML 2021.
Modelling Behavioural Diversity for Learning in Open-Ended Games. This paper studies how to measure and promote behavioural diversity in solving games in a mathematically rigorous way. It is awarded a long talk (top 3%) at ICML 2021.
Learning in Nonzero-Sum Stochastic Games with Potentials. This paper studies a generalised class of fully cooperative games, named stochastic potential games, and propose a MARL solution to find the Nash in such games.
Check out my recent talk on the topic of:
Update: Our paper wins the Best Paper Award at the Blue Sky Idea track!!!
One paper gets accepted in AAMAS 2021.
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems. I express some of my recent thoughts on why behavioural diversity in the policy space is an important factor for MARL techniques to be applied in real-world problems, outside purely video games.
Check out my latest work on:
An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective. I hope this work could offer a nice summary of game theory basics for MARL researches in addition to the deep RL hype :)
Update: SMARTS won the BEST paper award in CoRL 2020!
We release SMARTS: a multi-agent reinforcement learning enabled autonomous driving platform.
Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Today we are excited to introduce a dedicated platform: SMARTS, that supports Scalable Multi-Agent Reinforcement Learning Training for autonomous driving. With SMARTS, ML researchers can now evaluate their new algorithms in the self-driving scenarios, in addition to traditional video games. In turn, SMARTS can enrich the social vehicle behaviours and create increasingly more realistic and diverse interactions, powered by RL techniques, for autonomous driving researchers. Check our code on Github, and our paper at Conference on Robotic Learning 2020.
One paper gets accepted at NIPS 2020 !
Replica-exchange Nos\'e-Hoover dynamics for Bayesian learning on large datasets. We introduce a new HMC sampler for large-scale Bayesian deep learning that suits multi-mode sampling and the noises from mini-batches can be absorbed by a special design of Nose-Hoover dynamics.
One paper gets accepted at CIKM 2020 !
A lecture was given at RL China Summer School.
A talk was given at ISTBI, Fudan University.
Many-agent Reinforcement Learning.
One paper gets accepted at ICML 2020
Multi-agent Determinantal Q-learning. We introduce a new function approximator called Q-determinant point process for multi-agent reinforcement learning problems. It can help learn the Q-function factorisation with no needs for a priori structural constraints such as QMIX, VDN, etc.
One paper gets accepted at IJCAI 2020
Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning. We use probabilisitic graphical model to describe the recursive reasoning process of "I believe you believe I believe..." in the multi-agent system.
One paper gets accepted at AAMAS 2020
Alpha^Alpla-Rank: Practically Scaling Alpha-Rank through Stochastic Optimisation. Alpha-Rank is a replacement for Nash equilibrium for general-sum N-player game, importantly, its solution is P-complete. In this paper, we further enhance its tractability by several orders of magnitude by stochastic optimisation formulation.