Yaodong is an assistant professor at Institute for AI, Peking University. Before joining Peking University, he was an assistant professor at King's College London. He studies game theory, reinforcement learning and multi-agent systems, aiming to achieve decisions, interactions and alignment for artificial general intelligence. He has maintained a track record of more than sixty publications at top conferences (NeurIPS, ICML, ICLR, etc) and top journals (Artificial Intelligence, National Science Review, etc), along with the best system paper award at CoRL 2020 and the best blue-sky paper award at AAMAS 2021. He was awarded ACM SIGAI China Rising Star and World AI Conference (WAIC'22) Rising Star. He holds a Ph.D. degree from University College London (nominated by UCL for Joint AAAI/ACM SIGAI Doctoral Dissertation Award), an M.Sc. degree from Imperial College London and a Bachelor degree from University of Science and Technology of China.
杨耀东博士，北京大学人工智能研究院助理教授（博导），伦敦国王大学客座助理教授。高层次留学人才回国计划、中国科协青年托举计划获得者。重点关注博弈智能与决策技术，科研领域包括强化学习、博弈论和多智能体系统。他本科毕业于中国科学技术大学，并在伦敦帝国理工大学获得硕士、伦敦大学学院获得博士学位（论文获唯一提名AAAI/ACM SIGAI 优博奖）。回国前他于伦敦国王大学信息学院任助理教授并入选英国内政部杰出人才计划。他发表AI顶会论文 70 余篇（以第一/通讯作者发表NeurIPS/ICML/ICLR 20余篇以及AIJ、NSR、JAAMAS、TMLR期刊等），谷歌引用两千余次。曾获机器人学习会议CoRL’20 最佳系统论文奖、多智能体顶会AAMAS’21 最具前瞻性论文奖、世界人工智能大会（WAIC 2022）云帆奖璀璨明星、ACM SIGAI China新星奖、华为伦敦研究院最佳技术突破奖。曾被央视一套《焦点访谈》“新征程上科教兴国还需人才支撑”主题节目报道。
1. 一个合作博弈的通用求解框架(TechBeat 22-23最受欢迎讲者)
2. 一个零和博弈的通用求解框架(TechBeat 21-22最受欢迎讲者)
4.My viewpoint on Safe Alignment for LLMs (latest!).
PKU-Beaver for Safe RLHF technique on LLM
TorchOpt for high-order differentiation in PyTorch
OmniSafe for safe reinforcement learning
HARL for heterogeneous-agent reinforcement learning
MARLlib for cooperative multi-agent reinforcement learning
MAlib for competitive multi-agent reinforcement learning
Four papers get accepted at ICML 2023.
A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems
GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models
Invited talk given:
Slides: Aligning safe decision in open-ended world.
One paper gets accepted at Artificial Intelligence Journal
Safe Multi-Agent Reinforcement Learning for Multi-Robot Control
We propose the first safe cooperative MARL method.
Two ICRA papers, One ICLR paper got accepted.
ICRA'23: End-to-End Affordance Learning for Robotic Manipulation
We take advantage of visual affordance by using the contact information generated during the RL training process to predict contact maps of interest.
ICRA'23: GenDexGrasp: Generalizable Dexterous Grasping
A versatile dexterous grasping method that can generalize to unseen hands.
ICLR'23: QUALITY-SIMILAR DIVERSITY VIA POPULATION BASED REINFORCEMENT LEARNING
A new policy diversity measure is proposed that suits game AI settings.
One paper gets accepted at Autonomous Agents and Multi-Agent Systems (Springer)
Online Markov Decision Processes with Non-oblivious Strategic Adversary
We study the setting of online MDP where the adversary is smart where it can change its policy accordingly to the learning agent's behavior.
One paper gets accepted at AAMAS 2023
Is Nash Equilibrium Approximator Learnable ?
We prove that Nash Equilibrium is agnostic-PAC learnable.
We have won the 1st place at NeurIPS 2022 MyoChallenge!
This competition is about learning contact-rich manipulation using a musculoskeletal hand, e.g., Die Rotation.
Our paper gets accepted at National Science Review [IF-23]
On the complexity of computing markov perfect equilibrium in general-sum stochastic games
We prove the complexity of computing Nash Equilibrium in Markov games are PPAD-Complete.
Three multi-agent RL papers get accepted at AAAI 2023.
Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks
Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency
Talk is given at Airs in Air.
Game Theoretical Multi-Agent Reinforcement Learning.
Talk is given at Techbeat.com 2022.
A General Solution Framework to Cooperative MARL.
Seven papers got accepted at NeurIPS 2022.
Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning
A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
Constrained Update Projection Approach to Safe Policy Optimization
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
A Unified Diversity Measure for Multiagent Reinforcement Learning
New RL environments:
Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning
MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control
Tutorial on Conference on Games 2022
Solving two-player zero-sum games through reinforcement learning
Two Invited Talks were given during the summer holidays
CSML China 08/22:
A continuum of solutions to cooperative MARL.
CCDM China 07/22:
Training a Population of Agents.
One paper got accepted at IROS 2022.
Fully Decentralized Model-based Policy Optimization for Networked Systems
We figured out how to do model-based MARL in networked systems.
One paper got accepted at IJCAI 2022.
1. On the Convergence of Fictitious Play: A Decomposition Approach
We extend the convergence guarantee for the well-known fictitious play method.
We open source two reinforcement learning projects:
We develop an optimisation tool in Pytorch where meta-gradients can be computed easily.
With TorchOpt, you can implement Meta-RL algorithms easily, try our code!
We develop a RL/MARL environment for bimanual dexterous hands manipulations.
BiDexhands are super fast, you can reach 40,000 FPS by only one GPU.
Two papers got accepted at ICLR 2022.
1. Multiagent-Agent TRPO Methods
We develop how to conduct trust-region updates in MARL settings.
This is the SOTA algorithm in the cooperative MARL space, try our code!
[English Blog] [Chinese Blog] [Code]
2. LIGS: Learnable Intrinsic-Reward Generation Selection for Multi- Agent Learning
The paper addresses coordination improvement in the MARL setting by learning intrinsic rewards that motivate the exploration and coordination.
Invited talk at DAI 2021 on the topic of Training A Population of Reinforcement Learning Agents.
Three papers get accepted at NeurIPS 2021:
We analysed the variance of gradient norm for multi-agent reinforcement learning and developed a minimal-variance policy gradient estimator.
We developed a rigorous way to generate diverse policies in population-based training and demonstrated impressive results on Google football.
We show it is entirely possible to make AI learn to learn how to solve zero-sum games without even telling it what is a Nash equilibrium.
Invited talk at RLChina on the tutorial of Multi-Agent Learning.
Invited talk by 机器之心 on my recent work on how to deal with non-transitivity in two-player zero-sum games.
We opensource MALib: A bespoke high-performance framework for population-based multi-agent reinforcement learning.
Two papers get accepted in ICML 2021.
Modelling Behavioural Diversity for Learning in Open-Ended Games. This paper studies how to measure and promote behavioural diversity in solving games in a mathematically rigorous way. It is awarded a long talk (top 3%) at ICML 2021.
Learning in Nonzero-Sum Stochastic Games with Potentials. This paper studies a generalised class of fully cooperative games, named stochastic potential games, and propose a MARL solution to find the Nash in such games.
Check out my recent talk on the topic of:
A general framework for solving two-player zero-sum games.
Update: Our paper wins the Best Paper Award at the Blue Sky Idea track!!!
One paper gets accepted in AAMAS 2021.
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems. I express some of my recent thoughts on why behavioural diversity in the policy space is an important factor for MARL techniques to be applied in real-world problems, outside purely video games.
Check out my latest work on:
An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective. I hope this work could offer a nice summary of game theory basics for MARL researches in addition to the deep RL hype :)
Update: SMARTS won the BEST paper award in CoRL 2020!
We release SMARTS: a multi-agent reinforcement learning enabled autonomous driving platform.
Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Today we are excited to introduce a dedicated platform: SMARTS, that supports Scalable Multi-Agent Reinforcement Learning Training for autonomous driving. With SMARTS, ML researchers can now evaluate their new algorithms in the self-driving scenarios, in addition to traditional video games. In turn, SMARTS can enrich the social vehicle behaviours and create increasingly more realistic and diverse interactions, powered by RL techniques, for autonomous driving researchers. Check our code on Github, and our paper at Conference on Robotic Learning 2020.
One paper gets accepted at NIPS 2020 !
Replica-exchange Nos\'e-Hoover dynamics for Bayesian learning on large datasets. We introduce a new HMC sampler for large-scale Bayesian deep learning that suits multi-mode sampling and the noises from mini-batches can be absorbed by a special design of Nose-Hoover dynamics.
One paper gets accepted at CIKM 2020 !
Learning to infer user hidden states for online sequential advertising.
A lecture was given at RL China Summer School.
Advances of Multi-agent Learning in Gaming AI.
A talk was given at ISTBI, Fudan University.
Many-agent Reinforcement Learning.
One paper gets accepted at ICML 2020
Multi-agent Determinantal Q-learning. We introduce a new function approximator called Q-determinant point process for multi-agent reinforcement learning problems. It can help learn the Q-function factorisation with no needs for a priori structural constraints such as QMIX, VDN, etc.
One paper gets accepted at IJCAI 2020
Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning. We use probabilisitic graphical model to describe the recursive reasoning process of "I believe you believe I believe..." in the multi-agent system.
One paper gets accepted at AAMAS 2020
Alpha^Alpla-Rank: Practically Scaling Alpha-Rank through Stochastic Optimisation. Alpha-Rank is a replacement for Nash equilibrium for general-sum N-player game, importantly, its solution is P-complete. In this paper, we further enhance its tractability by several orders of magnitude by stochastic optimisation formulation.