Model-Based Reinforcement Learning has been lagging behind Model-Free RL on Atari, especially among single-GPU algorithms. This collaboration between Google AI, DeepMind, and the University of Toronto (UofT) pushes world models to the next level. The main contribution is a learned latent state consisting of one discrete part and one stochastic part, whereby the stochastic part is a set of 32 categorical variables, each with 32 possible values. The world model can freely decide how it wants to use these variables to represent the input, but is tasked with the prediction of future observations and rewards. This procedure gives rise to an informative latent representation and in a second step, reinforcement learning (A2C Actor-Critic) can be done purely - and very efficiently - on the basis of the world-model’s latent states. No observations needed! This paper combines this with straight-through estimators, KL balancing, and many other tricks to achieve state-of-the-art single-GPU performance in Atari.
0:00 - Intro & Overview
4:50 - Short Recap of Reinforcement Learning
6:05 - Problems with Model-Free Reinforcement Learning
10:40 - How World Models Help
12:05 - World Model Learner Architecture
16:50 - Deterministic & Stochastic Hidden States
18:50 - Latent Categorical Variables
22:00 - Categorical Variables and Multi-Modality
23:20 - Sampling & Stochastic State Prediction
30:55 - Actor-Critic Learning in Dream Space
32:05 - The Incompleteness of Learned World Models
34:15 - How General is this Algorithm?
37:25 - World Model Loss Function
39:20 - KL Balancing
40:35 - Actor-Critic Loss Function
41:45 - Straight-Through Estimators for Sampling Backpropagation
46:25 - Experimental Results
52:00 - Where Does It Fail?
54:25 - Conclusion
Paper: [2010.02193] Mastering Atari with Discrete World Models
Code: GitHub - danijar/dreamerv2: Mastering Atari with Discrete World Models
Author Blog: Mastering Atari with Discrete World Models
Google AI Blog: Google AI Blog: Mastering Atari with Discrete World Models