Imagination-Augmented Agents for Deep Reinforcement Learning

From statwiki
Jump to navigation Jump to search

Introduction

An interesting research area in Reinforcement Learning is developing AI for playing video games. Before Deep Learning, the AI for video games is coded based on Monte-Carlo Tree Search of pre-set rules. In recent researches, Deep Reinforcement Learning shown the success in playing video games like Atari 2600 games. To be specific, the method is called Deep Q-Learning(DQN) which learns the optimal actions based on current observations(raw pixels). However, there are some complex games where DQN fails to learn: some games need to solve a sub-problem without explicit reward or contain irreversible domains, where actions can be catastrophic. A typical example of these games is Sokoban(Wikipedia). Even as humans are playing the game, planning and inference are needed. This kind of game raises challenges to RL.

In Reinforcement Learning, the algorithms can be divided into two categories: model-free algorithm and model-based algorithm. DQN, mentioned above, is a model-free method. It takes raw pixels as input and maps them to values or actions. As a drawback, large amounts of training data is required. In addition, the policies are not generalized to new tasks in the same environment. A model-based method is trying to build a model for the environment. By querying the model, agents can avoid irreversible, poor decisions. As an approximation of the environment, it can enable better generalization across states. However, this method only shows success in limited settings, where an exact transition model is given or in simple domains. In complex environments, model-based methods suffer from model errors from function approximation. Currently, there is no model-based method that is robust against imperfections.

In this paper, the authors introduce a novel deep reinforcement learning architecture called Imagination-Augmented Agents(I2As). Literally, this method enables agents to learn to interpret predictions from a learned environment model to construct implicit plans. It is a combination of model-free and model-based aspects. As shown in the results, this method outperforms DQN in the games: Sokoban, and MniniPacman. In addition, the experiments all show that I2A is able to successfully use imperfect models.

Motivation

Related Work

Approach

Experiment

Conclusion

Insight

Reference