Deep Exploration via Bootstrapped DQN: Difference between revisions

From statwiki
Jump to navigation Jump to search
(added abstract)
 
(edited summary a bit)
Line 1: Line 1:


== Abstract ==
== Gist ==


Efficient exploration remains a major challenge for reinforcement learning (RL). Common dithering strategies for exploration, such as epsilon-greedy, do not carry out temporally-extended (or deep) exploration; this can lead to exponentially larger data requirements. However, most algorithms for statistically efficient RL are not computationally tractable in complex environments. Randomized value functions offer a promising approach to efficient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. As a first step towards addressing such contexts we develop bootstrapped DQN. We demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than any dithering strategy. In the Arcade Learning Environment bootstrapped DQN substantially improves learning speed and cumulative performance across most games.
Efficient exploration remains a major challenge for reinforcement learning. Common dithering strategies for exploration, like $\epsilon$-greedy and Boltzmann Exploration, do not carry out deep exploration. So you need exponentially more data to train your agent. Also, most algorithms for statistically efficient reinforcement learning are not computationally tractable in complex environments. A list of available exploration strategies are available on [https://web.stanford.edu/class/msande338/lec9.pdf this page].
 
Randomized value functions offer a promising approach to efficient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. The authors propose '''bootstrapped DQN''' as a first step towards addressing such contexts. They go on to demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than ''any dithering strategy''.

Revision as of 12:54, 26 October 2017

Gist

Efficient exploration remains a major challenge for reinforcement learning. Common dithering strategies for exploration, like $\epsilon$-greedy and Boltzmann Exploration, do not carry out deep exploration. So you need exponentially more data to train your agent. Also, most algorithms for statistically efficient reinforcement learning are not computationally tractable in complex environments. A list of available exploration strategies are available on this page.

Randomized value functions offer a promising approach to efficient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. The authors propose bootstrapped DQN as a first step towards addressing such contexts. They go on to demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than any dithering strategy.