Deep Exploration via Bootstrapped DQN: Difference between revisions
Ashishgaurav (talk | contribs) (added abstract) |
Ashishgaurav (talk | contribs) (edited summary a bit) |
||
Line 1: | Line 1: | ||
== | == Gist == | ||
Efficient exploration remains a major challenge for reinforcement learning | Efficient exploration remains a major challenge for reinforcement learning. Common dithering strategies for exploration, like $\epsilon$-greedy and Boltzmann Exploration, do not carry out deep exploration. So you need exponentially more data to train your agent. Also, most algorithms for statistically efficient reinforcement learning are not computationally tractable in complex environments. A list of available exploration strategies are available on [https://web.stanford.edu/class/msande338/lec9.pdf this page]. | ||
Randomized value functions offer a promising approach to efficient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. The authors propose '''bootstrapped DQN''' as a first step towards addressing such contexts. They go on to demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than ''any dithering strategy''. |
Revision as of 11:54, 26 October 2017
Gist
Efficient exploration remains a major challenge for reinforcement learning. Common dithering strategies for exploration, like $\epsilon$-greedy and Boltzmann Exploration, do not carry out deep exploration. So you need exponentially more data to train your agent. Also, most algorithms for statistically efficient reinforcement learning are not computationally tractable in complex environments. A list of available exploration strategies are available on this page.
Randomized value functions offer a promising approach to efficient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. The authors propose bootstrapped DQN as a first step towards addressing such contexts. They go on to demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than any dithering strategy.