learn what not to learn
Learning how to act when the action space are large is challenging for reinforcement learning. For a specific case that many actions are irrelevant, it is sometimes easier for the algorithm to learn which action not to take. The paper propose a new reinforcement learning approach for dealing with large action spaces by restricting the available actions in each state to a subset of the most likely ones. More specifically, it propose a system that learns the approximation of Q-function and concurrently leans to eliminate actions. The method need to utilize an additional elimination signal which incorporates domain-specific prior knowledge. For example, in parser-based text games, the parser gives feedback regarding irrelevant actions after the action is played. (e.g., Player: "Climb the tree." Parser: "There are no trees to climb") Then a machine learning model can be trained to generalize to unseen states.
The paper focus mainly on tasks where both states and the actions are natural language. It introduce a novel deep reinforcement learning approach which has a DQN network and an Action Elimination Network(AEN), both using the CNN which is suitable to NLP tasks. The AEN is trained to predict invalid actions, supervised by the elimination signal from the environment. Note that the core assumption is that it is easy to predict which actions are invalid or inferior in each state and leverage that information for control.