STAT946F17/ Teaching Machines to Describe Images via Natural Language Feedback: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 6: Line 6:


= Related Works =
= Related Works =
Several works incorporate human feedback to help an RL agent learn faster.
#Thomaz et al. [2006]exploits humans in the loop to teach an agent to cook in a virtual kitchen. The users watch the agent learn and may intervene at any time to give a scalar reward. Reward shaping (Ng et al. [1999]) is used to incorporate this information in the Markov Decision Process (MDP).
#Judah et al. [2010] iterates between “practice”, during which the agent interacts with the real environment, and a critique session where a human labels any subset of the chosen actions as good or bad.
#Griffith et al. [2013] proposes policy shaping which incorporates right/wrong feedback by utilizing it as direct policy labels.
Above approaches mostly assume that humans provide a numeric reward. A few attempts have been made to advise an RL agent using language.


= Methodology =
= Methodology =

Revision as of 18:28, 1 November 2017

Introduction

In the era of Artificial Intelligence, one should ideally be able to educate the robot about its mistakes, possibly without needing to dig into the underlying software. Reinforcement learning has become a standard way of training artificial agents that interact with an environment. Several works explored the idea of incorporating humans in the learning process, in order to help the reinforcement learning agent to learn faster. In most cases, the guidance comes in the form of a simple numerical (or “good”/“bad”) reward. In this work, natural language is used as a way to guide an RL agent. The author argues that a sentence provides a much stronger learning signal than a numeric reward in that we can easily point to where the mistakes occur and suggest how to correct them.

Here the goal is to allow a non-expert human teacher to give feedback to an RL agent in the form of natural language, just as one would to a learning child. The author has focused on the problem of image captioning in which the quality of the output can easily be judged by non-experts.

Related Works

Several works incorporate human feedback to help an RL agent learn faster.

  1. Thomaz et al. [2006]exploits humans in the loop to teach an agent to cook in a virtual kitchen. The users watch the agent learn and may intervene at any time to give a scalar reward. Reward shaping (Ng et al. [1999]) is used to incorporate this information in the Markov Decision Process (MDP).
  2. Judah et al. [2010] iterates between “practice”, during which the agent interacts with the real environment, and a critique session where a human labels any subset of the chosen actions as good or bad.
  3. Griffith et al. [2013] proposes policy shaping which incorporates right/wrong feedback by utilizing it as direct policy labels.

Above approaches mostly assume that humans provide a numeric reward. A few attempts have been made to advise an RL agent using language.

Methodology

Phrase-based Image Captioning

Crowd-sourcing Human Feedback

Feedback Network

Policy Gradient Optimization using Natural Language Feedback

Experimental Results

Conclusion