show, Attend and Tell: Neural Image Caption Generation with Visual Attention: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with "= Introduction = = Motivation = = Contributions = * Two attention-based image caption generators using a common framework. A "soft" deterministic attention mechanism and a "h...")
 
No edit summary
Line 8: Line 8:
* Show how to gain insight and interpret results of this framework by visualizing "where" and "what" the attention focused on.
* Show how to gain insight and interpret results of this framework by visualizing "where" and "what" the attention focused on.
* Quantitatively validate the usefulness of attention in caption generation with state of the art performance on three datasets (Flickr8k, Flickr30k, and MS COCO)
* Quantitatively validate the usefulness of attention in caption generation with state of the art performance on three datasets (Flickr8k, Flickr30k, and MS COCO)
= Previous Work =


= Model =
= Model =

Revision as of 13:45, 4 November 2015

Introduction

Motivation

Contributions

  • Two attention-based image caption generators using a common framework. A "soft" deterministic attention mechanism and a "hard" stochastic mechanism.
  • Show how to gain insight and interpret results of this framework by visualizing "where" and "what" the attention focused on.
  • Quantitatively validate the usefulness of attention in caption generation with state of the art performance on three datasets (Flickr8k, Flickr30k, and MS COCO)

Previous Work

Model

The attention framework learns latent alignments from scratch instead of explicitly using object detectors. This allows the model to go beyond "objectness" and learn to attend to abstract concepts.

Results

References

<references />