Introduction

Motivation

Contributions

Two attention-based image caption generators using a common framework. A "soft" deterministic attention mechanism and a "hard" stochastic mechanism.
Show how to gain insight and interpret results of this framework by visualizing "where" and "what" the attention focused on.
Quantitatively validate the usefulness of attention in caption generation with state of the art performance on three datasets (Flickr8k, Flickr30k, and MS COCO)

Previous Work

Model

The attention framework learns latent alignments from scratch instead of explicitly using object detectors. This allows the model to go beyond "objectness" and learn to attend to abstract concepts.

Results

References

show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Contents

Introduction

Motivation

Contributions

Previous Work

Model

Results

References

Navigation menu

show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Introduction

Motivation

Contributions

Previous Work

Model

Results

References

Navigation menu

Search