Hierarchical Question-Image Co-Attention for Visual Question Answering

From statwiki

Revision as of 23:01, 20 November 2017 by S6kalra (talk | contribs) (Introduction)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Introduction

Visual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In VQA, an algorithm needs to answer text-based questions about images in natural language as illustrated in Figure <xr="fig:vqa-overview"/>.

<figure id="fig:vqa-overview">

Figure 1: Figure illustrates a VQA system; whereby AI System takes an image and a text-based visual question about the image as input and outputs the answer for the visual question in natural language

.

</figure>

Retrieved from "http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Hierarchical_Question-Image_Co-Attention_for_Visual_Question_Answering&oldid=30951"