Hierarchical Question-Image Co-Attention for Visual Question Answering: Difference between revisions
Jump to navigation
Jump to search
(Introduction) |
(Minor fixes) |
||
Line 1: | Line 1: | ||
__TOC__ | |||
= Introduction = | = Introduction = | ||
Visual Question Answering (VQA) is a recent problem in computer vision and | Visual Question Answering (VQA) is a recent problem in computer vision and | ||
Line 4: | Line 6: | ||
the deep learning, computer vision, and natural language processing communities. | the deep learning, computer vision, and natural language processing communities. | ||
In VQA, an algorithm needs to answer text-based questions about images in | In VQA, an algorithm needs to answer text-based questions about images in | ||
natural language as illustrated in Figure | natural language as illustrated in Figure 1. | ||
[[File:vqa-overview.png|thumb|800px|center|Figure 1: Figure illustrates a VQA system; whereby AI System takes an image and a text-based visual question about the image as input and outputs the answer for the visual question in natural language]]. | |||
Revision as of 00:04, 21 November 2017
Introduction
Visual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In VQA, an algorithm needs to answer text-based questions about images in natural language as illustrated in Figure 1.
.