stat441w18/Image Question Answering using CNN with Dynamic Parameter Prediction: Difference between revisions

From statwiki
Jump to navigation Jump to search
(modified past and related works section)
Line 9: Line 9:


== Previous and Related Works ==
== Previous and Related Works ==
As mentioned in the earlier section, one of the major goals in computer vision is to achieve holistic understanding. While relatively new interest in the computer vision community, Image Question Answering already has a growing number of researchers working on this problem.  
As mentioned in the earlier section, one of the major goals in computer vision is to achieve holistic understanding. While relatively new interest in the computer vision community, Image Question Answering already has a growing number of researchers working on this problem. There has been many past and recent efforts on this front, for instance this non-exhaustive list of papers published between 2015 and 2016 ([https://arxiv.org/pdf/1505.05612.pdf NIPS 2015 paper], [https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Malinowski_Ask_Your_Neurons_ICCV_2015_paper.pdf ICCV 2015 paper], [https://arxiv.org/pdf/1506.00333.pdf AAAI 2016 paper]). One key commonality in these papers is that most, if not all, of the recognition problems are defined in a simple, controlled environment with a finite set of objectives. While the question-handling strategies differ from paper to paper, a general problem-solving strategy in these papers is to use CNNs for feature extraction from images prior to handling question sentences.
 
In contrast, there has been less efforts in solving various recognition problems simultaneously, which is what researchers in Image Question Answering is trying to achieve. As mentioned previously, other than one paper ([https://arxiv.org/pdf/1410.0210.pdf NIPS 2014 paper]) which utilizes a Bayesian framework, the majority of the papers listed above generally propose an overall deep learning network structure, which performs very well on public bench marks but tends to fall apart when question complexity increases. The reason lies fundamentally in the complexity of the English language:
 


== Problem Setup (in mathematical terms) ==
== Problem Setup (in mathematical terms) ==

Revision as of 01:54, 15 March 2018

Image Question Answering using CNN with Dynamic Parameter Prediction

Presented by

Rosie Zou, Kye Wei, Glen Chalatov, Ameer Dharamshi

Introduction

Problem Setup (in words)

Previous and Related Works

As mentioned in the earlier section, one of the major goals in computer vision is to achieve holistic understanding. While relatively new interest in the computer vision community, Image Question Answering already has a growing number of researchers working on this problem. There has been many past and recent efforts on this front, for instance this non-exhaustive list of papers published between 2015 and 2016 (NIPS 2015 paper, ICCV 2015 paper, AAAI 2016 paper). One key commonality in these papers is that most, if not all, of the recognition problems are defined in a simple, controlled environment with a finite set of objectives. While the question-handling strategies differ from paper to paper, a general problem-solving strategy in these papers is to use CNNs for feature extraction from images prior to handling question sentences.

In contrast, there has been less efforts in solving various recognition problems simultaneously, which is what researchers in Image Question Answering is trying to achieve. As mentioned previously, other than one paper (NIPS 2014 paper) which utilizes a Bayesian framework, the majority of the papers listed above generally propose an overall deep learning network structure, which performs very well on public bench marks but tends to fall apart when question complexity increases. The reason lies fundamentally in the complexity of the English language:


Problem Setup (in mathematical terms)

Mathematical Background

CNNs

RNNs and GRUs

Model

VGGNet

Parameter Prediction Network

Hashing

Model Summary

Training and Results

Training

Error Reduction

Pre-trained GRUs

Fine-tuning

Results

Critique