stat441w18/Convolutional Neural Networks for Sentence Classification: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 19: Line 19:
= Model =
= Model =


=== Model Settings ===
=== Theory of Convolutional Neural Networks ===


Consider a sentence of length <math> n </math>, represented by <math> \boldsymbol{x}_{1:n} </math>. Let <math> \boldsymbol{x}_i \in \mathbb{R}^k </math> be the <math> i</math>-th word in the sentence and <math> \oplus </math> be the concatenation operator, where <math> \boldsymbol{x}_{1:n} = \boldsymbol{x}_{1} \oplus \boldsymbol{x}_2 \oplus \dots \oplus \boldsymbol{x}_n </math>. In general, let <math> \boldsymbol{x}_{i:i+j} </math> represent the concatenation of words <math> \boldsymbol{x}_{i}, \boldsymbol{x}_{i+1}, \dots, \boldsymbol{x}_{i+j} </math>.
Let <math> \boldsymbol{x}_i \in \mathbb{R}^k </math> be the <math> i </math>-th word in <math> \boldsymbol{x}_{1:n} </math>, a sentence of length <math> n </math>, <math> \boldsymbol{x}_{1:n} = \boldsymbol{x}_1 \oplus \boldsymbol{x}_2 \oplus \dots \oplus \boldsymbol{x}_{n} </math>, where <math> \oplus </math> is the concatenation operator.


We also consider a filter <math> w \in \mathbb{R}^{hk} </math>.
A Convolutional Neural Network (CNN) is a nonlinear function <math> \boldsymbol{f}: \mathbb{R}^{hk} \to \mathbb{R} </math> that computes a series of outputs <math> c_i </math> from a concatenation of words <math> \boldsymbol{x}_i, \boldsymbol{x}_{i+1}, \dots, \boldsymbol{x}_{i+h-1} </math>, represented by <math> \boldsymbols{x}_{i:i+h-1} </math>


=== Model Regularization ===
=== Model Regularization ===

Revision as of 17:26, 4 March 2018

Presented by

1. Ben Schwarz

2. Cameron Miller

3. Hamza Mirza

4. Pavle Mihajlovic

5. Terry Shi

6. Yitian Wu

7. Zekai Shao

Introduction

Model

Theory of Convolutional Neural Networks

Let [math]\displaystyle{ \boldsymbol{x}_i \in \mathbb{R}^k }[/math] be the [math]\displaystyle{ i }[/math]-th word in [math]\displaystyle{ \boldsymbol{x}_{1:n} }[/math], a sentence of length [math]\displaystyle{ n }[/math], [math]\displaystyle{ \boldsymbol{x}_{1:n} = \boldsymbol{x}_1 \oplus \boldsymbol{x}_2 \oplus \dots \oplus \boldsymbol{x}_{n} }[/math], where [math]\displaystyle{ \oplus }[/math] is the concatenation operator.

A Convolutional Neural Network (CNN) is a nonlinear function [math]\displaystyle{ \boldsymbol{f}: \mathbb{R}^{hk} \to \mathbb{R} }[/math] that computes a series of outputs [math]\displaystyle{ c_i }[/math] from a concatenation of words [math]\displaystyle{ \boldsymbol{x}_i, \boldsymbol{x}_{i+1}, \dots, \boldsymbol{x}_{i+h-1} }[/math], represented by [math]\displaystyle{ \boldsymbols{x}_{i:i+h-1} }[/math]

Model Regularization

Datasets and Experimental Setup

Hyperparameters and Training

MR:

SST-1:

SST-2:

Subj:

TREC:

CR:

MPQA:

Pre-trained Word Vectors
Model Variations

CNN-rand:

CNN-static:

CNN-static:

CNN-non-static:

CNN-multichannel:

Training and Results

Criticisms

More Formulations/New Concepts

Conclusion

Source