Revision as of 21:44, 4 March 2018

Presented by

1. Ben Schwarz

2. Cameron Miller

3. Hamza Mirza

4. Pavle Mihajlovic

5. Terry Shi

6. Yitian Wu

7. Zekai Shao

Introduction

Model

Theory of Convolutional Neural Networks

Let [math]\displaystyle{ \boldsymbol{x}_{i:i+j} }[/math] be the concatenation of words [math]\displaystyle{ \boldsymbol{x}_i, \boldsymbol{x}_{i+1}, \dots, \boldsymbol{x}_{i+j} }[/math] with the concatenation operation [math]\displaystyle{ \oplus }[/math]. Then, [math]\displaystyle{ \boldsymbol{x}_{i:i+j} = \boldsymbol{x}_i \oplus \boldsymbol{x}_{i+1} \oplus \dots \oplus \boldsymbol{x}_{i+j} }[/math]. Thus, a sentence of length [math]\displaystyle{ n }[/math] is the concatenation of [math]\displaystyle{ n }[/math] words, denoted as [math]\displaystyle{ \boldsymbol{x}_{1:n} }[/math], [math]\displaystyle{ \boldsymbol{x}_{1:n} = \boldsymbol{x}_1 \oplus \boldsymbol{x}_2 \oplus \dots \oplus \boldsymbol{x}_n }[/math]. Let [math]\displaystyle{ \boldsymbol{x}_i \in \mathbb{R}^k }[/math] denote the [math]\displaystyle{ i }[/math]-th word in the sentence, [math]\displaystyle{ i \in \{ 1, \dots, n \} }[/math].

A Convolutional Neural Network (CNN) is a nonlinear function [math]\displaystyle{ f: \mathbb{R}^{hk} \to \mathbb{R} }[/math] that computes a series of outputs [math]\displaystyle{ c_i = f \left( \boldsymbol{w} \cdot \boldsymbol{x}_{i:i+h-1} + b \right) }[/math] from windows of [math]\displaystyle{ h }[/math] words [math]\displaystyle{ \boldsymbol{x}_{i:i+h-1} }[/math] in the sentence, where [math]\displaystyle{ \boldsymbol{w} \in \mathbb{R}^{hk} }[/math] is call a filter and [math]\displaystyle{ i \in \{ 1, \dots, n-h+1 \} }[/math]. The outputs form a [math]\displaystyle{ (n-h+1) }[/math]-dimensional vector [math]\displaystyle{ \boldsymbol{c} = \left[ c_1, c_2, \dots, c_{n-h+1} \right] }[/math] called a feature map.

To capture the most important feature from a feature map, we take the maximum value [math]\displaystyle{ \hat{c} = max \{ \boldsymbol{c} \} }[/math].

Model Regularization

Datasets and Experimental Setup

Hyperparameters and Training

MR:

SST-1:

SST-2:

Subj:

TREC:

CR:

MPQA:

Pre-trained Word Vectors

Model Variations

CNN-rand:

CNN-static:

CNN-non-static:

CNN-multichannel:

Training and Results

Criticisms

More Formulations/New Concepts

@@ Line 21: / Line 21: @@
 === Theory of Convolutional Neural Networks ===
-Let <math> \boldsymbol{x}_{i:i+j} </math> be the concatenation of words <math> \boldsymbol{x}_i, \boldsymbol{x}_{i+1}, \dots, \boldsymbol{x}_{i+j} </math>, <math> \boldsymbol{x}_{i:i+j} = \boldsymbol{x}_i \oplus \boldsymbol{x}_{i+1} \oplus \dots \oplus \boldsymbol{x}_{i+j} </math>, where <math> \oplus </math> is the concatenation operation. Then, a sentence of length <math> n </math> is the concatenation of <math> n </math> words, denoted as <math> \boldsymbol{x}_{1:n} </math>, <math> \boldsymbol{x}_{1:n} = \boldsymbol{x}_1 \oplus \boldsymbol{x}_2 \oplus \dots \oplus \boldsymbol{x}_n </math>. Let <math> \boldsymbol{x}_i \in \mathbb{R}^k </math> denote the <math> i </math>-th word in the sentence, <math> i \in \left{ 1, \dots, n \right} </math>.
+Let <math> \boldsymbol{x}_{i:i+j} </math> be the concatenation of words <math> \boldsymbol{x}_i, \boldsymbol{x}_{i+1}, \dots, \boldsymbol{x}_{i+j} </math> with the concatenation operation <math> \oplus </math>. Then, <math> \boldsymbol{x}_{i:i+j} = \boldsymbol{x}_i \oplus \boldsymbol{x}_{i+1} \oplus \dots \oplus \boldsymbol{x}_{i+j} </math>. Thus, a sentence of length <math> n </math> is the concatenation of <math> n </math> words, denoted as <math> \boldsymbol{x}_{1:n} </math>, <math> \boldsymbol{x}_{1:n} = \boldsymbol{x}_1 \oplus \boldsymbol{x}_2 \oplus \dots \oplus \boldsymbol{x}_n </math>. Let <math> \boldsymbol{x}_i \in \mathbb{R}^k </math> denote the <math> i </math>-th word in the sentence, <math> i \in \{ 1, \dots, n \} </math>.
-A Convolutional Neural Network (CNN) is a nonlinear function <math> f: \mathbb{R}^{hk} \to \mathbb{R} </math> that computes a series of outputs <math> c_i = f \left( \boldsymbol{w} \cdot \boldsymbol{x}_{i:i+h-1} + b \right) </math> from windows of <math> h </math> words <math> \boldsymbol{x}_{i:i+h-1} </math> in the sentence, where <math> \boldsymbol{w} \in \mathbb{R}^{hk} </math> is call a ''filter'' and  <math> i \in \left{ 1, \dots, n-h+1 \right} </math>. The outputs form a <math> (n-h+1) </math>-dimensional vector <math> \boldsymbol{c} = \left[ c_1, c_2, \dots, c_{n-h+1} \right] </math> called a ''feature map''.
+A Convolutional Neural Network (CNN) is a nonlinear function <math> f: \mathbb{R}^{hk} \to \mathbb{R} </math> that computes a series of outputs <math> c_i = f \left( \boldsymbol{w} \cdot \boldsymbol{x}_{i:i+h-1} + b \right) </math> from windows of <math> h </math> words <math> \boldsymbol{x}_{i:i+h-1} </math> in the sentence, where <math> \boldsymbol{w} \in \mathbb{R}^{hk} </math> is call a ''filter'' and  <math> i \in \{ 1, \dots, n-h+1 \} </math>. The outputs form a <math> (n-h+1) </math>-dimensional vector <math> \boldsymbol{c} = \left[ c_1, c_2, \dots, c_{n-h+1} \right] </math> called a ''feature map''.
-To capture the most important feature from a feature map, we take the maximum value <math> \hat{c} = max \left{ \boldsymbol{c} \right} </math>.
+To capture the most important feature from a feature map, we take the maximum value <math> \hat{c} = max \{ \boldsymbol{c} \} </math>.
 === Model Regularization ===

stat441w18/Convolutional Neural Networks for Sentence Classification: Difference between revisions

Revision as of 21:44, 4 March 2018

Contents

Presented by

Introduction

Model

Theory of Convolutional Neural Networks

Model Regularization

Datasets and Experimental Setup

Hyperparameters and Training

Pre-trained Word Vectors

Model Variations

Training and Results

Criticisms

More Formulations/New Concepts

Conclusion

Source

Navigation menu

stat441w18/Convolutional Neural Networks for Sentence Classification: Difference between revisions

Revision as of 21:44, 4 March 2018

Presented by

Introduction

Model

Theory of Convolutional Neural Networks

Model Regularization

Datasets and Experimental Setup

Hyperparameters and Training

Pre-trained Word Vectors

Model Variations

Training and Results

Criticisms

More Formulations/New Concepts

Conclusion

Source

Navigation menu

Search