stat441w18/Convolutional Neural Networks for Sentence Classification: Difference between revisions
Line 19: | Line 19: | ||
= Model = | = Model = | ||
=== | === Theory of Convolutional Neural Networks === | ||
Let <math> \boldsymbol{x}_i \in \mathbb{R}^k </math> be the <math> i </math>-th word in <math> \boldsymbol{x}_{1:n} </math>, a sentence of length <math> n </math>, <math> \boldsymbol{x}_{1:n} = \boldsymbol{x}_1 \oplus \boldsymbol{x}_2 \oplus \dots \oplus \boldsymbol{x}_{n} </math>, where <math> \oplus </math> is the concatenation operator. | |||
A Convolutional Neural Network (CNN) is a nonlinear function <math> \boldsymbol{f}: \mathbb{R}^{hk} \to \mathbb{R} </math> that computes a series of outputs <math> c_i </math> from a concatenation of words <math> \boldsymbol{x}_i, \boldsymbol{x}_{i+1}, \dots, \boldsymbol{x}_{i+h-1} </math>, represented by <math> \boldsymbols{x}_{i:i+h-1} </math> | |||
=== Model Regularization === | === Model Regularization === |
Revision as of 17:26, 4 March 2018
Presented by
1. Ben Schwarz
2. Cameron Miller
3. Hamza Mirza
4. Pavle Mihajlovic
5. Terry Shi
6. Yitian Wu
7. Zekai Shao
Introduction
Model
Theory of Convolutional Neural Networks
Let [math]\displaystyle{ \boldsymbol{x}_i \in \mathbb{R}^k }[/math] be the [math]\displaystyle{ i }[/math]-th word in [math]\displaystyle{ \boldsymbol{x}_{1:n} }[/math], a sentence of length [math]\displaystyle{ n }[/math], [math]\displaystyle{ \boldsymbol{x}_{1:n} = \boldsymbol{x}_1 \oplus \boldsymbol{x}_2 \oplus \dots \oplus \boldsymbol{x}_{n} }[/math], where [math]\displaystyle{ \oplus }[/math] is the concatenation operator.
A Convolutional Neural Network (CNN) is a nonlinear function [math]\displaystyle{ \boldsymbol{f}: \mathbb{R}^{hk} \to \mathbb{R} }[/math] that computes a series of outputs [math]\displaystyle{ c_i }[/math] from a concatenation of words [math]\displaystyle{ \boldsymbol{x}_i, \boldsymbol{x}_{i+1}, \dots, \boldsymbol{x}_{i+h-1} }[/math], represented by [math]\displaystyle{ \boldsymbols{x}_{i:i+h-1} }[/math]
Model Regularization
Datasets and Experimental Setup
Hyperparameters and Training
MR:
SST-1:
SST-2:
Subj:
TREC:
CR:
MPQA:
Pre-trained Word Vectors
Model Variations
CNN-rand:
CNN-static:
CNN-static:
CNN-non-static:
CNN-multichannel: