stat441w18/Convolutional Neural Networks for Sentence Classification
Presented by
1. Ben Schwarz
2. Cameron Miller
3. Hamza Mirza
4. Pavle Mihajlovic
5. Terry Shi
6. Yitian Wu
7. Zekai Shao
Introduction
Model
Theory of Convolutional Neural Networks
Let [math]\displaystyle{ \boldsymbol{x}_{i:i+j} }[/math] be the concatenation of k-dimensional words [math]\displaystyle{ \boldsymbol{x}_i, \boldsymbol{x}_{i+1}, \dots, \boldsymbol{x}_{i+j} }[/math]. Then, a sentence of length [math]\displaystyle{ n }[/math] is the concatenation of k-dimensional words [math]\displaystyle{ \boldsymbol{x}_1, \boldsymbol{x}_2, \dots, \boldsymbol{x}_n }[/math], represented as [math]\displaystyle{ \boldsymbol{x}_{1:n} }[/math], [math]\displaystyle{ \boldsymbol{x}_{1:n} = \boldsymbol{x}_1 \oplus \boldsymbol{x}_2 \oplus \dots \oplus \boldsymbol{x}_n }[/math], where [math]\displaystyle{ \oplus }[/math] is the concatenation operation. Let [math]\displaystyle{ \boldsymbol{x}_i }[/math] denote the [math]\displaystyle{ i }[/math]-th word in this sentence.
A Convolutional Neural Network (CNN) is a nonlinear function [math]\displaystyle{ \boldsymbol{f}: \mathbb{R}^{hk} \to \mathbb{R} }[/math] that computes a series of outputs [math]\displaystyle{ c_i }[/math] from a concatenation of words [math]\displaystyle{ \boldsymbol{x}_i, \boldsymbol{x}_{i+1}, \dots, \boldsymbol{x}_{i+h-1} }[/math], represented by [math]\displaystyle{ \boldsymbol{x}_{i:i+h-1} }[/math]
Model Regularization
Datasets and Experimental Setup
Hyperparameters and Training
MR:
SST-1:
SST-2:
Subj:
TREC:
CR:
MPQA:
Pre-trained Word Vectors
Model Variations
CNN-rand:
CNN-static:
CNN-static:
CNN-non-static:
CNN-multichannel: