Difference between revisions of "stat441w18/A New Method of Region Embedding for Text Classification"

From statwiki
Jump to: navigation, search
(Method)
Line 5: Line 5:
 
===Local context unit===
 
===Local context unit===
  
The vocabulary is represented by a matrix <math> mathbf{E}\isin mathbb{R}^{h \times v} </math> with a look up layer, denoted by the embedding <math> e_\omega </math>. The i-th column represents the embedding of <math> \omega_i </math>, denoted by <math> mathbf{e}_\omega_i</math>.
+
The vocabulary is represented by a matrix <math> \mathbf{E}\isin mathbb{R}^{h \times v} </math> with a look up layer, denoted by the embedding <math> e_\omega </math>. The i-th column represents the embedding of <math> \omega_i </math>, denoted by <math> \mathbf{e}_\omega_i</math>.
  
For each word <math> \omega_i </math>, we define the local context unit <math> mathbf{K}_\omega_i\isin mathbb{R}^{h\times\left (2c+1\right )}</math>. Let <math> mathbf{K}_\{omega_i,t} </math> be the (c+t)-th column in <math> mathbf{K}_\omega_i \left (t\isin\left [ -c,c\right ]</math>, representing a distinctive linear projection function on <math>mathbf(e)_{c+t}</math> in the local context <math>r\left (i,c\right )</math>. Thus, we can utilize local ordered word information in terms of each word.
+
For each word <math> \omega_i </math>, we define the local context unit <math> \mathbf{K}_\omega_i\isin \mathbb{R}^{h\times\left (2c+1\right )}</math>. Let <math> \mathbf{K}_\{omega_i,t} </math> be the (c+t)-th column in <math> \mathbf{K}_\omega_i \left (t\isin\left [ -c,c\right ]</math>, representing a distinctive linear projection function on <math>\mathbf(e)_{c+t}</math> in the local context <math>r\left (i,c\right )</math>. Thus, we can utilize local ordered word information in terms of each word.
  
Define <math>mathbf{p}_{\omega_i+t}^i</math> as the projected word embedding of <math> \omega_i+t </math> in i-th word’s view, computed by:
+
Define <math>\mathbf{p}_{\omega_i+t}^i</math> as the projected word embedding of <math> \omega_i+t </math> in i-th word’s view, computed by:
<math> mathbf{p}_{\omega_i+t}^i = mathbf{K}_\{omega_i,t} \odot mathbf{\omega_{i+t}}</math> where <math>\odot</math> denotes an elemet-wise multiplication.
+
<math> \mathbf{p}_{\omega_i+t}^i = \mathbf{K}_\{omega_i,t} \odot \mathbf{\omega_{i+t}}</math> where <math>\odot</math> denotes an elemet-wise multiplication.
  
 
Note local context units and embedding are learned as model parameters. Local context units can be learned to capture the semantic and syntactic influence of each word to its context.
 
Note local context units and embedding are learned as model parameters. Local context units can be learned to capture the semantic and syntactic influence of each word to its context.

Revision as of 21:10, 7 March 2018

Method

This paper focuses on representing small text regions which can preserve local internal structural information for specific text classification. It defines [math] region\left ( i,c\right ) [/math] as the [math]2\times c+1[/math] length region with middle word [math] \omega_i [/math] which is the i-th word of the document. And then it uses word embeddings and the local context units to produce region embedding. In the following, we first introduce local context unit, then two architectures to generate the region embedding, and how to classify text.

Local context unit

The vocabulary is represented by a matrix [math] \mathbf{E}\isin mathbb{R}^{h \times v} [/math] with a look up layer, denoted by the embedding [math] e_\omega [/math]. The i-th column represents the embedding of [math] \omega_i [/math], denoted by [math] \mathbf{e}_\omega_i[/math].

For each word [math] \omega_i [/math], we define the local context unit [math] \mathbf{K}_\omega_i\isin \mathbb{R}^{h\times\left (2c+1\right )}[/math]. Let [math] \mathbf{K}_\{omega_i,t} [/math] be the (c+t)-th column in [math] \mathbf{K}_\omega_i \left (t\isin\left [ -c,c\right ][/math], representing a distinctive linear projection function on [math]\mathbf(e)_{c+t}[/math] in the local context [math]r\left (i,c\right )[/math]. Thus, we can utilize local ordered word information in terms of each word.

Define [math]\mathbf{p}_{\omega_i+t}^i[/math] as the projected word embedding of [math] \omega_i+t [/math] in i-th word’s view, computed by: [math] \mathbf{p}_{\omega_i+t}^i = \mathbf{K}_\{omega_i,t} \odot \mathbf{\omega_{i+t}}[/math] where [math]\odot[/math] denotes an elemet-wise multiplication.

Note local context units and embedding are learned as model parameters. Local context units can be learned to capture the semantic and syntactic influence of each word to its context.