stat441w18/A New Method of Region Embedding for Text Classification

From statwiki
Revision as of 21:04, 7 March 2018 by Y2434liu (talk | contribs) (A New Method of Region Embedding for Text Classification)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Method

This paper focuses on representing small text regions which can preserve local internal structural information for specific text classification. It defines [math]\displaystyle{ region\left ( i,c\right ) }[/math] as the [math]\displaystyle{ 2\times c+1 }[/math] length region with middle word [math]\displaystyle{ \omega_i }[/math] which is the i-th word of the document. And then it uses word embeddings and the local context units to produce region embedding. In the following, we first introduce local context unit, then two architectures to generate the region embedding, and how to classify text.

Local context unit

The vocabulary is represented by a matrix [math]\displaystyle{ mathbf{E} \isin mathbb{R}^{h \times v} }[/math] with a look up layer, denoted by the embedding [math]\displaystyle{ e_\omega }[/math]. The i-th column represents the embedding of [math]\displaystyle{ \omega_i }[/math], denoted by [math]\displaystyle{ mathbf{e}_\omega_i }[/math].

For each word [math]\displaystyle{ \omega_i }[/math], we define the local context unit [math]\displaystyle{ mathbf{K}_\omega_i\isin mathbb{R}^{h\times\left (2c+1\right )} }[/math]. Let [math]\displaystyle{ mathbf{K}_\{omega_i,t} }[/math] be the (c+t)-th column in [math]\displaystyle{ mathbf{K}_\omega_i \left (t\isin\left [ -c,c\right ] }[/math], representing a distinctive linear projection function on [math]\displaystyle{ mathbf(e)_{c+t} }[/math] in the local context [math]\displaystyle{ r\left (i,c\right ) }[/math]. Thus, we can utilize local ordered word information in terms of each word.

Define [math]\displaystyle{ mathbf{p}_{\omega_i+t}^i }[/math] as the projected word embedding of [math]\displaystyle{ \omega_i+t }[/math] in i-th word’s view, computed by: [math]\displaystyle{ mathbf{p}_{\omega_i+t}^i = mathbf{K}_\{omega_i,t} \odot mathbf{\omega_{i+t}} }[/math] where [math]\displaystyle{ \odot }[/math] denotes an elemet-wise multiplication.

Note local context units and embedding are learned as model parameters. Local context units can be learned to capture the semantic and syntactic influence of each word to its context.