stat441w18/A New Method of Region Embedding for Text Classification

From statwiki
Revision as of 22:10, 7 March 2018 by Y2434liu (talk | contribs)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Method

This paper focuses on representing small text regions which can preserve local internal structural information for specific text classification. It defines [math]\displaystyle{ region\left ( i,c\right ) }[/math] as the [math]\displaystyle{ 2\times c+1 }[/math] length region with middle word [math]\displaystyle{ \omega_i }[/math] which is the i-th word of the document. And then it uses word embeddings and the local context units to produce region embedding. In the following, we first introduce local context unit, then two architectures to generate the region embedding, and how to classify text.

Local context unit

The vocabulary is represented by a matrix [math]\displaystyle{ \mathbf{E}\isin mathbb{R}^{h \times v} }[/math] with a look up layer, denoted by the embedding [math]\displaystyle{ e_\omega }[/math]. The i-th column represents the embedding of [math]\displaystyle{ \omega_i }[/math], denoted by [math]\displaystyle{ \mathbf{e}_\omega_i }[/math].

For each word [math]\displaystyle{ \omega_i }[/math], we define the local context unit [math]\displaystyle{ \mathbf{K}_\omega_i\isin \mathbb{R}^{h\times\left (2c+1\right )} }[/math]. Let [math]\displaystyle{ \mathbf{K}_\{omega_i,t} }[/math] be the (c+t)-th column in [math]\displaystyle{ \mathbf{K}_\omega_i \left (t\isin\left [ -c,c\right ] }[/math], representing a distinctive linear projection function on [math]\displaystyle{ \mathbf(e)_{c+t} }[/math] in the local context [math]\displaystyle{ r\left (i,c\right ) }[/math]. Thus, we can utilize local ordered word information in terms of each word.

Define [math]\displaystyle{ \mathbf{p}_{\omega_i+t}^i }[/math] as the projected word embedding of [math]\displaystyle{ \omega_i+t }[/math] in i-th word’s view, computed by: [math]\displaystyle{ \mathbf{p}_{\omega_i+t}^i = \mathbf{K}_\{omega_i,t} \odot \mathbf{\omega_{i+t}} }[/math] where [math]\displaystyle{ \odot }[/math] denotes an elemet-wise multiplication.

Note local context units and embedding are learned as model parameters. Local context units can be learned to capture the semantic and syntactic influence of each word to its context.