stat441w18/A New Method of Region Embedding for Text Classification
Method
This paper focuses on representing small text regions which can preserve local internal structural information for specific text classification. It defines [math]\displaystyle{ region\left ( i,c\right ) }[/math] as the [math]\displaystyle{ 2\times c+1 }[/math] length region with middle word [math]\displaystyle{ \omega_i }[/math] which is the i-th word of the document. And then it uses word embeddings and the local context units to produce region embedding. In the following, we first introduce local context unit, then two architectures to generate the region embedding, and how to classify text.
Local context unit
The vocabulary is represented by a matrix [math]\displaystyle{ \mathbf{E}\in \mathbb{R}^{h \times v} }[/math] with a look up layer, denoted by the embedding [math]\displaystyle{ e_\omega }[/math]. The i-th column represents the embedding of [math]\displaystyle{ \omega_i }[/math], denoted by [math]\displaystyle{ \mathbf{e}_\omega_i }[/math].
For each word [math]\displaystyle{ \omega_i }[/math], we define the local context unit [math]\displaystyle{ \mathbf{K}_\omega_i\in \mathbb{R}^{h\times\left (2c+1\right )} }[/math]. Let [math]\displaystyle{ \mathbf{K}_\{omega_i,t} }[/math] be the (c+t)-th column in [math]\displaystyle{ \mathbf{K}_\omega_i \left (t\in\left [ -c,c\right ] }[/math], representing a distinctive linear projection function on [math]\displaystyle{ \mathbf(e)_{c+t} }[/math] in the local context [math]\displaystyle{ r\left (i,c\right ) }[/math]. Thus, we can utilize local ordered word information in terms of each word.
Define [math]\displaystyle{ \mathbf{p}_{\omega_i+t}^i }[/math] as the projected word embedding of [math]\displaystyle{ \omega_i+t }[/math] in i-th word’s view, computed by: [math]\displaystyle{ \mathbf{p}_{\omega_i+t}^i = \mathbf{K}_\{omega_i,t} \odot \mathbf{\omega_{i+t}} }[/math] where [math]\displaystyle{ \odot }[/math] denotes an element-wise multiplication.
Note local context units and embedding are learned as model parameters. Local context units can be learned to capture the semantic and syntactic influence of each word to its context.
Word-context region embedding
We proposed two architectures to perform the region embedding from different perspectives. The first one is word-context region embedding, and the second one is context-word region embedding. In this paper, we consider middle words of the regions, hence we can compose the semantics of a give region only by the middle words influences on the context words, or the context words’ influences on the middle word. The most important thing from these two different perspectives in this model is the interaction between words.
In the first proposed architecture, we focus on addressing the middle word's influences on the context words. For example, in the sentence The food is not very good in this hotel, the occurrence of word not might bring a semantic reversal to the local region.