stat441F18/TCNLM

From statwiki
Revision as of 13:12, 5 November 2018 by B257zhan (talk | contribs)
Jump to navigation Jump to search

Topic Compositional Neural Language Model (TCNLM) simultaneously captures both the global semantic meaning and the local word-ordering structure in a document. A common TCNLM incorporates fundamental components of both a neural topic model (NTM) and a Mixture-of-Experts (MoE) language model. The latent topics learned within a variational autoencoder framework, coupled with the probability of topic usage, are further trained in a MoE model. (Insert figure here)

TCNLM networks are well-suited for topic classification and sentence generation on a given topic. The combination of latent topics, weighted by the topic-usage probabilities, yields an effective prediction for the sentences. TCNLMs were also developed to address the incapability of RNN-based neural language models in capturing broad document context. After learning the global semantic, the probability of each learned latent topic is used to learn the local structure of a word sequence.

Presented by

  • Yan Yu Chen
  • Qisi Deng
  • Hengxin Li
  • Bochao Zhang

Topic Model

A topic model is a probabilistic model that unveils the hidden semantic structures of a document. Topic modelling follows the philosophy that particular words will appear more frequently than others in certain topics.

LDA

A common example of a topic model would be latent Dirichlet allocation (LDA), which assumes each document contains various topics but with different proportion. LDA parameterizes topic distribution by the Dirichlet distribution and calculates the marginal likelihood as (insert formula).

Neural Topic Model

The neural topic model takes in a bag-of-words representation of a document to predict the topic distribution of the document in wish to identify the global semantic meaning of documents.

The variables are defined as the following:

d be document with D distinct vocabulary d be the bag-of-words representation of document d (each element of d is the count of the number of times the corresponding word appears in d), t be the topic proportion for document d T be the number of topics zn be the topic assignment for word wn beta be the transition matrix from the topic distribution trained in the decoder where is the topic distribution over the i-th word in the corresponding d.

Similar to LDA, the neural topic model parameterized the multinational document topic distribution. However, it uses a Gaussian random vector by passing it through a softmax function. The generative process in the following:

Where are trainable parameters.

The marginal likelihood for document d is then calculated as the following:

Re-Parameritization Trick

In order to build an unbiased and low-variance gradient estimator for the variational distribution, TCNLM uses the re-parameterization trick. The update for the parameters is derived from variational lower bound will be discussed in the section model inference.

Diversity Regularizer

One of the problems that many topic models encounter is the redundancy in the inferred topics. Therefore, The TCNLM uses a diversity regularizer to reduce it. The idea is to regularize the row-wise distance between each paired topics. First, we measure the distance between pair of topics with . Then, mean angle of all pairs of T topics is , and variance is . Finally, we identify the topic diversity regularization as which will be used in the model inference.

Language Model

A typical Language Model aims to define the conditional probability of each word \y_{m} given all the preceding input \y_{1},...,\y_{m-1}, connected through the hidden state hm.

RNN (LSTM)

Recurrent Neural Networks (RNNs) capture the temporal relationship among input information and output a sequence of input-dependent data. Comparing to traditional feedforward neural networks, RNNs maintains internal memory by looping over previous information inside each network. For its distinctive design, RNNs have shortcomings when learning from long-term memory as a result of the zero gradients in back-propagation, which prohibits states distant in time from contributing to the output of current state. Long short-term Memory (LSTM) or Gated Recurrent Unit (GRU) are variations of RNNs that were designed to address the vanishing gradient issue.

Neural Language Model

Model Evaluation

Extensions