stat441F18/TCNLM: Difference between revisions
Line 23: | Line 23: | ||
The variables are defined as the following: | The variables are defined as the following: | ||
d be document with D distinct vocabulary | *d be document with D distinct vocabulary | ||
d be the bag-of-words representation of document d (each element of d is the count of the number of times the corresponding word appears in d), | *<math>\bold{d} \in \mathbb{Z}_+^D</math> be the bag-of-words representation of document d (each element of d is the count of the number of times the corresponding word appears in d), | ||
t be the topic proportion for document d | t be the topic proportion for document d | ||
T be the number of topics | T be the number of topics |
Revision as of 12:15, 5 November 2018
Topic Compositional Neural Language Model (TCNLM) simultaneously captures both the global semantic meaning and the local word-ordering structure in a document. A common TCNLM incorporates fundamental components of both a neural topic model (NTM) and a Mixture-of-Experts (MoE) language model. The latent topics learned within a variational autoencoder framework, coupled with the probability of topic usage, are further trained in a MoE model. (Insert figure here)
TCNLM networks are well-suited for topic classification and sentence generation on a given topic. The combination of latent topics, weighted by the topic-usage probabilities, yields an effective prediction for the sentences. TCNLMs were also developed to address the incapability of RNN-based neural language models in capturing broad document context. After learning the global semantic, the probability of each learned latent topic is used to learn the local structure of a word sequence.
Presented by
- Yan Yu Chen
- Qisi Deng
- Hengxin Li
- Bochao Zhang
Topic Model
A topic model is a probabilistic model that unveils the hidden semantic structures of a document. Topic modelling follows the philosophy that particular words will appear more frequently than others in certain topics.
LDA
A common example of a topic model would be latent Dirichlet allocation (LDA), which assumes each document contains various topics but with different proportion. LDA parameterizes topic distribution by the Dirichlet distribution and calculates the marginal likelihood as (insert formula).
Neural Topic Model
The neural topic model takes in a bag-of-words representation of a document to predict the topic distribution of the document in wish to identify the global semantic meaning of documents.
The variables are defined as the following:
- d be document with D distinct vocabulary
- [math]\displaystyle{ \bold{d} \in \mathbb{Z}_+^D }[/math] be the bag-of-words representation of document d (each element of d is the count of the number of times the corresponding word appears in d),
t be the topic proportion for document d T be the number of topics zn be the topic assignment for word wn beta be the transition matrix from the topic distribution trained in the decoder where is the topic distribution over the i-th word in the corresponding d.
Similar to LDA, the neural topic model parameterized the multinational document topic distribution. However, it uses a Gaussian random vector by passing it through a softmax function. The generative process in the following:
Where are trainable parameters.
The marginal likelihood for document d is then calculated as the following:
Re-Parameritization Trick
In order to build an unbiased and low-variance gradient estimator for the variational distribution, TCNLM uses the re-parameterization trick. The update for the parameters is derived from variational lower bound will be discussed in the section model inference.
Diversity Regularizer
One of the problems that many topic models encounter is the redundancy in the inferred topics. Therefore, The TCNLM uses a diversity regularizer to reduce it. The idea is to regularize the row-wise distance between each paired topics. First, we measure the distance between pair of topics with . Then, mean angle of all pairs of T topics is , and variance is . Finally, we identify the topic diversity regularization as which will be used in the model inference.
Language Model
A typical Language Model aims to define the conditional probability of each word \y_{m} given all the preceding input \y_{1},...,\y_{m-1}, connected through the hidden state hm.
RNN (LSTM)
Recurrent Neural Networks (RNNs) capture the temporal relationship among input information and output a sequence of input-dependent data. Comparing to traditional feedforward neural networks, RNNs maintains internal memory by looping over previous information inside each network. For its distinctive design, RNNs have shortcomings when learning from long-term memory as a result of the zero gradients in back-propagation, which prohibits states distant in time from contributing to the output of current state. Long short-term Memory (LSTM) or Gated Recurrent Unit (GRU) are variations of RNNs that were designed to address the vanishing gradient issue.
Neural Language Model
Model Comparison and Evaluation
Model Comparison
In this paper, TCNLM incorporates fundamental components of both an NTM and a MoE Language Model. Compared with other topic models and language models: 1. The recent work of Miao et al. (2017) employs variational inference to train topic models. In contrast, TCNLM enforces the neural network not only modeling documents as bag-of-words but also transferring the inferred topic knowledge to a language model for word-sequence generation. 2. Dieng et al. (2016) and Ahn et al. (2016) use a hybrid model combining the predicted word distribution given by both a topic model and a standard RNNLM. Distinct from this approach, TCNLM learns the topic model and the language model jointly under the VAE framework, allowing an efficient end-to-end training process. Further, the topic information is used as guidance for a MoE model design. Under TCNLM's factorization method, the model can yield boosted performance efficiently. 3. TCNLM is similar to Gan et al. (2016). However, Gan et al. (2016) use a two-step pipline, first learning a multi-label classifier on a group of pre-defined image tags, and then generating image captions conditioned on them. In comparison, TCNLM jointly learns a topic model and a language model and focuses on the language modeling task.