hierarchical Dirichlet Processes
It is a common practice for Frequentists to tune the latent dimension K in order to get the best performance of a model. One weakness of this practice is that we assume the corpus is unchanged, which means it is generally difficult to do inference given new unseen data points. In that case, we may either re-train the model in the whole corpus including these unseen data points or use some algebraic/heuristic fold-in technique to do inference. If we can come out some prior on the latent dimension and likelihood distribution on data points, we learn the latent dimension K on the fly from the corpus based on the Bayesian framework. This is a important property when it comes to online data stream mining. In this paper, the authors mainly focus on the clustering problem using infinite linear mixture model. When doing it in Bayesian way,we can put a prior on random partition, which is the main idea of Dirichlet processes. When it comes to clustering discrete data such as text documents, we usually assume some discrete information is shared among these documents, which Dirichlet process usually can not model. Therefore, the authors proposed Hierarchical Dirichlet Processes to address this issue.