# hierarchical Dirichlet Processes

It is a common practice for Frequentists to tune the latent dimension K via cross-validation in order to get the best performance of a model. One weakness of this practice is that we assume the corpus is unchanged, which means it is generally difficult to do inference given new unseen data points. In that case, we may either re-train the model in the whole corpus including these unseen data points or use some algebraic/heuristic fold-in technique to do inference. If we can come out some prior on the latent dimension and likelihood distribution on data points, we can let data decide the latent dimension K on the fly based on the Bayesian framework. In this paper, the authors mainly focus on the clustering problem using infinite linear mixture model. When doing it in Bayesian way,we can put a prior on random partition, which is the idea of Dirichlet processes. When it comes to clustering discrete data such as text documents, we usually assume some discrete information is shared among these documents, which Dirichlet process usually can not model. Therefore, the authors proposed Hierarchical Dirichlet Processes to address this issue.