hierarchical Dirichlet Processes: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
It is a common practice for Frequentists to tune the latent dimension K in order to get the best performance of a model.
If we can put a prior on random partition and use likelihood distribution to model data points, we can use the Bayesian framework to learn the latent dimension, which is the main idea of Dirichlet process mixture model. When it comes to hierarchical clustering problem, we usually assume some information is shared between groups. One natural proposal about hierarchical clustering problem is each group i is modeled by a Dirichlet process mixture model DP(G_0(i))and all base measure G_0(i) are related to a parametric form G_0(). However, if the G_0() is continuous, this proposal generally cannot model shared information between groups. One idea is to make  G_0() become discrete by limiting the choice of G_0().
One weakness of this practice is that the corpus is unchanged, which means it is generally difficult to do inference given new unseen data points. In that case, we may either re-train the model in the whole corpus including these unseen data points or use some algebraic/heuristic fold-in technique to do inference.
The main idea of this paper is to use any base measure H and let G_0()=G_0 which is drawn from other Dirichlet process DP(H). Note that  G_0 is discrete with probability one due to the fact of Dirichlet process.
If we can come out some prior on the latent dimension and likelihood distribution on data points, we learn the latent dimension K on the fly from the corpus based on the Bayesian framework. This is a important property when it comes to online data stream mining.
 
In this paper, the authors mainly focus on the clustering problem using infinite linear mixture model.
==1. Introduction==
By doing it in Bayesian way,we can put a prior on random partition, which is the main idea of Dirichlet processes.
It is a common practice to tune the latent dimension K in order to get the best performance of a model. One weakness of this practice is that the corpus is static and unchanged, which means it is generally difficult to do inference given new unseen data points. In that case, we may either re-train the model in the whole corpus including these unseen data points or use some algebraic/heuristic fold-in technique to do inference.  
When it comes to clustering discrete data such as text documents, we usually assume some discrete information is shared among these documents, which Dirichlet process usually can not model. Therefore, the authors proposed Hierarchical Dirichlet Processes to address this issue.
If we can come out some prior on the latent dimension and likelihood distribution on data points, we learn the latent dimension K on the fly from the corpus based on the Bayesian framework. This is a important property when it comes to online stream mining.  
 
 
 
 
 
==2. Dirichlet process==
 
==3. Hierarchical Dirichlet process==
 
==4. Inference==

Revision as of 15:28, 22 July 2013

If we can put a prior on random partition and use likelihood distribution to model data points, we can use the Bayesian framework to learn the latent dimension, which is the main idea of Dirichlet process mixture model. When it comes to hierarchical clustering problem, we usually assume some information is shared between groups. One natural proposal about hierarchical clustering problem is each group i is modeled by a Dirichlet process mixture model DP(G_0(i))and all base measure G_0(i) are related to a parametric form G_0(). However, if the G_0() is continuous, this proposal generally cannot model shared information between groups. One idea is to make G_0() become discrete by limiting the choice of G_0(). The main idea of this paper is to use any base measure H and let G_0()=G_0 which is drawn from other Dirichlet process DP(H). Note that G_0 is discrete with probability one due to the fact of Dirichlet process.

1. Introduction

It is a common practice to tune the latent dimension K in order to get the best performance of a model. One weakness of this practice is that the corpus is static and unchanged, which means it is generally difficult to do inference given new unseen data points. In that case, we may either re-train the model in the whole corpus including these unseen data points or use some algebraic/heuristic fold-in technique to do inference. If we can come out some prior on the latent dimension and likelihood distribution on data points, we learn the latent dimension K on the fly from the corpus based on the Bayesian framework. This is a important property when it comes to online stream mining.



2. Dirichlet process

3. Hierarchical Dirichlet process

4. Inference