the Indian Buffet Process: An Introduction and Review

From statwiki
Jump to navigation Jump to search

The Indian Buffet Process (IBP) is one of Bayesian nonparametric models, which is a prior measure on an infinite binary matrix. Unlike the Dirichlet process(DP), where each atom has negative correlation, IBP assumes each atom is independent.


Introduction

IBP is often used in factor analysis as a prior of infinite factors.

IBP can be viewed as an extension of DP, where we drop the constraint [math]\displaystyle{ \sum_{i=1}^{\infty}{\pi_i}=1 }[/math].

Because we drop the constraint, we cannot naturally use IBP as a prior of mixture models.

Representations

Like DP, IBP has several representations.

the limiting of finite distribution on sparse binary feature matrices

We have N data points and K features and the possession of feature k by data point i is indicated by a binary variable [math]\displaystyle{ z_{ik} }[/math]

The generative process of the binary feature matrix is defined as below:

  • for each feature k
    • for each data point i
      • [math]\displaystyle{ \pi_k }[/math] ~ [math]\displaystyle{ Beta(\frac{\alpha}{K},1) }[/math]
      • [math]\displaystyle{ z_{ik} }[/math] ~ [math]\displaystyle{ Bernoulli(\pi_k) }[/math]

where [math]\displaystyle{ \alpha }[/math] is a hyper-parameter, which is similar to the parameter defined in DP.

When K goes into infinite, such generative process will become IBP.

stick breaking construction

  • For each feature k
    • [math]\displaystyle{ \mu_k }[/math] ~ [math]\displaystyle{ Beta(\alpha,1) }[/math]
    • [math]\displaystyle{ \pi_{k}=\prod_{l=1}^{k}(\mu_l) }[/math]
    • For each data point i
      • [math]\displaystyle{ z_{ik} }[/math] ~ [math]\displaystyle{ Bernoulli(\pi_k) }[/math]

the Indian buffet metaphor

N customers enter a restaurant one after another. Each customer encounters a buffet consisting of infinitely many dishes arranged in a line. The first customer starts at the left of the buffet and takes a serving from each dishes, stopping after a [math]\displaystyle{ Poisson(\alpha) }[/math] number of dishes as his plate becomes overburdened.

The ith customer moves along the buffet,sampling dishes in proportion to their popularity, serving himself with probability [math]\displaystyle{ \frac{m_k}{i} }[/math], where m_k is the number of previous customers who have sampled a dish. Having reached the end of all previous sampled dishes, the ith customer then tries a [math]\displaystyle{ Poisson(\frac{\alpha}{i}) }[/math] number of new dishes.

Properties of the distribution

  • The effective dimension of the distribution, which is the number of columns with at least one non-zero component, follow a [math]\displaystyle{ Poisson(\alpha H_N) }[/math], where HN is the Nth harmonic number, i.e. [math]\displaystyle{ H_N=\sum_{j=1}^N \frac{1}{j} }[/math].
  • The number of feature possessed by each object follow a [math]\displaystyle{ Poisson(\alpha) }[/math] distribution. This follows from the exchangeable property of IBP.
  • The binary matrix generated from IBP remains sparse as [math]\displaystyle{ K\rightarrow \infty }[/math]. Actually, [math]\displaystyle{ lim_{K\rightarrow \infty}E[1^T Z1]=N\alpha }[/math].

Inference

Like DP, we can use MCMC sampling framework based on sticking breaking construction and the Indian buffet metaphor to stimulate random samples from IBP.

We can use Gibbs sampling to generate sample from IBP.

Choosing an ordering on data points such as the ith data point corresponds to the last customer to visit the buffet, we obtain: [math]\displaystyle{ p(z_{ik}=1|z_{-ik})=\frac{m_{-i,k}}{N} }[/math] for any feature k such that [math]\displaystyle{ m_{-ik}\gt 0 }[/math] Similarly the number of new features associated with data point i should be drawn from a [math]\displaystyle{ Poisson(\frac{\alpha}{N}) }[/math] distribution. where [math]\displaystyle{ z_{-ik} }[/math] denotes the set of assignments of other data points, not including data point i for feature k and [math]\displaystyle{ m_{-i,k} }[/math] is the number of data points possessing feature k, not including i.

A binary matrix generated by the Indian buffet process with [math]\displaystyle{ \alpha = 10 }[/math].

A binary matrix generated by the Indian buffet process with [math]\displaystyle{ \alpha = 10 }[/math].

Application

Due to the fact of IBP that each atom in IBP is independent, IBP can be used as a building block to construct a hierarchical mixture model, with the help of additional normalized measure such as DP.

In the Indian Buffet Process compound Dirichlet process model (IBPCDP), the authors proposed to use IBP to select independent weight and then to use DP to normalize these weights. The model is closed related to hierarchical Dirichlet process(HDP), where the top level and down level are both drawn from DP.

Conclusion

IBP is often used as a sparse prior to model things that can be represented in a matrix, such as factor analysis.

IBP can be extended to a Beta process, which can capture the power-law phenomenon.

Unlike other finite model, IBP can dynamically fit data when input data are never-ending.

Discussion

Based on the Bayesian non-parametric framework, IBP can be used as a new building block of latent feature modeling.

However, whether Bayesian non-parametric model is superior to large parametric model is an open question.

Although IBP can not naturally use in probabilistic mixture model, IBP can be introduced into probabilistic mixture model to combine different models(features) with the help of extra normalized measure such as DP.

References

Williamson, Sinead, et al. "The IBP compound Dirichlet process and its application to focused topic modeling." (2010).

Griffiths, Thomas L., and Zoubin Ghahramani. "The Indian Buffet Process: An Introduction and Review." Journal of Machine Learning Research 12 (2011): 1185-1224.