adaptive dimension reduction for clustering high dimensional data: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 26: Line 26:


The cluster membership information is contained in the posterior probability <math>h_i^k</math>, <center><math>h_i^k = Pr(c_i =k | y_i, \theta)</math></center> This measures the probability of point <math>i</math> belongs to cluster <math>c_k</math> given current model (parameters) and the evidence (value of <math>y_t</math>).
The cluster membership information is contained in the posterior probability <math>h_i^k</math>, <center><math>h_i^k = Pr(c_i =k | y_i, \theta)</math></center> This measures the probability of point <math>i</math> belongs to cluster <math>c_k</math> given current model (parameters) and the evidence (value of <math>y_t</math>).
\begin{enumerate}
\item  Inialize model parameters$ \{ \pi_k, \upsilon_k, \sigma_k\}$
\item compute $\{h_i^k \}$, $h_i^k=\frac{\pi_kg_k^r(y)}{\sum_k \pi_kg_k^r(y)}$
\item Update:
\begin{itemize}
\item  compute the number of points belonging to cluster $c_k : n_k=\sum_i h_i^k$
\item  update priors: $\pi_k=n_k/N $
\item  update centers: $\upsilon_k=\sum_i h_i^ky_i/n_k$
\item  update covariances: $\sigma_k=\sum_i h_i^k||y_i-\mu_k||^2/rn_k$
\end{itemize}
\item steps 2 and 3 are repeated until convergence
\end{enumerate}


== 5. Adaptive Dimension Reduction for K-means ==
== 5. Adaptive Dimension Reduction for K-means ==

Revision as of 14:25, 27 July 2013

1. Introduction

Clustering methods such as the K-means and EM suffer from local minima problems. In high dimensional space, the cost function surface is very rugged and it is easy to get trapped somewhere close to the initial configurations. The conventional method is to try a number of initial values, and pick up the best one of the results.

The Paper proposes a approach to tackle this problem. The approach utilizes the idea of dimension reduction. In the paper, (i) they approach dimension reduction as a dynamic process that is adaptively adjusted and integrated with clustering process (ii) make effective use of cluster membership to connect reduced dimensional space and full dimension space.

The paper focuses on K-means and EM algorithms using mixture model of spherical Gaussian components, which retain identical model parameters in reduced low-dimensional subspace as in high dimensional space.

2. Effective Dimension for Clustering

The idea is to perform clustering in low dimensional subspaces. Low dimensional subspace is interpreted as containing relevant attributes( linear combinations of coordinates). The dimensionality r of subspace should satisfy: [math]\displaystyle{ y \leq K-1 }[/math] based on linear discriminant analysis. Thus the effective clustering dimensions for the K spherical Gaussians are spanned by the K centers [math]\displaystyle{ \mu_1, ...,\mu_k }[/math], for which r=K-1. We call the relevant dimensions passing through all K centers to the r-dim subspace. The effective dimensionality of the relevant subspace could be less than K-1. This happens when the K cluster centers lie in a subspace with dimension [math]\displaystyle{ r\lt K-1 }[/math]

3. EM in relevant subspace

The idea: the irrelevant dimensions can be integrated out, and the resulting marginal distribution follows the same Gaussian mixture functional form. Then it can freely move between subspace and original space. Knowing them in the subspace, we can directly infer the centers in the original space. Assume the following mixture model:

[math]\displaystyle{ p(x)=\pi_1g_1^d(x-\mu_1)+...+\pi_kg_k^d(x-\mu_k) }[/math]

Where each component is a spherical Gaussian distribution,

[math]\displaystyle{ g_k^d(x)=\frac{1}{(\sqrt{2\pi}\sigma_k)^d} exp(-\frac{||x-\mu_k||^2}{2\sigma_k^2}) }[/math] and x, ,[math]\displaystyle{ \mu_k }[/math] are vectors in d-dim space. It can be denoted as [math]\displaystyle{ N^{(d)}(\mu_k,\sigma_k) }[/math] . Two invariant properties of spherical Gaussian functions: (i) Remain invariant under any orthogonal coordinate rotation operation [math]\displaystyle{ R: x\leftarrow Rx: g_k^d(Rx|R\theta)=g_k^d(x|\theta) }[/math] (ii) coordinate translation(shift) operation L: [math]\displaystyle{ g_k^d(Lx|L\theta)=g_k^d(x|\theta) }[/math]

4. Adaptive Dimension Reduction for EM

the centers (or centroids in the K-means) obtained in clustering in the r-dim subspace can be uniquely traced back to the original d-dim space by using the cluster membership of each data point. This observation is the basis of our ADR-EM clustering.

The cluster membership information is contained in the posterior probability [math]\displaystyle{ h_i^k }[/math],

[math]\displaystyle{ h_i^k = Pr(c_i =k | y_i, \theta) }[/math]

This measures the probability of point [math]\displaystyle{ i }[/math] belongs to cluster [math]\displaystyle{ c_k }[/math] given current model (parameters) and the evidence (value of [math]\displaystyle{ y_t }[/math]).

\begin{enumerate} \item Inialize model parameters$ \{ \pi_k, \upsilon_k, \sigma_k\}$ \item compute $\{h_i^k \}$, $h_i^k=\frac{\pi_kg_k^r(y)}{\sum_k \pi_kg_k^r(y)}$ \item Update: \begin{itemize} \item compute the number of points belonging to cluster $c_k : n_k=\sum_i h_i^k$ \item update priors: $\pi_k=n_k/N $ \item update centers: $\upsilon_k=\sum_i h_i^ky_i/n_k$ \item update covariances: $\sigma_k=\sum_i h_i^k||y_i-\mu_k||^2/rn_k$ \end{itemize} \item steps 2 and 3 are repeated until convergence \end{enumerate}

5. Adaptive Dimension Reduction for K-means