adaptive dimension reduction for clustering high dimensional data

From statwiki
Revision as of 20:41, 22 July 2013 by Hcheng1118 (talk | contribs) (2. Effective Dimension for Clustering)
Jump to: navigation, search

1. Introduction

Clustering methods such as the K-means and EM suffer from local minima problems. In high dimensional space, the cost function surface is very rugged and it is easy to get trapped somewhere close to the initial configurations. The conventional method is to try a number of initial values, and pick up the best one of the results.

The Paper proposes a approach to tackle this problem. The approach utilizes the idea of dimension reduction. In the paper, (i) they approach dimension reduction as a dynamic process that is adaptively adjusted and integrated with clustering process (ii) make effective use of cluster membership to connect reduced dimensional space and full dimension space.

The paper focuses on K-means and EM algorithms using mixture model of spherical Gaussian components, which retain identical model parameters in reduced low-dimensional subspace as in high dimensional space.

2. Effective Dimension for Clustering

Low dimensional subspace is interpreted as containing relevant attributes( linear combinations of coordinates). The dimensionality r of subspace should satisfy: [math]Insert y \less K-1[/math]

3. EM in relevant subspace

4. Adaptive Dimension Reduction for EM

5. Adaptive Dimension Reduction for K-means