learning Spectral Clustering, With Application To Speech Separation: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 1: Line 1:
==Clustering==
=Introduction=
The paper <ref>Francis R. Bach and Michael I. Jordan, Learning Spectral Clustering,With Application To Speech Separation, Journal of Machine Learning Research 7 (2006) 1963-2001.</ref> presented here is about spectral clustering which makes use of dimension reduction and learning a similarity matrix that generalizes to the unseen datasets when spectral clustering is applied to them.
 
=Clustering=
Clustering refers to partition a given dataset into clusters such that data points in the same cluster are similar and data points in different clusters are dissimilar. Similarity is usually measured over distance between data points.
Clustering refers to partition a given dataset into clusters such that data points in the same cluster are similar and data points in different clusters are dissimilar. Similarity is usually measured over distance between data points.


Formally stated, given a set of data points <math>X=\{{{\mathbf x}}_1,{{\mathbf x}}_2,\dots ,{{\mathbf x}}_P\}</math>, we would like to find <math>K</math> disjoint clusters <math>{C{\mathbf =}\{C_k\}}_{k\in \{1,\dots ,K\}}</math> such that <math>\bigcup{C_k}=X</math>, that optimizes a certain objective function.
Formally stated, given a set of data points <math>X=\{{{\mathbf x}}_1,{{\mathbf x}}_2,\dots ,{{\mathbf x}}_P\}</math>, we would like to find <math>K</math> disjoint clusters <math>{C{\mathbf =}\{C_k\}}_{k\in \{1,\dots ,K\}}</math> such that <math>\bigcup{C_k}=X</math>, that optimizes a certain objective function. The dimensionality of data points is <math>D</math>, and <math>X</math> can be represented as a matrix <math>{{\mathbf X}}_{D\times P}</math>.The similarity matrix that measures the similarity between each pair of points is denoted by <math>{{\mathbf W}}_{P\times P}</math>. A classical similarity matrix for clustering is the diagonally-scaled Gaussian similarity, defined as
<br><math>\mathbf W(i,j)= \rm exp (-(\mathbf{x}_i-\mathbf{x}_j)^{\rm T}Diag(\boldsymbol{\alpha})(\mathbf{x}_i-\mathbf{x}_j) ) </math>
<br>where <math>{\mathbf \boldsymbol{\alpha} }\in {{\mathbb R}}^D</math> is a vector of positive parameters, and <math>\rm Diag(\boldsymbol{\alpha} )</math> denotes the <math>D\times D</math> diagonal matrix with diagonal <math>{\boldsymbol{\alpha} }</math>.
 
=Objective functions=
==Objective function for K-means clustering==
Given the number of clusters <math>K</math>, it aims to minimize an objective function (sum of within-cluster distance) over all clustering scheme <math>C</math>.
<br> <math>\mathop{\min_C} J=\sum^K_{k=1}\sum_{\mathbf x \in C_k}\|\mathbf x - \boldsymbol{\mu}\|^2</math>

Revision as of 18:25, 30 June 2009

Introduction

The paper <ref>Francis R. Bach and Michael I. Jordan, Learning Spectral Clustering,With Application To Speech Separation, Journal of Machine Learning Research 7 (2006) 1963-2001.</ref> presented here is about spectral clustering which makes use of dimension reduction and learning a similarity matrix that generalizes to the unseen datasets when spectral clustering is applied to them.

Clustering

Clustering refers to partition a given dataset into clusters such that data points in the same cluster are similar and data points in different clusters are dissimilar. Similarity is usually measured over distance between data points.

Formally stated, given a set of data points [math]\displaystyle{ X=\{{{\mathbf x}}_1,{{\mathbf x}}_2,\dots ,{{\mathbf x}}_P\} }[/math], we would like to find [math]\displaystyle{ K }[/math] disjoint clusters [math]\displaystyle{ {C{\mathbf =}\{C_k\}}_{k\in \{1,\dots ,K\}} }[/math] such that [math]\displaystyle{ \bigcup{C_k}=X }[/math], that optimizes a certain objective function. The dimensionality of data points is [math]\displaystyle{ D }[/math], and [math]\displaystyle{ X }[/math] can be represented as a matrix [math]\displaystyle{ {{\mathbf X}}_{D\times P} }[/math].The similarity matrix that measures the similarity between each pair of points is denoted by [math]\displaystyle{ {{\mathbf W}}_{P\times P} }[/math]. A classical similarity matrix for clustering is the diagonally-scaled Gaussian similarity, defined as
[math]\displaystyle{ \mathbf W(i,j)= \rm exp (-(\mathbf{x}_i-\mathbf{x}_j)^{\rm T}Diag(\boldsymbol{\alpha})(\mathbf{x}_i-\mathbf{x}_j) ) }[/math]
where [math]\displaystyle{ {\mathbf \boldsymbol{\alpha} }\in {{\mathbb R}}^D }[/math] is a vector of positive parameters, and [math]\displaystyle{ \rm Diag(\boldsymbol{\alpha} ) }[/math] denotes the [math]\displaystyle{ D\times D }[/math] diagonal matrix with diagonal [math]\displaystyle{ {\boldsymbol{\alpha} } }[/math].

Objective functions

Objective function for K-means clustering

Given the number of clusters [math]\displaystyle{ K }[/math], it aims to minimize an objective function (sum of within-cluster distance) over all clustering scheme [math]\displaystyle{ C }[/math].
[math]\displaystyle{ \mathop{\min_C} J=\sum^K_{k=1}\sum_{\mathbf x \in C_k}\|\mathbf x - \boldsymbol{\mu}\|^2 }[/math]