User:Dgroot/Oct.1 lecture: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with "''Temporary page to avoid edit conflicts'' ==Oct. 1 lecture== ===Maximum variance unfolding=== In maximum variance unfolding (MVU), as in isomap and LLE, we operate under the as...")
 
No edit summary
 
Line 5: Line 5:
In maximum variance unfolding (MVU), as in isomap and LLE, we operate under the assumption that the data is sampled from a low-dimensional [http://en.wikipedia.org/wiki/Manifold manifold] that is embedded within a higher-dimensional Euclidean space. The aim is to "unfold" this manifold in a lower-dimensional Euclidean space such that the local structure of each point is preserved for each point's <math>k</math> nearest neighbours, while maximizing the pairwise variance for all points which are not neighbours. The MVU algorithm uses semidefinite programming to find a kernel that satisfies this aim.
In maximum variance unfolding (MVU), as in isomap and LLE, we operate under the assumption that the data is sampled from a low-dimensional [http://en.wikipedia.org/wiki/Manifold manifold] that is embedded within a higher-dimensional Euclidean space. The aim is to "unfold" this manifold in a lower-dimensional Euclidean space such that the local structure of each point is preserved for each point's <math>k</math> nearest neighbours, while maximizing the pairwise variance for all points which are not neighbours. The MVU algorithm uses semidefinite programming to find a kernel that satisfies this aim.


Consider a mapping <math>\Phi</math> such that <math>x \mapsto \phi(x)</math>.
====Optimization constraints====
 
For a kernel <math>\Phi</math> such that <math>x \mapsto \phi(x)</math>, we have the constraint that for <math>i, j</math> such that <math>x_i, x_j</math> are neighbours, we require that <math>\Phi</math> satisfies
Then for <math>i, j</math> such that <math>x_i, x_j</math> are neighbours, we require that <math>\Phi</math> satisfies
:<math>| x_i - x_j |^2 = |\phi(x_i) - \phi(x_j)|^2</math>
:<math>| x_i - x_j |^2 = |\phi(x_i) - \phi(x_j)|^2</math>
:<math>= [\phi(x_i) - \phi(x_j)]^T [\phi(x_i) - \phi(x_j)] </math>
:<math>= [\phi(x_i) - \phi(x_j)]^T [\phi(x_i) - \phi(x_j)] </math>
Line 14: Line 13:
:<math>=k_{ii} -2k_{ij} + k_{jj}</math>
:<math>=k_{ii} -2k_{ij} + k_{jj}</math>


where <math>k=\phi^T \phi</math> is the <math>n \times n</math> matrix corresponding to the kernel of <math>\Phi</math>.
where <math>K=\phi^T \phi</math> is the <math>n \times n</math> kernel matrix. We can summarize this constraint as
 
:<math>n_{ij} ||x_i - x_j||^2 = n_{ij} || \phi(x_i) - \phi(x_J) ||^2</math>
 
where <math>n_{ij}</math> is the neighbourhood indicator, that is, <math>n_{ij} = 1</math> if <math>x_i, x_j</math> are neighbours and <math>0</math> otherwise.
 
We also have a constraint that <math>\Phi</math> is centred, i.e. <math>\sum\limits_{i,j} k_{ij} = 0</math>.
 
Furthermore, <math>k</math> should be positive semi-definite, i.e. <math>k \succeq 0 </math>.
 
====Optimization problem====
The ideal optimization problem would be to minimize the rank of the matrix K, but unfortunately that is not a convex problem so we cannot solve it. Since we would like to maximize the variance of non-neighbour points, we can use the objective function
 
:<math>\text{max tr}(K)</math>
 
and thus the problem can be solved using semidefinite programming.

Latest revision as of 17:54, 3 October 2014

Temporary page to avoid edit conflicts

Oct. 1 lecture

Maximum variance unfolding

In maximum variance unfolding (MVU), as in isomap and LLE, we operate under the assumption that the data is sampled from a low-dimensional manifold that is embedded within a higher-dimensional Euclidean space. The aim is to "unfold" this manifold in a lower-dimensional Euclidean space such that the local structure of each point is preserved for each point's [math]\displaystyle{ k }[/math] nearest neighbours, while maximizing the pairwise variance for all points which are not neighbours. The MVU algorithm uses semidefinite programming to find a kernel that satisfies this aim.

Optimization constraints

For a kernel [math]\displaystyle{ \Phi }[/math] such that [math]\displaystyle{ x \mapsto \phi(x) }[/math], we have the constraint that for [math]\displaystyle{ i, j }[/math] such that [math]\displaystyle{ x_i, x_j }[/math] are neighbours, we require that [math]\displaystyle{ \Phi }[/math] satisfies

[math]\displaystyle{ | x_i - x_j |^2 = |\phi(x_i) - \phi(x_j)|^2 }[/math]
[math]\displaystyle{ = [\phi(x_i) - \phi(x_j)]^T [\phi(x_i) - \phi(x_j)] }[/math]
[math]\displaystyle{ =\phi(x_i)^T \phi(x_i) - \phi(x_i)^T \phi(x_j) - \phi(x_j)^T \phi(x_i) + \phi(x_j)^T \phi(x_j) }[/math]
[math]\displaystyle{ =k_{ii} - k_{ij} - k_{ji} + k_{jj} }[/math]
[math]\displaystyle{ =k_{ii} -2k_{ij} + k_{jj} }[/math]

where [math]\displaystyle{ K=\phi^T \phi }[/math] is the [math]\displaystyle{ n \times n }[/math] kernel matrix. We can summarize this constraint as

[math]\displaystyle{ n_{ij} ||x_i - x_j||^2 = n_{ij} || \phi(x_i) - \phi(x_J) ||^2 }[/math]

where [math]\displaystyle{ n_{ij} }[/math] is the neighbourhood indicator, that is, [math]\displaystyle{ n_{ij} = 1 }[/math] if [math]\displaystyle{ x_i, x_j }[/math] are neighbours and [math]\displaystyle{ 0 }[/math] otherwise.

We also have a constraint that [math]\displaystyle{ \Phi }[/math] is centred, i.e. [math]\displaystyle{ \sum\limits_{i,j} k_{ij} = 0 }[/math].

Furthermore, [math]\displaystyle{ k }[/math] should be positive semi-definite, i.e. [math]\displaystyle{ k \succeq 0 }[/math].

Optimization problem

The ideal optimization problem would be to minimize the rank of the matrix K, but unfortunately that is not a convex problem so we cannot solve it. Since we would like to maximize the variance of non-neighbour points, we can use the objective function

[math]\displaystyle{ \text{max tr}(K) }[/math]

and thus the problem can be solved using semidefinite programming.