User:Dgroot/Oct.1 lecture: Difference between revisions

Latest revision as of 18:54, 3 October 2014

Temporary page to avoid edit conflicts

Oct. 1 lecture

Maximum variance unfolding

In maximum variance unfolding (MVU), as in isomap and LLE, we operate under the assumption that the data is sampled from a low-dimensional manifold that is embedded within a higher-dimensional Euclidean space. The aim is to "unfold" this manifold in a lower-dimensional Euclidean space such that the local structure of each point is preserved for each point's [math]\displaystyle{ k }[/math] nearest neighbours, while maximizing the pairwise variance for all points which are not neighbours. The MVU algorithm uses semidefinite programming to find a kernel that satisfies this aim.

Optimization constraints

For a kernel [math]\displaystyle{ \Phi }[/math] such that [math]\displaystyle{ x \mapsto \phi(x) }[/math], we have the constraint that for [math]\displaystyle{ i, j }[/math] such that [math]\displaystyle{ x_i, x_j }[/math] are neighbours, we require that [math]\displaystyle{ \Phi }[/math] satisfies

[math]\displaystyle{ | x_i - x_j |^2 = |\phi(x_i) - \phi(x_j)|^2 }[/math]

[math]\displaystyle{ = [\phi(x_i) - \phi(x_j)]^T [\phi(x_i) - \phi(x_j)] }[/math]

[math]\displaystyle{ =\phi(x_i)^T \phi(x_i) - \phi(x_i)^T \phi(x_j) - \phi(x_j)^T \phi(x_i) + \phi(x_j)^T \phi(x_j) }[/math]

[math]\displaystyle{ =k_{ii} - k_{ij} - k_{ji} + k_{jj} }[/math]

[math]\displaystyle{ =k_{ii} -2k_{ij} + k_{jj} }[/math]

where [math]\displaystyle{ K=\phi^T \phi }[/math] is the [math]\displaystyle{ n \times n }[/math] kernel matrix. We can summarize this constraint as

[math]\displaystyle{ n_{ij} ||x_i - x_j||^2 = n_{ij} || \phi(x_i) - \phi(x_J) ||^2 }[/math]

where [math]\displaystyle{ n_{ij} }[/math] is the neighbourhood indicator, that is, [math]\displaystyle{ n_{ij} = 1 }[/math] if [math]\displaystyle{ x_i, x_j }[/math] are neighbours and [math]\displaystyle{ 0 }[/math] otherwise.

We also have a constraint that [math]\displaystyle{ \Phi }[/math] is centred, i.e. [math]\displaystyle{ \sum\limits_{i,j} k_{ij} = 0 }[/math].

Furthermore, [math]\displaystyle{ k }[/math] should be positive semi-definite, i.e. [math]\displaystyle{ k \succeq 0 }[/math].

Optimization problem

The ideal optimization problem would be to minimize the rank of the matrix K, but unfortunately that is not a convex problem so we cannot solve it. Since we would like to maximize the variance of non-neighbour points, we can use the objective function

[math]\displaystyle{ \text{max tr}(K) }[/math]

and thus the problem can be solved using semidefinite programming.

@@ Line 5: / Line 5: @@
 In maximum variance unfolding (MVU), as in isomap and LLE, we operate under the assumption that the data is sampled from a low-dimensional [http://en.wikipedia.org/wiki/Manifold manifold] that is embedded within a higher-dimensional Euclidean space. The aim is to "unfold" this manifold in a lower-dimensional Euclidean space such that the local structure of each point is preserved for each point's <math>k</math> nearest neighbours, while maximizing the pairwise variance for all points which are not neighbours. The MVU algorithm uses semidefinite programming to find a kernel that satisfies this aim.
-Consider a mapping <math>\Phi</math> such that <math>x \mapsto \phi(x)</math>.
+====Optimization constraints====
+For a kernel <math>\Phi</math> such that <math>x \mapsto \phi(x)</math>, we have the constraint that for <math>i, j</math> such that <math>x_i, x_j</math> are neighbours, we require that <math>\Phi</math> satisfies
-Then for <math>i, j</math> such that <math>x_i, x_j</math> are neighbours, we require that <math>\Phi</math> satisfies
 :<math>| x_i - x_j |^2 = |\phi(x_i) - \phi(x_j)|^2</math>
 :<math>= [\phi(x_i) - \phi(x_j)]^T [\phi(x_i) - \phi(x_j)] </math>
@@ Line 14: / Line 13: @@
 :<math>=k_{ii} -2k_{ij} + k_{jj}</math>
-where <math>k=\phi^T \phi</math> is the <math>n \times n</math> matrix corresponding to the kernel of <math>\Phi</math>.
+where <math>K=\phi^T \phi</math> is the <math>n \times n</math> kernel matrix. We can summarize this constraint as
+:<math>n_{ij} ||x_i - x_j||^2 = n_{ij} || \phi(x_i) - \phi(x_J) ||^2</math>
+where <math>n_{ij}</math> is the neighbourhood indicator, that is, <math>n_{ij} = 1</math> if <math>x_i, x_j</math> are neighbours and <math>0</math> otherwise.
+We also have a constraint that <math>\Phi</math> is centred, i.e. <math>\sum\limits_{i,j} k_{ij} = 0</math>.
+Furthermore, <math>k</math> should be positive semi-definite, i.e. <math>k \succeq 0 </math>.
+====Optimization problem====
+The ideal optimization problem would be to minimize the rank of the matrix K, but unfortunately that is not a convex problem so we cannot solve it. Since we would like to maximize the variance of non-neighbour points, we can use the objective function
+:<math>\text{max tr}(K)</math>
+and thus the problem can be solved using semidefinite programming.

User:Dgroot/Oct.1 lecture: Difference between revisions

Latest revision as of 18:54, 3 October 2014

Contents

Oct. 1 lecture

Maximum variance unfolding

Optimization constraints

Optimization problem

Navigation menu

User:Dgroot/Oct.1 lecture: Difference between revisions

Latest revision as of 18:54, 3 October 2014

Oct. 1 lecture

Maximum variance unfolding

Optimization constraints

Optimization problem

Navigation menu

Search