User:Dgroot/Oct.1 lecture

From statwiki
Jump to navigation Jump to search

Temporary page to avoid edit conflicts

Oct. 1 lecture

Maximum variance unfolding

In maximum variance unfolding (MVU), as in isomap and LLE, we operate under the assumption that the data is sampled from a low-dimensional manifold that is embedded within a higher-dimensional Euclidean space. The aim is to "unfold" this manifold in a lower-dimensional Euclidean space such that the local structure of each point is preserved for each point's [math]\displaystyle{ k }[/math] nearest neighbours, while maximizing the pairwise variance for all points which are not neighbours. The MVU algorithm uses semidefinite programming to find a kernel that satisfies this aim.

Optimization constraints

For a kernel [math]\displaystyle{ \Phi }[/math] such that [math]\displaystyle{ x \mapsto \phi(x) }[/math], we have the constraint that for [math]\displaystyle{ i, j }[/math] such that [math]\displaystyle{ x_i, x_j }[/math] are neighbours, we require that [math]\displaystyle{ \Phi }[/math] satisfies

[math]\displaystyle{ | x_i - x_j |^2 = |\phi(x_i) - \phi(x_j)|^2 }[/math]
[math]\displaystyle{ = [\phi(x_i) - \phi(x_j)]^T [\phi(x_i) - \phi(x_j)] }[/math]
[math]\displaystyle{ =\phi(x_i)^T \phi(x_i) - \phi(x_i)^T \phi(x_j) - \phi(x_j)^T \phi(x_i) + \phi(x_j)^T \phi(x_j) }[/math]
[math]\displaystyle{ =k_{ii} - k_{ij} - k_{ji} + k_{jj} }[/math]
[math]\displaystyle{ =k_{ii} -2k_{ij} + k_{jj} }[/math]

where [math]\displaystyle{ K=\phi^T \phi }[/math] is the [math]\displaystyle{ n \times n }[/math] kernel matrix. We can summarize this constraint as

[math]\displaystyle{ n_{ij} ||x_i - x_j||^2 = n_{ij} || \phi(x_i) - \phi(x_J) ||^2 }[/math]

where [math]\displaystyle{ n_{ij} }[/math] is the neighbourhood indicator, that is, [math]\displaystyle{ n_{ij} = 1 }[/math] if [math]\displaystyle{ x_i, x_j }[/math] are neighbours and [math]\displaystyle{ 0 }[/math] otherwise.

We also have a constraint that [math]\displaystyle{ \Phi }[/math] is centred, i.e. [math]\displaystyle{ \sum\limits_{i,j} k_{ij} = 0 }[/math].

Furthermore, [math]\displaystyle{ k }[/math] should be positive semi-definite, i.e. [math]\displaystyle{ k \succeq 0 }[/math].

Optimization problem

The ideal optimization problem would be to minimize the rank of the matrix K, but unfortunately that is not a convex problem so we cannot solve it. Since we would like to maximize the variance of non-neighbour points, we can use the objective function

[math]\displaystyle{ \text{max tr}(K) }[/math]

and thus the problem can be solved using semidefinite programming.