stat946f10: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 43: Line 43:


MVU is based on maximizing the overall variance while the local distances between neighbor points are preserved and it uses only one source of information. Colored MVU uses more than one source of information, i.e it reducing the dimension satisfying a combination of to goals//
MVU is based on maximizing the overall variance while the local distances between neighbor points are preserved and it uses only one source of information. Colored MVU uses more than one source of information, i.e it reducing the dimension satisfying a combination of to goals//
1- preserving the local distance (as first information)//
/1- preserving the local distance (as first information)//
2- optimum alignment with second information (side information)//
\2- optimum alignment with second information (side information)\\
Colored MVU seems to be reasonable since we can not merge all kind of information in one distance metric, because on source of information may be a feature of similarity rather than difference.
Colored MVU seems to be reasonable since we can not merge all kind of information in one distance metric, because on source of information may be a feature of similarity rather than difference.



Revision as of 11:26, 4 June 2009

June 2nd Maximum Variance Unfolding (Semidefinite Embedding)

Maximum Variance Unfolding (MFU) is a variation of Kernel PCA. The main proposal of this technique is to learn a suitable kernel [math]\displaystyle{ K }[/math] with several constraints when the data set is given.

First, we give the constraints for the kernel.

Constraints

1. Semipositive definiteness
Kernel PCA is a kind of spectral decomposition in Hilbert space. The semipositive definiteness interprets the kernel matrix as storing the inner products of vectors in a Hilbert space. Furthermore, the semipositive definiteness also means all eigenvalues are non-negative.

2. Centering
Considering the centering process in Kernel PCA, it is also required here. The condition is given by
[math]\displaystyle{ \sum_i \Phi\left(x_i\right) =0 . }[/math]
Equivalently,
[math]\displaystyle{ 0 = \left|\sum_i \Phi(x_i)\right|^2 = \sum_{ij}\Phi(x_i)\Phi(x_j)=\sum_{ij}K_{ij}. }[/math]

3. Isometry
The local distance between a pairwise of data [math]\displaystyle{ x_i, x_j }[/math], under neighbourhood relation [math]\displaystyle{ \eta }[/math], should be preserved in new space after mapping [math]\displaystyle{ \Phi(\cdot) }[/math]. In other words, for all [math]\displaystyle{ \eta_{ij}\gt 0 }[/math],
[math]\displaystyle{ \left|\Phi(x_i) - \Phi(x_j)\right|^2 = \left|x_i - x_j\right|^2. }[/math]
Additonally, for the consider of conformal map, the neighbourhood relation can be changed to [math]\displaystyle{ [\eta^T\eta]_{ij}\gt 0. }[/math] This ensures that if two points have a common neighbour, we preserve their pairwise distance.

Objective Functions

Given the conditions, the objective functions should be considered. The aim of dimensional reduction is to map high dimension data into a low dimension space with the minimum information losing cost. Recall the fact that the dimension of new space depends on the rank of the kernel. Hence, the best ideal kernel is the one which has minimum rank. So the ideal objective function should be
[math]\displaystyle{ \min\quad rank(K). }[/math]
However, to minimize the rank of one matrix is a hard problem. So we look at the question in another way. When doing dimensional reduction, we try to maximize the distance between non-neighbour data. In other words, we want to maximize the variance between non-neighbour datas. In such sense, we can change the objective function to
[math]\displaystyle{ \max \quad Trace(K) }[/math] .

Note that it is an interesting question that whether these two objective functions can be equivalent to each other.

Algorithm for Optimization Problem

The objective function with linear constraints form a typical semidefinite programming problem. The optimization is convex and globally. We already have methods to slove such kind of optimization problem.

Colored Maximum Variance Unfolding

MVU is based on maximizing the overall variance while the local distances between neighbor points are preserved and it uses only one source of information. Colored MVU uses more than one source of information, i.e it reducing the dimension satisfying a combination of to goals// /1- preserving the local distance (as first information)// \2- optimum alignment with second information (side information)\\ Colored MVU seems to be reasonable since we can not merge all kind of information in one distance metric, because on source of information may be a feature of similarity rather than difference.

Algorithmic Modification

In Colored MVU, [math]\displaystyle{ KL }[/math] is maximized instead of [math]\displaystyle{ K }[/math], where [math]\displaystyle{ L }[/math] is the matrix of covariance of first and side information.

Application

One of the drawback of MVU is that its statistical interpretation is not always clear. However one of the application of Colored MVU, which has great statistical interpretation is to be used as a criterion to for measuring the Hilbert-Schmidt Independence.