a Direct Formulation For Sparse PCA Using Semidefinite Programming: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
Line 25: Line 25:
<math>\textbf{Card}(x)\leq k</math>
<math>\textbf{Card}(x)\leq k</math>


====Semidefinite relaxation====
===Semidefinite relaxation===


Rewriting the above formulas, we will have:
Rewriting the above formulas, we will have:

Revision as of 13:16, 10 November 2010

Still under construction

Introduction

Principle Component Analysis is a popular technique which finds a transformation from correlated variables to uncorrelated ones which correspond to the direction of maximal variance in the data. The Principal component can be representative of the whole data with minimum information loss. In sparse PCA, the embedded variables are linear combination of the input variables subject to the constraint that the number of non-zero elements in this combination is limited. Sparse PCA decomposition interpretation is facilitated since the number of non-zero elements are not all variables if the coordinate axes have a physical meaning. It is also helpful in expressing a space of a set of low-dimensional vectors with minimum loss of information.

Semidefinite programs are convex optimization. It optimize a linear function by constraining it to an affine combination of symmetric matrices is positive semidefinite. Semidefinite programs can be solved in polynomial time by interior-point solvers like SEDUMI or SDPT3. Unfortunately it is not viable and practical for high dimensional data sets.

This paper suggests a direct approach for formulation of sparse PCA via semidefinite programming which is convex. Since interior-point solvers cannot handle large data sets, Nesterov’s smoothing technique is used efficiently to help solving large dimensional problems. First order methods require less memory which is an important issue in interior-point solvers. On the other hand their convergence is slower but it is not a major concern in the certain case.So The optimal first-order minimization algorithm is going to be applied for solving the optimization problem.

The content of the paper could be summarized as below:

First they tried to maximize the variance projection by limiting the number of non-zero elements via semidefinite programming. then they show the robustness of their method and how this method could be used for decomposing a matrix in to limited number of variables. As their problem size is large and can not be solved by common techniques which are used for solving the convex optimization problems; they show Nesterov's smoothing method is helping to achieve the solution.

Semidefinite relaxation

Sparse variance maximization

A is assumed to be a symmetric matrix and WLOG, A is a covariance matrix and we are going to maximize the variance of vector [math]\displaystyle{ x \in \textbf{R}^n }[/math] while it is sparse.

maximize [math]\displaystyle{ x^{T}{A}x }[/math]

subject to [math]\displaystyle{ \|x\|_2=1 }[/math]

[math]\displaystyle{ \textbf{Card}(x)\leq k }[/math]

Semidefinite relaxation

Rewriting the above formulas, we will have:

maximize [math]\displaystyle{ \textbf{Tr}({A}{X}) }[/math]

subject to [math]\displaystyle{ \textbf{Tr}({X})=1 }[/math],

[math]\displaystyle{ \textbf{Card}({X})\leq k^2 }[/math],

[math]\displaystyle{ {X}\geq 0 }[/math]

[math]\displaystyle{ \textbf{Rank}({X})=1 }[/math]

where [math]\displaystyle{ X=xx^T }[/math] and it is the solution to the above problem. Finally by replacing the only non-convex constraint to a weaker but convex one, we come to a semidefinite relaxation of the sparse PCA for square matrices.

maximize [math]\displaystyle{ \textbf{Tr}({A}{X}) }[/math]

subject to [math]\displaystyle{ \textbf{Tr}({X})=1 }[/math],

[math]\displaystyle{ \textbf{1}^{T}|{X}|\textbf{1}\leq k }[/math],

[math]\displaystyle{ {X}\geq 0 }[/math]



maximize [math]\displaystyle{ u^{T}{A}v }[/math]

subject to [math]\displaystyle{ \|u\|_{2}=\|v\|_{2}=1 }[/math],

[math]\displaystyle{ \textbf{Card}(u)\leq k_{1},\textbf{Card}(u)\leq k_{1} }[/math],


maximize [math]\displaystyle{ \textbf{Tr}({A}^{T}{X}^{12}) }[/math]

subject to [math]\displaystyle{ {X}\leq0, \textbf{Tr}({X}^{ii})=1 }[/math],

[math]\displaystyle{ \textbf{1}^{T}|{X}^{ii}|\textbf{1}\leq k_{i}, i=1,2 }[/math],

[math]\displaystyle{ \textbf{1}^{T}|{X}^{12}|\textbf{1}\leq \sqrt{k_1k_2} }[/math]

Robustness interpretation

maximize [math]\displaystyle{ x^{T}{A}x }[/math]

subject to [math]\displaystyle{ \|x\|_2=1 }[/math]

[math]\displaystyle{ \textbf{Card}(x)\leq k }[/math]


maximize [math]\displaystyle{ x^{T}{A}x-\textbf{Card}^{2}(x) \rho }[/math]

subject to [math]\displaystyle{ \|x\|_2=1 }[/math]


maximize [math]\displaystyle{ \textbf{Tr}({A}{X})-\textbf{Card}({X})\rho }[/math]

subject to [math]\displaystyle{ \textbf{Tr}({X})=1 }[/math],

[math]\displaystyle{ {X}\geq 0 }[/math]

[math]\displaystyle{ \textbf{Rank}({X})=1 }[/math]


maximize [math]\displaystyle{ \textbf{Tr}({A}{X})-\rho\textbf{1}^{T}|{X}|\textbf{1} }[/math]

subject to [math]\displaystyle{ \textbf{Tr}({X})=1 }[/math],

[math]\displaystyle{ {X}\geq 0 }[/math]


[math]\displaystyle{ max _{{X}\geq 0, \textbf{Tr}({X})=1} min _{|{U}_{ij}\leq \rho|}\textbf{Tr}({X}({A}+{U})) }[/math]


minimize [math]\displaystyle{ \lambda^{max}({A}+{U}) }[/math]

subject to [math]\displaystyle{ |{U}_{ij}|\leq \rho, i,j=1,...,n }[/math]


[math]\displaystyle{ ({A}+{U}){X}=\lambda^{max}({A}+{U}){X} }[/math]

Sparse decomposition

maximize [math]\displaystyle{ \textbf{Tr}({A}{X}) }[/math]

subject to [math]\displaystyle{ \textbf{Tr}({X})=1 }[/math],

[math]\displaystyle{ \textbf{1}^{T}|{X}|\textbf{1}\leq k }[/math],

[math]\displaystyle{ {X}\geq 0 }[/math]

[math]\displaystyle{ {A}_2={A}_1-(x_{1}^{T}{A}_1x_1)x_1x_1^T }[/math]

Algorithm

Smoothing technique

[math]\displaystyle{ f(x)=\hat{f}(x)+max_{u}\{\lt \textbf{T}x,u\gt -\hat{\phi}(u) : u \in \textbf{Q}_2\} }[/math]

[math]\displaystyle{ min _{x\in \textbf{Q}_1}f(x) }[/math]

Application o sparse PCA

maximize [math]\displaystyle{ \textbf{Tr}({A}{X})-\textbf{1}^{T}|{X}|\textbf{1} }[/math]

subject to [math]\displaystyle{ \textbf{Tr}({X})=1 }[/math],

[math]\displaystyle{ {X}\geq 0 }[/math]


[math]\displaystyle{ min_{{U}\in{Q}_1}f({U}) }[/math]

[math]\displaystyle{ {Q}_1=\{{U}\in \textbf{S}^{n}:|{U}_{ij}|\leq 1,i,j=1,...,n\},{Q}_2=\{{X}\in\textbf{S}^n:\textbf{Tr} X=1,X\geq0\} }[/math]

[math]\displaystyle{ f(U):=max_{X \in Q_2}\lt TU,X\gt -\hat{\phi}(X) }[/math] with [math]\displaystyle{ T=I_{n^2}, \hat{\phi}(X)=-\textbf{Tr}(AX) }[/math]


[math]\displaystyle{ d_1(U)=\frac{1}{2}U^T U }[/math]

[math]\displaystyle{ U_0:=arg min_{U\in Q_{1}}d_1(U) }[/math]


[math]\displaystyle{ D_1:=max_{U \in Q_1}d_1(U)=n^2/2 }[/math]

[math]\displaystyle{ d_2(X)=\textbf{Tr}(XlogX)+log(n) }[/math]

[math]\displaystyle{ max_{X\in Q_2}d_2(X)\leq log n:=D_2 }[/math]

[math]\displaystyle{ \| T\| _{1,2}:= max_{X,U}\lt TX,U\gt :\| X\| _{F}=1,\|U\|^*_2=1 }[/math]

[math]\displaystyle{ =max_X\|X\|_2:\|X\|_F\leq 1 }[/math]

[math]\displaystyle{ =1 }[/math]

[math]\displaystyle{ D_1=n^2/2,\sigma _1=1,D_2=log(n),\sigma_2=1/2,\|T\|_{1,2}=1 }[/math]

Regularization

[math]\displaystyle{ \mu:=\frac{\epsilon}{2D_2} }[/math]

[math]\displaystyle{ N=\frac{4\|T\|_{1,2}}{\epsilon}\sqrt{\frac{D_1D_2}{\sigma_1\sigma_2}} }[/math]

[math]\displaystyle{ min_{U \in Q_1}f_{\mu}(U) }[/math]

[math]\displaystyle{ f_{\mu}(U):=max_{X \in Q_2}\lt TU,X\gt -\hat{\phi}(X)-\mu d_2(X) }[/math]

[math]\displaystyle{ L:=\frac{D_2\|T\|_{1,2}^2}{\epsilon 2\sigma_2} }[/math]

[math]\displaystyle{ f_\mu(U)=\mu log(\textbf{Tr}exp((A+U)/\mu))-\mu log n }[/math]

First order minimization

References

<references />