sparse PCA: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with '==Introduction== Given <math>n</math> observations on <math>d</math> variables (or in other words <math>n</math> <math>d</math>-dimensional data points), in PCA the goal is to fi...')
 
Line 30: Line 30:
</td><td valign=top> <br> (2) </td></tr></table>
</td><td valign=top> <br> (2) </td></tr></table>


The conditions <math>X\succeq 0</math> and <math>\textbf{Rank}(X)=1</math> in formula 2 guarantees that <math>X</math> can be written as <math>x^Tx</math>, for some <math>x</math>. But this formulation should be relaxed before it can be solved efficiently, because the constraint <math>\textbf{Card}(x)\leq k^2</math> is not convex. So we replace it with a weaker one: <math>\textbf{1}^T|X|\textbf 1\leq k</math>. We also drop the rank constraint. So we get:
 
The conditions <math>X\succeq 0</math> and <math>\textbf{Rank}(X)=1</math> in formula 2 guarantees that <math>X</math> can be written as <math>x^Tx</math>, for some <math>x</math>. But this formulation should be relaxed before it can be solved efficiently, because the constraintS <math>\textbf{Card}(x)\leq k^2</math> and <math>\textbf{Rank}(X)=1</math> are not convex. So we replace the cardinality constraint with a weaker one: <math>\textbf{1}^T|X|\textbf 1\leq k</math>. We also drop the rank constraint. So we get:


<table width=95%><tr><td width=45%>&nbsp;</td><td width=45%>
<table width=95%><tr><td width=45%>&nbsp;</td><td width=45%>
Line 38: Line 39:
\textrm{subject\ to}&\textbf{Tr}(X)=1\\
\textrm{subject\ to}&\textbf{Tr}(X)=1\\
&\textbf{1}^T|X|\textbf 1\leq k\\
&\textbf{1}^T|X|\textbf 1\leq k\\
&X\succeq 0\\
\end{array}
</math>
</td><td valign=top> &nbsp; </td></tr></table>
We then change the modified cardinality constraint to a penalty term in the goal function with some positive factio <math>\rho</math>. So we get:
<table width=95%><tr><td width=45%>&nbsp;</td><td width=45%>
<math>
\begin{array}{ll}
\textrm{maximize}& \textbf{Tr}(AX)-\rho\textbf{1}^T|X|\textbf 1\\
\textrm{subject\ to}&\textbf{Tr}(X)=1\\
&X\succeq 0\\
&X\succeq 0\\
\end{array}
\end{array}
</math>
</math>
</td><td valign=top> <br> (3) </td></tr></table>
</td><td valign=top> <br> (3) </td></tr></table>

Revision as of 11:27, 27 June 2009

Introduction

Given [math]\displaystyle{ n }[/math] observations on [math]\displaystyle{ d }[/math] variables (or in other words [math]\displaystyle{ n }[/math] [math]\displaystyle{ d }[/math]-dimensional data points), in PCA the goal is to find directions in the data set space that correspond to the directions with biggest variance in the input data. In the practice each of the [math]\displaystyle{ d }[/math] variables has its own special meaning and it may be desirable to come up with some directions, as principal components, each of which is a combination of just a few of these variables. This makes the directions more interpretable and meaningful. But this is not something that usually happens in the oridinal result of PCA method. Each of resulting directions from PCA in most cases is rather a linear combination of all variable with no zero coefficients.

The address the above concern we add a sparsity constraint to the PCA problem, which makes the PCA problem much harder to solve. That's because we have just added a combinatorial constraint to out optimization problem. This paper is about how to find directions in the data space with maximum variance that have a limited number of non-zero elements.

Problem Formulation

Given the covariance matrix [math]\displaystyle{ A }[/math], the problem can be written as:

 

[math]\displaystyle{ \begin{array}{ll} \textrm{maximize}& x^TAx\\ \textrm{subject\ to}& ||x||_2=1\\ &\textbf{Card}(x)\leq k \end{array} }[/math]


(1)

Defining [math]\displaystyle{ X=x^Tx }[/math], the above formula can be rewritten as

 

[math]\displaystyle{ \begin{array}{ll} \textrm{maximize}& \textbf{Tr}(AX)\\ \textrm{subject\ to}&\textbf{Tr}(X)=1\\ &\textbf{Card}(x)\leq k^2\\ &X\succeq 0, \textbf{Rank}(X)=1\\ \end{array} }[/math]


(2)


The conditions [math]\displaystyle{ X\succeq 0 }[/math] and [math]\displaystyle{ \textbf{Rank}(X)=1 }[/math] in formula 2 guarantees that [math]\displaystyle{ X }[/math] can be written as [math]\displaystyle{ x^Tx }[/math], for some [math]\displaystyle{ x }[/math]. But this formulation should be relaxed before it can be solved efficiently, because the constraintS [math]\displaystyle{ \textbf{Card}(x)\leq k^2 }[/math] and [math]\displaystyle{ \textbf{Rank}(X)=1 }[/math] are not convex. So we replace the cardinality constraint with a weaker one: [math]\displaystyle{ \textbf{1}^T|X|\textbf 1\leq k }[/math]. We also drop the rank constraint. So we get:

 

[math]\displaystyle{ \begin{array}{ll} \textrm{maximize}& \textbf{Tr}(AX)\\ \textrm{subject\ to}&\textbf{Tr}(X)=1\\ &\textbf{1}^T|X|\textbf 1\leq k\\ &X\succeq 0\\ \end{array} }[/math]

 

We then change the modified cardinality constraint to a penalty term in the goal function with some positive factio [math]\displaystyle{ \rho }[/math]. So we get:

 

[math]\displaystyle{ \begin{array}{ll} \textrm{maximize}& \textbf{Tr}(AX)-\rho\textbf{1}^T|X|\textbf 1\\ \textrm{subject\ to}&\textbf{Tr}(X)=1\\ &X\succeq 0\\ \end{array} }[/math]


(3)