optimal Solutions forSparse Principal Component Analysis: Difference between revisions
Line 51: | Line 51: | ||
==Greedy Solution== | ==Greedy Solution== | ||
The approximate algorithm which computes a full path is | |||
As estimating <math>n-k</math> eigenvalues at each iteration is costly, we get help from the fact that <math>uu^T</math> is a subgradient of <math>\lambda_max</math> at <math>X</math> if <math>u</math> is a leading eigenvector of <math>X</math> to get: | |||
<center><math>\lambda_{max}(\sum_{j\in I_k\cup \{i\}}a_ja_j^T)\geq \lambda_{max}(\sum_{j\in I_k}a_ja_j^T)+(x_k^Ta_i)^2</math></center> | |||
We the derive the following algorithm: | |||
______________________________________________ | |||
'''Approximate Greedy Search Algorithm''' | |||
'''Input''': <math>\Sigma \in \textbf{R}^{n\times n}</math> | |||
'''Algorithm''': | |||
1.Preprocessing: sort variables decreasingly diagonal elements and permute elements of <math>\Sigma</math> accordingly. Compute Cholesky decomposition <math>\Sigma =A^TA</math>. | |||
2.Initialization:<math>I_1=\{\}, x_1=a_1/\|a_1\|</math> | |||
3.Compute <math>i_k= argmax_{i\notin I_k}(x_k^Ta_i)^2</math> | |||
4.Set <math>I_{k+1}=I_k\cup\{i_k\}</math> and compute <math>x_{k+1}</math> as the leading eigenvector of <math>\sum_{j\in I_{k+1}}a_j a_j^T</math> | |||
5.Set <math>k=k+1</math> if <math>k< n</math> go back to step 3. | |||
'''Output''': sparsity patterns <math>I_k</math> |
Revision as of 01:26, 2 December 2010
Under construction
Introduction
Principle component analysis (PCA) is a method for finding a linear combinations of the variables called principle components,corresponding to the directions which are orthogonal and are maximizing variance in the data. PCA facilitates the interpretation of the data if their factors are just the combinations of a few variables, not all or many of them. Constraining the number of nonzero coefficients in PCA is known as sparse PCA. Sparse PCA has many application in biology, finance and many machine learning problems. Sparse principal components, like principal components, are vectors that span a lower-dimensional space that explain most of variance in the original data. However, in order to find the sparse principal components using sparse PCA, it is necessary to make some sacrifices such as:
- There is a reduction in the explained variance in the original data captured by the sparse principal components as compared to PCA.
- There is a reduction in the orthogonality (independence or correlation) between the resulting variables (sparse principal components) as compared to PCA.
In this paper we are going to focus on the problem of sparse PCA which can be written as:
[math]\displaystyle{ \textrm{subject} \; \textrm{to} \; \|x\|_2=1 }[/math]
where [math]\displaystyle{ z\in \textbf{R}^n }[/math] and [math]\displaystyle{ \sigma \in S_n }[/math] is the symmetric positive semi definite sample covariance matrix and [math]\displaystyle{ \rho }[/math] is a parameter which controls the sparsity and [math]\displaystyle{ Card(z) }[/math] express the cardinality of [math]\displaystyle{ z }[/math]. Note that solving the PCA problem is not complicated while solving Sparse PCA is NP hard.
This paper first formulate sparse PCA problem and then derive a greedy algorithm for computing a full set of good solutions with total complexity [math]\displaystyle{ O(n^3) }[/math]. It also formulate a convex relaxation for sparse PCA and use it to derive a tractable sufficient conditions for a vector [math]\displaystyle{ z }[/math] to be a global optimum of the above formula.
Notation
- For a vector [math]\displaystyle{ z }[/math], [math]\displaystyle{ \|z\|_1=\sum_{i=1}^n |z_i| }[/math] and [math]\displaystyle{ Card(z) }[/math]is the number of non-zero coefficients of [math]\displaystyle{ z }[/math], while the support [math]\displaystyle{ I }[/math] of [math]\displaystyle{ z }[/math] is the set [math]\displaystyle{ \{i: z_i \neq 0\} }[/math] and [math]\displaystyle{ I^c }[/math] denotes its complement. [math]\displaystyle{ \beta_{+} }[/math] show the maximum of [math]\displaystyle{ \{\beta , 0\} }[/math].
- For a symmetric matrix [math]\displaystyle{ X }[/math] with eigenvalues [math]\displaystyle{ \lambda_i }[/math],[math]\displaystyle{ \textbf{Tr}(X)_{+}=\sum_{i=1}^{n}\max\{\lambda_i,0\} }[/math].
- The vector of all ones is written [math]\displaystyle{ \textbf{1} }[/math]. The diagonal matrix with the vector [math]\displaystyle{ u }[/math] on the diagonal is written [math]\displaystyle{ \textbf{diag}(u) }[/math]
Sparse PCA
Sparse PCA problem can be written as:
[math]\displaystyle{ \textrm{subject} \; \textrm{to} \; \|x\|_2=1 }[/math]
The above problem is directly related to the solving the following:
[math]\displaystyle{ \textrm{subject} \; \textrm{to} \; \|x\|_2=1, }[/math]
[math]\displaystyle{ \textbf{Card}(x)\leq k. }[/math]By duality, an upper bound on the optimal value of the above problem is given by:
where [math]\displaystyle{ P }[/math] is the set of penalty values for which [math]\displaystyle{ \Phi(\rho) }[/math] has been computed. Since [math]\displaystyle{ z^T\sigma z\leq \sigma_{11}(\sum_{i=1}^n|z_i|^2)^2 }[/math] and [math]\displaystyle{ (\sum_{i=1}^n|z_i|^2)^2\leq \|z\|^2\textbf{Card}(z) }[/math] and using the above lemma while assuming [math]\displaystyle{ \rho \geq \sigma_{11} }[/math], the following will be achieved.
[math]\displaystyle{ \leq (\sum_{11}-\rho)\textbf{Card}(z) }[/math]
Rewriting the perliminary formula in the equivelant form we'll have:
For more detail refer to <ref name="afl"> Alexandre d'Aspremont, Francis Bach, and Laurent El Ghaoui. 2008. Optimal Solutions for Sparse Principal Component Analysis. J. Mach. Learn. Res. 9 (June 2008), 1269-1294. </ref>.
Greedy Solution
The approximate algorithm which computes a full path is As estimating [math]\displaystyle{ n-k }[/math] eigenvalues at each iteration is costly, we get help from the fact that [math]\displaystyle{ uu^T }[/math] is a subgradient of [math]\displaystyle{ \lambda_max }[/math] at [math]\displaystyle{ X }[/math] if [math]\displaystyle{ u }[/math] is a leading eigenvector of [math]\displaystyle{ X }[/math] to get:
We the derive the following algorithm:
______________________________________________
Approximate Greedy Search Algorithm
Input: [math]\displaystyle{ \Sigma \in \textbf{R}^{n\times n} }[/math]
Algorithm:
1.Preprocessing: sort variables decreasingly diagonal elements and permute elements of [math]\displaystyle{ \Sigma }[/math] accordingly. Compute Cholesky decomposition [math]\displaystyle{ \Sigma =A^TA }[/math].
2.Initialization:[math]\displaystyle{ I_1=\{\}, x_1=a_1/\|a_1\| }[/math]
3.Compute [math]\displaystyle{ i_k= argmax_{i\notin I_k}(x_k^Ta_i)^2 }[/math]
4.Set [math]\displaystyle{ I_{k+1}=I_k\cup\{i_k\} }[/math] and compute [math]\displaystyle{ x_{k+1} }[/math] as the leading eigenvector of [math]\displaystyle{ \sum_{j\in I_{k+1}}a_j a_j^T }[/math]
5.Set [math]\displaystyle{ k=k+1 }[/math] if [math]\displaystyle{ k\lt n }[/math] go back to step 3.
Output: sparsity patterns [math]\displaystyle{ I_k }[/math]