optimal Solutions forSparse Principal Component Analysis: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with "Under construction ==Introduction== Principle component analysis (PCA) is a method for finding a linear combinations of the variables called principle components,corresponding t...")
 
Line 17: Line 17:


==Notation==
==Notation==
* For a  vector <math>z</math>, <math>\|z\|_1=\sum_{i=1}^n |z_i|</math> and <math> Card(z)</math>is the number of non-zero coefficients of <math>z</math>, while the support <math>I</math> of <math>z</math> is the set <math>\{i: z_i \neq 0\}</math> and <math>I^c</math> denotes its complement. <math>\beta_{+}</math> show the maximum of <math>\{\beta , 0\}</math>.
* For a symmetric matrix <math>X</math> with eigenvalues <math>\lambda_i</math>,<math>\textbf{Tr}(X)_{+}=\sum_{i=1}^{n}\max\{\lambda_i,0\}</math>.
* The vector of all ones is written <math>\textbf{1}</math>. The diagonal matrix with the vector <math>u</math> on the diagonal is written <math>\textbf{diag}(u)</math>

Revision as of 01:17, 2 December 2010

Under construction

Introduction

Principle component analysis (PCA) is a method for finding a linear combinations of the variables called principle components,corresponding to the directions which are orthogonal and are maximizing variance in the data. PCA facilitates the interpretation of the data if their factors are just the combinations of a few variables, not all or many of them. Constraining the number of nonzero coefficients in PCA is known as sparse PCA. Sparse PCA has many application in biology, finance and many machine learning problems. Sparse principal components, like principal components, are vectors that span a lower-dimensional space that explain most of variance in the original data. However, in order to find the sparse principal components using sparse PCA, it is necessary to make some sacrifices such as:

  • There is a reduction in the explained variance in the original data captured by the sparse principal components as compared to PCA.
  • There is a reduction in the orthogonality (independence or correlation) between the resulting variables (sparse principal components) as compared to PCA.

In this paper we are going to focus on the problem of sparse PCA which can be written as:

[math]\displaystyle{ \textrm{maximize} \; x^{T}{A}x-\rho\textbf{Card}^{2}(x) }[/math]
[math]\displaystyle{ \textrm{subject} \; \textrm{to} \; \|x\|_2=1 }[/math]

where [math]\displaystyle{ z\in \textbf{R}^n }[/math] and [math]\displaystyle{ \sigma \in S_n }[/math] is the symmetric positive semi definite sample covariance matrix and [math]\displaystyle{ \rho }[/math] is a parameter which controls the sparsity and [math]\displaystyle{ Card(z) }[/math] express the cardinality of [math]\displaystyle{ z }[/math]. Note that solving the PCA problem is not complicated while solving Sparse PCA is NP hard.

This paper first formulate sparse PCA problem and then derive a greedy algorithm for computing a full set of good solutions with total complexity [math]\displaystyle{ O(n^3) }[/math]. It also formulate a convex relaxation for sparse PCA and use it to derive a tractable sufficient conditions for a vector [math]\displaystyle{ z }[/math] to be a global optimum of the above formula.

Notation

  • For a vector [math]\displaystyle{ z }[/math], [math]\displaystyle{ \|z\|_1=\sum_{i=1}^n |z_i| }[/math] and [math]\displaystyle{ Card(z) }[/math]is the number of non-zero coefficients of [math]\displaystyle{ z }[/math], while the support [math]\displaystyle{ I }[/math] of [math]\displaystyle{ z }[/math] is the set [math]\displaystyle{ \{i: z_i \neq 0\} }[/math] and [math]\displaystyle{ I^c }[/math] denotes its complement. [math]\displaystyle{ \beta_{+} }[/math] show the maximum of [math]\displaystyle{ \{\beta , 0\} }[/math].
  • For a symmetric matrix [math]\displaystyle{ X }[/math] with eigenvalues [math]\displaystyle{ \lambda_i }[/math],[math]\displaystyle{ \textbf{Tr}(X)_{+}=\sum_{i=1}^{n}\max\{\lambda_i,0\} }[/math].
  • The vector of all ones is written [math]\displaystyle{ \textbf{1} }[/math]. The diagonal matrix with the vector [math]\displaystyle{ u }[/math] on the diagonal is written [math]\displaystyle{ \textbf{diag}(u) }[/math]