# large-Scale Supervised Sparse Principal Component Analysis

## 1. Introduction

The drawbacks of most existing technique:

1 Drawbacks of Existing techniques

Existing techniques include ad-hoc methods(e.g. factor rotation techniques, simple thresholding), greedy algorithms, SCoTLASS, the regularized SVD method, SPCA, the generalized power method. These methods are based on non-convex optimization and they don't guarantee global optimum.

A semi-definite relaxation method called DSPCA can guarantee global convergence and has better performance than above algorithms, however, it is computationally expensive.

2 Contribution of this paper

This paper solves DSPCA in a computationally easier way, and hence it is a good solution for large scale data sets. This paper applies a block coordinate ascent algorithm with computational complexity $O(\hat{n^3})$, where $\hat{n}$ is the intrinsic dimension of the data. Since $\hat{n}$ could be very small compared to the dimension $n$ of the data, this algorithm is computationally easy.

## 2. Primal problem

The sparse PCA problem can be formulated as $max_x \ x^T \Sigma x - \lambda \| x \|_0 : \| x \|_2=1$.

This is equivalent to $max_z$ Tr$\Sigma Z - \lambda \sqrt{\| Z \|_0} : Z \succeq 0$, Tr $Z=1$, Rank$(Z)=1$.

Replacing the $\sqrt{\| Z \|_0}$ with $\| Z \|_1$ and dropping the rank constraint gives a relaxation of the original non-convex problem:

$max_z Tr (\Sigma Z) - \lambda \| Z \|_1 : Z \succeq 0$, $Tr(Z)=1 \qquad (3)$ .

Fortunately, this relaxation approximates the original non-convex problem to a convex problem.

Here is an important theorem used by this paper:

Theorem(2.1) Let $\Sigma=A^T A$ where $A=(a_1,a_2,......,a_n) \in {\mathbb R}^{m \times n}$, we have $\psi = max_{\| \xi \|_2=1}$ $\sum_{i=1}^{n} (({a_i}^T \xi)^2 - \lambda)_+$. An optimal non-zero pattern corresponds to the indices $i$ with $\lambda \lt (({a_i}^T \xi)^2-\lambda)_+$

## 3. Block Coordinate Ascent Algorithm

There is a row-by-row algorithm applied to the problems of the form $min_X \ f(X)-\beta$ log det $X: \ L \leq X \leq U, X$