# proof of Lemma 1

We seek $\textbf{u}$ that minimizes

$-\textbf{u}^T\textbf{a} \; \textrm{ subject } \; \textrm{ to } \; \|\textbf{u}\|^2_2 \leq 1, \; \|\textbf{u}\|_1 \leq c_1$

First we rewrite the criterion using Lagrange multipliers

$-\textbf{u}^T\textbf{a} + \lambda\|\textbf{u}\|^2_2 + \Delta\|\textbf{u}\|_1,$

and we differentiate, set the derivative to 0 and solve for $\textbf{u}$:

$0 = -\textbf{a} + 2\lambda\textbf{u} + \Delta\Gamma,$

where $\ \Gamma_i = \textrm{sgn}(u_i)$ if $u_i \neq 0$; otherwise $\Gamma_i \in [-1, 1]$. The Karush-Kuhn-Tucker conditions for optimality consist of the previous equation, along with $\lambda(\|\textbf{u}\|^2_2 - 1) = 0$ and $\Delta(\|\textbf{u}\|_1 - c_1) = 0$. Now if $\ \lambda \gt 0$, we have

$\textbf{u} = \frac{S(\textbf{a},\Delta)}{2\lambda}$

In general, we have either $\ \lambda = 0$ (if this results in a feasible solution) or $\ \lambda$ must be chosen such that $\|\textbf{u}\|^2_2 = 1$. So we see that

$\textbf{u} = \frac{S(\textbf{a},\Delta)}{\|S(\textbf{a},\Delta) \|_2}$

Again by the Karush-Kuhn-Tucker conditions, either $\ \Delta = 0$ (if this results in a feasible solution) or $\ \Delta$ must be chosen such that $\|\textbf{u}\|_1 = c_1$. So, $\ \Delta = 0$ if this results in $\|\textbf{u}\|_1 \leq c_1$; otherwise we choose $\ \Delta$ such that $\|\textbf{u}\|_1 = c_1$. This completes the proof of the Lemma.

The above proof is proof of Lemma 2.2 in <ref name="WTH2009">Daniela M. Witten, Robert Tibshirani, and Trevor Hastie. (2009) "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis". Biostatistics, 10(3):515–534.</ref>