proof of Lemma 1

From statwiki
Jump to: navigation, search

We seek [math]\textbf{u}[/math] that minimizes

[math] -\textbf{u}^T\textbf{a} \; \textrm{ subject } \; \textrm{ to } \; \|\textbf{u}\|^2_2 \leq 1, \; \|\textbf{u}\|_1 \leq c_1 [/math]

First we rewrite the criterion using Lagrange multipliers

[math] -\textbf{u}^T\textbf{a} + \lambda\|\textbf{u}\|^2_2 + \Delta\|\textbf{u}\|_1, [/math]

and we differentiate, set the derivative to 0 and solve for [math]\textbf{u}[/math]:

[math] 0 = -\textbf{a} + 2\lambda\textbf{u} + \Delta\Gamma, [/math]


where [math]\ \Gamma_i = \textrm{sgn}(u_i)[/math] if [math]u_i \neq 0[/math]; otherwise [math]\Gamma_i \in [-1, 1][/math]. The Karush-Kuhn-Tucker conditions for optimality consist of the previous equation, along with [math]\lambda(\|\textbf{u}\|^2_2 - 1) = 0[/math] and [math]\Delta(\|\textbf{u}\|_1 - c_1) = 0[/math]. Now if [math]\ \lambda \gt 0[/math], we have

[math] \textbf{u} = \frac{S(\textbf{a},\Delta)}{2\lambda} [/math]

In general, we have either [math]\ \lambda = 0[/math] (if this results in a feasible solution) or [math]\ \lambda[/math] must be chosen such that [math]\|\textbf{u}\|^2_2 = 1[/math]. So we see that

[math] \textbf{u} = \frac{S(\textbf{a},\Delta)}{\|S(\textbf{a},\Delta) \|_2} [/math]

Again by the Karush-Kuhn-Tucker conditions, either [math]\ \Delta = 0[/math] (if this results in a feasible solution) or [math]\ \Delta[/math] must be chosen such that [math]\|\textbf{u}\|_1 = c_1[/math]. So, [math]\ \Delta = 0[/math] if this results in [math]\|\textbf{u}\|_1 \leq c_1[/math]; otherwise we choose [math]\ \Delta[/math] such that [math]\|\textbf{u}\|_1 = c_1[/math]. This completes the proof of the Lemma.

The above proof is proof of Lemma 2.2 in <ref name="WTH2009">Daniela M. Witten, Robert Tibshirani, and Trevor Hastie. (2009) "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis". Biostatistics, 10(3):515–534.</ref>

References

<references />