a Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
Still under construction
==Introduction==
==Introduction==
Matrix decompositions or factorizations are a useful tool in identifying the underlying structure of a matrix and the data it represents.  However, many of these decompositions produce dense factors which are hard to interpret.  Enforcing sparse factors gives factorizations which are more amenable to interpretation.  In their paper, Witten, Tibshirani, and Hastie <ref name="WTH2009">Daniela M. Witten, Robert Tibshirani, and Trevor Hastie. (2009) "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis". ''Biostatistics'', 10(3):515–534.</ref> develop a penalized matrix decomposition (PMD) using penalty functions on the factors to ensure sparsity and ease of interpretation.  They divide their paper into three major components.  They begin by presenting their algorithm for PMD and derive efficient versions of their algorithm for two sets of common penalty functions.  In addition, they use a particular form of their algorithm to derive a sparse version of principal component analysis (PCA).  Comparing this version to two other sparse PCA methods by Jolliffe \textit{and others} \cite{JTU2003} and Zou \textit{and others} \cite{ZHT2006} they show how these three methods are related.  In particular, they show how their sparse PCA algorithm can be used to efficiently solve the SCoTLASS problem proposed by Jolliffe \textit{and others} \cite{JTU2003}; a computationally hard problem to solve in its original form.  Finally, they use their PMD to yield a new method for penalized canonical correlation analysis (CCA).  The main application of this procedure is to genomic data.  They argue that since it is becoming increasingly common for biologists to perform multiple assays on the same set of samples there is an increased need for methods that perform inference across data sets. To this end they demonstrate their penalized CCA method on a genomic data set consisting of gene expression and DNA copy number measurements on the same set of patient samples.  Using penalized CCA they can identify sets of genes that are correlated with regions of copy number change.
Matrix decompositions or factorizations are a useful tool in identifying the underlying structure of a matrix and the data it represents.  However, many of these decompositions produce dense factors which are hard to interpret.  Enforcing sparse factors gives factorizations which are more amenable to interpretation.  In their paper, Witten, Tibshirani, and Hastie<ref name="WTH2009">Daniela M. Witten, Robert Tibshirani, and Trevor Hastie. (2009) "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis". ''Biostatistics'', 10(3):515–534.</ref> develop a penalized matrix decomposition (PMD) using penalty functions on the factors to ensure sparsity and ease of interpretation.  They divide their paper into three major components.  They begin by presenting their algorithm for PMD and derive efficient versions for two sets of common penalty functions.  In addition, they use a particular form of their algorithm to derive a sparse version of principal component analysis (PCA).  Comparing this version to two other sparse PCA methods by Jolliffe ''and others''<ref name="JTU2003">Ian T. Jolliffe, Nickolay T. Trendafilov, and Mudassir Uddin. (2003) "A modified principal component technique based on the lasso". ''Journal of Computational and Graphical Statistics'', 12(3):531–547. </ref> and Zou ''and others''<ref name="ZHT2006>Hui Zou, Trevor Hastie, and Robert Tibshirani. (2006) "Sparse Principal Component Analysis". ''Journal of Computational and Graphical Statistics'', 15(2):265–286.</ref> they show how these three methods are related.  In particular, they show how their sparse PCA algorithm can be used to efficiently solve the SCoTLASS problem proposed by Jolliffe ''and others''<ref name="JTU2003"/>; a computationally hard problem to solve in its original form.  Finally, they use their PMD to yield a new method for penalized canonical correlation analysis (CCA).  The main application of this procedure is to genomic data.  They argue that since it is becoming increasingly common for biologists to perform multiple assays on the same set of samples there is an increased need for methods that perform inference across data sets. To this end they demonstrate their penalized CCA method on a genomic data set consisting of gene expression and DNA copy number measurements on the same set of patient samples.  Using penalized CCA they can identify sets of genes that are correlated with regions of copy number change.


==References==
==References==
<references />
<references />

Revision as of 18:44, 7 November 2010

Still under construction

Introduction

Matrix decompositions or factorizations are a useful tool in identifying the underlying structure of a matrix and the data it represents. However, many of these decompositions produce dense factors which are hard to interpret. Enforcing sparse factors gives factorizations which are more amenable to interpretation. In their paper, Witten, Tibshirani, and Hastie<ref name="WTH2009">Daniela M. Witten, Robert Tibshirani, and Trevor Hastie. (2009) "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis". Biostatistics, 10(3):515–534.</ref> develop a penalized matrix decomposition (PMD) using penalty functions on the factors to ensure sparsity and ease of interpretation. They divide their paper into three major components. They begin by presenting their algorithm for PMD and derive efficient versions for two sets of common penalty functions. In addition, they use a particular form of their algorithm to derive a sparse version of principal component analysis (PCA). Comparing this version to two other sparse PCA methods by Jolliffe and others<ref name="JTU2003">Ian T. Jolliffe, Nickolay T. Trendafilov, and Mudassir Uddin. (2003) "A modified principal component technique based on the lasso". Journal of Computational and Graphical Statistics, 12(3):531–547. </ref> and Zou and others<ref name="ZHT2006>Hui Zou, Trevor Hastie, and Robert Tibshirani. (2006) "Sparse Principal Component Analysis". Journal of Computational and Graphical Statistics, 15(2):265–286.</ref> they show how these three methods are related. In particular, they show how their sparse PCA algorithm can be used to efficiently solve the SCoTLASS problem proposed by Jolliffe and others<ref name="JTU2003"/>; a computationally hard problem to solve in its original form. Finally, they use their PMD to yield a new method for penalized canonical correlation analysis (CCA). The main application of this procedure is to genomic data. They argue that since it is becoming increasingly common for biologists to perform multiple assays on the same set of samples there is an increased need for methods that perform inference across data sets. To this end they demonstrate their penalized CCA method on a genomic data set consisting of gene expression and DNA copy number measurements on the same set of patient samples. Using penalized CCA they can identify sets of genes that are correlated with regions of copy number change.

References

<references />