stat841f14: Difference between revisions
No edit summary |
|||
Line 4: | Line 4: | ||
==== Introduction ==== | ==== Introduction ==== | ||
Principal Component Analysis (PCA), first invented by [http://en.wikipedia.org/wiki/Karl_Pearson Karl Pearson] in 1901, is a statistical technique | Principal Component Analysis (PCA), first invented by [http://en.wikipedia.org/wiki/Karl_Pearson Karl Pearson] in 1901, is a statistical technique for data analysis. Its main purpose is to reduce the dimensionality of the data. | ||
Suppose there is a set of data points in a p-dimensional space. PCA’s goal is to find a linear subspace with lower dimensionality q (q <math>\leq</math> p), such that it contains as many as possible of data points. In another word, PCA aims to reduce the dimensionality of the data, while preserving its information (or minimizing the loss of information). Information comes from variation. For example, if all data points have the same value along one dimension, that dimension does not carry any information. So, to preserve information, the subspace need to contain components (or dimensions) along which, data has most variability. However | |||
=== PCA applications === | === PCA applications === |
Revision as of 00:51, 15 September 2014
Data Visualization (Fall 2014)
Principal Components Analysis (PCA) (Lecture: Sep. 10, 2014)
Introduction
Principal Component Analysis (PCA), first invented by Karl Pearson in 1901, is a statistical technique for data analysis. Its main purpose is to reduce the dimensionality of the data.
Suppose there is a set of data points in a p-dimensional space. PCA’s goal is to find a linear subspace with lower dimensionality q (q [math]\displaystyle{ \leq }[/math] p), such that it contains as many as possible of data points. In another word, PCA aims to reduce the dimensionality of the data, while preserving its information (or minimizing the loss of information). Information comes from variation. For example, if all data points have the same value along one dimension, that dimension does not carry any information. So, to preserve information, the subspace need to contain components (or dimensions) along which, data has most variability. However
PCA applications
As mentioned, PCA is a method to reduce data dimension if possible to principal components such that those PCs cover as much data variation as possible.
This technique is useful in different type of applications which involve data with a huge dimension like Data pre-processing, neuroscience, Computer graphics, meteorology, oceanography, gene expression, economics, and finance among of all other applications.
Data preprocessing: Data usually are represented by lots of variables. PCA is a technique to select a subset of variables in order to figure our best model for data. In neuroscience, PCA used to identify the specific properties of a stimulus that increase a neuron’s probability of generating an action potential.
Formulation
PCA will result a data set with dimension P, described by n variables, to a smaller set of new variables and dimension Q. The new set of variables is linear combination of the principal components.
[math]\displaystyle{ u=w_1x_1+w_2x_2+w_3x_3+...+w_Dx_D = W^TX }[/math]
[math]\displaystyle{ W=\begin{bmatrix} w_1\\w_2\\w_3\\...\\w_D \end{bmatrix} }[/math], [math]\displaystyle{ X=\begin{bmatrix} x_1\\x_2\\x_3\\...\\x_D \end{bmatrix} }[/math]
[math]\displaystyle{ Var (W^T X) = W^T S X }[/math]
[math]\displaystyle{ L(W,\lambda) = W^T W - \lambda (W^T W -1 ) }[/math]
[math]\displaystyle{ \frac{\partial L}{\partial W}= 2SW -2\lambda W = 0 }[/math]
[math]\displaystyle{ SW= \lambda }[/math]
[math]\displaystyle{ W }[/math] is eigen vector of [math]\displaystyle{ S }[/math] [math]\displaystyle{ lambda }[/math] is eigen value of S
[math]\displaystyle{ W^T \lambda W = \lambda W^T W = \lambda }[/math]
[math]\displaystyle{ \lambda_1 \geq \lambda_2 \geq \lambda_3 \geq ... \geq \lambda_D }[/math]
[math]\displaystyle{ w_1 \geq w_2 \geq w_3 \geq ... \geq w_D }[/math]