stat841f14: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 22: Line 22:
PCA is a transformation from original space to a linear subspace with a new coordinate system. Each coordinate of this subspace is called a '''Principle Component'''. First principal component''' is the coordinate of this system along which the data points has the maximum variation. That is, if we project the data points along this coordinate, maximum variance of data is obtained (compared to any other vector in original space). Second principal component is the coordinate in the direction of the second greatest variance of the data, and so on.  
PCA is a transformation from original space to a linear subspace with a new coordinate system. Each coordinate of this subspace is called a '''Principle Component'''. First principal component''' is the coordinate of this system along which the data points has the maximum variation. That is, if we project the data points along this coordinate, maximum variance of data is obtained (compared to any other vector in original space). Second principal component is the coordinate in the direction of the second greatest variance of the data, and so on.  


Lets denote the basis of original space by <math> \mathbf{v_1}</math>, <math>\mathbf{v_2}</math>, ... , <math>\mathbf{v_p}</math>. Our goal is to find the principal components (coordinate of the linear subspace), denoted by <math>\mathbf{u_1}</math>, <math>\mathbf{u_2}</math>, ... , <math>\mathbf{u_q}</math> in the hope that <math> q \leq </math> p. First we would like to obtain the first principal component. This
Lets denote the basis of original space by <math> \mathbf{v_1}</math>, <math>\mathbf{v_2}</math>, ... , <math>\mathbf{v_p}</math>. Our goal is to find the principal components (coordinate of the linear subspace), denoted by <math>\mathbf{u_1}</math>, <math>\mathbf{u_2}</math>, ... , <math>\mathbf{u_q}</math> in the hope that <math> q \leq </math> p. First we would like to obtain the first principal component <math>\mathbf{u_1}</math>. It can be treated as vector in original space and so can be written as linear combination of the basis in original space.


<math>u=w_1x_1+w_2x_2+w_3x_3+...+w_Dx_D = W^TX </math>
<math>\mathbf{u_1}=w_1\mathbf{v_1}+w_2\mathbf{v_2}+...+w_D\mathbf{v_D} = W^TX </math>


<math>W=\begin{bmatrix}  w_1\\w_2\\w_3\\...\\w_D \end{bmatrix}</math>,
<math>W=\begin{bmatrix}  w_1\\w_2\\w_3\\...\\w_D \end{bmatrix}</math>,

Revision as of 02:07, 15 September 2014

Data Visualization (Fall 2014)

Principal Components Analysis (PCA) (Lecture: Sep. 10, 2014)

Introduction

Principal Component Analysis (PCA), first invented by Karl Pearson in 1901, is a statistical technique for data analysis. Its main purpose is to reduce the dimensionality of the data.

Suppose there is a set of data points in a p-dimensional space. PCA’s goal is to find a linear subspace with lower dimensionality q (q [math]\displaystyle{ \leq }[/math] p), such that it contains as many as possible of data points. In other words, PCA aims to reduce the dimensionality of the data, while preserving its information (or minimizing the loss of information). Information comes from variation. For example, if all data points have the same value along one dimension, that dimension does not carry any information. So, to preserve information, the subspace need to contain components (or dimensions) along which, data has its most variability. However, finding a linear subspace with lower dimensionality which include all data points is not possible in practical problems and loss of information is inevitable. But, we try to reduce this loss and capture most of the features of data.


PCA applications

As mentioned, PCA is a method to reduce data dimension if possible to principal components such that those PCs cover as much data variation as possible.

This technique is useful in different type of applications which involve data with a huge dimension like Data pre-processing, neuroscience, Computer graphics, meteorology, oceanography, gene expression, economics, and finance among of all other applications.

Data preprocessing: Data usually are represented by lots of variables. PCA is a technique to select a subset of variables in order to figure our best model for data. In neuroscience, PCA used to identify the specific properties of a stimulus that increase a neuron’s probability of generating an action potential.


Mathematical Details

PCA is a transformation from original space to a linear subspace with a new coordinate system. Each coordinate of this subspace is called a Principle Component. First principal component is the coordinate of this system along which the data points has the maximum variation. That is, if we project the data points along this coordinate, maximum variance of data is obtained (compared to any other vector in original space). Second principal component is the coordinate in the direction of the second greatest variance of the data, and so on.

Lets denote the basis of original space by [math]\displaystyle{ \mathbf{v_1} }[/math], [math]\displaystyle{ \mathbf{v_2} }[/math], ... , [math]\displaystyle{ \mathbf{v_p} }[/math]. Our goal is to find the principal components (coordinate of the linear subspace), denoted by [math]\displaystyle{ \mathbf{u_1} }[/math], [math]\displaystyle{ \mathbf{u_2} }[/math], ... , [math]\displaystyle{ \mathbf{u_q} }[/math] in the hope that [math]\displaystyle{ q \leq }[/math] p. First we would like to obtain the first principal component [math]\displaystyle{ \mathbf{u_1} }[/math]. It can be treated as vector in original space and so can be written as linear combination of the basis in original space.

[math]\displaystyle{ \mathbf{u_1}=w_1\mathbf{v_1}+w_2\mathbf{v_2}+...+w_D\mathbf{v_D} = W^TX }[/math]

[math]\displaystyle{ W=\begin{bmatrix} w_1\\w_2\\w_3\\...\\w_D \end{bmatrix} }[/math], [math]\displaystyle{ X=\begin{bmatrix} x_1\\x_2\\x_3\\...\\x_D \end{bmatrix} }[/math]

[math]\displaystyle{ Var (W^T X) = W^T S X }[/math]

[math]\displaystyle{ L(W,\lambda) = W^T W - \lambda (W^T W -1 ) }[/math]

[math]\displaystyle{ \frac{\partial L}{\partial W}= 2SW -2\lambda W = 0 }[/math]

[math]\displaystyle{ SW= \lambda }[/math]

[math]\displaystyle{ W }[/math] is eigen vector of [math]\displaystyle{ S }[/math] [math]\displaystyle{ lambda }[/math] is eigen value of S

[math]\displaystyle{ W^T \lambda W = \lambda W^T W = \lambda }[/math]

[math]\displaystyle{ \lambda_1 \geq \lambda_2 \geq \lambda_3 \geq ... \geq \lambda_D }[/math]

[math]\displaystyle{ w_1 \geq w_2 \geq w_3 \geq ... \geq w_D }[/math]