residual Component Analysis: Generalizing PCA for more flexible inference in linear-Gaussian models: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 1: Line 1:
==Introduction==
==Introduction==
Probabilistic principle component analysis (PPCA) decomposes the covariance of a data vector <math> y</math> in <math>\mathbb{R}^p</math>, into a low-rank term and a spherical noise term. <center><math>y \sim \mathcal{N} (0, WW^T+\sigma I )</math></center> <math>W \in \mathbb{R}^{p \times q}</math> such that <math>q < p-1</math> imposes a reduced rank structure on the covariance. The log-likelihood of the centered dataset <math>Y</math> in <math>\mathbb{R}^{n \times p}</math> with n data points and p features can be maximized with the result <center><math>W_{ML} = U_qL_qR^T</math></center>  
Probabilistic principle component analysis (PPCA) decomposes the covariance of a data vector <math> y</math> in <math>\mathbb{R}^p</math>, into a low-rank term and a spherical noise term. <center><math>y \sim \mathcal{N} (0, WW^T+\sigma I )</math></center> <math>W \in \mathbb{R}^{p \times q}</math> such that <math>q < p-1</math> imposes a reduced rank structure on the covariance. The log-likelihood of the centered dataset <math>Y</math> in <math>\mathbb{R}^{n \times p}</math> with n data points and p features<center><math> ln p(Y) = \sum_{j=1}^p ln \mathcal{N} (y_{:,j}|0, XX^T+\sigma^2 I)</math></center> can be maximized with the result <center><math>W_{ML} = U_qL_qR^T</math></center>  


where <math>U_q</math> are <math>q</math> principle eigenvectors of the sample covariance <math>\tilde S</math>, with <math>\tilde S = n^{-1}Y^TY</math> and <math>L^q</math> is a diagonal matrix with elements <math>l_{i,i} = (\lambda_i - \sigma^2)^{1/2}</math>, where <math>\lambda_i</math> is the ith eigenvalue of the sample covariance and <math>\sigma^2</math> is the noise variance. This max-likelihood solution is rotation invariant; <math>R</math> is an arbitrary rotation matrix. The matrix <math>W</math> spans the principle subspace of the data and the model is known as probabilistic PCA.
where <math>U_q</math> are <math>q</math> principle eigenvectors of the sample covariance <math>\tilde S</math>, with <math>\tilde S = n^{-1}Y^TY</math> and <math>L^q</math> is a diagonal matrix with elements <math>l_{i,i} = (\lambda_i - \sigma^2)^{1/2}</math>, where <math>\lambda_i</math> is the ith eigenvalue of the sample covariance and <math>\sigma^2</math> is the noise variance. This max-likelihood solution is rotation invariant; <math>R</math> is an arbitrary rotation matrix. The matrix <math>W</math> spans the principle subspace of the data and the model is known as probabilistic PCA.


The underlying assumption of the model is that the data set can be represented by <math>Y = XW^T+E</math> where <math>X</math> in <math>\mathbb{R}^{n \times p}</math> is a matrix of <math>q</math> dimensional latent variables and <math>E</math> is a matrix of noise variables <math> e_{ij} \sim \mathcal{N} (0,\sigma^2)</math>.  The marginal log-likelihood above is obtained by placing an isotropic prior independently on the elements of <math>X</math> with <math>x_{ij} \sim \mathcal{N}(0,1)</math>.
The underlying assumption of the model is that the data set can be represented by <math>Y = XW^T+E</math> where <math>X</math> in <math>\mathbb{R}^{n \times p}</math> is a matrix of <math>q</math> dimensional latent variables and <math>E</math> is a matrix of noise variables <math> e_{ij} \sim \mathcal{N} (0,\sigma^2)</math>.  The marginal log-likelihood above is obtained by placing an isotropic prior independently on the elements of <math>X</math> with <math>x_{ij} \sim \mathcal{N}(0,1)</math>.

Revision as of 22:33, 3 July 2013

Introduction

Probabilistic principle component analysis (PPCA) decomposes the covariance of a data vector [math]\displaystyle{ y }[/math] in [math]\displaystyle{ \mathbb{R}^p }[/math], into a low-rank term and a spherical noise term.

[math]\displaystyle{ y \sim \mathcal{N} (0, WW^T+\sigma I ) }[/math]

[math]\displaystyle{ W \in \mathbb{R}^{p \times q} }[/math] such that [math]\displaystyle{ q \lt p-1 }[/math] imposes a reduced rank structure on the covariance. The log-likelihood of the centered dataset [math]\displaystyle{ Y }[/math] in [math]\displaystyle{ \mathbb{R}^{n \times p} }[/math] with n data points and p features

[math]\displaystyle{ ln p(Y) = \sum_{j=1}^p ln \mathcal{N} (y_{:,j}|0, XX^T+\sigma^2 I) }[/math]

can be maximized with the result

[math]\displaystyle{ W_{ML} = U_qL_qR^T }[/math]

where [math]\displaystyle{ U_q }[/math] are [math]\displaystyle{ q }[/math] principle eigenvectors of the sample covariance [math]\displaystyle{ \tilde S }[/math], with [math]\displaystyle{ \tilde S = n^{-1}Y^TY }[/math] and [math]\displaystyle{ L^q }[/math] is a diagonal matrix with elements [math]\displaystyle{ l_{i,i} = (\lambda_i - \sigma^2)^{1/2} }[/math], where [math]\displaystyle{ \lambda_i }[/math] is the ith eigenvalue of the sample covariance and [math]\displaystyle{ \sigma^2 }[/math] is the noise variance. This max-likelihood solution is rotation invariant; [math]\displaystyle{ R }[/math] is an arbitrary rotation matrix. The matrix [math]\displaystyle{ W }[/math] spans the principle subspace of the data and the model is known as probabilistic PCA.

The underlying assumption of the model is that the data set can be represented by [math]\displaystyle{ Y = XW^T+E }[/math] where [math]\displaystyle{ X }[/math] in [math]\displaystyle{ \mathbb{R}^{n \times p} }[/math] is a matrix of [math]\displaystyle{ q }[/math] dimensional latent variables and [math]\displaystyle{ E }[/math] is a matrix of noise variables [math]\displaystyle{ e_{ij} \sim \mathcal{N} (0,\sigma^2) }[/math]. The marginal log-likelihood above is obtained by placing an isotropic prior independently on the elements of [math]\displaystyle{ X }[/math] with [math]\displaystyle{ x_{ij} \sim \mathcal{N}(0,1) }[/math].