residual Component Analysis: Generalizing PCA for more flexible inference in linear-Gaussian models: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 1: Line 1:
==Introduction==
==Introduction==
Probabilistic principle component analysis (PPCA) decomposes the covariance of a data vector <math> y</math> in <math>\mathbb{R}^p</math>, into a low-rank term and a spherical noise term. <center><math>y \sim \mathcal{N} (0, WW^T+\sigma I )</math></center>
Probabilistic principle component analysis (PPCA) decomposes the covariance of a data vector <math> y</math> in <math>\mathbb{R}^p</math>, into a low-rank term and a spherical noise term. <center><math>y \sim \mathcal{N} (0, WW^T+\sigma I )</math></center> <math>W \in \mathbb{R}^{p \times q}</math> such that <math>q < p-1</math> imposes a reduced rank structure on the covariance. The log-likelihood of the centered dataset <math>Y</math> in <math>\mathbb{R}^{n \times p}</math> with n data points and p features can be maximized with the result <center><math>W_{ML} = U_qL_qR^T</math></center>
 
where <math>U_q</math> are <math>q</math> principle eigenvectors of the sample covariance <math>\tilde S</math>, with <math>\tilde S = n^{-1}Y^TY</math> and <math>L^q</math> is a diagonal matrix with elements <math>l_{i,i} = (\lambda_i - \sigma^2)^{1/2}</math>, where <math>\lambda_i</math> is the ith eigenvalue of the sample covariance and <math>\sigma^2</math> is the noise variance.

Revision as of 18:01, 3 July 2013

Introduction

Probabilistic principle component analysis (PPCA) decomposes the covariance of a data vector [math]\displaystyle{ y }[/math] in [math]\displaystyle{ \mathbb{R}^p }[/math], into a low-rank term and a spherical noise term.

[math]\displaystyle{ y \sim \mathcal{N} (0, WW^T+\sigma I ) }[/math]

[math]\displaystyle{ W \in \mathbb{R}^{p \times q} }[/math] such that [math]\displaystyle{ q \lt p-1 }[/math] imposes a reduced rank structure on the covariance. The log-likelihood of the centered dataset [math]\displaystyle{ Y }[/math] in [math]\displaystyle{ \mathbb{R}^{n \times p} }[/math] with n data points and p features can be maximized with the result

[math]\displaystyle{ W_{ML} = U_qL_qR^T }[/math]

where [math]\displaystyle{ U_q }[/math] are [math]\displaystyle{ q }[/math] principle eigenvectors of the sample covariance [math]\displaystyle{ \tilde S }[/math], with [math]\displaystyle{ \tilde S = n^{-1}Y^TY }[/math] and [math]\displaystyle{ L^q }[/math] is a diagonal matrix with elements [math]\displaystyle{ l_{i,i} = (\lambda_i - \sigma^2)^{1/2} }[/math], where [math]\displaystyle{ \lambda_i }[/math] is the ith eigenvalue of the sample covariance and [math]\displaystyle{ \sigma^2 }[/math] is the noise variance.