residual Component Analysis: Generalizing PCA for more flexible inference in linear-Gaussian models

From statwiki
Jump to navigation Jump to search

Introduction

Probabilistic principle component analysis (PPCA) decomposes the covariance of a data vector [math]\displaystyle{ y }[/math] in [math]\displaystyle{ \mathbb{R}^p }[/math], into a low-rank term and a spherical noise term.

[math]\displaystyle{ y \sim \mathcal{N} (0, WW^T+\sigma I ) }[/math]

[math]\displaystyle{ W \in \mathbb{R}^{p \times q} }[/math] such that [math]\displaystyle{ q \lt p-1 }[/math] imposes a reduced rank structure on the covariance. The log-likelihood of the centered dataset [math]\displaystyle{ Y }[/math] in [math]\displaystyle{ \mathbb{R}^{n \times p} }[/math] with n data points and p features

[math]\displaystyle{ ln p(Y) = \sum_{j=1}^p ln \mathcal{N} (y_{i,:}|0, XX^T+\sigma^2 I) }[/math]

can be maximized<ref name="tipping1999">

Tipping, M. E. and Bishop, C.M. Probabilistic principle component analysis. Journal of the Royal Statistical Society. Series B(Statistical Methodology), 61(3):611-622,1999

</ref> with the result

[math]\displaystyle{ W_{ML} = U_qL_qR^T }[/math]

where [math]\displaystyle{ U_q }[/math] are [math]\displaystyle{ q }[/math] principle eigenvectors of the sample covariance [math]\displaystyle{ \tilde S }[/math], with [math]\displaystyle{ \tilde S = n^{-1}Y^TY }[/math] and [math]\displaystyle{ L^q }[/math] is a diagonal matrix with elements [math]\displaystyle{ l_{i,i} = (\lambda_i - \sigma^2)^{1/2} }[/math], where [math]\displaystyle{ \lambda_i }[/math] is the ith eigenvalue of the sample covariance and [math]\displaystyle{ \sigma^2 }[/math] is the noise variance. This max-likelihood solution is rotation invariant; [math]\displaystyle{ R }[/math] is an arbitrary rotation matrix. The matrix [math]\displaystyle{ W }[/math] spans the principle subspace of the data and the model is known as probabilistic PCA.

The underlying assumption of the model is that the data set can be represented by [math]\displaystyle{ Y = XW^T+E }[/math] where [math]\displaystyle{ X }[/math] in [math]\displaystyle{ \mathbb{R}^{n \times p} }[/math] is a matrix of [math]\displaystyle{ q }[/math] dimensional latent variables and [math]\displaystyle{ E }[/math] is a matrix of noise variables [math]\displaystyle{ e_{ij} \sim \mathcal{N} (0,\sigma^2) }[/math]. The marginal log-likelihood above is obtained by placing an isotropic prior independently on the elements of [math]\displaystyle{ X }[/math] with [math]\displaystyle{ x_{ij} \sim \mathcal{N}(0,1) }[/math].

It is shown<ref name="lawerence2005"> Lawrence N.D. Probabilistic non-linear principle component analysis with Gaussian process latent variable models. Journal of Machine Learning . MIT Press, Cambridge, MA, 2006.

</ref> that the PCA solution is also obtained for log-likelihoods of the form

[math]\displaystyle{ ln p(Y) = \sum_{j=1}^p ln \mathcal{N} (y_{:,j}|0, XX^T+\sigma^2 I) }[/math]

This is recovered when we marginalize the loadings [math]\displaystyle{ W }[/math], instead of latent variable [math]\displaystyle{ X }[/math], with a Gaussian isotropic prior. This is the dual form of probabilistic PCA. This is analogous to the Dual form of PCA and similarly to the primal form, the max likelihood solution solves for the latent coordinates [math]\displaystyle{ X_{ML} = U^'_q L_qR^T }[/math], instead of the principle subspace basis. Here, [math]\displaystyle{ U^'_q }[/math] are the first [math]\displaystyle{ q }[/math] principle eigenvectors of the inner product matrix [math]\displaystyle{ p^{-1}YY^T }[/math] with [math]\displaystyle{ Lq }[/math] define as before. Both primal and dual scenarios involve maximizing likelihoods of a similar covariance structure, namely when the covariance of the Gaussians is given by a low-rank term plus a spherical term. This paper considers a more general form

[math]\displaystyle{ XX^T+\Sigma }[/math]

Where [math]\displaystyle{ \Sigma }[/math] is a general positive definite matrix.