kernel Dimension Reduction in Regression: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 31: Line 31:
<math> <\sum_{YX}f,g>_H_{y} = E_{XY}[(f(X)-E_X{[f(X)]})(g(Y)-E_Y{[g(Y)]})]</math>
<math> <\sum_{YX}f,g>_H_{y} = E_{XY}[(f(X)-E_X{[f(X)]})(g(Y)-E_Y{[g(Y)]})]</math>
It holds for all f in Hx and g in Hy.  
It holds for all f in Hx and g in Hy.  
<math>\sum_{XX},\sum_{YY}</math> can be defined likewise.
<math>\sum_{XX},\sum_{YY}</math>can be defined likewise. A important theorem is that  
A important theorem is that  
 
<math>\sum_{YY|X}=\sum_{YY}-\sum_{YX}\sum_{XX}^{-1}\sum_{XY}</math>
<math>\sum_{YY|X}=\sum_{YY}-\sum_{YX}\sum_{XX}^{-1}\sum_{XY}</math>



Revision as of 19:14, 16 July 2013

The problem of Sufficient Dimension Reduction(SDR) for regression is to find a subspace such that covariates are conditionally independent given the subspace. In classical SDR regression methods, marginal distribution of explanatory variables are needed to calculate the independency measurement. This paper proposes that conditional independence can be characterized in terms of conditional covariance operators on Reproducing Kernel Hilbert Spaces(RKHS). This is the first few papers on independency measurement in RKHS(Other recent methods are the RHIC and dCor).


Sufficient Dimension Reduction(SDR)

The problem of SDR for regression is that of finding a subspace S such that the projection of the covariate vector X onto S captures the statistical dependency of the response Y on X[ref]. That is: [math]\displaystyle{ Y\perp X | \Pi_S X }[/math] Where [math]\displaystyle{ \Pi_S X }[/math] denotes the orthogonal projection of X on to S. performing a regression of X on Y generally requires making assumptions with respect to the probability distribution of X which can be hard to justify[ref]. Most of the previous regression methods assume linearity between projected X and Y. This assumption only stand when the distribution of X is elliptic.


Cross-covariance operator

Cross-covariance operator is first propose by (Baker,1973). It can be used to measure the relations between probability measures on two RKHSs. Define two RKHSs [math]\displaystyle{ H_1 }[/math] and [math]\displaystyle{ H_2 }[/math] with inner product [math]\displaystyle{ \lt .,.\gt _1 }[/math], [math]\displaystyle{ \lt .,.\gt _2 }[/math]. A probability measure [math]\displaystyle{ \mu_i }[/math] on [math]\displaystyle{ H_i,i=1,2 }[/math] that satisfies

[math]\displaystyle{ \int_{H_i}||x||_i^2d\mu_i(x)\lt \infty }[/math]

defines an operator [math]\displaystyle{ R_i }[/math] in [math]\displaystyle{ H_i }[/math] by

[math]\displaystyle{ \lt R_iu,v\gt =\int_{H_i}\lt x-m_i,u\gt _i\lt x-m_i,v\gt _id\mu_i(x) }[/math]

[math]\displaystyle{ R_i }[/math] is called covariance operator, if u and v are in different RKHS, then [math]\displaystyle{ R_i }[/math] is called cross-covariance operator.


Kernel Dimension Reduction for Regression

This paper calculate conditional independency measurements by using conditional covariance operators in RKHS. No strong assumptions are needed on [math]\displaystyle{ P_{\Pi_S}(Y|\Pi_S X) }[/math] or P(X). Let [math]\displaystyle{ (H_x,k_x) }[/math] and [math]\displaystyle{ (H_y,k_y) }[/math] be RKHS's of functions on X and Y. The cross-covariance operator of (X,Y) from [math]\displaystyle{ H_x }[/math] to [math]\displaystyle{ H_y }[/math] can be defined as [math]\displaystyle{ \lt \sum_{YX}f,g\gt _H_{y} = E_{XY}[(f(X)-E_X{[f(X)]})(g(Y)-E_Y{[g(Y)]})] }[/math] It holds for all f in Hx and g in Hy. [math]\displaystyle{ \sum_{XX},\sum_{YY} }[/math]can be defined likewise. A important theorem is that

[math]\displaystyle{ \sum_{YY|X}=\sum_{YY}-\sum_{YX}\sum_{XX}^{-1}\sum_{XY} }[/math]

Now assume a unknown projection matrix B that projects n dimensional X into a d dimensional subspace U by [math]\displaystyle{ U=B^TX }[/math] The SDR criterion can be rewrite to [math]\displaystyle{ B = \arg \min_B \sum_{YY|U} }[/math] It makes sense that when U captures most of the relationship between X and Y, given U, Y is much determined, so the conditional covariance operator should be small. It can be proved that when [math]\displaystyle{ \sum_{YY|U}=\sum_{YY|X} }[/math], X and Y are conditionally independent given U, and [math]\displaystyle{ \sum_{YY|U}\gt \sum_{YY|X} }[/math].

Define [math]\displaystyle{ \tilde{k} }[/math]

Use empirical cross-covariance operators which can be calculated from samples , the problem can be written to

[math]\displaystyle{ }[/math]