kernel Dimension Reduction in Regression: Difference between revisions
No edit summary |
No edit summary |
||
Line 12: | Line 12: | ||
---- | ---- | ||
Cross-covariance operator is first propose by (Baker,1973). It can be used to measure the relations between probability measures on two RKHSs. | Cross-covariance operator is first propose by (Baker,1973). It can be used to measure the relations between probability measures on two RKHSs. | ||
Define two RKHSs <math>H_1</math> and <math>H_2</math> with inner product <math><.,.>_1</math>, <math><.,.>_2</math>. A probability measure <math>\ | Define two RKHSs <math>H_1</math> and <math>H_2</math> with inner product <math><.,.>_1</math>, <math><.,.>_2</math>. A probability measure <math>\mu_i</math> on <math>H_i,i=1,2</math> that satisfies | ||
<math>\int_{H_i}||x||_i^2d\ | <math>\int_{H_i}||x||_i^2d\mu_i(x)<\infty</math> | ||
defines an operator <math>R_i</math> in <math>H_i</math> by | defines an operator <math>R_i</math> in <math>H_i</math> by | ||
<math><R_iu,v>=\int_{H_i}<x-m_i,u>_i<x-m_i,v>_id\ | <math><R_iu,v>=\int_{H_i}<x-m_i,u>_i<x-m_i,v>_id\mu_i(x)</math> | ||
<math>R_i</math> is called covariance operator, if u and v are in different RKHS, then <math>R_i</math> is called cross-covariance operator. | <math>R_i</math> is called covariance operator, if u and v are in different RKHS, then <math>R_i</math> is called cross-covariance operator. | ||
Revision as of 18:13, 16 July 2013
Introduction
The problem of Sufficient Dimension Reduction(SDR) for regression is to find a subspace such that covariates are conditionally independent given the subspace. In classical SDR regression methods, marginal distribution of explanatory variables are needed to calculate the independency measurement. This paper proposes that conditional independence can be characterized in terms of conditional covariance operators on Reproducing Kernel Hilbert Spaces(RKHS). This is the first few papers on independency measurement in RKHS(Other recent methods are the RHIC and dCor). No linearity or ellipticity conditions on variables are needed.
Sufficient Dimension Reduction(SDR)
The problem of SDR for regression is that of finding a subspace S such that the projection of the covariate vector X onto S captures the statistical dependency of the response Y on X[ref]. That is: [math]\displaystyle{ Y\perp X | \Pi_S X }[/math] Where [math]\displaystyle{ \Pi_S X }[/math] denotes the orthogonal projection of X on to S. performing a regression of X on Y generally requires making assumptions with respect to the probability distribution of X which can be hard to justify[ref]. Most of the previous regression methods assume linearity between projected X and Y. This assumption only stand when the distribution of X is elliptic.
Cross-covariance operator
Cross-covariance operator is first propose by (Baker,1973). It can be used to measure the relations between probability measures on two RKHSs. Define two RKHSs [math]\displaystyle{ H_1 }[/math] and [math]\displaystyle{ H_2 }[/math] with inner product [math]\displaystyle{ \lt .,.\gt _1 }[/math], [math]\displaystyle{ \lt .,.\gt _2 }[/math]. A probability measure [math]\displaystyle{ \mu_i }[/math] on [math]\displaystyle{ H_i,i=1,2 }[/math] that satisfies [math]\displaystyle{ \int_{H_i}||x||_i^2d\mu_i(x)\lt \infty }[/math] defines an operator [math]\displaystyle{ R_i }[/math] in [math]\displaystyle{ H_i }[/math] by [math]\displaystyle{ \lt R_iu,v\gt =\int_{H_i}\lt x-m_i,u\gt _i\lt x-m_i,v\gt _id\mu_i(x) }[/math] [math]\displaystyle{ R_i }[/math] is called covariance operator, if u and v are in different RKHS, then [math]\displaystyle{ R_i }[/math] is called cross-covariance operator.
Kernel Dimension Reduction for Regression
This paper calculate conditional independency measurements by using conditional covariance operators in RKHS. No strong assumptions are needed on [math]\displaystyle{ P_{\Pi_S}(Y|\Pi_S X) }[/math] or P(X). Let [math]\displaystyle{ (H_x,k_x) }[/math] and [math]\displaystyle{ (H_y,k_y) }[/math] be RKHS's of functions on X and Y. The cross-covariance operator of (X,Y) from [math]\displaystyle{ H_x }[/math] to [math]\displaystyle{ H_y }[/math] can be defined as [math]\displaystyle{ \lt g,\sum_{YX}f\gt _H_y=E_{XY}[(f(X)-E_X{[f(X)]})(g(Y)-E_Y{[g(Y)]})] }[/math] It holds for all f in Hx and g in Hy. [math]\displaystyle{ \sum_{XX},\sum_{YY} }[/math] can be defined likewise. A important theorem is that [math]\displaystyle{ \sum_{YY|X}=\sum_{YY}-\sum_{YX}\sum_{XX}^{-1}\sum_{XY} }[/math]
Now assume a unknown projection matrix B that projects n dimensional X into a d dimensional subspace U by [math]\displaystyle{ U=B^TX }[/math] The SDR criterion can be rewrite to [math]\displaystyle{ B = \arg \min_B \sum_{YY|U} }[/math] It makes sense that when U captures most of the relationship between X and Y, given U, Y is much determined, so the conditional covariance operator should be small. It can be proved that when [math]\displaystyle{ \sum_{YY|U}=\sum_{YY|X} }[/math], X and Y are conditionally independent given U, and [math]\displaystyle{ \sum_{YY|U}\gt \sum_{YY|X} }[/math].