measuring and testing dependence by correlation of distances: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with "== References == [1] Székely, Gábor J., Maria L. Rizzo, and Nail K. Bakirov. "Measuring and testing dependence by correlation of distances." The Annals of Statistics 35.6 (2007...")
 
No edit summary
Line 1: Line 1:
== Background ==
dCov is another dependence measurement between two random vectors. For all random variables with finite first moments, the dCov coefficient generalizes the idea of correlation in two ways. First, dCov can be applied to any two random variables with any dimensions, that is the two random variables could be in different dimensions. Second, dCov is equal to zero if and only is the two variables are independent. These features are very similar to HSIC(Hilbert Schmidt Independence Criteria). The relationship between the two coefficient has been a open question. A recent research [2] shows that HSIC is a extension of dCov.
== Definition ==
The dCov coefficient is de�ned as a weighted L2 distance between the joint and the product of the marginal characteristic functions of the random vectors. The
choice of the weights is crucial and ensures the independence property. The dCov coe�cient can also be written in terms of the expectations of Euclidean
distances which is easier to interpret:
<math>
\mathcal{V}^2=E[|X_1-X_2||Y_1-Y_2|]+E[|X_1-X_2|)E(|Y_1-Y_2|]-E[|X_1-X_2||Y_1-Y_3|]
</math>
in which <math>X_1,X_2,Y_1,Y_2</math> are independent copies of the randoms variables X and Y. |X_1,X_2| is the Euclidean distance. A straightforward empirical estimate <math>\mathcal{V}_n^2</math> is known as <math>dCov_n^2(X,Y)</math>:
<math>
dCov_n^2(X,Y)=\frac{1}{n^2}\sum_{i,j=1}^n d_{ij}^Xd_{ij}^Y+d_{..}^Xd_{..}^Y-2\frac{1}{n}\sum_{i=1}^n d_{i.}^Xd_{i.}^Y\\
=\frac{1}{n^2}\sum_{i,j=1}^n (d_{ij}^X-d_{i.}^X-d_{.j}^X+d_{..}^X)(d_{ij}^Y-d_{i.}^Y-d_{.j}^Y+d_{..}^Y)
</math>
once <math>dCov_n^2(X,Y)</math> is defined, the correlation coefficient can be defined:
<math>
dCor_n^2(X,Y)=\frac{<C\Delta_XC,C\Delta_YC>}{||C\Delta_XC|| ||C\Delta_YC||}
</math>
Because the Euclidean distance used in dCor is not squared Euclidean, dCor is able to detect non-linear correlations.
== Experiment ==
[[File:Dcor-1.png]]
[[File:Dcor-2.png]]
== References ==
== References ==
[1] Székely, Gábor J., Maria L. Rizzo, and Nail K. Bakirov. "Measuring and testing dependence by correlation of distances." The Annals of Statistics 35.6 (2007): 2769-2794.
[1] Székely, Gábor J., Maria L. Rizzo, and Nail K. Bakirov. "Measuring and testing dependence by correlation of distances." The Annals of Statistics 35.6 (2007): 2769-2794.
 
[2] Sejdinovic, Dino, et al. "Equivalence of distance-based and RKHS-based statistics in hypothesis testing." arXiv preprint arXiv:1207.6076 (2012).
[2] Fukumizu, Kenji, Francis R. Bach, and Michael I. Jordan. "Kernel dimension reduction in regression." The Annals of Statistics 37.4 (2009): 1871-1905.
[3] Fukumizu, Kenji, Francis R. Bach, and Michael I. Jordan. "Kernel dimension reduction in regression." The Annals of Statistics 37.4 (2009): 1871-1905.

Revision as of 11:58, 15 August 2013

Background

dCov is another dependence measurement between two random vectors. For all random variables with finite first moments, the dCov coefficient generalizes the idea of correlation in two ways. First, dCov can be applied to any two random variables with any dimensions, that is the two random variables could be in different dimensions. Second, dCov is equal to zero if and only is the two variables are independent. These features are very similar to HSIC(Hilbert Schmidt Independence Criteria). The relationship between the two coefficient has been a open question. A recent research [2] shows that HSIC is a extension of dCov.

Definition

The dCov coefficient is de�ned as a weighted L2 distance between the joint and the product of the marginal characteristic functions of the random vectors. The choice of the weights is crucial and ensures the independence property. The dCov coe�cient can also be written in terms of the expectations of Euclidean distances which is easier to interpret:

[math]\displaystyle{ \mathcal{V}^2=E[|X_1-X_2||Y_1-Y_2|]+E[|X_1-X_2|)E(|Y_1-Y_2|]-E[|X_1-X_2||Y_1-Y_3|] }[/math] in which [math]\displaystyle{ X_1,X_2,Y_1,Y_2 }[/math] are independent copies of the randoms variables X and Y. |X_1,X_2| is the Euclidean distance. A straightforward empirical estimate [math]\displaystyle{ \mathcal{V}_n^2 }[/math] is known as [math]\displaystyle{ dCov_n^2(X,Y) }[/math]: [math]\displaystyle{ dCov_n^2(X,Y)=\frac{1}{n^2}\sum_{i,j=1}^n d_{ij}^Xd_{ij}^Y+d_{..}^Xd_{..}^Y-2\frac{1}{n}\sum_{i=1}^n d_{i.}^Xd_{i.}^Y\\ =\frac{1}{n^2}\sum_{i,j=1}^n (d_{ij}^X-d_{i.}^X-d_{.j}^X+d_{..}^X)(d_{ij}^Y-d_{i.}^Y-d_{.j}^Y+d_{..}^Y) }[/math]

once [math]\displaystyle{ dCov_n^2(X,Y) }[/math] is defined, the correlation coefficient can be defined: [math]\displaystyle{ dCor_n^2(X,Y)=\frac{\lt C\Delta_XC,C\Delta_YC\gt }{||C\Delta_XC|| ||C\Delta_YC||} }[/math] Because the Euclidean distance used in dCor is not squared Euclidean, dCor is able to detect non-linear correlations.


Experiment

References

[1] Székely, Gábor J., Maria L. Rizzo, and Nail K. Bakirov. "Measuring and testing dependence by correlation of distances." The Annals of Statistics 35.6 (2007): 2769-2794. [2] Sejdinovic, Dino, et al. "Equivalence of distance-based and RKHS-based statistics in hypothesis testing." arXiv preprint arXiv:1207.6076 (2012). [3] Fukumizu, Kenji, Francis R. Bach, and Michael I. Jordan. "Kernel dimension reduction in regression." The Annals of Statistics 37.4 (2009): 1871-1905.