Difference between revisions of "measuring and testing dependence by correlation of distances"

Background

dCov is another dependence measurement between two random vectors. For all random variables with finite first moments, the dCov coefficient generalizes the idea of correlation in two ways. First, dCov can be applied to any two random variables with any dimensions, that is the two random variables could be in different dimensions. Second, dCov is equal to zero if and only is the two variables are independent. These features are very similar to HSIC(Hilbert Schmidt Independence Criteria). The relationship between the two coefficient has been a open question. A recent research [2] shows that HSIC is a extension of dCov.

Definition

The dCov coefficient is defined as a weighted L2 distance between the joint and the product of the marginal characteristic functions of the random vectors. The choice of the weights is crucial and ensures the independence property. The dCov coefficient can also be written in terms of the expectations of Euclidean distances which is easier to interpret:

$\mathcal{V}^2=E[|X_1-X_2||Y_1-Y_2|]+E[|X_1-X_2|)E(|Y_1-Y_2|]-E[|X_1-X_2||Y_1-Y_3|]$ in which $X_1,X_2,Y_1,Y_2$ are independent copies of the randoms variables X and Y. |X_1,X_2| is the Euclidean distance. A straightforward empirical estimate $\mathcal{V}_n^2$ is known as $dCov_n^2(X,Y)$: $dCov_n^2(X,Y)=\frac{1}{n^2}\sum_{i,j=1}^n d_{ij}^Xd_{ij}^Y+d_{..}^Xd_{..}^Y-2\frac{1}{n}\sum_{i=1}^n d_{i.}^Xd_{i.}^Y$

$=\frac{1}{n^2}\sum_{i,j=1}^n (d_{ij}^X-d_{i.}^X-d_{.j}^X+d_{..}^X)(d_{ij}^Y-d_{i.}^Y-d_{.j}^Y+d_{..}^Y)$

once $dCov_n^2(X,Y)$ is defined, the correlation coefficient can be defined: $dCor_n^2(X,Y)=\frac{\lt C\Delta_XC,C\Delta_YC\gt }{||C\Delta_XC|| ||C\Delta_YC||}$ Because the Euclidean distance used in dCor is not squared Euclidean, dCor is able to detect non-linear correlations.

References

[1] Székely, Gábor J., Maria L. Rizzo, and Nail K. Bakirov. "Measuring and testing dependence by correlation of distances." The Annals of Statistics 35.6 (2007): 2769-2794. [2] Sejdinovic, Dino, et al. "Equivalence of distance-based and RKHS-based statistics in hypothesis testing." arXiv preprint arXiv:1207.6076 (2012). [3] Fukumizu, Kenji, Francis R. Bach, and Michael I. Jordan. "Kernel dimension reduction in regression." The Annals of Statistics 37.4 (2009): 1871-1905.