Difference between revisions of "measuring statistical dependence with Hilbert-Schmidt norms"

This is another very popular kernel-based approach fro detecting dependence which is called HSIC(Hilbert-Schmidt Independence Criteria). It's based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces(RKHSs).This approach is simple and no user-defined regularisation is needed. Exponential convergence is guaranteed, so convergence is fast.

Background

Before the proposal of HSIC, there are already a few kernel-based independence detecting methods. Bach[] proposed a regularised correlation operator which is derived from the covariance and cross-covariance operators, and its largest singular value was used as a static to test independence. Gretton et al.[] used the largest singular value of the cross-covariance operator which resulted constrained covariance(COCO). HSIC is a extension of the concept COCO by using the entire spectrum of cross-covariance operator to determine when all its singular values are zero rather than just looking the largest singular value.

Cross-Covariance Operators

Hilbert-Schmidt Norm. Denote by $\mathit{C}:\mathcal{G}\to\mathcal{F}$ a linear operator. Provided the sum converges, the HS norm of $\mathit{C}$ is defined as

$||\mathit{C}||^2_{HS}:=\sum_{i,j}\lt \mathit{C}v_i,u_j\gt _\mathcal{F}^2$

Where $v_i,u_j$ are orthonormal bases of $\mathcal{G}$ and $\mathcal{F}$ respectively.

Hilbert-Schmidt Operator is defined based on the definition of Hilbert Schmidt norm as

$\lt \mathit{C},\mathit{D}\gt {HS}:=\sum_{i,j}\lt \mathit{C}v_i,u_j\gt _\mathcal{F}\lt \mathit{D}v_i,u_j\gt _\mathcal{F}$

Tensor Product. Let $f\in \mathcal{F}$ and $g\in \mathcal{G}$. The tensor product operator $f\otimes g:\mathcal{G}\to \mathcal{F}$ is defined as

$(f\otimes g)h:=f\lt g,h\gt _\mathcal{G}$ for all $h\in \mathcal{G}$

Cross-Covariance Operator associated with the joint measure $p_{x,y}$ on $(\mathscr{X}\times\mathscr{Y},\mathscr{\Gamma}\times\mathscr{\Lambda})$ is a linear operator $C_{xy}:\mathcal{G}\to \mathcal{F}$ defined as

$C_{xy}:=E_{x,y}[(\theta (x)-\mu_x)\otimes (\psi (y)-\mu_y)]=E_{x,y}[\theta (x)\otimes \psi (y)]-\mu_x\otimes\mu_y$

References

[1] Gretton, Arthur, et al. "Measuring statistical dependence with Hilbert-Schmidt norms." Algorithmic learning theory. Springer Berlin Heidelberg, 2005.

[2] Fukumizu, Kenji, Francis R. Bach, and Michael I. Jordan. "Kernel dimension reduction in regression." The Annals of Statistics 37.4 (2009): 1871-1905.

[3] Bach, Francis R., and Michael I. Jordan. "Kernel independent component analysis." The Journal of Machine Learning Research 3 (2003): 1-48.

[4] Baker, Charles R. "Joint measures and cross-covariance operators." Transactions of the American Mathematical Society 186 (1973): 273-289.