measuring statistical dependence with Hilbert-Schmidt norms

From statwiki
Revision as of 18:10, 14 August 2013 by L274wang (talk | contribs)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This is another very popular kernel-based approach fro detecting dependence which is called HSIC(Hilbert-Schmidt Independence Criteria). It's based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces(RKHSs).This approach is simple and no user-defined regularisation is needed. Exponential convergence is guaranteed, so convergence is fast.

Background

Before the proposal of HSIC, there are already a few kernel-based independence detecting methods. Bach[] proposed a regularised correlation operator which is derived from the covariance and cross-covariance operators, and its largest singular value was used as a static to test independence. Gretton et al.[] used the largest singular value of the cross-covariance operator which resulted constrained covariance(COCO). HSIC is a extension of the concept COCO by using the entire spectrum of cross-covariance operator to determine when all its singular values are zero rather than just looking the largest singular value.

Cross-Covariance Operators

Hilbert-Schmidt Norm. Denote by [math]\displaystyle{ \mathit{C}:\mathcal{G}\to\mathcal{F} }[/math] a linear operator. Provided the sum converges, the HS norm of [math]\displaystyle{ \mathit{C} }[/math] is defined as

[math]\displaystyle{ ||\mathit{C}||^2_{HS}:=\sum_{i,j}\lt \mathit{C}v_i,u_j\gt _\mathcal{F}^2 }[/math]

Where [math]\displaystyle{ v_i,u_j }[/math] are orthonormal bases of [math]\displaystyle{ \mathcal{G} }[/math] and [math]\displaystyle{ \mathcal{F} }[/math] respectively.

Hilbert-Schmidt Operator is defined based on the definition of Hilbert Schmidt norm as

[math]\displaystyle{ \lt \mathit{C},\mathit{D}\gt {HS}:=\sum_{i,j}\lt \mathit{C}v_i,u_j\gt _\mathcal{F}\lt \mathit{D}v_i,u_j\gt _\mathcal{F} }[/math]

Tensor Product. Let [math]\displaystyle{ f\in \mathcal{F} }[/math] and [math]\displaystyle{ g\in \mathcal{G} }[/math]. The tensor product operator [math]\displaystyle{ f\otimes g:\mathcal{G}\to \mathcal{F} }[/math] is defined as

[math]\displaystyle{ (f\otimes g)h:=f\lt g,h\gt _\mathcal{G} }[/math] for all [math]\displaystyle{ h\in \mathcal{G} }[/math]

Cross-Covariance Operator associated with the joint measure [math]\displaystyle{ p_{x,y} }[/math] on [math]\displaystyle{ (\mathscr{X}\times\mathscr{Y},\mathscr{\Gamma}\times\mathscr{\Lambda}) }[/math] is a linear operator [math]\displaystyle{ C_{xy}:\mathcal{G}\to \mathcal{F} }[/math] defined as

[math]\displaystyle{ C_{xy}:=E_{x,y}[(\theta (x)-\mu_x)\otimes (\psi (y)-\mu_y)]=E_{x,y}[\theta (x)\otimes \psi (y)]-\mu_x\otimes\mu_y }[/math]


References

[1] Gretton, Arthur, et al. "Measuring statistical dependence with Hilbert-Schmidt norms." Algorithmic learning theory. Springer Berlin Heidelberg, 2005.

[2] Fukumizu, Kenji, Francis R. Bach, and Michael I. Jordan. "Kernel dimension reduction in regression." The Annals of Statistics 37.4 (2009): 1871-1905.

[3] Bach, Francis R., and Michael I. Jordan. "Kernel independent component analysis." The Journal of Machine Learning Research 3 (2003): 1-48.

[4] Baker, Charles R. "Joint measures and cross-covariance operators." Transactions of the American Mathematical Society 186 (1973): 273-289.