measuring statistical dependence with Hilbert-Schmidt norms: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 6: Line 6:
== Cross-Covariance Operators ==
== Cross-Covariance Operators ==
'''Hilbert-Schmidt Norm'''. Denote by <math>\mathit{C}:\mathcal{G}\to\mathcal{F}</math> a linear operator. Provided the sum converges, the HS norm of <math>\mathit{C}</math> is defined as
'''Hilbert-Schmidt Norm'''. Denote by <math>\mathit{C}:\mathcal{G}\to\mathcal{F}</math> a linear operator. Provided the sum converges, the HS norm of <math>\mathit{C}</math> is defined as
<math>||\mathit{C}||^2_{HS}:=\sum_{i,j}<\mathit{C}v_i,u_i>_\mathcal{F}^2</math>


<math>||\mathit{C}||^2_{HS}:=\sum_{i,j}<\mathit{C}v_i,u_j>_\mathcal{F}^2</math>


Cross-covariance operator is first propose by (Baker,1973). It can be used to measure the relations between probability measures on two RKHSs.
Where <math>v_i,u_j</math> are orthonormal bases of <math>\mathcal{G}</math> and <math>\mathcal{F}</math> respectively.
Define two RKHSs <math>H_1</math> and <math>H_2</math> with inner product <math><.,.>_1</math>, <math><.,.>_2</math>. A probability measure <math>\mu_i</math> on <math>H_i,i=1,2</math> that satisfies


<math>\int_{H_i}||x||_i^2d\mu_i(x)<\infty</math>
'''Hilbert-Schmidt Operator''' is defined based on the definition of Hilbert Schmidt norm as


defines an operator <math>R_i</math> in <math>H_i</math> by
<math><\mathit{C},\mathit{D}>{HS}:=\sum_{i,j}<\mathit{C}v_i,u_j>_\mathcal{F}<\mathit{D}v_i,u_j>_\mathcal{F}</math>


<math><R_iu,v>=\int_{H_i}<x-m_i,u>_i<x-m_i,v>_id\mu_i(x)</math>  
'''Tensor Product'''. Let <math>f\in \mathcal{F}</math> and <math>g\in \mathcal{G}</math>. The tensor product operator <math>f\otimes g:\mathcal{G}\to \mathcal{F}</math> is defined as
 
<math>(f\otimes g)h:=f<g,h>_\mathcal{G}</math> for all <math>h\in \mathcal{G}</math>
 
'''Cross-Covariance Operator''' associated with the joint measure <math>p_{x,y}</math> on <math>(\mathscr{X}\times\mathscr{Y},\mathscr{\Gamma}\times\mathscr{\Lambda})</math> is a linear operator <math>C_{xy}:\mathcal{G}\to \mathcal{F}</math> defined as
 
<math> C_{xy}:=E_{x,y}[(\theta (x)-\mu_x)\otimes (\psi (y)-\mu_y)]=E_{x,y}[\theta (x)\otimes \psi (y)]-\mu_x\otimes\mu_y</math>


<math>R_i</math> is called covariance operator, if u and v are in different RKHS, then <math>R_i</math> is called cross-covariance operator.





Revision as of 17:10, 14 August 2013

This is another very popular kernel-based approach fro detecting dependence which is called HSIC(Hilbert-Schmidt Independence Criteria). It's based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces(RKHSs).This approach is simple and no user-defined regularisation is needed. Exponential convergence is guaranteed, so convergence is fast.

Background

Before the proposal of HSIC, there are already a few kernel-based independence detecting methods. Bach[] proposed a regularised correlation operator which is derived from the covariance and cross-covariance operators, and its largest singular value was used as a static to test independence. Gretton et al.[] used the largest singular value of the cross-covariance operator which resulted constrained covariance(COCO). HSIC is a extension of the concept COCO by using the entire spectrum of cross-covariance operator to determine when all its singular values are zero rather than just looking the largest singular value.

Cross-Covariance Operators

Hilbert-Schmidt Norm. Denote by [math]\displaystyle{ \mathit{C}:\mathcal{G}\to\mathcal{F} }[/math] a linear operator. Provided the sum converges, the HS norm of [math]\displaystyle{ \mathit{C} }[/math] is defined as

[math]\displaystyle{ ||\mathit{C}||^2_{HS}:=\sum_{i,j}\lt \mathit{C}v_i,u_j\gt _\mathcal{F}^2 }[/math]

Where [math]\displaystyle{ v_i,u_j }[/math] are orthonormal bases of [math]\displaystyle{ \mathcal{G} }[/math] and [math]\displaystyle{ \mathcal{F} }[/math] respectively.

Hilbert-Schmidt Operator is defined based on the definition of Hilbert Schmidt norm as

[math]\displaystyle{ \lt \mathit{C},\mathit{D}\gt {HS}:=\sum_{i,j}\lt \mathit{C}v_i,u_j\gt _\mathcal{F}\lt \mathit{D}v_i,u_j\gt _\mathcal{F} }[/math]

Tensor Product. Let [math]\displaystyle{ f\in \mathcal{F} }[/math] and [math]\displaystyle{ g\in \mathcal{G} }[/math]. The tensor product operator [math]\displaystyle{ f\otimes g:\mathcal{G}\to \mathcal{F} }[/math] is defined as

[math]\displaystyle{ (f\otimes g)h:=f\lt g,h\gt _\mathcal{G} }[/math] for all [math]\displaystyle{ h\in \mathcal{G} }[/math]

Cross-Covariance Operator associated with the joint measure [math]\displaystyle{ p_{x,y} }[/math] on [math]\displaystyle{ (\mathscr{X}\times\mathscr{Y},\mathscr{\Gamma}\times\mathscr{\Lambda}) }[/math] is a linear operator [math]\displaystyle{ C_{xy}:\mathcal{G}\to \mathcal{F} }[/math] defined as

[math]\displaystyle{ C_{xy}:=E_{x,y}[(\theta (x)-\mu_x)\otimes (\psi (y)-\mu_y)]=E_{x,y}[\theta (x)\otimes \psi (y)]-\mu_x\otimes\mu_y }[/math]


References

[1] Gretton, Arthur, et al. "Measuring statistical dependence with Hilbert-Schmidt norms." Algorithmic learning theory. Springer Berlin Heidelberg, 2005.

[2] Fukumizu, Kenji, Francis R. Bach, and Michael I. Jordan. "Kernel dimension reduction in regression." The Annals of Statistics 37.4 (2009): 1871-1905.

[3] Bach, Francis R., and Michael I. Jordan. "Kernel independent component analysis." The Journal of Machine Learning Research 3 (2003): 1-48.

[4] Baker, Charles R. "Joint measures and cross-covariance operators." Transactions of the American Mathematical Society 186 (1973): 273-289.