measuring Statistical Dependence with Hilbert-Schmidt Norm: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 42: Line 42:


==Empirical Criterion==
==Empirical Criterion==
===definition===
===Definition (Empirical HSIC)===
Let <math>Z:=\{(x_1,y_1),\cdots,(x_m,y_m)\}\subseteq \mathcal{X}\times\mathcal{Y}</math> be a series of <math>m</math> independent observations drwn from <math>p_{x,y}</math>. An estimator of HSIC, written<br>
<math>\text{HSIC}(Z,\mathcal{F},\mathcal{G}):=(m-1)^{-2}\textbf{tr}(KHLH)</math><br>
where <math>H, K, L\in\mathbb{R}^{m\times m}, K_{ij}:=k(x_i,x_j), L_{ij}:=k(y_i,y_j) \,\, \text{and} \, \, H_{ij}:=\delta_{ij}-m^{-1}</math>.
 
===Bias of Estimator===
===Bias of Estimator===
It may bee shown that the bias of the above empirical estimation is of the order <math>O(m^{-1})</math>:<br>
'''Theorem:''' Let <math>\mathbf{E}_Z</math> denote the expectation taken over <math>m</math> independent copies <math>(x_i,y_i)</math> drawn from <math>p_{xy}</math>. Then<br>
<math>\text{HSIC}(p_{xy},\mathcal{F},\mathcal{G})=\mathbf{E}_Z[\text{HSIC}(Z,\mathcal{F},\mathcal{G})]+O(m^{-1})</math>.
==Large Deviation Bound==
==Large Deviation Bound==
===Deviation Bound for U-statistics===
===Deviation Bound for U-statistics===

Revision as of 17:16, 24 June 2009

"Hilbert-Schmist Norm of the Cross-Covariance operator" is proposed as an independence criterion in reproducing kernel Hilbert spaces (RKHSs). The measure is refereed to as Hilbert-Schmidt Independence Criterion, or HSIC. An empirical estimate of this measure is introduced which may be well used in many practical application such as independent Component Analysis (ICA), Maximum Variance Unfolding (MVU), feature extraction, feature selection, ... .

RKHS Theory

Let [math]\displaystyle{ \mathcal{F} }[/math] be a Hilbert space from [math]\displaystyle{ \mathcal{X} }[/math] to [math]\displaystyle{ \mathbb{R} }[/math]. We assume [math]\displaystyle{ \mathcal{F} }[/math] is a Reproducing Kernel Hilbert Space,i.e., for all [math]\displaystyle{ x\in \mathcal{X} }[/math], the corresponding Dirac evaluation operator [math]\displaystyle{ \delta_x:\mathcal{F} \rightarrow \mathbb{R} }[/math] is a bounded (or equivalently continuous) linear functional. We denote the kernel of this operator by [math]\displaystyle{ k(x,x')=\langle \phi(x)\phi(x') \rangle_{\mathcal{F}} }[/math] where [math]\displaystyle{ k:\mathcal{X}\times \mathcal{X}\rightarrow \mathbb{R} }[/math] is a positive definite function and [math]\displaystyle{ \phi }[/math] is the feature map of [math]\displaystyle{ \mathcal{F} }[/math]. Similarly, we consider another RKHS named [math]\displaystyle{ \mathcal{G} }[/math] with Domain [math]\displaystyle{ \mathcal{Y} }[/math], kernel [math]\displaystyle{ l(\cdot,\cdot) }[/math] and feature map [math]\displaystyle{ \psi }[/math]. We assume both [math]\displaystyle{ \mathcal{F} }[/math] and [math]\displaystyle{ \mathcal{G} }[/math] are separable, i.e., they have a complete orthogonal bases.

Hilbert-Schmidt Norm

For a linear operator [math]\displaystyle{ C:\mathcal{G}\rightarrow \mathcal{F} }[/math], provided the sum converges, the Hilbert-Schmidt (HS) norm is defined as:
[math]\displaystyle{ \|C\|^2_{HS}:=\sum_{i,j}\langle Cv_i,u_j \rangle^2_{\mathcal{F}} }[/math]
where [math]\displaystyle{ u_j }[/math] and [math]\displaystyle{ v_i }[/math] are orthogonal bases of [math]\displaystyle{ \mathcal{F} }[/math] and [math]\displaystyle{ \mathcal{G} }[/math], respectively. It is easy to see that the Frobenius norm on matrices may be considered a spacial case of this norm.

Hilbert-Schmidt Operator

A Hilbert-Schmidt Operator is a linear operator for which the Hilbert-Schmidt norm (introduced above) exists.

Tensor Product Operator

We may employ any [math]\displaystyle{ f\in \mathcal{F} }[/math] and [math]\displaystyle{ g\in \mathcal{G} }[/math] to define a tensor product operator [math]\displaystyle{ f\otimes g:\mathcal{G}\rightarrow\mathcal{F} }[/math] as follows:
[math]\displaystyle{ (f\otimes g)h:=f\langle g,h\rangle_{\mathcal{G}} \quad }[/math] for all [math]\displaystyle{ h\in\mathcal{G} }[/math]
Using the definition of HS norm introduced above, we can simply show the norm of [math]\displaystyle{ f\otimes g }[/math] equals
[math]\displaystyle{ \|f\otimes g\|^2_{HS}=\|f\|^2_{\mathcal{F}}\, \|g\|^2_{\mathcal{G}} }[/math]

Cross-Covariance Operator

Mean

Mean elements of [math]\displaystyle{ \mathcal{F} }[/math] and [math]\displaystyle{ \mathcal{G} }[/math] are defined as those elements of these spaces for which

[math]\displaystyle{ \langle\mu_x,f \rangle_{\mathcal{F}}=\mathbf{E}_x[\langle\phi(x),f \rangle_{\mathcal{F}}]=\mathbf{E}_x[f(x)] }[/math]
[math]\displaystyle{ \langle\mu_y,g \rangle_{\mathcal{G}}=\mathbf{E}_y[\langle\psi(y),g \rangle_{\mathcal{G}}]=\mathbf{E}_y[g(x)] }[/math]
Based on this, [math]\displaystyle{ \|\mu_x\|^2_{\mathcal{F}} }[/math] may be calculated by applying expectation twice as follows:
[math]\displaystyle{ \|\mu_x\|^2_{\mathcal{F}}=\mathbf{E}_{x,x'}[\langle \phi(x),\phi(x')\rangle_{\mathcal{F}}]=\mathbf{E}_{x,x'}[k(x,x')] }[/math]

Cross-covariance Operator

Now we are in aposition to define the cross-covariance operator as follows
[math]\displaystyle{ C_{xy}:=\underbrace{\mathbf{E}_{x,y}[\phi(x)\otimes\psi(y)]}_{:=\tilde{C}_{xy}}-\underbrace{\mu_x\otimes\mu_y}_{:=M_{xy}} }[/math]
We will use [math]\displaystyle{ \tilde{C}_{xy} }[/math] and [math]\displaystyle{ {M_{xy}} }[/math] as the basis of our measure of dependence.

Hilbert-Schmidt Independence Criterion

Definition (HSIC)

Given separable RKHSs [math]\displaystyle{ \mathcal{F} }[/math], [math]\displaystyle{ \mathcal{G} }[/math] and a joint probability [math]\displaystyle{ p_{xy} }[/math], we define the Hilbert-Schmidt Independence Criterion(HSIC) as the squared HS-norm of the associated cross-covariance operator [math]\displaystyle{ C_{xy} }[/math]
[math]\displaystyle{ \text{HSIC}(p_{xy},\mathcal{F},\mathcal{G}):=\|C_{xy}\|^2_{HS} }[/math]

HSIC in terms of kernels

To compute HSIC we need to express it in terms of kernel functions. It can be shown that this can be achieved via the following identity:
[math]\displaystyle{ \text{HSIC}(p_{xy},\mathcal{F},\mathcal{G})=\mathbf{E}_{x,x',y,y'}[k(x,x')l(y,y')]+\mathbf{E}_{x,x'}[k(x,x')]\mathbf{E}_{y,y'}[l(y,y')]-2\mathbf{E}_{x,y}[\mathbf{E}_{x'}[k(x,x')]\mathbf{E}_{y'}[l(y,y')]] }[/math]

Empirical Criterion

Definition (Empirical HSIC)

Let [math]\displaystyle{ Z:=\{(x_1,y_1),\cdots,(x_m,y_m)\}\subseteq \mathcal{X}\times\mathcal{Y} }[/math] be a series of [math]\displaystyle{ m }[/math] independent observations drwn from [math]\displaystyle{ p_{x,y} }[/math]. An estimator of HSIC, written
[math]\displaystyle{ \text{HSIC}(Z,\mathcal{F},\mathcal{G}):=(m-1)^{-2}\textbf{tr}(KHLH) }[/math]
where [math]\displaystyle{ H, K, L\in\mathbb{R}^{m\times m}, K_{ij}:=k(x_i,x_j), L_{ij}:=k(y_i,y_j) \,\, \text{and} \, \, H_{ij}:=\delta_{ij}-m^{-1} }[/math].

Bias of Estimator

It may bee shown that the bias of the above empirical estimation is of the order [math]\displaystyle{ O(m^{-1}) }[/math]:
Theorem: Let [math]\displaystyle{ \mathbf{E}_Z }[/math] denote the expectation taken over [math]\displaystyle{ m }[/math] independent copies [math]\displaystyle{ (x_i,y_i) }[/math] drawn from [math]\displaystyle{ p_{xy} }[/math]. Then
[math]\displaystyle{ \text{HSIC}(p_{xy},\mathcal{F},\mathcal{G})=\mathbf{E}_Z[\text{HSIC}(Z,\mathcal{F},\mathcal{G})]+O(m^{-1}) }[/math].

Large Deviation Bound

Deviation Bound for U-statistics

Bound on Empirical HSIC

Independence Test using HSIC

Experimental Results