metric and Kernel Learning Using a Linear Transformation: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with " == Introduction == The motivation of the papers is to combine the advantages of Mahalanobis distance learning and kernel learning. Mahalanobis distance learning can deal with o...")
 
No edit summary
Line 1: Line 1:
== Introduction ==
== Introduction ==


Line 21: Line 20:
Given the similarity constrains S and dissimilarity constrains D, the following problem is proposed in [1]:
Given the similarity constrains S and dissimilarity constrains D, the following problem is proposed in [1]:


[[File:itml_formulation]]
[[File:itml_formulation.png]]


We can also get the kernelized form of the above formulation [2]:
We can also get the kernelized form of the above formulation [2]:


[[File:itml_kernelized]]
[[File:itml_kernelized.png]]


The paper proves that there are connections between the optimal solutions of the above two problems. Suppose the optimal learned kernel is K<sup>*</sup> and the optimal learned Mahalanobis matrix is W<sup>*</sup>, they are related as follows:
The paper proves that there are connections between the optimal solutions of the above two problems. Suppose the optimal learned kernel is K<sup>*</sup> and the optimal learned Mahalanobis matrix is W<sup>*</sup>, they are related as follows:


[[File:W_K_relation]]
[[File:W_K_relation.png]]


The identical matrix I can be generalized to any arbitrary matrix W<sub>0</sub> and the theory holds. To compute the out-of-sample extension, we can represent the inner product of the points after linear transformation by the inner product of the points before transformation:
The identical matrix I can be generalized to any arbitrary matrix W<sub>0</sub> and the theory holds. To compute the out-of-sample extension, we can represent the inner product of the points after linear transformation by the inner product of the points before transformation:
<gallery>
<gallery>
Image:Example.jpg|Caption1
Image:Example.jpg|Caption2
</gallery>
<gallery>
<gallery>
Image:Example.jpg|Caption1
Image:Example.jpg|Caption2
</gallery>
</gallery>
</gallery>

Revision as of 16:50, 5 August 2013

Introduction

The motivation of the papers is to combine the advantages of Mahalanobis distance learning and kernel learning. Mahalanobis distance learning can deal with out-of-sample points and avoid overfitting, while it is merely linear transformation and the number of parameters grows quadratically with the dimensionality. Kernel learning using non-linear kernels enable implicit non-linear transformation, while it is not easy to be applied to out-of-sample points. The paper shows metric learning with linear transformations is equivalent to learning a linear transformation kernel. They use LogDet divergence to formulate a metric learning problem. The LogDet divergence can make the optimization simpler and have a precedence in optimization and statistics. They claim that their algorithm is scalable to large data sets and allow learning of a linear transformation in the kernel space so that it can handle out-of-sample extensions. Also, even though the matrix to be learned may be infinite-dimensional, it can be fully represtend in terms of the constrained data points.


LogDet Divergence Based Metric Learning

The original Mahalanobis distance metric is as follows:

[math]\displaystyle{ d_W(x_i,x_j)=(x_i-x_j)^TW(x_i-x_j) }[/math]

In order to deal with the nonlinear case, we use a nonlinear mapping of the data points:

[math]\displaystyle{ d_W(\phi(x_i),\phi(x_j))=(\phi(x_i)-\phi(x_j))^TW(\phi(x_i)-\phi(x_j)) }[/math]

The main challenge is the computation of W. The paper uses the LogDet divergence. The LogDet divergence between two PSD matrices W and $W_0\in\Re^{d \times d}$ is defined as:

[math]\displaystyle{ D_ld(W,W_0)=tr(WW_0^{-1})-logdet(WW_0^{-1})-d }[/math]

Given the similarity constrains S and dissimilarity constrains D, the following problem is proposed in [1]:

File:itml formulation.png

We can also get the kernelized form of the above formulation [2]:

File:itml kernelized.png

The paper proves that there are connections between the optimal solutions of the above two problems. Suppose the optimal learned kernel is K* and the optimal learned Mahalanobis matrix is W*, they are related as follows:

File:W K relation.png

The identical matrix I can be generalized to any arbitrary matrix W0 and the theory holds. To compute the out-of-sample extension, we can represent the inner product of the points after linear transformation by the inner product of the points before transformation: