summary: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 17: Line 17:


Input: A set of vectors <math> I=\{x_1,x_2,......,x_p\} </math>, where <math> x_i\in \mathbb{R}^D, \forall i=1,2,3......,n. </math>
Input: A set of vectors <math> I=\{x_1,x_2,......,x_p\} </math>, where <math> x_i\in \mathbb{R}^D, \forall i=1,2,3......,n. </math>
Output: A parametric function <math>G_W:\mathbb{R}^D \rightarrow \mathbb{R}^d </math> with <math> d<<D </math>


The optimization problem of BoostMetric is similar to the large margin nearest neighbor algorithm (LMNN [4]). In the preprocessing step, the labeled training samples are required to be transformed into "triplets" (a <sub>i</sub>, a <sub>j</sub>, a <sub>k</sub>), where a <sub>i</sub> and a <sub>j</sub> are in the same class, but a <sub>i</sub> and a <sub>k</sub> are in different classes. Let us denote dist <sub>i,j</sub>  and dist<sub>i,k</sub>  as the distance between a<sub>i</sub>  and a<sub>j</sub>  and the distance between a<sub>i</sub>  and a<sub>k</sub>  separately. The goal is to maximize the difference between these two distances.
Output: A parametric function <math>G_W:\mathbb{R}^D \rightarrow \mathbb{R}^d </math> with <math> d<<D </math> such that it satisfies the following 3 properties:1.Simple distance measures in the output space should approximate the neighborhood relationships in the input space; 2.The mapping should not be constrained to implementing simple distance measures in the input space and should be able to learn invariances to complex transformations; 3.It should faithful even for samples whose neighborhood relationship are unknown.


Here the distance is Mahalanobis matrix represented as follows:
Solution:
Build up clusters according to similarities of the vectors. The similarities are measured by prior knowledge. Set Y=0 if <math> x_1 </math> and <math> x_2 </math> are in the same cluster; otherwise set Y=1. Define following functions:
<center>
<math> D_W^i(x_1,x_2)=\left \| G_W(x_1)-G_W(x_2)\right \|_2  \qquad (1) </math>


<math>dist_{ij}^{2}=\left \| L^Ta_i-L^Ta_j \right \|_2^2=(a_i-a_j)^TLL^T(a_i-a_j)=(a_i-a_j)^TX(a_i-a_j).</math>
<math> L(W,(Y,x_1,x_2)^i))=(1-Y)L_S(D_W^i)+YL_D(D_W^i) \qquad (2) </math>
 
<math> \mathbb{L} (W)= \sum^P_{i=1} L(W,(Y,x_1,x_2)^i) \qquad (3) </math>
</center>
 
<math>L_S</math> and <math>L_D</math> are determined by ourselves and must be designed such that minimizing <math>L</math> respect to W results in low values of <math>D_W</math> for similar pairs and high values of <math>D_W</math> for similar pairs.

Revision as of 20:17, 12 July 2013

Dimensionality Reduction by Learning an Invariant Mapping


1. Intention

The drawbacks of most existing technique:

1 Most of them depend on a meaningful and computable distance metric in input space. (eg. LLE, Isomap relies on computable distance)

2 They do not compute a “function” that can accurately map new input samples whose relationship to the training data is unknown.

To overcome these drawbacks, this paper introduces a technique called DrLIM. The learning relies solely on neighborhood relationships and does not require any distance measure in the input space.

2. Mathematical Model

Input: A set of vectors [math]\displaystyle{ I=\{x_1,x_2,......,x_p\} }[/math], where [math]\displaystyle{ x_i\in \mathbb{R}^D, \forall i=1,2,3......,n. }[/math]

Output: A parametric function [math]\displaystyle{ G_W:\mathbb{R}^D \rightarrow \mathbb{R}^d }[/math] with [math]\displaystyle{ d\lt \lt D }[/math] such that it satisfies the following 3 properties:1.Simple distance measures in the output space should approximate the neighborhood relationships in the input space; 2.The mapping should not be constrained to implementing simple distance measures in the input space and should be able to learn invariances to complex transformations; 3.It should faithful even for samples whose neighborhood relationship are unknown.

Solution: Build up clusters according to similarities of the vectors. The similarities are measured by prior knowledge. Set Y=0 if [math]\displaystyle{ x_1 }[/math] and [math]\displaystyle{ x_2 }[/math] are in the same cluster; otherwise set Y=1. Define following functions:

[math]\displaystyle{ D_W^i(x_1,x_2)=\left \| G_W(x_1)-G_W(x_2)\right \|_2 \qquad (1) }[/math]

[math]\displaystyle{ L(W,(Y,x_1,x_2)^i))=(1-Y)L_S(D_W^i)+YL_D(D_W^i) \qquad (2) }[/math]

[math]\displaystyle{ \mathbb{L} (W)= \sum^P_{i=1} L(W,(Y,x_1,x_2)^i) \qquad (3) }[/math]

[math]\displaystyle{ L_S }[/math] and [math]\displaystyle{ L_D }[/math] are determined by ourselves and must be designed such that minimizing [math]\displaystyle{ L }[/math] respect to W results in low values of [math]\displaystyle{ D_W }[/math] for similar pairs and high values of [math]\displaystyle{ D_W }[/math] for similar pairs.