summary: Difference between revisions
No edit summary |
|||
Line 21: | Line 21: | ||
Solution: | Solution: | ||
The high dimensional data points could be mapped to lower dimension in several clusters. We could build up clusters according to similarities of the vectors. The similarities are measured by prior knowledge. Set Y=0 if <math> x_1 </math> and <math> x_2 </math> are in the same cluster; otherwise set Y=1. This clustering problem can be reduced to an optimization problem. Define following functions: | |||
<center> | <center> | ||
<math> D_W^i(x_1,x_2)=\left \| G_W(x_1)-G_W(x_2)\right \|_2 \qquad (1) </math> | <math> D_W^i(x_1,x_2)=\left \| G_W(x_1)-G_W(x_2)\right \|_2 \qquad (1) </math> | ||
<math> | <math> l(W,(Y,x_1,x_2)^i))=(1-Y)L_S(D_W^i)+YL_D(D_W^i) \qquad (2) </math> | ||
<math> | <math> L(W)= \sum^P_{i=1} l(W,(Y,x_1,x_2)^i) \qquad (3) </math> | ||
</center> | </center> | ||
where <math>(Y,x_1,x_2)^i</math> is the <math>i</math>-th labeled sample pair, <math>L_S</math> is the partial loss function for the points in the same cluster, <math>L_D</math> is the partial loss function for points in different clusters. | |||
<math>L_S</math> and <math>L_D</math> are determined by ourselves and must be designed such that minimizing <math>L</math> respect to W results in low values of <math>D_W</math> for similar pairs and high values of <math>D_W</math> for | <math>L_S</math> and <math>L_D</math> are determined by ourselves and must be designed such that minimizing <math>L</math> respect to W results in low values of <math>D_W </math> for similar pairs and high values of <math>D_W </math> for disimilar pairs. The intuition is, minimizing the function <math>L(\ )</math> could build up clusters over input data set. |
Revision as of 21:20, 12 July 2013
Dimensionality Reduction by Learning an Invariant Mapping
1. Intention
The drawbacks of most existing technique:
1 Most of them depend on a meaningful and computable distance metric in input space. (eg. LLE, Isomap relies on computable distance)
2 They do not compute a “function” that can accurately map new input samples whose relationship to the training data is unknown.
To overcome these drawbacks, this paper introduces a technique called DrLIM. The learning relies solely on neighborhood relationships and does not require any distance measure in the input space.
2. Mathematical Model
Input: A set of vectors [math]\displaystyle{ I=\{x_1,x_2,......,x_p\} }[/math], where [math]\displaystyle{ x_i\in \mathbb{R}^D, \forall i=1,2,3......,n. }[/math]
Output: A parametric function [math]\displaystyle{ G_W:\mathbb{R}^D \rightarrow \mathbb{R}^d }[/math] with [math]\displaystyle{ d\lt \lt D }[/math] such that it satisfies the following 3 properties:1.Simple distance measures in the output space should approximate the neighborhood relationships in the input space; 2.The mapping should not be constrained to implementing simple distance measures in the input space and should be able to learn invariances to complex transformations; 3.It should faithful even for samples whose neighborhood relationship are unknown.
Solution: The high dimensional data points could be mapped to lower dimension in several clusters. We could build up clusters according to similarities of the vectors. The similarities are measured by prior knowledge. Set Y=0 if [math]\displaystyle{ x_1 }[/math] and [math]\displaystyle{ x_2 }[/math] are in the same cluster; otherwise set Y=1. This clustering problem can be reduced to an optimization problem. Define following functions:
[math]\displaystyle{ D_W^i(x_1,x_2)=\left \| G_W(x_1)-G_W(x_2)\right \|_2 \qquad (1) }[/math]
[math]\displaystyle{ l(W,(Y,x_1,x_2)^i))=(1-Y)L_S(D_W^i)+YL_D(D_W^i) \qquad (2) }[/math]
[math]\displaystyle{ L(W)= \sum^P_{i=1} l(W,(Y,x_1,x_2)^i) \qquad (3) }[/math]
where [math]\displaystyle{ (Y,x_1,x_2)^i }[/math] is the [math]\displaystyle{ i }[/math]-th labeled sample pair, [math]\displaystyle{ L_S }[/math] is the partial loss function for the points in the same cluster, [math]\displaystyle{ L_D }[/math] is the partial loss function for points in different clusters. [math]\displaystyle{ L_S }[/math] and [math]\displaystyle{ L_D }[/math] are determined by ourselves and must be designed such that minimizing [math]\displaystyle{ L }[/math] respect to W results in low values of [math]\displaystyle{ D_W }[/math] for similar pairs and high values of [math]\displaystyle{ D_W }[/math] for disimilar pairs. The intuition is, minimizing the function [math]\displaystyle{ L(\ ) }[/math] could build up clusters over input data set.