hamming Distance Metric Learning: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 3: Line 3:
In a previous paper, the authors tried to used a loss function which bears some similarity to the hinge function used in SVM. It includes a hyper-parameter which is a threshold in Hamming space that differentiates neighbors from non-neighbors. such that similar points are mapped to binary codes that do differ in more than P bits and disimilar points should map to points closer no more than P bits. For two binary codes <math>h</math> and <math>g</math> with hamming distance <math>||h-g||_H</math> and a similarity label <math>s \in {0,1}</math> the pairwise hinge loss function is defined as:
In a previous paper, the authors tried to used a loss function which bears some similarity to the hinge function used in SVM. It includes a hyper-parameter which is a threshold in Hamming space that differentiates neighbors from non-neighbors. such that similar points are mapped to binary codes that do differ in more than P bits and disimilar points should map to points closer no more than P bits. For two binary codes <math>h</math> and <math>g</math> with hamming distance <math>||h-g||_H</math> and a similarity label <math>s \in {0,1}</math> the pairwise hinge loss function is defined as:


However in practice finding value of P is not easy
However in practice finding value of P is not easy. Moreover in some datasets the relative pairwise distance is important not the precise numerical value. As a results in this paper authors define the loss function in terms of the relative similarity <center><math>l_triple(h,h^+,h^-)=[||h-h^+||-||h-h^-||+1]_+</math></center>

Revision as of 14:45, 8 July 2013

This paper tries to propose a method to learn mappings from high dimensional data to binary codes. One of the main advantages of using binary space is one can do exact KNN classification in sublinear time. Like other metric learning method this paper also tries to optimize some cost function which is based one a similarity measure between data points. One choice of similarity measure in binary space is Euclidean distance which produces unsatisfactory results. Another choice is Hamming distance, which is the total number of positions at which the corresponding bits are different.

The task is to learn a mapping from b(x) that project p-dimensional real valued input x onto a q dimensional binary code while preserving some notion of similarity. This mapping, which is called hash function is parameterized by a matrix w such that:

[math]\displaystyle{ b(x,w)=sign(f(w,x)) }[/math]

In a previous paper, the authors tried to used a loss function which bears some similarity to the hinge function used in SVM. It includes a hyper-parameter which is a threshold in Hamming space that differentiates neighbors from non-neighbors. such that similar points are mapped to binary codes that do differ in more than P bits and disimilar points should map to points closer no more than P bits. For two binary codes [math]\displaystyle{ h }[/math] and [math]\displaystyle{ g }[/math] with hamming distance [math]\displaystyle{ ||h-g||_H }[/math] and a similarity label [math]\displaystyle{ s \in {0,1} }[/math] the pairwise hinge loss function is defined as:

However in practice finding value of P is not easy. Moreover in some datasets the relative pairwise distance is important not the precise numerical value. As a results in this paper authors define the loss function in terms of the relative similarity

[math]\displaystyle{ l_triple(h,h^+,h^-)=[||h-h^+||-||h-h^-||+1]_+ }[/math]