parametric Local Metric Learning for Nearest Neighbor Classification
Overview
This paper addresses the problem of supervised local metric learning for nearest neighbor (or generally K-NN) classifiers. Here, by local, we mean that separate metrics are found for each data point. These metrics are formed by a linear combination of some "basis metrics". In the algorithm, both the basis metrics and the local weights of each data point are learned. While basis metrics are learned in a supervised way, the local weights are learned in an unsupervised manner, which reduces the risk of overfitting. Furthermore, the authors use the idea of manifold regularization to further overcome overfitting. Also, large margin triplet constraints are used to find basis metrics, which further improves the results.
Introduction
The squared Mahalanobis distance between two instances is given by
[math]\displaystyle{ d^{2}_{M} = (x_{i} - x_{j})^{T}M(x_{i} - x_{j}) }[/math]
One way of having local metrics, is to define different distance metrics for different training samples.
[math]\displaystyle{ d^{2}_{M_i} = (x_{i} - x_{j})^{T}M_i(x_{i} - x_{j}) }[/math]
In fact, this way of defining local metrics does not satisfy the symmetric property of a metric. However, although not a metric, in the paper it is called metric learning. Nevertheless, this abuse of notation can give intuitions.
We assume that each local metric [math]\displaystyle{ M_i }[/math] is a linear combination of the basis metrics, [math]\displaystyle{ M_{b_k} }[/math]
[math]\displaystyle{ {M_i} = \sum_{b_k}W_{i,b_k}M_{b_k}, W_{i,b_k}\gt =0, \sum_{b_k}W_{i,b_k}=1 }[/math]
As can be seen, the basis metrics [math]\displaystyle{ M_{b_k} }[/math] and the weights [math]\displaystyle{ W_{i,b_k} }[/math] should be learned; once they are known, we will have the local metric (well, if we can call it a metric).
Furthermore, each basis metric corresponds to an anchor point in the input space. These anchor points are found using k-means clustering (other methods could also be used).
Learning the Weights
The weights are learned in an unsupervised way. In fact, the weights do not even depend on the basis metrics [math]\displaystyle{ M_{b_k} }[/math]. We compute the weights in a way that
- Each training point is approximated as a weighted combination of the anchor points.
- The weights should be local; if a training sample is far from an anchor point, the corresponding weight should be small.
- The weights of nearby training points should be similar (i.e., manifold regularization)
Therefore, the optimization problem is defined as follows
[math]\displaystyle{ min_{w}g(W) = |X-WU|^2_F + \lambda_{1}tr(WG) + \lambda_{2}tr(W^{T}LW) }[/math]
[math]\displaystyle{ s.t. W_{i,b_k}\gt =0, \sum_{b_k}W_{i,b_k}=1 }[/math]