positive Semidefinite Metric Learning Using Boosting-like Algorithms

The BoostMetric algorithm proposed in the paper aims to learn a Mahlanobis distance. It is claimed to be faster than the previous metric learning algorithms because it uses a special optimization technique to solve the semi-definite programming (SDP) problem.

1. Mathematical model for the problem

The optimization problem of BoostMetric is almost the same as the large margin nearest neighbor algorithm (LMNN [1]). In the preprocessing step, the labeled training samples are required to be transformed into "triplets" (a _i, a _j, a _k), where a _i and a _j are in the same class, but a _i and a _k are in different classes. Let us denote dist _i,j and dist_i,k as the distance between a_i and a_j and the distance between a_i and a_k separately. The goal is to maximize the difference between these two distances.

Here the distance is Mahlanobis matrix represented as follows:

[math]\displaystyle{ dist_{ij}^{2}=\left \| L^Ta_i-L^Ta_j \right \|_2^2=(a_i-a_j)^TLL^T(a_i-a_j)=(a_i-a_j)^TX(a_i-a_j). }[/math]

where we can see that the matrix X is a positive semi definite (p.s.d.) matrix.

We try to maximize the difference of two distances:

[math]\displaystyle{ dist_{ij}^2-dist_{ik}^2=\left \langle A_r,X \right \rangle }[/math]

where

[math]\displaystyle{ A_r=(a_i-a_k)(a_i-a_k)^T-(a_i-a_j)(a_i-a_j)^T }[/math]

for all triplets (i,j,k).

Here X is unknown. To find the optimal X, we need to solve a semi-definite programming problem: to minimize [math]\displaystyle{ \sum_{r}F(\left \langle A_r,X \right \rangle)+vTr(X) }[/math] where [math]\displaystyle{ F(\cdot ) }[/math] is a convex loss function. The paper provides three loss functions for modeling: hinge loss, exponential loss and logistic loss. Also the trace norm regularization can be replaced by the Frobenius norm.

2. Optimization problems for three loss functions

For the hinge loss function, the optimization problem is formulated as:

[math]\displaystyle{ max_{\rho ,\omega, \xi }\rho-v\sum_{r=1}^{\left | I \right |}\xi_r, s.t.: \left \langle A_r, X \right \rangle \geq \rho-\xi_r, \forall r; X \succeq 0, Tr(X)=1; \xi \geq 0. }[/math]

where the constraint [math]\displaystyle{ Tr(X)=1 }[/math] is to remove the "scale ambiguity".

Based on some theories, X can be decomposed into a convex combination of rank-one matrices. Also their traces are constrained to one:

[math]\displaystyle{ X=\sum_{j=1}^{J}w_jZ_j }[/math], with [math]\displaystyle{ w_j\gt 0 }[/math], [math]\displaystyle{ Rank(Z_j)=1 }[/math] and [math]\displaystyle{ Tr(Z_j)=1, \forall j. }[/math]

Substituting X, we can get

[math]\displaystyle{ \left \langle A_r,X\right \rangle=\left \langle A_r,\sum_{j=1}^{J}w_jZ_j\right \rangle=\sum_{j=1}^{J}w_j\left \langle A_r,Z_j\right \rangle=\sum_{j=1}^Jw_jH_{rj}=H_{r:}w, \forall r. }[/math]

Then the optimization problem can be reformulated as:

[math]\displaystyle{ max_{\rho ,\omega, \xi }\rho-v\sum_{r=1}^{\left | I \right |}\xi_r, s.t.: H_{r:}\omega\geq \rho-\xi_r, (r=1,...,\left | I \right |); \omega \geq 0, 1^T\omega=1;\xi \geq 0. }[/math]

For exponential loss function, the optimization problem after substituting X is:

[math]\displaystyle{ min_{\rho, \omega}log (\sum_{r=1}^{\left | I \right |}exp(-\rho_r))+v\mathbf{1}^T\mathbf{\omega} }[/math]

[math]\displaystyle{ s.t.: \rho_r=H_{r:}\omega, r=1,...,|I|; \omega \geq 0. }[/math]

For logistic loss function, the optimization problem is:

[math]\displaystyle{ min_{\rho,\omega}\sum_{r=1}^{|I|}logit(\rho_r)+v\mathbf{1}^T\mathbf{\omega} }[/math]

[math]\displaystyle{ s.t.: \rho_r=H_{r:}\omega, r=1,...,|I|; \omega \geq 0. }[/math]

positive Semidefinite Metric Learning Using Boosting-like Algorithms

1. Mathematical model for the problem

2. Optimization problems for three loss functions

Navigation menu

Search