positive Semidefinite Metric Learning Using Boosting-like Algorithms: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
Line 13: Line 13:
where we can see that the matrix X is a positive semi definite (p.s.d.) matrix.
where we can see that the matrix X is a positive semi definite (p.s.d.) matrix.


To make calculation easier, we maximize </math>dist_{ij}^2-dist_{ik}^2</math> instead. Then we denote
To make calculation easier, we maximize <math>dist_{ij}^2-dist_{ik}^2</math> instead. Then we denote


<math>dist_{ij}^2-dist_{ik}^2=<A_r,X > </math>
<math>dist_{ij}^2-dist_{ik}^2=<A_r,X > </math>
Line 24: Line 24:


Here X is unknown. To find the optimal  X, we need to solve a semi-definite programming problem: to minimize <math>\sum_{r}F(<A_r,X>)+vTr(X)</math> where <math>F(\cdot )</math> is a convex loss function. The paper provides three loss functions for modeling: hinge loss, exponential loss and logistic loss. Also the trace norm regularization can be replaced by the Frobenius norm.
Here X is unknown. To find the optimal  X, we need to solve a semi-definite programming problem: to minimize <math>\sum_{r}F(<A_r,X>)+vTr(X)</math> where <math>F(\cdot )</math> is a convex loss function. The paper provides three loss functions for modeling: hinge loss, exponential loss and logistic loss. Also the trace norm regularization can be replaced by the Frobenius norm.


== 2. Optimization problems for three loss functions ==
== 2. Optimization problems for three loss functions ==

Revision as of 18:57, 4 July 2013

The BoostMetric algorithm proposed in the paper aims to learn a Mahlanobis distance. It is claimed to be faster than the previous metric learning algorithms because it uses a special optimization technique to solve the semi-definite programming (SDP) problem.


1. Mathematical model for the problem

The optimization problem of BoostMetric is almost the same as the large margin nearest neighbor algorithm (LMNN [1]). In the preprocessing step, the labeled training samples are required to be transformed into "triplets" (a i, a j, a k), where a i and a j are in the same class, but a i and a k are in different classes. Let us denote dist i,j and disti,k as the distance between ai and aj and the distance between ai and ak separately. The goal is to maximize the difference between these two distances.

Here the distance is Mahlanobis matrix represented as follows:

[math]\displaystyle{ dist_{ij}^{2}=\left \| L^Ta_i-L^Ta_j \right \|_2^2=(a_i-a_j)^TLL^T(a_i-a_j)=(a_i-a_j)^TX(a_i-a_j). }[/math]

where we can see that the matrix X is a positive semi definite (p.s.d.) matrix.

To make calculation easier, we maximize [math]\displaystyle{ dist_{ij}^2-dist_{ik}^2 }[/math] instead. Then we denote

[math]\displaystyle{ dist_{ij}^2-dist_{ik}^2=\lt A_r,X \gt }[/math]

where

[math]\displaystyle{ A_r=(a_i-a_k)(a_i-a_k)^T-(a_i-a_j)(a_i-a_j)^T }[/math]

for all triplets (i,j,k).

Here X is unknown. To find the optimal X, we need to solve a semi-definite programming problem: to minimize [math]\displaystyle{ \sum_{r}F(\lt A_r,X\gt )+vTr(X) }[/math] where [math]\displaystyle{ F(\cdot ) }[/math] is a convex loss function. The paper provides three loss functions for modeling: hinge loss, exponential loss and logistic loss. Also the trace norm regularization can be replaced by the Frobenius norm.

2. Optimization problems for three loss functions

For the hinge loss function, the optimization problem is formulated as:

[math]\displaystyle{ max_{\rho ,\omega, \xi }\rho-v\sum_{r=1}^{\left | I \right |}\xi_r, s.t.: \left \langle A_r, X \right \rangle \geq \rho-\xi_r, \forall r; X \succeq 0, Tr(X)=1; \xi \geq 0. }[/math]

where the constraint [math]\displaystyle{ Tr(X)=1 }[/math] is to remove the "scale ambiguity".

Based on some theories, X can be decomposed into a convex combination of rank-one matrices. Also their traces are constrained to one:

[math]\displaystyle{ X=\sum_{j=1}^{J}w_jZ_j }[/math], with [math]\displaystyle{ w_j\gt 0 }[/math], [math]\displaystyle{ Rank(Z_j)=1 }[/math] and [math]\displaystyle{ Tr(Z_j)=1, \forall j. }[/math]

Substituting X, we can get

[math]\displaystyle{ \left \langle A_r,X\right \rangle=\left \langle A_r,\sum_{j=1}^{J}w_jZ_j\right \rangle=\sum_{j=1}^{J}w_j\left \langle A_r,Z_j\right \rangle=\sum_{j=1}^Jw_jH_{rj}=H_{r:}w, \forall r. }[/math]

Then the optimization problem can be reformulated as:

[math]\displaystyle{ max_{\rho ,\omega, \xi }\rho-v\sum_{r=1}^{\left | I \right |}\xi_r, s.t.: H_{r:}\omega\geq \rho-\xi_r, (r=1,...,\left | I \right |); \omega \geq 0, 1^T\omega=1;\xi \geq 0. }[/math]

Its Lagrange dual problem is

[math]\displaystyle{ min_{\pi, u}\pi s.t.: \sum_{r=1}^{\left | I \right |}u_rH_{r:} \leq \pi\mathbf{1}^T; \mathbf{1}^Tu=1, 0 \leq u \leq v\mathbf{1}. }[/math]