graph Laplacian Regularization for Larg-Scale Semidefinite Programming: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with '==Introduction== This paper is about a new approach for the discovery of low dimensional representations of high-dimensional data where, in many cases, local proximity measuremen...')
 
Line 31: Line 31:
casting the required optimization as SDP over small matrices with few constraints will obtain the following SDP:<br />
casting the required optimization as SDP over small matrices with few constraints will obtain the following SDP:<br />
Maximize:  <math>b^Ty - l</math> (9)<br />
Maximize:  <math>b^Ty - l</math> (9)<br />
subject to: (i) <math>Y \succeq 0 </math> and (ii)  <math>\begin{matrix} I & A^{1/2}y <br /> (A^{1/2}y)^T & l \end{bmatrix}</math><br />
subject to: (i) <math>Y \succeq 0 </math> and (ii)  <math>\begin{bmatrix} I & A^{1/2}y <br /> (A^{1/2}y)^T & l \end{bmatrix}</math><br />


By puting in this form, The only variables of the SDP are the <math>m(m+1)/2</math> elements of Y and the unknown scalar l. Constraints decrease to the positive semidifinteness of Y and linear matrix inequality of <math>m^2*m^2</math>. It is worth noting that the complexity of the SDP does not depend on the number of nodes (n) or edges in the network.<br />
By puting in this form, The only variables of the SDP are the <math>m(m+1)/2</math> elements of Y and the unknown scalar l. Constraints decrease to the positive semidifinteness of Y and linear matrix inequality of <math>m^2*m^2</math>. It is worth noting that the complexity of the SDP does not depend on the number of nodes (n) or edges in the network.<br />

Revision as of 19:27, 7 July 2009

Introduction

This paper is about a new approach for the discovery of low dimensional representations of high-dimensional data where, in many cases, local proximity measurements also available. Sensor localization is an example. Existing approaches use semidefinite programs (SDPs) with low rank solutions from convex optimization methods. However, SDPs approach doesn’t scale well for large inputs. The main contribution of this paper is introducing a method called matrix factorization for solving very large problems of the above type that leads to much smaller and faster SDPs than the previuos ones.

Sensor localization

Assuming only nearby sensors can estimate their local pairwise distances via radio transmitters, the problem is to identify the whole network topology.In other words, knowing that we have n sensors with [math]\displaystyle{ d_{ij} }[/math] as an estimate of local distance between adjacent sensors i and j, the desired output would be [math]\displaystyle{ x_1, x_2, ..., x_n \in R_2 }[/math] as the planar coordinates of sensors.

Work on this issue so far

work on this issue so far starts with minimizing sum-of-squares loss function as
[math]\displaystyle{ \min_{x_1,...,x_n}\Sigma_{i\sim j}{(\|x_i-x_j\|^2-d_{ij}^2)^2} }[/math] (1)
and adding a centering Constraint (assuming no sensor location is known in advance) as
[math]\displaystyle{ \|\Sigma_i{x_i}\|^2 = 0 }[/math] (2). The problem here is that the optimization is not convex and is more likely to be trapped by local minima. For solving this problem, an [math]\displaystyle{ n*n }[/math] inner product matrix X is defined as [math]\displaystyle{ X_{ij} = x_i*x_j }[/math] and by relaxing the constraint that sensor locations [math]\displaystyle{ x_i }[/math] lie in the [math]\displaystyle{ R^2 }[/math] plane , the following convex notation will be obtained:
Minimize: [math]\displaystyle{ \Sigma_{i\sim j}{(X_{ii}-2X_{ij}+X_{jj}-d_{ij}^2)^2} }[/math] (3)
subject to: (i) [math]\displaystyle{ \Sigma_{ij}{X_{ij}=0} }[/math] and (ii) [math]\displaystyle{ X \succeq 0 }[/math]
The vectors [math]\displaystyle{ x_i }[/math] will lie in a subspace with dimension equal to the rank of the solution X. Projecting [math]\displaystyle{ x_i }[/math]s into their 2D subspace of maximum variance, obtaining from the top 2 eigenvectors of X, will get planar coordinates. Howevere, the higher the rank of X, the greater the information loss after projection. Growing the error of projection with increasing rank leads us to add a low rank, or equivalently, high trace constraint. Therefore, an extra term is added to favor solutions with high variance(high trace):
Maximize: [math]\displaystyle{ tr(X)-v\Sigma_{i\sim j}{(X_{ii}-2X_{ij}+X_{jj}-d_{ij}^2)^2} }[/math] (4)
subject to: (i) [math]\displaystyle{ \Sigma_{ij}{X_{ij}=0} }[/math] and (ii) [math]\displaystyle{ X \succeq 0 }[/math]
where the parameter [math]\displaystyle{ v\gt 0 }[/math] balances the trade-off between maximizing variance and preserving local distances (MVU)

Matrix factorization

Assume G is a neighborhodd graph defined by the sensor network and location of sensors is a function defined over the nodes of this graph. Functions on a graph can be approximated using eigenvectors of graph’s Laplacian matrix as basis functions (spectral graph theory)
graph Laplacian is defined ....

Sensor's location can be approximated using the m bottom eigenvecotrs of the Laplacian matrix of G. Expanding these locations yields a matrix factorization for X so that : [math]\displaystyle{ x_i\approx\Sigma_{\alpha = 1}^m Q_{i\alpha}y_{\alpha} }[/math]
where Q is the [math]\displaystyle{ n*m }[/math] matrix with m bottom eigenvecors of Laplacian matrix and [math]\displaystyle{ y_{\alpha} }[/math] is unknown and depends on [math]\displaystyle{ d_{ij} }[/math]. Now, if we define the inner product of theses vectors as [math]\displaystyle{ Y_{\alpha\beta} = y_{\alpha}y_{\beta} }[/math] we will get the factorized matrix [math]\displaystyle{ X\approx QYQ^T }[/math] (6)
Using this approximation, we can solve an optimization for Y that is much smaller than X. Since Q stores mutulaly orthogonal eigenvectors we can imply [math]\displaystyle{ tr(Y)=tr(X) }[/math]. In addition, [math]\displaystyle{ QYQ^T }[/math] satisfies centering constraint because uniform eigenvectors are not included. Therefore, the optimization would change to to folloing equations:
Maximize: [math]\displaystyle{ tr(Y)-v\Sigma_{i\sim j}{((QYQ^T)_{ii}-2(QYQ^T)_{ij}+(QYQ^T)_{jj}-d_{ij}^2)^2} }[/math] (7)
subject to: [math]\displaystyle{ Y \succeq 0 }[/math]

Formulation as SDP

casting the required optimization as SDP over small matrices with few constraints will obtain the following SDP:
Maximize: [math]\displaystyle{ b^Ty - l }[/math] (9)
subject to: (i) [math]\displaystyle{ Y \succeq 0 }[/math] and (ii) [math]\displaystyle{ \begin{bmatrix} I & A^{1/2}y \lt br /\gt (A^{1/2}y)^T & l \end{bmatrix} }[/math]

By puting in this form, The only variables of the SDP are the [math]\displaystyle{ m(m+1)/2 }[/math] elements of Y and the unknown scalar l. Constraints decrease to the positive semidifinteness of Y and linear matrix inequality of [math]\displaystyle{ m^2*m^2 }[/math]. It is worth noting that the complexity of the SDP does not depend on the number of nodes (n) or edges in the network.

Gradient based improvement

As the matrix factorization only provides an approximation to the global minimum, a refinemnetis needed by using it as initial starting point for gradient descent in the first equation. It can be done in 2 steps:
First, starting from the m-dimensional solution of eq. (6), use conjugate gradient methods to maximize the objective function in eq. (4)
Second, Project the results from the previous step into the R2 plane and use conjugate gradient methods to minimize the loss function in eq. (1) where conjugate gradient method is an iterative method for minimizing a quadratic function where its Hessian matrix (matrix of second partial derivatives) is positive

Results

The figure below shows sensor locations inferred for n = 1055 largest cities in continental US.Local distances were estimated up to 18 neighbors within radius r = 0.09. Local measurements were corrupted by 10% Gaussian noise over the true local distance. Using m = 10 bottom eigenvectors of graph Laplacian, the solution provides a good initial starting point(left picture) for gradient-based improvement. The right picture is senssor locations after the improvement.

The second simulated network, the figure below, placed nodes at n=20,000 uniformly sampled points inside the unit square. local distances were estimated up to 20 other nodes within radius r = 0.06. Using m = 10 bottom eigenvectors of graph Laplacian, 19s was taken to construct and solve the SDP and 52s was taken for 100 iterations in conjugate gradient descent.

For the simulated networks with nodes at US cities, the figure below plots the loss function in eq. (1) vs. number of eigenvectors. It also plots computation time vs. number of eigenvectors. It can be inferred from the figure that there is a trade-off between getting better solution and increasing computation time. Also, we can see that [math]\displaystyle{ m\approx 10 }[/math] best manages this trade-off.

Conclusion

An approach for inferring low dimensional representations from local distance constraints using MVU was proposed.Using matrix factorization computed from the bottom eigenvectors of the graph Laplacian was the main idea of the approach. The initial solution can be refined by local search methods. This approach is suitable for large input and its complexity does not depend on the input.