visualizing Similarity Data with a Mixture of Maps: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 10: Line 10:
<center> <math> \mathbf  p_{j|i} = \frac{\exp(-||x_i-x_j ||^2/ \sigma_i ^2  )}{\sum_{k \neq i} \exp(-||x_i-x_k ||^2/ \sigma_i ^2 ) }</math> </center>
<center> <math> \mathbf  p_{j|i} = \frac{\exp(-||x_i-x_j ||^2/ \sigma_i ^2  )}{\sum_{k \neq i} \exp(-||x_i-x_k ||^2/ \sigma_i ^2 ) }</math> </center>


Note that given a set pairwise distances between objects, <math> || x_i - x_j || </math>, we can use the above equation to derive the same probabilities. In practice, given a set of <math> N </math> points, we set the variance of the Gaussian <math> \sigma_i ^2  </math>,  either by hand or we find it by a binary search for the values of <math> \sigma_i </math>  that make the entropy of the distribution over neighbours equal to <math> \log_2 M </math>. This is done by starting from a number <math> M << N </math> and performing the binary search until the entropy of <math> P_i </math> is within some predetermined small tolerance of <math> \log_2 M </math>.
Note that given a set pairwise distances between objects, <math> || x_i - x_j || </math>, we can use the above equation to derive the same probabilities. In practice, given a set of <math> N </math> points, we set the variance of the Gaussian <math> \sigma_i ^2  </math>,  either by hand or we find it by a binary search for the values of <math> \sigma_i </math>  that make the entropy of the distribution over neighbours equal to <math> \log_2 M </math> (Remember that the entropy of the distribution <math> P_i </math> is defined as <math> \int_{-\infty}^{=\infty}p(x)\log(1/p(x))dx </math> and <math> p(x)\log(1/p(x)) </math> is nderstood to be zero when p(x)=0).) This is done by starting from a number <math> M << N </math> and performing the binary search until the entropy of <math> P_i </math> is within some predetermined small tolerance of <math> \log_2 M </math>.

Revision as of 16:38, 4 July 2009

Introduction

The main idea of this paper is to show how we can utilize several different two-dimensional maps in order to visualize a set of pairwise similarities. Aspect maps resemble both clustering (in modeling pair-wise similarities as a mixture of different types of similarity) and multi-dimensional scaling (in modeling each type of similarity by a two-dimensional map) . While methods such as PCA and MDS (Metric Multi-dimensional Scaling) are simple and fast, their main drawback can be seen in minimizing a cost function that is mainly focused on modelling large dissimilarities rather than small ones. As a result of that they do not provide good visualizations of data that lies on a curved low-dimensional manifold in a high dimensional space. Also methods such as Local MDS, LLE, Maximum Variance Unfolding or Stochastic Neighbour Embedding (SNE) model local distances accurately in the two-dimensional visualization, but modelling of larger distances is done inaccurately.

SNE outweighs methods such as LLE in two ways: Despite difficulty of optimizing the SNE objective function, it leads to much better solutions and since SNE is based on probabilistic model, it is much more efficient in producing better visualization. In the next section, we will explain how SNE works.

Stochastic Neighbour Embedding

The core of SN method lies in converting high-dimensional distance or similarity data into a set of [math]\displaystyle{ p_{j|i} }[/math], each of which represent the probability that one object [math]\displaystyle{ i }[/math] pick another object [math]\displaystyle{ j }[/math] as its neighbour if it was only allowed to pick one neighbour. For objects in high dimensional Euclidian space, where our data points consists of the coordinates of the objects, we can find [math]\displaystyle{ p_{j|i} }[/math] for each object [math]\displaystyle{ i }[/math] by using a spherical Gaussian distribution centered at the high-dimensional position of [math]\displaystyle{ i }[/math], [math]\displaystyle{ X_{i} }[/math]. We will set [math]\displaystyle{ p_{i|i} = 0 }[/math] and for [math]\displaystyle{ j \neq i }[/math],

[math]\displaystyle{ \mathbf p_{j|i} = \frac{\exp(-||x_i-x_j ||^2/ \sigma_i ^2 )}{\sum_{k \neq i} \exp(-||x_i-x_k ||^2/ \sigma_i ^2 ) } }[/math]

Note that given a set pairwise distances between objects, [math]\displaystyle{ || x_i - x_j || }[/math], we can use the above equation to derive the same probabilities. In practice, given a set of [math]\displaystyle{ N }[/math] points, we set the variance of the Gaussian [math]\displaystyle{ \sigma_i ^2 }[/math], either by hand or we find it by a binary search for the values of [math]\displaystyle{ \sigma_i }[/math] that make the entropy of the distribution over neighbours equal to [math]\displaystyle{ \log_2 M }[/math] (Remember that the entropy of the distribution [math]\displaystyle{ P_i }[/math] is defined as [math]\displaystyle{ \int_{-\infty}^{=\infty}p(x)\log(1/p(x))dx }[/math] and [math]\displaystyle{ p(x)\log(1/p(x)) }[/math] is nderstood to be zero when p(x)=0).) This is done by starting from a number [math]\displaystyle{ M \lt \lt N }[/math] and performing the binary search until the entropy of [math]\displaystyle{ P_i }[/math] is within some predetermined small tolerance of [math]\displaystyle{ \log_2 M }[/math].