visualizing Data using t-SNE: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 1: Line 1:
=Introduction=
=Introduction=
The paper <ref>Laurens van der Maaten, and Geoffrey Hinton, 2008. Visualizing Data using t-SNE.</ref> introduced a new nonlinear dimensionally reduction technique that visualizes high-dimensional data based on the pair-wise similarities between the datapoints. This technique is a variation of the Stochastic Neighbor embedding that was proposed by Hinton and Roweis <ref>G.E. Hinton and S.T. Roweis, 2002. Stochastic Neighbor embedding.</ref>.
The paper <ref>Laurens van der Maaten, and Geoffrey Hinton, 2008. Visualizing Data using t-SNE.</ref> introduced a new nonlinear dimensionally reduction technique that "embeds" high-dimensional data into low-dimensional space. This technique is a variation of the Stochastic Neighbor embedding (SNE) that was proposed by Hinton and Roweis in 2002 <ref>G.E. Hinton and S.T. Roweis, 2002. Stochastic Neighbor embedding.</ref>, where the high-dimensional Euclidean distances between datapoints are converted into the conditional probability to describe their similarities. t-SNE, based on the same idea, is aimed to be easier for optimization and to solve the "crowding problem". In addition, the author showed that t-SNE can be applied to large data sets as well, by using random walks on neighborhood graphs. The performance of t-SNE is demonstrated on a wide variety of data sets and compared with many other visualization techniques.


=Stochastic Neighbor Embedding=
=Stochastic Neighbor Embedding=

Revision as of 17:14, 12 July 2009

Introduction

The paper <ref>Laurens van der Maaten, and Geoffrey Hinton, 2008. Visualizing Data using t-SNE.</ref> introduced a new nonlinear dimensionally reduction technique that "embeds" high-dimensional data into low-dimensional space. This technique is a variation of the Stochastic Neighbor embedding (SNE) that was proposed by Hinton and Roweis in 2002 <ref>G.E. Hinton and S.T. Roweis, 2002. Stochastic Neighbor embedding.</ref>, where the high-dimensional Euclidean distances between datapoints are converted into the conditional probability to describe their similarities. t-SNE, based on the same idea, is aimed to be easier for optimization and to solve the "crowding problem". In addition, the author showed that t-SNE can be applied to large data sets as well, by using random walks on neighborhood graphs. The performance of t-SNE is demonstrated on a wide variety of data sets and compared with many other visualization techniques.

Stochastic Neighbor Embedding

t-Distributed Stochastic Neighbor Embedding

Symmetric SNE

The Crowding Problem

Compensating for Mismatched Dimensionality by Mismatched Tails

Optimization Methods for t-SNE

Experiments with Different Data Sets

t-SNE for Large Data Sets

Weaknesses of t-SNE

Summary

References