learning a Nonlinear Embedding by Preserving Class Neighborhood Structure: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 3: Line 3:
feature space in which K-nearest neighbour classification performs well. As the performance of algorithms like K-nearest neighbours (KNN) that are based on computing distances, the main objective of the proposed algorithm is to learn a good similarity measure that can provide insight into how high-dimensional data is organized. The nonlinear transformation is learned by pre-training and fine-tuning a multilayer neural network. The authors also show how to enhance the performance of non-linear transformation further using unlabeled data. Experimental results on a widely used version of the MNIST handwritten digit recognition task show that proposed algorithm achieves a much lower error rate than SVM or standard backpropagation.
feature space in which K-nearest neighbour classification performs well. As the performance of algorithms like K-nearest neighbours (KNN) that are based on computing distances, the main objective of the proposed algorithm is to learn a good similarity measure that can provide insight into how high-dimensional data is organized. The nonlinear transformation is learned by pre-training and fine-tuning a multilayer neural network. The authors also show how to enhance the performance of non-linear transformation further using unlabeled data. Experimental results on a widely used version of the MNIST handwritten digit recognition task show that proposed algorithm achieves a much lower error rate than SVM or standard backpropagation.


=Related work=
=Background and Related Work=
Learning a similarity measure (or distance metric) over the input space <math> {\mathbf X} </math> is an important task in machine learning, and is closely related to the feature extraction problem.
A distance metric <math> \mathbf D </math> (e. g. Euclidean) measures the similarity between two input vectors <math> {\mathbf x}^a, {\mathbf x}^b \in {\mathbf X} </math> by computing <math> \mathbf D[{\mathbf f}(x^a|W),{\mathbf f}(x^b|W)]</math>, where <math> {\mathbf f}(x|W)</math> represents the mapping function from input vector <math> {\mathbf X} </math> to feature space <math> {\mathbf Y} </math> parametrized by <math> {\mathbf W} </math>.
Previous work studied this problem where <math> \mathbf D </math> is the Euclidean distance and
<math> {\mathbf f} </math> is simple linear projection, i.e. <math> {\mathbf f}(x|W)=Wx </math>.
For example Linear discriminant analysis (LDA) learns the matrix <math> W </math> that minimizes the within-class distances to between-class distances ratio. 
 
Globerson and Roweis <ref> A. Globerson and S. T. Roweis. Metric learning by collapsing
classes. In NIPS, 2005 </ref> proposed a method for learning the matrix <math> W </math> such that the input vectors from the same class are mapped to a tight cluster. Also, Weinberger et.al. <ref> K. Q.Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005. </ref> also learned <math> W </math> with the goals of both making the K-nearest neighbours belong to the same class and making examples from different classes be separated by a large margin. All these methods rely on linear transformation, which has a limited number of parameters and thus cannot model higher-order correlations between the
original data dimensions.


== Neighborhood Component Analysis ==
== Neighborhood Component Analysis ==
Line 11: Line 20:
== Pre-training step ==
== Pre-training step ==


== Fine-tuning ==
== Fine-tuning step ==


=Regularized Nonlinear NCA=
=Regularized Nonlinear NCA=

Revision as of 23:28, 30 June 2009

Introduction

The paper <ref>Salakhutdinov, R., & Hinton, G. E. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. AI and Statistics.</ref> presented here describes a method to learn a nonlinear transformation from the input space to a low-dimensional feature space in which K-nearest neighbour classification performs well. As the performance of algorithms like K-nearest neighbours (KNN) that are based on computing distances, the main objective of the proposed algorithm is to learn a good similarity measure that can provide insight into how high-dimensional data is organized. The nonlinear transformation is learned by pre-training and fine-tuning a multilayer neural network. The authors also show how to enhance the performance of non-linear transformation further using unlabeled data. Experimental results on a widely used version of the MNIST handwritten digit recognition task show that proposed algorithm achieves a much lower error rate than SVM or standard backpropagation.

Background and Related Work

Learning a similarity measure (or distance metric) over the input space [math]\displaystyle{ {\mathbf X} }[/math] is an important task in machine learning, and is closely related to the feature extraction problem. A distance metric [math]\displaystyle{ \mathbf D }[/math] (e. g. Euclidean) measures the similarity between two input vectors [math]\displaystyle{ {\mathbf x}^a, {\mathbf x}^b \in {\mathbf X} }[/math] by computing [math]\displaystyle{ \mathbf D[{\mathbf f}(x^a|W),{\mathbf f}(x^b|W)] }[/math], where [math]\displaystyle{ {\mathbf f}(x|W) }[/math] represents the mapping function from input vector [math]\displaystyle{ {\mathbf X} }[/math] to feature space [math]\displaystyle{ {\mathbf Y} }[/math] parametrized by [math]\displaystyle{ {\mathbf W} }[/math]. Previous work studied this problem where [math]\displaystyle{ \mathbf D }[/math] is the Euclidean distance and [math]\displaystyle{ {\mathbf f} }[/math] is simple linear projection, i.e. [math]\displaystyle{ {\mathbf f}(x|W)=Wx }[/math]. For example Linear discriminant analysis (LDA) learns the matrix [math]\displaystyle{ W }[/math] that minimizes the within-class distances to between-class distances ratio.

Globerson and Roweis <ref> A. Globerson and S. T. Roweis. Metric learning by collapsing classes. In NIPS, 2005 </ref> proposed a method for learning the matrix [math]\displaystyle{ W }[/math] such that the input vectors from the same class are mapped to a tight cluster. Also, Weinberger et.al. <ref> K. Q.Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005. </ref> also learned [math]\displaystyle{ W }[/math] with the goals of both making the K-nearest neighbours belong to the same class and making examples from different classes be separated by a large margin. All these methods rely on linear transformation, which has a limited number of parameters and thus cannot model higher-order correlations between the original data dimensions.

Neighborhood Component Analysis

Nonlinear NCA

Pre-training step

Fine-tuning step

Regularized Nonlinear NCA

Splitting codes into class-relevant and class-irrelevant parts

Experiments

References

<references/>