Difference between revisions of "learning a Nonlinear Embedding by Preserving Class Neighborhood Structure"

From statwiki
Jump to: navigation, search
Line 3: Line 3:
 
feature space in which K-nearest neighbour classification performs well. As the performance of algorithms like K-nearest neighbours (KNN) that are based on computing distances, the main objective of the proposed algorithm is to learn a good similarity measure that can provide insight into how high-dimensional data is organized. The nonlinear transformation is learned by pre-training and fine-tuning a multilayer neural network. The authors also show how to enhance the performance of non-linear transformation further using unlabeled data. Experimental results on a widely used version of the MNIST handwritten digit recognition task show that proposed algorithm achieves a much lower error rate than SVM or standard backpropagation.
 
feature space in which K-nearest neighbour classification performs well. As the performance of algorithms like K-nearest neighbours (KNN) that are based on computing distances, the main objective of the proposed algorithm is to learn a good similarity measure that can provide insight into how high-dimensional data is organized. The nonlinear transformation is learned by pre-training and fine-tuning a multilayer neural network. The authors also show how to enhance the performance of non-linear transformation further using unlabeled data. Experimental results on a widely used version of the MNIST handwritten digit recognition task show that proposed algorithm achieves a much lower error rate than SVM or standard backpropagation.
  
=Related work=
+
=Background and Related Work=
 +
Learning a similarity measure (or distance metric) over the input space <math> {\mathbf X} </math> is an important task in machine learning, and is closely related to the feature extraction problem.
 +
A distance metric <math> \mathbf D </math> (e. g. Euclidean) measures the similarity between two input vectors <math> {\mathbf x}^a, {\mathbf x}^b \in {\mathbf X} </math> by computing <math> \mathbf D[{\mathbf f}(x^a|W),{\mathbf f}(x^b|W)]</math>, where <math> {\mathbf f}(x|W)</math> represents the mapping function from input vector <math> {\mathbf X} </math> to feature space <math> {\mathbf Y} </math> parametrized by <math> {\mathbf W} </math>.
 +
Previous work studied this problem where <math> \mathbf D </math> is the Euclidean distance and
 +
<math> {\mathbf f} </math> is simple linear projection, i.e. <math> {\mathbf f}(x|W)=Wx </math>.
 +
For example Linear discriminant analysis (LDA) learns the matrix <math> W </math> that minimizes the within-class distances to between-class distances ratio. 
 +
 
 +
Globerson and Roweis <ref> A. Globerson and S. T. Roweis. Metric learning by collapsing
 +
classes. In NIPS, 2005 </ref> proposed a method for learning the matrix <math> W </math> such that the input vectors from the same class are mapped to a tight cluster. Also, Weinberger et.al. <ref> K. Q.Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005. </ref> also learned <math> W </math> with the goals of both making the K-nearest neighbours belong to the same class and making examples from different classes be separated by a large margin. All these methods rely on linear transformation, which has a limited number of parameters and thus cannot model higher-order correlations between the
 +
original data dimensions.
  
 
== Neighborhood Component Analysis ==
 
== Neighborhood Component Analysis ==
Line 11: Line 20:
 
== Pre-training step ==
 
== Pre-training step ==
  
== Fine-tuning ==
+
== Fine-tuning step ==
  
 
=Regularized Nonlinear NCA=
 
=Regularized Nonlinear NCA=

Revision as of 22:28, 30 June 2009

Introduction

The paper <ref>Salakhutdinov, R., & Hinton, G. E. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. AI and Statistics.</ref> presented here describes a method to learn a nonlinear transformation from the input space to a low-dimensional feature space in which K-nearest neighbour classification performs well. As the performance of algorithms like K-nearest neighbours (KNN) that are based on computing distances, the main objective of the proposed algorithm is to learn a good similarity measure that can provide insight into how high-dimensional data is organized. The nonlinear transformation is learned by pre-training and fine-tuning a multilayer neural network. The authors also show how to enhance the performance of non-linear transformation further using unlabeled data. Experimental results on a widely used version of the MNIST handwritten digit recognition task show that proposed algorithm achieves a much lower error rate than SVM or standard backpropagation.

Background and Related Work

Learning a similarity measure (or distance metric) over the input space [math] {\mathbf X} [/math] is an important task in machine learning, and is closely related to the feature extraction problem. A distance metric [math] \mathbf D [/math] (e. g. Euclidean) measures the similarity between two input vectors [math] {\mathbf x}^a, {\mathbf x}^b \in {\mathbf X} [/math] by computing [math] \mathbf D[{\mathbf f}(x^a|W),{\mathbf f}(x^b|W)][/math], where [math] {\mathbf f}(x|W)[/math] represents the mapping function from input vector [math] {\mathbf X} [/math] to feature space [math] {\mathbf Y} [/math] parametrized by [math] {\mathbf W} [/math]. Previous work studied this problem where [math] \mathbf D [/math] is the Euclidean distance and [math] {\mathbf f} [/math] is simple linear projection, i.e. [math] {\mathbf f}(x|W)=Wx [/math]. For example Linear discriminant analysis (LDA) learns the matrix [math] W [/math] that minimizes the within-class distances to between-class distances ratio.

Globerson and Roweis <ref> A. Globerson and S. T. Roweis. Metric learning by collapsing classes. In NIPS, 2005 </ref> proposed a method for learning the matrix [math] W [/math] such that the input vectors from the same class are mapped to a tight cluster. Also, Weinberger et.al. <ref> K. Q.Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005. </ref> also learned [math] W [/math] with the goals of both making the K-nearest neighbours belong to the same class and making examples from different classes be separated by a large margin. All these methods rely on linear transformation, which has a limited number of parameters and thus cannot model higher-order correlations between the original data dimensions.

Neighborhood Component Analysis

Nonlinear NCA

Pre-training step

Fine-tuning step

Regularized Nonlinear NCA

Splitting codes into class-relevant and class-irrelevant parts

Experiments

References

<references/>