Revision as of 23:28, 30 June 2009

Introduction

The paper <ref>Salakhutdinov, R., & Hinton, G. E. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. AI and Statistics.</ref> presented here describes a method to learn a nonlinear transformation from the input space to a low-dimensional feature space in which K-nearest neighbour classification performs well. As the performance of algorithms like K-nearest neighbours (KNN) that are based on computing distances, the main objective of the proposed algorithm is to learn a good similarity measure that can provide insight into how high-dimensional data is organized. The nonlinear transformation is learned by pre-training and fine-tuning a multilayer neural network. The authors also show how to enhance the performance of non-linear transformation further using unlabeled data. Experimental results on a widely used version of the MNIST handwritten digit recognition task show that proposed algorithm achieves a much lower error rate than SVM or standard backpropagation.

Background and Related Work

Learning a similarity measure (or distance metric) over the input space [math]\displaystyle{ {\mathbf X} }[/math] is an important task in machine learning, and is closely related to the feature extraction problem. A distance metric [math]\displaystyle{ \mathbf D }[/math] (e. g. Euclidean) measures the similarity between two input vectors [math]\displaystyle{ {\mathbf x}^a, {\mathbf x}^b \in {\mathbf X} }[/math] by computing [math]\displaystyle{ \mathbf D[{\mathbf f}(x^a|W),{\mathbf f}(x^b|W)] }[/math], where [math]\displaystyle{ {\mathbf f}(x|W) }[/math] represents the mapping function from input vector [math]\displaystyle{ {\mathbf X} }[/math] to feature space [math]\displaystyle{ {\mathbf Y} }[/math] parametrized by [math]\displaystyle{ {\mathbf W} }[/math]. Previous work studied this problem where [math]\displaystyle{ \mathbf D }[/math] is the Euclidean distance and [math]\displaystyle{ {\mathbf f} }[/math] is simple linear projection, i.e. [math]\displaystyle{ {\mathbf f}(x|W)=Wx }[/math]. For example Linear discriminant analysis (LDA) learns the matrix [math]\displaystyle{ W }[/math] that minimizes the within-class distances to between-class distances ratio.

Globerson and Roweis <ref> A. Globerson and S. T. Roweis. Metric learning by collapsing classes. In NIPS, 2005 </ref> proposed a method for learning the matrix [math]\displaystyle{ W }[/math] such that the input vectors from the same class are mapped to a tight cluster. Also, Weinberger et.al. <ref> K. Q.Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005. </ref> also learned [math]\displaystyle{ W }[/math] with the goals of both making the K-nearest neighbours belong to the same class and making examples from different classes be separated by a large margin. All these methods rely on linear transformation, which has a limited number of parameters and thus cannot model higher-order correlations between the original data dimensions.

@@ Line 3: / Line 3: @@
 feature space in which K-nearest neighbour classification performs well. As the performance of algorithms like K-nearest neighbours (KNN) that are based on computing distances, the main objective of the proposed algorithm is to learn a good similarity measure that can provide insight into how high-dimensional data is organized. The nonlinear transformation is learned by pre-training and fine-tuning a multilayer neural network. The authors also show how to enhance the performance of non-linear transformation further using unlabeled data. Experimental results on a widely used version of the MNIST handwritten digit recognition task show that proposed algorithm achieves a much lower error rate than SVM or standard backpropagation.
-=Related work=
+=Background and Related Work=
+Learning a similarity measure (or distance metric) over the input space <math> {\mathbf X} </math> is an important task in machine learning, and is closely related to the feature extraction problem.
+A distance metric <math> \mathbf D </math> (e. g. Euclidean) measures the similarity between two input vectors <math> {\mathbf x}^a, {\mathbf x}^b \in {\mathbf X} </math> by computing <math> \mathbf D[{\mathbf f}(x^a|W),{\mathbf f}(x^b|W)]</math>, where <math> {\mathbf f}(x|W)</math> represents the mapping function from input vector <math> {\mathbf X} </math> to feature space <math> {\mathbf Y} </math> parametrized by <math> {\mathbf W} </math>.
+Previous work studied this problem where <math> \mathbf D </math> is the Euclidean distance and
+<math> {\mathbf f} </math> is simple linear projection, i.e. <math> {\mathbf f}(x|W)=Wx </math>.
+For example Linear discriminant analysis (LDA) learns the matrix <math> W </math> that minimizes the within-class distances to between-class distances ratio.
+Globerson and Roweis <ref> A. Globerson and S. T. Roweis. Metric learning by collapsing
+classes. In NIPS, 2005 </ref> proposed a method for learning the matrix <math> W </math> such that the input vectors from the same class are mapped to a tight cluster. Also, Weinberger et.al. <ref> K. Q.Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005. </ref> also learned <math> W </math> with the goals of both making the K-nearest neighbours belong to the same class and making examples from different classes be separated by a large margin. All these methods rely on linear transformation, which has a limited number of parameters and thus cannot model higher-order correlations between the
+original data dimensions.
 == Neighborhood Component Analysis ==
@@ Line 11: / Line 20: @@
 == Pre-training step ==
-== Fine-tuning ==
+== Fine-tuning step ==
 =Regularized Nonlinear NCA=

learning a Nonlinear Embedding by Preserving Class Neighborhood Structure: Difference between revisions

Revision as of 23:28, 30 June 2009

Contents

Introduction

Background and Related Work

Neighborhood Component Analysis

Nonlinear NCA

Pre-training step

Fine-tuning step

Regularized Nonlinear NCA

Splitting codes into class-relevant and class-irrelevant parts

Experiments

References

Navigation menu

learning a Nonlinear Embedding by Preserving Class Neighborhood Structure: Difference between revisions

Revision as of 23:28, 30 June 2009

Introduction

Background and Related Work

Neighborhood Component Analysis

Nonlinear NCA

Pre-training step

Fine-tuning step

Regularized Nonlinear NCA

Splitting codes into class-relevant and class-irrelevant parts

Experiments

References

Navigation menu

Search