learning a Nonlinear Embedding by Preserving Class Neighborhood Structure

From statwiki
Revision as of 22:47, 30 June 2009 by Bkhalegh (talk | contribs)
Jump to: navigation, search


The paper <ref>Salakhutdinov, R., & Hinton, G. E. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. AI and Statistics.</ref> presented here describes a method to learn a nonlinear transformation from the input space to a low-dimensional feature space in which K-nearest neighbour classification performs well. As the performance of algorithms like K-nearest neighbours (KNN) that are based on computing distances, the main objective of the proposed algorithm is to learn a good similarity measure that can provide insight into how high-dimensional data is organized. The nonlinear transformation is learned by pre-training and fine-tuning a multilayer neural network. The authors also show how to enhance the performance of non-linear transformation further using unlabeled data. Experimental results on a widely used version of the MNIST handwritten digit recognition task show that proposed algorithm achieves a much lower error rate than SVM or standard backpropagation.

Background and Related Work

Learning a similarity measure (or distance metric) over the input space [math] {\mathbf X} [/math] is an important task in machine learning, and is closely related to the feature extraction problem. A distance metric [math] \mathbf D [/math] (e. g. Euclidean) measures the similarity between two input vectors [math] {\mathbf x}^a, {\mathbf x}^b \in {\mathbf X} [/math] by computing [math] \mathbf D[{\mathbf f}(x^a|W),{\mathbf f}(x^b|W)][/math], where [math] {\mathbf f}(x|W)[/math] represents the mapping function from input vector [math] {\mathbf X} [/math] to feature space [math] {\mathbf Y} [/math] parametrized by [math] {\mathbf W} [/math]. Previous work studied this problem where [math] \mathbf D [/math] is the Euclidean distance and [math] {\mathbf f} [/math] is simple linear projection, i.e. [math] {\mathbf f}(x|W)=Wx [/math]. For example Linear discriminant analysis (LDA) learns the matrix [math] W [/math] that minimizes the within-class distances to between-class distances ratio.

Globerson and Roweis <ref> A. Globerson and S. T. Roweis. Metric learning by collapsing classes. In NIPS, 2005 </ref> proposed a method for learning the matrix [math] W [/math] such that the input vectors from the same class are mapped to a tight cluster. Also, Weinberger et.al. <ref> K. Q.Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005. </ref> also learned [math] W [/math] with the goals of both making the K-nearest neighbours belong to the same class and making examples from different classes be separated by a large margin. All these methods rely on linear transformation, which has a limited number of parameters and thus cannot model higher-order correlations between the original data dimensions.

Proposed Method

Neighborhood Component Analysis

Pre-training step

Fine-tuning step

Regularized Nonlinear NCA

Splitting codes into class-relevant and class-irrelevant parts