neighbourhood Components Analysis: Difference between revisions

Revision as of 23:32, 28 June 2009

Introduction

Neighbourhood Components Analysis (NCA) is a method for learning a Mahalnobis distance measure for k-nearest neighbours (KNN). In particular, it finds a distance metric that maximises the leave one out (LOO) error on the training set for a stochastic variant of KNN. NCA can also learn a low-dimensional linear embedding of labelled data for data visualisation and for improved KNN classification speed.

k-Nearest Neighbours

k-Nearest neighbours is a simple classification technique that determines a test point's label by looking at the labels of the [math]\displaystyle{ k }[/math] training points that are nearest the test point. This is a surprisingly effective method that has a non-linear decision surface that is non-parametric, except for the parameter [math]\displaystyle{ k }[/math].

However, KNN suffers from two problems. First, it can be computationally expensive to classify points, as they must be compared to the entire training set. There is also the problem of determining which distance metric to define "nearest" points.

NCA and Stochastic Nearest Neighbours

NCA attacks the above two problems of KNN. It finds a distance metric that defines which points are nearest. It can restrict this distance metric to be low rank, reducing the dimensionality of the data and thus reducing storage and search times.

NCA finds the matrix [math]\displaystyle{ A }[/math] where [math]\displaystyle{ Q=A^TA }[/math] and distance between two points is defined as: [math]\displaystyle{ d(x,y) = (x - y)^TQ(x-y) = (Ax - Ay)^T(Ax - Ay) }[/math]

@@ Line 6: / Line 6: @@
 k-Nearest neighbours is a simple classification technique that determines a test point's label by looking at the labels of the <math>k</math> training points that are nearest the test point.  This is a surprisingly effective method that has a non-linear decision surface that is non-parametric, except for the parameter <math>k</math>.
-However, KNN suffers from two problems.  First, it can be computationally expensive to classify points, as they must be compared to the entire training set.  There is also the problem of determining which distance metric to define ``nearest'' points.
+However, KNN suffers from two problems.  First, it can be computationally expensive to classify points, as they must be compared to the entire training set.  There is also the problem of determining which distance metric to define "nearest" points.
 == NCA and Stochastic Nearest Neighbours ==

neighbourhood Components Analysis: Difference between revisions

Revision as of 23:32, 28 June 2009

Contents

Introduction

k-Nearest Neighbours

NCA and Stochastic Nearest Neighbours

Low Rank Distance Metrics and Nonsquare Projection

Experimental Results

Extensions to Continuous Labels and Semi-Supervised Learning

Relationship to Other Methods

Navigation menu

neighbourhood Components Analysis: Difference between revisions

Revision as of 23:32, 28 June 2009

Introduction

k-Nearest Neighbours

NCA and Stochastic Nearest Neighbours

Low Rank Distance Metrics and Nonsquare Projection

Experimental Results

Extensions to Continuous Labels and Semi-Supervised Learning

Relationship to Other Methods

Navigation menu

Search