self-Taught Learning: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
Line 9: Line 9:
The additional unlabeled data can then be used to learn a "higher-level feature representation on the inputs" which will make classification simpler. After this representation is found, it can be used on the labeled data. Classification is subsequently done in the new representation.
The additional unlabeled data can then be used to learn a "higher-level feature representation on the inputs" which will make classification simpler. After this representation is found, it can be used on the labeled data. Classification is subsequently done in the new representation.


Formally, suppose we have <math>m</math> labeled training points <math>\{(x_{\ell}^{(i)}, y^{(i)}), i = 1,\dots,m\}</math>, where <math>x_{\ell}^{(i)} \in \mathbb{R}^n</math> and <math>y^{(1)}</math> is a class label. Further assume that the data are independently and identically taken from some distribution. In addition, we also have a set of unlabeled data <math> \{x_u^{(1=i)}\}</math>, <math>x_u^{i} \in \mathbb{R}^n</math>. It is not required that this data comes from the same distribution (which differentiates this method from semi-supervised learning); however, the data should somehow be relevant (as it is with transfer learning).
Formally, suppose we have <mth>m\,</math> labeled training points <mth>\{(x_{\ell}^{(i)}, y^{(i)}), i = 1,\dots,m\}</math>, where <math>x_{\ell}^{(i)} \in \mathbb{R}^n</math> and <math>y^{(1)}</math> is a class label. Further assume that the data are independently and identically taken from some distribution. In addition, we also have a set of unlabeled data <math> \{x_u^{(1=i)}\}</math>, <math>x_u^{i} \in \mathbb{R}^n</math>. It is not required that this data comes from the same distribution (which differentiates this method from semi-supervised learning); however, the data should somehow be relevant (as it is with transfer learning).


   
   

Revision as of 19:15, 9 November 2010

Introduction

Self-taught learning is a new paradigm in machine learning that builds on ideas from existing supervised, semi-supervised and transfer learning algorithms. The differences in these methods depend on the availability of labeled and unlabeled data:

  • Supervised Learning - All data is labeled and of the same type (shares the same class labels).
  • Semi-supervised learning - Some of the data is labeled but all of it is the same type.
  • Transfer-learning All data is labeled but some is of some other type (i.e. has class labels that do not apply to data set that we wish to classify).

Self-taught learning combines the latter two ideas. It uses labeled data with labels from the desired classes and unlabeled data from other, but somehow similar, classes. It is important to emphasize that the unlabeled data need not belong to the class labels we wish to assign.

The additional unlabeled data can then be used to learn a "higher-level feature representation on the inputs" which will make classification simpler. After this representation is found, it can be used on the labeled data. Classification is subsequently done in the new representation.

Formally, suppose we have <mth>m\,</math> labeled training points <mth>\{(x_{\ell}^{(i)}, y^{(i)}), i = 1,\dots,m\}</math>, where [math]\displaystyle{ x_{\ell}^{(i)} \in \mathbb{R}^n }[/math] and [math]\displaystyle{ y^{(1)} }[/math] is a class label. Further assume that the data are independently and identically taken from some distribution. In addition, we also have a set of unlabeled data [math]\displaystyle{ \{x_u^{(1=i)}\} }[/math], [math]\displaystyle{ x_u^{i} \in \mathbb{R}^n }[/math]. It is not required that this data comes from the same distribution (which differentiates this method from semi-supervised learning); however, the data should somehow be relevant (as it is with transfer learning).


The main advantage of this approach is that unlabeled similar data is often easier and cheaper to obtain than labeled data belonging to our classes.


This method of learning applies best to classification of images, text and sound, so most discussion will focus on classification in this context.