extracting and Composing Robust Features with Denoising Autoencoders: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 72: Line 72:
manifold, but it retains the property of capturing the main factors of variation
manifold, but it retains the property of capturing the main factors of variation
in the data.
in the data.
= Experiments =
The Input contains different
variations of the MNIST digit classification problem, with added factors of
variation such as rotation (rot), addition of a background composed of random
pixels (bg-rand) or made from patches extracted from a set of images (bg-img), or
combinations of these factors (rot-bg-img). These variations render the problems particularly challenging for current generic learning algorithms. Each problem
is divided into a training, validation, and test set (10000, 2000, 50000 examples
respectively). A subset of the original MNIST problem is also included with the
same example set sizes (problem basic). The benchmark also contains additional
binary classification problems: discriminating between convex and non-convex
shapes (convex), and between wide and long rectangles (rect, rect-img).
Neural networks with 3 hidden layers initialized by stacking denoising autoencoders
(SdA-3), and fine tuned on the classification tasks, were evaluated
on all the problems in this benchmark. Model selection was conducted following
a similar procedure as Larochelle et al. (2007). Several values of hyper
parameters (destruction fraction ν, layer sizes, number of unsupervised training
epochs) were tried, combined with early stopping in the fine tuning phase. For
each task, the best model was selected based on its classification performance
on the validation set.
The results can be reported in the following table.
[[File:Qq1.png]]
The filter obtained by training are shown the the figure below
[[File:Qq3.png]]

Revision as of 20:24, 17 November 2015

Introduction

This Paper explores a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be stacked to initialize deep architectures. The algorithm can be motivated from a manifold learning and information theoretic perspective or from a generative model perspective.

Motivation

The approach is based on the use of an unsupervised training criterion to perform a layer-by-layer initialization. The procedure is as follows : Each layer is at first trained to produce a higher level (hidden) representation of the observed patterns, based on the representation it receives as input from the layer below, by optimizing a local unsupervised criterion. Each level produces a representation of the input pattern that is more abstract than the previous level’s, because it is obtained by composing more operations. This initialization yields a starting point, from which a global fine-tuning of the model’s parameters is then performed using another training criterion appropriate for the task at hand.

This process gives better solutions than the one obtained by random initializations

The Denoising Autoencoder

A Denoising Autoencoder reconstructs a clean “repaired” input from a corrupted, partially destroyed one. This is done by first corrupting the initial input x to get a partially destroyed version x˜ by means of a stochastic mapping. In this paper the noise is added by zeroing a fixed number νd of components are chosen at random and leaving the rest untouched. Thus the objective function can be described as File:q1.png

The objective function minimized by stochastic gradient descent becomes: File:q3.png

where the loss function is the cross entropy of the model The denoising autoencoder can be shown in the figure as

File:q2.png

Layer-wise Initialization and Fine Tuning

While training the denoising autoencoder k-th layer used as input for the (k + 1)-th, and the (k + 1)-th layer trained after the k-th has been trained. After a few layers have been trained, the parameters are used as initialization for a network optimized with respect to a supervised training criterion. This greedy layer-wise procedure has been shown to yield significantly better local minima than random initialization of deep networks, achieving better generalization on a number of tasks.

Analysis of the Denoising Autoencoder

Manifold Learning Perspective

The process of mapping a corrupted example to an uncorrupted one can be visualized in Figure 2, with a low-dimensional manifold near which the data concentrate. We learn a stochastic operator p(X|~X) that maps an ~X to an X.


File:q4.png


The denoising autoencoder can thus be seen as a way to define and learn a manifold. The intermediate representation Y = f(X) can be interpreted as a coordinate system for points on the manifold (this is most clear if we force the dimension of Y to be smaller than the dimension of X). More generally, one can think of Y = f(X) as a representation of X which is well suited to capture the main variations in the data, i.e., on the manifold. When additional criteria (such as sparsity) are introduced in the learning model, one can no longer directly view Y = f(X) as an explicit low-dimensional coordinate system for points on the manifold, but it retains the property of capturing the main factors of variation in the data.

Experiments

The Input contains different variations of the MNIST digit classification problem, with added factors of variation such as rotation (rot), addition of a background composed of random pixels (bg-rand) or made from patches extracted from a set of images (bg-img), or combinations of these factors (rot-bg-img). These variations render the problems particularly challenging for current generic learning algorithms. Each problem is divided into a training, validation, and test set (10000, 2000, 50000 examples respectively). A subset of the original MNIST problem is also included with the same example set sizes (problem basic). The benchmark also contains additional binary classification problems: discriminating between convex and non-convex shapes (convex), and between wide and long rectangles (rect, rect-img). Neural networks with 3 hidden layers initialized by stacking denoising autoencoders (SdA-3), and fine tuned on the classification tasks, were evaluated on all the problems in this benchmark. Model selection was conducted following a similar procedure as Larochelle et al. (2007). Several values of hyper parameters (destruction fraction ν, layer sizes, number of unsupervised training epochs) were tried, combined with early stopping in the fine tuning phase. For each task, the best model was selected based on its classification performance on the validation set. The results can be reported in the following table.

The filter obtained by training are shown the the figure below