deep Sparse Rectifier Neural Networks
Introduction
Two trends in Deep Learning can be seen in terms of architecture improvements. The first is increasing sparsity (for example, see convolutional neural nets) and increasing biological plausibility (biologically plausible sigmoid neurons performing better than tanh neurons). Rectified linear neurons are good for sparsity and for biological plausibility, thus should increase performance.
Biological Plausibility and Sparsity
In the brain, neurons rarely fire at the same time as a way to balance quality of representation and energy conservation. This is in stark contrast to sigmoid neurons which fire at 1/2 of their maximum rate when at zero. A solution to this problem is to use a rectifier neuron which does not fire at it's zero value.
-
Sigmoid and TANH Neuron
-
Leaky Integrate Fire Neuron
-
Rectified Linear Neuron
Given that the rectifier neuron has a larger range of inputs that will be output as zero, it's representation will obviously be more sparse. In the paper, the two most salient advantages of sparsity are:
- Information Disentangling As opposed to a dense representation, where every slight input change results in a considerable output change, a the non-zero items of a sparse representation remain almost constant to slight input changes.
- Variable Dimensionality A sparse representation can effectively choose how many dimensions to use to represent a variable, since it choose how many non-zero elements to contribute. Thus, the precision is variable, allowing for more efficient representation of complex items.
Further benefits of a sparse representation and rectified linear neurons in particular are better linear separability and less computational complexity (most units are off and for on-units only a linear functions has to be computed).
However, it should also be noted that sparsity reduces the capacity of the model because each unit takes part in the representation of fewer values.
Potential problems of rectified linear units
The zero derivative below zero in the rectified neurons blocks the back-propagation of the gradient during learning. Using a smooth variant of the rectification non-linearity (the softplus activation) this effect was investigated. Surprisingly, the results suggest the hard rectifications performs better. The authors hypothesize that the hard rectification is not a problem as long as the gradient can be propagated along some paths through the network and that the complete shut-off with the hard rectification sharpens the credit attribution to neurons in the learning phase.
Furthermore, the unbounded nature of the rectification non-linearity can lead to numerical instabilities if activations grow too large. To circumvent this a [math]\displaystyle{ L_1 }[/math] regularizer is used.
Experiments
Networks with rectifier neurons were applied to the domains of image recognition and sentiment analysis. The datasets for image recognition included both black and white (MNIST, NISTP), colour (CIFAR10) and stereo (NORB) images.
The datasets for sentiment analysis were taken from opentable.com and Amazon. The task of both was to predict the star rating based off the text blurb of the review.
Results
Results from image classification File:rectifier res 1.png
Results from sentiment classification File:rectifier res 2.png
In the NORB and sentiment analysis cases, the network benefited greatly from pre-training. However, the benefit in NORB diminished as the training set size grew.
The result from the Amazon dataset was 78.95%, while the state of the art was 73.72%.
The sparsity achieved with the rectified linear neurons helps to diminish the gap between networks with unsupervised pre-training and no pre-training.
Criticism
Rectifier neurons really aren't biologically plausible for a variety of reasons. Namely, the neurons in the cortex do not have tuning curves resembling the rectifier. Additionally, the ideal sparsity of the rectifier networks were from 50 to 80%, while the brain is estimated to have a sparsity of around 95 to 99%.