learning Hierarchical Features for Scene Labeling: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 20: Line 20:
'''Pre-processing'''
'''Pre-processing'''


Before being put into the Convolutional Neural Network (CNN) the image is first passed through a Laplacian image processing pyramid to acquire different scale maps. There were three different scale outputs of the image created.
Before being put into the Convolutional Neural Network (CNN) the image is first passed through a Laplacian (which is just the derivative of the Gaussian) image processing pyramid to acquire different scale maps. There were three different scale outputs of the image created, in a similar manner shown in the picture below
 
[[File:Image_pyramid.svg]]


'''Network Architecture'''
'''Network Architecture'''
Line 26: Line 28:
A typical three layer (convolution of kernel with feature map, non-linearity, pooling) CNN architecture was used. The function tanh served as the non-linearity. The kernel being used were 7x7 Toeplitz matrices. The pooling operation was performed by the 2x2 max-pool operator.
A typical three layer (convolution of kernel with feature map, non-linearity, pooling) CNN architecture was used. The function tanh served as the non-linearity. The kernel being used were 7x7 Toeplitz matrices. The pooling operation was performed by the 2x2 max-pool operator.


The connection weights were applied to all of the images, thus allowing for the detection of scale-invariant features.
The same network was applied to all three different sized images. Since the parameters were shared between the networks, the ''same'' connection weights were applied to all of the images, thus allowing for the detection of scale-invariant features.


For training, stochastic gradient descent was used. To avoid over-fitting, jitter, horizontal flipping, rotations between +8 and -8, and rescaling between 90 and 110% was used.
For training, stochastic gradient descent was used. To avoid over-fitting, jitter, horizontal flipping, rotations between +8 and -8, and rescaling between 90 and 110% was used.

Revision as of 14:29, 2 November 2015

Introduction

Test input: The input into the network was a static image such as the one below:

File:cows in field.png

Training data and desired result: The desired result (which is the same format as the training data given to the network for supervised learning) is an image with large features labelled.

Methodology

Below we can see a flow of the overall approach.

File:yann flow.png

Pre-processing

Before being put into the Convolutional Neural Network (CNN) the image is first passed through a Laplacian (which is just the derivative of the Gaussian) image processing pyramid to acquire different scale maps. There were three different scale outputs of the image created, in a similar manner shown in the picture below

Network Architecture

A typical three layer (convolution of kernel with feature map, non-linearity, pooling) CNN architecture was used. The function tanh served as the non-linearity. The kernel being used were 7x7 Toeplitz matrices. The pooling operation was performed by the 2x2 max-pool operator.

The same network was applied to all three different sized images. Since the parameters were shared between the networks, the same connection weights were applied to all of the images, thus allowing for the detection of scale-invariant features.

For training, stochastic gradient descent was used. To avoid over-fitting, jitter, horizontal flipping, rotations between +8 and -8, and rescaling between 90 and 110% was used.

Post-Processing

Unlike previous approaches, the emphasis of this scene-labelling method was to rely on a highly accurate pixel labelling system. So, despite the fact that a variety of approaches were attempted, including SuperPixels, Conditional Random Fields and gPb, the simple approach of super-pixels often yielded state of the art results.

Results

Future Work