Understanding the Effective Receptive Field in Deep Convolutional Neural Networks

From statwiki
Revision as of 19:15, 30 October 2017 by Sosadatr (talk | contribs)
Jump to navigation Jump to search

Introduction

What is the Receptive Field (RF) of a unit?

Why is RF important?

The concept of receptive field is important for understanding and diagnosing how deep Convolutional neural networks (CNNs) work. Since anywhere in an input image outside the receptive field of a unit does not affect the value of that unit, it is necessary to carefully control the receptive field, to ensure that it covers the entire relevant image region. In many tasks, especially dense prediction tasks like semantic image segmentation, stereo and optical flow estimation, where we make a prediction for each single pixel in the input image, it is critical for each output pixel to have a big receptive field, such that no important information is left out when making the prediction.

How to Increase RF size?

Make the network deeper by stacking more layers, which increases the receptive field size linearly by theory, as each extra layer increases the receptive field size by the kernel size.

Add sub-sampling layers to increase the receptive field size multiplicatively.

Modern deep CNN architectures like the VGG networks and Residual Networks use a combination of these techniques.

Intuition behind Effective Receptive Fields

The pixels at the center of a RF have a much larger impact on an output:

  • In the forward pass, central pixels can propagate information to the output through many different paths, while the pixels in the outer area of the receptive field have very few paths to propagate its impact.
  • In the backward pass, gradients from an output unit are propagated across all the paths, and therefore the central pixels have a much larger magnitude for the gradient from that output [More paths always mean larger gradient?].

Authors prove that in many cases the distribution of impact in a receptive field distributes as a Gaussian. Since Gaussian distributions generally decay quickly from the center, the effective receptive field, only occupies a fraction of the theoretical receptive field.


Experiments

Verifying Theoretical Results

ERFs are Gaussian distributed: We can observe perfect Gaussian shapes for uniformly and randomly weighted convolution kernels without nonlinear activations, and near Gaussian shapes for randomly weighted kernels with nonlinearity. Adding the ReLU nonlinearity makes the distribution a bit less Gaussian, as the ERF distribution depends on the input as well. Another reason is that ReLU units output exactly zero for half of its inputs and it is very easy to get a zero output for the center pixel on the output plane, which means no path from the receptive field can reach the output, hence the gradient is all zero. Here the ERFs are averaged over 20 runs with different random seed.

The figures on the right shows the ERF for networks with 20 layers of random weights, with different nonlinearities. Here the results are averaged both across 100 runs with different random weights as well as different random inputs. In this setting the receptive fields are a lot more Gaussian-like.

absolute growth and 1n relative shrinkage: Figure 2 shows the change of ERF size and the relative ratio of ERF over theoretical RF w.r.t number of convolution layers. The fitted line for ERF size has the slope of 0.56 in log domain, while the line for ERF ratio has the slope of -0.43. This indicates ERF size is growing linearly w.r.t and ERF ratio is shrinking linearly w.r.t.

here we use 2 standard deviations as our measurement for ERF size, i.e. any pixel with value greater than 1 - 95.45% of center point is considered as in ERF. The ERF size is represented by the square root of number of pixels within ERF, while the theoretical RF size is the side length of the square in which all pixel has a non-zero impact on the output pixel, no matter how small. All experiments here are averaged over 20 runs.

Subsampling & dilated convolution increases receptive field: The figure on the right shows the effect of subsampling and dilated convolution. The reference baseline is a CNN with 15 dense convolution layers. Its ERF is shown in the left-most figure. Replacing 3 of the 15 convolutional layers with stride-2 convolution results in the ERF for the ‘Subsample’ figure. Finally, replacing those 3 convolutional layers with dilated convolution with factor 2,4 and 8 gives the ‘Dilation’ figure. Both of them are able to increase the effect receptive field significantly. Note the ‘Dilation’ figure shows a rectangular ERF shape typical for dilated convolutions (why?).

Discussion

Conclusion

Authors showed ,theoretically and experimentally, that the distribution of impact within the receptive field is asymptotically Gaussian, and the effective receptive field only takes up a fraction of the full theoretical receptive field. They also studied

References