Wavelet Pooling CNN
It is generally the case that Convolution Neural Networks (CNNs) out perform vector-based deep learning techniques. As such, the fundamentals of CNNs are good candidates to be innovated in order to improve said performance. The pooling layer is one of these fundamentals, and although various methods exist ranging from deterministic and simple: max pooling and average pooling, to probabilistic: mixed pooling and stochastic pooling, all these methods employ a neighborhood approach to the sub-sampling which, albeit fast and simple, can produce artifacts such as blurring, aliasing, and edge halos (Parker et al., 1983).
This paper introduces a novel pooling method based on the discrete wavelet transform. Specifically, it uses a second-level wavelet decomposition for the sub-sampling. This method, instead of nearest neighbor interpolation, uses a sub-band method that the authors claim produces less artifacts and represents the underlying features more accurately. Therefore, if pooling is viewed as a lossy process, the reason for employing a wavelet approach is to try to minimize this loss.
Pooling essentially means sub-sampling. After the pooling layer, the spatial dimensions of the data is reduced to some degree, with the goal being to compress the data rather than discard some of it. Typical approaches to pooling reduce the dimensionality by using some method to combine a region of values into one. For max pooling, this can be represented by the equation (EQUATION) where akij is the output activation of the kth feature map at (i,j), akpq is input activation at (p,q) within Rij, and Rij is the size of the pooling region. Mean pooling can be represented by the equation (EQUATION) with everything defined as before. Figure 1 provides a numerical example that can be followed.
[insert figure 1]
The paper mentions that these pooling methods, although simple and effective, have shortcomings. Max pooling can omit details from an image if the important features have less intensity than the insignificant ones, and also commonly overfits. On the other hand, average pooling can dilute important features if the data is averaged with values of significantly lower intensities. Figure 2 displays an image of this.
[insert figure 2]