Summary of A Probabilistic Approach to Neural Network Pruning

From statwiki
Revision as of 18:29, 15 November 2021 by Eofarrel (talk | contribs) (build framework for group 10 paper summary)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Presented by

Stefan Vladusic, Waqas Hamed, Justin D'Astous, Ethan O'Farrell


Background

High performance neural networks often have a very large number of training parameters. [Frankle and Carbin] propose the Lottery Ticket Hypothesis that a trained neural network will contain many smaller subnetworks containing the most important logic. When training these subnetworks in isolation after the initialization, they can acheive similar accuracy compared to the original network. [Ramanujan et al.] further proposes that the subnetworks can achieve similar accuracy without having to be further trained. However, finding these lottery tickets inside a large neural network is computationally expensive (NP hard in general).

There has been little research on the theoretical guarantees of pruning. This study, A Probabilistic Approach to Neural Network Pruning by Xin Qian and Diego Klabjan, focuses on the theoretical results from creating subnetworks by pruning weights. It covers pruning random weights in Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) as well as pruning weights based on their magnitude in FCNs. It does not cover the theoretical implications of pruning entire neurons.


Literature Review

Empirical Network Pruning: The theory on neural network pruning has transitioned from pruning individual connections based on the loss function [Lecun et al] to pruning connections based on its magnitude [Han et al]. Papers began to focus on pruning entire neurons [Hu et al] or convolution channels [Li et al] that could be redundant. Recently, the focus has shifted to instead find a smaller subnetwork inside an overparameterized network with comparable predictive power and prune the rest.

Theoretical Study of Network Pruning: Some research has looked into pruning FCNs with ReLU activation. Their work used the idea that a single ReLU neuron can be approximated using a two-hidden-layer neural network with constant width [Pensia et al, Orseau et al]. Other methods of pruning currently being researched are a greedy optimization based pruning method [Ye et al] and sampling-based pruning algorithms based on sensitivity scores [Baykal et al, Liebenwein et al].


Pruning

Deep neural networks can become computationally intensive and memory intensive. To tackle that people have employed neural network pruning in recent years. Pruning is simply the act of reducing the weights of a neural network layer. As explained in detail later, pruning can reduce the complexity of the network while keeping the difference in errors of the original neural network and the pruned neural network minimal. Pruning is either performed on individual weights or on entire neurons (whole column in a weight matrix). In the paper, only pruning individual weights has been discussed.

Pruning is a 3-step process that can be summarized as:

1. Train the network.

2. Prune the network.

3. Re-train the network.

The figure [Han et al, 2015] below demonstrates both pruning individual weights/synapses and pruning neurons.

image insert

Magnitude based pruning refers to reducing the number of weights in a layer based on the weight’s absolute value. The intuition behind this is that the weights with values close to zero, do not contribute significantly to the predictive ability of the network. To perform this, a mask matrix M, which has the same size as the weight matrix, is used. M has all entries equal to 1, except elements corresponding to weights to be pruned where the entry is instead 0. These weights are selected based on their magnitude. In the pruned network, the mask is multiplied element-wise with the weight matrix before re-training.

Random pruning also uses a mask matrix M. The weights to be pruned are decided randomly selected.


Pruning Fully Connected Networks

Future Works

References

[1] Baykal, C., Liebenwein, L., Gilitschenski, I., Feldman, D., and Rus, D. Sipping neural networks: Sensitivity informed provable pruning of neural networks. arXiv preprint arXiv:1910.05422, 2019b.

[2] Frankle, J. and Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. International Conference on Learning Representations, 2018.

[3] Han, S., Mao, H., and Dally, W. J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. International Conference on Learning Representations, 2015a.

[4] Hu, H., Peng, R., Tai, Y.-W., and Tang, C.-K. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.

[5] LeCun, Y., Denker, J. S., Solla, S. A., Howard, R. E., and Jackel, L. D. Optimal brain damage. Advances in Neural Information Processing Systems, volume 2, pp. 598–605, 1989

[6] Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H. P. Pruning filters for efficient convnets. International Conference on Learning Representations, 2016.

[7] Liebenwein, L., Baykal, C., Lang, H., Feldman, D., and Rus, D. Provable filter pruning for efficient neural networks. International Conference on Learning Representations, 2020

[8] Pensia, A., Rajput, S., Nagle, A., Vishwakarma, H., and Papailiopoulos, D. Optimal lottery tickets via subset sum: Logarithmic over-parameterization is sufficient. Advances in Neural Information Processing Systems, 33, 2020.

[9] Orseau, L., Hutter, M., and Rivasplata, O. Logarithmic pruning is all you need. Advances in Neural Information Processing Systems, 33, 2020.

[10] Ramanujan, V., Wortsman, M., Kembhavi, A., Farhadi, A., and Rastegari, M. What’s hidden in a randomly weighted neural network? Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11893–11902, 2020

[11] Ye, M., Gong, C., Nie, L., Zhou, D., Klivans, A., and Liu, Q. Good subnetworks provably exist: Pruning via greedy forward selection. In International Conference on Machine Learning, pp. 10820–10830, 2020a.

[12] Ye, M., Wu, L., and Liu, Q. Greedy optimization provably wins the lottery: Logarithmic number of winning tickets is enough. arXiv preprint arXiv:2010.15969, 2020b.