Learning the Number of Neurons in Deep Networks

From statwiki
Jump to navigation Jump to search

Introduction

Due to the availability of large-scale datasets and powerful computation, Deep Learning has made huge breakthroughs in many areas, like Language Models and Computer Vision. In spite of this, building a very deep model is still challenging, especially for the very large datasets. In deep neural networks, we need to determine the number of layers and the number of neurons in each layer, i.e, we need to determine the number of parameters, or complexity of the model. Typically, this is determined by errors manually.

The recent researches tend to build very deep networks. Building very deep networks means we need to learn more parameters, which leads to a significant cost on the memory of the equipment as well as the speed. Even though automatic model selection has developed in the past years by constructive and destructive approaches, there are some drawbacks. For constructive method, it starts a super shallow architecture, and then adds additional parameters [Bello, 1992] or extra layers to the network [Simonyan and Zisserman, 2014] at the process of learning. The drawback of this method is that these networks have fewer parameters, thus less expressive, and may have poor initialization at the later processes. For destructive method, it starts by a deep network to reduce a significant number of redundant parameters [Denil et al., 2013, Cheng et al., 2015]. Even though this technique has shown removing the redundant parameters [LeCun et al., 1990, Hassibi et al., 1993] or the neutrons [Mozer and Smolensky, 1988, Ji et al., 1990, Reed, 1993] has little influence on the output, it requires the analysis of each parameter and neuron by network Hessian, which does not work well for large architectures.

In this paper, we use an approach to automatically choose the number of neurons in each layer when we learn the network. Our approach introduces a group sparsity regularizer on the parameters of the network, and each group acts on the parameters of one neuron, rather than trains an initial network as as pre-processing step(training shallow or thin networks to mimic the behaviour of deep ones [Hinto et al., 2014, Romero et al., 2015]). We set those useless parameters to zero, which cancels out the effects of a particular neuron. Therefore, our approach does not need to learn the redundant network successfully and then reduce its parameters, instead, it learns the number of relevant neutrons in each layer and the parameters of those neurons simultaneously.

Previous Work

Model Training and Model Selection

Experiment

Set Up

Results

Analysis on Testing

Conclusion

References