Multiobjective Pruning for CNNs using Genetic Algorithm
Abstract
In this work, we propose a heuristic genetic algorithm (GA) for pruning convolutional neural networks (CNNs) according to the multiobjective tradeoff among error, computation and sparsity. In our experiments, we apply our approach to prune pretrained LeNet across the MNIST dataset, which reduces 95.42% parameter size and achieves 16Ã speedups of convolutional layer computation with tiny accuracy loss by laying emphasis on sparsity and computation, respectively. Our empirical study suggests that GA is an alternative pruning approach for obtaining a competitive compression performance. Additionally, compared with stateoftheart approaches, GA is capable of automatically pruning CNNs based on the multiobjective importance by a predefined fitness function.
Keywords:
Genetic algorithm Convolutional neural networks Multiobjective pruning.1 Introduction
Vision application scenarios often have different requirements in terms of multiobjective importance about error, computational cost and storage for convolutional neural networks (CNNs), but stateoftheart pruning approaches do not take this into account. Thus, we develop the genetic algorithm (GA) that can iteratively prune redundant parameters based on the multiobjective tradeoff by a twostep procedure. First, we prune the network by taking the advantages of swarm intelligence. Next, we retrain the elite network and reinitialize the population by the trained elite. Compared with stateoftheart approaches, our approach obtains a comparable result on sparsity and a significant improvement on computation reduction. In addition, we detail how to adjust the fitness function for obtaining diverse compression performances in practical applications.
2 Proposed Approach
2.1 Evaluation Regulation
Similar to general evolutionary algorithms, we design a fitness function to evaluate the comprehensive performance of a genome. In our method, is defined by the weighted average of error rate , computation remained rate and sparsity . And our target is to minimize the fitness function as follows:
(1)  
The coefficients , , and adjust the importance of the three objectives. , and denote the percentage of misclassified samples, remained FLOPs (the number of multiplicationaddition operations) and zeroed out parameters, respectively. From the experimental analyses in section 3, treating the multiobjective nature of the problem by linear combination and scalarization is indeed effective and consistent to our expectation, albeit more sophisticated fitness function may further improve the result.
2.2 Heuristic Pruning Procedure
2.2.1 Genetic Encoding and Initialization.
A CNN is encoded to a genome including parameter genes that denoted by , where denotes the depth of the CNN, denotes the th layer parameter with a 4D tensor of size ÃÃÃ in convolutional (CONV) layer, or a 2D tensor of size Ã in fullconnected (FC) layer, where , , and denote the size of filters, input channels, height and width of kernels, and denote the size of output and input features, respectively. We apply times mutations on a pretrained CNN to generate the initial population consisting of genomes.
2.2.2 Selection.
We straightforward select the top genomes with minimum fitness to reproduce next generation. It is worth mentioning that we have attempted a variety of selection operations, such as tournament selection, roulettewheel selection and truncation selection. Our empirical results indicate that different selection operations finally obtain the similar performance but the vanilla selection which we adopt has the fastest convergence speed.
2.2.3 Crossover.
Crossover operations are occurred among the selected genomes based on the crossover rate . We employ the classical microbial crossover which is first proposed in [1] inspired by bacterial conjugation. For each crossover, we choose two genomes randomly, from which the one with lower fitness is called Winner genome, and the other one is called Loser genome. Then, each gene in Loser genome is copied from Winner genome based on 50% probability. Thus, Winner genome can remain unchanged to preserve the good performance, and Loser genome can be modified to generate possibly better performance by the infection of Winner genome. One potential strength of microbial crossover is implicitly remaining the elite genome to the next generation, since the fittest genome can win any tournaments against any genomes.
2.2.4 Mutation.
2.2.5 Main Procedure.
After each heuristic pruning process including selection, crossover and mutation with iterations, we retrain the elite genome so that the remained weights can compensate for the loss of accuracy, and then reinitialize the population by the trained elite genome. The above procedures are repeated iteratively until the fitness of the elite genome is convergence. Algorithm 1 illustrates the whole procedures of multiobjective pruning by GA.
3 Experimental Results and Analyses
The hyperparameter settings of GA are as follows: population size , number of selected genomes , crossover rate , mutation rate and , iteration number . Albeit we find that further hyperparameter tuning can obtain better results, such as increasing population size or diminishing mutation rate, but corresponding with more time cost.
Comprehensive comparison with stateoftheart approaches is summarized in Table 1. We highlight in particular that different pruning performances can be obtained by adjusting . Meanwhile, we empirically analyze the effectiveness by custom with corresponding curves which are exhibited in Fig.2. Note that CONV layers and FC layers are the main source of computation and parameter size, respectively. And cannot be set too tiny in order to ensure the low error.

. With the approximate weights for as our baseline, which reach the overall optimal compression performance but with relatively higher error rate.

This setting aims at highspeed inference for CNN. In this case, computation achieves maximum reduction, but sparsity is hard to optimize because GA pays less attention to pruning FC layers which are not the main source of computation.

This setting aims at a CNN with low storage. In this case, we obtain the utmost sparsity and highlevel computation reduction simultaneously. Albeit CONV layers only play an unimportant role in the overall parameter size, it can also obtain the highlevel sparsity because of the tractability with coarse granularity pruning. Thus, can also indirectly facilitate computation reduction.

This setting aims at minimal performance loss. In this case, error curve is always at the low level resulting in that GA is conservative to pruning both CONV and FC layers. Hence, parameter and FLOPs curves are slower to fall compared with baseline.
Approach  Error:  Computation:  Sparsity:  Accuracy change 

LeNet Baseline [4, 5]  0.8%  100%  0%   
LNA [6]  0.7%    90.5%  +0.1% 
SSL [7]  0.9%  25.64%  75.1%  0.1% 
TSNN [8]  0.79%  13%  95.84%  0.01% 
SparseVD [9]  0.75%  45.66%  92.58%  +0.05% 
StructuredBP [10]  0.86%  9.53%  79.8%  0.06% 
Regularization [11]  1.0%  23.22%  99.14%  0.2% 
RA20.1 [12]  0.9%    97.7%  0.1% 
Ours:  
GA()  0.93%  6.22%  94.30%  0.13% 
GA()  0.84%  6.10%  71.63%  0.04% 
GA()  0.89%  9.00%  95.42%  0.09% 
GA()  0.81%  8.16%  91.00%  0.01% 
Compared with other approaches, albeit we do not obtain a minimal sparsity, our computation achieves outstanding reduction because of coarse granularity pruning. While some approaches with larger sparsity always employ fine granularity pruning, which is very tractable for facilitating sparsity but not essentially reducing the FLOPs of sparse weight tensors. Furthermore, our approach can perform a multiobjective tradeoff according to the actual requirements whereas stateoftheart approaches are unable to achieve this task.
4 Conclusion
We propose the heuristic GA to prune CNNs based on the multiobjective tradeoff, which can obtain a variety of desirable compression performances. Moreover, we develop a twostep pruning framework for evolutionary algorithms, which may open a door to introduce the biologicalinspired methodology to the field of CNNs pruning. As a future work, GA will be further investigated and improved to prune more largescale CNNs.
References
 [1] Harvey, I.: The microbial genetic algorithm. In: Proceedings of the 10th European conference on Advances in artificial life, pp. 126133 (2009)
 [2] Mao, H., Han, S., Pool, J., Li, W., Liu, X., Wang, Y., J. Dally, W.: Exploring the granularity of sparsity in convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, pp. 1927â1934 (2017)
 [3] MNIST dataset, http://yann.lecun.com/exdb/mnist/. Last accessed 23 Mar 2019
 [4] LeNet implementation by TensorFlow, https://github.com/tensorflow/models. Last accessed 23 Mar 2019
 [5] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradientbased learning applied to document recognition. Proc. IEEE 86(11), 2278â2324 (1998)
 [6] Srinivas, S., Babu, R.V.: Learning neural network architectures using backpropagation. In: Proceedings of the British Machine Vision Conference. BMVA Press (2016)
 [7] Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in neural information processing systems, pp. 2074â2082 (2016)
 [8] Srinivas, S., Subramanya, A., Babu, R.V.: Training sparse neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, pp. 455â462 (2017)
 [9] Dmitry, M., Arsenii, A., Dmitry, V.: Variational dropout sparsifies deep neural networks. In: 34th International Conference on Machine Learning, pp. 2498â2507 (2017)
 [10] Neklyudov, K., Molchanov, D., Ashukha, A., Vetrov, D.: Structured bayesian pruning via lognormal multiplicative noise. In: Advances in Neural Information Processing Systems, pp. 67756784 (2017)
 [11] Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through regularization. In: Proceedings of the International Conference on Learning Representations, ICLR (2018)
 [12] Dong, X., Liu, L., Li, G., Zhao, P., Feng, X.: Fast CNN pruning via redundancyaware training. In: 27th International Conference on Artificial Neural Networks, pp. 3–13 (2018). \doi10.1007/9783030014186_1