Towards Evolutional Compression

Towards Evolutional Compression

Yunhe Wang, Chang Xu, Jiayan Qiu, Chao Xu, Dacheng Tao
Key Laboratory of Machine Perception (MOE), Cooperative Medianet Innovation Center, School of EECS, Peking University
UBTech Sydney AI Centre, School of IT, FEIT, The University of Sydney, Australia
wangyunhe@pku.edu.cn, c.xu@sydney.edu.au, jqiu3225@uni.sydney.edu.au,
xuchao@cis.pku.edu.cn, dacheng.tao@sydney.edu.au,

Abstract

Compressing convolutional neural networks (CNNs) is essential for transferring the success of CNNs to a wide variety of applications to mobile devices. In contrast to directly recognizing subtle weights or filters as redundant in a given CNN, this paper presents an evolutionary method to automatically eliminate redundant convolution filters. We represent each compressed network as a binary individual of specific fitness. Then, the population is upgraded at each evolutionary iteration using genetic operations. As a result, an extremely compact CNN is generated using the fittest individual. In this approach, either large or small convolution filters can be redundant, and filters in the compressed network are more distinct. In addition, since the number of filters in each convolutional layer is reduced, the number of filter channels and the size of feature maps are also decreased, naturally improving both the compression and speed-up ratios. Experiments on benchmark deep CNN models suggest the superiority of the proposed algorithm over the state-of-the-art compression methods.

1 Introduction

Large-scale deep convolutional neural networks (CNNs) have been successfully applied to a wide variety of applications such as image classification [11, 22, 15], object detection [8, 20], and visual enhancement [6]. To strengthen representation capability and improve CNN performance, several convolutional layers have traditionally been used in network construction. Given the complex network architecture and numerous variables, most CNNs place excessive demands on storage and computational resources, thus limiting them to high-performance servers.

We are now in an era of intelligent mobile devices. Deep learning, one of the most promising artificial intelligence techniques, is expected to reduce reliance on servers and to apply advanced algorithms to smartphones, tablets, and wearable computers. Nevertheless, it remains challenging for mobile devices without GPUs and the necessary memory to carry CNNs usually run on servers. For instance, more than 232MB of memory and floating number multiplications would be consumed by AlexNet [15] or VGGNet [22] to process a single, normal-sized input image. Hence, special developments are required to translate CNNs to smartphones and other portable devices.

To overcome this conflict between reduced hardware configurations and the higher resource demands of CNNs, several attempts have been made to compress and speed up well-trained CNN models. Liu et al. developed to [17] learn CNNs with sparse architectures, thereby reducing model complexities compared to ordinary CNNs, while Han et al.  [10] directly discarded subtle weights in pre-trained CNNs to obtain sparse CNNs. Figurnov et al.  [7] reduced the computational cost of CNNs by masking the input data of convolutional layers. Wen et al.  [25] explored subtle connections from the perspective of channels and filters. In the DCT frequency domain, Wang et al.  [24] excavated redundancy on all weights and their underlying connections to deliver higher compression and speed-up ratios. In addition, a number of techniques exist to compress convolution filters in CNNs including weight pruning [10, 9], quantization and binarization [13, 3, 1, 2], and matrix decomposition [5].

While these methods have reduced the storage and computational burdens of CNNs, the research to deep model compression is still in its infancy. Existing solutions are typically grounded in different, albeit intuitive, assumptions of network redundancy, e.g. weight or filter redundancy with small absolute values, low-rank filter redundancy, and within-weight redundancy. Although these redundancy assumptions are valid, we hypothesize that all possible types of redundancy have yet to be identified and validated. We postulate that an approach can be developed to autonomously investigate the diverse and volatile network redundancies and to constantly upgrade the solution to cater for environment changes.

In this paper, we develop an evolutionary strategy to excavate and eliminate redundancy in deep neural networks. The network compression task can be formulated as a binary programming problem, where a binary variable is attached to each convolution filter to indicate whether or not the filter takes effect. Inspired by studies in evolutionary computation [4, 19, 14], we treat each binary encoding w.r.t. a compressed network as an individual and stack them to constitute the population. A series of evolutionary operators (e.g. crossover and mutation) allow the population to constantly evolve to reach the most competitive, compact, and accurate network architecture. When evaluating an individual, we use a relatively small subset of the original training dataset to fine-tune the compressed network, which quickly excavates its potential performance; therefore, the overall running times of compressing CNNs are acceptable. Experiments conducted on several benchmark CNNs demonstrate that compressed networks are more lightweight but have comparable accuracies to their original counterparts. Beyond conventional network redundancies, we suggest that convolutional filters with either large or small weights possess redundancies, the discrepancy between filters is appreciated, and that high-frequency coefficients of convolution filters are unnecessary (i.e. smooth filters are adequate).

Figure 1: An illustration of the evolution of LeNet on the MNIST dataset. Each dot represents an individual in the population, and the thirty best individuals are shown in each evolutional iteration. The fitness of individuals is gradually improved with an increasing number of iterations, implying that the network is more compact but remaining the same accuracy. The size of the original network is about 1.5MB.

2 An Evolutionary Method for Compressing CNNs

Most existing CNN compression methods are based on the consensus that weights or filters with subtle values have limited influence on the performance of the original network. In this section, we introduce an evolutionary algorithm to significantly excavate redundancy in CNNs and devise a novel compression method.

2.1 Molding Redundancy in Convolution Filters

Considering a convolutional neural network with convolutional layers , we define sets of convolution filters for these layers. For the -th convolutional layer, its filter is denoted as , where and are the height and width of filters, respectively, is the channel size, and is the number of filters in this layer. Given a training sample and the corresponding ground truth , the error of the network can be defined as , which could be, for example, softmax or Euclidean losses. The conventional weight pruning algorithm can be formulated as

(1)

where is a binary tensor for removing redundant weights in , is the -norm accumulating absolute values , i.e. the number in , is the element-wise product, and is the tradeoff parameter. A larger will make more sparse and so a network parameterized with will have fewer weights.

In general, Fcn. 1 is easy to solve if is a linear mapping of . However, neural networks are composed of a series of complex operations, such as pooling and ReLU, which increase the complexity of Fcn. 1. Therefore, a greedy strategy [9, 25] have been introduced to obtain a feasible solution that removes weights with small absolute values:

(2)

where is a threshold. This strategy is based on the intuitive assumption that small weights make subtle contributions to the calculation of the convolution response.

Although sparse filters learned by Fcn. 1 demand less storage and computational resources, the size of the feature maps produced by these filters does not change. For example, a convolutional layer with 10 filters will generate 10 feature maps for one input data before and after compression, which accounts for a large proportion of online memory usage. Moreover, these sparse filters usually need some additional techniques to support and speed-up their compression such as CuSparse kernel, CSR format, or the fixed-point multiplier [9]. Therefore, more flexible approaches [12, 25, 7] have been developed to directly discard redundant filters:

(3)

where denotes the -th filter in the -th convolutional layer. By directly removing convolution filters, network complexity can be significantly decreased. However, Fcn. 3 is also biased since the Frobenius norm of filters is not a reasonable redundancy indicator. For example, most of the weights in a filter for extracting edge information are very small. Thus, a more accurate approach for identifying redundancy in CNNs is urgently required.

2.2 Modeling Redundancy by Exploiting Evolutionary Algorithms

Instead of the greedy strategies shown in Fcns. 2 and 3, evolutionary algorithms such as the genetic algorithm (GA [4, 19]) and simulated annealing (SA [14]) have been widely applied to the NP-hard binary programming problem. A series of bit (0 or 1) strings (individuals) are used to represent possible solutions of the binary programming problem, and these individuals evolve using some pre-designed operations to maximize their fitnesses.

A binary variable can be attached to each weight in the CNN to indicate whether the weight takes effect or not, but a large number of binary variables will significantly slow down the CNN compression process, especially for sophisticated CNNs learned over large-scale datasets (e.g. ImageNet [21]). For instance, AlexNet [15] has eight convolutional layers with more than 32-floating weights in total, so it is infeasible to generate a population with hundreds of -dimensional individuals. In addition, as mentioned above, excavating redundancy in convolution filters itself produces a regular CNN model with less computational complexity and memory usage, which is more suitable for practical applications. Therefore, we propose to assign a binary bit to each convolution filter in a CNN, and these binary bits form an individual in this network. By doing so, the dimensionality of is tolerable, e.g. (without the last 1000 convolution filters corresponding to the 1000 classes in the ILSVRC 2012 dataset) for AlexNet.

During evolution, we use GA to constantly update individuals of greater fitness. Other evolutionary algorithms can be applied using a similar approach. The compression task has two objectives: preserving performance and removing the redundancy of the original networks. The fitness of a specific individual can therefore be defined as

(4)

where denotes the binary bit for the -th convolution filter in the given network, and is the number of all convolution filters in the network. calculates the classification loss of the network using compressed filters , which supposed as a general loss taken value from 0 to 1. In addition, we include a constant in Fcn. 4, which ensures for the convenience of calculating the probability of each individual in the evolutionary algorithm process. is the tradeoff parameter, and

(5)

where implies that the -th filter in the -th layer has been discarded, otherwise retained.

0:  An image dataset including images for evaluating individuals, a pre-trained network , parameters: the scale of the population , the maximum iteration number , , , , and .
1:  Randomly initialize the population , each individual is represented as a binary vector w.r.t. convolution filters in the given network ;
2:  for  to  do
3:     Calculate the fitness of each individual in using Fcn. 7;
4:     Calculate probability of being selected of individuals according to Fcn. 8;
5:     for  to  do
6:         the best individual in ;
7:        Generate a random value which follows a uniform distribution ;
8:        if  then
9:            a randomly selected parent according to Fcn. 8;
10:        else if  then
11:           Randomly select two parents and generate two offspring;
12:           Calculate fitnesses of offspring, the best offspring;
13:        else
14:            a randomly selected parent after applying XOR on a fragment;
15:        end if
16:     end for
17:  end for
18:  Use the optimal individual in to construct a compact neural network ;
18:  The compressed after fine-tuning.
Algorithm 1 ECS: Evolution method for compressing CNNs.

In addition, the last term in Fcn. 4 implicitly assumes that discarding every convolution filter makes an equivalent contribution to compression. However, the memory utilization of filters in different convolutional layers is different and related to the height , width , and the number of channel . Therefore, filter size must be taken into account, and Fcn. 4 can be reformulated as:

(6)

where , , are height, width, and channel number of filters in the -th convolutional layer, respectively. is the total number of weights in the network, which scales the last term in Fcn. 6 to .

In addition, the number of channels in the -th layer is usually set as the number of convolution filters ( for RGB color images) in the -th layer to make two consecutive network layers compatible. Instead of fixing in Fcn. 6, should various with . Thus, we reformulate the calculation of fitness as follows:

(7)

where , for the last layer consisting of nodes corresponding to different classes in a particular dataset. The second objective in Fcn. 7 accumulates the discarded weights of the compressed network. Since the error rate of a network tends to be influenced by adjusting the network architecture, we use a subset of the training data (10k images randomly extracted from the training set) to fine-tune the network weights and then re-calculate to provide a more reasonable evaluation. This fine-tuning is fast, since compressed networks with fewer filters require much less computation, e.g. fine-tuning over 10k images will cost about 2 seconds for LeNet and about 30 seconds for AlexNet, which is tolerable. Then, GA is deployed to discover the fittest individual through several evolutions detailed in the next section.

2.3 Genetic Algorithm for Optimization

GA can automatically search for compact neural networks by alternately evaluating the fitness of each individual in the whole population and executing operations on individuals. The population in the current iteration are regarded as parents, who breed another population as offspring using some representative operations, including selection, crossover, and mutation, with the expectation that the subsequent offsprings are fitter than the preceding parents. First, each individual is given a probability by comparing its fitness against those of other individuals in the current population:

(8)

where is the number of individuals in the population. Then, the above three operations will be randomly applied as follows:

Selection. Given a probability parameter , an individual is selected according to Fcn.  8 and then directly duplicated as an offspring. It is clear that compressed networks with higher accuracy and compression ratios will be preserved. The best individual in the parent population is usually inherited to preserve the optimal solution.

Crossover. Given a probability parameter , two selected parents according to Fcn. 8 will be crossed to generate two offspring as follows:

The objective of the crossover operation is to integrate excellent information from the parents. The fitness of two offspring are different, and we discard the weaker one.

Mutation. To promote population diversity, mutation randomly changes a fragment in the parent and produces an offspring. The conventional mutation operation for binary encoding is a XOR operation as follows:

The parent is also selected according to Fcn. 8, and the mutation operation is performed with a probability parameter . Since the scale of offspring is the same as that of parents, we have .

By iteratively employing these three genetic operations, the initial population will be updated efficiently until the maximum iteration number is achieved. After obtaining the individual with optimal fitness, we can reconstruct a compact CNN. Then, the fine-tuning strategy is adopted to enhance the performance of the compressed network. Alg. 1 summarizes the proposed evolutionary method for compression.

3 Analysis of Compression and Speed-up Improvements

In the above section, we presented the evolutionary method for compressing pre-trained CNN models. Since there is at least one convolution filter in the compressed network , it has the same depth but less filters in compared to the original network with . Here we further analyze the memory usage and computational cost of compressed CNNs using Alg. 1.

Speed-up ratio. For a given image, the -th convolutional layer produces feature maps through a set of convolution filters , where and are the height and width of feature maps, respectively, and . Since multiplications of 32-bit floating values are much more expensive than additions, and there is more computation in other auxiliary layers (e.g. pooling, ReLU, and batch normalization), speed-up ratios are usually calculated on these floating number multiplications [18, 24]. Considering the major computational cost, the speed-up ratio of the compressed network for this layer compared to the original network is

(9)

where is the number of filters in the -th convolutional layer of the compressed network, as shown in Fcn. 7. Besides the filter number of a layer, also has a greater impact on , suggesting that it is very difficult to directly find an optimal compact architecture of the original network. Moreover, excavating redundancy in the filter itself may be a more promising way to speed it up, e.g. if we discard half of the filters per layer, the speed-up ratio of the proposed method is about .

Compression ratio. The compression ratio on convolution filters is easy to calculate and is equal to the last term in Fcn. 7. Specifically, for the -th convolutional layer, the compression ratio of the proposed method is

(10)

However, besides the convolution filters, there are other memory usages that are often ignored in existing methods. In fact, the feature maps of different layers account for a large proportion of online memory. In some implementations [23], the feature maps of a layer are removed after they have been used to calculate the following layer to reduce the online memory usage. However, memory allocation and release are time consuming. In addition, short-cut layers are widely used in recent CNN models [11], in which previous feature maps are preserved for combination with other layers. Discarding redundant convolutional filters significantly reduces the memory usage of feature maps. For a given convolutional layer , the compression ratio of the proposed method on feature maps is

(11)

which is directly affected by the sparsity of . Accordingly, the memory to store the feature maps of other layers (e.g. pooling and ReLU) will be reduced simultaneously. The experimental results and a discussion of compression and speed-up ratios are presented in the following section.

4 Experiments

Baseline methods and Datasets. The proposed method was evaluated on four baseline CNN models: LeNet [16], AlexNet [15], VGGNet-16 [22], and ResNet-50 [11], and conducted using the MNIST handwritten digit and ILSVRC datasets. We used MatConvNet [23] and NVIDIA Titan X graphics cards to implement the proposed method. In addition, several state-of-the-art approaches were selected for comparison including P+QH (Pruning + Quantization and Huffman encoding) [9], SVD [5], XNOR-Net [18], and CNNpack [24].

LeNet on MNIST. The performance of the proposed method was first evaluated on a small network to study some of its characteristics. The network has four convolutional layers of size , , , and , respectively. The model was trained with batch normalization layers and the accuracy was 99.20%.

The proposed method has several parameters as shown in Alg. 1. Population was set to 1000 to ensure a sufficiently large search space, and the maximum iteration number was set to 100. Three probability parameters were empirically set to , , and  [4]. A larger makes the compressed network more compact, but the accuracy of the original network cannot be retained in the compressed counterpart. We tuned this parameter from to and set it to , which was the best trade-off between network accuracy and compression ratio, since the accuracy of the original network was very high and each individual could maintain considerable accuracy.

The evolutionary process for compressing the network is shown in Fig. 1. Individuals in the population are updated with higher fitness individuals after each iteration. As a result, we obtained a 103KB compressed network that consistutes of four convolutional layers: , , , and , respectively. The model accuracy after fine-tuning was 99.20%, i.e. there was no decrease in accuracy. Compression and speed-up ratios of the entire network were , , and .

Furthermore, for fair comparison, we directly initialized a network with the same architecture and tuned it on MNIST. Unfortunately, the accuracy of this network was only , significantly lower than that of the original network since it cannot inherit pre-trained convolution filters of the original network. Moreover, the accuracy of a network randomly generated with a similar number of filters was about , demonstrating that the proposed method provides an effective architecture for constructing a portable network and inherits useful information from the original network.

Figure 2: Convolution filters learned on MNIST. From top to bottom: the original convolution filters, filters after applying the proposed method, and filters after fine-tuning.

Filter visualization. The proposed method excavates redundant convolution filters using an evolutionary algorithm, which is different to the existing weight or filter pruning approaches. Therefore, it is necessary to explore which filters are recognized as redundant and which convolution filters are optimal for CNNs. To this end, we visualized the LeNet filters on MNIST before and after applying the proposed method, as shown in Fig. 2.

The result shown in the second row of Fig. 2 is particularly interesting. Our method not only discards small filters but also removes some filters with large weights. Of note, the remaining filters after fine-tuning are even more distinct. The average Euclidean distance of filters in the third row is 0.2428, while the average cosine similarity of filters in the first and second rows are 0.1927 and 0.1789, respectively. This observation shows that redundancy can exist in either large or small convolution filters, and similar filters may be redundant and non-contributory to the entire CNN, providing a strong a priori rationale for designing and learning CNN architectures. In addition, the filters shown in the third line are obviously smoother than those in the original network, demonstrating the feasibility of compressing CNNs in the frequency domain as discussed in [24].

Compressing convolutional networks on ImageNet. We next employed the proposed evolutionary method (namely ECS) on ImageNet ILSVRC 2012 [21], which contains 1.2 millions images for training and 50k images for testing. We examined three mainstream CNN architectures: AlexNet [15], VGGNet-16 [22], and ResNet-50 [11]. There are over 61M parameters in AlexNet and over 138M weights in VGGNet-16. ResNet-50 has about 25M parameters, which is more compact than the previous two CNNs. The top-5 accuracies of these three models were 80.8%, 90.1%, 92.9%, respectively.

Since the accuracy of convolutional networks on ImageNet was much harder to maintain, we adjusted to allow individuals higher accuracy evolution. The compression and speed-up ratios of the proposed methods on the three CNNs are shown in Table 1. In addition, architectures of compressed AlexNet and VGGNet-16 are shown in Tab. 3 and Tab. 3, and detailed compression and speed-up results of ResNet-50 are shown in Fig. 3.

Model Eval. Orig. P+QH [9] SVD [5] Perfor. [7] CNNpack [24] ECS
AlexNet [15] 1 35 5 1.7 39 5.00
1 - 2 2 25 3.34
- 41.8% 42.7% 44.0% 44.7% 41.6% 41.9%
- 19.2% 19.7% 20.5% - 19.2% 19.2%
VGGNet [22] 1 49 - 1.7 46 8.81
1 3.5 - 1.9 9.4 5.88
- 28.5% 31.1% - 31.0% 29.7% 29.5%
- 9.9% 10.9% - - 10.4% 10.2%
ResNet [11] 1 - - - 12.2 4.10
1 - - - 4 3.83
- 24.6% - - - - 25.1%
- 7.7% - - - 7.8% 8.0%
Table 1: An overall comparison between state-of-the-art CNN compression methods and the proposed evolutionary compression scheme (ECS) on the ILSVRC2012 dataset. The overall compression and speed-up ratios are denoted and , respectively.
Layer Num of Weights Memory Num of New Weights Memory
conv1 1.24MB 0.08MB
conv2 1.88MB 0.32MB
conv3 3.62MB 0.78MB
conv4 2.78MB 0.61MB
conv5 1.85MB 0.46MB
fc6 144MB 27.41MB 5.25
fc7 64MB 9.77MB 6.55
fc8 15.62MB 7.05MB 2.22
Total 60954656 232.52MB 12186444 46.48MB 5.00
Table 3: Compression statistics for VGG-16 Net.
Layer Num of Weights Memory Num of New Weights Memory
conv1_1 12.26MB 0.001MB 5.33
conv1_2 12.39MB 0.01MB 12.19
conv2_1 6.41MB 0.05MB 5.13
conv2_2 6.69MB 0.12MB 4.71
conv3_1 4.19MB 0.28MB 4.04
conv3_2 5.31MB 0.58MB 3.88
conv3_3 5.31MB 0.60MB 3.77
conv4_1 6.03MB 0.91MB 4.93
conv4_2 10.53MB 0.79MB 11.36
conv4_3 10.53MB 1.31MB 6.88
conv5_1 9.38MB 0.79MB 11.38
conv5_2 9.38MB 0.15MB 58.72
conv5_3 9.38MB 0.26MB 34.66
fc6 392MB 52.45MB 7.47
fc7 64MB 1.09MB 58.36
fc8 15.62MB 0.48MB 32.77
Total 579.46MB 59.88MB 8.81
Table 2: Compression statistics for AlexNet.

(a) Compression ratio of all convolution filters ().

(b) Speed-up ratio of all convolutional layers ().

Figure 3: Compression statistics for ResNet-50.

A detailed comparison of the above three benchmark deep CNN models between the proposed method and state-of-the-art methods can also be found in Table 1. Since the convolution filters learned by ECS in the compressed networks are significantly reduced and the number of channels per layer is simultaneously decreased, we obtained higher speed-up ratios on these CNNs according to Fcn. 9. The compression ratios of the proposed method were lower than those of the other approaches, because the weights of the compressed networks are still stored in 32-bit floating numbers, which can be directly implemented in most existing devices. For fair comparison, we used 8-bit floating numbers to quantize followed by a fine-tuning process to further enhance compression performance, suggested by [24]. The results are encouraging, with the compressed networks with 8-bit floating numbers having the same performance as their 32-bit versions. Therefore, the compression ratios of the proposed method should be multiplied by at least a factor of , e.g. we can obtain an about 16 compression ratio on the ResNet-50. This evaluation can be further increased by exploiting sparse shrinkage and Huffman encoding; however, these strategies have no contribution to online inference so we did not apply them here. When considering online inferences of deep CNNs, the proposed method is clearly the best approach for compressing convolutional networks.

Compression ratios on feature maps. As discussed in Fcn. 11, the compression ratio on CNN feature maps is also an important metric for evaluating compression methods, but it is ignored in most existing approaches. Therefore, the compression ratios on feature maps of these methods are both equal to 1 such as [5, 24, 9].

Model LeNet AlexNet VGGNet-16 ResNet-50
Original memory 0.07 MB 5.49 MB 109.26 MB 137.25 MB
Compression ratio 2.42 1.88 2.54 1.86
Table 4: Compression ratios of feature maps on different CNN models.

The compression ratios on feature maps of the proposed method on different CNNs are shown in Table 4. It is clear that the compressed networks are more portable and can be directly used for online inference without any additional technical support since they are still regular CNN models.

5 Discussion and Conclusions

CNNs with higher performance and portable architectures are urgently required for mobile devices. This paper presents an effective CNN compression technique using an evolutionary algorithm. Compared to state-of-the-art methods, we no longer directly recognize some weights or filters as redundant. The proposed method identifies redundant convolution filters by iteratively refining a certain number of networks before learning a compressed network with significantly fewer parameters. Our experiments show that the proposed method can achieve significant compression and speed-up ratios and retain the classification accuracy of the original neural network. Moreover, the network compressed by the proposed approach is still a regular CNN that can be directly used for online inference without any decoding.

References

  • [1] Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. Provable bounds for learning some deep representations. ICML, 2014.
  • [2] Matthieu Courbariaux and Yoshua Bengio. Binarynet: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016.
  • [3] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre Binaryconnect David. Training deep neural networks with binary weights during propagations. arXiv preprint arXiv:1511.00363, 2015.
  • [4] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197, 2002.
  • [5] Emily L Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. In NIPS, 2014.
  • [6] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image super-resolution using deep convolutional networks. IEEE TPAMI, 38(2):295–307, 2016.
  • [7] Michael Figurnov, Dmitry Vetrov, and Pushmeet Kohli. Perforatedcnns: Acceleration through elimination of redundant convolutions. NIPS, 2016.
  • [8] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
  • [9] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR, 2016.
  • [10] Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In NIPS, 2015.
  • [11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.
  • [12] Hengyuan Hu, Rui Peng, Yu-Wing Tai, and Chi-Keung Tang. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.
  • [13] Kyuyeon Hwang and Wonyong Sung. Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1. In IEEE Workshop on Signal Processing Systems, 2014.
  • [14] Scott Kirkpatrick, C Daniel Gelatt, Mario P Vecchi, et al. Optimization by simulated annealing. science, 220(4598):671–680, 1983.
  • [15] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
  • [16] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • [17] Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. Sparse convolutional neural networks. In CVPR, 2015.
  • [18] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. arXiv preprint arXiv:1603.05279, 2016.
  • [19] Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Quoc Le, and Alex Kurakin. Large-scale evolution of image classifiers. arXiv preprint arXiv:1703.01041, 2017.
  • [20] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2015.
  • [21] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 115(3):211–252, 2015.
  • [22] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. ICLR, 2015.
  • [23] Andrea Vedaldi and Karel Lenc. Matconvnet: Convolutional neural networks for matlab. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, 2015.
  • [24] Yunhe Wang, Chang Xu, Shan You, Dacheng Tao, and Chao Xu. Cnnpack: Packing convolutional neural networks in the frequency domain. In NIPS, 2016.
  • [25] Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. In NIPS, 2016.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
10029
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description