S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search
Recently, dynamic inference has emerged as a promising way to reduce the computational cost of deep convolutional neural networks (CNNs). In contrast to static methods (e.g., weight pruning), dynamic inference adaptively adjusts the inference process according to each input sample, which can considerably reduce the computational cost on “easy” samples while maintaining the overall model performance.
In this paper, we introduce a general framework, S2DNAS, which can transform various static CNN models to support dynamic inference via neural architecture search. To this end, based on a given CNN model, we first generate a CNN architecture space in which each architecture is a multi-stage CNN generated from the given model using some predefined transformations. Then, we propose a reinforcement learning based approach to automatically search for the optimal CNN architecture in the generated space. At last, with the searched multi-stage network, we can perform dynamic inference by adaptively choosing a stage to evaluate for each sample. Unlike previous works that introduce irregular computations or complex controllers in the inference or re-design a CNN model from scratch, our method can generalize to most of the popular CNN architectures and the searched dynamic network can be directly deployed using existing deep learning frameworks in various hardware devices.
In the past years, deep convolutional neural networks (CNNs) have gained great success in many computer vision tasks, such as image classification [21, 13, 18], object detection [35, 33, 30], and image segmentation [12, 4]. However, the remarkable performance of CNNs always comes with huge computational cost, which impedes their deployment in resource constrained hardware devices. Thus, various methods have been proposed to improve computational efficiency of the CNN inference, including network pruning [22, 11, 23] and weight quantization [9, 45, 32]. Most of the previous methods are static approaches, which use fixed computation graphs for all test samples.
Recently, dynamic inference has emerged as a promising alternative to speed up the CNN inference by dynamically changing the computation graph according to each input sample [40, 2, 6, 5, 46, 17, 29, 8]. The basic idea is to allocate less computation for “easy” samples while more computation for “hard” ones. As a result, the dynamic inference can considerably save the computational cost of “easy” samples without sacrificing the overall model performance. Moreover, the dynamic inference can naturally exploit the trade-off between accuracy and computational cost to meet varying requirements (e.g., computational budget) in real-world scenarios.
To enable the dynamic inference of a CNN model, most previous works aim to develop dedicated strategies to dynamically skip some computation operations during the CNN inference according to different input samples. To achieve this goal, these works attempted to add extra controllers in-between the original model to select which computations are executed. For example, well-designed gate-functions were proposed as the controller to select a subset of channels or pixels for the subsequent computation of the convolution layer [8, 5, 15]. However, these methods lead to irregular computation at channel level or spatial level, which are not efficiently supported by existing software and hardware devices [43, 47, 16]. To address this issue, a more aggressive strategy that dynamically skips whole layers was proposed for efficient inference [46, 42, 41]. Unfortunately, this strategy can only be applied to the CNN model with residual connection . Moreover, the controllers of some methods comes with a considerable complex structure, which cause the increase of the overall computational cost in the inference (see experimental results in Section 4).
To mitigate these problems, researchers propose early exiting the “easy” input samples at inference time [31, 40, 17, 1]. A typical solution is to add intermediate prediction layers at multiple layers of a normal CNN model, and then exit the inference when the confidence score of the intermediate classifier is higher than a given threshold. Figure 1\colorreda shows the paradigm of these early exiting methods [31, 1]. In this paradigm, prediction layers are directly added in-between the original network and the network is split into multiple stages along the layer depth. However, these solutions face the challenge that early classifiers are unable to leverage the semantic-level features produced by the deeper layers. It may cause a significant accuracy drop .
As illustrated in Figure 1\colorreda, three prediction layers are added to different depth of the network.
Thus, the classifier
Huang et al.  proposed a novel CNN model, called MSDNet, for solving this issue. The core design of MSDNet is a two-dimensional multi-scale architecture that maintains the coarse and fine level features in every layer as shown in Figure 1\colorredb. Based on this design, MSDNet can leverage the semantic-level features in every prediction layer and achieve the best result. However, MSDNet needs to design specialized network architecture, which cannot generalize to other CNN models and needs massive expertise in architecture design.
To solve the aforementioned issue without designing CNNs from scratch, we propose to transform a given CNN model into a channel-wise multi-stage network, which comes with the advantage that the classifier in the early stages can leverage the semantic-level features. Figure 1\colorredc intuitively demonstrates the idea behind our method. Different from the normal paradigm in Figure 1\colorreda, our method split the original network into multiple stages along the channel width. The prediction layers are added only to the last convolutional layer, thus all classifiers can leverage the semantic-level features. To reduce the computational cost of the classifiers in the early stages, we propose to cut down the number of channels of each layer in different stages (more details can be found in Section 3).
Based on the high-level idea introduced above, we present a general framework called S2DNAS. Given a specific CNN model, the framework can automatically generate the dynamic model following the paradigm showed in Figure 1\colorredc. S2DNAS consists of two components: S2D and NAS. First, the component S2D, which means “static to dynamic”, is used to generate a CNN model space based on the given model. This space comprises of different multi-stage CNN networks generated from the given model based on the predefined transformations. Then, NAS is used to search for the optimal model in the generated space with the help of reinforcement learning. Specifically, we devise an RNN to decide the setting of each transformation for generating the model. To exploit trade-off between accuracy and computational cost, we design a reward function that can reflect both the classification accuracy and the computational cost inspired by the prior works [39, 14, 44]. We then use a policy-gradient based algorithm  to train the RNN. The RNN will generate better CNN models with reinforcement learning and we can further use the searched model for dynamic inference.
To verify the effectiveness of S2DNAS, we perform extensive experiments by applying our method to various CNN models. With a comparable model accuracy, our method can achieve further computation reduction in contrast to the previous works for dynamic inference.
2 Related Work
Static Method for Efficient CNN Inference. Numerous methods are proposed for improving the efficiency of CNN inference. Two representative research directions are network pruning [22, 11, 10, 23] and quantization [9, 45, 26, 32]. Specifically, network pruning aims to remove redundant weights in a well-trained CNN without sacrificing the model accuracy. In contrast, network quantization aims to reduce the bit-width of both activations and weights. Most works in the above two directions are static, which refers to using the same computation graph for all test samples. Next, we introduce an emerging direction of utilizing dynamic inference for improving the efficiency of CNN inference.
Dynamic Inference. Dynamic inference also refers to adaptive inference in previous works [41, 24]. Most previous works aim to develop dedicated strategies to dynamically skip some computation during inference. They attempted to add extra controllers to select which computations are executed [5, 3, 34, 25, 15, 8, 46, 41, 42]. Dong et al.  proposed to compute the spatial attention using extra convolutional layers then skipping the computation of inactive pixels. Gao et al.  proposed to compute the importance of each channel then skipping the computation of those unimportant channels. However, these methods lead to irregular computation at channel level or spatial level, which is not efficiently supported by existing deep learning frameworks and hardware devices. To address this issue, a more aggressive strategy that dynamically skips the whole layers or blocks is proposed [46, 41, 42]. For example, BlockDrop  introduced a policy network to decide which layers should be skipped. Unfortunately, this strategy can only be applied to the CNN model with residual connection. Moreover, these methods introduce extra controllers into the computational graph, the computational cost will remain the same or even increase in some cases. On the other hand, early exiting methods propose to divide a CNN model into multiple stages and exit the inference of “easy” samples in the early stages [31, 40, 17, 1]. The state-of-the-art is MSDNet  in which the authors manually design a novel multi-stage network architecture to serve the purpose of dynamic inference.
Neural Architecture Search. Recently, neural architecture search (NAS) has emerged as a promising direction to automatically design the network architecture to meet varying requirements of different tasks [48, 49, 14, 27, 44, 28]. There are two typical types of works in this research direction, RL-based searching algorithms  and differentiable searching algorithms . In this paper, according to the formulation of our specific problem, we choose the RL-based searching algorithm to search for the optimal model in a design space.
3 Our Approach
3.1 Overview of S2DNAS
The overview of S2DNAS is depicted in Figure 2. At a high level, S2DNAS can be divided into two components, namely, S2D and NAS. Here, S2D means “static-to-dynamic”, which is used to generate a search space comprises of dynamic models based on a given static CNN model. Specifically, we define two transformations and then apply the transformations to the original model for generating different dynamic models in the search space. Each of these dynamic models is a multi-stage CNN that can be directly used for dynamic inference. All these generated models form the search space. Once the search space is generated, NAS searches for the optimal model in the space. In what follows, we will give the details of these two components.
3.2 The Details of S2d
Given a CNN model , the goal of S2D is to generate the search space which consists of different dynamic models transformed from . Each network in is a multi-stage CNN model in which each stage contains one classifier. These multi-stage CNNs can be generated from using two transformations, namely, split and concat. First, we propose split to split the original model along the channel width as Figure 3 shows. Specifically, we divide the input channels in each layer of the original model into different subsets. And each classifier can use features from different subsets for prediction. The prediction can be done by adding a prediction layer (shown as yellow squares in Figure 3). Moreover, to enhance the feature interactions between different stages for further performance boost, we propose concat to enforce the classifier in the current stage to reuse the features from previous stages. Next, we will present the details of these two transformations, split and concat. Before that, we first present some basic notations.
Notation. We start with the notation of a normal convolutional layer. Taking the -th layer of a deep CNN as an example, the input of the -th layer is denoted as , where is the number of input channels and is the -th feature map with a resolution of . We denote the weights as , where is the number of output channels and ( is the kernel size). In the following parts, we will present two transformations that can be applied to the original model. The goal of the transformations is to transform a static CNN model to a multi-stage model, which can be represented as , where is the classifier in the -th stage. Next, we will introduce the details of the proposed two transformations.
Split. The split transformation is responsible for assigning different subsets of the input channels to the classifiers in different stages. We denote the number of stages as . A direct way is splitting the input channels into subsets and allocating the -th subset to the classifier in the -th stage. However, this splitting method results in a considerable large search space which poses the obstacle to the subsequent search process (i.e., NAS). In order to reduce the search space generated by this transformation, we propose to first divide the input channels into groups and then assign these groups to different classifiers.
Specifically, we first evenly divide the input channels
Concat. The concat transformation is used for enhancing the interaction between different stages.
The basic idea is to enable the classifiers in later stages to reuse the features from previous stages. Formally, we use indicator matrices to indicate whether to enable the feature reuse at different positions.
Here denotes the -th layer and is the depth
Architecture Search Space. Based on the above two transformations, we can generate the search space by transforming the original CNN model. Specifically, there are two adjustable settings for the two transformations, splitting points and indicator matrices. Adjusting the splitting points will change the way to assign the feature groups, which is used for the trade-off between accuracy and computational cost of different classifiers. For example, we can assign more features to the early stages for improving the model performance on “easy” samples. Adjusting the indicator matrices accompanies the change of the feature reuse strategy. To reduce the size of the search space, we restrict the feature layers with the same resolution to use the same split and concat settings in our experiments. Through changing these two settings, we can generate the search space which consists of different multi-stage models. In the following section, we will demonstrate how to search for the optimal model in the generated space.
3.3 The Details of Nas
Once we obtain the search space from the above procedure of S2D, the goal of NAS is to find the optimal model with high accuracy and low computational cost. Note that the model is jointly determined by the settings of the above two transformations, i.e., the split points and the indicator matrices. With a slight abuse of notation, we also refer the architecture as these two settings and denote as the space which consists of these different settings. Thus the optimization goal reduces to search for the optimal settings of the proposed transformations which can maximize our predefined metric (see details in the following section).
However, searching the optimal setting is nontrivial due to the huge search space . For example, in our experiment on MobileNetV2 , the size of the search space is around . Motivated by the recent progress in neural architecture search (NAS) [48, 49, 14, 39], we propose to use a policy gradient based reinforcement learning algorithm for searching. The goal of the algorithm is to optimize the policy which further proceeds the optimal model. This process can be formulated into a nested optimization problem:
where is the corresponding weights of the model and is the policy which generates the settings of the transformations. and denote the validation and training datasets, respectively. And is the reward function for evaluating the quality of the multi-stage model.
To solve the nested optimization problem in Equation 1, we need to solve two sub-problems, namely, optimizing when is given and optimizing when the architecture is given. We first present how to optimize the policy when is given.
Optimization of the Transformation Settings. Similar to previous works [48, 49], we use a customized recurrent neural network (RNN) to generate the distribution of different transformation settings for each layer of the CNN model. Then a policy gradient based algorithm  is used for optimizing the parameters of the RNN to maximize the expected reward, which is defined in Equation 2. Specifically, the reward in our paper is defined as a weighted product considering both the accuracy and the computational cost:
where is the accuracy of the multi-stage model on the dataset . The is the average computational cost over the samples of the dataset using dynamic inference.
For a fair comparison with other works of dynamic inference, we use FLOPs
Optimization of the Multi-stage CNN. The inner optimization problem (i.e., solving for ) can be solved using the gradient descent algorithm. Specifically, we modify the normal classification loss function (i.e., cross-entropy function) for the case of training multi-stage models. Formally, the loss function is defined as:
Here, CE denotes the cross-entropy function. The optimization of the above equation can be regarded as jointly optimizing all the classifiers in different stages. The optimization can be implemented using stochastic gradient descent (SGD) and its variants. We use the optimized for assessing the quality of the model generated by the RNN, which can be further used for optimizing the RNN. In practice, to reduce the search time, following the previous work , we approximate by updating it for only several training epochs, without solving the inner optimization problem completely by training the network until convergence.
Dynamic Inference of the Searched CNN.
Once the optimal multi-stage model is found, we can directly perform dynamic inference using it. Specifically, we set a predefined threshold for each stage. Formally, the threshold of the -th stage is set to . Then, we can use these thresholds to decide at which stage that the inference should stop. Specifically, given a input sample , the inference stops at the -th stage when the -th classifier outputs a top-1 confidence score , here, .
To verify the effectiveness of S2DNAS, we compare it with different dynamic inference methods on different CNN models. Our experiments have covered a wide range of previous methods of dynamic inference [5, 46, 31, 40]. We also evaluate different aspects of S2DNAS, which are presented in the discussion part.
4.1 Experiment Settings
Model Setup. In our experiments, we conduct experiments on three CNN architectures: ResNet , VGG , and MobileNetV2 
Training Details. The CIFAR  dataset contains 50k training images and 10k test images. We randomly choose 5k images from the training images as the validation dataset and leave the other 45k images as the training dataset. We use the same input preprocessing for both CIFAR-10 and CIFAR-100. To be specific, the training images are zero-padded with 4 pixels and then randomly cropped to 32x32 resolution. The randomly horizontal flip is used for data augmentation.
For the training of the RNN, the PPO algorithm  is used. And we use Adam  as the optimizer to perform the parameter update in RNN. The details of the hyper-parameters settings can be found in the appendix. For the training of the multi-stage model, we use SGD as the optimizer. The momentum is set to . The initial learning rate is set to and the learning rate is divided by a factor of at and of the total epochs. More details of the training settings of different models can be found in the appendix.
For the hyper-parameters of S2DNAS, we set the group number for every layer. And we set the number of stages . For comparing with MSDNet which contains 5 stages, we set the for performing S2DNAS on the devised model. The in Equation 2 is set to and all of the in Equation 3 is set to for all experiments.
4.2 Classification Results
In this part, we compare our method with other methods of dynamic inference. To give a comprehensive study of our method, we have covered a wide range of methods, including LCCL , BlockDrop , Naive  and BranchyNet . We conduct experiments on two widely-used image classification benchmarks, CIFAR-10 and CIFAR-100. To show the effectiveness of S2DNAS in reducing the computational cost of CNN models with different architectures, we apply S2DNAS to five typical CNNs with various depth, width, and sub-structures.
The overall results are shown in Table 1. Note that different thresholds ( defined in the previous section) lead to different trade-offs between model accuracy and the computational cost. In our experiments, we chose the threshold which leads to the highest reward on the validation dataset. We also provide further results of using different thresholds in the discussion subsection.
As shown in Table 1, for most of the architectures and tasks, our method (denoted as S2DNAS in Table 1) can significantly reduce the computational cost with comparable accuracy with the original CNN model. As mentioned above, we use average FLOPs on the whole test dataset as the metric to measure the computational cost of a given CNN model. For ResNet-20 on CIFAR-10, S2DNAS has reduced the computation cost of the original net from M to M without the accuracy drop (even with a slight increase as shown in Table 1), which shows a relative cost reduction of .
Our method also shows improvements over other methods for dynamic inference in terms of computational cost reduction. We have reproduced the previous works on these CNN models for comparison. We have also implemented a normal early exiting solution (marked as Naive in Table 1), i.e., directly adding prediction layers (i.e., global average pooling and fully-connected layers) at the intermediate layers of the original models. For example, for ResNet-20 on CIFAR-10, compared with BranchyNet , our method has achieved a slight accuracy improvement (from to ) with more computational cost reduction.
One interesting observation is that some methods even cause an increase in computational cost. For example, BlockDrop boosts the FLOPs of the original net about . We infer that this is caused by the controller with high computational cost introduced by BlockDrop in the inference process . We also notice that some of the previous works can not be used for the network without residual connection. For instance, BlockDrop cannot be applied to VGG16-BN. In contrast, our method can generalize to CNN without residual connection. From Table 1, our method can reduce the computational cost of the original VGG16-BN net by with a slight accuracy drop.
Comparison to MSDNet. As mentioned in the introduction section, there is a recent work that proposed a specialized CNN named MSDNet for dynamic inference. Since the method cannot directly be applied to general CNN models, thus for comparison with MSDNet, we design a DenseNet-like  model based on the prior work , which has similar structure with MSDNet. More details of the devised model can be found in the appendix. We then apply S2DNAS to it and generate the dynamic models. The results are plotted in Figure 5. The varying FLOPs metrics of the x-coordinate can be obtained by adjusting the thresholds of each classifier of the dynamic CNN models. As Figure 5 shows, in most cases, our method can achieve similar accuracy-computation trade-offs. In the case of CIFAR-10, MSDNet outperforms our method when FLOPs is relative to 15M. However, the superiority of MSDNet comes with the cost of manually designing the CNN architecture. In contrast, as Table 1 shows, our method can be applied to various general CNN models.
Here, we present some discussions on our method for providing further insights.
Trade-off of Accuracy and Computational Cost. A key hyper-parameter of dynamic inference is the threshold setting , where is the number of stages. When the model is trained, different threshold settings lead to different trade-offs between the accuracy and the computational cost. To demonstrate how the threshold affects the final model performances, we conduct experiments with different thresholds and plot the results in Figure 6. All these results show the trend that the increase of computational cost leads to a performance boost. Thus, for practical use, we can set the threshold based on the computational budget of the given hardware device. Moreover, this property also helps to solve the anytime prediction task proposed in the prior work .
Difficulty Distribution of Test Dataset. The basic idea of our method is early exiting “easy” samples from the early stages.
In this part, we give the statistics of all the samples in the test dataset (=3, i.e., there are three stages in the trained model). As shown in Table 2, for ResNet-20 on CIFAR-10/100, the inference process of about test samples exits from the first two stages. As a result, S2DNAS can considerably reduce the average computation cost. Further, we observe that the accuracy of the classifier in the first stage
In this paper, we present a general framework called S2DNAS, for transforming various static CNN models into multi-stage models to support dynamic inference. Empirically, our method can be applied to various CNN models to reduce the computational cost, without sacrificing model performance. In contrast to previous methods for dynamic inference, our method comes with two advantages: (1) With our method, we can obtain a dynamic model generated from an existing CNN model instead of manually re-designing a new CNN architecture. (2) The inference of the generated dynamic model does not introduce irregular computations or complex controllers. Thus the generated model can be easily deployed on various hardware devices using existing deep learning frameworks.
These advantages are appealing for deploying a given CNN model into hardware devices with limited computational resources. To be specific, we can first use S2DNAS to transform the given model into the dynamic one then deploy it on the hardware devices. Moreover, our method is orthogonal to previous pruning/quantization methods, which can further reduce the computational cost of the given CNN model. All these properties of our method imply a wide range of application scenarios where the efficient CNN inference is desired.
Appendix A Details of RNN Model and its Optimization
The RNN model contains a GRU layer with 64 hidden units, predictors and an embedding layer. The predictors are used to output the probabilities of different transformation settings (split settings and concat settings) for different layers. The number of different split settings in a layer is the combination number , in which is the number of groups and is the number of stages. Thus the predictor (contains a fully-connected layer and a softmax function) is used to predict the probabilities of selecting the settings. The number of different concat locations is and we use the predictor (contains a fully-connected layer and a logistic function) to predict the probability for each concat locations. Finally, the embedding layer turns the sampled settings of the previous step into dense vectors of fixed size as the input of the GRU layer.
We employ Proximal Policy Optimization (PPO) to optimize the parameters of the RNN. Adam is used for optimizing the parameters of the RNN model, with a learning rate of 0.001. The number of epochs for PPO is set to 4, the clip parameter is set to 0.1, the mini-batch size is set to 4, the coefficient of value function loss is set to 0.5 and the entropy coefficient is set to 0.01.
Appendix B Details of DenseNet-like Model
To compare with MSDNet, a DenseNet-like model is devised. Specifically, we modify the DenseNet-BC (k=8, depth=100) by doubling the growth rate after each transition layer and modify the convolution in the bottleneck layers of DenseNet by halving the number of output channels. We denote it as DensNet*.
Appendix C Details of Training Settings
During network architecture search, we optimize the multi-stage model for 6 epochs using the training dataset to approximate . 10k models are sampled from the architecture search space for each experiment. Then the models with top-10 rewards are used to apply the full training. Table 3 demonstrates the hyper-parameters of the full training for different architectures. The scheme of learning rate warm-up is used for 100 iterations.
|Model||Datasets||Batch size||Training epochs||Weight decay|
Appendix D Demonstration of a Searched Multi-stage Model
Table 4 demonstrates the structure of searched multi-stage ResNet-56 model on CIFAR-10. From this table, we can see that each layer is split into three stages and each stage contains a subset of the original channels. Different stages are concated at different layers thus the feature maps generated by previous layers are reused in the later stages. The accumulated FLOPs increase from 21M to 90M for the three stages. As a result, we can save a lot of computation cost when the ”easy” input samples are stopped at stage 1 and stage 2.
|layer name||output size||stage 1||stage 2||stage 3||concat settings|
|conv1||, 6||, 2||, 8|
- In this paper, the classifier refers to the whole sub-network in the current stage.
- We do not split the input layer.
- Omit the batch normalization and pooling layers.
- Refer to the last layer of the classifier for prediction.
- Here, we regard one multiply-accumulate (MAC) as one floating-point operation (FLOP).
- We use the batch normalization after each convolution layer in VGG and change the stride of the first convolution layer in MobileNetV2 from 2 to 1 for CIFAR.
- Here, we only consider samples that exit from this stage.
- (2019) Dynamically sacrificing accuracy for reduced computation: cascaded inference based on softmax confidence. In Artificial Neural Networks and Machine Learning - ICANN 2019: Deep Learning - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17-19, 2019, Proceedings, Part II, pp. 306–320. External Links: Cited by: §1, §2.
- (2017) Adaptive neural networks for efficient inference. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp. 527–536. External Links: Cited by: §1, §4.3.
- (2019) SeerNet: predicting convolutional neural network feature-map sparsity through low-bit quantization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 11216–11225. External Links: Cited by: §2.
- (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40 (4), pp. 834–848. External Links: Cited by: §1.
- (2017) More is less: A more complicated network with less inference complexity. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 1895–1903. External Links: Cited by: §1, §1, §2, §4.2, §4.3, §4.
- (2017) Spatially adaptive computation time for residual networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 1790–1799. External Links: Cited by: §1.
- (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. Cited by: §3.3.
- (2019) Dynamic channel pruning: feature boosting and suppression. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, External Links: Cited by: §1, §1, §2, §4.3.
- (2015) Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pp. 1737–1746. External Links: Cited by: §1, §2.
- (2016) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, External Links: Cited by: §2.
- (2015) Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp. 1135–1143. External Links: Cited by: §1, §2.
- (2017) Mask R-CNN. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2980–2988. External Links: Cited by: §1.
- (2016) Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. External Links: Cited by: §1, §1, §4.1.
- (2018) AMC: automl for model compression and acceleration on mobile devices. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VII, pp. 815–832. External Links: Cited by: §1, §2, §3.3.
- (2018) Channel gating neural networks. CoRR abs/1805.12549. External Links: Cited by: §1, §2.
- (2019) Boosting the performance of CNN accelerators with dynamic fine-grained channel gating. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019., pp. 139–150. External Links: Cited by: §1.
- (2018) Multi-scale dense networks for resource efficient image classification. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Cited by: §1, §1, §1, §2, §4.2, §4.3, §4.3.
- (2017) Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. External Links: Cited by: §1, §4.2.
- (2015) Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, External Links: Cited by: §4.1.
- (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: §4.1.
- (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pp. 1106–1114. External Links: Cited by: §1.
- (1989) Optimal brain damage. In Advances in Neural Information Processing Systems 2, [NIPS Conference, Denver, Colorado, USA, November 27-30, 1989], pp. 598–605. External Links: Cited by: §1, §2.
- (2017) Pruning filters for efficient convnets. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, External Links: Cited by: §1, §2.
- (2019) Improved techniques for training adaptive deep networks. CoRR abs/1908.06294. External Links: Cited by: §2.
- (2017) Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 6459–6468. External Links: Cited by: §2.
- (2016) Fixed point quantization of deep convolutional networks. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, pp. 2849–2858. External Links: Cited by: §2.
- (2019) Auto-deeplab: hierarchical neural architecture search for semantic image segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 82–92. External Links: Cited by: §2.
- (2019) DARTS: differentiable architecture search. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, External Links: Cited by: §2.
- (2018) Dynamic deep neural networks: optimizing accuracy-efficiency trade-offs by selective execution. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 3675–3682. External Links: Cited by: §1.
- (2016) SSD: single shot multibox detector. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I, pp. 21–37. External Links: Cited by: §1.
- (2016) Conditional deep learning for energy-efficient and enhanced pattern recognition. In 2016 Design, Automation & Test in Europe Conference & Exhibition, DATE 2016, Dresden, Germany, March 14-18, 2016, pp. 475–480. External Links: Cited by: §1, §2, §4.2, §4.
- (2016) XNOR-net: imagenet classification using binary convolutional neural networks. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, pp. 525–542. External Links: Cited by: §1, §2.
- (2016) You only look once: unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 779–788. External Links: Cited by: §1.
- (2018) SBNet: sparse blocks network for fast inference. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 8711–8720. External Links: Cited by: §2.
- (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp. 91–99. External Links: Cited by: §1.
- (2018) MobileNetV2: inverted residuals and linear bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 4510–4520. External Links: Cited by: §4.1.
- (2017) Proximal policy optimization algorithms. CoRR abs/1707.06347. External Links: Cited by: §1, §3.3, §4.1.
- (2015) Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, External Links: Cited by: §4.1.
- (2018) MnasNet: platform-aware neural architecture search for mobile. CoRR abs/1807.11626. External Links: Cited by: §1, §3.3.
- (2016) BranchyNet: fast inference via early exiting from deep neural networks. In 23rd International Conference on Pattern Recognition, ICPR 2016, Cancún, Mexico, December 4-8, 2016, pp. 2464–2469. External Links: Cited by: §1, §1, §2, §4.2, §4.2, §4.
- (2018) Convolutional networks with adaptive inference graphs. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part I, pp. 3–18. External Links: Cited by: §1, §2.
- (2018) SkipNet: learning dynamic routing in convolutional networks. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIII, pp. 420–436. External Links: Cited by: §1, §2.
- (2016) Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pp. 2074–2082. External Links: Cited by: §1.
- (2019) FBNet: hardware-aware efficient convnet design via differentiable neural architecture search. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 10734–10742. External Links: Cited by: §1, §2.
- (2016) Quantized convolutional neural networks for mobile devices. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 4820–4828. External Links: Cited by: §1, §2.
- (2018) BlockDrop: dynamic inference paths in residual networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 8817–8826. External Links: Cited by: §1, §1, §2, §4.2, §4.2, §4.
- (2017) Scalpel: customizing DNN pruning to the underlying hardware parallelism. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017, pp. 548–560. External Links: Cited by: §1.
- (2017) Neural architecture search with reinforcement learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, External Links: Cited by: §2, §3.3, §3.3.
- (2018) Learning transferable architectures for scalable image recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 8697–8710. External Links: Cited by: §2, §3.3, §3.3, §3.3.