Selective Deep Convolutional Neural Network for Low Cost Distorted Image Classification

Selective Deep Convolutional Neural Network for Low Cost Distorted Image Classification

Minho Ha, Younghoon Byeon, Youngjoo Lee, and Sunggu Lee
Department of Electrical Engineering
POSTECH, Republic of Korea
{mh0205,byh1321,youngjoo.lee,slee}@postech.ac.kr
Abstract

Deep convolutional neural networks have proven to be well suited for image classification applications. However, if there is distortion in the image, the classification accuracy can be significantly degraded, even with state-of-the-art neural networks. The accuracy cannot be significantly improved by simply training with distorted images. Instead, this paper proposes a multiple neural network topology referred to as a selective deep convolutional neural network. By modifying existing state-of-the-art neural networks in the proposed manner, it is shown that a similar level of classification accuracy can be achieved, but at a significantly lower cost. The cost reduction is obtained primarily through the use of fewer weight parameters. Using fewer weights reduces the number of multiply-accumulate operations and also reduces the energy required for data accesses. Finally, it is shown that the effectiveness of the proposed selective deep convolutional neural network can be further improved by combining it with previously proposed network cost reduction methods.

 

Selective Deep Convolutional Neural Network for Low Cost Distorted Image Classification


  Minho Ha, Younghoon Byeon, Youngjoo Lee, and Sunggu Lee Department of Electrical Engineering POSTECH, Republic of Korea {mh0205,byh1321,youngjoo.lee,slee}@postech.ac.kr

\@float

noticebox[b]Preprint. Work in progress.\end@float

1 Introduction

In recent years, deep convolutional neural networks (DCNNs) have been shown to be exceptionally effective for image classification (krizhevsky2012imagenet, ; simonyan2014very, ; Szegedy_2015_CVPR, ; he2016deep, ; huang2017densely, ). However, the datasets used in these studies are all clean images. If distorted images are used, the classification accuracy degrades significantly even if the if the networks are trained using distorted images. When performing image classification using DCNNs in the real world, distorted images (due to noise, blurring, occlusion, etc.) can be quite common. Therefore, in order to utilize DCNN technology for image classification in the realistic situations, an effective method for dealing with distorted images is essential.

A DCNN typically requires an extremely large number of weight parameters, and thus, an extremely large number of multiply-and-add computations. Therefore, it is very difficult to use a DCNN in mobile or embedded environments. To address this problem, it is necessary to reduce the size of the DCNN. Also, in the case of image classification in mobile or embedded environments, the actual sensed image may be distorted or have low resolution due to demanding environmental conditions. Therefore, a low cost distorted image classification method that can be used effectively in mobile or embedded environments needs to be developed.

In this paper, a new selective DCNN architecture for low cost distorted image classification is proposed. The selective DCNN consists of one tiny CNN and two dedicated DCNNs. The proposed selective DCNN operates in two stages. First, the tiny CNN distinguishes between clean images and distorted images. After that, classification is performed with one of two dedicated DCNNs depending on whether the input is a clean image or a distorted image. The dedicated DCNN is a input/output channel reduced version of an existing state-of-the-art DCNN. Using the proposed selective DCNN, similar or better performance is achieved while using fewer weight parameters when compared to the baseline DCNN.

The rest of this paper is organized as follows. Section 2 introduces related work. Section 3 describes the proposed selective DCNN method. Section 4 shows experimental results. Section 5 shows the effect of combining a selective DCNN with other techniques. Section 6 discusses future work. Finally, Section 7 concludes the paper.

2 Related work

2.1 Distorted image classification using DCNN

Dodge et al. (dodge2016understanding, ) analyzed the effect of image quality on performance of a DCNN. They analyzed how various distortions (blur, noise, contrast, and JPEG compression) affect various DCNN models. Zhou et al. (zhou2017classification, ) also analyzed the effect of a different set of distortions (motion blur, defocus blur, Gaussian noise, and combined noise) on DCNN performance. Diamond et al. (diamond2017dirty, ) proposed a denoising/deblurring method to improve the performance of distorted image classification. In Section 4, like (dodge2016understanding, ) and (zhou2017classification, ), how various distortions affect image classification using DCNN will be analyzed.

2.2 Network complexity reduction methods

Network pruning is one of the typical ways used to reduce the size of neural networks. Optimal Brain Damage (lecun1990optimal, ) and Optimal Brain Surgeon (hassibi1993second, ) , which were proposed in the 1990s, prune network connections based on the Hessian of loss functions. Although these methods are more accurate than magnitude-based pruning, they require second order derivatives, which require additional computation. Han et al. (han2015learning, ) proposed pruning unimportant connections and fine tuning by retraining the pruned network. Pruning technique proposed in (han2015learning, ) showed the same level of performance as a more conventional method with fewer weight parameters through additional retraining. In Section 5, the effect of adding the pruning technique proposed in (han2015learning, ) to the proposed selective DCNN will be shown.

By reducing the bit-precision of the parameters used in a neural network, the amount of data stored in memory can be reduced. Reduced bit-precision also reduces power consumption for multiply-accumulate (MAC) operations. BinaryConnect (courbariaux2015binaryconnect, ) and Binary Neural Network (BNN) (hubara2016binarized, ) constrain the parameters that make up a neural network to +1 or -1. The accuracy is comparable to a full-precision DCNN, but they have the disadvantage that more parameters are needed, which weakens the benefit. Gupta et al. (gupta2015deep, ) replaced floating-point weights and activations with a fixed-point representation. Also, the gradient generated during the training process was limited to fixed-point values. Lin et al. (lin2016fixed, ) transformed pre-trained DCNN into fixed-point DCNN. Ding et al. (ding2018quantized, ) proposed a cost-effective, highly accurate quantization method using k-ones approximation and stochastic rounding. In Section 5, the effect of applying quantization to the selective DCNN proposed in this paper will be shown.

3 Proposed method: distorted image classification using selective DCNN

3.1 Image distortion

This paper presents the results of a case study conducted using three types of distortions: Gaussian noise, Gaussian blurring, and occlusion. These distortion types were selected because they are the main types of distortions that occur in embedded and mobile environments. Gaussian noise is modeled assuming that a low quality camera sensor is used, Gaussian blurring is modeled assuming that the camera is not focused properly, and occlusion is modeled assuming that a part of the object is covered with other objects. Experiments were carried out assuming that the standard deviation for Gaussian noise is 0.25, the standard deviation for Gaussian blurring is 1.00, and the block size for occlusion is 16 16.

3.2 Selective DCNN

Howard et al. (howard2017mobilenets, ) show how to reduce the number of input/output channels in a network while maintaining the same number of layers, thereby reducing the number of weight parameters, which may cause a loss of classification accuracy. However, when done properly, reducing the number of input/output channels at an appropriate level can effectively reduce the number of weight parameters without compromising classification accuracy. By using this property, the selective DCNN, which use one of two dedicated networks for clean images and distorted images, reduces the number of weight parameters without sacrificing classification accuracy. In the selective DCNN, the number of input/output channels in each dedicated network is less than the baseline DCNN. The proposed selective DCNN consists of two stages. In the previous stage, a tiny CNN determines whether the input image is a clean image or a distorted image. Based on the judgment in this stage, a dedicated DCNN that is thinner than the baseline DCNN can be loaded. The input image is then classified by the dedicated DCNN. The architecture of this selective DCNN is shown in Figure 1. In Figure 1, the tiny CNN is able to categorize clean images and distorted images almost perfectly. This tiny CNN is obtained by repeatedly reducing a large network until the desired performance level is achieved. The tiny CNN uses the same sized network for all three distortion types (Gaussian noise, blurring, and occlusion). The weight parameters values used are different for each case. The tiny CNN is very small and does not have a significant impact on the number of weight parameters when used with dedicated DCNNs.

In the case study considered in this paper, each type of distortion is applied to 50000 training data sets and 10000 test data sets of CIFAR-100 (krizhevsky2009learning, ). The baseline DCNNs used are the main representative DCNN architectures, which are VGG16 (simonyan2014very, ) and ResNet50 (he2016deep, ). The dedicated DCNNs used are thinner versions of these architectures. Table 1 summarizes the classification accuracy of the networks used with these dedicated DCNN. In Table 1, the accuracy in the "no distortion" case is the classification accuracy of only clean images and the accuracies in the other cases are the classification accuracies with only distorted images. The classification accuracy of the proposed selective DCNN for each type of distortion is compared with the values shown in Table 1. For Gaussian noise, the standard deviation is assumed to be 0.25, for Gaussian blurring ,the standard deviation used is 1.00, and for occlusion, the block size is set to 16 16. Variations with different degrees of distortion were also considered, and these results will be presented in Section 5.

Figure 1: The proposed selective DCNN architecture (shown with, for example, 0.50-VGG16 used as the baseline dedicated DCNN).
DCNN Distortion Top 1 DCNN Distortion Top 1
Architecture Architecture
0.75-VGG16 No Distortion 70.96 0.75-ResNet50 No Distortion 73.73
Gaussian noise 59.08 Gaussian noise 62.85
Gaussian blurring 65.80 Gaussian blurring 71.02
Occlusion 62.86 Occlusion 69.60
0.50-VGG16 No Distortion 69.22 0.50-ResNet50 No Distortion 71.68
Gaussian noise 57.89 Gaussian noise 62.61
Gaussian blurring 63.79 Gaussian blurring 68.20
Occlusion 61.02 Occlusion 66.16
0.25-VGG16 No Distortion 60.83 0.25-ResNet50 No Distortion 70.42
Gaussian noise 52.20 Gaussian noise 61.83
Gaussian blurring 55.43 Gaussian blurring 65.25
Occlusion 54.18 Occlusion 64.07
Table 1: Classification accuracies of dedicated DCNNs

3.3 Cost estimation methods

The cost of a DCNN can be measured in various ways. In this paper, the number of weight parameters, the number of MAC operations, and data access energy per input image, which deals only with data access to weight parameters in this paper, will be considered as the main components of cost. These costs are selected because of their importance mobile and embedded environments, which must use hardware with low power and small memory utilization requirements.

3.3.1 Number of weight parameters and MAC operations

The number of weight parameters and the number of MAC operations are computed in the same manner as in (howard2017mobilenets, ). The number of weight parameters and number of MAC operations for a given convolutional layer are calculated as follows.

(1)
(2)

In (1) and (2), is the height and width of a square shaped convolution kernel, is the number of input channels of the convolution filter, is the number of output channels, and is the height and width of a square shaped feature map. The corresponding parameters for a fully connected layer can be calculated a similar manner. As shown in (1) and (2), the number of weight parameters and the number MAC operations are closely related. These equations also show that the number of weights directly affects the number of MAC operations required.

3.3.2 Data access energy

DCNN accelerators, which are hardware devices specialized for DCNN applications, use limited-capability processing elements that primarily perform MAC operations and employ energy efficient methods for reading and writing weight and intermediate feature map parameters from memory. In these systems, the energy required to access the memory constitutes the major portion of the total energy consumption (chen2016eyeriss, ). Data access energy can vary depending on hardware characteristics. In this paper, data access energy is analyzed based on the well known Eyeriss architecture (chen2016eyeriss, ). The memory hierarchy of Eyeriss consists of external DRAM, global buffer (GLB), array, and register file (RF). Eyeriss maximizes the reuse of data read from RF, which has the lowest data access energy, and minimizes the reuse of data read from DRAM, which has the highest data access energy. By considering data reuse and using the normalized energy cost calculation method presented in (chen2016eyeriss, ), the total data access energy can be computed as follows.

(3)

In (3), , , , and are the number of reuses for DRAM, GLB, array, and RF, respectively. is the normalized energy cost for memory device . The normalized energy cost used in (3) is summarized in Table 2. The values in Table 2 assume 16-bit fixed point values. As shown in Table 2, it can be seen that the energy cost varies greatly depending on the memory layer used. In this paper, the data access energy is calculated based on the row stationary model used in [19].

Type of Memory DRAM Global Buffer Array (inter-PE) Register File
Normalized Energy Cost 200 6 2 1
Table 2: Normalized energy cost relative to a single MAC operation (chen2016eyeriss, )

4 Experimental results

The proposed selective DCNN was implemented in a PyTorch framework (paszke2017automatic, ). The experiment was carried out using NVIDIA Titan Xp GPUs. All experiments were carried out with 8 GPUs and fine-tuned while changing the batch size up to 1024 and varying the learning rate from 0.8 to 0.0008.

CIFAR-100 (krizhevsky2009learning, ) image classification using VGG16 (simonyan2014very, ) and ResNet50 (he2016deep, ) are used to show the performance of the proposed selective DCNN. The modified CIFAR-100 dataset used in this paper consists of a clean dataset and a distorted dataset with three types of distortions (Gaussian noise, blurring, and occlusion). As mentioned in Section 3, the experiment was conducted assuming the lowest degree of distortion. For experimental convenience, a distilled dataset was created in advance and used repeatedly. The dedicated DCNN used in Stage 2 of the selective DCNN is a DCNN with input/output channel depths of 0.75, 0.50, and 0.25 times, as shown in Table 1.

4.1 CIFAR-100 classification using VGG16

4.1.1 Weight parameters, MAC operations, and classification accuracy

First, image classification for the modified CIFAR-100 dataset (krizhevsky2009learning, ) was implemented using VGG16 (simonyan2014very, ). In the case of Gaussian noise, Gaussian blurring, and occlusion, the classification accuracy, number of weight parameters, and number of MAC operations are the primary comparison metrics used to evaluate the selective DCNN architecture. Since the target application environment is a mobile or embedded system environment, a slight reduction in classification accuracy, of less than 1 , is permitted if a significant energy reduction can be achieved. The number of weight parameters, number of MAC operations, and classification accuracy achieved when using the proposed selective DCNN architecture, which includes the tiny CNN and both dedicated DCNNs, are shown in Table 3. As can be seen from Table 3, both the selective 0.75-VGG16 and 0.50-VGG16 architectures can be used for images with Gaussian noise while only the selective 0.75-VGG16 architecture can be used for images with Gaussian blurring (these architectures result in classification accuracy within 1  of the levels achieved for clean images).

4.1.2 Data access energy per image

Figure 2 shows how much normalized data access energy per image is reduced when using selective DCNN. Data access energy calculations are carried out based on (3) and Table 3. As shown in Figure 2 (a), which is for the VGG16 case, energy consumed by tiny CNN is negligible (about 1.8 of 1.00-VGG16). In the case of classification of images with Gaussian noise using the selective 0.50-VGG16, the normalized data access energy per image is reduced by about 3.7 times, while the classification of images with Gaussian blurring or occlusion using the selective 0.75-VGG16 results in energy reduction of about 1.7 times.

Distortion DCNN architecture Weights MACs Top 1
Gaussian noise 1.00-VGG16 15.3M 313.8M 63.64
Selective 0.75-VGG16 8.9M 178.1M 65.02
Selective 0.50-VGG16 4.1M 80.2M 63.55
Selective 0.25-VGG16 1.2M 21.2M 56.52
Gaussian blurring 1.00-VGG16 15.3M 313.8M 67.53
Selective 0.75-VGG16 8.9M 178.1M 68.38
Selective 0.50-VGG16 4.1M 80.2M 66.51
Selective 0.25-VGG16 1.2M 21.2M 58.13
Occlusion 1.00-VGG16 15.3M 313.8M 67.55
Selective 0.75-VGG16 8.9M 178.1M 66.91
Selective 0.50-VGG16 4.1M 80.2M 65.12
Selective 0.25-VGG16 1.2M 21.2M 57.51
Table 3: Number of weight parameters, number of MAC operations, and classification accuracy of the selective VGG16
Figure 2: Data access energy of baseline DCNN and selective DCNNs: (a) VGG16, (b) ResNet50.

4.2 CIFAR-100 classification using ResNet50

4.2.1 Weight parameters, MAC operations, and classification accuracy

In a manner similar to the experiments of the previous subsection, modified CIFAR-100 dataset (krizhevsky2009learning, ) classification was carried out using ResNet50 (he2016deep, ). Like the VGG16 case, it is also assumed that the reduction of classification accuracy is allowed to be less than 1 . The number of weight parameters, number of MAC operations, and classification accuracy when using the proposed selective DCNN, which includes the tiny CNN and both dedicated DCNNs, are shown in Table 4. As can be seen from Table 4, the selective 0.25-ResNet50 can be used for classification of images with Gaussian noise and the selective 0.75-ResNet50 can be used for images with Gaussian blurring or occlusion.

4.2.2 Data access energy per image

As shown in Figure 2 (b), which is for the ResNet50 case, energy consumed by the tiny CNN is negligible (about 1.1 of 1.00-ResNet50). In the case of classification of images with Gaussian noise using selective 0.25-ResNet50, the normalized data access energy per image is reduced by about 13.3 times, while the classification of images with Gaussian blurring or occlusion using selective 0.75-ResNet results in energy reduction of about 1.7 times.

Distortion DCNN architecture Weights MACs Top 1
Gaussian noise 1.00-ResNet50 23.7M 2.1G 64.73
Selective 0.75-ResNet50 13.6M 1.2G 68.29
Selective 0.50-ResNet50 6.2M 524.5M 67.15
Selective 0.25-ResNet50 1.8M 132.3M 66.13
Gaussian blurring 1.00-ResNet50 23.7M 2.1G 72.51
Selective 0.75-ResNet50 13.6M 1.2G 72.38
Selective 0.50-ResNet50 6.2M 524.5M 69.94
Selective 0.25-ResNet50 1.8M 132.3M 67.84
Occlusion 1.00-ResNet50 23.7M 2.1G 70.87
Selective 0.75-ResNet50 13.6M 1.2G 71.67
Selective 0.50-ResNet50 6.2M 524.5M 68.92
Selective 0.25-ResNet50 1.8M 132.3M 67.25
Table 4: Number of weight parameters, number of MAC operations, and classification accuracy of the selective ResNet50

4.3 Distortion ratio and degree of distortion

The results of the previous experiments are all one-to-one ratio of clean dataset to distorted dataset. It is also assumed that the distortion degree is fixed to one value (for example, standard deviation is 0.25 for Gaussian noise). In real situations, however, it is unknown what the ratio of clean images to distorted images will be. The distortion degree will also vary depending on each situation. Therefore, it is necessary to analyze changes in classification accuracy with changes in the ratio of the clean to dirty images and the distortion degree. Figure 3 shows the classification accuracy according to the change of the distortion ratio and the degree of distortion. As is intuitive, it can be seen that as the distortion ratio or degree of distortion increases, the classification accuracy decreases. The classification accuracy decreases from 68.09 to 35.86 , from 68.68 to 46.74 , and from 68.40 to 45.15 for images with Gaussian noise, Gaussian blurring, and occlusion, respectively. When comparing the corresponding cases for the proposed selective DCNN and baseline DCNN, in almost all cases, the classification accuracy difference is within about 1 , except in a few extreme cases where the distortion degree and the ratio of clean to distorted images are very low.

Figure 3: Classification accuracy according to the ratio of the clean dataset to the distorted dataset and the distortion degree: (a) Gaussian noise, (b) Gaussian blurring, (c) occlusion.

5 Combination with other techniques

Additional experiments were carried out to apply weight pruning and quantization techniques to the proposed selective DCNN in order to check the synergy effect with other techniques. It is assumed that the tiny CNN is not pruned and quantized because it must be guaranteed to operate with close to 100 accuracy.

5.1 Weight pruning

Han et al. (han2015learning, ) proposed a weight pruning technique that can effectively reduce the DCNN size. It was shown that the size of the network can be reduced while maintaining the classification accuracy by removing unimportant connections and retraining the sparse network. By combining the proposed selective DCNN with weight pruning, it is possible to reduce DCNN cost more effectively. The effect of selective DCNN and weight pruning is summarized in Table 5. Classification results with the Gaussian noise dataset case are shown in Table 5 as representative of the results obtained with noisy datasets. Pruning was performed at 80 and the accuracy was improved through retraining. As can be seen from Table 5, the same classification accuracy can be achieved using only about 10 of the MAC operations required without pruning.

Distortion DCNN architecture Weights MACs Top 1
Gaussian noise Unpruned-1.00-VGG16 15.3M 313.8M 63.64
Pruned80-1.00-VGG16 3.1M 110.4M 63.44
Selective unpruned-0.50-VGG16 4.1M 80.2M 63.55
Selective pruned80-0.50-VGG16 1.1M 31.6M 64.69
Table 5: Effect of applying selective DCNN and weight pruning together

5.2 Weight pruning and quantization

Quantization is an effective way to reduce data access energy (courbariaux2015binaryconnect, ; hubara2016binarized, ; gupta2015deep, ; lin2016fixed, ; ding2018quantized, ). Among various quantization methods, in this paper, a method of quantizing DCNN weights of pre-trained network with full precision , which is similar to (lin2016fixed, ), is applied to the selective DCNN with weight pruning. Starting from the 16-bit fixed point data used by the Eyeriss architecture (chen2016eyeriss, ), the change in classification accuracy by decreasing the bitwidth is confirmed. As a result, it can be confirmed that even if quantization is performed up to the 12-bit fixed point case, there is no significant loss in classification accuracy. Data access energy values are also evaluated for these situations. Yang et al. showed how data access energy is changed with changes in bit width (cvpr_2017_yang_energy, ). The results for data access energy are shown in Figure 4. As can be seen in Figure 4, applying selective DCNN with weight pruning and quantization to distorted image classification shows that similar or better classification accuracy can be achieved by using only about 5.6 of the data access energy used in the baseline architecture.

Figure 4: Data access energy with weight pruning and quantization.

6 Future work

In this paper, energy estimation based on the Eyeriss architecture (chen2016eyeriss, ) was carried out. However, when considering the use of a selective DCNN and a mobile or embedded system environment, it is reasonable to assume that the input image is processed one by one rather than multiple images at once. It means that the Eyeriss architecture (chen2016eyeriss, ) utilizing input image reuse may not be suitable for the proposed selective DCNN. Therefore, it is necessary to design an accelerator suitable for the proposed selective DCNN. A related study will be conducted in future work.

7 Conclusion

In this paper, a low cost distorted image classification using selective DCNN for mobile or embedded systems is proposed. Using the proposed selective DCNN, the level of classification accuracy kept about the same distorted image datasets, which may have Gaussian noise, Gaussian blurring, and occlusion, while cost (the number of weights, the number of MAC operations, and the data access energy) is significantly decreased from that of the existing state-of-the-art DCNN. In addition, by using well known weight pruning and quantization techniques with the proposed selective DCNN, data access energy per input image can be reduced by up to about 18 times. The proposed selective DCNN showed better performance in almost all cases even when the distortion degree or the ratio of the clean dataset to the distorted dataset were changed. The proposed selective DCNN can be used in mobile or embedded environments, where low cost computation is essential and distorted images may need to be processed.

References

  • [1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS), pages 1097–1105, 2012.
  • [2] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • [3] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  • [4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition (CVPR), pages 770–778, 2016.
  • [5] Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. Densely connected convolutional networks. In IEEE conference on computer vision and pattern recognition (CVPR), 2017.
  • [6] Samuel Dodge and Lina Karam. Understanding how image quality affects deep neural networks. In International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6, 2016.
  • [7] Yiren Zhou, Sibo Song, and Ngai-Man Cheung. On classification of distorted images with deep convolutional neural networks. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1213–1217, 2017.
  • [8] Steven Diamond, Vincent Sitzmann, Stephen Boyd, Gordon Wetzstein, and Felix Heide. Dirty pixels: Optimizing image classification architectures for raw sensor data. arXiv preprint arXiv:1701.06487, 2017.
  • [9] Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In Advances in neural information processing systems (NIPS), pages 598–605, 1990.
  • [10] Babak Hassibi and David G Stork. Second order derivatives for network pruning: Optimal brain surgeon. In Advances in neural information processing systems (NIPS), pages 164–171, 1993.
  • [11] Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (NIPS), pages 1135–1143, 2015.
  • [12] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems (NIPS), pages 3123–3131, 2015.
  • [13] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In Advances in neural information processing systems (NIPS), pages 4107–4115, 2016.
  • [14] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Deep learning with limited numerical precision. In International Conference on Machine Learning (ICML), pages 1737–1746, 2015.
  • [15] Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning (ICML), pages 2849–2858, 2016.
  • [16] Ruizhou Ding, Zeye Liu, RD Blanton, and Diana Marculescu. Quantized deep neural networks for energy efficient hardware-based inference. In Asia and South Pacific Design Automation Conference (ASP-DAC), pages 1–8, 2018.
  • [17] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  • [18] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009.
  • [19] Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 367–379, 2016.
  • [20] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
  • [21] Yang, Tien-Ju and Chen, Yu-Hsin and Sze, Vivienne. Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
211808
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description