SGAD: Soft-Guided Adaptively-Dropped Neural Network

SGAD: Soft-Guided Adaptively-Dropped Neural Network

Zhisheng Wang, Fangxuan Sun, Jun Lin, Zhongfeng Wang and Bo Yuan
School of Electronic Science and Engineering, Nanjing University, P.R. China
Department of Electrical Engineering, City University of New York, City College
{zswang, fxsun}@smail.nju.edu.cn
{jlin, zfwang}@nju.edu.cn, byuan@ccny.cuny.edu
Abstract

Deep neural networks (DNNs) have been proven to have many redundancies. Hence, many efforts have been made to compress DNNs. However, the existing model compression methods treat all the input samples equally while ignoring the fact that the difficulties of various input samples being correctly classified are different. To address this problem, DNNs with adaptive dropping mechanism are well explored in this work. To inform the DNNs how difficult the input samples can be classified, a guideline that contains the information of input samples is introduced to improve the performance. Based on the developed guideline and adaptive dropping mechanism, an innovative soft-guided adaptively-dropped (SGAD) neural network is proposed in this paper. Compared with the 32 layers residual neural networks, the presented SGAD can reduce the FLOPs by with less than drop in accuracy on CIFAR-10.

 

SGAD: Soft-Guided Adaptively-Dropped Neural Network


  Zhisheng Wang, Fangxuan Sun, Jun Lin, Zhongfeng Wang and Bo Yuan School of Electronic Science and Engineering, Nanjing University, P.R. China Department of Electrical Engineering, City University of New York, City College {zswang, fxsun}@smail.nju.edu.cn {jlin, zfwang}@nju.edu.cn, byuan@ccny.cuny.edu

\@float

noticebox[b]Preprint. Work in progress.\end@float11footnotetext: Authors contributed equally.

1 Introduction

Deep neural networks (DNNs) have achieved the state-of-the-art accuracy and gained wide adoption in various artificial intelligence (AI) fields, such as computer vision, speech recognition and nature langue processing He et al. (2016); Krizhevsky et al. (2012); Simonyan and Zisserman (2014); Amodei et al. (2015). However, the remarkable accuracy of DNNs comes at the expense of huge computational cost, which has already posed severe challenges on the existing DNN computing hardware performance in terms of processing time and power consumption. Even worse, it is widely acknowledged that the computational cost of modern DNNs will continue to increase rapidly due to the ever-growing demands for improved accuracy in AI applications. Consider the limited progress of hardware technology, the huge computational cost of DNNs, if not being properly addressed, would largely prevent the large-scale deployment of DNNs on various resource-constrained platforms, such as mobile devices and Internet-of-Thing (IoT) equipment.

To address this challenge, several computation-reducing approaches have been proposed in Wen et al. (2016, 2017); Han et al. (2016); Sun et al. (2016); Garipov et al. (2016). To date, most of the existing works focus on modifying the original popular DNN architectures via different techniques (such as pruning and decomposition etc.). In those model-pruning/decomposition works, all the input samples are treated equally and they are processed by all the layers of DNNs. Consider that shallow models with relatively poor model capacities can also correctly classify some input samples, thus, different samples in the same dataset exhibit different levels of ease on accurate classification. By leveraging such characteristics, an input-specific adaptive computational approach can be exploited to avoid unnecessary computation.

A natural way to skip a layer is to add a bypass which directly outputs the inputs. Among various DNNs, residual networks He et al. (2016) (ResNets) exhibit a unique architecture which is friendly to the adaptive computational approach. Hence, this paper focuses on the adaptive computation of ResNets. There are two more reasons for choosing ResNets, 1) ResNet is the currently most popular and widely deployed DNN architecture, especially in computer vision field; 2) previous work Veit et al. (2016) showed that ResNets can be seen as ensembles of many shallow blocks with weak dependencies which can be utilized for adaptive computation.

Figure 1: Overview of adaptive dropped mechanism. Blocks denote the residual blocks that consist of several convolutional layers. (a) ResNets without any modifications. (b) ResNets with adaptive dropped mechanism. The dropped blocks are denoted by those striated blocks.

In this paper, we propose a novel end-to-end trainable soft-guided adaptively-dropped neural network (SGAD) to reduce the input-specific redundant computations while retaining high accuracy. As shown in Fig. 1, all blocks in the original ResNet are always busy. However, in SGAD, each block will be adaptively dropped according to the input samples. To smartly and efficiently decide which blocks will be dropped, a soft guideline is developed to generate a group of discrete masks. Experimental results show that the proposed SGAD can reduce floating-point operations (FLOPs) with less than accuracy loss compared with ResNet-32 on CIFAR-10. On CIFAR-100, SGAD can improve the accuracy by with less FLOPs as compared with ResNet-32. The contributions of this paper are summarized as follows:

  • A novel soft information-based guideline is proposed to quantize the level of difficulties of input samples being classified correctly. Such guideline is then used to direct the expected drop ratio of residual blocks during training via an efficient mapping strategy. At the inference stage, the guideline can be removed without incurring additional overhead.

  • We introduce a small but efficient model with binary output, which determines the positions of layers that will be skipped according to the current input sample under the direction of the proposed guideline. Straight through estimator (STE) Bengio et al. (2013) is introduced to approximate the non-differential of the rounding function during the training phase.

  • The learned dropping behavior of SGAD is explored. Our experiments show that layers of original network (e.g. ResNet-32) with less contribution to the model capacity are likely to be dropped in SGAD-based model (e.g. SGAD-32).

2 Related Works

The proposed SGAD is motivated by recent studies on exploring the behavior of residual networks. Andreas et al. found that residual networks can be seen as ensembles of many weakly-dependent paths with varying lengths, where only the short paths are needed during training Veit et al. (2016). Besides, removing individual layers from a trained residual network at test time only leads to misclassification on few borderline samples with minor accuracy drop Greff et al. (2016). These observations indicate that most input samples may be easily classified with limited number of layers, thus we can adaptively allocate different computation budgets between “easy” and “hard” samples. Several approaches have been proposed based on this concept.

The early-termination approaches Bolukbasi et al. (2017); Teerapittayanon et al. (2016); Panda et al. (2016) add additional side-branch classifiers inside a deep neural network. Hence, input samples that are judged as being able to be classified by a certain side-branch classifier can exit from the network immediately without executing the whole model. In contrast, the proposed SGAD enables adaptive-computation behavior by utilizing the ensemble nature of residual networks. Neither hand-crafted network architectures nor extra side-branch classifier is needed, thereby making our approach more simple and effective.

The adaptive computation approaches are close to our work. Spatially Adaptive Computation Time (SACT) Figurnov et al. (2016) dynamically decides the number of executed layers inside a set of residual units. SkipNet Wang et al. (2017) and BlockDrop Wu et al. (2018) utilize reinforcement learning to dynamically choose executed residual units in a pretrained ResNet for different input samples. Adanets Andreas and Serge (2017) enable adaptive computation graphs by adding layer-wise gating functions to decide whether to skip the computation of a certain layer or not. Different from these approaches, the proposed SGAD uses a shallow network, whose behavior is guided by an extra guideline during training, to generate binary vectors for adaptively masking those unused residual units. Compared to above mentioned approaches, SGAD is able to achieve higher savings in computational cost with no accuracy loss in most cases.

3 Soft-Guided Adaptively-Dropped Approach

In this section, we present the soft-guided adaptively-dropped neural network. First, we introduce a binary mask network (BMNet) to decide which blocks should be used for a specific input. The size of BMNet is quite small and hence it introduces very little computation overhead. Then, in order to solve the non-differentiable problem incurred by using these discrete binary masks in the training phase, straight through estimator (STE) Bengio et al. (2013) is introduced to approximate the gradient of the original non-differential rounding function during back propagation. Finally, we propose a soft guideline network (SGNet) to improve the overall classification accuracy. The SGNet can extract the soft information of different inputs during the training phase, and thereby aiding the training of BMNet through a regularization term to force BMNet drop dynamically. At the inference phase, the regularization term is no longer used, thus the SGNet can be removed.

Figure 2: Architecture of the proposed SGAD. (a). The overview of the presented algorithm. The activations after the first single convolutional layer of ResNet is denoted by green blocks. The red dot lines mean that the SGNet will be active only in the training phase. (b) Architecture of BMNet. The binary rounding and sinc function-based estimator are used in the phase of forward and backward propagation, respectively. (c). Architecture of SGNet. The block named Conv layers denotes a group of convolutional layers which can be flexibly adjusted.

3.1 Binary Mask

Generally, the main part of ResNets consists of several blocks. Let and be the input and output of the (+1)-th block, respectively. The computation of is shown below:

(1)

where the details of can be referred in He et al. (2016). Beside the blocks, the ResNets usually start with a single convolutional layer and end with a fully-connected layer.

Binary Mask: As indicated in the first paragraph of this section, we introduce a binary mask to determine whether each block should be skipped or not in the inference of a specific input sample. Specifically, for the -th block, its output determined by the binary mask is as follows:

(2)

where denotes the input of the first block, namely the output of the single convolutional layer in ResNet. is a hypothesis with weight which decides whether or not this block should be dropped. To simplify the deduction of gradient, Eq. (2) can be rewritten in the following form:

(3)

Assume that a batch of data contains pairs during the training phase. The weights of the -th block of ResNets are denoted by . The update of at the (+)-th iteration can be written as belows:

(4)

where and denote the learning rate used in the training phase and the training loss at the -th iteration, respectively.

For original ResNets, the gradient can be represented as follows:

(5)

where denotes the number of blocks in a ResNet. denotes the differential of to and denotes the differential of to . Taking the binary mask into consideration, the update of gradients can be represented as:

(6)

Generally, the gradients calculated in the training phase is much less than 1 (the magnitude of gradients are about according to Section.4.1). Hence, in Eq. (5) can be seen as 1. Eq. (5) can thusly be simplified to:

(7)

Let be the ratio that does not dropped in a batch for the -th block. Combining Eqs. (5 - 7) and the definition of , the updating of the weights of ResNets with binary mask can be approximated as follows:

(8)

where is the actual learning rate. Since in different blocks are not the same, each block will have an unique learning rate. Hence, the proposed binary mask can adaptively adjust the learning rate of different blocks according to the level of contributions to the model capacity (LCMC) of blocks Veit et al. (2016). To explore which blocks contribute more to the model capacity, we will study the dropping behavior of SGAD with details discussed in Section 4.1.

The design of binary mask called BMNet is introduced and shown in Fig. 2 (b). Note that small perturbations can result in quite different binary masks if the output of the sigmoid unit is near the rounding threshold (0.5), thereby making the BMNet instable. Inspired by Salakhutdinov and Hinton (2009), additive noises are injected before the sigmoid unit. The magnitude of noise is increased over time so that the magnitude of inputs will also be increased to alleviate impact of the noise. With the use of this method, the sigmoid unit can be trained to be saturated in nearly 0 or 1 for all input samples. Hence, more stable and confident decisions are generated during both the training and the inference phases.

3.2 Soft Guideline

With the proposed BMNet, an adaptively dropped ResNet can be realized. However, how BMNet decides the dropping ratio is unknown. Consider that our goal is to make the networks adaptively adjust the computational complexity according to the difficulty of classification of input samples, the information of whether or not the input samples are easy to be classified should be generated and sent to BMNet to improve the correctness of decisions. Based on this concept, an additional network, called the soft guideline network (SGNet), is proposed to produce the required information and guide the dropping behavior of the BMNet.

Soft Guideline: Generally, each input sample couples with a hard target which only contains the information of the truth label class. The information of whether or not the input samples are easy to be classified can not be gained from the hard targets. Inspired by  Hinton et al. (2015), the soft target, namely the class probabilities produced by the softmax layer, can provide much more information than the hard target. In this paper, the soft target of the SGNet is used to obtain the information which indicates the difficulty of classification. More specifically, the variance of the soft target is used as the guideline. For input sample (), corresponding variance can be written as follows:

(9)

where is the number of classes. are the elements of the softmax output for . Intuitively, smaller value of indicates that the SGNet is less confident for its classification result. Thus, it tends harder to correctly classify .

In order to make the BMNet learn to adaptively drop more (less) residual blocks for easily (hardly) classified input samples, the guideline is first transformed to produce an expected drop ratio . Easily classified will have higher . Then, the L1-norm between the and the calculated drop ratio for all input samples in a batch, denoted as , is added to the loss function as an regularizer to push the BMNet allocate desired drop ratio for different input samples, where

(10)

where is the measured average drop ratio (computed by BMNet). The application of this regularizer can push the actual drop ratio and the desired drop ratio closer.

Based on the above discussion, a proper transformation is needed to map to . The details of the transformation will be given in the following part.

Mapping Strategy: One simple intuition is to map larger to larger since input samples that are judged as to be easily classified by the SGNet are expected to bypass more blocks. Generally, a relatively shallow network can correctly classify a large proportion of input samples, indicating that most input samples are “easy” samples and only few are hard to be correctly classified. Based on this observation, an exponent function-based mapping strategy is proposed and can be expressed as follows:

(11)

where denotes the allowed maximum drop ratio and . transforms to the level of difficulties. Considering (avoid model with too little complexity) and , we can get . The proposed mapping strategy tends to map more different values to large . This approach is consistent with the distribution of the “easy” and “hard” samples as discussed above.

At the training phase, the SGAD, which includes the BMNet, the SGNet and the ResNet, can be end-to-end trained from scratch. As shown in Fig. 2, Input samples will be fetched to the ResNet and the SGNet simultaneously. The SGNet outputs its own classification results as well as the guideline. The BMNet fetches the first layers’s output of the ResNet to produce the binary mask. The ResNet learns to adaptively drop the rest residual blocks based on the output of the binary mask and also produces its own classification results. Then, all the weights in SGAD are updated based on regularization loss , the classification error of the SGNet and the ResNet . The final loss function of SGAD can be expressed as:

(12)

where , and denote the weighting factors for ResNet, BMNet and SGNet, respectively. During inference, the regularization term is useless. Thus, the SGNet can be removed and only the BMNet and the ResNet are needed after training.

4 Experiments

We evaluate the performance of the proposed SGAD on two datasets: CIFAR-10 and CIFAR-100. The influences of the guideline is investigated. In addition, we also explore different blocks’ contributions to the model capacity and the dropping behavior of SGAD.

Model Size: Both ResNet-32 and ResNet-110 are adopted as baseline in our experiments. The details of structures can be referred to  He et al. (2016). The design of BMNet is crucial to the overall complexity of SGAD. On CIFAR-10, the use of BMNets only introduces and computation overheads as compared with ResNet-32 and ResNet-110, respectively. The memory overheads of BMNets are and compared to ResNet-32 and ResNet-110, respectively. Hence, using the proposed BMNet will only render very minor overheads.

Training Details: PyTorch is used to implement the SGAD. The stochastic gradient descent is used as optimizer with momentum 0.9. The learning rate is initialized at 0.1 and decayed by after the 128, 160 and 192 epochs. SGAD is trained for 220 epochs with a batch size of 128. and are set to 1.0, 1.0 and 0.3 by default, respectively. In our experiments, only adjusting while leaving other hyper-parameters as default can affect the dropping behavior and works in most cases. The last block in SGAD is fixed for all inputs in order to ensure more robust output predictions.

4.1 Comparisons and Discussion

We train the SGAD model under two typical settings: 1) a relatively smaller , resulting in a model (MF-SGAD) with more FLOPs. 2) a larger , which produces a model (LF-SGAD) with less FLOPs. For the latter case, we fine tune the model from a pre-trained MF-SGAD to obtain a faster convergence instead of training from random initialization. The performances of MF-SGAD and LF-SGAD are shown in Table. 1. For comparison, we also provide the training results of the original ResNets. It can be found that at most cases, for smaller , the SGAD can achieve comparable and even better accuracy with less FLOPs as compared to the original ResNets, which indicates the effectiveness of the proposed SGAD. More aggressive reduction in FLOPs can also be obtained under large at a cost of small accuracy loss. For example, the FLOPs can be reduced by 77 with only 0.87 loss in accuracy (CIFAR10, 110 layers).

Dataset Layers ResNet MF-SGAD, LF-SGAD,
accuracy accuracy n-FLOPs accuracy n-FLOPs
CIFAR-10 32 93.02 93.11 0.86 92.18 0.47
110 94.57 94.20 0.86 93.70 0.23
CIFAR-100 32 70.38 70.85 0.77 70.09 0.71
110 73.94 73.94 0.94 73.94 0.75
Table 1: Results of SGADs and ResNets. For the original ResNets, n-FLOPs=1.0.

Comparisons with Existing Works: In this subsection, we compare the proposed SGAD with previous works. The performances are shown in Fig. 3, which contains the results of SACT Figurnov et al. (2016), ACT Figurnov et al. (2016), SkipNet Wang et al. (2017), and BlockDrop Wu et al. (2018). The proposed SGAD outperforms all existing networks at most cases.

Figure 3: Comparisons with state-of-the-art. (a) Results on CIFAR-10. (b) Results on CIFAR-100. The solid lines and dashed lines have different baselines.
22footnotetext: https://github.com/mfigurnov/sact

For ACT and SACT, since the results on CIFAR are not reported, we conduct the experiments using the code provided by the authors of SACT. Compared with the SACT, the FLOPs of SGAD can be reduced by with even higher accuracy on CIFAR-10. On CIFAR-100, the accuracy can be enhanced by with less FLOPs. The proposed SGAD also outperforms other algorithms such as ACT and SkipNet. Compared with the BlockDrop which currently achieves the state-of-the-art results, SGAD can also improve the accuracy by with less computational complexity on CIFAR-10.

Figure 4: Comparisons of magnitude of gradients and normalized flops of each blocks. The n-FLOPS-MF-SGAD and the n-FLOPS-LF-SGAD denote the n-FLOPS of MF-SGADs and LF-SGAD, respectively. The magnitudes of gradients are given by the mean value of L1 normalization of gradients after the -th epoch.

Discussion: The dropping behavior of SGAD is explored here. The experiments are conducted using ResNet-32 and SGAD-32 on CIFAR-10. Fig. 4 shows the comparisons of magnitude of gradients and normalized flops of each blocks. Since the last block is always fixed in SGAD, the n-FLOPs of the last block is always 1 and is not listed. It is worth noting that in ResNet32, every 5 blocks share the same number of output channels, C-block is used here to denote a cluster of 5 blocks. From Fig. 4 we can obtain the followings:

  1. In the original ResNets, different blocks usually have different magnitudes of gradients (MGs). In each C-block, the MGs decrease gradually from the first block to the fifth block. Such phenomenon shows that the first several blocks in a C-block have relatively higher LCMC than the others. The discovery gained here is consistent with the reports from previous works Veit et al. (2016); Jastrzebski et al. (2017).

  2. According to our experiments, in each C-block, the dropping behavior is closely related to the MGs. Blocks with higher MGs usually have a higher n-FLOPs. Combining the analysis in Section 3.1 and the experimental results, the blocks with less MGs are tended to be skipped. Thus, the updates of these blocks are further decreased in SGAD. To reduce the FLOPs while maintaining the performance, SGAD tries to keep the blocks with higher LCMC.

  3. As shown in Fig. 4, some blocks will be skipped by all the input samples after training, thus leads to zero n-FLOPs for these blocks. As an additional benefit, these dead blocks can be removed during inference to reduce the memory storage requirement.

5 Conclusion and Future Work

SGAD is proposed to exploit an adaptive processing pattern for different input samples. To enable the propagation of gradients, STE is introduced to approximate the non-differential rounding function during the training phase. The information contained in softmax layer is explored to inform the SGAD the difficulties of various input samples being classified correctly. In addition, a dedicatedly designed mapping strategy is introduced to combine the difficulties and the dropping ratio. The experiments demonstrate that the proposed SGAD outperforms previous works under the same baselines. While the reduction in FLOPs may not accurately reflect the real running latencies under different hardware devices (eg. CPUS, GPUs), the real speedup measurements will be conducted in the future.

References

  • Amodei et al. [2015] D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. arXiv preprint arXiv:1512.02595, 2015.
  • Andreas and Serge [2017] V. Andreas and B. Serge. Convolutional networks with adaptive computation graphs. arXiv preprint arXiv:1711.11503, 2017.
  • Bengio et al. [2013] Y. Bengio, N. Lonard, and A. Courville. Estimating or propagating gradients through stochastic neurons. Computer Science, 2013.
  • Bolukbasi et al. [2017] T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama. Adaptive neural networks for fast test-time prediction. arXiv preprint arXiv:1702.07811, 2017.
  • Figurnov et al. [2016] M. Figurnov, M. Collins D, Y. Zhu, L. Zhang, J. Huang, D. P. Vetrov, and R. Salakhutdinov. Spatially adaptive computation time for residual networks. arXiv preprint arXiv:1612.02297, 2016.
  • Garipov et al. [2016] T. Garipov, D. Podoprikhin, A. Novikov, and D. Vetrov. Ultimate tensorization: compressing convolutional and fc layers alike. arXiv preprint arXiv:1611.03214, 2016.
  • Greff et al. [2016] K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv preprint arXiv:1612.07771, 2016.
  • Han et al. [2016] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. international conference on learning representations, 2016.
  • He et al. [2016] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. pages 770–778, 2016.
  • Hinton et al. [2015] G. E. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  • Jastrzebski et al. [2017] S. Jastrzebski, D. Arpit, N. Ballas, V. Verma, T. Che, and Y. Bengio. Residual connections encourage iterative inference. arXiv preprint arXiv:1710.04773, 2017.
  • Krizhevsky et al. [2012] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
  • Panda et al. [2016] P. Panda, A. Sengupta, and K. Roy. Conditional deep learning for energy-efficient and enhanced pattern recognition. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016, pages 475–480. IEEE, 2016.
  • Salakhutdinov and Hinton [2009] R. Salakhutdinov and G. Hinton. Semantic hashing. International Journal of Approximate Reasoning, 50(7):969–978, 2009.
  • Simonyan and Zisserman [2014] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • Sun et al. [2016] F. Sun, J. Lin, and Z. Wang. Intra-layer nonuniform quantization of convolutional neural network. In Wireless Communications & Signal Processing (WCSP), 2016 8th International Conference on, pages 1–5. IEEE, 2016.
  • Teerapittayanon et al. [2016] S. Teerapittayanon, B. McDanel, and H. Kung. Branchynet: Fast inference via early exiting from deep neural networks. In Pattern Recognition (ICPR), 2016 23rd International Conference on, pages 2464–2469. IEEE, 2016.
  • Veit et al. [2016] A. Veit, M. J. Wilber, and S. J. Belongie. Residual networks behave like ensembles of relatively shallow networks. neural information processing systems, pages 550–558, 2016.
  • Wang et al. [2017] X. Wang, F. Yu, Z. Dou, and J. E. Gonzalez. Skipnet: Learning dynamic routing in convolutional networks. arXiv preprint arXiv:1711.09485, 2017.
  • Wen et al. [2016] Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. neural information processing systems, pages 2074–2082, 2016.
  • Wen et al. [2017] W. Wen, C. Xu, C. Wu, Y. Wang, Y. Chen, and H. Li. Coordinating filters for faster deep neural networks. arXiv preprint arXiv:1703.09746, 2017.
  • Wu et al. [2018] Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S Davis, Kristen Grauman, and Rogerio Feris. Blockdrop: Dynamic inference paths in residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8817–8826, 2018.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
211802
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description