Understanding and Controlling Memory in Recurrent Neural Networks

Understanding and Controlling Memory in Recurrent Neural Networks

Doron Haviv    Alexnader Rivkind    Omri Barak

Supplemental Material

Doron Haviv    Alexnader Rivkind    Omri Barak
Abstract

To be effective in sequential data processing, Recurrent Neural Networks (RNNs) are required to keep track of past events by creating memories. While the relation between memories and the network’s hidden state dynamics was established over the last decade, previous works in this direction were of a predominantly descriptive nature focusing mainly on locating the dynamical objects of interest. In particular, it remained unclear how dynamical observables affect the performance, how they form and whether they can be manipulated. Here, we utilize different training protocols, datasets and architectures to obtain a range of networks solving a delayed classification task with similar performance, alongside substantial differences in their ability to extrapolate for longer delays. We analyze the dynamics of the network’s hidden state, and uncover the reasons for this difference. Each memory is found to be associated with a nearly steady state of the dynamics which we refer to as a ’slow point’. Slow point speeds predict extrapolation performance across all datasets, protocols and architectures tested. Furthermore, by tracking the formation of the slow points we are able to understand the origin of differences between training protocols. Finally, we propose a novel regularization technique that is based on the relation between hidden state speeds and memory longevity. Our technique manipulates these speeds, thereby leading to a dramatic improvement in memory robustness over time, and could pave the way for a new class of regularization methods.

Recurrent Neural Networks, Dynamics

1 Introduction

Code available at: https://github.com/sashkarivkind/MemoryRNN

Recurrent Neural Networks (RNN) are the key tool currently used in machine learning when dealing with sequential data (Sutskever et al., 2014), and in many tasks requiring a memory of past events (Oh et al., 2016). This is due to the dependency of the network on its past states, and through them on the entire input history. This ability comes with a cost - RNNs are known to be hard to train (Pascanu et al., 2013a). This difficulty is commonly associated with the vanishing gradient that appears when trying to propagate errors over long times (Hochreiter, 1998). When training is successful, the network’s hidden state represents these memories. Understanding how such representation forms throughout training can open new avenues for improving learning of memory-related tasks.

Linking hidden state dynamics with task-related memories requires some form of reverse engineering. This can be done by focusing on individual recurrent units (Karpathy et al., 2015; Oh et al., 2016), or by analyzing global network properties. We opt for the latter, analyzing the RNN’s hidden states as a discrete-time dynamical system.

In this framework, memories might be associated with a wide range of dynamical objects. On one extreme, transient dynamics can be harnessed for memory operations (Manjunath & Jaeger, 2013; Maass, 2011; Maass et al., 2002). On the other extreme, there are memory networks (Sukhbaatar et al., 2015) that memorize everything and later use only the relevant memories while ignoring all the rest. The idealized dynamical scenario where each memory is associated with a fixed point in the RNN state space (Hopfield, 1982; Sussillo, 2014; Barak, 2017; Amit, 1989) was refined in (Durstewitz, 2003; Sussillo & Barak, 2013; Mante et al., 2013) where points which are only approximately fixed (slow points), with a drift that is slower than the task duration were shown to represent memory.

To allow in-depth analysis of the formation of memories throughout training, we analyze a simple delayed classification task. While simple enough to analyze, the task requires both memory and processing - two key operations in many RNN tasks. We train this task using different datasets, unit-types and training protocols, to obtain a range of networks. We find that different protocols lead to comparable performance on the task. A more careful analysis, however, reveals that the resulting networks differ in their extrapolation abilities and reflect their training histories.

To uncover the underlying reasons for such differences, we extend tools used in continuous-time systems in neuroscience (Sussillo & Barak, 2013). We find that memories of the different classes are represented by slow points of varying slowness. We show that the speed of the points predicts the extrapolation properties of their associated class. We establish such a correlation in a large variety of settings.

Having established the importance of slow points as a predictor, we obtain an instructive insight on how they evolve along the training course. Detailed analysis of individual training trajectories makes it possible to monitor the formation of slow points under a specific training protocol. This technique uncovers an interplay between newly recruited and functional slow points – decreasing the stability of the latter in a systematic manner. This provides a link between training curriculum, dynamical objects, memory and performance.

Ultimately, we take a step from merely predicting performance to improving it. To this extent we modify the loss function to penalize hidden state speed in relevant points, and report a dramatic improvement for extremely long delays.

2 Task Definition

Figure 1: A Task. The network is presented an MNIST or CIFAR-10 image amidst noisy images and has to report its label at a later time, as requested by a separate input ( to the right of images). Output should be null at all times except the reporting time. The precise times vary from trial to trial. B Architecture. In the case of MNIST dataset, both the image and the trigger signal are fed directly into the recurrent layer. For the CIFAR-10 task, a convolutional feed forward network is added in front of the recurrent layer, while the trigger signal is connected directly to the RNN.

Inspired by real-world applications of Recurrent Neural Networks (Oh et al., 2016), we designed a task where the RNN has to combine stimulus processing and memorization (Figure 1). The network is presented with a series of noisy images, among which appears a single target image (from MNIST or CIFAR-10) at time . At a later time point, , the network receives a response trigger in a separate input channel, prompting it to output the label of the image. At all other times, the network should report a null label.

The stimulus and reporting times are chosen randomly at each trial from a uniform distribution on subject to the constraint . The total stimulation time is , and the network was requested to distinguish between different classes of MNIST (LeCun et al., 2010) or CIFAR-10 (Krizhevsky et al., ).

Each pixel of the noise mask was sampled from a Gaussian distribution with mean and variance matching its counterpart at the image corpus ). We tested the RNNs ability to extrapolate from this task to longer durations by increasing the delay .

The motivation behind this task is three-fold. First, as explained, this task is comparable to real-world scenarios where RNNs are used, combining the need for both stimulus memorization and feature extraction. Second, the task lends itself to parametric variations, allowing to compare both different training protocols and generalization abilities. Third, desiring to understand the dynamical nature of memorization in discrete Gated-RNNs, the delay between stimulus and response trigger allows for evolution of RNN hidden-state (HS), which can be reliably analyzed using well known methods from dynamical systems (Sussillo & Barak, 2013), which we modify to our discrete setting.

3 Model

For MNIST, the network consists of a single recurrent layer of gated recurrent units, an output layer of neurons, neuron for the different classes, and an additional neuron for the null indicator. The input layer has neurons, where is the number of pixels in the image and an extra binary input channel for the response trigger defined by:

(1)

For CIFAR-10, the network was expanded to recurrent units, along with a convolutional front-end composed of three convolutional layers and two dense layers. To eliminate issues of translational invariance regarding the response trigger and the convolutional front-end, the trigger was added as an extra channel to the final dense layer, right before the recurrent units (Figure 1).

The gated units are either GRU:

(2)

or LSTM:

(3)

For the analysis of the network’s phase space, we denote the state of the recurrent layer by , which for LSTM is and for GRU .

The network was trained using the ’Adam’ optimizer (Kingma & Ba, 2014) with a soft-max cross-entropy loss function with an increased loss on reporting at in proportion to . Full description of each protocol, including schedules and other hyper-parameters is given in the supplemental code.

4 Training Protocol: Two Types of Curricula

We found that training failed when using straightforward stochastic gradient (SG) optimization on the full task. The network converged to a state where it consistently reports ’null’ without regarding neither the output trigger nor the images it has received as inputs. This suboptimal behavior did not improve upon further training. On the other hand, we observed that simpler versions of the task are learn-able. If the maximal delay between stimulus and reporting time was short or when we introduced only a limited number of different digits, the network was able to perform the task. This led us to try two different protocols of curriculum learning (Bengio et al., 2009) in order teach the network the full required task:

1. Vocabulary curriculum (VoCu) - here we started from two classes and then increased the vocabulary gradually until reaching the full class capacity. This protocol is similar to the original concept of (Bengio et al., 2009) except the fact that in our vocabulary all the classes occur with the same frequency, and the selected order of class introduction is in fact arbitrary.

2. Delay curriculum (DeCu) - starting from short delays between stimulus and reporting times (), we progressively extended it toward the desired values. Implicitly mentioned in (Hochreiter, 1998), this regime is expected to mitigate the vanishing gradient problem, at least during initial phase of training.

5 Extrapolation Ability Depends on Training Protocol

We found that, in good accordance with existing literature (Bengio et al., 2009; Jozefowicz et al., 2015) results for the nominal test-set were fairly indifferent to the training protocol (Supplementary material). Once we evaluated the ability of each setting to extrapolate to longer delays, however, similarity ends and differences emerge.

Figure 2: Retrieval accuracy when increasing the delay between stimulus and response trigger beyond . Despite similar performance initially, the ability to generalize for greater delays than those trained for (dashed vertical lines) varies with protocol. DeCu was superior to Vocu in both LSTM and GRU architectures for both MNIST and CIFAR-10 datasets.

We observed how each setting performs when the delay between stimulus and response trigger is extended further beyond . If robust fixed-point attractors have formed, retrieval accuracy should not be affected by the growing delay. If the computation is based on transients, then all class information is expected to eventually vanish.

Experiments revealed that neither of these extreme options was the case - performance deteriorated with increasing delay, but did not reach chance levels (Figure 2). This deterioration implies that not every memorized digit corresponds to a stable fixed point attractor, but some do. Furthermore, the deterioration was curriculum-dependent, with DeCu outperforming VoCu in all cases.

6 Dynamics of Hidden Representation

Figure 3: Hidden state projected on leading principal components in GRU - RNN on MNIST for delays of time-steps from stimulus. States are color codded by their prediction. For the nominal delay ten distinct regions are observed in the state space, corresponding to each of the classes. Examination of a larger delay reveals that some clouds collapse into a single point at the center of the cloud, while others vanish completely. The spread of samples in the nominal delay, along with a smaller number of distinct fixed points at in VoCu aligns with our findings of faster and less stable dynamics compared to DeCu.
Figure 4: Extrapolation accuracy predicted by slow point speeds. The accuracy at long delays ( for MNIST and for CIFAR-10) is shown as a function of the slow point speed of the associated class (green, errorbars denote standard error of the mean). In all datasets, unit-types and training methods, slower speeds correlate with increased accuracy. The red and blue dots show the mean speed and accuracy for each training protocol (standard error of the mean is smaller than marker size). The difference in speeds between the protocols underlies the different extrapolation performance shown in Figure 2. Ten networks were used for MNIST, and five for CIFAR-10.

The relevant phase space of this dynamical system is the recurrent layer state . We thus begin by visually inspecting (in the first 3 PCA components) the activity of the network for the maximal training delay, . We show here results for the MNIST dataset with a GRU architecture, but similar behavior is seen for other conditions and the statistics of all conditions is analyzed below. The left panels of Figure 3 show that different trials of each digit are well separated into regions with a one to one correspondence to data classes. Following these trajectories for a longer delay of shows that some regions converge into what appears to be fixed points, while others vanish (right panels). These figures also clearly show the difference between the two protocols. While both achieve a good separation with the nominal delay (left), it is already apparent that VoCu leads to clouds of points with a larger spread, possibly indicating a weaker attraction.

To verify the existence or absence of fixed points hinted by the above visualization, we apply an algorithm developed for continuous time vanilla RNN (Sussillo & Barak, 2013) to our setting. Briefly, fixed points (stable or unstable) are local minima of the (scalar) speed of the hidden state .

(4)

where the evolution of state,

(5)

is given by equations 2 or 3 for GRU or LSTM respectively. It is now possible to use gradient descent on the speed with respect to state , namely, , to locate such minima.

The initial conditions for this gradient descent were obtained by running the network with the mean delay value , and using the averages of hidden states of each class as initial conditions. The external input during gradient descent was the average of the noise images, thus effectively making our system Time-Invariant so that such points and their stability are well defined. We verified that using different fixed external inputs did not qualitatively alter the results (not shown). We repeated the procedure for several realizations, and it always resulted in a local minimum of speed for each class (which we call a slow point), with a readout that matches the class label.

To look for a quantitative relation between slow point speed and the memory robustness we located slow points for every class, computed their speed and the prediction accuracy of their associated classes after a long delay. Figure 4 shows that the speed of the slow point associated with a certain class can predict how members of that class will react to extrapolation experiments. This trend holds for all architectures, unit types and datasets tested.

The colored dots in Figure 4 denote the mean of the speeds obtained by the two different protocols. The picture here is consistent with that observed in Figure 2, with DeCu outperforming VoCu for all cases. Our results suggest that this difference is mediated by a difference in speeds of the associated slow points.

We also trained networks on an additional task of delayed matching (as opposed to classification), and observed the same speed-accuracy anticorrelation. (Supplementary material)

7 Formation of Slow Points - Why Protocols Differ

We saw that the two training protocols lead to a different representation of the stimulus memory by the network, and hence to different dynamical objects. How does training give rise to these differences? To answer this question, we analyze in detail two settings - a GRU and LSTM architectures trained on the MNIST database. We follow the slow points of the velocity backwards in training time to learn how they emerge and change throughout training, and correlate these events with network performance.

Figure 5: A,B Speeds of each slow-point through training (only five are shown for clarity) obtained by iteratively tracking them back in training time. The specific schedule of each curriculum is marked on the time axis ( for VoCu and for Decu). VoCu shows sharp jumps in the speed of all points for each class introduction, in contrast to DeCu which exhibits a gradual slowing down along the whole course of training. C A branching diagram for VoCu, obtained from short backtracking every class, proximal to its appearance. The change in readout suggests which slow point gave rise to the new point (e.g., 8 from 5 at time 78; 7 from 3 or 4 at time 62). D The accuracies of each individual class for the same VoCu realization when class ’8’ was introduced. As a result, all previously existing classes show a degradation in accuracy, however, the class from which class ’8’ emerged from (class ’5’) exhibited a stronger decline. Noisy images with double amplitude () were used to amplify the effect strength. E Verifying the statistical significance of the result in (D). For each branching event, we compared the accuracy drop of classes that gave rise to the new class, with the drop for the remaining classes. The histogram is from all events in three VoCu realizations. It shows that indeed accuracy of spawning classes decreases more following a branching event.

We located slow points as described in Section 6 , and then used them as initial conditions for gradient descent on the network defined by the previous training step. We then repeated this procedure iteratively for all training steps. The assumption is that the change in network parameters at each training step will not induce a very large shift in the locations of the relevant slow points, and thus our continuation procedure can track them. This is not clear in the case of VoCu, where one might expect rapid changes whenever a new class is added. Looking at the speeds of the tracked slow points shows that this is indeed the case for VoCu (Figure 5, A,B), with all tracked speeds exhibiting such jumps. DeCu, on the other hand, shows a gradual slowing down of the slow points throughout training.

By observing the gradual slowing down in DeCu, it is easy to understand why this protocol improves performance - slow points become slower. But the situation for VoCu is more complicated because introducing new classes qualitatively changes the dynamics. A natural expectation might be that the classes that were presented first will have more time to stabilize, and thus will be the slowest. We saw that this is not the case (not shown), and thus proceeded to look deeper into the class introduction events along VoCu training.

Consider such an event - the introduction of class at training time (We use to denote training step (of SG), as opposed to the time during each trial which is denoted by ). We perform a short backtracking procedure - starting just before the succeeding class is introduced (), and following it back until slightly before it appeared (). We verified that we can follow the point despite the jumps mentioned above (shown in supplementary material).

We discovered that the new class is assigned to an existing slow point. This slow point was previously classified as an existing class . By stitching together all such backtracking procedures, we obtain a diagram indicating where each new class originated (Figure 5C).

Does this history affect performance? To answer this question, we checked the difference in performance of the various classes following the introduction of a new one. For instance, the diagram shows that class 8 originated from class 5. Figure 5D shows that the performance of class 5 was impaired more than other existing classes following the introduction of class 8.

To evaluate the statistics of this phenomenon, we repeat this procedure for many networks, and all class introduction events. Figure 5E shows that the decline in accuracy of the classes that spawned the new class is significantly larger than that of the other classes (LSTM histogram is shown in supplementary material). Specifically, for class being newly introduced at training step , accuracy of all classes is evaluated at training steps and . Accuracy change for class that branches into class is compared to the average .

8 Improving Long Term Memory

Can the aforementioned insights help improve performance? Can we obtain memory for delays that are substantially longer than in the scenarios used for training? To answer this question, we first trained the network as before using one of the protocols discussed above. We then trained for an additional period of gradient steps (compared to in initial training) while regularizing our standard cross-entropy loss function by a term accounting for hidden state speed. The new loss:

(6)

penalizes for high speed of equation 4 at representative points associated with each class.

The natural candidates for are the slow points discussed above, and indeed Figure 6 shows that dramatic improvements were achieved, when compared to the same training without the added regularization. Using the slow points as regularizer targets is still somewhat costly, as their detection requires a gradient descent step. Furthermore, as training proceeds, the location of slow points can move – rendering the original regularization targets less effective. We thus used a proxy for the slow points by taking to be the centre of mass of each class. This simpler procedure achieved results that were comparable to using the slow points themselves. In both cases, we verified that slow point speed was manipulated to achieve the improved performance (Figure 6 legend), and that the accuracy for nominal delays remained virtually intact. In principle, one could train for such long delays by straight forward back propagation through time, but this would be orders of magnitude more time consuming than our method, if not impossible.

Figure 6: Effect of speed regularization on the performance is demonstrated for the DeCu training on GRU architechure on MNIST, results for all other setting are in Supplementary material. For control, we trained the network for the same number of additional gradient decent steps without any regularization (solid). Regularization targets were either the slow-points (dotted) or the center-of-mass (dashed). Both regularization methods resulted in dramatic improvements compared to control, which is also reflected in a smaller speed of slow-points after additional training (shown in legend brackets).

9 Discussion

Training RNNs to perform memory-related tasks is difficult (Pascanu et al., 2013b), and as a consequence many suggestions were made on how to alleviate this difficulty. Changing network architecture, unit-types or training protocols might be expected to generate different solutions to the same task.

Here we showed that different training protocols can lead to different locally optimal solutions. Although these solutions perform similarly under nominal conditions, challenging the networks with unforeseen settings reveals their differences.

An RNN is a dynamical system, and as such its operation can be understood in the language of fixed points and other dynamical objects. By analyzing the phase space of the network’s hidden states, we showed that the memory of each class was associated with a slow point of the dynamics. Such slow points were shown to assist network functionality (Machens et al., 2005), and arise through training (Mante et al., 2013; Durstewitz, 2003). The speeds of these slow points were highly correlated to the functional characteristics of memory longevity, and thus provide a dynamical explanation of the idiosyncrasies observed between curricula and architectures. Our result proved valid across architectures, datasets and unit types.

In specific cases, we were able to follow the formation of the recruitment of slow points to representation of memories during the training process. Such recruited slow points reside in an area of phase space that belongs to a specific existing class. We showed that this process is associated with a decrease in network accuracy that is specific for the classes that contribute the slow point. Things could have been different - slow points could emerge in an area of phase space that is distant from existing ones, and the introduction of a new class could have resulted in a uniform effect on all existing classes. To uncover this process, we introduced a back-tracking methodology that could be relevant to any case in which learning modifies the dynamics of the network. Possible applications include studying the success and failure in creating memories (Beer & Barak, 2018), preventing catastrophic forgetting, understanding memory capacity and more.

The setting studied in this work is a particular case of the more general problem of solution equivalence in gradient based optimization: On the one hand theoretical and numerical evidence does exist for training outcome’s indifference to protocol details. Specifically to our case of RNN, it is shown in (Cirik et al., 2016) that, at least for language modeling tasks, performance does not heavily depend on training protocol. On the other hand, such an indifference is far from being fully established. In particular, stochastic gradient optimization suffers from known drawbacks (Dauphin et al., 2014; Martens & Sutskever, 2011) and might prove dependent on initialization (Sutskever et al., 2013) (and in particular pre-training). Our results show that networks with the same initialization can reach different solutions, and begin to uncover the dynamics underlying the route to these solutions.

Ultimately, at the engineering end of this study, the aforementioned insights allowed us to come up with a novel regularization technique which, by penalizing the hidden state speeds, enables a dramatic improvement of performance for extremely long times, while keeping nominal performance intact. Having noticed that slowly increasing the delay (DeCu) resulted in better stability and increased extrapolation ability for all the conditions tested, it might have been possible to reach similar performance by extending the DeCu protocol to very large delays. Such long backpropagation through time, however, is extremely costly, if not impossible, and avoided by our new method. Notably, our method preserves linear time complexity of plain RNN, as opposed to attention mechanisms whose complexity is quadratic in time (Ke et al. (2018) and references therein).

Importantly, our regularization procedure relies on an informed guess of what information the system needs to memorize to improve performance and of where such information may be stored. In a delayed classification task, the trivial and straight forward guess is that remembering class improves performance with relevant information being represented by the mean of class’ samples hidden state . Future work may generalize our methodology to more complex tasks, such as natural language processing, where long term memorization amid a continuous stream of potentially useful input remains an open challenge (Paperno et al., 2016). Doing so will require identifying the appropriate entities to memorize and the locations in hidden space representing these memories (Bau et al., 2018), for use as regularization targets.

Acknowledgements

OB is supported by Israel Science Foundation (346/16). The Titan Xp used for this research was donated by the NVIDIA Corporation.

References

Appendix A Delayed Template Matching Task

To examine the relevance of our results to other tasks, we considered a delayed template matching setup (also known as delayed match to sample). Here instead of introducing a trigger via an extra input channel, we introduce a second stimulus. When the second stimulus is introduced, the system is to report whether both stimuli belong to the same class or not. At all other times, the system should report a null output. We trained both architectures on the MNIST dataset, with the same minimal and maximal delays as the original task and an equal probability of both stimuli belonging or not belonging to the same class. As a result of the simpler nature of the template matching task compared to the classification task, both architectures were able to perform well without curricula training for nominal delays. We then repeated the analysis reported in Figure 4 of the main text. Figure 7 demonstrates here, similarly to the original task, the predictive nature of slow-point speeds. The faster the slow-point is, the less likely is the memory of the corresponding class to sustain for long delays.

Figure 7: Speed of slow-point and the accuracy of the corresponding class for a long delay on the delayed template matching task. Similarly to the main text Figure 4 both architectures exhibit a clear negative correlation between slow-point speed and the effectiveness for long delays. Note that the delayed template matching task was learn-able without any curricula, further generalizing our findings to naive training. Ten networks for each architecture were used.

Appendix B LSTM Branching

Performing the Bifurcation analysis of Section 7 but for the LSTM architechure instead of GRU. Just as for GRU, Figure 8 shows that the details of slow-point bifurcations affect performance. In the VoCu protocol, the introduction of new classes is accompanied by a bifurcation of an existing slow point. The classes associated with this spawning slow point are significantly adversely affected by this event, compared to other classes.

Figure 8: Same as main text Figure 5E but for LSTM instead of GRU. Here as well, classes which bifurcated when new classes were spawned were more adversely affected compared to classes which did not experience a bifurcation.

Appendix C Regularization Figures

Results of Section 8 to all the data sets, training curricula and unit types are reported in Figure 9. Following the trend of section 8, when regularizing the speed of the center-of-mass of each class or of the appropriate slow-point superior results were achieved for extended delays compared to when no regularization was applied ().

Figure 9: Effect of speed regularization on the performance is demonstrated for all the test cases from Figure 6.

Appendix D Validity of backtracking procedure during introductions of new classes in VoCu

Backtracking of slow point speeds around an event of introduction of new class reveals a spurious behavior of these speeds. We validated that even at these training epochs the tracking follows specific slow points and does not just capture random slow points in the hidden representation space. Fig. 10 depicts displacements of slow points associated with classes back tracked along the step of introduction of a new class , and compares them to distances to other slow points being tracked. It confirms that displacements of the points claimed to be tracked are indeed systematically lower than distances from other slow points under tracking. Importantly this is also true for the slow point which is assigned to a newly introduced class.

Figure 10: Displacement of slow points along the training steps where new classes introduced is shown for all steps of a VoCu training example. Newly introduced class is always at the bottom-right corner.

Appendix E Accuracies on Test and Train sets for nominal task

Accuracies on the MNIST and CIFAR-10 train and test sets for the nominal task, for both GRU and LSTM architectures and the described curricula are reported in tables 1,2,3,4.

DeCu VoCu
Training Set Null 100% 100%
Digits % %
Test Set Null 100% 100%
Digits % %
Table 1: MNIST - GRU
DeCu VoCu
Training Set Null 100% 100%
Digits % %
Test Set Null 100% 100%
Digits % %
Table 2: MNIST - LSTM
DeCu VoCu
Training Set Null 100% 100%
Digits % %
Test Set Null 100% 100%
Digits % %
Table 3: CIFAR - GRU
DeCu VoCu
Training Set Null 100% 100%
Digits % %
Test Set Null 100% 100%
Digits % %
Table 4: CIFAR - LSTM
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
390197
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description