Learning with Long-term Remembering: Following the Lead of Mixed Stochastic Gradient

Learning with Long-term Remembering: Following the Lead of Mixed Stochastic Gradient

Yunhui Guo  Mingrui Liu  Tianbao Yang  Tajana Rosing
University of California, San Diego  The University of Iowa
yug185@eng.ucsd.edu, mingrui-liu@uiowa.edu, tianbao-yang@uiowa.edu, tajana@ucsd.edu
Abstract

Current deep neural networks can achieve remarkable performance on a single task. However, when the deep neural network is continually trained on a sequence of tasks, it seems to gradually forget the previous learned knowledge. This phenomenon is referred to as catastrophic forgetting and motivates the field called lifelong learning. The central question in lifelong learning is how to enable deep neural networks to maintain performance on old tasks while learning a new task. In this paper, we introduce a novel and effective lifelong learning algorithm, called MixEd stochastic GrAdient (MEGA), which allows deep neural networks to acquire the ability of retaining performance on old tasks while learning new tasks. MEGA modulates the balance between old tasks and the new task by integrating the current gradient with the gradient computed on a small reference episodic memory. Extensive experimental results show that the proposed MEGA algorithm significantly advances the state-of-the-art on all four commonly used lifelong learning benchmarks, reducing the error by up to 18%.

Equal contribution.

1 Introduction

A significant step towards artificial general intelligence (AGI) is to enable the learning agent to acquire the ability of remembering past experiences while being trained on a continuum of tasks. Current deep neural networks are capable of achieving remarkable performance on a single task (Goodfellow et al., 2016). However when the network is retrained on a new task, its performance drops drastically on previously trained tasks, a phenomenon which is referred to as catastrophic forgetting (Ratcliff, 1990; Robins, 1995; French, 1999; Kirkpatrick et al., 2017). In stark contrast, human cognitive system is capable of acquiring new knowledge without damaging previously learned experiences. It is thus of great importance to develop algorithms to allow deep neural networks to achieve continual learning capability (i.e., avoiding catastrophic forgetting).

The problem of catastrophic forgetting motivates the field called lifelong learning (Thrun and Mitchell, 1995; Kirkpatrick et al., 2017; Parisi et al., 2019). A central dilemma in lifelong learning is how to achieve a balance between the performance on old tasks and the new task (Robins, 1995; Kirkpatrick et al., 2017). During the process of learning the new task, the originally learned knowledge will typically be disrupted, which leads to catastrophic forgetting. On the other hand, a learning algorithm biasing towards old tasks will interfere with the learning of the new task. Several lines of methods are proposed recently to address this issue. Examples include regularization based methods (Kirkpatrick et al., 2017; Zenke et al., 2017), knowledge transfer based methods (Rusu et al., 2016), episodic memory based methods (Lopez-Paz and others, 2017; Chaudhry et al., 2018b; Riemer et al., 2018). However, the existing methods require over-parameterized neural networks (Kirkpatrick et al., 2017; Chaudhry et al., 2018a) or are not flexible enough handle the stochastic nature of the learning process (Lopez-Paz and others, 2017; Chaudhry et al., 2018b).

In this paper, we propose a novel and effective lifelong learning algorithm, called MixEd stochastic GrAdient (MEGA), to address the catastrophic forgetting problem. We cast the problem of balancing the performance on old tasks and the new task as an optimization problem with composite objective. Our formulation is general and closely related to several recently proposed lifelong learning algorithms (Lopez-Paz and others, 2017; Chaudhry et al., 2018b; Riemer et al., 2018). We approximately solve the optimization problem using one-step stochastic gradient descent with the standard gradient replaced by the proposed mixed stochastic gradient. The mixed stochastic gradient is derived from the gradients computed on the data of the current task and an episodic memory which stores a small subset of observed examples from old tasks (Lopez-Paz and others, 2017; Chaudhry et al., 2018b; Riemer et al., 2018). Based on our derivation, the direction of the mixed stochastic gradient balances the loss on old tasks and the new task in an adaptive manner. Therefore, the proposed MEGA algorithm allows deep neural networks to learn new tasks while avoiding catastrophic forgetting.

Our contributions are as follows. (1) We propose a novel and effective algorithm, called MEGA, for lifelong learning problems. (2) We extensively evaluate our algorithm using conventional lifelong learning benchmark datasets, and the results show that the proposed MEGA algorithm significantly advances the state-of-the-art performance across all the datasets. MEGA achieves an average accuracy of 91.210.10% on Permuted MNIST, which is 2% better than the previous state-of-the-art model. On Split CIFAR, our proposed MEGA achieves an average accuracy of 66.121.93%, which is about 5% better than the state-of-the-art method. Specially, on the Split CUB dataset, MEGA achieves an average accuracy of 80.581.94%, which surpasses the multi-task baseline which is previously believed as an upper bound performance of lifelong learning algorithms (Chaudhry et al., 2018b). (3) Finally, we also show that the proposed MEGA algorithm can handle increasingly non-stationary settings when the number of tasks becomes significantly larger.

2 Lifelong Learning

2.1 Problem Statement

Lifelong learning (LLL) (Rusu et al., 2016; Kirkpatrick et al., 2017; Lopez-Paz and others, 2017; Chaudhry et al., 2018b) considers the problem of learning a new task without degrading performance on old tasks, i.e., to avoid catastrophic forgetting (French, 1999; Kirkpatrick et al., 2017). Suppose there are tasks which are characterized by datasets: . Each dataset consists of a list of triplets , where is the label of -th example , and is a task descriptor that indicates which task the example comes from. Similar to supervised learning, each dataset is split into a training set and a test set .

In the learning protocol introduced in Chaudhry et al. (2018b), the tasks are separated into = {, , …, } and = {, , …, }. is used for cross-validation to search for hyperparameters. is used for actual training and evaluation. As pointed out in Chaudhry et al. (2018b), some regularization-based lifelong learning algorithms, e.g., Elastic Weight Consolidation (Kirkpatrick et al., 2017), are sensitive to the choice of the regularization parameters. Introducing can help find the best regularization parameter without exposing the actual training and evaluation data. While searching for the hyperparameters, we can have multiple passes over the examples in , the training is performed on with only a single pass over the examples (Lopez-Paz and others, 2017; Chaudhry et al., 2018b).

In lifelong learning, a given model is trained sequentially on a series of tasks {, , …, }. When the model is trained on task , the goal is to predict the labels of the examples in by minimizing the empirical loss on in an online fashion without suffering accuracy drop on {, , …, }.

2.2 Evaluation Metrics

Average Accuracy and Forgetting Measure (Chaudhry et al., 2018a) are common used metrics for evaluating performance of lifelong learning algorithms. In Chaudhry et al. (2018b), the authors introduce another metric, called Learning Curve Area (LCA), to assess the learning speed of different lifelong learning algorithms. In this paper, we further introduce a new evaluation metric, called Long-term Remembering (LTR), to characterize the ability of lifelong learning algorithms for remembering the performance of tasks trained in the far past.

Suppose there are mini-batches in the training set of task . Similar to Chaudhry et al. (2018b), we define as the accuracy on the test set of task after the model is trained on the -th mini-batch of task . Generally, suppose the model is trained on a sequence of tasks {, , …, }. Average Accuracy and Forgetting Measure after the model is trained on the task are defined as

(1)

where . Clearly, is the average test accuracy and assesses the degree of accuracy drop on old tasks after the model is trained on all the tasks. Learning Curve Area (LCA) (Chaudhry et al., 2018b) at is defined as,

(2)

where . Intuitively, LCA measures the learning speed of different lifelong learning algorithms. A higher value of LCA indicates that the model learns quickly. We refer the readers to Chaudhry et al. (2018b) for more details about LCA.

All the metrics introduced above fail to capture one important aspect of lifelong learning algorithms, that is, the ability to retain performance on the tasks trained in the far past. In this paper we introduce a new metric, called Long-Term Remembering (LTR), which is defined as

(3)

After the model is trained on all the tasks, LTR quantifies the accuracy drop on task relative to . The coefficient emphasizes more on the tasks trained earlier. Different algorithms can have the same average accuracy but very different LTR based on their ability to maintain the performance on the past tasks (a.k.a, long-term remembering).

3 Mixed Stochastic Gradient

In this section, we introduce the proposed Mixed Stochastic Gradient (MEGA) algorithm. Following previous works (Lopez-Paz and others, 2017; Chaudhry et al., 2018b), when the model is trained on the -th task, an episodic memory is used for storing a subset of the examples from all the old tasks . The main idea of MEGA is to minimize the loss on the episodic memory and the -th task by iteratively moving in the direction of the proposed mixed stochastic gradient.

In the lifelong learning setting, the learning of task is conducted over a single pass of the training examples in an online fashion. To establish the tradeoff between the performance on old tasks and the -th task, we consider the following optimization problem with composite objective:

(4)

where is the parameter of the model, are random variables with finite support, is the expected training loss of the -th task, is the expected loss calculated on the data stored in the episodic memory, and are hyperparameters which control the relative importance of and . Intuitively, a larger signifies catastrophic forgetting. Note that during the learning process of each task, every data sample is i.i.d., and hence the current example of task could be viewed as a random sample due to online-to-batch conversion argument (Cesa-Bianchi et al., 2004). In the lifelong learning setting, is the training loss calculated on the current mini-batch (controlled by ) of task , is the loss calculated on a random mini-batch (controlled by ) sampled from the episodic memory. In the traditional online learning, the loss is calculated only based on the current received example but not on historical samples, and hence , . In this case, the weights are only optimized for the current task while ignoring previous tasks which leads to catastrophic forgetting. If , this formulation naturally involves examples from old tasks. In lifelong learning, since we need to consider performance on both old tasks and the current task, we typically do not consider the degenerate case when .

We use to denote the weight when the model is being trained on the -th mini-batch of task . Clearly, both and are determined by during training. This implies that the relative value of and is changing between mini-batches. Therefore, and should be adjusted adaptively based on in order to achieve a good balance between old tasks and the current task. To this end, with a little abuse of notation, we define two parameter-dependent functions to characterize the relative importance of the current task and old tasks. Mathematically, we propose to use the following update:

(5)

where are real-valued mappings.

In practice, solving (5) is NP-hard in general if or do not have favorable properties (e.g. convexity), and hence first-order methods (e.g., stochastic gradient descent) is usually employed to approximately solve the optimization problem (5). This naturally motivates us to design the MixEd stochastic GrAdient (MEGA) algorithm. MEGA is doing update (6) to approximately solve (5), where one-step stochastic gradient descent is performed with the initial point set to be :

(6)

where is the learning rate, and are random variables with finite support, and are unbiased estimators of and respectively, and is referred to as the mixed stochastic gradient.

The main difficulty of the update (6) is to define well-behaved mappings and which are consistent with the goal of lifelong learning. To this end, we introduce two approaches, angle-based approach (Section 3.1) and a direct approach (Section 3.2) to address this difficulty. It is worth mentioning that several recent advances on lifelong learning (Lopez-Paz and others, 2017; Chaudhry et al., 2018b; Riemer et al., 2018) have close relationship with the angle-based approach in our MEGA framework which will be illustrated in Section 3.1.

1:
2:for  to  do
3:     for  to  do
4:         if  then
5:              
6:              
7:              Solve the optimization problem in Eq. (9) to obtain .
8:              Obtain and as in Appendix A.1.
9:         else
10:              Set and .
11:         end if
12:         Update using Eq. 6.
13:         
14:     end for
15:end for
Algorithm 1 MEGA, the proposed algorithm for lifelong learning. is the number of tasks. is the number of mini-batches of task . is the episodic memory. is the -th mini-batch of task and is the corresponding label. is a random mini-batch from the episodic memory. stands for the parameter after -th mini-batch during the training of -th task. is the training loss calculated on . is the reference loss calculated on . and are defined in Eq. 5

3.1 Angle-Based Approach

Note that the mixed stochastic gradient is a linear combination of and . While keeping the magnitude the same as , geometrically the mixed stochastic gradient can be viewed as an appropriate rotation of by a desired angle . This perspective leads to the angle-based approach in our framework. The key idea of the angle-based approach is to first appropriately rotate the stochastic gradient calculated on the current task (i.e., ) by an angle , and then use the rotated vector as the mixed stochastic gradient to conduct the update (6) in each mini-batch. For simplicity, we omit the subscript and superscript later on unless specified.

We use to denote the desired mixed stochastic gradient which has the same magnitude as . Specifically, we look for the mixed stochastic gradient which direction aligns well with both and . Mathematically, we want to maximize

(7)

which is equivalent to find an angle such that

(8)

where is the angle between and . To capture the relative importance of the current task and old tasks which is crucial for lifelong learning, we introduce and into (8),

(9)

Here we give some discussions of several special cases of Eq. (9),

  • When , then , and in this case , in (6), which means the mixed stochastic gradient reduces to . In the lifelong learning setting, implies that there is almost no catastrophic forgetting, and hence we can update the model parameters exclusively for the current task by moving in the direction of .

  • When , then , and in this case , , provided that (define 0/0=0). This means the direction of the mixed stochastic gradient is the same as the stochastic gradient calculated on the data in the episodic memory (i.e., ). In the lifelong learning setting, this update can help improve the performance on old tasks, i.e., avoid catastrophic forgetting.

In the general case, we assume and are both positive (the edge cases are covered in the above discussion). Since the optimization problem (9) is possibly nonconvex, we propose to use projected gradient ascent to approximately solve it. Mathematically, we do multiple updates using the following formula,

(10)

where is the projection operator, , . It is not difficult to show that the smoothness parameter of the function is , and hence projected gradient ascent can converge to a stationary point (Nesterov, 1998). To avoid getting stuck on saddle point or local minima or local maxima, we can use multiple random starting points and select the one which achieves the largest function value. This strategy is proven to be successful in our experiments.

After we find the desired angle , it is easy to obtain and in Eq. (6). For details, please refer to Appendix A.1. Please see Algorithm 1 for the summary of the algorithm.

Comparison with Existing Works

There are several existing works setting in different manners. In Lopez-Paz and others (2017) and Chaudhry et al. (2018b), if , and if . Note that is defined differently in Lopez-Paz and others (2017) and Chaudhry et al. (2018a). In Lopez-Paz and others (2017), is calculated on the data of each task separately stored in the episodic memory. While in Chaudhry et al. (2018a), is computed on a random mini-batch sampled from the episodic memory. These approaches cannot capture the dynamics in the lifelong learning process while our approach provides more exhaustive consideration on this aspect by introducing the optimization problem (9). Given , our approach can dynamically set an appropriate angle based on and , while in the previous approaches only depends on . This enables our approach to be more general and flexible for handling lifelong learning problems.

3.2 Direct Approach

We also introduce MEGA-D which is a direct approach for implementing MEGA. In MEGA-D, instead of rotating the stochastic gradient computed on the data in the current mini-batch as in the angle-based approach, we define and in the definition of mixed stochastic gradient in a direct manner. Specifically, in the update of (6), we set , if , and , if . Intuitively, if , then it means that the model performs well on the current task and can focus on improving performance on the data stored in the episodic memory, and hence , . Otherwise, we keep the balance of the two terms of mixed stochastic gradient according to and .

4 Experiments

4.1 Datasets

In the experiments, we consider the following four conventional lifelong learning benchmarks,

  • [leftmargin=*]

  • Permuted MNIST (Kirkpatrick et al., 2017): this is a variant of standard MNIST dataset (LeCun et al., 1998) of handwritten digits with 20 tasks. Each task has a fixed random permutation of the input pixels which is applied to all the images of that task.

  • Split CIFAR (Zenke et al., 2017): this dataset consists of 20 disjoint subsets of CIFAR-100 dataset (Krizhevsky and others, 2009), where each subset is formed by randomly sampling 5 classes without replacement from the original 100 classes.

  • Split CUB (Chaudhry et al., 2018b): the CUB dataset (Wah et al., 2011) is split into 20 disjoint subsets by randomly sampling 10 classes without replacement from the original 200 classes.

  • Split AWA (Chaudhry et al., 2018b): this dataset consists of 20 subsets of the AWA dataset (Lampert et al., 2009). Each subset is constructed by sampling 5 classes with replacement from a total of 50 classes. Note that the same class can appear in different subsets. As in Chaudhry et al. (2018b), in order to guarantee that each training example only appears once in the learning process, based on the occurrences in different subsets the training data of each class is split into disjoint sets.

We also include Many Permutations which is a variant of Permuted MNIST to introduce more non-stationality into the learning process. In Many Permutations, there are a total 100 tasks with 200 examples per task. The way to generate the tasks is the same as in Permuted MNIST, that is, a fixed random permutation of input pixels is applied to all the examples for a particular task.

4.2 Network Architectures

To be consistent with the previous works (Lopez-Paz and others, 2017; Chaudhry et al., 2018b), for Permuted MNIST we adopt a standard fully-connected network with two hidden layers. Each layer has 256 units with ReLU activation. For Split CIFAR we use a reduced ResNet18. For Split CUB and Split AWA, we use a standard ResNet18 (He et al., 2016).

4.3 Baselines and Experimental Settings

We compare the proposed MEGA with several state-of-the-art lifelong learning methods,

  • [leftmargin=*]

  • VAN: in VAN, a single network is trained continuously on a sequence of tasks in a standard supervised learning manner.

  • MULTI-TASK: in MULTI-TASK, a single network is trained on the shuffled data from all the tasks with a single pass.

  • Episodic memory based approach: GEM (Lopez-Paz and others, 2017) and AGEM (Chaudhry et al., 2018b) are episodic memory based approaches which modify the current gradient when its angle between the gradient computed on the episodic memory is obtuse. MER (Riemer et al., 2018) is another recently proposed episodic memory based approach which maintains an experience replay style memory with reservoir sampling and employs a meta-learning style training strategy.

  • Regularization-based approaches: EWC (Kirkpatrick et al., 2017), PI (Zenke et al., 2017), RWALK (Chaudhry et al., 2018a) and MAS (Aljundi et al., 2018) are regularization-based approaches which prevent the important weights of the old tasks from changing too much.

  • Knowledge transfer based approach: in PROG-NN (Rusu et al., 2016), a new “column” with lateral connections with previous hidden layers is added for each new task. This allows knowledge transfer between old tasks and the new task.

To be consistent with Chaudhry et al. (2018b), for episodic memory based approaches, the episodic memory size for each task is 250, 65, 50, and 100, and the batch size for computing the gradients on the episodic memory (if needed) is 256, 1300, 128 and 128 for MNIST, CIFAR, CUB and AWA, respectively. To fill the episodic memory, the examples are chosen uniformly at random for each task as in Chaudhry et al. (2018b). For each dataset, 17 tasks are used for training and 3 tasks are used for hyperparameter search. For MEGA, we do not conduct exhaustive hyperparameter search and reuse the hyperparameters of AGEM (Chaudhry et al., 2018b). For other baselines, we use best hyperparameters found by Chaudhry et al. (2018b). For the detailed hyperparameters, please see Appendix G of Chaudhry et al. (2018b). In MEGA, we solve Eq.9 three times with different random starting points and the update in Eq.10 is repeated for ten iterations.

(a) Permuted MNIST
(b) Split CIFAR
(c) Split CUB
(d) Split AWA
Figure 1: Performance of lifelong learning models across different measures on Permuted MNIST, Split CIFAR, Split CUB and Split AWA.

5 Results

5.1 MEGA VS Baselines

In Fig. 1 we show the results across different measures on all the benchmark datasets. We have the following observations based on the results. First, the proposed MEGA outperforms all baselines across the benchmarks, except that PROG-NN achieves a slightly higher accuracy on Permuted MNIST. As we can see from the memory comparison, PROG-NN is very memory inefficient since it allocates a new network for each task, thus the number of parameters grows super-linearly with the number of tasks. This becomes problematic when large networks are being used. For example, PROG-NN runs out of memory on Split CUB and Split AWA which prevents it from scaling up to real-life problems. On other datasets, MEGA consistently performs better than all the baselines, from Fig. 2 we can see that on Split CUB, MEGA even surpasses the multi-task baseline which is previously believed as an upper bound performance of lifelong learning algorithms (Chaudhry et al., 2018b). Second, MEGA achieves the lowest Forgetting Measure across all the datasets which indicates its ability to overcome catastrophic forgetting. Third, the proposed MEGA also obtains a high LCA across all the datasets which shows that MEGA also learns quickly. The evolution of LCA in the first ten mini-batches across all the datasets is shown in Fig. 3. Last, compared with AGEM (Chaudhry et al., 2018b), which is the state-of-the-art method for lifelong learning, MEGA has the same memory cost and similar time complexity. For detailed results, please refer to Table 4 and Table 5 in Appendix A.2.

In Fig. 2 we show the evolution of average accuracy during the lifelong learning process. As more tasks are added, while the average accuracy of the baselines generally drops due to catastrophic forgetting, MEGA can maintain and even improve its performance. This shows MEGA has a clear advantage over the state-of-the-art lifelong learning methods.

(a) Permuted MNIST
(b) Split CIFAR
(c) Split CUB
(d) Split AWA
Figure 2: Evolution of average accuracy during the lifelong learning process.

5.2 Long-term Remembering

In Table 1 we show the results of Long-term Remembering (LTR) of some representative lifelong learning methods on different datasets. As stated before, an algorithm with low LTR indicates that it can maintain the performance on the tasks trained initially. From Table 1 we can see that the proposed MEGA algorithm achieves the lowest LTR across all the datasets. This demonstrates that MEGA can learn tasks in succession without forgetting the initial tasks which is crucial for real-world lifelong learning applications.

Methods Permuted MNIST Split CIFAR Split CUB Split AWA
MEGA 0.524 0.017 0.356 0.114 0.002 0.002 0.070 0.114
AGEM 0.716 0.048 0.643 0.124 0.456 0.174 0.178 0.082
EWC 3.292 0.135 2.493 0.427 1.021 0.210 0.675 0.214
VAN 5.375 0.194 2.613 0.174 0.976 0.215 0.202 0.090
Table 1: Results of Long-term Remembering (LTR).
(a) Permuted MNIST
(b) Split CIFAR
(c) Split CUB
(d) Split AWA
Figure 3: LCA of first ten mini-batches on different datasets.

5.3 Direct Approach

We show the comparison of MEGA and MEGA-D in Table 2. The performance of MEGA-D is on par with MEGA across all the datasets. This shows that it is important to explicitly consider the loss on the episodic memory in order to overcome catastrophic forgetting.

Methods Permuted MNIST Split CIFAR Split CUB Split AWA
(%) (%) (%) (%)
MEGA 91.21 0.10 0.05 0.01 66.12 1.93 0.05 0.02 80.58 1.94 0.01 0.01 54.28 4.84 0.04 0.04
MEGA-D 91.14 0.16 0.05 0.02 66.72 1.50 0.04 0.01 79.68 2.37 0.01 0.02 54.67 4.69 0.04 0.03
Table 2: Comparison of MEGA and MEGA-D.

5.4 Many Permutation

We show the results on Many Permutation in Table 3. Compared with Permuted MNIST, Many Permutation has 5 times more tasks (100) and much fewer examples per task (200). This introduces more non-stationarity into the learning process. Nevertheless, the proposed MEGA achieves competitive results in this setting. Compared with MER which achieves similar results to MEGA, MEGA is much more time efficient since it does not rely on the meta-learning procedure.

Methods VAN EWC GEM AGEM MER MEGA
Average Accuracy (%) 32.62 0.43 33.46 0.46 56.76 0.29 34.15 0.55 62.52 0.32 62.48 0.51
Table 3: Results on Many Permutation.

6 Related Work

Improving the continual learning ability of neural network is a prerequisite to extending it to more practical vision tasks. Several lines of lifelong learning methods are proposed recently, we categorize them into different types based on the methodology,

Regularization based approaches: EWC (Kirkpatrick et al., 2017) adopts Fisher information matrix to prevent important weights for old tasks from changing drastically. In PI (Zenke et al., 2017), the authors introduce intelligent synapses which endows each individual synapse with a local measure of “importance” to avoid old memories from being overwritten. RWALK (Chaudhry et al., 2018a) utilizes a KL-divergence based regularization for preserving knowledge of old tasks. While in MAS (Aljundi et al., 2018) the importance measure for each parameter of the network is computed based on how sensitive the predicted output function is to a change in this parameter.

Knowledge transfer based methods: PROG-NN (Rusu et al., 2016) is a representative knowledge transfer based lifelong method. In PROG-NN, a new “column” with lateral connections to previous hidden layers is added for each new task. The lateral connections allow the new task to leverage the knowledge extracted from the old tasks.

Episodic memory based approaches: In episodic memory based lifelong learning methods, a small episodic memory is used for storing a subset of the examples from old tasks. Different episodic memory based approaches differ in the way of computing the reference gradients of the episodic memory. GEM (Lopez-Paz and others, 2017) computes the reference gradients using each individual previous tasks while in AGEM (Chaudhry et al., 2018b) the reference gradient is computed on the episodic memory by randomly sampling a batch of examples. MER a (Riemer et al., 2018) is recently proposed lifelong learning algorithm which maintains an experience replay style memory with reservoir sampling and employs a meta-learning style training strategy. Our proposed method is closely related to Chaudhry et al. (2018b) and is more general and flexible than previous approaches.

7 Conclusion

In this paper, we present a lifelong learning algorithm called MEGA which achieves the state-of-the-art performance across all the benchmark datasets. In MEGA, we cast the lifelong learning problem as an optimization problem with composite objective and solve it with the proposed mixed stochastic gradient. We also propose an important lifelong learning metric called LTR to characterize the ability of lifelong learning algorithms to maintain performance on the tasks trained in the far past. Extensive experimental results show that the proposed MEGA achieves superior results across all the considered metrics and establishes the new state-of-the-art on all the datasets.

References

  • R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars (2018) Memory aware synapses: learning what (not) to forget. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 139–154. Cited by: 4th item, §6.
  • N. Cesa-Bianchi, A. Conconi, and C. Gentile (2004) On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory 50 (9), pp. 2050–2057. Cited by: §3.
  • A. Chaudhry, P. K. Dokania, T. Ajanthan, and P. H. Torr (2018a) Riemannian walk for incremental learning: understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 532–547. Cited by: §1, §2.2, §3.1, 4th item, §6.
  • A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny (2018b) Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420. Cited by: §A.3, §1, §1, §1, §2.1, §2.1, §2.2, §2.2, §3.1, §3, §3, 3rd item, 4th item, 3rd item, §4.2, §4.3, §5.1, §6.
  • R. M. French (1999) Catastrophic forgetting in connectionist networks. Trends in cognitive sciences 3 (4), pp. 128–135. Cited by: §1, §2.1.
  • I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio (2016) Deep learning. Vol. 1, MIT Press. Cited by: §1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §4.2.
  • J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. (2017) Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114 (13), pp. 3521–3526. Cited by: §1, §1, §2.1, §2.1, 1st item, 4th item, §6.
  • A. Krizhevsky et al. (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: 2nd item.
  • C. H. Lampert, H. Nickisch, and S. Harmeling (2009) Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–958. Cited by: 4th item.
  • Y. LeCun, C. Cortes, and C. J. Burges (1998) The mnist database. URL http://yann. lecun. com/exdb/mnist. Cited by: 1st item.
  • D. Lopez-Paz et al. (2017) Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, pp. 6467–6476. Cited by: §1, §1, §2.1, §2.1, §3.1, §3, §3, 3rd item, §4.2, §6.
  • Y. Nesterov (1998) Introductory lectures on convex programming volume i: basic course. Lecture notes 3 (4), pp. 5. Cited by: §3.1.
  • G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter (2019) Continual lifelong learning with neural networks: a review. Neural Networks. Cited by: §1.
  • R. Ratcliff (1990) Connectionist models of recognition memory: constraints imposed by learning and forgetting functions.. Psychological review 97 (2), pp. 285. Cited by: §1.
  • M. Riemer, I. Cases, R. Ajemian, M. Liu, I. Rish, Y. Tu, and G. Tesauro (2018) Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910. Cited by: §1, §1, §3, 3rd item, §6.
  • A. Robins (1995) Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Science 7 (2), pp. 123–146. Cited by: §1, §1.
  • A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell (2016) Progressive neural networks. arXiv preprint arXiv:1606.04671. Cited by: §1, §2.1, 5th item, §6.
  • S. Thrun and T. M. Mitchell (1995) Lifelong robot learning. Robotics and autonomous systems 15 (1-2), pp. 25–46. Cited by: §1.
  • C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie (2011) The caltech-ucsd birds-200-2011 dataset. Cited by: 3rd item.
  • F. Zenke, B. Poole, and S. Ganguli (2017) Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3987–3995. Cited by: §1, 2nd item, 4th item, §6.

Appendix A Appendix

a.1 Some Derivations

For notation simplicity, we use , , , to replace , , , respectively. If , then , . Otherwise, the goal is to solve

(11)

The solution of (11) is

(12)

a.2 Result Tables

Methods Permuted MNIST Split CIFAR
VAN 47.552.37 0.520.026 0.2590.005 40.441.02 0.270.006 0.3090.011
EWC 68.680.98 0.280.010 0.2760.002 42.674.24 0.260.039 0.3360.010
MAS 70.301.67 0.260.018 0.2980.006 42.353.52 0.260.030 0.3320.010
RWALK 85.600.71 0.080.007 0.3190.003 42.113.69 0.270.032 0.3340.012
MER - - - 37.271.68 0.030.030 0.0510.101
PROG-NN 93.550.06 0.00.000 0.1980.006 59.791.23 0.00.000 0.2080.002
GEM 89.500.48 0.060.004 0.2300.005 61.200.78 0.060.007 0.3600.007
AGEM 89.320.46 0.070.004 0.2770.008 61.281.88 0.090.018 0.350 0.013
MEGA 91.210.10 0.050.001 0.2830.004 66.121.94 0.060.015 0.3750.012
Table 4: The results of Average Accuracy (), Forgetting Measure () and LCA of different methods on Permuted MNIST and Split CIFAR. The results are averaged across 5 runs with different random seeds.
Methods Split CUB Split AWA
VAN 53.892.00 0.130.020 0.2920.008 30.352.81 0.040.013 0.2140.008
EWC 53.561.67 0.140.024 0.2920.009 33.433.07 0.080.021 0.2570.011
MAS 54.121.72 0.130.013 0.2930.008 33.832.99 0.080.022 0.2570.011
RWALk 54.111.71 0.130.013 0.2930.009 33.632.64 0.080.023 0.2580.011
PI 55.043.05 0.120.026 0.2920.010 33.862.77 0.080.022 0.2590.011
AGEM 61.823.72 0.080.021 0.3020.011 44.952.97 0.050.014 0.2870.012
MEGA 80.581.94 0.010.017 0.3110.010 54.284.84 0.050.040 0.3050.015
Table 5: The results of Average Accuracy (), Forgetting Measure () and LCA of different methods on Split CUB and Split AWA. The results are averaged across 10 runs with different random seeds.

a.3 Learning process

In this section we report the result matrices for MEGA and AGEM (Chaudhry et al., 2018b) on each dataset. The entry (, ) of the matrix is the test accuracy of the -th task after the model is trained on the -th task.

a.3.1 Permuted MNIST

MEGA

0.9613 0.1091 0.1229 0.0832 0.1374 0.0708 0.0907 0.1017 0.1165 0.1286 0.0979 0.1182 0.1188 0.0886 0.0968 0.0854 0.0928 0.9535 0.9645 0.0895 0.0997 0.1191 0.0685 0.0803 0.1022 0.1165 0.1472 0.1054 0.1112 0.1264 0.1027 0.0872 0.0979 0.0993 0.9391 0.9556 0.9596 0.0996 0.1020 0.0900 0.0807 0.1083 0.0959 0.1400 0.1001 0.1012 0.1096 0.1085 0.0977 0.0716 0.0768 0.9295 0.9473 0.9527 0.9477 0.1113 0.0725 0.0856 0.1033 0.0884 0.1209 0.0847 0.1149 0.1285 0.0939 0.1193 0.0867 0.0824 0.9206 0.9405 0.9437 0.9569 0.9611 0.0785 0.0884 0.1113 0.0926 0.1189 0.0936 0.1337 0.1544 0.1154 0.1282 0.1010 0.0994 0.9119 0.9347 0.9378 0.9481 0.9547 0.9594 0.0916 0.1200 0.1013 0.1042 0.0908 0.1380 0.1415 0.1199 0.1210 0.0908 0.0847 0.9083 0.9261 0.9348 0.9419 0.9478 0.9556 0.9575 0.1092 0.1040 0.1083 0.0863 0.1202 0.1286 0.1177 0.1250 0.0801 0.0931 0.9100 0.9192 0.9291 0.9332 0.9419 0.9462 0.9527 0.9598 0.1152 0.1132 0.0945 0.1054 0.1248 0.1228 0.1187 0.0945 0.0934 0.9022 0.9133 0.9215 0.9271 0.9344 0.9381 0.9430 0.9476 0.9551 0.1187 0.1162 0.1119 0.1364 0.1249 0.1108 0.1012 0.1059 0.8974 0.9074 0.9147 0.9242 0.9289 0.9348 0.9367 0.9404 0.9519 0.9571 0.1102 0.1032 0.1536 0.1261 0.1122 0.1036 0.1093 0.8957 0.9042 0.9146 0.9193 0.9229 0.9313 0.9321 0.9318 0.9409 0.9545 0.9591 0.1144 0.1368 0.1373 0.1143 0.1092 0.1169 0.8863 0.8981 0.9056 0.9127 0.9148 0.9220 0.9270 0.9249 0.9335 0.9444 0.9528 0.9564 0.1517 0.1103 0.1051 0.1002 0.1390 0.8840 0.8992 0.9054 0.9085 0.9149 0.9179 0.9238 0.9198 0.9304 0.9419 0.9441 0.9523 0.9570 0.1292 0.1083 0.0926 0.1301 0.8808 0.8901 0.8994 0.8986 0.9084 0.9113 0.9168 0.9185 0.9239 0.9360 0.9382 0.9453 0.9522 0.9589 0.1017 0.0941 0.1277 0.8770 0.8850 0.8957 0.8926 0.9012 0.9101 0.909 0 0.9132 0.9188 0.9301 0.9358 0.9368 0.9430 0.9508 0.9521 0.0946 0.1334 0.8752 0.8806 0.8911 0.8854 0.8965 0.9070 0.9062 0.9059 0.9145 0.9265 0.9286 0.9338 0.9374 0.9434 0.9462 0.9601 0.1291 0.8732 0.8765 0.8824 0.8809 0.8945 0.9024 0.9016 0.9007 0.9088 0.9202 0.9228 0.9276 0.9279 0.9376 0.9408 0.9521 0.9556

AGEM

0.9613 0.1091 0.1229 0.0832 0.1374 0.0708 0.0907 0.1017 0.1165 0.1286 0.0979 0.1182 0.1188 0.0886 0.0968 0.0854 0.0928 0.9509 0.9645 0.0956 0.0991 0.1304 0.0696 0.0840 0.1033 0.1219 0.1454 0.1064 0.1133 0.1314 0.1043 0.0883 0.0979 0.0973 0.9410 0.9545 0.9615 0.0995 0.0964 0.0921 0.0710 0.1126 0.1176 0.1402 0.1112 0.1026 0.1185 0.1204 0.1077 0.0779 0.0761 0.9299 0.9450 0.9540 0.9546 0.1046 0.0788 0.0959 0.1033 0.1096 0.1266 0.1015 0.1152 0.1476 0.0885 0.1375 0.0984 0.0831 0.9151 0.9361 0.9425 0.9551 0.9588 0.0803 0.0809 0.1143 0.1063 0.1227 0.1066 0.1253 0.1436 0.1154 0.1131 0.1079 0.0915 0.9068 0.9312 0.9401 0.9450 0.9566 0.9590 0.0892 0.1189 0.1285 0.1086 0.1007 0.1433 0.1279 0.1179 0.1097 0.0892 0.0865 0.9015 0.9228 0.9339 0.9385 0.9473 0.9548 0.9586 0.1063 0.1073 0.1102 0.1048 0.1164 0.1291 0.1284 0.1341 0.0854 0.1024 0.8980 0.9155 0.9248 0.9280 0.9356 0.9403 0.9539 0.9580 0.1015 0.1231 0.1129 0.1125 0.1267 0.1133 0.1220 0.0921 0.0985 0.8952 0.9055 0.9201 0.9182 0.9273 0.9310 0.9447 0.9435 0.9512 0.1098 0.1374 0.1166 0.1264 0.1064 0.1183 0.0986 0.1048 0.8846 0.8996 0.9083 0.9154 0.9189 0.9267 0.9363 0.9339 0.9513 0.9558 0.1243 0.1095 0.1179 0.1137 0.1126 0.0945 0.0979 0.8764 0.8977 0.9011 0.9073 0.9086 0.9167 0.9292 0.9274 0.9386 0.9481 0.9631 0.1116 0.1099 0.1417 0.1100 0.0975 0.1166 0.8710 0.8882 0.8937 0.8922 0.9043 0.9077 0.9151 0.9174 0.9279 0.9324 0.9518 0.9572 0.1346 0.1240 0.0964 0.0930 0.1283 0.8625 0.8822 0.8847 0.8855 0.9013 0.8990 0.9093 0.9088 0.9189 0.9295 0.9411 0.9458 0.9533 0.1309 0.1059 0.0987 0.1139 0.8581 0.8784 0.8774 0.8817 0.8954 0.8938 0.8986 0.9003 0.9082 0.9225 0.9307 0.9350 0.9435 0.9603 0.1048 0.1023 0.1048 0.8492 0.8674 0.8732 0.8697 0.8828 0.8826 0.8930 0.8898 0.8962 0.9098 0.9184 0.9267 0.9290 0.9425 0.9542 0.1012 0.1070 0.8322 0.8700 0.8644 0.8493 0.8765 0.8787 0.8904 0.8848 0.8883 0.8979 0.9110 0.9158 0.9177 0.9299 0.9403 0.9609 0.1179 0.8438 0.8603 0.8555 0.8488 0.8864 0.8785 0.8798 0.8702 0.8916 0.8968 0.9076 0.9094 0.9092 0.9228 0.9228 0.9463 0.9551

a.3.2 Split CIFAR

MEGA

0.6472 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6260 0.5824 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6324 0.5700 0.6300 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6236 0.5496 0.5624 0.6452 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6132 0.5612 0.6048 0.6736 0.6960 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6140 0.5628 0.5692 0.6632 0.6792 0.7688 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5780 0.5500 0.5864 0.6364 0.6792 0.7420 0.6868 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5756 0.5308 0.5764 0.6292 0.6500 0.6820 0.6580 0.6580 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6212 0.5704 0.5876 0.6376 0.6600 0.7056 0.6636 0.6916 0.7376 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5992 0.5580 0.5828 0.6212 0.6528 0.6512 0.6496 0.6700 0.7276 0.6732 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6104 0.5552 0.5804 0.6396 0.6700 0.6960 0.6568 0.6752 0.7412 0.6752 0.7432 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5880 0.5516 0.5816 0.6248 0.6552 0.6568 0.6412 0.6284 0.7044 0.6304 0.7196 0.6660 0.0000 0.0000 0.0000 0.0000 0.0000 0.6020 0.5628 0.5792 0.6164 0.6444 0.6636 0.6356 0.6536 0.7008 0.6132 0.7000 0.6336 0.7108 0.0000 0.0000 0.0000 0.0000 0.6124 0.5692 0.5924 0.6420 0.6516 0.6912 0.6352 0.6492 0.6848 0.6400 0.6872 0.6312 0.7280 0.7596 0.0000 0.0000 0.0000 0.6012 0.5468 0.5908 0.6128 0.6552 0.6852 0.6288 0.6428 0.6704 0.6176 0.6988 0.6268 0.7088 0.7348 0.6324 0.0000 0.0000 0.6244 0.5588 0.5960 0.6432 0.6448 0.6868 0.6388 0.6540 0.6896 0.6056 0.6920 0.6192 0.7124 0.7344 0.6560 0.7604 0.0000 0.6088 0.5896 0.5840 0.6552 0.6716 0.6904 0.6584 0.6372 0.7032 0.6300 0.6900 0.5864 0.6832 0.7140 0.6348 0.7264 0.7780

AGEM

0.6772 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5948 0.5764 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6324 0.5828 0.6432 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5980 0.5384 0.5396 0.6456 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5864 0.5404 0.5576 0.6436 0.7004 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5728 0.5392 0.5068 0.5940 0.6344 0.7180 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5572 0.5404 0.5308 0.6224 0.6116 0.6520 0.6688 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6064 0.5356 0.5492 0.5872 0.6164 0.6532 0.6296 0.6724 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6060 0.5472 0.5528 0.6236 0.5920 0.6348 0.6076 0.6348 0.6972 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6004 0.5080 0.4960 0.6128 0.5656 0.6356 0.5752 0.6140 0.6580 0.6792 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5992 0.5408 0.5332 0.5964 0.5928 0.6520 0.5928 0.6304 0.6764 0.5916 0.7364 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5748 0.5020 0.5104 0.5936 0.6016 0.6184 0.5772 0.6164 0.6408 0.5688 0.6776 0.6436 0.0000 0.0000 0.0000 0.0000 0.0000 0.6056 0.5100 0.5200 0.5916 0.6012 0.6056 0.5816 0.6060 0.6236 0.5808 0.6288 0.5768 0.7332 0.0000 0.0000 0.0000 0.0000 0.6184 0.5344 0.5308 0.5888 0.6116 0.6188 0.6012 0.6248 0.6136 0.5836 0.6428 0.5688 0.6524 0.7392 0.0000 0.0000 0.0000 0.6012 0.5220 0.5488 0.6008 0.5828 0.6048 0.5728 0.5884 0.6356 0.5740 0.6476 0.5540 0.6520 0.6804 0.6460 0.0000 0.0000 0.5984 0.5360 0.5520 0.5808 0.5704 0.6184 0.6068 0.6108 0.6452 0.5404 0.6520 0.5256 0.6624 0.6512 0.5864 0.7388 0.0000 0.6232 0.5356 0.5412 0.6104 0.6080 0.6248 0.5944 0.5900 0.6492 0.5872 0.6468 0.5352 0.6336 0.6520 0.5908 0.6444 0.7508

a.3.3 Split CUB

MEGA

0.6472 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6260 0.5824 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6324 0.5700 0.6300 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6236 0.5496 0.5624 0.6452 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6132 0.5612 0.6048 0.6736 0.6960 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6140 0.5628 0.5692 0.6632 0.6792 0.7688 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5780 0.5500 0.5864 0.6364 0.6792 0.7420 0.6868 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5756 0.5308 0.5764 0.6292 0.6500 0.6820 0.6580 0.6580 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6212 0.5704 0.5876 0.6376 0.6600 0.7056 0.6636 0.6916 0.7376 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5992 0.5580 0.5828 0.6212 0.6528 0.6512 0.6496 0.6700 0.7276 0.6732 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6104 0.5552 0.5804 0.6396 0.6700 0.6960 0.6568 0.6752 0.7412 0.6752 0.7432 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5880 0.5516 0.5816 0.6248 0.6552 0.6568 0.6412 0.6284 0.7044 0.6304 0.7196 0.6660 0.0000 0.0000 0.0000 0.0000 0.0000 0.6020 0.5628 0.5792 0.6164 0.6444 0.6636 0.6356 0.6536 0.7008 0.6132 0.7000 0.6336 0.7108 0.0000 0.0000 0.0000 0.0000 0.6124 0.5692 0.5924 0.6420 0.6516 0.6912 0.6352 0.6492 0.6848 0.6400 0.6872 0.6312 0.7280 0.7596 0.0000 0.0000 0.0000 0.6012 0.5468 0.5908 0.6128 0.6552 0.6852 0.6288 0.6428 0.6704 0.6176 0.6988 0.6268 0.7088 0.7348 0.6324 0.0000 0.0000 0.6244 0.5588 0.5960 0.6432 0.6448 0.6868 0.6388 0.6540 0.6896 0.6056 0.6920 0.6192 0.7124 0.7344 0.6560 0.7604 0.0000 0.6088 0.5896 0.5840 0.6552 0.6716 0.6904 0.6584 0.6372 0.7032 0.6300 0.6900 0.5864 0.6832 0.7140 0.6348 0.7264 0.7780

AGEM

0.6772 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5948 0.5764 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6324 0.5828 0.6432 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5980 0.5384 0.5396 0.6456 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5864 0.5404 0.5576 0.6436 0.7004 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5728 0.5392 0.5068 0.5940 0.6344 0.7180 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5572 0.5404 0.5308 0.6224 0.6116 0.6520 0.6688 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6064 0.5356 0.5492 0.5872 0.6164 0.6532 0.6296 0.6724 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6060 0.5472 0.5528 0.6236 0.5920 0.6348 0.6076 0.6348 0.6972 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6004 0.5080 0.4960 0.6128 0.5656 0.6356 0.5752 0.6140 0.6580 0.6792 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5992 0.5408 0.5332 0.5964 0.5928 0.6520 0.5928 0.6304 0.6764 0.5916 0.7364 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5748 0.5020 0.5104 0.5936 0.6016 0.6184 0.5772 0.6164 0.6408 0.5688 0.6776 0.6436 0.0000 0.0000 0.0000 0.0000 0.0000 0.6056 0.5100 0.5200 0.5916 0.6012 0.6056 0.5816 0.6060 0.6236 0.5808 0.6288 0.5768 0.7332 0.0000 0.0000 0.0000 0.0000 0.6184 0.5344 0.5308 0.5888 0.6116 0.6188 0.6012 0.6248 0.6136 0.5836 0.6428 0.5688 0.6524 0.7392 0.0000 0.0000 0.0000 0.6012 0.5220 0.5488 0.6008 0.5828 0.6048 0.5728 0.5884 0.6356 0.5740 0.6476 0.5540 0.6520 0.6804 0.6460 0.0000 0.0000 0.5984 0.5360 0.5520 0.5808 0.5704 0.6184 0.6068 0.6108 0.6452 0.5404 0.6520 0.5256 0.6624 0.6512 0.5864 0.7388 0.0000 0.6232 0.5356 0.5412 0.6104 0.6080 0.6248 0.5944 0.5900 0.6492 0.5872 0.6468 0.5352 0.6336 0.6520 0.5908 0.6444 0.7508

a.3.4 Split AWA

MEGA

0.6472 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6260 0.5824 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6324 0.5700 0.6300 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6236 0.5496 0.5624 0.6452 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6132 0.5612 0.6048 0.6736 0.6960 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6140 0.5628 0.5692 0.6632 0.6792 0.7688 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5780 0.5500 0.5864 0.6364 0.6792 0.7420 0.6868 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5756 0.5308 0.5764 0.6292 0.6500 0.6820 0.6580 0.6580 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6212 0.5704 0.5876 0.6376 0.6600 0.7056 0.6636 0.6916 0.7376 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5992 0.5580 0.5828 0.6212 0.6528 0.6512 0.6496 0.6700 0.7276 0.6732 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6104 0.5552 0.5804 0.6396 0.6700 0.6960 0.6568 0.6752 0.7412 0.6752 0.7432 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5880 0.5516 0.5816 0.6248 0.6552 0.6568 0.6412 0.6284 0.7044 0.6304 0.7196 0.6660 0.0000 0.0000 0.0000 0.0000 0.0000 0.6020 0.5628 0.5792 0.6164 0.6444 0.6636 0.6356 0.6536 0.7008 0.6132 0.7000 0.6336 0.7108 0.0000 0.0000 0.0000 0.0000 0.6124 0.5692 0.5924 0.6420 0.6516 0.6912 0.6352 0.6492 0.6848 0.6400 0.6872 0.6312 0.7280 0.7596 0.0000 0.0000 0.0000 0.6012 0.5468 0.5908 0.6128 0.6552 0.6852 0.6288 0.6428 0.6704 0.6176 0.6988 0.6268 0.7088 0.7348 0.6324 0.0000 0.0000 0.6244 0.5588 0.5960 0.6432 0.6448 0.6868 0.6388 0.6540 0.6896 0.6056 0.6920 0.6192 0.7124 0.7344 0.6560 0.7604 0.0000 0.6088 0.5896 0.5840 0.6552 0.6716 0.6904 0.6584 0.6372 0.7032 0.6300 0.6900 0.5864 0.6832 0.7140 0.6348 0.7264 0.7780

A-GEM

0.6772 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5948 0.5764 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6324 0.5828 0.6432 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5980 0.5384 0.5396 0.6456 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5864 0.5404 0.5576 0.6436 0.7004 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5728 0.5392 0.5068 0.5940 0.6344 0.7180 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5572 0.5404 0.5308 0.6224 0.6116 0.6520 0.6688 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6064 0.5356 0.5492 0.5872 0.6164 0.6532 0.6296 0.6724 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6060 0.5472 0.5528 0.6236 0.5920 0.6348 0.6076 0.6348 0.6972 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6004 0.5080 0.4960 0.6128 0.5656 0.6356 0.5752 0.6140 0.6580 0.6792 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5992 0.5408 0.5332 0.5964 0.5928 0.6520 0.5928 0.6304 0.6764 0.5916 0.7364 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5748 0.5020 0.5104 0.5936 0.6016 0.6184 0.5772 0.6164 0.6408 0.5688 0.6776 0.6436 0.0000 0.0000 0.0000 0.0000 0.0000 0.6056 0.5100 0.5200 0.5916 0.6012 0.6056 0.5816 0.6060 0.6236 0.5808 0.6288 0.5768 0.7332 0.0000 0.0000 0.0000 0.0000 0.6184 0.5344 0.5308 0.5888 0.6116 0.6188 0.6012 0.6248 0.6136 0.5836 0.6428 0.5688 0.6524 0.7392 0.0000 0.0000 0.0000 0.6012 0.5220 0.5488 0.6008 0.5828 0.6048 0.5728 0.5884 0.6356 0.5740 0.6476 0.5540 0.6520 0.6804 0.6460 0.0000 0.0000 0.5984 0.5360 0.5520 0.5808 0.5704 0.6184 0.6068 0.6108 0.6452 0.5404 0.6520 0.5256 0.6624 0.6512 0.5864 0.7388 0.0000 0.6232 0.5356 0.5412 0.6104 0.6080 0.6248 0.5944 0.5900 0.6492 0.5872 0.6468 0.5352 0.6336 0.6520 0.5908 0.6444 0.7508

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
391978
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description