Learning Interpretable Shapelets for Time Series Classification through Adversarial Regularization
Times series classification can be successfully tackled by jointly learning a shapelet-based representation of the series in the dataset and classifying the series according to this representation. However, although the learned shapelets are discriminative, they are not always similar to pieces of a real series in the dataset. This makes it difficult to interpret the decision, i.e. difficult to analyze if there are particular behaviors in a series that triggered the decision. In this paper, we make use of a simple convolutional network to tackle the time series classification task and we introduce an adversarial regularization to constrain the model to learn more interpretable shapelets. Our classification results on all the usual time series benchmarks are comparable with the results obtained by similar state-of-the-art algorithms but our adversarially regularized method learns shapelets that are, by design, interpretable.
A time series (TS) is a series of time-ordered values, where , is the length of our time series and is the dimension of the feature vector describing each data point. If , is said univariate, otherwise it is said multivariate. In this paper, we are interested in the Time Series Classification (TSC) task. We are given a training set , composed of time series and their associated labels (target variable). Our aim is to learn a function such that , in order to predict the labels of new incoming time series. The time series classification problem has been studied in countless applications (see for example ) ranging from stock exchange evolution, daily energy consumption, medical sensors, videos, etc.
Many methods have been developed to tackle this problem (see  for a review). One very successful category of methods consists in ”finding” discriminative phase-independent subsequences, called shapelets, that can be used to classify the series. In the first papers about shapelet-based time series classification [26, 18], the shapelets were directly extracted from the training set and the selected shapelets could be used a posteriori to explain the classifier’s decision. However, the shapelet enumeration and selection processes were either very costly or the selection was fast but did not yield good performance (as discussed in Section 2). Jointly learning a shapelet-based representation of the series in the dataset and classifying the series according to this representation [15, 11] allowed to obtain discriminative shapelets in a much more efficient way. An example of such a learned shapelet, obtained with the method from , is given in Figure 1 (top). However, if the learned shapelets are definitively discriminative, they are often different from actual pieces of a real series in the dataset. As such, the classification decision is difficult to interpret, i.e. it is difficult to determine what particular behavior in a time series triggered the classification decision. Note that the same interpretability issue arises with ensemble classifiers such as  where one decision depends on the presence of multiple shapelets. One of the main challenge nowadays is to enrich Machine Learning (ML) systems, and in particular black box models such as neural networks, so that they have the ability to explain their outputs to human users. In many scenarios, it may be risky, unacceptable, or simply illegal, to let artificial intelligent systems make decisions without any human supervision . Hence, it is necessary for ML systems to provide an explanation of their decisions to all the humans concerned.
In this paper, we make use of a simple convolutional network to classify time series and we show how one can use adversarial techniques to regularize the parameters of this network such that it learns shapelets that could be more useful to interpret the classifier’s decision. Section 2 presents the related work on time series classification, interpretability of models and adversarial training. We present our adversarial parameter regularization method in Section 3. In Section 4, we show quantitative and qualitative results on the usual time series benchmarks  that are both on par with state-of-the-art methods and very interesting to interpret the neural network predictions.
2 Related Work
In this section we review the literature on Time Series Classification (TSC), on tools for understanding black box model predictions and on adversarial training.
2.1 Time Series Classification
In the TSC literature, two main families of approaches have been designed. First, a dedicated metric can be used to compare the time series. In this case the decision is based on the resulting similarities. For example,  uses Dynamic Time Warping (DTW) to find an optimal alignment between time series and provides an alignment cost that can be used to assess the similarity. Another family of methods is based on the extraction of features in the time series. Among these works, shapelet-based classifiers have attracted a lot of attention from the research community.
Shapelets are discriminative subseries that can either be extracted from a set of time series or learned so as to minimize an objective function. They have been introduced in , in which a binary decision tree is built, whose nodes are shapelets and whose subtrees contain subsets of time series that contain or not that shapelet. In this work, shapelets are extracted from a training set of time series and building the decision tree requires to test all possible subseries from the training set, which makes the method intractable for large-scale learning with an overall time complexity of where is the number of training time series and is the average length of the time series in the training set. This high time complexity has led to the use of heuristics in order to select the shapelets more efficiently. In  (Fast Shapelets), the authors rely on quantized time series and random projections in order to fasten the shapelet search. Note however that these improvements in time complexity are obtained at the cost of a lower classification accuracy, as reported in . The Shapelet Transform (ST)  consists in transforming time series into a feature vector whose coordinates represent distances between the time series and the shapelets selected beforehand. It hence needs to select a shapelet set (as in ) before transforming the time series. The resulting vectors are then given to a classifier in order to build the decision function. The training time complexity for ST is also in , which makes it unfit for large scale learning.
In order to face the high complexity that comes with search-based methods, other strategies have been designed for shapelet selection. On the one hand, some attention has been paid to random sampling of shapelets from the training set . On the other hand, Grabocka et al.  showed that shapelets could be learned using a gradient-descent-based optimization algorithm. The method, referred to as Learning Shapelets (LS) in the following, jointly learns the shapelets and the parameters of a logistic regression classifier. This makes the method very similar in spirit to a neural network with a single convolutional layer followed by a fully connected classification layer and where the convolution operation is replaced by a sliding-window local distance computation. A min-pooling aggregator should then be used for temporal aggregation.
Closely related to shapelet-based methods (as stated above), variants of Convolutional Neural Networks (CNN) have been introduced for the TSC task . These are mostly mono-dimensional variants of CNN models developed in the Computer Vision field. Note however that most models are rather shallow, which is likely to be related to the moderate sizes of the benchmark datasets present in the UCR/UEA archive . A review of these models can be found in .
Finally, ensemble-based methods, such as COTE  or HIVE-COTE , that rely on several of the above-presented standalone classifiers are now considered state-of-the-art for the TSC task. Note however that these methods tend to be computationally expensive, with high memory usage and difficult to interpret (as stated in Section 1) due to the combination of many different core classifiers.
In this paper, we propose a method that is scalable (compared to methods such as Shapelets  or ST ), yields interpretable results which can be used to explain the classifier’s decision (compared to ensemble approaches or unconstrained approaches such as  or ), and exhibits good classification accuracy (compared to FS ).
2.2 Model Interpretability
Among the vast number of existing classifiers, some are easily interpretable (e.g. decision trees, classification rules), while others are difficult to interpret (e.g. ensemble methods, neural networks that can be considered as black-boxes). Interpretation of black box classifiers usually consists in designing an interpretation layer between the classifier and the human level. Two criteria refine the category of methods to interpret classifiers: global versus local explanations, and black-box dependent versus agnostic. In this category, state-of-the-art methods are Local Interpretable Model-agnostic Explanations (LIME and Anchors) [19, 20] and SHapley Additive exPlanations (SHAP) . SHAP values come with the black-box local estimation advantages of LIME, but also with theoretical guarantees. A higher absolute SHAP value of an attribute compared to another means that it has a higher predictive or discriminative power.
In this paper, we are interested in making the decision of a neural network understandable. We follow the concept of interpretable shapelet as in : for a TSC model, a simple explanation should not directly come from the vector of attributes describing each point of each time series but rather from some discriminative shapelets internally learned to produce an intermediate representation to classify the series. Solutions such as LIME, Anchors and SHAP which are not designed to inspect the internal representation of a model are thus not well suited for our problem.
Fang et al.  have a similar goal as ours (to produce interpretable discriminative shapelets) and build on both the works from  (in this case the candidate shapelets are extracted with a piecewise aggregate approximation) and from  to automatically refine the “handcrafted” shapelets. Contrarily to our method, there is no explicit constraint on the learning process that ensures the interpretability of the shapelets. Besides, their experimental validation makes it hard to fully grasp the benefits and limitations of the proposed method since the algorithm is evaluated on a small subset of UCR/UEA datasets  and they provide visualizations for only a couple of the learned shapelets.
2.3 Adversarial Training
Adversarial training of neural networks has been popularized by Generative Adversarial Networks (GANs)  and their numerous variants.111See https://github.com/hindupuravinash/the-gan-zoo for a list. A GAN is a combination of two neural networks: a generator and a discriminator which compete against each other during the training process to reach an equilibrium where the discriminator cannot distinguish between the generator outputs and real training data. In a GAN, the adversarial network is used to push the generator towards producing data as similar to real data as possible. Other (non generative) adversarial training settings have been studied, for example in the context of domain adaptation . In this case, the adversarial network is used to regularize the latent representation learned by the classifier such that it becomes domain-independent. The recent work from  also uses adversarial regularization to constrain the latent representation of an autoencoder to follow a given distribution.
In this paper, we propose an adversarial regularization approach which is unique as 1) we use a non-generative adversarial approach, 2) we do not work on a latent representation or on the output of a generator but rather on the CNN convolution filters (i.e., it is used as a parameter regularization), and, 3) we leverage this regularization to encourage interpretability, by making the convolution filters similar to real subseries from the training data.
3 Learning Interpretable Shapelets
In this section, we present our approach to learn interpretable discriminative shapelets for time series classification.
Both LS and CNN slide the shapelets on the series to compute local (dis)similarities. The main difference between the classifier of LS and that of our method is the (dis)similarity between a shapelet and a series. LS uses a squared Euclidean distance between a portion of the time series starting at index and a shapelet of length :
The smaller this distance, the closer the shapelet is to the considered subseries. In a CNN, the feature map is obtained from a convolution, and hence encodes cross-correlation between a series and a shapelet:
Note that here, the higher , the more similar the shapelet is to the subseries.
As shown in Figure 2 (bottom), the convolutional layer of this classifier is made of three parallel convolutional blocks with shapelets of different lengths (red, green, blue) to be comparable with the structure proposed in LS. We will loosely refer to the convolution filters of our classifier as Shapelets in the following.
Inspired by previous works on adversarial training (see e.g. Section 2), in addition to our CNN classifier, we make use of an adversarial neural network (the discriminator at the top of Figure 2) to regularize the convolution parameters of our classifier. This regularization acts as a soft constraint for the classifier to learn shapelets as similar to real pieces of the training time series as possible.
This novel regularization strategy is referred to, in the following, as Adversarial InputParameter Regularization (AIPR) and the corresponding model is named AIPR-CNN.
Contrarily to GANs, our adversarial architecture does not rely on a generator to produce fake samples from a latent space. The AIPR strategy iteratively modifies the shapelets (i.e. the convolution filters of the classifier) such that they become close to subseries from the training set. To execute this strategy, the discriminator is trained to distinguish between real subseries from the training set and the shapelets. During the regularization phase, the discriminator updates the shapelets so that they become more and more similar to real subseries.
To obtain the best trade-off between the discriminative power of the shapelets (i.e. the final classification performance) and their interpretability, our training procedure alternates between training the discriminator and the classifier.
The type of data given as input to the discriminator is another major difference between a GAN and AIPR-CNN: in a GAN, the discriminator is fed with complete instances, while in AIPR-CNN, the discriminator takes subseries as input. These subseries can either be shapelets from the classifier model (denoted as in Figure 2), portions of training time series (denoted as ) or interpolations between shapelets and training time series portions (, see the following section for more details on those), as illustrated in Figure 3. This process allows the discriminator to alter the shapelets for better interpretability.
3.1 Loss Function
As for GANs, our optimization process alternates between losses attached to the subparts of our AIPR-CNN model. Here, each training epoch consists of three main steps that are (i) optimizing the classifier parameters for correct classification, (ii) optimizing the discriminator parameters to better distinguish between real subseries and shapelets and (iii) optimizing shapelets to fool the discriminator. Each of these steps is attached to a loss function that we describe in the following.
Firstly, a multi-class cross entropy loss is used for the classifier. It is denoted by where is the set of all classifier parameters.
Secondly, our discriminator is trained using a loss function derived from the Wasserstein GANs with Gradient Penalty (WGAN-GP) :
where is the empirical distribution over the shapelets, is the empirical distribution over the training subseries, and
where is drawn uniformly at random from the interval (cf. Figure 3) .
Thirdly, shapelets are updated to fool the discriminator by optimizing on the loss where is the set of shapelet coefficients:
3.2 Learning Algorithm
Algorithm 1 presents the whole training procedure to update the parameters of our AIPR-CNN model. At each epoch of this algorithm, the three steps presented above are executed sequentially. Note that in the second step (lines 10–17), sampling classifier shapelets, as well as sampling subseries from the training set, is performed uniformly at random.
In this section, we will detail the training procedure for the AIPR-CNN and present both quantitative and qualitative experimental results.
4.1 Experimental Setting
As explained in Section 2, our most relevant competitor is Learning Shapelets (LS) from  as it also describes a shapelet-based model where the shapelets are learned and where a single model is used for classification. In the following sections, all the results presented for LS are retrieved from the UCR/UEA repository  and the shapelets presented for LS are obtained using the tslearn implementation .
To compare our proposed method with [26, 18, 11], we use the 85 univariate time series datasets from the UCR/UEA repository for which all the baselines are available .222See http://www.timeseriesclassification.com/singleTrainTest.csv for all used datasets and baseline results. Note that our CNN-based method is not, by design, limited to univariate time series. However, for a fair comparison, we limited ourself to these datasets for this study. The datasets are significantly different from one to another, including seven types of data with various number of instances, lengths, and classes. The splits between training and test sets are provided in the repository.
4.1.2 Architecture details and parameter setting
We have implemented the AIPR-CNN model using TensorFlow  following the general architecture illustrated in Figure 2. The classifier is composed of one 1D convolution layer with ReLU activation, followed by a maxpooling layer along the temporal dimension and a fully connected layer with a softmax activation. The shapelets use a Glorot uniform initializer  while the other weights are initialized uniformly (using a fixed range). For each dataset, three different shapelet lengths are considered, inspired by the heuristic from  but without resorting to hyper-parameter search: we consider 3 groups of shapelets of length , and , where is the number of classes in the dataset and is the length of the time series at stake.
The convolution filters of the classifier, i.e. the shapelets, are given as input to the discriminator which has the same structure as the classifier, but with shorter convolution filters (100 filters of size , and ) and a single-neuron activation instead of the softmax in the last layer. For optimization, we use Adam optimizer with a standard parametrization (, and ) and each epoch consists in (resp. and ) mini-batches of optimization for the classifier loss (resp. discriminator and regularizer losses).
Experimental results are reported in terms of test accuracy and aggregated over five random initializations. All experiments are run for 8000 training epochs. The authors are devoted to the reproduciblility of the results.
4.2 Qualitative Results
Our method aims at producing interpretable results in the sense that shapelets should be similar to sub-parts of some series from the dataset. We first validate that our AIPR scheme actually ensures that shapelets are similar to the training data. Then we show how shapelets that look like subseries are helpful to make the decision process interpretable.
We first illustrate our training process and its impact on a single shapelet in Figure 10. In this figure, we show the evolution of a given shapelet for the Wine dataset at epochs 20, 200, 800 and 8,000. One can see from the loss values reported in Figures (a)a and (d)d that these correspond to different stages in our learning process. At epoch 20, the Wasserstein loss is far from the 0 value ( corresponds to a case where the discriminator cannot distinguish between shapelets and real subseries), and this indeed corresponds to a shapelet that looks very different from an actual subseries. As epochs go, both the Wasserstein loss and the cross-entropy one get closer to 0, leading to both realistic and discriminative shapelets.
To further check the effect of our regularization, we focus on the most discriminative shapelets for a bunch of datasets, as it would be misleading to look at a random shapelet: a shapelet might well be similar to a series but useless for the classification. The discriminative power, for class , of the shapelet at index with respect to the -th time series in the training set is evaluated as:
where is the -th component (i.e. the one that corresponds to shapelet ) of the activation map for the time series and is the weight connecting that -th component to the -th output in the logistic layer of our classifier. As we aim at evaluating the overall discriminative power of a shapelet in a multi-class setting, and given that we use of a softmax activation at the input of our logistic layer, we can define the cross-class discriminative power of a shapelet as:
This is the criterion that we use to rank our shapelets in terms of discriminative power and to select the three most discriminative shapelets in Figure 13. This figure shows a significant improvement in terms of adequation of the shapelets to the training time series when using our AIPR-CNN model in place of a standard LS one. Examples of the shapelets learned using only the classifier part of our neural network architecture (a simple CNN) are shown in Figure 17. This figure reveals that an unregularized network fails at generating interpretable shapelets just as LS does. This shows that the actual benefit in interpretability indeed comes from our AIPR scheme. Our regularization strategy allows to generate shapelets that are both discriminative and representative of the training data.
Another important aspect, in terms of interpretability, is the explanation that can be provided to an end-user to explain a classification decision. For a given test time series, we produce two representations that help the user understand and trust the decision of a classifier. First, in Figure 1 and Figure 16a, we present the shapelets that were the most important to make a classification decision (according to Equation 7). One can notice that in both cases, shapelets extracted by AIPR-CNN better fit the time series at stake and hence help the end-user focus on the particular pattern in the time series that leads to the decision (e.g. the series of three peaks for HandOutlines or the overall shape of the central hump for Herring). Next, in Figure 16b, we present a 2D embedding of all the time series of the dataset, using the two most important shapelets for the considered time series. One can see that the considered time series (circled in red) lies in a part of the space where there are only “red class” time series. With these two representations for a test time series, the end-user: knows what are the most important shapelets (and their location) used by the model for its decision, and, can be convinced that these shapelets are good or sufficient to isolate the time series into a given class. When the considered time series correspond to actual subseries as with our method, this allows the end-user to better understand the decision process.
4.3 Quantitative Results
Our AIPR is able to recover shapelets that are discriminative and similar to the input, as expected. We want to quantify if this is achieved at the expense of classification accuracy and/or computation time. Our goal is to be much faster than exhaustive shapelet search methods (our baseline is Shapelets ), much more accurate than very fast random shapelet selection-based methods (our baseline is FS ) and as accurate and as fast as single model shapelet learning methods (our baseline is LS ).
We analyze the accuracies obtained by FS, LS and our AIPR-CNN method on the 85 datasets using scatter plots.333See Appendix A for detailed dataset information and accuracy. The results of the shapelet-based baselines used in this section come from  (the results for Shapelets  are not available because the method already does not scale on small size datasets). We compare FS versus AIPR-CNN in Figure 18 and LS versus AIPR-CNN in Figure 19. We also show how a simple CNN (without the adversarial regularization) compares against LS in Figure 20. We indicate the number of win/tie/loss for our method and we provide a Wilcoxon significance test  with the resulting -value (: none of the two methods is significantly better than the other). The points on the diagonal are datasets for which the accuracy is identical for both competitors. Figure 18 shows that, as expected, our method yields significantly better performance than FS. Compared to LS, for most datasets, the difference in accuracy is low, with a small edge (significant) for LS: on average for the 85 datasets, LS obtains an accuracy of 0.77 whereas AIPR-CNN obtains an accuracy 0.76. On three datasets (namely HandOutlines, NonInvasiveFetalECGThorax1 and OliveOil), our AIPR-CNN method and its regularization seems to be strongly positive (and detrimental on one dataset), in terms of generalization. The simple CNN seems to give slightly better (non significant) results than LS (and thus than our AIPR-CNN): on average for the 85 datasets, the simple CNN obtains an accuracy of 0.8. This means that our backbone neural network architecture is a good candidate to jointly learn interpretable shapelets and classify time series with little loss on accuracy.
4.3.2 Training Time
We provide both a theoretical complexity study (see Table 1) of all the baselines and of our AIPR-CNN method. Some complexities were already given in Section 2. Our method is based on a classifier and a discriminator, and both of them are simple CNNs. So the complexity of our algorithm () is related to training a CNN and should depend mainly on the number of examples (), the average length of the time series (), and the number of classes (, since the latter is used to decide the number of shapelets to be learned). Note that for both LS and AIPR-CNN, the parameter could be considered as a (quite big) constant since the number of epochs (i.e. the number of times the algorithm ”sees” the entire dataset) is fixed in the experiments. However, in LS, this number still depends on whereas it is fixed once and for all (to ) in AIPR-CNN. This difference is in favor of LS for small datasets and in favor of AIPR-CNN for larger ones.
To have a better grasp on the actual training time of all methods, we ran the methods on a single dataset (ElectricDevices) and recorded the CPU time. The experiments were conducted on a Debian Cluster using Intel(R) Xeon(R) CPU E5-2650 v4 Processor (12 core 2.20 GHz CPU) with 32GB memory. The results are averaged over five runs. The implementation code of our baselines is taken from  (as for the accuracy results). As expected, the original Shapelet  method does not finish in 48 hours for this medium size dataset. FS finishes in 12.1 minutes, LS finishes in 2323 minutes, and our method takes 142 minutes. The theoretical complexity of LS and AIPR-CNN is identical so these results were surprising. We suspected that the JAVA implementation of LS was not well optimized and we re-implemented the LS method with Keras444https://keras.io/. With this new implementation, the training phase took only minutes for LS on this dataset (compared to 142 for AIPR-CNN) which shows that the time difference between the two algorithms is mainly related to the implementation (and the hyper-parameters related to the number of epochs).
We have presented a new shapelet-based time series classification method that produces interpretable shapelets. The shapelets are deemed interpretable because they are similar to pieces of a real series and can thus be used to explain a particular model prediction. The method is based on a novel adversarial architecture where one convolutional neural network is used to classify the series and another one is used to constrain the first network to learn interpretable shapelets. Our results show that the expected trade-off between accuracy and interpretability is satisfactory: our classification results are comparable with similar state-of-the-art methods while our shapelets are interpretable.
We believe that the proposed adversarial regularization method could be used in many more applications where the regularization should be put on the parameters instead of the latent representation of the networks as done, for example, with Generative Adversarial Networks.
In future work, we would like to first investigate the use of an additional regularization term, based on the group lasso , to be able to determine automatically a minimal set of necessary interpretable shapelets. We also want to use our regularization on other types of data (such as multivariate time series, spatial data, graphs) and in a deep(er) CNN. Furthermore, we would like to adapt this architecture for unsupervised anomaly detection in time series with interpretable clues using neural network architectures such as convolutional auto-encoders or generative networks.
-  M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.
-  A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery, 31(3):606–660, May 2017.
-  A. Bagnall, J. Lines, J. Hills, and A. Bostrom. Time-series classification with cote: the collective of transformation-based ensembles. IEEE Transactions on Knowledge and Data Engineering, 27(9):2522–2535, 2015.
-  A. Bagnall, J. Lines, W. Vickers, and E. Keogh. The uea & ucr time series classification repository. www.timeseriesclassification.com.
-  K. Bascol, R. Emonet, E. Fromont, and J.-M. Odobez. Unsupervised Interpretable Pattern Discovery in Time Series Using Autoencoders. In A. Robles-Kelly, M. Loog, B. Biggio, F. Escolano, and R. Wilson, editors, Structural, Syntactic, and Statistical Pattern Recognition, volume 10029, pages 427–438. Springer International Publishing.
-  J. Demšar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1–30, Dec. 2006.
-  Z. Fang, P. Wang, and W. Wang. Efficient learning interpretable shapelets for accurate time series classification. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), pages 497–508. IEEE, 2018.
-  H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P. Muller. Deep learning for time series classification: a review. ArXiv, abs/1809.04356, 2018.
-  X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Y. W. Teh and M. Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR.
-  I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems (NIPS), pages 2672–2680, 2014.
-  J. Grabocka, N. Schilling, M. Wistuba, and L. Schmidt-Thieme. Learning time-series shapelets. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 392–401, 2014.
-  R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi. A survey of methods for explaining black box models. ACM Comput. Survey, 51(5), 2018.
-  I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems (NIPS), 2017.
-  I. Karlsson, P. Papapetrou, and H. Bostrom. Generalized random shapelet forests. Data Mining and Knowledge Discovery, 30(5):1053–1085, Sep 2016.
-  J. Lines, L. M. Davis, J. Hills, and A. Bagnall. A shapelet transform for time series classification. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 289–297, 2012.
-  J. Lines, S. Taylor, and A. Bagnall. Time series classification with hive-cote: The hierarchical vote collective of transformation-based ensembles. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(5):52, 2018.
-  S. M. Lundberg and S. Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems (NIPS), pages 4768–4777, 2017.
-  T. Rakthanmanon and E. Keogh. Fast shapelets: A scalable algorithm for discovering time series shapelets. pages 668–676, 05 2013.
-  M. T. Ribeiro, S. Singh, and C. Guestrin. “why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, 2016.
-  M. T. Ribeiro, S. Singh, and C. Guestrin. Anchors: High-precision model-agnostic explanations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pages 1527–1535, 2018.
-  H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1):43–49, 1978.
-  R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications (Springer Texts in Statistics). Springer-Verlag, Berlin, Heidelberg, 2005.
-  R. Tavenard. tslearn: A machine learning toolkit dedicated to time-series data, 2017. https://github.com/rtavenar/tslearn.
-  E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 2962–2971, 2017.
-  Z. Wang, W. Yan, and T. Oates. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the International Joint Conference on Neural Networks, pages 1578–1585, 2017.
-  L. Ye and E. Keogh. Time series shapelets: a new primitive for data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 947–956, 2009.
-  J. Zhao, Y. Kim, K. Zhang, A. Rush, and Y. LeCun. Adversarially regularized autoencoders. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 5902–5911, 2018.
Appendix A Dataset information and accuracy comparison