Multimodal Probabilistic Prediction of Interactive Behavior via an Interpretable Model
Abstract
For autonomous agents to successfully operate in real world, the ability to anticipate future motions of surrounding entities in the scene can greatly enhance their safety levels since potentially dangerous situations could be avoided in advance. While impressive results have been shown on predicting each agent’s behavior independently, we argue that it is not valid to consider road entities individually since transitions of vehicle states are highly coupled. Moreover, as the predicted horizon becomes longer, modeling prediction uncertainties and multimodal distributions over future sequences will turn into a more challenging task. In this paper, we address this challenge by presenting a multimodal probabilistic prediction approach. The proposed method is based on a generative model and is capable of jointly predicting sequential motions of each pair of interacting agents. Most importantly, our model is interpretable, which can explain the underneath logic as well as obtain more reliability to use in real applications. A complicate realworld roundabout scenario is utilized to implement and examine the proposed method.
I Introduction
The idea of predicting the future behavior of statistical time series has a wide range of application in economics, weather forecast, intelligent agent systems, etc. The autonomous vehicle is one of the wellknown intelligent agents and it is expected to predict behaviors of other road entities. Accurate and reasonable prediction is a prerequisite of performing reliable tasks involving motion planning, decision making, and control.
There have been numerous researchers working on behavior prediction problems and many approaches have been explored to solve such problem in the autonomous driving area. Some of these approaches only perform point estimate by assuming that the environment is deterministic. However, they failed to taking into account the uncertainty of future outcomes caused by partial observation or stochastic dynamics, which will induce the lost of information that capture the real physical interactions. Therefore, in this paper, we take into account the uncertainty of drivers as well as the evolution of the traffic situations and try to predict possible behaviors of multiple traffic participants several steps into the future.
When dealing with uncertainties for sequential prediction problems, two aspects need to be further discussed: the estimation of multimodal output distribution and the interpretability of the output samples. The multimodal property of a method can be regarded as having different motion patterns in the outputs, which is illustrated in Fig. 1. As was reviewed and summarized in [1], motion patterns can be categorized hierarchically into route, passyield and subtle patterns in various kinds of scenarios. For routing information, motion patterns can be regarded as discrete. However, as we want to output a sequence of future motions, the agent’s speed could increase, decrease or change randomly in the next few time steps. Therefore, the sampled trajectories is expected to have continuous motion patterns.
Also, the output uncertainty is usually achieved by sampling data points from some learned distributions and it is necessary to reason about what causes the motion pattern to vary among samples. However, most of the commonly used approaches cannot provide much insight on the structure of the function being approximated, especially for learningbased methods.
The contributions of this paper are in four folds: First, we proposed a multimodal probabilistic prediction structure for autonomous vehicles using only a single learningbased model. Second, we considered the sequential motion prediction of each pair of interacting vehicles. Third, the proposed model is interpretable as we are able to explain each sampled data point and relate it to the underlying motion pattern. Last but not least, we trained and tested our method under a complicated roundabout scenario, which adds more difficulties to the behavior prediction problem.
The remainder of the paper is organized as follows: Section II provides a brief overview of works related to interpretable models, trajectory prediction, and multimodality. Section III provides the detailed explanation of the proposed approach; Section IV discusses an exemplar scenario to apply our method; evaluations and results are provided in Section V; and Section VI concludes the paper.
Ii Related Works
Interpretable Models
To solve prediction problems for autonomous vehicles, many researchers utilized traditional methods such as constant velocity (CV), constant acceleration (CA), Intelligent Driver Model (IDM), and Kalman Filter (KF) [2]. However, these methods work well only under simple driving scenarios and their performances degrade for longterm prediction as they ignores surrounding context. As these manually designed models failed to scale to complicated traffic scenes, classical machine learning models are used, such as Hidden Markov Model (HMM) [3], Bayesian Networks (BN) [4], and Gaussian Process [5]. The aforementioned methods either utilize some known transitions models or incorporate handdesigned domain knowledges to enhance the prediction performance. Although these works consist of interpretable methodology, their performances are usually worse than most learningbase methods that lack interpretations.
The success of deep learning in many reallife applications motivates research on its use for motion prediction and the methods includes Deep Neural Networks (DNN) [6], Recurrent Neural Networks (RNN) [7], and Convolutional Neural Networks (CNN) [8]. Deep learning models can achieve high accuracy but at the expense of high abstraction which cannot be trusted.
Recently, numerous researchers tried to solve such problem by utilizing the idea of variational autoencoder (VAE) [9] which is a latent variable model that is able to learn a factored, lowdimensional representation of data. [10] developed a framework for incorporating structured graphical models in the encoders of VAE that allows them to induce interpretable representations through approximate variational inference. [11] proposed a novel factorized hierarchical VAE to learn disentangled and interpretable latent representations from sequential data. Our goal is to develop an interpretable architecture for behavior prediction based on the latent variable model such that features involved in modeling can be described through latent codes and explainable future motions can be generated.
Trajectory Prediction
There are variety of works dealing with trajectory prediction problems for road entities such as vehicles and pedestrians. [12] proposed a Long ShortTerm Memory (LSTM) encoderdecoder model to predict a multimodal predictive distribution over future trajectories based on maneuver classes. [13] applied the Hidden Markov Model (HMM) to predict the trajectories for individual driver. [14] combined CNN and LSTM to predict multimodal trajectories for an agent on a birdview image. The main limitation of these works, however, is that they only predict motions for one selected agent without considering the influence of other agents with potential interactions.
Since the motion of an agent can be largely influenced by other surrounding agents, some researchers began to tackle the scene prediction problem. Modeling future distributions over the entire traffic scene is a challenging task, given the high dimensional feature space and complex dynamics of the environment. [15] and [16] brought forward a LSTMbased structure to predict the most possible trajectory candidates for every vehicles over occupancy grid map. [17] utilized the Dynamic Bayesian network (DBN) for behavior and trajectory prediction. In [18], the authors proposed a hierarchical scene prediction framework, where the Conditional Variational Autoencoder (CVAE) was used in their lower module to predict continuous motions for multiple interacting road participants.
MultiModality
There exists a number of studies addressing the problem of modeling multimodality. Feedforward network with Gaussian Mixture [6][7] is usually applied to solve multimodal regression tasks but it is often difficult to train in practice due to numerical instabilities when operating in highdimensional spaces such as predicting future sequences.
Other works solved this problem by utilizing different regression models for different possible discrete intentions of road entities [5][18]. However, when the intention space is large or data is insufficient, such method becomes inefficient and the model will even suffer from overfitting problems. Alternatively, [8][19] treated the discrete intention as one of the state input to the proposed structure to generate different types of output.
Iii Approach
In this section, we first introduce the main algorithm of the proposed behavior prediction approach. Then the details of the intention prediction method are illustrated.
Iiia Interactive Behavior Prediction
Our proposed method is based on the structure of CVAE which has a similar encoderdecoder structure as the typical VAE. Two types of conditional input are considered in our model structure: historical scene information and inferred driving intention.
We focus on predicting human drivers’ interactive behaviors between two vehicles: vehicle , (denoted by ), and vehicle , (denoted by ). Both vehicles are regarded as the predicted vehicle and we are interested in jointly predicting their behaviors, while taking into account any internal correlations. Note that it is trivial to convert the output joint distribution to a conditional distribution by treating one of the predicted vehicles as the ego vehicle and obtain the behavior prediction of the other. However, we will not address further on such problem setting in this paper.
For a given vehicle, we use to represent its actual trajectory and as the trajectory we predict. At the current time step , the vehicle’s historical trajectory is denoted as and its future trajectory is described as , where and represent the number of time steps into the past and future, respectively. Moreover, we denote as the discrete intention of a vehicle and as the environment information.
Given the historical trajectory and driving intention of two interactive vehicles, along with the environment information, the objective of estimating probabilistic joint trajectories can be expressed as:
(1) 
To formulate the problem in the CVAE structure, we let the encoder, , take the input as a learned embedded space of historical trajectory and environment information, as the intention vector , and as the actual trajectory , to “encode” them into a latent space. Then the decoder, , takes and as input, to “decode” the sampled values from space back to the output , which corresponds to the predicted future trajectories . To enable backpropagation, the network is trained using the reparameterization trick [9] such that the latent variables can be expressed as:
(2) 
Here, and are the parameters of the encoder and decoder network, respectively. To learn these parameters, we can optimize the variational lower bound:
(3) 
where the model is forced to perform a tradeoff between a good estimation of data loglikelihood and the divergence between the approximated posterior and assumed prior which, in our case, is the unit Gaussian . We also utilize the hyperparameter to control the training balance between the two losses for better performance.
At test time, we can directly sample from as the latent variable input and only use the decoder to obtain the predicted joint distribution. Notice that among the three input of the decoder network, only is fixed at a given time step while both and are nondeterministic factors. Therefore, in the following section, we will analyze the effects of these factors to the output trajectories, demonstrate how the proposed model can estimate multimodal distributions over future sequences, and explore the interpretability underneath the model.
IiiB Intention Prediction
IiiB1 Bayesian Approach
During the scene evolution, we use a Bayesian approach to predict each vehicle’s intention , based on history observation . In this problem, we consider the vehicle’s past trajectory as the observation since an agent’s potential intent can largely influence its trajectory. Therefore, the Bayesian equation can be written as:
(4) 
where represents the probability density function. The term is the prior belief of the intention and is initialized with a known distribution according to initial observation; is the likelihood of observing for a given intention ; and the denominator is a normalization term.
IiiB2 Dynamic Time Wraping (DTW)
The dynamic time warping (DTW) distance as proposed in [20] is a trajectory measure that can be used on general time series. DTW does not require both trajectories to have the same length. Instead, DTW measures the temporal changes that are necessary in order to warp one trajectory into another.
If we consider driving intention as pursuing some goal location such as one of the exit branch in roundabout scenario or left/right turn in intersections, we are able to obtain a reference driving path for each intention. Therefore, we can use the DTW algorithm to determine the likelihood of an observed trajectory given a reference path assuming the agent has intention :
(5) 
where is the cost calculated by the DTW algorithm. The smaller the cost is, the closer the observed trajectory is to the reference path, and thus the higher the probability is for intention . In fact, we are interested in the DTW value between the observed trajectory and a segment of the reference path closer to the trajectory instead of the full reference path.
IiiB3 Select Interacting Pairs
After obtaining the intention probabilities of each vehicle in the scene at a given time step, we can determine whether any of the two vehicles have potential interaction according to their corresponding reference path. If all potential reference path of two vehicles have no conflict point, the vehicles’ future trajectories will be independent from each other and thus no attempt is needed to further predict their joint motions. In this work, we make an assumption that the interaction happens only between two agents while there can be multiple interacting pairs in the scene.
IiiB4 Avoid ConstantlyChanged Prediction Results
To avoid rapid fluctuation of the likelihood distribution, we perform the aforementioned intention prediction algorithm for at least every other 0.4s. Here we assume that a certain driving intention will not change within a short period of time especially for the roundabout scenario where the driver already know his/her intended road branch to exit.
Iv Experiments
In this section, we use an exemplar roundabout scenario to apply our proposed behavior prediction method. The data source and details of the problem formulation are presented.
Iva Real Driving Scenario
IvA1 Dataset
The driving data we used was collected by our Mechanical Systems Control Lab at a singlelane roundabout in Berkeley, CA. The roundabout, shown in Fig. 2(b), is connected with 8 branches and each of them has one entry lane and one exit lane.
We used a data collection vehicle equipped with a 64layer LiDAR, omnidirectional cameras, a differential GPS and an inertial measurement unit (IMU). The test vehicle was parked on the side of the circular roadway and the collected sensor measurements were processed to obtain the states for every detected vehicle. To smoothen the noisy data, we performed a downsampling to decrease the sampling rate from 10Hz to 5Hz and applied an Extended Kalman Filter (EKF). We manually picked 1534 driving segments from the collected data, where 80% were randomly selected for training and 20% for testing.
IvA2 Reference Path
According to the dataset, a total of 19 reference paths were considered, which can be seen in Fig. 2(b). Each reference path corresponds to a routing information from one lane to another and it is generated by finding the best fitted path among all vehicle trajectories in the data that has the same entry and exit lane.
IvB Problem Description and Feature Selection
IvB1 Problem Description
For roundabout scenario, one of the most typical interaction scenarios happens when one vehicle (car ) is waiting behind the stop/yield sign and trying to enter the circular roadway, while another vehicle (car ) is driving on the circular roadway towards the direction of car . Under such circumstance, the potential exit lane for car will largely influence the driving behavior of both vehicles. For example, if car exits the circular roadway before reaching the current lane of car , the driving trajectories of two vehicles will not be influenced by each other; contrarily, if car keep driving on the circular roadway, two cars will begin negotiating with each other to decide who should go first, which will affect their future trajectories.
As most of the selected cases in our dataset belong to the aforementioned situation, we only consider the driving intent of car as the intention input in our proposed prediction model. Moreover, the front vehicles of car and car are regarded as environment information in each driving case, which are essential influence factors of vehicle behaviors.
IvB2 Feature Selection
The input of past joint trajectories contains four features at each time step: . The environment input contains the surrounding vehicles’ information at the current time step , which is the state of each interactive car’s front vehicle: . Here, and represent the vehicle’s location in Euclidean coordinates, while denotes the vehicle speed. The driving intention is converted to an onehot vector, which denotes the intended exit branch for the selected vehicle out of all 8 possible branches.
IvC Implementation Details
As shown in Fig. 2, the environment information passes through a fully connected network with 16 neurons and the sequence of past joint trajectories are fed into one LSTM cell with 16 neurons. Both the encoder and decoder contain three fully connected layers of 64 neurons with as nonlinear activation function. The latent space dimension is set to 2 and a randomly sampled vector from a unit Gaussian is used as one of the input of the decoder. In this problem, and are both set to 5, where we want to predict 1s into the future using the past 1s information. According to the experiment, the method has the best performance when is set to 0.005.
V Results and Evaluations
In this section, we first visually illustrate the result of the proposed model through several selected cases. Then we introduce the quantitative evaluation metric and present the comparison result with other baseline methods.
Va Visual Illustration
VA1 Intention Prediction
We selected a representative case to demonstrate how intention is predicted using Bayesian update and DTW, shown in the first row of Fig. 3. The top four reference paths that have the highest intention probability were plotted in blue and we further selected a path segment (dark blue) from each reference path to calculate the DTW value with the observed trajectory (red).
At the beginning, the vehicle’s intention is ambiguous and it has similar probability of exiting at branch 1, 2, and 3. However, the reference path of exiting at the 4th branch has the least similarity (large DTW value) compared to the vehicle’s observed trajectory and thus its corresponding intention probability is the smallest among the listed four reference path. As the vehicle continues to move, it drives closer to the roundabout center and has lesser intent to exit. Such behavior is well captured by the intention prediction module as shown in the second and third figure, where the probabilities of the vehicle exiting branch 1 and 2 are continually increasing while the likelihood of exiting at the 3rd branch is constantly decreasing.
VA2 Trajectory Prediction
We tested and compared the prediction result of our proposed method with the original CVAE approach shown in the bottom two rows of Fig. 3. To make a fair comparison, we fixed the 10 randomly sampled latent values in both methods and used them to generate 10 future joint trajectories of two interacting vehicles.
According to the result, our proposed method successfully generates different motion patterns which are consistent with the intention prediction result. In contrast, the traditional CVAE method only predicts one motion pattern and it fails to consider the possibility that car might exit the roundabout at the 3rd branch.
We argue that although two vehicles have interaction and it may influence their future trajectories, the intention of which road to exit for car is not influenced by the other vehicle, . Therefore, if we are about to predict the joint trajectories of two cars using learningbased methods like CVAE, each vehicle’s trajectory will be largely influenced by the data distribution and will not reveal multimodal property if the amount of data we used are not large enough to cover every possible cases. Even if we have sufficient data, the CVAE network will most likely learn how to closely relate the future motions of two vehicles instead of learning to infer the future joint trajectories based on the historical trajectory of each individual vehicle. In other word, we don’t want the network to only learn the motion correlations between two vehicles without treating each of them individually. Hence, the intention factor we added in the proposed method will mitigate such problem by encouraging the network to relate each vehicle’s intention to its own past trajectory and then generating its future motions while taking other vehicle’s historical motions into consideration.
VA3 Interpretability
The learned latent space is plotted in Fig. 4(a) where we assigned different colors to different interact cases. Although the pass/yield information of two interacting vehicles is not used during training, our proposed method successfully distinguished such motion patterns in the latent space. To illustrate the influence of the sampled vector to the predicted trajectories, we fixed the intention input and only changed the value along its two dimensions.
As we fix and decrease (figure b1 to b5), car increases its speed and shifts from yielding car to passing car while the speed of does not change much. As we fix and increase (figure c1 to c5), car always passes car but the speed of car gradually decreases. Therefore we can conclude that if we move from the 2nd to the 4th quadrant of the 2D unit Gaussian, there will be an obvious change of interaction patterns between two cars. Hence, the proposed method is interpretable as the sampled output can be wellexplained by the location of . Moreover, being able to generate various motion patterns from different sampled values can be also regarded as having the multimodal property.
VB Metric
VB1 Mean Squared Error (MSE)
MSE is commonly used to evaluate the prediction performance and the equation can be written as:
(6) 
where is the th sampled prediction result out of output samples and is the groundtruth.
While MSE provides a tangible measure for the predictive accuracy of models, it has limitations while evaluating multimodal predictions. MSE is skewed in favor of models that average different output modes. In particular, this average may not represent a good prediction and thus we also choose another evaluation metric.
VB2 Negative Log Likelihood (NLL)
NLL is a proper metric for evaluating predictive uncertainty [21] and it can be calculated as:
(7) 
where and represent the mean and variance of output samples respectively.
VC Quantitative Performance Evaluation
We compared our method with the following three approaches, where all of them take historical trajectories as input and generate a sequence of future trajectories as output.

Conditional Variational Autoencoder (CVAE): We compared the proposed method with the traditional CVAE approach where no intention prediction module is used.

Multilayer Perceptron Ensemble (MLPensemble): The MLP is designed to have the same network structure as the decoder in our proposed model. To incorporate uncertainty, we applied the bagging strategy to combine predictions of models built on subsets created by bootstrapping.

Monte Carlo dropout (MCdropout): We also implemented the MCdropout method [22] to estimate the prediction uncertainty by using Dropout during training and test time. The mean and variance can be obtained by performing stochastic forward passes and averaging over the outputs.
The MSE and NLL values are calculated for all four methods and the results are shown in Table I. It is apparent from the table that our method has the largest value in terms of the MSE but has the smallest NLL. Such results indicate that the proposed method is able to generate output trajectories with the largest uncertainties due to its multimodal prediction results at the expense of slightly larger MSE value. The reason that other methods have smaller MSE value is because they can only approximate single motion model and all the corresponding output samples are distributed around the groudtruth. However, small MSE is not favorable when the output is supposed to have multiple motion models.
Moreover, we notice that CVAE has comparable results against ML dropout and MLP ensemble in terms of the two evaluation metrics. Therefore, the proposed method, based on the CVAE methodology, is not only able to generate desirable performances, but also capable of estimating explainable multimodal distributions.
Method  Proposed  CVAE  MC Dropout  MLP Ensemble 
MSE  0.45 0.01  0.16 0.01  0.15 0.01  0.20 0.01 
NLL  0.83 0.26  2.57 1.02  2.10 0.63  3.05 0.97 
Vi Conclusions
In this paper, a multimodal probabilistic prediction method is proposed, which can predict interactive behavior for traffic participants and acquires interpretability. An exemplar roundabout scenarios with realworld data collected by ourselves was used to demonstrate the performance of our method. First, the prediction results for intention and motion of selected vehicles are visually illustrated through a representative driving case. Then, we plotted the learned latent space to demonstrate the interpretability. Finally, we quantitatively compared the proposed method with three different models: CVAE, MLP ensemble and MC dropout. Our method outperforms these methods in terms of the negative log likelihood metric, which shows its advantages for learning conditional models on multimodal distributions.
Vii Acknowledgment
We thank Liting Sun for generating reference path and Junming Chen, Di Wang for data processing works.
References
 [1] W. Zhan, L. Sun, Y. Hu, J. Li, and M. Tomizuka, “Towards a fatalityaware benchmark of probabilistic reaction prediction in highly interactive driving scenarios,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 3274–3280.
 [2] S. Hoermann, D. Stumper, and K. Dietmayer, “Probabilistic longterm prediction for autonomous vehicles,” in Intelligent Vehicles Symposium (IV), 2017 IEEE. IEEE, 2017, pp. 237–243.
 [3] C. Laugier, I. Paromtchik, M. Perrollaz, Y. Mao, J.D. Yoder, C. Tay, K. Mekhnacha, and A. Nègre, “Probabilistic analysis of dynamic scenes and collision risk assessment to improve driving safety,” Its Journal, vol. 3, no. 4, pp. 4–19, 2011.
 [4] M. Liebner, M. Baumann, F. Klanner, and C. Stiller, “Driver intent inference at urban intersections using the intelligent driver model,” in Intelligent Vehicles Symposium (IV), 2012 IEEE. IEEE, 2012, pp. 1162–1167.
 [5] Q. Tran and J. Firl, “Online maneuver recognition and multimodal trajectory prediction for intersection assistance using nonparametric regression,” in Intelligent Vehicles Symposium Proceedings, 2014 IEEE. IEEE, 2014, pp. 918–923.
 [6] Y. Hu, W. Zhan, and M. Tomizuka, “Probabilistic prediction of vehicle semantic intention and motion,” in Intelligent Vehicles Symposium (IV), 2018 IEEE. IEEE, 2018, pp. 307–313.
 [7] J. Morton, T. A. Wheeler, and M. J. Kochenderfer, “Analysis of recurrent neural networks for probabilistic modeling of driver behavior.” IEEE Trans. Intelligent Transportation Systems, vol. 18, no. 5, pp. 1289–1298, 2017.
 [8] H. Cui, V. Radosavljevic, F.C. Chou, T.H. Lin, T. Nguyen, T.K. Huang, J. Schneider, and N. Djuric, “Multimodal trajectory predictions for autonomous driving using deep convolutional networks,” arXiv preprint arXiv:1809.10732, 2018.
 [9] D. P. Kingma and M. Welling, “Autoencoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
 [10] N. Siddharth, B. Paige, A. Desmaison, V. de Meent, F. Wood, N. D. Goodman, P. Kohli, P. H. Torr et al., “Inducing interpretable representations with variational autoencoders,” arXiv preprint arXiv:1611.07492, 2016.
 [11] W.N. Hsu, Y. Zhang, and J. Glass, “Unsupervised learning of disentangled and interpretable representations from sequential data,” in Advances in neural information processing systems, 2017, pp. 1878–1889.
 [12] N. Deo and M. M. Trivedi, “Convolutional social pooling for vehicle trajectory prediction,” arXiv preprint arXiv:1805.06771, 2018.
 [13] Y. Nishiwaki, C. Miyajima, N. Kitaoka, R. Terashima, T. Wakita, and K. Takeda, “Generating lanechange trajectories of individual drivers,” in Vehicular Electronics and Safety, 2008. ICVES 2008. IEEE International Conference on. IEEE, 2008, pp. 271–275.
 [14] A. Bhattacharyya, B. Schiele, and M. Fritz, “Accurate and diverse sampling of sequences based on a âbest of manyâ sample objective,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8485–8493.
 [15] S. Park, B. Kim, C. M. Kang, C. C. Chung, and J. W. Choi, “Sequencetosequence prediction of vehicle trajectory via lstm encoderdecoder architecture,” arXiv preprint arXiv:1802.06338, 2018.
 [16] B. Kim, C. M. Kang, S. H. Lee, H. Chae, J. Kim, C. C. Chung, and J. W. Choi, “Probabilistic vehicle trajectory prediction over occupancy grid map via recurrent neural network,” arXiv preprint arXiv:1704.07049, 2017.
 [17] T. Gindele, S. Brechtel, and R. Dillmann, “A probabilistic model for estimating driver behaviors and vehicle trajectories in traffic environments,” in Intelligent Transportation Systems (ITSC), 2010 13th International IEEE Conference on. IEEE, 2010, pp. 1625–1631.
 [18] Y. Hu, W. Zhan, and M. Tomizuka, “A framework for probabilistic generic traffic scene prediction,” in Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 2790–2796.
 [19] M. Babaeizadeh, C. Finn, D. Erhan, R. H. Campbell, and S. Levine, “Stochastic variational video prediction,” arXiv preprint arXiv:1710.11252, 2017.
 [20] E. J. Keogh and M. J. Pazzani, “Scaling up dynamic time warping for datamining applications,” in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2000, pp. 285–289.
 [21] J. QuinoneroCandela, C. E. Rasmussen, F. Sinz, O. Bousquet, and B. Schölkopf, “Evaluating predictive uncertainty challenge,” in Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment. Springer, 2006, pp. 1–27.
 [22] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning, 2016, pp. 1050–1059.