High fidelity behavior prediction of intelligent agents is critical in many applications. However, the prediction model trained on the training set may not generalize to the testing set due to domain shift and time variance. The challenge motivates the adoption of online adaptation algorithms to update prediction models in real-time to improve the prediction performance. Inspired by Extended Kalman Filter (EKF), this paper introduces a series of online adaptation methods, which are applicable to neural network-based models. A base adaptation algorithm Modified EKF with forgetting factor (MEKF) is introduced first, followed by exponential moving average filtering techniques. Then this paper introduces a dynamic multi-epoch update strategy to effectively utilize samples received in real time. With all these extensions, we propose a robust online adaptation algorithm: MEKF with Exponential Moving Average and Dynamic Multi-Epoch strategy (MEKF). The proposed algorithm outperforms existing methods as demonstrated in experiments.
Robust Online Model Adaptation]Robust Online Model Adaptation by Extended Kalman Filter with Exponential Moving Average and Dynamic Multi-Epoch Strategy
nline adaptation, extended Kalman filter, exponential moving average, optimization
Supervised learning has been widely used to obtain models to predict the behaviors of intelligent agents [Rudenko et al.(2019)Rudenko, Palmieri, Herman, Kitani, Gavrila, and Arras]. Behavior prediction is a sub-topic of time series prediction [Weigend(2018)], which includes but is not limited to vehicle trajectory prediction during autonomous driving [Lefèvre et al.(2014)Lefèvre, Vasquez, and Laugier] and human-motion prediction during human-robot collaboration [Cheng et al.(2019)Cheng, Zhao, Liu, and Tomizuka]. Although a trained model typically performs well on the training set, performance can significantly drop in a slightly different test domain or under a slightly different data distribution [Si et al.(2019)Si, Wei, and Liu, Callison-Burch et al.(2010)Callison-Burch, Koehn, Monz, Peterson, Przybocki, and Zaidan]. For tasks without annotated corpora from the test domain, adaptation techniques are required to deal with the lack of domain-specific data. This paper studies robust online adaptation algorithms for behavior prediction.
In online adaptation, a prediction model observes instances sequentially over time. After every observation, the model outputs a prediction and receives the ground truth. Then the online adaptation algorithm updates the prediction model according to the error measured between the prediction and the ground truth. The goal of adaptation is to improve the prediction accuracy in subsequent rounds.
For prediction models encoded in neural networks, most existing online adaptation approaches are based on stochastic gradients [Kivinen et al.(2004)Kivinen, Smola, and Williamson]. For example, the identification-based approach uses stochastic gradient descent (SGD) to adapt the model online [Bhasin et al.(2012)Bhasin, Kamalapurkar, Dinh, and Dixon]. However, these methods may be sub-optimal in minimizing the local prediction errors. Another solution is to use the recursive least square parameter adaptation algorithm (RLS-PAA) [Ljung and Priouret(2010)], which has been applied to adapt the last layer of a feedforward neural network [Cheng et al.(2019)Cheng, Zhao, Liu, and Tomizuka] or the last layer of a recurrent neural network [Si et al.(2019)Si, Wei, and Liu]. RLS-PAA can only adapt the last layer of a neural network since it only applies to linear models. To adapt other layers, the adaptation problem becomes nonlinear, which requires the development of robust optimal nonlinear adaptation algorithms [Jazwinski(2007), Cooper et al.(2014)Cooper, Che, and Cao, Abuduweili et al.(2019)Abuduweili, Li, and Liu].
Since a neural network parameterizes a nonlinear system with a layered structure, learning or adaptation of the neural network is equivalent to parameter estimation of the nonlinear system. The extended Kalman filter (EKF) is one of the best methods for nonlinear parameter estimation [Jazwinski(2007)], which is derived by linearizing the system equations at each time step and applying Kalman filter (an optimal filter that minimizes the tracking error) on the linearized system. The EKF approach has been demonstrated to be superior to the SGD-based algorithms in training feedforward neural networks [Iiguni et al.(1992)Iiguni, Sakai, and Tokumaru, Ruck et al.(1992)Ruck, Rogers, Kabrisky, Maybeck, and Oxley]. Nonetheless, in online adaptation, more recent data is more important [Fink et al.(2001)Fink, Nelles, Fischer, and Isermann]. Similar to adaptive EKF methods [Yang et al.(2006)Yang, Lin, Huang, and Zhou, Ozbek and Efe(2004), Anderson and Moore(2012)] that discount old measurements, this paper considers the Modified Extended Kalman Filter with forgetting factor, MEKF, as a base adaptation algorithm.
On top of the base adaptation algorithm, the following modifications are made. Generally, the step size of parameter update in EKF-based approaches may not be optimal, due to the error introduced during linearization. Inspired by exponential moving average (EMA) methods, this paper proposes EMA filtering to the base MEKF in order to increase the convergence rate. The resulting algorithm is called MEKF. Then in order to effectively utilize the samples in online adaptation, this paper proposes a dynamic multi-epoch update strategy to discriminate the “hard” samples from “easy” samples, and sets different weights for them. The dynamic multi-epoch update strategy can improve the effectiveness of online adaptation with any base optimizers, e.g., SGD or MEKF. By incorporating MEKF with the dynamic multi-epoch update strategy, we propose the algorithm MEKF (MEKF with Exponential Moving Average and Dynamic Multi-Epoch update strategy).
The remainder of the paper first formulates the online adaptation problem, then discusses the proposed algorithm, and finally validates the effectiveness and flexibility of the proposed algorithms in experiments.
2 Online adaptation framework
The behavior prediction problem is to make inference on the future behavior of the target agent given the past and current measurement of the target agent and its surrounding environment. The transition model for behavior prediction problem is formulated as
where the input vector denotes the stack of -step current and past measurements (e.g. trajectory or extracted features) at time steps . The output vector denotes the stack of the -step future behavior (e.g. future trajectory) at time steps . The function is the prediction model that maps the measurements to the future behavior. denotes the (ground truth) parameter of the model. It is assumed that there are recurrent structures in such that the prediction of is made by rolling out the one-step predictions . The function is the one-step prediction function and is a recurrent part of the overall prediction model.
Online adaptation explores local overfitting to minimize the prediction error. At time step , the following prediction error is to be minimized
where is the ground truth trajectory (to be observed in the future) and is the predicted trajectory using the estimated model parameter . The adaptation objective can be in any norm. This paper considers norm. Assume that the true model parameter changes slowly during adaptation, i.e., . Then the estimated model parameter that minimizes the prediction error in the future can be approximated by the estimated parameter that minimizes the fitting error in the past. Solving for the estimated parameter that minimizes the fitting error corresponds to a nonlinear least square (NLS) problem.
[Problem NLS] Given a dataset , find that minimizes , where error term is defined as .
In online adaptation, the estimate of the model parameter is updated iteratively when new data is received. Then a new prediction is made using the new estimate. In the next time step, the estimate will be updated again given the new observation and the process repeats. It is worth noting that the observation we received at time is . The other terms in remains unknown. This paper is focused on adaptation methods using only one-step observation. It is possible to adapt with multi-step observations, which will be studied in the future. The process for online adaptation is summarized in algorithm 1. is the estimate of the model parameter at time .
3 Robust nonlinear adaptation algorithms
3.1 Modified EKF with forgetting factor
Our base adaptation algorithm is inspired by the recursive EKF method [Moriyama et al.(2003)Moriyama, Yamashita, and Fukushima, Alessandri et al.(2007)Alessandri, Cuneo, Pagnan, and Sanguineti]. In EKF, the object being estimated is the state value of a dynamic system, while in adaptable prediction, the object to be adapted is the parameters that describe the system dynamics. Nonetheless, we can apply the EKF approach to adapt model parameters by regarding model parameters as system states. By assuming that the ground truth changes very slowly, we can pose the parameter adaptation problem as a static state estimation problem [Ruck et al.(1992)Ruck, Rogers, Kabrisky, Maybeck, and Oxley, Nelson(2000)] with the following dynamics,
where is an estimate of the (ground truth) model parameter ; is the observation at time ; and is the prediction for time step made at time . is the one-step prediction function. The injected process noise and the injected measurement noise are assumed to be zero mean Gaussian white noise, and are identically and independently distributed. The symbol represents Gaussian distribution. and represent the covariance matrices for process noise and measurement noise respectively. For simplicity, we assume and for and . The only requirement on these two terms is that they should be positive semidefinite. If there is no knowledge regarding the cross correlation of noise in the outputs, it is reasonable to assume that the final output nodes are independent of each other, and set and to be proportional to the identity matrix.
In online adaptation, we assume that data in the distant past is no longer relevant for modeling the current dynamics, i.e. more recent data is more important. Hence, we consider a weighted nonlinear recursive least squares (NLS) problem:
where is the “forgetting factor” which provides exponential decay to older samples. The forgetting factor prevents the EKF from saturation, and increases the algorithm’s ability to track a changing system. Algorithm 2 summarizes the modified extended Kalman filter algorithm with forgetting factor ( MEKF).
In algorithm 2, is the Kalman gain. is a matrix representing the uncertainty in the estimates of the model parameter . is the gradient matrix by linearizing the network. In online adaptation, is initialized by the offline trained parameter of the model. For , due to the absence of any a priori information, the matrix can be set to be proportional to the identity matrix, i.e., for .
3.2 Extensions with exponential moving average filtering
In the following discussion, for simplicity, an optimizer (e.g. MEKF) that solves the adaptation problem will be denoted as with internal state matrix . The optimization process for adaptation can be compactly written as
where is the step size of the parameter update at time step .
In SGD-based methods, numerous variants of exponentially-decayed moving average (EMA) have been successfully used to speed up the convergence, including Polyak averaging [Polyak(1964)] and momentum [Qian(1999)]. We can utilize EMA for filtering in the MEKF optimization process. For example, applying EMA on the step size , to be discussed as EMA-V or momentum; and applying EMA for the optimizer’s inner state , to be discussed as EMA-P.
EMA-V or momentum is widely used in SGD-based optimization algorithms [Qian(1999)], which helps accelerate gradient-based optimizers in relevant directions and dampen oscillations [Qian(1999)]. Momentum can be regarded as an EMA filter on the step size of parameter update. It calculates the step size by decreasing exponentially the older step size with a factor , i.e. .
As mentioned earlier in MEKF is a matrix representing the uncertainty in the parameter estimates. In order to attenuate instability during adaptation cased by anomaly data, we can smooth the inner state of the optimizer by pre-filtering . The principle of pre-filtering the inner state (e.g., gradient, adaptive learning rate) before using them in optimization is applicable to many optimization algorithms. For example, in Adam [Kingma and Ba(2014)], the estimate of the first and second moment is filtered every step using EMA. Similarly, we can apply EMA on the inner state matrix of .
By combining EMA-V and EMA-P, we propose the modified extended Kalman filter with exponential moving average ( MEKF) algorithm as shown in algorithm 3. Where is a momentum factor. is a decay factor for the EMA filtering of .
3.3 Dynamic multi-epoch update strategy
In generic online adaptation, all data are equally considered. We run the adaptation algorithm chronologically from the first data to last data . Every data sample is used only once, as shown in algorithm 1. The method that uses every data sample only once in the adaptation is called single-epoch online update strategy.
Inspired by curriculum learning [Bengio et al.(2009)Bengio, Louradour, Collobert, and Weston] in offline training, we introduce a more effective way to determine the adaptation epochs for every data sample during online adaptation. A curriculum can be viewed as a sequence of training criteria. Each training criterion in the sequence is associated with a different sets of weights on the training examples. That said, it is practically useful to differentiate “easy” samples and “hard” samples. In the online adaptation scenario, we introduce the following dynamic multi-epoch strategy to mimic curriculum learning.
[Dynamic multi-epoch online update strategy] In online adaptation, the predicted output generated by the estimated parameter is . Suppose there is a criterion to determine the number of epochs to adapt the parameter with the current sample, i.e., . In other words, we reuse the input-output pair times to adapt the parameter . This approach is called the dynamic multi-epoch online update strategy or dynamic multi-epoch update.
We propose a very straighforward criterion to determine the number of epochs for every sample, as shown in algorithm 4. Two thresholds and are used to discriminate “easy”, “hard”, and “anomaly” samples. Before updating the parameter, we calculate prediction error at the current step. If the error satisfies , the sample is considered as an “easy” sample. Then we run single-epoch update for this sample. If the error satisfies , the sample is considered as a “hard” sample. Then we reuse this sample and run the adaptation twice. The rationale is that for a “hard” sample, an adaptation optimizer may not learn enough under single-epoch update. If the error satisfies , the sample is considered as an “anomaly” sample. Then we skip the update of . The rationale is that if the cost is too high, the sample is likely to be an anomaly point in the data distribution, which may destabilize the model adaptation process if learned. It is crucial to identify and learn more from those “hard” samples without losing the generalizability of the model by learning other samples.
The thresholds and can be determined by the validation set empirically. If the dataset is noise-free, there is no need to identify “anomaly” samples and we set . In general, we recommend the following method to find the desired and . First, we need to run the single-epoch adaptation on the validation set and record each sample’s prediction error . Second, we set as the quantile value of the errors, and set as the quantile value of the errors. That means, we regard of the samples as “easy” samples, of the samples as “hard” sample, and of the samples as “anomaly” samples.
We use MEKF to denote MEKF with the dynamic multi-epoch update strategy.
4.1 Experimental design
In the experiments, we consider multi-task prediction problems for simultaneous intention and trajectory prediction of either humans or vehicles. We construct Recurrent Neural Network [Salehinejad et al.(2017)Salehinejad, Sankar, Barfett, Colak, and Valaee] (RNN) based architectures to conduct experiments on Mocap dataset (human) and NGSIM dataset (vehicle) [Colyar and Halkias(2007)]. Details of the experiments are shown in section A.1. Before online adaptation, the prediction models are trained offline. In the following discussion, we studied the performance of various adaptation algorithms on these offline-trained models (with online adaptation on the test set). In particular, we evaluate the accuracy (0-1) for intention prediction, and the mean squared error (MSE) for trajectory prediction.
4.2 Experimental result
Comparison among different optimizers
This paragraph compares the proposed algorithm MEKF with the based algorithm MEKF and other commonly used optimizers such as SGD, Adam, and Amsgrad. For fair comparison, we apply dynamic multi-epoch update strategy on SGD (with momentum), Adam, and Amsgrad.
Table 1 shows the prediction performance of online adapted models using different optimizers on the Mocap dataset and the NGSIM dataset. Compared to the stochastic gradient-based algorithms, the EKF-based methods MEKF and MEKF perform better. In addition, MEKF has the best performance among all, due to the extensions inspired by EMA and dynamic multi-epoch update. On the CMU Mocap dataset, Adam reduces the trajectory MSE by 3.73%. MEKF reduces the trajectory MSE by 14.77% .MEKF reduces the trajectory MSE by 16.05%.
Effectiveness of extensions
|Dataset||Metrics||MEKF||MEKF + EMA-V||MEKF + EMA-P||MEKF + DME|
EMA-V or momentum rarely improves the performance. Two potential reasons are: 1) the momentum does not help EKF-based optimizers. In every optimization step, EKF-based optimizers has already incorporated the historical data. Hence its step size is already closer to optimum than that of SGD. The learning gain in SGD is not based on historical data but manually defined. 2) the moving average on the parameter or the step size is more applicable to offline training than to online adaptation. The inapplicability is due to the fact that online adaptation can only process data sequentially in time, which is significantly different from the shuffled, repetitive, and batched process in offline training.
EMA-P slightly improves the performance of MEKF. Filtering of can smooth the inner state and improve convergence.
Dynamic multi-epoch update improves the performance of MEKF, and it has the best performance among all the proposed extensions.
Experiments in section A.2 shows the effectiveness of the proposed discrimination criterion in dynamic multi epoch update strategy.
This paper studied online adaptation of neural network-based prediction models for behavior prediction. An EKF-based adaptation algorithm MEKF was introduced as an effective base algorithm for online adaptation. In order to improve the performance and convergence of MEKF, exponential moving average filtering was investigated, including momentum and EMA-P. Then this paper introduced a dynamic multi-epoch update strategy, which is compatible with any optimizers. By combining all extensions with the base MEKF algorithm, we introduced the robust online adaptation algorithm MEKF. In the experiments, we demonstrated the effectiveness of proposed adaptation algorithms.
In the future, mathematical analysis of the proposed online adaptation algorithm MEKF will be performed in order to provide theoretical guarantees on stability, convergence, and boundedness. In addition, we will apply the proposed algorithm on a wider range of problems, which may not be limited to behavior prediction problems.
Appendix A Detailed Experiments
a.1 Experimental design
In experiment we considers a multi-task prediction problem for simultaneous intention and trajectory prediction. Intentions are discrete representations of future trajectories. For example, in vehicle behavior prediction, intention can be acceleration and deceleration in a certain time window in the future.
The transition models for trajectory and intention prediction of the target agent are formulated as
where the input vector denotes the stack of -step current and past measurement at time steps . The measurement can include the position and velocity of the target agent as well as the state of the environment. For human behavior prediction, this paper uses raw position and velocity measurements of the human. For vehicle behavior prediction, this paper additionally uses environment features. The output vector denotes the stack of the -step future trajectory at time steps . Another output vector is a probability distribution over different intentions at time step . The function maps current and past measurements to the future trajectory, while the function maps current and past measurements to the current intention.
One possible design of the multi-task prediction model is to use an encoder-decoder-classifier architecture. The encoder serves as a common part for all sub-tasks, which maps the input vector to a hidden representation . The decoder works for trajectory prediction, which maps the hidden representation to the predicted future trajectory . The classifier aims to predict the intention from the hidden representation . Mathematically, the relationships among the encoder, the decoder, and the classifier are:
where is the parameter for the encoder, which affects all sub-tasks, is the parameter for the decoder, and is the parameter for the classifier. The total (ground truth) parameter of the model is .
In online adaptation of multi-task learning, the adaptation algorithm updates the prediction model only considering the error measured between the predicted trajectory and the ground truth trajectory. The ground truth intention is not available for online adaptation since it is not directly observable. Figure 1 illustrates an online adaptation theme which only adapts the encoder’s parameter.
Neural network architecture
We construct Recurrent Neural Network [Salehinejad et al.(2017)Salehinejad, Sankar, Barfett, Colak, and Valaee] (RNN) based architectures in our experiments to evaluate the effectiveness of MEKF and MEKF, as shown in Fig. 2. The neural networks follow the encoder-decoder-classifier structure for simultaneous intention and trajectory prediction as shown in Fig. 1. Trajectory prediction is based on encoder-decoder [Sutskever et al.(2014)Sutskever, Vinyals, and Le] structure and intention prediction is based on encoder-classifier structure. Both the encoder and the decoder are composed of single layer Gated Recurrent Units (GRU’s) [Cho et al.(2014)Cho, van Merrienboer, Bahdanau, and Bengio] and the classifier is composed of two-layer FC neural networks. In order to improve the performance of offline trained model, an attention mechanism [Bahdanau et al.(2014)Bahdanau, Cho, and Bengio] is applied to the output vectors of the encoder.
We used Mocap dataset and NGSIM dataset in our experiment. In each datset, we randomly split the dataset as 80% offline training, 10% offline validation and 10% testing according to different trials.
Mocap dataset. This is a human-motion capture dataset collected by researchers from CMU
1. We chose the wrist trajectories of three actions (walking, running, and jumping) of all subjects in the Mocap datasets 2. The intentions are identified with the labeled actions. There are 543 trials for all three actions.
US 101 human driving data from Next Generation SIMulation (NGSIM) dataset. It is a widely used benchmark dataset for autonomous driving [Colyar and Halkias(2007)]. We extract three actions from the dataset, which are driving with constant speed, acceleration, and deceleration respectively
3. At time step , if the vehicle will accelerate (or decelerate) in the next three seconds() , we label the intention as acceleration (or deceleration) at time step . Otherwise, we label it as constant speed. In our experiment, we used a subset of the dataset which contains 100 trials for all three actions.
We used accuracy to evaluate the intention prediction and average mean square error (MSE) for the trajectory prediction. The average MSE is computed as,
Where is total number of timesteps in the testing set. To maintain similar orders of magnitude on different datasets, we used dm unit for the trajectory in Mocap dataset, and used m unit for the trajectory in NGSIM dataset.
Before online adaptation, the prediction models are trained offline. We used an Adam optimizer with a 128 batch size and a 0.01 learning rate. For the Mocap dataset, past steps input information was used to predict the trajectories of the future steps and the intention. We used a concatenation of raw trajectory and speed as the input information. For the NGSIM dataset, past steps input information was used to predict the trajectories of the future steps and the intention. We used a concatenation of raw trajectories and extracted features as input information. The extracted features were similar to the features used in the parameter sharing generative adversarial imitation learning [Bhattacharyya et al.(2018)Bhattacharyya, Phillips, Wulfe, Morton, Kuefler, and Kochenderfer]. Table 3 shows the prediction performance after offline learning. In experiments of the adaptation, we studied the performance of various adaptation algorithms on hidden weights of encoder of these offline-trained models (with online adaptation on testing set).
|Metrics||CMU Mocap dataset||NGSIM dataset|
|MSE||3.271 (dm)||2.559 (m)|
a.2 Additional experiment
Effectiveness of proposed criterion in dynamic multi-epoch update strategy
In order to demonstrate the effectiveness of the proposed discrimination criterion in dynamic multi epoch update strategy, we design the following experiment on the NGSIM dataset. We compared three different criteria for DME.
the proposed criterion as discussed in section 3.3. In particular, we set as the quantile value of the errors , and as the quantile value of the errors. Under the error spectrum, the first are ”easy” samples, the middle to are ”hard” samples, and the last are ”anomaly” samples.
fixed criterion: we set for all samples. That means, we run fixed 2-epoch update strategy and use each sample twice.
random criterion: for each sample, we set with the probability of , set with the probability of , and set with the probability of . That means, the random criterion has same ”easy” and ”hard” ratio as the proposed criterion, but distinguishing ”easy”, ”hard” and ”anomaly” samples randomly.
|w/o DME||fixed criterion||random criterion||proposed criterion|
The results in table 4 show that: the proposed criterion outperforms other criteria, which justifies the effectiveness of the proposed error-based criterion. Nonetheless, we will investigate more reasonable and effective criterion for dynamic multi-epoch update in the future.
- We didn’t perform full body motion prediction, since it requires special design of the neural network model to encode the geometric constraints, which is out of the scope of this paper.
- In our experiment, value of acceleration was denoted as acceleration, and value of acceleration was denoted as deceleration.
- Abulikemu Abuduweili, Siyan Li, and Changliu Liu. Adaptable human intention and trajectory prediction for human-robot collaboration. arXiv preprint arXiv:1909.05089, 2019.
- Angelo Alessandri, Marta Cuneo, S Pagnan, and Marcello Sanguineti. A recursive algorithm for nonlinear least-squares problems. Computational Optimization and Applications, 38(2):195–216, 2007.
- Brian DO Anderson and John B Moore. Optimal filtering. Courier Corporation, 2012.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48. ACM, 2009.
- Shubhendu Bhasin, Rushikesh Kamalapurkar, Huyen T Dinh, and Warren E Dixon. Robust identification-based state derivative estimation for nonlinear systems. IEEE Transactions on Automatic Control, 58(1):187–192, 2012.
- Raunak P Bhattacharyya, Derek J Phillips, Blake Wulfe, Jeremy Morton, Alex Kuefler, and Mykel J Kochenderfer. Multi-agent imitation learning for driving simulation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1534–1539. IEEE, 2018.
- Chris Callison-Burch, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki, and Omar F Zaidan. Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 17–53. Association for Computational Linguistics, 2010.
- Yujiao Cheng, Weiye Zhao, Changliu Liu, and Masayoshi Tomizuka. Human motion prediction using semi-adaptable neural networks. In 2019 American Control Conference (ACC), pages 4884–4890. IEEE, 2019.
- KyungHyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. CoRR, abs/1409.1259, 2014. URL http://arxiv.org/abs/1409.1259.
- James Colyar and John Halkias. Us highway 101 dataset. Federal Highway Administration (FHWA), Tech. Rep. FHWA-HRT-07-030, 2007.
- John Cooper, Jiaxing Che, and Chengyu Cao. The use of learning in fast adaptation algorithms. International Journal of Adaptive Control and Signal Processing, 28(3-5):325–340, 2014.
- Alexander Fink, Oliver Nelles, Martin Fischer, and Rolf Isermann. Nonâlinear adaptive control of a heat exchanger. International Journal of Adaptive Control & Signal Processing, 15(8):883â906, 2001.
- Youji Iiguni, Hideaki Sakai, and Hidekatsu Tokumaru. A real-time learning algorithm for a multilayered neural network based on the extended kalman filter. IEEE Transactions on Signal processing, 40(4):959–966, 1992.
- Andrew H Jazwinski. Stochastic processes and filtering theory. Courier Corporation, 2007.
- Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Jyrki Kivinen, Alexander J Smola, and Robert C Williamson. Online learning with kernels. IEEE transactions on signal processing, 52(8):2165–2176, 2004.
- Stéphanie Lefèvre, Dizan Vasquez, and Christian Laugier. A survey on motion prediction and risk assessment for intelligent vehicles. ROBOMECH journal, 1(1):1, 2014.
- Lennart Ljung and Pierre Priouret. A result on mean square error obtained using general tracking algorithms. International Journal of Adaptive Control & Signal Processing, 5(4):231–248, 2010.
- Hiroyuki Moriyama, Nobuo Yamashita, and Masao Fukushima. The incremental gauss-newton algorithm with adaptive stepsize rule. Computational Optimization and Applications, 26(2):107–141, 2003.
- Alex T Nelson. Nonlinear estimation and modeling of noisy time-series by dual kalman filtering methods. Doctor of Philosopy, Oregon Graduate Institute of Science and Technology, 2000.
- Levent Ozbek and Murat Efe. An adaptive extended kalman filter with application to compartment models. Communications in Statistics-Simulation and Computation, 33(1):145–158, 2004.
- Boris T Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.
- Ning Qian. On the momentum term in gradient descent learning algorithms. Neural networks, 12(1):145–151, 1999.
- Dennis W. Ruck, Steven K. Rogers, Matthew Kabrisky, Peter S. Maybeck, and Mark E. Oxley. Comparative analysis of backpropagation and the extended kalman filter for training multilayer perceptrons. IEEE Transactions on Pattern Analysis & Machine Intelligence, (6):686–691, 1992.
- Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M Kitani, Dariu M Gavrila, and Kai O Arras. Human motion trajectory prediction: A survey. arXiv preprint arXiv:1905.06113, 2019.
- Hojjat Salehinejad, Sharan Sankar, Joseph Barfett, Errol Colak, and Shahrokh Valaee. Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078, 2017.
- Wenwen Si, Tianhao Wei, and Changliu Liu. Agen: Adaptable generative prediction networks for autonomous driving. In 2019 IEEE Intelligent Vehicles Symposium (IV), pages 281–286. IEEE, 2019.
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014.
- Andreas S Weigend. Time series prediction: forecasting the future and understanding the past. Routledge, 2018.
- Jann N Yang, Silian Lin, Hongwei Huang, and Li Zhou. An adaptive extended kalman filter for structural damage identification. Structural Control and Health Monitoring: The Official Journal of the International Association for Structural Control and Monitoring and of the European Association for the Control of Structures, 13(4):849–867, 2006.