Closed-Loop Statistical Verification of Stochastic Nonlinear Systems Subject to Parametric Uncertainties

Closed-Loop Statistical Verification of Stochastic Nonlinear Systems Subject to Parametric Uncertainties

John F. Quindlen Ph.D. Candidate, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology (MIT).    Ufuk Topcu Assistant Professor, Department of Aerospace Engineering and Engineering Mechanics, University of Texas at Austin    Girish Chowdhary Assistant Professor, Departments of Agricultural & Biological Engineering and Aerospace Engineering, University of Illinois Urbana-Champaign    Jonathan P. How Richard C. Maclaurin Professor of Aeronautics and Astronautics, Laboratory for Information and Decision Systems (LIDS), MIT

This paper proposes a statistical verification framework using Gaussian processes (GPs) for simulation-based verification of stochastic nonlinear systems with parametric uncertainties. Given a small number of stochastic simulations, the proposed framework constructs a GP regression model and predicts the system’s performance over the entire set of possible uncertainties. Included in the framework is a new metric to estimate the confidence in those predictions based on the variance of the GP’s cumulative distribution function. This variance-based metric forms the basis of active sampling algorithms that aim to minimize prediction error through careful selection of simulations. In three case studies, the new active sampling algorithms demonstrate up to a 35% improvement in prediction error over other approaches and are able to correctly identify regions with low prediction confidence through the variance metric.

I Introduction

Ever increasing demands for high performance necessitates the adoption of advanced, nonlinear control techniques in a wide variety of engineering systems. The complexity of these controllers, coupled with the increasing complexity of the open-loop systems themselves, complicates verification that the closed-loop system actually satisfies the performance requirements at off-nominal conditions, called parametric uncertainties. Control system verification analyzes the closed-loop system and attempts to identify at which parametric uncertainties the resulting trajectory fails to meet the requirements.

Traditional analytical and numerical verification techniques[1, 2] provably verify the system’s performance at various uncertainties, but require modeling assumptions that restrict their suitability to complex systems with highly nonlinear dynamics and hybrid controllers. Recent work[3, 4, 5] addressed some of these concerns, but are limited by the size of the state and parameter spaces. Stochastic dynamics further challenge these approaches and restrict their applicability. For comparison, statistical verification techniques[6, 7] replace provable guarantees with looser probabilistic bounds, but are capable of handling a much broader range of problems with fewer restrictions. Statistical verification methods rely upon extensive simulations of the system to test the likelihood of requirement satisfaction at various conditions and are well-suited to stochastic systems.

Recent work developed data-driven statistical verification techniques[8, 9] to efficiently verify deterministic closed-loop systems without relying upon exhaustive simulations. These data-driven methods construct machine learning models from small sets of simulation trajectories and predict the response at unobserved parameter settings. The latest work[10] introduced Gaussian process (GP) prediction models and novel active sampling procedures to improve the accuracy of predictions given limited amounts of simulation data. The deterministic GP-based approach also quantifies the prediction confidence online without the need for external validation datasets and carefully selects simulations to maximize the prediction confidence.

This paper extends data-driven methods to stochastic verification problems and provides three contributions: a GP-based statistical verification framework for stochastic systems, a metric to estimate the accuracy of the framework’s predictions, and active sampling algorithms specifically developed for closed-loop statistical verification. Given a small set of stochastic simulations, the first two contributions will predict the probability of requirement satisfaction at all possible parametric uncertainties and compute the confidence in those predictions. The third contribution exploits this prediction confidence to select subsequent simulations and minimize prediction error. Three case studies are provided to demonstrate the effectiveness of the approach to maximize accuracy given a constraint on the number of simulations.

Ii Problem Description

Consider the following nonlinear system


where is the state vector, is the control input vector, represents the parametric uncertainties, and is the stochastic noise. A candidate controller generates control inputs according to some predetermined function . Given this controller, the resulting closed-loop dynamics reduce to


and the trajectory is a function of nominal initial state , parametric uncertainties , and stochastic noise . We assume the nominal initial state is known and fixed.

The parametric uncertainties can arise from a variety of sources. For instance, these parameters may include system properties such as vehicle mass and inertia that will typically vary across different operational scenarios. The main impact of the parametric uncertainties is as variable initial conditions that determine the evolution of the subsequent state trajectory . These parametric uncertainties may not be known during real-world execution of the system, but are assumed to fall within some bounded, compact set that is known in advance.

Assumption 1

The set of all possible parametric uncertainties is a known, compact set .

Assumption 1 ensures the system has known feasible bounds on the uncertain conditions for the verification procedure to examine. Most physical systems will naturally have bounds on the set of allowable operating conditions. For instance, aircraft will typically fly within a maximum take-off weight set by structural limitations and a minimum (empty) weight.

The major difference between Equation 2 and the deterministic systems found in earlier problems[9, 10, 8] is the addition of stochastic noise in the dynamics. This noise term encapsulates any source of randomness, such as process and/or measurement noise and ultimately breaks the previous works’ fundamental assumption of deterministic (noise-free) dynamics. The randomness will cause multiple trajectories starting from the same and to obtain different results. The noise could have a large effect on the satisfaction of the performance requirements at a given and .

Ii-a Measurements of Trajectory Performance

A trajectory must meet certain performance requirements in order for it to be considered “satisfactory.” As before[10], these requirements are supplied by relevant certification authorities and may include a wide range of potential specifications with varying complexity. Commonly, the requirements can be written as signal temporal logic (STL)[11] specifications, which also offers a convenient method to quantify the satisfaction of the requirements. Central to STL is the computation of a scalar robustness degree that quantifies the minimum robustness of the trajectory with respect to the requirement specification . Therefore, the satisfaction of the requirements along the entire trajectory is captured with a single scalar variable. Building upon that concept, this work assumes the robustness of a trajectory to a given performance requirement is measured with a single measurement variable , where in problems with STL specifications. Other applications will specify the performance requirements and quantify their satisfaction through different methods, as will be seen in the task allocation example from Section V-B.

Assumption 2

The simulation model provides a single, scalar output that quantifies the minimum robustness of trajectory to a specified performance requirement. The sign of indicates satisfaction of the requirements, where signifies “satisfactory” performance and signifies “unsatisfactory” performance.

Since the closed-loop trajectory is a function of nominal (fixed) and , we write the trajectory robustness measurements as an explicit function of to emphasize the effect of the parametric uncertainties upon the robustness of the trajectory. As changes, so will the satisfaction of the performance requirement.

Ii-B Probabilistic Satisfaction of Requirements

Although a closed-loop trajectory and its corresponding robustness measurement are functions of parametric uncertainties, stochasticity will cause multiple trajectories at the same setting to obtain different . If a large number of simulations are performed at this setting, then the corresponding measurements will produce a continuous distribution of possible robustness values. The underlying true distribution of may take a variety of different forms, but this work addresses the case with a Gaussian distribution.

Assumption 3

The distribution of robustness measurements at every follows a Gaussian distribution with spatially-varying mean and constant standard deviation .

While this assumption does restrict the class of problems, it simplifies the implementation details to improve reader clarity and better highlight the contributions of the approach. This paper is intended to introduce the statistical verification framework and discuss the challenges and contributions on clearer, and still relevant, verification problems. Upcoming extensions will relax Assumption 3 to consider spatially-varying [12] and non-Gaussian distributions[13], but also require significantly more complex and sensitive statistical inference techniques to model those distributions. The implementation details will change in these extensions, but the fundamental approach remains the same.

From the verification perspective, where each trajectory either satisfies the requirement or does not, the Gaussian distribution of creates a Bernoulli distribution for the likelihood an arbitrary trajectory at will satisfy the requirement.

Definition 1

The satisfaction probability function defines the probability an arbitrary trajectory initialized with will satisfy the performance requirement.

This probability of satisfaction is the cumulative distribution of positive measurements,


While defines the probabilistic satisfaction of the requirements, the challenge for stochastic verification is will generally be unknown in advance. Instead, the statistical verification goal is to compute an estimated for all possible conditions in and minimize the difference between the predictions and .

Simultaneously, statistical verification must also contend with computational costs. In most applications, external constraints restrict the computational budget allocated to the statistical verification process. For instance, verification typically relies upon high-fidelity models of complex systems and subsystems that drive up the computational time for a single trajectory. It is infeasible to simply saturate with multiple simulations at every because each simulation trajectory requires a certain amount of time and computational resources. Regardless of the source, each application has some feasible limit on the amount of time, resources, or money allocated to the statistical verification process. This work models the feasible limit as an upper bound on the number of simulation trajectories.

Iii Statistical Verification Framework

Given the verification objective, there are multiple possible approaches to compute from a finite set of simulation-based observations. The most basic form is to treat each trajectory as a Bernoulli trial with binary evaluations {“satisfactory”, “unsatisfactory”} from the sign of . While this approach can address all types of systems, including those that do not meet Assumptions 2 and 3, a binary-based approach requires multiple simulations at every to construct a binomial distribution. Each repetition counts against the sampling budget and means one few simulation at another location. Although the cost of each simulation is less of a problem for applications with simple simulation models, the cost of training a prediction model given a large number of simulations at each location is non-negligible. Therefore, the large number of simulations required for practical predictions with binomial distributions restricts the tractability of those approaches.

Instead, this work entirely avoids the need for expensive binomial distributions in the relevant class of systems through the direct use of and its Gaussian probability density function (PDF). The main idea is to exploit a single stochastic trajectory at each training location and its noisy measurement to infer the underlying PDF, which defines the cumulative distribution . Although various computational methods are possible[14, 15], this problem is particularly well-suited to Gaussian process regression[16] due to the Gaussian distribution of .

Iii-a GP-based Prediction Model

The Gaussian process regression model follows a similar format to the earlier work in verification of deterministic systems[10]. A finite collection of total simulation trajectories forms a training dataset consisting of their parameter settings and corresponding robustness values . Unlike the deterministic approach, these stochastic measurements require the introduction of a Gaussian likelihood model, where the GP does not model directly, but infers the latent mean .

The training process constructs the GP regression model from the information provided by the training dataset . More details are found in [16], but fundamentally the training procedure uses Bayesian inference to place a posterior probability distribution on latent mean given . Assuming a zero-mean prior and likelihood model , the posterior predictive distribution at an arbitrary location follows a Gaussian distribution . The posterior predictive mean and covariance are given by


where is the scalar kernel function, is the vector of and is the matrix for . Different choices are possible, but this work uses the common squared exponential kernel with automatic relevance determination (SE-ARD)[16] for the kernel function.

The choice in GP hyperparameters may drastically change the predictive distribution, even with the same training set , since these hyperparameters control the kernel function and the likelihood model. Unfortunately, the optimal choice of hyperparameters that perfectly replicates will not be known in advance and must be estimated online. This work uses maximum likelihood estimation (MLE)[16] to optimize the kernel hyperparameters given only the information provided by . The use of the SE-ARD kernel also enables the MLE procedure to adjust the sensitivity of the kernel to each element of as may have a higher sensitivity to certain elements than others. Additionally, the likelihood model contains a hyperparameter to estimate standard deviation as this term may also be completely unknown. If is unknown, then an estimate may be computed in the same MLE optimization process as the kernel hyperparameters or separately by repeated sampling at a training locations in .

Iii-A1 Expected Probability of Satisfaction

Although and define the predictive PDF for , Equation 4 only completes half the goal. Definition 1 requires the cumulative distribution function (CDF) for , not the PDF. The CDF in Equation 3 defined the true , but this computation requires perfect knowledge of and . Instead, the predicted satisfaction probability function marginalizes the CDF over the posterior predictive distribution of ,


Iii-B Prediction Confidence

While Equation 5 computes the expected probability of requirement satisfaction, will likely fail to perfectly model . Therefore, the statistical verification framework must not only provide , but also indicate where its confidence in the accuracy of these predictions is low. Unsurprisingly, the true prediction error is unknown, but offline validation methods using external validation datasets[14] can estimate by comparing the predictions against the validation set’s known, true values. However, these external validation methods are wasteful as they generally require the removal of valuable training data for the independent validation set.

For comparison, probabilistic inequalities provide theoretically-justified bounds on without the use of validation sets. In particular, Chebyshev’s inequality bounds the prediction error by the variance of the CDF,


where . Lower CDF variance will translate into lower probabilistic bounds on and thus higher confidence in the accuracy of . The primary challenge with Equation 6 is the variance lacks an analytical closed-form solution. Fortunately, the variance can be approximated using a 1st or 2nd order Taylor series expansion of the nonlinear random function[17]. The 1st order approximation is


which also happens to conservatively upper bound the 2nd order approximation. For simplicity, we subsequently refer to the approximate CDF variance in Equation 7 as .

Despite the convenience of Equation 7, the accuracy of an approximation of a nonlinear random function is limited. This inaccuracy makes it inadvisable to blindly substitute Equation 7 into Equation 6 without careful consideration. However, the approximation in Equation 7 does still provide a perfect metric to identify regions of where the confidence in is low and provide intuition on the sensitivity of . The power of Equation 7 is its value as a computationally-efficient, online validation tool for signifying prediction confidence. For instance, approximate variance is highest in regions with small and large , meaning the confidence in the predictions should be quite low. The next section will exploit the approximate CDF variance to derive a closed-loop verification process that seeks to minimize prediction errors.

Iv Closed-Loop Statistical Verification

As discussed in Section II, statistical verification will have some feasible limit on the amount of time or computational resources allocated to the overall process. This limit is approximated as a cap on the number of simulation trajectories, . Given this restriction on the size of training dataset , the ideal scenario would only perform informative simulations and carefully select all to minimize the prediction error over all . However, the information-maximizing training dataset is not apparent until after all the trajectories have been obtained. To address that issue, this work applies active learning[18] to iteratively select informative settings for future simulations and minimize the expected prediction error. To emphasize the iterative, feedback-based nature of the active learning procedure, we label the process closed-loop statistical verification.

Active learning describes a wide variety of different procedures[18, 19, 20, 21, 22, 23], each with their own definition for the “best” sample to run next. Most of these procedures focus on a particular aspect of the Gaussian PDF. For instance, procedures focused on the PDF mean[19, 9] favor points with near zero, meaning the best location is . Although originally intended for binary classification with support vector machines, such an approach does correctly label points with low as informative since the CDF variance Equation 7 is high there. Likewise, PDF variance-based approaches and extensions[20, 21, 22, 23] are significantly more common for GP methods and aim to reduce the PDF variance . These approaches favor points with high variance, , which also correctly emphasizes points with comparatively high CDF variance since will be large. Although they both correctly emphasize certain aspects, neither of those two approaches explicitly minimizes the posterior CDF variance.

Iv-a Reduction in CDF Variance

Earlier work in deterministic closed-loop verification[10] specifically developed new sample selection metrics to maximize prediction confidence. Even though the implementation details have changed, this work utilizes the same underlying motivation and attempts to minimize the approximate CDF variance Equation 7 in order to maximize the confidence in . The ideal selection criterion would minimize the maximum posterior CDF variance after the new sample data at was added, but this requires the posterior training set to be known apriori. Prior knowledge of is an impossible proposition since cannot be known before a simulation has actually been performed there. Even if the expected posterior set replaces infeasible , the high computational cost of retraining the GP at every prospective sample location, nominally an operation and at best, renders the approach computationally intractable.

A more computationally tractable approach maximizes the local improvement in posterior CDF variance, effectively selecting the setting which will experience the greatest reduction in CDF variance if a simulation is performed at that location. The local change in CDF variance is labeled by . Although the change requires expected posterior information , it does not require the GP model to be retrained, which was the source of the previous computational intractability. The local posterior CDF variance at location after a measurement there can be written purely in terms of the current information. After the Woodbury matrix identity, the expected posterior covariance reduces to


while mean . Ultimately, the local change in CDF variance is given by


Note that the CDF variance reduction criterion in Equation 9 favors points with low and high , just as in Equation 7.

Iv-B Sampling Algorithms

The section criterion from Equation 9 forms the basis of the closed-loop stochastic verification framework. Since the actual set is infinite, the framework constructs a high resolution lattice to replicate and then selects all training locations from rather than directly . Set contains all remaining points not in training set available for future simulations. This paper presents two versions of the closed-loop verification framework.

Iv-B1 Sequential Sampling

The most straightforward implementation of closed-loop verification is sequential sampling, described in Algorithm 1. The process starts with an initial training set of passively-selected samples, usually obtained through random sampling or another open-loop design of experiments approach. This initial training set of size produces the initial GP so active sampling can be performed. The sequential procedure then selects one sample from according to Equation 9 before it performs one simulation at the selected vector, obtains , and updates the GP prediction model. This iterative procedure repeats until the number of remaining samples has been reached.

1:  Input: initial training set , available sample locations , max # of additional samples
2:  Initialize: train GP regression model
3:  for i=1:T do
4:     Select
5:     Perform simulation at , obtain measurement
6:     Add to training set , remove from
7:     Retrain GP model with updated
8:  end for
9:  Return: expected and variance
Algorithm 1 Sequential closed-loop stochastic verification

Iv-B2 Batch Sampling

In comparison to Algorithm 1, batch sampling[18] offers further computational efficiency by selecting multiple samples between retraining steps. Assuming the same limit , batch selection reduces the number of GP retraining processes and can also fully exploit any parallel computing capabilities of the simulation environment. Batch approaches select samples at once and perform their simulations in parallel. While this provides computational savings, it also introduces the possibility of redundant samples if the batch sample set does not possess adequate diversity.

First proposed for deterministic closed-loop verification in [10], determinantal point processes (DPPs)[24] present efficient probabilistic methods for encouraging diversity within without severe computational overhead. The batch framework first converts into a probability distribution , where is the normalization constant. A relatively small number of samples are obtained from according to and used to construct a DPP. This paper uses , but these are simply samples of locations from the distribution and not actual simulations are performed. The DPP randomly selects locations from for based upon a modified version of that penalizes similarities in and subsequently spreads the datapoints out across high-valued regions in with significantly less redundancy. Algorithm 2 details the batch closed-loop verification framework. Rather than number of additional simulations, Algorithm 2 operates in batches of simulations, assuming .

1:  Input: initial training set , available sample locations , # of iterations , batch size
2:  Initialize: train GP regression model
3:  for i=1:T do
4:     Initialize:
5:     Transform into distribution
6:     Form k-DPP from random samples of
7:     Generate random samples from DPP, add to
8:     Run simulation , obtain measurements
9:     Add to training set , remove from
10:     Retrain GP model with updated
11:  end for
12:  Return: expected and variance
Algorithm 2 Batch closed-loop stochastic verification framework using determinantal point processes

V Examples

This section examines statistical verification applied to three stochastic systems and demonstrates the improved performance of Algorithms 1 and 2 over existing techniques.

V-a Example 1: Model Reference Adaptive Control System

The first example considers a stochastic version of the concurrent learning model reference adaptive control (CL-MRAC) system from earlier deterministic work[9, 10]. The CL-MRAC example examines a second order linear system with two uncertain parameters . The adaptive control system estimates these parameters online and attempts to track a desirable reference trajectory. The nonlinearities associated with adaptation result in a highly nonlinear closed-loop system.

The verification goal is to predict the probability the actual trajectory will remain within 1 unit of the reference trajectory at all times between seconds, given in STL format as


where tracking error is the difference between actual position and reference position . The STL operator refers to “for all times between and seconds.” Measurement is the STL robustness degree for the trajectory initialized with uncertainty setting . The sampling lattice covers and with a grid of 40,401 points. In this example, the open-loop dynamics are subject to additive process noise with Gaussian distribution . The underlying true distribution was obtained by repeated sampling at each of the points in . To avoid any potential issues, Gaussian distributions were fit to the raw data and the variance was averaged across to return .

Figure 2 compares the performance of Algorithm 2 against similar procedures using the existing selection metrics discussed in Section IV as well as an open-loop, random sampling procedure. These procedures all start with an initial training set of 50 simulations and select batches of points until a total of 450 simulations has been reached. Neither the kernel hyperparameters nor are assumed to be known so the procedures estimate these online using maximum likelihood estimation. In order to fairly compare each procedure, the algorithms all start from the same 100 randomly-chosen initial training sets with the same random seed. At the conclusion of the process, Algorithm 2 demonstrates a 29%, 31%, and 35% improvement in average mean absolute error (MAE) over the PDF mean, PDF variance, and random sampling approaches. Additionally, the performance of the algorithms in each of the 100 test cases can be directly compared since they all start with the same and random seed. At this level, Figure 2 illustrates that Algorithm 2 will either eventually match or outperform the existing sampling strategies nearly 100% of the time. Although the exact numbers will change for different distributions, these results highlight the value of the selection criteria from Equation 9 to further reduce prediction error given a fixed number of samples.

Fig. 1: (Example 1) Comparison of mean absolute error (MAE) convergence for the four different sampling strategies. The standard deviation intervals around the mean (solid lines) are given by the 0.5 bound.
Fig. 2: (Example 1) Ratio of cases where Algorithm 2 directly outperforms or matches the MAE of the indicated strategies.
Fig. 1: (Example 1) Comparison of mean absolute error (MAE) convergence for the four different sampling strategies. The standard deviation intervals around the mean (solid lines) are given by the 0.5 bound.

Figure 3 demonstrates the utility of CDF variance to identify regions of large prediction errors, regardless of the particular sampling strategy. As discussed in Section III-B, the true CDF variance is unavailable, but the approximate CDF variance Equation 7 still provides a meaningful metric to compare prediction confidence across . One of the best uses of Equation 7 is to rank points in according to their CDF variance in order to identify which predictions to trust the least. Figure 3 displays the reduction in prediction error when the points with the top 5% of CDF variance are removed. The 12-22% improvement in MAE proves that the CDF variance did indeed correctly identify and remove regions with large . When the top 10% is removed, the improvement jumps to 40%.

Fig. 3: (Example 1) Reduction in MAE after all points with the top 5% of CDF variance are removed.

V-B Example 2: Robust Multi-Agent Task Allocation

The second example addresses the robust multi-agent task allocation problem from [25] with added stochasticity. The task allocation problem attempts to assign a team of four UAVs to complete fire surveillance tasks that will take longer or shorter depending on the wind speed and direction . Task durations are also corrupted by zero-mean Gaussian noise. Unforeseen time delays will compound and may potentially lead the UAVs to miss the completion of tasks within their assigned window, thus lowering the overall realized mission score. The verification goal is to determine whether the team of UAV agents will sufficiently complete the ordered tasking and achieve a minimum mission score at different wind settings. Sampling grid spans the set of feasible wind conditions km/hr with 16,641 possible trajectory settings.

Figures 5 and 5 compare the performance of Algorithm 1 against the competing sampling strategies for 250 randomly-initialized test cases. Ultimately, the MAE performance is consistent with the last example. Algorithm 1 demonstrates a 10-20% improvement in average MAE over the existing strategies and either matches or exceeds the MAE of the competing approaches when directly compared against one another.

Fig. 4: (Example 2) Comparison of MAE convergence for the four different sampling strategies over 250 randomly selected initializations.
Fig. 5: (Example 2) Ratio of cases where Algorithm 1 directly outperforms or matches the MAE of the indicated strategies.
Fig. 4: (Example 2) Comparison of MAE convergence for the four different sampling strategies over 250 randomly selected initializations.

Figure 6 demonstrates a similar reduction in MAE as witnessed in Figure 3. Once the points with the top 5% of CDF variance are removed, the MAE of the predictions in the remaining 95% of the data reduces by up to 12-16%. If the top 10% are removed, the average reduction reaches at least 28% for all the sampling strategies. This supports the conclusion that CDF variance identifies points with low prediction confidence where may be large.

Fig. 6: (Example 2) Reduction in MAE after all points with the top 5% of CDF variance are removed.

V-C Example 3: Lateral-Directional Autopilot

Lastly, the third example considers the altitude-hold requirement of a lateral-directional autopilot[26]. In this problem, a Dryden wind turbulence model[27] augments the original nonlinear aircraft dynamics model to introduce stochasticity into the closed-loop dynamics. The performance requirement expects the autopilot to maintain altitude within a set threshold of the initial altitude when the autopilot is engaged. This work explores a relaxed version of the original requirement[26] in which the aircraft must remain within a 55 foot window around the initial altitude ,


The performance measurement is the STL robustness degree . This example tests the satisfaction of the requirement against initial Euler angles for roll , pitch , and yaw and longitudinal inertia , with a constant reference heading of 112. The 4D grid consists of 937,692 possible parameter settings for the simulations.

Figures 8 and 8 illustrate the MAE performance of the four different sampling strategies given a batch size of . The equivalent of Figures 3 and 6 is skipped but displays the same exact behavior. This example begins with 100 simulations in the initial training set and compares the performance over 120 random initializations. As before, Algorithm 2 outperforms the PDF variance-focused and random sampling procedures with a 27% reduction in average MAE. Unlike the previous two examples, Algorithm 2 only slightly edges out the competing mean-focused strategy. This new result is due to the comparatively low for noisy from the low-altitude turbulence model. The resulting tighter distribution only experiences changes in near , which favors the mean-focused selection criteria. If the turbulence is increased then the distribution widens dramatically and the performance of the PDF mean-focused metric degrades significantly like the previous two examples. These case studies all highlight the improved performance of the novel CDF variance algorithms compared to the existing sampling strategies. Just as important, Algorithms 1 and 2 will remain the best-performing sampling strategy even as the distribution changes, whereas the performance of the other strategies will vary according to changes in .

Fig. 7: (Example 3) Comparison of MAE convergence for the four different sampling strategies over 120 random initializations.
Fig. 8: (Example 3) Ratio of cases where Algorithm 2 directly outperforms or matches the MAE of the indicated strategies.
Fig. 7: (Example 3) Comparison of MAE convergence for the four different sampling strategies over 120 random initializations.

Vi Conclusion

This work introduced machine learning methods for simulation-based statistical verification of stochastic nonlinear systems. In particular, the paper presented a GP-based statistical verification framework to efficiently predict the probability of requirement satisfaction over the full space of possible parametric uncertainties given a limited amount of simulation data. Additionally, Section III-B developed new criteria based upon the variance of the cumulative distribution function to qualify confidence in the predictions. This metric provides a simple validation tool for online identification of regions where the prediction confidence is low. Section IV builds upon this metric and introduces sequential and batch sampling algorithms for efficient closed-loop verification. These new verification procedures demonstrate up to a 35% improvement in prediction error over competing approaches in the three examples in Section V. The examples also serve to highlight the utility of the CDF variance to correctly identify low-confidence regions in the parameter space without external validation datasets.

While the paper only considers Gaussian distributions for the performance measurements, this work lays the foundation for more complex statistical verification frameworks capable of handing a wider range of distributions. Upcoming work will adapt the CDF variance metric and closed-loop verification procedures to recent developments in modeling of spatially-varying standard deviations[12] and non-Gaussian distributions[13]. Additionally, this work can be extended to high-dimensional GP representations[28]. While the implementation considers Gaussian distributions, the fundamental concepts have broader utility.


This work is supported by the Office of Naval Research and Air Force Office of Scientific Research under grants ONR MURI N00014-11-1-0688 and AFOSR FA9550-15-1-0146 .


  • [1] E. M. Clarke, O. Grumberg, and D. A. Peled, Model Checking.   MIT Press, 1999.
  • [2] S. Prajna, “Barrier certificates for nonlinear model validation,” Automatica, vol. 42, no. 1, pp. 117–126, 2006.
  • [3] U. Topcu, “Quantitative local analysis of nonlinear systems,” Ph.D. dissertation, University of California, Berkeley, 2008.
  • [4] A. Majumdar, A. A. Ahmadi, and R. Tedrake, “Control and verification of high-dimensional systems with DSOS and SDSOS programming,” in IEEE Conference on Decision and Control, 2014.
  • [5] J. Kapinski, J. Deshmukh, S. Sankaranarayanan, and N. Arechiga, “Simulation-guided Lyapunov analysis for hybrid dynamical systems,” in Hybrid Systems: Computation and Control, 2014.
  • [6] P. Zuliani, C. Baier, and E. M. Clarke, “Rare-event verification for stochastic hybrid systems,” in Hybrid Systems: Computation and Control, 2012.
  • [7] E. M. Clarke and P. Zuliani, “Statistical model checking for cyber-physical systems,” in International Symposium for Automated Technology for Verification and Analysis, 2011.
  • [8] A. Kozarev, J. F. Quindlen, J. How, and U. Topcu, “Case studies in data-driven verification of dynamical systems,” in Hybrid Systems: Computation and Control, 2016.
  • [9] J. F. Quindlen, U. Topcu, G. Chowdhary, and J. P. How, “Active sampling-based binary verification of dynamical systems,” in AIAA Guidance, Navigation, and Control Conference, 2018. [Online]. Available:
  • [10] ——. (2017) Active sampling for closed-loop statistical verification of uncertain nonlinear systems. Preprint. [Online]. Available:
  • [11] O. Maler and D. Nickovic, Monitoring Temporal Properties of Continuous Signals.   Springer Berlin Heidelberg, 2004, pp. 152–166.
  • [12] M. Lázaro-Gredilla and M. Titsias, “Variational heteroscedastic gaussian process regression,” in The 28th International Conference on Machine Learning, Bellevue, Washington, July 2011.
  • [13] D. Seiferth, G. Chowdhary, M. Muhlegg, and F. Holzapfel, “Online gaussian process regression with non-gaussian likelihood,” in American Control Conference, 2017.
  • [14] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 1st ed.   Springer, 2007.
  • [15] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, vol. 1, pp. 211–244, June 2001.
  • [16] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, T. Dietterich, Ed.   MIT Press, 2006.
  • [17] A. H.-S. Ang and W. H. Tang, Probability Concepts in Engineering, 2nd ed.   Wiley, 2007.
  • [18] B. Settles, Active Learning, R. J. Brachman, W. W. Cohen, and T. G. Dietterich, Eds.   Morgan and Claypool, 2012.
  • [19] J. Kremer, K. S. Pedersen, and C. Igel, “Active learning with support vector machines,” Data Mining and Knowledge Discovery, vol. 4, no. 4, pp. 313–326, July 2014.
  • [20] G. Chen, Z. Sabato, and Z. Kong, “Active learning based requirement mining for cyber-physical systems,” in IEEE Conference on Decision and Control, 2016.
  • [21] T. Desautels, A. Krause, and J. W. Burdick, “Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization,” Journal of Machine Learning Research, vol. 15, pp. 4053–4103, 2014.
  • [22] A. Gotovos, N. Casati, and G. Hitz, “Active learning for level set estimation,” in International Joint Conference on Artificial Intelligence, 2013, pp. 1344–1350.
  • [23] Y. Zhang, T. N. Hoang, K. H. Low, and M. Kankanhalli, “Near-optimal active learning of multi-output gaussian processes,” in AAAI Conference on Artificial Intelligence, 2016.
  • [24] A. Kulesza and B. Taskar, “k-DPPs: Fixed-size determinantal point processes,” in International Conference on Machine Learning, 2011.
  • [25] J. F. Quindlen and J. P. How, “Machine Learning for Efficient Sampling-based Algorithms in Robust Multi-Agent Planning under Uncertainty,” in AIAA SciTech Conference, 2017.
  • [26] C. Elliott, G. Tallant, and P. Stanfill, “An example set of cyber-physical V&V challenges for S5,” in Air Force Research Laboratory Safe and Secure Systems and Software Symposium (S5) Conference, Dayton, OH, July 2016.
  • [27] Department of Defense MIL-HDBK-1797, “Flying qualities of piloted aircraft.”
  • [28] K. Kandasamy, J. Schneider, and B. Poczos, “High dimensional bayesian optimization and bandits via additive models,” in International Conference on Machine Learning, 2015.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description