Towards Verified Artificial Intelligence
Verified artificial intelligence (AI) is the goal of designing AI-based systems that are provably correct with respect to mathematically-specified requirements. This paper considers Verified AI from a formal methods perspective. We describe five challenges for achieving Verified AI, and five corresponding principles for addressing these challenges.
Artificial intelligence (AI) is a term used for computational systems that attempt to mimic aspects of human intelligence (e.g., see ). Russell and Norvig  describe AI as the study of general principles of rational agents and components for constructing these agents. More broadly, the field of AI involves building intelligent entities that mimic ‘cognitive’ functions we intuitively associate with human minds, such as ‘learning’ and ‘problem solving.’ We interpret the term AI broadly to include closely-related areas such as machine learning . Systems that heavily use AI, henceforth referred to as AI-based systems, have had a significant impact in society in domains that include healthcare, transportation, social networking, e-commerce, education, etc. This growing societal-scale impact has brought with it a set of risks and concerns including errors in AI software, cyber-attacks, and safety of AI-based systems [55, 20, 3]. Therefore, the question of verification and validation of AI-based systems has begun to demand the attention of the research community. We define “Verified AI” as the goal of designing AI-based systems that have strong, ideally provable, assurances of correctness with respect to mathematically-specified requirements. How can we achieve this goal?
A natural starting point is to consider formal methods — a field of computer science and engineering concerned with the rigorous mathematical specification, design, and verification of systems [72, 16]. At its core, formal methods is about proof: formulating specifications that form proof obligations, designing systems to meet those obligations, and verifying, via algorithmic proof search, that the systems indeed meet their specifications. Verification techniques such as model checking [14, 53, 15] and theorem proving (see, e.g. [49, 34, 30]) are used routinely in the computer-aided design of integrated circuits and have been widely applied to find bugs in software, analyze embedded systems, and find security vulnerabilities. At the heart of these advances are computational proof engines such as Boolean satisfiability (SAT) solvers , Boolean reasoning and manipulation routines based on Binary Decision Diagrams (BDDs) , and satisfiability modulo theories (SMT) solvers .
In this paper, we consider the challenge of Verified AI from a formal methods perspective. That is, we review the manner in which formal methods have traditionally been applied, analyze the challenges this approach may face for AI-based systems, and propose techniques to overcome these challenges. To begin with, consider the typical formal verification process as shown in Figure 1, which begins with the following three inputs:
A model of the system to be verified, ;
A model of the environment, , and
The property to be verified, .
The verifier generates as output a YES/NO answer, indicating whether or not satisfies the property in environment . Typically, a NO output is accompanied by a counterexample, also called an error trace, which is an execution of the system that indicates how is violated. Some formal verification tools also include a proof or certificate of correctness with a YES answer.
In this paper, we use the term “formal verification” to apply to any verification technique that uses some aspect of formal methods. For instance, we include simulation-based hardware verification methods that, while based on formal specifications (assertions), employ best-effort heuristics to find violations of those specifications. (The term “semi-formal verification” is sometimes used for such methods.) Such simulation-based verification methods have also found practical use in industrial verification of cyber-physical systems, e.g., for automotive systems [33, 25, 73].
In order to apply formal verification to AI-based systems, at a minimum, one must be able to generate the three inputs , and in formalisms for which (ideally) there exist decision procedures to answer the YES/NO question as described above. Additionally, these decision procedures must be efficient. Meeting these requirements, however, is not straightforward. Indeed, in our view, the challenges for Verified AI stem directly from these requirements. We outline these challenges in Section 2 below, and describe ideas to address each of these challenges in Section 3.111The first version of this paper was published in July 2016 in response to the call for white papers for the CMU Exploratory Workshop on Safety and Control for AI held in June 2016. This is the second version reflecting the evolution of the authors’ understanding of the challenges for Verified AI, along with selected new results.
2 Challenges for Verified AI
We identify five major challenges to achieving formally-verified AI-based systems. In this section, we sketch out these challenges, illustrating them with examples from the domain of (semi-)autonomous driving.
2.1 Environment Modeling
In the traditional success stories for formal verification, such as verifying cache coherence protocols or device drivers, the interface between the system and its environment is well defined. Moreover, while the environment itself may not be known, it is usually acceptable to model it as a non-deterministic process subject to constraints specified in a suitable logic or automata-based formalism. Typically such an environment model is “over-approximate”, meaning that it may include more environment behaviors than are possible.
We see systems based on AI or machine learning (ML) as being quite different. Consider an autonomous vehicle operating in rush-hour traffic in an urban environment. It may be impossible even to precisely define all the variables (features) of the environment that must be modeled, let alone to model all possible behaviors of the environment. Even if these variables are known, non-deterministic or over-approximate modeling is likely to produce too many spurious bug reports, rendering the verification process useless in practice.
Similarly, for systems involving joint human-machine control, such as semi-autonomous vehicles, human agents are a key part of the environment and/or system. Researchers have attempted modeling humans as non-deterministic or stochastic processes with the goal of verifying the correctness of the overall system [54, 57]. However, such approaches must deal with the variability and uncertainty in human behavior. One could take a data-driven approach based on machine learning, but such an approach is sensitive to the quality of data. For example, the technique of inverse reinforcement learning  can be used for learning the reward function of human agents [1, 74]. However, accuracy of the learned reward function depends on the expressivity of the hand-coded features by the designer and the amount and quality of the data collected. In order to achieve Verified AI for such human-in-the-loop systems, we need to address the limitations of the current human modeling techniques and provide guarantees about their prediction accuracy and convergence. When learned models are used, one must represent any uncertainty in the learned parameters as a first-class entity in the model, and take that into account in verification and control.
The first challenge, then, is to come up with a method of environment modeling that allows one to provide provable guarantees on the system’s behavior even when there is considerable uncertainty about the environment.
2.2 Formal Specification
Formal verification critically relies on having a formal specification – a precise, mathematical statement of what the system is supposed to do. However, the challenge of coming up with a high-quality formal specification is well known, even in application domains in which formal verification has found considerable success (see, e.g., ).
This challenge is only exacerbated in AI-based systems. Consider a module in an autonomous vehicles that performs object recognition, distinguishing humans from other objects. What is the specification for such a module? How might it differ from the specifications used in traditional applications of formal methods? What should the specification language be, and what tools can one use to construct a specification?
Thus, the second challenge is to find an effective method to specify desired and undesired properties of systems that use AI- or ML-based components.
2.3 Modeling Systems that Learn
In most traditional applications of formal verification, the system is precisely known: it is a C program, or a circuit described in a hardware description language. The system modeling problem is primarily concerned with reducing the size of the to a more tractable representation by abstracting away irrelevant details.
AI-based systems lead to a very different challenge for system modeling. A major challenge is the use of machine learning, where the system evolves as it encounters new data and new situations. Modeling a deep neural network that has been trained on millions of data points, has non-linear components, stochastic behavior, and thousands of parameters can be challenging enough even if one “freezes” the training process. New abstraction techniques will be necessary. Additionally, verification must account for future changes in the learner as new data arrives, or else be performed incrementally, online as the learning-based system changes. In short, we must devise new techniques to formally model components based on machine learning.
2.4 Computational Engines for Training, Testing, and Verification
The effectiveness of formal methods in the domains of hardware and software has been driven by advances in underlying “computational engines” — e.g., SAT, SMT, simulation, and model checking. Such computational engines are needed for intelligent and scalable training, testing, and verification of AI-based systems. However, several challenges must be overcome to achieve this.
Training and Testing: Formal methods has proved effective for the systematic generation of test data in various settings including simulation-based verification of circuits (e.g., ) and finding security exploits in commodity software (e.g., ). In these cases, even though the end result is not a proof of correctness of the system , the generated tests raise the level of assurance in the system’s correctness.
AI-based systems benefit from testing not just by gaining a higher level of assurance, but also by leveraging the generated data for retraining. Moreover, recent efforts have shown that various machine learning algorithms can fail under small adversarial perturbations (e.g., [47, 26, 45, 29, 50]). Learning algorithms promise to generalize from data, but such simple perturbations that fool the algorithms create concerns regarding their use in safety-critical applications such as autonomous driving. Such small perturbations might be even unrecognizable to humans, but drive the algorithm to misclassify the perturbed data. Further, we need to generate not just single data items, but an ensemble that is “realistic” and satisfies distributional constraints.
Thus, the question is: can we devise techniques based on formal methods to systematically generate training and testing data for ML-based components?
Quantitative Verification: Several safety-critical applications of AI-based systems are in robotics and cyber-physical systems. In such systems, the scalability challenge for verification can be very high. In addition to the scale of systems as measured by traditional metrics (dimension of state space, number of components, etc.), the types of components can be much more complex. For instance, in (semi-)autonomous driving, autonomous vehicles and their controllers need to be modeled as hybrid systems combining both discrete and continuous dynamics. Moreover, agents in the environment (humans, other vehicles) may need to be modeled as probabilistic processes. Finally, the requirements may involve not only traditional Boolean specifications on safety and liveness, but also quantitative requirements on system robustness and performance. Yet, most of the existing verification methods are targeted towards answering Boolean verification questions. To address this gap, new scalable engines for quantitative verification must be developed.
2.5 Correct-by-Construction Intelligent Systems
In an ideal world, verification should be integrated with the design process so that the system is “correct-by-construction.” Such an approach could either interleave verification steps with compilation/synthesis steps, such as in the register-transfer-level (RTL) design flow common in integrated circuits, or modify the synthesis algorithms themselves so as to ensure that the implementation satisfies the specification, such as in reactive synthesis from temporal logic . Can we devise a suitable correct-by-construction design flow for AI-based systems?
One challenge is to design machine learning components that satisfy desired properties (assuming we solve the formal specification challenge described above in Sec. 2.2). For this, we need techniques that can synthesize a suitable training set, and update it as needed. We should synthesize the structure of the learning model, and potentially also a good set of features. Finally, we should have a principled approach to training that leverages the specification and environment models.
Another challenge is to design the overall system comprising both learning and non-learning components. While theories of compositional design have been developed for digital circuits and embedded systems (e.g. [61, 69]), we do not as yet have such theories for AI-based systems. Moreover, there is not yet a systematic understanding of what can be achieved at design time, how the design process can contribute to safe and correct operation of the intelligent system at run time, and how the design-time and run-time techniques can interoperate effectively.
3 Principles for Verified AI
For each of the challenges described in the preceding section, we suggest a corresponding set of principles to follow in the design/verification process to address that challenge. These five principles are:
Introspect on the system and actively gather data to model the environment;
Formally specify end-to-end behavior of the AI-based system, and develop new quantitative formalisms to specify learning components;
Develop abstractions for and explanations from ML components;
Create a new class of randomized and quantitative formal methods for data generation, testing, and verification, and
Develop techniques for formal inductive synthesis of AI-based systems, supported by an integrated design methodology combining design-time and run-time verification.
We believe these techniques are just a starting point. Our formal methods perspective on the problem complements other perspectives that have been expressed (e.g., ). Taken together with other ideas, we believe that the principles we suggest can point a way towards the goal of Verified AI.
3.1 Introspective Environment Modeling
Recall from Sec. 2.1 the challenge of modeling the environment of an AI-based system . We believe that a promising strategy to meet this challenge is to develop design and verification methods that are introspective, i.e., they identify assumptions that system makes about the environment that are sufficient to guarantee the satisfaction of the specification . The assumptions must be such that, at run time, can efficiently monitor so as to ensure that they always hold. Moreover, if there is a human operator involved, one might want to be translatable into an explanation that is human understandable, so that can “explain” to the human why it may not be able to satisfy the specification .
Ideally, the assumptions must be the weakest set of such assumptions that makes about its environment. However, given the other requirements for to be efficiently monitorable and human understandable, one may need to settle for a stronger assumption.
As an example, consider an autonomous vehicle that is trying to maintain a minimum distance from any other object while being in motion — this forms the specification . Note that defines an interface, including a set of sensors that the vehicle must use to check for itself that is satisfied. On top of this interface, suppose that tracks other features of the environment such as the state of traffic lights, the number of vehicles in its vicinity, their state such as their velocity, whether they are human-driven, an estimate of those human drivers’ intent and driving style, etc. It will then need to generate assumptions to monitor over this expanded interface (as well as its internal state) so as to ensure that when is satsified, so is .
Extracting good assumptions may be easier during the design process, e.g., while synthesizing a controller for . Preliminary work by the authors has shown that such extraction of monitorable assumptions is feasible in simple cases [37, 39, 28], although more research is required to make this practical.
In addition, we need to actively gather data about real and simulated environments and use those to learn and update our environment models. Put another way, we must monitor and interact with the environment, both offline and online, in order to model it. Initial work by the authors [57, 60, 59] has shown how data gathered from driving simulators via human subject experiments can be used to generate models of human driver behavior that are useful for verification and control of autonomous vehicles.
3.2 End-to-End Specifications, Quantitative Specifications, and Specification Mining
Writing formal specifications for AI/ML components is hard, perhaps even impossible if the task involves a version of the Turing test. How can we address this challenge described in Sec. 2.2?
As researchers often say: when the problem is too hard, perhaps we should change the problem! We believe that formally specifying the behavior of an AI/ML component may be unnecessary. Instead, one should focus on precisely specifying the end-to-end behavior of the entire AI-based system. By “end-to-end” we mean the specification on the entire system or at least a precisely-specifiable sub-system containing the AI/ML component, not on the AI/ML component alone. Such a specification is also referred to as a “system-level” specification. We believe that this latter task, in many if not most cases, can be done more easily.
Consider again our autonomous vehicle scenario from the previous section. It should be straightforward to specify the property corresponding to maintaining a minimum distance from any object during motion. This property says nothing about any component that uses machine learning.
Of course, in order to test the ML-based component, it is useful to have a formal specification on its interface. However, we believe this specification does not need to be exact: a “likely specification” could suffice. We suggest the use of techniques for inferring specifications from behaviors and other artifacts — so-called specification mining techiques (e.g., [24, 36, 33]), for this purpose.
In addition, we should address the mismatch between how design objectives are expressed in formal methods (typically using Boolean specifications given in logic or as automata) and how these are expressed in machine learning (typically as cost or reward functions). One approach to bridge this gap is to move to quantitative specification languages, such as logics with quantitative semantics (e.g. ) or notions of weighted automata (e.g. ).
3.3 Formal Abstractions and Explanations for Machine Learning
Let us now consider the challenge, described in Sec. 2.3, of modeling systems that learn from experience. We believe a combination of automated abstraction and explanation-based learning will be needed to model such systems for purposes of formal verification.
First, effective techniques need to be developed to abstract ML components into a formalism for which efficient verification techniques exist or can be developed. Since the guarantees many ML algorithms give are probabilistic, this will require the development of probabilistic logics and similar formalisms that can capture these guarantees (e.g., ). Additionally, if the output of a learning algorithm is accompanied by a measure of uncertainty about its correctness, then that uncertainty must be propagated to the rest of the system and represented in the model of the overall system. For example, the formalism of convex Markov decision processes (convex MDPs) [48, 52, 57] provide a way of representing uncertainty in the values of learned transition probabilities. Algorithms for verification and control may then need to be extended to handle these new abstractions (see, e.g., ).
The task of modeling a learning system can be made easier if the learner accompanies its predictions with explanations of how those predictions result from the data and background knowledge. In fact, this idea is not new – it has long been investigated by the ML community under terms such as explanation-based generalization . Recently, there has been a renewal of interest in using logic to explain the output of learning systems (e.g. ). Such approaches to generating explanations that are compatible with the modeling languages used in formal methods can make the task of system modeling for verification considerably easier.
The literature in formal methods on explaining failures or counterexamples may also be relevant. For example, assume that a misclassification by a ML component causes a failure of an end-to-end specification. If we can apply techniques from the formal methods literature to localize that failure (e.g., [38, 36]), then we could identify whether the ML-component was responsible. Counterfactual reasoning has been used in the formal methods literature for explaining failures, and we believe such an approach will also be useful in the context of AI-based systems.
3.4 Randomized and Quantitative Formal Methods for Training, Testing, and Verification
Consider the challenge, described in Sec. 2.4, of devising computational engines for scalable training, testing, and verification of AI-based systems. We see three promising directions to tackle this challenge.
Randomized Formal Methods: Consider the problem of systematically generating training data for a ML component in an AI-based system. More concretely, suppose we wish to systematically test a classifier that given a set of data points (images, audio, etc.) assigns a real-valued label to them . One testing problem is to find a perturbation such that the algorithm flips the label it assigns to (many) examples upon perturbation, i.e., .
Such perturbations cannot be done arbitrarily. One challenge is to define the space of “legal” perturbations so that the resulting examples are still legal inputs that look “realistic”. Additionally, one might need to impose constraints on the distribution of the generated examples in order to obtain guarantees about convergence of the learning algorithm to the true concept. How do we meet all these requirements?
We believe that the answer may lie in a new class of randomized formal methods – randomized algorithms for generating test inputs subject to formal constraints and distribution requirements. Specifically, a recently defined class of techniques, termed control improvisation , holds promise. An improviser is a generator of random strings (examples) that satisfy three constraints: (i) a hard constraint that defines the space of legal ; (ii) a soft constraint defining how the generated must be similar to real-world examples, and (iii) a randomness requirement defining a constraint on the output distribution. The theory of control improvisation is still in its infancy, and we are just starting to understand the computational complexity and to devise efficient algorithms. Improvisation, in turn, relies on recent progress on computational problems such as constrained random sampling and model counting (e.g., [42, 9, 10]). Much more remains to be done.
Quantitative Verification: Recall the challenge to develop techniques for verification of quantitative requirements – where the output of the verifier is not just YES/NO but a numeric value. We believe such techniques can be useful not only in verifying quantitative requirements but also in generating relevant training and test data for ML components.
The complexity and heterogeneity of AI-based systems means that, in general, even formal verification of Boolean specifications is likely to be undecidable. (For example, even deciding whether a state of a linear hybrid system is reachable is undecidable.) To overcome this obstacle posed by computational complexity, one must either (i) find tractable but realistic problem classes, or (ii) settle for incomplete or unsound formal verification methods (i.e., semi decision procedures). Techniques for simulation-based verification (also termed as falsification or run-time verification) can prove very fruitful in this regard, as has been recently demonstrated for industrial automotive systems (e.g. [21, 25, 19, 73]). A key element of these techniques is the formulation of verification as optimization, i.e., a quantitative approach.
Such falsification techniques can also be used for the systematic, adversarial generation of training data for ML components. Recent work has shown how a falsifier can be combined with a systematic generator of adversarial inputs for a deep neural network so as to find violations of a system-level specification [22, 23].
Safe Exploration for Learning: There has been considerable recent work on safe learning-based control (e.g., [12, 13, 2]). In this approach, a safety envelope is pre-computed and a learning algorithm is used to tune a controller within that envelope. Techniques for efficiently computing such safety envelopes based, for example, on reachability analysis , are needed. Reachable sets will need to be encoded in a representation suitable for efficient use during exploration. Additionally, how does one generate “interesting” trajectories within the safety envelope to train the learning system well? We believe the theory of control improvisation  could be useful here.
3.5 Formal Inductive Synthesis and Integrating Design Time with Run Time
Developing a correct-by-construction design methodology for AI-based systems, with associated tools, is perhaps the toughest challenge of all. For this to be fully solved, the preceding four challenges must be successfully addressed. However, we do not need to wait until we solve those problems in order to start working on this one. Indeed, a methodology to “design for verification” may well ease the task on the other four challenges.
First consider the problem of synthesizing learning components correct by construction. The emerging theory of formal inductive synthesis [31, 32] addresses this problem. Formal inductive synthesis is the synthesis from examples of programs that satisfy formal specifications. In machine learning terms, it is the synthesis of models/classifiers that additionally satisfy a formal specification. The most common approach to solving a formal inductive synthesis problem is to use an oracle-guided approach. In oracle-guided synthesis, a learner is paired with an oracle who answers queries. The set of query-response types is defined by an oracle interface. In many cases, the oracle is a falsifier that can generate counterexamples showing how the learned component violates its specification. This approach, also known as counterexample-guided inductive synthesis, has proved effective in many scenarios. In general, oracle-guided inductive synthesis techniques show much promise for the synthesis of learned components by blending expert human insight, inductive learning, and deductive reasoning [63, 64].
However, due to the undecidability of verification in most instances and the challenge of environment modeling, we believe it will be difficult, if not impossible, to synthesize correct-by-construction AI-based systems in many scenarios. Autonomous driving is one such application domain. In such cases, we believe design-time verification must be combined with run-time verification and recovery techniques. For example, the Simplex technique  provides one approach to combining a complex, but error-prone module with a safe, formally-verified backup module. Recent techniques for combining design-time and run-time assurance methods (e.g., [62, 18]) can be leveraged to design a robust system.
Taking a formal methods perspective, we have analyzed the challenge of formally verifying systems that use artificial intelligence or machine learning. We identified five main challenges: environment modeling, formal specification, system modeling, computational engines, and correct-by-construction design. For each of these five challenges, we have identified corresponding principles for design and verification that hold promise for addressing that challenge. We are currently engaged in developing the theory behind these principles, and applying them to the design of human cyber-physical systems  and learning-based cyber-physical systems, with a special focus on autonomous and semi-autonomous vehicles. We expect to report on the results in the years to come.
The authors’ work is supported in part by NSF grants CCF-1139138, CCF-1116993, CNS-1545126, and CNS-1646208, by an NDSEG Fellowship, and by the TerraSwarm Research Center, one of six centers supported by the STARnet phase of the Focus Center Research Program (FCRP) a Semiconductor Research Corporation program sponsored by MARCO and DARPA. We gratefully acknowledge the many colleagues with whom our conversations and collaborations have helped shape this document.
-  Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, page 1. ACM, 2004.
-  Anayo Akametalu. Reachability-based safe learning with gaussian processes. Master’s thesis, EECS Department, University of California, Berkeley, Aug 2015.
-  Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
-  Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J. Schwartz, Maverick Woo, and David Brumley. Automatic exploit generation. Commun. ACM, 57(2):74–84, 2014.
-  Clark Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. Satisfiability modulo theories. In Armin Biere, Hans van Maaren, and Toby Walsh, editors, Handbook of Satisfiability, volume 4, chapter 8. IOS Press, 2009.
-  I. Beer, S. Ben-David, C. Eisner, and Y. Rodeh. Efficient detection of vacuity in ACTL formulas. Formal Methods in System Design, 18(2):141–162, 2001.
-  Nikolaj Bjørner, Anh-Dung Phan, and Lars Fleckenstein. z-an optimizing SMT solver. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 194–199. Springer, 2015.
-  Randal E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers, C-35(8):677–691, August 1986.
-  Supratik Chakraborty, Daniel J. Fremont, Kuldeep S. Meel, Sanjit A. Seshia, and Moshe Y. Vardi. Distribution-aware sampling and weighted model counting for SAT. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI), pages 1722–1730, July 2014.
-  Supratik Chakraborty, Daniel J. Fremont, Kuldeep S. Meel, Sanjit A. Seshia, and Moshe Y. Vardi. On parallel scalable uniform sat witness generation. In Proceedings of the 21st International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 304–319, April 2015.
-  Krishnendu Chatterjee, Laurent Doyen, and Thomas A Henzinger. Quantitative languages. ACM Transactions on Computational Logic (TOCL), 11(4):23, 2010.
-  M. Chen, J. F. Fisac, S. Sastry, and C. J. Tomlin. Safe sequential path planning of multi-vehicle systems via double-obstacle hamilton-jacobi-isaacs variational inequality. In 14th European Control Conference (ECC), 2015.
-  M. Chen, Q. Hu, C. Mackin, J. F. Fisac, and C. J. Tomlin. Safe platooning of unmanned aerial vehicles via reachability. In 54th IEEE Conference on Decision and Control (CDC), 2015.
-  Edmund M. Clarke and E. Allen Emerson. Design and synthesis of synchronization skeletons using branching-time temporal logic. In Logic of Programs, pages 52–71, 1981.
-  Edmund M. Clarke, Orna Grumberg, and Doron A. Peled. Model Checking. MIT Press, 2000.
-  Edmund M Clarke and Jeannette M Wing. Formal methods: State of the art and future directions. ACM Computing Surveys (CSUR), 28(4):626–643, 1996.
-  Committee on Information Technology, Automation, and the U.S. Workforce. Information technology and the U.S. workforce: Where are we and where do we go from here? http://www.nap.edu/24649.
-  Ankush Desai, Tommaso Dreossi, and Sanjit A. Seshia. Combining model checking and runtime verification for safe robotics. In Runtime Verification - 17th International Conference, RV 2017, Seattle, WA, USA, September 13-16, 2017, Proceedings, pages 172–189, 2017.
-  Jyotirmoy Deshmukh, Alexandre Donzé, Shromona Ghosh, Xiaoqing Jin, Garvit Juniwal, and Sanjit A. Seshia. Robust online monitoring of signal temporal logic. In Proceedings of the International Conference on Runtime Verification (RV), volume 9333 of Lecture Notes in Computer Science, pages 55–70. Springer, September 2015.
-  Thomas G Dietterich and Eric J Horvitz. Rise of concerns about AI: reflections and directions. Communications of the ACM, 58(10):38–40, 2015.
-  Alexandre Donzé, Xiaoqing Jin, Jyotirmoy V. Deshmukh, and Sanjit A. Seshia. Automotive systems requirement mining using Breach. In American Control Conference (ACC), page 4097, 2015.
-  Tommaso Dreossi, Alexandre Donze, and Sanjit A. Seshia. Compositional falsification of cyber-physical systems with machine learning components. In Proceedings of the NASA Formal Methods Conference (NFM), May 2017.
-  Tommaso Dreossi, Shromona Ghosh, Alberto L. Sangiovanni-Vincentelli, and Sanjit A. Seshia. Systematic testing of convolutional neural networks for autonomous driving. CoRR, abs/1708.03309, 2017. Appeared at the ICML 2017 Workshop on Reliable Machine Learning in the Wild.
-  Michael Ernst. Dynamically Discovering Likely Program Invariants. PhD thesis, University of Washington, Seattle, 2000.
-  Georgios E. Fainekos. Automotive control design bug-finding with the S-TaLiRo tool. In American Control Conference (ACC), page 4096, 2015.
-  Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classifiers’ robustness to adversarial perturbations. arXiv preprint arXiv:1502.02590, 2015.
-  Daniel J. Fremont, Alexandre Donzé, Sanjit A. Seshia, and David Wessel. Control improvisation. In 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015), pages 463–474, 2015.
-  Shromona Ghosh, Dorsa Sadigh, Pierluigi Nuzzo, Vasumathi Raman, Alexandre Donzé, Alberto L. Sangiovanni-Vincentelli, S. Shankar Sastry, and Sanjit A. Seshia. Diagnosis and repair for synthesis from signal temporal logic specifications. In Proceedings of the 9th International Conference on Hybrid Systems: Computation and Control (HSCC), April 2016.
-  Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
-  M. J. C. Gordon and T. F. Melham. Introduction to HOL: A Theorem Proving Environment for Higher-Order Logic. Cambridge University Press, 1993.
-  S. Jha and S. A. Seshia. A Theory of Formal Synthesis via Inductive Learning. ArXiv e-prints, May 2015.
-  Susmit Jha and Sanjit A. Seshia. A Theory of Formal Synthesis via Inductive Learning. Acta Informatica, 2017.
-  Xiaoqing Jin, Alexandre Donzé, Jyotirmoy Deshmukh, and Sanjit A. Seshia. Mining requirements from closed-loop control models. In Proceedings of the International Conference on Hybrid Systems: Computation and Control (HSCC), April 2013.
-  Matt Kaufmann, Panagiotis Manolios, and J. Strother Moore. Computer-Aided Reasoning: An Approach. Kluwer Academic Publishers, 2000.
-  Nathan Kitchen and Andreas Kuehlmann. Stimulus generation for constrained random simulation. In Proceedings of the 2007 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 258–265. IEEE Press, 2007.
-  Wenchao Li. Specification Mining: New Formalisms, Algorithms and Applications. PhD thesis, EECS Department, University of California, Berkeley, Mar 2014.
-  Wenchao Li, Lili Dworkin, and Sanjit A. Seshia. Mining assumptions for synthesis. In Proceedings of the Ninth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), pages 43–50, July 2011.
-  Wenchao Li, Alessandro Forin, and Sanjit A. Seshia. Scalable specification mining for verification and diagnosis. In Proceedings of the Design Automation Conference (DAC), pages 755–760, June 2010.
-  Wenchao Li, Dorsa Sadigh, S. Shankar Sastry, and Sanjit A. Seshia. Synthesis for human-in-the-loop control systems. In Proceedings of the 20th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 470–484, April 2014.
-  Oded Maler and Dejan Nickovic. Monitoring temporal properties of continuous signals. In FORMATS/FTRTFT, pages 152–166, 2004.
-  Sharad Malik and Lintao Zhang. Boolean satisfiability: From theoretical hardness to practical success. Communications of the ACM (CACM), 52(8):76–82, 2009.
-  Kuldeep S. Meel, Moshe Y. Vardi, Supratik Chakraborty, Daniel J. Fremont, Sanjit A. Seshia, Dror Fried, Alexander Ivrii, and Sharad Malik. Constrained sampling and counting: Universal hashing meets SAT solving. In Beyond NP, Papers from the 2016 AAAI Workshop, Phoenix, Arizona, USA, February 12, 2016., 2016.
-  Tom M. Mitchell. Machine Learning. McGraw-Hill, 1997.
-  Tom M Mitchell, Richard M Keller, and Smadar T Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine learning, 1(1):47–80, 1986.
-  Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. arXiv preprint arXiv:1511.04599, 2015.
-  Andrew Y. Ng and Stuart J. Russell. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 663–670, 2000.
-  Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 427–436. IEEE, 2015.
-  A. Nilim and L. El Ghaoui. Robust Control of Markov Decision Processes with Uncertain Transition Matrices. Journal of Operations Research, pages 780–798, 2005.
-  S. Owre, J. M. Rushby, and N. Shankar. PVS: A prototype verification system. In Deepak Kapur, editor, 11th International Conference on Automated Deduction (CADE), volume 607 of Lecture Notes in Artificial Intelligence, pages 748–752. Springer-Verlag, June 1992.
-  Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy, EuroS&P 2016, Saarbrücken, Germany, March 21-24, 2016, pages 372–387, 2016.
-  Amir Pnueli and Roni Rosner. On the synthesis of a reactive module. In Conference Record of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, Austin, Texas, USA, January 11-13, 1989, pages 179–190, 1989.
-  Alberto Puggelli, Wenchao Li, Alberto Sangiovanni-Vincentelli, and Sanjit A. Seshia. Polynomial-time verification of PCTL properties of MDPs with convex uncertainties. In Proceedings of the 25th International Conference on Computer-Aided Verification (CAV), July 2013.
-  Jean-Pierre Queille and Joseph Sifakis. Specification and verification of concurrent systems in CESAR. In Symposium on Programming, number 137 in LNCS, pages 337–351, 1982.
-  John Rushby. Using model checking to help discover mode confusions and other automation surprises. Reliability Engineering & System Safety, 75(2):167–177, 2002.
-  Stuart Russell, Tom Dietterich, Eric Horvitz, Bart Selman, Francesca Rossi, Demis Hassabis, Shane Legg, Mustafa Suleyman, Dileep George, and Scott Phoenix. Letter to the editor: Research priorities for robust and beneficial artificial intelligence: An open letter. AI Magazine, 36(4), 2015.
-  Stuart Jonathan Russell and Peter Norvig. Artificial intelligence: a modern approach. Prentice hall, 2010.
-  Dorsa Sadigh, Katherine Driggs-Campbell, Alberto Puggelli, Wenchao Li, Victor Shia, Ruzena Bajcsy, Alberto L. Sangiovanni-Vincentelli, S. Shankar Sastry, and Sanjit A. Seshia. Data-driven probabilistic modeling and verification of human driver behavior. In Formal Verification and Modeling in Human-Machine Systems, AAAI Spring Symposium, March 2014.
-  Dorsa Sadigh and Ashish Kapoor. Safe control under uncertainty with probabilistic signal temporal logic. In Proceedings of Robotics: Science and Systems, AnnArbor, Michigan, June 2016.
-  Dorsa Sadigh, Shankar Sastry, Sanjit A. Seshia, and Anca D. Dragan. Information gathering actions over human internal state. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2016.
-  Dorsa Sadigh, Shankar Sastry, Sanjit A. Seshia, and Anca D. Dragan. Planning for autonomous cars that leverages effects on human actions. In Proceedings of the Robotics: Science and Systems Conference (RSS), June 2016.
-  Alberto Sangiovanni-Vincentelli, Werner Damm, and Roberto Passerone. Taming Dr. Frankenstein: Contract-based design for cyber-physical systems. European journal of control, 18(3):217–238, 2012.
-  John D Schierman, Michael D DeVore, Nathan D Richards, Neha Gandhi, Jared K Cooper, Kenneth R Horneman, Scott Stoller, and Scott Smolka. Runtime assurance framework development for highly adaptive flight control systems. Technical report, Barron Associates, Inc. Charlottesville, 2015.
-  Sanjit A. Seshia. Sciduction: Combining induction, deduction, and structure for verification and synthesis. In Proceedings of the Design Automation Conference (DAC), pages 356–365, June 2012.
-  Sanjit A. Seshia. Combining induction, deduction, and structure for verification and synthesis. Proceedings of the IEEE, 103(11):2036–2051, 2015.
-  Sanjit A. Seshia, Dorsa Sadigh, and S. Shankar Sastry. Formal methods for semi-autonomous driving. In Proceedings of the Design Automation Conference (DAC), pages 148:1–148:5, June 2015.
-  Lui Sha. Using simplicity to control complexity. IEEE Software, 18(4):20–28, 2001.
-  Yasser Shoukry, Pierluigi Nuzzo, Alberto Puggelli, Alberto L. Sangiovanni-Vincentelli, Sanjit A. Seshia, Mani Srivastava, and Paulo Tabuada. Imhotep-SMT: A satisfiability modulo theory solver for secure state estimation. In In 13th International Workshop on Satisfiability Modulo Theories (SMT), July 2015.
-  Yasser Shoukry, Pierluigi Nuzzo, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia, George J. Pappas, and Paulo Tabuada. Smc: Satisfiability modulo convex optimization. In Proceedings of the 10th International Conference on Hybrid Systems: Computation and Control (HSCC), April 2017.
-  Joseph Sifakis. System design automation: Challenges and limitations. Proceedings of the IEEE, 103(11):2093–2103, 2015.
-  Claire Tomlin, Ian Mitchell, Alexandre M. Bayen, and Meeko Oishi. Computational techniques for the verification of hybrid systems. Proceedings of the IEEE, 91(7):986–1001, 2003.
-  Marcell Vazquez-Chanlatte, Jyotirmoy V. Deshmukh, Xiaoqing Jin, and Sanjit A. Seshia. Logical clustering and learning for time-series data. In 29th International Conference on Computer Aided Verification (CAV), pages 305–325, 2017.
-  Jeannette M Wing. A specifier’s introduction to formal methods. IEEE Computer, 23(9):8–24, September 1990.
-  Tomoya Yamaguchi, Tomoyuki Kaga, Alexandre Donze, and Sanjit A. Seshia. Combining requirement mining, software model checking, and simulation-based verification for industrial automotive systems. Technical Report UCB/EECS-2016-124, EECS Department, University of California, Berkeley, June 2016.
-  Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. Maximum entropy inverse reinforcement learning. In AAAI, pages 1433–1438, 2008.