Conformance Testing as Falsification for CyberPhysical Systems
Abstract
In ModelBased Design of CyberPhysical Systems (CPS), it is often desirable to develop several models of varying fidelity. Models of different fidelity levels can enable mathematical analysis of the model, control synthesis, faster simulation etc. Furthermore, when (automatically or manually) transitioning from a model to its implementation on an actual computational platform, then again two different versions of the same system are being developed. In all previous cases, it is necessary to define a rigorous notion of conformance between different models and between models and their implementations. This paper argues that conformance should be a measure of distance between systems. Albeit a range of theoretical distance notions exists, a way to compute such distances for industrial size systems and models has not been proposed yet. This paper addresses exactly this problem. A universal notion of conformance as closeness between systems is rigorously defined, and evidence is presented that this implies a number of other applicationdependent conformance notions. An algorithm for detecting that two systems are not conformant is then proposed, which uses existing proven tools. A method is also proposed to measure the degree of conformance between two systems. The results are demonstrated on a range of models.
Conformance Testing as Falsification for CyberPhysical Systems
Houssam Abbas, Bardh Hoxha, and Georgios Fainekos 
CPS Lab, Arizona State University, Tempe, AZ, USA 
{hyabbas, fainekos, bhoxha}@asu.edu and 
Jyotirmoy V. Deshmukh, James Kapinski, and Koichi Ueda, 
Toyota Technical Center, Gardena, CA, USA 
{jyotirmoy.deshmukh, jim.kapinski, koichi.ueda}@tema.toyota.com 
\@float
copyrightbox[b]
\end@floatIn a typical ModelBased Design (MBD) process for CyberPhysical Systems (see Fig. Conformance Testing as Falsification for CyberPhysical Systems), a series of models and implementations are iteratively developed such that the end product satisfies a set of functional requirements . Ideally, the initial (simpler) model developed should have structural properties that make it amenable to formal synthesis and verification methods [?, ?] (cycle 1 in Fig. Conformance Testing as Falsification for CyberPhysical Systems) through software tools like [?, ?, ?, ?, ?, ?]. Then, the fidelity of the models is increased by modeling more complex physical phenomena ignored initially and by introducing inaccuracies due to the computational platforms such as lookuptables, time delays, party blackbox components, etc.
The development of a higher fidelity model raises the obvious question of what is the relationship between the “simple" and “complex" models developed (cycle 2 in Fig. Conformance Testing as Falsification for CyberPhysical Systems). If the simpler model developed was a nondeterministic model and the structure of was fully known, then the answer to the question could be established through behavioral inclusions [?], i.e., is it true that every behavior of can be exhibited by , in response to the same stimulus?
However, in practice, nondeterministic models are rarely utilized and supported by industry tools for MBD such as LabView or Simulink/Stateflow. Instead, a hierarchy of deterministic models is developed each capturing a more accurate representation of the final system, and it is important to know how ‘close’ two successive models are to each other. While the higher fidelity model introduces new, more realistic behavior, it should still follow, roughly, the behavior of . Thus, in lieu of behavioral inclusion, an appropriate notion of distance between the models is required, i.e., . This we call conformance between the simple and complex models. Such distance^{1}^{1}1Note we don’t use the word ‘distance’ in the mathematical sense. notions have been developed for various classes of systems [?, ?, ?, ?, ?] over the years. Even though works such as [?, ?] treat systems with hybrid dynamics directly, they apply only to certain classes of hybrid systems and, most importantly, they rely on the full knowledge of the mathematical model of both and . For industrial size CPS models, such knowledge is not always available. Another limitation is that existing distance measures for systems either consider only distances in time, e.g., [?], or in space [?, ?, ?]. For CPS, both are extremely important especially if the end goal is to verify that the deployed system () satisfies formal specifications that involve timing requirements [?, ?].
The same observations hold for the important problem of verifying whether a system , which is an implementation of a model , behaves approximately similar to its model (arrows labeled with 3 in Fig. Conformance Testing as Falsification for CyberPhysical Systems). Irrespective of whether the automatic code generation process has formal guarantees, rarely does the model capture accurately all physical phenomena. Thus, the prototype system will be manually modified and calibrated into a final deployment . Then, the deployment should have a bounded, computable distance from the model under an appropriate metric, i.e., , and, should satisfy the set of specifications.
In this paper, a framework is provided to address the aforementioned gaps in MBD for CPS, i.e., arrows 2 and 3 in the V process in Fig. Conformance Testing as Falsification for CyberPhysical Systems. The framework is agnostic about whether the systems studied are both models or a model and its implementation, thus we will generically refer to one system as the Model and to the other as its Implementation.
More specifically, we utilize hybrid distance measures similar to [?, ?, ?] in order to define distances between system behaviors. Given two system behaviors (or trajectories), we compute a distance between them that captures both their distances in time and in space. Then, given a bound , we consider the problem of whether the Implementation conforms to its Model with degree . We pose the aforementioned problem as an optimization problem which we solve using our tool STaliro [?, ?]. Our solution is a best effort framework and the guarantees provided are of a probabilistic nature as described for instance in [?, ?, ?].
One question naturally arises at this point: why not just verify that the Implementation satisfies the same specification that the Model has been verified to satisfy? The reasoning behind the question is that if is all that matters, it should be sufficient that the Implementation also satisfies it. We may answer this question as follows:

It is not always possible to verify formally that the Implementation satisfies the formal specification: for example, a component purchased from a third party might allow only limited observability and not lend itself to formal methods.

Parts of the specification are not formally expressed. For example, because the available formal tools can not handle the size of the design (e.g. reachability tools for nonlinear systems). Rather, the specification exists in plain language Test Plan documents [?] or implicitly in test suites.^{2}^{2}2Note the release of industrial tools that induce requirements from simulation traces, such as [?], in an effort to formalize requirements currently implicit in tests.

For a reallife CPS, much of the behavior is de facto left unspecified because of the complexity. Once triggered, a particular behavior may exhibit unspecified but undesired characteristics, even though it possesses the specified, desired, characteristics (and none of the specified, undesired characteristics).
Therefore, once we have an Implementation, it is not sufficient to check that it too conforms to the specification (if that is even possible). It is important to make sure that behaviors exhibited by Model and Implementation are close (in a sense to be defined). This then is conformance testing. This way, both Model and Implementation display similar unspecified characteristics, and our level of confidence in the Implementation derives from our confidence in the Model.
In the previous sections, we have argued that current ways of thinking about the relation between a Model and its Implementation are not sufficient for the verification of complex CPS. In the remainder of this paper,

We propose a universal definition of conformance between CPSs as a quantifiable closeness measure between the output behaviors of the two systems.

We argue that this universal notion implies most custom conformance notions which depend on the application.

We pose conformance testing as a logic property falsification problem. We then apply existing tools to this problem and show that they successsfully find nonconformant behavior.

We show that conformance satisfies a monotonicity property which allows us to search efficiently for the best conformance degree between two systems.
In Section Conformance Testing as Falsification for CyberPhysical Systems, it was argued that the verification of a CPS implementation in an MBD process requires conformance testing. The latter was described as checking that Model and Implementation display ‘similar’ behaviors, where ‘similar’ will be made precise. Because the objective is to detect bugs caused by the implementation process, Model and Implementation should be tested with the same inputs, and starting from the same initial conditions. Our high level goal is then to determine whether there exists a pair of (initial conditions, input signal) that causes the Model and its Implementation to produce significantly different outputs; and if such a pair exists, to find it and present it to the user as a debug guide.
To make this goal precise, this section starts by presenting the class of systems that we study. This class is illustrated with a running example of a fuel control system for an automotive application. Then, the conformance testing problem is formally stated as a search problem over the set of initial conditions and input signals. Finally, the constraints under which we seek to solve this problem are presented. This lays the groundwork for Section Conformance Testing as Falsification for CyberPhysical Systems, where we will mathematically define what it means for two CPSs to be conformant.
Notation. Given two sets and , denotes the set of all functions from to . That is, for any we have . Given a cartesian set product , is the projection onto , i.e. for all .
At its most general, a CPS may be thought of as an inputoutput map. Specifically, let be a finite set of integers, be a positive real, be a set of initial operating conditions of the system, be a compact set of input values, and let be a set of output values.
Definition 2.1
A realtimed state sequence (realTSS) is a pair where and .
A hybridtimed state sequence (hybridTSS) is a pair where and .
When a statement applies to both realtimed and hybridtimed state sequences, we will simply say ‘timed state sequence’ (TSS). A TSS can be the result of a sampling process or a numerical integration. Then the vector of ‘timestamps’ represents the sequence of sampling times, or times at which a numerical solution is computed. A timed state sequence will also be referred to as a signal, and a valued timed state sequence will also be referred to as a trajectory. The latter is standard dynamical systems theory terminology. Note that a realTSS may be viewed as a special hybridTSS such that .
A CPS is modeled as a map between initial conditions and input timed state sequences to output timed state sequences , where is either (for realtimed) or (for hybridtimed). Note that input and output signals must either both be realtimed, or both be hybridtimed. We model discrete states as integers, so and could be hybrid spaces of the form with and finite. The system can then be viewed as a map:
(1) 
We impose the following restrictions on the systems that we consider:

The output space must be equipped with a generalized metric . See [?] for implications.

For every initial condition and input signal , the system produces an output signal. This is imposed to avoid modeling issues where the Model’s and/or Implementation’s equations have no solutions.
Further details on the necessity and implications of the aforementioned assumptions can be found in [?].
As it is standard in systems theory, the system’s output can be expressed as a function of its internal state :
Here, is the statespace of the system. We do not always assume that the internal state is observable. Given a realtimed state sequence , its element is denoted . Similarly, given a hybridtimed state sequence , the element is denoted , with .
Example 1
We consider a fuel control (FC) system for an automotive application. Environmental concerns and government legislation require that the fuel economy be maximized and the rate of emissions (e.g., hydrocarbons, carbon monoxide, and nitrogen oxides) be minimized. Control of automobile engine airtofuel (A/F) ratio is crucial to optimize fuel economy and to minimize emissions. Ideal A/F levels are given by the stoichiometric value, which is the optimal A/F ratio to minimize both fuel consumption and emission of pollutants. The purpose of the FC system is to maintain the ratio of airtofuel (A/F) within a given range of the stoichiometric value.
The scenario that we model involves an engine connected to a dynamometer, which is a device that can control the speed of the engine and measure the output torque. For our experiment, the dynamometer maintains the engine at a constant rotational velocity, as the engine is tested. There is only one input to the model: the throttle position command from the driver.
The conformance testing scenario for this example is unique, in that the Model was derived from the Implementation, for reasons on which we will now elaborate. The Implementation was derived from a textbook model of an engine control system [?], and contains implementation details such as lookuptables (LUTs). The Model was then abstracted from this Implementation for the purposes of formal analysis [?].
Despite the counterintuitive relationship between the Model and Implementation for this case, the conformance task remains: to verify that these two versions satisfy some similarity criterion.
The discussion and results in this paper apply to this inputoutput map model of a CPS. To define some of the conformance notions in this paper, it will be useful to sometimes work with the more specialized hybrid automaton model of a CPS [?]: broadly speaking, a hybrid automaton has countably many modes , with possibly different dynamics active in each mode: . The automaton switches (or ‘jumps’) between modes whenever the internal state enters specific subsets of the state space, called switching guards. In general, a switching guard might depend on time and on the current state; different jumps will have different guards: , . Finally, when the system switches modes, the internal state might be reset to a switchspecific value: if . If we explicitly model the system mode as part of the internal state , we may write the automaton’s equations as [?]
(2) 
where is the ‘flow set’ of continuous evolution, and is the jump set, which equals the union of all guard sets. Apart from the requirement that the dynamics have at least one solution for every , they are arbitrary.
Remark 2.1
The notion of a system mode applies to the general inputoutput model of a system, so in what follows we will often be referring to the ‘mode’ of the CPS without necessarily requiring that it be modeled as a hybrid automaton. For example, a powertrain Implementation might be outputting the current gear, or the mode of operation e.g. Economy vs. Sport.
The trajectories (or ‘solutions’) of purely continuous dynamical systems (with only one mode) are parameterized by the time variable , and those of purely discrete dynamical systems (with no continuous evolutions) are parametrized by the number of discrete jumps . Following Goebel and Teel [?], the trajectories to hybrid automata are parametrized by both and , to reflect that both evolution mechanisms are present. So we write for the state and for the output of the automaton at time and after jumps, or mode switches. Because jumps take 0 time, it is possible to have the automaton go through several states in 0 time: . This can’t happen in a physical Implementation, but it may be allowed in the Model. We refer the reader to [?] for exact definitions of discrete and hybrid time domains, arcs and trajectories.
We now introduce the behavior of a system which is applicable to both inputoutput maps and hybrid automata.
Definition 2.2 (Behavior)
Take a system , an initial point and input signal . The behavior of the CPS from and , denoted , consists of


the output trajectory generated by in response to
The behavior of is then
Example 2 (Example 1 Continued)
For the FC, the outputs consist of the normalized airtofuel ratio and the fuel commanded into the CylinderandExhaust. Thus . The presence of a switch in the Throttle block, and an LUT in the CylinderandExhaust block, induces 8 modes so . The outputs of the FC are sampled at a fixed rate. The output signals can be modeled as realTSS. If we could observe the mode changes during a simulation, then we can use a counter to count the mode switches, or ‘jumps’, and model the output as a hybridTSS. E.g. the following sequence of sampled
is interpreted as the following hybridTSS
Note that counts the jumps so far, but does not indicate what mode the system is in.
Based on the preceding discussion, we adopt the premise that conformance is a relation of similarity between the behaviors of two systems when subjected to the same stimulus. The behavior is defined in Def. 2.2. So we may speak of conformant (i.e., similar) behaviors, or conformant output trajectories. Conformance testing can then be formulated as a search problem: find a pair of trajectories, generated by the two CPSs in response to the same initial condition and input, that are not conformant. The search for a nonconformant pair of trajectories is called falsification.
Problem 1 (Conformance testing)
Let and be a Model and Implementation of a CPS, respectively. Find a pair such that and are nonconformant.
Because Implementations typically have limited observability, we assume testing happens under the following restriction:
Assumption 2.1 (Black box testing)
The behaviors of Model and Implementation are observable: i.e. it is always possible, for either system, to obtain an element of the behavior by executing the system. Only the behavior of the Implementation is observable  i.e. we know nothing else about it.
In particular, the sequence of modes that Model and Implementation go through can be an important variable to track to decide whether the two systems conform. As an example, a silicon microchip has ‘scan chains’, which are chains of buffers that pass to the outside world the values of internal registers. These are only used during testing, and are burnt before customer delivery. In control systems, mode estimation [?] could be used when applicable. While we leave it possible that more is known about the Model, we won’t need to know more to apply the methods of this paper. More knowledge of the Model will make applicable grey box testing methods such as [?, ?].
In general, conformance is an applicationdependent notion to help determine that the implementation process does not use components or methods that alter the functionality (or safety or performance) of the final product in any significant manner. What ‘significant’ means will, naturally, depend on the application. This makes conformance testing itself applicationdependent. Our first contribution is made in this section: we present a notion of distance between output trajectories, called closeness, and argue that this is an appropriate universal notion of conformance; that is, it is generally applicable regardless of the underlying application. The price we pay for this universality is that this notion is stronger than the applicationdependent ones: two systems may not be conformant according to closeness, but they may be conformant according to a weaker custom notion which is sufficient for the task at hand. In the second part of this section, we give reallife examples where the applicationdependent conformance turns out to be implied by closeness.
Thus we may develop a general theory of conformance based on closeness, and ‘generic’ algorithms that decide conformance which do not depend on the application. This is advantageous for two reasons: one of the challenges today for testing of hybrid systems (and CPS in general) is to define conformance in a rigorous manner, and closeness provides an answer. Secondly, generic conformance tools can be used early in the design cycle, before the instrumentation is all there for a deeper analysis of the difference between Model and Implementation. Moreover, a feature of the universal notion is that it uses only the outputs of the system (and possibly the mode sequence if available). Thus, the analysis and methods herein are applicable to potentially complicated systems with very general system models, including the inputoutput map model in Section Conformance Testing as Falsification for CyberPhysical Systems.
The proposed universal notion of conformance is closeness. closeness expresses proximity between the outputs, their time sequences (realTSS and hybridTSS), and their modes if applicable. It is derived from [?].
Definition 3.1 (closeness)
Take a test duration , a maximum number of jumps , and parameters .
Two timed state sequences, or trajectories, and are close if
(a) for all such that satisfies , there exists such that , , and
(b) for all such that satisfies , there exists such that , , and
We will also say that and are conformant with degree .
When and are clear from the context, we simply say close. Because a realTSS is a special case of a hybridTSS, the above definition applies to both.
closeness may be tought of as giving a proximity measure between the two hybrid arcs, both in time and space. The definition says that within any time window of size , there must be a time when the trajectories are within or less of each other. Allowing some ‘wiggle room’ in both time and space is important for conformance testing: when implementing a Model, there are inevitable errors. These are due to differences in computation precision, clock drift in the implementation, the use of inexpensive components, unmodeled environmental conditions, etc, leading to the Implementation’s output to differ in value from the Model’s output, and to have different timing characteristics. Thus closeness captures nicely the intuitive notion that ‘the outputs should still look alike’. Our definition of closeness differs slightly from the original definition in [?] in that we use two ‘precision’ parameters and instead of one. In practice, using only one precision parameter is too restrictive, since the outputs can have a different order of magnitude from the time variable. It can be verified that the hioco relation of Van Osch [?] is an exact version of closeness (, with the role of inputs and outputs explicitly differentiated.
Remark 3.1
If it is not possible to observe the number of jumps , then we simplify the above definition by assuming that is always equal to 1. In other words, we interpret the definition over realTSS and assume the system only has one mode.
Definition 3.2
Take a test duration , a maximum number of jumps , and parameters . Two CPSs and are said to be close if for any initial condition and input signal , the trajectories and are close. The two systems are also said to be conformant with degree .
Remark 3.2
A Model and Implementation generally won’t have the same statespace, and so won’t accept the same initial conditions. So when we provide the same initial condition to both, one of them might use a projection of or a more general mapping to obtain its appropriate initial conditions.
From a conformance perspective, it is preferable to have a smaller and a smaller . We use this to define a partial order on the pairs.
Definition 3.3
The partial order relation over pairs is given by
if and only if and .
The inequality is strict if and only if at least one of the componentwise inequalities is strict.
Remark 3.3
closeness has the valuable advantage of being monotonic: if two trajectories are close, then they are close for any . This allows us to use a simple binary search for a smallest pair such that the trajectories, and the systems, are close. We make use of this property in the experiments.
We conclude this section with examples where applicationspecific notions of conformance are implied by closeness. Thus if we find trajectory pairs that violate the latter, they automatically violate the former.
Example 3 (Example 1 continued)
Because the lookuptables (LUTs) in the Implementation are replaced by polynomials in the Model , some error is expected between the outputs of the two systems. The designer hopes, however, that the error at the output of the Implementation, is in the same order of magnitude as the error between the outputs of the LUTs and the outputs of the corresponding polynomials. If not, then more entries are needed in the LUT. Moreover because LUT lookups are typically faster than polynomial computations, some delay between the two outputs is expected to be observed. The designer has a prespecified maximum acceptable delay. In this case, conformance imposes upper bounds on the spatial and temporal differences between the outputs of Model and Implemenation.
Conformance testing is applicable to application domain areas other than the automotive industry. E.g. in the microchip design cycle, as shown in the following example.
Example 4 (State retention)
is an RTL description of an electrical circuit, and is equal to with power gating and state retention added to some of its subsystems. With state retention, the contents of certain critical memory elements of the powergated subsystem are retained in ‘shadow’ registers prior to powerdown, and restored after powerup. This creates a temporary difference between the state of the nonstate retained circuit () and the stateretained circuit (). This difference lasts until the reset sequence is completed. Thus in this case, conformance means that a temporary difference in modes between the two systems is allowed, but they must reconverge after a predefined amount of time.
In this section, we present a general method for determining whether two systems are conformant or not. We also provide a way to quantify the degree of conformance between them.
Our approach is based on the observation that closeness can be expressed as a formal logical property defined over the output timed state sequences of the parallel interconnection of systems and . See Fig. Conformance Testing as Falsification for CyberPhysical Systems. A TSS of the interconnection system is just the concatenation of the TSS of the component systems: . If we can find a (parallel) TSS (or ‘trajectory’) which falsifies the closeness property, then by definition, the component trajectories are nonconformant, and by extension, the systems and are nonconformant. In what follows, we will use the terms ‘falsifying trajectory pairs’ and ‘nonconformant trajectory pairs’ interchangeably.
The logic we use to express closeness is Metric Temporal Logic (MTL) [?] (see Appendix for a review of MTL). We first present the following construction for realTSS, then generalize it to hybridTSS.
RealTSS: Fix , . Our goal is to express closeness as an MTL formula. Let and be the outputs of Model and Implementation CPSs, respectively, in response to the same initial conditions and input signal. Because closeness requires comparing the current value of to current, past and future values of (over a window of width ), we will create shifted versions of . Given the symmetry of closeness, we will also define shifted versions of . The amount of the shift will depend on : how many samples of () fit within a window of width ?
The shifted versions are now defined. Recall that is the sample in the TSS , and similarly for . Consider the Model’s output: for each , compute the largest such that . Define : this is the number of samples in the largest window of duration less than starting at . Similarly, we compute for the Implementation TSS for every . The numbers and could in general vary with due to an adaptive sampling period. Define . is the smallest number of samples in a window of size less than anywhere in . Assuming that , it comes that .^{3}^{3}3The case where occurs when the Model trajectory is Zeno: i.e., when it contains an infinite number of samples without advancing time. This can result from a modeling artifact [?]. The condition effectively says we have at least two different timesteps, and so the trajectory is not initially Zeno. This constitues the size of the shift (forward and backward) to apply to . Similarly, define for the Implementation. We may now define shifted versions of the output trajectories via the discrete shift operator: for , , with
(3)  
when , and
(4)  
when . Note that the filler values at both ends of the shifted sequences (3),(4) are obtained by constant interpolation.
Recall Def. 3.1(a). This condition can be captured by saying that at all , there exists a such that . Analogously for Def. 3.1(b).
Now closeness may be expressed as the following MTL formula ( is the logical OR operator, is the logical AND operator, and is the temporal ‘Always over the time interval ’ operator  see Appendix)
(5)  
(6)  
(7) 
Because closeness only requires that the two signals be within of each other at least once in a window of size , and use disjunction: it is sufficient for one shifted comparison to be less than .
HybridTSS: To define the MTL formula over hybridTSS, we must break up each trajectory into segments, such that there are no jumps within a segment. Specifically, consider the hybridTSS , with
Assume that there are only unique values of that appear in , corresponding to jumps. We divide the hybridTSS into segments , such that is constant over a segment. Each segment can be viewed as a realTSS. If we apply this procedure to and , we get Model segments and Implementation segments . Let . We can now apply the above procedure to every pair , with the important difference that the shifted sequences (3),(4) are filled with an arbitrarily large value (or ), and not by constant interpolation. This is to reflect that a comparison past the jump point is not valid. This results in formulae obtained via (7). The complete formula can now be written
(8) 
Note there are other ways of defining the MTL formula for hybridTSS that directly incorporate the jump counter in the formula. Comparing these different methods is outside the scope of this paper. Unless otherwise indicated, all the discussion that follows applies equally to the formula obtained via (7) (for realTSS) or (8) (for hybridTSS).
We can now use existing tools, like STaLiRo [?, ?], to find a pair of trajectories (equivalently, a trajectory of the parallel interconnection) which falsify . STaLiRo uses, among others, Simulated Annealing (SA) to find falsifying trajectories. If such a trajectory is not found, convergence properties of SA imply that with probability approaching 1, the property is satisfied by the systems; equivalently, that the two systems are indeed conformant.
We should stress at this point that the proposed method is not specific to closeness. It is more widely applicable to any applicationdependent conformance notion that can be expressed as an MTL formula, including those from the examples in Section Conformance Testing as Falsification for CyberPhysical Systems. For example, for the case when mode sequences are allowed to diverge for at most a predefined duration (Example 4), the conformance relation is expressed as: “For every initial condition and every input signal , whenever the two systems are in different modes, they will be back in the same mode within sec”. This can now be written as the MTL formula:
(9) 
We conclude this section with a word on how to practically falsify (or any of the other applicationdependent notions). A method that has proved efficient is to minimize the robustness of the trajectories w.r.t the MTL property. In this work, we use spatial robustness [?, ?] and time robustness [?]. Spatial robustness measures how far in the output space a given trajectory is from the nearest trajectory with opposite truth value for .^{4}^{4}4If the mode is observable, spatial robustness also computes the (quasi) distance between the modes of the two trajectories [?], but we don’t make use of this here. The spatial robustness of trajectory starting at time w.r.t. formula is denoted as follows
Computing is done on the output trajectory without any reference to the system that generated it.
Time robustness measures by how much to shift the given trajectory in time, to change its truth value w.r.t. . Two time robustness values may be measured for each trajectory: the future robustness and the past robustness , depending on whether the signal is shifted left (so future values are introduced) or right (so past values are introduced). In this work we explicitly denote time robustness by
The spatial [?] and temporal [?] robust semantics of MTL formulae are reviewed in the appendix.
Both types of robustness (spatial and temporal) satisfy the fundamental theorem that a negative robustness value indicates falsification, a positive value indicates satisfaction, and a value of 0 indicates that an infinitesimal change in the trajectory (in space or in time) will change its truth value. Therefore, the search for a falsifying trajectory can be recast as the problem of minimizing over . To make this a finitedimensional optimization, the input signals are parameterized with a finite number of parameters. (This parametrization effectively limits the search space, and the global minimum returned by falsification is a minimum over this limited space. But the parametrization can typically be made as precise as desired, e.g. to within the approximation error of the minimization algorithm). As our objective is to find falsifying trajectories, we stop the search as soon as it encounters a trajectory with negative robustness.
Now it is possible to create an example which displays a (graphically) convergent sequence of trajectories such that does not converge to . This holds true for both spatial and temporal robustness. So even if Model and Implementation are not conformant (for a given value of ), local optimization algorithms can get trapped in local minima with positive robustness. On the other hand, nonconformant trajectory pairs will necessarily have negative robustness, so that if a Model/Implementation pair is nonconformant, all global minima of the robustness are negative, and correspond to nonconformant pairs of trajectories. Thus we need to use global optimizers, like Simulated Annealing, CrossEntropy [?] or other methods supported by [?].
In addition to verifying whether two systems are close for a given , we may find a smallest such pair with the order defined in Def.3.3. Recall now that is monotonic in (remark 3.3). The following theorem shows that the robustness values are also monotonic in the parameters . The proof is in Appendix Conformance Testing as Falsification for CyberPhysical Systems.
Theorem 4.1
Take two TSS and , a test duration , a number of jumps , and a time . Consider the parallel concatenation
(i) Fix . If , then
(ii) Now fix . If , then
Therefore, we can combine STaLiRo with a binary search over the values of and to find a smallest pair such that is satisfied. Because the order on pairs is only partial, binary search is applied to each component while fixing the other, thus exploring the Paretooptimal front (e.g. [?]). Algorithm 1 shows the binary search for the smallest given a . A search over can be done similarly with obvious modifications. The initial can be found by using an initial binary search that doubles some until .
The value returned by this procedure gives a quantitative measure of conformance between the two systems, and allows the designer to make informed tradeoffs between, say, output accuracy of the Implementation, and its timing characteristics.
Remark 4.1
For a given , the smallest such that two trajectories and are close can be calculated as
(10)  
(11)  
(12) 
Similar definitions hold for the smallest given an . We can minimize over the space of TSS to determine a smallest such that the two systems are closeness. The approach in Algorithm 1 has the advantage of working not just for closeness, but any other, applicationdependent, notion of conformance.
We illustrate the proposed approach on three systems, including a commercial high fidelity engine model. In all experiments, we didn’t restrict the maximum number of jumps in a given trajectory; rather, the simulation ended only when simulation time reached . So below, we set equal to some appropriately large .
Example 5 (Example 1 continued)
We use the FC Model and Implementation from Example 1 to illustrate the application of Algorithm 1 to find the tightest values of and such that closeness is true. Because the pairs are partially ordered, we are looking for the Paretooptimal front. We decided to fix at 0.01, and do a search over . To determine which value of to start the search from, we computed the maximum relative error between the outputs of the LUTs and the outputs of the corresponding polynomials over a window of 85 seconds, using randomly generated inputs. The maximum relative error was 0.4091. Obviously, because the LUTs are deep in the system, we do not expect the same relative error at their outputs as that at the output of the entire system. However, this duplicates the typical procedure for deciding how many entries to have in an LUT: fewer levels consumes less memory and makes for a faster computation, but causes greater error. So the designer starts from a few entries and observes the output of the system. If the error in the oputput is not acceptable, entries are added to the LUT to provide a better approximation. And so on.
Figure 5 shows a closeup of the the output trajectories from System and Implementation. Note that, as shown in Fig. 5 for the Fuel output, the two trajectories don’t simply diverge and maintain one distance from each other, but rather, they diverge for a period only to meet up again. This interplay between time difference and space difference is wellcaptured by closeness.
STaLiRo [?] was run at each iteration of the binary search to falsify . Algorithm 1 found an interval over which the robustness varies between between and . That is, The two systems are close with .
Example 6 (High fidelity engine model)
Our second experiment was performed on a Model and Implementation of an automatic transmission. The transmission has one input (throttle angle), and two outputs: the speed of the engine (RPM) and the speed of the vehicle (MPH), i.e., . Here too, the goal is to find a smallest such that the two are systems are close. The Model is a slightly modified version of the Automatic Transmission model provided by Mathworks as a Simulink demo^{5}^{5}5Available at: http://www.mathworks.com/products/simulink/demos.html. The model is shown in Figure Conformance Testing as Falsification for CyberPhysical Systems right. It contains 69 blocks including 2 integrators, 3 lookup tables, 3 2D lookup tables and a Stateflow chart. The Stateflow chart contains two concurrently executing Finite State Machines with 4 and 3 states, respectively.
The Implementation is the Enginuity model of a Port Fuel Injected spark ignition engine from Simuquest [?] with 56 states and a large number of black box components. A overview of the components of the model is shown in Figure Conformance Testing as Falsification for CyberPhysical Systems left. It is significantly more complex than the Model, as it models the effects of combustion from first physics principles on a cylinderbycylinder basis, while also including regression models for particularly complex physical phenomena.
The initial conditions are the initial RPM and the initial vehicle speed, both of which must be 0. Therefore, . This means the output trajectories depend only on the input signal . The throttle at each point in time can take any value between 0 (fully closed) and 100 (fully open). We remark that the system is deterministic, i.e., under the same input , we will always observe the same output . Test duration is set to secs.
In 31 iterations, binary search found an interval of [4.8833, 4.8834], over which the spatial robustness varies between 0.00013 and 0.03. Thus the Model and Implementation are close with . In Figure Conformance Testing as Falsification for CyberPhysical Systems we present two output trajectories that fail the closeness specification given the same input sequence.
Example 7
To illustrate the falsification of applicationdependent notions, we choose given by (9), and apply it to the navigation benchmark Nav0 from [?]. Nav0 is a 4D hybrid automaton with 16 modes. Its guard sets are categorized as either ‘horizontal’ or ‘vertical’. Fifteen implementations are generated by varying the continuous dynamics in each mode (resulting in Implementations DynDyn), and varying the horizontal guards (resulting in Implementations HGHG) and vertical guards (resulting in Implementations VGVG). The variations are such that the difference between Nav0 and Dyn is smaller than the difference between Nav0 and Dyn. Similarly, the difference between Nav0 and HG is smaller than the difference between Nav0 and HG, and comparable to that between Nav0 and VG.
We ran STaLiRo to minimize , the temporal robustness of . Simulated Annealing (SA) was used as optimizer. Since it is a stochastic algorithm, to collect statistics, we ran 20 runs of 500 tests each, and each test lasts for seconds. was set to 0.5. The results are presented in Table 7. 12 out of the 15 implementations were falsified, i.e. found to be nonconformant to the Model. Implementations HG are robustly conformant to the Model, as their robustness was infinite: this means that modifying the horizontal guards within the amounts prescribed by HG can not affect PWC conformance. On the other hand, only one test was sufficient to falsify with the vertical guard modifications. This shows great sensitivity of the system to the vertical guard conditions. This is useful design input, as it tells the designers that they can tradeoff horizontal guard implementation accuracy for greater accuracy in implementing the vertical guards.
Implementations  Nb falsifying runs  Avg nb of tests  Avg robustness  Avg falsification time 
(out of 20)  required for falsification  
17  181.47  153.12  
13  119.3  98.08,  
18  141.77  117.71  
20  41,45  33.12  
20  31.65  24.82  
20  27.55  21.71  
20  11.6  8.60  
20  2.15  0.081  1.59  
20  1.15  0.90  
0  N/A  N/A  
0  N/A  N/A  
0  N/A  N/A  
20  1  0.46  
20  1  0.47  
20  1  0.48 
Tretmans [?] defined InputOutput conformance (ioco) as requiring that the Implementation never produces an output that can not be produced by the specification, and it is never the case that the Implementation fails to produce an output when the specification requires one. Both Implementation and specification are modeled as (discrete) labeled transition systems. Van Osch [?] later extended ioco to hybrid transition systems (HTS) by incorporating continuoustime inputs. This hybrid ioco is not testable in practice because the state space and transition relations of an HTS are uncountable, and the test generation algorithm proposed in [?] doesn’t contain a mechanism for judiciously choosing tests from the infinite set of possible tests.
Later work [?] also extends [?] by treating the Implementation as a black box that generates timed traces, and representing the specification as a timed automaton. The objective is to verify, for each trace generated by the Implementation, whether it satisfies the invariants of the specification automaton. As such, this conformance notion does does not address this paper’s goal of verifying ‘similarity’ between an Implementation and its Model, which is a more comprehensive problem. The work by Brandl et al. [?] utilizes (discrete) action systems [?] to provide a discrete view of hybrid systems (a modeling formalism for CPS). Thus Tretmans’ ioco can be applied to the nowdiscrete system. This method requires knowledge of the internal system structure, which we do not assume in our work.
In [?], a distance between systems is also defined via a distance between trajectories. The closeness notion used there can be shown to be weaker than closeness, so that proving two systems to be close implies they are close in the sense of [?]. In fact, closeness provides a continuum of closeness degrees between the two extremes presented in [?].
In this paper, we have defined conformance between a Model and its Implementation as a degree of closeness between the outputs of the two systems. This notion is quantifiable, thus allowing us to speak of degrees of conformance, giving a richer picture of the relation between the two systems. It is also applicable to very general system models, which allows us to study the conformance of Models to complex Implementations. This conformance was then expressed as an MTL formula, allowing us to use existing falsification tools to find nonconformant behavior of Model and Implementation, if it exists.
Because a CPS will usually have several operating modes with different dynamics, it will be interesting in future work to explicitly incorporate the mode switching into the MTL formulae. Finally, a more complete theory of conformance should also account for different time domains between the Model’s trajectories and the Implementation’s trajectories.
The work presented here benefited from the input of Raymond Turin, Founder and CTO at SimuQuest, who provided assistance in working with the SimuQuest Enginuity model.
This work was partially funded under NSF awards CNS 1116136, CNS 1319560, IIP0856090 and the NSF I/UCRC Center for Embedded Systems.
APPENDIX
In this section, we review the robust semantics of MTL formulas. Details on the theory and algorithms are available in our previous work [?, ?].
Definition A.1 (MTL Syntax)
Let be the set of atomic propositions and be any nonempty interval of . The set of all wellformed MTL formulas is inductively defined as , where and is true.
We provide semantics that map an MTL formula and an output trajectory of to a value drawn from . For an atomic proposition , the semantics evaluated for consists of the distance between and the set labeling . Intuitively, this distance represents how robustly the point lies within (or is outside) the set . If this distance is zero, then the smallest perturbation of the point can affect the outcome of . We denote the spatial robust valuation of the formula over the trajectory at time by . Here is such that for some . The solution always starts from time . Formally, .
Definition A.2 (Robust Semantics)
Let be a realTSS output of (Conformance Testing as Falsification for CyberPhysical Systems) and , and let be a nonempty interval on the real line. Then the robust semantics of any formula with respect to is defined as:
where is the signed distance of from a set
where , and stand for the supremum and infimum, respectively, and and . The semantics of the other operators can be defined using the above basic operators. E.g., and .
It can be shown [?] that if the signal satisfies the property, then its robustness is nonnegative, and if the signal does not satisfy the property, then its robustness is nonpositive.
The time robust semantics differ from the above only in the definition of the base case. Take such that for some . If we let denote the truth value of , then
The rest of the equations above follows through unchanged.
We start by proving the result for realTSS. The extension to hybridTSS will then follow immediately. So start by considering that and are realTSS, and for convenience, we will use to denote their parallel concatenation . Recall (5),(6), and the robust semantics of MTL from Appendix Conformance Testing as Falsification for CyberPhysical Systems.
(i) Define , and the atomic proposition . Equation (5) can be written as
So it holds that
Thus with , . By the robust semantics,
Similarly, we can show . Thus