A Robust Genetic Algorithm for LearningTemporal Specifications from Data
Abstract
We consider the problem of mining signal temporal logical requirements from a dataset of regular (good) and anomalous (bad) trajectories of a dynamical system. We assume the training set to be labeled by human experts and that we have access only to a limited amount of data, typically noisy.
We provide a systematic approach to synthesize both the syntactical structure and the parameters of the temporal logic formula using a twosteps procedure: first, we leverage a novel evolutionary algorithm for learning the structure of the formula; second, we perform the parameter synthesis operating on the statistical emulation of the average robustness for a candidate formula w.r.t. its parameters.
We test our algorithm on a anomalous trajectory detection problem of a naval surveillance system and we compare our results with our previous work [1] and with a recently proposed decisiontree [2] based method. Our experiments indicate that the proposed approach outperforms our previous work w.r.t. accuracy and show that it produces in general smaller and more compact temporal logic specifications w.r.t. the decisiontree based approach with a comparable speed and accuracy.
I Introduction
Learning temporal logic requirements from data is an emergent research field gaining momentum in the rigorous engineering of cyberphysical systems. Classical machine learning methods typically generate very powerful blackbox (statistical) models. However, these models often do not help in the comprehension of the phenomenon they capture. Temporal logic provides a precise formal specification language that can be easily interpreted by humans. The possibility to describe datasets in a concise way using temporal logic formulas can thus help to better clarify and comprehend which are the emergent patterns for the system at hand. A clearcut example is the problem of anomaly detection, where the input is a set of trajectories describing regular or anomalous behaviors, and the goal is to learn a classifier that not only can be used to detect anomalous behaviors at runtime, but also gives insights on what characterizes an anomalous behavior. Learning temporal properties is also relevant in combination with state of the art techniques for searchbased falsification of complex closedloop systems [3, 4, 5, 6], as it can provide an automatic way to describe unwanted (or desired behaviors) that the system needs to satisfy.
In this paper, we consider the problem of learning a temporal logic specification from a set of trajectories which are labeled by human experts (or by any other method which is not usable for realtime monitoring) as good for the normal expected behaviors and bad for the anomalous ones. The goal is to automatically synthesize both the structure of the formula and its parameters providing a temporal logic classifier that can discriminate as much as possible the bad and the good behaviors. This specification can be turned into a monitor that output a positive verdict for good behaviors and a negative verdict for bad ones.
Related Work
Mining temporal logic requirements is an emerging field of research in the analysis of cyberphysical systems [7, 8, 9, 10, 11, 12, 13, 2, 1]. Most of the literature focuses in particular on the problem of learning the optimal parameters given a specific template formula [7, 8, 9, 10, 11, 12]. In [7], Asarin et al. extend the Signal Temporal Logic (STL) [14] with the possibility to express the time bounds of the temporal operators and the constants of the inequalities as parameters. They also provide a geometric approach to identify the subset of the parameter space that makes a particular signal to satisfy an STL specification. In [8] the authors use the same parametric STL extension in combination with the quantitative semantics of STL to perform a counterexample guided inductive parameter synthesis. This approach consists in iteratively generating a counterexample by executing a falsification tool for a template formula. The counterexample found at each step is then used to further refine the parameter set and the procedure terminates when no other counterexamples are found. In general, all these methods, when work directly with raw data, are potentially vulnerable to the noise of the measurements and they are limited by the amount of data available.
Learning both the structure and the parameters of a formula from a dataset poses even more challenges [13, 2, 9, 1]. This problem is usually addressed in two steps, learning the structure of the formula and synthesizing its parameters. In particular, in [13] the structure of the formula is learned by exploring a directed acyclic graph and the method exploits Support Vector Machine (SVM) for the parameter optimization. In [2] the authors use instead a decision tree based approach for learning the formula, while the optimality is evaluated using heuristic impurity measures.
In our previous works [9, 1] we have also addressed the problem of learning both the structure and the parameters of a temporal logic specification from data. In [9] the structure of the formula is learned using an heuristics algorithm, while in [1] using a genetic algorithm. The synthesis of the parameters is instead performed in both cases using the Gaussian Process Upper Confidence Bound (GPUCB) [15] algorithm. GPUCB provides a statistical emulation of the satisfaction probability (and not of the average robustness as in this paper) of a formula for a given set of parameters. Both these methods require to learn first a statistical model from the training set of trajectories.
Our contribution
The approach presented in this paper instead does not require to learn a statistical model of the training set of trajectories. Furthermore, we introduce a number of techniques to improve the performance and to deal with the noise of the data, and even more important here, in the absence of an underlying model.
First, instead of using the probability satisfaction as evaluator for the best formula, we consider the quantitative semantics (or robustness) of STL and in particular the average robustness introduced in [5]. The average robustness enables us to differentiate among STL classifiers that have similar discriminating performance w.r.t. the data by choosing the most robust. This gives us more information than just having the probability of satisfaction: for each trajectory, we can evaluate how much is it closed to the “boundary” of the classifier, instead of only checking whether it satisfies or not a formula. We then modify the discrimination function and the GPUCB algorithm used in [9] and [1] to better deal with the noise of the data and to use the quantitative semantics to emulate the average robustness distribution w.r.t. to the parameter space of the formula.
Second, we reduce the computational cost of the Evolutionary Algorithm (EA) (with respect to [1]), by using a lightweight configuration (i.e., a low threshold of max number of iterations) of the GPUCB optimization algorithm to estimate the parameters of the formulas at each generation. The EA algorithm generates, as a final result, an STL formula tailored for classification purpose.
Paper structure
The rest of the paper is organized as follow: in the next section we present the Signal Temporal Logic and its robust semantics. We then introduce the problem considered in Section III. In section IV, we describe our approach. The results are presented in Section V. Finally, we conclude the paper in Section VI, by discussing the implications of our contribution, both from the practical and the methodological aspect and some possible future works.
Ii Signal Temporal Logic
Stl
Signal Temporal Logic (STL) [14] is a formal specification language to express temporal properties over realvalues trajectories with densetime interval. For the rest of the paper, let be a trace/trajectory, where is the time domain, is the value at time of the projection on the coordinate, and , as an abuse on the notation, is used also to indicate the set of variables of the trace considered in the formulae.
Definition II.1 (STL syntax)
The syntax of STL is given by
where is the Boolean true constant, is an atomic proposition, inequality of the form (), negation and conjunction are the standard Boolean connectives, and is the Until temporal modalities, where is a real positive interval. As customary, we can derive the disjunction operator and the future eventually and always operators from the until temporal modality
STL can be interpreted over a trajectory using a qualitative (yes/no) or a quantitative (realvalue) semantics [14, 16]. We report here only the quantitative semantics and we refer the reader to [17, 14, 16] for more details.
Definition II.2 (STL Quantitative Semantics)
The quantitative satisfaction function returns a value ^{1}^{1}1 quantifying the robustness degree (or satisfaction degree) of the property by the trajectory at time . It is defined recursively as follows:
Moreover, we let .
The sign of provides the link with the standard Boolean semantics of [14]: only if , while only if ^{2}^{2}2The case , instead, is a borderline case, and the truth of cannot be assessed from the robustness degree alone.. The absolute value of , instead, can be interpreted as a measure of the robustness of the satisfaction with respect to noise in signal , measured in terms of the induced perturbation in the secondary signal . This means that if then for every signal for which every secondary signal satisfies , we have that if and only if (correctness property).
Pstl
Parametric Signal Temporal Logic [7] is an extension of STL that parametrizes the formulas. There are two types of formula parameters: temporal parameters, corresponding to the time bounds in the time intervals associated with temporal operators (e.g. , with , s.t. ) and the threshold parameters corresponding to the constants in the inequality predicates (e.g., , where is a variable of the trajectory). In this paper, we allow only atomic propositions of the form with . Given an STL formula , let be the parameter space, where is the temporal parameter space ( being the number of temporal parameters), and is the threshold parameter space ( being the number of threshold parameters). A is a parameter configuration that induces a corresponding STL formula; e.g., then .
Stochastic Robustness
Let us consider an unknown stochastic process , where each vector corresponds to the state of the system at time . For simplicity, we indicate the stochastic model with . can be seen also as a random variable on the space valued cadlag functions , here denoted by , assuming the domain to be fixed. It means that the set of trajectories of the stochastic process is the set . Let us consider now an STL formula , with predicates interpreted over state variables of . Given a trajectory of a stochastic system, its robustness can be seen as a functional from the trajectories in to . It can be proved that is measurable (with respect to the Borel algebra of the standard topology of ) [5]. From this assumption, we can define the realvalued random variable with probability distribution:
Applying this definition to a stochastic model, we obtain a distribution of robustness degrees which can be summarised by the average robustness degree of this distribution conditionally to the stochastic process , , giving a measure of how strongly a formula is satisfied. The satisfaction is more robust when this value is higher. In this paper, we will approximate this expectation by Monte Carlo sampling, i.e., using Statistical Model Checking.
Iii Problem Formulation
In this paper, we focus our attention in learning the best property (or set of properties) that discriminates trajectories belonging to two different classes, say good and bad, starting from a labeled dataset of observed trajectories. Essentially, we want to tackle a supervised twoclass classification problem over trajectories, by learning a temporal logic discriminant, describing the temporal patterns that better separate two sets of observed trajectories.
The idea behind this approach is that there exists an unknown procedure that, given a temporal trajectory, is able to decide if the signal is good or bad. This procedure can correspond to many different things, e.g., to the reason of failure of a sensor that breaks when it receives certain inputs. Our task is to approximate this unknown procedure with an STL monitoring algorithm. In general, as there may not be an STL formula that perfectly explains/mimics the unknown procedure, our task is to find the most effective one.
The approach we present here works directly with observed data, and avoids the reconstruction of an intermediate generative model of trajectories conditioned on their class , as in [1, 9]. The reason is that such models can be hard to construct, even if they provide a powerful regularization, as they enable the generation of an arbitrary number of samples to train the logic classifier.
In a purely datadriven setting, to build an effective classifier, we need to deal with the fact that training data is available in limited amounts and it is typically noisy. This reflects in the necessity of finding methods that guarantee good generalization performance and avoid overfitting. In our context, overfitting can result in overly complex formulae, de facto encoding the training set itself rather than guessing the underlying patterns that separate the trajectories. This can be faced by using a score function based on robustness of temporal properties, combined with suitably designed regularizing terms.
We want to stress that the approach we present here, due to the use of the average robustness of STL properties, can be easily tailored to different problems, like finding the property that best characterise a single set of observations.
Iv Methodology
Learning an STL formula can be separated in two main optimization problems: the learning of the formula structure and the synthesis of the formula parameters. The structural learning is treated as a discrete optimization problem using an Evolutionary Algorithm (EA); the parameters learning, instead, considers a continuous parameter space and exploits an active learning algorithm called Gaussian Process Upper Confidence Bound Algorithm (GPUCB). The two techniques are not used separately but combined together in a compositional way. Each iteration of the EA produces a set (generation) of formulas. Then, the GPUCB algorithm, for each formula of this generation, finds the best parameter configuration such that the formula better discriminates between the two data sets. To apply this second optimization we need to define a score function to optimize, encoding the criterion to discriminate between the two data sets. We describe below in order: the discrimination function algorithm, the learning of the parameters of a formula and the learning of the formula structure.
Discrimination Function
We have two data sets and and we search for the formula that better separates these two classes. We define a function able to discriminate between this two data sets, such that maximising this discrimination function corresponds to finding the best formula classifier. In this paper, we decide to use, as evaluation of satisfaction of each formula, the quantitative semantics of STL. As described in Section II, this semantics returns a realvalue of satisfaction instead of only a yes/no answer.
Given a dataset , we assume that the data comes from an unknown stochastic process . The process in this case is like a blackbox for which we observe only a subset of trajectories, the dataset . We can then evaluate the averages robustness and its variance , averaging over .
To discriminate between the two dataset and , we search the formula that maximizes the difference between the average robustness of , , and the average robustness of , divided by the sum of the respective standard deviation.
(1) 
This formula is proportional to the probability that a new point sampled from the same distribution of points generating or , will belong to one set and not to the other. In fact, an higher value of implies that the two average robustness will be sufficiently far away, compared to their lengthscale, given by the standard deviation .
As said above, we can evaluate just a statistical approximation of because we are working with just evaluations of the unknow functions . We will see in the next paragraph how we tackle this problem.
Learning the Parameters of a formula
Given a formula and a parameter space , we want to find the parameter configuration that maximises the score function , considering that we have only noise and costly evaluation of it. So, the question is therefore how to best estimate (and optimize) an unknown function from observations of its value at a finite set of input points. This is a classic nonlinear nonconvex optimization problem that we tackle by means of the GPUCB algorithm [15]. This algorithm interpolates the noisy observations using a stochastic process (a procedure called emulation in statistics) and optimize the emulation function using the uncertainty of the fit to determine regions where the true maximum can lie.
More specifically, the GPUCB bases its emulation phase on Gaussian Processes, a Bayesian nonparametric regression approach [18], adding candidate maximum points to the training set of the GP in an iterative fashion, and terminating when no improvement is possible (see [15] for more details).
After this optimization, we have found a formula that separates the two data sets, not from the point of view of the satisfaction (yes/no) of the formula but from the point of view of the robustness value. In other words, there is a threshold value such that and . Now, we consider the new STL formula obtained translating the atomic predicates of by (e.g., becomes ). Taking into account that the quantitative robustness is achieved by combination of , , and , which are linear algebraic operators w.r.t translations (e.g, ), we easily obtain that and . Therefore, divides the two data set also from the point of view of the satisfaction.
Learning the Structure of the Formula
To learn the formula structure we exploit a modified version of the Evolutionary Algorithm (EA) presented in [1]. EAs are algorithms for search and optimization problems. The strategy takes inspiration from the genetic area, in particular in the selection strategy of species. Our implementation, called RObustness GEnetic (ROGE) is described in Algorithm 1. The algorithm takes as input the data sets (good) and (bad), the parameter space , the size of the initial set of formulae and the number of iterations . It starts generating an initial set of STL formulae, called generation, (line 2). At each iteration (), the EA considers the actual generation , optimizes the parameters of each formula with respect to the discrimination function obtaining the generation (line 4), and then evaluates a fitness function to sample from a subset of formulae (line 5). is equal to , where penalizes formulae with large sizes. From this subset, a new temporary generation is created by means of two genetic operators (line 6): the recombination operator which generates a new formula combining two formulas previously selected and the mutation operator generates small changes in the nodes of the selected formula to introduce innovation in the population. The temporary generation and the old generation are than merged together and the actual new generation is sampled from this set accordingly to the fitness function (line 7). The algorithm returns the last generation of STL formulae. The best formula is the one with the highest value of the discrimination function, i.e., .
Regularization
In the evolutionary algorithm, we use two strategies to penalize complex formulas and bias the algorithm towards simple ones. The first strategy is to use a size penalty in the fitness function: . In this paper, we consider , where is heuristically set such that , i.e. formulae of size 5 get a 50% penalty, and is adaptively computed as the average discrimination in the current generation. An alternative choice of can be done by crossvalidation. The second strategy, instead, consist in selecting the initial population biasing it towards simple formulae. In particular, this set is constructed by (a) a subset of random formulae and (b) a subset of logically nonequivalent temporal properties of size 1, of the form , , , where the atomic predicates are sampled from the subset of all atomic predicates (with threshold zero) or simple boolean combinations of them.
V Maritime Surveillance Case Study
We consider the maritime surveillance case study presented in [2] to compare our framework with their Decision Tree (DTL4STL) approach. The experiments with the DTL4STL approach were implemented in Matlab with the code available at [19]. We also test our previous implementation presented in [1] with the same benchmark. Both the new an the previous learning procedure were implemented in JAVA (JDK 1.8_0) and run on a Dell XPS, Windows 10 Pro, Intel Core i77700HQ 2.8 GHz, 8GB 1600 MHz memory.
The synthetic data set of naval surveillance reported in [2] consists of 2dimensional coordinates traces of vessels behaviours. It considers two kind of anomalous trajectories and regular trajectories, as illustrated in Fig. 1. The dataset contains 2000 total trajectories (1000 normal and 1000 anomalous) with 61 sample points per trace. We fixed the training set as the 80% of the entire dataset and the validation set as the rest 20% of the traces.
We run the experiments using a 10fold crossvalidation in order to collect the mean and variance of the misclassified trajectories of the validation set. Our previous implementation [1] performs so poorly on the chosen benchmark that is not meaningful to report it: the misclassification rate on the validation set is around 50%.
The other obtained performances are presented in Table I as well as the average and variance of the execution time. We also report the DTL4STL performance (DTL4STL in Table I) declared in [2], but we were not be able to reproduce them in our setting.
ROGE  DTL4STL  DTL4STL  
Mis. Rate  
Comp. Time    

An example of the formula that we learn with ROGE is
(2) 
DTL4STL instead does not consider the until operator in the set of primitives (see [2] for more details) and the formula found by the tool using the same dataset is the following:
The specific formula generated using ROGE is simpler than the formula generated using DTL4STL and indeed it is easier to understand it. More specifically, the formula 2 identifies all the regular trajectories, remarkably resulting in a misclassification error equal to zero, as reported in Table I. The red anomalous trajectories falsify the predicate without meanwhile verifying , on the contrary the blue anomalous trajectories globally satisfy but never verify (consider that all the vessels start from the top right part of the graphic in Figure 1). Both these conditions result in the falsification of the formula 2, which on the contrary is satisfied by all the regular trajectories. In Figure 1, we have reported the threshold and . In terms of accuracy our approach is comparable w.r.t. the performance of the DTL4STL approach shown in [2]. Similarly to other EA based approach, ROGE needs to be configured in some of its parameters (such as the mutation, the recombination probability, the fitness function and the initial population) meaning that its performance can be influenced by a specific setting.
Vi Conclusion
We present a framework to learn a Signal Temporal Logic (STL) specification from a labeled dataset of regular and anomalous trajectories that better discriminates them in two subsets. In particular, we design a Robust Genetic algorithm (ROGE) that combines an Evolutionary Algorithm (EA) for learning of the structure of the formula and a Gaussian Process Upper Confidence Bound algorithm (GPUCB) for the synthesis of the formula parameters. We compare ROGE with our previous work in [1] and the Decision Tree approach presented in [2] on a anomalous trajectory detection problem of a naval surveillance system.
With respect to the previous work [9, 1], we avoid the reconstruction of a generative statistical model for the dataset and we present a procedure that works well also directly with the dataset. Furthermore, to improve the quality of the learning procedure, we modify both the structure and parameters optimization algorithms. Concerning the learning of the structure, we modify the EA drastically improving its performance, in particular we change the policy to choose the initial formula generation and we simplify the genetic operators of mutation and recombination. Concerning the learning of the parameters, we leverage the average robustness (using the STL quantitative semantics) to discriminate also between two trajectories satisfying the same STL specification, but with two distinct robustness values that can be separated by a threshold. This provides more information that can be exploited in the optimization process.
We observe that the application of our previous approach [1] directly on the dataset of the naval surveillance system considered here, performs very poorly. We compare our method also with the Decision Tree (DTL4STL) approach of [2] showing that we have a comparable accuracy producing smaller and easier to understand STL specifications. Furthermore, we do not restrict the class of the temporal formula to only eventually and globally and we do not impose only one possible temporal nesting. On the other hand, the genetic algorithm can get completely wrong results if the initial formula generation is chosen completely randomly. Note that our choice of initial formualae is a way to bias the search towards simple properties, i.e. it is a form of regularization, and resembles the choice of the set of primitive in the DTL4STL approach.
As future work we aim to improve the genetic algorithm avoiding the optimisation of similar formulas. We plan to exploit the Bagging (or Bootstrap aggregating) technique to face the over fitting problem and guarantee good generalization performance. The idea is to divide our training set in a number of subsets and apply the genetic algorithm to each subset. This will decrease the effect of possible biases in the initial distribution of the data. Finally, we plan also to test our procedure in more interesting case studies, particularly in the presence of strong noise in the data sets, where we believe our method can exhibit good results.
Acknowledgment
E.B. and L.N. acknowledge the partial support of the Austrian National Research Network S 11405N23 (RiSE/SHiNE) of the Austrian Science Fund (FWF). E.B., L.N. and S.S. acknowledge the partial support of the ICT COST Action IC1402 (ARVI).
References
 [1] S. Bufo, E. Bartocci, G. Sanguinetti, M. Borelli, U. Lucangelo, and L. Bortolussi, “Temporal logic based monitoring of assisted ventilation in intensive care patients,” in Proc. of ISoLA, 2014, pp. 391–403.
 [2] G. Bombara, C.I. Vasile, F. Penedo, H. Yasuoka, and C. Belta, “A Decision Tree Approach to Data Classification Using Signal Temporal Logic,” in Proc. of HSCC, 2016, pp. 1–10.
 [3] A. Zutshi, S. Sankaranarayanan, J. V. Deshmukh, J. Kapinski, and X. Jin, “Falsification of safety properties for closed loop control systems,” in Proc. of HSCC, 2015, pp. 299–300.
 [4] S. Sankaranarayanan, S. A. Kumar, F. Cameron, B. W. Bequette, G. E. Fainekos, and D. M. Maahs, “Modelbased falsification of an artificial pancreas control system,” SIGBED Review, vol. 14, no. 2, pp. 24–33, 2017.
 [5] E. Bartocci, L. Bortolussi, L. Nenzi, and G. Sanguinetti, “System design of stochastic models using robustness of temporal properties,” Theor. Comput. Sci., vol. 587, pp. 3–25, 2015.
 [6] S. Silvetti, A. Policriti, and L. Bortolussi, “An active learning approach to the falsification of black box cyberphysical systems,” in Proc. of IFM, 2017, pp. 3–17.
 [7] E. Asarin, A. Donzé, O. Maler, and D. Nickovic, “Parametric identification of temporal properties,” in Proc. of RV, 2012, pp. 147–160.
 [8] H. Yang, B. Hoxha, and G. E. Fainekos, “Querying parametric temporal logic properties on embedded systems,” in Proc. of ICTSS, 2012, pp. 136–151.
 [9] E. Bartocci, L. Bortolussi, and G. Sanguinetti, “Datadriven statistical learning of temporal logic properties,” in Proc. of FORMATS, 2014, pp. 23–37.
 [10] X. Jin, A. Donzé, J. V. Deshmukh, and S. A. Seshia, “Mining requirements from closedloop control models,” IEEE Trans. on CAD of Integrated Circuits and Systems, vol. 34, no. 11, pp. 1704–1717, 2015.
 [11] L. V. Nguyen, J. Kapinski, X. Jin, J. V. Deshmukh, K. Butts, and T. T. Johnson, “Abnormal Data Classification Using TimeFrequency Temporal Logic,” in Proc. of HSCC, 2017, pp. 237–242.
 [12] J. Zhou, R. Ramanathan, W. Wong, and P. S. Thiagarajan, “Automated property synthesis of odes based biopathways models,” in Proc. of CMSB, 2017, pp. 265–282.
 [13] Z. Kong, A. Jones, and C. Belta, “Temporal Logics for Learning and Detection of Anomalous Behavior,” IEEE Transactions on Automatic Control, vol. 62, no. 3, pp. 1210–1222, Mar. 2017.
 [14] O. Maler and D. Nickovic, “Monitoring temporal properties of continuous signals,” in Proc. of FORMATS, ser. LNCS, vol. 3253, 2004, pp. 152–166.
 [15] N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger, “Informationtheoretic regret bounds for gaussian process optimization in the bandit setting,” IEEE Transactions on Information Theory, vol. 58, no. 5, pp. 3250–3265, 2012.
 [16] A. Donzé, T. Ferrer, and O. Maler, “Efficient robust monitoring for stl,” in Proc. of CAV, 2013, pp. 264–279.
 [17] O. Maler and D. Nickovic, “Monitoring properties of analog and mixedsignal circuits,” STTT, vol. 15, no. 3, pp. 247–268, 2013.
 [18] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. MIT Press, 2006.
 [19] “DTL4STL,” http://sites.bu.edu/hyness/dtl4stl/, 2016.