Observation-Enhanced QoS Analysis of Component-Based Systems

Observation-Enhanced QoS Analysis of Component-Based Systems

Colin Paterson, Radu Calinescu C. Paterson and R. Calinescu are with the Department of Computer Science at the University of York, UK.
Abstract

We present a new method for the accurate analysis of the quality-of-service (QoS) properties of component-based systems. Our method takes as input a QoS property of interest and a high-level continuous-time Markov chain (CTMC) model of the analysed system, and refines this CTMC based on observations of the execution times of the system components. The refined CTMC can then be analysed with existing probabilistic model checkers to accurately predict the value of the QoS property. The paper describes the theoretical foundation underlying this model refinement, the tool we developed to automate it, and two case studies that apply our QoS analysis method to a service-based system implemented using public web services and to an IT support system at a large university, respectively. Our experiments show that traditional CTMC-based QoS analysis can produce highly inaccurate results and may lead to invalid engineering and business decisions. In contrast, our new method reduced QoS analysis errors by 84.4–89.6% for the service-based system and by 94.7–97% for the IT support system, significantly lowering the risk of such invalid decisions.

Quality of service, component-based systems, Markov models, probabilistic model checking.

1 Introduction

Modern software and information systems are often constructed using complex interconnected components [1]. The performance, cost, resource use and other quality-of-service (QoS) properties of these systems underpin important engineering and business decisions. As such, the QoS analysis of component-based systems has been the subject of intense research [2, 3, 4, 5]. The solutions devised by this research can analyse a broad range of QoS properties by using performance models such as Petri Nets [6, 7], layered queuing networks [8], Markov chains [9, 10] and timed automata [11], together with tools for their simulation (e.g. Palladio [12] and GreatSPN [13]) and formal verification (e.g. PRISM [14] and UPPAAL [15]).

These advances enable the effective analysis of many types of performance models. However, they cannot support the design and verification of real systems unless the analysed models are accurate representations of the system behaviour, and ensuring the accuracy of performance models remains a major challenge. Our paper address this challenge for continuous-time Markov chains (CTMCs), a type of stochastic state transition models used for QoS analysis at both design time [10, 16, 17] and runtime [18, 19]. To this end, we present a tool-supported method for Observation-based Markov chaiN refInement (OMNI) and accurate QoS analysis of component-based systems.

The OMNI method comprises the five activities shown in Fig. 1. The key characteristic of OMNI is its use of observed execution times for the components of the analysed system to refine a high-level abstract CTMC whose states correspond to the operations executed by these components. As such, the first OMNI activity is the collection of these execution time observations, which can come from unit testing the components prior to system integration, from logs of other systems that use the same components, or from the log of the analysed system. The second OMNI activity involves the development of a high-level CTMC model of the system under analysis. This model can be generated from more general software models such as annotated UML activity diagrams as in [16, 20], or can be provided by the system developers. The next OMNI activity requires the formalisation of the QoS properties of interest as continuous stochastic logic formulae.

Fig. 1: OMNI workflow for the QoS analysis of component-based systems

The fourth activity of our OMNI method is the refinement of the high-level model. OMNI avoids the synthesis of unnecessarily large and inefficient-to-analyse models by generating a different refined CTMC for each QoS property of interest. This generation of property-specific CTMCs is fully automated and comprises two steps. The first step, called component classification, determines the effect of every system component on the analysed QoS property. The second step, called selective refinement, produces the property-specific CTMC by using phase-type distributions [21] to refine only those parts of the high-level CTMC that correspond to components which influence the QoS property being analysed. As such, OMNI-refined CTMCs model component executions with much greater accuracy than traditional CTMC modelling, whose exponential distributions match only the first moment of the unknown distributions of the observed execution times.

In the last activity of our method, the refined CTMC models generated by OMNI are analysed with the established probabilistic model checker PRISM [14]. As illustrated by the two case studies presented in the paper, these models support the accurate and efficient analysis of a broad spectrum of QoS properties specified in continuous stochastic logic [22]. As such, OMNI’s observation-enhanced QoS analysis can prevent many invalid engineering and business decisions associated with traditional CTMC-based QoS analysis.

The OMNI activities 2, 3 and 5 correspond to the traditional method for QoS property analysis through probabilistic model checking. Detailed descriptions of these activities are available (e.g., in [23, 24, 25, 26, 27]) and therefore we do not focus on them in this paper. Activities 1 and 4 are specific to OMNI. However, the tasks from activity 1 are standard software engineering practices, so the focus of our paper is on the observation-based refinement techniques used in the fourth activity of the OMNI workflow.

Like most methods for software performance engineering [28, 29, 30], OMNI supports both the design of new systems and the verification of existing systems. Using OMNI to assess whether a system under design meets its QoS requirements or to decide a feasible service-level agreement for a system being developed requires the collection of component observations by unit testing the intended system components, or by monitoring other systems that use these components. In contrast, for the verification of the QoS properties of an existing system, component observations can be collected using any of the techniques listed under the first activity from Fig. 1, or a combination thereof.

A preliminary version of OMNI that did not include the component classification step was introduced in [31]. This paper extends the theoretical foundation from [31] with key results that enable component classification, and therefore the synthesis of much smaller and faster to analyse refined CTMCs than those generated by our preliminary OMNI version. This extension is presented in Section 4.2, implemented by our new OMNI tool described in Section 5, and shown to reduce verification times by 54–74% (compared to the preliminary OMNI version) in Section 6.5. This is a particularly significant improvement because the same QoS property is often verified many times, to identify suitable values for the parameters of the modelled system (e.g., see the case studies from [32, 33, 34, 35, 36]). Additionally, we considerably extended and improved the validation of OMNI by evaluating it for the following two systems:

  • A service-based system that we implemented using six real-world web services — two commercial web services provided by Thales Group, three free Bing web services provided by Microsoft, and a free WebserviceX.Net web service. The evaluation of OMNI for this system was based on lab experiments.

  • The IT support system at the Federal Institute of Education, Science and Technology of Rio Grande do Norte (IFRN), Brazil. This system has over 44,000 users — students and IFRN employees (including IT support staff). The evaluation of OMNI for this system was based on real datasets obtained from the system logs.

The rest of the paper is structured as follows. Section 2 introduces the notation, terminology and theoretical background for our work. Section 3 describes the service-based system used to evaluate OMNI, as well as to motivate and illustrate our QoS analysis method throughout the paper. The assumptions and theoretical results underlying the component classification and selective refinement steps of OMNI are presented in Section 4, and the tool that automates their application is described in Section 5. Section 6 evaluates the effectiveness of OMNI for the two systems mentioned above. This evaluation shows that, compared to traditional CTMC-based QoS analysis, our method (a) reduces analysis errors by 84.4–89.6% for the service-based system and by 94.7–97% for the IT support system; and (b) lowers the risk of invalid engineering and business decisions. The experimental results also show a decrease of up to 71.4% in QoS analysis time compared to our preliminary OMNI version from [31]. Section 7 discusses the threats to the validity of our results. The paper concludes with an overview of related work in Section 8 and a brief summary in Section 9.

2 Preliminaries

2.1 Continuous-time Markov chains

Continuous-time Markov chains [37] are mathematical models for continuous-time stochastic processes over countable state spaces. To support the presentation of OMNI, we will use the following formal definition adapted from [32, 38].

Definition 2.1.

A continuous-time Markov chain (CTMC) is a tuple

(1)

where is a finite set of states, is an initial-state probability vector such that the probability that the CTMC is initially in state is given by and , and is a transition rate matrix such that, for any states from , specifies the rate with which the CTMC transitions from state to state , and .

We will use the notation for the continuous-time Markov chain from (1). The probability that this CTMC will transition from state to another state within time units is

and the probability that the new state is is

(2)

A state is an absorbing state if for all , and a transient state otherwise.

The properties of a CTMC are analysed over its set of finite and infinite paths . A finite path is a sequence , where , , is an absorbing state, and, for all , and is the time spent in state . An infinite path from is an infinite sequence where , and, for all , , and the time spent in state is . For any path , the state occupied by the path at time is denoted . For infinite paths, , where is the smallest index for which . For finite paths, is defined similarly if , and otherwise. Finally, the -th state on the path is denoted , where for infinite paths and for finite paths.

Continuous-time Markov chains are widely used for the modelling and analysis of stochastic systems and processes from domains as diverse as engineering, biology and economics [39, 40]. In this paper, we focus on the use of CTMCs for the modelling and QoS analysis of component-based software and IT systems. These systems are increasingly important for numerous practical applications, and advanced probabilistic model checkers such as PRISM [14], MRMC [41] and Storm [42] are available for the efficient analysis of their CTMC models.

2.2 Continuous stochastic logic

CTMCs support the analysis of QoS properties expressed in continuous stochastic logic (CSL) [22], which is a temporal logic with the syntax defined below.

Definition 2.2.

Let AP be a set of atomic propositions, an interval in . Then a state formula and a path formula in continuous stochastic logic are defined by the following grammar:

(3)

CSL formulae are interpreted over a CTMC whose states are labelled with atomic propositions from by a function . The (transient-state) probabilistic operator and the steady-state operator define bounds on the probability of system evolution. Next path formulae and until path formulae can occur only inside the probabilistic operator .

The semantics of CSL is defined with a satisfaction relation over the states and the paths of a CTMC [38]. OMNI improves the analysis of QoS properties expressed in the transient fragment of CSL,111Steady-state properties only depend on the first moment of the distributions of the times spent in the CTMC states, so they are already computed accurately by existing CTMC analysis techniques. with semantics defined recursively by:

where a formal definition for the probability measure on paths starting in state is available in [32, 38]. Note how according to these semantics [38], until path formulae are satisfied by a path if and only if is satisfied at some time instant in the interval and holds at all previous time instants , i.e., for all . Finally, a state satisfies a steady-state formula iff, having started in state , the probability of the CTMC being in a state where holds in the long run satisfies the bound ‘’.

The shorthand notation and is used when in an until formula and when the first part of an until formula is true, respectively. Probabilistic model checkers also support CSL formulae in which the bound ‘’ from is replaced with ‘’, to indicate that the computation of the actual bound is required. We distinguish between the probability that is satisfied by the paths starting in a state , and the probability

that is satisfied by the CTMC. In the analysis of system-level QoS properties, we are interested in computing the latter probability.

2.3 Phase-type distributions

OMNI uses phase-type distributions (PHDs) to refine the relevant elements of the analysed high-level abstract CTMC. PHDs model stochastic processes where the event of interest is the time to reach a specific state, and are widely used in the performance modelling of systems from domains ranging from call centres to healthcare [43, 44, 45]. PHDs support efficient numerical and analytical evaluation [21], and can approximate arbitrarily close any continuous distribution with a strictly positive density in  [46], although PHD fitting of distributions with deterministic delays requires extremely large numbers of states.

A PHD is defined as the distribution of the time to absorption in a CTMC with one absorbing state [21]. The transient states of the CTMC are called the phases of the PHD. With the possible reordering of states, the transition rate matrix of this CTMC can be expressed as:

(4)

where the sub-matrix specifies only transition rates between transient states, is a row vector of zeros, and is an vector whose elements specify the transition rates from the transient states to the absorbing state. The elements from each row of add up to zero (cf. Definition 1), so we additionally have , where and are column vectors of ones and zeros, respectively. Thus, and the PHD associated with this CTMC is fully defined by the sub-matrix and the row vector containing the first elements of the initial probability vector (as in most practical applications, we are only interested in PHDs that are acyclic and that cannot start in the absorbing state). We use the notation for this PHD.

2.4 Erlang distributions

The Erlang distribution [47] is a form of PHD in which exponential phases, each with the same rate parameter , are placed in series. The Erlang distribution has a -element initial probability vector , such that the system always starts in an initial state and successively traverses states until it reaches an absorbing state . The distribution represents the expected time to reach the absorbing state, and has the cumulative distribution function

(5)

for , the mean , and the variance (which approaches zero as ).

3 Motivating example: QoS analysis of a web application

Fig. 2: High-level abstract CTMC modelling the handling of a request by the web application

To illustrate the limitations of traditional CTMC-based QoS analysis, we consider a travel web application that handles two types of requests:

  • Requests from users who plan to meet and entertain a visitor arriving by train.

  • Requests from users looking for a possible destination for a day trip by train.

The handling of these requests by the application is modelled by the high-level abstract CTMC from Fig. 2, which can be obtained from a UML activity diagram of the application. The method for obtaining a Markov chain from an activity diagram is described in detail in [16, 20, 48, 49]. This method requires annotating the outgoing edges of decision nodes from the diagram with the probabilities with which these edges are taken during the execution of the modelled application. Markov model states are then created for each of the activities, decision and start/end nodes in the diagram, and state transitions are added for each edge between these nodes; the transitions corresponding to outgoing edges of decision nodes “inherit” the probabilities that annotate these edges, while all other transition probabilities have a value of 1.0.

The initial state of the CTMC from Fig. 2 corresponds to finding the location of the train station. For the first request type, which is expected to occur with probability , this is followed by finding the train arrival time (state ), identifying suitable restaurants in the area (state ), obtaining a traffic report for the route from the user’s location to the station (state ), and returning the response to the user (state ).

For the second request type, which occurs with probability , state is followed by finding a possible destination (state ), and obtaining a weather forecast for this destination (state ). With a probability of the weather is unsuitable and a new destination is selected (back to state ). Once a suitable destination is selected, the traffic report is obtained for travel to the station (state ) and the response is returned to the user (state ).

Label Thid-party service URL rate
location Bing location service http://dev.virtualearth.net/REST/v1/Locations 9.62
arrivals Thales rail arrival board http://www.livedepartureboards.co.uk/ldbws/ 19.88
departures Thales rail departures board http://www.livedepartureboards.co.uk/ldbws/ 19.46
search Bing web search https://api.datamarket.azure.com/Bing/Search 1.85
weather WebserviceX.net weather service http://www.webservicex.net/globalweather.asmx 1.11
traffic Bing traffic service http://dev.virtualearth.net/REST/v1/Traffic 2.51
TABLE I: Web services considered for the web application
Fig. 3: Predicted (dashed lines) versus actual (continuous lines) property values

The component execution rates to depend on the implementations used for these components, and we consider that a team of software engineers wants to decide if the real web services from Table I are suitable for building the application. If they are suitable, the engineers need:

  • To select appropriate request-handling times to be specified in the application service-level agreement (SLA);

  • To choose a pricing scheme for the application.

Accordingly, the engineers want to assess several QoS properties of the travel application variant built using these publicly available web services:

P1 The probability of successfully handling user requests in under seconds, for .
P2 The probability of successfully handling “day trip” requests in under seconds, for .
P3 The expected profit per request handled, assuming that 1 cent is charged for requests handled within seconds and a 2-cent penalty is paid for requests not handled within 3 seconds, for .

Service response times are assumed exponentially distributed in QoS analysis based on CTMC (as well as queueing network) models. Therefore, the engineers use observed service execution times for service to estimate the service rate as

(6)

These execution times can be taken from existing logs (e.g. of other applications that use the same services) or can be obtained through testing the web services individually. Finally, a probabilistic model checker is used to analyse properties P1P3 of the resulting CTMC. For this purpose, the three properties are first formalised as transient-state CSL formulae:

(7)

The value of to be specified in the SLA is unknown a priori and hence we evaluate each property for a range of values where for P1 and P2, and for P3.

To replicate this process, we implemented a prototype version of the application and we used it to handle randomly generated requests for and . Obtaining transition probabilities for Markov chains from real-world systems, and the effects of transition probabilities on system performance, have previously been considered [50, 51]. To decouple these effects from those due to the temporal characteristics of component behaviours, we utilise fixed probabilities for our motivating example. However, for the second system used to evaluate OMNI (Section 6.1) we extract the transition probabilities from system logs, showing that OMNI also provides significant improvements in verification accuracy in this setting. We obtained sample execution times for each web service (between for arrivals and search and for location and traffic), and we applied (6) to these observations, calculating the estimate service rates from Table I. Note that these observations are equivalent to observations obtained from unit testing the six services separately. This is due to the statistical independence of the execution times of different services, which we confirmed by calculating the Pearson correlation coefficient of the observations for every pair of services – the obtained coefficient values, between and , indicate lack of correlation. We then used the model checker PRISM [14] to analyse the CTMC for these rates, and thus to predict the values of properties (7).

To assess the accuracy of the predictions, we also calculated the actual values of these properties at each time value using detailed timing information logged by our application. The error associated with a single property evaluation may be quantified as the absolute difference between actual and predicted values

(8)

The predictions obtained through CTMC analysis and the actual property values across the range of values are compared in Fig. 3. The errors reported in the figure are calculated using the distance measure recommended for assessing the overall error of CTMC/PHD model fitting in  [52, 21, 53, 54], i.e., the area difference between the actual and the predicted property values:

(9)

where for properties P1 and P2, and for property P3.222Both underestimation and overestimation of QoS property values contribute to the error because both can lead to undesirable false positives or false negatives when assessing whether QoS requirements are met. For example, overestimates of the overall success probability of a system can falsely indicate that a requirement that places a lower bound on this probability is met and the system is safe to use (false negative), while underestimates of the same property can falsely indicate that the requirement is violated and the system should not be used (false positive). Later in the paper, we will use this error measure to assess the improvements in accuracy due to the OMNI model refinement. In this section we focus on the limitations of CTMC-based transient analysis. Therefore, recall that the software engineers must make their decisions based only on the predicted property values from Fig. 3; two of these decisions and their associated scenarios are described below.

Scenario 1. The engineers note that:

  • the predicted overall success probability (property P1) at s is (marked 1a in Fig. 3), i.e., slightly over 40% of the requests are predicted to be handled within 1s;

  • the predicted day-trip success probability (property P2) at s is (1b in Fig. 3), i.e., over 36% of the day-trip requests are predicted to be handled within 1s;

  • the expected profit (property P3) at s, i.e., when charging 1 cent for requests handled within 1s, is  cents (1c in Fig. 3).

Accordingly, the engineers decide to use the services from Table I to implement the travel web application, with an SLA “promising” that requests will be handled within 1s with 0.4 success probability, “day trip” requests will be handled within 1s with 0.35 success probability, and charging 1 cent for requests handled within 1s. As shown in Fig. 3, the actual property values at s are for P1 (marked 1a in Fig. 3), for P2 (1b in Fig. 3) and  cents for P3 (1c in Fig. 3), so this decision would be wrong – both promises would be violated by a wide margin, and the actual profit would be under a third of the predicted profit.

Scenario 2. The engineers observe that the success probabilities of handling requests or “day trip” requests within 2s are below 0.8 – the predicted values for properties P1 and P2 at s are (2a in Fig. 3) and (2b in Fig. 3), respectively; and/or that the expected profit is below 0.7 cents per request when charging 1 cent for each request handled within 2s (2c in Fig. 3). As such, they decide to look for alternative services for the application. As shown by points 2a’–2c’ in Fig. 3, all the constraints underpinning this decision are actually satisfied, so the decision would also be wrong.

We chose the times and constrains in the two hypothetical decisions to show how the current use of idealised CTMC models in QoS analysis may yield invalid decisions. The fact that choosing different times and constrains could produce valid decisions is not enough: engineering decisions are meant to be consistently valid, not down to chance. It is this major limitation of traditional CTMC-based QoS analysis that our CTMC refinement method addresses as described in the next section.

(a) Empirical CDF for the service execution times (continuous lines) versus exponential models with rates computed from observed data (dashed lines)
(b) Empirical CDF for the service holding times (continuous lines) versus exponential models with rates computed from observed holding times (long dashed lines); for all services except Arrivals the difference between the two (short dashed lines) exceeds 20% for multiple values of
Fig. 4: The services from the motivating example have non-zero delays and non-exponentially distributed holding times

4 The OMNI method for CTMC refinement

4.1 Overview

OMNI addresses the refinement of high-level CTMC models of software systems that satisfy the following assumptions:

  • Each state corresponds to a component of the system, and is the probability that is the initial component executed by the system;

  • For any distinct states from , the transition rate , where represents the (known or estimated) probability (2) that component is followed by component and is obtained by applying (6) to observed execution times of component ;

  • Each state is labelled with the name of its corresponding component, which we will call “component ” for simplicity.

This CTMC model makes the standard assumption that component execution times are exponentially distributed. However, this assumption is typically invalid for two reasons. First, each component has a delay (i.e. minimum execution time) approximated by

(10)

such that its probability of completion within time units is zero. In contrast, modelling the execution time of the component as exponentially distributed with rate yields a non-zero probability of completion within time units. Second, even the holding times

(11)

of the component are rarely exponentially distributed.

Example 1.

Fig. 4a shows the empirical cumulative distribution functions (CDFs) for the execution times of the six services from our motivating example (cf. Table I), and the associated exponential models with rates given by (6). The six services have minimum observed execution times to between 45ms and 0.71s (due to network latency and request processing time), and their exponential model is a poor representation of the observed temporal behaviour. Furthermore, the best-fit exponential model of the observed holding times for these services (shown in Fig. 4b) is also inaccurate.

OMNI overcomes these significant problems by generating a refined CTMC for each QoS property of interest in two steps, and uses standard probabilistic model checking to analyse the refined CTMC. As shown in Fig. 5, the first OMNI step, called component classification, partitions the states of the high-level CTMC into subsets that require different types of refinement because of the different impact of their associated system components on the analysed property. For instance, components unused on an execution path have no effect on QoS properties (e.g. response time) associated solely with that path, and therefore their corresponding states from the high-level CTMC need not be refined. The second OMNI step, called selective refinement, replaces the states which correspond to components that impact the analysed property with new states and transitions that model the delays and holding times of these components by means of Erlang distributions [47] and phase-type distributions (PHDs) [21], respectively.

Fig. 5: OMNI CTMC refinement and verification

As shown by our experimental results from Section 6, the two-step OMNI process produces refined CTMCs that are often much smaller and faster to analyse than the CTMCs obtained by obliviously refining every state of the high-level CTMC, e.g. as done in our preliminary work from [31]. These benefits dominate the slight disadvantage of having to refine the high-level CTMC for each analysed property, which is further mitigated by our OMNI tool by caching and reusing refinement results across successive refinements of the same high-level CTMC, as described in Section 5. Likewise, modelling the delay and holding time of system components separately (rather than using single-PHD fitting) yields smaller and more accurate refined models, in line with existing theory [46] and our preliminary results from [31].

Several factors can impede or impact the success of our OMNI method:

  • Components with execution times that are not statistically independent. Markov models assume that the transition rates associated with different states are statistically independent. If the execution times of different components are not independent (e.g., because the components are running on the same server), then this premise is not satisfied, and OMNI cannot be applied.

  • Changing component behaviour. If the system components change their behaviour significantly over time, then OMNI cannot predict the changed behaviour. This is a more general difficulty with model-based prediction.

  • Insufficient observations of component execution times. The accuracy of OMNI-refined models decreases when fewer observations of the system components are available. We provide details about the impact of the training dataset size on the OMNI accuracy in Section 6.4.

The component classification and selective refinement steps of OMNI are presented in the rest of this section.

4.2 Component classification

Given a high-level CTMC model of a system, and a QoS property encoded by the transient CSL formula , this OMNI step builds a partition

(12)

of the state set . Intuitively, the “eXclude-from-refinement” set will contain states with zero probability of occurring on paths that satisfy ; the “Once-only” set will contain states with probability of appearing once and only once on every path that satisfies ; and each “together” set will contain states that can only appear as a sequence on paths that satisfy . Formal definitions of the disjoint sets , , and to and descriptions of their roles in OMNI are provided in Sections 4.2.14.2.3.

4.2.1 Exclude-from-refinement state sets

Definition 4.1.

The exclude-from-refinement state set associated with an until path formula over the continuous-time Markov chain is the set of CTMC states

(13)

where, for each state , is extended with an atomic proposition also named ‘’ that is true in state and false in every other state. Thus, comprises all states for which the probability of reaching a state satisfying along paths that do not contain state and on which holds in all preceding states is the same as the probability of reaching a state that satisfies along paths on which holds in all preceding states.

Theorem 1.

Let be the exclude-from-refinement state set associated with the until path formula over the continuous-time Markov chain with atomic proposition set . Then, for any , the probability does not depend on the transition times from states in .

Proof.

The proof is by contradiction. Consider a generic state and the following sets of paths:

As and , we have . However, according to (13), , so .

Assume now that the time spent by the CTMC in state has an impact on the value of over for an interval . This requires that, at least for some (possibly very small) values of the time spent in , appears on paths from a set

such that ; otherwise, varying cannot have any impact on

However, since we must have , which contradicts our earlier finding that , completing the proof. ∎

Theorem 1 allows OMNI to leave the states from unrefined with no loss of accuracy in the QoS analysis results. The theorem also provides a method for obtaining by computing the until formula for each state of the high-level CTMC (i.e. for each system component) and comparing the result with the value of the CSL formula , which is only computed once. Existing probabilistic model checkers compute these unbounded until formulae very efficiently, as they only depend on the probabilities (2) of transition between CTMC states and not on the state transition rates [32, 38].333To asses the time taken by model checking, an experiment was carried out to evaluate each state from the motivating example for inclusion in . This experiment was repeated 30 times and the average time taken by model checking each state was found to be 1.6ms.

Example 2.

Consider the QoS properties (7) of the web application from our motivating example. For property P2 and the high-level CTMC model from Fig. 2, we have

(and for any other state ), so for P2. Applying Theorem 1 to the other two properties from (7) yields .

4.2.2 Once-only state sets

Definition 4.2.

The once-only state set associated with an until path formula over the continuous-time Markov chain is the set

(14)

where the until formula holds for paths that reach state without going through any states from (which corresponds to labelling the states from with the atomic proposition ‘’).

The next theorem asserts that for every state from , can be calculated by applying the probability measure to the set of paths which, in addition to satisfying the clause specified by the CSL semantics (i.e., ), contain once and only once before time instant . Using the unique existential quantifier , the last clause can be formalised as , where is the time spent in the -th state on the path (cf. Section 2.1).

Theorem 2.

Let be the once-only state set associated with the until path formula over the continuous-time Markov chain . Then, for any state and interval ,

(15)
Proof.

Let denote the subset of from (15). According to CSL semantics, where

Since , we have , so to prove the theorem we must show that . To this end, we partition into two disjoint subsets: , comprising the paths that do not contain state before time from the first line of (15), and , comprising the paths that contain state before time more than once. Since holds (according to the definition of ), . Similarly, since holds, the set of paths satisfying and containing twice (without reaching states in ) occur with probability zero. As is included in this set, we necessarily have . We conclude that , which completes the proof. ∎

OMNI exploits Theorem 2 in two ways. First, since states correspond to system components always executed before becomes true, for any interval , where is the delay (10) of the component associated with state . Therefore, OMNI returns a zero probability in this scenario without performing probabilistic model checking. Second, because the components associated with states are executed precisely once on relevant CTMC paths, no modelling of their delays is required, and OMNI only needs to model the holding times of these states. Importantly, obtaining to enable these simplifications only requires the probabilities of unbounded until and next path formulae (cf. (14)), which probabilistic model checkers can compute efficiently for the reasons we explained earlier in this section.

Example 3.

Consider property P1 from the QoS properties (7) in our motivating example: . In line with definition (14), we obtain the set for this property by first evaluating the following CSL formulae for the high-level CTMC from Fig. 2:

  • which holds as

  • , which holds only for states and .

The constraint is then checked only for the -candidate states and , taking into account the fact that (cf. Example 2). For instance, since only for , and , we conclude that . Similarly, only if and , so , giving . It is easy to show that the same “once-only” state set is obtained for the other two properties from (7).

4.2.3 Together state sets

1:function TogetherSeqs(, , )
2:    ,
3:    while  do
4:        
5:        ,
6:        
7:        while  do
8:           if  then
9:               
10:               if  NIL then
11:                  ,
12:               else
13:                  
14:               end if
15:           end if
16:           if  then
17:               
18:               if  NIL then
19:                  ,
20:               else
21:                  
22:               end if
23:           end if
24:        end while
25:        
26:    end while
27:    return
28:end function
29:
30:function Pred()
31:    if then return NIL end if
32:    for  do
33:        if
34:                       then
35:              return
36:        end if
37:    end for
38:    return NIL
39:end function
40:
41:function Succ()
42:    for  do
43:        if
44:                       then
45:              return
46:        end if
47:    end for
48:    return NIL
49:end function
Algorithm 1 Generation of “together” state sequences

Finally, the result in this section supports the calculation and exploitation of the “together” state sets from (12).

Definition 4.3.

The together state sets for an until path formula over the Markov chain are the state sets comprising the same elements as the state sequences returned by function TogetherSeqs(, , ) from Algorithm 1, where and are the exclude-from-refinement and once-only state sets for the formula.

The function TogetherSeqs builds the state sequences in successive iterations of its outer while loop (lines 3–26). The set maintains the states yet to be allocated to sequences (initially , cf. line 2), and each new sequence starts with a single element picked randomly from (line 4). The inner while loop in lines 7–24 “grows” this sequence. First, the if statement in lines 8–15 tries to grow the sequence to the left with a state that “precedes” the sequence, in the sense that the only outgoing CTMC transition from is to the sequence head, and the only way of reaching the sequence head is through an incoming CTMC transition from . Analogously, the if statement in lines 16-23 grows the sequence to the right, by appending to it the state that “succeeds” the state at the tail of the sequence, if such a “successor” state exists. The predecessor and successor states of a state are computed by the functions Pred and Succ, respectively, where these functions return NIL if the states they attempt to find do not exist. The inner while loop terminates when the set becomes empty or the sequence has no more predecessors or successors, so the flags and are set to in lines 13 and 21, respectively. On exit from this while loop, the sequence is added to the set of sequences , which is returned (line 27) after the outer while loop also terminates when becomes empty. Termination is guaranteed since at least one element is removed from in each iteration of this while loop (in line 5).

To analyse the complexity of TogetherSeqs, we note that the worst case scenario corresponds to and to the function returning only sequences of length , in which case the outer while loop is executed times with both Pred and Succ invoked once in each iteration. The if statements from Pred and Succ perform comparisons, and are executed within for loops with iterations, yielding an complexity for each function, and an overall complexity for the algorithm.

Theorem 3.

If is one of the sequences returned by TogetherSeqs, a path that satisfies for an interval , and the earliest time when (with for all ), then up to time the states from can only appear on as complete sequences .

Proof.

The case is trivial, so we assume in the rest of the proof. We have two cases: either contains no states from , or it contains at least one state from . In the former case, the theorem is proven. In the latter case, consider any state that occurs on , . The states , , …, must also occur on , in this order and just before , as transitioning through each of these states is the only way to reach in the CTMC. Moreover, , , …, must immediately follow on (in this order) because is not an absorbing state and its only outgoing transition is to , etc. Hence, the path is of the form