# Compositional bisimulation metric reasoning with Probabilistic Process Calculi

## Abstract

We study which standard operators of probabilistic process calculi allow for compositional reasoning with respect to bisimulation metric semantics. We argue that uniform continuity (generalizing the earlier proposed property of non-expansiveness) captures the essential nature of compositional reasoning and allows now also to reason compositionally about recursive processes. We characterize the distance between probabilistic processes composed by standard process algebra operators. Combining these results, we demonstrate how compositional reasoning about systems specified by continuous process algebra operators allows for metric assume-guarantee like performance validation.

kgl@cs.aau.dk This research is partially supported by the European FET projects SENSATION and CASSTING and the Sino-Danish Center IDEA4CPS.

## 1Introduction

Probabilistic process algebras, such as probabilistic CCS [33], CSP [33] and ACP [3], are languages that are employed to describe probabilistic concurrent communicating systems, or probabilistic processes for short. Nondeterministic probabilistic transition systems [43] combine labeled transition systems [34] and discrete time Markov chains [45]. They allow us to model separately the reactive system behavior, nondeterministic choices and probabilistic choices.

Behavioral semantics provide formal notions to compare systems. Behavioral equivalences are behavioral semantics that allow us to determine the observational equivalence of systems by abstracting from behavioral details that may be not relevant in a given application context. In essence, behavioral equivalences equate processes that are indistinguishable to any external observer. The most prominent example is bisimulation equivalence [36], which provides a well-established theory of the behavior of probabilistic nondeterministic transition systems.

Recently it became clear that the notion of behavioral equivalence is too strict in the context of probabilistic models. The probability values in those models originate either from observations (statistical sampling) or from requirements (probabilistic specification). Behavioral equivalences such as bisimulation equivalence are binary notions that can only answer the question if two systems behave precisely the same way or not. However, a tiny variation of the probabilities, which may be due to a measurement error or limitations how precise a specified probabilistic choice can be realized in a concrete system, will make these systems behaviorally inequivalent without any further information. In practice, many systems are approximately correct. This leads immediately to the question of what is an appropriate notion to measure the quality of the approximation. The most prominent notion is behavioral metric semantics [16] which provides a behavioral distance that characterizes how far the behavior of two systems is apart. Bisimulation metrics are the quantitative analogue to bisimulation equivalences and assign to each pair of processes a distance which measures the proximity of their quantitative properties. The distances form a pseudometric^{1}

In order to specify and verify systems in a compositional manner, it is necessary that the behavioral semantics is compatible with all operators of the language that describe these systems. For behavioral equivalence semantics there is common agreement that compositional reasoning requires that the considered behavioral equivalence is a congruence with respect to all language operators. For example, consider a term which describes a system consisting of subcomponents and that are composed by the binary operator . When replacing with a behaviorally equivalent , and with a behaviorally equivalent , congruence of the operator guarantees that the composed system is behaviorally equivalent to the resulting replacement system . This implies that equivalent systems are inter-substitutable: Whenever a system in a language context is replaced by an equivalent system , the obtained context is equivalent to . The congruence property is important since it is usually much easier to model and study (a set of) small systems and then combine them together rather than to work with a large monolithic system.

However, for behavioral metric semantics there is no satisfactory understanding of which property an operator should satisfy in order to facilitate compositional reasoning. Intuitively, what is needed is a formalization of the idea that systems close to each other should be approximately inter-substitutable: Whenever a system in a language context is replaced by a close system , the obtained context should be close to . In other words, there should be some relation between the behavioral distance between and and the behavioral distance between and . This ensures that any limited change in the behavior of a subcomponent implies a smooth and limited change in the behavior of the composed system (absence of chaotic behavior when system components and parameters are modified in a controlled manner). Earlier proposals such as non-expansiveness [16] and non-extensiveness [6] are only partially satisfactory for non-recursive operators and even worse, they do not allow at all to reason compositionally over recursive processes. More fundamentally, those proposals are kind of ‘ad hoc’ and do not capture systematically the essential nature of compositional metric reasoning.

In this paper we consider uniform continuity as a property that generalizes non-extensiveness and non-expansiveness and captures the essential nature of compositional reasoning w.r.t. behavioral metric semantics. A uniformly continuous binary process operator ensures that for any non-zero bisimulation distance (understood as the admissible tolerance from the operational behavior of the composed process ) there are non-zero bisimulation distances and (understood as the admissible tolerances from the operational behavior of the processes and ) such that the distance between the composed processes and is at most whenever the component (resp. ) is in distance of at most from (resp. at most from ). Uniform continuity ensures that a small variance in the behavior of the parts leads to a bounded small variance in the behavior of the composed processes. Since uniformly continuous operators preserve the convergence of sequences, this allows us to approximate composed systems by approximating its subsystems. In summary, uniform continuity allows us to investigate the behavior of systems by disassembling them into their components, analyze at the component level, and then derive properties of the composed system. We consider the uniform notion of continuity (technically, the depend only on and are independent of the concrete systems ) because we aim at universal compositionality guarantees. As important notion of uniform continuity we consider Lipschitz continuity which ensures that the ratio between the distance of composed processes and the distance between its parts is bounded.

Our main contributions are as follows:

We develop for many non-recursive and recursive process operators used in various probabilistic process algebras tight upper bounds on the distance between processes combined by those operators (Sections Section 3.2 and Section 4.2).

We show that non-recursive process operators, esp. (nondeterministic and probabilistic variants of) sequential, alternative and parallel composition, allow for compositional reasoning w.r.t. the compositionality criteria of non-expansiveness and hence also w.r.t. both Lipschitz and uniform continuity (Section Section 3).

We show that recursive process operators, e.g. (nondeterministic and probabilistic variants of) Kleene-star iteration and -calculus bang replication, allow for compositional reasoning w.r.t. the compositionality criterion of Lipschitz continuity and hence also w.r.t. uniform continuity, but not w.r.t. non-expansiveness and non-extensiveness (Section Section 4).

We discuss the copy operator proposed in [7] to specify the fork operation of operating systems as an example of operator allowing for compositional reasoning w.r.t. the compositionality criterion of uniform continuity, but not w.r.t. Lipschitz continuity.

We demonstrate the practical relevance of our methods by reasoning compositionally over a network protocol built from uniformly continuous operators. In detail, we show how to derive performance guarantees for the entire system from performance assumptions about individual components. In reverse, we show also how to derive performance requirements on individual components from performance requirements of the complete system (Section Section 5).

## 2Preliminaries

### 2.1Probabilistic Transition Systems

We consider transition systems with process terms as states and labeled transitions taking states to distributions over states. Process terms are inductively defined by process combinators.

The rank function gives by the arity of operator . We call operators with arity *constants*. If the rank of is clear from the context we will use the symbol for . We may write as shorthand for with .

Terms are defined by structural recursion over the signature. We assume an infinite set of *state variables* disjoint from .

We write for if is a constant. The set of *closed state terms* is abbreviated as . The set of *open state terms* is abbreviated as . We may refer to operators in as *process combinators*, to state variables in as *process variables*, and to closed state terms in as *processes*.

A probability distribution over the set of closed state terms is a mapping with that assigns to each closed term its respective probability . The probability mass of a set of closed terms in some probability distribution is given by . We denote by the set of all probability distributions over . We let range over .

Next, we introduce a language to describe probability distributions. We assume an infinite set of *distribution variables* and let range over . We denote by the set of state and distribution variables and let range over .

Distribution terms have the following meaning. A *distribution variable* is a variable that takes values from . An *instantiable Dirac distribution* is an expression that takes as value the Dirac distribution when state variables in are substituted such that becomes the closed term . Case ? allows us to construct convex combinations of distributions. Case ? lifts structural recursion from state terms to distribution terms.

The set of *closed distribution terms* is abbreviated as . The set of *open distribution terms* is abbreviated as . We write for with and . Furthermore, for binary operators we may use the infix notaion and write for .

A substitution is *closed* if for all and for all . Notice that closed distribution terms denote distributions in .

Probabilistic nondeterministic labelled transition systems [43], PTSs for short, extend labelled transition systems by allowing for probabilistic choices in the transitions. As state space we will take the set of all closed terms .

We call a *transition* from state to distribution labelled by action . We write for . Moreover, we write if there exists some distribution with , and if there is no distribution with . For a closed term and an action , let denote the set of all distributions reachable from by performing an -labeled transition. We call also the *-derivatives* of .

We say that a PTS is *image-finite* if is finite for each closed term and action . In the rest of the paper we assume to deal with image finite PTSs.

### 2.2Bisimulation metric

Bisimulation metric^{2}

We will define later bisimulation metrics as -bounded pseudometrics that measure how much two states disagree on their reactive behavior and their probabilistic choices. Note that a pseudometric permits that even if and are different terms (in contrast to a metric ). This will allow us to assign distance to different bisimilar states. We will provide two (equivalent) characterizations of bisimulation metrics in terms of a coinductive definition pattern and in terms of fixed points.

Both characterizations require the following lattice structure. Let be the complete lattice of functions ordered by iff for all . Then for each the supremum and infinimum are and for all . The bottom element is the constant zero function given by , and the top element is the constant one function given by , for all .

#### Metrical lifting

Bisimulation metric is characterized using the quantitative analogous of the bisimulation game, meaning that two states at some given distance can mimic each other’s transitions and evolve to distributions that are at distance not greater than the distance between the source states. Technically, we need a notion that lifts pseudometrics from states to distributions (to capture probabilistic choices).

A -bounded pseudometric on terms is lifted to a -bounded pseudometric on distributions by means of the Kantorovich pseudometric [15]. This lifting is the quantitative analogous of the lifting of bisimulation equivalence relations on terms to bisimulation equivalence relations on distributions [49].

A *matching* for a pair of distributions is a distribution over the product state space with left marginal , i.e. for all , and right marginal , i.e. for all . Let denote the set of all matchings for . Intuitively, a matching may be understood as a transportation schedule that describes the shipment of probability mass from to . Historically this motivation dates back to the Monge-Kantorovich optimal transport problem [51].

In order to capture nondeterministic choices, we need to lift pseudometrics on distributions to pseudometrics on sets of distributions.

#### Coinductive characterization

A -bounded pseudometric is a bisimulation metric if for all pairs of terms and each transition of can be mimicked by a transition of with the same label and the distance between the accessible distributions does not exceed the distance between and . By means of a *discount factor* , we allow to specify how much the behavioral distance of future transitions is taken into account [11]. The discount factor expresses no discount, meaning that the differences in the behavior between and are considered irrespective of after how many steps they can be observed.

We refer to as the bisimulation transfer condition. We call the smallest (w.r.t. ) -bisimulation metric *-bisimilarity metric* [13] and denote it by the symbol . We mean by *-bisimulation distance* between and the distance . If is clear from the context, we may refer by bisimulation metric, bisimilarity metric and bisimulation distance to -bisimulation metric, -bisimilarity metric and -bisimulation distance. Moreover, we may call the -bisimilarity metric also non-discounting bisimilarity metric. Bisimilarity equivalence is the kernel of the -bisimilarity metric [16], namely iff and are bisimilar.

#### Fixed point characterization

We provide now an alternative characterization of bisimulation metric in terms of prefixed points of an appropriate monotone bisimulation functional [13]. Bisimilarity metric is then the least fixed point of this functional. Moreover, the fixed point approach allows us also to express up-to- bisimulation metrics which measure the bisimulation distance for only the first transition steps.

It is easy to show that is a monotone function on . The following Proposition characterizes bisimulation metrics as prefixed points of .

Proposition ? provides the fixed point characterization of bisimulation metrics and shows that it coincides with the coinductive characterization of Definition ?. Since is a monotone function on the complete lattice , we can characterize the bisimilarity metric as least fixed point of .

Moreover, the fixed point approach allows us to define a notion of bisimulation distance that considers only the first trasnsition steps.

We call the up-to- bisimulation distance between and .

If the PTS is image-finite and, moreover, for each transition we have that the support of is finite, then is monotone and continuous, which ensures that the closure ordinal of is [48]-Section 3. As a consequence, up-to- bisimulation distances converge to the bisimulation distances when , which opens the door to show properties of the bisimulation metric by using a simple inductive argument [48].

#### Properties of bisimulation metrics

We give now an important property of bisimulation metrics that will be essential for the argumentation later in the technical sections.

The bisimulation distance between states and measures the difference of the reactive behavior of and (i.e. which actions can or cannot be performed) along their evolution. An important distinction is if two states can perform the same initial actions. In this case, the behavioral distance is given by the bisimulation game on the derivatives. Otherwise, the two states get the maximal distance of assigned since there is a transition by one of these states that cannot be mimicked by the other state.

We say that states and *do not totally disagree* if . If states do not totally disagree, then they agree on which actions they can perform immediately.

We start with Proposition ?. ? and reason as follows.

Now we show Proposition ?. ?. By Proposition ? we get that implies . The thesis follows now from Proposition ?. ?.

Moreover, if the implications in both cases also hold in the other direction.

#### Properties of the Kantorovich lifting

The Kantorovich pseudometric satisfies important properties that will be essential to prove our technical results. In detail, the Kantorovich lifting functional is monotone, the Dirac operator is an isometric embedding of the metric space of states into the metric space of distributions, and probabilistic choice distributes over the Kantorovich lifting.

Now we will show a very important new result stating that the Kantorovich lifting preserves concave moduli of continuity of language operators. In other words, moduli of continuity of language operators distribute over probabilistic choices.

We assume to be an optimal matching such that , i.e. a matching between and which yields the Kantorovich distance . We define a new distribution over the product space by

for all . First, we show that is a joint probability distribution with left marginal and right marginal . The left marginal is

with by induction over with induction step

The right marginal is computed analogously. Hence, , i.e. is a matching for distributions and .

The proof obligation can be derived now by

whereby the reasoning steps are derived as follows: step 1 from the fact that is a matching for distributions and , step 2 by the definition of , step 3 by the assumption , step 4 by using Jensen’s inequality for the concave function , step 7 by , and step 8 by the definition of .

### 2.3PGSOS Specifications

We will specify the operational semantics of operators by SOS rules in the probabilistic GSOS format [5]. The probabilistic GSOS format, PGSOS format for short, is the quantitative generalization of the classical nondeterministic GSOS format [7]. It is more general than earlier formats [37] which consider transitions of the form modeling that term reaches through action the term with probability . The probabilistic GSOS format allows us to specify probabilistic nondeterministic process algebras, such as probabilistic CCS [33], probabilistic CSP [33] and probabilistic ACP [3].

The PGSOS constraints ?– ? are precisely the constraints of the nondeterministic GSOS format [7] where the variables in the right-hand side of the literals are replaced by distribution variables.

The last property ensures that the supported model (Defintion ?) is image-finite such that the fixed point characterization of bisimulation metrics coincides with the coinductive characterization (Proposition ?).

The operational semantics of terms is given by inductively applying the respective PGSOS rules. Then, a supported model of a PTSS describes the operational semantics of all terms. In other words, a supported model of a PGSOS specification is a PTS with transition relation such that contains all and only those transitions for which the rules of offer a justification.

The supported transitions of a PTSS form the supported model of .

Each PTSS in PGSOS format has a supported model which is moreover unique [7]. We call the single supported PTS of a PTSS also the *induced model* of .

Intuitively, a term represents the composition of terms by operator . A rule specifies some transition that represents the evolution of the composed term by action to the distribution .

## 3Non-recursive processes

We start by discussing compositional reasoning over probabilistic processes that are composed by non-recursive process combinators. First we introduce the most common non-recursive process combinators, then study the distance between processes composed by these combinators, and conclude by analyzing their compositionality properties. Our study of compositionality properties generalizes earlier results of [16] which considered only a small set of process combinators and only the compositionality property of non-expansiveness. The development of tight bounds on the distance between composed processes (necessary for effective metric assume-guarantee performance validation) is novel.

### 3.1Non-recursive process combinators

We introduce now a probabilistic process algebra that comprises many of the probabilistic process combinators from CCS [33] and CSP [33]. Assume a set of actions , with denoting the successful termination action. Let be the signature with the following operators:

constants (stop process) and (skip process);

a family of -ary probabilistic prefix operators with , , and ;

binary operators

(sequential composition),

(alternative composition),

(probabilistic alternative composition), with ,

(synchronous parallel composition),

(asynchronous parallel composition),

(probabilistic parallel composition), with , and

for each for each (CSP-like parallel composition).

The PTSS is given by the set of PGSOS rules in Table ? and Table ?.

The probabilistic prefix operator expresses that the process can perform action and evolves to process with probability . Sometimes we write for and for (deterministic prefix operator). The sequential composition and the alternative composition are as usual. The synchronous parallel composition describes the simultaneous evolution of processes and , while the asynchronous parallel composition describes the interleaving of and where both processes can progress by alternating at any rate the execution of their actions. The CSP-like parallel composition describes multi-party synchronization where and synchronize on actions in and evolve independently for all other actions.

The probabilistic variants of the alternative composition and the asynchronous parallel composition replace the nondeterministic choice of their non-probabilistic variant by a probabilistic choice. The probabilistic alternative composition evolves to the probabilistic choice between a distribution reached by (with probability ) and a distribution reached by (with probability ) for actions which can be performed by both processes. For actions that can be performed by either only or only , the probabilistic alternative composition behaves just like the nondeterministic alternative composition . Similarly, the probabilistic parallel composition evolves to a probabilistic choice (with respectively the probability and ) between the two nondeterministic choices of the nondeterministic parallel composition for actions which can be performed by both and . For actions that can be performed by either only or only , the probabilistic parallel composition behaves just like the nondeterministic parallel composition .

### 3.2Distance between processes combined by non-recursive process combinators

We develop now tight bounds on the distance between processes combined by the non-recursive process combinators presented in Table ? and Table ?. This will allow us to derive the compositionality properties of those operators. As we will discuss two different compositionality properties for non-recursive process combinators (non-extensiveness, Definition ?, and non-expansiveness, Definition ?), we split in this section the discussion on the distance bounds accordingly. We use disjoint extensions of the specification of the process combinators in order to reason over the composition of arbitrary processes.

We will express the bound on the distance between composed processes and in terms of the distance between their respective components and . Intuitively, given a probabilistic process we provide a bound on the distance to the respective probabilistic process where each component is replaced by the component .

We start with those process combinators that satisfy the later discussed compositionality property of non-extensiveness (Definition ?).

First we consider the probabilistic prefix operator (Proposition ?. ?). The only transitions from and are and . Hence we need to show that . This property can be derived by Proposition ? as follows:

We proceed with the alternative composition operator (Proposition ?. ?). If either or then the statement is trivial since is a -bounded pseudometric. Hence, we assume and . We consider now the two different rules specifying the alternative composition operator and show that in each case whenever is derivable by some of the rules then there is a transition derivable by the same rule s.t. .

Assume that is derived from . Since and satisfies the transfer condition of the bisimulation metrics, there exists a transition for a distribution with . Finally, from we derive .

Assume that is derived from . The argument is the same of the previous case.

We conclude with the probabilistic alternative composition operator (Proposition ?. ?). If either or then the statement is trivial since is a -bounded pseudometric. Hence, we assume and . We consider now the three different rules specifying the probabilistic alternative composition operator and show that in each case whenever is derivable by some of the rules then there is a transition derivable by the same rule s.t. .

Assume that is derived from and . Since and satisfies the transfer condition of the bisimulation metrics, there exists a transition with . Since , by Proposition ?. ? the processes and agree on the actions they can perform immediately. Thus . Hence we can derive the transition .

Assume that is derived from and