Probabilistic Rewriting: Relations between Normalization, Termination, and Unique Normal Forms

Probabilistic Rewriting: Relations between Normalization, Termination, and Unique Normal Forms


We investigate how techniques from Rewrite Theory can help us to study calculi whose evaluation is both probabilistic and non-deterministic (think untyped probabilistic -calculus, in which non-determinism arises from choosing between different redexes). We are interested in relations between weak and strong normalization, and whenever the result is unique. We also investigate methods to compare strategies.

1 Introduction

Rewriting Theory [34] is a foundational theory of computing. Its impact extends to both the theoretical side of computer science, and the development of programming languages. A clear example of both these influences is the paradigmatic term rewriting system, -calculus, which can be seen as the foundation of functional programming.

Abstract Rewriting Systems (ARS) is the general theory which captures the common substratum of rewriting theory, independently of the particular structure of the objects. It studies properties of terms transformations, such as normalization, termination, confluence, unique normal forms, and the relations among them. Such results are a powerful set of tools which can be used when we study the computational and operational properties of any calculus or programming language. Furthermore, the theory provides tools to study and compare strategies, which become extremely important when a term may have reductions leading to a normal form, but it is not necessarily terminating. For such a system, we need to know: is there a strategy which is guaranteed to lead to a normal form, if any exists (normalizing strategies)? Which strategies diverge if at all possible (perpetual strategies)?

Probabilistic Computation models uncertainty. It has long been central to several areas of theoretical computer science (computational complexity, cryptography, randomized computation) which have developed probabilistic computational models such as automata [29], Turing machines [32], and the -calculus [31]. The pervasive role it is assuming in areas as diverse as robotics, machine learning, natural language processing, has stimulated the research on probabilistic programming languages [30, 20, 26] whose development is increasingly large. A typical programming language supports at least discrete distributions by providing a probabilistic construct which models sampling from a distribution. This is also the more concrete way to endow the -calculus with probabilistic choice [27, 24, 13].

Within the considerable research on models of probabilistic systems, we wish to mention that probabilistic rewriting is the explicit base of PMaude [1], a language for specifying probabilistic concurrent systems which has both a rigorous formal basis and the characteristics of a high-level programming language. In PMaude, rewriting is both probabilistic and non-deterministic.

Probabilistic Rewriting. Somehow surprisingly, while a large and mature body of work supports the study of rewriting systems – even infinitary ones [10, 17] – work on the abstract theory of probabilistic rewriting systems is sparse. The notion of Probabilistic Abstract Reduction Systems (PARS) has been introduced by Bournez and Kirchner in [6], and then extended in [5] to account for non-determinism. Recent work [22, 12, 18, 2] shows an increased attention.

{forest} L/.style= edge label=node[left,blue,font=]#1 , for tree= grow=0,reversed, parent anchor=east,child anchor=west, edge=line cap=round,outer sep=+1pt, l sep=8mm [a, [a, L=1/2, [a, L=1/4,[…],[tt]] [tt, L=1/4]] [tt, L=1/2] ]

Figure 1:

The key element in probabilistic rewriting is that even when the probability that a term leads to a normal form is (almost sure termination), that degree of certitude is typically not reached in any finite number of steps, but it appears as a limit. Think of a rewrite rule (as in Fig. 1) which rewrites to either the value or , with equal probability . We write this .

The most well-developed literature on PARS is that concerned with methods to prove almost sure termination [5, 14, 2] (indeed, the work presented in this paper was triggered by [22], where –out of need– the authors had to developed basic ARS-like results for PARS). Despite the lack of a substantial PARS theory, considering rewrite rules subject to probabilities opens numerous questions, both on the abstract properties of the PARS and on the proof techniques, which motivate the investigation in this paper.

We consider a rewrite relation on distributions. A “limit normal form” is a distribution over all the possible values of the program. This is coherent with the point of view that the meaning of a program is the value it evaluates to. The intuition is as follows (see [20]). Imagine an experiment in which the program is executed, and the random choices are made by sampling. This process defines a distribution over the various outputs that the program can produce. We write this . Observe that the limit distribution needs not to have total measure , because some runs may diverge.

What happens if the evaluation of a term is non-deterministic? Non-determinism arises naturally in the -calculus, because a term may have several redexes giving raise to different possible reductions. Below (Sec. 1.1) we discuss this, by looking at an example of probabilistic -term which reaches different (and incomparable) limit distributions.

The following questions on PARS are then natural:

  • the possibility to reach a limit distribution of measure (analogous to weak normalization);

  • the necessity to reach a limit distribution of measure (analogous to strong normalization);

  • if and , is ? (analogous to the unique normal form property).

Content and contributions. After introducing and motivating our formalism (Sec. 3), we investigate how to generalize the notions of weak normalization (WN), strong normalization (SN) and unique normal form (UN) to the probabilistic setting (Sec. 4). We then investigate if and how techniques from ARS can help us to understand such properties, and uncover relations between them.

We always pay close attention to local conditions. In particular, we develop a family of methods which exploit and generalize a proposal by Von Oostrom [35] and which is based on Newman’s property of Random Descent [25] (see Sec. 1.1). This turns out to provide fruitful proof techniques which are well suited to PARS.

The Random Descent property expresses the fact that non-determinism in the evaluation of a term is irrelevant w.r.t. properties of interest, such as the fact that a reduction leads to normal form and its length. We investigate a similar property in the probabilistic setting (Sec. 5), and characterize it by a local condition which we then extend (Sec. 7) to a method to compare strategies (“strategy is always better than strategy ”).

A contribution and application of our results is the introduction of weak (Sec. 6), a well behaved probabilistic -calculus whose evaluation is non-deterministic, but which has unique normal forms.

1.1 A motivating example, and some background

. Let us consider a paradigmatic calculus, [27, 24, 13], the untyped -calculus, extended with a binary operator which models probabilistic choice. Let here be just flipping a fair coin. reduces to either or with equal probability ; we write this as . Consider

where and are boolean constants, is a divergent term, and is a binary operator computing the exclusive or. Depending on which redex we fire first, may evaluate to the distribution (see also Ex. 37) or to the distribution . The two distributions are not even comparable.

The fact that the -calculus is confluent, implies that even if the reduction of a term is possibly nonterminating, if a normal form exists, it is unique. This is not the case for . The typical way out of probabilistic -calculi (e.g. [27, 24]) is to fix a deterministic reduction strategy. But is this satisfactory? There are at least two reasons to say is not.

  • Deterministic strategy, parallel implementation. We can impose a deterministic strategy, such as “leftmost-outermost”. However, even with that, it may still be desirable to have a parallel implementation, which necessarily will introduce non-determinism in the evaluation. This is exactly what happens in [22], where our probabilistic calculus is deterministic, however its parallel implementation (into probabilistic nets and a probabilistic token machine) admits non-deterministic reductions.

  • Determinism of the evaluation can be relaxed without damage. In Sec. 6 we introduce weak , a probabilistic calculus whose evaluation is non-deterministic, but has unique normal forms, and indeed it is as well behaved as the standard weak call-by-value -calculus.

Weak call-by-value -calculus. The -calculus can be seen both as an equational theory on -terms and as an abstract model of computation. With the latter point of view, the meaning of any -term is the value it evaluates to. In this setting, abstractions are seen as values, and is therefore natural to consider weak evaluation [15, 8] (i.e. reduction does not evaluate function bodies, aka the scope of -abstractions). In practical implementations, weak evaluation is more realistic than the full beta reduction (see [4, 3]); it corresponds to runtime systems in functional languages, since runtimes pass arguments to functions and never compute function bodies. Functional programming languages are indeed based on a simplified form of -calculus, with two crucial restrictions: evaluation is weak, and terms are closed (that is, they have no free variables). In particular, weak call-by-value is the basis of all the ML/CAML family of functional languages; we will study it in the probabilistic setting.

The weak call-by-value -calculus has a striking property. First, if a term has a normal form, then any sequence of reductions will find it; second, the number of steps such that is always the same (see e.g. [23] for an account). This means, any choice of reduction is normalizing, and no one is better then another. The reason for this is an ARS property called Random Descent (RD), of which the diamond property is an instance.

Random Descent. Newman’s Random Descent property (RD) [25] is an ARS property which guarantees that it suffices to prove normalization to establish both termination and uniqueness of normal forms. In fact, it does more than that. If a system has random descent, reductions to normal form need not to be unique, but they have unique length. In its essence, it says:

(RD) “if a normal form exists, all reductions reach it, and all have the same length”,

or, in the original Newman’s terminology: the end-form is reached by random descent. (More formally, whenever and with in normal form, all maximal reductions from have length and end in ).

In [35], Von Oostrom defines a characterization of RD by a local property1 and proposes Random Descent as a uniform method to (locally) compare strategies for normalization and minimality (resp. perpetuality and maximality). Recent work [36] extends the method and abstracts the notion of length into a notion of measure.

In Sections 5 and 7 we investigate an analogous of the RD method. This turns out to provide a family of fruitful proof techniques which are well suited to PARS.

On proof techniques for probabilistic rewriting. In the probabilistic setting, termination involves limits. One should therefore not assume that standard properties of ARS hold for PARS. To investigate ARS-like properties, there are two issues: we need to find the right formulation and the right proof technique. The game-changer in the proof techniques is that the rewriting relation is not well-founded. A typical example is the Newman Lemma (see Sec. 4.2 for a more technical discussion), which is known not to hold in general for infinitary rewriting [16, 19]. All proofs of Newman’s Lemma rely on a fundamental fact of ARS: termination implies that reduction is a well-founded relation. This is not the case in infinitary rewriting.

The beauty of local. To work locally means to reduce a test problem which is global (quantifying over all reductions from a term), to local properties (quanfifying only over one-step reductions from the term). This dramatically reduces the space of search when testing a property.

The privileged way to prove that an ARS has unique normal forms is by proving a stronger property, confluence (). Confluence is however not a necessary condition; moreover, its global nature makes it a difficult condition to establish. There is a standard way to factorize the problem: (1) prove termination and (2) prove a local property, local confluence (). This is exactly Newman’s lemma: Termination + . Other local properties (in particular the diamond property) imply confluence. Let us remind the definitions ( indicate a one-step reduction, multiple steps, see Sec. 2.1).

  • Confluence (CR): implies s.t. .

  • Local confluence (WCR): , implies such that .

  • Diamond property: implies or s.t.

The beauty of Newman’s lemma is that a global property (CR) is guaranteed by a local property (WCR). In the rest of the paper we will always aim at local conditions.

2 Basics

We assume the reader familiar with the basic notions of rewrite theory (Ch. 1 of [34]). We assume only some familiarity with discrete probability theory (see e.g.[7]). Let revise the basics of both languages.

2.1 Basics on ARS

An abstract rewrite system (ARS) is a pair consisting of a set and a binary relation on . We write for the transitive reflexive closure of . is terminal or in normal form if there is no with ; denotes the set of the terminal elements of . If and , we say has a normal form . has the normal form property (NFP) if . has the property of unique normal form (UN) if . NFP implies UN.

An important property of an ARS is whenever the sequences of reductions may or must lead to a normal form. An ARS is said strongly normalizing (SN, aka terminating)2, if there is no infinite sequence ; it is said weakly normalizing (WN, aka normalizing), if each element has a normal form.

2.2 Basics on Probabilities

The intuition is that random phenomena are observed by means of experiments (running a probabilistic program is such an experiment). Each experiment results in an outcome. The collection of all possible outcomes is represented by a set, called the sample space . Any subset of the sample space is called an event (a subset of possible outcomes).

Example 1.

Consider the experiment of tossing a die once. The sample space is the set . The probability measure of each outcome is . The event “result is odd” is the subset , whose probability measure is .

Formally, a probability space is a triple given by a sample space , a collection of events (which satisfy the -algebra axioms), and and a function which assigns a probability measure to events.

When the set is countable, the theory becomes very simple. The key notion here is that of probability distribution. A discrete probability distribution on a countable set is a (possibly partial) function from to such that . A discrete probability space is given by a pair , where the sample space is countable, and is a a probability distribution on . A probability measure is assigned to any event by as .

A discrete random variable on is a function , where is another countable set. induces a probability distribution on by composition: i.e. . Thus is also a probability space.

The expected value (also called the expectation or the mean) of a random variable is a weighted (in proportion to probability) average of the possible values of . Assume is discrete and a non-negative function, then .

2.3 Distributions: notations and general operations.

Let be a countable set. We write for the set of probability subdistributions i.e. functions such that . We need subdistributions to account for nontermination; in the rest of this paper, with a slight abuse of language, we often use the term distribution also for probability subdistribution. We indicate with the support of , i.e. .

Comparing. On , we define the relation point-wise: if for each . Sum and multiplication. Assume ; multiplication for a scalar () is defined if , by . The sum () is defined if ; in this case . Restriction. Assume and , we write for the subdistribution which is obtained by restriction of to , in the following sense:

We adopt the following convention. If , and then we also write , with the implicit assumption that this function behave as on , and is otherwise.

Notation 2.

We represent a distribution by explicitly indicating the support, and (as superscript) the probability assigned to each element by . We therefore write if and otherwise. is the dirac distribution which concentrates all probability in ; when clear form the context, we write it simply .

By using the definition of multiplication and sum, the distribution can be written also .

3 Pars

{forest} P/.style= edge label=node[left,blue,font=]#1 , for tree= grow=0,reversed, parent anchor=east,child anchor=west, edge=line cap=round,outer sep=+1pt, l sep=8mm [2, [1,P=1/2[0,P=1/4],[2,P=1/4]], [3,P=1/2[2,P=1/4],[4,P=1/4]] ] {forest} P/.style= edge label=node[left,blue,font=]#1 , for tree= grow=0,reversed, parent anchor=east,child anchor=west, edge=line cap=round,outer sep=+1pt, l sep=8mm [2, [1,P=1/2[0  ,P=1/4],[2,P=1/4,[1,P=1/8],[3,P=1/8]]], [3,P=1/2[2,P=1/4,[stop,P=1/4]],[4  ,P=1/4]] ]

Figure 2: Simple random walk on
Figure 3: Example 5

We recall the basic definitions of PARS from [6, 5], and then investigate how such a system evolves.

Pars. A probabilistic abstract rewrite system (PARS) is a pair consisting of a countable set and a relation such that for each , , and is finite. We write for and we call it a rewrite step, or a reduction. An element is terminal or in normal form if there is no with . We denote by the set of the terminal elements of . A PARS is deterministic if, for all , there is at most one with .

Remark 3.

The intuition behind is that the rewrite step on has probability .

The event . Let , with a PARS. The key event of interest for us is the set . Observe that is the probability that the outcome of the reduction is a normal form (similar to Example 1).

Probabilistic vs Non-deterministic. It is important to have clear the distinction between probabilistic choice (which globally happens with certitude) and non-deterministic choice (which leads to potentially different distributions of outcomes.) Let us discuss some examples.

Example 4 (A deterministic PARS).

Fig. 3 describes a simple random walk over , which can be encoded by the following PARS on

Observe that this PARS is deterministic, because for every element, at most one choice applies. The element is terminal.

Example 5 (A non-deterministic PARS).

Let assume (Fig. 3) that the random walk describes a gambler starting with points and playing a game where every time he can gain point with probablity or loose point with probability . We now assume that he is also given the possibility to stop at any time. The two choices are encoded as follows.

Remark 6.

The form of non-determinism which we had in mind when discussing is more subtle than in the last example. This is because a term rewriting system (see [34], Ch.2) has more structure than an ARS. The rewrite steps on the terms are defined by a set of rules, which are closed under context. Even if the set of rule has no overlap, the closure under context introduces non-determinism because in general a term has several redexes. A typical example of this is the -calculus with beta reduction. The case of (as well as weak , see Sec. 6) is similar.

3.1 Evolution of a system described by a PARS: discussion and examples

We have defined a PARS, now we need to explain how such a system evolves. The evolution of a system which is subject to a probabilistic transition is described by a stochastic sequence, called a stochastic process. A possibility to describe the evolution of a PARS, is to follow the stochastic evolution of a single run, a sampling at a time, as we have done in Fig. 1, 3, and 3. In each single run, there are two sources of choice: probabilistic (solved by sampling), and non-deterministic. This is the approach in [5]. Here we follow a different way. We describe a state of the system globally, as a distribution on the space of all terms. A possible evolution of the system is then a sequence of distributions. Since the probabilistic choices are taken in their totality, the only source of choice in the evolution is non-determinism (which is what produces different sequences of distributions). This global approach allows us to deal with non-determinism by using techniques which have been developed in Rewrite Theory. Before introducing the formal definitions, let us motivate them by informally examining same example, and by pointing out why same care is needed.

{forest} L/.style= edge label=node[midway,left, font=]#1 , for tree= grow=0,reversed, parent anchor=east,child anchor=west, edge=-¿,outer sep=+1pt, l sep=6mm [, [, blue, L= [ , blue, L=] [, , L=, ]] [, L=, [ , L= ] [, L=, ]] ]           {forest} L/.style= edge label=node[midway,left, font=]#1 , for tree= grow=0,reversed, parent anchor=east,child anchor=west, edge=-¿,outer sep=+1pt, l sep=6mm [, [, blue, L= [ ,blue, L=] [, L=]] [, red, L= [ , L= ] [ , red, L=]] ]

Figure 4: Non-deterministic PARS
Figure 5: Non-deterministic PARS
Example 7 (Fig. 1 continued).

With our approach, the PARS described by the rule (in Fig. 1) evolves as follows: .

Example 8 (Fig. 5).

Fig. 5 illustrates the possible evolutions of a non-deterministic system which has two rules: and . We annotate the arrow with the chosen rule. Observe that the top blue branch corresponds to the (deterministic) strategy ”always choose rule to reduce ” (exactly as happens in Fig. 1).

Example 9 (Fig. 5).

Fig. 5 illustrates the possible evolutions of a system with rules and

This approach demands some care. If we look at Fig. 3, we observe that after two steps, there are two distinct occurrences of the element 2, which live in two different runs of the program: the run 2.1.2, and the run 2.3.2. In both cases, the next transition only depends on 2, not on the history (as we expect from a Markovian process). The same fact is even more striking if we are in the system described in Fig. 3, because now there are two possible transitions for each nonterminal element. Again, the possible reductions do not depend on the specific run in which 2 occurs: its history is only a way to distinguish the occurrence. For this reason, given a PARS , we keep track of different occurrences of an element , but not of the history. We formalize these ideas in the next section.

Remark 10 (Markov Decision Processes).

To understand our distinction between occurrences of states in different paths, it is helpful to think how a system is described in the framework of Markov Decision Processes (MDP) [28]. Indeed, in the same way as ARS correspond to transition systems, PARS correspond to probabilistic transitions. Let us regard a PARS step as a probabilistic transition ( is here a name for the rule). Let assume is an initial state. In the setting of MDP, a typical element (called sample path) of the sample space is a sequence where is a rule, an element, , and so on. The index is interpreted as time. On various random variables are defined; for example, , which represents the state at time . The sequence is an example of stochastic process.

3.2 A rewrite system on distributions

From now on, we fix to be a countable set on which a PARS is defined. We also fix a countable index set (whose elements we call labels), and define , the disjoint union of copies of . We write an element of also as . We denote by the set of distributions over (with ) with finite support. Observe that if then also (by the convention in Sec. 2.3). Letters convention: we use for distributions on , and the letters for distributions on .

Remark 11.

Because of our previous discussion, , and not , is the sample space in which we describe the evolution of the system. Observe that we work with two families of probability spaces: , where , and ), where , which are related by the natural operations of embedding and a flattening.

We move between and in the natural way, as follows.

Each label defines a labelling injection : , which induces also a map , in the obvious way (see table below.) We assume as given a canonical labelling injection .

If has finite support, then . Conversely, if , the function induces a distribution , as follows.

Disjoint sum . On , we define an operation of disjoint sum. Assume with . We define where if .

The distribution on . Given , we define as ,  i.e. . Please observe that the domain is .

Remark 12.

With the language introduced in Sec. 2.2, and are random variable, which respectively induce the distributions and .

Equivalences. Two distributions are equivalent in , written , if . They are equivalent in , written , if .

Example 13.

. Assuming is terminal and are nonterminal, then .

The relation . We now define a binary relation on distributions , which is obtained by lifting the relation . Intuitively, brings the key notion of one-step reduction to distributions (see discussion in Sec. 8). Let us first observe that given , we can always partition it into , where , and its complement . Therefore, , with the subdistribution on the elements for which a reduction exists.

Given a relation , its lifting to a relation is defined by the rule OneStep below


  • in the premiss: for each in , we choose a PARS rule .

  • in the conclusion: is the canonical labelling injection (see above). Since the disjoint sum renames the labels, we assume that the labels in each component of are distinct form the labels in each other component.

Observe also that always hold.

Example 14.

Let us derive the reduction in Fig. 3. Let be the initial state; are labels, whose role is simply to distinguish occurrences of the same element (here, ).

Remark 15 (Index Set).

A natural choice for the index set is . In the previous example, we could have . Another natural choice for the index set is i.e. the set of finite sequences on . In the previous example, we could then have (where is the empty sequence, and denotes concatenation), i.e. occurrences are labelled by their path.

Rewrite sequences. A finite rewrite sequence is a non-empty sequence such that for all . We write to indicate that there is a finite sequence, and to specify its length . We write to indicate an infinite rewrite sequence.

We will extensively use the following results.

Lemma 16.
  1. If then ( as defined in Sec. 2.3).

  2. If , then for each , there exists a rewrite sequence and a rewrite sequence , with ().


(1.) Immediate. (2.) By easy induction. It is enough, at each step, to choose in the premiss of the rule OneStep, the same reduction for all such that . ∎

Intuitively, in this section we have defined a rewrite system . What makes it not an istance of ARS, is the definition of termination as limit, which we formalize in the next section.

Remark 17 (A general rewrite relation on distributions).

We observe that there are more options for lifting a PARS relation to a rewrite relation on distributions - each may better suit different situations. In this paper we focus on the relation OneStep because the results we study in the Sections LABEL:RD,weak,domination rely on exact one-step rewriting (see Discussion in Sec. 8). In general however, the reduction of a distribution could fire exactly one occurrence in the distribution, all non-terminal elements (as is the case of OneStep) or anything in between. It is the way we partition which makes the difference.

Given a PARS relation , its lifting to a binary relation on distributions is defined by the following rule:


  • is a partition of into a distribution which contains and its complement . I.e., is a subdistribution on the elements for which a reduction exists, non necessarily all of them;

  • in the premiss: for each in , we choose a PARS rule ;

  • and are as above for .

4 Normalization, Termination, Unique Normal Forms

In this section, we study normal forms ”at the limit”. The intuition is that the final result (i.e. the limit) of a sequence of rewriting steps should be a distribution on the possible values of the program. A PARS is defined in [5] to be almost surely terminating if the probability that a term leads to a normal form is whatever the rewriting strategy is. We expand and refine this idea, and make a distinction between weak and strong normalization. Why we do not limit ourselves to probability ? A natural example is the term in Sec. 1.1. This term has a rewrite sequence which leads to a normal form with probability (in our language, the sequence converges to ), while another rewrite sequence leads to a normal form with probability .

The following observations are useful to help intuition.

Remark 18.

A distribution such that represents a state in the evolution of the system with initial state . We are interested in two particular events: the event , which has probability (the probability of reaching a normal form), and the events of the form for , which has probability (the probability of reaching the specific normal form ).

Existence of limits. Let , let be a rewrite sequence from . We observe that if , is nondecreasing (Lemma 16,1.) and bounded, and so is . In both cases, the sequence has a limit, which is the supremum: and

Limit distributions
Definition 19.

Let be a rewrite sequence starting in . We say that converges to (and to ), written


if .
We call a
limit distribution of and define .

To focus on (resp. ), we may omit to annotate (resp. ), writing (). We write if a converging sequence exists.

Remark 20.

We stress that a limit distribution is a distribution on the normal forms of (not on ).

Weak and Strong Normalization. Similarly to the case of an ARS, we need to distinguish between weak and strong normalization (see Sec. 2.1 ).

Definition 21.

Let ,

  • is q-( weakly normalizes with probability ) if there exists a rewrite sequence starting in , with . We write this also as .

  • is q-( strongly normalizes with probability ) if for each rewrite sequence starting in , .

  • is Almost Sure Terminating (AST) if it is 1-.

A PARS is q-, q-, AST, if each satisfies that property.

As common use (see Sec. 2.1, and footnote), from now on we use the term termination for strong normalization and refer to weak normalization as simply normalization. We speak of q-termination and q-normalization.

Example 22.

The system in Fig. 5 is -, but not -. The top rewrite sequence (in blue ) converges to . The bottom rewrite sequence (in red) converges to . In between, we have all dyadic possibilities. In contrast, the system in Fig. 5 is AST, even if there is not unique limit distribution. Indeed, there is a a continuum of different limits, but all have measure .

PAST and Mean Time. Assume a PARS is AST. In the classical case, if a maximal rewrite sequence terminates, its length (number of steps) is finite; we interpret this number as time to termination. In the probabilistic case, the number of rewrite steps from an initial state is (in general) infinite, even if is AST. However, its expected value (i.e.  the average weighted w.r.t. probability, see Sec. 2.2) may be finite. In this case, the PARS is not only AST, but it is said PAST (Positively AST) (see [5]).

Example 23.

Let us very informally give an idea. Consider the PARS in Fig. 1 Let the sample space be the set of paths ending in a terminal element, and let be the probability distribution on . What is the expected value of the random variable ? We have .

The paper [2] makes a nice observation: this expected value admits a very simple formulation in terms of rewrite sequences, when the reduction captures one-step reduction on distributions, as is the case for . By using the formulation in [2] (to which we refer for the details), we define the mean number of steps of a rewrite sequence as follows


Intuitively, each tick in time is weighted with its probability (essentially, we count the individual steps and use [7], Th. 2.1.24.). We calculate separately the expected number of steps of the form with nonterminal, and the expected number of steps with terminal element ( is , because the system is AST).

Example 24.

In Example 7, the (only) rewrite sequence from has MeanTime .

4.1 Uniqueness of limit distributions

Assume that is a probabilistic term (a program) and assume . We regard as the result of a computation, i.e. as a distribution on the possible values (the normal forms) of . If we have also , it is natural to wonder how and relate, and to investigate under which conditions .

Normalization and termination are concerned with quantitative properties - we are only interested in the measure , for limit distribution. In this section, we turn our attention to finer-grained properties of the limit distributions. Consider again Fig. 5. All possible rewrite sequences are AST, however, the limit distributions are very differe; it is immediate to check that they span all the continuum of distributions , for . In Sec. 2.1 we reviewed the ARS notions of UN and NFP. The situation now is more subtle and delicate, because it may well be that and , with , as in Fig. 5.

Our goal in the next sections will be to compare limit distributions qualitatively once they are equal quantitatively. For this reason, we will focus on the following property of

: if and then

On uniqueness of limit distributions and confluence

Figure 6: Confluence implies NFP.
Figure 7: Sub-Confluence implies NFP.

Let us make some basic observations. We need to point out that while simple to formulate,  does not tell the complete story (again, what about a system as in Fig. 5?) It is adequate for our needs because in the next sections we concentrate on systems in which all have the same measure, as it is the case if is AST or (more generally) -. We first observe that in such a case, the following two properties are equivalent to (see Proposition 26):

  • gUN: has a greatest element.

  • NFP (Normal Form Property): if is maximal in , then , implies .

gUN is subtly more general then . It says that if , , with maximal, then . gUN appears as an appropriate generalization of the unique normal form property, and indeed it satisfies an analogous of two standard facts of ARS: ”the Normal Form Property implies UN” (see Proposition 25), and ”Confluence implies UN” (see Proposition 27), as we show next.

Proposition 25.

For each PARS, (NFP gUN).


Let be maximal. If , there is a sequence from such that . NFP implies that , , and therefore . We conclude that . ∎

The following is immediate to verify.

Proposition 26.

, q-implies that the three properties ( NFP, gUN, ) are equivalent.

What about confluence? Confluence of the relation (in the standard sense)3 does imply gUN. In fact, we show that a weaker notion of confluence suffices. We define the following two properties:

  • is confluent in normal form: with , there exist such that , , and (i.e. ).

  • is sub-confluent: with , such that and .

Proposition 27.

For each PARS

  1. Confluence in Normal Form implies gUN;

  2. Sub-Confluence implies gUN.

  3. Confluence in Normal Form implies Sub-Confluence.


(1.) and (2.). We prove that Confluence in Normal Form implies NFP(the proof really only uses Sub-Confluence, see Fig. 7). Let , be maximal in , and be a sequence which converges to . Assume . As illustrated in Fig. 7, starting from , we build a sequence , where , is given by confluence in Normal Form, by closing and . Observe that by construction, (*) . Let be the limit of this sequence, which implies . From (*) we have , and therefore . From maximality of , we conclude . (3.) Straightforward. ∎

4.2 More discussion on proof techniques

Observe that in Propositions 25 and 27, the statement has the same flavour as similar ones for ARS, but the notions are not the same. Limits (and therefore the notion of and ) do not belong to ARS. For this reason, the rewrite system which we are studying is not simply an ARS, and one should not assume that standard properties of ARS hold. As we pointed out in the Introduction, there are two issues: we need to find the right formulation and the right proof technique. An illustration of both issues is Newman’s Lemma, which is a sensible question in a probabilistic setting, because we do have methods to establish AST termination [5, 14, 2]. Let us assume AST and observe that in this case, confluence ”at the limit” can be identified with . A wrong answer: AST + , where : if and , then , with , . This does not hold. A counterexample is the PARS in Fig. 5, which does satisfy . Observe that this only refutes our attempt formulation, while leaving open the question ”Can we uncover properties similar to Newman Lamma?” or, better, ”Are there a local properties which guarantees ?”

In the rest of the paper, we develop proof techniques which give us results on , ,  and their relations.

5 Balance: Normalization implies Termination, and

Figure 8: Balance
Figure 9: Proof of 31

In this section we study a condition which guarantees p-termination (-) as soon as we have p-normalization (-); in appropriate form, it also guarantees . More precisely, we capture the following property: “for each , and , all rewrite sequences from have the same probability of reaching a normal form after steps”. This can be seen as a generalization of the ARS notion of Random Descent; it expresses the fact that non-determinism is “irrelevant” modulo a chosen equivalence.

Our property, -balance, is parametric on an equivalence relation on . In this section we assume the equivalence relation ) to be either or () .

  • -balance implies that if one sequence converges to 1 (resp. ), then the system is AST (resp. p-).

  • -balance allows us to also prove that  holds.

Remark 28.

Observe that if holds, then also holds!

Example 29.

The system in Fig. 5 satisfies -balance for but not for .

Definition 30 (Balance).

Let the equivalence be as stipulated above. We define the following properties of a PARS (illustrated in Fig. 9).

  • -balance (B): for each , for each pair of sequences , starting in , , ) holds.

  • local -balance (LB): , if , then for each exist with , , and ).

Observe that B is a property which is universally quantified on all sequences from . The property LB is instead local. We prove that this local property provides a characterization of B.

Theorem 31 (Characterization).

The following properties are equivalent.

  1. LB