Linear temporal logic for regular cost functions

Linear temporal logic for regular cost functions

Denis Kuperberg Liafa/CNRS/Université Paris 7, Denis Diderot, France
Abstract

Regular cost functions have been introduced recently as an extension to the notion of regular languages with counting capabilities, which retains strong closure, equivalence, and decidability properties. The specificity of cost functions is that exact values are not considered, but only estimated.

In this paper, we define an extension of Linear Temporal Logic (LTL) over finite words to describe cost functions. We give an explicit translation from this new logic to two dual form of cost automata, and we show that the natural decision problems for this logic are PSPACE-complete, as it is the case in the classical setting. We then algebraically characterize the expressive power of this logic, using a new syntactic congruence for cost functions introduced in this paper.

LTL, cost functions, cost automata, stabilization semigroup, aperiodic, syntactic congruence
\subjclass

F.1.1,F.4.3 \titlecommentUpdated version 08/02/2017

1 Introduction

Since the seminal works of Kleene and Rabin and Scott, the theory of regular languages is one of the cornerstones in computer science. Regular languages have many good properties, of closure, of equivalent characterizations, and of decidability, which makes them central in many situations.

Recently, the notion of regular cost function for words has been presented as a candidate for being a quantitative extension to the notion of regular languages, while retaining most of the fundamental properties of the original theory such as the closure properties, the various equivalent characterizations, and the decidability [Col09]. A cost function is an equivalence class of the functions from the domain (words in our case) to , modulo an equivalence relation which allows some distortion, but preserves the boundedness property over each subset of the domain. The model is an extension to the notion of languages in the following sense: one can identify a language with the function mapping each word inside the language to , and each word outside the language to . It is a strict extension since regular cost functions have counting capabilities, e.g., counting the number of occurrences of letters, measuring the length of intervals, etc…

This theory grew out of two main lines of work: research by Hashiguchi [Hashiguchi82], Kirsten [Kirsten05], and others who were studying problems which could be reduced to whether or not some function was bounded over its domain (the most famous of these problems being the star height problem); and research by Bojańczyk and Colcombet [Bojanczyk04, BojanczykC06] on extensions of monadic second-order logic (MSO) with a quantifier which can assert properties related to boundedness.

Linear Temporal Logic (LTL), which is a natural way to describe logical constraints over a linear structure, have also been a fertile subject of study, particularly in the context of regular languages and automata [VarWol]. Moreover quantitative extensions of LTL have recently been successfully introduced. For instance the model Prompt-LTL introduced in [PromptLTL] is interested in bounding the waiting time of all requests of a formula, and in this sense is quite close to the aim of cost functions.

In this paper, we extend LTL (over finite words) into a new logic with quantitative features (), in order to describe cost functions over finite words with logical formulae. We do this by adding a new operator : a formula means that holds somewhere in the future, and has to hold until that point, except at most times (we allow at most "mistakes" of the Until formula). The variable is unique in the formula, and the semantic of the formula is the least value of which makes the statement true.

Related works and motivating examples

Regular cost functions are the continuation of a sequence of works that intend to solve difficult questions in language theory. Among several other decision problems, the most prominent example is the star-height problem: given a regular language  and an integer , decide whether  can be expressed using a regular expression using at most -nesting of Kleene stars. The problem was resolved by Hashigushi [Hashiguchi88] using a very intricate proof, and later by Kirsten [Kirsten05] using an automaton that has counting features.

Finally, also using ideas inspired from [BojanczykC06], the theory of those automata over words has been unified in [Col09], in which cost functions are introduced, and suitable models of automata, algebra, and logic for defining them are presented and shown equivalent. Corresponding decidability results are provided. The resulting theory is a neat extension of the standard theory of regular languages to a quantitative setting.

On the logic side, Prompt-LTL, introduced in [PromptLTL], and PLTL [AETP01], which are similar, show an interesting way to extend LTL in order to look at boundedness issues, and already gave interesting decidability and complexity results. In [DJP04], the logics TL was introduced, which uses an explicit bound to express some desired boundedness properties.

These logics are only interested in bounding the wait time, i.e. consecutive events. It would correspond in the framework of regular cost functions to the subclass of temporal cost functions introduced in [CKL].

We will introduce here a logic with a more general purpose : it can bound the wait time before an event, but also non-consecutive events, like the number of occurences of a letter in a word.

These quantitative issues are a quite natural preoccupation in the context of verification: for instance one would expect that a system can react in a bounded time. The new features of could possibly be used to allow some mistakes in the behaviour fo the program, but guarantee a global bound on the number of mistakes. An other issue is the consumption of resources: for instance it is interesting to know whether we can bound the number of times a program stores something in the memory.

Contributions

It is known from [Col09] that regular cost functions are the ones recognizable by stabilization semigroups (or in an equivalent way, stabilization monoids), and from [CKL] than there is an effective quotient-wise minimal stabilization semigroup for each regular cost function. This model of semigroups extends the standard approach for languages.

We introduce a quantitative version of LTL in order to describe cost functions by means of logical formulas. The idea of this new logic is to bound the number of "mistakes" of Until operators, by adding a new operator . The first contribution of this paper is to give a direct translation from -formulae to -automata, which is an extension of the classic translation from LTL to Büchi automaton for languages. This translation preserves exact values (i.e. not only cost functions equivalence), which could be interesting in terms of future applications. We also use dual forms of logic and cost automata to describe a similar translation, and show that the boundedness problem for -formulae is PSPACE-complete (as it was the case in the classical setting). Therefore, we do not lose anything in terms of computational complexity, when generalizing from LTL to .

We then show that regular cost functions described by LTL formulae are the same as the ones computed by aperiodic stabilization semigroups, and this characterization is effective. The proof uses a syntactic congruence for cost functions, introduced in this paper, which generalizes the Myhill-Nerode equivalence for regular languages. This congruence present a general interest besides this particular context, since it can be used for any regular cost function.

This work validates the algebraic approach for studying cost functions, since it shows that the generalization from regular languages extends also to syntactic congruence. It also allows a more user-friendly way to describe cost functions, since temporal logic is often more intuitive than automata or stabilization semigroups to describe a given cost function.

As it was the case in [CKL] for temporal cost functions, the characterization result obtained here for -definable cost functions follows the spirit of Schützenberger’s theorem, which links star-free languages with aperiodic monoids [Schutz65].

Organisation of the paper

After some notations, and reminder on cost functions and stabilization semigroups, we introduce in Section 4 as a quantitative extension of LTL, and give an explicit translation from -formulae to and -automata in Sections 5 and 6. We then present in Section 7 a syntactic congruence for cost functions, and show that it indeed computes the minimal stabilization semigroup of any regular cost function. We finally use this new tool to show that has the same expressive power as aperiodic stabilization semigroups.

Notations

We will note the set of non-negative integers and the set , ordered by . We will say that a set is bounded if there is a number such that for all , we have . In particular, if contains then is unbounded. If is a set, is the set of infinite sequences of elements of  (we will not use here the notion of infinite word). Such sequences will be denoted by bold letters (a, b,…). We will work with a fixed finite alphabet . The set of words over  is and the empty word will be noted . The concatenation of words and  is . The length of  is . The number of occurrences of letter  in is . We will use (resp. ) to note the function (resp. ). Functions will be denoted by letters , and will be extended to by . Such functions will be called corrections functions.

2 Regular Cost functions

2.1 Cost functions and equivalence

Let be the set of functions from to . If , we will note the function of defined by if , if . For , we say that if for all set , if is bounded then is bounded. We define the equivalence relation on by if and . Notice that means that and are bounded on the same sets of words, i.e. for all , we have is bounded if and only if is bounded. This equivalence relation does not pay attention to exact values, but preserves the existence of bounds.

We also introduce another relation, which is parametrized by a correction function. If is a correction function (see Notations), we say that if , and if and . Intuitively, means that one can be obtained from the other by “distorting” the value according to the correction function . In particular, if and only if (where is the identity function).

{lem}

[Col09] Let . We have (resp. ) if and only if there exists a correction function such that (resp. ).

{proof}

Assume for some . If is bounded by for some set , then is bounded by , so we get .

Conversely, if , we want to to build such that . For each , we define . We define if , and otherwise. As always, . Notice that because , for every we have , since is bounded on . Let . If is finite, then let , we have , so . If , then we always have .

We showed that if and only if there exists a correction function such that . It directly follows that if , then and , thus . Conversely, if , then there are correction function such that and . We get . Notice that saying is more precise than saying : in addition to preserving the qualitative information on bound, the correction function gives a quantitative information on the distortion of bounds.

A cost function is an equivalence class of . In practice, cost functions will always be represented by one of their elements in . If is a function in , we will note the cost function containing . We will say that an object (automaton, logical formula) recognizes a cost function, when it defines a function in , but the notion of equivalence we are mostly interested in is the -equivalence instead of the equality of functions.

Notice that the value is considered unbounded, so if and are languages of , then if and only if . This shows that considering languages as cost functions does not lose any information on these languages, and therefore cost function theory properly extends language theory.

{rem}

They are uncountably many cost functions in , and each cost function contains uncountably many functions. Therefore it is hard to give an explicit description of all the functions in a -class, other than all the functions equivalent to a particular representative.

We will now introduce two models of cost automata recognizing cost functions. These definitions are from [Col09], the reader can report to it for more details. In both cases, we define the semantic of an automaton as a function in , which we will mainly look as a representative of the cost function .

2.2 -automata

A -automaton is a tuple where is the set of states, the alphabet, and the sets of initial and final states, the set of counters, and is the set of transitions.

Counters have integers values starting at , and an atomic action update the value of every counter in the following way: increments by , resets to , and leaves the counter value unchanged. If is a run, let be the set of values reached during , at any point of the run and on any counter of . The notation “” stands for “increment check”, meaning that as soon as we increment a counter, we put its value in .

A -automaton recognizes a cost function via the following semantic:

With the usual conventions that and . It means that the value of a run is the maximal value reached by a counter, and the nondeterminism resolves in taking the run with the least value. If there is no accepting run on a word , then .

Notice that in particular, if the automaton does not have any counter, then it is a classical automaton recognizing a language , and its semantic is , with if and if .

{exa}

Let . The functions and represent the same cost function, which is recognized by the following one-counter -automaton on the left-hand side. The cost function containing is recognized by the nondeterministic one-counter -automaton on the right-hand side.

\tikztostart

2.3 -automata

The model of -automaton is dual to the one -automaton. The aim of this model is to mimic completation: as we cannot complement a function, we get around it by reversing the semantic of the automata defining it.

An -automaton is a tuple where is the set of states, the alphabet, and the sets of initial and final states, the set of counters, and is the set of transitions.

Counters have integers values starting at , and an action performs a sequence of atomic actions on each counter, where atomic actions are either (increment by ), (reset to ), (do nothing on the counter), or (check the counter value and reset it). If is a run, let be the set of values checked during on all counters of . This means that this time, contrary to what happened in -automata, we only put in values witnessed during an operation . This is because we will be interested in the minimum of these values, and therefore we do not want to observe all intermediate values.

An -automaton computes a cost function via the following semantic :

Notice that and have been switched, compared to the definition of the -semantic. It means that the value of a run of an -automaton is the minimal checked value, and the automaton tries to maximize its value among all runs.

In particular, if is a classical automaton for , then its -semantic is , where is the complement of . This conforts the intuition that switching between and -automata corresponds to complementation.

{exa}

We will redefine the two cost functions from example 2.2, this time with -automata. The first one counts the number of , and guess the last letter to check the value. Notice that the exact function it computes is between and , so is equivalent to up to , with . The second automaton counts all blocks of , and also needs guess the last letter, in order to count the last block (-1 if the last letter is ).

{thm}

[Col09] If is a cost function, there is a -automaton for if and only if there is an -automaton for . That is to say, and -automata have same expressive power (up to ) in term of recognized cost functions.

3 Stabilization semigroups

3.1 Classical ordered semigroups, and regular languages

An ordered semigroup is a tuple , where is a product , and is a partial order compatible with , i.e. implies and . We will always write for the whole structure, and for the underlying set.

An ideal of is a set which is -closed, i.e. such that for all and , we have .

We remind how a classical semigroup can recognize a regular language . The order is not necessary here. Let be a function, canonically extended to a morphism . Let be a subset of , called accepting subset.

Then the language recognized by is . It is well-known that a language is regular if and only if it can be recognized by a finite semigroup.

This section explains how to generalize this to the cost functions setting, as it was done in [Col09].

3.2 Cost sequences

The aim is to give a semantic to stabilization semigroups. Some mathematical preliminaries are required.

Let  be an ordered set, a function from to , and  two infinite sequences. We define the relation by ab if :

A sequence a is said to be -non-decreasing if . We define  as , and ab (resp. ab) if (resp. ) for some .

Remarks:

  • if  then  implies ,

  • if a is -non-decreasing, then it is -equivalent to a non-decreasing sequence,

  • a is -non-decreasing iff it is non-decreasing,

  • let  be two non-decreasing sequences, then iff .

The -non-decreasing sequences ordered by  can be seen as a weakening of the  case. We will identify the elements  with the constant sequence of value .

The relations  and  are not transitive, but the following property guarantees a certain kind of transitivity. {fact} implies and implies .

The function  is used as a “precision” parameter for  and . Fact 3.2 shows that a transitivity step costs some precision. For any , the relation  coincides over constant sequences with order (up to identification of constant sequences with their constant value). Consequently, the infinite sequences in ordered by form an extension of .

In the following, while using relations and , we may forget the subscript and verify instead that the proof has a bounded number of transitivity steps.

{defi}

Let be an ordered semigroup and be an ideal of .

  • If is an -non-decreasing sequence of elements of , we note

    In other words, is the first position where a gets out of .

  • If and , we define the cost sequence by .

3.3 Stabilization semigroups

The notion of stabilization semigroup is introduced in [Col09], in order to extend the classic notion of semigroups, and recognize cost functions instead of languages. If is a semigroup (possibly with other operations), we will note the set of idempotent elements of , i.e. elements such that .

{defi}

A stabilization semigroup is an ordered semigroup together with an operator  (called stabilization) such that:

  • for all with  and , ;

  • for all , ;

  • for all  in , ;

  • if is a monoid, , we say then that is a stabilization monoid

In this paper, we only consider finite stabilization semigroups. The intuition of the operator is that means " repeated many times", which appears in the following properties, consequences of the definition above :

3.4 Factorization trees and compatible function

Let be a stabilization semigroup, and . A -tree over is a -labelled tree such that is the leaf word of , and for each node of , we are in one of these case :

Leaf

is a leaf,

Binary :

has only children , and ,

Idempotent :

has children with , and there is such that ,

Stabilization :

has children with , and there is such that , and .

The root of is called its value and is noted .

{exa}

Let , , and with , , and . The following tree is an -tree over :

Notice that the number of children of the root is . Two cases are possible :

  • : the root is an idempotent node, and .

  • : the root is a stabilisation node, and .

This gives an intuition of how these factorization trees can be used to associate a value to a word, here its number of occurences of .

In the following we will establish formally how we can use factorization trees to give a semantic to stabilization semigroups.

The following theorem is the cornerstone of this process. This theorem is a deep combinatoric result and generalizes Simon’s factorization forests theorem. It can be considered as a Ramsey-like theorem, because it provides the existence of big well-behaved structures (the factorization tree, and in particular the idempotent nodes) if the input word is big enough.

{thm}

[Col09] For all , there exists such that for all and , there is a -tree over of height at most .

This allows us to define by

The function is called compatible with . It depends on so there may be several compatible functions, however we will see that they are equivalent in some sense.

If is a function , we associate to it a function by . We will also identify elements of with their canonic image in (i.e. view a word of sequences as a sequence of words of same length).

{thm}

[Col09] If is a compatible function of , then there exists  such that :

Letter.

for all , ,

Product.

for all , ,

Stabilization.

for all , , ,

Substitution.

for all , , (we identify sequence of words and word of sequences)

{exa}

Let  be the stabilization semigroup with elements , with product defined by :  ( neutral element), and stabilization by  and . Let , we define by:

Then  is compatible with . This is proved by building a factorization tree of height , with idempotent (or stabilisation) -nodes at level , binary nodes at level , and one idempotent/stabilisation node at level .

3.5 Recognized cost functions

We now have all the mathematical tools to define how stabilization semigroups can recognize cost functions.

Let be a stabilization semigroup. Let be a morphism, canonically extended to , and  an ideal. let be a compatible function associated with . We say that the quadruple recognizes the function defined by

We say that is the accepting ideal of , it generalizes the accepting subset used in the classical setting.

Indeed, if is a classical semigroup recognizing with an accepting subset , we can take to be the normal product , to be the identify function on idempotents, and to be the complement of . Then recognizes .

{thm}

[Col09] If satifies all the properties given in Theorem 3.4, then . In other words, is unique up to (and in particular the choice of is not important). Moreover, if is given, and are two compatible functions for , then the functions defined by relatively to and are equivalent up to . This allows us to uniquely define the cost function recognized by the triplet , without ambiguity.

{exa}

Let , the cost function is recognizable. We take the stabilization semigroup from Example 3.4, defined by , and . We have then for all .

The following theorem links cost automata with stabilization semigroups, and allows us to define the class of regular cost functions.

{thm}

[Col09] Let be a cost function, the following assertions are equivalent:

  • is recognized by a -automaton,

  • is recognized by an -automaton,

  • is recognized by a finite stabilization semigroup.

Such a cost function will be called regular by generalization of this notion from language theory.

Notice that if is a language, then is a regular cost function if and only if is a regular language. This shows that the notion of regularity for cost function is a proper extension of the one from language theory. That is to say, restricting cost functions theory to [regular] cost functions of the form , one exactly gets [regular] language theory.

4 Quantitative LTL

We will now use an extension of LTL to describe some regular cost functions. This has been done successfully with regular languages, so we aim to obtain the same kind of results. Can we still go efficiently from an LTL-formula to an automaton?

4.1 Definition

The first thing to do is to extend LTL so that it can decribe cost functions instead of languages. We must add quantitative features, and this will be done by a new operator , required to appear positively in the formula. Unlike in most uses of LTL, we work here over finite words. This is to avoid additional technical considerations due to new formalisms suited to infinite words, which would make all the proofs heavier without adding any new ideas.

Formulas of (on finite words on an alphabet ) are defined by the following grammar :

Where is a unique free variable, common for all occurences of operator. This is in the same spirit as in [PromptLTL], where the bound is global for all the formula.

  • means that the current letter is , and are the classical conjunction and disjunction;

  • means that is true at the next letter;

  • means that is true somewhere in the future, and holds until that point;

  • means that is true somewhere in the future, and can be false at most times before .

  • means that we are at the end of the word.

Notice the absence of negation in the syntax of . However, we can still consider that generalizes classical LTL (with negation), because an LTL formula can be turned into an formula by pushing negations to the leaves. That is why we heed operators in dual forms in the syntax. Remark that we do not need a dual operator for , because we can use to negate it: . Moreover we can also express negations of atomic letters: for all we can define to signify that the current letter is not .

We can then choose any particular , and define and , meaning respectively true and false.

We also define connectors “eventually” : and “globally” : .

4.2 Semantics

We want to associate a function to any -formula . As usual, we will often be more interested in the cost function recognized by .

We will say that ( is a model of ) if is true on with as valuation for , i.e. as number of errors for all the ’s in the formula . We finally define

We can remark that if , then for all , since the operators appear always positively in the formula.

{prop}
  • if , and otherwise

  • if , and otherwise

  • , and

  • ,

  • , and

{exa}

Let , then

We use -formulae in order to describe cost functions, so we will often work modulo cost function equivalence . However, we will sometimes be interested in the exact function described by .

{rem}

If does not contain any operator , is a classical LTL-formula computing a language , and .

5 From to -Automata

5.1 Description of the automaton

We will now give a direct translation from -formulae to -automata, i.e. given an -formula on a finite alphabet , we want to build a -automaton recognizing . We will also show that a slight change in the model of -automaton (namely allowing sequences of counter actions on transitions) allows us to design a -automaton with : the functions recognized by and are equal and not just equivalent up to . This construction is adapted from the classic translation from LTL-formula to Büchi automata [DG].

Let be an -formula. We define to be the set of subformulae of , and to be the set of subsets of .

We want to define a -automaton such that .

We set the initial states to be and the final ones to be We choose as set of counters where is the number of occurences of the operators in , labeled from to .

A state is basically the set of constraints we have to verify before the end of the word, so the only two accepting states are the one with no constraint, or with only constraint to be at the end of the word.

The following definitions are the same as for the classical case (LTL to Büchi automata):

{defi}
  • An atomic formula is either a letter or

  • A set of formulae is consistent if there is at most one atomic formula in it.

  • A reduced formula is either an atomic formula or a Next formula (of the form ).

  • A set is reduced if all its elements are reduced formulae.

  • If is consistent and reduced, we define .

{lem}

[Next Step] If is consistent and reduced, for all and ,

{proof}

If , then the only atomic formula that can contain is , and therefore . Moreover, for every formula of the form in , we have . By definition of the semantic of the operator, this means . This is true for every in , so we obtain . The converse is similar.

We would like to define with as transitions.

The problem is that is not consistent and reduced in general. If is inconsistent we remove it from the automaton. If it is consistent, we need to apply some reduction rules to get a reduced set of formulae. This consists in adding -transitions (but with possible actions on the counter) towards intermediate sets which are not actual states of the automaton (we will call them "pseudo-states"), until we reach a reduced set.

Let be maximal (in size) not reduced in , we add the following transitions

  • If :

  • If :

  • If :

  • If :

    where action (resp. ) perform (resp. ) on counter and on the other counters.

The pseudo-states do not (a priori) belong to because we add formulae for , so if is a reduced pseudo-state, will be in again since we remove the new next operators.

The transitions of automaton will be defined as follows:

where means that there is a sequence of -transitions from to with as combined action on counters.

5.2 Correctness of

We will now prove that is correct, i.e. computes the same cost function as .

{defi}

If is a sequence of actions on counters, we will call the maximal value checked on a counter during with as starting value of the counters, and if there is no check in . It corresponds to the value of a run of a -automaton with as combined action of the counter.

{lem}

Let be a word on and an accepting run of .

Then for all , for all , for all , verifying (if ) consistent and reduced, and

where .

{proof}

We do a reverse induction on .

If , is a final state so or . If , then (no outgoing -transitions defined from or ). Then if , the only possibility is , but , and , hence the result is true for .

Let , we assume the result is true for , and we take same notations as in the lemma, with . By definition of , there exists a transition in .

We do an induction on the length of the path .

If , then is consistent and reduced, so is either atomic or a Next formula.

If is atomic, the only way can be consistent is if . In which case we obtain without difficulty.

If with , then it corresponds to the case . By induction hypothesis (on ), ( does not change because is empty). Hence which shows the result.

If , we assume the result is true for , and we show it for . We have with , and for all with .

We now look at the different possibilities for the -transition . Let us first notice that either or : since , adding it at the beginning of a sequence can only increment its value by one, or leave it unchanged.

Let . If , then , but so .

We just need to examine the cases where :

  • If , , and ,

    then and , hence .

  • Other classical cases where are similar and come directly from the definition of LTL operators.

  • If , and ,

    then and , hence

  • If , and ,

    then .

    If reaches before its first reset in , then , and we can conclude .

    On the contrary, if and there are strictly less than mistakes on before the next occurence of , we can allow one more while still respecting the constraint with respect to , so .

  • If , and then , and , hence .

Hence we can conclude that for all , , which concludes the proof of the lemma.

Lemma 5.2 implies the correctness of the automaton :
Let be a valid run of on of value , applying Lemma 5.2 with and gives us . Hence .

Conversely, let , then so by definition of , it is straightforward to verify that there exists an accepting run of over of value (each counter doing at most mistakes relative to operator ). Hence .

We finally get , the automaton computes indeed the exact value of function (and so we have obviously ).

Contraction of actions

If we want to obtain a -automaton as defined in Section 2.2, with atomic actions on transitions, we can proceed as follow.

We replace every action by the maximal letter atomic action occuring in it, with respect to the order . For instance will be changed in . Let be the maximal number of consecutive increments in such an action