Regular cost functions, Part I: logic and algebra over words
The theory of regular cost functions is a quantitative extension to the classical notion of regularity. A cost function associates to each input a non-negative integer value (or infinity), as opposed to languages which only associate to each input the two values “inside” and “outside”. This theory is a continuation of the works on distance automata and similar models. These models of automata have been successfully used for solving the star-height problem, the finite power property, the finite substitution problem, the relative inclusion star-height problem and the boundedness problem for monadic second-order logic over words. Our notion of regularity can be – as in the classical theory of regular languages – equivalently defined in terms of automata, expressions, algebraic recognisability, and by a variant of the monadic second-order logic. These equivalences are strict extensions of the corresponding classical results.
The present paper introduces the cost monadic logic, the quantitative extension to the notion of monadic second-order logic we use, and show that some problems of existence of bounds are decidable for this logic. This is achieved by introducing the corresponding algebraic formalism: stabilisation monoids.
firstname.lastname@example.org Supported by the Anr project Jade: ‘Jeux et Automates, Décidabilité et Extensions’. The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 259454.
This paper introduces and studies a quantitative extension to the standard theory of regular languages of words. It is the only quantitative extension (in which quantitative means that the function described can take infinitely many values) known to the author in which the milestone equivalence for regular languages:
accepted by automata = recognisable by monoids
= definable in monadic second-order logic = definable by regular expressions
can be faithfully extended.
This theory is developed in several papers. The objective of the present one is the introduction of the logical formalism, and its resolution using algebraic tools. However, in this introduction, we try to give a broader panorama.
The theory of regular cost functions involves the use of automata (called - and -automata), algebraic structures (called stabilisation monoids), a logic (called cost monadic logic), and suitable regular expressions (called - and -regular expressions). All these models happen to be of same expressiveness. Though most of these concepts are new, some are very close to objects known from the literature. As such, the present work is the continuation of several branches of research.
The general idea behind these works is that we want to represent functions, i.e., quantitative variants of languages, and that, ideally we want to keep strong decision results. Works related to cost functions go in this direction, where the quantitative notion is the ability to count, and the decidability results are concerned with the existence/non-existence of bounds.
A prominent question in this theory is the star-height problem. This story begins in 1963 when Eggan formulates the star-height decision problem :
A regular language of words and a non-negative integer .
Yes, if there exists a regular expression
1using at most nesting of Kleene stars which defines . No, otherwise.
Eggan proved that the hierarchy induced by does not collapse, but the decision problem itself was quickly considered as central in language theory, and as the most difficult problem in the area.
Though some partial results were obtained by McNaughton, Dejean and Schützenberger , it took twenty-five years before Hashiguchi came up with a proof of decidability spread over four papers . This proof is notoriously difficult, and no clean exposition of it has ever been presented.
Hashiguchi used in his proof the model of distance automata. A distance automaton is a finite state non-deterministic automaton running over words which can count the number of occurrences of some “special” states. Such an automaton associates to each word a natural number, which is the least number of occurrences of special states among all the accepting runs (or nothing if there is no accepting run over this input). The proof of Hashiguchi relies on a very difficult reduction to the following limitedness problem:
A distance automaton.
Yes, if the automaton is limited, i.e., if the function it computes is bounded over its domain. No, otherwise.
Hashiguchi established the decidability of this problem . The notion of distance automata and its relationship with the tropical semiring (distance automata can be seen as automata over the tropical semiring, i.e., the semiring ) has been the source of many investigations .
Despite this research, the star-height problem itself remained not so well understood for seventeen more years. In 2005, Kirsten gave a much simpler and self-contained proof . The principle is to use a reduction to the limitedness problem for a form of automata more general than distance automata, called nested distance desert automata. To understand this extension, let us first look again at distance automata: we can see a distance automaton as an automaton that has a counter which is incremented each time a “special” state is encountered. The value attached to a word by such an automaton is the minimum over all accepting runs of the maximal value assumed by the counter. Presented like this, a nested distance desert automaton is nothing but a distance automaton in which multiple counters and reset of the counters are allowed (with a certain constraint of nesting of counters). Kirsten performed a reduction of the star-height problem to the limitedness of nested distance desert automata which is much easier than the reduction of Hashiguchi. He also proves that the limitedness problem of nested distance desert automata is decidable. For this, he generalises the proof methods developed previously by Hashiguchi, Simon and Leung for distance automata. This work closes the story of the star-height problem itself.
The star-height problem is the king among the problems solved using this method. But there are many other (difficult) questions that can be reduced to the limitedness of distance automata and variants. Some of the solutions to these problems paved the way to the solution of the star-height problem.
The finite power property takes as input a regular language and asks whether there exists some positive integer such that . It was raised by Brzozowski in 1966, and it took twelve years before being independently solved by Simon and Hashiguchi . This problem is easily reduced to the limitedness problem for distance automata.
The finite substitution problem takes as input two regular languages , and asks whether it is possible to find a finite substitution (i.e., a morphism mapping each letter of the alphabet of to a finite language over the alphabet of ) such that . This problem was shown decidable independently by Bala and Kirsten by a reduction to the limitedness of desert automata (a form of automata weaker than nested distance desert automata, but incomparable to distance automata), and a proof of decidability of this latter problem .
The relative inclusion star-height problem is an extension of the star height problem introduced and shown decidable by Hashiguchi using his techniques . Still using nested distance desert automata, Kirsten gave another, more elegant proof of this result .
The boundedness problem is a problem of model theory. It consists of deciding if there exists a bound on the number of iterations that are necessary for the fixpoint of a logical formula to be reached. The existence of a bound means that the fixpoint can be eliminated by unfolding its definition sufficiently many times. The boundedness problem is usually parameterised by the logic chosen and by the class of models over which the formula is studied. The boundedness problem for monadic second-order formulae over the class of finite words was solved by a reduction to the limitedness problem of distance automata by Blumensath, Otto and Weyer .
One can also cite applications of distance automata in speech recognition , databases , and image compression . In the context of verification, Abdulla, Krcàl and Yi have introduced -automata, which correspond to nested distance desert automata in which the nesting of counters is not required anymore . They prove the decidability of the limitedness problem for this model of automata.
Finally, Löding and the author have also pursued this branch of researches in the direction of extended models. In , the star-height problem over trees has been solved, by a reduction to the limitedness problem of nested distance desert automata over trees. The latter problem was shown decidable in the more general case of alternating automata. In  a similar attempt has been tried for deciding the Mostowski hierarchy of non-deterministic automata over infinite trees (the hierarchy induced by the alternation of fixpoints). The authors show that it is possible to reduce this problem to the limitedness problem for a form of automata that unifies nested distance desert automata and parity tree automata. The latter problem is an important open question.
Bojańczyk and the author have introduced the notion of -automata in , a model which resembles much (and is prior to) -automata. The context was to show the decidability of some fragments of the logic MSO+ over infinite words, in which MSO+ is the extension of the monadic second order logic extended with the quantifier meaning “for all integers , there exists a set of cardinality at least such that holds”. From the decidability results in this work, it is possible to derive every other limitedness results over finite words. However, the constructions are complicated and of non-elementary complexity. Nevertheless, the new notion of -automata was introduced, a model dual to -automata. Recall that the semantics of distance automata and their variants can be expressed as a minimum over all runs of the maximum of the value taken by counters. The semantics of -automata is dual: it is defined as the maximum over all runs of the minimum of the value taken by the counters at the moment of their reset. Unfortunately, it is quite hard to compare in detail this work with all others. Indeed, since it was oriented toward the study of a logic over infinite words, the central automata are in fact and -automata: automata accepting languages of infinite words that have an infinitary accepting condition constraining the asymptotic behaviour of the counters along the run. This makes these automata very different
The proof methods for showing the decidability of the limitedness problem of distance automata and their variants, are also of much interest by themselves. While the original proof of Hashiguchi is quite complex, a major advance has been achieved by Leung who introduced the notion of stabilisation  (see also  for an early overview). The principle is to abstract the behaviour of the distance automaton in a monoid, and further describe the semantics of the counter using an operator of stabilisation, i.e., an operator which describes, given an element of the monoid, what would be the effect of iterating it a “lot of times”. This key idea was further used and refined by Simon, Leung, Kirsten, Abdulla, Krcàl and Yi. This idea was not present in , and this is one explanation for the bad complexity of the constructions.
Another theory related to cost functions is the one developed by Szymon Toruńczyk in his thesis . The author proposes a notion of recognisable languages of profinite words which happen to be equivalent to cost functions. Indeed, profinite words are infinite sequences of finite words (which are convergent in a precise topology, the profinite topology). As such, a single profinite word can be used as a witness that a function is not bounded. Following the principle of this correspondence, one can see a cost function as a set of profinite words: the profinite words corresponding to infinite sequences of words over which the function is bounded. This correspondence makes Toruńcyk’s approach equi-expressive with cost functions over finite words as far as decision questions are concerned. Seen like this, this approach can be seen as the theory of cost functions presented in a more abstract setting. Still, some differences have to be underlined. On one side, the profinite approach, being more abstract, loses some precision. For instance in the present work, we have a good understanding of the precision of the constructions: namely each operation can be performed doing an at most “polynomial approximation
1.2Survey of the theory.
The theory of regular cost functions gives a unified and general framework for explaining all objects, results and constructions presented above (apart from the results in  that are of a slightly different nature). It also allows to derive new results.
Let us describe the contributions in more details.
Cost functions. The standard notion of language is replaced by the new notion of cost function. For this, we consider mappings from a set to (in practice is the set of finite words over some finite alphabet) and the equivalence relation defined by if:
for all , restricted to is bounded iff restricted to is bounded.
Hence two functions are equivalent if it is not possible to distinguish them using arguments of existence of bounds. A cost function is an equivalence class for . The notion of cost functions is what we use as a quantitative extension to languages. Indeed, every language can be identified with (the equivalence class of) the function mapping words in to the value , and words outside to . All the theory is presented in terms of cost functions. This means that all equivalences are considered modulo the relation .
Cost automata. A first way to define regular cost functions is to use cost automata, which come in two flavours, - and -automata. The -automata correspond in their simple form to -automata  and in their simple and hierarchical form to nested distance desert automata in . Those are also very close to -automata in . Following the ideas in , we also use the dual variant of -automata. The two forms of automata, -automata and -automata, are equi-expressive in all their variants, an equivalence that we call the duality theorem. Automata are not introduced in this paper.
Stabilisation monoids. The corresponding algebraic characterisation makes use of the new notion of stabilisation monoids. A stabilisation monoid is a finite ordered monoid together with a stabilisation operation. This stabilisation operation expresses what it means to iterate “a lot of times” some element. The operator of stabilisation was introduced by Leung  and used also by Simon, Kirsten, Abdulla, Krcàl and Yi as a tool for analysing the behaviour of distance automata and their variants. The novelty here lies in the fact that in our case, stabilisation is now part of the definition of a stabilisation monoid. We prove that it is possible to associate unique semantics to all stabilisation monoids. These semantics are represented by means of computations. A computation is an object describing how a word consisting of elements of the stabilisation monoid can be evaluated into a value in the stabilisation monoid. This key result shows that the notion of stabilisation monoid has a “meaning” independent from the existence of cost automata (in the same way a monoid can be used for recognising a language, independently from the fact that it comes from a finite state automaton). This notion of computations is easier to handle than the notion of compatible mappings used in the conference version of this work .
Recognisable cost functions. We use stabilisation monoids for defining the new notion of recognisable cost functions. We show the closure of recognisable cost functions under min, max, and new operations called inf-projection and sup-projection (which are counterparts to projection in the theory of regular languages). We also prove that the relation (in fact the correspoding preorder ) is decidable over recognisable cost functions. This decidability result subsumes many limitedness results from the literature. This notion of recognisability for cost functions is equivalent to being accepted by the cost automata introduced above.
Extension of regular expressions. It is possible to define two forms of expressions, - and -regular expressions, and show that these are equivalent to cost automata. These expressions were already introduced in  in which a similar result was established.
Cost monadic logic. The cost monadic (second-order) logic is a quantitative extension to monadic (second-order) logic. It is for instance possible to define the diameter of a graph in cost monadic logic. The cost functions over words definable in this logic coincide with the regular cost functions presented above. This equivalence is essentially the consequence of the closure properties of regular cost functions (as in the case of regular languages), and no new ideas are required here. The interest lies in the logic itself. Of course, the decision procedure for recognisable cost function entails decidability results for cost monadic logic. In this paper, cost monadic logic is the starting point of our presentation, and our central decidability result is Theorem ? stating the decidability of this logic.
1.3Content of this paper.
This paper does not cover the whole theory of regular cost functions over words. The line followed in this paper is to start from the logic “cost monadic logic”, and to introduce the necessary material for “solving it over words”. This requires the complete development of the algebraic formalism.
In Section 2, we introduce the new formalism of cost monadic logic, and show what is required to solve it. In particular, we introduce the notion of cost function, and advocate that it is useful to consider the logic under this view. We state there our main decision result, Theorem ?.
In Section 3 we present the underlying algebraic structure: stabilisation monoids. We then introduce computations, and establish the key results of existence (Theorem ?) and uniqueness (Theorem ?) of the value computed by computations.
In Section 4, we use stabilisation monoids for defining recognisable cost functions. We show various closure results for recognisable cost functions as well as decision procedures. Those results happen to fulfill the conditions required in Section 2 for showing the decidability of cost monadic logic over words.
In Section 5 some arguments are given on the relationship with the models of automata, which are not described in this document, and on how these different notions interact in the big picture.
2.1Cost monadic logic
Let us recall that monadic second-order logic (monadic logic for short) is the extension of first-order logic with the ability to quantify over sets (i.e., monadic relations). Formally monadic formulae use first-order variables (), and monadic variables (), and it is allowed in such formulae to quantify existentially and universally over both first-order and monadic variables, to use every boolean connective, to use the membership predicate (), and every predicate of the relational structure. We expect from the reader basic knowledge concerning monadic logic.
In cost monadic logic, one uses a single extra variable of a new kind, called the bound variable. It ranges over non-negative integers. Cost monadic logic is obtained from monadic logic by allowing the extra predicate – in which is some monadic variable and the bound variable – if and only if it appears positively in the formula (i.e., under the scope of an even number of negations). The semantic of is, as one may expect, to be satisfied if (the valuation of) has cardinality at most (the valuation of) . Given a formula , we denote by its free variables, the bound variable excluded. A formula that has no free-variables–it may still use the bound variable–is called a sentence.
We now have to provide a meaning to the formulae of cost monadic logic. We assume some familiarity of the reader with logic terminology. A signature consists of a set of symbols . To each symbol is attached a non-negative integer called its arity. A (relational) structure (over the above signature) consists of a set called the universe, and for each symbol of arity of a relation . Given a set of variables , a valuation of (over ) is a mapping which to each monadic variable associates a set , and to each first-order variable associates an element . We denote by the valuation in which is further mapped to . Given a cost monadic formula , a valuation of its free variable over a structure and a non-negative integer , we express by the fact that the formula is satisfied over the structure with valuation when the variable takes the value . Of course, if is simply a sentence, we just write . We also omit the parameter when is a monadic formula.
The positivity assumption required when using the predicate has straightforward consequences. Namely, for all cost monadic sentences , all relational structures , and all valuations , implies for all .
Instead of evaluating as true or false as done above, we see a formula of cost monadic logic of free variables as associating to each relational structure and each valuation of the free variables a value in defined by:
This value can be either a non-negative integer, or if no valuation of makes the sentence true. In case of a sentence , we omit the valuation and simply write . Let us stress the link with standard monadic logic in the following fact:
From now on, for avoiding some irrelevant considerations, we will consider the variant of cost monadic logic in which a) only monadic variables are allowed, b) the inclusion relation is allowed, and c) each relation over elements is raised to a relation over singleton sets. Keeping in mind that each element can be identified with the unique singleton set containing it, it is easy to translate cost monadic logic into this variant. In this presentation, it is also natural to see the inclusion relation as any other relation. We will also assume that the negations are pushed to the leaves of formulae as is usual. Overall a formula can be of one of the following forms:
in which and are formulas, is some symbol of arity which can possibly be (of arity ), and are monadic variables.
So far, we have described the semantic of cost monadic logic from the standard notion of model. There is another equivalent way to describe the meaning of formulae, by induction on the structure. The equations are disclosed in the following fact.
As it is the case for monadic logic, no property (if not trivial) is decidable for monadic logic in general. Since cost monadic logic is an extension of monadic logic, one cannot expect anything to be better in this framework. However we are interested, as in the standard setting, to decide properties over a restricted class of structures. The class can typically be the class of finite words, of finite trees, of infinite words (of length , or beyond) or of infinite trees. The subject of this paper is to consider the case of finite words over a fixed finite alphabet.
We are interested in deciding properties concerning the function described by cost monadic formulae over . But what kind of properties? It is quite easy to see that, given a cost monadic sentence and , one can effectively produce a monadic formula such that for all structures , iff (such a translation would be possible even without assuming the positivity requirement in the use of the predicates ). Hence, deciding questions of the form “” can be reduced to the standard theory.
Properties that cannot be reduced to the standard theory, and that we are interested in, involve the existence of bounds. One says below that a function is bounded over some set if there is some integer such that for all . We are interested in the following generic problems:
Is the function bounded over ?
Or (variant), is bounded over a regular subset of ?
Or (limitedness), is bounded over ?
For all , do only finitely many satisfy ?
Said differently, are all sets over which is bounded of finite cardinality?
For all , does bounded over imply that is also bounded over ?
All these questions cannot be reduced (at least simply) to questions in the standard theory. Furthermore, all these questions become undecidable for very standard reasons as soon as the requirement of positivity in the use of the new predicate is removed. In this paper, we introduce suitable material for proving their decidability over the class of words.
One easily sees that the domination question is in fact a joint extension of the boundedness question (if one sets to be always true, i.e., to compute the constant function ), and the divergence question (if one sets to be measuring the size of the structure, i.e., ). Let us remark finally that if is a formula of monadic logic, then the boundedness question corresponds to deciding if is a tautology. If furthermore is also monadic, then the domination consists of deciding whether implies .
In the following section, we introduce the notion of cost functions, i.e., equivalence classes over functions allowing to omit discrepancies of the function described, while preserving sufficient information for working with the above questions.
In this section, we introduce the equivalence relation over functions, and the central notion of cost function.
A correction function is a non-decreasing mapping from to such that for all . From now on, the symbols implicitly designate correction functions. Given in , holds if in which is the extension of with . For every set , is extended to in a natural way by if for all , or equivalently . Intuitively, is dominated by after it has been “stretched” by . One also writes if and . Finally, one writes (resp. ) if (resp. ) for some . A cost function (over a set ) is an equivalence class of (i.e., a set of mappings from to ).
Some elementary properties of are:
The above fact allows to work with a single correction function at a time. Indeed, as soon as two correction functions and are involved in the same proof, we can consider the correction function . By the above fact, it satisfies that implies , and implies .
The relation has other characterisations:
From ( ?) to ( ?).
Let us assume , i.e., for some . Let be some non-negative integer, and . We have for all , that implies , thus establishing the second statement.
From ( ?) to ( ?).
Let be such that is bounded. Let be a bound of over . Item ( ?) states the existence of such that . In particular, for all , we have by choice of , and hence . Hence is bounded by .
From ( ?) to ( ?).
Let , consider the set . The mapping is bounded over (by ), and hence by ( ?), is also bounded. We set . Since , the function is non-decreasing. Since furthermore , is a correction function. Let now . If , we have that by definition of the ’s. Hence . Otherwise , and we have . Hence .
The last characterisation shows that the relation is an equivalence relation that preserves the existence of bounds. Indeed, all this theory can be seen as a method for proving the existence/non-existence of bounds. One can also remark that the questions of boundedness, divergence, and domination presented in the previous section, are preserved under replacing the semantic of a formula by an -equivalent function. Furthermore, the domination question can be simply reformulated as .
We conclude this section by some remarks on the structure of the relation. Cost functions over some set ordered by form a lattice. Let us show how this lattice refines the lattice of subsets of ordered by inclusion. The following elementary fact shows that we can identify a subset of with the cost function of its characteristic function (given a subset , one denotes by its characteristic mapping defined by if , and otherwise):
In this respect, the lattice of cost functions is a refinement of the lattice of subsets of equipped with the superset ordering. Let us show that this refinement is strict. Indeed, there is only one language such that does not have in its range, namely , however, we will show in Proposition ? that, as soon as is infinite, there are uncountably many cost functions which have this property of not using the value .
Without loss of generality, we can assume countable, and even, up to bijection, that . Let be the sequence of all prime numbers. Every is decomposed in a unique way as in which all ’s are null but finitely many (with an obvious meaning of the infinite product). For all , one defines the function from to for all by:
Consider now two different sets . This means—up to a possible exchange of the roles of and —that there exists . Consider now the set . Then, by construction, and hence is not bounded over . However, and hence is bounded over . It follows by Proposition ? that and are not equivalent for . We can finally conclude that—since there exist continuum many subsets of — there is at least continuum many cost functions over which do not use value .
2.3Solving cost monadic logic over words using cost functions
As usual, we see a word as a structure, the universe of which is the set of positions in the word (numbered from ), equipped with the ordering relation , and with a unary relation for each letter of the alphabet that we interpret as the set of positions at which the letter occur. Given a set of monadic variables , and a valuation of over a word , we denote by the word over the alphabet such that for all position , in which maps to if , and to otherwise.
It is classical that given a monadic formula with free variables , the language
is regular. The proof is done by induction on the formula. It amounts to remark that to the constructions of the logic, namely disjunction, conjunction, negation and existential quantification, correspond naturally some language theoretic operations, namely union, intersection, complementation and projection. The base cases are obtained by remarking that the relations of ordering, inclusion, and letter, also correspond to regular languages.
We use a similar approach. To each cost monadic formula with free variables over the signature of words over , we associate the cost function over defined by
We aim at solving cost monadic logic by providing an explicit representation to the cost functions . For reaching this goal, we need to define a family of cost functions that contains suitable constants, has effective closure properties and decision procedures.
The first assumption we make is the closure under composition with a morphism. I.e., let be a cost function in over and be a morphism from ( being another alphabet) to , we require to also belong to . In particular, this operation allows us to change the alphabet, and hence to add new variables when required. It corresponds to the closure under inverse morphism for regular languages.
Fact ? gives us a very precise idea of the constants we need. The constants correspond to the formulae of the form as well as their negation. As mentioned above, for such a formula , is regular. Hence, it is sufficient for us to require that the characteristic function belongs to for each regular language . The remaining constants correspond to the formula . We have that . This corresponds to counting the number of occurrences of letters from in a word over . Up to a change of alphabet (thanks to the closure under composition with a morphism) it will be sufficient for us that contains the function “” which maps each word to .
Fact ? also gives us a very precise idea of the closure properties we need. We need the closure under and for disjunctions and conjunctions. For dealing with existential and universal quantification, we need the new operations of -projection and -projection. Given a mapping from to and a mapping from to that we extend into a morphism from to ( being another alphabet) the inf-projection of with respect to is the mapping from to defined for all by:
Similarly, the sup-projection of with respect to is the mapping from to defined for all by:
We summarise all the requirements in the following fact.
The remainder of the paper is devoted to the introduction of the class of recognisable cost functions, and showing that this class satisfies all the assumptions of Fact ?. In particular, Item ? is established as Example ?. Item ? is achieved in Example ?. Item ? is the subject of Fact ?, Corollary ? and Theorems ? and ?. Finally, Item ? is established in Theorem ?.
Thus we deduce our main result.
3The algebraic model: stabilisation monoids
The purpose of this section is to describe the algebraic model of stabilisation monoids. This model has, a priori, no relation with the previous section. However, in Section 4, in which we define the notion of a recognisable cost function, we will use this model of describing cost functions.
The key idea—an idea directly inspired from the work of Leung, Simon and Kirsten—is to develop an algebraic notion (the stabilisation monoid) in which a special operator (called the stabilisation, ) allows to express what happens when we iterate “a lot of times” some element. In particular, it says whether we should count or not the number of iterations of this element. The terminology “a lot of times” is very vague, and for this reason such a formalism cannot describe precisely functions. However, it is perfectly suitable for describing cost functions.
The remaining part of the section is organised as follows. We first introduce the notion of stabilisation monoids in Section 3.1, paying a special attention to give it an intuitive meaning. In Section 3.2, we introduce the key notions of computations, under-computations and over-computations, as well as the two central results of existence of computations (Theorem ?) and “unicity” of their values (Theorem ?). These notions and results form the main technical core of this work. Then Section 3.4 is devoted to the proof of Theorem ?, and Section 3.5 to the proof of Theorem ?.
A semigroup is a set equipped with an associative operation ’’. A monoid is a semigroup such that the product has a neutral element , i.e., such that for all . Given a semigroup , we extend the product to products of arbitrary length by defining from to by and . If the semigroup is a monoid of neutral element , we further set . All semigroups are monoids, and conversely it is sometimes convenient to transform a semigroup into a monoid simply by the adjunction of a new neutral element .
An idempotent in is an element such that . We denote by the set of idempotents in . An ordered semigroup is a semigroup together with an order over such that the product is compatible with ; i.e., and implies . An ordered monoid is an ordered semigroup, the underlying semigroup of which is a monoid.
We are now ready to introduce the new notions of stabilisation semigroups and stabilisation monoids.
The intuition is that represents what is the value of when becomes “very large”. Some consequences of the definitions, namely
make perfect sense in this respect: repeating “a lot of ’s” is equivalent to seeing one followed by “a lot of ’s”, etc…This meaning of is in some sense a limit behaviour. This is an intuitive reason why is not used for non-idempotent elements. Consider for instance the element in . Then iterating it yields at even iterations, and at odd ones. This alternation prevents to giving a clear meaning to what is the result of “iterating a lot of times” .
However, this view is incompatible with the classical view on monoids, in which by induction, if , then for all . The idea in stabilisation monoids is that the product is something that cannot be iterated “a lot of times”. For this reason, considering that for all , is correct for “small values of ”, but becomes “incorrect” for “large values of ”. The value of is if is “small”, and it is if is “big”. Most of the remainder of the section is devoted to the formalisation of this intuition, via the use of the notion of computations.
Even if the material necessary for working with stabilisation monoids has not been yet provided, it is already possible to give some examples of stabilisation monoids that are constructed from an informal idea of their intended meaning.
We have seen through the above examples how easy it is to work with stabilisation monoids at an informal level. An important part of the rest of the section is dedicated to providing formal definitions for this informal reasoning. In the above explanations, we worked with the imprecise terminology “a few” and “a lot of”. Of course, the value (what we referred to as “the kind” in the examples) of a word depends on what is the frontier we fix for separating “a few” from “a lot”.
We continue the description of stabilisation monoids by introducing the key notion of computations. These objects describe how to evaluate a long “product” in a stabilisation monoid.
3.2Computations, under-computations and over-computations
Our goal is now to provide a formal meaning for the notion of stabilisation semigroups and stabilisation monoids, allowing to avoid terms such as “a lot” or “a few”. More precisely, we develop in this section the notion of computations. A computation is a tree which is used as a witness that a word evaluates to a given value.
We fix ourselves for the rest of the section a stabilisation semigroup . We develop first the notion for semigroups, and then see how to use it for monoids in Section 3.3 (we will see that the notions are in close correspondence).
Let us consider a word (it is a word over , seen as an alphabet). Our objective is to define a “value” for this word. In standard semigroups, the “value” of is simply , the product of the elements appearing in the word. But, what should the “value” be for a stabilisation semigroup? All the informal semantics we have seen so far were based on the distinction between “a few” and “a lot”. This means that the value the word has depends on what is considered as “a few”, and what is considered as “a lot”. This is captured by the fact that the value is parameterised by a positive integer which can be understood as a threshold separating what is considered as “a few” from what is considered as “a lot”. For each choice of , the word is subject to have a different value in the stabilisation semigroup.
Let us assume a threshold value is fixed. We still lack a general mechanism for associating to each word over a value in . This is the purpose of computations. Computations are proofs (taking the form of a tree) that a word should evaluate to a given value. Indeed, in the case of usual semigroups, the fact that a word evaluates to can be witnessed by a binary tree, the leaves of which, read from left to right, yield the word , and such that each inner node is labelled by the product of the label of its children. Clearly, the root of such a tree is labelled by , and the tree can be seen as a proof of correctness for this value.
The notion of -computation that we define now is a variation around this principle. For more ease in its use, it comes in three variants: under-computations, over-computations and computations.
It should be immediately clear that these notions have to be manipulated with care, as shown by the following example.
There is another problem. Indeed, it is straightforward to construct an -computation for some word, simply by constructing a computation which is a binary tree, and would use no idempotent nodes nor stabilisation nodes. However, such a computation would of course not be satisfactory since every word would be evaluated in this way as . We do not want that. This would mean that the quantitative aspect contained in the stabilisation has been lost. We need to determine what is a relevant computation in order to rule out such computations.
Thus we need to answer the following questions:
What are the relevant computations?
Can we construct a relevant -computation for all words and all ?
How do we relate the different values that -computations may have on the same word?
The answer to the first question is that we are only interested in computations of small height, meaning of height bounded by some function of the semigroup. With such a restriction, it is not possible to use binary trees as computations. However, this choice makes the answer to the second question less obvious: does there always exist a computation?
This result is an extension of the forest factorisation theorem of Simon  (which corresponds to the case of a semigroup). Its proof, which is independent from the rest of this work, is presented in Section 3.4.
The third question remains: how to compare the values of different computations over the same word? An answer to this question in its full generality makes use of under- and over-computations.
Remark first that since computations are special instances of under- and over-computations, Theorem ? holds in particular for comparing the values of computations. The proof of Theorem ? is the subject of Section 3.5.
We have illustrated the above results in Figure ?. It depicts the relationship between computations in some idealised stabilisation monoid . In this drawing, assume some word over some stabilisation semigroup is fixed, as well as some integer . We aim at representing for each the possible values of an -computation, an -under computation or an -over computation for this word of height at most . In all the explanations below, all computations are supposed to not exceed height .
The horizontal axis represents the -coordinate. The values in the stabilisation semigroup being ordered, the vertical axis represents the values in the stabilisation semigroup (for the picture, we assume the values in the stabilisation semigroup totally ordered). Thus an -computation (or -under or -over-computation) is placed at a point of horizontal coordinate and vertical coordinate the value of the computation.
We can now interpret the properties of the computations in terms of this figure. First of all, under-computations as well as over-computations, and as opposed to computations, enjoy certain forms of monotonicity as shown by the fact below.
Fact ? is illustrated by Figure ?. It means that over-computations define a left and upward-closed area, while the under-computations define a right and downward-closed area. Hence, in particular, the delimiting lines are non-decreasing. Furthermore, since computations are at the same-time over-computations and under-computations, the area of computations lie inside the intersection of under-computations and over-computations. Since the height is chosen to be at least , Theorem ? provides for us even more information. Namely, for each value of , there exists an -computation. This means in the picture that the area of computations crosses every column. However, since computations do not enjoy monotonicity properties, the shape of the area of computations can be quite complicated. Finally Theorem ? states that the frontier of under-computations and the frontier of over-computations are not far one from each other. More precisely, if we choose an element of the stabilisation semigroup, and we draw an horizontal line at altitude , if the frontier of under-computations is above or at for threshold , then the frontier of over-computations is also above or at at threshold . Hence the frontier of over-computations is always below the one of under-computations, but it essentially grows at the same speed, with a delay of at most .
Let us finally remark that Theorem ?, which is a consequence of the axioms of stabilisation semigroups, is also sufficient for deducing them. This is formalised by the following proposition.
Let us first prove that is compatible with . Assume and , then is a -under-computation over and is an -over-computation over the same word . It follows that .
Let us now prove that is associative. Let in . Then is a -computation for the word , and is an -computation for the same word. It follows that . The other inequality is symmetric.
Let be an idempotent. The tree is both a and -computation over the word . Furthermore, the tree is also both a and an -computation for the same word. It follows that , i.e., that maps idempotents to idempotents.
Let us show that for all idempotents . The tree is a -computation over the word , and is a -computation over the same word. It follows that .
Let us show that stabilisation is compatible with the order. Let be idempotents. Then and are respectively a -computation for the word and an -over-computation for the same word. It follows that .
Let us prove that stabilisation is idempotent. Let be an idempotent. We already know that (this makes sense since we have seen that is idempotent). Let us prove the opposite inequality. Consider the -computation for the word , and the -computation for the same word. It follows that .
Let us finally prove the consistency of stabilisation. Assume that both and are idempotents. Let be (and similarly for ), i.e., computations for and respectively. Define now:
Then both and are at the same time and -computations over the same word of height at most . Since their respective values are and , it follows by our assumption that .
This result is particularly useful. Indeed, when constructing a new stabilisation semigroup, we usually aim at proving that it “recognises” some function (to be defined in the next chapter). It involves proving the hypothesis of Proposition ?. Thanks to Proposition ?, the syntactic correctness is then for free. This situation occurs in particular in Section 4.5 and Section 4.6 when the closure of recognisable cost-functions under inf-projection and sup-projection is established.
3.3Specificities of stabilisation monoids
We have presented so far the notion of computations in the case of stabilisation semigroups. We are in fact interested in the study of stabilisation monoids. Monoids differ from semigroups by the presence of a unit element . This element is used for modelling the empty word. We present in this section the natural variant of the notions of computations for the case of stabilisation monoids. As is often the case, results from stabilisation semigroups transfer naturally to stabilisation monoids. The definition is highly related to the one for stabilisation semigroups, and we see through this section that it is easy to go from the notion for stabilisation monoid to the one of stabilisation semigroup case, and backward. The result is that we use the same name “computation” for the two notions elsewhere in the paper.
Thus, the definition deals with the implicit presence of arbitrary many copies of the empty word (the unit) interleaved with a given word. This definition allows us to work in a transparent way with the empty word (this saves us case distinctions in proofs). In particular the empty word has an sm--computation which is simply , of value . There are many others, like for instance.
Since each -computation is also an sm-n-computation over the same word, it is clear that Theorem ? can be extended to this situation (just the obvious case of the empty word needs to be treated separately):
The following lemma shows that sm-[under/over]-computations are not more expressive than [under/over]-computations. It is also elementary to prove.
It is simple to eliminate each occurrence of an extra by local modifications of the structure of the sm-computation: replace subtrees of the form by , subtrees of the form by , and subtrees of the form by , up to elimination of all occurrences of . For the empty word, this results in the first part of the lemma. For non-empty words, the resulting simplified sm-computation is a computation. The argument works identically for the under/over variants.
A corollary is that Theorem ? extends to sm-computations.
Indeed, the sm-under-computations and sm-over-computations can be turned into under-computations and over-computations of the same respective values by Lemma ?. The inequality holds for these under and over-computations by Theorem ?.
There is a last lemma which is related and will prove useful.
Let , then for . Let be the -[under/over]-computation for of value . It is easy to construct an -[under/over]-computation for of height at most of value . It is then sufficient to plug in each for the th leave of .
The consequence of these results is that we can work with sm-[under/over]-computations as with [under-over]-computations. For this reason we shall not distinguish further between the two notions unless necessary.
3.4Existence of computations: the proof of Theorem
In this section, we establish Theorem ? which states that for all words over a stabilisation semigroup and all non-negative integers , there exists an -computation for of height at most . Remark that the convention in this context is to measure the height of a tree without counting the leaves. This result is a form of extension of the factorisation forest theorem due to Simon :
Some proofs of the factorisation forest theorem can be found in . Our proof could follow similar lines as the above one. Instead of that, we try to reuse as much lemmas as possible from the above constructions.
For proving Theorem ?, we will need one of Green’s relations, namely the -relation (while there are five relations in general). Let us fix ourselves a semigroup . We denote by the semigroup extended (if necessary) with a neutral element (this transforms into a monoid). Given two elements , if for some . If and , then . We write to denote and . The interested reader can see, e.g.,  for an introduction to the relations of Green (with a proof of the factorisation forest theorem), or monographs such as ,  or  for deep presentations of this theory. Finally, let us call a regular element in a semigroup an element such that for some .
The next lemma gathers some classical results concerning finite semigroups.
We will use the following technical lemma.
We use some standard results concerning finite semigroups. The interested reader can find the necessary material for instance in . Let us just recall that the relations , and and are the one-sided variants of and ( stands for “left” and for “right”). Namely, (resp. ) holds if for some (resp. ), and (resp. ). Finally, .
The proof is very short. By definition since . Since by assumption , we obtain (a classical result in finite semigroups). In a symmetric way . Thus . Since an -class contains at most one idempotent, (it is classical than any -class, when containing an idempotent, has a group structure; since groups contain exactly one idempotent element, this is the only one).
The next lemma shows that the stabilisation operation behaves in a very uniform way inside -classes (similar arguments can be found in the works of Leung, Simon and Kirsten).
For the second part, assume and . Let . We easily check . Furthermore . Hence . It follows by Lemma ? that . We now compute (using consistency and ).
This proves that implies . Using symmetry, we obtain .
Hence, if is a regular -class, there exists a unique -class which contains for one/all idempotents . If , then is called stable, otherwise, it is called unstable. The following lemma shows that stabilisation is trivial over stable -classes.
Indeed, we have and thus by Lemma ?, .
The situation is different for unstable -classes. In this case, the stabilisation always goes down in the -order.
Since , it is always the case that . Assuming is unstable means that does not hold, which in turn implies .
We say that a word in is -smooth, for a -class, if , and . It is equivalent to say that for all . Indeed for all , . Remark that, according to Lemma ?, if is irregular, -smooth words have length at most . We will use the following lemma from  as a black-box. This is an instance of the factorisation forest theorem, but restricted to a single -class.
Remark that Ramsey factorisations and -computations do only differ on what is allowed for a node of large degree, i.e., above . That is why our construction makes use of Lemma ? to produce Ramsey factorisations, and then based on the presence of nodes of large degree, constructs a computation by gluing pieces of Ramsey factorisations together.
Remark that if is irregular, then has length by Lemma ?, and the result is straightforward. Remark also that if is stable, and since the stabilisation is trivial in stable -classes (Lemma ?), every Ramsey factorisations for of height at most (which exist by Lemma ?) is in fact -computations for .
The case of unstable remains. Let us say that a node in a factorisation is big if its degree is more than . Our goal is to “correct” the value of big nodes. If there is a Ramsey factorisation for which has no big node, then it can be seen as an -computation, and once more the first conclusion of the lemma holds.
Otherwise, consider the least non-empty prefix of for which there is a Ramsey factorisation of height at most which contains a big node. Let be such a factorisation and be a big node in which is maximal for the descendant relation (there are no other big nodes below). Let be the subtree of rooted in . This decomposes into where is the factor of for which is a Ramsey factorisation. For this , it is easy to transform into an -computation for : just replace the label of the root of by . Indeed, since there are no other big nodes in than the root, the root is the only place which prevents from being an -computation. Remark that from Lemma ?, the value of is .
If is empty, then is a prefix of , and an -computation for it. The second conclusion of the lemma holds.
Otherwise, by the minimality assumption and Lemma ?, there exists a Ramsey factorisation for of height at most which contains no big node. Both and being -computations of height at most , it is easy to combine them into an -computation of height at most for . This is an -computation for , which inherits from the property that its value is . It proves that the second conclusion of the lemma holds.
We are now ready to establish Theorem ?.
The proof is by induction on the size of a left-right-ideal , i.e., (remark that a left-right-ideal is a union of -classes). We establish by induction on the size of the following induction hypothesis:
for all words there exists an -computation of height at most for .
Of course, for , this proves Theorem ?.
The base case is when is empty, then has length , and a single node tree establish the first conclusion of the induction hypothesis (recall that the convention is that the leaves do not count in the height, and as a consequence a single node tree has height ).
Otherwise, assume non-empty. There exists a maximal -class (maximal for ) included in . From the maximality assumption, we can check that is again a left-right-ideal. Remark also that since is a left-right-ideal, it is downward closed for . This means in particular that every element such that belongs to .
We claim () that for all words ,
either there exists an -computation of height for , or;
there exists an -computation of height at most for some non-empty prefix of of value in .
Let be the longest -smooth prefix of . If there exists no such non-empty prefix, this means that the first letter of does not belong to . Two subcases can happen. If has length , this means that , and thus is an -computation witnessing the first conclusion of (). Otherwise has length at least , and thus belongs to . Since furthermore it does not belong to , it belongs to . In this case, is an -computation witnessing the second conclusion of ().
Otherwise, according to Lemma ? applied to , two situations can occur. The first case is when there is an -computation for of value and height at most . There are several sub-cases. If , of course, the -computation is a witness that the first conclusion of () holds. Otherwise, there is a letter such that is a prefix of . If , then is an -computation for of height at most , witnessing that the first conclusion of () holds. Otherwise, has to belong to (because all letters of have to belong to except possibly the last one). But, by maximality of as a -smooth prefix, either , or . Since is a left-right-ideal, implies . Then, is an -computation for of height at most and value . This time, the second conclusion of () holds.
The second case according to Lemma ? is when there exists a prefix of for which there is an -computation of height at most of value . In this case, is also a prefix of , and the value of this computation is in . Once more the second conclusion of () holds. This concludes the proof of Claim ().
As long as the second conclusion of the claim () applied on the word holds, this decomposes into , and we can proceed with . In the end, we obtain that all words can be decomposed into such that there exist -computations of height at most for respectively, and such that the values of all belong to (but not necessarily the value of ). Let be the values of respectively. The word belongs to . Let us apply the induction hypothesis to the word . We obtain an -computation for of height at most . By simply substituting to the leaves of , we obtain an -computation for of height at most . (Remark once more here that the convention is to not count the leaves in the height. Hence the height after a substitution is bounded by the sum of the heights.)
3.5Comparing computations: the proof of Theorem
We now establish the second key theorem for computations, namely Theorem ? which states that the result of computations is, in some sense, unique. The proof works by a case analysis on the possible ways the over-computations and under-computations may overlap. We perform this proof for stabilisation monoids, thus using sm-computations. More precisely, all statements take as input computations, and output sm-computations, which can be then normalised into non-sm computations. The result for stabilisation semigroup can be derived from it. We fix ourselves from now on a stabilisation monoid .
By induction on the height of the over-computation, using the fact that an -over-computation for a word of length at most cannot contain a stabilisation node.
By induction on the height of the over-computation.
A sequence of words is called a decomposition of if . We say that a non-leaf [under/over]-computation for a word decomposes into ,…, if the subtree rooted at the th child of the root is an [under/over]-computation for , for all . Our proof will mainly make use of over-computations. For this reason, we introduce the following terminology.
We say that a word -evaluates to if there exists an -over-computation for of value . We will also say that -evaluate to if -evaluates to for all .
This notion is subject to elementary reasoning such as (a) -evaluates to or (b) if -evaluate to and -evaluates to , then -evaluates to .
The core of the proof is contained in the following property:
From this result, we can deduce Theorem ? as follows.
Let be as in Lemma ?. Let be the th composition of with itself. Let be an -under-computation of height at most for some word of value , and be an -over-computation for of value . We want to establish that . The proof is by induction on .
If , this means that has length , then and are also restricted to a single leaf, and the result obviously holds. Otherwise, decomposes into . Let ,…, be the values of the children of the root of , read from left to right. By applying Lemma ? on and the decomposition . We construct the -over-computations for respectively, and of respective values , as well as an -over-computation of value for .
For all , we can apply the induction hypothesis on (let us recall that is the sub-under-computation rooted at the th child of the root of ) and , and obtain that . Depending on , three cases have to be separated. If (binary node), then . If (idempotent node), we have which is an idempotent. We have for all . Hence by Lemma ?, , which means . If (stabilisation node), we have once more which is an idempotent, and such that . This time, by Lemma ?, we have . We obtain once more .
The remainder of this section is dedicated to the proof of Lemma ?.
To each ordered pair , let us associates the color . We now apply the theorem of Ramsey to this coloring, for sufficiently large, and get that there exist such that . This implies in particular that , and thus and are idempotents. Furthermore, and . It follows from consistency that . We now have, using the assumptions that and ,
The following lemma will be used for treating the case of idempotent and stabilisation nodes in the proof of Lemma ?.
Let us treat first the case , whatever is . Remark first that naturally -evaluates to . Thus .
Assume now that for where is the constant obtained from Lemma ?. Set for all .
We first claim () that there exists such that -evaluates to . For this, consider the word , and apply Theorem ? for producing an -computation for of height at most . The word has length . Thus there is a stabilisation node in , say of degree . Let be a subtree of rooted at some stabilisation node. Let be the (idempotent) value of the children of this node, the value of being . This subtree corresponds to the factor of . We have to show that -evaluates to . The computation decomposes into and each is of the form for some with . Define now to be and to be for all . It is clear that for all since there is a computation over of value . Furthermore, for all . Hence we can apply Lemma ?, and get that