CoPaR: An Efficient Generic Partition Refiner

# CoPaR: An Efficient Generic Partition Refiner

Hans-Peter Deifel Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany    Stefan Milius Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany    Lutz Schröder Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany    Thorsten Wißmann Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
###### Abstract

Partition refinement is a method for minimizing automata and transition systems of various types. We present a tool implementing a recently developed partition refinement algorithm that is generic in the transition type of the given system and typically runs in time either or for systems with edges and states. This matches the runtime of the best known algorithms for several fixed types of systems, e.g. deterministic automata as well as ordinary, weighted, and probabilistic (labelled) transition systems, and in the case of weighted systems over non-cancellative monoids, such as (the additive monoid of) the tropical semiring, even improves the asymptotic runtime. Genericity is achieved by modelling transition types as endofunctors on sets and state-based systems as coalgebras. In addition to thus obtaining an efficient partion refiner for the mentioned types of systems, we demonstrate how the user can quickly obtain a partition refiner for new types of systems by composing pre-implemented basic functors, and how the tool can easily be extended with new basic functors by implementing a refinement interface.

\DeclareUnicodeCharacter

D7

## 1 Introduction

Minimization is a basic verification task on state-based systems, concerned with reducing the number of system states as far as possible while preserving the system behaviour. It is used for equivalence checking of systems and as a preprocessing step in further system analysis tasks, such as model checking.

In general, minimization proceeds in two steps: (1) remove unreachable states, and (2) identify behaviourally equivalent states. Here, we are concerned with the second step. This step depends heavily on what notion of equivalence is imposed on states; we work with the standard notion of bisimilarity and generalizations thereof. Classically, bisimilarity obeys the principle “states and are behaviourally equivalent if for any step can make to some with some effect, can make a step with the same effect to some such that and are behaviourally equivalent, and vice versa”. It is thus given via a fixpoint definition, to be understood as a greatest fixpoint, and can therefore be iteratively approximated from above. This is the principle behind partition refinement algorithms: Initially all states are tentatively considered equivalent, and then this initial partition is iteratively refined according to observations made on the states until a fixpoint is reached. Unsurprisingly, such procedures run in polynomial time.

In fact, Kanellakis and Smolka [16] provide an algorithm with a runtime of for ordinary transition systems with states and transitions. However, even faster partition refinement algorithms running in have been developed for various types of systems over the past 50 years. For example, Hopcroft’s algorithm minimizes deterministic automata for a fixed input alphabet in [13]; it was later generalized to unbounded input alphabets, with runtime [11, 17]. The Paige-Tarjan-algorithm minimizes transition systems in time [19], and generalizations to labelled transition systems run in in the same time complexity [14, 8, 25]. Minimization of weighted systems is typically called lumping in the literature, and Valmari and Franchescini [27] developed an algorithm, running in , that is rather simple and works with rational weights.

In earlier work [9, 29] we have developed an efficient generic efficient partition refinement algorithm that can be easily instantiated to a wide range of system types, most of the time either matching or improving the previous best run time. The genericity of the algorithm is based on modelling state-based systems as coalgebras following the paradigm of universal coalgebra [21], in which the branching structure of systems is encapsulated as a functor, the type functor. This allows us to cover not only classical relational systems and various forms of weighted systems, but also to combine existing system types in various ways, e.g. nondeterministic and probabilistic branching. Our algorithm uses a functor-specific refinement interface that allows representing coalgebras of the functor as certain graphs. Under suitable assumptions on the type functor and the refinement interface, our algorithm runs in time either or , where and are the numbers of nodes and edges of the representing graph, respectively. In particular the generic algorithm has the same asymptotic complexity as the above-mentioned specific algorithms (for system types with fixed finite alphabet); in the case of weighted systems over non-cancellative monoids, the runtime of the generic algorithm is , thus markedly improving the runtime of previous algorithms for weighted automata [4] and weighted tree automata [12].

In this paper we describe the working tool CoPaR [6] that implements our algorithm in full generality, allowing users to easily obtain efficient partition refiners for a wide variety of different system types by composing a set of pre-implemented basic systems types via a dedicated syntax for functor combination. This already covers all the above-mentioned applications; beyond this, users can also add their own custom system types to the basic system types by implementing a simple interface in Haskell. We hasten to add that due to overhead caused by the genericity (e.g. by calls to the functor interface), existing specific implementations of partition refinement for concrete system types will often beat the generic implementation by a constant factor in both time and space consumption, even though the asymptotic complexity remains the same. We have performed a comparison for those system types for which competing specific implementations exist, and provide estimates of the corresponding constant factors. One should keep in mind, however, that in many cases our tool is either the only available implementation or runs asymptotically faster than existing algorithms.

#### Organization

In Section 2 we recall the necessary technical background and the principles behind our generic algorithm [9, 29]. Section 3 specifying refinement interfaces, while Section 3.6 discusses in detail how the mechanism for combining system types is realized within the tool. Some concrete instantiations are described in Section 4. In Section 5 we evaluate our tool by benchmarking it for some system types, and in particular provide a comparison with existing specific tools. Additional proofs can be found in the full version online [7].

## 2 Theoretical Foundations

Our framework achieves its genericity by using coalgebras for a functor on sets as an abstraction of state-based systems with a certain transition type. We now recall the necessary notions from basic category theory. For more details see [9, 29].

###### Notation 2.1

Given maps and we denote the unique induced map into the cartesian product by . The kernel of a map is the equivalence relation . The canonical surjective map corresponding to an equivalence relation is denoted by .

We identify equivalences on with partitions (and quotients) of and use the maps to denote them.

### 2.1 State-Based Systems as Coalgebras

.   We model state based systems by coalgebras for endofunctors on sets.

###### Definition 1

An endofunctor on sets consists of two mappings: to each set , assigns a set , and to each map , assigns a map such that preserves identity maps and composition of maps.

###### Example 1

The finite powerset functor maps a set to the set of all finite subsets of and a map is mapped to the map taking direct images.

###### Definition 2

Let be an endofunctor on sets. An -coalgebra is a pair , where is a set, the carrier, and a map, the structure.

A homomorphism is a map that preserves the structure, i.e., , where denotes the composition of maps.

Two states of a coalgebra are said to be behaviourally equivalent if there exists a coalgebra homomorphism with . In this case we write .

Informally, a coalgebra for the endofunctor on sets models a state based systems whose successor type is given by the actions of on sets, and the equivalence of the behavious of states is given by the action of on maps:

###### Example 2
1. -coalgebras are finitely branching (unlabelled) transition systems and two states are behaviourally equivalent iff they are bisimilar.

2. For a fixed finite set , the functor sends a set to the set of pairs of boolean values and functions . An -Coalgebra is a deterministic automaton (without initial state). For each state , the first component of tells whether is a final state, i.e. whether accepts the empty word, and the second component is the successor function mapping each input letter to a successor state of . States are behaviourally equivalent iff they accept the same language in the usual automata theoretic sense.

3. For a commutative monoid , the monoid-valued functor sends each set to the set of finitely supported maps , i.e. maps that vanish at almost all . We refer to the elements of as (finitely supported) -valued measures on ; e.g. -valued measures are measures in the standard sense, and -valued measures are signed measures. An -coalgebra is, equivalently, a finitely branching -weighted transition system; for a state , maps each state to the weight of the transition from to . For a map , sends a finitely supported map to the map , corresponding to the standard image measure construction. As the notion of behavioural equivalence of states in a -coalgebra, we obtain weighted bisimilarity: two states are behaviourally equivalent iff for all we have

 ∑x′∼zc(x)(x′)=∑y′∼zc(y)(y′).

For the boolean monoid , the monoid-valued functor is (naturally isomorphic to) the finitary powerset sending a set to the set of finite subsets of , and -coalgebras are finitely branching transition systems.

For the monoid of real numbers , the monoid-valued functor has -weighted systems as coalgebras, e.g. Markov chains.

For the monoid of natural numbers , the monoid-valued functor is the bag functor , sending a set to the set of finite multisets over .

State minimality of a coalgebra is captured by the notion of simplicity: a coalgebra is called simple if every homomorphism is injective. This means that any two behaviourally equivalent states in are already equal.

One can prove that every -coalgebra has a simple quotient, i.e. there exists a surjective homomorphism with simple, and moreover, that quotient is unique up to isomorphism (bijective homomorphism) [15]. Hence, we call the simple quotient of .

Note that surjectivity of means that no additional states are introduced in , and simplicity of means that all behaviourally equivalent states of are indeed identified by . Thus, the task of minimizing a given coalgebra w.r.t. behavioural equivalence is the task of computing its simple quotient. We now recall a partition refinement algorithm that achieves this task on the level of -coalgebras.

### 2.2 Partition Refinement in State-Based Systems

.   When reading the previous non-generic partition-refinement algorithms side by side, one notices that they all follow a common pattern. Given a state based system with state set , the partition refinement algorithms by Hopcroft [13] and Paige-Tarjan [19] maintain two partitions of , and , where will be “one transition step ahead of ”, so the relation is a refinement of the relation . Therefore, the elements of are called subblocks and the elements of are called compound blocks.

###### Algorithm 2.2 (Informal Partition Refinement for a System on X)

Initially, we put (1) and (2) let be the initial partition with respect to the “output behaviour” of the states in . For example, in the case of deterministic automata, the output behaviour of a state is its finality, i.e. the initial partition separates final from non-final states; and in the case of labelled transition systems, the initial partition separates deadlock states from states with successors.

While is properly finer than , iterate:

1. Pick a subblock in that is properly contained in a compound block , i.e. . Note that this choice represents a quotient .

2. Refine with respect to this quotient by splitting into two blocks and ; this is just the (block-wise) intersection of the partitions and .

3. Choose as coarse as possible such that two states are identified in if their transition structure is the same up to . Sometimes, this is referred to as refining in a way such that the transition structure is stable w.r.t. .

Since is refined using information from with each iteration, the partition is finer than invariantly.

Now note that Steps (1) and (2) are independent of the given transition type, because they perform basic operations on quotients. In contrast, the initialization procedure and Step (3) depend on the transition type of the given system , encoded by the type functor.

### 2.3 Coalgebraic Partition Refinement

.   In [9, Section 3 and 4], the informal pattern of Theorem 2.2 is formalized as a mathematical construction on -coalgebras. The key observation is that whenever we write ‘‘separate the states of with respect to a partition ”, then states need to get separated iff they are mapped to different elements by . For example, in the initial step (1), in order to separate final from non-final states in an automaton , we need to separate states w.r.t. the partition . Indeed, two states in have the same finality iff they are identified by the map .

In order to model the intersection of and , we need the following definition.

###### Definition 3

Given a chain of subsets , define , with , by if , if , and if .

Note that the above definition is obtained from , where are the ordinary characteristic functions, by omitting the impossible case of .

A key requirement on the type functor for the correctness of our generic algorithm is the following notion.

###### Definition 4 ([9])

A functor is zippable if the following map is injective for all sets : .

All functors listed in 2 are zippable. Zippable functors are closed under subfunctors, set-wise , , but not under composition [29].

###### Algorithm 2.3

Given a finite coalgebra for zippable, do:

1. Define and .

2. Compute .

While is properly finer than , witnessed by the canonical quotient , iterate the following:

1. Pick a subblock , such that where .

2. Define .

3. Compute .

In step (1) the partition is defined to consist of a single block containing all states from . Step (2) is the partition obtained by considering the ‘output’ of the states. Step (4) is a basic operation on the partition that splits the block into and . Step (5) does its counterpart on taking into account the coalgebraic structure. Note that steps (2) and (5) depend on the type functor and so require more involved computations, which are described later in Section 3.4.

###### Remark 1

We recall a few properties of Theorem 2.3 from [9]:

1. At the end of the loop body, or is finer than .

2. After every iteration of the main loop and get properly finer.

3. Before an iteration of the main loop, let be a map such that and assume that , so e.g.  is a candidate for before the first iteration. Then and , using that is zippable [9, Section 4]. So is a candidate for before the next iteration of the main loop.

4. If , we have that for some map . Assume wlog that is surjective. Then , and is well-defined on , i.e. we obtain a map making a coalgebra homomorphism.

###### Theorem 2.4

For a finite -coalgebra , Theorem 2.3 terminates after at most iterations, and when that happens two states are behaviourally equivalent iff .

Intuitively, gets properly refined from iteration to iteration, and hence can get refined at most times. In every iteration contains all behaviourally equivalent states. Upon termination of the algorithm, is a coalgebra homomorphism, so identifies only behaviourally equivalent states.

## 3 Implementation

We proceed to discuss a more concrete but still highly generic version of the coalgebraic partition refinement algorithm (Algorithm 2.3); we combine the description of the algorithm [9] with an overview of its implementation in Haskell [6]. As indicated in the introduction, the algorithm will either match or improve the best known runtime in most of its instances. Specifically, the steps (2) and (5) in Theorem 2.3 will be implemented in such a way that under suitable assumptions on , the overall complexity of the algorithm is for -coalgebras with states and edges, with the factor depending on details of the functor; in most cases, is either constant or logarithmic in . Regarding assumptions on , first, has to expose some notion of labelled edges, allowing us to count the edges in an -coalgebra. We call this the encoding of the coalgebra. Secondly, needs to implement a refinement interface [9] that is used to implement steps (2) and (5) in the desired run time. We discuss these requirements in following two subsections.

### 3.1 Encoding Functors and Coalgebras

.

###### Definition 5

An encoding of a functor consists of a set of labels and for every set  a map (not necessarily natural in ). The encoding of an -coalgebra is then given by a set of edges, and maps

 graph:E→X×A×Xtype:X→F1

such that and .

This encoding matches how one intuitively draws coalgebras. For instance for Markov chains (see Fig. 1), i.e. coalgebras for the distribution functor , the set of labels is the set of probabilities , and assigns to each finite probability distribution the finite (multi-)set . (This example also illustrates that, as indicated above,  need not be natural.)

### 3.2 Parsing

.   The input for our minimization tool is given as a file that represents a finite -coalgebra , and consists of two parts. The first part is a single line specifying the functor . Each of the remaining lines describes one state and its behaviour . Examples of input files are shown in Figure 1; we next describe the format in detail.

#### Functor specification.

Functors are generally described as composites of basic building blocks; that is, the functor given in the first line of an input file is an expression determined by the grammar

 T::=X∣F(T,…,T)(F:Setk→Set)∈F, (⋆)

where the character X is a terminal symbol and is a set of predefined symbols called basic functors, representing functors of the shape . Only basic functors need to be implemented explicitly, in the sense that the refinement interface needs to be provided; for composite functors, the tool derives instances of the algorithm automatically. Basic functors currently implemented include (finite) powerset , bags/finite multisets , monoid-valued functors , and polynomial functors for finite signatures . Since behavioural equivalence is preserved under converting -coalgebras into -coalgebras for a superfunctor  of , we also cover subfunctors of these functors, such as the finite distribution functor . Polynomial functors can have multiple arguments, so e.g.  is also considered a polynomial functor . The polynomial constructs and are written infix, so the currently supported grammar is effectively

 T ::=X∣PT∣BT∣DT∣M(T)∣Polynomial constructsC∣T+T∣T×T∣TA, C ::=N | AA::={s1,…,sn}

where the are strings subject to the usual conventions for C-style identifiers, exponents are written F^A, and is one of the monoids , , (multiplication), and , the latter being the additive monoid of the tropical semiring. Note that  effectively ranges over at most countable sets, and  over finite sets. A term  determines a functor in the evident way, with X interpreted as the argument.

#### Coalgebra specification.

The remaining lines define a finite -coalgebra . Each line of the form defines a state , where is a C-style identifier, and represents the element . The syntax for depends on the specified functor , and follows the structure of the term  defining ; we write to indicate that is a term describing an element of :

• For the base case , is an element of , i.e. one of the named states specified in the file.

• For , we have where and . Similarly for , where , . Both constructs extend to higher arities and .

• For sets and multisets , we have brace notation with .

• The notation for -valued measures is with and . It denotes with , cf. Fig. 0(a).

E.g. if the functor  is given by the term ^(X)), then the one-line declaration x: {(a,{x: 2.4}), (a,{}), (b,{x: -8})} defines an -coalgebra with a single state , having two -successors and one -successor, where successors are -valued measures on . One -successor is the zero measure, and the other assigns weight  to ; the -successor assigns weight to  . More examples are shown in Fig. 1.

After reading the functor term , the tool builds a parser for the system-specific input format and parses the above syntax into a map of type . In the case of a composite functor, the system then undergoes a substantial amount of preprocessing that affects also how edges need to be counted; we defer the discussion of this point to Section 3.6, and assume for the time being that is a basic functor with only one argument.

### 3.3 Refinement Interfaces

.   In order to implement Theorem 2.3 efficiently, we need to be more specific what the algorithm does in the steps (2) and (5) which depend on the type functor of the input coalgebra . Given the input coalgebra in the encoded form , we need additional information about the underlying functor in order to refine the system correctly. This information is provided by an interface that needs to be implemented for every functor of interest:

###### Definition 6

Given the encoding of the set functor , a refinement interface for consists of a set of weights and functions

 init:F1×BA→Wandupdate:BA×W→W×F3×W

satisfying the following coherence condition: there exists a map such that for all and all :

 w(X)(t)=init(F!(t),Bπ1(♭(t)))⟨w(S),FχCS,w(C∖S)⟩(t)=update({a∣(a,x)∈♭(t),x∈S},w(C))

Note that the comprehension in the first argument of is to be read as a multiset comprehension. Intuitively, is the overall weight of the block in the term . The purpose of is to be able to let efficiently compute the value when is only partially given, more precisely, only the edges into and the weight are given. A more detailed description of the algorithm and how it uses the refinement interface is given in Section 3.4.

For example for we have and carries the accumulated weight of and , i.e. ; in other words . For many other functors , we have , too. In contrast for , we have , and counts the number of edges in to and . With the right choice of for a functor , the functions and are usually easily derived.

###### Example 3

Let be one of the functors where is a group and a finite signature resp. its corresponding polynomial functor. Then , and ; full definitions of the refinement interfaces can be found in [9, 29] and the source.

###### Remark 2

In [12], cancellative and non-cancellative monoids are considered. The refinement interface for groups can still be used in the cancellative case, because every commutative, cancellative monoid embeds into an abelian group by the Grothendieck group construction. For non-cancellative monoids, we have the following refinement interface for arbitrary monoids which is new and of course can be found in its entirety in the implementation [6].

###### Example 4

For , we have the encoding with the labels being all elements except the unit, and where . We have a refinement interface with weights . So every consists of the accumulated weight of in and a bag, i.e. a finitely supported map , of non-zero monoid elements, listing the coefficients of -elements in the -valued measure :

 w(C)(t)=(∑\mathclapx∈X∖Ct(x), (m↦|{x∈C∣μ(x)=m}|)).\vspace1pt

Note that there is a canonical map adding up monoid elements in a given bag. Thus, we have for every . The refinement interface is defined by and where we define for bags by .

###### Example 5

The powerset functor is a monoid valued functor for the boolean monoid , and so is a special case of the previous example, with labels and weights . Its refinement interface is explicitly given in [9, 29].

### 3.4 The Algorithm

.   The pseudocode that computes the steps (2) and (5) of Theorem 2.3 is presented and analysed in [9, 29]; here we briefly recall the ideas before presenting the implementation details.

For the correctness argument of the algorithm, we have a mapping from edges to memory cells of type . This mapping maintains the following invariant: for every and , the edge is mapped to , and for edges we have that these edges map to the same memory location iff and are in the same block of .

Step (2) creates such that it identifies precisely those elements with the same value and it establishes the invariant by calling for each and storing (which is returned by ) in a new memory cell and lets all the outgoing edges of point to this cell.

Step (5) splits a block into and , and uses to efficiently compute . In the pseudocode, is called for each that has at least one edge into . See Figure 2 for such a situation. Here, is called with the bag and the weight which is stored in every edge from to . In order to re-establish the invariant, has to return the weights that will be saved in the edges from to and to , namely and respectively. Since all edges from to save in the same memory location, this value is updated with and a new memory cell is allocated holding and all edges from to are modified such that they use this new memory cell.Then, the set of all states with edges to is partitioned by the value returned by and this partition is used to refine .

###### Theorem 3.1 ([9, 29])

Assume that all the functions of the refinement interface of run in for and that comparison of elements runs in for every -coalgebra with and . Then Theorem 2.3 computes the simple quotient of in .

###### Example 6
1. For the functors , , , , , where s a group, is a finite monoid, and a finite signature, we have and thus an overall complexity of .

2. For an arbitrary (possibly infinite) monoid the functor meets the complexity bounds for the refinement interface with . The elements of have to be implemented using the data structure ‘map’, whose basic operations run logarithmically in the map’s size which is bound by the total number of edges and .

### 3.5 Implementation Details

.  The actual implementation of our algorithm in Haskell is a pretty close translation of the pseudocode given in [9, 29], but fills in a few omitted data structures and routines.

One key data structure that is needed for is a refinable partition, which maintains the current partition of states during the execution of the algorithm. It has to provide constant time operations for finding the size of a block, marking a state and counting the marked states in a block. Splitting a block in marked and unmarked states must only take linear time in the number of marked states of this block. A possible implementation of such a data structure is described in [27].

Picking a subblock from in step (3) of algorithm 2.3 is implemented as a queue of blocks that fulfil the property , where is the compound block that contains . The termination condition of the main loop translates into the queue being empty. There are some important corner cases when a subblock is already in the queue, and these are discussed in [27].

One optimization not contained in [27, 29] is that weights for blocks of exactly one state are not computed, as those cannot be split any further. This has drastic performance benefits for inputs where the algorithm produces many single-element blocks early on, e.g. (nearly) minimal systems with a fine grained initial partition.

Each basic functor is defined in its own source file as a single Haskell data type that implements two type classes: a class that directly corresponds to the refinement interface given in 6 with its methods init and update and a parser that defines the coalgebra syntax for the functor. This means that new basic functors can be implemented without modifying any of the existing code, except for registering the new type in a list of existing functors; see src/Copar/Functors [6] for the implemented refinement interfaces.

### 3.6 Combining refinement interfaces

.  So far, we have discussed the results when from (3.2) consists of a plain functor with a refinement interface. However, our implementation is not only modular in the sense that one can implement new refinement interfaces, but also because one can freely combine functors without the need of writing a single line of code. By this we mean, that the implementation supports all functors given by (3.2) where is the set of functors with a refinement interface already implemented.

For example, we can handle systems of type . To achieve this, a given -coalgebra is transformed into one for the functor

 T′X=DX+(N×X×X)+PX+BX.

This functor is obtained as the sum of all basic functors involved in , i.e. of all the nodes in the visualization of the functor term (Figure 3). Then the components of the refinement interfaces of the four involved functors , , , and are combined by disjoint union . The transformation of a coalgebra into a -coalgebra introduces sets of intermediate states for each edge in the visualization of the term in Figure 3. First, we introduce intermediate states for every outgoing -edge from every , i.e. . Their successors lie in for the additional intermediate state sets and whose successors, in turn, lie in and , respectively. Overall, we obtain a -coalgebra on , whose minimization w.r.t.  yields the minimization of the given -coalgebra. The theoretical foundations of this construction as well as its correctness is established in full generality in [29].

However, while this is theoretically sound and has no effect on the complexity, the intermediate states lead to a substantial blow up of the state space w.r.t. the given coalgebra, which affects the memory consumption, hence the scalability, of the algorithm rather badly. Fortunately, this can be partially avoided. In fact, our implementation performs an optimization step that flattens out polynomial parts of the functor term for and thus avoids introducing the corresponding intermediate states. In our example instance, a given -coalgebra is transformed into a coalgebra for the functor with the intermediate states in above but avoiding and . Again, the refinement interface for this functor can be derived from the ones of , , and . In concrete instances, this optimization step may lead to a drastic reduction of the size of the state space of the resulting coalgebra on which partition refinement is performed (see Section 5 for details).

## 4 Instances

Many systems are coalgebras for combined functors. In the following, we list various system types that can be handled by CoPaR. In all cases is the number of edges and is the number of states, i.e. the size of the carrier of the coalgebra.

Transition Systems are coalgebras for , which are minimized in , matching the complexity of the Paige Tarjan algorithm [19].

Initial partitions: Note that in the classical Paige Tarjan algorithm [19], the input also features an initial partition explicitly. This is accomplished by considering coalgebras for the functor , where the first component of a coalgebra assigns each state the number of its block in the initial partition. In general, partition refinement on -coalgebras with an initial partition is just partition refinement for . This does not change the complexity (cf. Section 3.6), and thus the initial partition is omitted in all further examples.

Labelled Transition Systems (LTS) are coalgebras for where is a fixed set of labels. Since the preprocessing (Section 3.6), introduces one intermediate state per edge, we minimize LTSs in , matching the complexity of [10] which is slightly slower than the of [25].

Markov Chains are coalgebras for or sometimes , when negative weights are allowed. The partition refinement is called markov chain lumping, and is done by CoPaR in matching the best known complexity, presented in [27].

Markov decision processes combine nondeterministic and probabilistic choice. The following two instances give a transition system representation of them and two further instances are evaluated in Section 5.

Simple Segala Systems are introduced in [23] and are coalgebras for . A system with nondeterministic and probabilistic edges is minimized in , because the preprocessing introduces one intermediate state per probabilistic edge. Since , this is as least as fast as , previously the best known complexity [1].

General Segala Systems are coalgebras for and have a run-time in where again is the number of nondeterministic edges and the number of probabilistic edges.

Weighted Tree Automata (WTA) for a signature and a commutative monoid are coalgebras for , where the signature is interpreted as the corresponding polynomial functor. Coalgebraic behavioural equivalence is precisely backward simulation (cf. [12, Def 4.1], [4]). Let be the number of edges in an -coalgebra with states and . If is a group or cancellative monoid, then the minimization runs in which is slightly slower than the in [12]. If is not cancellative we use 62 and obtain minimization in which is faster than the previously known from [12]. This instance our algorithm was not reported in our previous work [9, 29]. It may serve to demonstrate how easy it is to arrive at a working implementation for a new system type: one simply needs to implement the refinement interface for that functor (cf. 4).

The 1-dimensional Weisfeiler-Lehman Algorithm (WL), also known as color refinement, is an important subroutine in graph isomorphism checking (and originally conjectured to check graph isomorphism, see e.g. [5, 28]). Its input is an undirected graph , where contains two-element subsets of . Then WL is just partition refinement on the -coalgebra defined by (i.e. introduce two directed edges per undirected edge). Our algorithm runs in which is faster than the existing polynomial algorithms analysed in the literature [24].

Deterministic Finite Automata (DFAs) for a fixed input alphabet correspond to coalgebras for the functor and our algorithm instantiates to Hopcroft’s classical minimization [13] with running time . If is part of the input, DFAs can be regarded as labelled transition systems.

Further system types are covered by combining , , and in various ways; a list can be found in [2], including reactive systems, generative systems, stratified systems, alternating systems, bundle systems, and Pnüeli-Zuck systems.

## 5 Benchmarking and Evaluation

We tested our implementation with a variety of test cases of multiple system types to verify its correctness and performance. For the benchmarks below, our Haskell program was compiled with GHC version 8.4.4 and executed on an Intel® Core™ i5-6500 processor with 3.20GHz clock rate and 16 GiB of RAM. For large inputs, memory usage is the limiting factor, and for this reason, all timing results in this section are in the range of a few seconds to two minutes. Timings are given individually for the three phases parse, init and refine of our tool, which correspond resp. to reading and parsing the input file as described in Section 3.2, creating the initial partition in step (2) of our algorithm, and executing step (5) until the termination condition is met.

Our first benchmark suite consists of randomly generated DFAs for a fixed size and alphabet , which were generated by uniformly choosing a successor for each state and letter of the alphabet and for each state whether it is final or not. All of the resulting automata were already minimal, which means that our algorithm has to refine the initial partition until all blocks are singletons, which requires a maximal number of iterations. However, the resulting systems were eligible for the optimization described in Section 3.5. The results in (a) and 3(b) show that the implementation can handle coalgebras with 10 million edges, and that parsing the input takes more time than the actual refinement for these particular systems.

In addition, we translated the benchmark suite of the model checker PRISM [18] to coalgebras for the functors for continuous time markov chains (CTMC) and for Markov decision processes (MDP). In contrast to DFAs, these functors are compositions of several basic functors and thus require the construction described in Section 3.6. Two of those benchmarks [20] are shown in (c) with different parameters, resulting in three differently sized coalgebras each. The fms family of systems model a flexible manufacturing system as CTMC, while the wlan benchmarks model various aspects of the IEEE 802.11 Wireless LAN protocol as MDP.

The results in (c) show that refinement for the fms benchmarks is faster than for the respective wlan ones, even though the first group has more edges. This is due to (a) the fact that the functor for MDPs is more complex and thus introduces more indirection into our algorithms, as explained in Section 3.6, and (b) that our optimization for one-element blocks from Section 3.5 fires much more often for fms.

Compared to an optimized C++ implementation of specialized refinement algorithms for labelled transition systems and Markov chains [27, 26], our implementation is slower by a factor of approximately 2.5 to 15, depending on the system type and how often our optimization on single-element blocks is applicable. This has several reasons: our implementation is written in Haskell and requires additional abstractions and indirections in the code due to the genericity; moreover, our input format is much more expressive than the one for the C++ implementation, slowing down parsing.

## 6 Conclusion and Future Work

We have presented a tool that efficiently minimizes systems w.r.t. coalgebraic behavioural equivalence. Both the algorithm and its implementation are highly generic, and thus provide a tool that can be used off-the-shelf as an efficient minimizer for many different types of state-based systems, e.g. deterministic automata, weighted tree automata, ordinary, weighted, and probabilistic transition systems, and Segala systems (which combine nondeterminism and probabilistic choice). Our tool can easily be extended to accomodate new system types, either by combining existing basic types by composition and polynomial constructions, or by implementing a simple functor interface for a new basic system type. Remarkably, there are instances where the generic algorithm is asymptotically faster than the best algorithms in the literature, notably on weighted systems over non-cancellative weight monoids.

Future development of the generic algorithm will include further broadening of the class of functors covered, in particular with a view to neighbourhood systems, whose coalgebraic type functor fails to be zippable. Another important extension is to include support for base categories (of state spaces) beyond sets, with nominal sets (the base category for nominal automata [3, 22]) as a particularly promising candidate.

Several optimizations of the current implementation can be envisaged; e.g. the implementation could be extended to detect when the set of weights is trivial and simplify the refinement interface.

## References

• [1] C. Baier, B. Engelen, and M. Majster-Cederbaum. Deciding bisimilarity and similarity for probabilistic processes. J. Comput. Syst. Sci., 60:187–231, 2000.
• [2] F. Bartels, A. Sokolova, and E. de Vink. A hierarchy of probabilistic system types. In Coagebraic Methods in Computer Science, CMCS 2003, volume 82 of ENTCS, pages 57 – 75. Elsevier, 2003.
• [3] M. Bojańczyk, B. Klin, and S. Lasota. Automata theory in nominal sets. Logical Methods in Computer Science, 10(3), 2014.
• [4] P. Buchholz. Bisimulation relations for weighted automata. Theoret. Comput. Sci., 393:109–123, 2008.
• [5] J.-Y. Cai, M. Fürer, and N. Immerman. An optimal lower bound on the number of variables for graph identification. Combinatorica, 12(4):389–410, dec 1992.
• [6] H.-P. Deifel. CoPaR. Project repository.
• [7] H.-P. Deifel, S. Milius, L. Schröder, and T. Wißmann. Copar: An efficient generic partition refiner. full version with appendix.
• [8] S. Derisavi, H. Hermanns, and W. Sanders. Optimal state-space lumping in markov chains. Inf. Process. Lett., 87(6):309–315, 2003.
• [9] U. Dorsch, S. Milius, L. Schröder, and T. Wißmann. Efficient Coalgebraic Partition Refinement. In R. Meyer and U. Nestmann, editors, 28th International Conference on Concurrency Theory (CONCUR 2017), volume 85 of Leibniz International Proceedings in Informatics (LIPIcs), pages 32:1–32:16, Dagstuhl, Germany, 2017. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
• [10] A. Dovier, C. Piazza, and A. Policriti. An efficient algorithm for computing bisimulation equivalence. Theor. Comput. Sci., 311(1-3):221–256, 2004.
• [11] D. Gries. Describing an algorithm by Hopcroft. Acta Informatica, 2:97–109, 1973.
• [12] J. Högberg, A. Maletti, and J. May. Bisimulation minimisation for weighted tree automata. In Proceedings of the 11th International Conference on Developments in Language Theory, DLT’07, pages 229–241, Berlin, Heidelberg, 2007. Springer-Verlag.
• [13] J. Hopcroft. An algorithm for minimizing states in a finite automaton. In Theory of Machines and Computations, pages 189–196. Academic Press, 1971.
• [14] D. Huynh and L. Tian. On some equivalence relations for probabilistic processes. Fund. Inform., 17:211–234, 1992.
• [15] T. Ihringer. Algemeine Algebra. Mit einem Anhang über Universelle Coalgebra von H. P. Gumm, volume 10 of Berliner Studienreihe zur Mathematik. Heldermann Verlag, 2003.
• [16] P. C. Kanellakis and S. A. Smolka. CCS expressions, finite state processes, and three problems of equivalence. Inf. Comput., 86(1):43–68, 1990.
• [17] T. Knuutila. Re-describing an algorithm by Hopcroft. Theor. Comput. Sci., 250:333 – 363, 2001.
• [18] M. Kwiatkowska, G. Norman, and D. Parker. PRISM 4.0: Verification of probabilistic real-time systems. In G. Gopalakrishnan and S. Qadeer, editors, Proc. 23rd International Conference on Computer Aided Verification (CAV’11), volume 6806 of LNCS, pages 585–591. Springer, 2011.
• [19] R. Paige and R. E. Tarjan. Three partition refinement algorithms. SIAM J. Comput., 16(6):973–989, 1987.
• [20] PRISM. Benchmarks fms and wlan. Accessed: 2019-11-16.
• [21] J. Rutten. Universal coalgebra: a theory of systems. Theor. Comput. Sci., 249:3–80, 2000.
• [22] L. Schröder, D. Kozen, S. Milius, and T. Wißmann. Nominal automata with name binding. In J. Esparza and A. Murawski, editors, Foundations of Software Science and Computation Structures, FOSSACS 2017, volume 10203 of LNCS, pages 124–142, 2017.
• [23] R. Segala and N. Lynch. Probabilistic simulations for probabilistic processes. In Concurrency Theory, CONCUR 1994, pages 481–496. Springer, 1994.
• [24] N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt. Weisfeiler-lehman graph kernels. J. Mach. Learn. Res., 12:2539–2561, Nov. 2011.
• [25] A. Valmari. Bisimilarity minimization in time. In Applications and Theory of Petri Nets, PETRI NETS 2009, volume 5606 of LNCS, pages 123–142. Springer, 2009.
• [26] A. Valmari. Simple bisimilarity minimization in o (m log n) time. Fundamenta Informaticae, 105(3):319–339, 2010.
• [27] A. Valmari and G. Franceschinis. Simple time Markov chain lumping. In Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2010, volume 6015 of LNCS, pages 38–52. Springer, 2010.
• [28] B. Weisfeiler. On Construction and Identification of Graphs. Springer, 1976.
• [29] T. Wißmann, U. Dorsch, S. Milius, and L. Schröder. Efficient and modular coalgebraic partition refinement. submitted, https://arxiv.org/abs/1806.05654, 2018.

## Appendix A Omitted Proofs

### a.1 Notes on cancellative monoids (2)

Recall that a commutative monoid is cancellative iff implies . The Grothendieck group construction for cancellative monoids defines the abelian group , where is the equivalence relation given by

 (a,b)∼(a′,b′) iff a+b′=a′+b

The transitivity of follows from the monoid being cancellative. The group addition is given component-wise, the inverse of is , and the inclusion , is a monoid homorphism.

### a.2 Details on weighted tree automata

###### Definition 7

In [12], a weighted tree automaton (WTA) consists of a finite set of states , a finite signature, and a commutative monoid , a map and for each , a map (where is the set of -ary symbols in ).

Equivalently, a WTA is a (finite) coalgebra for the functor

 FX=M×MΣX

where the singature is identified with its corresponding polynomial functor .

###### Remark 3

In [12, Definition 4.1], a backward bisimulation on a WTA is defined to be an equivalence relation such that for every and every and :

 ∑w∈Lμk(σ)w,p=∑w∈Lμk(σ)w,q.

This is equivalent to saying that the canonical quotient is a coalgebra homomorphism.

In order to an -coalgebras with edges and states, we need to introduce one intermediate state per edge as described in Section 3.6, and we need to use the general refinement interface for non-cancellative monoids (4,item 2), minimization runs in

 O((m+n)⋅log(m+n)⋅log(m)) for arbitrary commutative % monoids M.

If the monoid is cancellative, the refinement interface for groups can be used (2) and so we minimize in

 O((m+n)⋅log(m+n)) for cancellative commutative monoids M.

In [12, Theorem 4.21], an algorithm for minimization under backward bisimulation is given and it runs in where is the maximum rank of the input alphabet, i.e. maximum arity in in our terms, is the size of the transition table, and is the number of states, so our algorithm is faster, since for

 O(m⋅log(m)2)⊊O(m⋅√m)m∈O(n2)⊆O(m⋅√n2)=O(m⋅n)

For cancellative monoids, the time bound in [12] is reduced to [12, 1.4 Algorithms]. Which is slightly faster than our and is the same issue as for LTSs.

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters