Sat solvers for synchronization in non-deterministic automata

# Using Sat solvers for synchronization issues in non-deterministic automata

Hanan Shabana and Mikhail V. Volkov Hanan Shabana
iiii Institute of Natural Sciences and Mathematics, Ural Federal University
iiii Lenina 51, 620000 Yekaterinburg, Russia
iiii Faculty of Electronic Engineering, Menoufia University, Egypt
Mikhail V. Volkov
iiii Institute of Natural Sciences and Mathematics, Ural Federal University
iiii Lenina 51, 620000 Yekaterinburg, Russia
Supported by the Russian Foundation for Basic Research, grant no. 16-01-00795, the Ministry of Education and Science of the Russian Federation, project no. 1.3253.2017, and the Competitiveness Enhancement Program of Ural Federal University.

Abstract. We approach the problem of computing a -synchronizing word of minimum length for a given nondeterministic automaton via its encoding as an instance of SAT and invoking a SAT solver. We also present some experimental results.

Keywords: Nondeterministic automaton, synchronizing word, SAT, SAT-solver.

## 1. Background and overview

We assume the reader’s familiarity with some basic concepts of computational complexity theory that can be found in the early chapters of any general complexity theory text such as, e.g., . As far as automata theory is concerned, we have tried to make the paper, to a reasonable extent, self-contained.

One of the significant concepts for digital systems is synchronization. It means that all parts of the system are in agreement regarding the present state of the system. This concept is of immense importance in fields such as coding theory, conformance testing, biocomputing, industrial robotics, and many others, and also leads to intriguing mathematical questions, see, e.g., .

From the viewpoint of mathematics, discrete systems are often modeled as finite automata. A finite automaton is a triple , where is a finite non-empty set which elements are referred to as states, is a finite non-empty set which is called the input alphabet and which elements are referred to as input symbols or input letters, and is a map, called the transition function, that describes the action of symbols in at states in . Finite automata are usually classified into three categories according to the nature of their transition function.

1. is a deterministic finite automaton (DFA) if the transition function is a total map , that is, is defined for every state and for every symbol . We interpret as the next state where the DFA would move to if it was at the state and read the symbol .

2. is a partial finite automaton (PFA) if the transition function is a partial map , that is, is defined for some pairs but may be undefined for some other pairs. We again interpret , provided it is defined, as the next state where the PFA would move to if it was at the state and read the symbol , and we write to indicate that is undefined111It should be noted that in the literature, automata that we call PFAs sometimes are referred to as deterministic finite automata while our DFAs are called complete deterministic finite automata..

3. is a nondeterministic finite automaton (NFA) if the transition function is a map , where is the power set of , that is, for every state and for every symbol , the expression is not a single state, but rather a subset of states. If this subset is non-empty, we interpret it as the set of all possible states where the NFA could move to if it was at the state and read the symbol . If , we say that the action of is undefined at .

Clearly, both DFSs and PFAs can be considered as special instances of NFAs. Therefore, in the sequel, we define all concepts for NFAs, commenting on their specializations for NFAs and PFAs, if necessary.

We represent a given automaton by the labeled directed graph with the vertex set , the label alphabet , and the set of labeled edges

 {qs→q′∣q,q′∈Q, s∈Σ, q′∈δ(q,s)}.

Figure 1 shows examples of a DFA (left) and a NFA (right). We adopt the convention that edges with multiple labels represent bunches of parallel edges. Thus, the edge in Figure 1 represents the two parallel edges and , etc.

Given an alphabet , a word over is a finite sequence of symbols from . We do not exclude the empty sequence from this definition; that is, we allow the empty word. The set of all words over including the empty word is denoted by and is referred to as the free monoid over . If with is a non-empty word over , the number is said to be the length of and is denoted by . The length of the empty word is defined to be 0. The set of all words of a given length over is denoted by .

For every NFA , the transition function can be extended to a function (still denoted by ) by induction on the length of . If , that is, is the empty word, then, for each , we let . If , we represent as with and and, for each , let . (The right hand side of the latter equality is defined by the induction assumption since .) To lighten the notation, we write for and for whenever we deal with a fixed automaton.

Here we are interested in synchronization of finite automata. The idea of synchronization is as follows: for a given automaton, we are looking for an input word that directs the automaton to a specific state, no matter at which state the automaton was at the beginning. This input is called a synchronizing word, and if an automaton possesses such a word, it is called synchronizing.

The above informal idea of synchronization is easy to formalize for DFAs but for NFAs it admits several non-equivalent formalizations. First, we recall the three versions that were suggested in  and have been widely studied thereafter.

Let be an NFA, . A word is said to be -synchronizing for if it satisfies the condition from the list below:

1. ;

2. ;

3. .

A NFA is called -synchronizing, , if it has a -synchronizing word222In some sources, the requirement is not explicitly included in the definition of -synchronization. If one omits this requirement, every word that is nowhere defined becomes -synchronizing. We think this version of synchronization hardly is of independent interest since it readily reduces to -synchronization in our sense in the automaton obtained from by adding a new sink state and making all transitions undefined in lead to this sink state.. It should be clear that every -synchronizing word is also -synchronizing and every -synchronizing word is also -synchronizing. The converse is not true in general. For an illustration, consider the NFA in Figure 1 (right). It is easy to see that for it, the word is -synchronizing, the word is -synchronizing, but not -synchronizing, and the word is -synchronizing, but not -synchronizing. Moreover, the NFA obtained from by omitting the letter is -synchronizing, but not -synchronizing, while the NFA obtained from by omitting the letters and is -synchronizing, but not -synchronizing.

Yet another version of synchronization for NFAs has been studied by Martyugin, see, e.g., . Let be an NFA. A word with is said to be carefully synchronizing for if it satisfies the condition , being the conjunction of below:

1. is defined for all ,

2. with is defined for all ,

3. .

Thus, when is applied at any state in , no undefined transition occurs during the course of application. Clearly, every carefully synchronizing word is also -synchronizing but the converse is not true. For instance, the word is not carefully synchronizing for the NFA in Figure 1 (right); moreover, this NFA possesses no carefully synchronizing word. We call a NFA carefully synchronizing if it admits a carefully synchronizing word. Thus, if we denote by , , the class of all -synchronizing NFAs and by the class of all carefully synchronizing NFAs, we have the following strict inclusions:

 C⊂D1⊂D2⊂D3.

In this paper, we consider -synchronization. As it can been seen from the above discussion, it is the most general version of synchronization for NFAs amongst those considered in the literature so far. Besides that, we think that it reasonably reflects the basic nature of non-determinism. Indeed, if an NFA is used as an acceptor, we designate some states in as initial and final and then say that accepts a word whenever there exists a path labeled that starts at an initial state and terminates at a final state. The definition of a -synchronizing word very much resembles this concept: a word is -synchronizing whenever for each , there exists a path labeled that starts at and terminates at a certain common state, independent of . In both cases we do not require that a starting state uniquely determines the path labeled nor that every path labeled with a given starting state should arrive at a final/common state.

We also mention in passing that -synchronization gets a very transparent meaning within a standard matrix representation of NFAs. In this representation, an NFA becomes a collection of Boolean -matrices where to each input symbol , a matrix is assigned such that the -entry of is 1 if and 0 otherwise. Then it is not hard to realize that the automaton is -synchronizing if and only if some product of the matrices , , has a column consisting entirely of 1s.

Some information about -synchronization can be found in Chapter 8 of Ito’s monograph ; recently, some aspects of -synchronization has been considered in [6, 7, 8, 9]. (The papers [6, 7] use the language of matrices rather than that of automata.)

It is easy to see that each of the conditions , , , leads to the same notion when restricted to PFAs. Thus, for PFAs and, in particular, for DFAs, we call a word synchronizing if it satisfies any of these conditions. A PFA (in particular, a DFA) is said to be synchronizing if it has a synchronizing word.

It is known that the problem of determining whether or not a DFA with states is synchronizing can be solved in time, see, e.g.,  or . If such a DFA is synchronizing, it always has a synchronizing word of length , see , and it is conjectured that a synchronizing DFA with states must have a synchronizing word of length (this is the famous Černý conjecture). In contrast, the problem of determining whether or not a given PFA is synchronizing is known to be PSPACE-complete and there is no polynomial in upper bound on the length of synchronizing words for a synchronizing PFA with states. (These results were found by Rystsov in the early 1980s [12, 13] and later rediscovered (and strengthened) by Martyugin .) This readily implies that the problem of determining whether or not a given NFA is -synchronizing as well as the problem of finding a -synchronizing word of minimum length are computationally hard.

Nowadays, a popular approach to computationally hard problems consists in encoding them as instances of the Boolean satisfiability problem (SAT) that are then fed to a SAT-solver, that is, a specialized program designed to solve instances of SAT. We refer to this approach as the SAT-solver method. Modern SAT solvers can solve instances with hundreds of thousands of variables and millions of clauses within a few minutes. Thanks to this remarkable progress, the SAT-solver method has proved to be very efficient for an extremely wide range of problems of both theoretical and practical importance. Its applications are far too numerous to be listed here; some examples of such applications can be found in the survey , which also gives a smart introduction into the area. Here we mention only three recent papers that deal with two difficult problems related to finite automata. Geldenhuys, van der Merwe, and van Zijl  have used the SAT-solver method to attack the minimization problem for NFAs. In the minimization problem, which is known to be PSPACE-complete , an NFA with designated initial and final states is given, and one looks for an NFA of minimum size that accepts the same set of words as . Skvortsov and Tipikin  have applied the method to find a synchronizing word of minimum length for a given DFA with two input symbols, and Güniçen, Erdem, and Yenigün  have extended their approach to DFAs with arbitrary input alphabets. The problem of finding a synchronizing word of minimum length is known to be hard for the complexity class , the functional analogue of the class of problems solvable by a deterministic polynomial-time Turing machine that has an access to an oracle for an NP-complete problem, with the number of queries being logarithmic in the size of the input .

In the present paper, we use the SAT-solver method to approach the problem of computing a -synchronizing word of minimum length for a given NFA. It should be stressed that neither the encoding of NFAs used in  nor the encoding of synchronization used in [18, 19] work for our problem, and therefore, we have had to invent essentially different encodings.

The rest of the paper is divided into three sections. Section 2 describes our basic encoding and Section 3 presents implementation details and some of our experimental results. The final section contains several concluding remarks and a discussion of possible further developments.

## 2. Encoding

By the encoding of a problem, we mean a polynomial reduction from this problem to SAT. First, let us precisely formulate the problem which we are interested in.

The integer is assumed to be given in unary. With given in binary, a polynomial reduction from D3W to SAT is hardly possible. Indeed, it is known that every -synchronizing NFA with states has a -synchronizing word of length at most , see [5, Proposition 8.3.10]. Hence, given a NFA with states and two input symbols, the answer to the problem D3W for the instance is YES if and only if is -synchronizing. As it was mentioned, the problem of determining whether or not a given NFA is -synchronizing is PSPACE-complete, whence the version of D3W in which the integer parameter is given in binary is PSPACE-hard. On the other hand, SAT is an archetypical problem in NP, and clearly, the existence of a polynomial reduction from a PSPACE-hard problem to a problem in NP would imply that the polynomial hierarchy collapses at level 1. While, as it is usual in complexity theory, the question of whether or not the polynomial hierarchy collapses at any level is open, a common opinion is that it does not.

In contrast, the version of D3W with the integer parameter given in unary is easily seen to belong to NP. Indeed, given an instance of D3W in this setting, one has right to guess a word of length over the input alphabet of as is obviously of polynomial size in terms of the size of the instance. Then one just checks whether or not is -synchronizing for , and time spent for this check is clearly polynomial in the size of . By Cook’s classic theorem (see, e.g., [1, Theorem 8.2]), SAT is NP-complete, and by the very definition of NP-completeness, there exists a polynomial reduction from our version of D3W to SAT.

Recall that an instance of SAT is a pair , where is a set of Boolean variables and is a collection of clauses over . (A clause over is a disjunction of literals and a literal is either a variable in or the negation of a variable in .) Any truth assignment on , i.e., any map , extends to a map (still denoted by ) via the usual rules of propositional calculus: , . A truth assignment satisfies if for all . The answer to an instance is YES if has a satisfying assignment (i.e., a truth assignment on that satisfies ) and NO otherwise.

Thus, a polynomial reduction from D3W to SAT is an algorithm that, given an arbitrary instance of D3W, constructs, in polynomial time with respect to the size of , an instance of SAT such that the answer to is YES if and only if so is the answer to . Of course, neither a pure existence statement nor any general construction that can be extracted from one of the proofs of Cook’s theorem can be used for our purposes. We need a sort of “practical” reduction: it should be explicit, easy to implement, and economical in the sense that the degrees of the polynomials that bound the number of variables in and the number of clauses in in terms of the size of should be as small as possible.

In the following presentation of our encoding, precise definitions and statements are interwoven with less formal comments explaining the “physical” meaning of variables and clauses we introduce and with estimations of their numbers.

So, take a NFA and an integer . Denote the size of by and fix some numbering of the states in so that . Recall that we consider the problem D3W for NFAs with two input symbols, so let .

We start with introducing the variables used in the instance of SAT that encodes . The set consists of three sorts of variables: letter variables, token variables, and synchronization variables.

The letter variables are . They are just placeholders for the input symbols 0 and 1. There is an obvious 1-1 correspondence between the truth assignments on the set and the words in : given a truth assignment , the corresponding word is , and, conversely, given a word with , the corresponding truth assignment is for each .

The token variables are where and . To explain the role of these variables, we use a solitaire-like game on the labeled directed graph representing the NFA . In the initial position of , each state holds exactly one token denoted . In the course of the game, tokens migrate and may multiply or disappear according to certain rules that will be specified a bit later, when we describe the clauses in . For the moment, it is sufficient to say that the rules are designed to ensure that the variable gets value 1 in a satisfying truth assignment for if and only if after rounds of the game, one of the tokens held by the state is .

The synchronization variables are . They play the role of indicators showing which states may occur at the end of the synchronization process. By the definition of -synchronization, the answer to the instance is YES if and only if there exists a word such that . The clauses of will be chosen so that the variable gets value 1 in a satisfying assignment for if and only the state belongs to the set , where is the word defined by the restriction of the assignment to .

We see that the total number of variables in is .

Now we turn to constructing the set of clauses . It is the disjoint union of sets: the set of initial clauses, the sets , , of transition clauses, and the set of synchronization clauses.

The clauses in describes the initial position of our game . As mentioned, in this position, each state holds the token and nothing else. It order to reflect this setting, we let consist of the clauses along with all clauses of the form with . Altogether, contains one-literal clauses.

They are the clauses in , , that encode the rules of . The rules are as follows. At each move an input symbol is chosen. Then for each state such that , all tokens that were held by slide along the edges labeled to all states in the set . (If , then every token held by multiplies to identical tokens, one for each state in .) If , then all tokens that were held by disappear. Thus, after the move, the token occurs at a state if and only if for some state that had held just prior to the move.

For an illustration, Figure 2 demonstrates the initial distribution of tokens on a 5-state NFA with the input alphabet (top), along with the outcomes of the first move, depending on whether 0 or 1 has been chosen for the move (bottom left and bottom right, respectively).

The following observation is immediate.

###### Lemma 1.

Suppose that in the game played on , the sequence of chosen symbols forms a word . Then for each , the set of states holding the token at the end of the game is .

Now we express the rules of by formulas of propositional logic. For a state , let and stand for the sets of all preimages of under the actions of the input symbols 0 and respectively 1, that is, if is either of the two symbols, . Consider for every and all , the following formulas:

 Ψtij:ytij⟺(xt∧⋁qk∈P1(qj)yt−1ik)∨(¬xt∧⋁qh∈P0(qj)yt−1ih).

Observe that the equivalence just translates in the language of propositional logic our propagation rule for the tokens that says that the token occurs at the state after moves if and only if one of the following alternatives takes place:

• the -th move was done with the input symbol 1 and one of the preimages of under the actions of 1 was holding after moves, or

• the -th move was done with the input symbol 0 and one of the preimages of under the actions of 0 was holding after moves.

###### Lemma 2.

For every , every truth assignment on the set of letter variables has a unique extension to the token variables that makes the clauses in and the formulas hold true . The token variable gets value 1 under if and only if after the moves of the game , one of the tokens held by the state is .

###### Proof..

We induct on . The indiction basis is clear: we have to satisfy the clauses in and the only way to satisfy a one-literal clause is to assign value 1 to its only literal. Hence, independently of , we have to set for all ,

 ¯¯¯¯φ(y0ij)={1if i=j,0otherwise.

Observe that then, in the accordance with the initial setting of the game , the variable gets value 1 exactly when the token held by the state is .

Now suppose that and there exists a unique way to define for all , , such that the clauses in and the formulas with and hold true. If the variable is assigned the value , the value of the right hand side of each equivalence is uniquely defined, and to make this equivalence hold true, we must assign the value to the left hand side, that is, the variable . This gives a unique way to extend to the variables , where . As observed prior to the formulation of the lemma, the equivalences express the rule of . Therefore the token will migrate to the state after the move if and only if the variable gets value 1 under this extension. ∎

For each , we define the set as the set of all clauses of a suitable CNF (conjunctive normal form) equivalent to . In our basic encoding, the set consists of the following clauses:

 (1) ¬ytij∨xt∨⋁qh∈P0(qj)yt−1ih,¬ytij∨¬xt∨⋁qk∈P1(qj)yt−1ik, (2) ytij∨¬xt∨¬yt−1ik  for each qk∈P1(qj), (3) ytij∨xt∨¬yt−1ih  for each qh∈P0(qj).

The verification of the equivalence between and the conjunction of the clauses in (1)–(3) is routine, and we omit it.

It may be worth explaining how the clauses of the form (1)–(3) are understood in the case when one of the sets or or both of these sets happen to be empty. In (1) the disjunctions over the empty sets are omitted so that if, say, , then the first clause in (1) reduces to . As for (2) or (3), these clauses disappear whenever or, respectively are empty.

In order to calculate the number of clauses in , denote by the number of all transitions in , that is, triples with . Clearly, for each fixed , the number of clauses of the forms (2) and (3) is equal to , whence the total number of such “short” clauses is . As for “long” clauses in (1), there are at most two such clauses for each fixed pair , whence their total number does not exceed . Altogether, for each .

Lemma 1 readily implies that a word is -synchronizing for if and only if after the moves in the game on , some state holds all tokens . This is equivalent to saying that the formula

 (4) n⋁j=1n⋀i=1yℓij

holds true under the extension, specified in Lemma 2, of the truth assignment on defined by . A little difficulty is that a direct conversion of the formula (4) into a CNF produces clauses. To overcome this difficulty, we use a standard trick for which we need new variables (this is why we introduce synchronization variables). Let consist of the following clauses:

 n⋁j=1zj  and  ¬zj∨yℓij  for% all  i,j=1,…,n.

It is easy to see that the set and the formula (4) are equisatisfiable; moreover, if and , then every truth assignment on that satisfies (4) can be extended to a truth assignment on that satisfies , and, conversely, for every truth assignment on that satisfies , its restriction to satisfies (4).

The whole set consists of at most clauses. The number of transitions in a NFA with states two input symbols is upper-bounded by , whence . Thus, constructing from takes time polynomial in and . Summarizing the above discussion, we arrive at the main result of the section.

###### Theorem 3.

An NFA has a -synchronizing word of length if and only if the instance of SAT constructed above is satisfiable, and the construction takes time polynomial in the size of and the value of . Moreover, by the construction, there is a 1-1 correspondence between the -synchronizing words of length for and the restrictions of satisfying assignments of to the letter variables.

###### Remark 4.

We do not claim that the above reduction of D3W to SAT is optimal. For instance, it is possible to reduce the number of variables by getting rid of the letter variables. Namely, for each pair of and each , one could take the clause

 (5) ¬ytij∨⋁qh∈P0(qj)yt−1ih∨⋁qk∈P1(qj)yt−1ik

instead of the clauses in (1) and the set of clauses of the form

 (6) ytij∨¬yt−1ih∨¬yt−1ik  for h and k % such that qh∈P0(qj) and qk∈P1(qj)

instead of the ones in (2) and (3). It is easy to see that (1) and (5) are equisatisfiable, and so are the sets of clauses in (2), (3) on the one hand and in (6) on the other.

We have preferred to keep the letter variables because of the fact mentioned in Theorem 3: if a -synchronizing word of length exists, we can immediately recover it from the restriction of a satisfying assignment to the letter variables.

## 3. Experimental results

Here we overview our experiments and present some of their results. Our basic procedure has been organized as follows.

1. A positive integer (the number of states) is fixed. In the experiments which results we report here, we have considered .

2. A random NFA with states and 2 input symbols is generated. We have used two models of random generation that are specified below.

3. We check whether has an input symbol whose action is defined at each state. If it is not the case, the NFA cannot be -synchronizing, and we return to Step 2 to generate another random NFA.

4. A positive integer (the hypothetical length of the shortest -synchronizing word for ) is chosen. Initially, we chose to be close to but, as our early experiments have revealed, it is much more practical to start with smaller values of . We introduce three integer variables , , and and initialize them as follows: , , .

5. The pair is encoded into a SAT instance as described in Section 2.

6. A SAT solver is invoked to solve the SAT instance obtained in Step 5. We have used MiniSat 2.2.0; see  for a description of the underlying ideas of MiniSat and  for a discussion and the source code of the solver.

7. The binary search on is performed. In more detail, if the SAT solver returns YES on the encoding of the pair , we first check whether or not . If , then is the length of the shortest -synchronizing word for , and we go to Step 2 to generate another random NFA. If , we update the variables and by letting

 ℓmax:=ℓ,ℓ:=⌊ℓmin+ℓmax2⌋,

keep the value of and go to Step 5. If the SAT solver returns NO on the encoding of the pair , we check whether or not . If , we interpret this as the evidence that the NFA fails to be -synchronizing333Of course, the equality only means that has no -synchronizing word of length , and it is not excluded, in principle, that the NFA is -synchronizing but its shortest -synchronizing word is very long. However, by suitable preprocessing and choosing an appropriate value of the parameter , we have got rid of the “bad” cases when the SAT solver returns NO and in our experiments. and go to Step 2 to generate another random NFA. If , we update the variables and by letting

 ℓmin:=ℓ+1,ℓ:=⌈ℓmin+ℓmax2⌉,

keep the value of and go to Step 5.

We implemented the algorithm outlined above in C++ and compiled with GCC 4.9.2. In our experiments we used a personal computer with an Intel(R) Core(TM) i5-2520M processor with 2.5 GHz CPU and 4GB of RAM. For each fixed , up to NFAs that passed Step 3 were analyzed. The average calculation time (for one NFA) was 400 seconds for and 4350 seconds for .

The two models we used for random generation of an NFA with  states and 2 input symbols are the uniform model based on the uniform distribution and the Poisson model based on the Poisson distribution with some parameter . For each state and each symbol , we first choose a number that serves as the cardinality of the set . In the uniform model, each is chosen with probability while in the Poisson model with parameter , each is chosen with probability and is chosen with probability . With being chosen, we proceed the same in both models, by choosing a -element subset from all subsets of with cardinality uniformly at random and letting be the chosen subset.

In each of the two models, it is easy to estimate the fraction of automata that survive Step 3. The corresponding results are stated in the following proposition which proof amounts to straightforward calculations and is therefore omitted.

###### Proposition 5.

The probability that a random NFA with  states and input symbols has an input symbol whose action is defined at each state is

 (7) 2(1−1n+1)n−(1−1n+1)2n

if the NFA is generated under the uniform model and

 (8) 2(1−e−λ)n−(1−e−λ)2n

if the NFA is generated under the Poisson model with parameter .

Observe that as grows, the expression in (7) tends to while the expression in (8) tends to 0. In the further discussion, we always assume that the NFA considered have passed Step 3.

For the uniform model, our experiments produced results that may seem surprising at the first glance. Namely, it turns out that for an overwhelming majority of NFAs, the length of the shortest -synchronizing word is equal to 2, and this conclusion does not depend on the state number , at least within the range of our experiments (recall that we have considered ). For an illustration, see Figure 3 in which the horizontal axis is the length of the shortest -synchronizing word and the vertical axis is the number of NFAs. The blue and the yellow circles represent NFAs with 20 and 30 states respectively. Figure 3. Distributions of 20- and 30-state NFAs generated under the uniform model according to the length of their shortest D3-synchronizing words

Insofar, we have got no rigorous theoretical explanation of the observed phenomenon. However, even a quick analysis of the uniform model reveals that NFAs it produces should tend to have rather short -synchronizing words. Indeed, if an NFA with  states and 2 input symbols is generated under the uniform model, then the expected cardinality of the set is for every and . Therefore the expected size of every set of the form with is close to . Hence it is quite likely that for some word of length 2, which is then a -synchronizing word for .

Some sample experimental results for the Poisson model are presented in Figure 4. The three histograms in Figure 4 correspond to 60-state NFAs generated under the Poisson models with three different values of the parameter and demonstrate how these NFAs are distributed according to the length of their shortest -synchronizing words. As in Figure 3, the horizontal axis is the length of the shortest -synchronizing word and the vertical axis is the number of NFAs. Figure 4. Distributions of 60-state NFAs generated under the Poisson models with λ=1 (top), λ=2 (middle), λ=5 (bottom) according to the length of their shortest D3-synchronizing words

We see that if the number of states is fixed, the expected length of the shortest -synchronizing word decreases as the parameter grows. This can be explained by an informal argument of the same flavour as the reasoning used above to explain the outcome of our experiments with NFAs generated under the uniform model. Indeed, if an NFA with  states and 2 input symbols is generated under the Poisson model wiht parameter , it follows from a basic property of the Poisson distribution that is close to the expected cardinality of sets for every and . The larger are these sets, the smaller is the value of such that the expected size of sets of the form with becomes close to .

Our experiments also show that if the parameter is fixed, the expected length of the shortest -synchronizing word grows with the number of states but the growth rate is rather small. For each , we have calculated the average length of the shortest -synchronizing words for -state NFAs generated under the Poisson model with . Then, using the method of least squares, we have searched for an explicit function of that approximates and found the following solution:

 E1(n)≈(0.57+0.66lnn)2.

For , the same procedure has led to the following approximation of the similarly defined quantity calculated from our experimental data:

 E2(n)≈(0.77+0.43lnn)2.

Similar approximations have been obtained for other values of the parameter .

## 4. Conclusion and future work

We have presented an attempt to approach the problem of computing a -synchronizing word of minimum length for a given NFA via the SAT-solver method. We think that our results do provide some evidence for this approach to be feasible in principle. Of course, they constitute only the very first steps, and more work is needed to improve the performance of our implementation and to enlarge its range.

We see several resources for improvements. First of all, we may try to modify the basic encoding described in Section 2. There are several options for such modifications that all look promising but it is hard to predict a priori which one will prove to be the most efficient, and we have to go through several rounds of trial-and-error. As an example of a relatively successful trial, we briefly report one of the modifications that have already been implemented by the first author.

As mentioned in the description of our basic algorithm in Section 3, every -synchronizing NFA must have an everywhere defined input symbol. If all input symbols of are everywhere defined, one can use the transformations described in [5, Lemma 8.3.8] or [8, Section 2] to convert into a DFA such that is -synchronizing if and only if is synchronizing and the minimum length of -synchronizing words for is the same as the minimum length of synchronizing words for . Since there are powerful methods to compute shortest synchronizing words for DFAs with up to 350 states (see, e.g., ), we can apply one of these methods to . Hence, we can restrict ourselves to the case when one of the input symbols of is not everywhere defined.

If we consider only NFAs with 2 input symbols, 0 and 1, say, we conclude that we may assume that 0 is everywhere defined while 1 is not. Every -synchronizing word for such an NFA should start with the symbol 0. Therefore one can start our solitaire-like game described in Section 2 from the position that arises after the first application of 0, and the basic encoding can be modified accordingly444If we re-use the illustrative example in Figure 2, the new initial position for this example will be the one shown in bottom left.. For an NFA with states and transitions, this preprocessing allows one to save variables and around clauses in the resulting instance of SAT. Our experiments show that this modification indeed reduces the execution time of solving D3W-instances for NFAs with states, and the average time decrease reaches 50% for NFAs with states. Also, the modification has allowed us to solve D3W for NFAs with more than 100 states which size was out of reach with the basic encoding.

Of course, the efficiency of our approach depends not only on the way we encode the problem but also on software and hardware used in the implementation. Besides optimizing our own code, we have plan to experiment with more advanced SAT-solvers, namely, with CryptoMiniSat  and lingeling . Using more powerful computers constitutes yet another obvious direction for improvements. In particular, our approach is clearly amenable to parallelization since calculations needed for different automata are completely independent so that in principle, we can work in parallel with as many automata as many processors are available.

Our future work should include theoretical explanations for phenomena observed in our experiment as well as extending our study to automata with arbitrarily many input symbols and to other versions of NFA synchronization such as - and -synchronization mentioned in Section 1.

## References

•  M. V. Volkov, Synchronizing automata and the Černý conjecture, in: Language and Automata Theory and Applications, volume 5196 of Lect. Notes Compt. Sci. (2008), 11–27.
•  B. Imreh, M. Steinby, Directable nondeterministic automata, Acta Cybernetica 14 (1999), 105–115.
•  P. Martyugin, A lower bound for the length of the shortest carefully synchronizing words, Russ. Math. 54(1) (2010), 46–54.
•  M. Ito, Algebraic Theory of Automata and Languages, World Scientific, 2004.
•  V. D. Blondel, R. M. Jungers, A. Olshevsky, On primitivity of sets of matrices, Automatica 61(C) (2015), 80–88.
•  B. Gerencsér, V. V. Gusev, R. M. Jungers, Primitive sets of nonnegative matrices and synchronizing automata, SIAM J. Matrix Analysis and Applications, accepted; preprint available at https://arxiv.org/abs/1602.07556.
•  H. Don, H. Zantema, Synchronizing non-deterministic finite automata, Preprint, 2017. Available at https://arxiv.org/abs/1703.07995.
•  M. Steinby, Directable fuzzy and nondeterministic automata, Preprint, 2017. Available at https://arxiv.org/abs/1709.07719.
•  S. Sandberg, Homing and synchronizing sequences, in: Model-Based Testing of Reactive Systems, volume 3472 of Lect. Notes Compt. Sci. (2005), 5–33.
•  J.-E. Pin, On two combinatorial problems arising from automata theory, Ann. Discrete Math. 17 (1983), 535–548.
•  I. K. Rystsov, Asymptotic estimate of the length of a diagnostic word for a finite automaton, Cybernetics 16(1) (1980), 194–198.
•  I. K. Rystsov, Polynomial complete problems in automata theory, Inf. Process. Lett. 16(3) (1983), 147–151.
•  P. Martyugin, Synchronization of automata with one undefined or ambiguous transition, in: Implementation and Application of Automata, volume 7381 of Lect. Notes Compt. Sci. (2012), 278–288.
•  C. P. Gomes, H. Kautz, A. Sabharwal, B. Selman, Satisfiability Solvers, Chapter 2 in: Handbook of Knowledge Representation, Elsevier, 2008, 89–134.
•  J. Geldenhuys, B. van der Merwe, L. van Zijl, Reducing nondeterministic finite automata with SAT solvers, in: Finite-State Methods and Natural Language Processing, volume 6062 of Lect. Notes Compt. Sci. (2009), 81–92.
•  T. Jiang, B. Ravikumar, Minimal NFA problems are hard, in: Automata, Languages and Programming, volume 510 of Lect. Notes Compt. Sci. (1991), 629–640.
•  E. Skvortsov, E. Tipikin, Experimental study of the shortest reset word of random automata, in: Implementation and Application of Automata, volume 6807 of Lect. Notes Compt. Sci. (2011), 290–298.
•  C. Güniçen, E. Erdem, H. Yenigün, Generating shortest synchronizing sequences using Answer Set Programming, in: Proceedings of Answer Set Programming and Other Computing Paradigms (ASPOCP 2013), 6th International Workshop (2013), 117–127. Available at https://arxiv.org/abs/1312.6146.
•  J. Olschewski, M. Ummels, The complexity of finding reset words in finite automata, in: Mathematical Foundations of Computer Science, volume 6281 of Lect. Notes Compt. Sci. (2010), 568–579.
•  N. Eén, N. Sörensson, An extensible SAT-solver, in: Theory and Applications of Satisfiability Testing (SAT 2003), volume 2919 of Lect. Notes Compt. Sci. (2004), 502–518.
•  N. Eén, N. Sörensson, The MiniSat Page. Available at http://minisat.se.
•  A. Kisielewicz, J. Kowalski, M. Szykuła, Computing the shortest reset words of synchronizing automata, J. Comb. Optim. 29(1)(2015), 88–124.
•  M. Soos, CryptoMiniSat 2. Available at http://www.msoos.org/cryptominisat2/.
•  A. Biere, Yet another Local Search Solver and Lingeling and Friends entering the SAT Competition 2014, in: Proceedings of SAT Competition 2014: Solver and Benchmark Descriptions, University of Helsinki, 2014, 39–40.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   