Synthesizing Finite-state Protocols from Scenarios and RequirementsThis is the working draft of a paper currently in submission. (February 10, 2014)

Synthesizing Finite-state Protocols from Scenarios and Requirementsthanks: This is the working draft of a paper currently in submission. (February 10, 2014)

Rajeev Alur University of Pennsylvania    Milo Martin University of Pennsylvania    Mukund Raghothaman University of Pennsylvania    Christos Stergiou University of Pennsylvania University of California, Berkeley    Stavros Tripakis University of California, Berkeley Aalto University    Abhishek Udupa University of Pennsylvania
Abstract

Scenarios, or Message Sequence Charts, offer an intuitive way of describing the desired behaviors of a distributed protocol. In this paper we propose a new way of specifying finite-state protocols using scenarios: we show that it is possible to automatically derive a distributed implementation from a set of scenarios augmented with a set of safety and liveness requirements, provided the given scenarios adequately cover all the states of the desired implementation. We first derive incomplete state machines from the given scenarios, and then synthesis corresponds to completing the transition relation of individual processes so that the global product meets the specified requirements. This completion problem, in general, has the same complexity, PSPACE, as the verification problem, but unlike the verification problem, is NP-complete for a constant number of processes. We present two algorithms for solving the completion problem, one based on a heuristic search in the space of possible completions and one based on OBDD-based symbolic fixpoint computation. We evaluate the proposed methodology for protocol specification and the effectiveness of the synthesis algorithms using the classical alternating-bit protocol.

1 Introduction

In formal verification, a system model is checked against correctness requirements to find bugs. Sustained research in improving verification tools over the last few decades has resulted in powerful heuristics for coping with the computational intractability of problems such as Boolean satisfiability and search through the state-space of concurrent processes. The advances in these analysis tools now offer an opportunity to develop new methodologies for system design that allow a programmer to specify a system in more intuitive ways. In this paper, we focus on distributed protocols: the multitude of behaviors arising due to asynchronous concurrency makes the design of such protocols difficult, and the benefits of using model checkers to debug such protocols have been clearly demonstrated. Traditionally a distributed protocol is described using communicating finite-state machines (FSMs), and the goal of this paper is to develop a methodology aimed at simplifying the task of specifying them.

A more intuitive way of specifying the desired behaviors of a protocol is by scenarios, where each scenario describes an expected sequence of message exchanges among participating processes. Such scenarios are used in textbooks and classrooms to explain the protocol and can be specified using the intuitive visual notation of Message Sequence Charts. In fact, the MSC notation is standardized by IEEE [1], and it is supported by some system development environments as design supplements. These observations raise the question: is it plausible to ask the designer to provide enough scenarios so that the protocol implementation can be automatically synthesized? Although one cannot expect a designer to provide scenarios that include all the possible behaviors, our key insight is that even a representative set of scenarios covers all the states of the desired implementation. From a scenario, (local) states of a process are obtained from explicit state-labels that appear as annotations as well as from the histories of events in which the process participates. If we consider all the states and the input/output transitions out of these states for a given process that appear in the given set of scenarios, we obtain a skeleton of the desired FSM implementation of that process. The synthesis problem now corresponds to completing this skeleton by adding transitions. This requires the synthesizer to infer, for instance, how to process a particular input event in a particular state even when this information is missing from the specified scenarios. The more such completions that the synthesizer can learn successfully, the lower the burden on the designer to specify details of each and every case. To rule out incorrect completions, we ask the designer to provide a model of the environment and correctness requirements. Some requirements such as absence of deadlocks can be generic to all the protocols, whereas other requirements specific to the coordination problem being solved by the protocol are given as finite-state monitors for safety and liveness properties commonly used in model checkers.

The synthesis problem then maps to the following protocol completion problem: given (1) a set of FSMs with incomplete transition functions, (2) a model of the environment, and (3) a set of safety/liveness requirements, find a completion of the FSMs so that the composition satisfies all the requirements. We show this problem, similar to the model checking problem, to be Pspace-complete, but, unlike the model checking problem, to be NP-hard for just one process. We focus on two approaches to solving this problem: the first performs a search through the space of possible completions with heuristics guiding the search order and the second uses OBDD-based symbolic model checking to compute the set of correct completions by encoding the unknown targets of transitions as rigid variables.

To evaluate our methodology, we consider the Alternating Bit Protocol, a classical solution to provide reliable transmission using unreliable channels. The canonical description of the protocol [2] uses four scenarios to explain its behavior. It turns out that the first scenario corresponding to the typical behavior contains a representative of each local state of both the sender and receiver processes. Our symbolic algorithm for protocol completion is able to find the correct implementation from just one scenario, and thus, automatically learn how to cope with message losses and message duplications. We then vary the input, both in terms of the set of scenarios and the set of correctness requirements, and study how it affects the computational requirements and the ability to learn the correct protocol for both the completion algorithms.

Related Work

Our work builds on techniques and tools for model checking [3] and also on the rich literature for formal modeling and verification of distributed protocols [4].

The problem of deriving finite-state implementations from formal requirements specified, for instance, in temporal logic, is called reactive synthesis, and has been studied extensively [5, 6, 7]. When the implementation is required to be distributed, the problem is known to be undecidable [8, 9, 10, 11]. In bounded synthesis, one fixes a bound on the number of states of the implementation, and this allows algorithmic solutions to distributed synthesis [12]. Another approach uses genetic programming, combined with model checking, to search through protocol implementations to find a correct one, and has been shown to be effective in synthesizing protocols such as leader election [13, 14].

Specifying a reactive system using example scenarios has also a long tradition. In particular, the problem of deriving an implementation that exhibits at least the behaviors specified by a given set of scenarios is well-studied (see, for instance, [15, 16]). A particularly well-developed approach is behavioral programming [17] that builds on the work on an extension of message sequence charts, called live sequence charts [18], and has been shown to be effective for specifying the behavior of a single controller reacting with its environment. More recently, scenarios — in the form of “flows” — have been used in the modular verification of cache coherence protocols [19].

Our approach of using both the scenarios and the requirements in an integrated manner, and using scenarios to derive incomplete state machines, offers a conceptually new methodology compared to the existing work. We are inspired by recent work on program sketching [20] and on protocol specification [21]. Compared to Transit [21], in this paper we limit ourselves to finite-state protocols, but consider both safety and liveness requirements, and provide a fully automatic synthesis procedure.

The protocol completion problem itself has conceptual similarities to problems such as program repair studied in the literature [22], but differs in technical details.

2 Methodology

We explain our methodology by illustrating it on an example, the well-known Alternating Bit Protocol (ABP). The ABP protocol ensures reliable message transmission over unreliable channels which can duplicate or lose messages. As input to the synthesis tool the user provides the following:

  • The protocol skeleton: this is a set of processes which are to be synthesized, and for each process, the interface of that process, i.e., its inputs and outputs.

  • The environment: this is a set of processes which are known and fixed, that is, are not to be synthesized nor modified in any way by the synthesizer. The environment processes interact with the protocol processes and the product of all these processes forms a closed system, which can be model-checked against a formal specification.

  • A specification: this is a set of formal requirements. These can be expressed in different ways, e.g., as temporal logic formulas, safety or liveness (e.g., Büchi) monitors, or “hardwired” properties such as absence of deadlock.

  • A set of scenarios: these are example behaviors of the system. In our framework, a scenario is a type of message sequence chart (MSC).

In the case of the ABP example, the above inputs are as follows. The overall system is shown in Figure 1. The protocol skeleton consists of the two unknown processes ABP Sender and ABP Receiver. Their interfaces are shown in the figure, e.g., ABP Sender has inputs , , and and outputs , , and . The environment processes are: Forward Channel (FC) (from ABP Sender to ABP Receiver, duplicating and lossy), Backward Channel (BC) (from ABP Receiver to ABP Sender, also duplicating and lossy), Timer (sends messages to ABP Sender), Safety Monitor, and a set of Liveness Monitors.

As specification for ABP we will use the following requirements: (1) deadlock-freedom, i.e., absence of reachable global deadlock states (in the product system); (2) safety, captured by a safety monitor such as the one depicted in Figure 1; (3) Büchi liveness monitors, that accept incorrect infinite executions in which either a send message is not followed by a deliver, a deliver is not followed by a send, or a send never appears, provided that the channels are fair; as well as (4) a property that we call non-blockingness, which informally requires that in every reachable global state, if a process wants to send a message to another process, then the latter must be able to receive it. Non-blockingness allows to specify that the system does not have local deadlocks, where a process is blocked from making progress while the system as a whole is not deadlocked.

ABPsenderForwardChannelBackwardChannelABPreceiver

Safety Monitor

Liveness Monitors

Timer

, , , , , , , , ,

,

,

,

,

error

,

Figure 1: ABP system architecture (left) and the safety monitor which ensures that send and deliver messages alternate (right)

We will use the four message sequence charts shown in Figure 2 to describe the behavior of the ABP protocol. They come from a textbook on computer networking [2]. The first scenario describes the behavior of the protocol when no packets or acknowledgments are lost or duplicated. The second and the third scenarios correspond to packet and acknowledgment loss respectively. Finally, the fourth scenario describes the behavior of ABP on premature timeouts and/or packet duplication.

beforesending 0

Sender

Receiver

beforereceiving 0

beforereceiving 0

beforesending 1

beforereceiving 1

beforesending 0

Sender

Receiver

Sender

Receiver

Sender

Receiver

Figure 2: Four scenarios for the alternating-bit protocol. From left to right: No loss, Lost packet, Lost ACK, Premature timeout/duplication.

A candidate solution to the ABP synthesis problem is a pair of processes, one for the ABP Sender and one for the ABP Receiver. Such a candidate is a valid solution if: (a) the two processes respect their I/O interface and satisfy some additional requirements such as determinism (these are defined formally in Section 3.1), (b) the overall ABP system (product of all processes) may exhibit each of the input scenarios, and (c) it satisfies the correctness requirements.

The output of the BDD-based algorithm when run with the requirements mentioned above and only the first scenario from Figure 2 is shown in Figure 4. It can be checked that these solutions are “similar/equivalent” in the sense that they satisfy the same intuitive properties that one expects from the ABP protocol. In particular, the computed solution in Figure 4 eagerly retransmits the appropriate packet when an unexpected acknowledgment is received. This behavior might incur additional traffic but satisfies all the safety and liveness properties for the ABP protocol. The computed solution for the ABP receiver is the same as the manually constructed automaton shown in Figure 3.

Figure 3: ABP “manual” solution: ABP Sender (left), ABP Receiver (right).

Figure 4: Solution computed for ABP Sender by the BDD-based symbolic algorithm using only the first scenario.

3 The Automata Completion Problem

We now describe how the problem, which we have set up in Section 2, can be viewed as a problem of completing the transition relations of finite IO automata.

3.1 Finite-state Input-Output Automata

A finite-state input-output automaton is a tuple where is a finite set of states, is the initial state, is a finite (possibly empty) set of inputs, is a finite (possibly empty) set of outputs, with and is a finite set of transitions111The framework and synthesis algorithms can easily be extended to handle internal transitions as well, but we suppress this detail for simplicity of presentation..

We write a transition as when , and as when . We write if there exists such that . A transition labeled with (respectively, ) is called an input transition (respectively, an output transition).

A state is called a deadlock if it has no outgoing transitions. is called an input state if it has at least one outgoing transition, and all outgoing transitions from are input transitions. is called an output state if it has a single outgoing transition, which is an output transition.

Automaton is called deterministic if for every state , if there are multiple outgoing transitions from , then all these transitions must be labeled with distinct inputs. Determinism implies that every state is a deadlock, an input state, or an output state.

Automaton is called closed if .

A safety monitor is an automaton equipped with a set of error states , . A liveness monitor is an automaton equipped with a set of accepting states , . A monitor could be both safety and liveness, in which case it is a tuple .

A run of an automaton is a finite or infinite sequence of transitions starting from the initial state: . A state is called reachable if there exists a finite run reaching that state: . A safety automaton is called safe if it has no reachable error states. An infinite run of a liveness automaton is called accepting if it visits accepting states infinitely often. A liveness automaton is called empty if it has no infinite accepting runs.

3.2 Composition

We define an asynchronous (interleaving-based) parallel composition operator with rendezvous synchronization. Given two automata and , the composition of and , denoted , is defined, provided , as the automaton

where iff one of the following holds:

  • and and either and or and .

  • and and either and or and .

  • and at least one of the following holds: (1) and and , (2) and and , (3) and and .

During composition, the product automaton “inherits” the safety and liveness properties of each of its components. Specifically, a product state is an error state if either or are error states. A product state is an accepting state if either or is an accepting state.

Note that is commutative and associative. So we can write without parentheses, for a set of automata.

We call a product strongly non-blocking if in each reachable global state , if some automaton is willing to send a message , and all automata which accept are in non-output states, then these automata should be able to synchronize on the transition in that global state. On the other hand, we call a product weakly non-blocking if in each reachable global state , if some automaton is willing to send a message , then it is possible for all automata which accept to eventually synchronize on the transition . Note that strong non-blockingness is a safety property and can be useful when the model-checker cannot verify liveness properties.

3.3 From Scenarios to Incomplete Automata

The first step in our synthesis method is to automatically generate from the set of input scenarios an incomplete automaton for each protocol process. The second step is then to complete these incomplete automata to derive a complete protocol. In the sections that follow, we formalize and study the automata completion problem. In this section, we illustrate the first step of going from scenarios to incomplete automata, by means of the ABP example.

The idea for transforming scenarios into incomplete automata is simple. First, for every “swim lane” in the message sequence chart corresponding to a given scenario, we identify the corresponding automaton in the overall system. For example, in each scenario shown in Figure 2, the left-most lane corresponds to ABP Sender and the right-most lane to ABP Receiver. These scenarios omit the environment processes for simplicity. In particular channel processes are omitted, however, we will use a primed version of a message when referencing it on the process that receives it.

Second, for every protocol process , we generate an incomplete automaton as follows. For every message history (i.e., finite sequence of messages received or sent by the process) specified in some scenario in the lane for , we identify a state in . If is an extension of history by one message , then there is a transition in . Applying this procedure to the four scenarios of Figure 2, we obtain the two incomplete automata shown in Figure 5.

Figure 5: Incomplete protocol automata for ABP: Sender (top), Receiver (bottom).

Third, scenarios are annotated with labels. As shown in the first scenario of Figure 2, labels appear between messages on swim lanes. These are used to merge the states that correspond to message histories that are followed by the same label. Merging occurs for states of a single scenario as well as across multiple ones if the same label is used in different scenarios. If consistent labels are given to the initial and final positions in all swim lanes of the scenarios the resulting incomplete automata can be made cyclic. Furthermore, labels are essential for specifying recurring behaviors in scenarios and the structure of the incomplete automaton depends on the number and positions of labels used.

Finally, it is often the case that different behaviors of a system are equivalent up to simple replacement of messages. For example, all the ABP scenarios express valid behaviors if and messages are consistently replaced with and messages respectively and vice versa. Thus, our framework allows for scenarios to be characterized as “symmetric”.

before sending 0

before sending 1

Figure 6: Incomplete protocol automaton for ABP Sender after adding symmetric scenarios and merging labeled states. (Only the first half of the automaton is shown, the rest is the symmetric case for packet 1.)

We annotate the swim lanes of the ABP Sender scenarios of Figure 2 with “before sending 0” and “before sending 1” labels, and the swim lanes of the ABP Receiver with “before receiving 0” and “before receiving 1” labels. We also add the symmetric scenarios by switching 0 messages with 1 messages. The resulting incomplete automaton for ABP Sender is shown in Figure 6.

3.4 Automata Completion

Having transformed the input scenarios into incomplete automata, the next step is to complete those automata by adding the appropriate transitions, so as to synthesize a complete and correct protocol. In this section we formalize this completion problem. We define two versions of the problem: a general version (Problem 2) and a special version with only a single incomplete automaton (Problem 1). We will use Problem 1 in Section 4.1 to show that even in the simplest case automaton completion is combinatorially hard.

Consider an automaton . Given a set of transitions , the completion of with is the new automaton .

Problem 1

Given automaton (the environment) and deterministic automaton (the process) such that is defined, find a set of transitions such that, if is the completion of with , then is deterministic and has no reachable deadlock states.

Note that if is defined then is also defined, because, by definition, completion does not modify the interface (sets of inputs and outputs) of an automaton.

Problem 2

Given a set of environment automata and set of deterministic process automata such that is defined, find sets of transitions such that, if is the completion of with , for , then

  • is deterministic, for ,

  • if the product automaton is a safety automaton then it is safe,

  • if is a liveness automaton then it is empty,

  • has no reachable deadlock states,

  • and, optionally, is weakly (or strongly) non-blocking.

Some of the environment processes can be be safety or liveness monitors. The last requirement means that Problem 2 comes in three versions, one where strong non-blockingness is required, one where the weak version is required, and a third where none is. These are options provided by the user.

4 Solving Automata Completion

In this section, we consider procedures to solve the automata completion problem. First, we show that Problems 1 and 2 are NP-complete and PSPACE-complete respectively. Then, we present an explicit search algorithm that eagerly prunes parts of the search space and a heuristic which ranks candidate completions. Finally, we describe an algorithm which reduces automata completion to a model-checking problem for a symbolic model-checker.

4.1 Complexity

It can be shown that Problem 2 is PSPACE-complete. Note that this is not surprising, as the verification problem itself is PSPACE-complete, for safety properties of distributed protocols. However, in the special case of two processes, while verification can be performed in polynomial time, a reduction from 3-SAT shows that the corresponding completion Problem 1 is NP-complete. The proofs can be found in the appendix.

Theorem 4.1

Problem 2 is PSPACE-complete and Problem 1 is NP-complete.

4.2 Explicit search

This algorithm for solving the automata completion problem (Problem 2) is based on an explicit search over the space of possible completions, guided by various heuristics. More specifically, the algorithm explores a search tree in which every node is a set of added transitions (we include in the transitions added in all incomplete protocol automata). The children of each node are those nodes which contain exactly one more transition than . The root of the tree is the empty set of transitions, which corresponds to the original input (i.e., the incomplete automata generated from the scenarios).

For every newly visited node (including the root) a model-checking problem is solved: we form the product of all environment processes, protocol processes (with the added transitions ), and monitors, and we check the absence of deadlocks, safety, and liveness violations (and optionally also non-blockingness). The following cases are possible:

  1. No violations are found. In this case, is a correct solution, and the search terminates.

  2. A safety or liveness violation is found. This means that this candidate solution is incorrect. Moreover, any other candidate obtained by adding extra transitions to , i.e., , will also be incorrect, by exhibiting the same violations. This is because adding extra local transitions can only add, but not remove, global transitions. This in turn implies that any reachable error state with will also be a reachable error state with , so any safety violation with will also be a safety violation with . Similarly, any reachable accepting cycle with will also be a reachable accepting cycle with , so liveness violations cannot be removed either. In conclusion, in this case, the entire sub-tree under can be pruned from the search.

  3. No safety nor liveness violation is found, but a deadlock or blocking state is found. In this case, is incorrect, but could potentially be made correct by adding more transitions. The search continues exploring the children of .

The search algorithm saves every visited node . The same node might be visited via different paths. For example, adding first , then , leads to the same node as adding first , then . To reduce the search space, the search stops (and backtracks) when it finds a node that has already been visited. This is clearly sound and complete.

The search continues only in Case 3. In this case, we use heuristics to determine in which order should the children of be explored. The heuristic we used prioritizes the children of according to how similar they are to existing transitions in the protocol automaton. We deem two transitions as being similar if their message and destination are the same and if their starting states already agree on some other transition. In other words, if two states handle a message by transitioning to the same state, or indistinguishable states, the heuristic extrapolates that the states should also handle other messages in the same manner. For example, in Figure 6, states and both handle message by transitioning to state . Hence, the heuristic prioritizes the candidate transition from to on , over say to , since also handles by transitioning to . Note that this transition correctly generalizes the behavior on a single timeout described in the scenarios to multiple timeouts.

4.3 BDD-based Symbolic Computation

This technique reduces the automata completion problem to an instance of a model-checking query, which is then solved by using BDD-based symbolic model-checking techniques. Consider a set of environment automata and a set of (possibly incomplete) deterministic process automata , with each , as described in Section 3. For each state and for each event such that for any , we introduce a variable whose value ranges over the set . Intuitively, these variables encode all possible ways to complete the transition relation , including the possibility that no transition exists. Each of the transition relations is now parametrized by the newly introduced variables . Let be the automata obtained by replacing the transition relation of with the parametrized transition relation whose construction we have just described.

We denote the composition by and its transition relation by . Note that is also parametrized by the newly introduced variables. Also, the original composition has only one initial state, by definition, and thus, has only one (parametrized) initial state as well.

Suppose we are given a safety monitor with a set of error states. We can symbolically represent the states where the monitor automaton is in an error state by a propositional formula . Now, the parameter values such that states satisfying are not reachable in the composition can be obtained by model-checking with the CTL property, . If this property is found to be true, then an erroneous state in the monitor is reachable for every valuation of the parameters. If the property is found to be false, then there must exist a valuation for the parameters which prevents a state satisfying from being reachable. These parameter values represent a completion that satisfies the property that no state satisfying is reachable.

4.3.1 Determinism, Deadlock Freedom, Non-blockingness and Liveness.

We encode constraints for determinism by restricting the set of initial values that the parameters can take.

A deadlock state is characterized by the formula , where is a formula which expresses that the (sole) sender of the event is in a state where it can send , and all the receivers of event are in states where they can receive an . Thus the formula represents the set of states which eventually deadlock. A valid completion would render these states unreachable.

Suppose we are given a liveness automaton with a set of accepting states. A completion which is live needs to have no runs that visit states in infinitely often. If all states in are represented symbolically by the propositional formula , then the set of states from which it is possible to visit states in infinitely often can be characterized by the CTL formula . Again, we desire that these states be unreachable in a valid completion.

The non-blockingness requirement can expressed as a safety requirement , and hence can be handled in a similar manner. Finally, we use a symbolic model-checker (NuSMV [23]) to check if satisfies . The valuation of parameters for which does not not satisfy this property represents the completion which has the required safety, liveness, and deadlock-freedom properties.

5 Evaluation

We investigate (1) how effective scenarios are in reducing the empirical complexity of the automata completion problem, (2) the amount of generalization that the proposed algorithms are able to perform, and (3) how adding scenarios reduces the number of formal specifications required for successful completion.

5.0.1 Synthesis with no scenarios.

To validate our hypothesis that scenarios make the synthesis problem easier, we attempted to synthesize the ABP protocol with no transitions specified, but with bounds on the number of states of the processes. These bounds were set to be equal to the corresponding number of states in the manually constructed version of the ABP protocol. We required that the protocol satisfy all the properties discussed in Section 2.

The BDD-based symbolic algorithm ran out of memory222We used a memory limit of 16GB and a time limit of one hour. and failed to synthesize a protocol. Recall that the heuristic algorithm performs generalization by using a similarity metric between states. When the starting point is an empty transition relation, there is no similarity between states, and the heuristic fails to differentiate between candidate transitions. The resulting search procedure runs out of time.

5.0.2 Varying the number of scenarios.

When all four of the scenarios in Figure 2 are used, both the explicit search algorithm and the BDD-based search are able to find a correct completion for the protocol. Moreover, both the algorithms find a correct completion, when applied to just one scenario. A quantitative summary of our experiments can be found in Table 1. We applied our algorithms on the incomplete automata constructed from the first scenario (row 1), the second scenario (row 2), and all four scenarios (row 3). For each case, we report the number of states of the incomplete automata, the number of transitions in the completions found by the algorithms, and their computational requirements.

In the case of all four scenarios, the ranking heuristic described in Section 4.2 chooses a candidate transition that is part of the correct completion at every step. The search does not backtrack, does not prune any nodes, and takes less than a minute to complete. When only the first scenario is used, the ranking heuristic is less effective in the same way as when no scenarios were used. However, the additional structure imposed by the scenarios significantly constrains the search space and the explicit search with pruning successfully returns a completion. The incomplete automata that correspond to scenario 2 include intermediate states that the heuristic can use to successfully rank candidate edges. The runtime of the explicit search algorithm varies with the order in which candidate transitions are explored. We report the 75th percentile over several randomly chosen orders.

In contrast, the BDD-based search performs better when a single scenario is used rather than all four. The intuition behind this is that the number of BDD variables is smaller when there are fewer intermediate states in the incomplete automata. As a result, the search finishes in less than 15 seconds when only the first scenario of Figure 2 is used, less than 15 minutes when only the second scenario is used, and less than 35 minutes when all four are used. Synthesis takes longer with the second scenario because the incomplete sender automaton, as mentioned earlier, has more intermediate states than the first scenario.

Number of states in incomplete automaton Number of transitions to be completed Computational requirements
Sender Receiver Heuristic time BDD time BDD memory
Scenario 1 6 6 6 4 min. 15 sec. 100MB
Scenario 2 10 6 8 30 sec. 15 min. 1GB
All scenarios 12 8 8 45 sec. 35 min. 3.8GB
Table 1: Quantitative summary of experiments.

5.0.3 Generalization and inference of unspecified behaviors.

With just one scenario specified, the algorithms successfully perform the generalization required to obtain a correct completion. We believe that the generalization performed is non-obvious: the correct protocol behaviors on packet loss, loss of acknowledgments and message duplication are inferred, even though the scenario does not describe what needs to happen in these situations. As can be seen in Figure 7, the incomplete automata constructed from the scenario only describe the protocol behavior over lossless channels. The algorithms are guided solely by the liveness and safety specifications to infer the correct behavior. In contrast, when all four scenarios are specified, the scenarios already contain information about the behavior of the protocol when a single packet loss or a single message duplication occurs. The algorithm thus needs to only generalize this behavior to handle an arbitrary number of packet losses and message duplications.

5.0.4 Varying the Correctness Requirements.

We observed that when fewer scenarios were used, we needed to specify more properties — some of which were non-obvious — so that the algorithms could converge to a correct completion. For instance, when only one scenario was specified, we needed to include the liveness property that every deliver message was eventually followed by a send message. Owing to the structure of the incomplete automata, this property was not necessary to obtain a correct completion when all four scenarios were specified. Another property which was necessary to reject trivial completions when no scenarios were specified was that there has to be at least one send message in every run. Therefore, in some cases, using scenarios can compensate for the lack of detailed formal specifications.

5.0.5 Discussion of Experimental Results.

The experimental results clearly demonstrate that specifying the behavior of the protocol using scenarios is essential for the algorithms we have evaluated to be able to construct a correct completion. In particular, even providing just one scenario allows the algorithms to converge on a correct completion within a reasonable amount of time. In contrast, none of the approaches we have evaluated were successful in synthesizing the required protocol when no scenarios were provided. An interesting trend that we observed was that the heuristic algorithm and the BDD-based symbolic algorithm seem to complement each other. Specifically, the BDD-based symbolic algorithm was effective when the number of scenarios — and therefore the number of states in the incomplete automata — was small. On the other hand, the heuristic search does better when more information is provided in the form of additional scenarios — which in turn causes a larger number of states in the incomplete automata to be similar. This is because the BDD-based symbolic algorithm essentially constructs BDDs for all possible completions and picks one that satisfies the properties. In general, the number of possible completions increases with the number of states in the automata, which explains why the BDD-based algorithm performs well when the number of states in the automaton is small. The heuristic algorithm exploits similarities between states to converge on a correct completion. When the algorithm is provided with the incomplete automaton from the first scenario — which has a minimal number of states — it is unable to find states which are similar and thus degenerates to an exhaustive search over all completions. Finally, we also observe that additional scenarios can compensate for unspecified correctness properties. This frees the protocol designer from having to specify the requirements completely formally.

Figure 7: Incomplete automata constructed from the first scenario of Figure 2.

6 Conclusions

The main contribution of this paper is a new methodology, supported by automatic synthesis techniques, for specifying finite-state distributed protocols using a mix of representative behaviors and correctness requirements. The synthesizer derives a skeleton of the state machine for each process using the states that appear in the scenarios and then finds a completion that satisfies the requirements. The promise of the proposed method is demonstrated by the ability of the synthesizer to learn the correct ABP protocol from just a single scenario corresponding to the typical case. Future research should focus on the scalability of the algorithms for protocol completion. One idea is to heuristically limit the choices for targets of transitions before applying the BDD-based symbolic computation, and another approach is to reduce the number of states by representing the implementation as an extended FSM with variables.

References

  • [1] ITU Telecommunication Standardization Sector: ITU-R recommendation Z.120, Message Sequence Charts (MSC ’96). (May 1996)
  • [2] Kurose, J.F., Ross, K.W.: Computer Networking: A Top-Down Approach. 5th edn. Addison-Wesley Publishing Company, USA (2009)
  • [3] Clarke, E.M., Grumberg, O., Peled, D.A.: Model checking. MIT Press (2000)
  • [4] Lynch, N.: Distributed algorithms. Morgan Kaufmann (1996)
  • [5] Ramadge, P., Wonham, W.: The control of discrete event systems. IEEE Transactions on Control Theory 77 (1989) 81–98
  • [6] Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: Proceedings of the 16th ACM Symposium on Principles of Programming Languages. (1989)
  • [7] Bloem, R., Jobstmann, B., Piterman, N., Pnueli, A., Sa’ar, Y.: Synthesis of reactive(1) designs. J. Comput. Syst. Sci. 78(3) (2012) 911–938
  • [8] Pnueli, A., Rosner, R.: Distributed reactive systems are hard to synthesize. In: 31st Annual Symposium on Foundations of Computer Science. (1990) 746–757
  • [9] Tripakis, S.: Undecidable Problems of Decentralized Observation and Control on Regular Languages. Information Processing Letters 90(1) (April 2004) 21–28
  • [10] Finkbeiner, B., Schewe, S.: Uniform distributed synthesis. In: IEEE Symposium on Logic in Computer Science. (2005) 321–330
  • [11] Lamouchi, H., Thistle, J.: Effective control synthesis for DES under partial observations. In: 39th IEEE Conference on Decision and Control. (2000) 22–28
  • [12] Finkbeiner, B., Schewe, S.: Bounded synthesis. Software Tools for Tchnology Transfer 15(5-6) (2013) 519–539
  • [13] Katz, G., Peled, D.: Model checking-based genetic programming with an application to mutual exclusion. In: Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference. LNCS 4963 (2008) 141–156
  • [14] Katz, G., Peled, D.: Synthesizing solutions to the leader election problem using model checking and genetic programming. In: Haifa Verification Conference. (2009) 117–132
  • [15] Alur, R., Etessami, K., Yannakakis, M.: Inference of message sequence charts. IEEE Transactions on Software Engineering 29(7) (2003) 623–633
  • [16] Uchitel, S., Kramer, J., Magee, J.: Synthesis of behavioral models from scenarios. IEEE Trans. Softw. Eng. 29(2) (February 2003) 99–115
  • [17] Harel, D., Marron, A., Weiss, G.: Behavioral programming. Commun. ACM 55(7) (2012) 90–100
  • [18] Damm, W., Harel, D.: LSCs: Breathing life into message sequence charts. Formal Methods in System Design 19(1) (2001) 45–80
  • [19] O’Leary, J., Talupur, M., Tuttle, M.R.: Protocol verification using flows: An industrial experience. In: Formal Methods in Computer-Aided Design, 2009. FMCAD 2009. (Nov 2009) 172–179
  • [20] Solar-Lezama, A., Rabbah, R., Bodik, R., Ebcioglu, K.: Programming by sketching for bit-streaming programs. In: Proc. 2005 ACM Conference on Programming Language Design and Implementation. (2005) 281–294
  • [21] Udupa, A., Raghavan, A., Deshmukh, J.V., Mador-Haim, S., Martin, M.M.K., Alur, R.: TRANSIT: specifying protocols with concolic snippets. In: ACM SIGPLAN Conference on Programming Language Design and Implementation. (2013) 287–296
  • [22] Jobstmann, B., Griesmayer, A., Bloem, R.: Program repair as a game. In: Computer Aided Verification, 17th International Conference. LNCS 3576 (2005) 226–238
  • [23] Cimatti, A., Clarke, E., Giunchiglia, E., Giunchiglia, F., Pistore, M., Roveri, M., Sebastiani, R., Tacchella, A.: NuSMV 2: An OpenSource Tool for Symbolic Model Checking. In: Computer-Aided Verification, 14th International Conference (CAV). LNCS 2404, Springer (2002) 359–364

Appendix: Complexity of Automata Completion

Theorem .1

Problem 1 is in NP.

Proof

Problem 1 is in NP since we can guess a completion, and then check whether the requirements of the problem are satisfied. Checking whether the resulting process automaton is deterministic can be done in polynomial time. Checking whether the product automaton is deadlock-free can be done in polynomial time also, because there are only two automata in the product.

Theorem .2

Problem 1 is NP-complete.

Proof

We will show that 3SAT is reducible to Problem 1. Together with Theorem .1 this shows that Problem 1 is NP-complete.

Let be a set of variables and be a set of clauses, such that for , making up an arbitrary instance of 3SAT.

We write , , for the literals of . We write for the variable of literal . Note that can be equal to or its negation .

We construct a process automaton and an environment with defined, such that the following are equivalent:

  • there is a set of transitions such that if is the completion of with , then is deterministic and has no reachable deadlock states

  • there is a truth assignment for that satisfies all clauses in .

The process automaton has an initial state , and a pair of states and for each variable . At each step, the environment challenges the process to instantiate some variable by transmitting the message . On receipt of this message, the completed process has to transition to one of and . It responds with either or indicating the assignment made to . The environment performs this challenge for each literal in each clause, and enters a deadlock state if any clause is left unsatisfied. It follows that a completion exists iff is satisfiable.

P is an automaton such that:

E is an automaton such that:

  • is the smallest set such that the following hold:

    • for and , if ,

    • for , if ,

    • and if ,

    • for and , if ,

    • for , if ,

    • and if , and

    • .

Note that , , and , and and can be constructed in polynomial time.

For , , , and , the process automaton and the environment automaton are shown in Figure 8 and in Figure 9 respectively.

Figure 8: Process automaton before completion. Choosing an assignment to a boolean variable of the SAT problem means choosing whether to add an input transition labeled from to (in which case is assigned to ) or to (in which case is assigned to ).

Clause

Clause

deadlock

success

Figure 9: Environment Automaton

We now prove the correctness of the reduction:

.0.1 3sat Problem 1:

Assume that there is a truth assignment for that satisfies all clauses in .

Let be the set of transitions such that iff and iff .

Let be the completion of with . It is easy to see that every transition leaving state has a distinct label, since we add exactly one transition for each label . Therefore is deterministic. We now show that has no reachable deadlock states.

Let be the sequence of states in a run of . If is even then is of the form