Unbounded Software Model Checking with Incremental SATSolving ^{†}^{†}thanks: This work was supported by BadenWürttemberg Stiftung project HIVES
Abstract
This paper describes a novel unbounded software model checking approach to find errors in programs written in the C language based on incremental SATsolving. Instead of using the traditional assumption based API to incremental SAT solvers we use the DimSpec format that is used in SAT based automated planning. A DimSpec formula consists of four CNF formulas representing the initial, goal and intermediate states and the relations between each pair of neighboring states of a transition system. We present a new tool called LLUMC which encodes the presence of certain errors in a C program into a DimSpec formula, which can be solved by either an incremental SATbased DimSpec solver or the IC3 algorithm for invariant checking. We evaluate the approach in the context of SATbased model checking for both the incremental SATsolving and the IC3 algorithm. We show that our encoding expands the functionality of bounded model checkers by also covering large and infinite loops, while still maintaining a feasible time performance. Furthermore, we demonstrate that our approach offers the opportunity to generate runtimeoptimizations by utilizing parallel SATsolving.
1 Introduction
Software has become an important part of almost all modern technical devices, such as cars, airplanes, household appliances, therapy machines, and many more. The cars of tomorrow will drive on their own but will be controlled by software. As shown by serious accidents like the rocket crash of Ariane flight 501 [25], the massive overdoses of radiation generated by the therapy machine Therac25 [24] or the car crash of the Toyota Camry in 2005 [23] software is never perfect, it almost always contains errors and bugs. While testing of software can only cover a limited number of program executions, software verification can guarantee a much higher coverage while producing proofs for the existence or absence of errors. There exist several different software verification approaches, as for instance symbolic execution [21] and bounded model checking [13]. Bounded model checking inlines function calls and unrolls loops a finite number of times, say times, where is called the bound of the program. This unrolling reduces the complexity of the problem to a feasible level, though it limits the coverage and precision of these approaches.
By means of extending the functionality of bounded model checkers, we developed a novel unbounded model checking approach. To this end, we removed the bound that limits all bounded model checkers and created a transition system that is traversed by an incremental SATsolver or an invariant checking algorithm. We focus on sequential programs written in C and use the lowlevel code representation of the compiler framework LLVM as an intermediate language. Based on this representation we derived an encoding of the program verification task into a DimSpec formula. A DimSpec formula uses four CNF formulas to specify a transition system and is often used in SAT based automated planning. We first encode the program into an SMT formula and, subsequently, we generate the SATproblem in DimSpec format. The resulting DimSpec formula is then solved by either an incremental SATsolver that unrolls the transition system to find a transition path to the error state or an invariant checking algorithm that refines an overapproximation of a transition path to the error state.
Our verification system uses Clang and LLVM version 3.7.1 to compile Ccode into the LLVM intermediate language. Then our new tool LLUMC (LowLevelUnboundedModelChecker) generates DimSpec formulas representing the presence of certain errors in the program. To solve the generated formulas [18] we either use the incremental SATsolver IncPlan [7] or the invariant checking algorithm implemented in the solver MinireachIC3 [8]. LLUMC was inspired by the bounded model checker LLBMC [27] but runs independently. Our evaluation is based on the Software Verification Competition (SVComp) and shows the correctness and feasibility of our approach. LLUMC is available online at [3].
2 Preliminaries
We assume the reader is familiar with propositional logic, firstorderlogic and SAT and use definitions and notations standard in SAT. This section will introduce incremental SATsolving and describe the theory of bitvectors in the context of SMTsolving. Furthermore, the software bounded model checking approach is briefly described.
2.0.1 Incremental SATSolving
In the assumption based interface [16], two methods are used: add(C) and solve(A), where is a clause and a set of literals called assumptions. All clauses can be added with the add method and their conjunction can then be solved under the condition that all literals in are true by . To add a removable clause , we add , where is an unused variable. The clause is only relevant, if we add the literal (called activation literal) to the assumptions . If the activation literal is not added to the assumptions is essentially removed from the set clauses.
DimSpec Formulas A DimSpec formula represents a transition system with states , where each state is a full truth assignment on Boolean variables . It consists of four CNF formulas: and , where are the initial clauses, i.e., clauses satisfied by , are goal clauses satisfied by final state , the clauses are satisfied by each individual state and finally the transitional clauses are satisfied by each pair of consecutive states . The clause sets contain the variables and contains . Testing whether the goal state is reachable from the initial state within steps is equivalent to checking whether the following formula is satisfiable.
where , , and denote the respective formulas where each variable is replaced by . One way to find the smallest number of steps to reach the goal state from initial state is to solve until a satisfiable formula is reached. An efficient way to implement this is to use an incremental SAT solver with the assumption based interface via the following steps.
This algorithm works only if the goal state is reachable from the initial state, otherwise it does not terminate. A more sophisticated approach that can detect unreachability is described next.
IC3 algorithm A different approach to solve a transition system reachability is described in [14] and implemented in the tool IC3 (Incremental Construction of Inductive Clauses for Indubitable Correctness). Given a transition system and a safety property the algorithm can prove that is , meaning that regarding the property is true in all reachable states or produces a counterexample. IC3 incrementally refines a sequence of formulas that are overapproximations of the set of states reachable in at most steps. It can extend the formula sequence in major steps that increase by one. In minor steps the algorithm refines the approximations with by conjoining clauses to with . Given a finite transition system and a safety property , the IC3 algorithm terminates and returns true, iff is true in all reachable states of [14]. The IC3 algorithm was implemented and adjusted^{1}^{1}1The clause sets represent the transition system and represents the negation of the invariant property . to the DimSpec format in the tool MinireachIC3 by Suda [8].
2.0.2 Satisfiability Modulo Theories (SMT)
Due to quantifiers, firstorderlogic is generally undecidable but there are numerous decidable subsets. The problem of solving those subsets or theories is called satisfiability modulo theories or SMT. There is a lot of research on various theories, there are for example the theory of arrays, bitvectors, floating points, heaps, linear arithmetic and many more. These theories can be seen as restrictions on possible models of firstorderlogic formulas [26]. In this paper, we will restrict ourselves to the theory of bitvectors. SMT was standardized by the SMTLIB initiative [9]. We will use the same notations, especially when referring to SMT functions defined in the different theories. Such an SMTLIB function could for example be , describing the addition of two bitvectors and . A more complex function is called ifthenelse () and is defined by:
(1) 
We refer to the theory of fixedsize bitvectors defined by the SMTLIB standard in [9]. The theory of bitvectors models finite bitvectors of length and operations on these vectors into firstorderlogic. The set of function symbols contain standard operations on bitvectors as for example the addition, multiplication, unsigned division, bitwise and, bitwise or, bitwise exclusive or, left shift, right shift, concatenation, and extraction of bitvectors.
2.0.3 Software Bounded Model Checking
The general idea of bounded model checking (BMC) is to encode the states of a system and the transition between them. Furthermore, you unroll any loop and function calls times. The number is called the bound and is the reason for the decidability of bounded model checking but also for its limitations. After the unrolling and encoding of the program, a formula that represents the negation of a desired property is added and the formula is solved with a SMT or SATsolver. If the solver finds a model for the formula, the approach has found an error and the model can be used as a counterexample. The loopbound can be increased step by step until a fixed bound is reached. Thus, the counterexample is always minimal and easier to comprehend for a user. The question to which bound the loop should be unrolled is complex and further discussed for example by Biere et al. [13].
Bounded model checking is implemented for example in the tool LLBMC (LowLevelBoundedModelChecker). It was developed at the research group "Verification meets Algorithm Engineering" at the KIT with the aim to verify safetycritical embedded systems [26]. To support large parts of the C and C++ languages it uses the compiler framework LLVM as it’s foundation. With it’s algorithm LLBMC is able to create very positive results and earned a number of gold, silver and bronze medals in the Software Verification Competition (SVComp), which we will describe and refer to in our evaluation in Section 4. We will use LLBMC as a stateoftheart reference to compare it to our approach.
2.0.4 LLVM Representation
LLVM is an open source compiler framework project that consist of a "collection of modular and reusable compiler and toolchain technologies" [1]. It supports compilation for a wide range of languages and is known for its research friendliness and good documentation. To work directly on Ccode is very complex and it is nearly impossible to support all features and libraries. Thus, we use the intermediate language of LLVM, which describes the statements more directly and provides a number of optimizations and simplifications. We define a LLVMmodule bottom up. The smallest executable unit is called an instruction. An instruction is an atomic unit of execution that performs a single operation. A basic block is a linear sequence of program instructions having one entry point and one exit point. It may have many predecessors and many successors and may be its own successor. The last instruction of every basic block is called terminator. Every basic block is part of a function. A function is a tuple of a name , a sequence of basic blocks , and an entry block . Hereinafter, we will denote the main function of every program with . A module is a pair of a set of function symbols and a set of global variable symbols .
To optimize our encoding, we run some predefined optimization passes from LLVM and LLBMC on the generated LLVMmodule. Among other things, these optimizations remove undefined behavior in Ccode, promote memory references to register references and inline the program into one main function. These optimizations are described in more detail in [22]. The resulting LLVMmodule is then used as input for our encoding.
3 LLUMC Encoding
A bug or error in a software program is a wellknown notion but there exists no universal definition. A general concept is that a program has an error, if it does not act according to its specification. For our approach this definition is not specific enough. We will not cover all possible errors but concentrate on two main properties. One of them is the occurrence of an undefined overflow for the signed arithmetic operations addition, subtraction, multiplication and division. We define undefined overflows independent from the variable type and thus independent of the bitvector length representing the variable. Let be a variable in two’s complement and let be the bitlength of , then returns the maximal value for : and returns the minimal value . In the C language unsigned overflows are defined by a wrap around. The addition of two unsigned integers and is e.g. defined modulo :
Thus, we can consider undefined overflows solely on signed variables.
Definition 1 (Undefined Overflow)
Let be signed variables of length , then an undefined overflow occurs, if

,

,

,

with and .
The other property for our error definition, regards calls to assume and assert. A program acts according to its specification, if the assert statements are true under the condition that the assume conditions are met. If the assume condition is not met, the further run of the program is not specified and thus no errors can occur. With these two properties in mind, we can define the term error for our approach.
Definition 2 (Program Error in LLUMC)
Let be a program. Then there exists an error in , if all calls to assume that are prior to an assert statement or possible overflow are true and one of the following holds.

An assertion is false: a call to assert with the parameter false.

The occurrence of an undefined overflow for an arithmetic operation.
Of course, there are other errors that can happen during a program execution like irregular bitshifting, nontermination and many more. These errors can be regarded in future work and for the remainder of this paper the expression "error" is equated with the above definition.
To find these errors we regard an LLVMmodule as stated in Section 2.0.4. After inlining all function calls, we can concentrate on just the main function. Every basic block together with its variable assignment can be seen as a state. We then add a special error state and try to find a path from the entry state, defined by the entry block of the main function, to the error state. Therefore, we first define the state space of our encoding.
3.0.1 State Space
Transition from one state to the next state will always represent the transition from one basic block to the next with respect to its current variable assignment. Often this kind of encoding is called small block encoding [11]. According to the theory of bitvectors, we define every state variable as a bitvector of length . The number of bitvectors in the state, including the bitvectors representing the current and previous basic block, define the number of SMT variables that are needed to encode the state and the number of bits in total represent the number of CNF variables needed.
The focus on the theory of bitvectors, allows us to ignore the state of the main memory and concentrate on the immediate LLVMmodule^{2}^{2}2Generally, encoding the state of the main memory is not easily realized and to integrate a main memory model in our approach requires further research.. First of all, every state has to save the current basic block. Hereinafter denotes the number of basic blocks of the main function. For our encoding we need two additional blocks. The ok block represents a safe state from which on, no more errors can occur. This block is reached when the program terminates with the output 0 or when an assume condition is not met. The second block is called error and is our goal state, representing that an error occurred. With the function we uniquely map every basic block to a natural number. If there are basic blocks in , then the bitvector needs to have the length to encode the current basic block. We call this variable:
In LLVM the value of a register can depend on the previous basic block and must thus also be encoded:
Furthermore, we need to save the current variable assignment. We do not need the assignment of all variables, but should concentrate on those that will be accessed later on and cannot be optimized away. Those variables can be classified by two properties and we call the set of those variables :

Variables that are used in more than one basic block and

variables that are read before written in the same basic block, which is part of a loop.
It is enough to add only those variables to the state space, because all other variables are included during the encoding of the entailing basic block and their value is not directly used for a transition step. The length of the variables depends on their type. The standard integer in C has a width of 32 bits, long has 64 and Boolean values have a width of 1. There are other types but their lengths is always specified by LLVM and can thus be easily extracted.
Definition 3 (State)
The state space is the set of bitvector variables: Every variable of the state space has a fixed bitlength and can take on concrete bitvectors of length as values. For a specific time point the state state(k) is the assignment of concrete bitvectors to every variable.
3.0.2 Encoding to SMT
Our aim is to develop an encoding for an LLVMmodule defined in Section 2.0.4 that fits the DimSpec format. Therefore, we must define the four CNF formulas in such a way that if there exists a transition from to defined by and restricted by then there exists an error in the given program code.
The initial formula can be created by encoding the entry block of the LLVMmodule. Due to the restriction on the theory of bitvectors global variables are not regarded, because they always include a memory access. The encoding has to represent the state that we are currently at the first basic block and that there were no prior actions. We declare the entry block itself as the predecessor to exclude any prior actions. The initial formula is thereby timeindependent, because the entry block is the same for every time step. The rest of the variable assignment is arbitrary at this point and can be left undeclared.
Definition 4 (Encoding of Initial Formula)
Let entry be the name of the first block, then the initial formula for the LLVMmodule and for is defined as:
The encoding of the goal formula is also timeindependent and can be defined accordingly.
Definition 5 (Encoding of Goal Formula)
Let error be the name of the error block, then the goal formula for the LLVMmodule and for is defined as:
The universal formula consists of constraints that have to be true in all states. In our case, that are boundaries for the variables and . In the previous section, the number of bits needed to encode the current and previous basic block were defined as . In most cases is not a power of two and thus bigger numbers can be represented. These numbers must be excluded at all times in the universal formula .
Definition 6 (Encoding of Universal Formula)
Let be the number of basic blocks in the LLVMmodule, then the universal formula for is defined as:
At last, we have to define the transition formula. It represents the transition between state and state . It is important to notice that the transition formula has twice as much variables as the other formulas. To distinguish between the variables in timepoint and every variable of our state space is called at timepoint . Otherwise, every transition formula would be evaluated to false and thus no transition step could ever be taken. In general, the encoding of one transition has the form:
(2) 
We call antecedent and consequent. For each that is reachable from our initial state, a transition must be defined. An undefined transition leads to an undefined with arbitrary values. Thus, if there is a reachable, undefined transition all goal states can be reached. For the same reason, we determine that for each the transition must be explicit. Variables that are not important for the transition should not be declared in the antecedent but should be specified in the consequent to avoid undefined values. We will use the auxiliary function
to encode that variables which are not modified in a basic block maintain their current value. The function returns the conjunction of all , for all variables in our state space, that have not been modified in the transition of our basic block .
To encode the transition between steps, we take a closer look at the current basic block, further denoted as and customize Equation 2 for different branching possibilities. We divide basic blocks into three groups and distinguish them by means of their terminator. Afterwards, we will have a special look at the function calls of , and . These function calls together with the possibility of overflows will extend the encoding. The three different types of terminator instructions are called unconditional branching, conditional branching and return.
Unconditional branching (br %bb2): Branches to the basic block with the label and creates a transition from the current basic block to . If the current basic block has no other instructions, only the change of basic block and the saving of the predecessor have to be encoded. Furthermore, we have to state that no variables have changed during this transition:
(3) 
This encoding is rarely complete, because it does not regard all other instructions in the basic block . Let be the ordered list of instructions from bottom to top in . Then we iterate over and regard all instructions that (1) are part of our state, (2) have not been visited before and (3) are not the terminator instruction.
The instruction is encoded according to its type and its operands. When an instruction like is encoded, the algorithm checks the operands first. When regarding the value , the algorithm checks whether it is a variable that is part of our state or a value calculated by an instruction, which the algorithm has to encode recursively. The stop criterion is always the occurrence of a state variable, a constant like for example or a call to assert, assume or error. For the add instruction the encoding would result in . This generated SMT formula is then conjuncted with the consequent of equation 3. For arithmetic operations an additional overflow check formula, which is described later on, is inserted. The algorithm continues by iterating further through the list until there are no instructions left.
Conditional branching (br %cond, %bb1, %bb2):
Creates a transition to with the condition and a transition to with the condition .
Every conditional branch has a branching condition represented as a variable (). We can extract that condition by visiting and encoding the variable representing the branching condition.
In LLVM the branching condition is a Boolean value that is assigned by the so called . This instruction returns a Boolean value based on the comparison of two values and it supports equality, unsigned and signed comparison. The icmpinstruction is then encoded recursively by visiting its two operands with the same approach as described for the unconditional branching. The result could for example be the condition .
Based on this condition, the algorithm creates two separate transitions.
Return value (): The value can be an arbitrary integer and represents the return value of the program as usual. This terminator creates a transition to . In an extended and already implemented version, another check is inserted verifying that the result value of a correct program is 0 and if this does not hold a transition to is created.
Now we have to look at the calls to assume, assert, error and the possibility of overflows. During the instruction iteration of a basic block, we regard these instructions differently because they lead to a split of our transitions.
Method calls (error, assume, assert): If the method, which is used to specify program errors in Ccode, is called inside a basic block, we do not have to regard any other instructions and thus delete all other transitions from this basic block. We produce a single transition:
The other three possibilities lead to a split of our transitions similar to the conditional branching. A call of divides the set of current transitions for our basic block. The condition is and leads to a transition to the state with . The call to is similar only with the transition to if holds true. In both cases, the encoding continues normally with the next instruction if the conditions are not met.
Overflow Checks: While calls to error, assume and assert are explicit calls in the LLVMmodule, we have to recognize possible overflows while still encoding the operations correctly. Therefore, an overflow check is always inserted when is called on an arithmetic operation with the flag . In this case, we know that there is a signed operation with no defined wrap around. If the condition for an overflow is true, we transition to the state. We will give the formula for the signed addition, the formulas for subtraction, multiplication and division are similar and comply with the undefined overflow in Definition 1.
Addition: The result of adding up two positive numbers must always be positive and the addition of two negative values must always result in a negative value. Whether the result is positive or negative can be seen by the signbit. Starting with 0, we will refer to a single bit at position of a bitvector by . The position of the signbit has the special index . Let be the result of adding the two bitvectors and , then the condition for an undefined overflow is defined by:
All components of the transition formula have now been discussed. To obtain the complete transition formula the algorithm has to iterate over all basic blocks of the main function. Depending on their terminator instruction, every basic block has do be encoded according to the definitions above. To predict which transition is taken in which step would be equal to solving the whole formula. Thus, the transition formula is time independent and the transition possibilities for all time steps are part of the formula.
Definition 7 (Encoding of the Transition Formula)
Let be the set of all basic blocks of and let with be the encoding as shown above, then the transition formula for is defined by:
(4) 
Claim
Proof idea:
We forego on a formal proof, because it would require a structural induction over huge sets of CCode and the LLVMlanguage. Instead, we present short arguments and references for our claim.
(1): Using LLVM as a representation for Ccode is widely accepted and used in research and industry. We assume that the transformation from Ccode into a LLVMmodule does not remove or add any errors based on the high number of research papers [4, 6, 10] and tools like LLBMC [26] and SeaHorn [19].
(2): The error node has three types of incoming edges: from an assert statement, from an overflow check and an edge from the error node itself. We disregard the edge that points to itself and are left with the two options that match the properties defined in 2. If the encoding of the variables is, as we claim, correct and our state space is closed under and we can assume that the a transition path from the initial state to the error state complies with an error in the LLVMmodule.
3.0.3 From SMT to SAT formula
The encoding of the LLVMmodule gives us four SMT formulas. These formulas have to be translated into CNFs. The most widespread approach to transform SMT to CNF formulas is called bitblasting. We have taken one approach implemented in STP [17] and the ABClibrary [20] and modified these algorithms to correspond to some technical requirements of the DimSpec format. Finally, a CNF in the DimSpec format is created that can be used as input for a number of SATsolvers.
4 Experimental Results
The LLUMCapproach is implemented as a toolchain. First, the input file in Ccode is compiled with Clang (version 3.7.1) and then optimized with LLVM and LLBMC passes. This optimized LLVMmodule serves as input for the program LLUMC, which performs the encoding as described above. To transform the created SMT formulas into CNF formulas in DimSpec format, the tool STP was modified. The final renaming and aggregation is implemented directly in LLUMC. Thus, the tool produces a single CNF file in DimSpec format.
We tested two different approaches to solve the generated DimSpec/CNF formulas. The tool IncPlan [18] was developed at KIT and implements the incremental SATsolving described in Section 2.0.4. It can be used with every SATsolver that accepts the Reentrant Incremental Satisfiability Application Program Interface (IPASIR). We have tested IncPlan with a number of SATsolvers including Minisat [28], abcdSat [15], Glucose [5] and Picosat [12]. While Glucose and Minisat produced good results for some benchmarks, they crashed for a number of benchmarks and thus we concentrated on the usage of abcdSat and PicoSat. The IC3 algorithm was implemented and adjusted to the DimSpec format in the tool MinireachIC3 by Balyo and Suda [8]. The safety property expresses that the error state should not be reachable and thus is given by . Thus, we are not only able to prove the existence of errors but also their nonexistence.
4.1 Benchmarks
We evaluated our approach using benchmarks from the Software Verification Competition (SVCOMP) [10]. The SVComp is an annual competition for academic software verification tools, with the aim to compare software verifiers. The competition is conduced every year since 2012. The verification tasks are divided in different topics and verification tasks are contributed by a number of research and development groups. While we were not able to participate in the competition, the collected benchmarks serve as an excellent evaluation basis for every verifier. All benchmarks are available at [2] and we regarded the subfolder c with programs written in the language C.
We screened these benchmarks for tasks that match our theory of bitvectors. We excluded all benchmarks that do not match our theory and removed benchmarks that include memory accesses or floating point arithmetic. Furthermore, we checked that all instructions used in the examples were implemented in LLUMC. It is notable, that nearly all instructions were implemented and only the truncate instruction, which cuts the length of values, restricts the usable benchmarks. The truncate instruction is not included in most theories of bitvectors e.g in tools like LLBMC, because on a programming level there is not enough (signedness) information about the bitvector to truncate it easily. Lastly, we excluded recursive and concurrent tasks due to the inlining in our approach.
We evaluated our approach on 14 incorrect and 10 correct programs. Our approach creates a CNF formula representing the problem of finding a transition path to the error state. Thus, the desired result of our approach should be sat in case there exists an error and unsat if there is none. Whereby most benchmarks are smaller and have the purpose of demonstrating the correctness of our approach, we were also able to evaluate our approach for some larger problems. The benchmarks vary between 14 and 646 lines of code (LoC) and 151 to 116777 number of clauses. The evaluation was performed on a system with 64 CPUs with 2.4GHz from which, for our sequential approach, only one was used and 483 GB memory. Each benchmark had a time limit of 600 seconds and a memory limit of 8 GB. The time needed to generate the CNF formula and to read and write CNF formulas in and out of files is negligible for larger problems. Thus, we decided to measure only the CPU time needed to solve the generated CNF formulas.
Table 3 displays the result of solving the generated DimSpec/CNF formulas both with the tool IncPlan and MinireachIC3. The results of running IncPlan with the SATsolver abcdSat were most stable and are thus displayed. One can see, that our approach generates correct encodings of the Ccode and that IncPlan is able to find a satisfying model representing a transition path to the error state for erroneous programs. We also recognize that for small problems the time and memory needed is insignificant and for larger problems it is still manageable. For programs without an error we are not able to prove anything, but the timeouts indicate the correctness of our encoding. The jain benchmarks show that the number of iterations the SATsolvers are able to perform in the given time depends on the complexity of the individual basic block and varies for all benchmarks.
abcdSAT  MinireachIC3  
Benchmark  ans  T  M  Its  ans  T  M  Its  
error 
diamond false unreach call2  sat  3.95  130  27  TO  600  241.2  / 
implicitunsignedconversion false unreach call  sat  0.01  0.1  4  sat  0.001  0.0  4  
jain 1 false no overflow  sat  0.01  0.1  3  sat  0.007  0.0  3  
jain 2 false no overflow  sat  0.01  0.1  3  sat  0.019  0.0  3  
jain 4 false no overflow  sat  0.01  0.1  3  sat  0.023  0.0  3  
jain 5 false no overflow  sat  0.01  0.1  3  sat  0.009  0.0  3  
jain 6 false no overflow  sat  0.01  0.1  3  sat  0.024  0.0  3  
jain 7 false no overflow  sat  0.01  0.1  3  sat  0.021  0.0  3  
signextension2 falseunreachcall  sat  0.01  0.1  7  sat  0.002  0.0  7  
overflow false unreach call1  TO  600  899.0  /  TO  600  366.3  /  
overflow false unreach call1 smaller  sat  11.34  113.0  507  sat  0.83  25.5  502  
overflow false unreach call1 smaller2  sat  20.28  234.7  720  TO  600  686.3  /  
s3 clnt 1 false unreach call true no overflow.BV.c.cil  sat  198.93  940.5  68  TO  600  377  /  
s3 clnt 2 false unreach call true no overflow.BV.c.cil  sat  0.79  57.4  4  sat  0.628  33.6  4  
no error 
implicitunsignedconversion true unreach call  TO  600  300.4  13089  unsat  0.002  0.0  4 
jain 1 trueunreachcall truenooverflow  TO  600  1458.1  5595  unsat  0.008  0.0  4  
jain 2 trueunreachcall truenooverflow  TO  600  2386.7  4230  unsat  0.030  0.0  6  
jain 4 trueunreachcall truenooverflow  TO  600  2795.9  3273  unsat  0.047  0.0  6  
jain 5 trueunreachcall truenooverflow  TO  600  955.2  3083  unsat  0.046  0.0  7  
jain 6 trueunreachcall truenooverflow  TO  600  2447.0  2827  unsat  0.052  0.0  6  
jain 7 trueunreachcall truenooverflow  TO  600  1760.7  2731  unsat  0.531  0.0  6  
signextension2 trueunreachcall  TO  600  504.3  11995  unsat  0.002  0.0  6  
gcd 4 true unreach call true no overflow  TO  600  2045  2249  unsat  6.853  25.7  11  
s3 srvr 1 true alt truenooverflow.BV.c.cil  TO  600  1182.5  93  TO  600  446.5  / 
MinireachIC3 in comparison is not only able to prove the existence of errors but is also able to prove their nonexistence. For erroneous programs the time difference between IncPlan and IC3 is negligible for smaller benchmarks. For some of the larger benchmarks the algorithm produces a timeout. In general it is harder to prove the absence of errors than to prove their existence. To prove the existence of an error, the solver only needs to find a valid transition path to the error label, while needing to exclude all possible transition paths to the error label for proving the absence of an error. This complexity is displayed in Table 3. The "jain false" and "jain true" benchmarks only differ in a slightly changed assert statement but to prove the absence of an error always takes more time than to prove its existence. In the case of "jain 7" even 25 times longer.
After evaluating the feasibility of our approach Figure 1 shows the comparison between the LLUMCapproach with the stateoftheart bounded model checker LLBMC. When comparing an unbounded model checker like LLUMC with a bounded model checker, we have to determine a bound until which the bounded model checker unrolls the program. When setting the bound too small, LLBMC runs very fast but has a high chance of producing incorrect results but if we set the bound to high, LLBMC needs a long time to encode and solve the formula. We tested LLBMC with the bounds of 10, 100 and 1000 and compared it with our results generated by IncPlan and MinireachIC3.
Looking at Figure 1, we can recognize the time difference depending on the defined bound. Setting the bound to 10 leads to a really fast solving process but it can solve fewer problems compared to the bound of 100. Setting the bound to 1000 results in timeouts for more complex benchmarks and thus regresses the number of solved problems. After some overhead for smaller problems, solving the benchmarks with IncPlan and abcdSat leads to good results but due to its restriction of only finding errors and not disproving them, it can not solve as many benchmarks as MinireachIC3. The IC3 algorithm can solve 20 out of 24 benchmarks and has a performance advantage compared to all other approaches.
The experimental evaluation illustrates the correctness of our approach for a wide variety of problems. Furthermore, it indicates that the time needed for most problems is reasonable. For model checking in general, the scalability for large programs is always a challenge.
5 Conclusion and Future Work
We introduced a novel unbounded model checking approach to find errors in software or prove their nonexistence by using the DimSpec format. We have developed a new encoding from Ccode to a CNF formula in the DimSpec format. Using the intermediate language LLVM, we are able to transform the existence of an error in Ccode into four SMT formulas representing the problem of finding a transition path from the initial state of the program to a defined error label. By means of an AIGsupported bitblasting algorithm, the four SMT formulas are then transformed and added into one CNF in DimSpec format. The encoding has been implemented in the tool LLUMC and we have evaluated this encoding using both the incremental SATsolving algorithm implemented in the tool IncPlan and the invariant checking algorithm implemented in MinireachIC3. Based on benchmarks from the SVComp, the evaluation shows that we extended the functionality of current solvers for infiniteloops while providing correct results and are also comparable to the stateoftheart solvers regarding solvingtime.
Transforming Ccode and the existence of errors into CNF formulas in DimSpec format results in a wide range of possibilities to solve the given problem. While we tested incremental SATsolving and the invariant checking algorithm of IC3, there is also the chance of utilizing advances in parallel SATsolving for our approach. IncPlan can be run with parallel SATsolvers as backend tools, and IC3 was designed to fit both sequential and parallel SATsolving.
In addition to parallel solving, the performance of the LLUMC approach can also be improved by enlarging the incremental steps of the solver. A first evaluation shows that merging basic blocks in LLVM leads to performance improvements, indicating that a large block encoding could be advantageous. Furthermore, the functionality of the approach can be extended. As a next step, an implementation of other theories like the theory of arrays would make LLUMC usable on a greater range of programs.
References
 [1] The llvm compiler infrastructure, http://llvm.org/, accessed on November 1, 2017
 [2] Svbenchmarks, https://github.com/sosylab/svbenchmarks/, accessed on October 12, 2017
 [3] Llumc (low level unbounded model checker (2017), https://github.com/MarkoKleineBuening/llumc, [revision hash: 94e93beecdc7d82bdf6366aef92a7cc7b3ee89a3]
 [4] Albarghouthi, A., Li, Y., Gurfinkel, A., Chechik, M.: Ufo: A framework for abstractionand interpolationbased software verification. In: Computer Aided Verification. pp. 672–678. Springer (2012)
 [5] Audemard, G., Simon, L.: Glucose in the sat 2014 competition. SAT COMPETITION 2014 p. 31 (2014)
 [6] Babic, D., Hu, A.J.: Structural abstraction of software verification conditions. In: CAV. pp. 366–378. Springer (2007)
 [7] Balyo, T., Gocht, S.: Accelerating sat based planning with incremental sat solving (2017)
 [8] Balyo, T., Suda, M.: Reachlunch entering the unsolvability ipc 2016. Unsolvability IPC: planner abstracts pp. 3–5 (2016)
 [9] Barrett, C., Stump, A., Tinelli, C., et al.: The smtlib standard: Version 2.0. In: Proceedings of the 8th International Workshop on Satisfiability Modulo Theories (Edinburgh, England). vol. 13, p. 14 (2010)
 [10] Beyer, D.: Second competition on software verification(summary of svcomp 2013). In: TACAS. vol. 7795, pp. 594–609. Springer (2013)
 [11] Beyer, D., Cimatti, A., Griggio, A., Keremoglu, M.E., Sebastiani, R.: Software model checking via largeblock encoding. In: Formal Methods in ComputerAided Design, 2009. FMCAD 2009. pp. 25–32. IEEE (2009)
 [12] Biere, A.: Picosat essentials. Journal on Satisfiability, Boolean Modeling and Computation 4, 75–97 (2008)
 [13] Biere, A., Cimatti, A., Clarke, E.M., Strichman, O., Zhu, Y.: Bounded model checking. Advances in computers 58, 117–148 (2003)
 [14] Bradley, A.R.: Satbased model checking without unrolling. In: Vmcai. vol. 6538, pp. 70–87. Springer (2011)
 [15] Chen, J.: Minisat bcd and abcdsat: Solvers based on blocked clause decomposition. SAT RACE (2015)
 [16] Eén, N., Sörensson, N.: An extensible satsolver. In: International conference on theory and applications of satisfiability testing. pp. 502–518. Springer (2003)
 [17] Ganesh, V., Dill, D.L.: A decision procedure for bitvectors and arrays. In: CAV. vol. 4590, pp. 519–531. Springer (2007)
 [18] Gocht, S.: Incremental sat solving for sat based planning (2017)
 [19] Gurfinkel, A., Kahsai, T., Navas, J.A.: Seahorn: A framework for verifying c programs (competition contribution). In: TACAS. pp. 447–450 (2015)
 [20] Jha, S., Limaye, R., Seshia, S.: Beaver: Engineering an efficient smt solver for bitvector arithmetic. In: Computer Aided Verification. pp. 668–674. Springer (2009)
 [21] Khurshid, S., Păsăreanu, C.S., Visser, W.: Generalized symbolic execution for model checking and testing. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. pp. 553–568. Springer (2003)
 [22] Kleine Büning, M.: Unbounded software model checking with incremental satsolving (2017)
 [23] Koopman, P.: A case study of toyota unintended acceleration and software safety. Presentation. Sept (2014)
 [24] Leveson, N.G., Turner, C.S.: An investigation of the therac25 accidents. Computer 26(7), 18–41 (1993)
 [25] Lions, J.L., et al.: Ariane 5 flight 501 failure (1996)
 [26] Merz, F.: Theory and Implementation of Software Bounded Model Checking. Ph.D. thesis, Dissertation, Karlsruhe, Karlsruher Institut für Technologie (KIT), 2016 (2016)
 [27] Merz, F., Falke, S., Sinz, C.: Llbmc: Bounded model checking of c and c++ programs using a compiler ir. Verified Software: Theories, Tools, Experiments pp. 146–161 (2012)
 [28] Sorensson, N., Een, N.: Minisat v1. 13a satsolver with conflictclause minimization. SAT 2005(53), 1–2 (2005)
Appendix
.1 Running Example
To illustrate the transformation from Ccode into a LLVMmodule and later on into a CNF formula, we demonstrate the encoding on an example. The example was taken out of the benchmark verification tasks of the competition on software verification (SVComp). It can be found under the category bitvectorloops. Example 1 iterates through a whileloop until is smaller then 10. In every loop the value 2 is added to the even number . At the first glance, the loop will never terminate but after a high number of iterations an overflow occurs and the value becomes smaller then 10, while still being an even number. The maximal value of an unsigned integer () is the uneven number . After a high number of loop iterations the value would be . The addition with would then result in and thus the assert condition will fail, because is still an even number. This example shows the limitations of bounded model checkers, because they would only unroll the loop to a specific bound that often is not high enough to find errors like these.
Example 1 (CCode)
The LLUMCapproach takes this cCode file and transforms it into an LLVMmodule.
Example 2 (LLVMmodule before optimizations)
We regard only the main function of the LLVMmodule. This function consists of four basic blocks. Variables are marked by an , representing register variables. The first basic block assigns the constant 4294967194 to the variable . It first allocates the needed space before storing the value into the register variable. The second basic block represents the ifcondition of Example 1. It loads the value of into a register variable and the instruction icmp checks whether it is greater or equal (uge) then 10. Depending on the output, the branching instruction (br) jumps to the third or fourth basic block. The third basic block, representing the body of the ifstatement, adds the constant 2 to . The fourth basic block checks the assert condition (x%2) by first extracting the remainder of the unsigned division of with 2 (urem) and then calling the method with the result as a parameter.
In theory one could work on this LLVMmodule, but it is more efficient and easier to first run some predefined optimization passes from LLVM and LLBMC. In the first step, we remove undefined values in LLVM. Furthermore, the optimization mem2reg promotes memory references to be register references. The pass called inline tries to inline all functions bottomup into the main function. Afterwards the two passes instnamer and simplifycfg simplify the program. After running these optimizations on our Example 2 we get the following LLVMmodule function as input for the LLUMCapproach.
Example 3 (Optimized LLVMmodule)
We can see the results of running the LLVMpasses when comparing the resulting main function with the earlier Example 2. The result of running the instname pass is obvious when looking at the naming of basic blocks and variables. The mem2reg pass replaced all allocate, store and load instructions with the phi instruction. Hence, the value of is set either to 4294967194 when coming from the entry block or to the earlier calculated . The inlining pass inlined the function and checks in line 15 whether the assert condition was true(1) or false(0).
.2 Encoding of the example as described in the paper
The state space of this example consists of the two variables and with a bitlength of four. Furthermore, the variable with a bitlength of 32 is added to the state space, because it occurs in the basic block and also in . The SMT function represents the modulo calculation and the function assigns values to the basic block as following:
().
The encoding algorithm iterates over all basic blocks of the LLVMmodule and encodes them as described in the paper. The encoding of the example leads to the following formulas, which are then transformed to CNFformulas by an AIGbased approach.
Initial Formula:
Goal Formula:
Universal Formula:
Transition Formula:
.3 Details from the Experimental Evaluation
Details about the benchmarks used for the experimental evaluation are given in table format. Furthermore, detailed evaluation results are displayed.
Benchmark  Result  Time/s  Memory/MB  Phases  
error 
diamond false unreach call2  timeout  600  241.2  / 
implicitunsignedconversion false unreach call  sat  0.001  0.0  3  
jain 1 false no overflow  sat  0.007  0.0  2  
jain 2 false no overflow  sat  0.019  0.0  2  
jain 4 false no overflow  sat  0.023  0.0  2  
jain 5 false no overflow  sat  0.009  0.0  2  
jain 6 false no overflow  sat  0.024  0.0  2  
jain 7 false no overflow  sat  0.021  0.0  2  
signextension2 falseunreachcall  sat  0.002  0.0  5  
overflow false unreach call1  timeout  600  366.3  /  
overflow false unreach call1 smaller  sat  0.83  25.5  502  
overflow false unreach call1 smaller2  timeout  600  686.3  /  
s3 clnt 1 false unreach call true no overflow.BV.c.cil  timeout  600  377  /  
s3 clnt 2 false unreach call true no overflow.BV.c.cil  sat  0.628  33.6  3  
no error 
implicitunsignedconversion true unreach call  unsat  0.002  0.0  3 
jain 1 trueunreachcall truenooverflow  unsat  0.008  0.0  5  
jain 2 trueunreachcall truenooverflow  unsat  0.030  0.0  5  
jain 4 trueunreachcall truenooverflow  unsat  0.047  0.0  5  
jain 5 trueunreachcall truenooverflow  unsat  0.046  0.0  6  
jain 6 trueunreachcall truenooverflow  unsat  0.052  0.0  5  
jain 7 trueunreachcall truenooverflow  unsat  0.531  0.0  5  
signextension2 trueunreachcall  unsat  0.002  0.0  5  
gcd 4 true unreach call true no overflow  unsat  6.853  25.7  10  
s3 srvr 1 true alt truenooverflow.BV.c.cil  timeout  600  446.5  / 
90
LLBMC  LLUMC  

Benchmark  Bound 10  Time/s  Bound 100  Time/s  Bound 1000  Time/s  abcdSat  Time/s  MinireachIC3  Time/s  
error 
diamond false unreach call2  sat  0.017  sat  0.112  sat  1.194  sat  0.1  sat  0.1 
implicitunsignedconversion false unreach call  sat  0.002  sat  0.002  sat  0.002  sat  0.01  sat  0.001  
jain 1 false no overflow  i.b.  0.008  i.b.  0.031  i.b.  0.237  sat  0.01  sat  0.007  
jain 2 false no overflow  i.b.  0.009  i.b.  0.038  i.b.  0.316  sat  0.01  sat  0.019  
jain 4 false no overflow  i.b.  0.010  i.b.  0.044  i.b.  0.381  sat  0.01  sat  0.023  
jain 5 false no overflow  i.b.  0.005  i.b.  0.019  i.b.  0.134  sat  0.01  sat  0.009  
jain 6 false no overflow  i.b.  0.012  i.b.  0.051  i.b.  0.429  sat  0.01  sat  0.024  
jain 7 false no overflow  i.b.  0.008  i.b.  0.054  i.b.  0.449  sat  0.01  sat  0.021  
signextension2 falseunreachcall  sat  0.002  sat  0.002  sat  0.003  sat  0.01  sat  0.002  
overflow false unreach call1  sat  0.007  sat  0.007  sat  0.007  timeout  600  timeout  600  
overflow false unreach call1 smaller  sat  0.027  sat  0.029  sat  0.030  sat  11.34  sat  0.83  
overflow false unreach call1 smaller2  sat  0.249  sat  0.264  sat  0.260  sat  20.28  timeout  600  
s3 clnt 1 false unreach call true no overflow.BV.c.cil  i.b.  0.338  sat  8.347  timeout  600  sat  198.93  timeout  600  
s3 clnt 2 false unreach call true no overflow.BV.c.cil  i.b.  0.295  sat  5.336  sat  222.414  sat  0.79  sat  0.628  
no error 
implicitunsignedconversion true unreach call  no error  0.002  no error  0.002  no error  0.002  timeout  600  unsat  0.002 
jain 1 trueunreachcall truenooverflow  i.b.  0.008  i.b.  0.031  i.b.  0.239  timeout  600  unsat  0.008  
jain 2 trueunreachcall truenooverflow  i.b.  0.009  i.b.  0.039  i.b.  0.316  timeout  600  unsat  0.030  
jain 4 trueunreachcall truenooverflow  i.b.  0.011  i.b.  0.045  i.b.  0.370  timeout  600  unsat  0.047  
jain 5 trueunreachcall truenooverflow  i.b.  0.005  i.b.  0.020  i.b.  0.128  timeout  600  unsat  0.046  
jain 6 trueunreachcall truenooverflow  i.b.  0.010  i.b.  0.051  i.b.  0.407  timeout  600  unsat  0.052  
jain 7 trueunreachcall truenooverflow  i.b.  0.013  i.b.  0.054  i.b.  0.451  timeout  600  unsat  0.531  
signextension2 trueunreachcall  no error  0.002  no error  0.002  no error  0.002  timeout  600  unsat  0.002  
gcd 4 true unreach call true no overflow  no error  0.006  no error  0.020  no error  0.132  timeout  600  unsat  6.853  
s3 srvr 1 true alt truenooverflow.BV.c.cil  i.b.  1.802  i.b.  266.09  i.b.  600  timeout  600  timeout  600 