Graphbased Reduction of Program Verification Conditions
Abstract
Increasing the automaticity of proofs in deductive verification of C programs is a challenging task. When applied to industrial C programs known heuristics to generate simpler verification conditions are not efficient enough. This is mainly due to their size and a high number of irrelevant hypotheses.
This work presents a strategy to reduce program verification conditions by selecting their relevant hypotheses. The relevance of a hypothesis is determined by the combination of a syntactic analysis and two graph traversals. The first graph is labeled by constants and the second one by the predicates in the axioms. The approach is applied on a benchmark arising in industrial program verification.
Graphbased Reduction of Program Verification Conditions
J.F. Couchot 
LIFC 
University of FrancheComté 
F25030 Besançon 
couchot@lifc.univfcomte.fr 
A. Giorgetti 
INRIA CASSIS / LIFC 
University of FrancheComté 
F25030 Besançon 
alain.giorgetti@univfcomte.fr 
N. Stouls 
AMAZONES INRIA / CITI 
INSA Lyon 
F69621 Villeurbanne 
Nicolas.Stouls@insalyon.fr 
\@float
copyrightbox[b]
\end@floatD.2.4 [Software Engineering]: Software/Program Verification
Verification, Experimentation

Proof, hypothesis selection
Deductive software verification aims at verifying program properties with the help of theorem provers. It has gained more interest with the increased use of software embedded in, for instance, airplanes commands, cars or smart cards, requiring a highlevel of confidence.
In the Hoare logic framework, program properties are expressed by firstorder logical assertions on program variables (preconditions, postconditions, invariants, …). The deductive verification method consists in transforming a program, annotated with sufficiently many assertions, into socalled verification conditions (VCs) that, when proved, establish that the program satisfies its assertions. In the KeY system [?] a special purpose logic and calculus are used to prove these verification conditions. The drawback of this approach is has it is specific to a programming language and a target prover. In contrast, a multiprover approach is followed by effective tools such as ESC/Java [?] for Java programs annotated using the Java Modeling Language [?], Boogie [?] for the C# programming language, and Caduceus/Why [?] for C programs. The latter also offers Java as input programming language.
A theorem prover is invoked to establish the validity of each verification condition. One of the challenges in deductive software verification is to automatically discharge as many verification conditions as possible. A key issue is that the whole context of a verification condition is a huge set of axioms modelling not only the property and the program under verification, but also many features of the programming language. Simply passing this large context to an automated prover induces a combinatorial explosion, preventing the prover from terminating in reasonable time.
Possible solutions to reduce the VC size and complexity are to optimize the memory model (e.g. by introducing separations of zones of pointers [?]), to improve the weakest precondition calculus [?] and to apply strategies for simplifying VCs [?, ?, ?]. This work focuses on the latter. We suggest heuristics to select axioms to feed automated theorem provers (ATPs). Instead of blindly invoking ATPs with a large VC, we present reduction strategies that significantly prune their search space. The idea behind these strategies is quite natural: an axiom is relevant if a prover applies it successfully, i.e. without diverging, to establish the conclusion. Relevance criteria are computed by the combined traversal of two graphs representing symbol dependencies within axioms. In the graph of constants edges represent the conjoint presence of two constants in some ground axiom. In the graph of predicates arcs represent logical dependencies between predicates occuring in the same axiom.
In former work [?], selection was limited to ground hypotheses and comparison predicates were not taken into account. This led to unsatisfactory results, for instance when the conclusion is some equality between terms. The present work extends selection to context axioms, comparison predicates and hypotheses with quantifiers. We propose new heuristics that increase the number of automatically discharged VCs.
The plan of the article is as follows. Section Graphbased Reduction of Program Verification Conditions presents the industrial C example that has motivated this work. This case study is a part of the Oslo [?] secure bootloader annotated with a safety property. Section Graphbased Reduction of Program Verification Conditions presents the general structure of a verification condition. Section Graphbased Reduction of Program Verification Conditions shows how dependencies are stored in graphs. The selection strategy of hypotheses is presented in Section Graphbased Reduction of Program Verification Conditions. These last two sections are the first contribution. The second contribution is the implementation of this strategy as a module of Caduceus/Why [?]. Section Graphbased Reduction of Program Verification Conditions presents experimentation results. Section Graphbased Reduction of Program Verification Conditions discusses related work, concludes and presents future work.
Some new challenges for axiom filtering are posed by the context of the PFC project on Trusted Computing (TC). PFC (meaning trusted platforms in French) is one of the SYSTEM@TIC Paris Region French cluster projects. The main idea of the TC approach is to gain some confidence about the execution context of a program. This confidence is obtained by construction, by using a trusted chain. A trusted chain is a chain of executions where each launched program is previously registered with a tamperproof component, such as the Trusted Platform Module (TPM) hardware chipset. In this context of TC, we focus on the Oslo [?] secure loader. This program is the first step of a trusted chain. It uses some hardware functionalities of recent CPUs (AMDV or IntelTET technologies) to initialize the chain and to launch the first program of the chain.
The main trusted chain properties are temporal, but some recent works [?, ?] propose a method to translate a temporal property into firstorder logic annotations in the code. This method is systematic and generates a large amount of VCs, including quantifications and arrays with many links between them. Therefore, this approach is a good generator for VCs with a medium or low level of automaticity. Table Graphbased Reduction of Program Verification Conditions gives some factual information about the studied part of Oslo. The VCs of this benchmark are publicly available [?].
Oslo program and specification Code 1500 lines Specification 1500 lines (functional) Number of VCs 7300 VCs Observed part of Oslo Observed code 218 lines Specification 1400 lines (functional and generated) Number of VCs 771 VCs Table \thetable: Some Metrics about the Oslo Program The verification conditions (VC) we consider are first order formulae whose validity implies that a piece of annotated source code satisfies some property. This section describes the general structure of VCs generated by Caduceus/Why. A VC is composed of a context and a goal. This structure is illustrated in Fig. Graphbased Reduction of Program Verification Conditions.
Goal Context Hypotheses Conclusion Axioms Figure \thefigure: Structure of verification conditions The context depends on the programming language. It is a firstorder axiomatization of the language features used in the program under verification. Typical features are data types or a memory model, enriched to allow the specification of, e.g. separated pointer regions. For instance, a typical VC produced by Caduceus/Why has a context with more than 80 axioms.
VCs are generated in the input format of many firstorder ATPs, among which Simplify [?] and SMT solvers [?]. The Simplify automatic prover has a specific input language. SMT solvers such as AltErgo and Yices have a common input language. AltErgo is however addressed in the Why input language for more efficiency. For SMT solvers, the context is presented as a base theory, usually a combination of equality with uninterpreted function symbols and linear arithmetic, extended with a large set of specific axioms.
The goal depends on the program and on the property under verification. When this property is an assertion about a given program control point, the goal is generated by the weakest precondition (wp) calculus of Dijkstra [?] at that control point. The goal is considered as a conclusion implied by hypotheses that encode the program execution up to the control point.
Running example.
Consider the following function:
struct p { int x; } p; struct t { struct p v[2]; } t; /*@ requires \valid(a) && @ (\forall int i; 0<=i<=1 => \valid(a>v[i])) @ assigns a>v[0].x */ void f(struct t *a) { a>v[0].x = 2; }
The requires annotation specifies a precondition and the assigns annotation means that function f modifies no other location than
a>v[0].x
. The hypotheses of the generated VC areThe conclusion is
The meaning of these formulae is as follows. is the pointer () memory () for the structures of type p. valid_acc() means that the memory is initialized, i.e. that this memory is accessible from any valid pointer in the allocation table. The first two hypotheses correspond to the precondition. In the next two hypotheses the predicates valid_acc_range and separation1_range respectively mean that any access to the memory returns an array such that pointers and are valid and . The last three hypotheses come from a flatteninglike decomposition of the statement a>v[0].x = 2 performed by the VC generator. The function allows access to the index in the array . The conclusion translates the assigns annotation into a relation between two memory values. is the value of memory before execution of
f
and is its value after execution off
. The third parameter is the representation of a>v[0]. Our preprocessor eliminates the last three hypotheses and the intermediary constants that they introduce by considering that the conclusion is() Basically, a conclusion is a propositional combination of potentially quantified predicates over some terms. Dependencies between axioms and the conclusion can then arise from terms and predicates. Terms in the goal may either come from the annotated program (from statements or assertions) or may result from a weakest precondition calculus applied to the program and its assertions. The term dependency just transcribes that parts of the goal (in particular, hypotheses and conclusion) share common terms. It is presented in Section Graphbased Reduction of Program Verification Conditions. Two predicates are dependent if there is a deductive path leading from one to the other. The predicate dependency is presented in Section Graphbased Reduction of Program Verification Conditions. Finally, Section Graphbased Reduction of Program Verification Conditions presents a special dependency analysis for comparison predicates.
In order to describe how hypotheses connect terms together and according to previous work [?], an undirected connected graph is constructed by syntactic analysis of term occurrences in each hypothesis of a VC. The graph vertices are labeled with the constants occurring in the goal and with new constants resulting from the following flatteninglike process. A fresh constant where is some unique integer is created for each term in the goal. There is a graph edge between the two vertices labeled with the constants and when is if is a constant and when is the fresh constant created for if is a compound term ().
Running example.
An excerpt of the graph representing the VC presented in Section Graphbased Reduction of Program Verification Conditions is given in Fig. Running example.. The vertices shift_6 and acc_7 come from the second hypothesis and the other vertices come from the conclusion ().
A weighted directed graph is constructed to represent implication relations between predicates in an efficient way. Intuitively, each graph vertex represents a predicate name and an arc from a vertex to a vertex means that may imply . What follows are details on how to compute this graph of predicates, named . This section describes the general approach. The next section adds a special treatment for comparison predicates.
First, each context axiom is decomposed into a conjunctive normal form (CNF). It is done in a straightforward way (in contrast to optimised CNF decomposition [?]): axioms are of short size and their transformation into CNF does not yield a combinatorial explosion. The resulting clauses are called axiom clauses. Each graph vertex is labeled with a predicate symbol that appears in at least one literal of the context. If a predicate appears negated (as ) in an axiom clause, it is represented by a vertex labeled with . A clause is considered as a set of literals. For each axiom clause and each pair of distinct literals in this clause, there is an arc in depending on the polarity of and . There are three distinct cases modulo symmetry to consider. They are enumerated in Table Graphbased Reduction of Program Verification Conditions, where and are two distinct predicates. To reduce the graph size, the contraposite of each implication is not represented as an arc in the graph but is considered when traversing it, as detailed in Section Graphbased Reduction of Program Verification Conditions.
Value of the pair Arcs Table \thetable: Translating Pairs of Literals into Arcs. The intended meaning of an arc weight is that the lower the weight is, the higher the probability to establish from is. Therefore, the arc introduced for the pair along Table Graphbased Reduction of Program Verification Conditions is labeled with the number of predicates minus one () in the clause under consideration. For instance, a large clause with many negative literals, with among them, and with many consequents, with among them, is less useful for a deduction step leading to than the smaller clause . Finally, two weighted arcs and are replaced with the weighted arc .
Running example.
Figure Running example. represents the dependency graph corresponding to the definition of predicates , and . It is an excerpt of the graph representing the memory model of Caduceus/Why.
In a former work [?], equalities and inequalities were ignored when memorizing predicate dependencies. This leads to unsatisfactory results when (in)equality is central for deduction, e.g. when the conclusion is some equality between terms. If we handle equality as the other predicates, the process of Section Graphbased Reduction of Program Verification Conditions connects too many vertices with the vertex labeled . We have experienced that this reduction of the graph diameter has a negative impact on the quality of selection.
More generally the present section suggests a special construction of graph vertices and edges for comparison predicates. A comparison predicate is an equality , an inequality , a (reflexive) order relation ( or ) or an irreflexive preorder ( or ). The keys of this construction are the support of types and the exploitation of some causalities between comparison predicates.
Each comparison predicate is written where is , , , , or and is the type of the operands. For simplicity, the focus is on the types where and are total orders, and are their respective reverse orders, and is the union of and . A typical example is the type int of integers.
Each comparison present in at least one axiom is represented by two nodes respectively labeled with and , where , , , , , and respectively are , , , , , and . For instance, the two nodes and represent a total order on integers and its negation. These labels are called the typed comparison predicates.
Apart from this difference in the definition of , the arcs connected to typed comparison predicates are constructed following the general rules described in Table Graphbased Reduction of Program Verification Conditions.
Verification conditions are expressed as SMT problems in AUFLIA logics [?]. Since the comparison predicates between integers are interpreted in AUFLIA, no context axiom contributes to their definition. Figure Graphbased Reduction of Program Verification Conditions suggests such a list of axioms. To lighten the figure, the predicates are not indexed with int.
Adding these axioms to the context would be counterproductive. We propose instead to analyze them to enrich the predicate graph as if they were in the context. Since the algorithm of axiom selection does not take loops into account, the sole arcs of interest in the predicate graph are between distinct nodes. It is then impossible to proceed so on internal properties like reflexivity, irreflexivity, symmetry or transitivity. This is the reason why Figure Graphbased Reduction of Program Verification Conditions is limited to axioms between distinct predicates. The symmetric axioms where and respectively replace and are also treated but are not reproduced. The arcs resulting from the application of the rules of Table Graphbased Reduction of Program Verification Conditions to those ten axioms are added to the graph of predicates.
(1) (2) (3) (4) (5) Figure \thefigure: Some Axioms Relating Comparison Predicates Relevant axioms remain to be selected. Intuitively, an axiom is relevant with respect to a conclusion if a proof that needs this axiom can be found. Variables and predicates included in a relevant axiom are also called relevant.
Section Graphbased Reduction of Program Verification Conditions shows how to select relevant constants in, Section Graphbased Reduction of Program Verification Conditions how to select relevant predicates and Section Graphbased Reduction of Program Verification Conditions how to combine these results to select relevant axioms. A selection strategy is presented as an algorithm in Section Graphbased Reduction of Program Verification Conditions.
A node in the graph of constants is identified with its labeling constant. Let be the diameter of the graph of constants . Starting from the set of constants in the conclusion, a breadthfirst search algorithm computes the sets of constants in that are reachable from with at most steps (). Finally, unreachable constants are added to the limit of the sequence for completeness. Let be the resulting set.
To introduce more granularity in the computation of reachable constants, we propose as a heuristic to insert nodes that are linked several times before nodes that are just linked once. Semantically it gives priority to constants which are closer to the conclusion. Notice that, in this case, the index of does not correspond to a path length anymore.
Running example.
A predicate is identified with the vertex labeled and its negation with the vertex labeled in the graph of predicates . A predicate symbol is relevant w.r.t. a predicate symbol if there is a path from to in , or dually from to . Intuitively, the weaker the path weight is, the higher the probability of to establish is. Relevant predicates extracted from are stored into an increasing sequence of sets. The natural number is the maximal weight of paths considered in the graph of predicates.
We now present how is computed. The conclusion is assumed to be a single clause. gathers the predicates from the conclusion. For each predicate symbol that is not in , a graph traversal computes the paths with the minimal weight from to some predicate in .
Furthermore, contraposition of each implication is considered: let and be two node labels, corresponding either to a positive or a negative literal. If the arc is taken into account, its couterpart is too, with the convention that is . Let be the minimal distance from to the deepest reachable predicate. For , is the set of vertices of whose distance to is less than or equal to . is the limit augmented with the vertices from which is not reachable.
Running example.
From the predicate graph of the running example, depicted in Fig. Running example. without the comparison predicates for lack of space, the first five sets of reachable predicates are
In this section, we present the main principles of the axiom selection combining predicate and constant selection. A first part describes hypothesis selection and a second one extends the approach to axioms from the context.
Let and respectively be the sequences of relevant predicate and constant sets. Let be a counter which represents the depth of predicate selection. Similarly, let be a counter corresponding to the depth of constant selection.
Let be a clause from a hypothesis. Let be the set of constants of augmented with constants resulting from flattening (see Section Graphbased Reduction of Program Verification Conditions). Let be the set of predicates of . The clause should be selected if it includes constants or predicates that are relevant according to the conclusion. Different criteria can be used to verify this according to its sets and . Possible choices are, in increasing order of selectivity

the clause includes at least one relevant constant or one relevant predicate:

the clause includes more than a threshold of relevant constants or more than a threshold of relevant predicates:

all the clause constants and clause predicates are relevant:
Our experiments on these criteria have shown that a too weak criterion does not accomplish what it is designed for: too many clauses are selected for few iterations, making the prover quickly diverge. Thus, we only consider the strongest criterion (3).
We have also often observed the case where only a conjunctive part of a universally quantified hypothesis is relevant. In that case, we split the conjunctive hypothesis into its parts and the filtering criterion is applied to the resulting predicates. A particular case is considered if a whole splittable hypothesis is relevant according to the criterion. Indeed, we then consider the original formula, in order to preserve its structure, which can be exploited by provers.
Consider now the case of selecting relevant axioms from the context. Intuitively, an axiom of the context has to be selected if one of the predicate relations it defines is relevant for one hypothesis, i.e. the corresponding arc is used in the computation of . Practically, for each arc that is passed through while generating , we keep all the axioms of the context that have generated this arc.
Figure \thefigure: General Algorithm Discharging a VC with Axiom Selection The selection strategy experimented in this work is described in Fig. Graphbased Reduction of Program Verification Conditions. The algorithm takes three parameters in input:

a VC whose satisfiability has to be checked,

a satisfiability solver Prover, and

a maximal amount of time given by the user to the satisfiability solver to discharge the VC.
The algorithm starts with a first attempt to discharge the VC without axiom selection. It stops if this first result is unsatisfiable or satisfiable. Notice that in the latter case, removing axioms cannot modify the result. Otherwise, Prover is called following an incremental constantfirst selection.
The two natural numbers and are depth bounds for and computed during predicate graph and constant graph traversals. Since we want to reach and , and are initially computed by the tool as one plus the minimal depth to obtain all reachable predicates and constants. This is interpreted by the tool as the depth, according to Sec. Graphbased Reduction of Program Verification Conditions and Graphbased Reduction of Program Verification Conditions (all predicates and constants of the graphs).
The selection function implements the selection of axioms (from context or hypotheses) according to the strongest criterion (3). Discharging the resulting reduced VC into a prover can yield three outcomes: satisfiable, unsatisfiable or timeout.

If the formula is declared to be unsatisfiable, the procedure ends. Adding more axioms cannot make the problem satisfiable.

If the formula is declared to be satisfiable, we may have omitted some axioms; we are then left to increment either or , i.e. to enlarge either the set of selected predicates or the set of selected constants.
However, allowing predicates has a more critical impact than allowing new constants, since constants do not appear in context axioms. Therefore we recommend to first increment , increasing until eventually , before considering incrementing . In this later case, resets to .

If the formula is not discharged in less than a given time, after having iteratively incremented and , then the algorithm terminates.
The proposed approach is included in a global context of annotated C program certification. A separation analysis that strongly simplifies the verification conditions generated by a weakest precondition calculus, and thus greatly helps to prove programs with pointers has been proposed by T. Hubert and C. Marché [?]. Their approach is supported by the Why tool. The pruning heuristics presented here are developed as a postprocess of this tool.
Section Graphbased Reduction of Program Verification Conditions gives some implementation and experimentation details. Section Graphbased Reduction of Program Verification Conditions presents experimental results on an industrial case study for trusted computing. This case study raises new challenges associated to the certification of C programs annotated with a temporal logic formula. Section Graphbased Reduction of Program Verification Conditions finally gives results obtained on a public benchmark.
All the strategies presented in this work are implemented in OCaml as modules in the Why [?] tool in less than 1700 lines of code. Since these criteria are heuristics, their use is optional, and Why has command line arguments which allow a user to enable or disable their use. In the current version, several others heuristics have been developed, which are not considered because their impact on the performance of Why seems to be less obvious. In order to use the presented algorithms, the arguments to include in the Why call are:
 prunewithcomp  prunecontext  prunecoarsepredcomp
 prunevarsfilter CNF
The first parameter includes comparison predicates in the predicate dependency graph. The second one requires filtering not only hypotheses but also axioms from the context. The third one requires to ignore arc weights. This option gives better execution times on the Oslo benchmark. Finally, the fourth argument requires for rewriting hypotheses into CNF before filtering.
The whole experiment is done on an Intel T8300@2.4GHz with 4Gb of memory, under a x86_64 Ubuntu Linux.
First of all, among the 771 generated VCs, 741 are directly discharged, without any axiom selection. Next, the approach developed in [?] increases the result to 752 VCs.
Among the remaining unproved VCs, some rely on quantified hypotheses and others need comparison predicates that are not handled in the previous work [?]. They have motivated the present extensions, namely CNF reduction, comparison handling and context reduction. Thanks to these improvements, 10 more VCs are automatically proved by using the algorithm described in Fig. Graphbased Reduction of Program Verification Conditions with the three provers Simplify, AltErgo 0.8 and Yices 1.0.20 with a timeout of 10 seconds.
The and limits depend on the VCs. Their observed values do not go beyond and . These limits express the number of versions in which the VCs have been cut. If edge weights are considered, then grows up to and the execution time is twice as long. Figure Graphbased Reduction of Program Verification Conditions sums up these results.
Our approach is developed in the Why tool, which translates Why syntax into the input syntax of several proof assistants (Coq, HOL 4, HOL Light, Isabelle/HOL, Mizar, PVS) and automated theorem provers (AltErgo, CVC3, Simplify, Yices, Z3). This section shows some experimental results on the Why public benchmark^{1}^{1}1http://proval.lri.fr/whybenchmarks/.
The Why benchmark is a public collection of VCs generated by Caduceus or Krakatoa. These tools generate VCs respectively from C and Java programs, according to CSL and JML specifications. Hence, it partially matches to our requirements, since our work is focusing on the verification of VCs generated by these tools. The only limitation is that our method is focusing on VCs with a large amount of hypotheses, in contrast to the ones presented in this benchmark.
This benchmark is provided in two versions corresponding to two different preprocesses. Our results are similar with both versions. AltErgo discharges 1260 VCs directly and 1297 VCs with axiom selection, while axiom selection adds 3 VCs to the 1310 VCs directly discharged by Simplify.
We have presented a new strategy to select relevant hypotheses in formulae coming from program verification. To do so, we have combined two separate dependency analyses based on graph computation and graph traversal. Moreover, we have given some heuristics to analyse the graphs with a sufficient granularity. Finally we have shown the relevance of this approach with a benchmark issued from a real industrial code.
Strategies to simplify the prover’s task have been widely studied since automated provers exist [?], mainly to propose more efficient deductive systems [?, ?, ?]. The KeY deductive system [?] is an extreme case. It is composed of a large list of special purpose rules dedicated to JMLannotated JavaCard programs. These rules make unnecessary an explicit axiomatization of data types, memory model, and program execution. Priorities between deduction rules help in effective reasoning. Beyond this, choosing rules in that framework requires as much effort as choosing axioms when targeting general purpose theorem provers.
The present work can be compared with the set of support (sos) selection strategy [?, ?]. This approach starts with asking the user to provide an initial sos: it is classically the conclusion negation and a subset of hypotheses. It is then restricted to only apply inferences with at least one clause in the sos, consequences being added next into the sos. Our work can also be viewed as an automatic guess of the initial sos guided by the formula to prove. In this sense, it is close to [?] where initial relevant clauses are selected according to syntactical criteria, i.e. counting matching rates between symbols of any clause and symbols of clauses issued from the conclusion. By considering syntactical filtering on clauses issued from axioms and hypotheses, this latter work does not consider the relation between hypotheses, formalized by axioms of the theory: it provides a reduced forward proof. In contrast, by analyzing dependency graphs, we simulate natural deduction and are not far from backward proof search. By focusing on the predicative part of the verification condition, our objectives are dual to those developed in [?]: this work concerns boolean verification conditions with any boolean structure whereas we treat predicative formulae whose symbols are axiomatized in a quantified theory. Even in a large set of context axioms, most of the time, each verification condition only requires a tiny portion of this context. In [?, ?] a strategy to select relevant context axioms is presented, but it needs a preliminary manual task classifying axioms. Our predicate graph computation makes this axiom classification automatic. Recent advances have been made in the direction of semantic selection of axioms [?, ?]. Briefly speaking, at each iteration, the selection of each axiom depends on the fact whether a computed valuation is a model of the axiom or not. By comparison, our syntactical axiom selection is more efficient, indeed linear in the size of the input formula.
In a near future we plan to apply the strategy to other case studies. We also plan to investigate the impact on execution time of various strategies discharging the same list of verification conditions. We want to confirm or infirm with other benchmarks that weighting predicate dependencies with a formula length has no positive impact on automaticity but has a significant negative impact on the execution time. We also plan to integrate selection strategies in the Why tool or in a target automated theorem prover.
This work is partially funded by the French Ministry of Research, thanks to the CAT (C Analysis Toolbox) RNTL (Reseau National des Technologies Logicielles), by the SYSTEM@TIC Paris Region French cluster, thanks to the PFC project (Plateforme de Confiance, trusted platforms), and by the INRIA, thanks to the CASSIS project and the CeProMi ARC. The authors also want to thank Christophe Ringeissen and the four anonymous referees for their insightful comments.
 [1] M. Barnett, K. R. M. Leino, and W. Schulte. The Spec# Programming System: An Overview. In Construction and Analysis of Safe, Secure, and Interoperable Smart Devices (CASSIS’04), volume 3362 of Lecture Notes in Computer Science, pages 49–69. Springer, 2004.
 [2] B. Beckert, R. Hähnle, and P. H. Schmitt, editors. Verification of ObjectOriented Software: The KeY Approach. LNCS 4334. SpringerVerlag, 2007.
 [3] Bernhard Kauer. OSLO: Improving the security of Trusted Computing. In 16th USENIX Security Symposium, August 610, 2007, Boston, MA, USA, 2007.
 [4] L. Burdy, Y. Cheon, D. Cok, M. Ernst, J. Kiniry, G. T. Leavens, K. R. M. Leino, and E. Poll. An overview of JML tools and applications. Technical Report NIIIR0309, Dept. of Computer Science, University of Nijmegen, 2003.
 [5] J.F. Couchot and T. Hubert. A Graphbased Strategy for the Selection of Hypotheses. In FTP 2007  International Workshop on FirstOrder Theorem Proving, Liverpool, UK, Sept. 2007.
 [6] L. M. de Moura, B. Dutertre, and N. Shankar. A tutorial on satisfiability modulo theories. In W. Damm and H. Hermanns, editors, CAV, volume 4590 of Lecture Notes in Computer Science, pages 20–36. Springer, 2007.
 [7] D. Deharbe and S. Ranise. Satisfiability Solving for Software Verification. Submitted in 2006. See http://www.loria.fr/~ranise/pubs/stttsubmitted.pdf.
 [8] E. Denney, B. Fischer, and J. Schumann. An empirical evaluation of automated theorem provers in software certification. International Journal on Artificial Intelligence Tools, 15(1):81–108, 2006.
 [9] D. Detlefs, G. Nelson, and J. B. Saxe. Simplify: a theorem prover for program checking. J. ACM, 52(3):365–473, 2005.
 [10] D. L. Detlefs, K. R. M. Leino, G. Nelson, and J. B. Saxe. Extended static checking. Technical Report 159, Compaq Systems Research Center, Dec. 1998. Available at http://www.hpl.hp.com/techreports/CompaqDEC/SRCRR159.html.
 [11] E. W. Dijkstra. A discipline of programming. Series in Automatic Computation. Prentice Hall Int., 1976.
 [12] J.C. Filliâtre and C. Marché. The Why/Krakatoa/Caduceus platform for deductive program verification. In 19th International Conference on Computer Aided Verification, volume 4590 of Lecture Notes in Computer Science, pages 173–177. Springer, 2007.
 [13] A. Giorgetti and J. Groslambert. JAG: JML Annotation Generation for Verifying Temporal Properties. In Luciano Baresi and Reiko Heckel, editors, Fundamental Approaches to Software Engineering, 9th International Conference, FASE 2006, volume 3922 of Lecture Notes in Computer Science, pages 373–376. Springer, 2006.
 [14] E. P. Gribomont. Simplification of boolean verification conditions. Theoretical Computer Science, 239(1):165–185, 2000.
 [15] J. Groslambert and N. Stouls. Vérification de propriétés LTL sur des programmes C par génération d’annotations. In AFADL’09, 2009. Short paper.
 [16] T. Hubert and C. Marché. Separation analysis for deductive verification. In Heap Analysis and Verification (HAV’07), Braga, Portugal, Mar. 2007.
 [17] K. R. M. Leino. Efficient weakest preconditions. Information Processing Letters, 93(6):281–288, 2005.
 [18] J. Meng and L. Paulson. Lightweight relevance filtering for machinegenerated resolution problems. In ESCoR: Empirically Successful Computerized Reasoning, 2006.
 [19] A. Nonnengart and C. Weidenbach. Computing Small Clause Normal Forms. In A. Robinson and A. Voronkov, editors, Handbook of Automated Reasoning, volume I, chapter 6, pages 335–367. Elsevier Science, 2001.
 [20] D. A. Plaisted and A. H. Yahya. A relevance restriction strategy for automated deduction. Artificial Intelligence, 144(12):59–93, 2003.
 [21] P. Pudlak. Semantic selection of premisses for automated theorem proving. In G. Sutcliffe, J. Urban, and S. Schulz, editors, CEUR Workshop Proceedings, volume 257, pages 27–44, 2007.
 [22] S. Ranise and C. Tinelli. The Satisfiability Modulo Theories Library (SMTLIB). http://www.SMTLIB.org, 2006.
 [23] W. Reif and G. Schellhorn. Theorem proving in large theories. In M. P. Bonacina and U. Furbach, editors, Int. Workshop on FirstOrder Theorem Proving, FTP’97, pages 119–124. Johannes Kepler Universität, Linz (Austria), 1997.
 [24] N. Stouls. Hypotheses selection applied to trusted computing. http://perso.citi.insalyon.fr/nstouls/tools/oslo/.
 [25] G. Sutcliffe and Y. Puzis. Srass  a semantic relevance axiom selection system. In Springer, editor, Automated Deduction  CADE21, 21st International Conference on Automated Deduction, Bremen, Germany, July 1720, 2007, Proceedings, volume 4603 of Lecture Notes in Computer Science, pages 295–310, 2007.
 [26] L. Wos. Conquering the meredith single axiom. Journal of Automated Reasoning, 27(2):175–199, 2001.
 [27] L. Wos and G. W. Pieper. The hot list strategy. Journal of Automated Reasoning, 22(1):1–44, 1999.
 [28] L. Wos, G. A. Robinson, and D. F. Carson. Efficiency and completeness of the set of support strategy in theorem proving. J. ACM, 12(4):536–541, 1965.
