Exploiting Pointer Analysis in Memory Models for Deductive Verificationcolor=greencolor=greencolor=greenToDo color=greenIdeas? see notes

Exploiting Pointer Analysis in Memory Models for Deductive Verificationcolor=greencolor=greentodo: color=greenIdeas? see notes

Quentin Bouillaguet CEA, LIST, Software Reliability Laboratory, France
IRIF, University Paris Diderot and CNRS, France
    François Bobot CEA, LIST, Software Reliability Laboratory, France
  
Mihaela Sighireanu
IRIF, University Paris Diderot and CNRS, France
    Boris Yakobowski CEA, LIST, Software Reliability Laboratory, France
AdaCore, Paris, France
Abstract

Cooperation between verification methods is crucial to tackle the challenging problem of software verification. The paper focuses on the verification of C programs using pointers and it formalizes a cooperation between static analyzers doing pointer analysis and a deductive verification tool based on first order logic. We propose a framework based on memory models that captures the partitioning of memory inferred by pointer analyses, and complies with the memory models used to generate verification conditions. The framework guided us to propose a pointer analysis that accommodates to various low-level operations on pointers while providing precise information about memory partitioning to the deductive verification. We implemented this cooperation inside the Frama-C platform and we show its effectiveness in reducing the task of deductive verification on a complex case study.color=yellowcolor=yellowtodo: color=yellowKeep less 150 words!

1 Introduction

Software verification is a challenging problem for which different solutions have been proposed. Two of these solutions are deductive verification (DV) and static analysis (SA).

Deductive verification is interested in checking precise and expressive properties of the input code. It requires efforts from the user that has to specify the properties to be checked, plus other annotations – e.g., loop invariants. Using these specifications, DV tools build verification conditions which are formulas in various logic theories and send them to specialized solvers. For C programs with pointers, DV has been boosted by the usage of Separation Logic [OHearnRY01], which leads to compact proofs due to the local reasoning allowed by the separating conjunction operator. However, for programs with low-level operations on pointers (e.g., pointer arithmetics and casting), this approach is actually limited by the theoretical results on the fragment of separation logic employed [BrotherstonK18] and on the availability of solvers. Therefore, this class of programs is most commonly dealt using classic approaches based on memory models à la Burstall-Bornat [ma/Burstall72, Bornat00], which may be adapted to be sound in presence of low-level operations [RakamaricH09] and dynamic allocation [DBLP:conf/vmcai/0062BW17]. The memory model is chosen in general by the DV engine which may employ some heuristics to guide the choice [hubert07hav]. Indeed, changing the memory model may result in an increase of the number of proofs discharged automatically [TuchKN07]. However, annotations on non aliasing between pointers and memory partitioning complicates the task of users and of underlying solvers.

On the other hand, static analysis targets checking a fixed class of properties. This loss in the expressivity of properties is counterbalanced by a high degree of automation. For example, static pointer analysis for C programs usually computes over-approximations of the set of values (addresses) for each pointer expression at each control point. These abstractions do not speak about concrete memory addresses, but refer to symbolic memory regions provided by the memory allocated to program variables and in heap by dynamic allocation methods.

The information obtained by static analysis may help to infer partitioning of the memory in disjoint regions which can then be used by DV tools. The success of this collaboration between SA and DV tool strongly depends on the coarseness of the abstraction used by SA to keep track of the locations scanned by a pointer inside each memory region. For example, consider p a pointer to integer and a variable s of type record with five integer fields, struct \{int m,n,o,p,q;\}, such that p scans locations of all fields of s except o (i.e., &s.m, &s.n, &s.p and &s.q). Pointer analyses (e.g., [lctrts/Mine06]) over-approximate the location of p to any location in the memory region of s which is multiple of an integer, thus including the spurious o field. Therefore, it is important to be able to try several SA algorithms to gather precise information about the memory partitioning.

C files

C files

C files

Annot.

DV

MME

P

SA

Solvers

Figure 1: Verification using memory partitioning inferred by pointer analysis

Our contribution targets this specific cooperation of SA and DV methods in the context of first-order logic solvers. The verification process we propose is summarized by the flow diagram in Figure 1. The code to be verified is first given to the static analyzer to produce state invariants including a sound partitioning P of the program’s memory. The partitioning P is exploited by a functor M which produces a memory model environment MME used by the DV tool to generate verification conditions into a logic theory supported by automatic solvers. Our first contribution is the formalization of the functor M and of the information it needs from the static analysis. Secondly, we demonstrate that several existing pointer analyses may be used in this general framework. Thirdly, we implemented this functor in the Frama-C platform [fac/KirchnerKPSY15] between the plug-ins Eva for static analysis and WP for deductive verification. Finally, we propose a new pointer analysis exploiting a value analysis based on abstract interpretation; this analysis is able to produce the memory model that reduces the verification effort of a relevant benchmark.

2 A Motivating Example

1typedef int32_t data_t;
2typedef uint8_t pos_t;
3typedef struct {
4  data_t *in1, *in2, *in3, *in4;
5  data_t *out1,*out2,*out3,*out4;
6  pos_t  *pos1,*pos2,*pos3,*pos4;
7               } intf4_t;
8 /*@ requires:
9 *   sep({args->in1,...,args->in4},
10 *       args->out1,...,args->out4,
11 *       args->pos1,...,args->pos4);
12 *  ensures:
13 *   sorted_vals(&(args->out1),4);
14 *  ensures:
15 *   perm(&(args->in1),&(args->out1),
16 *        &(args->pos1),4); */
17void sort4(intf4_t *args) {
18  data_t **inArr   =
19    (data_t **) &(args->in1);
20  data_t **outArr  =
21    (data_t **) &(args->out1);
22  pos_t **posArr =
23    (pos_t **) &(args->pos1);
24   /** init arrays from inputs */
25  int32_t sortArr[4];  // values
26  uint8_t permArr[4];  // permutation
27   /*@ loop invariant: ... */
28  for (int i = 0; i < 4; i++) {
29    sortArr[i] = *inArr[i];
30    permArr[i] = i;
31  }
32
33   /* sorting algorithm on sortArr
34   * with permutation in permArr */
35
36   /** copy results to outputs */
37   /*@ loop invariant: ... */
38  for (int i = 0; i < 4; i++) {
39    (*outArr[i]) = sortArr[i];
40    (*posArr[i]) = permArr[i];
41  }
42}
Figure 2: Sorting function for inputs and outputs

We overview the issues targeted and the solution proposed in this work using the C code given in Figure 2. This code is extracted from the C code generated by the compiler of a high level data flow language. It combines at least three complex features of pointers in C.

The first feature is the duality of records and arrays, which is used here to interpret the (large) list of arguments for a function as individual fields in a compound (record) type or as cells of an array. Thus, the read of the -th field () named fk of a record stored at location s and using only fields of type may be written s->fk or *(&(s->f0)+k), where f0 is the first field. It is debatable whether the C standard actually permits this form of dual indexing between records with fields of the same type and arrays [Stackoverflow-Q51737910], but some programs, including this one, use this feature with success. In our example, this duality is used in function sort4 to ease the extraction of numerical values from the inputs and the storage of the sorted values in the outputs. This first feature makes our running example more challenging, but the technique we propose is also effective when the parameters are encapsulated in arrays of pointers, e.g., when inputs and outputs are declared as a field of type array by data_t* in[4]. The second feature is precisely the usage of arrays of pointers which is notoriously difficult to be dealt precisely by pointer analyses. The third feature is the complex separation constraints between pointers stored in arrays, which leads to a quadratic number of constraints on the size of the array and complicates the task of DV tools. In the following, we discuss in detail these issues and our approach to deal with them.

Inputs and outputs of sort4 have the same type, data_t, which shall encapsulate a numerical value to be sorted. For simplicity, we consider only one field of int32_t type for data_t. Type pos_t models an element of the permutation and denotes the destination position (an unsigned integer) of the value sorted. The parameters of sort4 are collected by type intf4_t: four pointers to data_t for input values, four pointers to data_t for output values, and four pointers to pos_t for the new positions of input values.

The function is annotated with pre/post conditions and with loop invariants. The pre-condition requires (predicate sep) that (1) all pointers in *args are valid, i.e. point to valid memory locations, (2) the pointers in fields in are disjoint from any pointer in fields out and pos, and (3) pointers in fields out and pos are pairwise disjoint. Notice that the in fields may alias. The post-condition states that the values pointed by the fields out are sorted (predicate sorted_vals) and, for each output , the value of this output is equal to the value of the input such that pos[] is (predicate perm).

The separation pre-condition is necessary for the proof of the post-condition because any aliasing between fields out may crush the results of the sorting algorithm. The encoding of this pre-condition in FOL is done by a conjunction of dis-equalities which is quadratic on the number of pointers concerned. More precisely, for inputs (and so outputs and positions), there are such constraints. (In SL, this requirement is encoded in linear formulas.) The original code from which our example is inspired instantiate with and therefore generates a huge number of dis-equalities. Several techniques have been proposed to reduce the number of dis-equalities generated by the separation constraints. For example, a classic technique is assigning a distinct logic value (a color) to each pointer in the separated set. This technique does not apply in our example if the type data_t is a record with more than one field because the color shall concern only the numerical value to be sorted.

As an alternative, we propose to use precise points-to analyses to lift out such constraints and to simplify the memory model used for the proof of the function. Importantly, we perform a per-call proof of sort4, instead of a unitary proof. For each call of sort4, the static analysis tries to check that the separation pre-condition is satisfied and provides a model for the memory where the pointers are dispatched over disjoint zones. Unfortunately, the precision of the points-to analyses (and consequently the number of separation constraints discharged) may change radically with the kind of initialization done for the arguments of sort4. We will illustrate this behavior for two calls of sort4 given in Figure 3: the call in listing (a) uses variables and the one in listing (b) uses arrays. Notice that each call satisfies the separation pre-condition of sort4.

Listing 1: (a) using variables 1  data_t  df_1,df_2,...,df_8; 2  pos_t   pf_1,pf_2,pf_3,pf_4; 3  intf4_t SORT = { 4    .in1=&df1,  .in2=&df2, 5    .in3=&df3,  .in4=&df4, 6    .out1=&df5, .out2=&df6, 7    .out3=&df7, .out4=&df8, 8    .pos1=&pf1, .pos2=&pf2, 9    .pos3=&pf3, .pos4=&pf4 }; 10 11  df_1 = nondet_data(); 12  df_2 = nondet_data(); 13  df_3 = nondet_data(); 14  df_4 = nondet_data(); 15 16  sort4(&SORT); Listing 2: (b) using arrays 1  data_t  df[8]; 2  pos_t   pf[4]; 3  intf4_t SORT = { 4    .in1=df+1,  .in2=df+2, 5    .in3=df+3,  .in4=df+4, 6    .out1=df+5, .out2=df+6, 7    .out3=df+7, .out4=df, 8    .pos1=pf,   .pos2=pf+1, 9    .pos3=pf+2, .pos4=pf+3 }; 10 11  df[1] = nondet_data(); 12  df[2] = nondet_data(); 13  df[3] = nondet_data(); 14  df[4] = nondet_data(); 15 16  sort4(&SORT);
Figure 3: Two calls for the sorting function using different initialization

Typed memory model: For completeness, we quickly present first how DV tools using FOL deal with our example using the Burstall-Bornat model. In this model, the memory is represented by a set of array variables, each array corresponding to a (pre-defined, basic) type of memory locations. For our example, the memory model includes six array variables: M_int32, M_uint8, M_int32_ref, M_uint8_ref, M_int32_ref_ref, M_uint8_ref_ref storing values of type respectively int32_t, uint8_t, int32_t*, uint8_t*, int32_t** and uint8_t**. Program variables are used as indices in these arrays, e.g., variable inArr is an index in array M_int32_ref_ref and sortArr is index of M_int32.

The separation pre-condition of sort4 is encoded by dis-equalities, e.g., M_int32_ref[args_in4] <> M_int32_ref[args_out1] where args_in4 is bound to the term which encodes the access to the memory location &(args->in4) using the logic function shift; args_out1 is defined similarly. However, these dis-equalities are not propagated through the assignments at lines 18–23 in Figure 2, which interpret the sequence of (input/output/position) fields as arrays. Therefore, additional annotations are required to prove the correct initialization of the output at lines 39–41. Some of these annotations may be avoided using our method that employs pointer analyses to infer precise memory models, as we show below.

Base-offset pointer analysis: Consider now a pointer analysis which is field and context sensitive, and which computes an over-approximation of the value of each pointer expression at each program statement. The over-approximation, that we name abstract location, is built upon the standard concrete memory model of C [jar/LeroyB08]. An abstract location is a partial map between the set of program’s variables and the set of intervals in . An element of this abstraction, , denotes the symbolic (i.e., not related with the locations in the virtual memory space used during the concrete execution) memory block that starts at the location of the program variable (called also base), and the abstraction by an interval of the set of possible offsets (in bytes) inside the symbolic block of to which the pointer expression may be evaluated. In this memory model, symbolic blocks of different program variables are implicitly separated: it is impossible to move from the block of one variable to another using pointer arithmetic. The memory model is modeled by a set of logic arrays, one for each symbolic block. The over-approximation computed by the analysis allows to dispatch a pointer expression used in a statement on these arrays.

In our example, for the call of sort4 in Figure 3 (a), the memory model includes the symbolic blocks for program’s variable df, pf and SORT. The above analysis computes for the pointer expressions args->in1 and *(args->in1) at the start of sort4, the abstract location and  respectively. The abstract locations for the pointer expressions involving other fields of args are computed similarly. The separation pre-condition of sort4 is implied by these abstract locations. After the fields of args are interpreted as arrays (lines 18–23 of sort4), the pointer expression outArr+i at line 39, where i is restricted to the interval , is over-approximated to the abstract locationcolor=graycolor=graytodo: color=grayMS: keep simple, see comments. BY: I agree . Similarly, inArr+i is abstracted by . Therefore, the left value given by the pointer expression outArr[i] (at line 39) is (precisely) computed to be . This allows proving the correctness of the output computed by sort4.

For the call in Figure 3 (b), the memory model includes symbolic blocks for program’s variable df, pf and SORT. The analysis computes for pointer expressions args->in1 and *(args->in1) (used at the start of sort4), the abstract location resp. , which also allows to prove the separation pre-condition. The interpretation of fields as arrays (lines 18–23) leads to the abstract location for inArr+i, which is very precise. However, because the initialization of the field SORT.out4 at line 18 in Figure 3 (b) breaks the uniformity of the interval, the pointer expression outArr+i (at line 39) is over-approximated to . This prevents the proof of the post-condition.

In conclusion, such an analysis is able to infer a sound memory model that offers a finer grain of separation than the typed memory model. However, it is not precise enough to deal with the array of pointers and field duality in records.

Partitioning analysis: Based on the base-offset pointer analysis above, we define in Section 5.3 a new analysis that computes for each pointer expression an abstract location that collects a finite set of slices of symbolic blocks, i.e., the abstraction is a partial mapping from program’s variables to sets of intervals representing offsets in the block. With this analysis, the abstract location computed for outArr+i (at line 39 of sort4, call in Figure 3 (b)) is more precise, i.e., , and it allows to prove the post-condition for sort4. Notice that the analysis computes a finite set of slices in symbolic blocks whose concretizations (sets of locations) are pairwise disjoint. For this reason, this analysis may be imprecise if its parameter fixing the maximum size of this set is exceeded. This analysis also deals precisely with the call of sort4 in Figure 3 (a).

Dealing with different analyses: The above comments demonstrate the diversity of results obtained for the memory models for different points-to analysis algorithms. One of our contributions is to define a generic interface for the definition of the memory model for the DV based on the results obtained by static analyses doing points-to analysis (SPA). This interface eases the integration of a new SPA algorithm and the comparison of results obtained with different SPA algorithms. We formalize this interface in Section 4 and instantiate it for different SPA algorithms in Section 5. Our results are presented in Section 6.

3 Generating Verification Conditions

To fix ideas, we recall the basic principles of generating verification conditions (VC) using a memory model by means of a simple C-like language.

3.1 A Clight Fragment

We consider a fragment of Clight [jar/BlazyL09] that excludes casts, union types and multi-dimensional arrays. We also restrict the numerical expressions to integer expressions. The syntax of expressions, types and atomic statements is defined by the grammar in Figure 4. This fragment is able to encode all assignment statements in Figures 23 using classic syntax sugar (e.g., **(arr + i) for *arr[i], &((*args).in1) for &(args->in1)). Complex control statements can be encoded using the standard way. User defined types are pointer types, static size array types, and record types. A record type declares a list of typed fields with names from a set ; for simplicity, we suppose that each field has a unique name. We split expressions into integer expressions and address expressions to ease their typing. Expressions are statically typed by a type in . When this information is needed, we write .

Figure 4: Syntax of our Clight fragment

We choose to present our work on this simple fragment for readability. However, our framework may be extended to other constructs. For example, our running example contains struct initialization. Struct assignment may be added by explicit assignment of fields. Type casting for arithmetic and compatible pointer types (i.e., aligned on the same type) may be dealt soundly in DV tools employing array-based memory models using the technique in [RakamaricH09]. Functions calls may be also introduced if we choose context-sensitive SA. In general, DV tools conduct unit proofs for functions. We restrict this work to whole-program proofs, because it avoids the requirement that SA is able to conduct analyses starting with function’s pre-conditions.

3.2 Memory Model

We define the denotational semantics of our language using an environment called abstract memory model (AMM). (This name is reminiscent of the first abstract memory model defined in [inria/LeroyAB12, jar/LeroyB08] for CompCert. We enriched it with some notations to increase readability of our presentation.) Figure 5 summarizes the elements of this abstract memory model. The link between the abstract memory model and the concrete standard memory model is provided in Appendix A.

Figure 5: Abstract signature for the concrete memory model

The states of the memory are represented by an abstract data type Mem which associates locations of type Loc to values in the type Val. Locations are pairs where is the identifier of a symbolic block and is an integer giving the offset of the location in the symbolic block of . Because we are not considering dynamic allocation, symbolic blocks are all labeled by program’s variables. Thus we simplify the concrete memory model by replacing block identifiers by program variables. Values of type Loc are built by two operations of AMM: gives the location of a program variable and computes the location obtained by shifting the offset of location by bytes. The shift operation abstracts pointer arithmetics. The typing function is extended to elements of Loc based on the typing of expressions used to access them. Some operations are partial and we denote by the undefined value. A set extended with the undefined value is denoted by . The axiomatization of loading and storing operations is similar to the one in [inria/LeroyAB12, jar/LeroyB08].

3.3 Semantics

Figure 6 defines the rules of the semantics using the abstract memory model, via the overloaded functions . The semantic functions are partial: the undefined case cuts the evaluation. The operators are interpretations of operations over integer types . The functions and are defined by the Application Binary Interface (ABI) and depend on the architecture. Conversions between integer values are done using function .

Figure 6: Semantics of our Clight fragment

3.4 Generating Verification Conditions

Verification conditions (VC) are generated from Hoare’s triple with and formulas in some logic theory used for program annotations and a program statement. The classic method [ipl/Leino05, Flanagan/2001] is built on the computation of a formula in specifying the relation between the states of the program before and after the execution of , which are represented by the set of logic variables resp. . The VC built for the above Hoare’s triple is and it is given to solvers for to check its validity. In the following, we denote by the set of logic terms built in the logic theory using the constants, operations, and variables in a set . For a logic sort , we designate by the terms of type .

Compilation environment: Formula is defined based on the dynamic semantics of statements, like the one given in Figure 6 for our language. The compilation of this semantics into formulas uses a memory model environment (called simply environment) that implements the interface of the abstract memory model given in Figure 5. This environment changes at each context call and keeps the information required by the practical compilation into formulas, e.g., the set of variables used for modeling the state at the current control point of this specific context call. Figure 7 defines the signature of memory environments.

Figure 7: Signature of the memory model environments

The types Mem and Loc encapsulate information about the program states and memory locations respectively. Notice that the logical representation of locations is hidden by this interface, which allows to capture very different memory models. The compilation information about the values stored is given by the type Val, which represent integers by integer terms in , i.e., in the set . Operation shift implements arithmetics on locations by an integer term. Operation store encapsulates the updating of the environment by an assignment and produces a new environment and a term in , i.e., a formula of .

Prerequisites on the logic theory: For DV tools based on first order logic, the theory is a multi-sorted FOL that embeds the logic theory used to annotate programs (which usually includes boolean and integer arithmetics theories) and the McCarthy’s array theory [ifip/McCarthy62] employed by the Burstall-Bornat memory model [Bornat00] to represent atomic memory blocks. The memory model environment associates to each memory blocks a set of logic array variables using base operations. It encodes the operations resp.  into logic array operations resp. , where is the array variable for the symbolic block of location that stores values of type and is the offset of in . also embeds abstract data types (or at least polymorphic pairs with component selection by fst and snd), and uninterpreted functions. Polymorphic conditional expression “” are also needed.

In the following, we use the logic theory above and suppose that an infinite number of fresh variables can be generated. To ease the reading of environment definitions, we distinguish the logic terms by using the mathematical style and by underlining the terms of , e.g., . For example, the logic term is built from a VC generator term m(b) that computes a logic term of array type and the logic sub-term .

Example: Consider the Hoare’s triple . Let be , where (resp. ) is the environment for the source state (resp. modified by the store for the destination state); that is . The formula (resp. ) is generated from (resp. ) using compilation environment (resp. ). Then the VC generated by the above method is . Notice that the above calls of the environment’s operations follow the order given by the semantics in Figure 6, except for the failure cases. Indeed, to simplify our presentation, we consider that statement’s pre-condition includes the constraints that eliminate runs leading to undefined behaviors. Therefore, the VC generation focuses on encoding in the correct executions of statements.

4 Partition-based Memory Model

We define a functor that produces memory models environments implementing the interface on Figure 7 from the information inferred by a pointer analysis. The main idea is that the SA produces a finite partitioning of symbolic blocks into a set of pairwise disjoint sub-blocks and each sub-block is mapped to a specific set of array logic variables by the compilation environment. We first formalize the pre-requisites for the pointer analysis using a signature constrained by well-formed properties. Then, we define the functor by providing an implementation for each element of the interface on Figure 7.

4.1 Pointer Analysis Signature

A necessary condition on the pointer analysis is its soundness. To ease the reasoning about this property of analysis, we adopt the abstract interpretation [popl/CousotC77] framework. In this setting, a SA computes an abstract representation of the set of concrete states reached by the program’s executions before the execution of each statement. The abstract states belong to a complete lattice which is related to the set of concrete program configurations by a pair of functions (abstraction) and (concretization) forming a Galois connection. In the following, we overload the symbol to denote concretization functions for other abstract objects.

disjointness: (1)
completeness: (2)
unique base: (3)
(4)
(5)
(7)
Figure 8: A signature for pointer analysis and its properties

Aside being sound, the SA shall be context sensitive and provide, for each context call, an implementation of the signature on Figure 8. The values of provides, for each statement of the current context, the abstract state in computed by the analysis. The type represents the domain of abstract values computed for the pointer expressions in abstract states. The concretization function maps abstract locations to sets of concrete locations.

The type stands for the set of pairwise disjoint abstract blocks partitioning the symbolic memory blocs, for the fixed specific context call. The concretization function for abstract blocks maps blocks to set of concrete locations. Equations (1) and (2) in Figure 8 specify that abstract blocks in shall form a partition of the set of concrete locations available in symbolic blocks such that an abstract block belongs to a unique symbolic block.

The operation returns the symbolic block to which belongs, represented by the program variable labeling this symbolic block. The range of an abstract block inside its symbolic block is specified by the operation , which returns a formula (boolean term in ) that constrains to be in this range. The soundness of the and operations is specified by equation (4). The set of abstract blocks covered by an abstract location is provided by the operation , whose soundness is specified by equation (5). The operation abstracts the offset of a program variable. Abstract locations may be shifted by an integer term using operation . Operation computes the abstract location stored at in some context , i.e., it dereferences of type for some . (We denote by the set of all pointer types in the program.) The last two operations shall be sound abstract transformers on abstract locations, as stated in equations (7) resp. (7).

4.2 A Functor for Memory Model Environments

We define now our functor that uses the signature to define the elements of the memory model environment defined in Figure 7. To disambiguate symbols, we prefix names of types and operations by the name of the signature or logic theory when necessary.

Environment’s type: A compilation environment stores the mapping to abstract states from and and a total function that associates to each abstract block in a logic variable in :

(8)

where denotes the set of total functions from to , i.e., . We designate by and the first and second component of some .

If an abstract block stores only one type of values, the logic variable has type where is the logic type for the values stored. For blocks storing integer values (i.e., ), is naturally (logical) or . For blocks storing pointer values, is , where the denotes the abstract block of the location and represents the location’s offset. We denote by the integer constant that uniquely identifies . If an abstract block stores values of both kinds of scalar types (notice that only scalar values are stored in array-based models), the logic variable has the type pair of arrays, where the first array is used for integer values and the second one for pointer values. For readability, we detail here only the case of homogeneously typed blocks. Notice that the mapping binds fresh array variable names to abstract blocks changed by store operation.

Locations’ type: The type collects the logic encoding of locations as a pair of integer terms together with the abstract location provided by the static analysis, i.e., . Intuitively, in the logic pair , is interpreted as an abstract block identifier and models the offset of the location in the symbolic block of the abstract block , i.e., an integer in the slice of .

Locations’ operations: The values of are built by two operations and defined as follows. For a program variable , is based on the abstract location returned by . The domain of shall have only one abstract block because program variables are located at the start of symbolic blocks. Moreover, the term denoting the offset shall be the constant . Formally:

(9)

The shifting of a location in Loc by an expression is computed based on the abstract shift operation as follows:

(10)

where and the new logic base selects (using a conditional expression) the base from the ones of . Let us denote by the boolean term testing that the block identifier in is one of the blocks identifiers in which has the same symbolic block (i.e., base) as , i.e.:

(11)

Using , if is , the formal definition of is:

(12)

Indeed, since the shift operation can not change the symbolic block, we have to test, using , that each resulting block identifier has the same symbolic block as .

The size of the expression encoding depends on the product of sizes of domains computed by for and . If the abstract locations have a singleton domain, i.e. , then is simply . When the precision of the SA does not enable such simplification, we could soundly avoid big expressions generated by by using in and operations only the component abstract location of an environment’s location.

Loading from memory: Reading an integer value in the environment m at a location is compiled into a read operation (denoted by for concision) from an array variable obtained by statically dispatching the logical base of l among the possible base identifiers in as follows:

(13)

where

(14)

The size of the expression above may be reduced by asking to SA an over-approximation of the values of expression in the current state. If SA is able to produce a precise result for , we could remove from the expression above the cases for abstract blocks for which (i.e., the formula is invalid for the values in ).

The expression in equation (14) is also used for reading pointer values. In this case, the expression obtained is a tuple. The abstract location corresponding to this logic expression is obtained using the abstract operation in the abstract state component of the environment:

(15)

Storing in memory: The compilation of store semantic operation is done by the operation that produces a new environment and a boolean term (formula) encoding the relation between the logic arrays associated to blocks before and after the assignment as follows:

(16)

where with the abstract state computed by the analysis for the control pointer after the assignment compiled. The new block mapping uses fresh logic variables for the abstract blocks in the domain of the abstract location at which is done the update:

(17)

The fresh variables are related with the old ones using the store operator on logic arrays, denoted by , in the generated formula defined as follows:

(18)

The size of this expression may be reduced using the SA results in a similar way as for load. In general, the size of expressions generated by the compilation in equations (12), (14) and (18) depends on size of the domain for the abstract locations computed by the static analysis. Indeed, if the analysis always provides abstract locations with a singleton domain, the compilation produces expressions with only one component, while proving most separation annotations. However, if the analysis computes a small set (however bigger or equal to the number of program variables), the VC generated does not win any concision (we are falling back to the separation given by the typed model).

Functor’s properties: The requirements on the signature PA ensure that the operations , and are sound. This enforces the soundness of definitions for the MME’s operations. Based on this observation, we conjecturecolor=pinkcolor=pinktodo: color=pinkMS: proof idea? that these operations compute a sound post-condition relation, although this relation maybe not the strongest post-condition. A formal proof is left for future work.

5 Instances of Pointer Analysis Signature

The signature may be implemented by several existing pointer analyses. We consider three of them here and we show how they fulfill the requirements of . We also define an analysis which exploits the results of a precise pointer analysis to provide an appropriate partitioning of the memory in .

All pointer analyses we consider computes statically the possible values (i) of an address expression, i.e., an over-approximation of ( from Figure 4) and (ii) of an address dereference, i.e., an over-approximation of . For these reason, these analyses belong to the points-to analyses class [conf/paste/Hind01].

5.1 Basic Analyses (B and B)

The first points-to analysis abstracts locations by a finite set of pairs built from a symbolic block identifier and an abstraction for sets of integers collecting the possible offsets of the location in the symbolic block. If we fix to be the abstract domain used to represents sets of integers, then the abstract domain for locations is defined by .

Many abstract domains have ben proposed to deal with integer sets in abstract interpretation framework. For points-to analysis, most approaches use the classic domain of intervals [popl/CousotC77]. To obtain more precise results, we consider here the extension of the interval domain which also keeps modulo constraints and small sets of integers. This domain is implemented in the Eva plugin of Frama-C [fac/KirchnerKPSY15]. Then, the abstract sets in are defined by the following grammar:

(19)

where are natural constants, are integer constants and are integer constants extended with two symbols to capture unspecified bounds. We wrote for . The concretization of a value in , maps to the set of integers such that . Because the abstract intervals are used to capture offsets in symbolic blocks which have a known size (given by the ABI), the concrete offsets are always bounded, but they may be very large. We obtain independence of the ABI by introducing unspecified bounds for intervals and the value. For efficiency, the size of explicit sets is kept bounded by a parameter of the analysis, denoted in the following ilvl. The domain comes with lattice operators (e.g., join ) and abstract transformers for operations on integers. Our work requires a sound abstract transformer for addition, .

Precise offsets (B): Let us consider a precise instance of such an analysis, i.e. field-sensitive and employing the abstract domain of intervals defined above. Let be the abstract domain for program’s states implemented in this analysis. This domain captures the abstract values for all program’s variables. We denote by the abstract location (in ) computed by the analysis for the address expression at statement . For address expressions typed as pointer to pointer types, the abstract value of the address expression is also an element of and computes the points-to information.color=pinkcolor=pinktodo: color=pinkBY: load is partial

The types and operations of are shown in Figure 9. The symbolic blocks are not partitioned, since . Then, the slice for a block is the set of valid offsets for the symbolic block and the generated constraint is very simple. Abstract locations are shifted precisely using the abstract transformer for addition in . It is usually precise when is a constant. The soundness properties required by are trivially satisfied due to the simple form of abstract blocks’ type and the soundness of operations on the abstract domains used.

(20)
(21)
(22)
(23)
(24)
Figure 9: Implementation of by analyses B and B

Imprecise offsets (B): We also consider an instance of the points-to analysis which is not field-sensitive. For example, the B analysis computes for , where is the assignment at line 3 of listing in Figure 3(a), the set of abstract location . The definition of the elements of the signature is exactly the one given in Figure 9.

5.2 Partitioning by Cells (C)

Analyzers that do not handle aggregate types (arrays and structs) decompose the symbolic blocks of variables having aggregate types into atomic blocks that all have a scalar type. We call these blocks cells. For examples, the symbolic block of variable pf in Figure 3(b) is split into four cells of type pos_t. For this analysis, the definitions for are those given in Figure 9 except for the type and the operations using this type and . To define , we first define the set of cells-paths of type by induction on the syntax of as follows: