# Why Would You Trust B?

###### Abstract

The use of formal methods provides confidence in the correctness of developments. Yet one may argue about the actual level of confidence obtained when the method itself – or its implementation – is not formally checked. We address this question for the B, a widely used formal method that allows for the derivation of correct programs from specifications. Through a deep embedding of the B logic in Coq, we check the B theory but also implement B tools. Both aspects are illustrated by the description of a proved prover for the B logic.

###### Keywords:

Confidence, Formal Methods, Prover, Deep embeddingA clear benefit of formal methods is to increase the confidence in the correctness of developments. However, one may argue about the actual level of confidence obtained, when the method or its implementation are not themselves formally checked. This question is legitimate for safety, as one may accidentally derive invalid results. It is even more relevant when security is a concern, as any flaw can be deliberately exploited by a malicious developer to obfuscate undesirable behaviours of a system while still getting a certification.

B [abr:1] is a popular formal method that allows for the derivation of correct programs from specifications. Several industrial implementations are available (e.g. AtelierB, B Toolkit), and it is widely used in the industry for projects where safety or security is mandatory. So the B is a good candidate for addressing our concern: when the prover says that a development is right, who says that the prover is right? To answer this question, one has to check the theory as well as the prover w.r.t. this theory (or, alternatively, to provide a proof checker). Those are the objectives of BiCoq, a deep embedding of the B logic in Coq [coq:1].

BiCoq benefits from the support of Coq to study the theory of B, and to check the validity of standard definitions and results. BiCoq also allows us, through an implementation strategy, to develop formally checked B tools. This strategy is illustrated in this paper by the development of a prover engine for the B logic, that can be extracted and used independently of Coq. Coq is therefore our notary public, witnessing the validity of the results associated to the B theory, as well as the correctness of tools implementing those results – ultimately increasing confidence in B developments. The approach, combining a deep embedding and an implementation technique, can be extended to address further elements of the B, beyond its logic, or to safely enrich it, as illustrated in this paper.

This paper is divided into 9 sections. Sections 1, 2 and 3 briefly introduce B, Coq and the notion of embedding. The B logic and its formalisation in Coq are presented in Sec. 4. Section LABEL:sc_proofs describes various results proved using BiCoq. Section LABEL:sc_environ focuses on the implementation strategy, and presents its application to the development of a set of extractible proof tactics for a B prover. Section LABEL:sc_extend discusses further uses of BiCoq, and mentions some existing extensions. Finally, Sect. LABEL:sc_conc concludes and identifies further activities.

## 1 A Short Introduction to B

In a nutshell, the B method defines a first-order predicate logic completed with elements of set theory, a Generalised Substitution Language (GSL) and a methodology of development. An abstract B machine is a module combining a state, properties and operations (described as substitutions) to read or alter the state.

The logic is used to express preconditions, invariants, etc. and to conduct proofs. The GSL allows for definitions of substitutions that can be abstract, declarative and non-deterministic (that is, specifications) as well as concrete, imperative and deterministic (that is, programs). The following example uses the non-deterministic substitution (a “magic” operator finding a value which satisfies a property) to specify the square root of a natural number :

###### Example 1

Regarding the methodology, a machine refines an abstract machine if one cannot distinguish from by valid operation calls – this notion being independent of the internal representations, as illustrated by the following example of a system returning the maximum of a set of stored values:

###### Example 2

The state of is a (non implementable) set of natural numbers; the state of is a natural number. Yet , having the expected behaviour, refines .

Refinement being transitive, it is possible to go progressively from the specification to the implementation. By discharging at each step the proof obligations defined by the B methodology, a program can be proved to be a correct and complete implementation of a specification. This methodology, combined with the numerous native notions provided by the set theory and the existence of toolkits, make the B a popular formal method, widely used in the industry.

Note that the B logic is not genuinely typed and allows for manipulation of free variables. A special mechanism, called type-checking (but thereafter referred to as wf-checking), filters ill-formed (potentially paradoxal) terms; it is only mentioned in this paper, deserving a dedicated analysis.

The rest of the paper only deals with the B logic (its inference rules).

## 2 A Short Introduction to Coq

Coq is a proof assistant based on a type theory. It offers a higher-order logical framework that allows for the construction and verification of proofs, as well as the development and analysis of functional programs in an ML-like language with pattern-matching. It is possible in Coq to define values and types, including dependent types (that is, types that explicitly depend on values); types of sort represent sets of computational values, while types of sort represent logical propositions. When defining an inductive type (that is, a least fixpoint), associated structural induction principles are automatically generated.

For the intent of this paper, it is sufficient to see Coq as allowing for the manipulation of inductive sets of terms. For example, let’s consider the standard representation of natural numbers:

###### Example 3

It defines a type which is the smallest set of terms stable by application of the constructors and . is exactly made of the terms and for any finite ; being well-founded, structural induction on is possible.

Coq also allows for the declaration of inductive logical properties, e.g.:

###### Example 4

It defines a family of logical types: is a type inhabited by the term , is another type inhabited by , and is an empty type. The standard interpretation is that is a proof of the proposition and that there is no proof of , that is we have .

An intuitive interpretation of our two examples is that is a set of terms, and a predicate marking some of them, defining a subset of .

## 3 Deep Embedding and Related Works

Embedding in a proof assistant consists in mechanizing a guest logic by encoding its syntax and semantic into a host logic ([gor:2, bou:1, azu:1]). In a shallow embedding, the encoding is partially based on a direct translation of the guest logic into constructs of the host logic. In a deep embedding the syntax and the semantic are formalised as datatypes. At a fundamental level, taking the view presented in Sec. 2, the deep embedding of a logic is simply a definition of the set of all sequents (the terms) and a predicate marking those that are provable (the inference rules of the guest logic being encoded as constructors of this predicate).

Shallow embeddings of B in higher-order logics have been proposed in several papers (cf. [bod:1b, cha:1]) formalising the GSL in PVS, Coq or Isabelle/HOL. Such embeddings are not dealing with the B logic, and by using directly the host logic to express B notions, they introduce a form of interpretation. If the objective is to have an accurate formalisation of the guest system, the definition of a valid interpretation is difficult – e.g. B functions are relations, possibly partial or undecidable, and translating accurately this concept in Coq is a tricky exercise.

BiCoq aims at such an accurate formalisation, to pinpoint any problem of the theory with the objective to increase confidence in the developments when safety or security is a concern; in addition, we also have an implementation objective. In such cases, a deep embedding is fully justified – see for example the development of a sound and complete theorem prover for first-order logic verified in Isabelle proposed in [rid:1].

A deep embedding of the B logic in Coq is described in [brk:1] (using notations with names), to validate the base rules used by the prover of Atelier-B – yet not checking standard B results, and without implementation goal. As far as the implementation of a trusted B prover is concerned, we can also mention the encoding of the B logic as a rewriting system proposed in [cir:1].

Deep embeddings have also the advantage to clearly separate the host and the guest logics: in Bicoq, excluded middle, provable in B, is not promoted to Coq. This improves readibility, and allows one to study meta-theoretical questions such as consistency. Furthermore, the host logic consistency is not endangered.

## 4 Formalising the B Logic in Coq

In this section, we present our embedding of the B logic in the Coq system; the embedding uses a De Bruijn representation that avoids ambiguities and constitutes an efficient solution w.r.t. the implementation objective (see [deb:1, lia:1]). Deviations between B and its formalisation are described and justified.

###### Notation

B definitions use upper case letters with standard notations. BiCoq uses lower case letters, and mixes B and Coq notations; standard notations are used for Coq (e.g. is the universal quantification) while dotted notations are used for the embedded B (e.g. is the universal quantification constructor).

###### Notation

denotes the type of the lists whose elements have type .

### 4.1 Syntax

Given a set of identifiers (), the B logic syntax defines predicates (), expressions (), sets () and variables () as follows:

In this syntax, represents the (elementary) substitution, a list of variables, a pair of expressions, and the choice and powerset operators, and a constant set. The comprehension set operator, while syntactically defined by , is rejected at wf-checking if not of the form , with a variable not free in Â·

###### Definition

Other connectors are defined from the previous ones, is defined as , as , and as .

The first design choice of BiCoq is to use a pure nameless De Bruijn notation
(see [deb:1, ayd:1]), where variables are represented by indexes giving the position of
their binder – here the universal quantifier and the comprehension set. When an index exceeds
the number of parent binders, it is said to be dangling and represents a
free variable, whose name is provided by a scope (left implicit in this paper), so that
any syntactically correct term is semantically valid, and there is no need for well-formedness
condition^{1}^{1}1An alternative approach to avoid well-formedness conditions is described in
[pat:1].. In this representation, proofs of side conditions related to name clashing are
replaced by computations on indexes, but the index representing a variable is not constant in
a term.

The B syntax is formalised in Coq by two mutually inductive types with the following constructors, being the set of indexes (that is, ) and an infinite set of names with a decidable equality:

represents B predicates, while merges B expressions, sets and variables.

Using a De Bruijn representation, binders and have no attached names and only bind (implicitly) a single variable. Binding over list of variables can be eliminated without loss of expressivity, as illustrated by the following example:

###### Example 5

represents
^{2}^{2}2This second representation, while standard in B, appears to be an illegal
binding over the expression rather than over the variable , but the same
notations are used for both in [abr:1] and such confusions are frequent.

The constructor is further modified to be parameterised by an expression, to keep in the syntax definition only wf-checkable terms. Indeed, only comprehension sets of the form , with not free in , are valid. The BiCoq representation of this set is ; to reflect the non-freeness condition, only binds variables in its predicate parameter . By these design choices, we bridge the gap between syntactically correct terms and wf-checkable ones, while being conservative.

represents the constant set , unary (De Bruijn) variables. The constructor is without B equivalent, and provides elements of (cf. Par. LABEL:ss_infer).

###### Notation

denotes the application of constructor to and of constructor to . By abuse of notation the variable is also denoted simply by .

Finally, the elementary substitution is not considered in BiCoq as a syntactical construct but is replaced by functions on terms – substitution being introduced earlier in B only to be used in the description of inference rules. Note however that the full GSL of B can still be formalised by additional terms constructors (the explicit substitution approach, see [aba:1, cur:1]).

###### Notation

is defined as , as , and as .

###### Notation

denotes the type of terms, that is the union of and .

### 4.2 Dealing with the De Bruijn Notation

De Bruijn notation is an elegant solution to avoid complex name management, and it has numerous merits. But it also has a big drawback, being an unusual representation for human readers:

###### Example 6

If is the interpretation of the term , the interpretation of the term is ; because of the binder, the scope has shifted (so 2 now represents ), and (likely) the semantic has been distorted.

In this paragraph, we illustrate some of the consequences of using a De Bruijn notation, as well as how to mask such consequences from the users.

#### 4.2.1 Induction

When defining type , Coq automatically generates the associated structural induction principle. As illustrated in Ex. 6, it is however not semantically adequate, because it does not reflect De Bruijn indexes scoping. A more interesting principle is derived in BiCoq by using the syntactical depth function of a term as a well-founded measure:

With this principle, for the term (that is, ) we can choose to use an induction hypothesis on (that is, ) instead of (that is, ).

#### 4.2.2 Non-Freeness

The B notation means that the variable
does not appear free in . Non-freeness is defined in BiCoq as a type
(a relation between , representing the
variables, and ), with the following rules^{3}^{3}3The rules for the other
constructors are trivial and can be obtained by straightforward extension, e.g. here
and allow to derive .:

The two first rules are axioms, the associated constructors are atomic and do not interact with variables. The rules for and reflect the fact that the associated constructors are binders and therefore shift the scope.

#### 4.2.3 Binding, Instantiation and Substitution

It is possible to define functions to simulate B binding (that is the use of or , representing -abstraction). These functions constitute a built-in user interface to produce De Bruijn terms while using the usual representation, making De Bruijn indexes and their management invisible to the user (see also [gor:1] for a similar approach):