System FR as Foundations for Stainless
Abstract.
We present the design, implementation, and foundation of a verifier for higherorder functional programs with generics and recursive data types. Our system supports proving safety and termination using preconditions, postconditions and assertions. It supports writing proof hints using assertions and recursive calls. To formalize the soundness of the system we introduce System FR, a calculus supporting System F polymorphism, dependent refinement types, and recursive types (including recursion through contravariant positions of function types). Through the use of sized types, System FR supports reasoning about termination of lazy data structures such as streams. We formalize a reducibility argument using the Coq proof assistant and prove the soundness of a typechecker with respect to callbyvalue semantics, ensuring type safety and normalization for typeable programs. Our program verifier is implemented as an alternative verificationcondition generator for the Stainless tool, which relies on existing SMTbased solver backend for automation. We demonstrate the efficiency of our approach by verifying a collection of higherorder functional programs comprising around 14000 lines of polymorphic higherorder Scala code, including graph search algorithms, basic number theory, monad laws, functional data structures, and assignments from popular Functional Programming MOOCs.
1 \newtogglearxiv \toggletruearxiv
1. Introduction
Automatically verifying the correctness of higherorder programs is a challenging problem that applies to most modern programming languages and proof assistants. Despite extensive research in program verifiers (Nipkow et al., 2002a; Bertot and Castéran, 2004a; Abel, 2010; Norell, 2007; Brady, 2013; Vazou et al., 2014; Swamy et al., 2013; Leino, 2010) there remain significant challenges and tradeoffs in checking safety and termination. A motivation for our work are implementations that verify polymorphic functional programs using SMT solvers (Suter et al., 2011; Vazou et al., 2014). To focus on foundations, we look at simpler verifiers that do not perform invariant inference and are mostly based on unfolding recursive definitions and encoding of higherorder functions into SMT theories (Suter et al., 2011; Voirol et al., 2015). A recent implementation of such a verifier is the Stainless system ^{1}^{1}1https://github.com/epfllara/stainless, which claims to handle a subset of Scala (Odersky et al., 2008). The goal of Stainless is to verify that function contracts hold and that all functions terminate. Unfortunately, the termination checking procedure is not documented to the best of our knowledge and even its soundness can be doubted. Researchers have shown (Hupel and Kuncak, 2016) how to map certain patterns of specified Scala programs into Isabelle/HOL to ensure verification, but the linkup imposes a number of restrictions on data type definitions and can certify only a fraction of programs that the original verifier can prove. This paper seeks foundations for verification and termination checking of functional programs with such a rich set of features.
Termination is desirable for many executable functions in programs and is even more important in formal specifications. A nonterminating function definition such as could be easily mapped to a contradiction and violate the conservative extension principle for definitions. Yet termination in the presence of higherorder functions and data types is challenging to ensure. For example, when using nonmonotonic recursive types, terms can diverge even without the explicit use of recursive functions, as illustrated by the following snippet of Scala code:
Furthermore, even though the concept of termination for all function inputs is an intuitively clear property, its modular definition is subtle: a higher order function taking another function as an argument should terminate when given any terminating function , which in term can be applied to expressions involving further calls to . We were thus led to type theoretic techniques, where reducibility method has long been used to show strong normalization of expressive calculi (Tait, 1967), (Girard, 1990, Chapter 6), (Harper, 2016). As a natural framework for analyzing support for firstclass functions with preconditions and postconditions we embraced the ideas of refinement dependent types similar to those in Liquid Haskell (Vazou et al., 2014) with refinementbased notion of subtyping. To explain verification condition generation in higherorder case (including the question of which assumptions should be visible inside for a given assertion), we resorted to wellknown dependent () function types. To support polymorphism we incorporated type quantifiers, as in System F (Girard, 1971, 1990). We found that the presence of refinement types allowed us to explain soundness of wellfounded recursion based on userdefined measures. To provide expressive support for iterative unfolding of recursive functions, we introduced rules to make function bodies available while type checking of recursive functions. For recursive types, many existing systems introduce separate notions of inductive and coinductive definitions. We found this distinction less natural for developers and chose to support expressive recursive types (without a necessary restriction to positive recursion) using sized types (Abel, 2010). We draw inspiration from a number of existing systems, yet our solution has a new combination of features that work nicely together. For example, we can encode userdefined termination measures for functions using a general fixpoint combinator and refinement types that ensure termination condition semantically. The recursion in programs is thus not syntactically restricted as in, e.g., System F.
We combined these features into a new type system, System FR, which we present as a bidirectional type checking algorithm. The algorithm generates type checking and type inference goals by traversing terms and types, until it reaches a point where it has to check that a given term evaluates to true. This typically arises when we want to check that a term has a refinement type , which is the case when has type , and when the term evaluates to true in the context where equals . Following the tradition of SMTbased verifiers (Detlefs et al., 1998; Barnett et al., 2004), we call checks that some terms evaluate to true verification conditions.
We prove the soundness of our type system using a reducibility interpretation of types. The goal of our verification system is to ensure that a given term belongs to the semantic denotation of a given type. For simple types such as natural numbers, this denotation is the set of untyped lambda calculus terms that evaluate, in a finite number of steps, to a nonnegative integer. For function types the denotation are, as is typical in reducibility approaches, terms that, when applied to terms in denotation of argument type, evaluate to terms in the denotation of the result type. Such denotation gives us a unified framework for function contracts expressed as refinement types. The approach ensures termination of programs because the semantics of types only contain terms that are terminating in callbyvalue semantics.
We have formally proven using the Coq proof assistant (Bertot and Castéran, 2004a) the soundness of our typing algorithm, implying that when verification conditions generated for checking that a term belongs to a type are semantically valid, the term belongs to the semantic denotation of the type . The bidirectional typing algorithm handles the expressive types in a deterministic and predictable way, which enables good and localized error reporting to the user. To solve generated verification conditions, we use existing implementation invoking the Inox solver^{2}^{2}2https://github.com/epfllara/inox that translates them into firstorder language of SMT solvers (Voirol et al., 2015). Our semantics of types provides a definition of soundness for such solvers; any solver that respects the semantics can be used with our verification condition generator. Our bidirectional type checking algorithm thus becomes a new, trustworthy verification condition generator for Stainless. We were successful in verifying many existing Stainless benchmarks using the new approach.
We summarize our contributions as follows:

We define a bidirectional typechecking algorithm for System FR (Section 5.6). Our algorithm generates verification conditions that are then solved by the (existing) SMTbased solver Inox.

We prove
Figure 1. Template of a recursive function with usergiven contracts and a decreasing measure. 2. Examples of Program Verification and Termination Checking
Our goal is to verify correctness and termination of pure Scala functions written as in Figure 1. [x] is the precondition of the function f, and is written by the user in the same language as the body of f. The precondition may contain arbitrary expressions and calls to other functions. Similarly, the user specifies in the property that the results of the function should satisfy. To ensure termination of f (which might call itself recursively), the user may also provide a measure using the decreases keyword, which is also an expression (of type , the type of natural numbers) written in the same language. and may be arbitrary types, including function types or algebraic data types. Informally, the function is terminating and correct, if, for every value v of type such that [v] evaluates to , f(v) returns (in a finite number of steps) a value res of type such that [v,res] evaluates to . By using dependent and refinement types, this can be summarized by saying that the function f has type:
⬇ sealed abstract class List case object Nil extends List case class Cons(head: , tail: List) extends List def filter(l: List, p: Boolean): List = { decreases(l) l match { case Nil Nil case Cons(h, t) if p(h) Cons(h, filter(t, p)) case Cons(_, t) filter(t, p) } } def count(l: List, x: ): = { decreases(l) l match { case Nil 0 case Cons(h, t) (if (h == x) 1 else 0) + count(t, x) }} ⬇ def partition(l: , p: ): (, ) = { decreases() l match { case Nil Nil case x :: xs val (l1, l2) = partition(xs, p) if (p(x)) (x :: l1, l2) else (l1, x :: l2) } } ensuring { res res._1 == filter(l, p) && res._2 == filter(l, x !p(x))} Figure 2. The function filter filters elements of a list based on a predicate p, and count counts the number of occurrences of x in a list. Figure 3. A partition function specified using filter and with termination measure is given with size. def partitionMultiplicity(@induct l: , p: , ): Boolean = {val (l1, l2) = partition(l, p)count(l, x) == count(l1, x) + count(l2, x)} holdsFigure 4. A proof (by induction on l) that partitioning a list preserves the multiplicity of each element. ⬇ def isSorted(l: List): Boolean = { decreases(size(l)) l match { case Nil() true case Cons(x, Nil()) true case Cons(x, Cons(y, ys)) x y && isSorted(Cons(y, ys)) } } ⬇ def merge(l1: List, l2: List): List = { require(isSorted(l1) && isSorted(l2)) decreases(size(l1) + size(l2)) (l1, l2) match { case (Cons(x, xs), Cons(y, ys)) if (x y) Cons(x, merge(xs, l2)) else Cons(y, merge(l1, ys)) case (Cons(_, _), Nil) l1 case _ => l2 } } ensuring { res => isSorted(res) } Figure 5. A function that checks whether a list is sorted and a function that merges two sorted lists As an example, consider the list type as defined in Figure 3. We use to denote the type of integers (corresponding to Scala’s BigInt in actual source code). The function filter filters elements from a list, while count counts the number of occurrences of an integer in the list. These two functions have no pre or postconditions. The decreases clauses specify that the functions terminate because the size of the list decreases at each recursive call.
Using these functions we define partition in Figure 3, which takes a list l of natural numbers and partitions it according to a predicate p: . We prove in the postcondition that partitioning coincides with applying filter to the list with p and its negation.
Figure 4 shows a theorem that partition also preserves the multiplicity of each element. We use here count to state the property, but we could have used multisets instead (a type which is natively supported in Stainless). The holds keyword is a shorthand for ensuring { res => res }. The @induct annotation instructs the system to add a recursive call to partitionMultiplicity on the tail of l when l is not empty. This gives us access to the multiplicity property for the tail of l, which the system can then use automatically to prove that the property holds for l itself. This corresponds to a proof by induction on l.
Figure 5 shows a function isSorted that checks whether a list is sorted, and a function merge that combines two sorted lists in a sorted list. When given the above input, the system proves the termination of all functions, establishes that postconditions of functions hold, and shows that the theorem holds, without any user interaction or additional annotations.
2.1. Reasoning about Streams
Our system also supports reasoning about infinite data structures, including streams that are computed on demand. These data structures are challenging to deal with because even defining termination of an infinite stream is nonobvious, especially in absence of a concrete operation that uses the stream. Given some type X, represents the type of infinite streams containing elements in X. In a mainstream callbyvalue language such as Scala, this type can be defined as:
case class Stream[X](head: X, tail: Stream[X])For the sake of concise syntax, we typeset a function taking unit, (u:Unit)=>e, using Scala’s syntax e for a function of zero parameters. Given a stream s: , we can call s.head to get the head of the stream (which is of type X), or s.tail to get the tail of the stream (which is of type ). We can use recursion to define streams, as shown in figures 8, 8, 8. The @ghost annotation is used to mark the ghost parameters n of these functions. These parameters are used as annotations to guide our typechecker, but they do not influence the computation and can be erased at runtime. For instance, an erased version of constant (without ghost code and without type annotation) looks like:
def constant(x) = Stream(x, constant(x))Informally, we can say that the constant stream is terminating. Indeed, it has the interesting property that, despite the recursion, for every , we can take the first elements in finite time (no divergence in the computation). We say that constant(x) is an nondiverging stream. Moreover, when a stream is nondiverging for every , we simply say that it is nondiverging, which means that we can take as many elements as we want without diverging, which is the case for constant(x). Note that nondivergence of constant cannot be shown by defining a measure on its argument x that strictly decreases on each recursive call, because constant is called recursively on the exact same argument x. Instead, we define a measure on the ghost argument n of the annotated version. This corresponds to using typebased termination (Abel, 2008, 2007; Barthe et al., 2008), where the type of the function for the recursive call is smaller than the type of the caller. We expand on that technique in Section 5.2.
In the annotated version of constant from Figure 8, the notation stands for streams of elements in X which are nnondiverging. The type of constant then states that constant can be called with any (ghost) parameter n to build an nnondiverging stream. Since parameter n is computationally irrelevant, this proves that the erased version of constant returns a nondiverging stream. At the moment, while our formalization fully supports streams, but we have not made Scala frontend modification for parameters such as to parse the functions given above. Instead we construct them internally in our tool as syntax trees.
Figure 9. Grammar for untyped lambda calculus terms 3. Syntax and Operational Semantics
We now give a formal syntax for terms and show (in Appendix) callbyvalue operational semantics. This untyped lambda calculus with pairs, tagged unions, integers, booleans, and error values models programs that our verification system supports. It is Turing complete and rather conventional.
3.1. Terms of an Untyped Calculus
Let be a set of variables. We let be the set of all (untyped) terms (see Figure 9) which includes the unit term , pairs, booleans, natural numbers, a recursor rec for iterating over natural numbers, a pattern matching operator match for natural numbers, a recursion operator fix, an error term to represent crashes, and a generic term to represent proofs of equality between terms. The recursor rec can be simulated using fix and match but we keep it in the paper for presenting examples.
The terms and are used to represent data structures (such as lists or streams), where ‘’ plays the role of a constructor, and ‘’ the role of a deconstructor. The terms and are used to represent the erasure of type abstractions and type instantiation terms (for polymorphism) of the form and , where is a type variable and is a type. These typeannotated terms will be introduced in a further section.
The term is a special term to internalize the sizes of syntax trees of values (ignoring lambdas) of our language. It is used for measure of recursive functions such as the map examples on lists shown in Section 2.
Given a term we denote the set of all free variables of . Terms are considered up to renaming of locally bound variables (alpharenaming).
3.2. CallbyValue Operational Semantics
The set of values of our language is defined (inductively) to be , , , , , every variable , every lambda term or , the terms of the form or where , and the terms of the form where .
The callbyvalue smallstep relation between two terms , written , is standard for the most part and given in Figure 23 (Appendix A). Given a term and a value , denotes the term where every free occurrence of has been replaced by .
To evaluate the fixpoint operator fix, we use the rule , which substitutes the fix under a lambda with unit argument. We do this wrapping of fix in a lambda term because we wanted all substitutions to be values for our callbyvalue semantics, and fix is not. This also means that, to make a recursive call within , one has to use y() instead of y.
To define the semantics of , we use a (mathematical) function that returns the size of a value, ignoring lambdas for which it returns . The precise definition is given in Figure 24 (Appendix A).
We denote by the reflexive and transitive closure of . A term is normalizing if there exists a value such that .
Figure 10. Grammar for types , where is a term variable, is a type variable ( denotes typeannotated terms of Figure 14 that complete the mutually recursive definition) 4. Types, Semantics and Reducibility
We give in Figure 10 the grammar for the types that our verification system supports. Given two types and , we use the notation for when is not a free variable of . Similarly, we use the notation when is not a free variable of .
For recursive types, we introduce the notation:
Then, the type of (nondiverging) streams informally introduced in Section 2 can be understood as a notation, when X is a type, for: . Similarly, for a natural number , the type of nondiverging streams is a notation for . Using this notation, we can also define finite data structures such as lists of natural numbers, as follows: .
We show in Section 4.3 that these types indeed correspond to streams and lists respectively.
Let be the set of all types. We define a (unary) logical relation on types to describe terms that do not get stuck (e.g. due to the error term , or due to an illformed application such as ‘’) and that terminate to a value of the given type. Our definition is inspired by the notion of reducibility or hereditary termination (see e.g. (Tait, 1967; Girard, 1990; Harper, 2016)), which we use as a guiding principle for designing the type system and its extensions.
4.1. Reduciblity for Closed Terms
For each type , we define in Figure 11 mutually recursively the sets of reducible values and reducible terms . In that sense, a type can be understood as a specification that some terms satisfy (and some do not).
These definitions require an environment , called an interpretation, to give meaning to type variables. Concretely, an interpretation is a partial map from type variables to sets of terms. An interpretation has the constraint that for every type variable , is a reducibility candidate , which, in our setting, means that all terms in are (erased) values. The set of all reducibility candidates is denoted by , and an interpretation is therefore a partial map in .
When the interpretation has no influence on the definition, we may omit it. For instance, for every , we have , so we can just denote this set by .
By construction, only contains (erased) values (of type ), while contains (erased) terms that reduce to a value in . For example, a term in is not only normalizing as a term of its own, but also normalizes whenever applied to a value in .
Figure 11. Definition of reducibility for values and for terms for each type. The function is an auxiliary function, used in the base case of the definition for recursive types. The type represents the values of type for which evaluates to . We use this type as a building block for writing specifications (pre and postconditions).
The type represents the values that are in the intersection of the types when ranges over values of type .
The sum type represents values that are either of the form where is a reducible value of , or of the form where is a reducible value of .
The set of reducible values for the equality type makes use of a notion of equivalence on terms which is based on operational semantics. More specifically, we say that and are equivalent, denoted , if for every value , we have . Note that this equivalence relation is defined even if we do not know anything about the types of terms and , and it ensures that if one of the terms reduces to a value, then so does the other.
The type is the polymorphic type from System F. The set is defined by using the environment to bind the type variable to an arbitrary reducibility candidate.
We use the recursive type as a building block for representing data structures such as lists of streams. The definition of reducibility for the recursive type makes use of an auxiliary function that can be seen as an (upper) approximation of the recursive type. Note that removes the type variable from .
Our reducibility definition respects typical lemmas that are needed to prove the soundness of typing rules, such as the following substitution lemma (see (Girard, 1971) for the lemma on System F), which we have formally proven (see also Section 5.8 below).
Lemma 4.1 ().
Let and be two types, and let be a type variable that may appear in but not in . Let be a type interpretation. Then, we have:
4.2. Reduciblity for Open Terms
Having defined reducibility for closed terms, we now define what it means for a term with free term and type variables to be reducible for a type . Informally, we want to ensure that for every interpretation of the type variables, and for every substitution of values for the term variables, the term reduces in a finite number of steps to a value in type . This is formalized by a (semantic) typing relation which is defined as follows.
First, a context is made of a finite set of type variables and of a sequence of pairs in . The domain of , denoted is the list of variables (in ) appearing in the lefthandsides of the pairs. We implicitly assume throughout the paper that all variables appearing in the domains are distinct. This enables us to use as a partial map from to . We use a sequence to represent as the order of variables is important, since a variable may have a (dependent) type which refers to previous variables in the context.
Given a partial map , we write for the term where every variable is replaced by . We use the same notation for applying a substitution to a type .
Given a context , a reducible substitution for is pair a partial maps and where: , , and .
Note that the substitution is also applied to the type , since may be a dependent type with free term variables. The set of all pairs of reducible substitutions for is denoted .
Finally, given a context , a term and a type , we say that holds when for every pair of substitutions for the context , belongs the reducible values at type . Formally, is defined to hold when:
Our bidirectional type checking and inference algorithm in Section 5 is a sound (even if incomplete) procedure to check .
4.3. Recursive Types
We explain in this section how to interpret the type (see reducibility definition in Figure 11) and how the and types represent streams and lists.
4.3.1. Infinite Streams
For a natural number , consider the type . Let us first see what represents for small values of . As a shortcut, we use the notations , , , for , , ,
The definition refers to , which is by definition. This means that is the set of values of the form , where , and .
By unrolling the definition, we get that is the set of values of the form where is in , which is the same (by Lemma 4.1) as . Therefore, is the set of values of the form where and . This means that when it is applied to , terminates and returns a value in . Similarly, is the set of values of the form where and .
To summarize, we can say that for every , represents values of the language that behave as streams of natural numbers, as long as they are unfolded at most times. This matches the property we mentioned in Section 2, as represents the streams that are nondiverging. We can show that as grows, gets more and more constraints: In the limit, a value (which is in every for ), represents a stream of natural numbers, that, regardless of the number of times it is unfolded, does not diverge, i.e. a nondiverging stream. Equivalently, we have .
4.3.2. Finite Lists
Types of the form can also be used to represent finite data structures such as lists. We let be a notation for , so that:
Here are some examples to show how lists are encoded:

The empty list is ,

A list with one element is ,

More generally, given an element and a list , we can construct the list by writing: .
Let us now see why represents the type of all finite lists of natural numbers. The first thing to note is that given , does not represent the lists of size . For instance, we know that is the set of values of the form where , i.e. . Therefore, contains lists of all sizes (and also all values that do not represent lists, such as or ).
Instead, can be understood as the values that, as long as they are unfolded no more than times, behave as lists. Just like for streams, we have: ^{3}^{3}3This monotonicity comes from the fact that only appear is positive positions in the definitions of the recursive types for streams and lists.
In the limit, we can show that contains all finite lists, and nothing more.
Let be a value. Then, if and only if there exists and such that .
Lemma 4.2 ().
It can seem surprising that the type of streams contains infinite streams while the type of lists only contains finite lists. The reason is that, in a callbyvalue language, a value representing an infinite list would need to have an infinite syntax tree, with infinitely many ’s (which is not possible). On the other hand, we can represent infinite streams by hiding recursion underneath a lambda term as shown in Section 2.
5. A Bidirectional TypeChecking Algorithm
