A Language for Generic Programming in the Large
Generic programming is an effective methodology for developing reusable software libraries. Many programming languages provide generics and have features for describing interfaces, but none completely support the idioms used in generic programming. To address this need we developed the language . The central feature of is the concept, a mechanism for organizing constraints on generics that is inspired by the needs of modern C++ libraries. provides modular type checking and separate compilation (even of generics). These characteristics support modular software development, especially the smooth integration of independently developed components. In this article we present the rationale for the design of and demonstrate the expressiveness of with two case studies: porting the Standard Template Library and the Boost Graph Library from C++ to . The design of shares much in common with the concept extension proposed for the next C++ Standard (the authors participated in its design) but there are important differences described in this article.
keywords:programming language design, generic programming, generics, polymorphism, concepts, associated types, software reuse, type classes, modules, signatures, functors, virtual types
The 1968 NATO Conference on Software Engineering identified a software crisis affecting large systems such as IBM’s OS/360 and the SABRE airline reservation system [1, 2]. At this conference McIlroy gave an invited talk entitled Mass-produced Software Components  proposing the systematic creation of reusable software components as a solution to the software crisis. He observed that most software is created from similar building blocks, so programmer productivity would be increased if a standard set of blocks could be shared among many software products. We are beginning to see the benefits of software reuse; Douglas McIlroy’s vision is gradually becoming a reality. The number of commercial and open source software libraries is steadily growing and application builders often turn to libraries for user-interface components, database access, report creation, numerical routines, and network communication, to name a few. Furthermore, larger software companies have benefited from the creation of in-house domain-specific libraries which they use to support entire software product lines .
As the field of software engineering progresses, we learn better techniques for building reusable software. In the 1980s Musser and Stepanov developed a methodology for creating highly reusable algorithm libraries [5, 6, 7, 8], using the term generic programming for their work.111The term generic programming is often used to mean any use of generics, i.e., any use of parametric polymorphism or templates. The term is also used in the functional programming community for function generation based on algebraic datatypes, i.e., polytypic programming. Here, we use generic programming solely in the sense of Musser and Stepanov. Their approach was novel in that they wrote algorithms not in terms of particular data structures but rather in terms of abstract requirements on structures based on the needs of the algorithm. Such generic algorithms could operate on any data structure provided that it meet the specified requirements. Preliminary versions of their generic algorithms were implemented in Scheme, Ada, and C. In the early 1990s Stepanov and Musser took advantage of the template feature of C++  to construct the Standard Template Library (STL) [10, 11]. The STL became part of the C++ Standard, which brought generic programming into the mainstream. Since then, the methodology has been successfully applied to the creation of libraries in numerous domains [12, 13, 14, 15, 16].
The ease with which programmers develop and use generic libraries varies greatly depending on the language features available for expressing polymorphism and requirements on type parameters. In 2003 we performed a comparative study of modern language support for generic programming . The initial study included C++, SML, Haskel, Eiffel, Java, and C#, and we evaluated the languages by porting a representative subset of the Boost Graph Library  to each of them. We recently updated the study to include OCaml and Cecil . While some languages performed quite well, none were ideal for generic programming.
Unsatisfied with the state of the art, we began to investigate how to improve language support for generic programming. In general we wanted a language that could express the idioms of generic programming while also providing modular type checking and separate compilation. In the context of generics, modular type checking means that a generic function or class can be type checked independently of any instantiation and that the type check guarantees that any well-typed instantiation will produce well-typed code. Separate compilation is the ability to compile a generic function to native assembly code that can be linked into an application in constant time.
Our desire for modular type checking was a reaction to serious problems that plague the development and use of C++ template libraries. A C++ template definition is not type checked until after it is instantiated, making templates difficult to validate in isolation. Even worse, clients of template libraries are exposed to confusing error messages when they accidentally misuse the library. For example, the following code tries to use stable_sort with the iterators from the list class.
Fig. 1 shows a portion of the error message from GNU C++. The error message includes functions and types that the client should not have to know about such as __inplace_stable_sort and _List_iterator. It is not clear from the error message who is responsible for the error. The error message points inside the STL so the client might conclude that there is an error in the STL. This problem is not specific to the GNU C++ implementation, but is instead a symptom of the delayed type checking mandated by the C++ language definition.
Our desire for separate compilation was driven by the increasingly long compile times we (and others) were experiencing when composing sophisticated template libraries. With C++ templates, the compilation time of an application is a function of the amount of code in the application plus the amount of code in all template libraries used by the application (both directly and indirectly). We would much prefer a scenario where generic libraries can be separately compiled so that the compilation time of an application is just a function of the amount of code in the application.
With these desiderata in hand we began laying the theoretical groundwork by developing the calculus F . F is based on System F [20, 21], the standard calculus for parametric polymorphism, and like System F, F has a modular type checker and can be separately compiled. In addition, F provides features for precisely expressing constraints on generics, introducing the concept feature with support for associated types and same-type constraints. The main technical challenge overcome in F is dealing with type equality inside of generic functions. One of the key design choices in F is that models are lexically scoped, making F more modular than Haskell in this regard. (We discuss this in more detail in Section 3.6.1.) Concurrently with our work on F, Chakravarty, Keller, and Peyton Jones responded to our comparative study by developing an extension to Haskell to support associated types [22, 23].
The next step after F was to add two more features needed to express generic libraries: concept-based overloading (used for algorithm specialization) and implicit argument deduction. Fully general implicit argument deduction is non-trivial in the presence of first-class polymophism (which is present in ), but some mild restrictions make the problem tractable (Section 3.5). However, we discovered a a deep tension between concept-based overloading and separate compilation . At this point our work bifurcated into two language designs: the language which supports separate compilation and only a basic form of concept-based overloading [25, 26], and the concepts extension to C++ , which provides full support for concept-based overloading but not separate compilation. For the next revision of the C++ Standard, popularly referred to as C++0X, separate compilation for templates was not practical because the language already included template specialization, a feature that is also deeply incompatible with separate compilation. Thus, for C++0X it made sense to provide full support for concept-based overloading. For we placed separate compilation as a higher priority, leaving out template specialization and requiring programmers to work around the lack of full concept-based overloading (see Section X).
Table 1 shows the results of our comparative study of language support for generic programming  augmented with new columns for C++0X and and augmented with three new rows: modular type checking (previously part of “separate compilation”), lexically scoped models, and concept-based overloading. Table 2 gives a brief description of the evaluation criteria.
|Associated type access||●||●||
|Constraints on assoc. types||-||●||●||●||
|Implicit arg. deduction||●||○||●||●||●||●||
|Modular type checking||○||●||
|Lexically scoped models||○||●||○||○||○||○||○||○||●|
|Multi-type concepts||Multiple types can be simultaneously constrained.|
|Multiple constraints||More than one constraint can be placed on a type parameter.|
|Associated type access||Types can be mapped to other types within the context of a generic function.|
|Constraints on associated types||Concepts may include constraints on associated types.|
|Retroactive modeling||The ability to add new modeling relationships after a type has been defined.|
|Type aliases||A mechanism for creating shorter names for types is provided.|
|Separate compilation||Generic functions can be compiled independently of calls to them.|
|Implicit argument deduction||The arguments for the type parameters of a generic function can be deduced and do not need to be explicitly provided by the programmer.|
|Modular type checking||Generic functions can be compiled independently of calls to them.|
|Lexically scoped models||Model declarations are treated like any other declaration, and are in scope for the remainder of enclosing namespace. Models may be explicitly imported from other namespaces.|
|Concept-based overloading||There can be multiple generic functions with the same name but differing constraints. For a particular call, the most specific overload is chosen.|
The rest of this article describes the design of in detail. We review the essential ideas of generic programming and survey of the idioms used in the Standard Template Library (Section 2). This provides the motivation for the design of the language features in (Section 3). We then evaluate with respect to a port of the Standard Template Library (Section 4) and the Boost Graph Library (Section 5). We conclude with a survey of related work (Section 6) and with the future directions for our work (Section 7).
This article is an updated and greatly extended version of , providing a more detailed rationale for the design of and extending our previous comparative study to include by evaluating a port of the Boost Graph Library to .
2 Generic Programming and the STL
Fig. 2 reproduces the standard definition of generic programming from Jazayeri, Musser, and Loos . The generic programming methodology always consists of the following steps: 1) identify a family of useful and efficient concrete algorithms with some commonality, 2) resolve the differences by forming higher-level abstractions, and 3) lift the concrete algorithms so they operate on these new abstractions. When applicable, there is a fourth step to implement automatic selection of the best algorithm, as described in Fig. 2.
2.1 Type requirements, concepts, and models
The merge algorithm from the STL, shown in Fig. 3, serves as a good example of generic programming. The algorithm does not directly work on a particular data structure, such as an array or linked list, but instead operates on an abstract entity, a concept. A concept is a collection of requirements on a type, or to look at it a different way, it is the set of all types that satisfy the requirements. For example, the Input Iterator concept requires that the type have an increment and dereference operation, and that both are constant-time operations. (We italicize concept names.) A type that meets the requirements is said to model the concept. (It helps to read “models” as “implements”.) For example, the models of the Input Iterator concept include the built-in pointer types, such as int*, the iterator type for the std::list class, and the istream_iterator adaptor. Constraints on type parameters are primarily expressed by requiring the corresponding type arguments to model certain concepts. In the merge template, the argument for InIter1 is required to model the Input Iterator concept. Type requirements are not expressible in C++, so the convention is to specify type requirements in comments or documentation as in Fig. 3.
The type requirements for merge refer to relationships between types, such as the value_type of InIter1. This is an example of an associated type, which maps between types that are part of a concept. The merge algorithm also needs to express that the value_type of InIter1 and InIter2 are the same, which we call same-type constraints. Furthermore, the merge algorithm includes an example of how associated types and modeling requirements can be combined: the value_type of the input iterators is required to be Less Than Comparable.
Fig. 4 shows the definition of the Input Iterator concept following the presentation style used in the SGI STL documentation [30, 31]. In the description, the variable X is used as a place holder for the modeling type. The Input Iterator concept requires several associated types: value_type, difference_type, and iterator_category. Associated types change from model to model. For example, the associated value_type for int* is int and the associated value_type for list<char>::iterator is char. The Input Iterator concept requires that the associated types be accessible via the iterator_traits class. (Traits classes are discussed in Section 2.4). The count algorithm, which computes the number of occurrences of a value within a sequence, is a simple example for the need of this access mechanism, for it needs to access the difference_type to specify its return type:
The reason that count uses the iterator-specific difference_type instead of int is to accommodate iterators that traverse sequences that may be too long to be measured with an int.
In general, a concept may consist of the following kinds of requirements.
are analogous to inheritance. They allow one concept to include the requirements from another concept.
specify the functions that must be implemented for the modeling type.
- associated types
specify mappings between types, and in C++ are provided using traits classes, which we discuss in Section 2.4.
- nested requirements
include requirements on associated types such as modeling a certain concept or being the same-type as another type. For example, the Input Iterator concept requires that the associated difference_type be a signed integral type.
- semantic invariants
specify behavioral expectations about the modeling type.
- complexity guarantees
specify constraints on how much time or space may be used by an operation.
2.2 Overview of the STL
The high-level structure of the STL is shown in Fig. 5. The STL contains over fifty generic algorithms and 18 container classes. The generic algorithms are implemented in terms of a family of iterator concepts, and the containers each provide iterators that model the appropriate iterator concepts. As a result, the STL algorithms may be used with any of the STL containers. In fact, the STL algorithms may be used with any data structure that exports iterators with the required capabilities.
Fig. 6 shows the hierarchy of STL’s iterator concepts. An arrow indicates that the source concept is a refinement of the target. The iterator concepts arose from the requirements of algorithms: the need to express the minimal requirements for each algorithm. For example, the merge algorithm passes through a sequence once, so it only requires the basic requirements of Input Iterator for the two ranges it reads from and Output Iterator for the range it writes to. The search algorithm, which finds occurrences of a particular subsequence within a larger sequence, must make multiple passes through the sequence so it requires Forward Iterator. The inplace_merge algorithm needs to move backwards and forwards through the sequence, so it requires Bidirectional Iterator. And finally, the sort algorithm needs to jump arbitrary distances within the sequence, so it requires Random Access Iterator. (The sort function uses the introsort algorithm  which is partly based on quicksort .) Grouping type requirements into concepts enables significant reuse of these specifications: the Input Iterator concept is directly used as a type requirement in over 28 of the STL algorithms. The Forward Iterator, which refines Input Iterator, is used in the specification of over 22 STL algorithms.
The STL includes a handful of common data structures. When one of these data structures does not fulfill some specialized purpose, the programmer is encouraged to implement the appropriate specialized data structure. All of the STL algorithms can then be made available for the new data structure at the small cost of implementing iterators.
Many of the STL algorithms are higher-order: they take functions as parameters, allowing the user to customize the algorithm to their own needs. The STL defines over 25 function objects for creating and composing functions.
The STL also contains a collection of adaptor classes, which are parameterized classes that implement some concept in terms of the type parameter (which is the adapted type). For example, the back\_insert\_iterator adaptor implements Output Iterator in terms of any model of Back Insertion Sequence. The generic copy algorithm can then be used with back\_insert\_iterator to append some integers to a list. Adaptors play an important role in the plug-and-play nature of the STL and enable a high degree of reuse.
2.3 The problem of argument dependent name lookup in C++
In C++, uses of names inside of a template definition, such as the use of operator< inside of merge, are resolved after instantiation. For example, when merge is instantiated with an iterator whose value_type is of type foo::bar, overload resolution looks for an operator< defined for foo::bar. If there is no such function defined in the scope of merge, the C++ compiler also searches the namespace where the arguments’ types are defined, so looks for operator< in namespace foo. This rule is known as argument dependent lookup (ADL).
The combination of implicit instantiation and ADL makes it convenient to call generic functions. This is a nice improvement over passing concept operations as explicit arguments to a generic function, as in the inc example from Section 1. However, ADL has two flaws. The first problem is that the programmer calling the generic algorithm no longer has control over which functions are used to satisfy the concept operations. Suppose that namespace foo is a third party library and the application programmer writing the main function has defined his own operator< for foo::bar. ADL does not find this new operator<.
The second and more severe problem with ADL is that it opens a hole in the protection that namespaces are suppose to provide. ADL is applied uniformly to all name lookup, whether or not the name is associated with a concept in the type requirements of the template. Thus, it is possible for calls to helper functions to get hijacked by functions with the same name in other namespaces. Fig. 7 shows an example of how this can happen. The function template lib::generic_fun calls load with the intention of invoking lib::load. In main we call generic_fun with an object of type foo::bar, so in the call to load, x also has type foo::bar. Thus, argument dependent lookup also consider namespace foo when searching for load. There happens to be a function named load in namespace foo, and it is a slightly better match than lib::foo, so it is called instead, thereby hijacking the call to load.
2.4 Traits classes, template specialization, and separate type checking
The traits class idiom plays an important role in writing generic algorithms in C++. Unfortunately there is a deep incompatibility between the underlying language feature, template specialization, and our goal of separate type checking.
A traits class  maps from a type to other types or functions. Traits classes rely on C++ template specialization to perform this mapping. For example, the following is the primary template definition for iterator_traits.
A specialization of iterator_traits is defined by specifying particular type arguments for the template parameter and by specifying an alternate body for the template. The code below shows a user-defined iterator class, named my_iter, and a specialization of iterator_traits for my_iter.
When the type iterator_traits<my_iter> is used in other parts of the program it refers to the above specialization. In general, a template use refers to the most specific specialization that matches the template arguments, if there is one, or else it refers to an instantiation of the primary template definition.
The use of iterator_traits within a template (and template specialization) represents a problem for separate compilation. Consider how a compiler might type check the following unique_copy function template.
To check the first line of the body, the compiler needs to know that the type of *first is the same type as (or at least convertible to) the value_type member of iterator_traits<InIter>. However, prior to instantiation, the compiler does not know what type InIter will be instantiated to, and which specialization of iterator_traits to choose (and different specializations may have different definitions of the value_type).
Thus, if we hope to provide modular type checking, we must develop and alternative to using traits classes for accessing associated types.
2.5 Concept-based overloading using the tag dispatching idiom
One of the main points in the definition of generic programming in Fig. 2 is that it is sometimes necessary to provide more than one generic algorithm for the same purpose. When this happens, the standard approach in C++ libraries is to provide automatic dispatching to the appropriate algorithm using the tag dispatching idiom or enable_if . Fig. 8 shows the advance algorithm of the STL as it is typically implemented using the tag dispatching idiom. The advance algorithm moves an iterator forward (or backward) n positions. There are three overloads of advance_dispatch, each with an extra iterator tag parameter. The C++ Standard Library defines the following iterator tag classes, with their inheritance hierarchy mimicking the refinement hierarchy of the corresponding concepts.
The main advance function obtains the tag for the particular iterator from iterator_traits and then calls advance_dispatch. Normal static overload resolution then chooses the appropriate overload of advance_dispatch. Both the use of traits and the overload resolution rely on knowing actual argument types of the template and the late type checking of C++ templates. So the tag dispatching idiom provides another challenge for designing a language for generic programming with separate type checking.
2.6 Reverse iterators and conditional models
The reverse_iterator class template adapts a model of Bidirectional Iterator and implements Bidirectional Iterator, flipping the direction of traversal so operator++ goes backwards and operator-- goes forwards. An excerpt from the reverse_iterator class template is shown below.
The reverse_iterator class template is an example of a type that models a concept conditionally: if Iter models Random Access Iterator, then so does reverse_iterator<Iter>. The definition of reverse_iterator defines all the operations, such as operator+, required of a Random Access Iterator. The implementations of these operations rely on the Random Access Iterator operations of the underlying Iter. One might wonder why reverse_iterator can be used on iterators such as list<int>::iterator that are bidirectional but not random access. The reason this works is that a member function such as operator+ is type checked and compiled only if it is used. For we need a different mechanism to handle this, since function definitions are always type checked.
2.7 Summary of language requirements
In this section we surveyed how generic programming is accomplished in C++, taking note of the variety of language features and idioms that are used in current practice. In this section we summarize the findings as a list of requirements for a language to support generic programming.
The language provides type parameterized functions with the ability to express constraints on the type parameters. The definitions of parameterized functions are type checked independently of how they are instantiated.
The language provides a mechanism, such as “concepts”, for naming and grouping requirements on types, and a mechanism for composing concepts (refinement).
Type requirements include:
requirements for functions and parameterized functions
requirements on associated types
The language provides an implicit mechanism for providing type-specific operations to a generic function, but this mechanism should maintain modularity (in contrast to argument dependent lookup in C++).
The language implicitly instantiates generic functions when they are used.
The language provides a mechanism for concept-based dispatching between algorithms.
The language provides function expressions and function parameters.
The language supports conditional modeling.
3 The Design of
is a statically typed imperative language with syntax and memory model similar to C++. We have implemented a compiler that translates to C++, but could also be interpreted or compiled to byte-code. Compilation units are separately type checked and may be separately compiled, relying only on forward declarations from other compilation units (even compilation units containing generic functions and classes). The languages features of that support generic programming are the following:
Concept and model definitions, including associated types and same-type constraints;
Constrained polymorphic functions, classes, structs, and type-safe unions;
Implicit instantiation of polymorphic functions; and
Concept-based function overloading.
In addition, includes the basic types and control constructs C++.
The following grammar defines the syntax for concepts.
The grammar variable is for concept names and is for type variables. The type variables are place holders for the modeling type (or a list of types for multi-type concepts). and are function signatures and definitions, whose syntax we introduce later in this section. In a concept, a function signature says that a model must define a function with the specified signature. A function definition in a concept provides a default implementation.
The syntax type ; declares an associated type; a model of the concept must provide a type definition for the given type name. The syntax == introduces a same type constraint. In the context of a model definition, the two type expressions must refer to the same type. When the concept is used in the type requirements of a polymorphic function or class, this type equality may be assumed. Type equality in is non-trivial, and is explained in Section 3.9. Concepts may be composed with refines and require. The distinction is that refinement brings in the associated types from the “super” concept. Fig. 9 shows an example of a concept definition in , the definition of InputIterator.
The modeling relation between a type and a concept is established with a model definition using the following syntax.
The following shows an example of the Monoid concept and a model definition that makes int a model of Monoid, using addition for the binary operator and zero for the identity element.
A model definition must satisfy all requirements of the concept. Requirements for associated types are satisfied by type definitions. Requirements for operations may be satisfied by function definitions in the model, by the where clause, or by functions in the lexical scope preceding the model definition. Refinements and nested requirements are satisfied by preceding model definitions in the lexical scope or by the where clause.
A model may be parameterized by placing type variables inside <>’s after the model keyword. The following definition establishes that all pointer types are models of InputIterator.
The optional where clause in a model definition can be used to introduce constraints on the type variables. Constraints are either modeling constraints or same-type constraints.
Using the where clause we can express conditional modeling. As mentioned in Section 2.6, we need conditional modeling to say that reverse_iterator is a model of Random Access Iterator whenever the underlying iterator is. Fig. 10 shows is a model definition that says just this.
The rules for type checking parameterized model definitions with constraints is essentially the same as for generic functions, which we discuss in Section 3.4.
3.3 Nominal versus structural conformance
One of the fundamental design choices of was to include model definitions. After all, it is possible to instead have the compiler figure out when a type has implemented all of the requirements of a concept. We refer to the approach of using explicit model definitions nominal conformance whereas the implicit approach we call structural conformance. An example of the nominal versus structural distinction can be seen in the example below. Do the concepts create two ways to refer to the same concept or are they different concepts that happen to have the same constraints?
With nominal conformance, the above are two different concepts, whereas with structural conformance, A and B are two names for the same concept. Examples of language mechanisms providing nominal conformance include Java interfaces and Haskell type classes. Examples of language mechanisms providing structural conformance include ML signatures , Objective Caml object types , CLU type sets , and Cforall specifications .
Choosing between nominal and structural conformance is difficult because both options have good arguments in their favor.
Structural conformance is more convenient than nominal conformance With nominal conformance, the modeling relationship is established by an explicit declaration. For example, a Java class declares that it implements an interface. In Haskell, an instance declaration establishes the conformance between a particular type and a type class. When the compiler sees the explicit declaration, it checks whether the modeling type satisfies the requirements of the concept and, if so, adds the type and concept to the modeling relation.
Structural conformance, on the other hand, requires no explicit declarations. Instead, the compiler determines on a need-to-know basis whether a type models a concept. The advantage is that programmers need not spend time writing explicit declarations.
Nominal conformance is safer than structural conformance The usual argument against structural conformance is that it is prone to accidental conformance. The classic example of this is a cowboy object being passed to something expecting a Window . The Window interface includes a draw() method, which the cowboy has, so the type system does not complain even though something wrong has happened. This is not a particularly strong argument because the programmer has to make a big mistake for this kind accidental conformance to occur.
However, the situation changes for languages that support concept-based overloading. For example, in Section 2.5 we discussed the tag-dispatching idiom used in C++ to select the best advance algorithm depending on whether the iterator type models Random Access Iterator or only Input Iterator. With concept-based overloading, it becomes possible for accidental conformance to occur without the programmer making a mistake. The following C++ code is an example where an error would occur if structural conformance were used instead of nominal.
The vector class has two versions of insert, one for models of Input Iterator and one for models of Forward Iterator. An Input Iterator may be used to traverse a range only a single time, whereas a Forward Iterator may traverse through its range multiple times. Thus, the version of insert for Input Iterator must resize the vector multiple times as it progresses through the input range. In contrast, the version of insert for Forward Iterator is more efficient because it first discovers the length of the range (by calling std::distance, which traverses the input range), resizes the vector to the correct length, and then initializes the vector from the range.
The problem with the above code is that istream_iterator fulfills the syntactic requirements for a Forward Iterator but not the semantic requirements: it does not support multiple passes. That is, with structural conformance, there is a false positive and insert dispatches to the version for Forward Iterators. The program resizes the vector to the appropriate size for all the input but it does not initialize the vector because all of the input has already been read.
Why not both? It is conceivable to provide both nominal and structural conformance on a concept-by-concept basis, which is in fact the approach used in the concept extension for C++0X. Concepts that are intended to be used for dispatching could be nominal and other concepts could be structural. This matches the current C++ practice: some concepts come with traits classes that provide nominal conformance whereas other concepts do not (the default situation with C++ templates is structural conformance). However, providing both nominal conformance and structural conformance complicates the language, especially for programmers new to the language, and degrades its uniformity. Therefore, with we provide only nominal conformance, giving priority to safety and simplicity over convenience.
3.4 Generic Functions
The syntax for generic functions is shown below. The name of the function is the identifier after fun, the type parameters are between the <>’s and are constrained by the requirement in the where clause. A function’s parameters are between the ()’s and the return type of a function comes after the ->.
The default parameter passing mode in is read-only pass-by-reference. Read-write pass-by-reference is indicated by ! and pass-by-value is indicated by @.
The merge algorithm, implemented as a generic function in , is shown in Fig. 11. The function is parameterized on three types: Iter1, Iter2, and Iter3. The dot notation is used to refer to a member of a model, including associated types such as the value type of an iterator.
The Output Iterator concept used in the merge function is an example of a multi-parameter concept. It has a type parameter X for the iterator and a type parameter T for the type that can be written to the iterator. The following is the definition of the Output Iterator concept.
In general the body of a generic function contains a sequence of statements. Syntax for some of the statements in is defined in the following grammar.
The let form introduces local variables, deducing the type of the variable from the right-hand side expression (similar to the auto proposal for C++0X ).
The body of a generic function is type checked separately from any instantiation of the function. The type parameters are treated as abstract types so no type-specific operations may be applied to them unless otherwise specified by the where clause. The where clause introduces surrogate model definitions and function signatures (for all the required concept operations) into the scope of the function.
Multiple functions with the same name may be defined, and static overload resolution is performed by to decide which function to invoke at a particular call site depending on the argument types and also depending on which model definitions are in scope. When more than one overload may be called, the most specific overload is called (if one exists) according to the rules described in Section 3.10.
3.5 Function calls and implicit instantiation
The syntax for calling functions (or polymorphic functions) is the C-style notation:
Arguments for the type parameters of a polymorphic function need not be supplied at the call site: will deduce the type arguments by unifying the types of the arguments with the types of the parameters and then implicitly instantiate the polymorphic function. The design issues surrounding implicit instantiation are described below. All of the requirements in the where clause must be satisfied by model definitions in the lexical scope preceding the function call, as described in Section 3.6. The following is an example of calling the generic accumulate function. In this case, the generic function is implicitly instantiated with type argument int*.
A polymorphic function may be explicitly instantiated using this syntax:
Following Mitchell  we view implicit instantiation as a kind of coercion that transforms an expression of one type to another type. In the example above, the accumulate function was coerced from
There are several kinds of implicit coercions in , and together they form a subtyping relation . The subtyping relation is reflexive and transitive. Like C++, contains some bidirectional implicit coercions, such as float double and double float, so is not anti-symmetric. The subtyping relation for is defined by a set of subtyping rules. The following is the subtyping rule for generic function instantiation.
The type parameters are substituted for type arguments and the constraints in the where clause must be satisfied in the current environment. To apply this rule, the compiler must choose the type arguments. We call this type argument deduction and discuss it in more detail momentarily. Constraint satisfaction is discussed in Section 3.6.
The subtyping relation allows for coercions during type checking according to the subsumption rule:
The (Sub) rule is not syntax-directed so its addition to the type system would result in a non-deterministic type checking algorithm. The standard workaround is to omit the above rule and instead allow coercions in other rules of the type system such as the rule for function application. The following is a rule for function application that allows coercions in both the function type and in the argument types.
As mentioned above, the type checker must guess the type arguments to apply the (Inst) rule. In addition, the (App) rule includes several types that appear from nowhere: and . The problem of deducing these types is equivalent to trying to find solutions to a system of inequalities. Consider the following example program.
The application apply(id, 0) type checks if there is a solution to the following system:
The following type assignment is a solution to the above system.
Unfortunately, not all systems of inequalities are as easy to solve as the above example. In fact, with Mitchell’s original set of subtyping rules, the problem of solving systems of inequalities was proved undecidable by Tiuryn and Urzyczyn . There are several approaches to dealing with this undecidability.
Remove the (Arrow) rule. Mitchell’s subtyping relation included the usual co/contravariant rule for functions.
The (Arrow) rule is nice to have because it allows a function to be coerced to a different type so long as the parameter and return types are coercible in the appropriate way. In the following example the standard ilogb function is passed to foo even though it does not match the expected type. The (Arrow) rule allows for this coercion because int is coercible to double.
However, the (Arrow) rule is one of the culprits in the undecidability of the subtyping problem; removing it makes the problem decidable . The language of Le Botlan and Remy  takes this approach, and for the time being, so does . With this restriction, type argument deduction is reduced to the variation of unification defined in . Instead of working on a set of variable assignments, this unification algorithm keeps track of either a type assignment or the tightest lower bound seen so far for each variable. The (App) rule for is reformulated as follows to use this unify algorithm.
In languages where functions are often written in curried form, it is important to provide even more flexibility than in the above (App) rule by postponing instantiation, as is done in . Consider the apply example again, but this time written in curried form.
In the first application apply(id) we do not yet know that T should be bound to int. The instantiation needs to be delayed until the second application apply(id)(0). In general, each application contributes to the system of inequalities that needs to be solved to instantiate the generic function. In , the return type of each application encodes a partial system of inequalities. The inequalities are recorded in the types as lower bounds on type parameters. The following is an example of such a type.
Postponing instantiation is not as important in because functions take multiple parameters and currying is seldom used.
Removal of the arrow rule means that, in some circumstances, the programmer would have to wrap a function inside another function before passing the function as an argument.
Restrict the language to predicative polymorphism Another alternative is to restrict the language so that only monotypes (non-generic types) may be used as the type arguments in an instantiation. This approach is used in by Odersky and Läufer  and also by Peyton Jones and Shields . However, this approach reduces the expressiveness of the language for the sake of the convenience of implicit instantiation.
Restrict the language to second-class polymorphism Restricting the language of types to disallow polymorphic types nested inside other types is another way to make the subtyping problem decidable. With this restriction the subtyping problem is solved by normal unification. Languages such as SML and Haskell 98 use this approach. Like the restriction to predicative polymorphism, this approach reduces the expressiveness of the language for the sake of implicit instantiation (and type inference). However, there are many motivating use cases for first-class polymorphism , so throwing out first-class polymorphism is not our preferred alternative.
Use a semi-decision procedure Yet another alternative is to use a semi-decision procedure for the subtyping problem. The advantage of this approach is that it allows implicit instantiation to work in more situations, though it is not clear whether this extra flexibility is needed in practice. The down side is that there are instances of the subtyping problem where the procedure diverges and never returns with a solution.
3.6 Model lookup (constraint satisfaction)
The basic idea behind model lookup is simple although some of the details are a bit complicated. Consider the following program containing a generic function foo with a requirement for C<T>.
At the call foo(0), the compiler deduces the binding T=int and then seeks to satisfy the where clause, with int substituted for T. In this case the constraint C<int> must be satisfied. In the scope of the call foo(0) there is a model definition for C<int>, so the constraint is satisfied. We call C<int> the model head.
3.6.1 Lexical scoping of models
The design choice to look for models in the lexical scope of the instantiation is an important choice for , and differentiates it from both Haskell and the concept extension for C++. This choice improves the modularity of by preventing model declarations in separate modules from accidentally conflicting with one another.
For example, in Fig. 12 we create sum and product functions in modules A and B respectively by instantiating accumulate in the presence of different model declarations. This example would not type check in Haskell, even if the two instance declarations were to be placed in different modules, because instance declarations implicitly leak out of a module when anything in the module is used by another module. This example would be illegal in C++0X concept extension because 1) model definitions must appear in the same namespace as their concept, and 2) if placed in the same namespace, the two model definitions would violate the one-definition-rule.
It is also quite possible for separately developed modules to include model definitions that accidentally overlap. In , this is not a problem, as the model definitions will each apply within their own module. Model definitions may be explicitly imported from one module to another. The syntax for modules and import declarations is shown below. An interesting extension would be parameterized modules, but we leave that for future work.
3.6.2 Constrained models
In , a model definition may itself be parameterized and the type parameters constrained by a where clause. Fig. 13 shows a typical example of a parameterized model. The model definition in the example says that for any type T, list<T> is a model of Comparable if T is a model of Comparable. Thus, a model definition is like an inference rule or a Horn clause  in logic programming. For example, a model definition of the form
corresponds to the Horn clause:
The model definitions from the example in Fig. 13 could be represented in Prolog with the following two rules:
The algorithm for model lookup is essentially a logic programming engine: it performs unification and backward chaining (similar to how instance lookup is performed in Haskell). Unification is used to determine when the head of a model definition matches. For example, in Fig. 13, in the call to generic\_foo the constraint Comparable< list<int> > needs to be satisfied. There is a model definition for Comparable< list<T> > and unification of list<int> and list<T> succeeds with the type assignment T int. However, we have not yet satisfied Comparable< list<int> > because the where clause of the parameterized model must also be satisfied. The model lookup algorithm therefore proceeds recursively and tries to satisfy Comparable<int>, which in this case is trivial. This process is called backward chaining: it starts with a goal (a constraint to be satisfied) and then applies matching rules (model definitions) to reduce the goal into subgoals. Eventually the subgoals are reduced to facts (model definitions without a where clause) and the process is complete. As is typical of Prolog implementations, processes subgoals in a depth-first manner.
It is possible for multiple model definitions to match a constraint. When this happens the most specific model definition is used, if one exists. Otherwise the program is ill-formed. We say that definition is a more specific model than definition if the head of is a substitution instance of the head of and if the where clause of implies the where clause of . In this context, implication means that for every constraint in the where clause of , is satisfied in the current environment augmented with the assumptions from the where clause of .
places very few restrictions on the form of a model definition. The only restriction is that all type parameters of a model must appear in the head of the model. That is, they must appear in the type arguments to the concept being modeled. For example, the following model definition is ill formed because of this restriction.
This restriction ensures that unifying a constraint with the model head always produces assignments for all the type parameters.
Horn clause logic is by nature powerful enough to be Turning-complete. For example, it is possible to express general recursive functions. The program in Fig. 14 computes the Ackermann function at compile time by encoding it in model definitions. This power comes at a price: determining whether a constraint is satisfied by a set of model definitions is in general undecidable. Thus, model lookup is not guaranteed to terminate and programmers must take some care in writing model definitions. We could restrict the form of model definitions to achieve decidability however there are two reasons not to do so. First, restrictions would complicate the specification of and make it harder to learn. Second, there is the danger of ruling out useful model definitions.
3.7 Improved error messages
In the introduction we showed how users of generic libraries in C++ are plagued by hard to understand error messages. The introduction of concepts and where clauses in solves this problem. The following is the same misuse of the stable\_sort function, but this time written in .
In contrast to long C++ error message (Fig. 1), in we get the following:
A related problem that plagues authors of generic C++ libraries is that type errors often go unnoticed during library development. Again, this is because C++ delays type checking templates until instantiation. One of the reasons for such type errors is that the implementation of a template is not consistent with its documented type requirements.
This problem is directly addressed in : the implementation of a generic function is type-checked with respect to its where clause, independently of any instantiations. Thus, when a generic function successfully compiles, it is guaranteed to be free of type errors and the implementation is guaranteed to be consistent with the type requirements in the where clause.
Interestingly, while implementing the STL in , the type checker caught several errors in the STL as defined in C++. One such error was in replace\_copy. The implementation below was translated directly from the GNU C++ Standard Library, with the where clause matching the requirements for replace\_copy in the C++ Standard .
The compiler gives the following error message:
This is a subtle bug, which explains why it has gone unnoticed for so long. The type requirements say that both the value type of the iterator and T must be writable to the output iterator, but the requirements do not say that the value type and T are the same type, or coercible to one another.
3.8 Generic classes, structs, and unions
The syntax for generic classes, structs, and unions is defined below. The grammar variable is for class, struct, and union names.
Classes consist of data members, constructors, and a destructor. There are no member functions; normal functions are used instead. Data encapsulation (public/private) is specified at the module level instead of inside the class. Class, struct, and unions are used as types using the syntax below. Such a type is well-formed if the type arguments are well-formed and if the requirements in its where clause are satisfied.
3.9 Type equality
There are several language constructions in that make it difficult to decide when two types are equal. Generic functions complicate type equality because the names of the type parameters do not matter. So, for example, the following two function types are equal:
The order of the type parameters does matter (because a generic function may be explicitly instantiated) so the following two types are not equal.
Inside the scope of a generic function, type parameters with different names are assumed to be different types (this is a conservative assumption). So, for example, the following program is ill formed because variable a has type S whereas function f is expecting an argument of type T.
Associated types and same-type constraints also affect type equality. First, if there is a model definition in the current scope such as:
then we have the equality C<int>.bar bool.
Inside the scope of a generic function, same-type constraints help determine when two types are equal. For example, the following version of foo is well formed:
There is a subtle difference between the above version of foo and the following one. The reason for the difference is that same-type constraints are checked after type argument deduction.
In the first call to foo_1 the compiler deduces T=double and S=double from the arguments id and 1.0. The compiler then checks the same-type constraint T == S, which in this case is satisfied. For the second call to foo_1, the compiler deduces T=double and S=int and then the same-type constraint T == S is not satisfied. The first call to foo_2 is straightforward. For the second call to foo_2, the compiler deduces T=double from the type of id and the argument 1 is implicitly coerced to double.
Type equality is a congruence relation, which means several things. First it means type equality is an equivalence relation, so it is reflexive, transitive, and symmetric. Thus, for any types , , and we have
For example, the following function is well formed:
The type expression R (the type of a) and the type expression T (the parameter type of f) both denote the same type.
The second aspect of type equality being a congruence is that it propagates in certain ways with respect to type constructors. For example, if we know that S T then we also know that fun(S)->S fun(T)->T. Similarly, if we have defined a generic struct such as:
then S T implies bar<S> bar<T>. The propagation of equality also goes in the other direction. For example, bar<S> bar<T> implies that S T. The congruence extends to associated types. So S T implies C<S>.bar C<T>.bar. However, for associated types, the propagation does not go in the reverse direction. So C<S>.bar C<T>.bar does not imply that S T. For example, given the model definitions
we have C<int>.bar C<float>.bar but this does not imply that int float.
Like type parameters, associated types are in general assumed to be different from one another. So the following program is ill-formed:
The next program is also ill formed.
In the compiler for we use the congruence closure algorithm by Nelson and Oppen  to keep track of which types are equal. The algorithm is efficient: time complexity on average, where is the number of types. It has time complexity in the worst case. This can be improved by instead using the Downey-Sethi-Tarjan algorithm which is in the worst case .
3.10 Function overloading and concept-based overloading
Multiple functions with the same name may be defined and static overload resolution is performed to decide which function to invoke at a particular call site. The resolution depends on the argument types and on the model definitions in scope. When more than one overload may be called, the most specific overload is called if one exists. The basic overload resolution rules are based on those of C++.
In the following simple example, the second foo is called.
The first foo has the wrong number of arguments, so it is immediately dropped from consideration. The second and fourth are given priority over the third because they can exactly match the argument type int (for the fourth, type argument deduction results in T=int), whereas the third foo requires an implicit coercion from int to double. The second foo is favored over the fourth because it is more specific.
A function is a more specific overload than function if is callable from but not vice versa. A function is callable from function if you could call from inside , forwarding all the parameters of as arguments to , without causing a type error. More formally, if has type and has type then is callable from if
for some .
In general there may not be a most specific overload in which case the program is ill-formed. In the following example, both foo’s are callable from each other and therefore neither is more specific.
In the next example, neither foo is callable from the other so neither is more specific.
In Section 2.5 we showed how to accomplish concept-based overloading of several versions of advance using the tag dispatching idiom in C++. Fig. 15 shows three overloads of advance implemented in . The signatures for these overloads are the same except for their where clauses. The concept BidirectionalIterator is a refinement of InputIterator, so the second version of advance is more specific than the first. The concept RandomAccessIterator is a refinement of BidirectionalIterator, so the third advance is more specific than the second.
The code in Fig. 16 shows two calls to advance. The first call is with an iterator for a singly-linked list. This iterator is a model of InputIterator but not RandomAccessIterator; the overload resolution chooses the first version of advance. The second call to advance is with a pointer which is a RandomAccessIterator so the second version of advance is called.
Concept-based overloading in is entirely based on static information available during the type checking and compilation of the call site. This presents some difficulties when trying to resolve to optimized versions of an algorithm from within another generic function. Section LABEL:sec:algo-dispatching discusses the issues that arise and presents an idiom that ameliorates the problem.
3.11 Function expressions
The following is the syntax for function expressions and function types.
The body of a function expression may be either a sequence of statements enclosed in braces or a single expression following a colon. The return type of a function expression is deduced from the return statements in the body, or from the single expression.
The following example computes the sum of an array using for\_each and a function expression. 222Of course, the accumulate function is the appropriate algorithm for this computation, but then the example would not demonstrate the use of function expressions.
creates a function object. The body of a function expression is not lexically scoped, so a direct use of sum in the body would be an error. The initialization p=\&sum declares a data member inside the function object with type int* and copy constructs the member with the address \&sum.
The primary motivation for non-lexically scoped function expressions is to keep the design close to C++ so that function expressions can be directly compiled to function objects in C++. However, this design has some drawbacks as we discovered while porting the STL to .
Most STL implementations implement two separate versions of find_subsequence, one written in terms of operator== and the other in terms of a function object. The version using operator== could be written in terms of the one that takes a function object, but it is not written that way. The original reason for this was to improve efficiency, but with with a modern optimizing compiler there should be no difference in efficiency: all that is needed to erase the difference is some simple inlining. The implementation we write the operator== version of find_subsequence in terms of the higher-order version. The following code shows how this is done and is a bit more complicated than we would have liked.
It would have been simpler to write the function expression as
However, this is an error in because the operator== from the EqualityComparable<..> requirement is a local name, not a global one, and is therefore not in scope for the body of the function expression. The workaround is to store the comparison function as a data member of the function object. The expression
accesses the operator== member from the model of EqualityComparable for type T.
Examples such as these are a convincing argument that lexical scoping should be allowed in function expressions, and the next generation of will support this feature.
3.12 First-class polymorphism
In the introduction we mentioned that is based on System F. One of the hallmarks of System F is that it provides first class polymorphism. That is, polymorphic objects may be passed to and returned from functions. This is in contrast to the ML family of languages, where polymorphism is second class. In Section 3.5 we discussed how the restriction to second-class polymorphism simplifies type argument deduction, reducing it to normal unification. However, we prefer to retain first-class polymorphism and use the somewhat more complicated variant of unification from .
One of the reasons to retain first-class polymorphism is to retain the expressiveness of function objects in C++. A function object may have member function templates and may therefore by used polymorphically. The following program is a simple use of first-class polymorphism in . Note that f is applied to arguments of different types.
4 Analysis of and the STL
In this section we analyze the interdependence of the language features of and generic library design in light of implementing the STL. A primary goal of generic programming is to express algorithms with minimal assumptions about data abstractions, so we first look at how the generic functions of can be used to accomplish this. Another goal of generic programming is efficiency, so we investigate the use of function overloading in to accomplish automatic algorithm selection. We conclude this section with a brief look at implementing generic containers and adaptors in .
Fig. 17 depicts a few simple STL algorithms implemented using generic functions in . The STL provides two versions of most algorithms, such as the overloads for find in Fig. 17. The first version is higher-order, taking a predicate function as its third parameter while the second version relies on operator==. Functions are first-class in , so the higher-order version is straightforward to express. As is typical in the STL, there is a high-degree of internal reuse: remove uses remove\_copy and and find.
Figures 18 and 19 show the STL iterator hierarchy as represented in . Required operations are expressed in terms of function signatures, and associated types are expressed with a nested type requirement. The refinement hierarchy is established with the refines clauses and nested model requirements with require. The semantic invariants and complexity guarantees of the iterator concepts are not expressible in as they are beyond the scope of its type system.
4.3 Automatic Algorithm Selection
To realize the generic programming efficiency goals, provides mechanisms for automatic algorithm selection. The following code shows two overloads for copy. (We omit the third overload to save space.) The first version is for input iterators and the second for random access, which uses an integer counter thereby allowing some compilers to better optimize the loop. The two signatures are the same except for the where clause.
The use of dispatching algorithms such as copy inside other generic algorithms is challenging because overload resolution is based on the surrogate models from the where clause and not on models defined for the instantiating type arguments. (This rule is needed for separate type checking and compilation). Thus, a call to an overloaded function such as copy may resolve to a non-optimal overload. Consider the following implementation of merge. The Iter1 and Iter2 types are required to model InputIterator and the body of merge contains two calls to copy.
This merge function always calls the slow version of copy even though the actual iterators may be random access. In C++, with tag dispatching, the fast version of copy is called because the overload resolution occurs after template instantiation. However, C++ does not have separate type checking for templates.
To enable dispatching for copy, the type information at the instantiation of merge must be carried into the body of merge (suppose it is instantiated with a random access iterator). This can be done with a combination of concept and model declarations. First, define a concept with a single operation that corresponds to the algorithm.
Next, add a requirement for this concept to the type requirements of merge and replace the calls to copy with the concept operation copy\_range.
The final step of the idiom is to create parameterized model declarations for CopyRange. The where clauses of the model definitions match the where clauses of the respective overloads for copy. In the body of each copy\_range there is a call to copy which will resolve to the appropriate overload.
A call to merge with a random access iterator will use the second model to satisfy the requirement for CopyRange. Thus, when copy\_range is invoked inside merge, the fast version of copy is called. A nice property of this idiom is that calls to generic algorithms need not change. A disadvantage of this idiom is that the interface of the generic algorithms becomes more complex.
The containers of the STL are implemented in using polymorphic classes. Fig. 20 shows an excerpt of the doubly-linked list container in . As usual, a dummy sentinel node is used in the implementation. With each STL container comes iterator types that translate between the uniform iterator interface and data-structure specific operations. Fig. 20 shows the list_iterator which implements operator* in terms of x.node->data and implements operator++ by performing the assignment x.node = x.node->next.
Not shown in Fig. 20 is the implementation of the mutable iterator for list (the list\_iterator provides read-only access). The definitions of the two iterator types are nearly identical, the only difference is that operator* returns by read-only reference for the constant iterator whereas it returns by read-write reference for the mutable iterator. The code for these two iterators should be reused but does not yet have a language mechanism for this kind of reuse.
In C++ this kind of reuse can be expressed using the Curiously Recurring Template Pattern (CRTP)  and by parameterizing the base iterator class on the return type of operator*. This approach can not be used in because the parameter passing mode may not be parameterized. Further, the semantics of polymorphism in does not match the intended use here, we want to generate code for the two iterator types at library construction time. A separate generative mechanism is needed to complement the generic features of . As a temporary solution, we used the m4 macro system to factor the common code from the iterators. The following is an excerpt from the implementation of the iterator operators.
The reverse\_iterator class is a representative example of an STL adaptor.