Efficient Type Checking for Path Polymorphism

# Efficient Type Checking for Path Polymorphism

## Abstract

A type system combining type application, constants as types, union types (associative, commutative and idempotent) and recursive types has recently been proposed for statically typing path polymorphism, the ability to define functions that can operate uniformly over recursively specified applicative data structures. A typical pattern such functions resort to is which decomposes a compound, in other words any applicative tree structure, into its parts. We study type-checking for this type system in two stages. First we propose algorithms for checking type equivalence and subtyping based on coinductive characterizations of those relations. We then formulate a syntax-directed presentation and prove its equivalence with the original one. This yields a type-checking algorithm which unfortunately has exponential time complexity in the worst case. A second algorithm is then proposed, based on automata techniques, which yields a polynomial-time type-checking algorithm.

-Calculus, Pattern Matching, Path Polymorphism, Type Checking
\theoremstyle

plain 2]Juan Edi 1,2]Andrés Viso 1,3]Eduardo Bonelli 1]Consejo Nacional de Investigaciones Científicas y Técnicas – CONICET 2]Departamento de Computación
Facultad de Ciencias Exactas y Naturales
Universidad de Buenos Aires – UBA
Buenos Aires, Argentina 3]Departamento de Ciencia y Tecnología
Universidad Nacional de Quilmes – UNQ
Bernal, Argentina \CopyrightJuan Edi, Andrés Viso, and Eduardo Bonelli\subjclass F4.1 Lambda calculus and related systems, F.3.2 Semantics of Programming Languages, D.3.3 Language Constructs and Features. \serieslogo\volumeinfoTarmo Uustalu121st International Conference on Types for Proofs and Programs (TYPES 2015)6911\EventShortNameTYPES 2015 \DOI10.4230/LIPIcs.xxx.yyy.p

## 1 Introduction

The lambda-calculus plays an important role in the study of programming languages (PLs). Programs are represented as syntactic terms and execution by repeated simplification of these terms using a reduction rule called -reduction. The study of the lambda-calculus has produced deep results in both the theory and the implementation of PLs. Many variants of the lambda-calculus have been introduced with the purpose of studying specific PL features. One such feature of interest is pattern-matching. Pattern-matching is used extensively in PLs as a means for writing more succinct and, at the same time, elegant programs. This is particularly so in the functional programming community, but by no means restricted to that community.

In the standard lambda-calculus, functions are represented as expressions of the form , being the formal parameter and the body. Such a function may be applied to any term, regardless of its form. This is expressed by the above mentioned -reduction rule: , where stands for the result of replacing all free occurrences of in with . Note that, in this rule, no requirement on the form of is placed. Pattern calculi are generalizations of the -reduction rule in which abstractions are replaced by where is called a pattern. An example is for projecting the first component of a pair, the pattern being . An expression such as will only be able to reduce if indeed is of the form ; it will otherwise be blocked.

Patterns may be catalogued in at least two dimensions. One is their structure and another their time of creation. The structure of patterns may be very general. Such is the case of variables: any term can match a variable, as in the standard lambda-calculus. The structure of a pattern may also be very specific. Such is the case when arbitrary terms are allowed to be patterns [vanOostrom90, DBLP:journals/tcs/KlopOV08]. Regarding the time of creation, patterns may either be static or dynamic. Static patterns are those that are created at compile time, such as the pattern mentioned above. Dynamic patterns are those that may be generated at run-time [DBLP:journals/jfp/JayK09, DBLP:books/daglib/0023687]. For example, consider the term ; note that it has an occurrence of a pattern with a free variable, namely the in , that is bound to the outermost lambda. If this term is applied to a constant , then one obtains . However, if we apply it to the constant , then we obtain . Both patterns and are created during execution. Note that one could also replace the in the pattern with an abstraction. This leads to computations that evaluate to patterns.

Expressive pattern features may easily break desired properties, such as confluence, and are not easy to endow with type systems. This work is an attempt at devising type systems for such expressive pattern calculi. We originally set out to type-check the Pure Pattern Calculus (PPC[DBLP:journals/jfp/JayK09, DBLP:books/daglib/0023687]. PPC is a lambda-calculus that embodies the essence of dynamic patterns by stripping away everything inessential to the reduction and matching process of dynamic patterns. It admits terms such as . We soon realized that typing PPC was too challenging and noticed that the static fragment of PPC, which we dub Calculus of Applicative Patterns (CAP), was already challenging in itself. CAP also admits patterns such as however all variables in this pattern are considered bound. Thus, in a term such as both occurrences of and are bound in , disallowing reduction inside patterns. Such patterns, however, allow arguments that are applications to be decomposed, as long as these applications encode data structures. They are therefore useful for writing functions that operate on semi-structured data.

The main obstacle for typing CAP is dealing in the type system with a form of polymorphism called path polymorphism [DBLP:journals/jfp/JayK09, DBLP:books/daglib/0023687], that arises from these kinds of patterns. We next briefly describe path polymorphism and the requirements it places on typing considerations.

Path Polymorphism. In CAP data structures are trees. These trees are built using application and variable arity constants or constructors. Examples of two such trees follow, where the first one represents a list and the second a binary tree:

 cons(vl1)(cons(vl2)nil)node(vl3)(node(vl4)nilnil)(node(vl5)nilnil)

The constructor is used to tag values ( and in the first case, and , and in the second). A “map” function for updating the values of any of these two structures by applying some user-supplied function follows, where type annotations are omitted for clarity:

 upd=f\shortrightarrow(vlz\shortrightarrowvl(fz)|xy\shortrightarrow(updfx)(updfy)|w\shortrightarroww) (1)

The expression may thus be applied to any of the two data structures illustrated above. For example, we can evaluate and also . The expression to the right of “=” is called an abstraction (or case) and consists of a unique branch; this branch in turn is formed from a pattern (), and a body (in this case the body is itself another abstraction that consists of three branches). An argument to an abstraction is matched against the patterns, in the order in which they are written, and the appropriate body is selected.

Notice the pattern . During evaluation of the variables and may be instantiated with different applicative terms in each recursive call to . For example:

The type assigned to and should encompass all terms in its respective column.

Singleton Types and Type Application. A further consideration in typing CAP is that terms such as the ones depicted below should clearly not be typable.

 (nil\shortrightarrow0)cons(vlx\shortrightarrow{x:Nat}x+1)(vltrue) (2)

In the first case, will never match . The type system will resort to singleton types in order to determine this: will be assigned a type of the form which will fail to match . The second expression in (2) breaks Subject Reduction (SR): reduction will produce . Applicative types of the form will allow us to check for these situations, being a new type constructor that applies datatypes to arbitrary types. Also, note the use of typing environments (the expression ) to declare the types of the variables of patterns in branches. These are supplied by the programmer.

Union and Recursive Types. On the assumption that the programmer has provided an exhaustive coverage, the type assigned by CAP to the variable in the pattern in is:

 μα.(vl@A)⊕(α@α)⊕(cons⊕node⊕nil)

Here is the recursive type constructor and the union type constructor. is the singleton type used for typing the constant and denotes type application, as mentioned above. The union type constructor is used to collect the types of all the branches. The variable in the pattern will also be assigned the same type as . Thus variables in applicative patterns are assigned union types. itself is assigned type , where is .

Type-Checking for CAP. Based on these, and other similar considerations, we proposed typed CAP [DBLP:journals/entcs/VisoBA16], referred to simply as CAP in the sequel. The system consists of typing rules that combine singleton types, type application, union types, recursive types and subtyping. Also it enjoys several properties, the salient one being safety (subject reduction and progress). Safety relies on a notion of typed pattern compatibility based on subtyping that guarantees that examples such as (2–right) and the following one do not break safety:

 ((vlx\shortrightarrow{x:Bool}if x then 1 else 0)|(vly\shortrightarrow{y:Nat}y+1))(vl4) (3)

Assumptions on associativity and commutativity of typing operators in [DBLP:journals/entcs/VisoBA16], make it non-trivial to deduce a type-checking algorithm from the typing rules. The proposed type system is, moreover, not syntax-directed. Also, it relies on coinductive notions of type equivalence and subtyping which in the presence of recursive subtypes are not obviously decidable. A practical implementation of CAP is instrumental since a robust theoretical analysis without such an implementation is of little use.

Goal and Summary of Contributions. This paper addresses this implementation. It does so in two stages:

• The first stage presents a naïve but correct, high-level description of a type-checking algorithm, the principal aim being clarity. We propose an invertible presentation of the coinductive notions of type-equivalence and subtyping of [DBLP:journals/entcs/VisoBA16] and also a syntax-directed variant of the presentation in [DBLP:journals/entcs/VisoBA16]. This leads to algorithms for checking subtyping membership and equivalence modulo associative, commutative and idempotent (ACI) unions, both based on an invertible presentation of the functional generating the associated coinductive notions.

• The second stage builds on ideas from the first algorithm with the aim of improving efficiency. -types are interpreted as infinite -ary trees and represented using automata, avoiding having to explicitly handle unfoldings of recursive types, and leading to a significant improvement in the complexity of the key steps of the type-checking process, namely equality and subtype checking.

Related work. For literature on (typed) pattern calculi the reader is referred to [DBLP:journals/entcs/VisoBA16]. The algorithms for checking equality of recursive types or subtyping of recursive types have been studied in the 1990s by Amadio and Cardelli [DBLP:journals/toplas/AmadioC93]; Kozen, Palsberg, and Schwartzbach [DBLP:journals/mscs/KozenPS95]; Brandt and Henglein [DBLP:conf/tlca/BrandtH97]; Jim and Palsberg [Jim97typeinference] among others. Additionally, Zhao and Palsberg [DBLP:journals/iandc/PalsbergZ01] studied the possibilities of incorporating associative and commutative (AC) products to the equality check, on an automata-based approach that the authors themselves claimed was not extensible to subtyping [Zhao:thesis]. Later on Di Cosmo, Pottier, and Rémy [DBLP:conf/tlca/CosmoPR05] presented another automata-based algorithm for subtyping that properly handles AC products with a complexity cost of , where and are the sizes of the analyzed types, and is a bound on the arity of the involved products.

Structure of the paper. Sec. 2 reviews the syntax and operational semantics of CAP, its type system and the main properties. Further details may be consulted in [DBLP:journals/entcs/VisoBA16]. Sec. 3 proposes invertible generating functions for coinductive notions of type-equivalence and subtyping that lead to inefficient but elegant algorithms for checking these relations. Sec. 4 proposes a syntax-directed type system for CAP. Sec. 5 studies a more efficient type-checking algorithm based on automaton. Finally, we conclude in Sec. 6. An implementation of the algorithms described here is available online [EV:2015:Prototipo].

## 2 Review of Cap

### 2.1 Syntax and Operational Semantics

We assume given an infinite set of term variables and constants . CAP has four syntactic categories, namely patterns (), terms (), data structures () and matchable forms ():

 p::=x(matchable)|c(constant)|pp(compound)t::=x(variable)|c(constant)|tt(application)|p\shortrightarrowθt|…|p\shortrightarrowθt(abstraction)d::=c(constant)|dt(compound)m::=d(datastructure)|p\shortrightarrowθt|…|p\shortrightarrowθt(abstraction)

The set of patterns, terms, data structures and matchable forms are denoted , , and , resp. Variables occurring in patterns are called matchables. We often abbreviate with . The are typing contexts annotating the type assignments for the variables in (cf. Sec. 2.3). The free variables of a term (notation ) are defined as expected; in a pattern we call them free matchables (). All free matchables in each are assumed to be bound in their respective bodies . Positions in patterns and terms are defined as expected and denoted ( denotes the root position). We write for the set of positions of and for the subterm of occurring at position .

A substitution () is a partial function from term variables to terms. If it assigns to , , then we write . Its domain () is . Also, is the identity substitution. We write for the result of applying to term . We say a pattern subsumes a pattern , written if there exists such that . Matchable forms are required for defining the matching operation, described next.

Given a pattern and a term , the matching operation determines whether matches . It may have one of three outcomes: success, fail (in which case it returns the special symbol ) or undetermined (in which case it returns the special symbol ). We say is decided if it is either successful or it fails. In the former it yields a substitution ; in this case we write . The disjoint union of matching outcomes is given as follows (“” is used for definitional equality):

 Misplaced &

where denotes any possible output and if the domains of and are disjoint. This always holds given that patterns are assumed to be linear (at most one occurrence of any matchable). The matching operation is defined as follows, where the defining clauses below are evaluated from top to bottom1:

 Missing or unrecognized delimiter for \right

For example: ; ; and . We now turn to the only reduction axiom of CAP:

 \prooftree{{u/pi}}=fail% foralli

It may be applied under any context and states that if the argument to an abstraction fails to match all patterns with and successfully matches pattern (producing a substitution ), then the term reduces to .

For instance, consider the function

Then, with , while since and .

###### Proposition \thetheorem

Reduction in CAP is confluent [DBLP:journals/entcs/VisoBA16].

### 2.2 Types

In order to ensure that patterns such as decompose only data structures rather than arbitrary terms, we shall introduce two sorts of typing expressions: types and datatypes, the latter being strictly included in the former. We assume given countably infinite sets of datatype variables (), of type variables ) and of type constants . We define and use meta-variables to denote an arbitrary element in it. Likewise, we write for elements in . The sets of -datatypes and of -types, resp., are inductively defined as follows:

 D::=α(% datatypevariable)|c(atom)|D@A(compound)|D⊕D(union)|μα.D(recursion)A::=X(typevariable)|D(datatype)|A⊃A(typeabstraction)|A⊕A(union)|μX.A(recursion)
{remark}

A type of the form is not valid in general since it may produce invalid unfoldings. For example, , which fails to preserve sorting. On the other hand, types of the form are not necessary since they denote the solution to the equation , hence is a variable representing a datatype, a role already fulfilled by .

We consider to bind tighter than , while binds tighter than . E.g.  means . We write to mean that the root symbol of is different from ; and similarly with the other type constructors. Expressions such as will be abbreviated ; this is sound since -types will be considered modulo associativity of . A type of the form where each , , is called a maximal union. We often write to mean either or . A non-union -type is a -type of one of the following forms: , , , , or with a non-union -type. We assume -types are contractive: is contractive if occurs in only under a type constructor or , if at all. For instance, , and are contractive while and are not. We henceforth redefine to be the set of contractive -types.

-types come equipped with a notion of type equivalence (Fig. 1) and subtyping (Fig. 2). In Fig. 2 a subtyping context is a set of assumptions over type variables of the form with . (e-rec) actually encodes two rules, one for datatypes () and one for arbitrary types (). Likewise for (e-fold) and (e-contr). Regarding the subtyping rules, we adopt those for union of [DBLP:conf/csl/Vouillon04]. It should be noted that the naïve variant of (s-rec) in which is deduced from , is known to be unsound [DBLP:journals/toplas/AmadioC93]. We often abbreviate as .

### 2.3 Typing and Safety

A typing context (or ) is a partial function from term variables to -types; means that maps to . We have two typing judgments, one for patterns and one for terms . Accordingly, we have two sets of typing rules: Fig. 3, top and bottom. We write to indicate that the typing judgment is derivable (likewise for ). The typing schemes speak for themselves except for two of them which we now comment. The first is (t-app). Note that we do not impose any additional restrictions on , in particular it may be a union-type itself. This implies that the argument can have a union type too. Regarding (t-abs) it requests a number of conditions. First of all, each of the patterns must be typable under the typing context , . Also, the set of free matchables in each must be exactly the domain of . Another condition, indicated by , is that the bodies of each of the branches , , must be typable under the context extended with the corresponding . More noteworthy is the condition that the list be compatible.

Compatibility is a condition that ensures that Subject Reduction is not violated. We briefly recall it; see [DBLP:journals/entcs/VisoBA16] for further details and examples. As already mentioned in example (3) of the introduction, if subsumes (i.e. ) with , then the branch will never be evaluated since the argument will already match . Thus, in this case, in order to ensure SR we demand that . If does not subsume (i.e. ) with we analyze the cause of failure of subsumption in order to determine whether requirements on and must be put forward, focusing on those cases where is an offending position in both patterns. The following table exhaustively lists them:

 \andpiπ \andpjπ (a) c y restriction required (b) d no overlapping (pj\centernot⊲pi) (c) q1q2 no overlapping (d) q1q2 y restriction required (e) d no overlapping

In cases (b), (c) and (e), no extra condition on the types of and is necessary, since their respective sets of possible arguments are disjoint. The cases where and must be related are (a) and (d): for those we require . In summary, the cases requiring conditions on their types are: 1) ; and 2) and .

{definition}

Given a pattern and , we say admits a symbol (with ) at position iff , where:

 a∥ϵ≜{a}(A1⋆A2)∥ϵ≜{⋆},⋆∈{⊃,@}(A1⋆A2)∥iπ≜Ai∥π,⋆∈{⊃,@},i∈{1,2}(A1⊕A2)∥π≜A1∥π∪A2∥π(μV.A′)∥π≜({μV.A′/V}A′)∥π

Note that and contractiveness of imply is well-defined for .

{definition}

The maximal positions in a set of positions are:

 maxpos(P)≜{π∈P|∄π′≠ϵ.ππ′∈P}

The mismatching positions between two patterns are defined below where, recall from the introduction, stands for the sub-pattern at position of :

 mmpos(p,q)≜{π|π∈maxpos(pos(p)∩pos(q))∧\andpπ\centernot⊲\andqπ}

For instance, given patterns and with set of positions and respectively, we have and , while the only mismatching position among them is the root, i.e. .

{definition}

Define the compatibility predicate as

 Pcomp(p:A,q:B)≜∀π∈mmpos(p,q).A∥π∩B∥π≠∅

We say is compatible with , notation , iff

 Pcomp(p:A,q:B)⟹B⪯μA

A list of patterns is compatible if .

Following the example above, consider types and for patterns and respectively. Compatibility requires no further restriction in this case since and

 nil∥ϵ={nil}(cons@Nat@(μα.nil⊕cons@Nat@α))∥ϵ={@}

hence is false and the property gets validated trivially.

On the contrary, recall motivating example (3) on Sec. 1. requires since (i.e.  is trivially true). This actually fails because . Thus, this pattern combination is rejected by rule (t-abs).

Types are preserved along reduction. The proof relies crucially on compatibility.

###### Proposition \thetheorem (Subject Reduction)

If and , then .

Let the set of values be defined as . The following property guarantees that no functional application gets stuck. Essentially this means that, in a well-typed closed term, a function which is applied to an argument has at least one branch that is capable of handling it.

###### Proposition \thetheorem (Progress)

If and is not a value, then such that .

## 3 Checking Equivalence and Subtyping

As mentioned in the related work, there are roughly two approaches to implementing equivalence and subtype checking in the presence of recursive types, one based on automata theory and another based on coinductive characterizations of the associated relations. The former leads to efficient algorithms [DBLP:journals/iandc/PalsbergZ01] while the latter is more abstract in nature and hence closer to the formalism itself although they may not be as efficient. In the particular case of subtyping for recursive types in the presence of ACI operators, the automata approach of [DBLP:journals/iandc/PalsbergZ01] is known not to be applicable [Zhao:thesis] while the coinductive approach, developed in this section, yields a correct algorithm. In Sec. 5 we explore an alternative approach for subtyping based on automata inspired from [DBLP:conf/tlca/CosmoPR05]. We next further describe the reasoning behind the coinductive approach.

### 3.1 Preliminaries

Consider type constructors and together with type connector and the ranked alphabet . We write for the set of (possibly) infinite types with symbols in . This is a standard construction [Terese:2003, DBLP:journals/tcs/Courcelle83] given by the metric completion based on a simple depth function measuring the distance from the root to the minimum conflicting node in two trees. Perhaps worth mentioning is that the type connector does not contribute to the depth (hence the reason for calling it a connector rather than a constructor) excluding types consisting of infinite branches of , such as , from . A tree is regular if the set of all its subtrees is finite. We shall always work with regular trees and also denote them .

{definition}

The truncation of a tree at depth (notation ) is defined inductively2 as follows:

 Misplaced &

where is a distinguished type constant used to identify the nodes where the tree was truncated.

{definition}

The function , mapping -types to types, is defined inductively as follows:

 \llbracketa\rrbracketT(ϵ)≜a\llbracketA1⋆A2\rrbracketT(ϵ)≜⋆for⋆∈\left{{\mathrel{@},\supset,\oplus}\right% }\llbracketA1⋆A2\rrbracketT(iπ)≜\llbracketAi\rrbracketT(π)for⋆∈% \left{{\mathrel{@},\supset,\oplus}\right}\llbracketμV.A\rrbracketT(π)≜\llbracket{μV.A/V}A\rrbracketT(π)≜

Coinductive characterizations of subsets of whose generating function is invertible admit a simple (although not necessarily efficient) algorithm for subtype membership checking and consists in “running backwards” [DBLP:books/daglib/0005958, Sec. 21.5]. This strategy is supported by the fact that contractiveness of -types guarantees a finite state space to explore (i.e. unfolding these types results in regular trees); invertibility further guarantees that there is at most one way in which a member of , the greatest fixed-point of , can be generated. Invertibility of means that for any , the set is either empty or contains a unique member.

### 3.2 Equivalence Checking

Fig. 4 presents a coinductive definition of type equality over -types. This relation is defined by means of rules that are interpreted coinductively (indicated by the double lines). The rule (e-union-al) makes use of functions and to encode the ACI nature of . Letters , used in rules (e-rec-l-al) and (e-rec-r-al), denote contexts of the form:

 A1⊕…Ai−1⊕□⊕Ai+1⊕…⊕An

where denotes the hole of the context, for all and for all . Note that, in particular, may take the form . These contexts help identify the first occurrence of a constructor within a maximal union. In turn, this allows us to guarantee the invertibility of the generating function associated to these rules.

###### Proposition \thetheorem

The generating function associated with the rules of Fig. 4 is invertible.

Moreover, coincides with :

###### Proposition \thetheorem

iff .

This will allow us to check by using the invertibility of the generating function (implicit in the rules of Fig. 4) for . The proof of Prop. 3.2 relies on an intermediate relation over the possibly infinite trees resulting from the complete unfolding of -types. This relation is defined using the same rules as in Fig. 4 except for two important differences: 1) the relation is defined over regular trees in , and 2) rules (e-rec-l-al) and (e-rec-r-al) are dropped.

The proof is structured as follows. First we characterize equality of -types in terms of equality of their infinite unfoldings [DBLP:journals/entcs/VisoBA16]:

###### Proposition \thetheorem

iff .

The proof of Prop. 3.2 thus reduces to showing that coincides with . In order to do so, we appeal to the following result that states that inspecting all finite truncations suffices to determine whether holds:

{lemma}

iff .

{proof}

This is proved by showing that the relations