Enumeration of chord diagrams on many intervals and their non-orientable analogs
Two types of connected chord diagrams with chord endpoints lying in a collection of ordered and oriented real segments are considered here: the real segments may contain additional bivalent vertices in one model but not in the other. In the former case, we record in a generating function the number of fatgraph boundary cycles containing a fixed number of bivalent vertices while in the latter, we instead record the number of boundary cycles of each fixed length. Second order, non-linear, algebraic partial differential equations are derived which are satisfied by these generating functions in each case giving efficient enumerative schemes. Moreover, these generating functions provide multi-parameter families of solutions to the KP hierarchy. For each model, there is furthermore a non-orientable analog, and each such model likewise has its own associated differential equation. The enumerative problems we solve are interpreted in terms of certain polygon gluings. As specific applications, we discuss models of several interacting RNA molecules. We also study a matrix integral which computes numbers of chord diagrams in both orientable and non-orientable cases in the model with bivalent vertices, and the large-N limit is computed using techniques of free probability.
A partial chord diagram is a connected fatgraph (i.e., a graph equipped with a cyclic order on the half edges incident to each vertex) comprised of an ordered set of disjoint real line segments (called backbones) connected with chords in the upper half plane with distinct endpoints, so that there are vertices of degree three (or chord endpoints) and vertices of degree two (or marked points) all belonging to the backbones (in effect, ignoring the vertices of degree one arising from backbone endpoints.) If so there are no marked points, then we call the diagram a (complete) chord diagram. Each partial or complete chord diagram is a spine of an orientable surface with boundary components and therefore has a well-defined topological genus. The genus of a partial chord diagram on backbones and its number of boundary components are related by Euler’s formula .
Chord diagrams occur pervasively in mathematics, which further highlights the importance of the counting results obtained here. To mention a few, see the theory of finite type invariants of knots and links  (cf. also ), the representation theory of Lie algebras , the geometry of moduli spaces of flat connections on surfaces [5, 6] and mapping class groups . Moreover and as we shall further explain later, partial and complete chord diagrams each provide a useful model [28, 29, 25, 37] for the combinatorics of interacting RNA molecules with the associated genus filtration of utility in enumerative problems [3, 11, 25, 32, 33, 37, 38] and in folding algorithms on one [35, 10] and two backbones .
Our goal is to enumerate various classes of connected partial and complete chord diagrams, and to this end, we next introduce combinatorial parameters, where each enumerative problem turns out to be solved by an elegant partial differential equation on a suitable generating function in dual variables. In effect, creation and annihilation operators for the combinatorial data are given by multiplication and differentiation operators in the dual variables leading to algebraic differential equations.
We say that a partial chord diagram has
backbone spectrum if the diagram has backbones with precisely vertices (of degree either two or three);
boundary point spectrum if its boundary contains connected components with marked points;
boundary length spectrum if the boundary cycles of the diagram consist of edge-paths of length , where the length of a boundary cycle is the number of chords it traverses counted with multiplicity (as usual on the graph obtained from the diagram by collapsing each backbone to a distinct point) plus the number of backbone undersides it traverses (or in other words, the number of traversed backbone intervals obtained by removing all the chord endpoints from all the backbones).
The data is called the type of a partial chord diagram (cf. Fig. 1). Note that the entries in the data set are not independent. In particular, we have
Let denote the number of distinct connected partial chord diagrams of type taken to be zero if there are no chord diagrams of the specified type. Our two basic models involve boundary point spectra of partial chord diagrams and boundary length spectra of complete chord diagrams, and each basic model, in turn, has both an orientable and a non-orientable incarnation.
We may also consider non-orientable chord diagrams. Let denote the number of both orientable and non-orientable connected diagrams on backbones, out of which exactly have vertices, with pairs of vertices connected by (twisted or untwisted) chords, with boundary point and boundary length spectra and respectively, and with Euler characteristic , where denotes twice the genus in the orientable case and the number of cross caps in the non-orientable case. This can evidently be formalized in the language of planar projections of chord diagrams by two-coloring the chords depending upon whether they preserve or reverse the orientation of the plane of projection.
For partial chord diagrams and boundary point spectra, we shall count the subsets
in the orientable case and
in the non-orientable case.
We can equivalently replace each backbone component containing vertices by a polygon with sides (one of which is distinguished, corresponding to the first along the backbone). Thus, the numbers count the orientable genus connected gluings of polygons, among which exactly have sides, with pairs of sides identified in such a way that the boundary of the glued surface has exactly connected components consisting of sides.111For , a similar problem was addressed in , but the formulas derived there are considerably different from the ones obtained in this paper and do not stand a numerical check. In particular, the Harer-Zagier numbers  that enumerate closed orientable genus gluings of a -gon coincide with the numbers with and , where we denote by the vector with 1 in the -th place and elsewhere.
A useful notation for exponentiating a -tuple of variables by an integral -tuple is to write simply ; we extend this notation in case is a fixed infinite sequence of variables and is a finite tuple. In this notation and setting and , we define the orientable, multi-backbone, boundary point spectrum generating function , where
and the non-orientable generating functions , where
(we recall that in the orientable case and in the non-orientable case, while in the both cases ).
Theorem 1 (Boundary point spectrum for partial chord diagrams).
Consider the linear differential operators
and the quadratic differential operator
Then the following partial differential equations hold:
These equations, together with the common for each case initial condition at given by , determine the generating functions uniquely.
Equivalently, each differential equation is solved by exponentiating times the operator on the right hand side applied to , for example,
This explains the relationship between these differential equations and the corresponding enumerative problems. These are the most efficient enumerations of which we are aware. As we shall see in the proof, each term corresponds to adding a certain type of chord: and , respectively, for chords with both endpoints on the same and different boundary components lying in a common component, for chords whose removal separates the diagram, and the analogue of for Möbius bands that give rise to Möbius graphs as compared to fatgraphs in the oriented case (the subscripts 0,1 and 2 by reflect the change in the Euler characteristic of the chord diagram under such an operation).
In the last section of this paper, we provide matrix model formulas for certain linear combinations of the numbers and . This allows us to compare our computations for partial chord diagrams with results on a certain limiting spectral distribution, the so-called large -limit for one backbone. Note that a recursion for the numbers , for of all complete (not necessarily orientable) gluings of a -gon was derived in  using the methods of random matrix theory. Our formulas specialize to those of  in this particular case.
For complete chord diagrams and boundary length spectra, we shall count the subsets
in the orientable case and
in the non-orientable case. We define the orientable, multi-backbone, length spectrum generating function , where
and the non-orientable generating function , where
Theorem 2 (Boundary length spectrum for complete chord diagrams).
Define the linear differential operators
and the quadratic differential operator
Then the following partial differential equations hold:
These equations, together with the common in each case initial condition at given by , determine the generating functions uniquely.
Complete gluing of a -gon with a marked edge can be enumerated in a similar way. Consider the image of the polygon perimeter, that is, the graph embedded in the glued surface. We say that the embedded graph has the vertex spectrum if there are exactly vertices of degree . Let denote the number of genus orientable gluings of a -gon, such that the embedded graph has the vertex spectrum . The generating function
for the numbers satisfies the equation
and is uniquely determined by it together with the initial condition
Actually, and are explicitly related by the formula
(this immediately follows from the fact that both and commute with ). The same problem, but differently formulated (namely, the enumeration of genus fatgraphs on vertices of specified degrees) was recently solved in . However, our generating function (7) for these numbers and the partial differential equation (8) it satisfies are different from their counterparts in .
The following observation we learned from M. Kazarian [17, 18]: for the generating functions and satisfy an infinite system of non-linear partial differential equations called the KP (Kadomtsev-Petviashvili) hierarchy (in particular, this means that the numbers and additionally obey an infinite system of recursions). The KP hierarchy is one of the best studied completely integrable systems in mathematical physics. Below are the several first equations of the hierarchy:
where the subscript stands for the partial derivative with respect to . The exponential of any solution is called a tau function of the hierarchy. The space of solutions (or the space of tau functions) has a nice geometric interpretation as an infinite-dimensional Grassmannian (called the Sato Grassmannian), see, e. g.,  or  for details. See also  for another application of the Sato Grassmannian to conformal field theory. The space of solutions is homogeneous: there is a Lie algebra (a central extension of ) that acts infinitesimally on the space of solutions, and the action of the corresponding Lie group is transitive.
Introduce the standard bosonic creation-annihilation operators
(the notation stands for the ordered product , where is a permutation such that ).222The operator is the famous cut-and-join operator  used in the computation of Hurwitz numbers. All the operators belong to the Lie algebra . Moreover, it is easy to check that
so that and also belong to . Now we notice that the exponentials and of the initial conditions in Theorems 1 and 2 both are KP tau functions for a trivial reason – their logarithms are linear in and therefore obviously satisfy the equations of KP hierarchy (1) for any values of the other parameters. Moreover, both and preserve the Sato Grassmannian and map KP tau functions to KP tau functions. Thus, and are KP tau function as well, and we get
Corollary 1 (M. Kazarian ).
The generating functions
satisfy the infinite system of KP equations (1) with respect to for any values of the parameters . Equivalently, the partition functions and are (multi-parameter) families of KP tau functions.
Let us now comment on the relevance of the above results to describing the RNA interactions. Define to be the number of complete and connected chord diagrams of genus on ordered and oriented backbones with chords, so in particular, is the Harer-Zagier number . These chord diagrams provide the basic model for a complex of interacting RNA molecules, one RNA molecule for each backbone and one chord for each Watson-Crick333These are the allowed bonds G-C and A-U between nucleic acids. For the expert, let us emphasize that any other model including wobble G-U or further exotic base pairs is handled in exactly the same way with one chord for each allowed type of bond. bond between nucleic acids, where one demands that the chord endpoints respect the natural ordering444From the so-called 5’ to 3’ end as determined by the chemical structure of the RNA. of the nucleic acids in each molecule, i.e, in each oriented backbone. It is very natural, as is the attention to connected chord diagrams in order to avoid separate molecular interactions. In reality, RNA folds according to a partial chord diagram, i.e., there are in practice unbonded nucleic acids.555Typically, 50 to 80 percent of nucleic acids participate in Watson-Crick base pairs together with several percent exotic. On the other hand in an extreme example, roughly 50 percent are Watson-Crick and 40 percent exotic for ribosomal RNA.
Recall from  that a shape is a special connected and complete chord diagram which has no parallel chords, has a unique “rainbow” on each backbone, i.e., a chord whose endpoints are closer to the backbone endpoints than any other chord and no “1-chords” connecting vertices consecutive in a single backbone unless the 1-chord is a rainbow. In the very special (genus zero on one backbone) case, the single-chord diagram is permitted since the 1-chord is a rainbow, but in all other cases, there are no 1-chords, each backbone has a unique rainbow, and , . If a shape is not the special single-chord diagram and we remove its rainbows, then the resulting diagram has . Conversely, in a chord diagram with , no backbone has a rainbow, and rainbows can be added to produce a shape. Let denote the number of shapes of genus on backbones with chords.
Define the generating functions , with
and , with
It follows by construction that
and , so we have computed here both the complete chord diagrams and the shapes 666Furthermore, the free energy for the matrix model in  is given (up to a constant depending only on times ) by our .. In fact , the generating functions for shapes and chord diagrams are algebraically related by
where is the Catalan generating function, the former equation expressing the formal power series in terms of the polynomial . As a further interesting open problem, inspired by the results of this paper, we ask if there is a non-zero finite order differential operator in the variables which together with an initial condition determines ?
One point about shapes is that standard combinatorial techniques allow their “inflation” to complete chord diagrams as indicated in the previous formulas, and furthermore, complete chord diagrams can likewise be inflated to partial chord diagrams, cf. [7, 34]. Another point is that shape inflation is well-suited to the accepted Ansatz for free energy and so provides efficient polynomial-time algorithms for computing minimum free energy RNA folds [35, 34] at least in the planar case. A further geometric point  is that shapes of genus on backbones are dual to cells in the Harer-Mumford-Strebel  or Penner  decomposition of Riemann’s moduli space of genus surfaces with boundary components provided .
As was already discussed, it is really partial chord diagrams that actually describe complexes of RNA molecules with its distillation first to complete chord diagrams and then to shapes. All three formulations of the combinatorics have thus been treated here, namely, shapes and complete chord diagrams by the previous formulas and partial chord diagrams by inflation or instead directly with our generating function in Theorem 1.
This paper is organized as follows. Section 2 contains basic combinatorial results on the boundary point spectra of chord diagrams on one backbone and derives the equation given before on (Proposition 2), and section 3 extends these results to include possibly separating edges and derives the equation given before on (Proposition 4). Boundary point spectra of non-orientable surfaces are discussed in Section 4, and the equations given before on and are derived (Proposition 5), so together Propositions 2-5 comprise Theorem 1. Section 5 is dedicated to boundary length spectra, and the situation is similar to boundary point spectra in that each counts data for each fatgraph boundary cycle. For this reason, the arguments are only sketched for boundary length spectra culminating in the equations from before on , , and (Theorem 2). Section 6 introduces a random matrix technique for partial chord diagrams and provides a matrix integral for boundary point spectra computations in both the orientable and non-orientable cases. Free probability techniques permit the computation of the large-N limit which reproduces computations based on the partial differential equations, providing a consistency check on the entire discussion.
2. Combinatorics of connected partial chord diagrams
As before, denotes the sequence with 1 in the -th place and 0 elsewhere. We say simply that a diagram is of type if it is of type for some and let if there are no diagrams of type .
The numbers enumerating one backbone chord diagrams of type obey the following recursion relation:
Let us start with a chord diagram of type . Note that erasing a chord in a diagram, we keep its endpoints as marked points. This yields two possibilities.
The first possibility is that the chord belongs to two distinct boundary components, say, one with and the other with marked points. After erasing the chord, these two boundary components join into one component with marked points, and the genus of the diagram does not change (see Fig. 2). Thus, one gets a diagram of genus with chords, marked points and boundary point spectrum .
The second possibility is that one boundary component traverses the chord twice, i.e., once in each direction. Erasing this chord splits the boundary component (say, with marked points) into two (with and marked points respectively, ) (see Fig. 3). In this case, one gets a chord diagram of genus with chords, marked points and boundary point spectrum .
In order to prove (12), let us compute the number of chord diagrams of type with one marked chord in two different ways. On the one hand, there are possibilities to mark a chord in a diagram with chords, so the number in question is . On the other hand, one can join any two marked points with a marked chord on any diagram with chords. We have described above all types of diagrams with chords that could potentially give a -chord diagram of the required type after adding a chord.
If one takes a diagram of type (let us assume that ), then there are possibilities to choose a boundary component with marked points. One then needs to connect two marked points on it with a chord in such a way that it splits into two boundary components with and marked points respectively. This can be done in different ways. If , then there are ways to split the boundary component into two components with marked points each. For we get the same diagrams as in the case , hence we get the first term on the r.h.s. of (12).
If one takes a diagram of type (let us assume that ), then there are ways to choose a boundary component with marked points, provided . If , then and so and the number of ways is then . There are ways to choose a boundary component with marked points if . If , then the number of ways is . One then needs to connect with a chord a marked point on one boundary component with a marked point on the other one. This can be done in different ways. If , then there are ways to choose a pair of boundary components with marked points, provided . If we have and also , then and the number of ways is . In both cases, there are ways to connect with a chord two points on different components. This gives us the second term on the r.h.s. of (12). ∎
The one backbone generating function is uniquely determined by the equation
together with the initial condition
Equivalently, we have
It is straightforward to check that the equation is equivalent to formula (12). Moreover, every chord diagram of type can be obtained from the unique diagram of type by adding chords to it. On the level of , this amounts to applying the operator to precisely times and taking the coefficient of the monomial in which is equal to by formula (12). ∎
3. The multibackbone case
Let us proceed with the multibackbone case.
The numbers obey the following recursion relation: