Associative and commutative tree representations for Boolean functions
Since the 90’s, several authors have studied a probability distribution on the set of Boolean functions on variables induced by some probability distributions on formulas built upon the connectors and and the literals . These formulas rely on plane binary labelled trees, known as Catalan trees. We extend all the results, in particular the relation between the probability and the complexity of a Boolean function, to other models of formulas: non-binary or non-plane labelled trees (i.e. Polya trees). This includes the natural tree class where associativity and commutativity of the connectors and are realised.
Consider the set of Boolean functions on a set of variables.
There are such Boolean functions, as a value from can be assigned to every variable, which gives different assignments, for every assignment the function has output or . In this paper we consider And/Or trees, i.e. trees where internal nodes carry labels from the set and external nodes (leaves) have labels from the set . Obviously, every such tree represents a function from .
We consider the uniform distribution on the set of And/Or trees of size (denoting by size the number of leaves) and are interested in the limiting probability of a given function being computed by a random tree of size , as tends to infinity, if it exists.
A lot of work has been going on in this field. Lefmann and Savický  were first to prove
the existence of the limiting probability of . A survey including various numerical results was done
by Chauvin et al.  and Gardy . A similar study on implication trees, i.e.
Boolean trees where internal nodes carry implication labels () has been done by Fournier et al. .
We base our paper on a recent work by Kozik  on pattern languages. In his paper, Kozik proves a strong relation between the limiting probability of a given function and its complexity (that is the minimal size of a tree computing the function ), asymptotically as the number of variables tends to infinity. We want to study the impact of removing step by step the restrictions on the trees, that is considering first plane, but non-binary or non-plane but binary And/Or trees, and later non-plane and non-binary trees. Considering such tree structures seems quite natural, as the new characteristics correspond to adding the characteristics of associativity (non-binary) and commutativity (non-plane), which are given for the and operator on the level of Boolean logic. Gardy  presented some of these models and gave some numerical values of the probabilty distributions on . In this paper we will use some recent method based on binary plane trees to obtain some results on the probability distribution on , when is large.
Kozik has shown that the asymptotic order of depends on for binary plane trees (c.f. ). First, we compare the limiting probabilities of the constant function in the different models. Supported by numerical results for equal to 1 and 2, we conjectured that commutativity does not matter. Surprisingly to us, we find that both characteristics have impact on the limiting probability . To be more precise, even if the order of when tends to infinity is the same in all models, the asymptotic leading coefficient differs from model to model. To get more insight, we further compare probabilities of functions of complexity , those are the literals , in a next step.
Finally, we prove that for all tree models compared, the asymptotically relevant fraction of trees computing a given function is given by the set of minimal trees of expanded once in a given way, and give bounds for the arising probability distribution. A similar result is proved in  for plane binary And/Or trees and in a paper by Fournier et al.  for implication trees.
2 Associative and commutative trees: definitions, generating functions.
Kozik  has shown that in binary plane trees the order of magnitude of the limiting probability of a given Boolean function is related to its complexity. We generalise this result and therefore define the complexity of a function by the following:
An And/Or tree is a labelled tree, where each internal node is labelled with one of the connectors and each leaf with one of the literals . We define the size of an And/Or tree to be its number of leaves.
The complexity of a non-constant function (i.e. ) is given by the size of a smallest And/Or tree computing (in the rest of the paper such trees will be called minimal for ), while we define the complexity of and to be .
As it will be clear later, the complexity of a function does not depend on the chosen tree model.
We are considering a set of And/Or trees of size . Let be the uniform distribution on , and its image on the set of Boolean functions. We call
the limiting distribution.
Remark: In all models we will take into consideration, the probability of a function is equal to the one of its negation. In fact, a tree computing can be relabelled in the following way: each connector is substituted by the other one and each literal by its negation ( and ). The new tree we obtain belongs to the same model as and computes the function .
At first, we will present the result proven by Kozik. This result will be generalised in the forthcoming parts of the paper.
2.1 The classical model.
First, let us consider the set of binary plane trees, whose internal nodes are labelled with or , and whose external nodes
are labelled with literals chosen in : each such tree computes a Boolean function on variables.
We denote by the generating function enumerating this set of trees
Binary And/Or trees fulfil the symbolic equation
where is a leaf. Thus the generating function verifies , and therefore, we have:
and the singularity of is .
Let us consider the uniform distribution on the set of trees of size and then the probability distribution it induces on the set of Boolean functions on variables. The limit of this distribution when tends to infinity, denoted by has already been studied, in particular by Lefmann and Savický , Chauvin et al.  and Kozik , who has shown the following theorem.
[8, Kozik] Let be a Boolean function. Then
where is the complexity of , i.e. the size of a minimal tree computing , and is a constant depending on , which will be specified later in this paper.
A variable is essential for a function if there exists an assignment of or to the variables , which we denote by , such that .
Remark: An essential variable of appears in every tree representation of .
Remark: Note that in this theorem, (and thus ) is fixed, and tends to infinity. The set of essential variables of the function is finite (and does not depend on ).
First of all, let us define associative trees, commutative trees and then associative and commutative trees, and the induced distributions on the set of Boolean functions .
2.2 The associative plane model.
An associative tree is a plane tree where each node has out-degree chosen in . A labelled associative tree is an associative tree in which each external node has a label in and each internal node has an -label or an -label but cannot have the same label as its father. We denote by the family of associative trees and by the set of such trees of size .
Hence these trees are stratified: the root can be labelled either by or and it determines the labels of all other internal nodes.
We denote by the limiting distribution of Boolean functions induced by associative And/Or trees. Our aim is to compare the limiting distributions and .
The generating function enumerating associative trees is given by , where (resp. ) is the generating function of associative trees rooted at an -node (resp. an -node) or is a single leaf. Note that and,
and its dominant singularity is
Remark: Thanks to the Drmota-Lalley-Woods theorem (well presented in [4, Chapter 8]), we can show that has indeed a limit when tends to infinity. We denote by (resp. ) the generating function enumerating associative trees computing , whose roots are labelled by (resp. ) or a literal. These generating functions satisfy the following system:
The Drmota-Lalley-Woods theorem says, roughly speaking, that generating functions satisfying a system of functional equation have a dominant singularity of the same type. By transfer theorems (see ) this implies similar behaviour of their coefficients and eventually the existence of a limiting distribution. For a similar system of functional equations it was shown in [5, Section 3] that all assumptions of the Drmota-Lalley-Woods theorem indeed hold.
2.3 The commutative binary model.
A labelled commutative tree on variables is a non-plane binary tree where every internal node is labelled with one of the connectors and every leaf is labelled by a literal . We denote this family of trees by .
We consider the distribution induced over the set of Boolean functions of variables by the uniform distribution over such trees of size .
Binary commutative trees fulfil the same symbolic equation as in the plane case (c.f. (1)) but because of commutativity, the generating function of all commmutative trees on variables, counting leaves, is given implicitly by
where the term tracks a possible symmetry if both subtrees of the root are identical. See Gardy  for details on this model of expressions and Pólya and Read  for more general ideas. The system of equations for the generating functions computing a given Boolean function is given by
We can prove all assumptions of the Drmota-Lalley-Woods theorem, hence we conclude that all the and have the same singularity , and therefore converges to a limiting probability distribution , when tends to infinity.
2.4 The commutative associative model.
Finally we define general labelled trees as commutative and associative trees, with internal nodes labelled by or (with the condition that father and sons cannot have the same label), and external nodes labelled by literals chosen in . We denote by this family of trees.
As in the other models, we consider the distribution induced over the set of Boolean functions by the uniform distribution over such trees of size .
Let be the generating function of general trees, and (resp. ) the generating function of general trees rooted by (or by , resp.) or are a leaf. We have
Moreover, the generating functions and of general trees computing satisfy the following system:
Thus, we can check the hypothesis of the Drmota-Lalley-Woods theorem and conclude that the limiting distribution
of , when tends to infinity, exists, and moreover,
that all the , and have
the same singularity, denoted by .
In the next parts of the paper, we will show that Theorem 2.5 still holds in the associative or commutative cases.
First, we show in Section 3 that the limiting ratio of tautologies is of order , we compute explicitly the limit of when tends to infinity for the different models. If these limits were the same, we could not conclude anything, but in fact they are all different, which permits us to conclude that asymptotically, when tends to infinity, the probability distributions induced by the various models are all different. In Section 4, we extend our results to the limiting probabilities of functions which are literals. In all models, the asymptotic ratio is of order when tends to infinity, but the limiting ratios are different from one model to the other. Finally, we generalise Theorem 2.5 in Section 5.
3 Limiting ratio of tautologies.
In this section we compute the limiting probability of the constant function . We recall that trees computing the function are called tautologies.
In a tree, if the path from the root to a leaf crosses only - nodes, then this path will be called a -only-path. We extend the definition to the case such that the leaf is equal to the root (i.e. the tree has size 1).
As suggested by Kozik’s results, the limiting probability of tautologies reduces to the limiting probability of so-called simple tautologies, defined by the following:
A simple tautology realised by , , is a Boolean expression which has the shape for some Boolean function , i.e. there exists a leaf labelled by and a leaf labelled by , both connected to the root by a "-only-path" (c.f. Figure 2). A simple tautology is a simple tautology realised by any literal . We denote by the number of simple tautologies of size (on variables, is omitted for simplicity), and .
Let be a set of variables and be the set of simple tautologies realised by every but not by any other variable .
is the set of simple tautologies that are realised by exactly one variable:
is the set of simple tautologies that are realised by exactly two different variables:
is the set of simple tautologies that are realised by exactly different variables:
We denote by the generating function of simple tautologies realised by . Let . Obviously, , because some tautologies are counted several times in . We get .
To calculate limiting probabilities, we use the singular expansions of the considered generating functions around their dominant singularities. Consider the generating function of a given family of And/Or trees together with the generating function of a subset of such trees.
We assume that and have the same dominant singularity and a square root singular expansion
around . Then
We call this number, when it exists, the limiting ratio of the set counted by .
If tends to infinity, transfer lemmas (c.f. ) give
Derivation of the singular expansions gives
Hence the result follows. ∎
Remark: If is the set of trees computing a given function , then, the limiting probability of is equal to the limiting ratio of because for all ,
3.1 Binary plane trees.
In the binary plane model, Kozik has shown that asymptotically, when tends to infinity, all tautologies are simple tautologies. Therefore, to estimate the probability that a binary plane tree computes the function , it suffices to count simple tautologies, and furthermore, thanks to the following proposition, simple tautologies that are realised by only one variable (i.e. the set ).
If tends to infinity, then
The proof of the proposition is deferred to the end of this section since further technical concepts are required.
The limiting ratio of simple tautologies, and thus the limiting ratio of tautologies in the binary plane model is
where is the total number of plane binary trees and is the number of simple tautologies of size labelled with variables.
Let us compute the generating function of simple tautologies. First, let be the generating function of trees containing a leaf labelled by which is connected to the root by an -only-path (c.f. Figure 3) and the generating function of trees which are not of such shape. Hence .
The function is given by:
This equation is obtained by decomposing the tree at its root: if the root is labelled by an , the tree is not of the shape depicted in Figure 3 and both subtrees are arbitrary trees. If the root is labelled by an , neither of the two subtrees may have the shape of Figure 3. If the root is a single leaf, it must not be labelled by . By a symbolic argumentation, the three cases translate to the three terms in the equation. Solving this equation, using the explicit expression of given by Proposition 2.4, we get:
Let be the generating function of trees given by (or ), where is a tree counted by and is a tree counted by , i.e. simple tautologies realised by , where and lie in different subtrees of the root (c.f. Figure 4).
Obviously, . Recall that is the generating function of simple tautologies realised by the variable , and be the generating function of trees that are not simple tautologies realised by . Again by decomposing and analysing the label of the root, we get:
In particular, if the root is labelled by an , neither of the two subtrees can be a simple tautology realised by and additionally the whole tree cannot be of the shape depicted in Figure 4. Solving this equation, we obtain an explicit expression for , and yields an expression for , where denotes :
We now go back to Proposition 3.5. In the following, we define pattern languages and some related vocabulary, which can be found in Kozik’s paper  for the binary case. Interpreting a given And/Or tree as an element from a pattern language, which is possible if pattern and trees have a similar structure, will lead us to the proof of Proposition 3.5.
A pattern language is a set of plane trees with internal nodes labelled by or , and external nodes labelled by or . The leaves labelled by are called placeholders and those labelled by are called pattern leaves. We define as the generating function of , with marking the pattern leaves and marking the placeholders.
Given a pattern language , we will denote by the set of plane labelled trees with internal nodes labelled by or , and external nodes labelled by literals or placeholders, such that if we replace every literal by a , we obtain a tree of . Therefore, is the generating function of .
Given a set of trees , we define (resp. ) as the set of trees obtained by taking an element of (resp. ) and plugging an element of in each placeholder.
Given two pattern languages and , we define the composition of and by the pattern language obtained by plugging -patterns into the placeholders of the structures of . The pattern leaves of are then both the pattern leaves of and .
A pattern language is unambiguous if for every
family every element of can be constructed in only one way.
A pattern language is subcritical for if the generating function of has a square root singularity and if is analytic in some set .
In the following, we call a variable essential for a tree if and only if it is essential for the function computed by this tree (c.f. Definition 2.6).
If is an element of , has -repetitions if equals the difference between
the number of its -pattern leaves and the number of distinct variables (and not literals) that appear in its -pattern leaves.
Further, has -restrictions if equals the number of its -repetitions plus the number of essential variables of that appear at least once in its -pattern leaves.
For an example of repetitions and restrictions, see Figure 5.
[8, Kozik] Let be a binary unambiguous language which is subcritical for . We denote by (resp by ) the number of elements of of size which have (resp. at least ) -restrictions, and by the number of elements of of size . Then,
when tends to infinity, and is a constant.
Due to this theorem, we can now prove Proposition 3.5.
3.2 Associative plane trees.
To compute the limit of when tends to infinity, we define simple tautologies, and prove that asymptotically every tautology is a simple tautology. Therefore, we will generalise Theorem 3.10 to associative trees.
The limiting probability of the function in the associative model, , is given by
Generalisation of Kozik’s theorem to associative trees.
Let be an unambiguous pattern language with out-degree different from , which is subcritical for . We denote by (resp by ) the number of elements of of size which have (resp. at least ) -restrictions, and by the number of elements of of size . Then,
when tends to infinity, and is a constant.
The proof of the generalisation works analogously to the one of Theorem 3.10, still we will state the main ideas as they will be useful in the following.
Let be the family of associative trees with leaves unlabelled, and let with -pattern leaves. Further, we fix the set of essential variables and denote by the cardinality of this set, . For any , the number of different leaf-labellings of which give -repetitions and -restrictions is:
are the Stirling numbers of second
the number of partitions of the -pattern leaves into classes (leaves in the same class will be labelled by the same variable),
the number of different choices for the essential variables that appear in the -pattern leaves,
the number of different assignments of these essential variables to the classes of the first term,
the number of assignments of non-essential variables to the remaining classes of the -pattern leaves,
the number of assignments of variables to the leaves that are not -pattern leaves,
the number of ways to distribute the negations.
The following proposition is immediate:
Given an associative tree with leaves unlabelled, the number of leaf-labellings of which make it have -restrictions is:
where is a polynomial in .
In  the following proposition is proved for binary trees and patterns (cf. [8, Lemma 2.7]), but in fact the proof does not rely on binarity and hence the proposition holds for patterns and trees of arbitrary degree.
Let be a set of trees whose generating function has a unique dominating singularity in of the square root type. Let be an unambiguous pattern language, subcritical for . Let denote the number of trees from of size with exactly pattern leaves. Finally, let be a non zero polynolmial of degree . Then,
for some non-negative real .
Moreover, if has non-negative values and is positive at some point , and if contains a pattern with non pattern leaves and at least one placeholder, then .
Thanks to those propositions, we can now prove Theorem 3.12 to associative trees:
Proof of Theorem 3.12.
Let be an associative pattern and the family of trees from with leaves unlabelled. We have, thanks to Proposition 3.13:
and this implies:
Thanks to Proposition 3.14, we get:
when tends to infinity. Moreover, we can check that is positive. A lower bound can be proven analogously, the proof for the binary case is given in . It follows that
when tends to infinity. Moreover, we can see that:
the theorem is proven. ∎
In the associative model, asymptotically when tends to infinity, almost all tautologies are simple tautologies.
The proof is very similar to the proof of the binary case (see ). First we need to introduce patterns:
where means we start with either an -node or an -node, and use the according pattern or , and then use both partial patterns alternatingly until the process finishes. Then is an unambigous pattern language.
The pattern is subcritical for associative trees.
The generating function of the labelled pattern is given by
where (resp. ) is the generating function of the partial labelled patterns (resp. ). These two generating functions satisfy the following system:
Solving this system, we get
Recall that (cf. (2))
To prove that is the dominant singularity of , it is enough to prove that it is the dominant singularity of and . Actually, and are analytic in . For big enough , , and due to non-negative coefficients the inequality holds for all . Thus the dominant singularity of is and is subcritical for associative trees. ∎
Remark: The -pattern has an interesting property: if we set all the -pattern leaves of a tree to , then, the whole tree itself computes . This can be checked by induction on the size of the tree. If the pattern is only a leaf, it returns . If the root is an -node, then all subtrees of the root are patterns returning by the induction hypothesis. If the root is an -node, the leftmost subtree is a pattern returning by the induction hypothesis. Thus the whole tree computes in all cases. This property is the key point of the following proof.
Remark: The pattern is a generalisation of the pattern , defined in  to handle the proof in the binary plane case. Note that , and we can find the unique element from which corresponds to a tree by starting at the root of and finding the pattern leaves by traversing the tree top-to-bottom.
Proof of Proposition 3.15.
Let us consider a tautology with exactly one -restriction (cf. Definition 3.7). This restriction has to be a repetition, since a tautology does not contain essential variables.
If the repetition is of the kind , then we can assign all the -pattern leaves to , and with this assignment the whole tree computes , which is impossible.
Thus the repetition has to be an repetition. Let us first assume that the repetition does not appear among the -pattern leaves. Thus we can assign all those leaves to , and then the whole term computes , which is impossible. Hence, the repetition must occur in the -pattern leaves. Let us assume that there is a node labelled by on one of the paths from the leaves labelled by and to the root of the tree. Then, the subtree rooted at has shape with . Let us assume that (or ) appears in . Then, we can assign all the -pattern leaves of the other subtrees