A stronger null hypothesis for crossing dependencies

# A stronger null hypothesis for crossing dependencies

R. Ferrer-i-Cancho\inst1 E-mail: rferrericancho@cs.upc.edu
###### Abstract

The syntactic structure of a sentence can be modeled as a tree where vertices are words and edges indicate syntactic dependencies between words. It is well-known that those edges normally do not cross when drawn over the sentence. Here a new null hypothesis for the number of edge crossings of a sentence is presented. That null hypothesis takes into account the length of the pair of edges that may cross and predicts the relative number of crossings in random trees with a small error, suggesting that a ban of crossings or a principle of minimization of crossings are not needed in general to explain the origins of non-crossing dependencies. Our work paves the way for more powerful null hypotheses to investigate the origins of non-crossing dependencies in nature.

## Introduction

The syntactic structure of a sentence can be defined as a network where vertices are words and edges indicate syntactic dependencies [1, 2] as in Fig. 1. The most common assumption is that this structure is a tree (an acyclic connected graph) (e.g., [1, 3]). In the 1960s, a striking pattern of syntactic dependency trees of sentences was reported: dependencies between words normally do not cross when drawn over the sentence [4, 5] (e.g., Fig. 1). , the number of different pairs of edges that cross, is small in real sentences. In Fig. 1, for sentence (a) and for sentence (b). Interestingly, the tree structure of both sentences is the same but varies, showing that depends on the linear arrangement of the vertices.

Imagine that is defined as the position of the vertex in a linear arrangement of vertices (the 1st vertex has position 1, the second vertex has position 2 and so on…) and thus . is used to refer to an edge formed by the vertices and . The length of the edge in words is (here is the absolute value operator). and are defined, respectively, as the initial and the end position of the edge , i.e. and . and cross if and only if one of the following conditions is met

• and and

• and and .

It has been hypothesized that in real sentences [1, 6] could be due to a principle of minimization of the length of edges [7, 8, 9, 10]. Although the minimization of

 D=∑u∼vd(u∼v) (0)

reduces crossings to practically zero [7], this does not provide a full explanation about the low frequency of crossings in real sentences: (a) minimum does not imply [11], (b) the actual value of in real sentences is located between the minimum and that of a random ordering of vertices [12] and (c) the word order that minimizes might be in a serious conflict with other linguistic or cognitive constraints [13]. Here the problem of the reduction of that is required for explaining in real sentences is avoided by means of a null hypothesis that predicts by considering the actual length of the edges that may cross. With this null hypothesis, one can shed light on a fundamental question: how much surprising it is that given the lengths of edges? That null hypothesis is vital for the development of a general but minimal theory of crossing dependencies in nature. First, in sentences might also be due to a ban of crossings by grammar [2] or a principle of minimization of [8]. Second, crossings have also been investigated in networks of nucleotides [14]. Here it will be shown that a simple null hypothesis based on actual dependency lengths would suffice a priori for predicting in short enough sentences.

## Crossing theory

### The expected number of crossings

is defined as the number of edge crossings where the edge formed by and is involved. can be defined as

 C=12∑u∼vC(u∼v), (0)

where the factor is due to the fact that if two edges and cross, their crossing will be counted twice, one through and another through . can be defined as

 C(u1∼v1)=∑u2∼v2,{u1,v1}∩{u2,v2}=∅C(u1∼v1,u2∼v2), (0)

where indicates if and define a couple of edges that cross, i.e. if they cross, otherwise. Applying the definition of in eq. (The expected number of crossings), becomes

 C=12∑u1∼v1∑u2∼v2,{u1,v1}∩{u2,v2}=∅C(u1∼v1,u2∼v2). (0)

Suppose that the vertices are arranged linearly at random (being all the permutations of the vertex sequence equally likely). Then, the expectation of is

 see eq. (The expected number of crossings)

As is and indicator variable, can be replaced by , the probability that two arbitrary edges that to not share any vertex cross when their vertices are arranged linearly at random, which yields [15]

 E0[C]=Cmax/3 (0)

with

 Cmax=n2(n−1−⟨k2⟩) (0)

being the number of edge pairs that can potentially cross and the degree 2nd moment of the tree [10]. is the mean of squared degrees, i.e.

 ⟨k2⟩=∑vk2v, (0)

where is the degree of vertex . In uniformly random labeled trees, the expected is [16, 17]

 E[⟨k2⟩]=(1−1n)(5−6n). (0)

Thus, the expectation of for those trees is

 E[E0[C]] = n6(n−1−E[⟨k2⟩]) (0) = n26−n+116−1n.

This analytical result is easy to check numerically by generating random linear arrangements of vertices of random trees with the procedure in Fig. 2.

Here we aim to improve introducing information about the actual length of the dependencies. Suppose that

 p(u1∼v1~{}and~{}u2∼v2~{}cross|d) (0)

is the probability that the edges  and  cross in a random linear arrangement of vertices where edge lengths are given by the function above. Then, , the expected number of crossings given full knowledge about edge lengths, can be defined as

 see eq. ( ‣ The expected number of crossings)

The calculation of for a given sentence is not straightforward: it requires the calculation of all the permutations of the words of the sentence preserving the edge lengths of the original sentence. Besides, makes a prediction about the crossings of a dependency tree involving a lot of information: the edges of the tree and their length. In contrast, can be computed just from knowledge about the degree sequence or simply the values of and , as eqs. (The expected number of crossings) and (The expected number of crossings) indicate. Here we aim to predict the number of crossings reducing the computational and informational demands of while beating the predictions of .

is defined as the probability that two edges that are arranged linearly at random cross knowing that their lengths are and and that they do not share any vertex. Replacing

 p(u1∼v1~{}and~{}u2∼v2~{}cross|d) (0)

by in eq. The expected number of crossings, one obtains

 see eq. ( ‣ The expected number of crossings)

refers to an approximation to the expected value of knowing the length of edges in every potential crossing (giving priority to the knowledge about the lengths of the pair of edges that may cross in every potential crossing as in eq. (The expected number of crossings)). is an approximation to that is based on a stronger null hypothesis than that of for the probability that two edges cross. and are true expectations (notice ). While conditions globally with the function , i.e. the same conditioning for every pair of edges that may cross, conditions locally with two edge lengths that depend on the pair of edges under consideration (Eq. The expected number of crossings versus Eq. The expected number of crossings). In the remainder of the article two virtues of over will be shown. First, is easier to calculate. Second, it predicts with small error in spite of discarding, for every pair of edges that may potentially cross, the lengths of other edges. The point is: if such a rough but simple predictor of crossing works, is it necessary to believe that crossings are forbidden by grammars [2] or postulate an independent principle of minimization of [8]?

### The probability that two edges cross knowing their lengths

The set is defined as the set of possible initial positions for an edge of length in a sequence of length , i.e.

 S(n,d)={s|1≤s≤n−d}. (0)

We say that and are a valid pair of initial positions if they define the initial positions of two edges that have lengths and , respectively, and that do not share vertices, i.e. , and .

can be defined as a proportion, i.e.

 p(cross|d1,d2)=|α(d1,d2)||β(d1,d2)|, (0)

where here is the cardinality operator, is the set of valid pairs of initial position of two edges of lengths and that involve a crossing and is simply the set of valid pairs of initial positions of edges of lengths and . More formally,

 β(d1,d2)={s1,s2|s1~{}and~{}s2% ~{}are valid initial positions} (0)

and

The definition of is based on an adapted version of the formal definition of crossing in the introduction section (notice that ). Fig. 3 shows for two different number of vertices. If then and then is undefined (notice that ). If that happens, the reasonable convention that is adopted. The order of edge length information is irrelevant, i.e. as Fig. 3 shows. Some crossings are impossible a priori, i.e. and some others are unavoidable, e.g., (we are assuming ).

and are related through

 n−1∑d1=1n−1∑d2=1p(cross|d1,d2)p(d1,d2)=p(cross), (0)

where is the probability that a random linear arrangement of four different vertices, i.e. ,, and , produces and .

## Results

The relative number of crossings is defined as and thus . Table 1 shows that makes better predictions about the (absolute or relative) number of crossings than for the real syntactic dependency trees in Fig. 1. and allow for a fairer comparison of the real number of crossings and its predictions as they measure crossings in units of the potential number of crossings. We wish to investigate if might shed light on the small number of crossings of real sentences abstracting away from the details of a concrete language, in the spirit of a long tradition of research on crossing dependencies [20, 21]. Our language neutral perspective is not based on the analysis of real syntactic dependency trees but those of uniformly random labeled trees whose vertex labels are distinctive numbers from to that also represent the positions of the vertices, i.e. . Here we aim to compare the capacity of and to predict , the real number of a crossings in uniformly random labeled trees, when is small () as in real sentences [4, 5]. The relative error of the prediction is defined as

 Δx = Ex[¯C]−¯Ctrue (0) = (Ex[C]−Ctrue)/Cmax.

For every sentence of length (because needs it [10]), an ensemble of uniformly random labeled trees with was generated (a) following the procedure in Fig. 2 and (b) rejecting random trees yielding till the desired size was reached. For every relevant value of (), the mean was calculated over all configurations where ( is only achieved by star trees [10]). was the maximum sentence length considered due to the explosion of rejections as increases. The space of possible trees is huge (there are labeled trees of vertices [22]) and trees with have a number of crossings that is unexpectedly low for that class of random trees (recall eq. (The expected number of crossings)). These considerations notwithstanding, covers the average length of English sentences (about 17.8 words [23, pp. 37-55]), and that of other languages [12].

Fig. 4 shows the mean over ensembles of random trees with indicating both and overestimate in general. While is small, i.e. of the order of , converges to as expected from the fact that

 Δ0 = (Cmax/3−Ctrue)/Cmax (0) = 1/3−Ctrue/Cmax,

which yields for sufficiently large and small.

## Discussion

It has been shown that is able to predict the actual relative number of crossings in random unlabeled trees. This is not very surprising: edge length does give information on how likely edges are to cross. What is not straightforward is that a method that estimates crossings based exclusively on local dependency length information (just on the length of the pair of edges that can potentially cross) is able to make predictions with a small relative error in trees of the size of real sentences. Our finding has important consequences for language research: it suggests that there is no need a priori for banning crossings by grammar [2] or minimizing [8] to explain in short enough sentences. This is consistent with the view that syntactic constraints, in general, do not imply an internally represented grammar [21].

However, the predictive power of decreases slightly as the number of vertices increases (Fig. 4). The reason is very simple: departs from an estimation of the probability that two edges cross that is based exclusively on their lengths, thus discarding the length of other edges. neglects the length of edges. As increases, the amount of information discarded increases and predictions worsen. In the tree in Fig. 1 (c), the only pairs of edges that could cross in the sense of (i.e. if dependency lengths of other edges were ignored) are and (recall that edges of length 1 or cannot produce crossings). Eq. (The probability that two edges cross knowing their lengths) gives but ( can only be achieved placing and at the ends of the sequence, which turns impossible). For this reason, , the expected relative number of crossings knowing all edge lengths in every potential crossing, should be investigated in the future.

We are grateful to D. Blasi, R. Czech, E. Gibson and G. Morrill for helpful discussions. This work was supported by the grant BASMATI (TIN2011-27479-C04-03) from the Spanish Ministry of Science and Innovation.

## References

• [1] Mel’čuk I., Dependency syntax: theory and practice (State of New York University Press, Albany) 1988.
• [2] Hudson R., Language networks. The new word grammar (Oxford University Press, Oxford) 2007.
• [3] Levy R., Fedorenko E., Breen M. Gibson E., Cognition, 122 (2012) 12 .
• [4] Lecerf Y., Rapport CETIS No. 4, (1960) 1 Euratom.
• [5] Hays D., Language, 40 (1964) 511.
• [6] Liu H., Lingua, 120 (2010) 1567.
• [7] Ferrer-i-Cancho R., Europhysics Letters, 76 (2006) 1228.
• [8] Liu H., Journal of Cognitive Science, 9 (2008) 159.
• [9] Morrill G., Valentín O. Fadda M., Dutch grammar and processing: A case study in TLG in Logic, Language, and Computation, edited by Bosch P., Gabelaia D. Lang J., Vol. 5422 of Lecture Notes in Computer Science (Springer Berlin Heidelberg) 2009 pp. 272–286.
• [10] Ferrer-i-Cancho R., Glottometrics, 25 (2013) 1.
• [11] Hochberg R. A. Stallmann M. F., Information Processing Letters, 87 (2003) 59.
• [12] Ferrer-i-Cancho R., Physical Review E, 70 (2004) 056135.
• [13] Ferrer-i-Cancho R., Why might SOV be initially preferred and then lost or recovered? A theoretical framework in proc. of THE EVOLUTION OF LANGUAGE - Proceedings of the 10th International Conference (EVOLANG10), edited by Cartmill E. A., Roberts S., Lyn H. Cornish H., (Wiley, Vienna, Austria) 2014 pp. 66–73 Evolution of Language Conference (Evolang 2014), April 14-17.
• [14] Chen W. Y. C., Han H. S. W. Reidys C. M., Proceedings of the National Academy of Sciences, 106 (2009) 22061.
• [15] Ferrer-i-Cancho R., http://arxiv.org/abs/1305.4561, (2013) .
• [16] Noy M., Discrete Mathematics, 180 (1998) 301.
• [17] Moon J., Counting labelled trees presented at Canadian Math. Cong. 1970.
• [18] Aldous D., SIAM J. Disc. Math., 3 (1990) 450.
• [19] Broder A., Generating random spanning trees in proc. of Symp. Foundations of Computer Sci., IEEE (New York) 1989 pp. 442–447.
• [20] de Vries M. H., Petersson K. M., Geukes S., Zwitserlood P. Christiansen M. H., Philosophical Transactions of the Royal Society B: Biological Sciences, 367 (2012) 2065.
• [21] Christiansen M. H. Chater N., Cognitive Science, 23 (1999) 157.
• [22] Cayley A., Quart. J. Math, 23 (1889) 376.
• [23] Leech G. N. Short M. H., Style in fiction (Longman, London) 2007.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters