Ranking a set of objects: a graph based least-square approach

Ranking a set of objects: a graph based least-square approach

Abstract

We consider the problem of ranking objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We assume that objects are endowed with intrinsic qualities and that the probability with which an object is preferred to another depends only on the difference between the qualities of the two competitors. We propose a class of non-adaptive ranking algorithms that rely on a least-squares optimization criterion for the estimation of qualities. Such algorithms are shown to be asymptotically optimal (i.e., they require comparisons to be -PAC). Numerical results show that our schemes are very efficient also in many non-asymptotic scenarios exhibiting a performance similar to the maximum-likelihood algorithm. Moreover, we show how they can be extended to adaptive schemes and test them on real-world datasets.

Ranking algorithms, noisy evaluation, applied graph theory, least-square estimation

1 Introduction

Ranking algorithms have many applications. For example they are used for ranking pages, users preferences against advertisements on the web, hotels, restaurants, or online games [1, 2]. In general a ranking algorithm infers an estimated order relation among objects starting from a set of evaluations or comparisons. Sometimes, such evaluations are performed by human “workers” in the framework of crowdsourcing applications. However, since the behavior of humans cannot be deterministically predicted, it is usually described through the adoption of a probabilistic model. Then, the challenge in designing algorithms, is the ability to infer reliable estimates of the ranking starting from “noisy” evaluations of the objects. Often the ranking algorithm resorts to pairwise comparisons of objects. In this work, we focus on such a class of ranking algorithms. Several stochastic models have been proposed in the literature [3, 4, 5, 6] to represent the outcome of comparisons. Most of them are based on the idea that objects to be compared have an intrinsic quality and that the probability, , that object is preferred to object depends on their qualities and . In this context, we devise a class of efficient algorithms, which reconstruct object qualities from pairwise difference through a least-square (LS) approach. To do so, we establish a parallelism between the estimation process and the average cumulative reward of random walks on a weighted graph.

1.1 System model

Let be a compact set. We assume that objects are available for ranking: object is provided with an intrinsic quality, , which is unknown to the system. Qualities induce a true ranking among objects, in which iff 1. A ranking algorithm resorts to a set of observations (or answers) provided by workers, which compare pairs of objects and return the identity of the object they prefer. The comparison procedure implicitly contains some randomness reflecting the workers’ behavior. Thus, in general, workers’ answers can be modeled as a collection of binary random variables, whose distribution depends on the qualities of the objects to be evaluated.

Due to this randomness in the evaluation process, the inferred ranking for object , , does not always coincide with the true ranking . The reliability of depends on how the evaluation process is organized. In particular, it depends on (i) the workers’ behavior, (ii) the choice of the set of object pairs to be compared, (iii) the number of workers assigned to each pair of objects, and (iv) the processing algorithm used to infer the ranking from workers’ answers.

We assume that all workers behave similarly and that they provide independent answers. In particular, a worker comparing objects and , will express a preference for object against with probability:

(1)

where the function is differentiable and strictly increasing in its argument (and therefore invertible) and such that . Moreover, we assume that is bounded away from zero for where . When the pair of objects is compared, the worker’s output is modeled as a binary random variable, , whose outcomes have probability

(2)

The model in (1) is pretty general. For example, it encompasses

  • the Thurstone model [5], where the preferred object (in a pair) is chosen in accordance with the qualities as perceived by the worker and defined as

    respectively, where and are zero-mean random variables that represent noise terms. In this case is the cumulative distribution function of the zero-mean random variable , i.e.,

    (3)
  • the Bradley-Terry-Luce (BTL) model [3, 4], where

    (4)

Let be the set of objects. We observe that an arbitrary choice of a set of object pairs to be compared, denoted by , automatically induces an undirected graph , whose vertex and edge sets are, respectively, and . Clearly, it is possible to infer a ranking among the objects only if the graph is connected.

Each object pair is assigned to a number of workers . In general, an increase of leads to a more reliable estimate of the ranking. On the other hand, the overall complexity, , of the ranking algorithm is proportional to the total number of workers employed in the process, i.e.,

Then, an efficient ranking algorithm must find a good trade-off between the complexity and the reliability of the inferred ranking, i.e., by returning an almost correct ranking of objects with a minimal number of pair comparisons.

About the reliability of the inferred ranking we say that an estimated ranking is -quality approximately correct (or, is an -quality ranking) if whenever . Moreover a ranking algorithm is -PAC [7, 8, 9] if it returns an -quality ranking with a probability larger than .2

1.2 Paper contribution and related work

This paper contributes to a better understanding of the fundamental limits of ranking algorithms based on noisy pairwise comparisons. Our main results complement and extend previous findings about minimal complexity of ranking algorithms under different non-parametric preference models recently derived in [8, 9]. As shown in former studies the efficiency of ranking algorithm is crucially determined by the structure of the underlying preference model.

On the one hand, under a non-parametric preference model satisfying both Strong Stochastic Transitivity (SST) and Stochastic Triangle Inequality (STI) properties,3 a provably asymptotically-optimal4 adaptive algorithm has been proposed, under the restriction that . In particular, the algorithm proposed in [9] is -PAC provided that comparisons are dynamically allocated on the basis of previous outcomes. On the other hand, in [8, 9], it is shown that comparisons are strictly needed to obtain a reliable ranking as soon as either STI or SST are relaxed.

When considering parametric models, estimating a ranking is essentially related to estimating the underlying qualities. [10, 11] provide a characterization of the expected norm-two distance between estimated and true qualities (later on referred to as mean square error (MSE)), in connection with the properties of a fixed graph . In particular [10], under the assumption that is log-concave, provides universal (i.e., applicable to optimal algorithms, such as the maximum-likelihood (ML) algorithm) order-optimal upper and lower bounds for the MSE, relating it to the spectral gap of a certain scaled version of the Laplacian of . The very recent paper [11], for the BTL model only, introduces a LS algorithm and provides upper and lower bounds for a variant of the MSE and the relative tail-probabilities achievable by such algorithm, characterizing it in terms of the graph resistance.

Interesting works are also [12, 13, 14, 15, 16, 17, 18]. In  [16, 17] a LS approach for ranking is first introduced, but no theoretical guarantees are given. In particular, [17] proposes Sync-Rank, a semi-definite programming algorithm based on the angular synchronization framework. In [12, 13], instead, an iterative algorithm that emulates a weighted random walk of graph is proposed and its performance analyzed under the BTL model. In particular, it provides bounds on the MSE and the corresponding tail-probabilities. A direct comparison between the performance of algorithms proposed in [14, 15, 11] is reported in  [11] where the LS approach is shown to be, in general, asymptotically more efficient. Under the BTL model,  [12, 13] propose and analyze algorithms able to identify the top- quality objects. At last, [18] describes a ranking algorithm based on the singular value decomposition approach by assuming that workers return unquantized noisy estimates of objects quality differences.

Regarding online ranking algorithms, in [7], for the BTL model, an online algorithm inspired to a finite-budget version of quick sort is described, able to obtain an -PAC ranking with comparisons. In [19], it is shown that, for online ranking algorithms, parametric models help to reduce the complexity only by logarithmic factors, in order sense.

In this work, unlike [11], we introduce a rather general parametric preference model according to which preference probabilities are determined by an arbitrary smooth monotonic function of object-quality differences. In this scenario, we show that order-optimal non-adaptive algorithms can be defined without the necessity of introducing any restriction to parameter . In particular, differently from [10, 11], we work with the PAC framework and show that our algorithms are -PAC, provided that comparisons are blindly allocated in a single round. Observe that our preference model does not necessarily satisfy STI, while it satisfies SST. Our ranking procedure is based on the reconstruction of object qualities from pairwise quality differences, by adopting a LS approach akin to the one in [11]. Notice however that the analysis in [11] only applies to the case where total comparisons are performed. Our analysis establishes a parallelism between the quality estimation process and the cumulative reward of random walks on graphs. As an original contribution, we also introduce a weighted LS algorithm with performance very close to the more complex ML algorithm. Finally, by simulation, we show that the performance of our algorithms is extremely good also in non-asymptotic scenarios.

The paper is organized as follows: in Section 2 we introduce a ranking algorithm based on the Maximum Likelihood (ML) approach, which is used as a performance reference. In Section 3 we describe our proposed LS estimation algorithm, whose asymptotic analysis is investigated in Section 4. The LS estimation algorithm is then tested in Sections 5 and 6 against synthetic and real-world datasets, respectively. Finally, in Section 7 we draw our conclusions.

1.3 Notation

Boldface uppercase and lowercase letters denote matrices and vectors, respectively. is the identity matrix. The transpose of the matrix is denoted by , while indicates its -th entry. For the sake of notation compactness we use the notation to define a matrix whose elements are . Finally, the symbol represents the Hadamard product, while calligraphic letters denote sets or graphs.

2 Maximum-likelihood quality estimation

Consider a graph with vertices where each pair of objects is evaluated times by independent workers 5. Without loss of generality we assume that the indices of the objects connected by the generic edge are such that . Moreover, we assume that the -th worker evaluating the pair of objects outputs the binary random variable whose distribution is given by (2).

In our proposed ML approach, the estimate of the ranking can be obtained by sorting the quality estimates which are obtained as follows:

(5)

When workers are independent on each other and behave similarly, the random variables can be modeled as independent and identically distributed. Therefore, the conditional probability in (5) factorizes as

By using (2) we write

where we recall that . By substituting the above result in (5), the ML estimate of the qualities can be rewritten as

(6)

where

and . The function has a finite global maximum. Indeed, since , and , it is straightforward to show that . However, in general, is a non-linear function of and its maximization non trivial. Nevertheless, a local maximum can be found by using standard techniques such as, for example, the Newton-Raphson method which works iteratively and requires the function to be twice differentiable.

Let be the estimate of at iteration . Then the estimate of at iteration can be updated as follows:

where and are, respectively, the gradient and the Hessian matrix of . Specifically, and . In order to compute and consider a generic node and the set of edges connecting node to its neighbors. Then, the function can be rewritten as

(7)

where the term does not depend on . Since , we can write the partial derivatives of as follows:

and, similarly

It immediately follows that

and

Moreover, for

The above equations can be specialized for both the Thurstone model as well as for the BTL model, by using the expressions for provided, respectively, in (3) and (4).

3 Least-squares quality estimation

We propose a simpler linear estimation algorithm, based on a least-square criterion, that can be applied on the graph . Let the distance between objects and be

and let be the set of binary answers, of cardinality , provided by the workers comparing the pair . Also, let be the number of times object is preferred to object . Then, by construction, follows the binomial distribution , where . Out of the evaluation results, an estimate of is formed as

(8)

where is the estimate of , and represents the estimation error on the probability . Note that has zero mean and variance . As a consequence, , where represents the error on the estimate of induced by the presence of . From the set of noisy estimates , the estimate of can be obtained by solving the following LS optimization problem

(9)

where are arbitrary positive weights, whose setting is discussed in Section 3.1. The solution of (9) satisfies the following linear equations:

(10)

where represents the neighborhood of node (i.e., the set of nodes connected to in ), and is its generalized degree, i.e., . We can compactly express the previous linear system in terms of the matrix associated to the graph , whose elements are defined as

Let be the identity matrix, , and . Moreover let and be, respectively, the antisymmetric matrices of the true and estimated quality differences6. Thus, from (10) we can write:

(11)

where represents the Hadamard product and is a column vector of size . We observe that, by construction, , i.e., is singular. Indeed , as it can be easily checked. This implies that the associated linear operator on is not injective and that, given a solution of (10), also is a solution of (10) for any . Note, however, that, for the purposes of object ranking, the actual value of is irrelevant, since every solution of the form induces the same object ranking. Therefore, we can arbitrarily fix the quality of, say, object to 0 as a reference, i.e., . To keep into account this constraint, we define the new matrices and as follows:

(12)

and , respectively. We then replace and in (11) with, respectively, and . Since is full rank, solving for we obtain

(13)

where we have used the fact that .

3.1 Weight optimization

In the following, we will consider two possible choices for the weights . The first, which will be studied in the next section for its simplicity, corresponds to for all , and will be called unweighted LS or simply LS. The second, which will be called weighted LS (WLS), is dictated by the fact that the estimates do not have the same reliability. Indeed, by developing (8) at the first order for , we obtain

so that, if we neglect the higher-order term, is a zero-mean random variable with variance

Given the values of , , the optimal weights for in (10) are then proportional to . For our WLS algorithm, we will then set , with

where , for a small positive parameter such that exists finite. Note that, under this setting

4 Asymptotic analysis of the Least-Square Estimator

All the theoretical results in this section are obtained by considering the unweighted LS estimator, for simplicity. However, they can be extended to the general weighted case as long as is bounded away from 0, as for the case described in Section 3.1.

The following propositions derive the conditions for the asymptotic convergence of the estimated qualities to their true values. We start by presenting a preliminary asymptotic result on the mean square error.

Proposition 4.1

Consider the unweighted LS estimator in (10). Assume that the degree of nodes of the graph are upper-bounded and define . Then the mean square error (MSE) on the estimates can be bounded by

(14)

where is a constant, is the largest eigenvalue of and for a sufficiently large .

The proof is provided in Appendix A.

Even if an expression similar to (14) is reported in [10], we recall that the latter was derived for perfect ML-estimators under the assumption that is log-concave; our results, instead, apply to LS algorithm for a generic strictly-increasing . Furthermore, (14) complements and extends results in [11] under more general settings (we recall that results in [11] apply to the BTL model only). It is also to be noted that the theoretical results in [11] only apply to the regime where is large, i.e., . Under such constraint, for any connected graph, the total complexity of the algorithm in terms of number of comparisons is at least .

From (14), we can deduce that, whenever is bounded (as for example in the case of Ramanujan graphs), by symmetry, , . Thus, if for , then converges in probability to .

To find out the minimum number of comparisons under which the LS approach satisfies the -PAC conditions, we need to evaluate for . The following proposition gives sufficient conditions in order for the absolute error to converge to zero in the properly defined limiting regime.

Proposition 4.2

Consider the unweighted LS estimator in (10). For any , as grows, , provided that

  1. (i.e., the -norm of is bounded),

  2. the total number of edges of is , and for some .

Assumption i) can be weakened by the following condition i’):

  1. .

The proof is provided in Appendix B.

Remark 4.1

Note that Proposition 4.2 provides sufficient conditions for the existence of a -PAC ranking algorithm with complexity . In the following subsection we characterize classes of graphs meeting condition (i) or (i’) of Proposition 4.2.

4.1 Considerations on graphs structure

Proposition 4.2 grants that the absolute error can be well controlled as under some conditions on the matrix (condition (i) or (i’)). Such conditions hold depending on the structure of the graph . In order to characterize the class of graphs for which condition (i) or condition (i’) holds, we first observe that (13) computes the quality of object as the average value of the sum of estimated quality differences along all paths joining node to the reference node . In other words, can be regarded as the average total reward earned by a standard random walk that starts from node and stops as soon it hits node , when estimates are the elementary rewards associated to graph edges [20]. Then, in Section 4.1.1 we restrict our asymptotic analysis to directed and acyclic graphs. Finally, in Section 4.1.2 we extend it to the more general class of undirected graphs.

is directed and acyclic

An explicit solution of (10) can be given when graph is turned into a directed graph, i.e., by imposing to all edges one of the two possible directions. While this assumption is suboptimal, since it constrains the random walk to a subset of possible trajectories, it greatly simplifies the analysis. Indeed, first observe that, for directed graphs, (10) can be rewritten as:

where represents the set of in-neighborhoods of and . Then, when the graph is directed and acyclic, and has the reference node, , as a common ancestor, an explicit solution for , , is

(15)

where

and is the length of the longest (simple) path from node to the reference node. Proposition 4.3 gives sufficient conditions for a directed acyclic graph to meet the requirements of Proposition 4.2. The proposition exploits the notion of proximality between nodes according to the following definition:

Definition 4.1

Given a family of graphs , we say that a node is proximal to the reference node , with parameters , if a random walk starting from reaches the reference node within hops with a probability that is asymptotically (with ) bounded below by .

Proposition 4.3

Given a family of directed and acyclic graphs with bounded diameter, condition (i’) of Proposition 4.2 is satisfied if one of the following three conditions is met: (i) all paths from any node to the reference have bounded length, (ii) , or (iii) a fraction bounded away from 0 of the in-neighbors of any node is proximal for some and .

The proof is provided in Appendix C.

is undirected

Now, let us go back to the original formulation (10) on the undirected graph. In the following, we will show that, considered from the point of view of a given node, the solution of (10) for an undirected graph can be obtained by defining an equivalent problem for a properly defined directed acyclic graph. Consider the graph on nodes and let be the matrix obtained from matrix by removing the last row and column (i.e., those corresponding to the reference node ). Consider a given node , , and notice that gives the average number of times that node is visited in the random walk starting from , before ending in the reference node [21]. Let

(16)

be the average number of times any edge incident to node is traversed in the direction from to its neighbors, in the standard random walk defined on . Now, define a directed graph , where if and only if and one of the two following conditions are satisfied: (i) and , or (ii) , , and .

Notice that . Let be the set of in-neighbors of in . It can be easily verified that in node has only in-neighbors and node (the reference node) has only out-neighbors. It is also easy to prove that is acyclic. Indeed, suppose that the cycle belongs to . This implies that, by definition, , which is impossible.

Let us also define a biased random walk on digraph , for which, given that the current node is , the probability of taking outgoing edge , is given by

(17)

The following proposition relates the standard random walk on to the biased random walk on .

Proposition 4.4

The estimate of given by (13) on can be obtained by solving

(18)

on , and then setting .

The proof is provided in Appendix D.

According to Proposition 4.4, can be equivalently seen as the average total reward of the standard random walk on graph or as the average total reward of the biased random walk on graph . The following proposition gives sufficient conditions for a family of graphs to meet the conditions of Proposition 4.2.

Proposition 4.5

Given a family of graphs with bounded diameter, condition i’) of Proposition 4.2 is satisfied if, for each node , one of the following conditions are satisfied: (i) all paths in from to the reference have bounded length, (ii) in , a fraction bounded away from 0 of the in-neighbors of any node is proximal.

The proof is provided in Appendix E.

Example 4.1

Consider the family of complete graphs on nodes, i.e., .7 Because of symmetry, we can easily see that, after a proper permutation of the nodes, for every . For the same reason, in , for . Thus, the only surviving edges in are the edges connected either to node 1 or to the reference node . Then, the maximum path length from node 1 to the reference is 2. Thus this family of graphs meets condition i) of Prop. 4.5. In particular, the estimate of is given by

Example 4.2

Let and be any two positive numbers. Let us build the family of graphs as follows. Nodes (a set that includes the reference) are “hubs” with potentially unbounded degree. The subgraph induced by the hub nodes is a connected arbitrary graph. The remaining nodes are divided into subsets . Subset , , is composed of nodes with maximum degree , which are neighbors of hub node and whose other neighbors all belong to . It is easy to see that, for this family of graphs, the diameter is bounded by .

Consider a node . Since all paths that reach the reference must pass through the hub nodes, it is easy to see that, in , node is connected only to nodes belonging to . Whenever the biased random walk on leaves (by reaching hub node ) does not enter it any more. Thus, we can divide into two parts the biased random walk: the first on the subgraph of induced by , where hub node serves as reference, and the second on the hub nodes. Then, we can deduce the following facts.

  • In the first part of the random walk, since hub node is the reference, the probability of reaching it in one step from any node in is larger than . Thus, the probability of reaching it within steps is upper-bounded by .

  • The second part of the random walk lasts for at most steps.

Thus, every node is proximal with parameters , and condition ii) of Prop. 4.5 is satisfied.

Remark 4.2

Although our unweighted LS estimator is akin to the one in [11], our analysis of its performance differs substantially, also because we consider the PAC approach. Consequently, our characterization of “good” graphs does not coincide with that of [11]. For instance, a particular case of Example 4.2 is the wheel graph, which corresponds to choosing (the reference node as the only hub) and . From [11, Theorem 1], the wheel graph would require comparisons per edge in order for the upper bound on the estimation error to hold, since every edge belongs to a simple path from any node to the reference. Instead, Prop. 4.5 allows to conclude that is enough to achieve the -PAC.

Example 4.3

Star graphs represent a particular sequence of acyclic graphs with bounded-length paths. Therefore, they satisfy condition i) of Prop. 4.5. In such a case, an object (let us say object 1) is taken as pivot (i.e. center of the star) and qualities of all the other objects are estimated only through direct comparisons with the pivot. Observe, that, in such particular case, ranking among objects can be directly inferred from without the necessity of inverting function . In practice it is enough to rank objects according to the following rules: iff and iff . Therefore, star graphs are appealing when function (i.e. the precise worker model) is not known.

5 Results with synthetic datasets

Fig. 1: Error probability achieved by several ranking algorithms plotted versus the complexity per object , for . Object qualities are equally spaced in and the workers behave according to the Thurstone model.

Fig. 2: Error probability achieved by several ranking algorithms plotted versus the complexity per object , for . Object qualities are equally spaced in and the workers behave according to the Thurstone model.

We present numerical results showing the performance of our proposed algorithms for moderate values of . In Figures 1 and 2 we compare the error probability achieved by several ranking algorithms versus the complexity per object . Objects qualities are equally spaced in the range , i.e., object has quality where (Fig. 1) and (Fig. 2). Workers’ behavior is described by the Thurstone model detailed in Section 1.1 where and is the cdf of a Gaussian random variable with zero mean and standard deviation . On the -axis we display the empirical probability of generating an output which is not an -quality ranking, for . Note that an error is counted whenever at least two objects, whose quality difference exceeds , appear swapped in the estimated ranking. The curve labeled “MergeRank” refers to the Merge-Rank algorithm proposed in [22], which we consider as a performance reference. The LS, WLS 8, and ML algorithms have been applied to randomly generated regular graphs [23] whose nodes have degree (lines with square marker) and (lines without markers). The figure shows the superior performance of our ranking algorithm. It is interesting to observe that the WLS algorithm provides significant enhancements with respect to the LS algorithm and almost perfectly matches the performance of the more (computationally) complex ML approach. As the number of nodes increases our proposed solutions substantially outperform the “MergeRank” algorithm.

Fig. 3: Performance of the LS and WLS ranking algorithms plotted against the complexity per object , for objects. Object qualities are drawn from a uniform distribution in and the workers behave according to the Thurstone model.

Figure 3 compares the performance of the LS and of the WLS algorithms for objects. Object qualities are randomly generated according to a uniform distribution in . Other system parameters are set as in Figure 2. The figure reports the empirical error probability plotted versus the number of tests per node, , for different values of the degree of the nodes in the graph. We first observe that, given , the number of tests per edge of the graph decreases as the degree, , increases. Hence, as increases, distances between pairs of nodes (corresponding to edges of the graph) are estimated with a decreasing accuracy. In spite of that, a larger number of neighbors for each node (i.e., a larger ) leads a more reliable estimation of object qualities. This effect is more evident when the WLS algorithm is employed. Indeed, because of the weights , as increases, WLS is able to well exploit the increasing number of highly-reliable edges in the graph connecting objects with similar qualities; at the same time WLS is able to limit the impact of the greater number of scarcely-reliable edges that connect objects with largely different qualities.

5.1 Adaptive multistage approach

The performance of the proposed ranking algorithms can be improved by adopting a multistage approach where, at each stage, new edges are added to the graph, depending on the quality estimates obtained at previous stage. The rationale of this approach stems from the fact that such algorithms provide approximate rankings, in which the probability of swapping the order of two objects increases as their distance (in terms of their qualities) decreases. Therefore, in order to mitigate this phenomenon and, thus, improve the reliability of the estimate, it is convenient to (i) add to the graph extra edges connecting neighboring objects (in terms of their estimated qualities); (ii) assign additional workers to the already existing edges connecting the aforementioned neighboring objects. This procedure can be iterated until a desired performance level is achieved.

In our simulation setup, we have considered a 2-stage approach where we first apply the estimation algorithm to a random regular graph, , of degree , obtaining the vector of estimates . In the second stage, we create a new regular graph, of degree , where each node is connected to its closest neighbors, according to the estimates . Finally, the estimation algorithm is applied to the graph obtaining the output which is used to infer the ranking. In Figure 4 we show the performance of the ML and WLS algorithms when the proposed multistage approach is employed. For both algorithms we show the error probability versus the number of tests per object, , for , and . We observe that the second stage allows for a significant improvement of the performance and a reduction of about 60% of the required tests per object for and of about 30% for . In both cases the performance of the WLS algorithm is very close to that provided by the ML algorithm.

Fig. 4: Error probability provided by ML and WLS algorithms when a 2-stage adaptive approach is employed, for , and .

6 Results with real-world datasets

In this section, we show that our algorithm works well even when considering a real scenario, where the “evaluations” are the outcome of experiments, and not synthetically generated by simulations. In particular, we consider five recent seasons of the English Premier League and build up a complete graph, where nodes are the football teams and edges are the matches between each pair of them. The match between team and team is considered as lasting for 180 minutes, since it includes both the round when is at home and the round where is away. If team has scored goals in the match against team , we count evaluations in favor of when compared to , where and are constant. The total number of comparisons between and is then simply 9.

The WLS algorithm has been run with and both the Thurstone and BTL models, to see the influence of the underlying worker model. The true ranking is assumed to be the final season ranking. The results have been plotted in terms of the Kendall tau distance, which counts the number of inversions in the estimated ranking with respect to the true ranking, i.e. the number of pairs for which is ranked better than in the true ranking and worse than in the estimated one.