Ranking a set of objects: a graph based leastsquare approach
Abstract
We consider the problem of ranking objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We assume that objects are endowed with intrinsic qualities and that the probability with which an object is preferred to another depends only on the difference between the qualities of the two competitors. We propose a class of nonadaptive ranking algorithms that rely on a leastsquares optimization criterion for the estimation of qualities. Such algorithms are shown to be asymptotically optimal (i.e., they require comparisons to be PAC). Numerical results show that our schemes are very efficient also in many nonasymptotic scenarios exhibiting a performance similar to the maximumlikelihood algorithm. Moreover, we show how they can be extended to adaptive schemes and test them on realworld datasets.
1 Introduction
Ranking algorithms have many applications. For example they are used for ranking pages, users preferences against advertisements on the web, hotels, restaurants, or online games [1, 2]. In general a ranking algorithm infers an estimated order relation among objects starting from a set of evaluations or comparisons. Sometimes, such evaluations are performed by human “workers” in the framework of crowdsourcing applications. However, since the behavior of humans cannot be deterministically predicted, it is usually described through the adoption of a probabilistic model. Then, the challenge in designing algorithms, is the ability to infer reliable estimates of the ranking starting from “noisy” evaluations of the objects. Often the ranking algorithm resorts to pairwise comparisons of objects. In this work, we focus on such a class of ranking algorithms. Several stochastic models have been proposed in the literature [3, 4, 5, 6] to represent the outcome of comparisons. Most of them are based on the idea that objects to be compared have an intrinsic quality and that the probability, , that object is preferred to object depends on their qualities and . In this context, we devise a class of efficient algorithms, which reconstruct object qualities from pairwise difference through a leastsquare (LS) approach. To do so, we establish a parallelism between the estimation process and the average cumulative reward of random walks on a weighted graph.
1.1 System model
Let be a compact set. We assume that objects are available for ranking:
object is provided with an intrinsic quality, , which is unknown to the
system. Qualities induce a true ranking among objects, in which
iff
Due to this randomness in the evaluation process, the inferred ranking for object , , does not always coincide with the true ranking . The reliability of depends on how the evaluation process is organized. In particular, it depends on (i) the workers’ behavior, (ii) the choice of the set of object pairs to be compared, (iii) the number of workers assigned to each pair of objects, and (iv) the processing algorithm used to infer the ranking from workers’ answers.
We assume that all workers behave similarly and that they provide independent answers. In particular, a worker comparing objects and , will express a preference for object against with probability:
(1) 
where the function is differentiable and strictly increasing in its argument (and therefore invertible) and such that . Moreover, we assume that is bounded away from zero for where . When the pair of objects is compared, the worker’s output is modeled as a binary random variable, , whose outcomes have probability
(2) 
The model in (1) is pretty general. For example, it encompasses

the Thurstone model [5], where the preferred object (in a pair) is chosen in accordance with the qualities as perceived by the worker and defined as
respectively, where and are zeromean random variables that represent noise terms. In this case is the cumulative distribution function of the zeromean random variable , i.e.,
(3)
Let be the set of objects. We observe that an arbitrary choice of a set of object pairs to be compared, denoted by , automatically induces an undirected graph , whose vertex and edge sets are, respectively, and . Clearly, it is possible to infer a ranking among the objects only if the graph is connected.
Each object pair is assigned to a number of workers . In general, an increase of leads to a more reliable estimate of the ranking. On the other hand, the overall complexity, , of the ranking algorithm is proportional to the total number of workers employed in the process, i.e.,
Then, an efficient ranking algorithm must find a good tradeoff between the complexity and the reliability of the inferred ranking, i.e., by returning an almost correct ranking of objects with a minimal number of pair comparisons.
1.2 Paper contribution and related work
This paper contributes to a better understanding of the fundamental limits of ranking algorithms based on noisy pairwise comparisons. Our main results complement and extend previous findings about minimal complexity of ranking algorithms under different nonparametric preference models recently derived in [8, 9]. As shown in former studies the efficiency of ranking algorithm is crucially determined by the structure of the underlying preference model.
On the one hand, under a nonparametric preference model satisfying
both Strong Stochastic Transitivity (SST) and Stochastic Triangle
Inequality (STI) properties,
When considering parametric models, estimating a ranking is essentially related to estimating the underlying qualities. [10, 11] provide a characterization of the expected normtwo distance between estimated and true qualities (later on referred to as mean square error (MSE)), in connection with the properties of a fixed graph . In particular [10], under the assumption that is logconcave, provides universal (i.e., applicable to optimal algorithms, such as the maximumlikelihood (ML) algorithm) orderoptimal upper and lower bounds for the MSE, relating it to the spectral gap of a certain scaled version of the Laplacian of . The very recent paper [11], for the BTL model only, introduces a LS algorithm and provides upper and lower bounds for a variant of the MSE and the relative tailprobabilities achievable by such algorithm, characterizing it in terms of the graph resistance.
Interesting works are also [12, 13, 14, 15, 16, 17, 18]. In [16, 17] a LS approach for ranking is first introduced, but no theoretical guarantees are given. In particular, [17] proposes SyncRank, a semidefinite programming algorithm based on the angular synchronization framework. In [12, 13], instead, an iterative algorithm that emulates a weighted random walk of graph is proposed and its performance analyzed under the BTL model. In particular, it provides bounds on the MSE and the corresponding tailprobabilities. A direct comparison between the performance of algorithms proposed in [14, 15, 11] is reported in [11] where the LS approach is shown to be, in general, asymptotically more efficient. Under the BTL model, [12, 13] propose and analyze algorithms able to identify the top quality objects. At last, [18] describes a ranking algorithm based on the singular value decomposition approach by assuming that workers return unquantized noisy estimates of objects quality differences.
Regarding online ranking algorithms, in [7], for the BTL model, an online algorithm inspired to a finitebudget version of quick sort is described, able to obtain an PAC ranking with comparisons. In [19], it is shown that, for online ranking algorithms, parametric models help to reduce the complexity only by logarithmic factors, in order sense.
In this work, unlike [11], we introduce a rather general parametric preference model according to which preference probabilities are determined by an arbitrary smooth monotonic function of objectquality differences. In this scenario, we show that orderoptimal nonadaptive algorithms can be defined without the necessity of introducing any restriction to parameter . In particular, differently from [10, 11], we work with the PAC framework and show that our algorithms are PAC, provided that comparisons are blindly allocated in a single round. Observe that our preference model does not necessarily satisfy STI, while it satisfies SST. Our ranking procedure is based on the reconstruction of object qualities from pairwise quality differences, by adopting a LS approach akin to the one in [11]. Notice however that the analysis in [11] only applies to the case where total comparisons are performed. Our analysis establishes a parallelism between the quality estimation process and the cumulative reward of random walks on graphs. As an original contribution, we also introduce a weighted LS algorithm with performance very close to the more complex ML algorithm. Finally, by simulation, we show that the performance of our algorithms is extremely good also in nonasymptotic scenarios.
The paper is organized as follows: in Section 2 we introduce a ranking algorithm based on the Maximum Likelihood (ML) approach, which is used as a performance reference. In Section 3 we describe our proposed LS estimation algorithm, whose asymptotic analysis is investigated in Section 4. The LS estimation algorithm is then tested in Sections 5 and 6 against synthetic and realworld datasets, respectively. Finally, in Section 7 we draw our conclusions.
1.3 Notation
Boldface uppercase and lowercase letters denote matrices and vectors, respectively. is the identity matrix. The transpose of the matrix is denoted by , while indicates its th entry. For the sake of notation compactness we use the notation to define a matrix whose elements are . Finally, the symbol represents the Hadamard product, while calligraphic letters denote sets or graphs.
2 Maximumlikelihood quality estimation
Consider a graph with vertices where each
pair of objects is evaluated times by independent
workers
In our proposed ML approach, the estimate of the ranking can be obtained by sorting the quality estimates which are obtained as follows:
(5) 
When workers are independent on each other and behave similarly, the random variables can be modeled as independent and identically distributed. Therefore, the conditional probability in (5) factorizes as
By using (2) we write
where we recall that . By substituting the above result in (5), the ML estimate of the qualities can be rewritten as
(6)  
where
and . The function has a finite global maximum. Indeed, since , and , it is straightforward to show that . However, in general, is a nonlinear function of and its maximization non trivial. Nevertheless, a local maximum can be found by using standard techniques such as, for example, the NewtonRaphson method which works iteratively and requires the function to be twice differentiable.
Let be the estimate of at iteration . Then the estimate of at iteration can be updated as follows:
where and are, respectively, the gradient and the Hessian matrix of . Specifically, and . In order to compute and consider a generic node and the set of edges connecting node to its neighbors. Then, the function can be rewritten as
(7) 
where the term does not depend on . Since , we can write the partial derivatives of as follows:
and, similarly
It immediately follows that
and
Moreover, for
3 Leastsquares quality estimation
We propose a simpler linear estimation algorithm, based on a leastsquare criterion, that can be applied on the graph . Let the distance between objects and be
and let be the set of binary answers, of cardinality , provided by the workers comparing the pair . Also, let be the number of times object is preferred to object . Then, by construction, follows the binomial distribution , where . Out of the evaluation results, an estimate of is formed as
(8) 
where is the estimate of , and represents the estimation error on the probability . Note that has zero mean and variance . As a consequence, , where represents the error on the estimate of induced by the presence of . From the set of noisy estimates , the estimate of can be obtained by solving the following LS optimization problem
(9) 
where are arbitrary positive weights, whose setting is discussed in Section 3.1. The solution of (9) satisfies the following linear equations:
(10) 
where represents the neighborhood of node (i.e., the set of nodes connected to in ), and is its generalized degree, i.e., . We can compactly express the previous linear system in terms of the matrix associated to the graph , whose elements are defined as
Let be the identity matrix,
, and . Moreover let and
be, respectively, the
antisymmetric matrices of the true and estimated quality
differences
(11) 
where represents the Hadamard product and is a column vector of size . We observe that, by construction, , i.e., is singular. Indeed , as it can be easily checked. This implies that the associated linear operator on is not injective and that, given a solution of (10), also is a solution of (10) for any . Note, however, that, for the purposes of object ranking, the actual value of is irrelevant, since every solution of the form induces the same object ranking. Therefore, we can arbitrarily fix the quality of, say, object to 0 as a reference, i.e., . To keep into account this constraint, we define the new matrices and as follows:
(12) 
and , respectively. We then replace and in (11) with, respectively, and . Since is full rank, solving for we obtain
(13) 
where we have used the fact that .
3.1 Weight optimization
In the following, we will consider two possible choices for the weights . The first, which will be studied in the next section for its simplicity, corresponds to for all , and will be called unweighted LS or simply LS. The second, which will be called weighted LS (WLS), is dictated by the fact that the estimates do not have the same reliability. Indeed, by developing (8) at the first order for , we obtain
so that, if we neglect the higherorder term, is a zeromean random variable with variance
Given the values of , , the optimal weights for in (10) are then proportional to . For our WLS algorithm, we will then set , with
where , for a small positive parameter such that exists finite. Note that, under this setting
4 Asymptotic analysis of the LeastSquare Estimator
All the theoretical results in this section are obtained by considering the unweighted LS estimator, for simplicity. However, they can be extended to the general weighted case as long as is bounded away from 0, as for the case described in Section 3.1.
The following propositions derive the conditions for the asymptotic convergence of the estimated qualities to their true values. We start by presenting a preliminary asymptotic result on the mean square error.
Proposition 4.1
Consider the unweighted LS estimator in (10). Assume that the degree of nodes of the graph are upperbounded and define . Then the mean square error (MSE) on the estimates can be bounded by
(14) 
where is a constant, is the largest eigenvalue of and for a sufficiently large .
The proof is provided in Appendix A.
Even if an expression similar to (14) is reported in [10], we recall that the latter was derived for perfect MLestimators under the assumption that is logconcave; our results, instead, apply to LS algorithm for a generic strictlyincreasing . Furthermore, (14) complements and extends results in [11] under more general settings (we recall that results in [11] apply to the BTL model only). It is also to be noted that the theoretical results in [11] only apply to the regime where is large, i.e., . Under such constraint, for any connected graph, the total complexity of the algorithm in terms of number of comparisons is at least .
From (14), we can deduce that, whenever is bounded (as for example in the case of Ramanujan graphs), by symmetry, , . Thus, if for , then converges in probability to .
To find out the minimum number of comparisons under which the LS approach satisfies the PAC conditions, we need to evaluate for . The following proposition gives sufficient conditions in order for the absolute error to converge to zero in the properly defined limiting regime.
Proposition 4.2
Consider the unweighted LS estimator in (10). For any , as grows, , provided that

(i.e., the norm of is bounded),

the total number of edges of is , and for some .
Assumption i) can be weakened by the following condition i’):

.
The proof is provided in Appendix B.
Remark 4.1
4.1 Considerations on graphs structure
Proposition 4.2 grants that the absolute error can be well controlled as under some conditions on the matrix (condition (i) or (i’)). Such conditions hold depending on the structure of the graph . In order to characterize the class of graphs for which condition (i) or condition (i’) holds, we first observe that (13) computes the quality of object as the average value of the sum of estimated quality differences along all paths joining node to the reference node . In other words, can be regarded as the average total reward earned by a standard random walk that starts from node and stops as soon it hits node , when estimates are the elementary rewards associated to graph edges [20]. Then, in Section 4.1.1 we restrict our asymptotic analysis to directed and acyclic graphs. Finally, in Section 4.1.2 we extend it to the more general class of undirected graphs.
is directed and acyclic
An explicit solution of (10) can be given when graph is turned into a directed graph, i.e., by imposing to all edges one of the two possible directions. While this assumption is suboptimal, since it constrains the random walk to a subset of possible trajectories, it greatly simplifies the analysis. Indeed, first observe that, for directed graphs, (10) can be rewritten as:
where represents the set of inneighborhoods of and . Then, when the graph is directed and acyclic, and has the reference node, , as a common ancestor, an explicit solution for , , is
(15) 
where
and is the length of the longest (simple) path from node to the reference node. Proposition 4.3 gives sufficient conditions for a directed acyclic graph to meet the requirements of Proposition 4.2. The proposition exploits the notion of proximality between nodes according to the following definition:
Definition 4.1
Given a family of graphs , we say that a node is proximal to the reference node , with parameters , if a random walk starting from reaches the reference node within hops with a probability that is asymptotically (with ) bounded below by .
Proposition 4.3
Given a family of directed and acyclic graphs with bounded diameter, condition (i’) of Proposition 4.2 is satisfied if one of the following three conditions is met: (i) all paths from any node to the reference have bounded length, (ii) , or (iii) a fraction bounded away from 0 of the inneighbors of any node is proximal for some and .
The proof is provided in Appendix C.
is undirected
Now, let us go back to the original formulation (10) on the undirected graph. In the following, we will show that, considered from the point of view of a given node, the solution of (10) for an undirected graph can be obtained by defining an equivalent problem for a properly defined directed acyclic graph. Consider the graph on nodes and let be the matrix obtained from matrix by removing the last row and column (i.e., those corresponding to the reference node ). Consider a given node , , and notice that gives the average number of times that node is visited in the random walk starting from , before ending in the reference node [21]. Let
(16) 
be the average number of times any edge incident to node is traversed in the direction from to its neighbors, in the standard random walk defined on . Now, define a directed graph , where if and only if and one of the two following conditions are satisfied: (i) and , or (ii) , , and .
Notice that . Let be the set of inneighbors of in . It can be easily verified that in node has only inneighbors and node (the reference node) has only outneighbors. It is also easy to prove that is acyclic. Indeed, suppose that the cycle belongs to . This implies that, by definition, , which is impossible.
Let us also define a biased random walk on digraph , for which, given that the current node is , the probability of taking outgoing edge , is given by
(17) 
The following proposition relates the standard random walk on to the biased random walk on .
Proposition 4.4
The proof is provided in Appendix D.
According to Proposition 4.4, can be equivalently seen as the average total reward of the standard random walk on graph or as the average total reward of the biased random walk on graph . The following proposition gives sufficient conditions for a family of graphs to meet the conditions of Proposition 4.2.
Proposition 4.5
Given a family of graphs with bounded diameter, condition i’) of Proposition 4.2 is satisfied if, for each node , one of the following conditions are satisfied: (i) all paths in from to the reference have bounded length, (ii) in , a fraction bounded away from 0 of the inneighbors of any node is proximal.
The proof is provided in Appendix E.
Example 4.1
Consider the family of complete graphs on nodes, i.e.,
.
Example 4.2
Let and be any two positive numbers. Let us build the family of graphs as follows. Nodes (a set that includes the reference) are “hubs” with potentially unbounded degree. The subgraph induced by the hub nodes is a connected arbitrary graph. The remaining nodes are divided into subsets . Subset , , is composed of nodes with maximum degree , which are neighbors of hub node and whose other neighbors all belong to . It is easy to see that, for this family of graphs, the diameter is bounded by .
Consider a node . Since all paths that reach the reference must pass through the hub nodes, it is easy to see that, in , node is connected only to nodes belonging to . Whenever the biased random walk on leaves (by reaching hub node ) does not enter it any more. Thus, we can divide into two parts the biased random walk: the first on the subgraph of induced by , where hub node serves as reference, and the second on the hub nodes. Then, we can deduce the following facts.

In the first part of the random walk, since hub node is the reference, the probability of reaching it in one step from any node in is larger than . Thus, the probability of reaching it within steps is upperbounded by .

The second part of the random walk lasts for at most steps.
Thus, every node is proximal with parameters , and condition ii) of Prop. 4.5 is satisfied.
Remark 4.2
Although our unweighted LS estimator is akin to the one in [11], our analysis of its performance differs substantially, also because we consider the PAC approach. Consequently, our characterization of “good” graphs does not coincide with that of [11]. For instance, a particular case of Example 4.2 is the wheel graph, which corresponds to choosing (the reference node as the only hub) and . From [11, Theorem 1], the wheel graph would require comparisons per edge in order for the upper bound on the estimation error to hold, since every edge belongs to a simple path from any node to the reference. Instead, Prop. 4.5 allows to conclude that is enough to achieve the PAC.
Example 4.3
Star graphs represent a particular sequence of acyclic graphs with boundedlength paths. Therefore, they satisfy condition i) of Prop. 4.5. In such a case, an object (let us say object 1) is taken as pivot (i.e. center of the star) and qualities of all the other objects are estimated only through direct comparisons with the pivot. Observe, that, in such particular case, ranking among objects can be directly inferred from without the necessity of inverting function . In practice it is enough to rank objects according to the following rules: iff and iff . Therefore, star graphs are appealing when function (i.e. the precise worker model) is not known.
5 Results with synthetic datasets
We present numerical results showing the performance of our proposed
algorithms for moderate values of . In Figures 1
and 2 we compare the error probability achieved by
several ranking algorithms versus the complexity per object
. Objects qualities are equally spaced in the range ,
i.e., object has quality where
(Fig. 1) and (Fig. 2).
Workers’ behavior is described by the Thurstone model detailed in
Section 1.1 where and
is the cdf of a Gaussian random variable with zero mean and
standard deviation . On the axis we display the
empirical probability of generating an output which is not an
quality ranking, for . Note that an error
is counted whenever at least two objects, whose quality difference
exceeds , appear swapped in the estimated ranking. The
curve labeled “MergeRank” refers to the MergeRank algorithm
proposed in [22], which we consider as
a performance reference. The LS, WLS
Figure 3 compares the performance of the LS and of the WLS algorithms for objects. Object qualities are randomly generated according to a uniform distribution in . Other system parameters are set as in Figure 2. The figure reports the empirical error probability plotted versus the number of tests per node, , for different values of the degree of the nodes in the graph. We first observe that, given , the number of tests per edge of the graph decreases as the degree, , increases. Hence, as increases, distances between pairs of nodes (corresponding to edges of the graph) are estimated with a decreasing accuracy. In spite of that, a larger number of neighbors for each node (i.e., a larger ) leads a more reliable estimation of object qualities. This effect is more evident when the WLS algorithm is employed. Indeed, because of the weights , as increases, WLS is able to well exploit the increasing number of highlyreliable edges in the graph connecting objects with similar qualities; at the same time WLS is able to limit the impact of the greater number of scarcelyreliable edges that connect objects with largely different qualities.
5.1 Adaptive multistage approach
The performance of the proposed ranking algorithms can be improved by adopting a multistage approach where, at each stage, new edges are added to the graph, depending on the quality estimates obtained at previous stage. The rationale of this approach stems from the fact that such algorithms provide approximate rankings, in which the probability of swapping the order of two objects increases as their distance (in terms of their qualities) decreases. Therefore, in order to mitigate this phenomenon and, thus, improve the reliability of the estimate, it is convenient to (i) add to the graph extra edges connecting neighboring objects (in terms of their estimated qualities); (ii) assign additional workers to the already existing edges connecting the aforementioned neighboring objects. This procedure can be iterated until a desired performance level is achieved.
In our simulation setup, we have considered a 2stage approach where we first apply the estimation algorithm to a random regular graph, , of degree , obtaining the vector of estimates . In the second stage, we create a new regular graph, of degree , where each node is connected to its closest neighbors, according to the estimates . Finally, the estimation algorithm is applied to the graph obtaining the output which is used to infer the ranking. In Figure 4 we show the performance of the ML and WLS algorithms when the proposed multistage approach is employed. For both algorithms we show the error probability versus the number of tests per object, , for , and . We observe that the second stage allows for a significant improvement of the performance and a reduction of about 60% of the required tests per object for and of about 30% for . In both cases the performance of the WLS algorithm is very close to that provided by the ML algorithm.
6 Results with realworld datasets
In this section, we show that our algorithm works well even when
considering a real scenario, where the “evaluations” are the outcome
of experiments, and not synthetically generated by simulations. In
particular, we consider five recent seasons of the English Premier
League and build up a complete graph, where nodes are the
football teams and edges are the matches between each pair of
them. The match between team and team is considered as lasting
for 180 minutes, since it includes both the round when is at home
and the round where is away. If team has scored goals
in the match against team , we count
evaluations in favor of when
compared to , where and are
constant. The total number of comparisons between and is then
simply
The WLS algorithm has been run with and both the Thurstone and BTL models, to see the influence of the underlying worker model. The true ranking is assumed to be the final season ranking. The results have been plotted in terms of the Kendall tau distance, which counts the number of inversions in the estimated ranking with respect to the true ranking, i.e. the number of pairs for which is ranked better than in the true ranking and worse than in the estimated one.