Sync-Rank: Robust Ranking, Constrained Ranking and Rank Aggregation via Eigenvector and SDP Synchronization

Sync-Rank: Robust Ranking, Constrained Ranking and Rank Aggregation via Eigenvector and SDP Synchronization

Mihai Cucuringu Department of Mathematics, UCLA, 520 Portola Plaza, Mathematical Sciences Building 6363, Los Angeles, CA 90095-1555, email: mihai@math.ucla.edu
Abstract

We consider the classic problem of establishing a statistical ranking of a set of items given a set of inconsistent and incomplete pairwise comparisons between such items. Instantiations of this problem occur in numerous applications in data analysis (e.g., ranking teams in sports data), computer vision, and machine learning. We formulate the above problem of ranking with incomplete noisy information as an instance of the group synchronization problem over the group SO(2) of planar rotations, whose usefulness has been demonstrated in numerous applications in recent years in computer vision and graphics, sensor network localization and structural biology. Its least squares solution can be approximated by either a spectral or a semidefinite programming (SDP) relaxation, followed by a rounding procedure. We show extensive numerical simulations on both synthetic and real-world data sets (Premier League soccer games, a Halo 2 game tournament and NCAA College Basketball games), which show that our proposed method compares favorably to other ranking methods from the recent literature. Existing theoretical guarantees on the group synchronization problem imply lower bounds on the largest amount of noise permissible in the data while still achieving exact recovery of the ground truth ranking. We propose a similar synchronization-based algorithm for the rank-aggregation problem, which integrates in a globally consistent ranking many pairwise rank-offsets or partial rankings, given by different rating systems on the same set of items, an approach which yields significantly more accurate results than other aggregation methods, including Rank-Centrality, a recent state-of-the-art algorithm. Furthermore, we discuss the problem of semi-supervised ranking when there is available information on the ground truth rank of a subset of players, and propose an algorithm based on SDP which is able to recover the ranking of the remaining players, subject to such hard constraints. Finally, synchronization-based ranking, combined with a spectral technique for the densest subgraph problem, makes it possible to extract locally-consistent partial rankings, in other words, to identify the rank of a small subset of players whose pairwise rank comparisons are less noisy than the rest of the data, which other methods are not able to identify. We discuss a number of related open questions and variations of the ranking problem in other settings, which we defer for future investigation.

July 3, 2019

Key words. ranking, angular synchronization, spectral algorithms, semidefinite programming, rank aggregation, partial rankings, least squares, singular value decomposition, densest subgraph problem.

1 Introduction

We consider the problem of ranking a set of players, given ordinal or cardinal pairwise comparisons on their rank offsets. In most practical applications, such available information is usually incomplete, especially in the setting where is large, and the available data is very noisy, meaning that a large fraction of the pairwise measurements are both incorrect and inconsistent with respect to the existence of an underlying total ordering. In such scenarios, one can at most hope to recover a total (or partial) ordering that is as consistent as possible with the available noisy measurements. For instance, at sports tournaments where all pairs of players meet, in most circumstances the outcomes contain cycles (A beats B, B beats C, and C beats A), and one seeks to recover a ranking that minimizes the number of upsets, where an upset is a pair of players for which the higher ranked player is beaten by the lower ranked player.

Due to the shear size of nowadays data sets, the set of available comparisons between the items is very sparse, with much, or even most, of the data being incomplete, thus rendering the ranking problem considerably harder. In addition, the available measurements are not uniformly distributed around the network, a fact which can significantly affect the ranking procedure. Similarly, the noise in the data may not be distributed uniformly throughout the network, with part of the network containing pairwise measurements that are a lot less noisy than the rest of the network, which provides an opportunity to recover partial ranking which are locally consistent with the given data. We investigate this possibility in Appendix LABEL:sec:PartialRank, and show how our proposed method can be combined with recent spectral algorithms for detecting planted cliques or dense subgraphs in a graph. Furthermore, in many scenarios, the available data is governed by an underlying (static or dynamic) complex network, whose structural properties can play a crucial role in the accuracy of the ranking process if exploited accordingly, as in the case of recent work on time-aware ranking in dynamic citation networks [29].

The analysis of many modern large-scale data sets implicitly requires various forms of ranking to allow for the identification of the most important entries, for efficient computation of search and sort operations, or for extraction of main features. Instances of such problems are abundant in various disciplines, especially in modern internet-related applications such as the famous search engine provided by Google [43, 54], eBay’s feedback-based reputation mechanism [65], Amazon’s Mechanical Turk (MTurk) system for crowdsourcing which enables individuals and businesses to coordinate the use of human labor to perform various tasks [55, 34], the popular movie recommendation system provided by Netflix [16], the Cite-Seer network of citations [30], or for ranking of college football teams [32].

Another setting which can be reduced to the ranking problem, comes up in the area of exchange economic systems, where an item can be exchanged for an item at a given rate; for example, 1 unit of item is worth units of item . Such information can be collected in the form of a (possibly incomplete) exchange matrix , which is a reciprocal matrix since for non-zero entries . Such reciprocal matrices have been studied since the 1970s with the work of Saaty [56] in the context of paired preference aggregation systems, and more recently by Ma for asset pricing in foreign exchange markets [48]. In this setup, the goal is to compute a universal (non-negative) value associated to each item , such that , which can easily be seen as equivalent to the pairwise ranking problem via the logarithmic map , whenever . In other words, the available pairwise measurement is a, perhaps noisy, measurement of the offset . Note that in arbitrage free economic systems, the triangular condition is always satisfied.

Traditional ranking methods, most often coming from the social choice theory literature, have proven less efficient in dealing with nowadays data, for several reasons. Most of this literature has been developed with ordinal comparisons in mind, while much of the current data deals with cardinal (numerical) scores for the pairwise comparisons. However, it is also true that in certain applications such as movie or music rating systems, it is somewhat more natural to express preferences in relative terms (e.g., movie is better than movie ) rather than in absolute terms (e.g., should be ranked and , or is better than by 6 units). In other applications, however, such as sports, the outcome of a match is often a score, for example in soccer, we know what is the goal difference via which team beat team . Along the same lines, in many instances one is often interested not only in recovering the ordering of the items, but also in associating a score to each of the items themselves, which reflects the level of accuracy or the intensity of the proposed preference ordering. For example, the TrueSkill ranking algorithm developed by Microsoft Research assigns scores to online gamers based on the outcome of games players between pairs of players. Every time a player competes in a new game, the ranking engine updates his estimated score and the associated level of confidence, and in doing so, it learns the underlying inherent skill parameters each player is assumed to have.

There exists a very rich literature on ranking, which dates back as early as the 1940s with the seminal work of Kendall and Smith [38], who were interested in recovering the ranking of a set of players from pairwise comparisons reflecting a total ordering. Perhaps the most popular ranking algorithm to date is the famous PageRank [54], used by Google to rank web pages in increasing order of their relevance, based on the hyperlink structure of the Web graph. On a related note, Kleinberg’s HITS algorithm [40] is another website ranking algorithm in the spirit of PageRank, based on identifying good authorities and hubs for a given topic queried by the user, and which assigns two numbers to a web page: an authority and a hub weight, which are defined recursively. A higher authority weight occurs if the page is pointed to by pages with high hub weights. And similarly, a higher hub weight occurs if the page points to many pages that have high authority weights.

In another line of work [14], Braverman and Mossel proposed an algorithm which outputs an ordering based on pairwise comparisons on adaptively selected pairs. Their model assumes that there exists an underlying true ranking of the players, and the available data comes in the form of noisy comparison results, in which the true ordering of a queried pair is revealed with probability , for some parameter which does not depend on the pair of players that compete against each other. However, such a noise model is somewhat unrealistic in certain instances like chess matches or other sporting events, in the sense that the noisy outcome of the comparison does not depend on the strength of the opponents that are competing (i.e., on their underlying skill level) [21].

A very common approach in the rank aggregation literature is to treat the input rankings as data generated from a probabilistic model, and then learn the Maximum Likelihood Estimator (MLE) of the input data, an idea which has been explored in depth in both the machine learning and computational social choice theory communities include the the Bradley-Terry-Luce (BTL) model [13], the Plackett-Luce (PL) model [47], the Mallows-Condorcet model [49, 15], and Random Utility Model (RUM) [60]. The model has found numerous applications in recent years, including pricing in the airline industry [59], or analysis of professional basketball results [41]. Much of the related research within the machine learning community has focused on the development of computationally efficient algorithms to estimate parameters for some of the above popular models, a line of work commonly referred to as learning to rank [45]. We refer the reader to the recent book of Liu [46], for a comprehensive review of the main approaches to learning to rank, and how one may leverage tools from the machine learning community to improve on and propose new ranking models. Another such example is the earlier work of Freund et al. [28], who proposed Rank-Boost, an efficient algorithm for combining preferences based on the boosting approach from machine learning.

Soufiani et al. propose a class of efficient Generalized Method-of-Moments algorithm for computing parameters of the Plackett-Luce model, by breaking the full rankings into pairwise comparisons, and then computing the parameters that satisfy a set of generalized moment conditions [5]. Of independent interest is the approach of breaking full rankings into pairwise comparisons, since the input to the synchronization-based approach proposed in this paper consists of pairwise comparisons. This technique of rank breaking was explored in more depth by a subset of the same set of authors [6], with a focus on the consistency of their proposed breaking methods for a variety of models, together with fast algorithms for estimating the parameters.

Other related work includes [36], whose authors propose to adaptively select pairwise comparisons, an approach which, under certain assumptions, recovers the underlying ranking with much fewer measurements when compared to the more naive approach of choosing at random. Kenyon-Mathieu and Schudy [39] propose a polynomial time approximation scheme (PTAS) for the minimum feedback arc set problem on tournaments111When all pairwise comparisons between a set of players are available, the data can be conveniently represented as a directed complete graph, referred to as tournament graphs in the theoretical computer science literature. Such scenarios are very common in practice, especially in sports, where in a round robin tournament every two players meet, and the direction of each edge encodes the outcome of the match, or in a more general settings, a preference relation between a set of items., an NP-hard problem which arises at tournaments where all pairs of players meet and one seeks to recover a ranking that minimizes the number of upsets. Very computationally efficient methods based on simple scoring methodologies that come with certain guaranties exist since the work of Huber in the 1960s [35] (based on a simple row-sum procedure), and very recently Wauthier et al. [64] in the context of ranking from a random sample of binary comparisons, who can also account for whether one seeks an approximately uniform quality across the ranking, or more accuracy near the top of the ranking than the bottom.

The idea of angular embedding, which we exploit in this paper, is not new, and aside from recent work by Singer in the context of the angular synchronization problem [57], has also been explored by Yu [66], who observes that embedding in the angular space is significantly more robust to outliers when compared to embedding in the linear space. In the later case, the traditional least squares formulation, or its norm formulation, cannot match the performance of the angular embedding approach, thus suggesting that the key to overcoming outliers comes not with imposing additional constraints on the solution, but by adaptively penalizing the inconsistencies between the measurements themselves. Yu’s proposed spectral method returns very satisfactory results in terms of robustness to noise when applied to an image reconstruction problem. In recent work, Braxton et al. [53], propose an -norm formulation for the statistical ranking problem, for the case of cardinal data, and provides an alternative to previous recent work utilizing an -norm formulation, as in [33] and [37].

In very recent work [51], Negahban et al. propose an iterative algorithm for the rank aggregation problem of integrating ranking information from multiple ranking systems, by estimating scores for the items from the stationary distribution of a certain random walk on the graph of items, where each edge encodes the outcome of pairwise comparisons. We summarize their approach in Section LABEL:secsec:RankCent, and compare against it in the setting of the rank aggregation problem. However, for the case of a single rating system, we propose and compare to a variant of their algorithm. Their work addresses several shortfalls of earlier work [2] by a subset of the same authors, who view the available comparison data as partial samples from an unknown distribution over permutations, and reduce ranking to performing inference on this distribution, but in doing so, assume that the comparisons between all pairs of items are available, a less realistic constraint in most practical applications.

In other also very recent work [26], which we briefly summarize in Section LABEL:secsec:SER, Fogel et al. propose a ranking algorithm given noisy incomplete pairwise comparisons by making an expliciti connection to another very popular ordering problem, namely seriation [4], where one is given a similarity matrix between a set of items and assumes that a total order exists and aims to order the items along a chain such that the similarity between the items decreases with their distance along this chain. Furthermore, they demonstrate in [27] the applicability of the same seriation paradigm to the setup of semi-supervised ranking, where additional structural constraints are imposed on the solution.

Contribution of our paper. The contribution of our present work can be summarized as follows.

  • We make an explicit connection between ranking and the angular synchronization problem, and use existing spectral and SDP relaxations for the latter problem to compute robust global rankings.

  • We perform a very extensive set of numerical simulations comparing our proposed method with existing state-of-the-art algorithms from the ranking literature, across a variety of synthetic measurement graphs and noise models, both for numerical (cardinal) and binary (ordinal) pairwise comparisons between the players. In addition, we compare the algorithms on three real data sets: the outcome of soccer games in the English Premier League, a Microsoft tournament for the computer game Halo 2, and NCAA College Basketball games. Overall, we compare (in most instances, favorably) to the two recently proposed state-of-the-art algorithms, Serial-Rank [26], and Rank-Centrality [51], aside from the more traditional Least-Squares method.

  • Furthermore, we propose and compare to a very simple ranking method based on Singular Value Decomposition, which may be on independent interest as its performance (which we currently investigate theoretically in a separate ongoing work) is comparable to that of a recent state-of-the-art method.

  • We propose a method for ranking in the semi-supervised setting where a subset of the players have a prescribed rank to be enforced as a hard constraint.

  • We also adjust the synchronization approach to the setting of the rank aggregation problem of integrating ranking information from multiple rating systems that provide independent, incomplete and inconsistent pairwise comparisons for the same set of players, with the goal of producing a single global ranking.

  • Finally, we show that by combining Sync-Rank with recent algorithms for the planted clique and densest subgraph problem, we are able to identify planted locally-consistent partial rankings, which other methods are not able to extract.

The advantage of the synchronization-based ranking algorithm (Sync-Rank) stems from the fact that it is a computationally simple, non-iterative algorithm that is model independent and relies exclusively on the available data, which may come as either pairwise ordinal or cardinal comparisons. Existing theoretical guarantees from the recent literature on the group synchronization problem [57, 8, 9, 10] trivially translate to lower bounds for the largest amount of noise permissible in the measurements that would still allow for an exact or almost exact recovery of the underlying ground truth ranking. We point point out that a perfect recovery of the angles in the angular synchronization problem is not a necessary condition for a perfect recovery of the underlying ground truth ranking, since it suffices that only the relative ordering of the angles is preserved.

The remainder of this paper is organized as follows. Section LABEL:sec:OtherMethods summarizes related methods against which we compare, with a focus on the recent Serial-Rank algorithm. Section LABEL:sec:GroupAngSync is a review of the angular synchronization problem and existing results from the literature. Section LABEL:sec:Sync-Rank describes the Sync-Rank algorithm for ranking via eigenvector and SDP-based synchronization. Section LABEL:sec:numexp provides an extensive numerical comparison of Sync-Rank with other methods from the literature. Section LABEL:sec:RankAggregation considers the rank aggregation, which we solve efficiently via the same spectral and SDP relaxations of the angular synchronization problem. In Section LABEL:sec:constRanking we consider the constrained ranking problem and propose to solve it via a modified SDP-based synchronization algorithm. Section LABEL:sec:varOpen summarizes several variations and open problems related to ranking, while Section LABEL:sec:summaryDisc is a summary and discussion. Appendix LABEL:sec:PartialRank proposes an algorithm for extracting locally-consistent partial rankings from comparison data. Appendix LABEL:sec:appSER summarizes the recently proposed Serial-Rank algorithm. Finally, in Appendix LABEL:sec:appEngland we provide additional numerical results for the English Premier League soccer data set.

2 Related Methods

In this section, we briefly summarize the Serial-Rank algorithm recently introduced in [26], which performs spectral ranking via seriation and was shown to compare favorably to other classical ranking methods, some of which we discussed in the Introductory section. In addition, we summarize the very recent Rank-Centrality algorithm proposed by Negahban et al. [51], which we used for the rank aggregation problem discussed in Section LABEL:sec:RankAggregation, and also propose a modification of it for the setting of a single rating system, making it amenable to both cardinal and ordinal comparisons. Finally, we consider two other approaches for obtaining a global ranking based on Singular Value Decomposition (SVD) and the popular method of Least Squares (LS).

2.1 Serial Rank and Generalize Linear Models

In very recent work [26], Fogel et al. propose a seriation algorithm for ranking a set of players given noisy incomplete pairwise comparisons between the players. The gist of their approach is to assign similar rankings to players that compare similarly with all other players. They do so by constructing a similarity matrix from the available pairwise comparisons, relying on existing seriation methods to reorder the similarity matrix and thus recover the final rankings. The authors make an explicit connection between the ranking problem and another related classical ordering problem, namely seriation, where one is given a similarity matrix between a set of items and assumes that the items have an underlying ordering on the line such that the similarity between items decreases with their distance. In other words, the more similar two items are, the closer they should be in the proposed solution. By and large, the goal of the seriation problem is to recover the underlying linear ordering based on unsorted, inconsistent and incomplete pairwise similarity information. We briefly summarize their approach in Appendix LABEL:sec:appSER.

2.2 Ranking via Singular Value Decomposition

An additional ranking method we propose, and compare against, is based on the traditional Singular Value Decomposition (SVD) method. The applicability of the SVD-Rank approach stems from the observation that, in the case of cardinal measurements (LABEL:cardComp), the noiseless matrix of rank offsets , is a skew-symmetric matrix of even rank 2 since

\hb@xt@.01(2.1)

where denotes the all-ones column vector. In the noisy case, is a random perturbation of a rank-2 matrix. We consider the top two singular vectors of , order their entries by their size, extract the resulting rankings, and choose between the first and second singular vector based on whichever one minimizes the number of upsets. Note that since the singular vectors are obtained via a global sign, we (again) choose the ordering which minimizes the number of upsets. Though a rather naive approach, SVD-Rank returns, under the multiplicative uniform noise model, results that are comparable to those of very recent algorithms such as Serial-Rank [26] and Rank-Centrality [51]. A previous application of SVD to ranking has been explored in Gleich and Zhukov [31], for studying relational data as well as developing a method for interactive refinement of the search results. To the best of our knowledge, we are not aware of other work that considers SVD-based ranking for the setting considered in this paper. An interesting research direction, which we are pursuing in ongoing work, is to analyze the performance of SVD-Rank using tools from the random matrix theory literature on rank-2 deformations of random matrices [11].

2.2.1 Rank-2 Decomposition in the ERO model

Note that for the Erdős-Rényi Outliers ERO() model given by (LABEL:ERoutliers), the following decomposition could render the SVD-Rank method amenable to a theoretical analysis. Note that the expected value of the entries of is given by

\hb@xt@.01(2.2)

in other words, is a rank-2 skew-symmetric matrix

\hb@xt@.01(2.3)

Next, one may decompose the given data matrix as

\hb@xt@.01(2.4)

where is a random skew-symmetric matrix whose elements have zero mean and are given by

\hb@xt@.01(2.5)

whenever , which renders the given data matrix decomposable into a low-rank (rank-2) perturbation of a random skew-symmetric matrix. The case which corresponds to the complete graph simplifies (LABEL:ERoutliers_Rij), and is perhaps a first step towards a theoretical investigation.

2.2.2 Rank-2 Decomposition in the MUN model

A similar decomposition holds for the other noise model we have considered, Multiplicative Uniform Noise, MUN(), given by (LABEL:MUN_model). As above,

\hb@xt@.01(2.6)

and a similar decomposition as in (LABEL:decompC_EC_R) holds, where the zero mean entries of the random matrix give by

\hb@xt@.01(2.7)

with . Note that, as opposed to (LABEL:ERoutliers_Rij), the entries are no longer independent. To limit the dependency between the entries of the random matrix, one may further assume that there are no comparisons whenever is large enough, i.e., between players who are far apart in the rankings, an assumption which may seem natural in certain settings such as chess competitions, where it is less common that a highly skilled chess master plays against a much weaker player. It would be interesting to investigate whether this additional assumption could make the SVD approach amenable to a theoretical analysis in light of recent results from the random matrix theory literature by Anderson and Zeitouni [3], which relax the independence condition and consider finite-range dependent random matrices that allow for dependency between the entries which are ”nearby” in the matrix.

2.3 Ranking via Least Squares

We also compare our proposed ranking method with the more traditional least-squares approach. Assuming the number of edges in is given by , we denote by the edge-vertex incidence matrix of size whose entries are given by

\hb@xt@.01(2.8)

and by the vector of length which contains the pairwise rank measurements , for all edges . We obtain the least-squares solution to the ranking problem by solving the following minimization problem

\hb@xt@.01(2.9)

We point out here the work of Hirani et al [33], who show that the problem of least-squares ranking on graphs has far-reaching rich connections with various other research areas, including spectral graph theory and multilevel methods for graph Laplacian systems, Hodge decomposition theory and random clique complexes in topology.

2.4 The Rank-Centrality algorithm

In recent work [51], Negahban et al. propose an iterative algorithm for the rank aggregation problem by estimating scores for the items from the stationary distribution of a certain random walk on the graph of items, where edges encode the outcome of pairwise comparisons. The authors propose this approach in the context of the rank aggregation problem, which, given as input a collection of sets of pairwise comparisons over players or partial rankings (where each such set is provided by an independent rating system, or member of a jury of size ) the goal is to provide a global ranking that is as consistent as possible with the given measurements of all ranking systems.

At each iteration of the random walk, the probability of transitioning from vertex to vertex is directly proportional to how often player beat player across all the matches the two players confronted, and is zero if the two players have never played a game before. In other words, the random walk has a higher chance of transitioning to a more skillful neighbors, and thus the frequency of visiting a particular node, which reflects the rank or the skill level of the corresponding players, is thus encoded in the stationary distribution of the associated Markov Chain. Such an interpretation of the stationary distribution of a Markov chain can be traced back to early work on the topic of network centrality from the network science literature. Network centrality-based tools have been designed to measure which nodes of the graph (or other network structures) are most important [52, 63], some of which having a natural interpretation in terms of information flow within a network [22]. One of the most popular applications of network centrality is the PageRank algorithm [54] for computing the relative importance of a web page on the web graph. More recently, dynamic centrality measures have been proposed for the analysis of temporal network data in neuroscience, for studying the functional activity in the human brain using functional magnetic resonance imaging [50].

In the context of the popular BTL model, the authors of [51] propose the following approach for computing the Markov matrix, which we adjust to render it applicable to both ordinal and cardinal measurements, in the case of a single rating system. Note that in Section LABEL:sec:RankAggregation where we discuss the rank aggregation problem in the context of multiple rating systems, we rely on the initial Rank-Centrality algorithm introduced in [51].

For a pair of items and , let be equal to 1 if player beats player , and 0 otherwise, during the match between the two players, with . The BTL model assumes that , where represent the underlying vector of positive real weights associated to each player. The approach in [51] starts by estimating the fraction of times players has defeated player , which is denoted by

\hb@xt@.01(2.10)

as long as players and competed in at least one match, and otherwise. Next, consider the symmetric matrix

\hb@xt@.01(2.11)

which converges to , as . To define a valid transition probability matrix, the authors of [51] scale all the edge weights by and consider the resulting random walk

\hb@xt@.01(2.12)

where denotes the maximum out-degree of a node, thus making sure that each row sums to 1. The stationary distribution is the top left eigenvector of , and its entries denote the final numerical scores associated to each node, which, upon sorting, induce a ranking of the players.

To render the above approach applicable222Otherwise, in the ordinal case, is either or in the case when of a single rating system (for both cardinal and ordinal measurement), but also when for the case of cardinal measurements, we propose the following alternatives to designing the winning probabilities given by (LABEL:aij_RC), and inherently the final winning probability matrix in (LABEL:Aij_RC). Once we have an estimate for (given by (LABEL:myA_ordinal_RC) for ordinal data, respectively by (LABEL:myA_cardinal_RC) for cardinal data), we proceed with building the transition probability as in (LABEL:P_RC) and consider its stationary distribution. Note that we also make these two new methods proposed below applicable to the setting of multiple rating systems , by simply averaging out the resulting winning probabilities , given by each rating system via (LABEL:myA_ordinal_RC) and (LABEL:myA_cardinal_RC), across all rating systems

\hb@xt@.01(2.13)

and then consider the transition probability matrix as in (LABEL:P_RC) and its stationary distribution.

2.4.1 Adjusted Rank-Centrality for ordinal measurements

To handle the case of ordinal measurements, we propose a hybrid approach that combines Serial-Rank and Rank-Centrality, and yields more accurate results than the former one, as it can be seen in Figure LABEL:fig:Meth6_n200_ord. We proceed by computing the matrix as in the Serial-Rank algorithm (given by (LABEL:SijMatch) and (LABEL:SMatch) in Appendix LABEL:sec:appSER) that counts the number of matching comparisons between and with other third reference items . The intuition behind this similarity measure is that players that beat the same players and are beaten by the same players should have a similar ranking in the final solution. Note that for any pair of players333This is since is defined to be , otherwise it would be true that ., and thus

Note that, whenever is very large, say , meaning the two players are very similar, then the quantity is small and close to zero, and thus a good proxy for the difference in the winning probabilities and defined in (LABEL:Aij_RC). In other words, if two players are very similar, it is unlikely that, had they played a lot of matches against each other, one player will defeat the other in most of the matches. On the other hand, if two players are very dissimilar, and thus is close to zero and is close to one, then it must be that, had the two players met in many matches, one would have defeated the other in a significant fraction of the games. With these observations in mind, and in the spirit of (LABEL:Aij_RC), we design the matrix of winning probabilities such that

\hb@xt@.01(2.14)

for a pair of players that met in a game. We lean the balance in favor of the player who won in the (single) direct match, and assign to him the larger winning probability. Keeping in mind that should be a proxy for the fraction of times player defeated player (thus whenever it must be that ), the above system of equations (LABEL:System2by2RC) yields

\hb@xt@.01(2.15)

We remark that, in the case of outliers given by the ERO noise model (LABEL:ERoutliers), our above proposed version of Rank-Centrality (denoted as RC), when used in the setting of multiple rating systems, performs much better than the original Rank-Centrality algorithm (denoted as RCO), as shown in the bottom plot of Figure LABEL:fig:ErrorsJuryERO.

2.4.2 Adjusted Rank-Centrality for cardinal measurements

For the case of cardinal measurements, we propose a similar matrix of winning probabilities, and incorporate the magnitude of the score into the entries of . The intuition behind defining the winning probability is given by the following reasoning. Whenever takes the largest possible (absolute) value (i.e., assume , thus defeats by far), we define the winning probability that player defeats player to be largest possible, i.e., , and in general, the larger the magnitude of , the larger should be. On the other hand, whenever has the smallest possible (absolute) value (i.e., assume ), then the wining probability should be as small as possible, i.e., close to . With these two observations in mind, we define the winning probability matrix as

\hb@xt@.01(2.16)

3 The Group Synchronization Problem

Finding group elements from noisy measurements of their ratios is known as the group synchronization problem. For example, the synchronization problem over the special orthogonal group consists of estimating a set of unknown matrices from noisy measurements of a subset of their pairwise ratios

\hb@xt@.01(3.1)

where denotes the Frobenius norm, and are non-negative weights representing the confidence in the noisy pairwise measurements . Spectral and semidefinite programming relaxations for solving an instance of the above synchronization problem were originally introduced and analyzed by Singer [57] in the context of angular synchronization, over the group SO(2) of planar rotations, where one is asked to estimate unknown angles

\hb@xt@.01(3.2)

given noisy measurements of their offsets

\hb@xt@.01(3.3)

The difficulty of the problem is amplified on one hand by the amount of noise in the offset measurements, and on the other hand by the fact that , i.e., only a very small subset of all possible pairwise offsets are measured. In general, one may consider other groups (such as SO(), O()) for which there are available noisy measurements of ratios between the group elements The set of pairs for which a ratio of group elements is available can be realized as the edge set of a graph , , with vertices corresponding to the group elements , and edges to the available pairwise measurements .

In [57], Singer analyzed the following noise model, where each edge in the measurement graph is present with probability , and each available measurement is either correct with probably or a random measurement with probability . For such a noise model with outliers, the available measurement matrix is given by the following mixture

\hb@xt@.01(3.4)

Using tools from random matrix theory, in particular rank-1 deformations of large random matrices [25], Singer showed in [57] that for the complete graph (thus ), the spectral relaxation for the angular synchronization problem given by (LABEL:FinalRelaxAmitMaxi) and summarized in the next Section LABEL:secsec:syncEIG, undergoes a phase transition phenomenon, with the top eigenvector of the Hermitian matrix in (LABEL:mapToCircle) exhibiting above random correlations with the underlying ground truth solution as soon as

\hb@xt@.01(3.5)

In other words, even for very small values of (thus a large noise level), the eigenvector synchronization method summarized in Section LABEL:secsec:syncEIG is able to successfully recover the ground truth angles if there are enough pairwise measurements available, i.e., whenever is large enough. For the general case of Erdős-Rényi graphs, the same phenomenon is encountered as soon as , where denotes the number of edges in the measurement graph .

3.1 Spectral Relaxation

Following the approach introduced in [57], we build the sparse Hermitian matrix whose elements are either or points on the unit circle in the complex plane

\hb@xt@.01(3.6)

In an attempt to preserve the angle offsets as best as possible, Singer considers the following maximization problem

\hb@xt@.01(3.7)

which gets incremented by whenever an assignment of angles and perfectly satisfies the given edge constraint (i.e., for a good edge), while the contribution of an incorrect assignment (i.e., of a bad edge) will be uniformly distributed on the unit circle in the complex plane. Note that (LABEL:AmitMaxim) is equivalent to the formulation in (LABEL:genSyncMinimization) by exploiting properties of the Frobenius norm, and relying on the fact that it is possible to represent group elements for the special case of SO(2) as complex-valued numbers. Since the non-convex optimization problem in (LABEL:AmitMaxim) is difficult to solve computationally, Singer introduced the following spectral relaxation

\hb@xt@.01(3.8)

by replacing the individual constraints having unit magnitude by the much weaker single constraint . Next, we recognize the resulting maximization problem in (LABEL:RelaxAmitMaxi) as the maximization of a quadratic form whose solution is known to be given by the top eigenvector of the Hermitian matrix , which has an orthonormal basis over , with real eigenvalues and corresponding eigenvectors . In other words, the spectral relaxation of the non-convex optimization problem in (LABEL:AmitMaxim) is given by

\hb@xt@.01(3.9)

which can be solved via a simple eigenvector computation, by setting , where is the top eigenvector of , satisfying , with , corresponding to the largest eigenvalue . Before extracting the final estimated angles, we consider the following normalization of using the diagonal matrix , whose diagonal elements are given by , and define

\hb@xt@.01(3.10)

which is similar to the Hermitian matrix through

Thus, has real eigenvalues with corresponding orthogonal (complex valued) eigenvectors , satisfying . Finally, we define the estimated rotation angles using the top eigenvector via

\hb@xt@.01(3.11)

Note that the estimation of the rotation angles is up to an additive phase since is also an eigenvector of for any . We point out that the only difference between the above approach and the angular synchronization algorithm in [57] is the normalization (LABEL:Rnormalization) of the matrix prior to the computation of the top eigenvector, considered in our previous work [18], and formalized in [58] via the notion of graph connection Laplacian (or its normalized version), for which it can be shown that the bottom eigenvectors can be used to recover the unknown elements of SO(), after a certain rounding procedure.

3.2 Semidefinite Programming Relaxation

A second relaxation proposed in [57] as an alternative to the spectral relaxation, is via the following semidefinite programming formulation. In an attempt to preserve the angle offsets as best as possible, one may consider the following maximization problem

\hb@xt@.01(3.12)

where is the (unknown) Hermitian matrix of rank-1 given by

\hb@xt@.01(3.13)

with ones in the diagonal . Note that, with the exception of the rank-1 constraint, all the remaining constraints are convex and altogether define the following SDP relaxation for the angular synchronization problem

\hb@xt@.01(3.14)
subject to

which can be solved via standard methods from the convex optimization literature [61]. We remark that, from a computational perspective, solving such SDP problems is computationally feasible only for relative small-sized problem (typically with several thousand unknowns, up to about ), though there exist distributed methods for solving such convex optimization problems, such as the popular Alternating Direction Method of Multipliers (ADMM) [12] which can handle large-scale problems arising nowadays in statistics and machine learning [67].

As pointed out in [57], this program is very similar to the well-known Goemans-Williamson SDP relaxation for the famous MAX-CUT problem of finding the maximum cut in a weighted graph, the only difference being the fact that here we optimize over the cone of complex-valued Hermitian positive semidefinite matrices, not just real symmetric matrices. Since the recovered solution is not necessarily of rank-1, the estimator is obtained from the best rank-1 approximation of the optimal solution matrix via a Cholesky decomposition. We plot in Figure LABEL:fig:rkSDP_Meth6_n200_num the recovered ranks of the SDP relaxation for the ranking problem, and point out the interesting phenomenon that, even for noisy data, under favorable noise regimes, the SDP program is still able to find a rank-1 solution. The tightness of this relaxation has been explained only recently in the work of Bandeira et al. [7]. The advantage the SDP relaxation brings is that it explicitly imposes the unit magnitude constraint for , which we cannot otherwise enforce in the spectral relaxation solved via the eigenvector method.

4 Sync-Rank: Ranking via Synchronization

We now consider the application of the angular synchronization framework [57] to the ranking problem. The underlying idea has also been considered in [66] in the context of image denoising, who suggested, similar to [57], to perform the denoising step in the angular embedding space as opposed to the linear space, and observed increased robustness against sparse outliers in the measurements.

Figure 4.1: (a) Equidistant mapping of the rankings around half a circle, for , where the rank of the player is . (b) The recovered solution at some random rotation, motivating the step which computes the best circular permutation of the recovered rankings, which minimizes the number of upsets with respect to the initially given pairwise measurements.

Denote the true ranking of the players to be , and assume without loss of generality that , i.e., the rank of the player is . In the ideal case, the ranks can be imagined to lie on a one-dimensional line, sorted from to to , with the pairwise rank comparisons given, in the noiseless case, by (for cardinal measurements) or (for ordinal measurements). In the angular embedding space, we consider the ranks of the players mapped to the unit circle, say fixing to have a zero angle with the -axis, and the last player corresponding to angle equal to . In other words, we imagine the players wrapped around a fraction of the circle, interpret the available rank-offset measurements as angle-offsets in the angular space, and thus arrive at the setup of the angular synchronization problem detailed in Section LABEL:sec:GroupAngSync.

We also remark that the modulus used to wrap the players around the circle plays an important role in the recovery process. If we choose to map the players across the entire circle, this would cause ambiguity at the end points, since the very highly ranked players will be positioned very close (or perhaps even mixed) with the very poorly ranked players. To avoid the confusion, we simply choose to map the players to the upper half of the unit circle .

In the ideal setting, the angles obtained via synchronization would be as shown in the left plot of Figure LABEL:fig:wrapHalfCircleEx, from where one can easily infer the ranking of the players by traversing the upper half circle in an anti-clockwise direction. However, since the solution to the angular synchronization problem is computed up to a global shift (see, for example, the right plot of Figure LABEL:fig:wrapHalfCircleEx), an additional post-processing step is required to accurately extract the underlying ordering of the players that best matches the observed data. To this end, we simply compute the best circular permutation of the initial rankings obtained from synchronization, that minimizes the number of upsets in the given data. We illustrate this step with an actual noisy instance of Sync-Rank in Figure LABEL:fig:SyncExCircular_n100_025, where plot (a) shows the rankings induced by the initial angles recovered from synchronization, while plot (b) shows the final ranking solution, obtained after shifting by the best circular permutation.

Denote by the ordering induced by the angles recovered from angular synchronization, when sorting the angles from smallest to largest, where denotes the label of the player ranked on the position. For example, denotes the player with the smallest corresponding angle . To measure the accuracy of each candidate circular permutation , we first compute the pairwise rank offsets associated to the induced ranking, via

\hb@xt@.01(4.1)

where denotes the outer product of two vectors , denotes the Hadamard product of two matrices (entrywise product), and is the adjacency matrix of the graph . In other words, for an edge , it holds that , i.e., the resulting rank offset after applying the cyclic shift 444The circular or cyclic shift is given by . The result of repeatedly applying circular shifts to a given -tuple are often denoted circular shifts of the tuple.. Next, we choose the circular permutation which minimizes the norm555This is just the norm of the vectorized form of the matrix of the residual matrix

\hb@xt@.01(4.2)

which counts the total number of upsets. Note that an alternative to the above error is given by

\hb@xt@.01(4.3)

which takes into account the actual magnitudes of the offsets, not just their sign.

Figure 4.2: An illustration of the steps for recovering the ranking from the eigenvector synchronization approach, for an Erdős-Rényi measurement graph with cardinal comparisons, and outliers chosen uniformly at random with probability as in the ERO model (LABEL:ERoutliers). Left: the ranking induced by the initially obtained angles, estimated via angular synchronization. Right: the ranking obtained after shifting the set of angles by the best circular permutation which minimizes the number of upsets (i.e., inverted rankings) in the available measurements; (respectively, ) denote the Kendall distance (respectively, correlation) between the recovered and the ground truth ranking.

We summarize the above steps of the Sync-Rank approach in Algorithm LABEL:Algo:listSync. Note that, throughout the paper, we denote by SYNC (or SYNC-EIG) the version of synchronization-based ranking which relies on the spectral relaxation of the synchronization problem, and by SYNC-SDP the one obtained via the SDP relaxation.

0:   the graph of pairwise comparisons and the matrix of pairwise comparisons (rank offsets), such that whenever we have available a (perhaps noisy) comparison between players and , either a cardinal comparison () or an ordinal comparison .
1:  Map all rank offsets to an angle with , using the transformation
\hb@xt@.01(4.4)
We choose , and hence .
2:  Build the Hermitian matrix with , and otherwise, as in (LABEL:mapToCircle).
3:  Solve the angular synchronization problem via either its spectral (LABEL:FinalRelaxAmitMaxi) or SDP (LABEL:SDP_program_SYNC) relaxation, and denote the recovered solution by , where denotes the recovered eigenvector
4:  Extract the corresponding set of angles from .
5:  Order the set of angles in increasing order, and denote the induced ordering by .
6:  Compute the best circular permutation of the above ordering that minimizes the resulting number of upsets with respect to the initial rank comparisons given by
\hb@xt@.01(4.5)
with defined as in (LABEL:distTwoRanks).
7:  Output as a final solution the ranking induced by the circular permutation .
Algorithm 1 Summary of the Synchronization-Ranking (Sync-Rank) Algorithm

Figure LABEL:fig:BarpsMethodComp is a comparison of the rankings obtained by the different methods: SVD, LS, SER, SER-GLM, RC, and SYNC-EIG, for an instance of the ranking problem given by the Erdős-Rényi measurement graph with cardinal comparisons, and outliers chosen uniformly at random with probability , according to the ERO() noise model given by (LABEL:ERoutliers). Note that at low levels of noise, all methods yield satisfactory results, but as the noise level increases, only Sync-Rank is able to recover a somewhat accurate solution, and significantly outperforms the results obtained by any of the other methods in terms of the number of flips (i.e., Kendall distance) with respect to the original ranking.

(a)
(b)
(c)
Figure 4.3: A comparison of the rankings obtained by the different methods: LS, SVD, SER, RC and SYNC-EIG (eigenvector synchronization), for the graph ensemble with cardinal comparisons, and outliers chosen uniformly at random with probability , according to the ERO() noise model given by (LABEL:ERoutliers). We omit the results obtained via SER-GLM and SYNC-SDP, since they were very similar to SER, respectively SYNC-EIG, for this particular instance.

4.1 Synchronization Ranking for Ordinal Comparisons

When the pairwise comparisons are ordinal, and thus , all the angle offsets in the synchronization problem will have constant magnitude, which is perhaps undesirable. To this end, we report on a scoring method for the synchronization problem, for which the magnitude of the resulting entries (used as input for synchronization) give a more accurate description of the rank-offset between a pair of players. For an ordered pair , we define the Superiority Score of with respect to as follows, in a manner somewhat analogous to the score (LABEL:SijMatch) used by the Serial-Rank algorithm. Let

\hb@xt@.01(4.6)

where denotes the set of nodes with rank lower than , and denotes the set of nodes with rank higher than . Based on this, the final score (rank offset) used as input for the Sync-Rank algorithm is given by

\hb@xt@.01(4.7)

To preserve skew-symmetry, we set . The philosophy behind this measure is as follows. In the end, we want to reflect the true rank-offset between two nodes. One can think of as the number of witnesses favorable to (supporters of ), which are players ranked lower than but higher than . Similarly, is the number of witnesses favorable to (supporters of ), which are players ranked lower than but higher than . Then, the final rank offset is given by the difference of the two values and (though one could also perhaps consider the maximum). The sign adjustment is just so that we have being a proxy for (if is high-rank and thus is a small number, and is low-rank and thus is a large number, then should be strongly negative). We solve the resulting synchronization problem via the eigenvector method, and denote the approach by SYNC-SUP. While this approach yields rather poor results (compared to the initial synchronization method) on the synthetic data sets, its performance is comparable to that of the other methods when applied to real data sets, and quite surprisingly, it performs twice as good as any of the other methods when applied to the NCAA basketball data set. We attribute this rather intriguing fact to the observation (based on a preliminary investigation) that for this particular data set, the measurement graphs for each season are somewhat close to having a one dimensional structure, with teams grouped in leagues according to their strengths, and it is less likely for a highly ranked team to play a match against a very weak team. It would be interesting to investigate the performance of the all algorithms on a measurement graph that is a disc graph with nodes (i.e., players) lying on a one-dimensional line, and there is an edge (i.e., match) between two nodes if and only if they are at most units apart.

5 Noise models and experimental results

We compare the performance of the Sync-Rank algorithm with that of other methods across a variety of measurement graphs, varying parameters such as the number of nodes, edge density, level of noise and underlying noise model. We detail below the two noise models we have experimented with, and then proceed with the performance outcomes of extensive numerical simulations.

5.1 Measurement and Noise Models

In most real world scenarios, noise is inevitable and imposes additional significant challenges, aside from those due to sparsity of the measurement graph. To each player (corresponding to a node of ), we associate a corresponding unique positive integer weight . For simplicity, one may choose to think of as taking values in . In other words, we assume there is an underlying ground truth ranking of the players, with the most skillful player having rank , and the least skillful one having rank . We denote by

\hb@xt@.01(5.1)

the ground truth cardinal rank-offset of a pair of players. Thus, in the case of cardinal comparisons, the available measurements are noisy versions of the ground truth entries. On the other hand, an ordinal measurement is a pairwise comparison between two players which reveals only who the higher-ranked player is, or in the case of items, the preference relationship between the two items, i.e., is item is preferred to item , and otherwise, thus without revealing the intensity of the preference relationship

\hb@xt@.01(5.2)

This setup is commonly encountered in classical social choice theory, under the name of Kemeny model, where , and takes value if player is ranked higher than player , and otherwise. Given (perhaps noisy) versions of the pairwise cardinal or ordinal measurements given by (LABEL:cardComp) or (LABEL:ordComp), the goal is to recover an ordering (ranking) of all players that is as consistent as possible with the given data. We compare our proposed ranking methods with those summarized in Section LABEL:sec:OtherMethods, on measurement graphs of varying edge density, under two different noise models, and for both ordinal and cardinal comparisons.

To test for robustness against incomplete measurements, we use a measurement graph of size given by the popular Erdős-Rényi model , where edges between the players are present independently with probability . In our experiments, we consider three different values of , the latter case corresponding to the complete graph on nodes, which corresponds to having a ranking comparison between all possible pairs of players. To test the robustness against noise, we consider the following two noise models detailed below. We remark that, for both models, noise is added such that the resulting measurement matrix remains skew-symmetric.

5.1.1 Multiplicative Uniform Noise (MUN) Model

In the Multiplicative Uniform Noise model, which we denote by MUN(), noise is multiplicative and uniform, meaning that, for cardinal measurements, instead of the true rank-offset measurement , we actually measure

\hb@xt@.01(5.3)

An equivalent formulation is that to each true rank-offset measurement we add random noise (depending on ) uniformly distributed in , i.e.,

\hb@xt@.01(5.4)

Note that we cap the erroneous measurements at in absolute value. Thus, whenever we set it to , and whenever we set it to , since the furthest away two players can be is positions. The percentage noise added is , (e.g., corresponds to noise). Thus, if , and the ground truth rank offset , then the available measurement is a random number in .

5.1.2 Erdős-Rényi Outliers (ERO) Model

The second noise model we experiment with, is an Erdős-Rényi Outliers model, abbreviated by ERO(), where the available measurements are given by the following mixture model

\hb@xt@.01(5.5)

Note that this noise model is similar to the one considered by Singer [57], in which angle offsets are either perfectly correct with probability , and randomly distributed on the unit circle, with probability .

We expect that in most practical applications, the first noise model (MUN) is perhaps more realistic, where most rank offsets are being perturbed by noise (to a smaller or larger extent, proportionally to their magnitude), as opposed to having a combination of perfectly correct rank-offsets and randomly chosen rank-offsets.

5.2 Numerical comparison synthetic data sets

This section compares the spectral and SDP synchronization-based ranking with the other methods briefly summarized in Sec. LABEL:sec:OtherMethods, and also in Table LABEL:tab:methodAbbrev. More specifically, we compare against the Serial-Rank algorithm, based on both (LABEL:SijMatch) and (LABEL:def:SimGLM) summarized in Sec. LABEL:secsec:SER and Appendix LABEL:sec:appSER, the SVD-based approach in Sec. LABEL:secsec:SVD, the method of Least Squares in Sec. LABEL:secsec:LS, and finally the Rank-Centrality algorithm in Sec. LABEL:secsec:RankCent.

Acronym Name Section SVD SVD Ranking Sec. LABEL:secsec:SVD LS Least Squares Ranking Sec. LABEL:secsec:LS SER Serial-Ranking Sec. LABEL:secsec:SER SER-GLM Serial-Ranking in the GLM model Sec. LABEL:secsec:SER RC Rank-Centrality Sec. LABEL:secsec:RankCent SYNC Synchronization-Ranking via the spectral relaxation Sec. LABEL:sec:Sync-Rank SYNC-SUP Synchronization-Ranking based on the Superiority Score (spectral relaxation) Sec. LABEL:secsec:syncWeighted SYNC-SDP Synchronization-Ranking via the SDP relaxation Sec. LABEL:secsec:syncSDP
Table 5.1: Names of the algorithms we compare, their acronyms, and respective Sections.

We measure the accuracy of the recovered solutions, using the popular Kendall distance, i.e., the number of pairs of candidates that are ranked in different order (flips), in the two permutations, the original one and the recovered one. Given two rankings and , their Kendall distance is defined as

\hb@xt@.01(5.6)

We compute the Kendall distance on a logarithmic scale (), and remark that in the error curves, for small levels of noise, the error levels for certain methods are missing due to them being equal to 0. In Figure LABEL:fig:Meth6_n200_num we compare the methods for the case of cardinal pairwise measurements, while in Figure LABEL:fig:Meth6_n200_ord we do so for ordinal measurements. The SYNC and SYNC-SDP methods are far superior to the other methods in the case of cardinal measurements, and similar to the other methods for ordinal measurements. In Figures LABEL:fig:rkSDP_Meth6_n200_num we plot the recovered rank of the SDP program (LABEL:SDP_program_SYNC). Note that for favorable levels of noise, the SDP solution is indeed of rank 1 even if we did not specifically enforce this constraint, a phenomenon explained only recently in the work of Bandeira et al. [7], who investigated the tightness of this SDP relaxation.

(a) , cardinal, MUN
(b) , cardinal, MUN
(c) , cardinal, MUN
(d) , cardinal, ERO
(e) , cardinal, ERO
(f) , cardinal, ERO
Figure 5.1: The rank of the recovered solution from the SDP program (LABEL:SDP_program_SYNC), for the case of cardinal comparisons, as the vary the amount of noise. Top: Multiplicative Uniform Noise (MUN()). Bottom: Erdős-Rényi Outliers (ERO()). We average the results over 20 experiments.
(a) , cardinal, MUN
(b) , cardinal, MUN
(c) , cardinal, MUN
(d) , cardinal, ERO
(e) , cardinal, ERO
(f) , cardinal, ERO
Figure 5.2: Comparison of all ranking algorithms in the case of cardinal comparisons. Top: Multiplicative Uniform Noise (MUN()). Bottom: Erdős-Rényi Outliers (ERO()). We average the results over 20 experiments.
(a) , ordinal, MUN
(b) , ordinal, MUN
(c) , ordinal, MUN
(d) , ordinal, ERO
(e) , ordinal, ERO
(f) , ordinal, ERO
Figure 5.3: Comparison of all methods for ranking with ordinal comparisons. Top: Multiplicative Uniform Noise (MUN()). Bottom: Erdős-Rényi Outliers (ERO()). We average the results over 20 experiments.

5.3 Numerical Comparison on the English Premier League Data Set

The first real data set we consider is from the analysis of sports data, in particular, several years of match outcomes from the English Premier League soccer data set. We consider the last 3 complete seasons: 2011-2012, 2012-2013, and 2013-2014, both home and away games. During each championship, any pair of teams meets exactly twice, both home and away. We pre-process the data in several ways, and propose several methods to extract information from the game outcomes, i.e., of building the pairwise comparison matrix from the raw data given by and , and report on the numerical results we obtain across all scenarios. The four different criteria we experiment with are as follows

  1. : Total-Point-Difference, for each pair of teams, we aggregate the total score for the two matches teams and played against each other

    \hb@xt@.01(5.7)
  2. : Sign-Total-Point-Difference considers the winner after aggregating the above score

    \hb@xt@.01(5.8)
  3. : Net Wins: the number of times a team has beaten the other one. In the soccer data set, this takes value . In the Halo data set, the number of games a pair of players may play against each other is larger.

    \hb@xt@.01(5.9)
  4. : Sign(Net Wins) (): only considers the winner in terms of the number of victories.

    \hb@xt@.01(5.10)

Note that and lead to cardinal measurements, while and to ordinal ones. Each user may also interpret each variant of the pre-processing step as a different winning criterium for the game under consideration. We remind the reader that for soccer games, a win is compensated with 3 points, a tie with 1 point, and a loss with 0 points, the final ranking of the teams is determined by the cumulative points a team gathers throughout the season. Given a particular criterium, we are interested in finding an ordering that minimizes the number of upsets, or weighted upsets as defined below.

We denote by the estimated rank of player as computed by the method of choice. Recall that lower values of correspond to higher ranks (better players or more preferred items). We then construct the induced (possibly incomplete) matrix of induced pairwise rank-offsets

and remark that denotes that the rank of player is higher than the rank of player . To measure the accuracy of a proposed reconstruction, we rely on the following three metrics. First, we use the popular metric that counts the number of upsets (lower is better)

\hb@xt@.01(5.11)

which counts the number of disagreeing ordered comparisons. It contributes with a to the summation whenever the ordering in the provided data contradicts the ordering in the induced ranking. Next we define the following two measures of correlation (higher is better) between the given rank comparison data, and the one induced by the recovered solution

\hb@xt@.01(5.12)

and the very similar one

\hb@xt@.01(5.13)
Team SVD LS SER SYNC SER-GLM SYNC SUP SYNC SDP RC GT Arsenal 1 4 5 4 4 5 4 4 4 Aston Villa 19 16 18 16 20 19 16 16 15 Cardiff 17 20 16 20 15 20 20 20 20 Chelsea 8 1 1 1 2 1 1 1 3 Crystal Palace 14 12 14 12 16 13 12 13 11 Everton 5 5 4 5 5 6 5 5 5 Fulham 15 17 17 17 17 16 17 17 19 Hull 13 14 13 14 13 14 14 14 16 Liverpool 3 3 3 3 3 3 3 3 2 Man City 2 2 2 2 1 2 2 2 1 Man United 4 7 7 7 8 7 7 7 7 Newcastle 10 11 10 11 11 10 11 11 10 Norwich 18 15 19 15 18 18 15 15 18 Southampton 7 8 8 8 7 8 8 8 8 Stoke 11 9 9 9 9 9 9 9 9 Sunderland 20 18 20 18 19 17 18 18 14 Swansea 9 10 11 10 10 12 10 10 12 Tottenham 6 6 6 6 6 4 6 6 6 West Brom 16 19 15 19 12 15 19 19 17 West Ham 12 13 12 13 14 11 13 12 13 Nr. upsets 66 44 44 44 52 48 44 46 54 Score/100 9.5 10.3 10.0 10.3 10.0 10.1 10.3 10.2 10.4 W-Score/1000 9.5 10.0 9.9 10.0 10.0 10.0 10.0 10.0 10.6 Corr w. GT 0.69 0.87 0.80 0.87 0.75 0.84 0.87 0.86 1.00
Table 5.2: English Premier League Standings 2013-2014, based on the input matrix (the number of net wins between a pair of teams) given by (LABEL:Cnw). GT denotes the final official ranking at the end of the season.

We show in Table LABEL:tab:EnglandStandings_nrWonLost the rankings obtained by the different methods we have experimented with, for the 2013-2014 Premier League Season, when the input is based on the measurements (Net Wins). The final column, denoted as GT, denotes the final official ranking at the end of the season. We sort the teams alphabetically, and show their assigned rank by each of the methods. The bottom row of Table LABEL:tab:EnglandStandings_nrWonLost computes the different quality measures defined above, , , and . The very last row in the Table computes the Kendall correlation between the respective ranking and the official final standing GT. We plot similar results in Appendix LABEL:sec:appEngland, in Tables LABEL:tab:EnglandStandings_sumGoalDif ( based on Total-Goal-Difference), LABEL:tab:EnglandStandings_signsumGoalDif ( based on Sign-Total-Goal-Difference), and finally, LABEL:tab:EnglandStandings_signNrWonLost ( based on Sign(Net Wins)).

We remark that, across the different type of inputs considered, LS, SYNC and SYNC-SDP (abbreviated by SDP in the table) correlate best with the official ranking666We do not believe that correlation with the official standings is a good measure of success, simply because they are based on different rules, i.e., on accumulation of points for each win, tie, or loss. One can think of the four different types of pre-processing criteriums as four different input data sets with pairwise comparisons, and the goal is to propose a solution that best agrees with the input data, whatever that is., and in almost all scenarios (with very few exceptions) achieve the highest values for the and correlation scores. In terms of the number of upsets, the synchronization based methods alternate the first place in terms of quality with SER, depending on the pre-processing step and the input used for the ranking procedure. In addition, we show in Figure LABEL:fig:EnglandErors the Q-scores associated to three recent seasons in Premier League: 2011-2012, 2012-2013, and 2013-2014, together with the mean across these three seasons. We show the number of upsets in Figure LABEL:fig:EnglandErors (a), and the correlations scores in (b) and (c), across the four different possible types of inputs , , , and .

(a) Nr upsets (lower is better)
(b) correlation score (higher is better)
(c) correlation Score (higher is better)
Figure 5.4: English Premier League 2011-2014. The input data is as follows. Column 1: , Column 2: , Column 3: , and Column 4: (as we number the columns left to right).

5.4 Numerical Comparison on the Halo-2 Data Set

In this section, we detail the outcome of numerical experiments performed on a second real data set of game outcomes gathered during the Beta testing period for the Xbox game Halo 2 777Credits for the use of the Halo 2 Beta Dataset are given to Microsoft Research Ltd. and Bungie.. There are 606 players, who play a total of head-to-head games. We remove the low degree nodes, i.e., discard the players who have played less than 3 games. The resulting comparison graph has nodes, and edges, with average (respectively maximum) degree of the graph being (respectively ) and standard deviation . We show the histogram of resulting degrees in Figure LABEL:fig:HaloDegDist.

(a) Nr upsets (lower is better)
(b) correlation score (higher is better)
(c) correlation score (higher is better)
Figure 5.5: A comparison of the methods on the Halo data set, based on Head-to-Head matches. The input is as follows. Column 1: , Column 2: , Column 3: , and Column 4: (as we number the columns left to right).

We plot in Figure LABEL:fig:HaloErors the results obtained, in terms of the number of upsets (a), and the two correlation scores in (b) and (c), as we compare all the methods considered so far, across the four different possible types of inputs , , , and (which index the columns from left to right, in Figure LABEL:fig:HaloErors). We remark that LS, SYNC, and RC achieve the lowest number of upsets , across all possible inputs, and also the best results for the correlation score. The ranking in terms of performance based on the correlation score varies across the different types of inputs, with SVD achieving the best results in two instances.

5.5 Numerical Comparison on the NCAA College Basketball Data Set

Our third and final real data set contains the outcomes of NCAA College Basketball matches during the regular season, for the time interval 1985 - 2014. During the regular season, most often it is the case that a pair of teams play against each other at most once. For example, during the 2014 season, for which 351 teams competed, there were a total of 5362 games, with each team playing on average about 30 games. We remark that in the earlier years, there were significantly less teams participating in the season, and therefore games played, which also explains the increasing number of upsets for the more recent years, as shown in Figure LABEL:fig:BasketErors (a). For example, during the 1985 season, 282 teams participated, playing a total of 3737 games. In Figure LABEL:fig:BasketErors (a) we compare the performance of all methods across the years, for the case of both cardinal and ordinal measurements, i.e., when we consider the point difference or simply the winner of the game. Similarly, in Figure LABEL:fig:BasketErors (b) we compute the average number of upsets across all years in the interval 1985-2014, for both cardinal and ordinal measurements, i.e., when we record the actual point difference or only the winner. In the less frequent cases when a pair of teams play against each other more than once, we simply average out the available scores. We remark that the SYNC-SUP method, i.e., eigenvector-based synchronization based on the superiority score given by (LABEL:SijDefSuperiority), significantly outperforms all other methods in terms of the number of upsets. The second best results are obtained (in no particular order) by LS, SYNC, SYNC-SDP and RC, which all yield very similar results, while SVD and SER are clearly the worst performers. We left out from the simulations the SER-GLM method due to its large computational running time.

(a) Number of upsets, for each year in the interval 1985 - 2014.
(b) Average number of upsets across 1985 - 2014.
Figure 5.6: A comparison of the methods on the NCAA College Basketball Championship data set, over the interval 1985-2014, in terms of the number of upsets, across the years (left) and averaged over the entire period (right). In the left figure, we append the suffix ”ord”, respectively ”card”, when the input to the ranking algorithms is given by ordinal measurements that record only the winner of each match, respectively cardinal measurements that rely on the point difference for each match.

6 Rank Aggregation

In many practical instances, it is often the case that multiple voters or rating systems provide incomplete and inconsistent rankings or pairwise comparisons between the same set of players or items. For example, at a contest, two players compete against each other, and judges decide what the score should be. Or perhaps, as in certain championships, teams may play against each other multiple times, as is the case in the USA Basketball Championship where teams may play multiple matches against each other throughout a season. In such settings, a natural question to ask is, given a list of (possibly incomplete) matrices of size corresponding to ordinal or cardinal rank comparisons on a set of players, how should one aggregate all the available data and produce a single ranking of the players, that is as consistent as possible with the observed data? A particular instance of this setup occurs when many partial (most likely inconsistent) rankings are provided on a subset of the items; for example, given a partial ranking of size , the resulting comparison matrix will only have nonzero elements corresponding to all paired comparisons of the items (chosen out of all items). The difficulty of the problem is amplified on one hand by the sparsity of the measurements, since each ratin