Identifying top football players and springboard clubs from a football player collaboration and club transfer networks
Abstract
We consider all players and clubs in top twenty world football leagues in the last fifteen seasons. The purpose of this paper is to reveal top football players and identify springboard clubs. To do that, we construct two separate weighted networks. Player collaboration network consists of players, that are connected to each other if they ever played together at the same club. In directed club transfer network, clubs are connected if players were ever transferred from one club to another. To get meaningful results, we perform different network analysis methods on our networks. Our approach based on PageRank reveals Christiano Ronaldo as the top player. Using a variation of betweenness centrality, we identify Standard Liege as the best springboard club.
keywords:
football network, sports networks, network analysis, measures of centrality1 Introduction
Football is probably the most popular sport in the world with around 265 million active players BCount () around the globe and even more people enjoy watching it. Every year a lot of money is spent by football clubs in attempt to build a strong team by buying good players from their rivals.
Most of the data available on official football unions or tournament websites normally only addresses a specific match, tournament or season. In order to collect the data for top leagues in the last few seasons, we need to look elsewhere.
A very interesting website from this perspective is www.transfermarkt.co.uk. It contains all the major leagues including all the clubs, rankings, players, information about the players and also their estimated market value.
In an attempt to analyse players and the connections between them we construct a large network of professional football players from different clubs in different leagues. We are particularly interested in the influence of the teammates on a football player and if it is possible to identify the best players, based on knowing with whom they play now and where they played in the past. Using these analyses we could be able to find out which players are the best according to different metrics.
In a football player network, two players are connected to each other if they have ever played for the same club. Such network can be represented by a bipartite graph consisting of clubs and players. Every player is connected to all the clubs he has played for and through the nodes that represent clubs we are able to see which players played together for a specific club. For simpler analysis we separate these problems and project the bipartite graph to a network constructed only from nodes representing football players. Two nodes are connected to each other if they were ever teammates. This is an undirected network.
Apart from the analysis of players, we also want to identify the best springboard clubs that are the players entry point into the best football clubs in the world. Because we do not include information about clubs in the first network, we construct a second network. The second network is a club transfer network. Clubs from the top twenty leagues represent nodes which are connected if any player was ever transferred from one club to another. The direction of the edge points from the club that sold the player to the club that bought the player.
Preliminary analysis on an unweighted undirected player collaboration network shows that weighted networks are needed in order to extract information about the best players. We expect very well known football players to come on top when analysing a weighted player collaboration network. In order to identify springboard clubs a weighted directed club transfer network has to be constructed. Weights of the edges are calculated using different equations that take into account multiple metrics. Using those networks we identify the top players in the world of football and the top springboard clubs.
2 Related Work
As it has been pointed out in pena2012network (), football data is becoming more easily available in the past years since FIFA has made more data regarding different matches available on their website. Many authors took advantage of that and constructed different networks to perform network analysis and gather information from the networks.
In pena2012network () the authors used some interesting approaches to reveal key players of a certain team, performing analysis on a passing network of a specific team. They showed we are able to identify different kinds of strategies of a team such as focusing passes on a single player or evenly distributing passes between all players in the team. They performed several analyses on a team passing network using very well known network analysis methods such as PageRank, Betweenness centrality and Closeness centrality.
Player contribution to a team was also analysed in duch2010quantifying (). They used a variation of betweenness centrality of the player with regard to opponent’s goal, which authors denoted as flow centrality.
We use similar network theory methods, but we adapt them to test different theories.
In cotta2013network () they dug a little deeper but followed the same idea. They only concentrated on one specific team and constructed more networks for the same match, introducing the time dimension.
Although there are various papers regarding football network analysis, the majority of football networks are only considering a certain match or tournament. In this paper we construct a much larger network consisting of thousands of players.
In other team sports, such as cricket, some authors have already tried to identify the best individuals among all the players that played over a certain time period.
A very interesting networks considering sportsmen throughout several decades were analysed in mukherjee2012identifying (); radicchi2011best (). In radicchi2011best () the authors attempted to find out who is the best player of tennis in history of this sport. We try to construct a somewhat similar network but since football is very different from tennis the networks still differ a lot. Since the main difference is that football is a team sport, we can not just link players based on their matches. Here players are connected based on their affiliation to a club.
3 Methods
3.1 Data Extraction and Network Construction
In this paper we analyse a large set of football players throughout the past fifteen seasons. In order to collect this data we use the site www.transfermarkt.co.uk, which is becoming the leading portal when it comes to football players and information about them. Several scripts are used to extract relevant data for different clubs and players. Network is constructed from players out of 20 most valuable football leagues from year 2001 to 2016. The leagues and their values are presented in Table 3.
Using the gathered data we constructed two separate networks, first one consisting of football players and the other consisting of football clubs. Football player network is a player collaboration network where players are connected if they ever played together at the same club. It is an undirected weighted network consisting of 36,214 nodes and 1,412,232 edges.
Other basic network properties are shown in Table 1.
Club network is a directed transfer network between all the clubs in the top twenty world football leagues. Nodes represent clubs and a club is connected to another club if a player was ever transferred from the first club to the second club. It is a directed weighted network consisting of 330 nodes and 12,841 edges.
Other basic network properties are shown in Table 2.
Property  Value 

Nodes  36,214 
Edges  1,412,232 
Fraction of nodes in LCC  99.96% 
Average degree  78.02 
Average clustering  0.67 
Property  Value 

Nodes  330 
Edges  12.841 
Fraction of nodes in LCC  99.90% 
Average degree  77.82 
Average distance  2.03 
Leagues  

League  Value [£]  League  Value [£] 
Permier League (ENG)  3,01bn  Pro League (BEL)  354m 
La Liga (ESP)  2,25bn  Primera Division (ARG)  306m 
Serie A (ITA)  1,79bn  Premier Liga (UKR)  299m 
1. Bundesliga (GER)  1,65bn  Super League(GRE)  227m 
Ligue 1 (FRA)  1,06bn  Super League (SWI)  175m 
Super Lig (TUR)  698m  MLS (USA)  162m 
Premier Liga (RUS)  638m  Liga 1 (ROM)  118m 
Serie A (BRA)  608m  1. HNL (CRO)  115m 
Liga NOS (POR)  574m  Bundessliga (AUT)  110m 
Eredivisie (NED)  375m  Premiership (SCO)  85m 
3.2 Player Network Analysis
In order to reveal the best players in our network, we choose an appropriate method of determining node importance. Since we wanted to identify the best players in the last fifteen seasons, we expected the most known and valued names of football to be at the top of the list. Not to neglect younger players, we also separate players into age groups. We analysed each age group individually in order to identify the most perspective players. Since our network is a collaboration network, we have to categorize the edges. The players that play with the best players are usually good themselves. Players with a lower value may change a lot of clubs and change a lot of teammates in a couple of seasons, but this categorization penalises their edges. In general, player market value is a good identifier of the quality of a player. Therefore we choose market value as a core property to calculate the edge weight. Since our data spans over fifteen seasons, we have to take into account the inflation, so that good players that played in the past are not penalised. We gather average inflation rate from InfRatio (). The final formula for calculating the weight of a specific edge is
(1) 
Symbols and are values of players that are connected by the edge, represents the seasons in which players played together and represents average inflation ratio per year for Europe in the last 13 years. The equation is divided by 100000, to obtain smaller numbers. To calculate which node is the most important, we choose one of the most popular node importance algorithms, PageRank page1997pagerank (). We calculate the PageRank score of every node in our weighted network. To identify the most perspective players, we separate players into age groups. The most perspective players have the highest score in their age groups.
3.3 Club Network Analysis
From the club transfer network we want to identify the springboard clubs. These are the clubs where younger players gather experience and are later sold to better or even the best clubs in the world. Similar to the player collaboration network, this network has to be weighted as well. We are able to extract the number of transfers in both directions for all pairs of clubs but the absolute number does not provide the necessary information for springboard club identification. Thus, we have to weight every edge, representing the number of transfers from one club to another, with a weight related to the importance of the destination club. The importance of the destination club is calculated using two different equations. One is based on average ranking of the destination club in the past fifteen seasons and the ranking of the league they play in, and the other one is based on the destination club value. Both equations are stated and explained below.
(2) 
(3) 
Weight in the Equation 2 is calculated as a reciprocal value of destination club average ranking in the past fifteen seasons multiplied by the our predefined destination club league ranking . Predefined league rankings can be found in Table 4 and are defined for the purpose of this paper.
Weight in the Equation 3 is calculated as destination club average value in the past fifteen seasons divided by to lower the weight values.
To identify springboard clubs we have to choose a different method from the one we use for player collaboration network. The most important thing in this network are the transfer paths from less valuable to the most valuable clubs. A club is considered a springboard if it is involved in a lot of transfers to the most valuable clubs. Thus, the betweenness centrality freeman1977set () is the most suitable measure. We implement a fast betweenness algorithm discussed in fastBet (). Since our network is weighted we have to modify the proposed algorithm so it takes weights into account. The only difference from the proposed algorithm is calculation of path lengths where we do not add one for every hop but take weight into account. We have to take the reciprocal value of weight as in our network larger weight is better and we want to favour edges with larger weights.
League rankings  

League  Ranking  League  Ranking 
La Liga (ESP)  100  Premier League (UKR)  20 
Premier League (ENG)  95  Super League (SWI)  20 
Serie A (ITA)  85  Serie A (BRA)  20 
Bundesliga (GER)  75  Super Lig (TUR)  15 
Ligue 1 (FRA)  50  Primera Division (ARG)  15 
Primera Liga (POR)  40  Super League(GRE)  13 
Eredivisie (NED)  40  Liga 1 (ROM)  12 
Pro League (BEL)  25  1. HNL (CRO)  10 
Premier League(RUS)  25  Bundesliga (AUT)  10 
Premiership (SCO)  20  MLS (USA)  5 
4 Results and Discussion
4.1 Top players
After running the analysis on the player collaboration network, we can show that the best player according to our analysis is Cristiano Ronaldo. He is followed by several other players that have played for several of the best clubs. By looking at the Table 5, where top 20 players identified by our algorithm and their scores are listed, we can see that the value of the player is not the only thing that affects the score of a player. Players like Beckham, Ronaldinho, Kaká and Keane, whose market value decreased a lot lately because of their age, but they played for a lot of important clubs in their career, have high scores. Most players on the top 20 list are still active today and are playing in the best leagues.
The most perspective players in each age group are listed in Tables 9, 8, 7 and 6.
When assessing player’s perspectiveness, the most important factor besides his value and the values of his teammates is the player’s age. Since our network is an undirected network connecting two players, age can not be simply added to the weight equation. Including age into weight equation would favour players that have valuable teammates and also players that have younger teammates, which is not desired. Therefore, for identifying the most perspective players, the network can stay the same, we just need to interpret results differently. We divide players into different groups based on their age and compare only scores of players in the same groups.
On average, older players have higher scores, which is expected as they played more seasons, which results in higher degree. Thus, the separation into age groups is beneficial. Some of the most perspective players based on our algorithm already play for the best clubs and others, despite their young age, play an important role in their clubs.
Based on the results, we can conclude that PageRank is an appropriate algorithm for determining the best players in our weighted network.
Player  PageRank score  Value 2015/16 [£] 

Cristiano Ronaldo  0.000557  77.000.000 
Lionel Messi  0.000544  84.000.000 
David Beckham  0.000528  / 
Zlatan Ibrahimović  0.000459  10.500.000 
Ronaldinho Gaúcho  0.000444  1.005.000 
Kaká  0.000417  3.500.000 
Wayne Rooney  0.000407  28.000.000 
Fernando Torres  0.000402  4.900.000 
Steven Gerrard  0.000400  1.400.000 
Samuel Eto’o  0.000399  1.400.000 
Robbie Keane  0.000390  876.000 
Daniele De Rossi  0.000389  5.250.000 
Neymar  0.000388  70.000.000 
Cesc Fábregas  0.000377  35.000.000 
Sergio Agüero  0.000376  42.000.000 
Andrés Iniesta  0.000376  24.500.000 
Wesley Sneijder  0.000370  10.500.000 
David Villa  0.000358  4.900.000 
Gianluigi Buffon  0.000349  1.400.000 
Carlos Tévez  0.000347  14.000.000 
Player  PageRank score  Player  PageRank score 

Gianluigi Donnarumma  0.000020  Hachim Mastour  0.000023 
Alexandru Petrus  0.000011  Ianis Hagi  0.000020 
Maximiliano Romero  0.000010  Dani Olmo  0.000017 
Robert Moldoveanu  0.000009  Martin Ödegaard  0.000015 
Vlad Dragomir  0.000009  Reece Oxford  0.000015 
Player  PageRank score  Player  PageRank score 

Youri Tielemans  0.000070  Alen Halilovic  0.000052 
Breel Embolo  0.000054  Gabriel  0.000052 
Malcom  0.000042  Kingsley Coman  0.000050 
Ante Ćorić  0.000036  Timo Werner  0.000049 
Andrija Balić  0.000035  Fabrice Olinga  0.000044 
Player  PageRank score  Player  PageRank score 

Max Meyer  0.000058  Mateo Kovacic  0.000107 
Luke Shaw  0.000057  Marquinhos  0.000092 
Adrien Rabiot  0.000053  Domenico Berardi  0.000091 
Ángel Correa  0.000052  Raheem Sterling  0.000089 
Dorin Rotariu  0.000052  Gerard Deulofeu  0.000086 
Player  PageRank score  Player  PageRank score 

Romelu Lukaku  0.000202  Neymar  0.000388 
Paul Pogba  0.000151  Lucas  0.000178 
Julian Draxler  0.000143  Mario Götze  0.000175 
Raphaël Varane  0.000104  Christian Eriksen  0.000157 
Luciano Vietto  0.000096  Jack Wilshere  0.000148 
4.2 Springboard Clubs Identification
From the club transfer network analysis we can show that the best springboard club among the clubs in the top twenty leagues is Standard Liege. The analysis provides very good results, since the top 15 clubs list is lacking the most valuable and the best clubs in the world. Top 15 clubs by betweenness centrality scores and their scores calculated on network using both weight equations are listed in Table 10. The results also show very slight difference between both proposed weight equations. The top two clubs are the same regardless of the weight and the third and the fourth switch positions if we change the weight calculation equation. All the clubs on the top 15 list are from less valuable leagues and these clubs normally buy younger players that are more affordable and sell the ones whose value rises above a certain level. This makes them a perfect springboard for younger and less experienced players. Because of such transfer activity such clubs get high score according to betweenness centrality as they play an important role in the transfer paths from less valuable clubs to the best clubs.
Club ranking using betweenness centrality  

Club  Score by value (Eq. 3)  Club  Score by rank (Eq. 2) 
Standard Liege  0.013605  Standard Liege  0.012823 
AEK Athens  0.011217  AEK Athens  0.012240 
SL Benfica  0.010937  Sporting CP  0.010424 
Sporting CP  0.010312  SL Benfica  0.010172 
Skoda Xanthi  0.009605  AS Monaco  0.009275 
Dinamo Bukarest  0.008743  FC Porto  0.008988 
AS Monaco  0.008704  Rubin Kazan  0.008884 
Dinamo Zagreb  0.008675  CFR Cluj  0.008681 
Olympiacos Pir.  0.008553  Skoda Xanthi  0.008638 
CFR Cluj  0.008542  Dinamo Bukarest  0.008518 
Steaua Bucharest  0.008180  Olympiacos Pir.  0.008397 
Udinese Calcio  0.007899  Rangers FC  0.008216 
FC Porto  0.007889  Dinamo Zagreb  0.008170 
Celtic FC  0.007849  Iraklis Thess.  0.007925 
Petrolul Ploiesti  0.007794  Red Bull Salzburg  0.007907 
5 Conclusion
Player collaboration network from the past fifteen seasons from the top twenty football leagues consists of over 36 thousand nodes and nearly 1.5 million edges. Therefore, time and space consuming algorithms can prove too demanding to run on regular computers. Weighted PageRank algorithm however was able to calculate the scores for all the players in a very reasonable time. With the PageRank algorithm and proper edge weight, we are able to identify the top players from the period of last fifteen seasons. A very important factor in the weight equation is the inflation rate which ensures that older players that were never as valuable as the best players of the last seasons are also present on the top players list.
Using the same network, we are also able to identify the most perspective football players by separating their PageRank scores into age groups. Using this approach, we compare only players of similar age that have played for similar number of seasons. This ensures the same conditions for all the players in a specific age group. Results highlight some young players that already play for the best football clubs and some young players from less known clubs, where they play an essential role.
Results from club transfer network analysis are very similar to initial hypothesis. We expect clubs from less valuable leagues to come on top. We are able to identify springboard clubs by using the data about player transfers from the past fifteen seasons by constructing a directed weighted network with adequate weights using the data we have on the club value or the club rankings in the past seasons. With the proposed network, we use a weighted betweenness centrality algorithm to reveal the best springboard clubs in the top football leagues in the world. Our algorithm identifies some clubs from Belgian, Greek and Portuguese leagues as the best springboard clubs.
References

(1)
FIFA, Big Count (2006).
URL http://www.fifa.com/worldfootball/bigcount/  (2) J. L. Peña, H. Touchette, A network theory analysis of football strategies, arXiv preprint arXiv:1206.6904.
 (3) J. Duch, J. S. Waitzman, L. A. N. Amaral, Quantifying the performance of individual players in a team activity, PloS one 5 (6) (2010) e10937.
 (4) C. Cotta, A. M. Mora, J. J. Merelo, C. MereloMolina, A network analysis of the 2010 fifa world cup champion team play, Journal of Systems Science and Complexity 26 (1) (2013) 21–42.
 (5) S. Mukherjee, Identifying the greatest team and captainâa complex network approach to cricket matches, Physica A: Statistical Mechanics and its Applications 391 (23) (2012) 6066–6076.
 (6) F. Radicchi, M. Perc, Who is the best player ever? a complex network analysis of the history of professional tennis, PloS one 6 (2) (2011) e17249.

(7)
Eurostat, HICP  inflation rate (2015).
URL http://ec.europa.eu/eurostat/  (8) L. Page, S. Brin, R. Motwani, T. Winograd, Pagerank: Bringing order to the web, Tech. rep., Stanford Digital Libraries Working Paper (1997).
 (9) L. C. Freeman, A set of measures of centrality based on betweenness, Sociometry (1977) 35–41.
 (10) U. Brandes, A faster algorithm for betweenness centrality, Journal of Mathematical Sociology 25 (2) (2001) 163–177.