Identifying top football players and springboard clubs from a football player collaboration and club transfer networks

Identifying top football players and springboard clubs from a football player collaboration and club transfer networks

Matic Tribušon mt0932@student.uni-lj.si Matevž Lenič ml9497@student.uni-lj.si University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, SI-1000 Ljubljana, Slovenia
Abstract

We consider all players and clubs in top twenty world football leagues in the last fifteen seasons. The purpose of this paper is to reveal top football players and identify springboard clubs. To do that, we construct two separate weighted networks. Player collaboration network consists of players, that are connected to each other if they ever played together at the same club. In directed club transfer network, clubs are connected if players were ever transferred from one club to another. To get meaningful results, we perform different network analysis methods on our networks. Our approach based on PageRank reveals Christiano Ronaldo as the top player. Using a variation of betweenness centrality, we identify Standard Liege as the best springboard club.

keywords:
football network, sports networks, network analysis, measures of centrality

1 Introduction

Football is probably the most popular sport in the world with around 265 million active players BCount () around the globe and even more people enjoy watching it. Every year a lot of money is spent by football clubs in attempt to build a strong team by buying good players from their rivals. Most of the data available on official football unions or tournament websites normally only addresses a specific match, tournament or season. In order to collect the data for top leagues in the last few seasons, we need to look elsewhere. A very interesting website from this perspective is www.transfermarkt.co.uk. It contains all the major leagues including all the clubs, rankings, players, information about the players and also their estimated market value.
In an attempt to analyse players and the connections between them we construct a large network of professional football players from different clubs in different leagues. We are particularly interested in the influence of the teammates on a football player and if it is possible to identify the best players, based on knowing with whom they play now and where they played in the past. Using these analyses we could be able to find out which players are the best according to different metrics. In a football player network, two players are connected to each other if they have ever played for the same club. Such network can be represented by a bipartite graph consisting of clubs and players. Every player is connected to all the clubs he has played for and through the nodes that represent clubs we are able to see which players played together for a specific club. For simpler analysis we separate these problems and project the bipartite graph to a network constructed only from nodes representing football players. Two nodes are connected to each other if they were ever teammates. This is an undirected network. Apart from the analysis of players, we also want to identify the best springboard clubs that are the players entry point into the best football clubs in the world. Because we do not include information about clubs in the first network, we construct a second network. The second network is a club transfer network. Clubs from the top twenty leagues represent nodes which are connected if any player was ever transferred from one club to another. The direction of the edge points from the club that sold the player to the club that bought the player.
Preliminary analysis on an unweighted undirected player collaboration network shows that weighted networks are needed in order to extract information about the best players. We expect very well known football players to come on top when analysing a weighted player collaboration network. In order to identify springboard clubs a weighted directed club transfer network has to be constructed. Weights of the edges are calculated using different equations that take into account multiple metrics. Using those networks we identify the top players in the world of football and the top springboard clubs.

2 Related Work

As it has been pointed out in pena2012network (), football data is becoming more easily available in the past years since FIFA has made more data regarding different matches available on their website. Many authors took advantage of that and constructed different networks to perform network analysis and gather information from the networks. In pena2012network () the authors used some interesting approaches to reveal key players of a certain team, performing analysis on a passing network of a specific team. They showed we are able to identify different kinds of strategies of a team such as focusing passes on a single player or evenly distributing passes between all players in the team. They performed several analyses on a team passing network using very well known network analysis methods such as PageRank, Betweenness centrality and Closeness centrality.
Player contribution to a team was also analysed in duch2010quantifying (). They used a variation of betweenness centrality of the player with regard to opponent’s goal, which authors denoted as flow centrality. We use similar network theory methods, but we adapt them to test different theories. In cotta2013network () they dug a little deeper but followed the same idea. They only concentrated on one specific team and constructed more networks for the same match, introducing the time dimension.
Although there are various papers regarding football network analysis, the majority of football networks are only considering a certain match or tournament. In this paper we construct a much larger network consisting of thousands of players. In other team sports, such as cricket, some authors have already tried to identify the best individuals among all the players that played over a certain time period. A very interesting networks considering sportsmen throughout several decades were analysed in mukherjee2012identifying (); radicchi2011best (). In radicchi2011best () the authors attempted to find out who is the best player of tennis in history of this sport. We try to construct a somewhat similar network but since football is very different from tennis the networks still differ a lot. Since the main difference is that football is a team sport, we can not just link players based on their matches. Here players are connected based on their affiliation to a club.

3 Methods

3.1 Data Extraction and Network Construction

In this paper we analyse a large set of football players throughout the past fifteen seasons. In order to collect this data we use the site www.transfermarkt.co.uk, which is becoming the leading portal when it comes to football players and information about them. Several scripts are used to extract relevant data for different clubs and players. Network is constructed from players out of 20 most valuable football leagues from year 2001 to 2016. The leagues and their values are presented in Table 3.
Using the gathered data we constructed two separate networks, first one consisting of football players and the other consisting of football clubs. Football player network is a player collaboration network where players are connected if they ever played together at the same club. It is an undirected weighted network consisting of 36,214 nodes and 1,412,232 edges. Other basic network properties are shown in Table 1. Club network is a directed transfer network between all the clubs in the top twenty world football leagues. Nodes represent clubs and a club is connected to another club if a player was ever transferred from the first club to the second club. It is a directed weighted network consisting of 330 nodes and 12,841 edges. Other basic network properties are shown in Table 2.

Property Value
Nodes 36,214
Edges 1,412,232
Fraction of nodes in LCC 99.96%
Average degree 78.02
Average clustering 0.67
Table 1: Player collaboration network properties
Property Value
Nodes 330
Edges 12.841
Fraction of nodes in LCC 99.90%
Average degree 77.82
Average distance 2.03
Table 2: Club transfer network properties
Figure 1: Player network degree distribution
Leagues
League Value [£] League Value [£]
Permier League (ENG) 3,01bn Pro League (BEL) 354m
La Liga (ESP) 2,25bn Primera Division (ARG) 306m
Serie A (ITA) 1,79bn Premier Liga (UKR) 299m
1. Bundesliga (GER) 1,65bn Super League(GRE) 227m
Ligue 1 (FRA) 1,06bn Super League (SWI) 175m
Super Lig (TUR) 698m MLS (USA) 162m
Premier Liga (RUS) 638m Liga 1 (ROM) 118m
Serie A (BRA) 608m 1. HNL (CRO) 115m
Liga NOS (POR) 574m Bundessliga (AUT) 110m
Eredivisie (NED) 375m Premiership (SCO) 85m
Table 3: Table showing value of top football leagues

3.2 Player Network Analysis

In order to reveal the best players in our network, we choose an appropriate method of determining node importance. Since we wanted to identify the best players in the last fifteen seasons, we expected the most known and valued names of football to be at the top of the list. Not to neglect younger players, we also separate players into age groups. We analysed each age group individually in order to identify the most perspective players. Since our network is a collaboration network, we have to categorize the edges. The players that play with the best players are usually good themselves. Players with a lower value may change a lot of clubs and change a lot of teammates in a couple of seasons, but this categorization penalises their edges. In general, player market value is a good identifier of the quality of a player. Therefore we choose market value as a core property to calculate the edge weight. Since our data spans over fifteen seasons, we have to take into account the inflation, so that good players that played in the past are not penalised. We gather average inflation rate from InfRatio (). The final formula for calculating the weight of a specific edge is

(1)

Symbols and are values of players that are connected by the edge, represents the seasons in which players played together and represents average inflation ratio per year for Europe in the last 13 years. The equation is divided by 100000, to obtain smaller numbers. To calculate which node is the most important, we choose one of the most popular node importance algorithms, PageRank page1997pagerank (). We calculate the PageRank score of every node in our weighted network. To identify the most perspective players, we separate players into age groups. The most perspective players have the highest score in their age groups.

3.3 Club Network Analysis

From the club transfer network we want to identify the springboard clubs. These are the clubs where younger players gather experience and are later sold to better or even the best clubs in the world. Similar to the player collaboration network, this network has to be weighted as well. We are able to extract the number of transfers in both directions for all pairs of clubs but the absolute number does not provide the necessary information for springboard club identification. Thus, we have to weight every edge, representing the number of transfers from one club to another, with a weight related to the importance of the destination club. The importance of the destination club is calculated using two different equations. One is based on average ranking of the destination club in the past fifteen seasons and the ranking of the league they play in, and the other one is based on the destination club value. Both equations are stated and explained below.

(2)
(3)

Weight in the Equation 2 is calculated as a reciprocal value of destination club average ranking in the past fifteen seasons multiplied by the our predefined destination club league ranking . Predefined league rankings can be found in Table 4 and are defined for the purpose of this paper. Weight in the Equation 3 is calculated as destination club average value in the past fifteen seasons divided by to lower the weight values.
To identify springboard clubs we have to choose a different method from the one we use for player collaboration network. The most important thing in this network are the transfer paths from less valuable to the most valuable clubs. A club is considered a springboard if it is involved in a lot of transfers to the most valuable clubs. Thus, the betweenness centrality freeman1977set () is the most suitable measure. We implement a fast betweenness algorithm discussed in fastBet (). Since our network is weighted we have to modify the proposed algorithm so it takes weights into account. The only difference from the proposed algorithm is calculation of path lengths where we do not add one for every hop but take weight into account. We have to take the reciprocal value of weight as in our network larger weight is better and we want to favour edges with larger weights.

League rankings
League Ranking League Ranking
La Liga (ESP) 100 Premier League (UKR) 20
Premier League (ENG) 95 Super League (SWI) 20
Serie A (ITA) 85 Serie A (BRA) 20
Bundesliga (GER) 75 Super Lig (TUR) 15
Ligue 1 (FRA) 50 Primera Division (ARG) 15
Primera Liga (POR) 40 Super League(GRE) 13
Eredivisie (NED) 40 Liga 1 (ROM) 12
Pro League (BEL) 25 1. HNL (CRO) 10
Premier League(RUS) 25 Bundesliga (AUT) 10
Premiership (SCO) 20 MLS (USA) 5
Table 4: Predefined league rankings (higher is better)

4 Results and Discussion

4.1 Top players

After running the analysis on the player collaboration network, we can show that the best player according to our analysis is Cristiano Ronaldo. He is followed by several other players that have played for several of the best clubs. By looking at the Table 5, where top 20 players identified by our algorithm and their scores are listed, we can see that the value of the player is not the only thing that affects the score of a player. Players like Beckham, Ronaldinho, Kaká and Keane, whose market value decreased a lot lately because of their age, but they played for a lot of important clubs in their career, have high scores. Most players on the top 20 list are still active today and are playing in the best leagues.
The most perspective players in each age group are listed in Tables 9, 8, 7 and 6. When assessing player’s perspectiveness, the most important factor besides his value and the values of his teammates is the player’s age. Since our network is an undirected network connecting two players, age can not be simply added to the weight equation. Including age into weight equation would favour players that have valuable teammates and also players that have younger teammates, which is not desired. Therefore, for identifying the most perspective players, the network can stay the same, we just need to interpret results differently. We divide players into different groups based on their age and compare only scores of players in the same groups. On average, older players have higher scores, which is expected as they played more seasons, which results in higher degree. Thus, the separation into age groups is beneficial. Some of the most perspective players based on our algorithm already play for the best clubs and others, despite their young age, play an important role in their clubs.
Based on the results, we can conclude that PageRank is an appropriate algorithm for determining the best players in our weighted network.

Player PageRank score Value 2015/16 [£]
Cristiano Ronaldo 0.000557 77.000.000
Lionel Messi 0.000544 84.000.000
David Beckham 0.000528 /
Zlatan Ibrahimović 0.000459 10.500.000
Ronaldinho Gaúcho 0.000444 1.005.000
Kaká 0.000417 3.500.000
Wayne Rooney 0.000407 28.000.000
Fernando Torres 0.000402 4.900.000
Steven Gerrard 0.000400 1.400.000
Samuel Eto’o 0.000399 1.400.000
Robbie Keane 0.000390 876.000
Daniele De Rossi 0.000389 5.250.000
Neymar 0.000388 70.000.000
Cesc Fábregas 0.000377 35.000.000
Sergio Agüero 0.000376 42.000.000
Andrés Iniesta 0.000376 24.500.000
Wesley Sneijder 0.000370 10.500.000
David Villa 0.000358 4.900.000
Gianluigi Buffon 0.000349 1.400.000
Carlos Tévez 0.000347 14.000.000
Table 5: Player collaboration network PageRank results
Player PageRank score Player PageRank score
Gianluigi Donnarumma 0.000020 Hachim Mastour 0.000023
Alexandru Petrus 0.000011 Ianis Hagi 0.000020
Maximiliano Romero 0.000010 Dani Olmo 0.000017
Robert Moldoveanu 0.000009 Martin Ödegaard 0.000015
Vlad Dragomir 0.000009 Reece Oxford 0.000015
Table 6: PageRank results for players born in year 1999 (left) and 1998 (right)
Player PageRank score Player PageRank score
Youri Tielemans 0.000070 Alen Halilovic 0.000052
Breel Embolo 0.000054 Gabriel 0.000052
Malcom 0.000042 Kingsley Coman 0.000050
Ante Ćorić 0.000036 Timo Werner 0.000049
Andrija Balić 0.000035 Fabrice Olinga 0.000044
Table 7: PageRank results for players born in year 1997 (left) and 1996 (right)
Player PageRank score Player PageRank score
Max Meyer 0.000058 Mateo Kovacic 0.000107
Luke Shaw 0.000057 Marquinhos 0.000092
Adrien Rabiot 0.000053 Domenico Berardi 0.000091
Ángel Correa 0.000052 Raheem Sterling 0.000089
Dorin Rotariu 0.000052 Gerard Deulofeu 0.000086
Table 8: PageRank results for players born in year 1995 (left) and 1994 (right)
Player PageRank score Player PageRank score
Romelu Lukaku 0.000202 Neymar 0.000388
Paul Pogba 0.000151 Lucas 0.000178
Julian Draxler 0.000143 Mario Götze 0.000175
Raphaël Varane 0.000104 Christian Eriksen 0.000157
Luciano Vietto 0.000096 Jack Wilshere 0.000148
Table 9: PageRank results for players born in year 1993 (left) and 1992 (right)

4.2 Springboard Clubs Identification

From the club transfer network analysis we can show that the best springboard club among the clubs in the top twenty leagues is Standard Liege. The analysis provides very good results, since the top 15 clubs list is lacking the most valuable and the best clubs in the world. Top 15 clubs by betweenness centrality scores and their scores calculated on network using both weight equations are listed in Table 10. The results also show very slight difference between both proposed weight equations. The top two clubs are the same regardless of the weight and the third and the fourth switch positions if we change the weight calculation equation. All the clubs on the top 15 list are from less valuable leagues and these clubs normally buy younger players that are more affordable and sell the ones whose value rises above a certain level. This makes them a perfect springboard for younger and less experienced players. Because of such transfer activity such clubs get high score according to betweenness centrality as they play an important role in the transfer paths from less valuable clubs to the best clubs.

Club ranking using betweenness centrality
Club Score by value (Eq. 3) Club Score by rank (Eq. 2)
Standard Liege 0.013605 Standard Liege 0.012823
AEK Athens 0.011217 AEK Athens 0.012240
SL Benfica 0.010937 Sporting CP 0.010424
Sporting CP 0.010312 SL Benfica 0.010172
Skoda Xanthi 0.009605 AS Monaco 0.009275
Dinamo Bukarest 0.008743 FC Porto 0.008988
AS Monaco 0.008704 Rubin Kazan 0.008884
Dinamo Zagreb 0.008675 CFR Cluj 0.008681
Olympiacos Pir. 0.008553 Skoda Xanthi 0.008638
CFR Cluj 0.008542 Dinamo Bukarest 0.008518
Steaua Bucharest 0.008180 Olympiacos Pir. 0.008397
Udinese Calcio 0.007899 Rangers FC 0.008216
FC Porto 0.007889 Dinamo Zagreb 0.008170
Celtic FC 0.007849 Iraklis Thess. 0.007925
Petrolul Ploiesti 0.007794 Red Bull Salzburg 0.007907
Table 10: Club transfer network betweenness centrality results

5 Conclusion

Player collaboration network from the past fifteen seasons from the top twenty football leagues consists of over 36 thousand nodes and nearly 1.5 million edges. Therefore, time and space consuming algorithms can prove too demanding to run on regular computers. Weighted PageRank algorithm however was able to calculate the scores for all the players in a very reasonable time. With the PageRank algorithm and proper edge weight, we are able to identify the top players from the period of last fifteen seasons. A very important factor in the weight equation is the inflation rate which ensures that older players that were never as valuable as the best players of the last seasons are also present on the top players list.
Using the same network, we are also able to identify the most perspective football players by separating their PageRank scores into age groups. Using this approach, we compare only players of similar age that have played for similar number of seasons. This ensures the same conditions for all the players in a specific age group. Results highlight some young players that already play for the best football clubs and some young players from less known clubs, where they play an essential role.
Results from club transfer network analysis are very similar to initial hypothesis. We expect clubs from less valuable leagues to come on top. We are able to identify springboard clubs by using the data about player transfers from the past fifteen seasons by constructing a directed weighted network with adequate weights using the data we have on the club value or the club rankings in the past seasons. With the proposed network, we use a weighted betweenness centrality algorithm to reveal the best springboard clubs in the top football leagues in the world. Our algorithm identifies some clubs from Belgian, Greek and Portuguese leagues as the best springboard clubs.

References

  • (1) FIFA, Big Count (2006).
    URL http://www.fifa.com/worldfootball/bigcount/
  • (2) J. L. Peña, H. Touchette, A network theory analysis of football strategies, arXiv preprint arXiv:1206.6904.
  • (3) J. Duch, J. S. Waitzman, L. A. N. Amaral, Quantifying the performance of individual players in a team activity, PloS one 5 (6) (2010) e10937.
  • (4) C. Cotta, A. M. Mora, J. J. Merelo, C. Merelo-Molina, A network analysis of the 2010 fifa world cup champion team play, Journal of Systems Science and Complexity 26 (1) (2013) 21–42.
  • (5) S. Mukherjee, Identifying the greatest team and captain—a complex network approach to cricket matches, Physica A: Statistical Mechanics and its Applications 391 (23) (2012) 6066–6076.
  • (6) F. Radicchi, M. Perc, Who is the best player ever? a complex network analysis of the history of professional tennis, PloS one 6 (2) (2011) e17249.
  • (7) Eurostat, HICP - inflation rate (2015).
    URL http://ec.europa.eu/eurostat/
  • (8) L. Page, S. Brin, R. Motwani, T. Winograd, Pagerank: Bringing order to the web, Tech. rep., Stanford Digital Libraries Working Paper (1997).
  • (9) L. C. Freeman, A set of measures of centrality based on betweenness, Sociometry (1977) 35–41.
  • (10) U. Brandes, A faster algorithm for betweenness centrality, Journal of Mathematical Sociology 25 (2) (2001) 163–177.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
278326
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description