Schools are segregated by educational outcomes in the digital space

Schools are segregated by educational outcomes in the digital space

Ivan Smirnov Institute of Education; National Research University Higher School of Economics, Myasnitskaya ul., 20, Moscow 101000, Russia

The Internet provides students with a unique opportunity to connect and maintain social ties with peers from other schools, irrespective of how far they are from each other. However, little is known about the real structure of such online relationships. In this paper, we investigate the structure of interschool friendship on a popular social networking site. We use data from students from schools of a large European city. We find that the probability of a friendship tie between students from neighboring schools is high and that it decreases with the distance between schools following the power law. We also find that students are more likely to be connected if the educational outcomes of their schools are similar. We show that this fact is not a consequence of residential segregation. While high- and low-performing schools are evenly distributed across the city, this is not the case for the digital space, where schools turn out to be segregated by educational outcomes. There is no significant correlation between the educational outcomes of a school and its geographical neighbors; however, there is a strong correlation between the educational outcomes of a school and its digital neighbors. These results challenge the common assumption that the Internet is a borderless space, and may have important implications for the understanding of educational inequality in the digital age.

schools, social networks, educational outcomes, digital inequality

Even Pope said so pope2014 (). The Internet creates unique opportunities for people to connect with each other. It may, therefore, be significantly beneficial for its users because social ties are known to play a significant role in human well-being including life-satisfaction diener1999subjective (), health holt2010social (); kawachi2001social (), and professional development podolny1997resources (); ng2005predictors (). There is growing evidence that these findings apply not only to offline social ties but to online friendship as well hobbs2016online (); manago2012me (). This role of the internet may be particularly important for underprivileged groups of people such as students from low-performing schools who lack resources in their immediate environment. Connections with students from high-performing schools might potentially influence their university aspirations cohen1983peer (), improve educational outcomes lomi2011some (), and promote positive behavioral change maxwell2002friends ().

People from underprivileged backgrounds tend not to benefit as much as their peers from the Internet (a phenomenon usually referred to as digital inequality dimaggio2004unequal ()). While well-educated people often use the Internet for medical or juridical advice, job seeking or education, their less educated peers use it predominantly for entertainment pearce2017somewhat (); buchi2016modeling (); van2014digital (). The use of social media by students is known to be differentiated in a similar way depending on their academic performance. High-performing students use it for information seeking while low-performing students for chatting and entertainment junco2012too (); smirnov2018predicting (). It may be expected that online social ties would also depend on academic achievements and that students might be segregated by the educational outcomes in the digital space. At a general level, segregation is the degree to which several groups of people are separated from each other allen2007should (). In this paper, we investigate whether students from high- and low-performing schools are separated (i.e. not connected via online friendship) in the digital space.

Figure 1: The school network. Circles represent schools. Different colors correspond to administrative districts of Saint Petersburg. Two schools are connected if there is a friendship tie between their students. For visual clarity, only strong connections (at least three friendship ties) are shown.

We use data from 15-year-old students from schools of Saint Petersburg, Russia, registered on a popular social networking site VK111 (see Methods for details about the sample). VK is the Russian analog of Facebook and the largest European social networking site. It is ubiquitous among young Russians: more than 90% of 18-24-year-olds use it regularly fom2016online (). The information in users’ public profiles includes their age and the schools they are studying in. This information is available via the open application programming interface (API) of VK. We use the VK API to download information about all students who indicate that they study in one of Saint Petersburg’s schools and who were born in 2001 (i.e. that students were 15 years old at the time of data collection).

Similar to other social networking sites, users might become “friends” on VK if they mutually confirm this status. We use information about such online friendships to construct a weighted network of schools (Fig. 1), where two schools are connected if there is at least one friendship tie between their students (see Methods for details), and the weight corresponds to the number of such ties. For each school, the information about its geographical coordinates along with the performance of its graduates on the unified state examination (USE) is available (see Methods). The USE scores serve as a proxy for schools’ educational outcomes.

Residential segregation by income is believed to be an important source of variation in schools’ educational outcomes in some countries flores2008residential (); gordon2003urban (); owens2018income (). It means that low-performing schools are concentrated in less affluent neighborhoods and the educational outcomes of a school could be effectively predicted from the socioeconomic status of its district reardon2017geography (). The situation might be different in Saint Petersburg thanks to the egalitarian nature of the Russian educational system inherited from the Soviet period. To account for potential effects of residential segregation, we collect data from apartments from the largest Russian real estate site CIAN222 and use average apartment price as a proxy of neighborhood affluence. We then check whether schools’ educational outcomes are correlated with the affluence of their neighborhood.

We measure geographical segregation of schools as a correlation between the educational outcomes of a school and those of its closest geographical neighbors. We then compare this segregation with that in the digital space. In this case, instead of the closest geographical neighbors, we examine the educational outcomes of schools’ closest digital neighbors. We assume that the distance between two schools in the digital space is inversely proportional to the number of online friendship ties between them.

The probability of an online friendship between two people is known to be strongly dependent on the geographical distance between them takhteyev2012geography (); shin2015new (); lengyel2015geographies (); grabowicz2014entangling (). It is, therefore, important to ensure that any observed effect for the digital network of schools is not solely driven by the geographical constraints. To achieve this, we use a random graph model that preserves geographical constraints – namely, the probability of a friendship tie between two schools given the geographical distance between them. We then compare the results obtained for such random networks with the observed results for the real network.

Figure 2: Probability of a friendship tie between two schools as a function of distance between these schools. For close schools, the probability is and it then declines with distance following the power law (inset).


Distance and online relationships
We find that geographical distance plays an important role in the formation of an interschool friendship. The probability of a friendship tie between two close schools is high () but it declines rapidly with distance following the power law (Fig. 2). The best fit is provided by the exponent (Fig. 2 inset), which is similar to the previously observed results grabowicz2014entangling ().

Figure 3: Correlation between educational outcomes of schools and their closest geographical (a) and digital neighbors (b). While there is no correlation for physical neighbors, there is a relatively strong correlation for digital neighbors. These results hold true regardless of the number of neighbors used in the analysis.

Geographical segregation
We find that the educational outcomes of schools do not depend on their distance from the city center (Pearson correlation coefficient between USE scores of schools and their distance from the center is , ). The distance from the center may be, however, a poor proxy for neighborhood affluence. Hence, we additionally collect information about average apartment prices across the city (see Fig. S1 for the corresponding heat map). We then compute the correlation between schools’ USE scores and neighborhood affluence, (see Methods). The exact value depends on (see Fig. S2), and the maximum value is , indicating a weak correlation between educational outcomes and neighborhood affluence. Finally, we compute a correlation between USE scores of schools and average USE score of their N closest geographical neighbors, (see Methods). We find no correlation for (Fig. 3a); this result holds true for all values of (Fig. S3).

We therefore find that there is only a weak if any relationship between educational outcomes of a school and its location in physical space. However, as we show in the next section, this result does not apply for the school location in the digital space.

Digital segregation
We find that there is a relatively strong correlation between the educational outcomes of schools and their N closest digital neighbors (see Methods). for (Fig 3b). The correlation is significant for all (Fig. S3).

To rule out the role of geographical constraints in the observed digital segregation, we use a random graph model that preserves relationships between distance and probability of a friendship tie from the observed network (i.e. we create a tie between two schools with a probability from distribution represented in Figure 2 that depends on distance between schools). We compute for generated random networks and compare it with . After simulations we obtain and . The maximum value . This result makes the observed digital segregation significant with .

We also find that high-performing schools not only tend to be connected with each other but also have more connections on average than low-performing schools. The correlation between the degree centrality of schools in the network and their educational outcomes is . Note that this simple network property explains as much variation in the educational outcomes as the socioeconomic status of students yasterbov2014contextualizing (), which is one of the key context variables used in educational studies sirin2005socioeconomic ().

We show, therefore, that the educational outcomes of a school are closely related to its location in the digital space. More central schools tend to be high performing. We also show that schools with similar academic performance tend to be connected in the digital space. We demonstrate that these results cannot be explained by schools’ locations in the physical space.


Both for research and policy-making purposes, it is crucial to understand the context in which schools operate. This requirement traditionally means collecting information about school resources and the socioeconomic status of its students. Today, students spend much of their time online koroleva2016always (), and it may be warranted to consider students’ online environment on a par with their home environment. In this paper, we focus only on one dimension of such an online environment, namely interschool friendship on a social networking site. We find that school position in an online friendship network could explain as much variation in the educational outcomes of its students as their socioeconomic status, indicating the importance of the digital context. Online inequalities might merely reflect existing socioeconomic inequality or rather complement it. Future research is required to clarify this relationship.

Social media have become the main source of information for young people. In Russia, VK is referred to as the main source of information about the country and the world by 70.3% of respondents — more than any other information source kasamara2017 (). It is also considered more trustworthy than traditional media kasamara2017 (). The news feed of the social network mainly comprises posts shared by online friends. The friends from different schools may, therefore, be an important source of diversity in the information environment of students. In particular, the connections with students from high-performing schools could have a positive impact on students from low-performing schools. However, our results suggest that interschool friendship ties mainly exist between schools with similar educational outcomes. Intriguingly, this digital separation cannot be explained by the geographical location of schools. This result means that the digital environment not only fails to remove segregation, but rather might amplify it.


Data collection
According to the open data government portal333, there are 638 high schools in Saint Petersburg. This number excludes specific types of schools such as boarding schools, cadet schools, and educational centers. We use open VK API to find these schools in the VK database. We find VK IDs for 628 of the schools. We exclude school №1 from the sample because it has an unreasonable number of users (more than 1000 per cohort). We also exclude two pairs of schools with identical names. We then use data from the web portal "Schools of Saint Petersburg"444 to obtain the average performance of schools’ graduates at the Unified State Examination. This is a mandatory state examination that all school graduates should pass in Russia. This information was available for 590 schools from our sample.

We then perform requests to VK API to obtain the lists of all users who were born in 2001 and indicate that they are studying in one of the schools from our sample. To exclude users who provided false information about their school, we remove profiles with no friends from the same school, as previously recommended smirnov2016search (). We also exclude students who indicate several schools in their profiles. Finally, we download the lists of all VK friends for users from our sample. All collected data is publicly available. The VK team confirmed to us that we can use its API in this way for research purposes.

We also use data from the largest Russian real estate site CIAN to collect information about the prices of all 2-room apartments in Saint Petersburg listed on the site. For each apartment, its price per square meter was calculated. CIAN team approved the use of this data for research purposes.

Network of schools
We define a adjacency matrix that represents the friendship network of students (i.e. if students and are friends on VK and otherwise). We assume that student studies in school , and construct a weighted network of schools by counting the number of all friendship ties between two schools. This network is represented by matrix where

One potential disadvantage of this definition is that two schools could be considered as closely connected when only one student from the first school has a lot of friends from the other. We therefore also use an alternative way to define the weight of the school tie. In this case, instead of friendship ties, we count the number of students from one school that have friends from another (i.e. we define ). We could then construct a symmetric matrix . This alternative metric leads to the same results, and therefore we opted for the first more straightforward approach.

Segregation measures
If is the average performance on the Unified State Examination of graduates from school , we could then define segregation based on the affluence of school neighborhoods in the following way:

where is the price of apartment in rubles per square meter and is the distance between school and apartment .

We denote geographical neighbors of school by . is an ordered list of all schools such as , where is the geographical distance between schools. We then denote the list of -closest geographical neighbors by . We define the -closest digital neighbors by replacing geographical distance with the digital distance that is equal to .

We then define geographical and digital segregations by academic performance in the following manner:

Note that in the case of digital segregation, there could be several schools with exactly the same distance from a certain school. In this case, is not uniquely defined. In our computations, we randomly select with equal probabilities one of the possible .


  • [1] Pope Francis. Message for the 48th world communications day., 2014. [Accessed 02.08.2018].
  • [2] Ed Diener, Eunkook M Suh, Richard E Lucas, and Heidi L Smith. Subjective well-being: Three decades of progress. Psychological bulletin, 125(2):276, 1999.
  • [3] Julianne Holt-Lunstad, Timothy B Smith, and J Bradley Layton. Social relationships and mortality risk: a meta-analytic review. PLoS medicine, 7(7):e1000316, 2010.
  • [4] Ichiro Kawachi and Lisa F Berkman. Social ties and mental health. Journal of Urban health, 78(3):458–467, 2001.
  • [5] Joel M Podolny and James N Baron. Resources and relationships: Social networks and mobility in the workplace. American sociological review, 62(5):673–693, 1997.
  • [6] Thomas WH Ng, Lillian T Eby, Kelly L Sorensen, and Daniel C Feldman. Predictors of objective and subjective career success: A meta-analysis. Personnel psychology, 58(2):367–408, 2005.
  • [7] William R Hobbs, Moira Burke, Nicholas A Christakis, and James H Fowler. Online social integration is associated with reduced mortality risk. Proceedings of the National Academy of Sciences, 113(46):12980–12984, 2016.
  • [8] Adriana M Manago, Tamara Taylor, and Patricia M Greenfield. Me and my 400 friends: The anatomy of college students’ Facebook networks, their communication patterns, and well-being. Developmental psychology, 48(2):369, 2012.
  • [9] Jere Cohen. Peer influence on college aspirations with initial aspirations controlled. American Sociological Review, 48(5):728–734, 1983.
  • [10] Alessandro Lomi, Tom AB Snijders, Christian EG Steglich, and Vanina Jasmine Torló. Why are some more peer than others? evidence from a longitudinal study of social networks and individual academic performance. Social Science Research, 40(6):1506–1520, 2011.
  • [11] Kimberly A Maxwell. Friends: The role of peer influence across adolescent risk behaviors. Journal of Youth and adolescence, 31(4):267–277, 2002.
  • [12] Paul DiMaggio, Eszter Hargittai, Coral Celeste, and Steven Shafer. From unequal access to differentiated use: A literature review and agenda for research on digital inequality. Social inequality, pages 355–400, 2004.
  • [13] Katy E Pearce and Ronald E Rice. Somewhat separate and unequal: digital divides, social networking sites, and capital-enhancing activities. Social Media+ Society, 3(2):2056305117716272, 2017.
  • [14] Moritz Büchi, Natascha Just, and Michael Latzer. Modeling the second-level digital divide: A five-country study of social differences in internet use. New Media & Society, 18(11):2703–2722, 2016.
  • [15] Alexander JAM Van Deursen and Jan AGM Van Dijk. The digital divide shifts to differences in usage. New media & society, 16(3):507–526, 2014.
  • [16] Reynol Junco. Too much face and not enough books: The relationship between multiple indices of Facebook use and academic performance. Computers in human behavior, 28(1):187–198, 2012.
  • [17] Ivan Smirnov. Predicting PISA scores from students’ digital traces. In International Conference on Web on Social Media, pages 360–364, 2018.
  • [18] Rebecca Allen and Anna Vignoles. What should an index of school segregation measure? Oxford Review of Education, 33(5):643–668, 2007.
  • [19] Public Opinion Foundation. Online practices of Russians: social networks., 2016. [Accessed 01.08.2018].
  • [20] Carolina Andrea Flores. Residential segregation and the geography of opportunites: a spatial analysis of heterogeneity and spillovers in education. PhD thesis, The University of Texas at Austin, 2008.
  • [21] Ian Gordon and Vassilis Monastriotis. Urban size, spatial segregation and educational outcomes. London: London School of Economics, 2003.
  • [22] Ann Owens. Income segregation between school districts and inequality in students’ achievement. Sociology of Education, 91(1):1–27, 2018.
  • [23] Sean F Reardon, Demetra Kalogrides, and Ken Shores. The geography of racial/ethnic test score gaps. CEPA Working Paper No. 16-10., 2017.
  • [24] Yuri Takhteyev, Anatoliy Gruzd, and Barry Wellman. Geography of twitter networks. Social networks, 34(1):73–81, 2012.
  • [25] Won-Yong Shin, Bikash C Singh, Jaehee Cho, and André M Everett. A new understanding of friendships in space: Complex networks meet twitter. Journal of Information Science, 41(6):751–764, 2015.
  • [26] Balázs Lengyel, Attila Varga, Bence Ságvári, Ákos Jakobi, and János Kertész. Geographies of an online social network. PloS one, 10(9):e0137248, 2015.
  • [27] Przemyslaw A Grabowicz, Jose J Ramasco, Bruno Gonçalves, and Víctor M Eguíluz. Entangling mobility and interactions in social media. PloS one, 9(3):e92196, 2014.
  • [28] Gordey Yasterbov, Alexey Bessudnov, Marina Pinskaya, and Sergey Kosaretsky. Contextualizing academic performance in russian schools: School characteristics, the composition of student body and local deprivation. Higher School of Economics Research Paper, 2014.
  • [29] Selcuk R Sirin. Socioeconomic status and academic achievement: A meta-analytic review of research. Review of educational research, 75(3):417–453, 2005.
  • [30] Diana Koroleva. Always online: Using mobile technology and social media at home and at school by modern teenagers. Educational Studies, 1:205–224, 2016.
  • [31] Kasamara, Valeria and Sorokina, Anna. Russian students’ values., 2017. [Accessed 02.08.2018].
  • [32] Ivan Smirnov, Elizaveta Sivak, and Yana Kozmina. In search of lost profiles: The reliability of vkontakte data and its importance in educational research. Educational Studies, 4:106–122, 2016.
Figure S1: The heat map of average apartment price in Saint Petersburg. The price is the highest in the city center, near Krestovsky Island, and along the Moscow avenue.
Figure S2: Segregation as a function of the radius R that defines school neighborhood.
Figure S3: Digital and geographical segregations as functions of the number of the neighbors N used in the analysis.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description