Complete trails of co-authorship network evolution
The rise and fall of a research field is the cumulative outcome of its intrinsic scientific value and social coordination among scientists. The structure of the social component is quantifiable by the social network of researchers linked via co-authorship relations, which can be tracked through digital records. Here, we use such co-authorship data in theoretical physics and study their complete evolutionary trail since inception, with a particular emphasis on the early transient stages. We find that the co-authorship networks evolve through three common major processes in time: the nucleation of small isolated components, the formation of a tree-like giant component through cluster aggregation, and the entanglement of the network by large-scale loops. The giant component is constantly changing yet robust upon link degradations, forming the network’s dynamic core. The observed patterns are successfully reproducible through a new network model.
pacs:89.75.Hc, 89.75.Fb, 89.65.-s
At the brink of the twenty-first century, two papers heralding the beginning of a new science of network, which were for the small-world and the scale-free networks ws (); ba (), were published. Since then, complex network research (CNR) has flourished, not only as an active research field but also as a common structural analytic framework by which the systems approach to various complex systems in the natural, social, and information sciences can potentially be unified book (); caldarelli-book (); internet-book (); palsson (); social (). Along with the emergence of a new research field, a new research community forms and evolves. Thus, the CNR provides a useful example to study the spreading of a research field in terms of the evolution of the social network behind it perspective (); newman (); toroczkai (); onnela (). Specifically, we can quantify evolutionary patterns in its initial transient periods of the network formation, which has never been explicitly observed. We can also portray the route through which the co-authorship network reaches a fully-grown state.
To achieve this goal, we apply the complex network theory to study how the CNR has developed since its inception: We first construct the co-authorship network in which nodes are researchers participating in the CNR and a link is made when two authors write a paper together newman (). The weight of the link is given by the number of papers co-authored. To track the evolution of the network, the two aforementioned papers ws (); ba () and three early review papers rmp (); advphys (); siam () have been chosen. They are the highest cited papers and regarded as pioneering papers in the CNR field. Next, we considered all the subsequent published papers citing any of these five papers to engage in the CNR. According to Web of Science, there are 5,008 such papers with information on the list of authors and publication times measured in months, written by 6,816 non-redundant authors (in terms of their last name and initials) for a period spanning 127 months from June 1998 to December 2008. The co-authorship network was constructed each month from the papers published up to that month exclude (). This network is a growing, weighted network. With the data, we have performed a detailed temporal analysis of the evolution of the large-scale structure of the network and discuss its social implications. We particularly emphasize the early stage of evolution, which has not been addressed in previous studies pref (); palla ().
Our fine-scale temporal analysis in Secs. II–IV reveals a global structural transition of the network through three major regimes of (I) the nucleation of small isolated components, (II) the formation of a tree-like giant component by cluster aggregation, and (III) the network entanglement by long-range loop formation. The network reaches the steady state in which the mean separation between two nodes stabilizes around a finite value. The locality constraint, that is, new links are formed much more locally than globally, played an important role in sustaining the network’s tree-like structure in regime (II). Here, by tree-like structure, we mean that the network is dominated by short-range loops and devoid of long-range connections, thus becomes a tree when coarse-grained into the network of supernodes, corresponding in this case to groups led by each principal investigator. This implies that most papers are made through in-group collaborations, even though researchers began sharing ideas through international conferences. If the locality effect were weak, the intermediate stage (II) would not appear. Moreover, such a tree-like structure is a fractal and sustains even underneath the entangled network in late regime (II). This structure is unveiled upon the removal of inactivated edges, and has the same fractal dimension as in tree-like structure. This implies that a hidden ordered structure with the same fractal dimension underlies in the evolution process. In Sec. V, a model is constructed based on the empirical findings and suggests that a structural transition in the real co-authorship network can be understood as a percolation transition in the growth parameter space. Finally, we will summarize the results and discuss their robustness and implications in Sec. VI.
Ii Large-scale network evolution and structural properties
Since its inception, CNR has grown steadily over the decade, and the pace of growth has not yet started to decelerate (Fig. 1a). The largest connected component (giant component) of the co-authorship network has also grown in size and has reached almost a half of the total network (Fig. 1a). The mean separation between two nodes in the giant component becomes relatively stable to reach around after passing the intermediate regime, during which it displays strong temporal fluctuations (Fig. 1b). This stable behavior of is robust in other co-authorship networks maldacena (); randall (); kpz (); soc ().
The network has grown both by the expansion and merging of existing components and by the continuous introduction of new components. In the earliest stage (regime (I) in Fig. 1b), small-sized components nucleate independently and their number and size increase with time. Most of the currently highly connected nodes (researchers) have already appeared in this regime, playing the role of pioneers and contributing to the progress of the field. On the brink of the intermediate stage (regime (II) in Fig. 1b), the giant component is formed, which might be promoted by the first international conference exclusively devoted to CNR (1⃝ in Figs. 1b-c). In regime (II), the giant component grows in a tree-like manner; it branches out more and deeper with the passage of time, but rarely establishes links between branches, as can be seen by coarse-graining the network (Fig. 2) obtained by the affinity propagation algorithm affinity (). This can also be seen quantitatively in the distribution of shortcut lengths, which is dominated by the peak at (Fig. 3c). At this stage, the giant component is a fractal song (); goh_box () (Figs. 3a,b), with the fractal dimension measured by the recently introduced box-covering algorithm fractal_chaos () and the mean branching ratio around unity fractal_long (). With the passage of time, such dynamics continue and the giant component and mean separation gradually increase. Component sizes become inhomogeneous in the growth process (Fig. 4). Such a steady growth may be promoted by large-scale international conferences such as the one held in Santa Fe with the purpose of bringing together scientists from diverse disciplines to discuss network science problems across fields (2⃝ in Fig. 1b-c). However, there are a few intermittent jumps (e.g., the left arrow in Fig. 1b), resulting from the merging of smaller but macroscopic components with the largest one (Fig. 5a). A large-scale loop does not appear until 2004, and it is formed by the long-range inter-branch link (Fig. 5b). Such a long-range loop formation can be monitored by the sudden drop in the mean separation of the giant component (i.e., the right arrow in Fig. 1b). This change can also be monitored by examining the size of largest bi-connected component. Furthermore, before this long-range loop formation the ratio between the largest bi-connected component size to the largest singly-connected component size tends to decrease, implying the tree-like growth of the largest component during the period. This event is a consequence of the first major multinational project devoted to CNR in Europe (COSIN) (3⃝ in Figs. 1b,c). Another prominent example is the peak in January 2006 (Figs. 5c,d). Since then, an increasing number of large-scale loops have been formed, resulting in an increasingly entangled and interwoven giant component structure and the network has made the transition into regime (III), in which the network properties such as the mean separation become stable, despite the steady growth of the giant component. This transition into a stable research field may be epitomized by the establishment of a regular international conference gathering researchers from various multidisciplinary fields (International Workshop and Conference on Network Science (NetSci); 4⃝ in Figs. 1b,c).
Temporal evolution of motif contents reveals the global structural changes from a different viewpoint. We observed that the motifs with one triangle begin to be significant approximately from the beginning of the intermediate regime (II), whereas the motifs containing two triangles do so around June 2005, at which the mean separation exhibits a drastic jump. In this way, the temporal evolution of motif contents is related to that of the mean separation and complements the global evolution picture.
The co-authorship network exhibits heavy-tailed behaviors in the degree (number of links a node is connected to) and strength (the sum of the weights of links a node has) distributions, which become robust over time (Figs. 6a,b). The degree-degree correlation within the giant component is almost neutral or weakly assortative, in contrast with the assortative behavior observed for the full network and other social networks (Figs. 6c,d) friendship (); mobile ().
Iii Effect of link degradations
Social ties decay in strength over time in the absence of reinforcement. Co-authorship links may be no longer active if the collaboration ceased long ago. Thus, to ensure that the generic features remain robust, it is informative to examine how the overall network structure is affected in the presence of a link degradation process. The central question would be whether the giant component persists to support the integration of the research field. To this end, for each month, we removed all the links that had not been re-activated during the previous two years–a typical postdoctoral contract period.
The link degradation process significantly affects network configurations because many links become inactive in the end (Fig. 7a); for example, 86% of the links formed up to the year 2006 eventually disappeared before the end of 2008 according to the two-year inactivation rule. However, the giant component not only persists upon degradation, but is also more stable, in the sense that its relative size has been stable at 10% of the total network since 2000 (Fig. 7c). At the same time, the link-degraded giant component (LDGC) is highly dynamic, in that its members constantly change over time. At the end of 2008, 1,195 nodes formed the LDGC, among which only 272 were the LDGC members in December 2006 composed of 727. This indicates that the CNR is still a vigorous field palla (). The LDGC exhibits a tree-like structure throughout the observation period, implying that such a tree-like spanning component structure exists to provide a dynamic backbone underlying the complex original interwoven network. Furthermore, the LDGC (Fig. 7a) topologically resembles the original giant component in regime (II) (Fig. 5a), and their fractal dimensions are the same as (Figs. 3d and 7b).
Link degradation properties indicate the future prospects of the research field. Based on the evolutionary trajectory observed, the CNR has passed its initial transient growth period and has now settled into a steady growth regime with stationary topological properties such as the degree distribution and the mean separation , even taking the link degradation process into consideration. Moreover, the link degradation process in the co-authorship network occurs in a manner that is consistent with the so-called asymmetric disassembly uzzi (), where the probability of link degradation decreases with the degree of the connecting node (Fig. 7d). Given that asymmetric disassembly provides structural robustness in a declining network uzzi (), the current steady growth of the CNR co-authorship network backed up by the asymmetric disassembly implicates the integration and stability of research discipline in the future, even after it eventually enters into the network saturation stage.
Iv Microscopic link dynamics
To understand the microscopic mechanisms responsible for the large-scale evolution pattern observed, we focus on the link dynamics. We categorize each new link into five classes depending on the nature of the nodes that it connects and measure their relative frequency in the link dynamics (Fig. 8a). They are i) the duplicate link, connecting two nodes already linked, ii) the intra-component link, connecting two unlinked nodes in the same component, iii) the cluster-growing link, connecting an existing node in a component to a new node, thereby resulting in an incremental growth of the component, iv) the cluster-merging link, connecting two nodes in different components, and v) the new-cluster link, connecting two new nodes to introduce a new component.
Among them, the duplicate link, the cluster-growing link, and the new-cluster link are found to be of high frequency, each constituting approximately a quarter to a third of all the links. The remaining two classes, the cluster-merging link and the intra-cluster link are far less frequent, comprising 2.8% and 4.7% of all links, respectively. Although infrequent, the latter two classes are the driving forces of major large-scale structural changes: The former provides the punctuated growth of components by merging existing macroscopic components, while the latter can introduce long-range loops that entangle the connectivity structure.
We found that the existing nodes play an equally important role as that of the new nodes: Among nodes connected by new links, approximately half of them are existing nodes (51%). We also found a clear signature of the locality effect (i.e., the tendency of nodes in proximity to link with each other), which manifests itself as a strong enrichment of the links connecting nodes at shorter separations, compared to random linkages without such a locality constraint. For example, about 47% of the links between existing nodes are found to be separated by two links before linking, that is, they connect “friends of a friend”, compared to 1.5% for random linkages (Fig. 8b). It is this locality-constrained link formation that is responsible for the tree-like growth dominating the early structure of the network.
V Network evolution model
We model the co-authorship network evolution by incorporating the observed microscopic link dynamics. The network model is built upon a number of previous network models callaway (); dm (); holme (); guimera (); kertesz (), with relevant growth ingredients such as the preferential attachment-based internal link formations dm (), the triad formation due to locality constraint holme (), and the team-based evolution guimera (). Although all these ingredients are found relevant in the co-authorship network evolution, none of them alone can account for the whole process. Therefore, we combined ingredients from these previous models and incorporate them into the combined model with additional parameters for the relative frequencies of these processes. In this combined network evolution model which is schematically depicted in Fig. 9a, the model network evolves by a node dynamics and a link dynamics, according to the following rules applied at each time step: i) With the probability , a new component with three connected nodes (a triangle) is introduced, representing a new group. The average number of authors per paper is observed to be approximately three (Fig. 10a). ii) With the complementary probability , a new node is added, and then it selects an existing component in proportion to the component size and connects to a node in the component chosen in proportion to the degree (preferential attachment ba ()), as well as to a randomly chosen neighbor of that node, forming a triangle. iii) Independently, with the probability , a randomly chosen node links to a random neighbor of its neighbor (a friends’ friend). iv) With the complementary probability , a triangle is formed by a random pair of nodes with separation larger than two and one of their nearest neighbors. Here, the parameters and control the influx of new components and the strength of the locality effect, respectively.
v.1 Model simulation results
We run the network model up to the size for various growth parameters and and calculate the general characteristics of the model network, specifically the fraction of nodes in the giant component and the average size of the finite (non-giant) components used in the percolation study. Here, and , where is the fraction of -sized component and the summation runs over finite components. In addition, we measure the mean separation between two nodes in the giant component, which may be analogous to the correlation length used in the percolation theory. decreases monotonically in both and (Fig. 9b-1), quite abruptly in the region bounded with dashed lines. (Fig. 9b-2) exhibits a peak behavior along the same region, denoted by dotted line. Both behaviors suggest a percolation transition-type event occurring across the region in the parameter space. The mean separation (Fig. 9b-3) also displays a peak behavior along the same region in a large regime, denoted by dotted line, establishing that the giant component is tree-like in the percolation transition region.
Having understood the generic behavior of the network model, where does the real co-authorship network reside in the parameter space? We measure the parameter set from the empirical data, finding that while is steady at , depends on time (Fig. 10b). We estimate it roughly by taking the average over time to be . Interestingly, the measured parameter set (indicated by a white dot in Figs. 9b) is located within the percolation transition region. This implies that the network achieves a balance between the continuous influx of new isolated components and the formation of global connectivity. The tree-like giant component may be rooted in the fact that most research groups tend to work independently and rarely collaborate with other group members; however, the hub group members perform out-of-group collaborations more actively by locating themselves near the center of the network.
The configurations generated from the network model with the empirically measured parameter set successfully reproduce the observed general large-scale structural features, such as the tree-like growth of the giant component (Fig. 9c) as well as a strong fluctuation in the mean separation in the early time regime, followed by its stabilization through the network entanglement by long-range loops in the later time regime (Fig. 9d). Outside of the empirical parameter point, the model network evolves in different ways. Notably, when the locality effect is weak (small ), the network is unable to maintain the initial tree-like growth stage and quickly forms a hairball-like interwoven structure, without the macroscopic-scale cluster aggregation process, as in most of the unconstrained random growth models (Fig. 11). Furthermore, generalization of the model with more complex component structure such as a mixture of dimer and trimer in the growth rule does not affect the main results. Thus, the current simple network model appears to address the essential mechanisms underlying the evolution of real networks in a minimal way.
In the past 10 years, more than 5,000 papers have been published on the subject of the CNR by approximately 6,800 researchers. Using the Web of Science database, we have traced those papers for 127 months. The evolution exhibits the percolation transition through which the giant component forms to establish global connectivity under the continuous supply of non-inbred new members into a society. The co-authorship network currently appears to maintain diversity without sacrificing the internal evolution within groups, by a moderate value of model parameter . However, in order to form a stable research discipline, both the growth and the connectedness are important. In this respect, the existence of the giant component spanning the system, even with link degradation, would represent the maturity of the subject in that it allows the exchange of ideas through the body of the community by occasional unconstrained collaborations that overcome the prevailing locality effect. Such formation of a giant component would correspond to emergence of the so-called invisible college guimera (). However, such a dominant college supported by the core tree-like giant component is shown to be highly dynamic, raising the possibility of continuous diversity and variations in the leading ideas and research trends. What is remarkable is that these evolutionary patterns appear robustly in other systems. In addition to the CNR, we chose two new topics in theoretical high-energy physics, the Anti-DeSitter/Conformal Field Theory duality conjecture maldacena () and the Randall-Sundrum model randall (), and confirmed that their evolutionary pattern is similar to what we discussed for the CNR. On the other hand, when we consider a research field that is fading away, we get different patterns. For example, the core giant component spanning the co-authorship network disappears after link degradation, as observed in the cases of the fractal surface growth and the self-organization criticality, triggered by the papers kpz () and soc (), respectively (Fig. 12). These two subjects have passed their heydays in 1990s, and now their LDGC has degenerated even though new papers are published in a steady rate and the new comers still continue to enter the fields. Thus the relative size of the co-authorship LDGC may be an indicator of the current state of a research field. Even though there may be other subject-specific factors that are responsible for some of the observed properties and therefore comparison between different fields has to be interpreted with care, our finding that they exhibit many shared patterns suggests the validity of common evolutionary mechanisms explored in this work.
We have shown that two microscopic mechanisms, the continuous influx of new nodes and groups and link formations strongly constrained by the locality effect, underly the observed co-authorship evolution pattern. Moreover, the current state of the network was found to be nearly critical from the perspective of percolation theory. An important remaining question is how the system has located itself in such a delicate state. One appealing answer might be that it has done so in a self-organized way, calling for further studies in this direction.
This work is supported by Mid-career Researcher Program through NRF grants funded by the MEST (No. 2010-0015066) (to BK) and (No. 2009-0080801) (to K-IG), and by KRCF (to DK).
- (1) D. J. Watts and S. H. Strogatz, Nature (London) 393, 440 (1998).
- (2) A.-L. Barabási and R. Albert, Science 286, 509 (1999).
- (3) M. E. J. Newman, A.-L. Barabasi, and D. J. Watts, Structure and dynamics of networks (Princeton University Press, Princeton, 2006).
- (4) Caldarelli, G. Scale-free networks (Oxford University Press, Oxford, 2007).
- (5) R. Pastor-Satorras and A. Vespignani, Evolution and structure of the Internet: A statistical physics approach (Cambridge University Press, Cambridge, 2004).
- (6) B. O. Palsson, Systems biology: Properties of reconstructed networks (Cambridge University Press, Cambridge, 2006).
- (7) M. O. Jackson, Social and economic networks (Princeton University Press, Princeton, 2008).
- (8) D. Lazer, et al. Science 323, 721 (2009).
- (9) M. E. J. Newman, Proc. Natl. Acad. Sci. U.S.A. 98, 404 (2001).
- (10) S. Eubank, H. Guclu, V. S. A. Kumar, M. Marathe, A. Srinivasan, Z. Toroczkai, and N. Wang, Nature (London) 429, 180 (2004).
- (11) J.-P. Onnela, J. Saramäki, J. Hyvönen, G. Szabó, D. Lazer, K. Kaski, J. Kertész, and A.-L. Barabási, Proc. Natl. Acad. Sci. U.S.A. 104, 7332 (2007).
- (12) R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 (2001).
- (13) S. N. Dorogovtsev and J. F. F. Mendes, Adv. Phys. 51, 1079 (2002).
- (14) M. E. J. Newman, SIAM Rev. 45, 167 (2003).
- (15) We excluded 11 papers with more than 20 authors, most of which were from large-scale biology experiments such as the human genome sequencing project.
- (16) H. Jeong, Z. Néda, and A.-L. Barabási, Europhys. Lett. 61, 567 (2003).
- (17) G. Palla, A.-L. Barabási, and T. Vicsek, Nature (London) 446, 664 (2007).
- (18) J. M. Maldacena, Adv. Theor. Math. Phys. 2, 231 (1998); J. M. Maldacena, Int. J. Theor. Phys. 38, 1113 (1999); S. S. Guber, I. R. Klebanov, and A. M. Polyakov, Phys. Lett. B. 428, 105 (1998); E. Witten, Adv. Theor. Math. Phys. 2, 253 (1998).
- (19) L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 3370 (1999).
- (20) M. Kardar, G. Parisi, and Y.-C. Zhang, Phys. Rev. Lett. 56, 889 (1986).
- (21) P. Bak, C. Tang, and K. Wiesenfeld, Phys. Rev. Lett. 59, 381 (1987).
- (22) B.J. Frey and D. Dueck, Science 315, 972 (2007).
- (23) C. Song, S. Havlin, and H. A., Makse, Nature (London) 433, 392 (2005).
- (24) K.-I. Goh, G. Salvi, B. Kahng, and D. Kim, Phys. Rev. Lett. 96, 018701 (2006).
- (25) D.-H. Kim, J.-D. Noh, and H. Jeong, Phys. Rev. E 70, 046126 (2004).
- (26) J. S. Kim, K.-I. Goh, B. Kahng, and D. Kim, Chaos 17, 026116 (2007).
- (27) J. S. Kim, K.-I. Goh, G. Salvi, E. Oh, B. Kahng, and D. Kim, Phys. Rev. E 75, 016110 (2007).
- (28) M. Llas, P. M. Gleiser, J. M. Lopez, and A. Diaz-Guilera, Phys. Rev. E 68, 066101 (2003).
- (29) G. Caldarelli, R. Pastor-Satorras, and A. Vespignani, Euro. Phys. J. B 38, 183 (2004).
- (30) A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani, Proc. Nat. Aca. Sci. U.S.A. 101, 3747 (2004).
- (31) M. C. González, H. J. Herrmann, J. Kertész, and T. Vicsek, Physica A 379, 307-316 (2007).
- (32) J.-P. Onnela, J. Saramäki, J. Hyvönen, G. Szabó, M. Argollo de Menezes, K. Kaski, A.-L. Barabási, and J. Kertész, New J. Phys. 9, 179 (2007).
- (33) S. Saavedra, F. Reed-Tsochas, and B. Uzzi, Proc. Natl. Acad. Sci. U.S.A. 105, 16466 (2008).
- (34) D. S. Callaway, J. E. Hopcroft, J. M. Kleinberg, M. E. J. Newman, and S. H. Strogatz, Phys. Rev. E 64, 041902 (2001).
- (35) S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin, Phys. Rev. E 64, 066110 (2001).
- (36) P. Holme and B.J. Kim, Phys. Rev. E 65, 026170 (2002).
- (37) R. Guimera, B. Uzzi, J. Spiro, and A. N. Amaral, Science 308, 697 (2005).
- (38) J. M. Kumpula, J.-P. Onnela, J. Saramäki, K. Kaski, and J. Kertész, Phys. Rev. Lett. 99, 228701 (2007).