Modularity and community detection in bipartite networks.

Modularity and community detection in bipartite networks.

Michael J. Barber Austrian Research Centers GmbH—ARC, Bereich systems research, Vienna, Austria michael.barber@arcs.ac.at
July 17, 2019
Abstract

The modularity of a network quantifies the extent, relative to a null model network, to which vertices cluster into community groups. We define a null model appropriate for bipartite networks, and use it to define a bipartite modularity. The bipartite modularity is presented in terms of a modularity matrix ; some key properties of the eigenspectrum of are identified and used to describe an algorithm for identifying modules in bipartite networks. The algorithm is based on the idea that the modules in the two parts of the network are dependent, with each part mutually being used to induce the vertices for the other part into the modules. We apply the algorithm to real-world network data, showing that the algorithm successfully identifies the modular structure of bipartite networks.

pacs:
89.75.Hc, 02.10.Ud

I Introduction

Networks have attracted a burst of attention in the last decade (useful reviews include Refs. Christensen and Albert (2007); Newman (2006a, 2003); Albert and Barabási (2002)), with applications to natural, social, and technological networks. Of great current interest is the identification of the modular structure of the network. Detecting modules, or communities, allows quantitative investigation of relevant subnetworks, which may have different properties from the aggregate properties of the network as a whole, e.g., modules in the World Wide Web are sets of topically related web pages.

Informally, a network module is a subgraph whose vertices are more likely to be connected to one another than to the vertices outside the subgraph. A variety of approaches (Angelini et al., 2007; Gol’dshtein and Koganov, 2006; Hastings, 2006; Newman and Leicht, 2007; Reichardt and Bornholdt, 2006; Palla et al., 2005; Newman and Girvan, 2004; Clauset et al., 2004; Girvan and Newman, 2002) have been taken to explore this concept. See Refs. Danon et al. (2005); Newman (2004a) for useful reviews.

In this work we focus on the measure called modularity, introduced by Newman and Girvan (2004). Modularity reflects the extent, relative to a null model network, to which edges are formed within modules instead of between modules. Using the modularity, we can assess the quality of any assignment of vertices to modules. Further, the module identification problem becomes a modularity optimization problem. However, exact maximization of the modularity is in general an intractable problem, because the number of ways to partition the set of vertices grows extremely rapidly (Rota, 1964). In light of this, a number of effective algorithms have been introduced to find high modularity partitions of the vertices (Pujol et al., 2006; Newman, 2004b). The modularity can be also be defined in terms of a so-called modularity matrix, the eigenspectrum of which has a fundamental relationship with the modular nature of the network (Newman, 2006b).

Given the explicit dependence of the modularity upon a null model, it is clear that the specific choice of null model has a profound impact on the modularity. Surprisingly, only one null model has been so far explored at length: networks with edges randomly assigned such that the expected degrees of model-network vertices equal the actual degrees of corresponding real-network vertices (Newman, 2006b). Specific classes of networks have additional constraints that could be and, indeed, should be reflected in the null model.

A significant such class of networks is that of bipartite networks. The vertices of a bipartite network can be partitioned into two disjoint sets such that no two vertices within the same set are adjacent. There are thus two distinct kinds of vertices, providing a natural representation for many affiliation or interaction networks, with one kind of vertex representing actors and the other representing relations. Examples of actor-relation pairs include people attending events (Davis et al., 1941; Freeman, 2003; Doreian et al., 2004), court justices making decisions (Doreian et al., 2004), scientists jointly publishing articles (Newman, 2001a, b), organizations collaborating in projects (Barber et al., 2006; Roediger-Schluga and Barber, 2007), and legislators serving on committees (Porter et al., 2007). Arguably, bipartite networks are the empirically standard case for social networks and other interaction networks, with unipartite networks appearing—often implicitly—as projections.

In the statistical physics community, the usual approach taken to identify modules in bipartite networks is to first construct a unipartite projection of one part of the network, and then identify modules in that projection using methods for unipartite networks. For example, in the scientist-publication network mentioned above, a network of scientists is created by linking scientists when they have jointly published. These unipartite projection can be illuminating, but intrinsically lose information—indeed, Guimerà et al. (2007) demonstrate that analysis of an unweighted, unipartite projection can give unreliable or incorrect results.

The principal contribution in this work is a proposed definition of a modularity for bipartite networks. The approach taken is based on defining a bipartite modularity matrix as an extension of the recent work by Newman (2006b). Some key properties of the eigenspectrum of are identified and used to specialize Newman’s matrix-based algorithms to bipartite networks. An additional algorithm fundamentally based on the bipartite character of the networks is introduced; we call the algorithm BRIM, for bipartite, recursively induced modules.

In parallel, Guimerà et al. (2007) have independently investigated modularity in bipartite networks. They proceed by first identifying the two parts of the network as actors and teams, and then formulating a bipartite modularity in which modules consist of groups of actors that are closely interconnected based on joint participation in many teams. The resulting modularity is thus focused on identifying modules in only one part of the network at a time. Interesting, Guimerà et al. point out the possibilities of classifying both partite sets of the network simultaneously and of customizing spectral methods for bipartite networks, which is essentially the approach taken in the present work.

As of this writing, we are aware of no other attempts to define modularity for bipartite networks. However bipartite networks, or “two mode networks,” have undergone several related studies in the sociology community using other methods (see, e.g., Refs. Doreian et al. (2004); Freeman (2003) and references cited therein).

The structure of the paper is as follows: in section II we define a modularity matrix and measure for bipartite networks. We discuss using the bipartite modularity matrix to identify modules in section III, and apply the algorithm therein devised to two real-world networks in section IV. Finally, we conclude in section V with an assessment of the present investigation and an outlook for future work.

Ii Bipartite Modularity

In this section, we develop a modularity matrix for bipartite networks. Structurally and notationally, the development parallels the discussion of the modularity matrix by Newman (2006b).

Consider a network with vertices and edges defined by an adjacency matrix . Each vertex is assigned to a community group or module, denoted by . The modularity for such an assignment reflects the extent, relative to a null model, to which edges are formed within modules instead of between modules. Formally, the modularity is defined as

(1)

where the are the adjacency matrix elements and the are probabilities in the null model that an edge exists between vertices and .

The modularity can be given an equivalent definition in matrix form. First, the community indices with values taken from are replaced by an index matrix , where is the number of modules. All elements of take on either a 0 or 1 value, so that column is an index vector showing membership in module ; a value of 1 in position of indicates that vertex belongs to module . Given that each vertex is assigned to exactly one module, each row of has a single unit value and the index vectors are thus orthogonal.

Further, a modularity matrix is defined with elements

(2)

Using and , the modularity becomes

(3)

The eigenspectrum of has a fundamental relationship with the modular nature of the network, as Newman (2006b) has explored.

From Eqs. (1) through (3), it is apparent that the choice of null model has a profound impact on the modularity. Thus, for example, a Bernoulli random graph with constant for all and is a poor representation of most real-world networks, so would be an inappropriate choice of null model. Instead, the usual choice of null model (Newman, 2006b) assigns edges at random with the expected degrees of model vertices constrained to match the degrees in the actual network.

In much the same fashion, bipartite networks have specific constraints that should be reflected in the null model. The vertices of a bipartite network can be partitioned into two disjoint sets such that no two vertices within the same set are adjacent. An equivalent, but more visual, definition is that the vertices in a bipartite graph can be assigned one of two colors, say red and blue, with no neighboring vertices bearing the same color. In the remainder of this section, we will define a null model with the above requirement that the expected degrees match the degrees in the real network, along with the additional constraint that each edge links a red vertex and a blue vertex.

Let be the number of red vertices and be the number of blue vertices; this implies . Without loss of generality, assume that the vertices are indexed so that red vertices are labeled and the blue vertices are labeled . The adjacency matrix then has a block off-diagonal form of

(4)

where is the all-zero matrix with rows and columns. Require the same block structure for that is exhibited by , giving

(5)

This form for assigns zero likelihood to edges between vertices with the same color, precluding any such edges in the null model.

The modularity matrix in turn has a block off-diagonal form of

(6)

where . The all-zero blocks on the diagonal are the potential modularity contributions from pairs of vertices of the same color being present in a module; all meaningful contributions, positive or negative, to the modularity thus are made by pairs of vertices with distinct colors. In contrast, with the usual null model based on unipartite networks (Newman, 2006b), the corresponding blocks contain only negative elements (or zeros for isolated nodes of degree zero), always providing a modularity penalty for pairs of like-colored vertices in the same module.

Equation (1) can be rewritten as

(7)

where . Since when all vertices are in the same module, we can set all and equal, giving

(8)

so that

(9)

Thus, the expected number of edges in the null model must equal the number of edges in the actual network.

The degrees of the red vertices are given by , while those of the blue vertices are given by . By constraining the expected degrees in the null model to match the actual degrees, as discussed above, we obtain

(10)
(11)

Since

(12)
(13)

Eqs. (10) and (11) ensure that Eq. (9) holds.

In the usual null model, the probability of an edge being present between two vertices is proportional to the product of the degrees of the vertices. For the bipartite case, this becomes for some constant . Combining this definition with Eqs. (11) and (12), we obtain

(14)

so that and thus

(15)

The same result can be obtained from Eqs. (10) and (13) instead of Eqs. (11) and (12). With Eq. (15), we have fully defined the modularity for a bipartite network.

Iii Module Identification

iii.1 Spectral Methods for Module Identification

Using the modularity defined in section II, we can assess the quality of any partitioning of the vertices of a bipartite graph into modules. A partitioning can be determined using any method. Two general approaches seem relevant. First, the modularity defined in section II can be maximized using standard optimization algorithms such as genetic algorithms, greedy search methods (Newman, 2004b), or extremal optimization (Duch and Arenas, 2005); this is generally straightforward and will not be discussed at length in this work. Second, the spectral properties of or other matrices associated with the graph can be analyzed to partition the vertices into modules.

For example, one standard partitioning approach is to assign the vertices to modules using spectral partitioning (SP). In spectral partitioning, the eigenvectors of the network Laplacian are used to minimize the number of edges running between groups. The SP approach has a significant drawback: the vertices are assigned to modules of predetermined size. This is problematic for the investigation of real-world networks, where the number and sizes of community groups are not generally known in advance.

An analogous approach based on the spectral properties of the modularity matrix has recently been proposed (Newman, 2006b). Since the modularity is conceptually closer to our understanding of network community structure, this spectral optimization of modularity (SOM) is better tailored for real world networks.

An important special case in both spectral partitioning and spectral optimization of modularity is to assign the vertices to two groups based on a single eigenvector of the Laplacian (SP) or modularity (SOM) matrix. In the case of SP, we are interested in the eigenvector corresponding to the smallest positive eigenvalue; this is the Fiedler vector. For SOM, we are interested in the leading eigenvector , corresponding to the largest positive eigenvalue of ; we propose calling this the Newman vector. Using the Newman vector, we approximate as

(16)

With just two modules, , so that the modularity in Eq. (3) becomes

(17)

Recall that the index vectors and take on values from . It is clear how to maximize the modularity in Eq. (17): when , the th element of , is positive, assign vertex to the first module by setting the th entry of to one, and when is negative, assign vertex to the second module by setting the th entry of to one 111The assignment when is arbitrary, and makes no contribution to the modularity..

The use of multiple of eigenvectors allows more than two modules to be considered (), with at most one module more than the number of positive eigenvalues of (Newman, 2006b). Additional eigenvectors of can also be used for SOM (Newman, 2006b) in a vector partitioning algorithm adapted from spectral partitioning (Alpert and Yao, 1995; Alpert et al., 1999). In the present work, we will not make use of this algorithm, nor of a recursive bipartitioning approach, instead developing an alternative technique that capitalizes on the bipartite nature of the networks.

iii.2 Module Identification in Bipartite Networks

In section III.1, we have seen how to identify community groups of networks by using the Newman vector to maximize . However, we made no use of the bipartite character of the networks. For a bipartite network, the eigenvalue equation can be written as

(18)

where is a vector and is a vector. The left-hand side of Eq. (18) can be multiplied out, giving

(19)

i.e., and .

Additionally, we can construct a vector from and , so that

(20)

Hence, for any eigenvalue of , is also an eigenvalue of .

Since only the eigenvectors corresponding to positive eigenvalues of can give positive contributions to , we can focus on just the positive eigenvalues . In this case, and are, respectively, left and right singular vectors of . If we shift our attention from the spectral decomposition of to the singular value decomposition (SVD) of , we therefore automatically exclude the eigenvectors of that correspond to negative eigenvalues.

The appearance of the singular vectors of is not surprising. All the information about the linkage structure of the network is contained in , and the singular value decomposition is the natural generalization of the spectral decomposition used for to asymmetric matrices like . What is more, the singular values and singular vectors of can sometimes provide more information than the eigenvalues and eigenvectors of .

For example, the number of modules is at most one more than the number of positive eigenvalues of . Since, for each vertex, the expected degree in the null model equals the actual degree in the network, the rows and columns of all sum to zero. The rank of , which equals the number of singular values of , must then be less than both and . From this, we conclude that the number of communities is at most equal to the smaller of and .

To assign vertices to modules using , we first partition the index matrix so that

(21)

The matrices and have dimensions and , respectively, indexing the red and blue vertices into modules. Substituting the partitioned matrices into Eq. (3), we obtain

(22)

Our goal then becomes to assign network vertices to modules such that Eq. (22) is maximized.

One approach to optimizing the modularity as expressed in Eq. (22) is essentially the same as the Newman vector approach considered in section III.1. Without loss of generality, label the singular values such that . Approximate as

(23)

Now, we bipartition the vertices with and , so that

(24)

As with the Newman vector approach, is maximized by assigning the vertices to modules based on the signs of the corresponding component of or , as appropriate. This maximizes the magnitude of the inner products in Eq. (24), with consistent assignment of both red and blue vertices to the same module based on the signs ensuring that positive contributions are made to the modularity.

iii.3 Recursive Identification of Bipartite Modules

In sections III.1 and III.2, we have seen how the leading eigenvector of and the leading singular vectors of can be used to bipartition network vertices. Extending these methods to use the full modularity matrices and to handle more than two modules is in general nontrivial. However, for the bipartite case at least, there is a relatively straightforward extension that leads to a useful algorithm.

First, we assume that the blue vertices are all assigned to modules through some mechanism. Maximizing the modularity then consists solely of assigning the red vertices to modules. This is a comparatively simple task. To see this, rewrite Eq. (22), giving

(25)

where we have aggregated the fixed terms into the matrix . We now write Eq. (25) in terms of explicit sums, so that

(26)

The inner sum in Eq. (26) is a sum across the rows of . Since each row of consists of a single 1 with all other elements being 0, the modularity is now simple to maximize: we just assign red vertex to module such that is the maximum of the th row of 222An arbitrary rule is needed to break ties, for example, random assignment of the vertex to one of the modules that maximizes ..

Conversely, if the red vertices are all assigned to modules, maximizing consists of assigning the blue vertices to modules. Analogously to the previous case, we define and manipulate Eq. (22) into the form

(27)

As with the red vertices, we maximize by assigning the th blue vertex to the module such that is the maximum of the th row of .

Taken together, these two maximization procedures define an algorithm that we call BRIM (bipartite, recursively induced modules). The BRIM algorithm is an iterative algorithm for maximizing , with the sets of red and blue vertices each recursively drawing the other into modular structures. For each iteration, is guaranteed never to decrease, as it is always possible at least to maintain the previous vertex partitioning and keep the modularity the same. Therefore, the BRIM algorithm will always find a partition at a maximum of . In general, the identified partition will correspond to a local maximum in , not the global maximum.

Note that the BRIM algorithm can work with the entire matrix, or a rank-restricted approximation calculated by omitting the smallest singular values. By using the full matrix, we automatically include all positive contributions to the modularity. As well, the algorithm can work with any assumed number of modules; however, no constraint exists to ensure that each module is occupied.

To test the efficacy of the BRIM algorithm, we apply it to a simple model network. The model consists of modules, each containing red and blue vertices. An edge exists between a red vertex and a blue vertex with probability if they are in the same module and with probability if they are in different modules. No edges exist between vertices with the same color.

The qualitative behavior of the model depends on and . When , there is a greater probability of vertices within a module being linked than vertices in different modules, matching our intuitive notion of modularity. With sufficiently close to one and small, the actual modular structure of a particular realization of the model should correspond to the assumed modular structure. As , the network becomes more uniform, with the assumed modular structure ultimately vanishing and all vertices belonging to a single module.333More customary (see, e.g., Ref. (Danon et al., 2005)) is to fix the expected degree of the network vertices and vary the expected number of edges linking vertices in different modules, with and calculated from the expectation values. However, for the bipartite network model under consideration, the base case, with edges only existing between vertices in the same module, will often be excluded using this approach. Lower values of introduce additional substructure into the modules; the general behavior as varies should be similar to the previous case, but with an overall reduced correspondence between the assumed modules and the actual modules in networks instantiated from the model.

Following Danon et al. (2005), we make precise the above qualitative description in terms of the normalized mutual information . Consider two schemes and for dividing the vertices into community groups, represented by two index matrices and 444Analogous measures can be defined in a straightforward fashion using the portions of the index matrices that correspond to just the red or blue vertices.. The two index matrices are used to calculate the so-called confusion matrix , which takes the simple form

(28)

The probability that a vertex is assigned to community in scheme and to community in scheme is proportional to the corresponding element of the confusion matrix, so that

(29)

Using the probability as defined in Eq. (29), we can calculate the normalized mutual information as

(30)

Equation (30) is expressed in terms of the usual mutual information and entropies and (Cover and Thomas, 1991), defined as

(31)
(32)
(33)

In Eqs. (30) through (33), we have made use of the common shorthand abbreviations , , and . The base of the logarithms in Eqs. (31) through (33) is arbitrary, as the computed measures only appear in the ratio in Eq. (30).

The normalized mutual information is a measure of the amount of information common to the two partitioning schemes. By taking one of the partitions to be the assumed modular structure of the network and one to be the structure found using the BRIM algorithm, we can thus explore the efficacy of the algorithm. When the found modules match the real ones, we have , and when the found modules are independent of the real ones, we have .

We now set , , and , giving vertices in the network. With various choices of and , we repeatedly instantiate the model network and determine the assignment of vertices to modules using the BRIM algorithm. The algorithm is initialized by assigning each of the blue vertices to a unique module. For each sample, we calculate .

In Fig. 1, we show results of applying the BRIM algorithm to the model network. The points show the mean value of , averaged over 100 instantiations of the network. The error bars show the standard error of the mean. The general behavior is as anticipated, lending confidence to the algorithm definition.

Figure 1: Agreement between model network modules and modules found using the BRIM algorithm. Each point shows the mean normalized mutual information between the model network community groups and those identified using the algorithm, averaged over 100 realizations of the model network. Error bars show the standard error of the mean.

iii.4 Determining the Number of Modules

The BRIM algorithm is silent on the issue of how many modules should be used. As noted in section III.2, the number of modules is at most one more than the rank of , which is a relatively weak constraint. One approach is thus to assign each vertex of the smaller of the red and blue vertex sets to unique modules, and allow the vertices to be grouped into an appropriate number of modules. For the BRIM algorithm, said approach is resource intensive, requiring the calculation of modularity contributions for what may be a grossly overestimated number of modules. Worse still, when the number of vertices is much greater than the number of modules, the BRIM algorithm may terminate at low-quality local maxima far from the true number of modules in the network (see section IV.2 for an example of this).

Clearly, automatically selecting the correct number of allowed modules in such a case would be preferable. The allowed number of modules thus becomes an adaptable parameter for which a value is to be found that optimizes the modularity. This presents some difficulties in that there is no obvious relationship between the allowed number of modules and the modularity found by the BRIM algorithm. However, by assuming that the modularity depends on the allowed number of modules in a reasonably smooth fashion, we can use a simple bisection approach to identify an appropriate value for the number of allowed modules.

The search begins by requiring all vertices to belong to the same module, , giving . We double the allowed number of modules . Half of the vertices are randomly reassigned to the newly defined modules, and a new, locally optimal solution is found using the BRIM algorithm. This process continues, with being repeatedly doubled so long as continues to increase. Each step in the -search builds on the previous solution by partially reusing the assignment of vertices to modules.

Once drops as increases, we have crossed a maximum in the modularity landscape. We therefore switch from extrapolating to larger numbers of modules to interpolating within the interval that includes the maximum. The interpolation is done using a simple bisection search in the allowed number of modules, trying new values for so as to continuously reduce the interval wherein the putative maximum in lies. As with the initial extrapolation stage of the search, vertices are assigned from earlier solutions to the newly allowed modules for each value of , and a new, locally optimal solution found.

The search for terminates once the interval becomes sufficiently small. In this work, we take the interval to be 2, i.e., the maximum at is bracketed by inferior solutions at and . This adaptive BRIM algorithm enables us to identify the appropriate number of modules in a number of steps that scales logarithmically with the number of vertices in the network.

Iv Results

In this section, we apply the BRIM algorithm to a network showing the interactions of women in the American Deep South at various social events (Davis et al., 1941) and to a network showing corporate interlocks in Scottish firms (Scott and Hughes, 1980). Both networks are conveniently available on the World Wide Web in Pajek format (Batagelj and Mrvar, 2006).

iv.1 Southern Women Event Participation

As an initial example, we consider the Southern women data set, collected by Davis et al. (1941) in and around Natchez, Mississippi during the 1930s as part of an extensive study of class and race in the Deep South. This data set and networks derived from it have been much studied. Indeed, Freeman (2003) has described it as “…a touchstone for comparing analytic methods in social network analysis.”

The Southern women data set describes the participation of 18 women in 14 social events. The women and social events constitute a bipartite network; an edge exists between a woman and a social event if the woman was in attendance at the event. The network is connected.

We identified network modular structure using the BRIM algorithm. The initial state is in general important. The dependence on the initial state is most visible in the quality of the stable solution, i.e., the algorithm can get “stuck” at a poor quality local maximum. We initialized the assignment of events to modules in using several strategies: (1) assigning all events to a single module, (2) assigning each event to its own module, and (3) randomly assigning events to modules.

For this network, all three strategies identify modular structures. The first strategy produces a good quality solution (4 modules, ). The second strategy also produces a solution that captures a great deal of the modular structure, but is somewhat coarser than the first (2 modules, ). The third strategy, random initial assignment, sheds light on the quality of the first two. Because the network is small, a large number of trials can be run without difficulty; we ran 500,000 trials. The greatest modularity found equalled that found with all events initially in unique modules, , indicating that this best solution found is quite good.

In Fig. 2, we show the best assignment of vertices to modules determined using the BRIM algorithm with all events initially in different modules. The shapes of the vertices show which ones belong to the same modules, with four modules in all. Open symbols with black labels portray vertices corresponding to the women, and filled symbols with white labels portray vertices corresponding to the events. The positions of the vertices are based on the singular vectors corresponding to the two largest singular values of , with the right singular vectors giving the coordinates for the events and the left singular vectors giving the coordinates for the women. Several vertices have been shifted slightly to prevent overlapping vertex symbols while preserving the overall character of the network.

Figure 2: Modules in the Southern women network. The women are represented as open symbols with black labels and the events as filled symbols with white labels. The modules are indicated by the shape of the symbols. Vertices are positioned with coordinates based on the elements of the singular vectors corresponding to the two largest singular values of ; some vertices are repositioned slightly to eliminate overlaps. The vertex partition pictured has the highest modularity we have found for the Southern women network,

The community groups found using the BRIM algorithm are comparable to those found in previous investigations of the Southern women data set (Ref. Freeman (2003) provides a useful survey). Most such studies have focused on the women, leaving the groupings for the events unspecified; we can use the groupings of the women to assign the events to the best modules, as described in section III.3, and calculate modularity values for purposes of comparison. The community groups can be further compared using the normalized mutual information between the various groupings of the women and the best grouping found using the BRIM algorithm. Values of and are summarized in table 1 and discussed in depth below.

Modules
BRIM
Spectral
Davis 1
Davis 2
Doreian
Unipartite
Table 1: Comparison of modules in the Southern women network. Where necessary, the modularity values are calculated from an optimistic assignment of the events to the best possible modules from a given assignment of the women to modules. Values of the normalized mutual information are calculated between the given divisions of the women and the best division found using the BRIM algorithm.

In the original investigation, Davis et al. (1941) used general ethnographic knowledge of the community to assign the women to two groups. The groups consisted of women 1–9 and of women 9–18; woman 9 is a secondary member of both groups. To be consistent with the definitions in section II, we must assign this individual to a specific group. The and values are seen from table 1 to be similar for both assignments, with the case where woman 9 is grouped with women 10–18 labeled as “Davis 1” and the case where woman 9 is grouped with women 1–8 labeled as “Davis 2.” The latter division is the same as what Freeman (2003) identified as the consensus from 21 different studies of the Southern women data set. The and values are reasonably similar to values found for two modules using either the BRIM algorithm or spectral bipartitioning as discussed in section III.2, which groups the women into sets {1–7, 9} and {8, 10–18} (identified in table 1 with the label “Spectral”).

Doreian et al. (2004) considered the modular nature of both parts of the network, suggesting several divisions of the women and events. The division with the greatest modularity (given in their Table 4) is characterized in table 1 with the label “Doreian.” Taking just their partitioning of the events into three groups (events 1–5, 6–9, and 10–14) and replacing their partitioning of the women using the approach from section III.3, the modularity can be increased from 0.29390 to 0.32950. This is similar to the best assignment of vertices to modules we described above, with modularity of 0.34554, wherein the additional structure produces a modest, but real, improvement in the modularity.

It is also of interest to compare the community groups obtained for the Southern women network using the bipartite network to those found using an unweighted projection network. Here, we focus on the projection consisting of the eighteen women as vertices, with edges defined by mutual participation in events. The best division we found for the women, discussed above and shown in Fig. 2, actually has a negative value for the standard unipartite modularity; it is thus better to use only a single module containing all eighteen women than the best module found for the bipartite network. Since the modules we identified from the bipartite network using the BRIM algorithm are similar to those found in numerous other studies, this highlights the difficulties that can arise using a unipartite projection.

Conversely, we can determine the bipartite modularity for community groups found using the unipartite projection. We first use the Newman vector to partition the women into two groups as described in section III.1, with women 2 and 4–7 in one group and all others in a second group. Next, we determine the best assignment of events to modules using the approach from section III.3. Together, this gives the values shown in table 1 for the label “Unipartite,” which reflect that some of the modular structure of the network has been captured but is generally inferior to the solutions found from the bipartite network. Further, the solution from the unweighted projection does not correspond to a maximum in the bipartite modularity; using the solution as the initial state for the BRIM algorithm, a solution is obtained with two modules identical to those found using spectral bipartitioning as described in section III.2.

iv.2 Scotland Corporate Interlock

As a second example, we consider a data set on corporate interlocks in Scotland in the early twentieth century (Scott and Hughes, 1980). The data set characterizes 108 Scottish firms during 1904-5, detailing the corporate sector, capital, and board of directors for each firm. The data set includes only those board members who held multiple directorships, totaling 136 individuals.

Here, we focus on the bipartite network of firms and directors, with edges existing between each firm and its board members. Unlike the Southern women network, the Scotland corporate interlock network is not connected. In the following, we consider only the largest component of the graph, containing 131 directors and 86 firms—and thus, as many as 86 modules.

As with the Southern women network, assigning all directors to unique modules or to the same module results in a solution that captures some of the modular character of the network, with and , respectively. However, in contrast to the Southern women network, these are rather poor solutions to what can be found starting from a random assignment of directors to modules (see Fig. 3).

Further, the best solutions are found by restricting the allowed number of modules to less than the maximum. In principle, allowing the number of modules to take on any size leaves the BRIM algorithm to search the largest possible space, potentially finding the largest possible modularity value. In practice, the results are inferior to those obtained from a more restricted search. In Fig. 3, we show the results, in terms of the actual numbers of modules occupied and modularity values, for BRIM searches with the allowed number of modules restricted. This trades off the possibility of higher modularity values in the excluded region for improved searching in the remaining region. The trade-off is clearly a good one, as the best solutions are found with fewer than thirty modules.

In Fig. 3, we also show three runs of the adaptive BRIM algorithm described in section III.4. The lines show the progress of the number of modules and modularity value during the search. The number of modules allowed for the BRIM search is typically close (within 10%) to the number of modules actually found, suggesting that the adaptive approach eliminates a wasteful search through vertex assignments with too many modules. The three traces all show typical behavior and lead to good solutions; two of the adaptive runs lead to better solutions, in terms of modularity, than any of the much larger number of trials using BRIM with a fixed .

Figure 3: Quality of solutions found in the Scotland corporate interlock network. The modularity depends on the allowed number of modules . The points correspond to solutions found using the BRIM algorithm starting from a random initial assignment of vertices to modules. The values on the ordinate indicate the number of modules occupied by at least one vertex in the solution state found by the BRIM algorithm. All points are slightly dithered to better show regions with many similar or identical solutions. The lines show the course of an adaptive search for the correct number of modules to maximize the modularity, terminating at states with the modularity and number of modules shown by the crosses.

Based on the solutions shown in Fig. 3, the main component of the Scotland corporate interlock network has roughly twenty community groups, considerably fewer than the 131 directors or 86 firms. This analysis could serve as a starting point for an investigation of the community structures of the firms or directors. A more comprehensive analysis would take into account the available information on the corporate sectors and capital of the firms.

V Conclusions

We have defined and explored a modularity appropriate for bipartite networks. The presented results extend and specialize the matrix-based approach recently reported by Newman (2006b) for unipartite networks. The bipartite structure of the network is reflected mathematically in the importance of an asymmetric submatrix of the full bipartite modularity matrix , with a corresponding emphasis on the singular value decomposition of instead of the spectral decomposition of . We made use of the properties of to define an algorithm, BRIM, for use in identifying network modules. By applying the algorithm to real-world networks, we demonstrated its effectiveness and identified some of its limitations.

The usual unipartite modularity has a limited resolution that depends on the number of edges in the network (Fortunato and Barthelemy, 2007). The main consequence of the resolution limit is that the modules in large networks may have hidden substructures that require deeper investigations to reveal. Although we have not shown it, we expect that the bipartite modularity introduced in this work has a similar resolution limit, with similar consequences.

One of the key themes in this paper has been that the bipartite structure of the network can be beneficially incorporated into its mathematical description and its computational treatment. This theme was realized in the BRIM algorithm, where the assignment of vertices to modules in one part of the network, when held fixed, provides a stable modularity landscape in which it is straightforward to partition the vertices of the other part into modules. We expect that the characteristics of other specialized classes of networks could be taken advantage of in an analogous fashion to define appropriate null model networks, modularity measures, and community detection algorithms.

The eigenvalues of the graph Laplacian are closely related to many important properties and invariants of the graph (Chung, 1997). In contrast, relatively little is known about the spectra of modularity matrices, be they for unipartite or bipartite networks. We are optimistic that the eigenvalues of the modularity matrix usefully relate to important and interesting network properties.

Acknowledgements.
The author thanks Ludwig Streit, Philippe Blanchard, and Thomas Roediger-Schluga for useful comments and suggestions. This work has been supported in part by the European FP6-NEST-Adventure Programme, contract number 028875.

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
255325
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description