Closeness Centralization Measure for Two-mode Data of Prescribed Sizes††thanks: This work was supported by PHC Proteus 26818PC, Slovenian ARRS bilateral projects BI-FR/12-13-PROTEUS-011 and BI-FR/14-15-PROTEUS-001.
Abstract
We confirm a conjecture by Everett, Sinclair, and Dankelmann [Some Centrality results new and old, J. Math. Sociology 28 (2004), 215–227] regarding the problem of maximizing closeness centralization in two-mode data, where the number of data of each type is fixed. Intuitively, our result states that among all networks obtainable via two-mode data, the largest closeness is achieved by simply locally maximizing the closeness of a node. Mathematically, our study concerns bipartite graphs with fixed size bipartitions, and we show that the extremal configuration is a rooted tree of depth , where neighbors of the root have an equal or almost equal number of children.
Keywords: centrality, closeness centrality, network, graph, complex network.
Closeness Centralization Measure for Two-mode Data of Prescribed Sizes
1 Introduction
A social network is often conveniently modeled by a graph: nodes represent individual persons and edges represent the relationships between pairs of individuals. Our work focuses on simple unweighted graphs: our graph only tells us, for a given (binary) relation , which pairs of individual are in relation according to .
Centrality is a crucial concept in studying social networks [8, 12]. It can be seen as a measure of how central is the position of an individual in a social network. Various node-based measures of the centrality have been proposed to determine the relative importance of a node within a graph (the reader is referred to the work of Koschützki et al. [9] for an overview). Some widely used centrality measures are the degree centrality, the betweenness centrality, the closeness centrality and the eigenvector centrality (definitions and extended discussions are found in the book edited by Brandes and Erlebach [5]).
We focus on closeness centrality, which measures how close a node is to all other nodes in the graph: the smaller the total distance from a node to all other nodes, the more important the node is. Various closeness-based measures have been developed [1, 2, 4, 13, 11, 14, 16, 13].
Let us see an example: suppose we want to place a service facility, e.g., a school, such that the total distance to all inhabitants in the region is minimal. This would make the chosen location as convenient as possible for most inhabitants. In social network analysis the centrality index based on this concept is called closeness centrality.
Formally, for a node of a graph , the closeness of is defined to be
(1) |
where is the distance between and in , that is, the length of a shortest path in between nodes and . We shall use the shorthand . In both notations, we may drop the subscript when there is no risk of confusion.
While centrality measures compare the importance of a node within a graph, the associated notion of centralization, as introduced by Freeman [8], allows us to compare the relative importance of nodes within their respective graphs. The closeness centralization of a node in a graph is given by
(2) |
Further, we set .
It is important to note that the parameter is really tailored to compare the centralization of nodes in different graphs. If only one graph is involved, then one readily sees that maximizing over the nodes of a graph amounts to minimizing . Indeed, suppose that is a graph and a node of such that for every . Then for every node of ,
In what follows, we use the the following notation. The star graph of order , sometimes simply known as an -star, is the tree on nodes with one node having degree . The star graph is thus a complete bipartite graph with one part of size . Everett, Sinclair, and Dankelmann [7] established that over all graphs with a fixed number of nodes, the closeness is maximized by the star graph.
Theorem 1.
If is a graph with nodes, then
where is the node of of maximum degree.
They also considered the problem of maximizing centralization measures for two-mode data [*]ESD04. In this context, the relation studied links two different types of data (e.g., persons and events) and we are interested in the centralization of one type of data only (e.g., the most central person). Thus the graph obtained is bipartite: its nodes can be partitioned into two parts so that all the edges join nodes belonging to different parts. A toy example is depicted in Figure 1, where one type of data consists of students and the other of classes: edges link the students to the classes they attended. (The sole purpose of this example is to make sure the reader is at ease with the definitions of and .) Closeness centrality is maximized at the student “” for one part and at the class “” for the other. An example of a real-world two-mode network on edges with partition sizes and , borrowed from [6] is depicted on Figure 2. On the figure, one can observe a frequency of interparticipation of a group of women in social events in Old City, 1936. On Tables 1 and 2, one can observe closeness centralization for partitions and and notice that closeness centrality (and hence centralization) is maximized at “Mrs. Evelyn Jefferson” and the event from “September 16th”, respectively.
Mrs. Evelyn Jefferson | ||
---|---|---|
Miss Theresa Anderson | ||
Mrs. Nora Fayette | ||
Mrs. Sylvia Avondale | ||
Miss Laura Mandeville | ||
Miss Brenda Rogers | ||
Miss Katherine Rogers | ||
Mrs. Helen Lloyd | ||
Miss Ruth DeSand | ||
Miss Verne Sanderson | ||
Miss Myra Liddell | ||
Miss Frances Anderson | ||
Miss Eleanor Nye | ||
Miss Pearl Oglethorpe | ||
Mrs. Dorothy Murchison | ||
Miss Charlotte McDowd | ||
Mrs. Olivia Carleton | ||
Mrs. Flora Price |
label on Fig. 2 | |||
---|---|---|---|
September 16th | P8 | ||
April 8th | P9 | ||
March 15th | P7 | ||
May 19th | P6 | ||
February 25th | P5 | ||
April 12th | P3 | ||
April 7th | P12 | ||
June 10th | P10 | ||
September 26th | P4 | ||
February 23rd | P11 | ||
June 27th | P1 | ||
March 2nd | P2 | ||
November 21st | P13 | ||
August 3rd | P14 |
Everett et al. formulated an interesting conjecture, which was later proved by Sinclair [15]. To state it, we first need a definition.
Definition 2.
Let be the tree with node bipartition such that
-
for ;
-
there exists a node such that ; and
-
for all nodes .
The node is called the root of .
The aforementioned conjecture was that the pair is an extremal pair for the problem of maximizing betweenness centralization in bipartite graphs with a fixed sized bipartition into parts of sizes and . Recall that for two-mode data, we are only interested in one type of data: in graph-theoretic terms, we look only at nodes that belong to the part of size , and we want to know which of these nodes has the largest closeness in the graph. In other words, letting be the part of size of , we want to determine .
Everett et al. also suggested that the same pair is extremal for closeness and eigenvector centralization measures. In this paper, we confirm the conjecture for the closeness centralization measure. That is, we prove that the pair is extremal for the problem of maximizing closeness centralization in bipartite graphs with parts of size and ,where is the root.
We point out that a similar study for the centrality measure of eccentricity was led recently [10]. In addition, Bell [3] worked on closely related notions, namely subgroup centrality measures. Similarly as for two-mode data, a susbet of the nodes is fixed (called a group) and the aim is to find a node in with largest centrality. However, unlike in the standard centrality notion, the centrality itself is computed using distances only to the nodes in (local centrality) or to the nodes outside (global centrality). Note that the standard notion, which is used in this work, takes into account the distances to all other nodes in the graph.
2 Bipartite Networks With Fixed Number of Nodes
Theorem 3.
Let be a bipartite graph with node parts and sizes and , respectively. Then for each ,
To prove Theorem 3, suppose that is a bipartite graph with bipartition where for , and is a node in such that . We prove that this inequality must actually be an equality by showing that any such extremal pair must satisfy the following three properties:
-
is a tree;
-
; and
-
whenever .
Property 1 is relatively straightforward to check and so is 3 if we assume that 2 holds. Thus the majority of the discussion below will be devoted to proving that 2 holds, which we do last. For convenience, we define to be .
We start by establishing 1; namely, that the graph is a tree. Assume, for the sake of contradiction, that is not a tree and let be a breadth-first-search tree of rooted at . Note that and for any node . In addition, there exist at least two nodes for which the above inequality is strict. It follows that , a contradiction.
We now establish that 3 holds if 2 does. Thus we know that is a tree and we assume that , therefore also all nodes from are leaves. Suppose, for the sake of contradiction, that there exist nodes such that . Let be a neighbor of different from and consider the graph obtained by deleting the edge and replacing it with . Note that and that unless , that is unless belongs to the closed neighborhood of either or . So
(3) |
Now, let and where, by assumption, .
Recalling that is a tree, observe that the following hold for every and every (for better illustration, see Figure 3).
-
;
-
;
-
;
-
;
-
and
-
.
From (i)–(iii), we infer that for any ,
and similarly by (v) and (vi),
Thus the right side of (3) is greater than
which is positive by (i) and (iv). This contradiction shows that 3 holds provided 2 does.
It remains to prove that 2 holds to complete the proof. First, if , then the tree must be an -star, hence the second property is satisfied. Now consider the case where . Then there is precisely one node that is adjacent to both nodes in . Moreover, if since, if then while . Thus and hence , as wanted.
From now on, we assume that . As in the proof of 3, we argue that if 2 does not hold then can be increased by altering the graph . In this case, however, we find it necessary to use our assumption that itself is at least as large as . This shall allow us to have a lower bound on , by the next lemma.
Lemma 4.
.
Proof.
We establish the inequality via a direct computation. Unfortunately, the expressions involved force a lengthy computation.
We set and we write where . Let us now calculate for each node of .
-
.
-
Consider the neighbors of : there are
-
neighbors for which ; and
-
neighbors for which .
-
-
Consider the nodes at distance two from : there are
-
nodes for which ; and
-
nodes for which .
-
As is seen from (4), if is fixed and tends to infinity (hence, so does ), then approaches .
Let us now subtract from the right side of (5) and show that the difference is non-negative. After cross-multiplying and simplifying, we obtain a fraction with positive denominator (since each denominator in the right side of (5) is positive), and with numerator equal to
(6) |
This expression increases with and is clearly positive when (to see it quickly just compare, in each parenthesis, every (maximal) sequence of consecutive negative terms with the (maximal) sequence of positive terms preceding it). Further, a direct calculation ensures that (6) is actually positive even when .
However, if , then (6) could take on negative values for certain values of . To deal with these two cases we revert back to the initial equation (4).
Assume that . Then subtracting from both sides of (4) yields that is at least
(7) |
Placing (7) under one (positive) denominator, the numerator becomes
(8) |
which is clearly positive as .
A similar calculation yields the conclusion when . In this case, the difference of (4) and yields that is at least
whose numerator, when placed under a common (positive) denominator, is
This is non-negative as . This concludes the proof. ∎
It remains to demonstrate that 2 holds. To this end, we consider the tree to be rooted at and, for a node , we let be the subtree of rooted at . To avoid unnecessary notation later, let us observe immediately that if then 2 holds. For otherwise, and there exists a node at distance two from such that . As a result, , which implies that , a contradiction.
We also note that if for all , then 2 is satisfied. So assume that there exists some child of whose subtree has depth at least . Among all such children of , let be such that is maximum, that is,
We now give some notations, which are illustrated in Figure 4. Let be the nodes of with depth and set . Note that, by definition, and whenever . Let be the children of (in ) with degree more than and set . Let be the set of children of with degree and set .
Note that for any , the definition of ensures that is a star whenever . The graph is obtained from as follows. (An illustration is given in Figure 5.) For convenience, we set .
-
For each , the edge is added.
-
For each , the edge is removed and all other edges incident to but one are removed. Thus the vertices become leaves of , each being attached to one of the vertices .
-
If there exists a child of different from with , then we select an arbitrary set of size and we set . Then for each , we replace the edge by the edge .
-
If there is no node as in 3, then we let be a child of different from such that is as large as possible, and we define to be . (Recall that , hence such a child always exists.) Moreover, we set for convenience.
As noted earlier, if 3 applies then is a star. Moreover, if , then one can see that and hence . However, this is not a contradiction since and .
Regardless of whether 3 or 4 applies, . Actually, it is important to notice that, in , no child of different from has more than children itself. Even more, for any such child we know that . This follows from our previous remark if has depth at most , and from the fact that otherwise. Also, setting , we observe that for every node
Therefore, . Since the definition of implies that , it follows that the size of is at most .
Note that is a tree, which we see rooted at , and and have the same node set, which we call . In addition, and have the same bipartition . Our next task is to compare the total distance of nodes in and in , that is, we compare and . For readability purposes, let us set , , and let be the subtree of rooted at . We now make a few statements about and for various nodes. We shall often use that
Lemma 5.
The following hold.
-
If , then .
-
If , then .
-
If , then .
-
If , then .
-
If , then and whenever and .
-
If , then .
-
for every node .
Proof.
We prove all the statements in order.
1. If , then the distance from to any node not in is unchanged. In addition, whenever , hence the conclusion.
2. If , then for each . In addition, if , then , which yields the conclusion.
3. It suffices to observe that if , then
4. First note that if , then the definition of ensures that for each , which implies that .
Now let . Observe that if , then . In addition, if , then . Consequently,
which is non-negative since when , and .
5. Let and . First note that every node in is two units closer to than to . Similarly, every node in