Closeness Centralization Measure for Twomode Data of Prescribed Sizes^{†}^{†}thanks: This work was supported by PHC Proteus 26818PC, Slovenian ARRS bilateral projects BIFR/1213PROTEUS011 and BIFR/1415PROTEUS001.
Abstract
We confirm a conjecture by Everett, Sinclair, and Dankelmann [Some Centrality results new and old, J. Math. Sociology 28 (2004), 215–227] regarding the problem of maximizing closeness centralization in twomode data, where the number of data of each type is fixed. Intuitively, our result states that among all networks obtainable via twomode data, the largest closeness is achieved by simply locally maximizing the closeness of a node. Mathematically, our study concerns bipartite graphs with fixed size bipartitions, and we show that the extremal configuration is a rooted tree of depth , where neighbors of the root have an equal or almost equal number of children.
Keywords: centrality, closeness centrality, network, graph, complex network.
Closeness Centralization Measure for Twomode Data of Prescribed Sizes
1 Introduction
A social network is often conveniently modeled by a graph: nodes represent individual persons and edges represent the relationships between pairs of individuals. Our work focuses on simple unweighted graphs: our graph only tells us, for a given (binary) relation , which pairs of individual are in relation according to .
Centrality is a crucial concept in studying social networks [8, 12]. It can be seen as a measure of how central is the position of an individual in a social network. Various nodebased measures of the centrality have been proposed to determine the relative importance of a node within a graph (the reader is referred to the work of Koschützki et al. [9] for an overview). Some widely used centrality measures are the degree centrality, the betweenness centrality, the closeness centrality and the eigenvector centrality (definitions and extended discussions are found in the book edited by Brandes and Erlebach [5]).
We focus on closeness centrality, which measures how close a node is to all other nodes in the graph: the smaller the total distance from a node to all other nodes, the more important the node is. Various closenessbased measures have been developed [1, 2, 4, 13, 11, 14, 16, 13].
Let us see an example: suppose we want to place a service facility, e.g., a school, such that the total distance to all inhabitants in the region is minimal. This would make the chosen location as convenient as possible for most inhabitants. In social network analysis the centrality index based on this concept is called closeness centrality.
Formally, for a node of a graph , the closeness of is defined to be
(1) 
where is the distance between and in , that is, the length of a shortest path in between nodes and . We shall use the shorthand . In both notations, we may drop the subscript when there is no risk of confusion.
While centrality measures compare the importance of a node within a graph, the associated notion of centralization, as introduced by Freeman [8], allows us to compare the relative importance of nodes within their respective graphs. The closeness centralization of a node in a graph is given by
(2) 
Further, we set .
It is important to note that the parameter is really tailored to compare the centralization of nodes in different graphs. If only one graph is involved, then one readily sees that maximizing over the nodes of a graph amounts to minimizing . Indeed, suppose that is a graph and a node of such that for every . Then for every node of ,
In what follows, we use the the following notation. The star graph of order , sometimes simply known as an star, is the tree on nodes with one node having degree . The star graph is thus a complete bipartite graph with one part of size . Everett, Sinclair, and Dankelmann [7] established that over all graphs with a fixed number of nodes, the closeness is maximized by the star graph.
Theorem 1.
If is a graph with nodes, then
where is the node of of maximum degree.
They also considered the problem of maximizing centralization measures for twomode data [*]ESD04. In this context, the relation studied links two different types of data (e.g., persons and events) and we are interested in the centralization of one type of data only (e.g., the most central person). Thus the graph obtained is bipartite: its nodes can be partitioned into two parts so that all the edges join nodes belonging to different parts. A toy example is depicted in Figure 1, where one type of data consists of students and the other of classes: edges link the students to the classes they attended. (The sole purpose of this example is to make sure the reader is at ease with the definitions of and .) Closeness centrality is maximized at the student “” for one part and at the class “” for the other. An example of a realworld twomode network on edges with partition sizes and , borrowed from [6] is depicted on Figure 2. On the figure, one can observe a frequency of interparticipation of a group of women in social events in Old City, 1936. On Tables 1 and 2, one can observe closeness centralization for partitions and and notice that closeness centrality (and hence centralization) is maximized at “Mrs. Evelyn Jefferson” and the event from “September 16th”, respectively.
Mrs. Evelyn Jefferson  

Miss Theresa Anderson  
Mrs. Nora Fayette  
Mrs. Sylvia Avondale  
Miss Laura Mandeville  
Miss Brenda Rogers  
Miss Katherine Rogers  
Mrs. Helen Lloyd  
Miss Ruth DeSand  
Miss Verne Sanderson  
Miss Myra Liddell  
Miss Frances Anderson  
Miss Eleanor Nye  
Miss Pearl Oglethorpe  
Mrs. Dorothy Murchison  
Miss Charlotte McDowd  
Mrs. Olivia Carleton  
Mrs. Flora Price 
label on Fig. 2  

September 16th  P8  
April 8th  P9  
March 15th  P7  
May 19th  P6  
February 25th  P5  
April 12th  P3  
April 7th  P12  
June 10th  P10  
September 26th  P4  
February 23rd  P11  
June 27th  P1  
March 2nd  P2  
November 21st  P13  
August 3rd  P14 
Everett et al. formulated an interesting conjecture, which was later proved by Sinclair [15]. To state it, we first need a definition.
Definition 2.
Let be the tree with node bipartition such that

for ;

there exists a node such that ; and

for all nodes .
The node is called the root of .
The aforementioned conjecture was that the pair is an extremal pair for the problem of maximizing betweenness centralization in bipartite graphs with a fixed sized bipartition into parts of sizes and . Recall that for twomode data, we are only interested in one type of data: in graphtheoretic terms, we look only at nodes that belong to the part of size , and we want to know which of these nodes has the largest closeness in the graph. In other words, letting be the part of size of , we want to determine .
Everett et al. also suggested that the same pair is extremal for closeness and eigenvector centralization measures. In this paper, we confirm the conjecture for the closeness centralization measure. That is, we prove that the pair is extremal for the problem of maximizing closeness centralization in bipartite graphs with parts of size and ,where is the root.
We point out that a similar study for the centrality measure of eccentricity was led recently [10]. In addition, Bell [3] worked on closely related notions, namely subgroup centrality measures. Similarly as for twomode data, a susbet of the nodes is fixed (called a group) and the aim is to find a node in with largest centrality. However, unlike in the standard centrality notion, the centrality itself is computed using distances only to the nodes in (local centrality) or to the nodes outside (global centrality). Note that the standard notion, which is used in this work, takes into account the distances to all other nodes in the graph.
2 Bipartite Networks With Fixed Number of Nodes
Theorem 3.
Let be a bipartite graph with node parts and sizes and , respectively. Then for each ,
To prove Theorem 3, suppose that is a bipartite graph with bipartition where for , and is a node in such that . We prove that this inequality must actually be an equality by showing that any such extremal pair must satisfy the following three properties:

is a tree;

; and

whenever .
Property 1 is relatively straightforward to check and so is 3 if we assume that 2 holds. Thus the majority of the discussion below will be devoted to proving that 2 holds, which we do last. For convenience, we define to be .
We start by establishing 1; namely, that the graph is a tree. Assume, for the sake of contradiction, that is not a tree and let be a breadthfirstsearch tree of rooted at . Note that and for any node . In addition, there exist at least two nodes for which the above inequality is strict. It follows that , a contradiction.
We now establish that 3 holds if 2 does. Thus we know that is a tree and we assume that , therefore also all nodes from are leaves. Suppose, for the sake of contradiction, that there exist nodes such that . Let be a neighbor of different from and consider the graph obtained by deleting the edge and replacing it with . Note that and that unless , that is unless belongs to the closed neighborhood of either or . So
(3) 
Now, let and where, by assumption, .
Recalling that is a tree, observe that the following hold for every and every (for better illustration, see Figure 3).

;

;

;

;

and

.
From (i)–(iii), we infer that for any ,
and similarly by (v) and (vi),
Thus the right side of (3) is greater than
which is positive by (i) and (iv). This contradiction shows that 3 holds provided 2 does.
It remains to prove that 2 holds to complete the proof. First, if , then the tree must be an star, hence the second property is satisfied. Now consider the case where . Then there is precisely one node that is adjacent to both nodes in . Moreover, if since, if then while . Thus and hence , as wanted.
From now on, we assume that . As in the proof of 3, we argue that if 2 does not hold then can be increased by altering the graph . In this case, however, we find it necessary to use our assumption that itself is at least as large as . This shall allow us to have a lower bound on , by the next lemma.
Lemma 4.
.
Proof.
We establish the inequality via a direct computation. Unfortunately, the expressions involved force a lengthy computation.
We set and we write where . Let us now calculate for each node of .

.

Consider the neighbors of : there are

neighbors for which ; and

neighbors for which .


Consider the nodes at distance two from : there are

nodes for which ; and

nodes for which .

As is seen from (4), if is fixed and tends to infinity (hence, so does ), then approaches .
Let us now subtract from the right side of (5) and show that the difference is nonnegative. After crossmultiplying and simplifying, we obtain a fraction with positive denominator (since each denominator in the right side of (5) is positive), and with numerator equal to
(6) 
This expression increases with and is clearly positive when (to see it quickly just compare, in each parenthesis, every (maximal) sequence of consecutive negative terms with the (maximal) sequence of positive terms preceding it). Further, a direct calculation ensures that (6) is actually positive even when .
However, if , then (6) could take on negative values for certain values of . To deal with these two cases we revert back to the initial equation (4).
Assume that . Then subtracting from both sides of (4) yields that is at least
(7) 
Placing (7) under one (positive) denominator, the numerator becomes
(8) 
which is clearly positive as .
A similar calculation yields the conclusion when . In this case, the difference of (4) and yields that is at least
whose numerator, when placed under a common (positive) denominator, is
This is nonnegative as . This concludes the proof. ∎
It remains to demonstrate that 2 holds. To this end, we consider the tree to be rooted at and, for a node , we let be the subtree of rooted at . To avoid unnecessary notation later, let us observe immediately that if then 2 holds. For otherwise, and there exists a node at distance two from such that . As a result, , which implies that , a contradiction.
We also note that if for all , then 2 is satisfied. So assume that there exists some child of whose subtree has depth at least . Among all such children of , let be such that is maximum, that is,
We now give some notations, which are illustrated in Figure 4. Let be the nodes of with depth and set . Note that, by definition, and whenever . Let be the children of (in ) with degree more than and set . Let be the set of children of with degree and set .
Note that for any , the definition of ensures that is a star whenever . The graph is obtained from as follows. (An illustration is given in Figure 5.) For convenience, we set .

For each , the edge is added.

For each , the edge is removed and all other edges incident to but one are removed. Thus the vertices become leaves of , each being attached to one of the vertices .

If there exists a child of different from with , then we select an arbitrary set of size and we set . Then for each , we replace the edge by the edge .

If there is no node as in 3, then we let be a child of different from such that is as large as possible, and we define to be . (Recall that , hence such a child always exists.) Moreover, we set for convenience.
As noted earlier, if 3 applies then is a star. Moreover, if , then one can see that and hence . However, this is not a contradiction since and .
Regardless of whether 3 or 4 applies, . Actually, it is important to notice that, in , no child of different from has more than children itself. Even more, for any such child we know that . This follows from our previous remark if has depth at most , and from the fact that otherwise. Also, setting , we observe that for every node
Therefore, . Since the definition of implies that , it follows that the size of is at most .
Note that is a tree, which we see rooted at , and and have the same node set, which we call . In addition, and have the same bipartition . Our next task is to compare the total distance of nodes in and in , that is, we compare and . For readability purposes, let us set , , and let be the subtree of rooted at . We now make a few statements about and for various nodes. We shall often use that
Lemma 5.
The following hold.

If , then .

If , then .

If , then .

If , then .

If , then and whenever and .

If , then .

for every node .
Proof.
We prove all the statements in order.
1. If , then the distance from to any node not in is unchanged. In addition, whenever , hence the conclusion.
2. If , then for each . In addition, if , then , which yields the conclusion.
3. It suffices to observe that if , then
4. First note that if , then the definition of ensures that for each , which implies that .
Now let . Observe that if , then . In addition, if , then . Consequently,
which is nonnegative since when , and .
5. Let and . First note that every node in is two units closer to than to . Similarly, every node in