Closeness Centralization Measure for Two-mode Data of Prescribed SizesThis work was supported by PHC Proteus 26818PC, Slovenian ARRS bilateral projects BI-FR/12-13-PROTEUS-011 and BI-FR/14-15-PROTEUS-001.

# Closeness Centralization Measure for Two-mode Data of Prescribed Sizes††thanks: This work was supported by PHC Proteus 26818PC, Slovenian ARRS bilateral projects BI-FR/12-13-PROTEUS-011 and BI-FR/14-15-PROTEUS-001.

Matjaž Krnc111Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Slovenia. matjaz.krnc@gmail.com.    Jean-Sébastien Sereni222CNRS (LORIA), Vandœuvre-lès-Nancy, France. sereni@kam.mff.cuni.cz. This author’s work was partially supported by the French Agence Nationale de la Recherche under reference anr 10 jcjc 0204 01. Corresponding author, .    Riste Škrekovski333Department of Mathematics, University of Ljubljana, and Faculty of information studies, Novo Mesto, and FAMNIT, University of Primorska, Koper, Slovenia. Partially supported by ARRS Program P1-0383. Email: skrekovski@gmail.com    Zelealem B. Yilma444Carnegie Mellon University Qatar, Doha, Qatar. E-mail: zyilma@qatar.cmu.edu. This author’s work was partially supported by the French Agence Nationale de la Recherche under reference anr 10 jcjc 0204 01.
July 29, 2019
###### Abstract

We confirm a conjecture by Everett, Sinclair, and Dankelmann [Some Centrality results new and old, J. Math. Sociology 28 (2004), 215–227] regarding the problem of maximizing closeness centralization in two-mode data, where the number of data of each type is fixed. Intuitively, our result states that among all networks obtainable via two-mode data, the largest closeness is achieved by simply locally maximizing the closeness of a node. Mathematically, our study concerns bipartite graphs with fixed size bipartitions, and we show that the extremal configuration is a rooted tree of depth , where neighbors of the root have an equal or almost equal number of children.

Keywords: centrality, closeness centrality, network, graph, complex network.

Closeness Centralization Measure for Two-mode Data of Prescribed Sizes

## 1 Introduction

A social network is often conveniently modeled by a graph: nodes represent individual persons and edges represent the relationships between pairs of individuals. Our work focuses on simple unweighted graphs: our graph only tells us, for a given (binary) relation , which pairs of individual are in relation according to .

Centrality is a crucial concept in studying social networks [8, 12]. It can be seen as a measure of how central is the position of an individual in a social network. Various node-based measures of the centrality have been proposed to determine the relative importance of a node within a graph (the reader is referred to the work of Koschützki et al. [9] for an overview). Some widely used centrality measures are the degree centrality, the betweenness centrality, the closeness centrality and the eigenvector centrality (definitions and extended discussions are found in the book edited by Brandes and Erlebach [5]).

We focus on closeness centrality, which measures how close a node is to all other nodes in the graph: the smaller the total distance from a node  to all other nodes, the more important the node  is. Various closeness-based measures have been developed [1, 2, 4, 13, 11, 14, 16, 13].

Let us see an example: suppose we want to place a service facility, e.g., a school, such that the total distance to all inhabitants in the region is minimal. This would make the chosen location as convenient as possible for most inhabitants. In social network analysis the centrality index based on this concept is called closeness centrality.

Formally, for a node  of a graph , the closeness of  is defined to be

 CG(v)\coloneqq1∑u∈V(G)distG(v,u), (1)

where is the distance between  and  in , that is, the length of a shortest path in  between nodes  and . We shall use the shorthand . In both notations, we may drop the subscript when there is no risk of confusion.

While centrality measures compare the importance of a node within a graph, the associated notion of centralization, as introduced by Freeman [8], allows us to compare the relative importance of nodes within their respective graphs. The closeness centralization of a node in a graph is given by

 C1(v;G)\coloneqq∑u∈V(G)[C(v)−C(u)]. (2)

Further, we set .

It is important to note that the parameter  is really tailored to compare the centralization of nodes in different graphs. If only one graph is involved, then one readily sees that maximizing  over the nodes of a graph  amounts to minimizing . Indeed, suppose that is a graph and  a node of  such that for every . Then for every node  of ,

 C1(v;G)−C1(x;G) =(n−1)(1WG(v)−1WG(x))−(1WG(x)−1WG(v)) =n(1WG(v)−1WG(x)) ⩾0.

In what follows, we use the the following notation. The star graph of order , sometimes simply known as an -star, is the tree on  nodes with one node having degree . The star graph is thus a complete bipartite graph with one part of size . Everett, Sinclair, and Dankelmann [7] established that over all graphs with a fixed number of nodes, the closeness is maximized by the star graph.

###### Theorem 1.

If is a graph with  nodes, then

 C1(u;Sn−1)⩾C1(G),

where  is the node of  of maximum degree.

They also considered the problem of maximizing centralization measures for two-mode data [*]ESD04. In this context, the relation studied links two different types of data (e.g., persons and events) and we are interested in the centralization of one type of data only (e.g., the most central person). Thus the graph obtained is bipartite: its nodes can be partitioned into two parts so that all the edges join nodes belonging to different parts. A toy example is depicted in Figure 1, where one type of data consists of students and the other of classes: edges link the students to the classes they attended. (The sole purpose of this example is to make sure the reader is at ease with the definitions of  and .) Closeness centrality is maximized at the student “” for one part and at the class “” for the other. An example of a real-world two-mode network  on  edges with partition sizes and , borrowed from [6] is depicted on Figure 2. On the figure, one can observe a frequency of interparticipation of a group of women in social events in Old City, 1936. On Tables 1 and 2, one can observe closeness centralization for partitions  and  and notice that closeness centrality (and hence centralization) is maximized at “Mrs. Evelyn Jefferson” and the event from “September 16th”, respectively.

Everett et al. formulated an interesting conjecture, which was later proved by Sinclair [15]. To state it, we first need a definition.

###### Definition 2.

Let be the tree with node bipartition such that

• for ;

• there exists a node such that ; and

• for all nodes .

The node  is called the root of .

The aforementioned conjecture was that the pair  is an extremal pair for the problem of maximizing betweenness centralization in bipartite graphs with a fixed sized bipartition into parts of sizes  and . Recall that for two-mode data, we are only interested in one type of data: in graph-theoretic terms, we look only at nodes that belong to the part of size , and we want to know which of these nodes has the largest closeness in the graph. In other words, letting  be the part of size  of , we want to determine .

Everett et al. also suggested that the same pair is extremal for closeness and eigenvector centralization measures. In this paper, we confirm the conjecture for the closeness centralization measure. That is, we prove that the pair is extremal for the problem of maximizing closeness centralization in bipartite graphs with parts of size  and ,where  is the root.

We point out that a similar study for the centrality measure of eccentricity was led recently [10]. In addition, Bell [3] worked on closely related notions, namely subgroup centrality measures. Similarly as for two-mode data, a susbet  of the nodes is fixed (called a group) and the aim is to find a node in  with largest centrality. However, unlike in the standard centrality notion, the centrality itself is computed using distances only to the nodes in  (local centrality) or to the nodes outside  (global centrality). Note that the standard notion, which is used in this work, takes into account the distances to all other nodes in the graph.

## 2 Bipartite Networks With Fixed Number of Nodes

###### Theorem 3.

Let be a bipartite graph with node parts and sizes and , respectively. Then for each ,

 C1(u;H(u;n0,n1))⩾C1(v;G).

To prove Theorem 3, suppose that is a bipartite graph with bipartition where for , and is a node in  such that . We prove that this inequality must actually be an equality by showing that any such extremal pair must satisfy the following three properties:

1. is a tree;

2. ; and

3. whenever .

Property 1 is relatively straightforward to check and so is 3 if we assume that 2 holds. Thus the majority of the discussion below will be devoted to proving that 2 holds, which we do last. For convenience, we define  to be .

We start by establishing 1; namely, that the graph is a tree. Assume, for the sake of contradiction, that is not a tree and let be a breadth-first-search tree of  rooted at . Note that and for any node . In addition, there exist at least two nodes for which the above inequality is strict. It follows that , a contradiction.

We now establish that 3 holds if 2 does. Thus we know that is a tree and we assume that , therefore also all nodes from are leaves. Suppose, for the sake of contradiction, that there exist nodes such that . Let be a neighbor of  different from  and consider the graph  obtained by deleting the edge  and replacing it with . Note that and that unless , that is unless  belongs to the closed neighborhood of either  or . So

 C1(u;G′)−C1(u;G)=∑x∈NG[w1]∪NG[w2]1WG(x)−∑x∈NG[w1]∪NG[w2]1WG′(x). (3)

Now, let and where, by assumption, .

Recalling that is a tree, observe that the following hold for every and every (for better illustration, see Figure 3).

1. ;

2. ;

3. ;

4. ;

5. and

6. .

From (i)–(iii), we infer that for any ,

 1WG′(xj)+1WG′(yj)<1WG(xj)+1WG(yj),

and similarly by (v) and (vi),

 1WG′(w1)+1WG′(w2)<1WG(w1)+1WG(w2).

Thus the right side of (3) is greater than

 1WG(z)−1WG′(z)+t∑j=s+11WG(xj)−1WG′(xj),

which is positive by (i) and (iv). This contradiction shows that 3 holds provided 2 does.

It remains to prove that 2 holds to complete the proof. First, if , then the tree  must be an -star, hence the second property is satisfied. Now consider the case where . Then there is precisely one node  that is adjacent to both nodes in . Moreover, if since, if then while . Thus and hence , as wanted.

From now on, we assume that . As in the proof of 3, we argue that if 2 does not hold then can be increased by altering the graph . In this case, however, we find it necessary to use our assumption that itself is at least as large as . This shall allow us to have a lower bound on , by the next lemma.

.

###### Proof.

We establish the inequality via a direct computation. Unfortunately, the expressions involved force a lengthy computation.

We set and we write where . Let us now calculate for each node  of .

1. .

2. Consider the neighbors of : there are

1. neighbors  for which ; and

2. neighbors  for which .

3. Consider the nodes at distance two from : there are

1. nodes  for which ; and

2. nodes  for which .

Since and, for , we have , it follows that if then

 C1(u)=n1+mn1+2m−rn13mn1−2m+2n21−3n1+2r−n1(n1−r)3mn1−2m+2n21−n1+2r−r(m+n1−r)4mn1−2m+3n21−4n1+2r−(n1−r)(m−r)4mn1−2m+3n21−2n1+2r (4) ⩾n1+mn1+2m−n213mn1−2m+2n21−3n1+2r−n1m4mn1−2m+3n21−4n1+2r, (5)

where we used that to derive (5).

One notes that (5) is still true if . Indeed, in this case , so

 C1(u)=n1+mn1+2m−n213mn1−2m+2n21−n1−n1m4mn1−2m+3n21−2n1,

so that (5) stays true.

As is seen from (4), if is fixed and tends to infinity (hence, so does ), then approaches .

Let us now subtract from the right side of (5) and show that the difference is non-negative. After cross-multiplying and simplifying, we obtain a fraction with positive denominator (since each denominator in the right side of (5) is positive), and with numerator equal to

 m2(10n41−44n31+12n21r+30n21−8n1r−4n1)+m(15n51−77n41+38n31r+74n31−54n21r−14n21+8n1r2+8n1r)+(6n61−35n51+22n41r+45n41−48n31r−12n31+12n21r2+14n21r−4n1r2). (6)

This expression increases with  and is clearly positive when (to see it quickly just compare, in each parenthesis, every (maximal) sequence of consecutive negative terms with the (maximal) sequence of positive terms preceding it). Further, a direct calculation ensures that (6) is actually positive even when .

However, if , then (6) could take on negative values for certain values of . To deal with these two cases we revert back to the initial equation (4).

Assume that . Then subtracting from both sides of (4) yields that is at least

 m+32m+3−3r7m+9+2r−9−3r7m+15+2r−r(m+3−r)10m+15+2r−(3−r)(m−r)10m+21+2r−15. (7)

Placing (7) under one (positive) denominator, the numerator becomes

 1540m4+2m3(9075−1016r+588r2)+6m2(10605−1047r+937r2+112r3)+m(88155−3816r+9828r2+2408r3+96r4)+(42525+1350r+6174r2+2280r3+184r4), (8)

which is clearly positive as .

A similar calculation yields the conclusion when . In this case, the difference of (4) and yields that is at least

 m+42m+4−2r5m+10+r−8−2r5m+14+r−r(m+4−r)14m+32+2r−(4−r)(m−r)14m+40+2r−314,

whose numerator, when placed under a common (positive) denominator, is

 1855m4+4m3(5855−82r+100r2)+2m2(52090+206r+1405r2+80r3)+4m(49180+2022r+1793r2+194r3+4r4)+3(44800+4080r+2204r2+332r3+13r4).

This is non-negative as . This concludes the proof. ∎

It remains to demonstrate that 2 holds. To this end, we consider the tree  to be rooted at  and, for a node , we let be the subtree of  rooted at . To avoid unnecessary notation later, let us observe immediately that if then 2 holds. For otherwise, and there exists a node  at distance two from  such that . As a result, , which implies that , a contradiction.

We also note that if for all , then 2 is satisfied. So assume that there exists some child of  whose subtree has depth at least . Among all such children of , let be such that is maximum, that is,

 |V(Tz)|=max{|V(Tv)|:v child of~{}u and Tv has depth at least~{}2}.

We now give some notations, which are illustrated in Figure 4. Let be the nodes of  with depth  and set . Note that, by definition, and whenever . Let be the children of  (in ) with degree more than  and set . Let be the set of children of  with degree  and set .

Note that for any , the definition of  ensures that is a star whenever . The graph  is obtained from  as follows. (An illustration is given in Figure 5.) For convenience, we set .

1. For each , the edge  is added.

2. For each , the edge  is removed and all other edges incident to  but one are removed. Thus the vertices become leaves of , each being attached to one of the vertices .

3. If there exists a child  of  different from  with , then we select an arbitrary set of size and we set . Then for each , we replace the edge  by the edge .

4. If there is no node  as in 3, then we let be a child of  different from  such that is as large as possible, and we define  to be . (Recall that , hence such a child always exists.) Moreover, we set for convenience.

As noted earlier, if 3 applies then is a star. Moreover, if , then one can see that and hence . However, this is not a contradiction since and .

Regardless of whether 3 or 4 applies, . Actually, it is important to notice that, in , no child of  different from  has more than children itself. Even more, for any such child  we know that . This follows from our previous remark if has depth at most , and from the fact that otherwise. Also, setting , we observe that for every node

 distG(pi,x)=⎧⎪⎨⎪⎩distG(u,x)−2if x∈V(Tpi)distG(u,x)+2if x∈R∪V(Tw)distG(u,x)otherwise.

Therefore, . Since the definition of  implies that , it follows that the size of  is at most .

Note that is a tree, which we see rooted at , and and have the same node set, which we call . In addition, and have the same bipartition . Our next task is to compare the total distance of nodes in  and in , that is, we compare  and . For readability purposes, let us set , , and let be the subtree of  rooted at . We now make a few statements about  and  for various nodes. We shall often use that

###### Lemma 5.

The following hold.

1. If , then .

2. If , then .

3. If , then .

4. If , then .

5. If , then and whenever and .

6. If , then .

7. for every node .

###### Proof.

We prove all the statements in order.

1. If , then the distance from to any node not in  is unchanged. In addition, whenever , hence the conclusion.

2. If , then for each . In addition, if , then , which yields the conclusion.

3. It suffices to observe that if , then

 distG′(x,v)=⎧⎪⎨⎪⎩distG(x,v)if v∈V∖(S∪Y)distG(x,v)−2if v∈YdistG(x,v)+2if v∈S.

4. First note that if , then the definition of  ensures that for each , which implies that .

Now let . Observe that if , then . In addition, if , then . Consequently,

 W′(x)−W(x)⩾2∣∣S′∪{w}∣∣−2∣∣V∖({x,w}∪S′)∣∣,

which is non-negative since when , and .

5. Let and . First note that every node in is two units closer to  than to . Similarly, every node in