Signed Network Structural Analysis and Applications with a Focus on Balance Theory

Signed Network Structural Analysis and Applications with a Focus on Balance Theory

Samin Aref
Abstract

This research is an effort to understand small-scale properties of networks resulting in global structure in larger scales. Networks are modelled by graphs and graph-theoretic conditions are used to determine the structural properties exhibited by the network. Our focus is on signed networks which have positive and negative signs as a property on the edges. We analyse networks from the perspective of balance theory which predicts structural balance as a global structure for signed social networks that represent groups of friends and enemies. The vertex set of balanced signed networks can be partitioned into two subsets such that each negative edge joins vertices belonging to different subsets.

The scarcity of balanced networks encouraged us to define the notion of partial balance in order to quantify the extent to which a network is balanced. We evaluate several numerical measures of partial balance and recommend using the frustration index, a measure that satisfies key axiomatic properties and allows us to analyse graphs based on their levels of partial balance.

The exact algorithms used in the literature to compute the frustration index, also called the line index of balance, are not scalable and cannot process graphs with a few hundred edges. We formulate computing the frustration index as a graph optimisation problem to find the minimum number of edges whose removal results in a balanced network given binary decision variables associated with graph nodes and edges. We use our first optimisation model to analyse graphs with up to 3000 edges.

Reformulating the optimisation problem, we develop three more efficient binary linear programming models. Equipping the models with valid inequalities and prioritised branching as speed-up techniques allows us to process graphs with 15000 edges on inexpensive hardware. Besides making exact computations possible for large graphs, we show that our models outperform heuristics and approximation algorithms suggested in the literature by orders of magnitude.

We extend the concepts of balance and frustration in signed networks to applications beyond the classic friend-enemy interpretation of balance theory in social context. Using a high-performance computer, we analyse graphs with up to 100000 edges to investigate a range of applications from biology and chemistry to finance, international relations, and physics.

\pdfsuppresswarningpagegroup

=1

\subtitle\degreesought

Doctor of Philosophy (Ph.D.) \degreedisciplineComputer Science \degreecompletionyear2019

\thesisdedication

Dedicated to
M. (infatuation)
H. (sharp sword)
M. (moonlight)
and S. (star)

Acknowledgements

I would like to express my very great appreciation to Dr. Mark C. Wilson for supervising this research and motivating me in the past couple of years. The experience of working with him was extremely valuable and I am deeply indebted to him for sharing his knowledge and expertise. It was proven to me numerous times that having him as a supervisor has played a key role in the success of this Ph.D. project.

I also would like to thank Dr. Andrew J. Mason for co-supervising this research. I am very grateful to him not only for his valuable comments, but for sharing his mathematical modelling expertise which strengthened the contributions of this thesis.

I was privileged to have Dr. Serge Gaspers and Prof. Gregory Gutin as examiners of this thesis. I am thankful for their essential comments which helped in revising and improving the thesis.

This research would not be completed without the tremendous support of my partner for whom my heart is filled with gratitude. I cannot thank her enough for her selfless and pure love that has lighten up my life. The challenges we faced could have not possibly been overcome without her devotion and dedication.

I am also indebted to a lifetime of love and support from my parents and my sister. Their presence has always encouraged me to accept new challenges such as a Ph.D. program in New Zealand. I am grateful for having the best father, the best mother, and the best sister I can possibly imagine.

I would like to acknowledge University of Auckland for investing in these ideas. The support provided by Department of Computer Science, Centre for eResearch, and Te Pūnaha Matatini was greatly appreciated.

In the end, I would like to thank everyone who has taught me something; past teachers and professors as well as authors of the papers I have read and the reviewers who have commented on my works.

\settocdepth

subsection

Contents
List of Figures

Chapter \thechapter Introduction

We investigate small-scale properties of networks resulting in global structure in larger scales. Networks are modelled by graphs and graph-theoretic conditions are used to determine the structural properties exhibited by the network. Our focus is on signed networks which have positive and negative signs as a property on the edges. We analyse networks from the perspective of balance theory which predicts structural balance as a global structure for signed social networks that represent groups of friends and enemies. The vertex set of balanced signed networks can be partitioned into two subsets such that each negative edge joins vertices belonging to different subsets.

The scarcity of balanced networks encouraged us to define the notion of partial balance in Chapter \thechapter in order to quantify the extent to which a network is balanced. We evaluate several numerical measures of partial balance using randomly generated graphs and basic axioms. The results highlight using the frustration index, a measure that satisfies key axiomatic properties and allows us to analyse graphs based on their levels of partial balance [12].

Two types of random graphs that we use are Erdős-Rényi graphs and Barabási-Albert graphs [22]. Erdős-Rényi graphs, denoted by , are a type of random graphs generated based on a model named after Paul Erdős and Alfréd Rényi in which given a fixed vertex set of size , all graphs with edges are equally likely to be generated. Note that another model for generating random graphs is contemporaneously introduced by Edgar Gilbert in which each edge has a fixed probability of being present or absent in a graph with nodes. Such randomly generated graphs are denoted by , but also referred to as Erdős-Rényi graphs. Throughout this thesis, we used the term Erdős-Rényi graphs alongside the distinctive notation to clarify the type of Erdős-Rényi graph.

Barabási-Albert graphs are another type of random graphs that are generated based on the preferential attachment process [22]. According to this random graph generation model, a graph is grown by attaching new nodes each with a certain number of edges that are preferentially attached to existing high-degree nodes. Different types of random graphs can be generated using the NetworkX package. NetworkX provides functions which take parameters such as size and order and generate graphs according to certain random graph generation models and processes such as Erdős-Rényi model or preferential attachment process.

The exact algorithms used in the literature to compute the frustration index, also called the line index of balance, are not scalable and cannot process graphs with a few hundred edges. In Chapter \thechapter, we formulate computing the frustration index as a graph optimisation problem in order to find the minimum number of edges whose removal results in a balanced network given binary decision variables associated with graph nodes and edges. We use our first optimisation model to analyse graphs with up to 3000 edges. Such computations take a few seconds on an ordinary computer [11].

In Chapter \thechapter, we reformulate the optimisation problem to develop three more efficient binary linear programming models. Equipping the models with valid inequalities and prioritised branching as speed-up techniques allows us to process graphs with 15000 edges. Using our more advanced models, such instances take less than a minute on inexpensive hardware. Besides making exact computations possible for large graphs, we show that our models outperform heuristics and approximation algorithms suggested in the literature by orders of magnitude [10].

In Chapter \thechapter, we extend the concepts of balance and frustration in signed networks to applications beyond the classic friend-enemy interpretation of balance theory in social context. Using a high-performance computer, we analyse graphs with up to 100000 edges to investigate a range of applications from biology and chemistry to finance, international relations, and physics. The longest solve time for these instances is 9.3 hours. We use the frustration index as a measure of distance to monotonicity in biological networks, a predictor of fullerene chemical stability, a measure of bi-polarisation in international relations, a measure of financial portfolio performance, and an indicator of ground-state energy in models of atomic magnets [13].

Chapters \thechapter\thechapter of this thesis are based on the results from the following papers [12, 11, 10, 13]. Links to publisher’s verified versions of the four papers are provided in the bibliography. Each chapter is written as a self-contained paper and the readers who are interested in a specific chapter can directly jump to that chapter. Those who read this thesis as a whole may notice several preliminary definitions recurring at the beginning of each chapter. In particular, the readers may notice an overlap between Chapter \thechapter and Chapter \thechapter that both concern computing the frustration index. More introductory discussions regarding computing the frustration index are provided in Chapter \thechapter, while Chapter \thechapter concerns more advanced discussions about the efficiency of such computations.

Chapter \thechapter Measuring Partial Balance in Signed Networks

Abstract

Is the enemy of an enemy necessarily a friend? If not, to what extent does this tend to hold? Such questions were formulated in terms of signed (social) networks and necessary and sufficient conditions for a network to be “balanced" were obtained around 1960. Since then the idea that signed networks tend over time to become more balanced has been widely used in several application areas. However, investigation of this hypothesis has been complicated by the lack of a standard measure of partial balance, since complete balance is almost never achieved in practice. We formalise the concept of a measure of partial balance, discuss various measures, compare the measures on synthetic datasets, and investigate their axiomatic properties. The synthetic data involves Erdős-Rényi and specially structured random graphs. We show that some measures behave better than others in terms of axioms and ability to differentiate between graphs. We also use well-known datasets from the sociology and biology literature, such as Read’s New Guinean tribes, gene regulatory networks related to two organisms, and a network involving senate bill co-sponsorship. Our results show that substantially different levels of partial balance is observed under cycle-based, eigenvalue-based, and frustration-based measures. We make some recommendations for measures to be used in future work.

1 Introduction to Chapter \thechapter

Transitivity of relationships has a pivotal role in analysing social interactions. Is the enemy of an enemy a friend? What about the friend of an enemy or the enemy of a friend? Network science is a key instrument in the quantitative analysis of such questions. Researchers in the field are interested in knowing the extent of transitivity of ties and its impact on the global structure and dynamics in communities with positive and negative relationships. Whether the application involves international relationships among states, friendships and enmities between people, or ties of trust and distrust formed among shareholders, relationship to a third entity tends to be influenced by immediate ties.

There is a growing body of literature that aims to connect theories of social structure with network science tools and techniques to study local behaviours and global structures in signed graphs that come up naturally in many unrelated areas. The building block of structural balance is a work by Heider [78] that was expanded into a set of graph-theoretic concepts by Cartwright and Harary [27] to handle a social psychology problem a decade later. The relationship under study has an antonym or dual to be expressed by the opposite sign [71]. In a setting where the opposite of a negative relationship is a positive relationship, a tie to a distant neighbour can be expressed by the product of signs reaching him. Cycles containing an odd number of negative edges are considered to be unbalanced, guaranteeing total balance therefore only in networks containing no such cycles. This strict condition makes it quite unlikely for a signed network to be totally balanced. The literature on signed networks suggests many different formulae to measure balance. These measures are useful for detecting total balance and imbalance, but for intermediate cases their performance is not clear and has not been systematically studied.

Our contribution in Chapter \thechapter

The main focus of this chapter is to provide insight into measuring partial balance, as much uncertainty still exists on this. The dynamics leading to specific global structures in signed networks remain speculative even after studies with fine-grained approaches. The central thesis of this chapter is that not all measures are equally useful. We provide a numerical comparison of several measures of partial balance on a variety of undirected signed networks, both randomly generated and inferred from well-known datasets. Using theoretical results for simple classes of graphs, we suggest an axiomatic framework to evaluate these measures and shed light on the methodological details involved in using such measures.

This chapter begins by laying out the theoretical dimensions of the research in Section 2 and looks at basic definitions and terminology. In Section 3 different means of checking for total balance are outlined. Section 4 discusses some approaches to measuring partial balance in Eq. (4) – (11), categorised into three families of measures 4.14.3 and summarised in Table 1. Numerical results on synthetic data are provided in Figures 12 in Section 5. Section 6 is concerned with analytical results on synthetic data in closed-form formulae in Table 2 and visually represented in Figures 34. Axioms and desirable properties are suggested in Section 7 to evaluate the measures systematically. Section 8 concerns recommendations for choosing a measure of balance. Numerical results on real signed networks are presented in Section 9. Finally, Section 10 summarises the chapter.

2 Problem statement and notation

Throughout this chapter, the terms signed graph and signed network will be used interchangeably to refer to a graph with positive and negative edges. We use the term cycle only as a shorthand for referring to simple cycles of the graph. While several definitions of the concept of balance have been suggested, this chapter will only use the definition for undirected signed graphs unless explicitly stated.

We consider an undirected signed network where and are the sets of vertices and edges, and is the sign function . The set of nodes is denoted by , with . The set of edges is represented by including negative edges and positive edges adding up to a total of edges. We denote the graph density by . The symmetric signed adjacency matrix and the unsigned adjacency matrix are denoted by A and respectively. Their entries are defined in (1) and (2).

(1)
(2)

The positive degree and negative degree of node are denoted by and representing the number of positive and negative edges incident on node respectively. They are calculated based on and . The degree of node is represented by and equals the number of edges incident on node . It is calculated based on .

A walk of length in is a sequence of nodes such that for each there is an edge from to . If , the sequence is a closed walk of length . If all the nodes in a closed walk are distinct except the endpoints, it is a cycle (simple cycle) of length . The sign of a cycle is the product of the signs of its edges. A cycle is balanced if its sign is positive and is unbalanced otherwise. The total number of balanced cycles (closed walks) of length is denoted by (). Similarly, () denotes the total number of unbalanced cycles (closed walks) of length . The total number of cycles (closed walks) of length is represented by ().

We use to denote a reshuffled graph in which the sign function is a random mapping of to that preserves the number of negative edges. The reshuffling process preserves the underlying graph structure.

3 Checking for balance

It is essential to have an algorithmic means of checking for balance. We recall several known methods here. The characterisation of bi-polarity (also called bipartitionability), that a signed graph is balanced if and only if its vertex set can be partitioned into two subsets such that each negative edge joins vertices belonging to different subsets [70], leads to an algorithm of complexity [75] similar to the usual algorithm for determining whether a graph is bipartite. An alternate algebraic criterion is that the eigenvalues of the signed and unsigned adjacency matrices are equal if and only if the signed network is balanced [2] which results in an algorithm of complexity to check for balance. For our purposes the following additional method of detecting balance is also important. We define the switching function operating over a set of vertices as follows.

(3)

As the sign of cycles remains the same when is applied, any balanced graph can switch to an all-positive signature [69]. Accordingly, a balance detection algorithm of complexity can be developed by constructing a switching rule on a spanning tree and a root vertex, as suggested in [69]. Finally, another method of checking for balance in connected signed networks makes use of the signed Laplacian matrix defined by where is the diagonal matrix of degrees. The signed Laplacian matrix, L, is positive-semidefinite i.e. all of its eigenvalues are nonnegative [139, 141]. The smallest eigenvalue of L equals 0 if and only if the graph is balanced [139, Section 8A]. This leads to an balance checking algorithm.

4 Measures of partial balance

Several ways of measuring the extent to which a graph is balanced have been introduced by researchers. We discuss three families of measures here and summarise them in Table 1.

4.1 Measures based on cycles

The simplest of such measures is the degree of balance suggested by Cartwright and Harary [27], which is the fraction of balanced cycles:

(4)

There are other cycle-based measures closely related to . The relative -balance, denoted by and formulated in Eq. (5) is a cycle-based measure where the sums defining the numerator and denominator of are restricted to a single term of fixed index [71, 74]. The special case is called the triangle index, denoted by .

(5)

Giscard et al. have recently introduced efficient algorithms for counting simple cycles [63] making it possible to use various measures related to to evaluate balance in signed networks [64].

A generalisation is weighted degree of balance, obtained by weighting cycles based on length as in Eq. (6), in which is a monotonically decreasing nonnegative function of the length of the cycle.

(6)

The selection of an appropriate weighting function is briefly discussed by Norman and Roberts [113], suggesting functions such as , but no objective criterion for choosing such a weighting function is known. We consider two weighting functions and for evaluating in this chapter. Given the typical distribution of cycles of different lengths, makes mostly dominated by longer cycles that are more frequent while makes mostly determined by shorter cycles.

Although fast algorithms are developed for counting and listing cycles of undirected graphs [20, 63], the number of cycles grows exponentially with the size of a typical real-world network. To tackle the computational complexity, Terzi and Winkler [132] used in their study and made use of the equivalence between triangles and closed walks of length . The triangle index can be calculated efficiently by the formula in (7) where denotes the trace111The trace of a matrix is the sum of its diagonal entries. of A.

(7)

The relative signed clustering coefficient is suggested as a measure of balance by Kunegis [88], taking insight from the classic clustering coefficient. After normalisation, this measure is equal to the triangle index. Having access to an easy-to-compute formula [132] for obviates the need for a clustering-based calculation which requires iterating over all triads222groups of three nodes in the graph.

Bonacich argues that dissonance and tension are unclear in cycles of length greater than three [23], justifying the use of the triangle index to analyse structural balance. However, the neglected interactions may represent potential tension and dissonance, though not as strong as that represented by unbalanced triads. One may consider a smaller weight for longer cycles, thereby reducing their impact rather than totally disregarding them. Note that is a generalisation of both and .

In all the cycle-based measures, we consider a value of for the case of division by zero. This allows the measures and () to provide a value for acyclic graphs (graphs with no -cycle).

4.2 Measures related to eigenvalues

Beside checking cycles, there are computationally easier approaches to evaluating structural balance such as the walk-based approach. The walk-based measure of balance is suggested by Pelino and Maimone [116] with more weight placed on shorter closed walks than the longer ones. Let and denote the trace of the matrix exponential333The matrix exponential is a matrix function similar to the ordinary exponential function. for A and respectively. In Eq. (8), closed walks are weighted by a function with a relatively fast rate of decay compared to functions suggested in [113]. The weighted ratio of balanced to total closed walks is formulated in Eq. (8).

(8)

Regarding the calculation of , one may use the standard fact that A is a symmetric matrix for undirected graphs. It follows that in which ranges over eigenvalues of A. The idea of a walk-based measure was then used by Estrada and Benzi [45]. They have tested their measure on five signed networks resulting in values inclined towards imbalance which were in conflict with some previous observations [48, 88]. The walk-based measure of balance suggested in [45] have been scrutinised in the subsequent studies [64, 128]. Giscard et al. discuss how using closed-walks in which the edges might be repeated results in mixing the contribution of various cycle lengths and leads to values that are difficult to interpret [64]. Singh et al. criticise the walk-based measure from another perspective and explains how the inverse factorial weighting distorts the measure towards showing imbalance [128].

The idea of another eigenvalue-based measure comes from spectral graph theory [89]. The smallest eigenvalue of the signed Laplacian matrix (defined in Section 3) provides a measure of balance for connected graphs called algebraic conflict [89]. Algebraic conflict, denoted by , equals zero if and only if the graph is balanced. Positive-semidefiniteness of L results in representing the amount of imbalance in a signed network. Algebraic conflict is used in [88] to compare the level of balance in online signed networks of different sizes. Moreover, Pelino and Maimone analysed signed network dynamics based on [116]. Bounds for are investigated by [80] leading to recent applicable results in [17, 18]. Belardo and Zhou prove that for a fixed is maximised by the complete all-negative graph of order [18]. Belardo shows that is bounded by in which represents the maximum average degree of endpoints over graph edges [17]. We use this upper bound to normalise algebraic conflict. Normalised algebraic conflict, denoted by , is expressed in Eq. (9).

(9)

4.3 Measures based on frustration

A quite different measure is the frustration index [1, 72, 140] that is also referred to as the line index for balance [72]. A set of edges is called minimum deletion set if deleting all edges in results in a balanced graph and deleting edges from no smaller set leads to a balanced graph. The frustration index equals the cardinality of a minimum deletion set as in Eq. (10).

(10)

Each edge in lies on an unbalanced cycle and every unbalanced cycle of the network contains an odd number of edges in . Iacono et al. showed that equals the minimum number of unbalanced fundamental cycles induced over all spanning trees444A minimal set of cycles which may be formed from any spanning tree of a given graph, through choosing the cycles formed by combining a path from the tree with a single edge from outside the tree. of the graph [82]. The graph resulted from deleting all edges in is called a balanced transformation of a signed graph.

Similarly, in a setting where each vertex is given a black or white colour, if the endpoints of positive (negative) edges have different colours (same colour), they are “frustrated". The frustration index is therefore the smallest number of frustrated edges over all possible 2-colourings of the nodes.

is hard to compute as the special case with all edges being negative is equivalent to the MAXCUT problem [60], which is known to be NP-hard. There are upper bounds for the frustration index such as which states the obvious result of removing all negative edges.

Facchetti, Iacono, and Altafini have used computational methods related to Ising spin glass models to estimate the frustration index in relatively large online social networks [48]. Using an estimation of the frustration index obtained by a heuristic algorithm, they concluded that the online signed networks are extremely close to total balance; an observation that contradicts some other research studies like [45].

The number of frustrated edges in special Erdős-Rényi graphs, , is analysed by El Maftouhi, Manoussakis and Megalakaki [42]. It follows a binomial distribution with parameters and in which represents the sum of equal probabilities for positive and negative edges in Erdős-Rényi graph . Therefore, the expected number of frustrated edges is . They also prove that such a network is almost always not balanced when . It is straightforward to prove that frustration index is equal to the minimum number of negative edges over all switching functions [141]. Petersdorf [117] proves that the frustration index is bounded by .

Bounds for the largest number of frustrated edges for a graph with nodes and edges are provided in [4]. It follows that ; an upper bound that is not necessarily tight.

Another upper bound for the frustration index is reported in [82] referred to as the worst-case upper bound on the consistency deficit. However, the frustration index values in complete graphs with all negative edges shows that the upper bound is incorrect.

In order to compare with the other indices which take values in the unit interval and give the value for balanced graphs, we suggest normalised frustration index, denoted by and formulated in Eq. (11).

(11)

Using a different upper bound for normalising the frustration index, we discuss another frustration-based measure in Subsection 6.3 and formulate it in Eq. (12).

4.4 Other methods of evaluating balance

Balance can also be analysed by blockmodeling555Blockmodeling is a method for dividing network vertices into particular sets called blocks. based on iteratively calculating Pearson moment correlations666The Pearson moment correlation is a measure of correlation which quantifies the strength and the direction of relationship between two variables. from the columns of A [34]. Blockmodeling reveals increasingly homogeneous sets of vertices.

Doreian and Mrvar discuss this approach in partitioning signed networks [37]. Applying the method to Correlates of War data on positive and negative international relationships, they refute the hypothesis that signed networks gradually move towards balance using blockmodeling alongside some variations of and [38].

Moreover, there are probabilistic methods that compare the expected number of balanced and unbalanced triangles in the signed network and its reshuffled version [94, 138, 129, 130]. As long as these measures are used to evaluate balance, the result will not be different to what provides alongside a basic statistical testing of its value against reshuffled networks.

Some researchers suggest that studying the structural dynamics of signed networks is more important than measuring balance [26, 99]. This approach is usually associated with considering an energy function to be minimised by local graph operations decreasing the energy. However, the energy function is somehow a measure of network imbalance which requires a proper definition and investigation of axiomatic properties. Seven measures of partial balance investigated in this chapter are outlined in Table 1.

Measure Name, Reference, and Description
Degree of balance [27, 72]
A cycle-based measure representing the ratio of balanced cycles
Weighted degree of balance [113]
An extension of using cycles weighted by a function of length
Relative -balance [71, 74]
A variant of placing a non-zero weight only on cycles of length
Triangle index [132, 88]
A triangle-based measure representing the ratio of balanced triangles
Walk-based measure of balance [116, 45]
A simplified extension of replacing cycles by closed walks
Normalised algebraic conflict [89, 88]
A normalised measure using least eigenvalue of the Laplacian matrix
Normalised frustration index [72, 48]
Normalised minimum number of edges whose removal results in balance
Table 1: Measures of partial balance summarised

Outline of the rest of the chapter

We started by discussing balance in signed networks in Sections 2 and 3 and reviewed different measures in Section 4. We will provide some observations on synthetic data in Figures 14 in Sections 5 and 6 to demonstrate the values of different measures. The reader who is not particularly interested in the analysis of measures using synthetic data may directly go to Section 7 in which we introduce axioms and desirable properties for measures of partial balance. In Section 8, we provide some recommendations on choosing a measure and discuss how using unjustified measures has led to conflicting observations in the literature. The numerical results on real signed networks are presented in Section 9.

5 Numerical results on synthetic data

In this section, we start with a brief discussion on the relationship between negative edges and imbalance in networks. According to the definition of structural balance, all-positive signed graphs (merely containing positive edges) are totally balanced. Intuitively, one may expect that all-negative signed graphs are very unbalanced. Perhaps another intuition derived by assuming symmetry is that increasing the number of negative edges in a network reduces partial balance proportionally. We analyse partial balance in randomly generated graphs to evaluate these intuitions. Our motivation for analysing balance in such graphs is to gain an understanding of the behaviour of the measures and their connections with signed graph parameters like , and .

5.1 Erdős-Rényi random network with various

We calculate measures of partial balance, denoted by , for an Erdős-Rényi random network, , with 15 nodes, 50 edges, and a various number of negative edges. Figure 1 demonstrates the partial balance measured by different methods. For each data point, we report the average of 50 runs, each assigning negative edges at random to the fixed underlying graph. The subfigures (c) and (d) of Figure 1 show the mean along with standard deviation.

(a) The mean values of seven measures of partial balance
(b) The mean values of relative -balance
(c) The standard deviation of and
(d) The standard deviation of and
Figure 1: Partial balance measured by different methods in Erdős-Rényi network, , with various number of negative edges

Measures and with are observed to tend to where , not differentiating partial balance in graphs with a non-trivial number of negative edges. Given the typical distribution of cycles of different lengths, we expect and with to be mostly determined by longer cycles that are much more frequent. For this particular graph, cycles with a length of 10 and above account for more than of the total cycles in the graph. Such long cycles tend to be balanced roughly half the time for almost all values of (for all the values within the range of in the network considered here). The perfect overlap of data points for and with in Figure 1 shows that using a linear rate of decay does not make a difference. One may think that if we use which does not mix cycles of different lengths, it may circumvent the issues. However, subfigure (b) of Figure 1 demonstrating values of for different cycle lengths shows the opposite. It shows not only does not resolve the problems of lack of sensitivity and clustering around , but it behaves unexpectedly with substantially different values based on the parity of when . weighted by and mostly determined by shorter cycles, decreases slower than and then provides values close to for . drops below for and then clusters around for . is the measure with a wide range of values symmetric to . The single most striking observation to emerge is that seems to have a completely different range of values, which we discuss further in Subsection 6.3. A steady linear decrease is observed from for .

5.2 4-regular random networks of different orders

To investigate the impacts of graph order (number of nodes) and density on balance, we computed the measures for randomly generated 4-regular graphs with 50 percent negative edges. Intuitively we expect values to have low variation and no trends for similarly structured graphs of different orders. Figure 2 demonstrates the analysis in a setting where the degree of all the nodes remains constant, but the density () is decreasing in larger graphs. For each data point the average and standard deviation of 100 runs are reported. In each run, negative weights are randomly assigned to half of the edges in a fixed underlying 4-regular graph of order .

Figure 2: Partial balance measured by different methods in 50% negative 4-regular graphs of different orders and decreasing densities

According to Figure 2, the four measures differ not only in the range of values, but also in their sensitivity to the graph order and density. First, when for larger graphs although the graphs are structurally similar, which goes against intuition. Clustered around is which features a substantial standard deviation for 4-regular random graphs. Values of are around and do not seem to change substantially when increases. provides stationary values around when increases. While and depend on the graph order and size, the relative constancy of and values suggest the normalised measures and are largely independent of the graph size and order, as our intuition expects. We further discuss the normalisation of and in Subsection 6.3.

6 Analytical results on synthetic data

In this section, we analyse the capability of measuring partial balance in some families of specially structured graphs. Closed-form formulae for the measures in specially structured graphs are provided in Table 2. We will describe two families of complete signed graphs in 6.1 and 6.2.

6.1 Minimally unbalanced complete graphs with a single negative edge

The first family includes complete graphs with a single negative edge, denoted by . Such graphs are only one edge away from a state of total balance. It is straight-forward to provide closed-form formulae for as expressed in Eq. (21) – (27) in Appendix 11.1.

Table 2: Balance in minimally and maximally unbalanced graphs (6.1) and (6.2)
Figure 3: Partial balance measured by different methods for (6.1)

In , intuitively we expect to increase with and as . We also expect the measure to detect the imbalance in (a triangle with one negative edge). Figure 3 demonstrates the behaviour of different indices for complete graphs with one negative edge. gives unreasonably large values for . Except for , the measures are co-monotone777Excluding , we observe a consistent order among the values of the other five measures within the given range of . over the given range of .

6.2 Maximally unbalanced complete graphs with all-negative edges

The second family of specially structured graphs to analyse includes all-negative complete graphs denoted by . The indices are calculated in Eq. (29) – (35) in Appendix 11.1.

Intuitively, we expect a measure of partial balance to represent the lack of balance in by providing a value close to . Figure 4 illustrates oscillating around and as increases. We explain the oscillation of in Appendix 11.1. Clearly, measures , and, provide values for that go against our intuition. Figure 4 shows that as as expected based on Table 2.

Figure 4: Partial balance measured by different methods for (6.2)

6.3 Normalisation of the measures

It is worth mentioning that measures of partial balance may lead to different maximally unbalanced complete graphs. Based on and , are maximally unbalanced graphs [18, 117] (also see Subsections 4.2 and 14.4), while it is merely one family among the maximally unbalanced graphs according to . Estrada and Benzi have found complete graphs comprised of one cycle of positive edges with the remaining pairs of nodes connected by negative edges to be a family of maximally unbalanced graphs based on [45], while this argument is not supported by any other measures. It is difficult to find the structure of maximally unbalanced graphs under the cycle-based measures and partly because the signs of cycles in a graph are not independent. This is a major obstacle in finding a suitable way to normalise cycle-based measures.

A simple comparison of (calculations provided in Eq. (34) in Appendix 11.1) and the proposed upper bound reveals substantial gaps. These gaps equal for even and for odd . This supports the previous discussions on looseness of as an upper bound for frustration index. As is maximally unbalanced under , can be used as a tight upper bound for normalising the frustration index. This allows a modified version of normalised frustration index, denoted by and defined in Eq. (12), to take the value zero for .

(12)

Similarly, the upper bound, , used to normalise algebraic conflict, is not tight for many graphs. For instance, in the Erdős-Rényi graph, , studied in Section 5 with , the existence of an edge with makes , while .

The two observations mentioned above suggest that tighter upper bounds can be used for normalisation. However, the statistical analysis we use in Section 9 to evaluate balance in real networks is independent of the normalisation method, so we do not pursue this question further now.

6.4 Expected values of the cycle-based measures

Relative -balance, , is proved by El Maftouhi, Manoussakis and Megalakaki [42] to tend to for Erdős-Rényi graphs, , such that the probability of an edge being negative is equal to . Moreover, Giscard et al. discuss the probability distribution of . Their discussion is based on a model in which the sign of any edge is negative with a fixed probability [64, Section 4.2]. We use the same model to present some simple observations that appear not to have been noticed by previous authors advocating for the use of cycle-based measures. We are going to take a different approach from that of Giscard et al. and merely calculate the expected values of cycle-based measures in general, rather than the full distribution under additional assumptions. Note that for an arbitrary graph, gives the probability that a randomly chosen -cycle is balanced and is denoted by . Let be a graph and consider the sign function obtained by independently choosing each edge to be negative with probability , and positive otherwise. Then, the expected value of ,

(13)
Proof.

Note that a cycle is balanced if and only if it has an even number of negative edges. Thus

(compare with [64, Eq. 4.1]). This simplifies to the stated formula (details of calculations are given in Appendix 11.1). ∎

Note that the expected values are independent of the graph structure and obtaining them does not require making any assumptions on the signs of cycles being independent random variables. As the signs of the edges are independent random variables, the expected value of can be obtained by summing on all cases having an even number of negative signs in the -cycle.

Based on (13), when and when supporting our intuitive expectations. However, when , takes extremal values based on the parity of which is a major problem as previously observed in the subfigure (b) of Figure 1. It is clear to see that the parity of makes a substantial difference to when a considerable proportion of edges are negative.

Let be a graph and consider the sign function obtained by independently choosing each edge to be negative with probability , and positive otherwise. Then

(14)
Proof.

The random variable can be written as . Taking expected value from the two sides gives as is a constant for a fixed . This completes the proof using the result from Eq. (13). ∎

Note that the exponential decay of the factor reduces the contribution for large , and small values of will dominate for many graphs. For example, if the expression for simplifies to

For many graphs encountered in practice, will initially grow with (exponentially, but at a rate less than ) and then decrease, so the tail contribution will be small. Larger values of only make this effect more pronounced. Thus we expect that will often be very close to in signed graphs with a reasonably large fraction of negative edges (we have already seen such a phenomenon in Subsection 5.1). A similar conclusion can be made for . This casts doubt on the usefulness of the measures that mix cycles of different lengths whether weighted or not.

While we have also observed many problems involving values of cycle-based measures on synthetic data in other parts of Sections 5 and 6, we will continue evaluating their axiomatic properties in Section 7 and then summarise the methodological findings in Section 8.

7 Axiomatic framework of evaluation

The results in Section 5 and Section 6 indicate that the choice of measure substantially affects the values of partial balance. Besides that, the lack of a standard measure calls for a framework of comparing different methods. Two different sets of axioms are suggested in [113], which characterise the measure inside a smaller family (up to the choice of ). Moreover, the theory of structural balance itself is axiomatised in [124]. However, to our knowledge, axioms for general measures of balance have never been developed. Here we provide the first set of axioms and desirable properties for measures of partial balance, in order to shed light on their characteristics and performance.

7.1 Axioms for measures of partial balance

We define a measure of partial balance to be a function taking each signed graph to an element of . Worthy of mention is that some of these measures were originally defined as a measure of imbalance (algebraic conflict, frustration index and the original walk-based measure) calibrated at for completely balanced structures, so that some normalisation was required, and perhaps our normalisation choices can be improved on (see Subsection 6.3). As the choice of as the upper bound for normalising the line index of balance was somewhat arbitrary, another normalised version of frustration index is defined in Eq. (15).

(15)

Before listing the axioms, we justify the need for an axiomatic evaluation of balance measures. As an attempt to understand the need for axiomatising measures of balance, we introduce two unsophisticated and trivial measures that come to mind for measuring balance. The fraction of positive edges, denoted by , is defined in Eq. (16) on the basis that all-positive signed graphs are balanced. Moreover, a binary measure of balance, denoted by , is defined in Eq. (17). While and appear to be irrelevant, there is currently no reason not to use such measures.

(16)
(17)

We consider the following notation for referring to basic operations on signed graphs:

denotes signed graph switched by (switched graph).
denotes the disjoint union of two signed graphs and (disjoint union).
denotes with deleted (removing an edge).
denotes after removing the edges in a minimum deletion set (balanced transformation).
denotes the disjoint union of graphs and a positive 3-cycle (adding a balanced 3-cycle).
denotes the disjoint union of graphs and a negative 3-cycle (adding an unbalanced 3-cycle).
denotes an edge in a minimum deletion set.
denotes a balanced transformation of a graph with an edge added to it.

We list the following axioms:

A1

.

A2

if and only if is balanced.

A3

If , then .

A4

.

The justifications for such axioms are connected to very basic concepts in balance theory. We consider A1 essential in order to make meaningful comparisons between measures. Introducing the notion of partial balance, we argue that total balance, being the extreme case of partial balance, should be denoted by an extremal value as in A2. In A3, the argument is that the overall balance of two disjoint graphs is bounded between their individual balances. This also covers the basic requirement that the disjoint union of two copies of graph must have the same value of partial balance as . Switching nodes should not change balance [141] as in A4.

Table 3 shows how some measures fail on particular axioms. The results provide important insights into how some of the measures are not suitable for measuring partial balance. A more detailed discussion on the proof ideas and counterexamples related to Table 3 is provided in Appendix 11.2.

A1
A2
A3
A4
Table 3: Different measures satisfying or failing axioms

7.2 Some other desirable properties

We also consider four desirable properties that formalise our expectations of a measure of partial balance. We do not consider the following as axioms in that they are based on adding or removing 3-cycles and edges which may bias the comparison in favour of cycle-based and frustration-based measures.

Positive and negative 3-cycles are very commonly used to explain the theory of structural balance which makes B1 and B2 obvious requirements. Removing an edge which belongs to a minimum deletion set, should not decrease balance as in B3. Finally, if a balanced transformation of graph becomes unbalanced by adding an edge, the addition of such an edge to the graph should not increase balance as in B4.

B1

If , then .

B2

If , then

B3

If , then .

B4

If and , then .

Table 4 shows how some measures fail on particular desirable properties. It is worth mentioning that the evaluation in Tables 34 is somewhat independent of parametrisation: for each strictly increasing function such that and , the results in Tables 34 hold for . Proof ideas and counterexamples related to Table 4 is provided in Appendix 11.2.

B1
B2
B3
B4
Table 4: Different measures satisfying or failing desirable properties

Another desirable property, which we have not formulated as a formal requirement owing to its vagueness, is that the measure takes on a wide range of values. For example, and tend rapidly to as increases which makes their interpretation and possibly comparison with other measures difficult. A possible way to formalise it would be expecting to give and on each complete graph of order at least , for some assignment of signs of edges. This condition would be satisfied by and , as well as . However, and would not satisfy this condition due to the existence of balanced cycles and closed walks in complete signed graphs of general orders. Moreover, the very small standard deviation of , , and makes statistical testing against the balance of reshuffled networks complicated. The measures , , and also have shown some unexpected behaviours on various types of graphs discussed in Section 5 and Section 6.

8 Discussion on methodological findings

Taken together, the findings in Sections 57 give strong reason not to use cycle-based measures and , regardless of the weights. The major issues with cycle-based measures and include the very small variance in randomly generated and reshuffled graphs, lack of sensitivity and clustering of values around 0.5 for graphs with a non-trivial number of negative edges. Recall the numerical analysis of synthetic data in Section 5, analytical results on the expected values of cycle-based measures in Subsection 6.4, and the numerical values which are difficult to interpret like the oscillation of and values of for graphs in Table 2 and Figure 4.

The relative -balance which is ultimately from the same family of measures, seems to resolve some, but not all the problems discussed above. However, it fails on several axioms and desirable properties. It is easy to compute based on closed walks of length 3 [132] and there are recent methods resolving the computational burden of computing for general [64, 63]. However, cannot be used for cyclic graphs that do not have -cycles. Besides, for networks with a large proportion of negative edges, the parity of substantially distorts the values of . Accepting all these shortcomings, one may use when cycles of a particular length have a meaningful interpretation in the context of study.

Walk-based measures like require a more systematic way of weighting to correct for the double-counting of closed walks with repeated edges. The shortcomings of involving the weighting method and contribution of non-simple cycles are also discussed in [64, 128]. Recall that in 4-regular graphs when we increase as in our discussion in Subsection 5.2. Besides, as increases as discussed in Subsection 6.2. The commonly observed clustering of values near 0.5 may also present problems. Moreover, the model behind is strange as signs of closed walks do not represent balance or imbalance. For these reasons we do not recommend for future use.

The major weakness of the normalised algebraic conflict, , seems to be its incapability of evaluating the overall balance in graphs that have more than one connected component. Note that some of the failures observed for on axioms and desirable properties stem from its dependence on the smallest eigenvalue of the signed Laplacian matrix. might be determined by a component of the graph disconnected from other components and in turn not capturing the overall balance of the graph as a whole. For analysing graphs with just one cyclic connected component, one may use while disregarding the acyclic components. However, if a graph has more than one cyclic connected component, using or is similar to disregarding all but the most balanced connected component in the graph.

The three trivial measures, namely , and , fail on various basic axioms and desirable properties in Tables 3 and 4, and also show a lack of sensitivity to the graph, making them inappropriate to be used as measures of balance.

Satisfying almost all the axioms and desirable properties, seems to measure something different from what is obtained using all cycles or all -cycles, and be worth pursuing in future. Note that equals the minimum number of unbalanced fundamental cycles [82]; suggesting a connection between the frustration and unbalanced cycles yet to be explored further. We recommend using for all graphs as long as their size allows computing (to be further discussed in Chapters \thechapter\thechapter). The optimisation models discussed in Chapters \thechapter\thechapter are shown to be capable of computing the frustration index in graphs with up to thousands of nodes and edges. For larger graphs, exact computation of would be time consuming and it can be approximated using a nonzero optimality gap tolerance with the optimisation models in Chapters \thechapter\thechapter. Alternatively, and seem to be the other options. Depending on the type of the graph, -cycles might not necessarily capture global structural properties. For instance, this would make an improper choice for some specific graphs like sparse 4-regular graphs (as in Subsection 5.2), square grids, and sparse graphs with a small number of 3-cycles. Similarly, is not suitable for graphs that have more than one connected component (including many sparse graphs).

Notes on previous work

In the literature, balance theory is widely used on directed signed graphs. It seems that this approach is questionable in two ways. First, it neglects the fact that many edges in signed digraphs are not reciprocated. Bearing that in mind, investigating balance theory in signed digraphs deals with conflict avoidance when one actor in such a relationship may not necessarily be aware of good will or ill will on the part of other actors. This would make studying balance in directed networks analogous to studying how people avoid potential conflict resulting from potentially unknown ties. Secondly, balance theory does not make use of the directionality of ties and the concepts of sending and receiving positive and negative links.

Leskovec, Huttenlocher and Kleinberg compare the reliability of predictions made by competing theories of social structure: balance theory and status theory (a theory that explicitly includes direction and gives quite different predictions) [94]. The consistency of these theories with observations is investigated through large signed directed networks such as Epinions, Slashdot, and Wikipedia. The results suggest that status theory predicts accurately most of the time while predictions made by balance theory are incorrect half of the time. This supports the inefficacy of balance theory for structural analysis of signed digraphs. For another comparison of the theories on signed networks, one may refer to a study of 8 theories to explain signed tie formation between students [138].

In a parallel line of research on network structural analysis, researchers differentiate between classical balance theory and structural balance specifically in the way that the latter is directional [23]. They consider another setting for defining balance where absence of ties implies negative relationships. This assumption makes the theory limited to complete signed digraphs. Accordingly, 64 possible structural configurations emerge for three nodes. These configurations can be reduced to 16 classes of triads, referred to as 16 MAN triad census, based on the number of Mutual, Asymmetric, and Null relationships they contain. There are only 2 out of 16 classes that are considered balanced. New definitions are suggested by researchers in order to make balance theory work in a directional context. According to Prell [119], there is a second, a third, and a fourth definition of permissible triads allowing for 3, 7, and 9 classes of all 16 MAN triads. However, there have been many instances of findings in conflict with expectations [119].

Apart from directionality, the interpretation of balance measures is very important. Numerous studies have compared balance measures with their extremal values and found that signed networks are far from balanced, for example [45]. However, with such a strict criterion, we must be careful not to look for properties that are almost impossible to satisfy. A much more systematic approach is to compare values of partial balance in the signed graphs in question to the corresponding values for reshuffled graphs [129, 130] as we have done in Section 9.

So far we formalised the notion of partial balance and compared various measures of balance based on their values in different graphs where the underlying structure was not important. We also evaluated the measures based on their axiomatic properties and ruled out the measures that we could not justify. In the next section, we focus on exploring real signed graphs based on the justified methods.

9 Results on real signed networks

(a) Highland tribes network (G1), a signed network of 16 tribes of the Eastern Central Highlands of New Guinea [121]
(b) Monastery interactions network (G2) of 18 New England novitiates inferred from the integration of all positive and negative relationships [123]
(c) Fraternity preferences network (G3) of 17 boys living in a pseudo-dormitory inferred from ranking data of the last week in [112]
(d) College preferences network (G4) of 17 girls at an Eastern college inferred from ranking data of house B in [92]
Figure 5: Four small signed networks visualised where dotted lines represent negative edges and solid lines represent positive edges

In this section, we analyse partial balance for a range of signed networks inferred from datasets of positive and negative interactions and preferences. Read’s dataset for New Guinean highland tribes [121] is demonstrated as a signed graph (G1) in Figure 5(a), where dotted lines represent negative edges and solid lines represent positive edges. The fourth time window of Sampson’s dataset for monastery interactions [123] (G2) is drawn in Figure 5(b). We also consider datasets of students’ choice and rejection (G3 and G4) [112, 92] as demonstrated in Figure 5(c) and Figure 5(d). The last three are converted to undirected signed graphs by considering mutually agreed relations. A further explanation on the details of inferring signed graphs from the choice and rejection data is provided in Appendix 11.3.

A larger signed network (G5) is inferred by [111] through implementing a stochastic degree sequence model on Fowler’s data on Senate bill co-sponsorship [56]. Besides the signed social network datasets, large scale biological networks can be analysed as signed graphs. There are relatively large signed biological networks analysed by [30] and [82] from a balance viewpoint under a different terminology where monotonocity is the equivalent for balance. The two gene regulatory networks we consider are related to two organisms: a eukaryote (the yeast Saccharomyces cerevisiae) and a bacterium (Escherichia coli). Graphs G6 and G7 represent the gene regulatory networks of Saccharomyces cerevisiae [29] and Escherichia coli [122] respectively. Note that the densities of these networks are much smaller than the other networks introduced above. In gene regulatory networks, nodes represent genes. Positive and negative edges represent activating connections and inhibiting connections respectively. Figure 6 shows the bill co-sponsorship network as well as biological signed networks. The colour of edges correspond to the signs on the edges (green for and red for ). For more details on the biological datasets, one may refer to [82].

(a) The bill co-sponsorship network (G5) of senators [111]
(b) The gene regulatory network (G6) of Saccharomyces cerevisiae [29]
(c) The gene regulatory network (G7) of the Escherichia coli [122]
Figure 6: Three larger signed datasets illustrated as signed graphs in which red lines represent negative edges and green lines represent positive edges

As Figure 6 shows, graphs G6 and G7 have more than one connected component. Besides the giant component, there are a number of small components that we discard in order to use and . Note that this procedure does not change and as the small components are all acyclic. The values of for giant components of G6 and G7 are and respectively.

The results are shown in Table 5. Although neither of the networks is completely balanced, the small values of suggest that removal of relatively few edges makes the networks completely balanced. Table 5 also provides a comparison of partial balance between different datasets of similar sizes. In this regard, it is essential to know that the choice of measure can make a substantial difference. For instance among G1–G4, under , G1 and G3 are respectively the most and the least partially balanced networks. However, if we choose as the measure, G1 and G3 would be the least and the most partially balanced networks respectively. This confirms our previous discussions on how choosing a different measure can substantially change the results and helps to clarify some of the conflicting observations in the literature [48, 88] and [45], as previously discussed in Section 8.

Graph:
G1: (16, 58, 29) 0.483 0.87 0.88 0.76 1.04 7
0.50 0.76 0.49 2.08 14.65
0.06 0.02 0.05 0.20 1.38
Z-score 6.04 5.13 5.54
G2: (18, 49, 12) 0.320 0.86 0.88 0.80 0.75 5
0.55 0.79 0.60 1.36 9.71
0.09 0.03 0.05 0.18 1.17
Z-score 3.34 3.37 4.03
G3:(17, 40, 17) 0.294 0.78 0.90 0.80 0.50 4
0.49 0.82 0.62 0.89 7.53
0.11 0.06 0.06 0.30 1.24
Z-score 2.64 1.32 2.85
G4: (17, 36, 16) 0.265 0.79 0.88 0.67 0.71 6
0.49 0.87 0.64 0.79 6.48
0.14 0.03 0.06 0.17 1.08
Z-score 2.16 0.50 0.45
G5: (100, 2461, 1047) 0.497 0.86 0.87 0.73 8.92 331
0.50 0.75 0.22 17.46 965.6
0.00 0.00 0.01 0.02 9.08
Z-score 118.5 387.8 69.89
G6:(690, 1080, 220) 0.005 0.54 1.00 0.92 0.02 41
0.58 1.00 0.77 0.02 124.3
0.07 0.00 0.01 0.00 4.97
Z-score 8.61 16.75
G7:(1461, 3215, 1336) 0.003 0.50 1.00 0.77 0.06 371
0.50 1.00 0.59 0.06 653.4
0.02 0.00 0.00 0.00 7.71
Z-score 3.11 36.64
Table 5: Partial balance computer for signed graphs (G1–7) and reshuffled graphs

In Table 5, the mean and standard deviation of measures for the reshuffled graphs , denoted by and , are also provided for comparison. We implement a very basic statistical analysis as in [129, 130] using and of 500 reshuffled graphs. Reshuffling the signs on the edges 500 times, we obtain two parameters of balance distribution for the fixed underlying structure. For measures of balance, Z-scores are calculated based on Eq. (18).

(18)

The Z-score shows how far the balance is with regards to balance distribution of the underlying structure. Positive values of Z-score for , , and can be interpreted as existence of more partial balance than the average random level of balance.

It is worth pointing out that the statistical analysis we have implemented is independent of the normalisation method used in and . The two right columns of 5 provide and alongside their associated Z-scores.

The Z-scores show that as measured by the frustration index and algebraic conflict, signed networks G1–G7 exhibit a level of partial (but not total) balance beyond what is expected by chance. Based on these two measures, the level of partial balance is high for graphs G1, G2, G5, G6, and G7 while the numerical results for G3 and G4 do not allow a conclusive interpretation. It indicates that most of the real signed networks investigated are relatively consistent with the theory of structural balance. However, the Z-scores obtained based on the triangle index for G6–G7 show totally different results. Note that G6 and G7 are relatively sparse graphs which only have 70 and 1052 triangles. This may explain the difference between Z-scores of and that of other measures. The numerical results using the algebraic conflict and frustration index support previous observations of real-world networks’ closeness to balance [48, 88].

10 Conclusion of Chapter \thechapter

In this chapter, we started by discussing balance in signed networks in Sections 2 and 3 and introduced the notion of partial balance. We discussed different ways to measure partial balance in Section 4 and provided some observations on synthetic data in Sections 5 and 6. After gaining an understanding of the behaviour of different measures, basic axioms and desirable properties were used in Section 7 to rule out the measures that cannot be justified.

We have discussed various methodologies and how they have led to conflicting observations in the literature in Section 8. Taking axiomatic properties of the measures into account, using the common cycle-based measures denoted by and and the walk-based measure is not recommended. and may introduce some problems, but overall using them seems to be more appropriate compared to , and . The observations on synthetic data taken together with the axiomatic properties, recommend as the best overall measure of partial balance. However, considering the difficulty of computing the exact value of for very large graphs (to be discussed in Chapters \thechapter\thechapter), one may approximate it using a nonzero optimality gap tolerance with exact optimisation-based computational models. Alternatively, and seem to be the other options accepting their potential shortcomings.

Using the three measures , , and , each representing a family of measures, we compared balance in real signed graphs and analogous reshuffled graphs having the same structure in Section 9. Table 5 provides this comparison showing that different results are obtained under different measures.

Returning to the questions posed at the beginning of this chapter, it is now possible to state that under the frustration index and algebraic conflict many signed networks exhibit a level of partial (but not total) balance beyond that expected by chance. However, the numerical results in Table 5 show that the level of balance observed using the triangle index can be totally different. One of the more significant findings to emerge from this chapter is that methods suggested for measuring balance may have different context and may require some justification before being interpreted based on their values. This chapter confirms that some measures of partial balance cannot be taken as a reliable static measure to be used for analysing network dynamics.

One gap in this chapter is that we avoid using structural balance theory for analysing directed networks, making directed signed networks like Epinions, Slashdot, and Wikipedia Elections [94, 45, 64] datasets untested by our approach. However, see our discussion in Section 8. Although a numerical part of this chapter is based on signed networks with less than a few thousand nodes, the analytical findings that were not restricted to a particular size suggest the inefficacy of some methods for analysing larger networks as well.

From a practical viewpoint, international relations is a crucial area to implement signed network structural analysis. Having an efficient measure of partial balance in hand, international relations can be investigated in terms of evaluation of partial balance over time for networks of states (to be discussed in Chapter \thechapter).

11 Appendix

11.1 Details of calculations

In order to simplify the sum , one may add the two following equations and divide the result by 2:

(19)

In , a -cycle is specified by choosing vertices in some order, then correcting for the overcounting by dividing by (the possible directions) and (the number of starting points, namely the length of the cycle). If the unique negative edge is required to belong to the cycle, by orienting this in a fixed way we need choose only further elements in order, and no overcounting occurs. The numbers of negative cycles and total cycles are as follows.

(20)

Asymptotic approximations for these sums can be obtained by introducing the exponential generating function. For example, letting

we have

Similarly we obtain

Standard singularity analysis methods [53] show the denominator of the expression for to be asymptotic to while the number of negative cycles is asymptotic to . Similarly the weighted sum defining , where we choose , can be expressed using the ordinary generating function, which for the denominator turns out to be

Again, singularity analysis techniques yield an approximation . The numerator is easier, and asymptotic to . This yields the result.

The unsigned adjacency matrix of the complete graph has the form where E is the matrix of all ’s. The latter matrix has rank 1 and nonzero eigenvalue . Thus has eigenvalues (with multiplicity 1) and (with multiplicity ). The matrix has a similar form and we can guess eigenvectors of the form and . Then satisfies a quadratic . Solving for and the corresponding eigenvalues, we obtain eigenvalues (with multiplicity