The HIM glocal metric and kernel for
network comparison and classification
Abstract
Due to the ever rising importance of the network paradigm across several areas of science, comparing and classifying graphs represent essential steps in the networks analysis of complex systems. Both tasks have been recently tackled via quite different strategies, even tailored adhoc for the investigated problem. Here we deal with both operations by introducing the HammingIpsenMikhailov (HIM) distance, a novel metric to quantitatively measure the difference between two graphs sharing the same vertices. The new measure combines the local Hamming distance and the global spectral IpsenMikhailov distance so to overcome the drawbacks affecting the two components separately. Building then the HIM kernel function derived from the HIM distance it is possible to move from network comparison to network classification via the Support Vector Machine (SVM) algorithm. Applications of HIM distance and HIM kernel in computational biology and social networks science demonstrate the effectiveness of the proposed functions as a general purpose solution.
The HIM glocal metric and kernel for
network comparison and classification
G. Jurman^{†}^{†}thanks: Corresponding author , R. Visintainer, M. Filosi, S. Riccadonna, C. Furlanello Fondazione Bruno Kessler, Trento, Italy CIBIO, University of Trento, Italy Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy {jurman,visintainer,filosi,furlan}@fbk.eu samantha.riccadonna@fmach.it
1 Introduction
The arising prevalence of the network paradigm [1] as the elective model for complex systems analysis in different workfields has strongly contributed in stimulating graph theoretical techniques in the recent scientific literature. Methods based on graph properties have spread through the static and dynamic analysis of different economical, chemical and biological system, computer networking, social networks and neuroscience. As a relevant example, it is worthwhile mentioning the rapid diffusion, in computational biology, of the differential network analysis [2, 3, 4, 5, 6, 7, 8, 9, 10]. In particular, two key tasks constitute the backbone of most of the aforementioned analysis techinques, namely network comparison and network classification, and they both rely on the basic idea of measuring the similarity between two graphs.
Network comparison consists in the quantification of the difference between two homogeneous objects in some network space, while the aim of network classification is to predictively discriminate graphs belonging to different classes, for instance by means of machine learning algorithms. Network comparison has its roots in the quantitative description of main properties of a graph (e.g., degree distribution), which can be encoded into a feature vector [11], thus providing a convenient representation for classification tasks (see for instance [12] for a very recent approach). As a major alternative strategy, one can adopt a direct comparison method stemming from the graph isomorphism problem, by defining a suitable similarity measure on the topology of the underlying (possibly directed and/or weighted) graphs. This line of study dates back to the 70âs with the theory of graph distances, regarding both inter and intragraphs metrics [13]. Since then, a wide range of similarity measures has been defined, based on very different graph indicators. To mention some of the most important metrics, we list the family of edit distances, evaluating the minimum cost of transformation of one graph into another by means of the usual edit operations (insertion and deletion of links), the family of common network subgraphs, looking for shared structures between the graphs and the family of spectral measures, relying on functions of the eigenvalues of one of the graph connectivity matrices. Similarly, graph classification can be tackled by a number of different techniques, for instance nearest neighbours on Euclidean distance of the features’ vectors of the graphs [14, 15, 16], or Support Vector Machine with the graph Laplacian as a regularization term [17], or via different subgraphbased lerning algorithms [18]. However, in general the most efficient techniques use a kernel machine, where the kernel itself corresponds to a scalar product (and hence a distance) in a suitable Hilbert space [19, 20, 21, 22, 23, 24, 25, 26, 27]. For more recent advances, we cite the WeisfeilerLehman graph kernel [28], and its use in neuroimaging classification for discriminating mild cognitive impairment from Alzheimer’s disease [29]. This last citation stands as an example of the increasing interest for these techniques recently appearing in neurosciences [30, 31].
In the present work we propose a novel solution to both the comparison and the classification tasks by introducing the novel HIM metric for comparing graphs (even directed and weighted) and a graph kernel induced by the HIM measure. The HIM distance is defined as the oneparameter family of product metrics linearly combining – by a nonnegative real factor – the normalized Hamming distance H [32, 33, 34, 35] and the normalized IpsenMikhailov distance IM [36]; the product metric is normalized by the factor to set its upper bound to 1. In absence of a gold standard driving the search for the optimal weight ratio, we decided for an equal contribution from the two components as the most natural choice. The Hamming distance is the simplest member of the family of edit distances, evaluating the occurrence of matching links in the compared networks: by definition, it is a local measure of dissimilarity between graphs, because it only focusses on the links as independent entities, disregarding the overall structure. On the other hand, the spectral distances are global measures, evaluating the differences between the whole network structures: however, they cannot discriminate between isospectral nonidentical graphs: for a recent spectral approach, see [37]. In the comparative review [38], the properties of the existing graph spectral distances were studied, and the IpsenMikhailov metric emerged as the more reliable and stable. The combination of the two components within a single metric allows overcoming their drawbacks and obtaining a measure which is simultaneously global and local. Moreover, the imposed normalization limits the values of the HIM distance between zero (reached only by comparing identical networks) and one (attained when comparing a clique and the empty graph), regardless of the number of vertices. Finally, the HIM distance can also be applied to multilayer networks [39, 40], since a rigorous definition of their Laplacian has just been proposed [41, 42]. By a Gaussianlike map [43], the HIM distance generates the HIM kernel. Plugging the HIM kernel [44] into a Support Vector Machine gives us a classification algorithm based on the HIM distance, to be used as is or together with other graph kernels in a MultiKernel Learning framework to increase the classification performance and to enhance the interpretability of the results [45]. Note that, although positive definiteness does not hold globally for the HIM kernel, this property can be guaranteed on the given training data, thus leading to positive definite matrices suitable for the convergence of the SVM optimizer.
To conclude with, we present some applications of the HIM distance and the HIM kernel to some real datasets belonging to different areas of science. These examples support the positive impact of the HIM suite as general analysis tool whenever it is required to extract information from the quantitative evaluation of the difference among diverse instances of a complex system.
We also provide for analysis the R [46] package nettools including functions to compute the HIM distance. The package is provided as a working beta version and it is accessible on GitHub at https://github.com/filosi/nettools.git. To reduce computing time, the software can be used on multicore workstations and on high performance computing (HPC) clusters.
2 The HIM family of distances
2.1 Notations
Let and be two simple networks on nodes, described by the corresponding adjacency matrices and , with , where for unweighted graphs and for weighted networks. Let then be the identity matrix , let be the unitary matrix with all entries equal to one and let be the null matrix with all entries equal to zero. Denote then by the empty network with nodes and no links (with adjacency matrix ) and by the clique (undirected full network) with nodes and all possible links, whose adjacency matrix is . For an undirected network, its adjacency matrix is symmetric. For a directed network , following the convention in [47], a link is represented by setting in the corresponding adjacency matrix , which thus is, in general, not symmetric.
For instance, the matrix represents the full directed network , with all possible directed links .
2.2 The Hamming distance
The Hamming distance is one of the most common dissimilarity measures in coding and string theory, recently used also for (biological) network comparison [32, 33, 35, 34]. Since the Hamming measure basically evaluates the presence/absence of matching links on the two networks being compared, it has a simple expression in terms of the neworks’ adjacency matrices. This is not the case for many other members of the edit distance family, whose computation is known to be a NPhard task. The definition of the normalized Hamming distance H is in fact the following:
(1) 
where the normalization factor bound the range of the function H in the interval . The lower bound is attained only for identical networks , while the upper bound is reached whenever the two networks are complementary . When and are unweighted networks, is just the fraction of different matching links over the total number of possible links between the two graphs.
2.3 The IpsenMikhailov distance
Originally introduced in [36] as a tool for network reconstruction from its Laplacian spectrum, the definition of the IpsenMikhailov IM metric follows the dynamical interpretation of an nodes network as an molecules system connected by identical elastic strings as in Fig. 1(ab), where the pattern of connections is defined by the adjacency matrix of the corresponding network.
(a)  (b)  (c) 
The dynamical system is described by the set of differential equations
(2) 
We recall that the Laplacian matrix of an undirected network is defined as the difference between the degree and the adjacency matrices , where is the diagonal matrix with vertex degrees as entries. is positive semidefinite and singular [48, 49, 50, 51], so its eigenvalues are . The vibrational frequencies for the network model in Eq. 2 are given by the square root of the eigenvalues of the Laplacian matrix of the network: , with . In [48], the Laplacian spectrum is called the vibrational spectrum. Estimates (actual and asymptotic) of the eigenvalues distribution are available for complex networks [52], while the relations between the spectral properties and the structure and the dynamics of a network are discussed in [53, 54, 55]. The spectral density for a graph as the sum of Lorentz distributions is defined as
where is the common width and is the normalization constant defined by the condition , and thus
The scale parameter specifies the halfwidth at halfmaximum, which is equal to half the interquartile range. An example of Lorentz distribution for two networks is shown In Fig. 5. Then the spectral distance between two graphs and on nodes with densities and can then be defined as
The highest value of is reached, for each , when evaluating the distance between and . Denoting by the unique solution of
(3) 
the normalized IpsenMikahilov distance between two undirected (possibly weighted) networks can be defined as
(4) 
so that IM is bounded between 0 and 1, with upper bound attained only for . A detailed description of the uniqueness of the solution of Eq. 3 is described in Appendix A. Isospectral networks (and thus also isomorphic networks) cannot be distinguished by this class of measures, so this is a distance between classes of isospectral graphs. Although the number of isospectral networks is negligible for large number of nodes [56], their fraction is relevant for smaller networks. The case of directed networks is discussed in a later paragraph.
2.4 The HIM distance
Consider now two copies of the space of all simple undirected networks on nodes, and endow the first copy with the Hamming metric H and the second copy with the IpsenMikhailov distance IM. Then the two obtained pairs and are metric spaces. Define now on their Cartesian product the oneparameter HIM function as the (Euclidean) product metric [57] combining H and IM, normalized by the factor , for . Via the natural correspondence of the same network in the two spaces, the HIM function becomes a distance on :
(5) 
where in what follows we will omit the subscript when it is equal to one. Obviously, and (see Fig. 1(c)); apart from values of close to the bounds where the prevalence of one of the factors becomes dominant, the qualitative impact of is minimal in practice when using as a distance. In what follows, when no a priori hypothesis supports unbalancing the metric towards one of the two components, will be assumed. However, the impact of is definitely more relevant when is used to generate a kernel function to be used for classification purposes, as we will show in a later section. The metric is bounded in the interval , with lower bound attained for every couple of identical networks, and upper bound attained only on the pair . Moreover, all distances will be nonzero for nonidentical isomorphic/isospectral graphs.
Consider now the Hamming/IpsenMikhailov (H/IM) space, where a point has coordinates , and the distance of from the origin is . If we (roughly) split the Hamming/IpsenMikhailov space into four main zones I,II,III,IV as in Fig. 2, two networks whose distances correspond to a point in zone I are quite close both in terms of matching links and of structure, while those falling in the zone III are very different with respect to both measures. Networks corresponding to a point in zone II have many common links, but their structure is rather different (for instance, they have a different number of connected components), while a point in zone IV indicates two networks with few common links, but with similar structure (e.g., isospectral nonidentical graphs). In Fig. 2 we show some examples of points in the Hamming/IpsenMikhailov space.
2.5 The directed network case
In this situation, the connectivity matrices are not symmetric, thus the Laplacian spectrum lies in . Hence, computing the IpsenMikhailov distance would require extending the Lorentzian distribution to the complex plane. A simpler solution can be obtained by transforming the directed network into an undirected (bipartite) one , as in [47]. For each node in , the graph has two nodes and (where I and O stand for In and Out respectively) and for each directed link in there is a link in . If the adjacency matrix for is , the corresponding matrix for is , with respect to the node ordering . An example of the above transformation is shown in Fig. 3.
Thus it is possible to define as after substituing the normalizing factors and with the corresponding and derived by imposing the conditions and , so that by using Eq. (5). It is immediate to compute , while can be numerically computed as for : details are given in Appendix B.
3 The HIM kernel
Following [43], a kernel can be naturally derived from a distance by means of a Gaussian (Radial Basis Function) map (see also [58]). Thus, given two graphs and on the same nodes and a positive real number , the HIM kernel can be defined as
Whenever a novel kernel is introduced, one has to check whether it is positively defined.
A function is a kernel of condionally negative type if

;

;

.
A variant of Schoenberg’s theorem [59] (proved in [60, 61]) states that
Theorem 3.1
For a function , the following are equivalent:

is of conditionally negative type;

is a positive semidefinite kernel for all .
The above theorem describes the correspondence between negativetype distances and positive definite kernel, which is also equivalent to embeddability [62]. Hence, is a positive semidefinite kernel if and only if is a symmetric function of conditionally negative type. Although the square of many distances are condionally negative type functions, cannot be proven to be of conditionally negative type (actually, it is probably not of negative type, as it is the case for many edit distances [43, 63, 64, 65, 66], the HIM kernel is not positively defined in general for all . Nevertheless, this problem can be overcomed by using Prop. 1.3.4 in [67] (see also [58, 65]):
Theorem 3.2
Suppose the data and the kernel ) are such that the matrix
is positive. Then it is possible to construct a map into a feature space such that
Conversely, for a map Î¦ into some feature space , the matrix is positive.
Note that Th. 3.2 does not even require to belong to a vector space. This theorem implies that, even though the kernel is not positive definite, it is still possible to use it in Support Vector Machines or other algorithms requiring to correspond to a dot product in some space if the kernel matrix is positive for the given training data. This condition can be obtained by choosing a suitable value of : in the experiments shown hereafter, the HIM kernel is always positively defined on the given training data, leading to positive definite matrices, and thus posing no difficulties for the SVM optimizer, as in [68].
4 Applications
4.1 A minimal example
Consider the two networks with corresponding adjacency matrices shown in Fig. 4. The Hamming distance between and is
From the spectral point of view, the corresponding Laplacian matrices and eigenvalues are
From the above spectra, we can compute the corresponding Lorentz distributions , where : their plots are shown in Fig. 5.
The resulting IpsenMikhailov distance is
so that the HIM distance results
The situation can be graphically represented as in Fig. 5: the two networks are quite different in terms of matching links, but their structures are not so diverse.
4.2 Small networks
Fixed the number of nodes , there are exactly different simple undirected unweighted networks, which can be grouped into isomorphism classes. As anticipated before, isomorphic graphs cannot be distiguished by spectral metrics, while their mutual Hamming distances are non zero, since their links are in different positions. As an example, for there are 8 networks grouped in 4 isomorphism classes, for there are 11 isomorphism classes including a total of 64 graphs and for 34 classes with 1024 networks (for , the number of classes is respectively 156 e 1044).
To give an overview of a broader situation, we compute a number of mutual distances between networks with a given number of nodes (all possible couples for and a subset of them for ) and we display the results in Fig. 6. To select a good range of variability for the networks with 15 nodes, we select the empty graph, the full graph (with 105 nodes) and 10 different graphs with edges each, for .
(a)  (b)  (c)  (d) 
As shown by the plots, all possible situations can occur, apart from points in the northwest corner of zone II which are the rarest. For instance, the point in Fig. 6(b) corresponds to 6 different pairs of networks with nodes with maximal Hamming distance and minimal spectral distance: as an example, we show one of these pairs in Fig. 7.
4.3 Comparison with Matthews Correlation Coefficient
When assessing performances in a link prediction task (for instance, in the series of DREAM challenges [69, 70, 71]), the standard strategy following the machine learning approach, is to rely on functions of the confusion matrix, i.e., the table collecting the number of correct and wrong predictions with respect to the ground truth. Classical measures of this kind are the pairs Sensitivity/Specificity and Precision/Recall, and the derived Area Under the Curve.
A reliable alternative is the Matthews Correlation Coefficient (MCC for short) [72], summarizing into a single value the confusion matrix of a binary classification task. This is a measure of common use in the machine learning community [73], recently accepted as an effective metric also for network comparison [74, 75]. Also known as the coefficient, for a contingency table MCC corresponds to the square root of the average statistic
where is the total number of observations. In the binary case of two classes positive P and negative N, for the confusion matrix , where T and F stand for true and false respectively, the Matthews Correlation Coefficient has the following shape:
MCC lives in the range , where is perfect classification, is reached in the complete misclassification case while corresponds to coin tossing classification, and it is invariant for scalar multiplication of the whole confusion matrix.
Here we want to provide a quick comparison of MCC and HIM distances in a few cases. First of all, some considerations on the extreme cases:

only for , which has .

only for ; in this case, for , and in all other cases.

only for , and thus HIM=0.

The two values or can correspond to a landscape of quite different pairs of networks, for which the HIM distance can assume very diverse values.
To investigate the last case in the above list, we randomly generated 250,000 pairs of networks of different size, and we compared the MCC with the H, IM and HIM distances: the corresponding scatterplots are shown in Fig. 8. Since MCC is a similarity measure, for a direct comparison we displayed it as the normalized dissimilarity measure .
As predictable, since the confusion matrix is unaware of the network structure but it takes into account only matching and mismatching links, the MCC is well correlated with the Hamming distance (Pearson Coefficient PC=0.92) and poorly correlated with the IpsenMikhailov distance (PC=0.01), resulting in an good global correlation with the HIM distance (PC=0.79). Nonetheless, the plots in Fig. 8 show that the relevant variability of one measure for a given value of the other one supports the claim of a strong independency between MCC and HIM. Finally, as an example giving a quantitative basis to the last claim of the above list, for all the pairs with we obtain values of HIM ranging in , with median 0.37 and mean 0.39, while when the range of the HIM values is , with mean and median equal to 0.74.
4.4 Dynamical networks
In what follows we show the evolution of the Hamming, the IpsenMikhailov and the HIM distance during the evolution of the following dynamical processes moving through consecutive steps:

Random Addition is obtained from by randomly adding a not already present link.

Random Removal is obtained from by randomly removing an existing link.

Sequential Addition is obtained from by adding a new link in the same row and in the next available column of the last added link, if possible, or in the following row, starting from the first available column. The whole process starts from the first available row with the smallest index. As an example, if , then the process evolves inserting ones in the adjacency matrix following the sequence in .

Sequential Removal : as in , but removing one link at each step.

Highest Degree Addition is obtained from by adding a previously not existing link connecting the node with the highest degree.

Highest Degree Removal is obtained from by removing an existing link connecting the node with the highest degree.
As a first example, consider the processes and with the empty graph as starting network . They both end at the Nnodes clique after steps: . The corresponding inverse processes and evolve in the opposite direction: and . In Fig. 9 we show the curves of for =H, IM and HIM in the cases 10, 25 and 100 nodes.
For the representation of the curves, we use two different spaces: the already introduced Hamming/IpsenMikhailov space, with the metric H on the axis and the metric IM on the axis, and the Fractionofnodes/HIM space, with the ratio between the number of newly added or removed links over the total number of links on the axis and the HIM distance on the axis; in this representation, the th step of a process has coordinates . In all cases, since one edge is removed or added at each step, in both spaces the evolution of the processes proceeds from left to right in both graphs, and the trend of the curves representing the distances of the same process in the Hamming/IpsenMikhailov space and in the Fractionofnodes/HIM space are similar, varying only for a scaling factor.
(a)  (b)  (c) 
For the random processes and we show the means of the distances computed on 100 runs; no standard deviation or confidence intervals are plotted, because they are negligible at the scale of the plot. For instance, in the case 25, the order of magnitude of the standard deviation for HIM at each step is , and the span of the 95% boostrap confidence intervals is in the range of . As a first observation, all curves are monotonically increasing and the bigger the graph, the larger the distances, but in the emptytoclique case, where, in the second half of the process, induces distances which are smaller than and which are smaller for larger graphs. The most interesting observation is the different shape of the curves between the emptytoclique process and the cliquetoempty: for the same Hamming distance (or fraction of links), the corresponding IpsenMikhailov (or HIM, respectively) distance is larger when the nodes are added rather than removed, because adding links quickly generates degree correlation. Furthermore, in the emptytoclique case, not much difference occurs between the random and the sequential process, while this difference is much wider (with the random one inducing larger distances) for the cliquetoempty case.
(a)  (b)  (c)  (d) 
An analogous experiment was carried out within the family of Poissonian graphs, with ErdösRényi model [76, 77] . In particular, for 10, 25 and 100, let be a sparse network , with 2, 11 and 230 edges respectively and let be a dense network , with 39, 275 and 4462 edges respectively. Consider the following four processes, of which we represent the initial steps in Fig. 10:

, for , with , for =10, 25, 100.

, for , with , for =10, 25, 100.

, for , with , for =10, 25, 100.

, for , with , for =10, 25, 100.
In this case, too, results on the random processes are averaged over 100 runs, with negligible confidence intervals. To better highlight the differences of the resulting distances in the various processes, in Fig. 10 (c) and (d) we show the ratio of some pairs of HIM distances as a function of the removed/added links. In particular, in subfigure (c), for each step , we show the quotient of HIM distances for over , in the three cases 10, 25 and 100. The three curves show that HIM distances for are larger than the HIM distances for for =25 and 100, and their difference is higher in the first steps of the process , while they tend to get closer as far as the processes evolve. In the other cases and (not shown here), the differences are smaller and they converge faster to one, but in this case the process accounts for the smaller values of HIM distances. In the plot (c) of Fig. 10, we show the curves for as a function of . All the three curves are monotonically decreasing and converging to one after the first stages of the processes, yielding that, for all values of , adding links produces higher values of HIM distance. In the case of the evolution targeting higher degree nodes first (not shown here), the trend is the same, only scaled down to smaller ratios.
(a)  (b)  (c)  (d) 
The final examples consider processes having scale free networks as starting graphs. For =10, 25 and 100, let be a scale free sparse network generated following the AlbertBarabasi model [78], with power law exponent 2.3 and with 9, 24 and 99 edges rispectively, and a dense network with the same exponent 2.3 but with 35, 300 and 4150 edges respectively. The same four processes of the previous case were tested for the initial steps:

, for , with , for =10, 25, 100.

, for , with , for =10, 25, 100.

, for , with , for =10, 25, 100.

, for , with , for =10, 25, 100.
The corresponding curves are plotted in Fig. 11 (a) and (b). We recall here that scalefree networks are not invariant for percolation, i.e., they do not remain scalefree when links are randomly removed or added. However, the evolution of the processes is not very different from the ErdösRényi case, especially for the processes removing links as shown in Fig. 11, panel (b). Some differences emerge for the processes adding links, and a few peculiarities that are also present in the Poissonian case here become more evident. In particular, for all and for both and the derivative of the curves are larger than those in panel (b), and it is not true anymore that the larger the number of nodes, the larger the distances. For instance, in the case , both the processes quickly modify the network structure, resulting in a fast increment of the IpsenMikhailov distance for , while later the curves grow at a much smaller rate. To better study this behaviour, a larger starting network was generated following the scale free model in [79], with 200 nodes and 1000 edges, power law exponent 2.001 and degree distribution as in the histogram of Fig. 11, panel (c). The following processes were started from , and they were carried on until they reach either the empty network or the clique:

, with and .

, with and .

, with and .

, with and .
The curves corresponding to HIM distances from in the four aforementioned processes are plotted in Fig. 11, panel (d), versus the percentage of progress of the process, i.e., , with and , . The HIM distance for the processes and are monotonically and similarly increasing when evolving from to , slower at the beginning and much faster in the last steps of the process. The two other processes instead show the same effect previously noted: and change rapidly in the initial 10% of the processes, yielding a fast increase in the IpsenMikhailov distance, due to the quick modification in the network structure. After this initial period, the growth of both curves proceed with a smaller derivatives until they reach their maximum at the end of the process.
(a)  (b)  (c)  (d) 
4.5 Graph families
In this section we investigate the distribution of the distances from the empty network of a set of graphs randomly extracted from five families. In particular, for each = 10, 20, 50, 100 and 1000 we extracted 1000 networks on nodes from each of the following class of graphs:

BA BarabasiAlbert model [78], with power of preferential attachment extracted from the uniform distribution between 0.1 and 10.

WS WattsStrogatz model [80], with neighborhood within which the vertices of the lattice will be connected uniformly sampled in and rewiring probability extracted from the uniform distribution between 0.1 and 0.9.

PL Scalefree random graphs from vertex fitness scores [79], with number of edges uniformly sampled between 1 and and power law exponent of the degree distribution extracted from the uniform distribution between 2.005 and 3.

KR Random regular graphs, with all possible values of node degree.
In Tab. 2 we list mean and standard deviation of for all combinations of node size and network type: note that we do not report the corresponding median, because its distance from the mean is always smaller than 0.02 nor the bootstrap confidence intervals, whose range is always smaller than 0.02 from either side of the mean. In Fig. 12 we also show the corresponding boxplots, while in Fig. 13(a) we display the scatterplot in the Hamming/IpsenMikhailov space of all the aforementioned distances. In the Hamming/IpsenMikhailov space all the BA nets are confined in the narrow rectangle , while all other classes of graphs span a much wider area. In particular, the points corresponding to distances of the PL nets occupy densely all the upper left triangle of the H/IM plane, and the same happens, with , also for the ER networks, while WS and KR points lie in the upper rectangle . Thus, different PL networks show very different stucture, while the BA nets are very homogeneous. Notably, no point occurs in the lower right corner of the H/IM space. Moreover, in average, the standard deviation decreases inversely with the network size, showing larger homogeneity in bigger networks.
BA  PL  
ER  KR  
WS  All 
Finally, to better highlight the difference among the diverse families, we randomly extracted 100 networks with 100 nodes for the four families BA, ER, WS and PL and we computed the mutual distances between all possible pairs of these 400 graphs. A few statistics of these HIM distances are reported in Tab. 1, while the planar multidimensional scaling plot [81] is displayed in Fig. 13(b). Apart from the PL networks, the three families BA, ER and WS can be mutually well separated as shown in the multidimensional plot; moreover, the graphs in the BA and in the WS families are mutually quite similar, as supported by the small interclass mean HIM distance. On the other side, the PL networks have essentially the same distance from all other groups, so they cannot be easily distiguished.
BA  ER  WS  PL  

BA  0.05 (0.06)  0.69 (0.10)  0.47 (0.13)  0.65 (0.17) 
ER  0.50 (0.13)  0.56 (0.15)  0.51 (0.14)  
WS  0.29 (0.12)  0.56 (0.18)  
PL  0.50 (0.17) 
N  10  20  50  100  1000  All  

T  
BA  0.53  0.01  0.51  0.01  0.50  0.02  0.49  0.02  0.50  0.02  0.51  0.02 
ER  0.69  0.18  0.73  0.12  0.77  0.09  0.77  0.08  0.76  0.08  0.74  0.12 
WS  0.91  0.14  0.76  0.15  0.64  0.08  0.62  0.06  0.62  0.05  0.71  0.16 
PL  0.72  0.19  0.72  0.19  0.75  0.15  0.74  0.14  0.72  0.11  0.73  0.16 
KR  0.60  0.11  0.54  0.10  0.50  0.05  0.49  0.00  0.48  0.00  0.52  0.08 
All  0.69  0.19  0.65  0.17  0.63  0.15  0.62  0.14  0.62  0.13  0.64  0.16 
(a)  (b) 
4.6 The D. melanogaster development dataset
In [82], the authors used the Keller algorithm to infer the gene regulatory networks of Drosophila melanogaster from a time series of gene expression data measured during its full life cycle, originally published in [83]. They followed the dynamics of 588 development genes along 66 time points spanning through four different stages (Embryonic – time points 130, Larval – t.p. 3140, Pupal – t.p. 4158, Adult – t.p. 5966), constructing a time series of inferred networks , publicly available at http://cogitob.ml.cmu.edu/keller/downloads.html. Hereafter we evaluate the structural differences between and the initial network , as measured by the HIM distance: the resulting plot is displayed in Fig. 14. The largest variations, both between consecutive terms and with respect to the initial network , occur in the embrional stage (E): in particular, the HIM distance grows until time points 23, then the following networks start getting closer again to , showing that the interactions of the selected 588 genes in the adult stage are more similar to the corresponding net of interaction in the embrional stage, rather than in the other two stages. Moreover, while Hamming distance ranges between and , the IpsenMikhailov distance has as its maximum, indicating an higher variability of the networks in terms of structure rather than matching links. Finally, using a Support Vector Machine with HIM kernel built in the kernlab package in R, a 5fold Cross Validation with and reached accuracy 0.97 in discriminating Embryonic and Adult networks from Larval and Pupal, while, in the same setup, we reach perfect separation between Embryonic and Adult stages for all values of larger than 1000.
(a)  (b)  (c) 
4.7 The HCC dataset
Publicly available at the Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo, at the Accession Number GSE6857, the HepatoCellular Carcinoma (HCC) dataset [84, 85] collects 482 tissue samples from 241 patients affected by HCC, a wellstudied pathology [86, 87] where the impact of microRNA (miRNA) is notably relevant [88, 89]. For each patient, a sample from cancerous hepatic tissue and a sample from surrounding noncancerous hepatic tissue are available, hybridized on the Ohio State University CCC MicroRNA Microarray Version 2.0 platform collecting the signals of 11,520 probes of 250 nonredundant human and 200 mouse miRNA. After a preprocessing phase including imputation of missing values [90] and discarding probes corresponding to nonhuman (mouse and controls) miRNA, we consider the dataset HCC of 240+240 paired samples described by 210 human miRNA, with the cohort consisting of 210 male and 30 female patients. We thus parted the whole dataset HCC into four subsets combining the sex and disease status phenotypes, collecting respectively the cancer tissue for the male patients (MT), the cancer tissue for the female patients (FT) and the corresponding two datasets including the non cancer tissues (MnT, FnT). Then we first generated the four coexpression networks on the 210 miRNA as vertices, inferred via absolute Pearson’s correlation and corresponding to the combinations of the two binary phenotypes, and we computed all mutual HIM distances. In particular, to show the possible effects due to the different sample size, we computed 30 instances of the MT and MnT networks, inferred using only 30 matching samples and then averaging all the mutual HIM distances. One instance of MT and MnT is displayed as an hairball in FruchtermanReingold layout [91] together with the nets FT and FnT. The corresponding twodimensional scaling plot [81] in the right panel of the Figure 15. The four networks are widely separated, with orthogonal separations for the two phenotypes, but the values of the HIM distances between the network support the known different development of HCC in male and female: for instance, the FT network is closer to the MnT net (HIM=0.08), rather than to the MT and FnT (HIM=0.13 and 0.16, respectively). Note that the largest distance (HIM=0.23) is detected between the two nontumoral networks MnT, FnT.
MT a  MnT g  
FT a  FnT g 
4.8 The Gulf Dataset
Part of the Kansas Event Data System, available at http://vlado.fmf.unilj.si/pub/networks/data/KEDS/, the Gulf Dataset collects, on a monthly bases, political events between pairs of countries focusing on the Gulf region and the Arabian peninsula for the period 15 April 1979 to 31 March 1999, for a total of 240 months. Political events belong to 66 classes (including for instance ”pessimist comment”, ”meet”, ”formal protest”, ”military engagement”, etc.) and involve 202 countries. This dataset formally translates into a time series of 240 unweighted and undirected graphs with 202 nodes, for which we computed all the mutual HIM distances. These distances are then used to project the 240 networks on a plane through a multidimensional scaling [81]: the resulting plot is displayed in Fig. 16. The months corresponding to the First Gulf War months (July 1990  April 1991) are close together and confined in the lower left corner of the plane, showing both a mutual high degree of homogeneity and, at the same time, a relevant difference to the graphs of all other months. This shows that, at the onset of the conflict, the diplomatic relations worldwide changed consistently and their structure remained very similar throughout the whole event. Note that the blue point (closer to the warlike period) corresponds to February 1998, the time of Iraq disarmament crisis: Iraqi President Saddam Hussein negotiates a deal with U.N. Secretary General Kofi Annan, allowing weapons inspectors to return to Baghdad, preventing military action by the United States and Britain.
4.9 The International Trade Network data
As an application of the HIM distance on directed and weighted networks, we show four examples based on the International Trade Network (ITN) data, version 4.1, by Gledisch [94] available at http://privatewww.essex.ac.uk/â¼ksg/exptradegdp.html, collecting estimates of trade flows between independent states (19482000) and GDP per capita of independent states (19502000). As noted by [95], due to differences in reporting procedures between countries, incongruences occur between exports from to and imports from to : to avoid this issues, in our analysis we only use the figures reported as export in the dataset.
In what follows, we extract four sets of countries, and we study the evolution of their trade subnetworks during the aforementioned period. In each example, chosen the set of countries , we construct, for every year, the weighted directed network having as nodes. A link between country and country represents the export from to , and its weight corresponds to the volume of the export flow. Then we compute all mutual HIM distances among these networks, first rescaling link weights in the unit interval. Finally, using these HIM distances we construct a planar classical Multidimensional Scaling plot, transforming the networks in a set of points such that the distances between the points are approximately equal to the mutual HIM dissimilarities, using the methods in [96, 97, 98, 81] as implemented in R. The aim here is to connect the structural changes in yearly trade networks with time periods and events having a role in explaing such changes. Note that in [95], the authors show that bilateral trade fulfills fluctuationresponse theorem [99], stating that the average relative change in import (export) between two countries is a sum of relative changes in their GDPs. This result yields that directed connections, i.e., bilateral trade volumes, are only characterized by the product of the trading countriesâ GDPs.
As a first example we present the BRICS countries case. Introduced in 2001, the acronym BRICS collects the five nations Brazil, Russia, India, China and South Africa (Fig. 17(a)) which, although developing or newly industrialized countries, are distinguished by their large and fastgrowing economies and by their significant influence on regional and global affairs. To such aim, in Fig. 17 we show the bidimensional scaling of their trade networks for the years 1950–2000, with the HIM matrix as the distance constraint. As shown by the plot, three groups of years can be clearly divided, thus yielding that the corresponding networks are similar within each group, but diverse across different groups: the early years recovering after WWII (until about 1963), the seventies and eighties, where the economies of the involved countries started to develop, and the nineties, where their growth begun to accelerate.
BRA RUS IND CHN SAF  
(a)  (b) 
A very similar situation occurs in the regional trade network among the South American countries (Fig. 18), where the global behaviour is essentially controlled by the two local giants Brazil and Argentina, and for which the larger differences between the nets can be appreciated between the economic growth of the 90s and the suffering economies in the late 70s / early 80s due to the struggling political situations.

Not much different is the case of the larger trade subnetworks of the top 20 world economies ranked by Gross Domestic Product 2012 (PPP) (Top20 for short) as listed by the World Bank http://data.worldbank.org and shown in Fig. 19, with the notable difference that the networks for the 60s are more homogeneous to those of the 70s and 80s, supporting a faster recovery of these economies after WWII than the BRICS or t