A Quotient Space Formulation for Statistical Analysis of Graphical Data
Abstract
Complex analyses involving multiple, dependent random quantities often lead to graphical models – a set of nodes denoting variables of interest, and corresponding edges denoting statistical interactions between nodes. To develop statistical analyses for graphical data, one needs mathematical representations and metrics for matching and comparing graphs, and other geometrical tools, such as geodesics, means, and covariances, on representation spaces of graphs. This paper utilizes a quotient structure to develop efficient algorithms for computing these quantities, leading to useful statistical tools, including principal component analysis, linear dimension reduction, and analytical statistical modeling. The efficacy of this framework is demonstrated using datasets taken from several problem areas, including alphabets, video summaries, social networks, and biochemical structures.
I Introduction
Due to rapid advances in sensing and measurement technology, data is increasingly becoming complex and structured, reflecting the growing needs for newer approaches and problem formulations. One common approach to understanding complex, highdimensional datasets is to represent them as graphs. Typically one identifies a number of variables of interests in the data, designating them as nodes, and represents their interactions as edges in a graph. Such a graph captures variability and interactions associated with a large number of variables, and is amenable to higherorder statistical analysis. Examples of graphical representations can be found in many areas, including video data analysis [de2014pattern], social networks [ugander2011anatomy], gene expression networks [westenberg2008interactive], brain connectivity data [dai2016testing], geographical data [mackaness1993use], financial stocks [yang2003market], communication networks [hakimi1965optimum], epidemiology [shirley2005impacts], and so on. Fig. 1 shows some examples: letters with straight edges, molecules with atoms as nodes and valence as edges, videos represented as pattern theoretic graphs [de2014pattern] with objects or actions as nodes and their relationships as edges, and brain connectivity networks in Human Connectome Project (HCP) [van2013wu] data.
The use of graph representations is of great interest in machine learning, including deep learning. In principle all geometric deep learning is concerned with learning on manifold of the data elements (nodes) which, in turn, can be naturally represented by a graph. Additionally, there are some papers that consider entire graphs as the entities of interest. For instance, papers, such as graph2vec [narayanan2017graph2vec] and UgraphEmbed [baiunsupervised], consider the problem of assigning a fixed vector space representation to entire graphs. Some other papers seeks to find vector representations for nodes such that distances in the vector space is reflective of the neighborhood structure of the graph. Examples include modeling randomwalks through the nodes using recursive neural networks (RNN) [perozzi2014deepwalk, grover2016node2vec], or ones that preserves first and secondorder proximity information [wang2016structural, tang2015line], or ones that consider larger neighborhood structures as captured by node coarsening [chen2018harp]. Another set of papers seek to perform graph convolutions, i.e. convolutions over the nodes, respecting the manifold defined by the given graph. Examples include graph signal processing [ortega2018graph, defferrard2016convolutional], or works that direct processing via the local graph structure [kipf2016semi, hamilton2017inductive]. Again, like the previous embedding problem this one too considers a single graph and operations on the nodes of that graph.
We are focused on the problems where one has several graphs, each representing a snapshot or an observation of a system at any instance, and one is interested in capturing, modeling and analyzing statistical variability across these graphs. For instant, consider the representation of functional connectivity of parts of a human brain during performance of a certain task, as measured by fMRI signals, using graphical structures. Given several such graphs, one for each human subject under each task and performance, one has a large amount of graph data to analyze and model. Similarly, one may have graphical representations of different social or economic networks, each representing different communities. The general goal of statistical analysis is: (1) derive common characteristics across observed graphs, (2) distinguish graph populations using statistical testing, and (3) model variability in graph data using analytical generative models. Thus, a unique aspect of this work is that we can create a generative model over entire graphs. We can generate synthetic graphs that capture dominant variability of the domain. An interesting shortterm use of the generative aspects can be to augment training data for other graphbased deep learning approaches. In the long term, our hope is that the theory in this paper will enable the study of deep learning on nonEuclidean manifolds of graphs, where the data elements are themselves graphs.
The structured nature of graphs makes them difficult to analyze using classical statistical tools. A graph is a nonEuclidean data object that consists of a set of variables in form of nodes and their interactions in form of edges. There are two sources of variability in graphs – (i) different number and values (attributes) of nodes, and (ii) different connectivity patterns of the nodes (in form of edges). One is interested in incorporating both these factors in comparisons and analyses of graphs. Learning structures underlying observed graphs can help us understand deeper relationships between the variables of interests. Therefore, one is interested in mathematical representations that enable quantitative statistical analysis of graph data in terms of both edge and node attributes. For quantifying differences across graphs, one requires metrics that can incorporate an arbitrary combination of differences in these properties. However, one big issue in analyzing graph data is that nodes across graphs often come without matchings or correspondences. The problem of establishing correspondences of nodes across graphs is called registration (or graph matching) and it represents one of the biggest challenges in statistical analysis of graphs.
In the literature, there are mainly two different types of graph matching: exact matching and inexact matching [neuhaus2007bridging]. The exact matching implies finding a bijective map such that the nodes and edges across two graphs are in onetoone correspondence. If two graphs can be matched exactly, then the mapping is also called an isomorphism. A related topic is subgraph isomorphism [carletti2017challenging] where one graph matches to a substructure of another graph. In contrast, the inexact matching seeks optimal registration between graphs that may be dissimilar. The inexact matching is more common in practice because of the complexities associated with real data. Since matching of two sets of nodes is essentially a problem of combinatorics, the problem of finding a global optimum for inexact graph matching is NP hard [zhou2012factorized]. Therefore, most algorithms for graph matching seek approximate solutions based on different relaxations of the original problem [conte2004thirty]. As described later, the mathematical variability of matching different nodes across graphs is achieved using the action of a permutation group – a permutation of ordering of nodes in graph changes its registration with an ordered set of nodes representing another graph. The approximate solutions correspond to expanding from the permutation group to some larger set where the solutions are more readily available. One idea is to replace permutations by rotation matrices and then use spectral (eigendecomposition based) approaches to find optimal rotations, see [umeyama1988eigendecomposition, caelli2004eigenspace]. Another direction is to replace permutations by doubly stochastic matrices, and find the solution in that larger space. In all these cases, the final solution can eventually be restricted to the discrete set of permutation matrices, see e.g. [gold1996graduated, lyzinski2016graph]. Besides these approximations, there are some other algorithms for approximate graph matching [riesen2009approximate, almohamad1993linear, krcmar1994application]. There are also some treesearch based methods to calculate the graph edit distance (GED) [neuhaus2007bridging], a problem that incorporates graph matching. The general idea here is to solve the problem iteratively – given the current estimation of cost (based on predefined cost of node/edge insertion, deletion and substitution), determine the next operation based on the heuristic estimation of future cost, see [hart1968formal].
In this paper we present a metricbased approach for comparing, summarizing, and analyzing graphs. The basic idea is to represent graphs as matrices, and to formulate the registration problem as that of permutation of entries in those matrices. Mathematically speaking, we represent the registration variability using the action of the permutation group on the set of matrices representing all graphs. In order to remove this nuisance group, we form a quotient space and inherit a metric on the quotient space from the original set of matrices. We use a standard Euclidean metric, with appropriate invariance properties, because it allows for an efficient registration of nodes across graphs. The quotient space metric is then used to define and compute statistical summaries such as clustering, sample means, covariances, and principal components. The principle component analysis can be used to perform dimension reduction, and to impose compact statistical models on observed graphs. These models can play important roles in hypothesis testing and other statistical inferences involving graph data.
The novel contributions of this paper are as follows:

It adapts a quotient space metric structure on the set of graph representations, originally introduced in [jain2009structure], and extends it to include both node and edge attributes. It uses this metric structure to quantify graph differences and to compute optimal deformations (geodesics) between graphs. Using this metric structure, it establishes a framework for computing sample statistics such as mean and covariance for graph data.

A key idea here is that it does not assume the graphs to be isomorphic. That is, one allows nodes to remain unmatched across the graphs. Past metricbased approaches often insist on every node being matched to a proper node during graph comparisons.

It defines the notion of principal component analysis of graphs, and uses that to develop lowdimensional representations of observed graphs.

It develops a simple Gaussiantype model for capturing graph variability in observed graphs and uses it to generate random samples from such graphical models. This sampling, in turn, can be used for Bayesian inferences involving graphical data although that direction has not been pursued here.
The rest of this paper is organized as follows. Section II describes the chosen mathematical representation of graphs using symmetric matrices. Section III studies the graph matching problem using an action of the permutation group. Section IV extends this framework to include both node and edge attributes in the framework. Section V presents techniques for statistical analysis of graph data. Section VI shows a number of experiments illustrating this framework. The paper ends with a short discussion and some conclusions in Section VII.
Ii Graph Representation and Metric Structure
In this section, we will present a framework for the structure of graphs that was first developed in [jain2009structure, jain2012learning]. We apply and advance this framework as described below.
Iia Adjacency Matrix Representation
We start by providing a mathematical representation for analyzing weighted graphs. A weighted graph is an ordered pair , where is a set of nodes and is a weighting function: . is a Riemannian manifold on which we can define distances, averages, and covariances. That is, characterizes the edge between , where elements of the set are the edges of . A binary graphs is the special case of weighted graphs where the weights of edges are either zero or one. Assuming that the number of nodes, denoted by , is , can be represented by its adjacency matrix , where the element . For an undirected graph , we have and therefore is a symmetric matrix. (In this paper, we only focus on undirected graphs although the framework is extendable to directed graphs also.) The set of all such matrices is given by . Let denote the Riemannian distance on . We will use this to impose a metric on the representation space . That is, for any two , with the corresponding entries and , respectively, the metric quantifies the difference the graphs they represent. Under the chosen metric, the geodesic or the shortest path between two points in can be written as a set of geodesics in between the corresponding components. That is, for any , the geodesic consists of components given by , a geodesic path in between and . In case , then is a vector space, equivalent to a Euclidean space of dimension .], and the geodesic between two points in is a straight line. That is, for any , given by is the geodesic path.
Since the ordering of nodes in graphs is often arbitrary, the ensuing analysis should not be dependent on this arbitrary choice. We view the ordering variability as a nuisance and seek to remove its influence from the analysis. A different way to state this issue is that nodes across graphs need to be registered during comparisons, and we will use permutations to perform registration. Let be the set of all permutation matrices of size . A permutation matrix is a matrix that has exactly one 1 in each row and each column, with all the other entries being zero. This set forms a group with the group operation being matrix multiplication, and the identity element being the identity matrix. Note that is a subgroup of , the set of all rotation matrices. For any , the inverse of is given by , the transpose of . We define the action of on using the map:
One can easily verify that this is a proper group action. For any , its orbit under the action of is given by:
It is the set of all possible permutations of the node ordering in a graph represented by . Any two elements of an orbit denote exactly the same graph, except that the ordering of the nodes has been changed. Therefore, the membership of an orbit defines an equivalent relationship on the set :
(1) 
One can check that any two orbits and , for any , are either equal or disjoint. The set of all equivalence classes forms the quotient space or the graph space:
(2) 
is a nonlinear space because it is a quotient space – one cannot perform linear operations, such as addition or multiplications on its elements directly. For example, is not well defined in for arbitrary . Next we will impose a metric structure on this quotient space, and use this metric to compute statistical summaries and to perform statistical analysis.
We can inherit the chosen distance (the Frobenious norm) from on to the quotient space due to the following result.
Lemma 1
The action of on is by isometries. That is, for any and , we have
(3) 
The proof is easy since an identical permutation on both graphs leaves the registration between nodes (across graphs) remains unchanged. Also, since is a finite set, the orbits under are finite. This enables the following definition.
Definition 1 (Graph Metric)
Define a metric on the graph space according to:
(4)  
Since is a finite set, the minimum is well defined. The last equality comes from the fact that the action of is by isometry (Eqn. 3) and that is a group.
One can define geodesics in the graph space as follows. For any two graphs, with the adjacency matrices and , let denote the optimal permutation of to best register it with (according to Eqn. 4). Then, the geodesic path between and in is given by the line , where the components denote geodesics in between the registered elements of and . This geodesic, in turn, is useful in computing graph summaries and graph PCA, as defined later.
IiB Alternative Representation: Laplacian Matrix
In the special case when , one can also use graph Laplacian matrix [horaud2012short, severn2019manifold] as a mathematical representation, instead of the adjacency matrix, for a graph. The graph Laplacian matrix is defined as follows:
The set of all such Laplacian matrices is given by , the set of all positive semidefinite matrices of size . There is a bijective mapping between adjacency matrices and Laplacian matrices with defined as follows. Suppose is an adjacency matrix and is the Laplacian matrix for the same graph . Then, , where and 1 is the vector of all ones. The inverse of is given by: . The bijection of can be proved as follows. First, if , and it implies (Injection). And , we can find the preimage (Surjection). There are some interesting properties associated with the two representations:

Since , we have , for all .

For any geodesic path in , the corresponding path in is given by:
Note that is generally not a geodesic path in under the commonly used metrics on .

Also, under the Frobenious norms on and , the mapping is not isometric, i.e. , in general.
The framework developed in this paper can also be alternatively applied to the Laplacian representation, instead of the adjacency representation. For simplicity, we mainly focus on the latter in this paper.
Iii Graph Matching Problem
The problem of optimizing over , as stated in Eqn. 4, becomes a key step in evaluating the graph metric and performing statistical analysis. Let be any two weighted graphs, and let be the corresponding adjacency matrices. To simplify the discussion on graph matching and existing literature, we will completely focus on the case where . (The problem of matching nodes, when entries of are elements of arbitrary nonlinear manifolds, remains unsolved.) Then, the registration requires solving the problem:
(5) 
Most of current graph matching algorithms are applicable only to graphs with equal number of nodes. Even if they allow different number of nodes, they require that each node of the smaller graph must be registered to at least one node in the larger graph. Here ’smaller’ and ’larger’ indicate the size of the graphs, i.e. the number of nodes.
In general, given two graphs, with nodes and with nodes and , we will add null nodes to , respectively, to bring each of them to the same size . The null nodes are unattached nodes with zero values for the node attributes. As a result, the new adjacency matrices of and are:
(6) 
The new matrix dimensions are and, therefore, one can now apply previous graph matching algorithms. In fact, this idea of extending the adjacency matrix using Eqn. 6, can be applied even when the graphs being compared have same number of nodes, in order to allow for individual nodes to match with null nodes. By doing this, one has more degrees of freedom, in order to reach a better matching and to further reduce the cost function.
In the next three subsections, we present three different solutions for this optimization problem over .
Iiia Umeyama Algorithm
First we introduce a classic solution from [umeyama1988eigendecomposition] that is based on eigen decomposition of representation matrices. This method is summarized in Algorithm 1 and not repeated in the text here. Note that Algorithm 1 applies to the current discussion with , the more general case is discussed later in Section IV. As noted in [umeyama1988eigendecomposition], the solution is the global solution for isomorphic graphs but is usually a good initialization to more general graph matching problems. Thus, we use it as an initial condition for a greedy search (pairwise exchanges of rows and columns) that seeks to further improve the solutions.
We illustrate this idea using some simple examples in Fig. 2. This dataset has binary graphs representing uppercase English letters[riesen2008iam]. Each row shows the original graphs (first graph) and (last graph), and the outcomes and , in the middle. is the optimal permutation from Algorithm 1 of , while is same as with possibly some null nodes added. The first row shows the simpler case, where and have same number of nodes. We still add null nodes to both of them and permute to match , resulting in . As expected, the null nodes of are found to be registered to null nodes of , and are not displayed here. For the second row, the graphs and have different sizes. It is interesting to note that has a null node – node – which means a regular node of is registered to a null node of . In the last row, the two graphs and have the same size. However, both and have null nodes ( and , respectively), that are matched to the regular nodes of the other graph. This seems to result in a more natural matching.
IiiB Fast Approximate Quadratic Programming
Algorithm 1 generally works well for smaller graphs but becomes slow when the number of nodes gets large. The greedy part of this algorithm costs in computations for each node exchange. Recently, [vogelstein2015fast, lyzinski2016graph] have used the FrankWolfe algorithm [frank1956algorithm] to develop a different solution, called Fast Approximate Quadratic or FAQ. The main idea is to restate matching problem according to:
The RHS of above equation is a special case of quadratic assignment problem. One can solve it using the gradient of the cost function . In order to handle the discrete nature of permutation matrix, the procedure first replaces the permutation matrix by a doubly stochastic matrix:
(7) 
where is the set of doubly stochastic matrices. These are matrices whose: (1) all entries are nonnegative, and (2) rows sum, columns sum equal to one. After the optimization, the solution is projected back to the space . We summarize this approach in Algorithm 2, with the current context applicable for .
IiiC Comparisons of Algorithms for Registration
We compare these two algorithms, along with famous graduate assignment algorithm [gold1996graduated], using some simulated data. Algorithm 1 was run in Python 3.6.8 while FAQ and graduate assignment was run in Matlab, R2018b. All three were run on the same machine with a 2.5 GHz Intel Core i7 CPU. For each pair of graphs, we apply these three algorithms, and record the energy, i.e., the optimal cost () and the elapsed time. For graduate assignment, the parameters used are . We show the results in Fig. 3.

Case 1: We randomly generate 100 pairs of ErdősRényi graphs (binomial graphs) [wiki:binomial] with number of nodes and probability for edges.

Case 2: We repeat the same procedure for graphs with and .
As the first row shows, for smaller graphs (, Algorithm 1 has better performance. For large graphs (), FAQ is both superior in terms of both energy and time. These experiments tell us that different graph sizes demand different matching algorithms.
Iv Extension Involving Both Edge Weights and Node Attributes
In many cases, the structure of a graph can be identified by comparing edge weight exclusively. However, sometimes the information associated with the nodes of graphs is also important in matching and comparing graphs. Next we extend the previous framework to incorporate node information also.
Let be the set of potential node attributes and let be a distinguished element denoting the null or void element. A nodeattributed weighted graph is represented by , consisting of: (i) a finite nonempty set of nodes, (ii) a weight function for edges, and (iii) an attribute function for nodes given by . Let be an appropriate distance in , .
For any two graphs and , each with nodes, let denote the matrix of pairwise distances between nodes across the two graphs. That is, , where . Now the matching problem becomes:
(8) 
where is the tuning parameter to balance the contributions of edge and node attributes in matching. For FAQ, the equivalent matching problem is defined as:
(9) 
The new gradient for Eqn. 9 becomes . The previous algorithms can be simply modified to handle the new formulation. In fact these general solutions are already presented in Algorithms 1 and 2 for a general .
More generally, for with () nodes, we extend the matrix according to:
(10) 
Here, the offdiagonal elements in represent the nodeattribute distance between and th null node in . The explanation applies to as well.
In Fig. 4 we present some illustrations of these algorithms, when using node attributes also. In this example, we use the planar coordinates of nodes of letter graphs as the attributes and incorporate this additional information in matching graphs. The first row is the case without using any node attributes, i.e. (in Eqn. 8). In second row, we add node attributes with . Compared to the first row, this case shows a better correspondence across graphs since the edges are now registered across graphs. If we further increase the weight on the node attributes, as last row () shows, the matching completely ignores the edge correspondence. In second row only one edge of is matched to a null edge, while in last row two edges: and of are matched with null edges.
As mentioned earlier, an important strength of this framework is that it provides geodesic paths between registered graphs, as element of the quotient space . The geodesics in the prespace and the graph space are linear interpolations, except that the registration has been optimized in the latter case. Fig. 5 is a comparison between geodesics in (top row) and in (bottom row) between the same two graphs. The two original graphs are at the two ends, representing letter ’A’ and letter ’F’. In this example, we also use the coordinates of nodes as node attributes with . As one can see, geodesic in shows a more natural deformation from one graph to the other, resulting from an improved matching of nodes.
As stated in the previous section, we can also use Laplacian matrices to represent graphs. Although one can easily map an adjacency matrix to a Laplacian matrix using , and viceversa, the past literature has rarely used Laplacian matrix for graph matching. We present one example in Fig. 6 where we perform matching under both the representations – adjacency matrix and Laplacian matrix, using Algorithm 1. As commented earlier, the mapping is not an isometry under the Frobenious norm on both spaces, and minimizing results in a different solutions than minimizing . Anyway, one should note that the different representations of graphs only diverge in terms of actual solutions, the general procedures are quite similar.
V Statistical Analysis of Graphs
We have developed a metric space for representing, matching and comparing graphs. Additionally, we have now a way of computing geodesic paths in between arbitrary graphs. Together, these tools help us derive statistical summaries of graph data and develop stochastic models to capture the observed variability in given data. We start by defining sample means and covariances.
Va Mean of Graph Data
Given a set of graphical data, it is important to summarize given graphs using the notion of a mean or a median. However, a simple average of the adjacency matrices does not make much sense if the nodes are not registered, which is usually the case in practice. Therefore, we would like to seek the mean in the graph space . Given a set of graphs, , , with corresponding adjacency matrices , the adjacency matrix of the mean graph is defined as:
(11) 
where is as defined in Eqn. 4. An algorithm for computing this mean is given next.
In case the node attributes are included, one will need to endow the node attribute space with a metric structure, so that one can average the nodes also. For Euclidean attributes that is straightforward. However, in case of categorical node attributes, one either needs to impose a metric structure and use that structure to compute the mean node value.
VB Principal Component Analysis of Graphs
The high dimensionality of observed graphs is a big issue in many problem domains. For a graph with nodes, the number of potential edges can be as high as . It will be useful to have a technique for projecting graph data to smaller dimensions while capturing as much intrinsic variability in the data as possible. Principal component analysis (PCA) can naturally be used as a simple tool for linear projection and dimension reduction, and to discover dominant directions/subspaces in data space. As mentioned earlier, the nonregistration of nodes in the raw data can be an obstacle in applying PCA direction in . Instead, one can apply PCA in the quotient space listed in Algorithm 4. After PCA, graphs can be represented as low dimensional vectors, which facilities further analysis. Compared with the graph embedding techniques that also represent graphs as vectors [bahonar2019graph], one is able to project the principal scores back to the graphs.
Given a set of graphs with adjacency matrices , let denote their sample mean in (obtained using Algorithm 3) and be the matrices registered to . Then, the differences are elements of a vector space and one can use them to perform PCA. The algorithm for PCA follows.
As mentioned before, the extended adjacency matrices can be used for the graphs with different number of nodes. In fact, one can also elongate with node attributes when nodes are taken into account. We skip further discussions on these possibilities to save space.
VC Generative Graph Model
In some situations involving statistical inferences, it is useful to develop analytical generative models for graphical data. For example, it can be useful in performing Bayesian graphical inference [jensen1996introduction]. However, model estimation directly from observed graphs may have a large error because the graphs are not registered. We introduce a simple Gaussiantype model in graph space to better capture the essential variability of graphical data. In conjunction with the graph PCA and potential dimension reduction, we can reach a very efficient model.
Assume that we have a set of graphs with adjacency matrices . By applying Algorithm 4, we can get the PC scores by projecting each to the first principal components space. For s, we impose a dimensional Gaussian model with sample mean and covariance as the model parameters. Note that Gaussian model is used only for illustration, but any general parametric or nonparametric model can also be used here. In case one wants to use a multivariate normal density, it is useful to validate it using some normality test beforehand. Unlike the generative graph model in [han2015generative], where one need to gradually add nodes and edges to get a new sample graph. One can directly sample a new graph from the proposed model.
Vi Experiments and Applications
To illustrate this framework, we have implemented it on a variety of graph datasets. The results are presented in this section.
Via Letter Shapes
The Letter Graphs dataset is a part of the IAM Graph Database used in [riesen2008iam], and consist of small graphs depicting 15 uppercase letters (A, E, F, H, I, K, L, M, N, T, V, W, X, Y, Z) that can be drawn using straight lines. The edge weights in these graphs are either one or zero. The nodes have location coordinates in so that the collection of edges form the shape of a letter. The authors also introduced distortions to the prototype graphs with three different levels of distortions – low, medium and high. Fig. 7 shows some sample graphs of letter ’A’ at these three different distortion levels.
First, we use Algorithm 3 to compute mean graphs of 50 observations associated with letter ’A’, at each of the three distortion levels. The results are shown in Fig. 8. In order to match different number of nodes across multiple graphs, one has to add a number of null nodes in the mean shape, and this can be seen in the resulting means. The mean graphs resemble the letter ’A’ in all three cases, despite a significant variability and distortions in the original data.
Additionally, we perform PCA on this letter data in the quotient space and display results in Figs. 9, 10 and 11, for low, medium and high distortion graphs, respectively. In these figures, each row depicts shape variability along a principal direction in the given data in form of graphs at mean standard deviation. This analysis helps identify the main modes of structural variability in the original data. For example, Fig. 9 shows graphs along first three principal directions of variability in the low distortion dataset. In all these graphs, the main edges are stable and there are no significant changes along principal axes. This implies that observations in this set are quite similar in shape. However, in results from the medium distortion data in Fig. 10, the horizontal edge changes significantly in the first principal direction. In case of the high distortion level, there are significant changes in shapes along principal directions. For instance, in the top row, there is an extra edge on the top left that dominates the first principal direction.
Another important application of PCA is in reducing data dimension. That is, perform PCA on graph data and represent original graphs using lowdimensional PC scores. One can reconstruct and visualize these approximate graphs, to evaluate the quality of representation. Fig. 12 shows the reconstruction of graphs using only the first 16, 16 and 14 dimensions for low, medium and high distortion letters, respectively. These dimensions were chosen to ensure the representation retains at least of the original variance. The reconstructed graphs are quite similar to the original despite a significant reduction in size. Fig. 13 shows percentage variance in first components versus , while the actual cutoffs can be found in Table I.
100%  

Low  13  16  18  50 
Medium  11  16  25  50 
High  11  14  21  50 



Additionally, we fitted a Gaussian model on PC scores of the observed graphs. We first use PCA to reduce the dimension to around 80% variance as mentioned before. Then we impose a Gaussian model on these principal component scores. To test model performance, we generate some random samples from this model and project into graph space, presented in Fig. 14. A visible similarity of these random samples to the original graphs underscores the goodness of the model.
As the final experiment with letters, we perform the following classification experiment. The full letter dataset consists of training, validation and test subsets, each containing 750 graphs. The graphs are uniformly distributed over different letters, i.e, 50 graphs for each class. We classify the graphs based on their pairwise distance in graph space using Algorithm 1. We use Support Vector Machine (SVM) with the radial kernel as the classifier and the classification result on the test data is given in Table III with the comparision from [riesen2008iam].
ViB Molecular Shapes
In this section, we analyze another graph dataset from IAM Graph Database[riesen2008iam] that involves shapes of molecular compounds. These molecules are converted into graphs in a straightforward manner by representing atoms as nodes and the covalent bonds as edges. That is, edges are weighted by valence and node is attributed as atoms. This dataset consists of two classes (active, inactive), which represent molecules with activity against HIV or not. Fig. 15 shows some example graphs of active and inactive molecules. We use the edge weight for matching the graphs and we present one pair of geodesics in and in Fig. 16. The deformation in graph space has more natural path.
The complex structure of molecules results in the highdimensional representation in the graph space , but that can be handled by PCA. We use the first 22 and 21 principal components for active and inactive classes, respectively, containing roughly of the total variance, to represent and reconstruct these molecules. As shown in Fig. 17, one can successfully reconstruct the original graphs with the chosen smaller dimensions. Fig. 18 shows the percentage variance in top components, versus the number of principal components. The detailed cutoff values can be found in Table II.




100%  

Active  16  22  31  50 
Inactive  15  21  31  190 
This dataset consists of a training set and a validation set of size 250 each, and a test set of size 1,500. Thus, there are 2,000 elements totally. Of those, 1,600 inactive elements and 400 active elements are uniformly distributed over different datasets. We perform classification on this dataset using SVM classifier and the pairwise graph distance (Fig. 19) in . To involve node attributes (atoms values) in the analysis, we adopt a binary distance for nodes. In other words, if two nodes have exactly the same atom, then the distance between them is set to zero. Otherwise it is set to one. In this experiment, we use in Eqn. 8 to balance the edge information with node attributes. The classification result on the test data is given in Table III. The results are compared to those presented in the original paper [riesen2008iam].
Methods  Letter: Low  Letter: Medium  Letter: High  Molecule 

Distortion  Distortion  Distortion  
Algorithm 1  98.5%  96.4%  93.6%  99.6% 
[riesen2008iam]  99.6%  94.0%  90.0%  97.3% 

ViC Video Graphs
The third example comes from representation of cooking videos as pattern theoretic graphs, as developed in [souza2015temporally]. There are 1020 graphs representing 102 different video clips related to cooking. Specifically, for each video clip there are 10 graphs providing multiple interpretations of that video clip [souza2015temporally]. The nodes in a graph represent features extracted from the video and edges represent their interactions or relationships. For example, a node at the lowest level can be histogram of oriented gradients (HOG) or histogram of optic flow (HOF) features. At the second level, a node can be an object such as a bowl, cup, etc, and at the highest level a node can be an action such as stir, pour, etc. The weight of edges represents the strength of the interaction between the nodes. Some examples of such graphs can be found in Fig. 20.
We will represent each video clip as a different class, and will classify videos using their graphical representations. We perform this classification task using the Nearest Neighbors classifier under the pairwise graph distance in . To utilize node attributes, we also impose a binary distance on node level or node name with different weight. The leave one out result of classification can be found in Table IV.
Edge (E)  E + NodeLevel  E + NodeLevel  E + NodeName 

Only  
71.3%  72.3%  73.0%  73.2% 
ViD Wikipedia Graphs
Our last example comes from communication networks of the Chinese Wikipedia [konect:2017:wiki_talk_zh, konect:sun_2016_49561],. In these graphs, nodes represent users of the Chinese Wikipedia, and an edge (0 or 1) denotes whether one user left a message on the talk page to another user at a certain time stamp. We take monthly graphs from the year 2004, resulting in a sample size of 12 graphs. On average, each graph has around 300 nodes and 431 edges. We compute the mean in original space and graph space , with results shown in Fig. 21. The top panel is simply an average of adjacency matrix. It is complex and hard to discern any pattern from this average. The bottom panel shows the mean in graph space; this graph shows a clear clustering of users implying that there are prominent subsets of users that actively interact with others in their clusters.
The results from PCA analysis of these graphs is shown in Fig. 22. These results show that most of the user interactions are stable and remain unchanged, while principal variations in the data come only from a handful of active users.
Vii Discussion & Conclusion
In this paper, we presented a novel framework for learning and analyzing structures of graphs. The quotient space formulation removes the nuance permutation variability. Due to the isometric action of the permutation group, the quotient space inherits metric that enables metricbased statistical analysis of graphs – geodesics, means, PCA, and Gaussiantype models.
The set of tools developed in this paper are useful in several contexts. For instance, one can use them to analyze methods in geometrical deep learning, where both data and inferences can involve graphs in different forms. Lowdimensional Euclidean representations of graphs will enable a direct use of more sophisticated statistical models, including many deep learning architectures. The ability to reconstruct full graphs from these representations is important in synthesizing new graphs.
One limitation of the current formulation is that the edge attribute is restricted to be Euclidean (). However, the future work includes investigating the applications of cases when graphs have more generic edge attributes.
Acknowledgment
The authors would like to thank the creators of different public datasets used in this paper. The authors also thank Dr. Adam Duncan for his contributions in the implementation of a preliminary version of the approach.