A Convolutional Neural Network into graph space

A Convolutional Neural Network into graph space

Abstract

Convolutional neural networks (CNNs), in a few decades, have outperformed the existing state of the art methods in classification context. However, in the way they were formalised, CNNs are bound to operate on euclidean spaces. Indeed, convolution is a signal operation that are defined on euclidean spaces. This has restricted deep learning main use to euclidean-defined data such as sound or image.

And yet, numerous computer application fields (among which network analysis, computational social science, chemo-informatics or computer graphics) induce non-euclideanly defined data such as graphs, networks or manifolds.

In this paper we propose a new convolution neural network architecture, defined directly into graph space. Convolution and pooling operators are defined in graph domain. We show its usability in a back-propagation context.

Experimental results show that our model performance is at state of the art level on simple tasks. It shows robustness with respect to graph domain changes and improvement with respect to other euclidean and non-euclidean convolutional architectures.

1 Introduction

Graphs are frequently used in various fields of computer science, since they constitute a universal modeling tool which allows the description of structured data. The handled objects and their relations are described in a single and human-readable formalism. Hence, tools for graphs supervised classification and graph mining are required in many applications such as pattern recognition [22], chemical components analysis [9], structured data retrieval [20].

1.1 Graph Classification

Graph classifiers can be categorized into two categories whether the classifier operates in a graph space or in a vector space.

Graph space

Graph space classification consists of finding a metric (with the graph space) to evaluate the dissimilarity between two graphs. This metric can be later used in a K-Nearest Neighbor context, where the distances between the object to be classified and the elements in the learning database are used as a base for classification. The similarity or dissimilarity between two graphs requires the computation and the evaluation of the ”best” matching between them. Since exact isomorphism rarely occurs in pattern analysis applications, the matching process must be error-tolerant, i.e., it must tolerate differences on the topology and/or its labeling. For instance, in the Graph Edit Distance (GED) problem [22], the graph matching process and the dissimilarity computation are linked through the introduction of a set of graph edit operations. Each edit operation is characterized by a cost, and the dissimilarity measure is the total cost of the least expensive set of operations that transform one graph into another one. In [22, 3], the GED is shown to be equivalent to a Quadratic Assignment Problem (QAP). Since error-tolerant graph matching problems are NP-hard most research has long focused on developing accurate and efficient approximate algorithms. In [3], with this quadratic formulation, two well known graph matching methods called Integer Projected Fixed Point method [14] and Graduated Non Convexity and Concavity Procedure [15] are applied to GED. In [14], this heuristic improves an initial solution by solving a linear assignment problem (LSAP) and a relaxed QAP where binary constraints are relaxed to the continuous domain. The algorithm iterates through gradient descent using the Hungarian algorithm to solve the LSAP and a line search. In [15], a path following algorithm aims at approximating the solution of a QAP by considering a convex-concave relaxation through a modified quadratic function.

Vector space

Vector space graph classification is about representing graphs as vectors to classify them.

A first one consists in transforming the initial structural problem in a common statistical pattern recognition one by describing the graphs with vectors in an Euclidean space [16]. In such a context, some features (vertex degree, labels occurrence histograms,etc.) are extracted from the graph. Hence, the graph is projected in a Euclidean space and classical machine learning algorithms can be applied. Such approaches suffer from a main drawback: to have a satisfactory description of topological structure and graph content, the number of such features has to be very large and dimensionality issues occur.

Another possible approach also consists in projecting the graphs in a Euclidean space of a given dimension but using a distance matrix between each pairs of graphs. In such cases, a dissimilarity measure between graphs has to be designed [5]. Kernels can be derived from the distance matrix. It is the case for multidimensional scaling methods proposed in [23].

Alternatively, graph embedding can be implemented implicitly through kernel-based machine learning algorithms. In the kernel approaches, an explicit data representation is of secondary interest. That is, rather than defining individual representations for each pattern or object, the data at hand is represented by pairwise comparisons only. The graphs are not explicitly but implicitly projected in a Euclidean space without defining the function . More formally, under given conditions, a similarity function can be replaced by a graph kernel function . Most kernel methods can only process kernel values which are established by symmetric and positive definite kernel functions. Many kernels have been proposed in the literature [18, 9]. In most cases, the graph is embedded in a feature space composed of label sequences through a graph traversal. According to this traversal, the kernel value is then computed by measuring similarity between label sequences. Even if such approaches have proven to achieve high performance, they suffer from their lack of interpretability. In fact, it is very difficult to come back to graph space from the kernel space. This problem is also known as ”pre-image”.

1.2 Euclidean and geometric deep learning

Deep learning has achieved a remarkable performance breakthrough in several fields, most notably in speech recognition, natural language processing, and computer vision. In particular, convolutional neural network (CNN) architectures currently produce state-of-the-art performance on a variety of image analysis tasks such as object detection and recognition. Most of deep learning research has so far focused on dealing with 1D, 2D, or 3D Euclidean structured data such as acoustic signals, images, or videos.

Recently, there has been an increasing interest in geometric deep learning, attempting to generalize deep learning methods to non-Euclidean structured data such as graphs and manifolds, with a variety of applications from the domains of network analysis, computational social science, or computer graphics. Graph neural networks are one of possible ways to implement explicit graph embedding: the neural network takes a graph as input and outputs a vector. This one can be used for classification. Moreover, graph neural networks perform learning the explicit embedding according to a given learning criterion.

1.3 Graph Neural Networks

These neural networks often try to apply convolution to graphs so that it mimics classical convolutional neural networks. Convolution definition on graph space is a tedious theoretical task. There is indeed no straightforward definition. However, one can identify two families of definitions in the existing literature. The first family (spectral approaches) relies on the convolution theorem. This theorem states that the convolution operator on the spatial domain is equivalent to the product operator on the frequency domain. Although this theorem was only proven on euclidean spaces, a group of approaches in the litterature postulates its validity on the graph space. A graph frequency domain is accessed through diagonalization of its Laplacian ( and respectively being the degree and adjacency matrices of the graph). Such approaches have two main limitations. The first one is their sensitivity to topological variations: a slight deformation of the graph structure changes the resulting convolution signal drastically. The latter is that there is no Fast Fourier Transform on the graph space: as previously stated, accessing the graph frequency domain relies on matrix diagonalization and therefore inversion. Inverting a matrix is a costly operation.

These drawbacks exist because convolution is applied implicitly to the graph through its frequency domain. A simple way to avoid them is to apply convolution directly on the spatial domain. The second family of approaches (the spatial ones) try to come up with analogies of the original convolution definition. However, existing approaches often degrade graphs and therefore do not fully exploit their structural information.

In this paper, we propose a graph convolution operator which operates solely on graph space. This is made possible through usage of graph matching to define local convolutional operation. By doing so, we try to establish a link between two scientific communities who respectively work on graphs and deep learning. More specifically, we define graph-based computations using operators from the graph matching litterature in a deep learning (neural network) framework.

2 State of the Art

This section offers a review of existing graph neural network definitions. Every graph neural network layer can then be written as a non-linear function:

As an example, let’s consider the following very simple form of a layer-wise propagation rule:

is a non-linear activation function like the ReLU.

Multiplying the input with now corresponds to taking the average of neighboring node features from the layer . It is also called in the literature ”average neighbor messages” and it acts like passing average node features from one layer to another. In [11], a better (symetric) normalization of the adjacency matrix is proposed i.e. . A per-neighbor normalization is performed instead of simple average, normalization varies across neighbors.

with , where is the identity matrix and is the diagonal node degree matrix of . The complexity of this model is time complexity overall (E being the set of edges).

More operations have been investigated in the literature [19]. A complete family of operations can be used :

  • I : this identity operator does not consider the structure of the graph and neither provide any aggregation. Used alone this operator makes the GNN a composition of MLP completly independent. One MLP for each node feature vector.

  • : the adjacency operator gather information on the node neighborhood (1 hop).

  • : . This degree operator gather information on the node degree. is node degree matrix (a diagonal matrix).

  • : . It encodes -hop neighborhoods of each node, and allow us to aggregate local information at different scales, which is useful in regular graphs.

  • : is matrix filled with ones. This average operator, which allows to broadcast information globally at each layer, thus giving the GNN the ability to recover average degrees, or more generally moments of local graph properties.

Let us denote . A GNN layer is defined as :

, are trainable parameters.

Key distinctions are in how different approaches aggregate messages. So far, proposals have aggregated the neighbor messages by taking their (weighted) average, but is it possible to do better? In [10], a GNN called GraphSAGE is proposed. The aggregation of neighbors information is more complex. The very general scheme of aggregation can written thanks to the function :

Let us define is the set of nodes in the 1-hop neighborhood of node .

  • mean : .

  • max : . Transform neighbor vectors into a matrix and apply a max pooling element-wise.

  • LSTM : . Where is a random permutation. The idea is to provide to the LSTM a sequence composed of neighbor embeddings. So the input sequence is composed of vectors. The sequence is randomly permuted by the function .

In [17], the graph structure is locally embedded into a vector space. The distribution of local structures in the local space is estimated by a Gaussian Mixture Model. The function is then expressed by a mixture of Gaussians. The Gaussian parameters are covariance matrix and mean vector and they are learnt during the training of the neural network.

A notable variant of GNN is graph attention networks (GAT), which was first proposed in [25]. This model includes the self attention mechanism to evaluate the individual importance of the adjacent nodes and therefore it can be applied to graph nodes having different degrees by specifying arbitrary weights to the neighbors [25].

For further reading, good surveys about graph neural networks have been published [28, 27, 26].

Deadlocks, contributions and motivations From the literature, two main deadlocks can be drawn. First, in many of the related works [11, 19, 25], edge features are not well considered. However, the edge information is of first interest to boosts the structural knowledge in the computation of the node embedding. Second, most of the aforementioned approaches do not take full advantage of the graph topology [17, 11]. The graph structure is locally embedded into a vector space (i.e. the tangent space at a given point of a riemannian manifold). In this paper, we propose CNN architectures that remain in the graph domain. Especially, we design a convolution operator onto graph space through the solution of a graph matching problem. The problem of graph matching under node and pair-wise constraints is fundamental to capture topological information. It takes into account the nodes and edge features along with their neighborhood structure. Consequently, graph matching-based convolution can release deadlocks related to edge information integration, domain changes sensitivity and Euclidean space projection. Graph matching can be seen as added local constraints in the machine learning problem. We promote a truly novel class of neural network architecture where layers contain a combinatorial optimization scheme that plays a fundamental role in the construction of the entire neural network architecture. Consequently, we highlight the interplay between machine learning and combinatorial optimization.

3 Graph Convolutional Neural Network

3.1 Notation

Frequently used notations are summarized in Table 1.

Notation Description
An input graph
A filter graph
Neighbourhood subgraph rooted at vertex in
, Vertices in graph
An edge in graph between and
A vertex in
An edge in between and
Labelling function for vertices
Labelling function for edges
A filter graph and its associated weights
Vertex label of
Vertex label of parametrized by
Cardinality of
Kronecker delta of and
Table 1: Frequently used notations

3.2 Graph matching

To define our convolution operator, we must define the graph matching function that will be pointwisely used.

Graph matching problem

Let and be attributed graphs: and
(1a)
subject to (1b)
(1c)
(1d)
(1e)

The similarity function is defined as follows:

(2a)
(2b)
(2c)
(2d)
(2e)
Let denote an assignment of element (edge or vertex) to some element in :
(3a)
(3b)
(3c)
(3d)

The similarity function can be rewritten as follows:

(4a)
(4b)

3.3 Graph convolution based on graph matching

Now that our matching operator is formulated, we can apply it over an input graph to compute the result of a convolution.

Let and be attributed graphs: and . and are respectively referred to as the input graph and the filter graph.

Graph convolution operator

The graph convolution operator is a function and is defined as follows:

(5a)
with (5b)
(5c)

where and are defined as follows.

Vertex neighbourhood graph (-hops)

is defining the neighbourhood (which is a subgraph) for vertex in :

(6a)
with (6b)
and (6c)

Edge attribute in convolved graph

score is a function mapping an edge to its matching score in the found GMS. The problem is that it might be assigned multiple times:

(7)

potentially contains more than one element. Therefore, score can be defined as follows:

(8a)
with (8b)

3.4 Convolution layer

Now that the convolution operator is defined, it is possible to use it as a base to build a convolution layer. This layer can be included in a graph neural network.

Graph convolution filter: the filter graph

A graph convolution filter is an attributed graph . Its role is analogous to that of a vanilla CNN kernel: it modifies the output and gets modified through backpropagation. Every attribute function is parametrized with respect to a weight vector .

(9a)
with (9b)
(9c)

Graph convolution layer

A convolution layer is a set of convolution filters applied on a same input graph . The output of the layer consists of all filters results (analogous to euclidean convolution feature map) stacked up.

Let be the output function of the layer s.t.:

(10a)
with (10b)
(10c)
(10d)
(10e)
function keeps only a single graph structure and concatenates each vertex/edge attribute. The output function of the layer is a graph with same topology as but with attributes as vectors composed by attributes of every filters outputs.

Graph convolution computation can be seen as a step-by-step process (shown in Figure 1). The first step is neighbourhood extraction: for each vertice in (the input graph), the neighbourhood graph is extracted. It is composed of every neighbour of in a given range (it can be 1-hop away but also n-hops away). and (the filter graph) are matched. The matching score becomes the output of the convolution at .

Figure 1: Computing graph convolution
Definition 1

Graph convolution differentiation Let the input graph and be the output of a convolution layer (called Conv) s.t. .

(11a)
(11b)

To simplify notations, let’s consider the output of the Conv layer to be the vertex and edge labelling functions and , as neither the vertices or edges sets change during convolution. The output will be noted as in Equation 12b

Let be a loss function (for example mean-squared error or categorical cross-entropy). Let’s suppose Conv is involved in the calculation of such that:

and respectively being the processing before and after Conv. In order to minimize , its gradient must be calculated with respect to . This gradient will then be used to modify itself. For calculus needs, let Conv be the output function of the Conv layer. This output function is defined w.r.t. the graph labeling functions:

(12a)
(12b)
(12c)
(12d)
(12e)

The error gradient for is calculated using chain derivative:

(13a)
(13b)
(13c)

Let’s assume exists and is known. Therefore, only is to be calculated.

First of all, we need to expand . Let’s expand first for a given vertex :
(14a)
(14b)
(14c)
(14d)
(14e)
Then let’s expand :
(14f)
(14g)
If is , the same rewriting as in Equation 14c applies:
(14h)
(14i)
(14j)
If is avg:
(14k)

Let’s differentiate with respect to . :

(15a)
(15b)

Now let’s differentiate . and :

If is :

(16a)

If is avg:

(17a)

In any case:

(18a)

Now, can be calculated:

(19a)
(19b)
w.r.t. Eq 18a (19c)
(20a)
(20b)

Finally, let’s suppose is parameterized with vector . In this case, is to be calculated:

(21a)
(21b)
(21c)

Let’s assume exists and is known. has already been evaluated. Therefore, only is to be calculated. :

(22a)
(22b)

If is , :

(23a)

If is avg, :

(24a)

In any cases:

(25)

3.5 About graph matching differentiation

In Definition 1, a differentiation of the convolution operator is proposed. This differentiation does not take into account the dependencies between the optimal graph matching and the variables and . As these variables are used to calculate the possible matchings, it is trivial to conclude such dependencies exist. Nevertheless, the matching solver in use (see Subsection 3.9) is not differentiable, at least a priori. We therefore assumed as a constant in the gradient calculus with respect to these variables by means of change of variable in Eq. 14c.

3.6 A ”no edge matching” version of the graph convolution layer

This section presents a degraded model. It ignores topology at a local level by not matching edges. It therefore reduces the graph matching problem to a node assignment problem inside a given neighborhood. One concern on this simplification could be that we do not take advantage of the graphs topology. However, topology information is used when computing vertices neighbourhoods. Additionally, this model has lower time complexity as edge information is not taken into account (see details in Subsection 3.9.)

Used graphs are 3-uplets and the similarity function is simplified as follows:

(26a)
(26b)

As a consequence of the edge attributes deletion in the filter graph, its parameter becomes vector (as many parameters as vertex). The filter is defined as follows:

(27a)
with (27b)

The output function of the filter is defined as follows:

(28a)
(28b)

3.7 Graph pooling

As in euclidean convolutional neural nets, we want to implement not only convolutional layers but also pooling/downsampling layers. In the existing literature, downsampling is view as graph coarsening [4]. A recurrent graph coarsening algorithm choice seems to be Graclus [8] (used in [17, 7]).

We propose to use a community detection algorithm (Louvain method [2]) as the base of our graph pooling layer. Louvain method deals with weighted graphs. In our case, edge weights are computed by scalar products of involved vertices. This choice is brought by the following intuition: the higher nodes attributes scalar product get, the more these vertices probabilities to fall in the same cluster increases (because a higher scalar product implies vector similarity).

3.8 Hyperparameterization

As in any neural network, graph neural networks have parameters that won’t be optimized from gradient descent.

The first one is the graph filter (its number of nodes and adjacency matrix). The number of nodes in the graph filter is analogous to the size of a classic convolution kernel. A kernel filter is equivalent to a 9 nodes filter graph with grid-like adjacency. The second hyperparameter is the size of extracted neighbourhoods graphs which is the maximum node distance in a given node neighbourhood. A 2-hop-sized neighbourhoods will include nodes that can be reached from the origin node in two hops or less.

These hyperparameters could be optimized through grid or random search. However, to restrain our study, we will consider the following postulate: a graph filter should be congruent with extracted neighbourhoods. In other words, the two should have equal sizes and identical topologies as much as possible. This postulate comes from classic graph convolution where each kernel coefficient is matched with one and only one image coefficient.

3.9 Choosing the graph matching solver

The algorithm for solving the graph matching problem is a critical element for the model. The first reason is that it is potentially the highest in complexity since graph matching problems are up to NP-hard. Additionally, graph matching is solved as many times as there are vertices in the input graph (the size of every problem to solve being that of every vertex neighbourhood).

We opted for a bipartite (BP) graph matching algorithm [21]. Complexity of such an algorithm is among the lowest (polynomial time) for solving error-tolerant graph matching problems suboptimally.

Bipartite graph matching algorithm reduces graph matching to vertex matching by embedding an estimation for edge costs in the vertex costs. This edge cost estimation is computed by solving an edge-assignment problem for every node-matching possibility. Therefore, BP has to solve as many matching problems as there are edge-costs.

We used a variant of BP called Square Fast BP [24] where the cost matrix for vertex matching is of size with and being number of vertices in filter graph and neighbourhood graph . Assuming both neighbourhood and filter graphs are complete, a matching problem complexity is .

As a consequence, worst case complexity with fast bipartite matching is the following:

Some preliminary experiments showed impracticable computation time of the full model. As a first workaround, the experimental part of this paper will focus on ”no edge matching” model. This workaround allowed to keep processing to an acceptable level (that is suitable for small classification experiments). Edge cost estimation by edge matching is no longer required. The simplified model has the following pointwise complexity:

4 Experimental work

In this section, we test the model according to several parameters. We want to test our model with a simple classification task on MNIST digit images.

4.1 Baselines

Our approach was compared with two other approaches:

  • Vanilla CNN layer

  • [17] mixture model graph CNN.

Same network topology was used for all approaches. It consists of classical ConvPool blocks linearly connected. Figure 2 shows the exact network structure in use. In case of graph convolution, convolution filters equivalents are nodes filters and pooling becomes 4 nodes pooling. is set depending on average graph connectivity in a given dataset: if the average number of neighbours in a given dataset is 9, .

The last layer is a global pooling one. As in the euclidean case, it consists in aggregating each filter feature map in one scalar value. In our case, feature maps are aggregated by taking its average value.

conv, 32

maxpool/2

conv, 64

maxpool/2

conv, 128

maxpool/2

global avgpool

fc, n

Figure 2: Network structure used for graph convolution experiments

4.2 Data

Quantitative experiments in this section are operated on digit images of MNIST dataset [13]. We chose this dataset as this was in use in the graph convolution literature. MNIST is a good ”hello world” machine learning (ML) dataset. MNIST helps at quickly iterating on the learning model. Performance information gathered from experiments on MNIST can be great for judging how the model might perform on much harder and larger datasets like ImageNet.

In addition to the original MNIST dataset, a rotated version was used [12]. To compare results with MNIST-rotated, MNIST-original has to be modified as follows. MNIST-reduced proportions are unusual: 10000, 2000 and 50000 images respectively for train, validation and test whereas MNIST-original has 60000 and 10000 images respectively for train/validation and test. We used MNIST-reduced, a resampled version of MNIST-original to fit MNIST-rotated ratio between subsets cardinalities: MNIST-reduced and MNIST-rotated have both 10000, 2000 and 50000 images respectively for train, validation and test. All the set cardinalities are summed up in Table 2. Note that the test set of MNIST-reduced is larger than the training set by a factor 5 consequently, the generalization ability is better assessed.

Dataset Training set Validation set Testing set
MNIST-original 48 000 12 000 10 000
MNIST-rotated 10 000 2 000 50 000
MNIST-reduced
MNIST-mixed
Table 2: Different MNIST-based graph datasets

Lastly, to test rotation invariance, a third MNIST-based dataset was added: MNIST-mixed. It was generated by combining MNIST-reduced train and validation sets and MNIST-rotated test set. It is design so that the models are trained on rotation-free images but tested on rotated images.

As MNIST is an image dataset, a graph-based representation of images has to be chosen. Representations used in [17] are superpixels graphs and grid graphs. We used grids ( images resized to ) and generated 75 superpixels Region Adjacency Graphs (RAG) using SLIC algorithm [1] with superpixel adjacency as edges (see Table 3). Sample graphs are depicted in Figure 3.

Figure 3: MNIST graphs. Top is grid, bottom is 75 superpixels RAG. Red symbolizes vertex frontiers and green shows edges.
Representation Nb nodes Vertex attributes Edge attributes
grid Pixel intensities Relative polar coordinates
75 superpixels 75 (average) Average superpixel intensities
Table 3: MNIST representations

4.3 Parameterization

Following hyperparameters were set after preliminary tests were conducted: Models are trained during 50 epochs using Adaptive Moment (Adam) gradient descent (learning rate ). Neighbourhood reach in use is 1-hop and filter size was set in accordance with average neighbourhood size (9 nodes).

4.4 Protocol

Following experiments were conducted:

Experiment 1

Models are tested on MNIST digit images classification task

Experiment 2

Several neighbourhood connectivities are tested on our model (1 and 2 hops)

Experiment 3

Rotation invariance is investigated. Spatial information for our datasets is conveyed by edge attributes. In such a frame, as our ”no edges” model ignores edge attributes, it is theoretically rotation-invariant. Experiment 3 aims at experimentally validating this claim. This is done by training models on unrotated images and testing on rotated ones. MNIST-mixed set is used to this end.

Experiment 4

A sample filter is visualized on some MNIST example images

Experiment 5

Graph based methods are tested on regular grids and on irregular graphs (75 superpixels RAG) for testing sensitivity to domain changes

As stated before and because of technical limitations, experiments involving MNIST datasets will focus on the two first MNIST classes (referred to as MNIST-2class)

4.5 Results

Results on MNIST-2class are listed on Table 4. Results include classification from both grid graphs and SLIC 75-superpixels graphs. This table shows results for each dataset using classic CNN, MoNet [17] and our method.

Experiment 1: MNIST

On MNIST-2class, our model competes in a 3% margin with used baselines.

Experiment 2: Neighbourhood size

Extending the neighbourhood size did not have any significant effect on performance (see Table 5)

Experiment 3: Rotation invariance

On MNIST-mixed, no performance loss was observed on testing for our method. This is especially visible on grid graphs results where only classic CNN and MoNet show a 10 percent loss. A trivial explanation of how is this invariance obtained is that our graph convolution filters are non-oriented because edge attributes are ignored.

Representation Dataset CNN MoNet Ours
Valid Test Valid Test Valid Test
grid MNIST reduced 100 % 99.88 % 97.56 % 99.40 % 99.51 % 97.76 %
MNIST mixed 100 % 89.87 % 97.76 % 88.90 % 99.27 % 95.63 %
75 superpixels MNIST reduced 94.13 % 92.70 % 94.13 % 89.53 %
MNIST mixed 94.13 % 92.90 % 94.62 % 94.17 %
Table 4: Recognition rates on MNIST 2class
Representation 1 hop 2 hops
Valid Test Valid Test
grid 99.02% 97.55% 98.04% 96.47%
75 superpixels 97.55% 93.74% 96.82% 93.62%
Table 5: Recognition rates for different neighbourhood sizes on MNIST reduced 2 class

Experiment 4: Visualizing graph convolution on images

As an additional experimental material, we tried to visualize the result of a handcrafted filter on images. As for euclidean convolution, the most straightforward filter operation is edge detection. This is usually done by using Sobel operator that calculates intensity gradient at each spatial point of the image.

A potential equivalent graph convolution filter is (the filter is a 2-nodes graph with respective attributes and .) The intuition behind this filter is that the nodes will be matched respectively to the lowest (for the attributed node) and highest (for the attributed node) intensities. As a consequence, this filter will find the highest node attribute difference in every node neighbourhood, making it a sort of eager edge detection filter.

We applied this filter on grid graphs to visualize the output graph as an image (as the graph-to-image transformation is trivial). Figure 4 shows example applications of this filter on both original and rotated examples. This last figure suggests rotation invariance.

Figure 4: MNIST graph convolution examples (respectively original, convoluted and rotated convoluted versions)

Experiment 5: Testing graph convolution across domain

A particular concern on graph convolution operators is sensitivity to domain changes, i.e. capacity to identify similarities on irregular graphs. Both graph convolution tested show little performance loss between regular (grids) and irregular (75 superpixels RAG) results.

Training duration

As mentioned in Subsection 3.9, complexity of the model makes experiment tedious to lead. Epoch durations are given in Table 6.

Representation CNN MoNet Ours
grid 1s 1s 17min 29s
75 superpixels NA 1s 2min 42s
Table 6: Epoch durations on MNIST 2class (Models use different implementations/hardware: CNN is Keras on GPU, MoNet is Theano on GPU and Ours is Keras on CPU)

5 Conclusion and perspectives

In this paper, a graph convolutional neural network layer is proposed and tested in a simplified form.

Our model performance is at state of the art level on simple tasks. It shows robustness with respect to graph domain changes.

Following improvements could highly benefit to performances and computational costs. The bipartite solver is not the most suitable choice for our use. Complexity seems to be too high for an efficient application. Using a less complex solver would allow the full model to be used in practice and applied to larger graphs. Using the edge information would probably enhance performances significantly. Moreover, it will probably help with solving more complex problems.

Another point of improvement is regarding differentiation: the solver operator is not differentiable. The gradient must then be approximated by neglecting contribution of the solver intermediary states. Finding a differentiable solver would enhance trainability of the model.

Addressing these issues will not only enhance the current degraded version of the model but also allow to implement the full model in a usable form. This model has the peculiarity to learn edge attributes as well as vertex attributes. It is to our knowledge the only graph convolution formulation that suggests to modify the spatiality of edge attibutes.

Finally, investigating our downsampling layer would justify a whole study for itself. It would be interesting to study the quality of the downsampled graphs but also to study the effect of weighting edges regarding vertex similarity.

6 Code

Code for running the model can be found at https://github.com/prafiny/graphconv

References

  1. R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua and S. Süsstrunk (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence 34 (11), pp. 2274–2282. Cited by: §4.2.
  2. V. D. Blondel, J. Guillaume, R. Lambiotte and E. Lefebvre (2008-10) Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008 (10), pp. 10008. External Links: Document Cited by: §3.7.
  3. S. Bougleux, L. Brun, V. Carletti, P. Foggia, B. Gaüzère and M. Vento (2017) Graph edit distance as a quadratic assignment problem. Pattern Recognition Letters 87, pp. 38–46. External Links: Link, Document Cited by: §1.1.1.
  4. M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam and P. Vandergheynst (2016) Geometric deep learning: going beyond euclidean data. CoRR abs/1611.08097. Cited by: §3.7.
  5. H. Bunke and K. Riesen (2008) Graph classification on dissimilarity space embedding. See Structural, syntactic, and statistical pattern recognition, joint IAPR international workshop, SSPR & SPR 2008, orlando, usa, december 4-6, 2008. proceedings, da Vitoria Lobo et al., pp. 2. Cited by: §1.1.2.
  6. N. da Vitoria Lobo, T. Kasparis, F. Roli, J. T. Kwok, M. Georgiopoulos, G. C. Anagnostopoulos and M. Loog (Eds.) (2008) Structural, syntactic, and statistical pattern recognition, joint IAPR international workshop, SSPR & SPR 2008, orlando, usa, december 4-6, 2008. proceedings. Lecture Notes in Computer Science, Vol. 5342, Springer. External Links: Link, Document, ISBN 978-3-540-89688-3 Cited by: 5.
  7. M. Defferrard, X. Bresson and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. CoRR abs/1606.09375. Cited by: §3.7.
  8. I. S. Dhillon, Y. Guan and B. Kulis (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE transactions on pattern analysis and machine intelligence 29 (11), pp. 1944–1957. Cited by: §3.7.
  9. B. Gaüzere, L. Brun and D. Villemin (2012) Two new graphs kernels in chemoinformatics. Pattern Recogn. Lett. 33 (15), pp. 2038 – 2047. Cited by: §1.1.2, §1.
  10. W. L. Hamilton, R. Ying and J. Leskovec (2017) Inductive representation learning on large graphs. CoRR abs/1706.02216. Cited by: §2.
  11. T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907. External Links: Link Cited by: §2, §2.
  12. H. Larochelle, D. Erhan, A. Courville, J. Bergstra and Y. Bengio (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th international conference on Machine learning, pp. 473–480. Cited by: §4.2.
  13. Y. LeCun, L. Bottou, Y. Bengio and P. Haffner (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §4.2.
  14. M. Leordeanu, M. Hebert and R. Sukthankar (2009) An integer projected fixed point method for graph matching and map inference. In Proceedings Neural Information Processing Systems, pp. 1114–1122. Cited by: §1.1.1.
  15. Z. Liu and H. Qiao (2014) GNCCP - graduated nonconvexityand concavity procedure. IEEE Trans. Pattern Anal. Mach. Intell. 36, pp. 1258–1267. Cited by: §1.1.1.
  16. M. M. Luqman, J. Ramel, J. Lladós and T. Brouard (2013) Fuzzy multilevel graph embedding. Pattern Recognition 46 (2), pp. 551–565. Cited by: §1.1.2.
  17. F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda and M. M. Bronstein (2016) Geometric deep learning on graphs and manifolds using mixture model cnns. CoRR abs/1611.08402. Cited by: §2, §2, §3.7, 2nd item, §4.2, §4.5.
  18. M. Neuhaus and H. Bunke (2007) Bridging the gap between graph edit distance and kernel machines. Series in Machine Perception and Artificial Intelligence, Vol. 68, WorldScientific. External Links: ISBN 978-981-270-817-5 Cited by: §1.1.2.
  19. A. Nowak, S. Villar, A. S. Bandeira and J. Bruna (2017) A note on learning algorithms for quadratic assignment with graph neural networks. CoRR abs/1706.07450. External Links: Link Cited by: §2, §2.
  20. R. Raveaux, J. Burie and J. Ogier (2013) Structured representations in a content based image retrieval context. J. Visual Communication and Image Representation 24 (8), pp. 1252–1268. Cited by: §1.
  21. K. Riesen and H. Bunke (2009) Approximate graph edit distance computation by means of bipartite graph matching. Image Vision Comput. 27 (7), pp. 950–959. Cited by: §3.9.
  22. K. Riesen (2015) Structural pattern recognition with graph edit distance - approximation algorithms and applications. Advances in Computer Vision and Pattern Recognition, Springer. Cited by: §1.1.1, §1.
  23. V. Roth, J. Laub, M. Kawanabe and J. M. Buhmann (2003) Optimal cluster preserving embedding of nonmetric proximity data. IEEE Trans. Pattern Anal. Mach. Intell. 25 (12), pp. 1540–1551. Cited by: §1.1.2.
  24. F. Serratosa (2015) Speeding up fast bipartite graph matching through a new cost matrix. International Journal of Pattern Recognition and Artificial Intelligence 29 (02), pp. 1550010. Cited by: §3.9.
  25. P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò and Y. Bengio (2017-10) Graph Attention Networks. arXiv e-prints, pp. arXiv:1710.10903. Cited by: §2, §2.
  26. Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang and P. S. Yu (2019) A comprehensive survey on graph neural networks. CoRR abs/1901.00596. External Links: Link Cited by: §2.
  27. Z. Zhang, P. Cui and W. Zhu (2018) Deep learning on graphs: A survey. CoRR abs/1812.04202. External Links: Link Cited by: §2.
  28. J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu and M. Sun (2018) Graph neural networks: A review of methods and applications. CoRR abs/1812.08434. External Links: Link Cited by: §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
409243
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description