A Simple Baseline Algorithm for Graph Classification
Abstract
Graph classification has recently received a lot of attention from various fields of machine learning e.g. kernel methods, sequential modeling or graph embedding. All these approaches offer promising results with different respective strengths and weaknesses. However, most of them rely on complex mathematics and require heavy computational power to achieve their best performance. We propose a simple and fast algorithm based on the spectral decomposition of graph Laplacian to perform graph classification and get a first reference score for a dataset. We show that this method obtains competitive results compared to stateoftheart algorithms.
1 Introduction
Graph classification methods can schematically be divided into three categories: graph kernels, sequential methods and embedding methods. In this section, we briefly present these different approaches, focusing on methods that only use the structure of the graph and no exogenous information, such as node features, to perform classification as we only want to compare the capacity of the algorithms to capture structural information.
Kernel methods
Kernel methods [13, 14, 12, 11] perform pairwise comparisons between the graphs of the dataset and apply a classifier, usually a support vector machine (SVM), on the similarity matrix. In order to maintain the number of comparisons tractable when the number of graphs is large, they often use Nyström algorithm [18] to compute a low rank approximation of the similarity matrix. The key is to construct an efficient kernel that can be applied to graphs of varying sizes and captures useful features for the downstream classification.
Sequential methods
Some methods tackle the varying sizes of graphs by processing them as a sequence of nodes. Earliest models used random walk based representations [4, 19]. More recently, [7] or [20] transform a graph into a sequence of fixed size vectors, corresponding to its nodes, which is fed to a recurrent neural network. The two main challenges in this approach are the design of the embedding function for the nodes and the order in which the embeddings are given to the recurrent neural network.
Embedding methods
Embedding methods [6, 1, 5, 10], derive a fixed number of features for each graph which is used as a vector representation for classification. Even though deriving a good set of features is often a difficult task, this approach has the benefit of being compatible with any standard classifier in a plug and play fashion (SVM, random forest, multilayer perceptron…). Our model belongs to this class of methods as we rely on spectral features of the graph.
2 Model
Let be an undirected and unweighted graph and its boolean adjacency matrix with respect to an arbitrary indexing of the nodes. is assumed to be connected, otherwise, we extract its largest connected component. Let be the matrix of node degrees, the normalized Laplacian of is defined as
(1) 
We use the smallest positive eigenvalues of
If the graph has less than nodes, we use right zero padding to get a vector of appropriate dimensions: . We denote this embedding as spectral features (SF).
A major benefit of this representation is that, unlike traditional node embedding [2], it does not depend on the indexing of the nodes. Besides, it can be interpreted in different ways. In [3], each eigenvalue of the Laplacian corresponds to the energy level of a stable configuration of the nodes in the embedding space. The lower the energy, the stabler the configuration. In [17], these eigenvalues correspond to frequencies associated to a Fourier decomposition of any signal living on the vertices of the graph. Thus, the truncation of the Fourier decomposition acts as lowpass filter on the signal.
The choice of the classifier is left to the discretion of the user. In our experiments, we chose a random forest classifier (RFC) which offers a good computational speed versus accuracy tradeoff. Results with several other common classifiers are displayed in appendix A.
An illustration of the model is proposed in figure 1.
3 Experiments
Datasets
We evaluated our model against some standard datasets from biology: Mutag (MT), Predictive Toxicology Challenge (PTC), Enzymes (EZ), Proteins Full (PF), Dobson and Doig (DD) and National Cancer Institute (NCI1) [8]. All graphs represent chemical compounds. Nodes are molecular substructures (typically atoms) and edges represent connections between these substructures (chemical bound or spatial proximity). In MT, the compounds are either mutagenic and not mutagenic while in PTC, they are either carcinogens or not. EZ contains tertiary structures of proteins from the 6 Enzyme Commission top level classes. In DD, graphs represent secondary structures of proteins being either enzyme or not enzyme. PF is a subset of DD where the largest graphs have been removed. In NCI1, compounds have either an anticancer activity or not. Statistics about the graphs are presented in table 1.
MT  PTC  EZ  PF  DD  NCI1  
graphs  188  344  600  1113  1178  4110 
classes  2  2  6  2  2  2 
bias ()  66.5  55.8  16.7  59.6  58.7  50.0 
avg. V  18  14  33  39  284  30 
avg. E  39  15  124  146  1431  65 
Experimental setup
Each dataset is divided into 10 folds such that the class proportions are preserved in each fold for all datasets. These folds are then used for crossvalidation i.e, one fold serves as the testing set while the other ones compose the training set. Results are averaged over all testing sets. We built the folds using scikitlearn [15] StratifiedKFold function with the random seed fixed to in order to get reproducible results.
The embedding dimension is set to the average number of nodes for each dataset (see appendix B for additional experiments) and a unique set of hyperparameters for the classifier is used for all datasets. We used the random forest classifier from scikitlearn with class_weights: balanced. The other nondefault hyper parameters were selected by randomized cross validation over the different datasets (see table 3 for more details). All experiments were run on a laptop equipped with an intel core i7 vPro processor and 16GB of RAM.
Results
We compare our results (RFC) to those obtained by Earth Mover’s Distance [14] (EMD), Pyramid Match [14] (PM), FeatureBased [1] (FB), DynamicBased Features [6] (DyF) and Stochastic Graphlet Embedding [5] (SGE). All values are directly taken from the aforementioned papers as they used a setup similar to ours. For algorithms presenting results with and without node features, we reported the results without node features. For algorithms presenting results with several sets of hyperparameters, we reported the results for the set of parameters that gave the best performance on the largest number of datasets. Results are reported in table 2.
MT  PTC  EZ  PF  DD  NCI1  
EMD  86.1  57.7  36.8      72.7 
PM  85.6  59.4  28.2    75.6  69.7 
FB  84.7  55.6  29.0  70.0    62.9 
DyF  86.3  56.2  26.6  73.1    66.6 
SGE  87.2  60.0  40.7    76.6   
SF + RFC  88.4  62.8  43.7  73.6  75.4  75.2 

We see that our model achieves good performance compared to the stateofthe art. It gives the best result on five out of the six datasets (MT, PTC, EZ, PF, NCI1). Besides, it did not require any perdataset hyper parameters intensive tuning as we used the same random forest for all datasets.
Computation analysis
The results were obtained extremely quickly (some kernel methods cannot run within one day on DD for example [5]). Embedding all graphs took approximately minutes (most of it dedicated to DD which has the largest graphs and largest embedding dimension), while training and testing the random forest on all folds took less than a minute. Hence, the total time to run all described experiments was less than minutes.
Robustness analysis
In order to confirm the intrinsic quality of our spectral graph representation, we performed robustness analysis of our model with respect to the classifier. To do so, we measured the marginal variation of accuracy with respect to some hyperparameters, the others being fixed.
To ensure that we only capture parameters sensibility, we fixed the seed of the random forest to for all experiments. See table 3 for the parameters grid and figure 2 for the results.
We see that our method is very robust against RFC hyperparameters variability. Outliers in boxplots are all due to highly improper parameters (, , …).
RFC hyperparameters  Hyperparameters grid 

1, 10, 50, 100, 250, 500, 750, 1000  
1, 2, 3, 4, 5, 6  
1, 5, 10, 50, 100, 250, 500, 750, 1000  
True, False 
Conclusion
We experimentally showed the interest of normalized Laplacian eigenvalues for graph classification. This feature is easy to extract and can be combined to any other graph representation in order to improve the model performances. We hope it will inspire new approaches to graph classification. Experimenting with permutationinvariant classifiers [9, 16] could be a natural continuation of this work in order to properly include information from eigenvectors of which are nodeindexing dependent.
Acknowledgments
We would like to thank Thomas Bonald and Sebastien Razakarivony for their reviews and help. This work is supported by the company Safran through the CIFRE convention 2017/1317.
Appendix A Results for different classifiers
Besides RFC, we experimented with different standard classifiers combined to our spectral embedding. Namely: nearest neighbors classifier (NNC), 2layers perceptron with Relu nonlinearity (MLP), support vector machine with one versus one classification (SVM) and ridge regression classifier (RRC). Results are reported in table 4.
MT  PTC  EZ  PF  DD  NCI1  

SF + RFC  88.4  62.8  43.7  73.6  75.4  75.2 
SF + 1NNC  86.8  59.3  37.3  65.6  69.6  68.3 
SF + 15NNC  85.7  61.9  33.7  70.4  75.0  69.6 
SF + MLP  86.3  60.5  31.8  71.6  75.6  62.3 
SF + SVM  85.3  60.8  31.3  73.0  75.0  63.9 
SF + RRC  84.2  59.6  26.7  71.5  75.0  62.2 

As we can see, RFC provides the best results for all datasets except DD where MLP has an accuracy of 75.6 against 75.4. Our intuition to explain these good results is that the decision tree classifier, which is at the core of RFC, is an algorithm based on level thresholding. As explained in section 2, our embedding represents a sequence of energy levels, being above or below a certain level is thus likely to be meaningful for classification.
Appendix B Results for different embedding dimensions
We experimented with different embedding dimensions for RFC: . The hyperparameters are the same as in section 3. Results are reported in table 5.
MT  PTC  EZ  PF  DD  NCI1  

1  76.2  56.1  23.8  64.0  57.2  58.2 
5  86.8  62.5  39.0  69.6  73.9  72.5 
10  86.8  61.4  42.8  71.7  75.5  75.5 
25  88.4  62.8  42.7  72.8  75.7  75.2 
50  88.4  62.8  43.7  73.6  75.1  75.2 

We see that even the first energy level is sufficient to obtain a nontrivial classification. provides results competitive with the state of the art while provides results relatively similar to . We did not experiment with larger values of as it would mostly result into additional zero padding for most graphs. Note that embedding all graphs for took less than a minute in our experimental setting.
Footnotes
 footnotemark:
 The smallest eigenvalue of the normalized Laplacian of a connected graph is with multiplicity one.
References
 Ian Barnett, Nishant Malik, Marieke L Kuijjer, Peter J Mucha, and JukkaPekka Onnela. Featurebased classification of networks. arXiv preprint arXiv:1610.05868, 2016.
 Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems, pages 585–591, 2002.
 Thomas Bonald, Alexandre Hollocou, and Marc Lelarge. Weighted spectral embedding of graphs. arXiv preprint arXiv:1809.11115, 2018.
 Jérôme Callut, Kevin Françoisse, Marco Saerens, and Pierre Dupont. Classification in graphs using discriminative random walks, 2008.
 Anjan Dutta and Hichem Sahbi. High order stochastic graphlet embedding for graphbased pattern recognition. arXiv preprint arXiv:1702.00156, 2017.
 Leonardo Gutierrez Gomez, Benjamin Chiem, and JeanCharles Delvenne. Dynamics based features for graph classification. arXiv preprint arXiv:1705.10817, 2017.
 Yu Jin and Joseph F JaJa. Learning graphlevel representations with gated recurrent neural networks. arXiv preprint arXiv:1805.07683, 2018.
 Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann. Benchmark data sets for graph kernels, 2016. http://graphkernels.cs.tudortmund.de.
 Thomas Lucas, Corentin Tallec, Jakob Verbeek, and Yann Ollivier. Mixed batches and symmetric discriminators for gan training. arXiv preprint arXiv:1806.07185, 2018.
 Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, and Shantanu Jaiswal. graph2vec: Learning distributed representations of graphs. CoRR, abs/1707.05005, 2017.
 Marion Neumann, Roman Garnett, Christian Bauckhage, and Kristian Kersting. Propagation kernels: efficient graph kernels from propagated information. Machine Learning, 102(2):209–245, 2016.
 Giannis Nikolentzos, Polykarpos Meladianos, Stratis Limnios, and Michalis Vazirgiannis. A degeneracy framework for graph similarity. In IJCAI, pages 2595–2601, 2018.
 Giannis Nikolentzos, Polykarpos Meladianos, Antoine JeanPierre Tixier, Konstantinos Skianis, and Michalis Vazirgiannis. Kernel graph convolutional neural networks. arXiv preprint arXiv:1710.10689, 2017.
 Giannis Nikolentzos, Polykarpos Meladianos, and Michalis Vazirgiannis. Matching node embeddings for graph similarity. In AAAI, pages 2429–2435, 2017.
 Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikitlearn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.
 Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 1(2):4, 2017.
 David I Shuman, Benjamin Ricaud, and Pierre Vandergheynst. Vertexfrequency analysis on graphs. Applied and Computational Harmonic Analysis, 40(2):260–291, 2016.
 Christopher KI Williams and Matthias Seeger. Using the nyström method to speed up kernel machines. In Advances in neural information processing systems, pages 682–688, 2001.
 Xiaohua Xu, Lin Lu, Ping He, Zhoujin Pan, and Cheng Jing. Protein classification using random walk on graph. In International Conference on Intelligent Computing, pages 180–184. Springer, 2012.
 Jiaxuan You, Rex Ying, Xiang Ren, William L Hamilton, and Jure Leskovec. Graphrnn: A deep generative model for graphs. arXiv preprint arXiv:1802.08773, 2018.