A Simple Baseline Algorithm for Graph Classification

A Simple Baseline Algorithm for Graph Classification

Abstract

Graph classification has recently received a lot of attention from various fields of machine learning e.g. kernel methods, sequential modeling or graph embedding. All these approaches offer promising results with different respective strengths and weaknesses. However, most of them rely on complex mathematics and require heavy computational power to achieve their best performance. We propose a simple and fast algorithm based on the spectral decomposition of graph Laplacian to perform graph classification and get a first reference score for a dataset. We show that this method obtains competitive results compared to state-of-the-art algorithms.

1 Introduction

Graph classification methods can schematically be divided into three categories: graph kernels, sequential methods and embedding methods. In this section, we briefly present these different approaches, focusing on methods that only use the structure of the graph and no exogenous information, such as node features, to perform classification as we only want to compare the capacity of the algorithms to capture structural information.

Kernel methods

Kernel methods [13, 14, 12, 11] perform pairwise comparisons between the graphs of the dataset and apply a classifier, usually a support vector machine (SVM), on the similarity matrix. In order to maintain the number of comparisons tractable when the number of graphs is large, they often use Nyström algorithm [18] to compute a low rank approximation of the similarity matrix. The key is to construct an efficient kernel that can be applied to graphs of varying sizes and captures useful features for the downstream classification.

Sequential methods

Some methods tackle the varying sizes of graphs by processing them as a sequence of nodes. Earliest models used random walk based representations [4, 19]. More recently, [7] or [20] transform a graph into a sequence of fixed size vectors, corresponding to its nodes, which is fed to a recurrent neural network. The two main challenges in this approach are the design of the embedding function for the nodes and the order in which the embeddings are given to the recurrent neural network.

Embedding methods

Embedding methods [6, 1, 5, 10], derive a fixed number of features for each graph which is used as a vector representation for classification. Even though deriving a good set of features is often a difficult task, this approach has the benefit of being compatible with any standard classifier in a plug and play fashion (SVM, random forest, multilayer perceptron…). Our model belongs to this class of methods as we rely on spectral features of the graph.

2 Model

Let be an undirected and unweighted graph and its boolean adjacency matrix with respect to an arbitrary indexing of the nodes. is assumed to be connected, otherwise, we extract its largest connected component. Let be the matrix of node degrees, the normalized Laplacian of is defined as

(1)

We use the smallest positive eigenvalues of 2 in ascending order as input of the classifier:

If the graph has less than nodes, we use right zero padding to get a vector of appropriate dimensions: . We denote this embedding as spectral features (SF).

A major benefit of this representation is that, unlike traditional node embedding [2], it does not depend on the indexing of the nodes. Besides, it can be interpreted in different ways. In [3], each eigenvalue of the Laplacian corresponds to the energy level of a stable configuration of the nodes in the embedding space. The lower the energy, the stabler the configuration. In [17], these eigenvalues correspond to frequencies associated to a Fourier decomposition of any signal living on the vertices of the graph. Thus, the truncation of the Fourier decomposition acts as low-pass filter on the signal.

The choice of the classifier is left to the discretion of the user. In our experiments, we chose a random forest classifier (RFC) which offers a good computational speed versus accuracy trade-off. Results with several other common classifiers are displayed in appendix A.

An illustration of the model is proposed in figure 1.

Figure 1: Schematic view of our model. denotes the Laplacian as in equation 1 and the predicted class.

3 Experiments

Datasets

We evaluated our model against some standard datasets from biology: Mutag (MT), Predictive Toxicology Challenge (PTC), Enzymes (EZ), Proteins Full (PF), Dobson and Doig (DD) and National Cancer Institute (NCI1) [8]. All graphs represent chemical compounds. Nodes are molecular substructures (typically atoms) and edges represent connections between these substructures (chemical bound or spatial proximity). In MT, the compounds are either mutagenic and not mutagenic while in PTC, they are either carcinogens or not. EZ contains tertiary structures of proteins from the 6 Enzyme Commission top level classes. In DD, graphs represent secondary structures of proteins being either enzyme or not enzyme. PF is a subset of DD where the largest graphs have been removed. In NCI1, compounds have either an anti-cancer activity or not. Statistics about the graphs are presented in table 1.

MT PTC EZ PF DD NCI1
graphs 188 344 600 1113 1178 4110
classes 2 2 6 2 2 2
bias () 66.5 55.8 16.7 59.6 58.7 50.0
avg. |V| 18 14 33 39 284 30
avg. |E| 39 15 124 146 1431 65
Table 1: Basic characteristics of the datasets. Bias indicates the proportion of the dominant class.

Experimental setup

Each dataset is divided into 10 folds such that the class proportions are preserved in each fold for all datasets. These folds are then used for cross-validation i.e, one fold serves as the testing set while the other ones compose the training set. Results are averaged over all testing sets. We built the folds using scikit-learn [15] StratifiedKFold function with the random seed fixed to in order to get reproducible results.

The embedding dimension is set to the average number of nodes for each dataset (see appendix B for additional experiments) and a unique set of hyper-parameters for the classifier is used for all datasets. We used the random forest classifier from scikit-learn with class_weights: balanced. The other non-default hyper parameters were selected by randomized cross validation over the different datasets (see table 3 for more details). All experiments were run on a laptop equipped with an intel core i7 vPro processor and 16GB of RAM.

Results

We compare our results (RFC) to those obtained by Earth Mover’s Distance [14] (EMD), Pyramid Match [14] (PM), Feature-Based [1] (FB), Dynamic-Based Features [6] (DyF) and Stochastic Graphlet Embedding [5] (SGE). All values are directly taken from the aforementioned papers as they used a setup similar to ours. For algorithms presenting results with and without node features, we reported the results without node features. For algorithms presenting results with several sets of hyper-parameters, we reported the results for the set of parameters that gave the best performance on the largest number of datasets. Results are reported in table 2.

MT PTC EZ PF DD NCI1
EMD 86.1 57.7 36.8 - - 72.7
PM 85.6 59.4 28.2 - 75.6 69.7
FB 84.7 55.6 29.0 70.0 - 62.9
DyF 86.3 56.2 26.6 73.1 - 66.6
SGE 87.2 60.0 40.7 - 76.6 -
SF + RFC 88.4 62.8 43.7 73.6 75.4 75.2

Table 2: Experimental accuracy () of different models plus ours over standard molecular datasets.

We see that our model achieves good performance compared to the state-of-the art. It gives the best result on five out of the six datasets (MT, PTC, EZ, PF, NCI1). Besides, it did not require any per-dataset hyper parameters intensive tuning as we used the same random forest for all datasets.

Computation analysis

The results were obtained extremely quickly (some kernel methods cannot run within one day on DD for example [5]). Embedding all graphs took approximately minutes (most of it dedicated to DD which has the largest graphs and largest embedding dimension), while training and testing the random forest on all folds took less than a minute. Hence, the total time to run all described experiments was less than minutes.

Robustness analysis

In order to confirm the intrinsic quality of our spectral graph representation, we performed robustness analysis of our model with respect to the classifier. To do so, we measured the marginal variation of accuracy with respect to some hyperparameters, the others being fixed.

To ensure that we only capture parameters sensibility, we fixed the seed of the random forest to for all experiments. See table 3 for the parameters grid and figure 2 for the results.

We see that our method is very robust against RFC hyperparameters variability. Outliers in boxplots are all due to highly improper parameters (, , …).

RFC hyperparameters Hyperparameters grid
1, 10, 50, 100, 250, 500, 750, 1000
1, 2, 3, 4, 5, 6
1, 5, 10, 50, 100, 250, 500, 750, 1000
True, False
Table 3: Parameters grid for RFC. Bold values correspond to parameters used in the experimental setup and reference values for robustness analysis.
Figure 2: Box plot representing , and percentiles (box), confidence (moustaches) and outliers (isolated points) of the empirical distribution of the classification accuracy with respect to four hyperparameters of the RFC.

Conclusion

We experimentally showed the interest of normalized Laplacian eigenvalues for graph classification. This feature is easy to extract and can be combined to any other graph representation in order to improve the model performances. We hope it will inspire new approaches to graph classification. Experimenting with permutation-invariant classifiers [9, 16] could be a natural continuation of this work in order to properly include information from eigenvectors of which are node-indexing dependent.

Acknowledgments

We would like to thank Thomas Bonald and Sebastien Razakarivony for their reviews and help. This work is supported by the company Safran through the CIFRE convention 2017/1317.

Appendix A Results for different classifiers

Besides RFC, we experimented with different standard classifiers combined to our spectral embedding. Namely: -nearest neighbors classifier (-NNC), 2-layers perceptron with Relu non-linearity (MLP), support vector machine with one versus one classification (SVM) and ridge regression classifier (RRC). Results are reported in table 4.

MT PTC EZ PF DD NCI1
SF + RFC 88.4 62.8 43.7 73.6 75.4 75.2
SF + 1-NNC 86.8 59.3 37.3 65.6 69.6 68.3
SF + 15-NNC 85.7 61.9 33.7 70.4 75.0 69.6
SF + MLP 86.3 60.5 31.8 71.6 75.6 62.3
SF + SVM 85.3 60.8 31.3 73.0 75.0 63.9
SF + RRC 84.2 59.6 26.7 71.5 75.0 62.2

Table 4: Accuracy () of different classifiers combined to the spectral features embedding.

As we can see, RFC provides the best results for all datasets except DD where MLP has an accuracy of 75.6 against 75.4. Our intuition to explain these good results is that the decision tree classifier, which is at the core of RFC, is an algorithm based on level thresholding. As explained in section 2, our embedding represents a sequence of energy levels, being above or below a certain level is thus likely to be meaningful for classification.

Appendix B Results for different embedding dimensions

We experimented with different embedding dimensions for RFC: . The hyperparameters are the same as in section 3. Results are reported in table 5.

MT PTC EZ PF DD NCI1
1 76.2 56.1 23.8 64.0 57.2 58.2
5 86.8 62.5 39.0 69.6 73.9 72.5
10 86.8 61.4 42.8 71.7 75.5 75.5
25 88.4 62.8 42.7 72.8 75.7 75.2
50 88.4 62.8 43.7 73.6 75.1 75.2

Table 5: Accuracy () of RF combined to the spectral features embedding of different dimensions.

We see that even the first energy level is sufficient to obtain a non-trivial classification. provides results competitive with the state of the art while provides results relatively similar to . We did not experiment with larger values of as it would mostly result into additional zero padding for most graphs. Note that embedding all graphs for took less than a minute in our experimental setting.

Footnotes

  1. footnotemark:
  2. The smallest eigenvalue of the normalized Laplacian of a connected graph is with multiplicity one.

References

  1. Ian Barnett, Nishant Malik, Marieke L Kuijjer, Peter J Mucha, and Jukka-Pekka Onnela. Feature-based classification of networks. arXiv preprint arXiv:1610.05868, 2016.
  2. Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems, pages 585–591, 2002.
  3. Thomas Bonald, Alexandre Hollocou, and Marc Lelarge. Weighted spectral embedding of graphs. arXiv preprint arXiv:1809.11115, 2018.
  4. Jérôme Callut, Kevin Françoisse, Marco Saerens, and Pierre Dupont. Classification in graphs using discriminative random walks, 2008.
  5. Anjan Dutta and Hichem Sahbi. High order stochastic graphlet embedding for graph-based pattern recognition. arXiv preprint arXiv:1702.00156, 2017.
  6. Leonardo Gutierrez Gomez, Benjamin Chiem, and Jean-Charles Delvenne. Dynamics based features for graph classification. arXiv preprint arXiv:1705.10817, 2017.
  7. Yu Jin and Joseph F JaJa. Learning graph-level representations with gated recurrent neural networks. arXiv preprint arXiv:1805.07683, 2018.
  8. Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann. Benchmark data sets for graph kernels, 2016. http://graphkernels.cs.tu-dortmund.de.
  9. Thomas Lucas, Corentin Tallec, Jakob Verbeek, and Yann Ollivier. Mixed batches and symmetric discriminators for gan training. arXiv preprint arXiv:1806.07185, 2018.
  10. Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, and Shantanu Jaiswal. graph2vec: Learning distributed representations of graphs. CoRR, abs/1707.05005, 2017.
  11. Marion Neumann, Roman Garnett, Christian Bauckhage, and Kristian Kersting. Propagation kernels: efficient graph kernels from propagated information. Machine Learning, 102(2):209–245, 2016.
  12. Giannis Nikolentzos, Polykarpos Meladianos, Stratis Limnios, and Michalis Vazirgiannis. A degeneracy framework for graph similarity. In IJCAI, pages 2595–2601, 2018.
  13. Giannis Nikolentzos, Polykarpos Meladianos, Antoine Jean-Pierre Tixier, Konstantinos Skianis, and Michalis Vazirgiannis. Kernel graph convolutional neural networks. arXiv preprint arXiv:1710.10689, 2017.
  14. Giannis Nikolentzos, Polykarpos Meladianos, and Michalis Vazirgiannis. Matching node embeddings for graph similarity. In AAAI, pages 2429–2435, 2017.
  15. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.
  16. Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 1(2):4, 2017.
  17. David I Shuman, Benjamin Ricaud, and Pierre Vandergheynst. Vertex-frequency analysis on graphs. Applied and Computational Harmonic Analysis, 40(2):260–291, 2016.
  18. Christopher KI Williams and Matthias Seeger. Using the nyström method to speed up kernel machines. In Advances in neural information processing systems, pages 682–688, 2001.
  19. Xiaohua Xu, Lin Lu, Ping He, Zhoujin Pan, and Cheng Jing. Protein classification using random walk on graph. In International Conference on Intelligent Computing, pages 180–184. Springer, 2012.
  20. Jiaxuan You, Rex Ying, Xiang Ren, William L Hamilton, and Jure Leskovec. Graphrnn: A deep generative model for graphs. arXiv preprint arXiv:1802.08773, 2018.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
311572
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description