Improving the Robustness of Graphs through Reinforcement Learning and Graph Neural Networks
Abstract
Graphs can be used to represent and reason about real world systems and a variety of metrics have been devised to quantify their global characteristics. An important property is robustness to failures and attacks, which is relevant for the infrastructure and communication networks that power modern society. Prior work on making topological modifications to a graph, e.g., adding edges, in order to increase robustness is typically based on local and spectral properties or a shallow search since robustness is expensive to compute directly. However, such strategies are necessarily suboptimal. In this work, we present RNet–DQN, an approach for constructing networks that uses Reinforcement Learning to address improving the robustness of graphs to random and targeted removals of nodes. In particular, the approach relies on changes in the estimated robustness as a reward signal and Graph Neural Networks for representing states. Experiments on synthetic and realworld graphs show that this approach can deliver performance superior to existing methods while being much cheaper to evaluate and generalizing to outofsample graphs, as well as to larger outofdistribution graphs in some cases. The approach is readily applicable to optimizing other global structural properties of graphs.
^{1} University College London
^{2} The Alan Turing Institute
^{3} University of Bologna
{v.darvariu, s.hailes, m.musolesi}@ucl.ac.uk
1 Introduction
Graphs are mathematical abstractions that can be used to model a variety of systems, from infrastructure and biological networks to social structures. Various methods for analyzing networks have been developed: these have been often used for understanding the systems themselves and range from mathematical models of how families of graphs are generated Watts and Strogatz (1998); BarabÃ¡si and Albert (1999) to measures of centrality for capturing the roles of vertices Bianchini, Gori, and Scarselli (2005) and global network characteristics Newman (2018) just to name a few.
A measure that has attracted significant interest from researchers and practitioners is robustness Newman (2003) (sometimes called resilience), which is typically defined as the capacity of the graph to withstand random failures, targeted attacks on key nodes, or some combination thereof. A network is considered robust if a large fraction (critical fraction) of nodes have to be removed before it becomes disconnected Cohen et al. (2000), its diameter increases Albert, Jeong, and BarabÃ¡si (2000), or its largest connected component diminishes in size Beygelzimer et al. (2005). Previous work has focused on the robustness of communication networks such as the Internet Cohen et al. (2001) and infrastructure networks used for transportation and energy distribution Cetinay, Devriendt, and Van Mieghem (2018).
In many practical cases, the goal is not to build robust networks from scratch, but to improve existing ones by modifying their structure. For example, Beygelzimer et al. (2005) approach this problem by considering edge addition or rewiring, based on random and preferential (wrt. node degree) modifications. Schneider et al. (2011) propose a “greedy” modification scheme based on random edge selection and swapping if the resilience metric improves. Another line of work focuses on the spectral decomposition of the graph Laplacian, and using properties such as the algebraic connectivity Wang and Van Mieghem (2008) and effective graph resistance Wang et al. (2014) to guide modifications. While simple and interpretable, these strategies may not yield the best solutions or generalize across networks with varying characteristics and sizes. Certainly, better solutions may be found by exhaustive search, but the time complexity of exploring all the possible topologies and the cost of computing the metric render this strategy infeasible. We thus ask whether generalizable robustness improvement strategies can be learned.
In this work, we propose addressing this question in the Reinforcement Learning (RL) framework. We formalize the process of adding edges to a graph as a Markov Decision Process (MDP) in which rewards are proportional to the improvement measured through a graphlevel objective function. We consider two objective functions that quantify robustness as the critical fraction of the network in the presence of random failures and targeted attacks. Inspired by recent successes of RL in solving combinatorial optimization problems on graphs Bello et al. (2016); Khalil et al. (2017), we make use of Graph Neural Network (GNN) architectures Gilmer et al. (2017) together with the Deep QNetwork (DQN) Mnih et al. (2015) algorithm. Recent work in goaldirected graph generation and improvement considers performing edge additions for adversarially attacking GNN classifiers Dai et al. (2018) and generating molecules with certain desirable properties using domainspecific rewards You et al. (2018a). In contrast, to the best of our knowledge, this is the first time that RL is used to learn how to optimize a global structural property of a graph through its construction. While in this paper we focus on robustness, other intrinsic global properties of graphs can be used as optimization targets.
The contribution of this paper is twofold. Firstly, we propose using RL as a framework for improving global structural properties of graphs through formulating the Graph Improvement MDP (GIMDP). Secondly, focusing on the robustness of graphs under failures and attacks as a core case study, we offer an indepth empirical evaluation that demonstrates significant advantages over existing approaches in this domain, both in terms of the quality of the solutions found as well as the time complexity of model evaluation. Since this approach addresses the problem of building robust networks with a DQN, we name it RNet–DQN.
The remainder of the paper is structured as follows. We provide the definitions of the GIMDP and the robustness measures in Section 2. Section 3 describes state and action representations for deep RL using GNNs. We describe our experimental setup in Section 4, and discuss our main results in Section 5. In Section 6 we review and compare the key works in this area. Finally, we conclude and offer a discussion of avenues for future work in Section 7.
2 Modeling Graph Robustness for Reinforcement Learning
MDP Preliminaries.
An MDP is one possible formalization of decision making processes. The decision maker, called agent, interacts with an environment. When in a state , the agent must take an action out of the set of valid ones, receiving a reward governed by the reward function . Finally, the agent finds itself in a new state , depending on a transition model that governs the joint probability distribution of transitioning to state after taking action in state . This sequence of interactions gives rise to a trajectory. The agent’s goal is to maximize the expected (possibly discounted) sum of rewards it receives over all trajectories. The tuple defines this MDP, where is a discount factor. We also define a policy , a distribution of actions over states, which fully determines the behavior of the agent. Given a policy , the stateaction value function is defined as the expected return when starting from , taking action , and subsequently following policy .
Modeling Graph Improvement.
Let be the set of labeled, undirected, unweighted, connected graphs with nodes; each such graph consists of a vertex set and edge set . Let be the subset of with . We also let be an objective function, and be a modification budget. Given an initial graph , the aim is to perform a series of edge additions to such that the resulting graph satisfies:
This combinatorial optimization problem can be cast as a sequential decisionmaking process. In order to enable scaling to large graphs, the agent has to select a node at each step, and an edge is added to the graph after every two decisions Dai et al. (2018). Tasks are episodic; each episode proceeds for at most steps. A trajectory visualization is shown in Figure 1. Formally, we map our problem to the Graph Improvement MDP (GIMDP) as follows:

State: The state is a tuple containing the graph and an edge stub . can be either the empty set or the singleton , where .

Action: corresponds to the selection of a node in . Letting the degree of node be , available actions are defined as:

Transitions: The transition model is defined as ,

Reward: The reward is defined as follows
^{1} :
Definition of Objective Functions for Robustness.
In this study, we are interested in the robustness of graphs as objective functions. Given a graph , we let the critical fraction be the minimum fraction of nodes that have to be removed from in some order for it to become disconnected (i.e., have more than one connected component). Connectedness is a crucial operational constraint and the higher this fraction is, the more robust the graph can be said to be. We note that the order in which nodes are removed can have an impact on , and corresponds to different attack strategies. We consider both random permutations of nodes in , as well as permutations , which are subject to the constraint that nodes must appear in the order of their degree, i.e.,
We define the objective functions in the following way:

Expected Critical Fraction to Random Removal:

Expected Critical Fraction to Targeted Removal:
As there are currently no closedform expressions to compute these quantities, we use Monte Carlo (MC) sampling to estimate them. For completeness, Algorithm 1 in the Technical Appendix describes how the simulations are performed. In the remainder of the paper, we use and to indicate their estimates obtained in this way. We highlight that evaluating an MC sample has time complexity : it involves checking connectedness (an operation) after the removal of each of the nodes. Typically, many such samples need to be used to obtain a lowvariance estimate of the quantities. Coupled with the number of possible topologies, the high cost renders even shallow search methods infeasible in this domain.
3 Learning to Build Robust Graphs with Function Approximation
We will now discuss the key ingredients behind the RNet–DQN algorithm: a scalable approach for learning how to build robust graphs.
While the problem formulation described in Section 2 may allow us to work with a tabular RL method, the number of states quickly becomes intractable – for example, there are approximately labeled, connected graphs with 20 vertices OEIS Foundation (2020). We thus require a means of considering graph properties that are labelagnostic, permutationinvariant, and generalize across similar states and actions. Graph Neural Network architectures address these requirements. In particular, we use a graph representation based on a variant of structure2vec (S2V) Dai, Dai, and Song (2016), a GNN architecture inspired by mean field inference in graphical models.
where is the neighborhood of node . We initialize embeddings with , and let . Once nodelevel embeddings are obtained, permutationinvariant embeddings for a subgraph can be derived by summing the nodelevel embeddings: . Summing embeddings has the following advantage over other aggregators proposed in the GNN literature (mean, max): it can distinguish between early states (with less selected nodes) and later states: intuitively, this captures how far the agent is in an episode. The node features are onehot 2dimensional vectors representing whether is the edge stub, and their use is required to satisfy the Markovian assumption behind the MDP framework (the agent “commits” to selecting , which is now part of its present state).
In Qlearning Watkins and Dayan (1992), the agent estimates the stateaction value function introduced earlier, and derives a deterministic policy that acts greedily with respect to it. The agent interacts with the environment and updates its estimates according to:
During learning, exploratory random actions are taken with probability . In the case of highdimensional state and action spaces, approaches that use a neural network to estimate have been successful in a variety of domains ranging from general gameplaying to continuous control Mnih et al. (2015); Lillicrap et al. (2016). In particular, we use the DQN algorithm: a sampleefficient method that improves on neural fitted Qiteration Riedmiller (2005) by use of an experience replay buffer and an iteratively updated target network for stateaction value function estimation. Specifically, we use two parametrizations of the Qfunction depending on whether the state contains an edge stub:
where represents concatenation. This lets the model learn combinations of relevant node features (e.g., that connecting two central nodes has high Qvalue). The use of GNNs has several advantages: firstly, the parameters can be learned in a goaldirected fashion for the RL objective, allowing for flexibility in the learned representation. Secondly, the embeddings have the potential to generalize to larger graphs since they control how to combine node features of neighbors in the message passing rounds and are not restricted to graphs of a particular size.
4 Experimental Setup
We evaluate RNet–DQN and baselines both on synthetic and realworld graphs. We allow agents a number of edge additions proportional to a percentage of total possible edges. Training is performed separately for each graph family, objective function , and value of . No hyperparameter tuning is performed due to computational constraints. Further details of the hyperparameters, implementation, and running costs are provided in the supplementary material.
Baselines.
We compare against the following approaches:

Random: Randomly selects an available action.

Greedy: Uses lookahead and selects the action that gives the biggest improvement in the estimated value of over one edge addition.

Preferential: Previous works have considered preferential additions between nodes with the two lowest degrees Beygelzimer et al. (2005), connecting a node with the lowest degree to a random node Wang and Van Mieghem (2008) or connecting the two nodes with the lowest degree product Wang et al. (2014), i.e., adding an edge between the vertices that satisfy . We find the latter works best in all settings tested, and refer to it as LDP.

Supervised Learning (SL): We consider a supervised learning baseline by regressing on to learn an approximate . We use the same S2V architecture as RNet–DQN, which we train using MSE loss instead of the Qlearning loss. To select actions for a graph , the agent considers all graphs that are one edge away, selecting the one that satisfies .
Synthetic Graphs.
We study performance on graphs generated by the following models:
We consider graphs with , allowing agents to add a percentage of the total number of edges equal to , which yields . For RNet–DQN and SL, we train on a disjoint set of graphs . We periodically measure performance on another set , storing the best model found. We use and . The performance of all agents is evaluated on a set with generated using the ER and BA models.
To evaluate outofdistribution generalization we repeat the evaluation on graphs with up to (only up to for Greedy and SL due to computational cost, see next section) and scale (for ER) and accordingly.
RealWorld Graphs.
In order to evaluate our approach on realworld graphs, we consider infrastructure networks (for which robustness is a critical property) extracted from two datasets:

Scigrid: a dataset of the European power grid Medjroubi et al. (2017) ().
We split these graphs by the country in which the nodes are located, selecting the largest connected component in case they are disconnected. We then select those with , obtaining infrastructure graphs for Scigrid and for Euroroad. We consider . Since in this context the performance on individual instances matters more than generalizability, we train and evaluate the models on each graph separately (). SL is excluded for this experiment as dataset size is 1.
5 Results
Random  LDP  FV  ERes  Greedy  SL  RNetâDQN  

Objective  L  avg  best  avg  best  
BA  2  0.018  0.036  0.051  0.053  0.033  0.048  0.057  0.051  0.057  
5  0.049  0.089  0.098  0.106  0.079  0.099  0.122  0.124  0.130  
10  0.100  0.158  0.176  0.180  0.141  0.161  0.203  0.211  0.222  
ER  2  0.029  0.100  0.103  0.103  0.082  0.094  0.100  0.098  0.104  
5  0.071  0.168  0.172  0.175  0.138  0.158  0.168  0.164  0.173  
10  0.138  0.238  0.252  0.253  0.217  0.221  0.238  0.240  0.249  
BA  2  0.010  0.022  0.018  0.018  0.045  0.022  0.033  0.042  0.047  
5  0.025  0.091  0.037  0.077  0.077  0.055  0.077  0.108  0.117  
10  0.054  0.246  0.148  0.232  0.116  0.128  0.217  0.272  0.289  
ER  2  0.020  0.103  0.090  0.098  0.149  0.102  0.118  0.122  0.128  
5  0.050  0.205  0.166  0.215  0.293  0.182  0.238  0.268  0.279  
10  0.098  0.306  0.274  0.299  0.477  0.269  0.374  0.461  0.482 
In Table 1, we present the results of our experimental evaluation for synthetic graphs. We also display the evolution of the validation loss during training in Figure 2. Outofdistribution generalization results are shown in Figure 3. The results for realworld graphs are provided in Table 2, aggregated by dataset (for results split by individual country please see Table A.1 in the Technical Appendix).
Main Findings.
We summarize our findings as follows:
RNet–DQN provides competitive performance, especially for longer action sequences. Across all settings tested, RNet–DQN performed significantly better than random. On synthetic graphs the best model obtained the highest performance in 8 out of 12 settings tested, while the average performance is at least 89% of that of the bestperforming configuration. For BA graphs, RNet–DQN obtained the best performance across all tasks tested. For ER graphs, ERes performed slightly better when considering ; for the greedy baseline performed better for shorter sequences. For realworld graphs, RNet–DQN obtained the best performance across all tasks.
Strategies for improving are easier to learn. The performance gap between the trained model and the baselines is smaller for , suggesting it is less complex to learn. This is also supported by the evaluation losses monitored during training, which show performance improves and plateaus more quickly. For the network with randomly initialized parameters already yields reasonable policies, and training brings some improvement. In contrast, the improvements for are much more dramatic.
Outofdistribution generalization only occurs for . The performance on larger outofdistribution graphs is preserved for the objective, and especially for BA graphs we observe strong generalization. The performance for decays rapidly, obtaining worse performance than the baselines as the size increases. The poor performance of the greedy policy means the estimates are no longer accurate under distribution shift. There are several possible explanations, e.g., the inherent noise of estimating makes the neural network more robust to outliers, or that central nodes impact message passing in larger graphs differently. We think investigating this phenomenon is a worthwhile future direction of this work, since outofdistribution generalization does occur for and evaluating the objective functions directly is prohibitively expensive for large graphs.
Performance on realworld graphs is comparatively better wrt. the baselines. This is expected since training is performed separately for each graph to be optimized.
Dataset  Algorithm  

Euroroad  Random  0.081  0.033 
LDP  0.164  0.122  
FV  0.185  0.091  
ERes  0.183  0.104  
Greedy  0.161  0.143  
RNet–DQN (avg)  0.187  0.213  
RNet–DQN (best)  0.202  0.222  
Scigrid  Random  0.082  0.046 
LDP  0.211  0.123  
FV  0.212  0.087  
ERes  0.217  0.139  
Greedy  0.185  0.122  
RNet–DQN (avg)  0.236  0.217  
RNet–DQN (best)  0.258  0.231 
Time Complexity.
We also compare the time complexities of all approaches considered below.

RNet–DQN: operations at each step: constructing node embeddings and, based on these embeddings, estimating for all valid actions.

Random: for sampling, assuming the environment checks action validity.

Greedy: . The improvement in is estimated for all possible edges. For each edge, this involves MC simulations. As described in Section 2, each MC simulation has complexity .

LDP: : computing the product of node degrees.

FV, ERes: , since they involve computing the eigendecomposition and the MoorePenrose pseudoinverse of the graph Laplacian respectively (may be faster in practice).

SL: . is predicted for graphs that are one edge away, then an argmax is taken.
It is worth noting that the analysis above does not account for the cost of training, the complexity of which is difficult to determine as it depends on many hyperparameters and the specific characteristics of the problem at hand. The approach is thus advantageous in situations in which predictions need to be made quickly, over many graphs, or the model transfers well from a cheaper training regime.
6 Related Work
Network Resilience.
Network resilience was first discussed by Albert, Jeong, and BarabÃ¡si (2000), who examined the average shortest path distance as a function of the number of removed nodes. Analyzing two scalefree communication networks, they found that this type of network has good robustness to random failure but is vulnerable to targeted attacks. A more extensive investigation by Holme et al. (2002) analyzed the robustness of several realworld networks as well as some generated by synthetic models using a variety of attack strategies. Another area of interest is the analysis of the phase transitions of the graph in terms of connectivity under the two attack strategies Cohen et al. (2000, 2001). Optimal network topologies have also been discovered – for example, under the objective of resilience to both failures and attacks, the optimal network has a bimodal or trimodal degree distribution Valente, Sarkar, and Stone (2004); Tanizawa et al. (2005). There exists evidence to suggest that the topological robustness of infrastructure systems is correlated to operational robustness SolÃ© et al. (2008). More broadly, the resilience of systems is highly important in structural engineering and risk management Cimellaro, Reinhorn, and Bruneau (2010); Ganin et al. (2016).
GNNs and Combinatorial Optimization.
Neural network architectures able to deal not solely with Euclidean but also with manifold and graph data have been developed in recent years Bronstein et al. (2017), and applied to a variety of problems where their capacity for representing structured, relational information can be exploited Battaglia et al. (2018). A subcategory of such approaches are Message Passing Neural Networks (MPNN) Gilmer et al. (2017), often referred to as Graph Neural Networks (GNN) instead. Significant progress has been achieved in machine learning for combinatorial optimization problems Bengio, Lodi, and Prouvost (2018) such as Minimum Vertex Cover and the Traveling Salesman Problem by framing them as a supervised learning Vinyals, Fortunato, and Jaitly (2015) or RL Bello et al. (2016) task. Combining GNNs with RL algorithms has yielded models capable of solving several graph optimization problems with the same architecture while generalizing to graphs an order of magnitude larger than those used during training Khalil et al. (2017). The problem as formulated in this paper is a combinatorial optimization problem, related to other problems in network design that arise in operations research Ahuja et al. (1995) – albeit with a different objective.
Graph Generation.
Similarities exist between the current work and the area of graph generative modeling, which tries to learn a generative model of graphs Li et al. (2018); You et al. (2018b); Liao et al. (2019) in a computationally efficient way given some existing examples. This generation, however, is not necessarily conditioned by an objective that captures a global property to be optimized. Other lines of work target constructing, ab initio, graphs that have desirable properties: examples include neural architecture search Zoph and Le (2017); Liu et al. (2018) and molecular graph generation Jin, Barzilay, and Jaakkola (2018); You et al. (2018a); Bradshaw et al. (2019). The concurrent work GraphOpt Trivedi, Yang, and Zha (2020) tackles the inverse problem: given a graph, the goal is to learn the underlying objective function that lead to its generation. This is achieved with maximum entropy inverse RL and GNNs.
Relationship to RL–S2V.
Our method builds on RL–S2V, a method that was applied for constructing adversarial examples against graph classifiers Dai et al. (2018), and for simplicity we keep the learning mechanism very similar (S2V+DQN), while the problem domain we treat is fundamentally different. The key algorithmic differences to RL–S2V are related to the GIMDP formulation: the reward function used, which in this case quantifies a global property of the graph itself, as well as the definition of the action spaces and the transition model, which account for excluding alreadyexisting edges (RL–S2V ignores this, leading to some of the edge budget being wasted). Since the values of structural properties we discuss are increasing in the number of edges (the complete graph has robustness 1), RNet–DQN generally yields strictly better performance results.
7 Conclusion and Outlook
In this work, we have formulated the problem of improving the value of an arbitrary graph global objective function as the Graph Improvement MDP (GIMDP) for the first time. Our approach, named RNetDQN, uses Reinforcement Learning and Graph Neural Networks as a key component for generalization. As a case study, we have considered the problem of improving graph robustness to random and targeted removals of nodes.
Our experimental evaluation on synthetic and realworld graphs shows that in certain situations this approach can deliver performance superior to existing methods, both in terms of the solutions found (i.e., the resulting robustness of the graphs) and time complexity of model evaluation. Further, we have shown the ability to transfer to outofsample graphs, as well as the potential to transfer to outofdistribution graphs larger than those used during training.
The proposed approach can be applied to other problems based on different definitions of resilience or considering fundamentally different objective functions such as efficiency Latora and Marchiori (2001), path diversity Gvozdiev et al. (2018), and assortativity Newman (2002), which are of interest in various biological, communication, and social networks.
Ethics and Broader Impact
The approach described in this work can be used to improve the robustness of a variety of humanmade infrastructure systems such as communication networks, transportation networks, and power grids. This research might benefit the civil and structural engineering communities, which aim to make these systems resilient to natural or intentional damage. It may also help in the process of defining disaster recovery protocols. We are excited about the potential applications of this approach to the complex systems that enable society to function.
Beyond humanmade structures, there may exist applications of the process of optimizing global characteristics of graphs as presented in this paper in the context of biological neural networks. For example, the topology of the brain evolves over time as connections are made and removed under changing cognitive demands. It is hypothesized that this process satisfies a tradeoff between efficiency and metabolic wiring cost Bullmore and Sporns (2012); both are global structural properties of the network.
In principle, an approach very similar to the one presented might be used for learning strategies for attacking networks. The targeting of nodes by degree that we considered in this work is a common but relatively simple strategy; there exists a body of literature concerned with strategies for dismantling networks (see e.g., Wandelt et al. (2018) for a survey). While legitimate use cases for this type of method exist (such as disrupting malicious networks or tackling the spread of misinformation), we warn that ethical considerations should drive the application of this method and potential negative impacts associated to it should be thoroughly and transparently addressed. We believe that this method does not leverage or amplify any biases in the data – indeed, the experiments presented are solely based on data that represent topological structure.
Footnotes
 Since is very expensive to estimate, we deliberately only provide the reward at the end of the episode in order to make the training feasible computationally, to the detriment of possible credit assignment issues. Intermediate rewards based on the true objective or a related quantity represent a middle ground which we leave for future work.
 We note that the problem formulation does not depend on the specific GNN or RL algorithm used. While further advances developed by the community in these areas Maron et al. (2019); Hessel et al. (2018) can be incorporated, in this paper we focus on aspects specific to the challenges of optimizing the global properties of graphs.
References
 Ahuja, R. K.; Magnanti, T. L.; Orlin, J. B.; and Reddy, M. R. 1995. Chapter 1 Applications of Network Optimization. In Handbooks in Operations Research and Management Science, volume 7 of Network Models, 1–83. Elsevier.
 Albert, R.; Jeong, H.; and BarabÃ¡si, A.L. 2000. Error and Attack Tolerance of Complex Networks. Nature 406(6794): 378–382.
 BarabÃ¡si, A.L.; and Albert, R. 1999. Emergence of Scaling in Random Networks. Science 286(5439): 509–512.
 Battaglia, P. W.; Hamrick, J. B.; Bapst, V.; SánchezGonzález, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. 2018. Relational Inductive Biases, Deep Learning, and Graph Networks. arXiv:1806.01261 .
 Bello, I.; Pham, H.; Le, Q. V.; Norouzi, M.; and Bengio, S. 2016. Neural Combinatorial Optimization with Reinforcement Learning. arXiv:1611.09940 .
 Bengio, Y.; Lodi, A.; and Prouvost, A. 2018. Machine Learning for Combinatorial Optimization: a Methodological Tour d’Horizon. arXiv:1811.06128 .
 Beygelzimer, A.; Grinstein, G.; Linsker, R.; and Rish, I. 2005. Improving Network Robustness by Edge Modification. Physica A 357: 593–612.
 Bianchini, M.; Gori, M.; and Scarselli, F. 2005. Inside PageRank. ACM Trans. Internet Technol. 5(1): 92–128.
 Bradshaw, J.; Paige, B.; Kusner, M. J.; Segler, M. H. S.; and HernÃ¡ndezLobato, J. M. 2019. A Model to Search for Synthesizable Molecules. In NeurIPS.
 Bronstein, M. M.; Bruna, J.; LeCun, Y.; Szlam, A.; and Vandergheynst, P. 2017. Geometric Deep Learning: Going beyond Euclidean data. IEEE Signal Processing Magazine 34(4): 18–42.
 Bullmore, E.; and Sporns, O. 2012. The Economy of Brain Network Organization. Nature Reviews Neuroscience 13(5): 336–349.
 Cetinay, H.; Devriendt, K.; and Van Mieghem, P. 2018. Nodal Vulnerability to Targeted Attacks in Power Grids. Applied Network Science 3(1): 34.
 Cimellaro, G. P.; Reinhorn, A. M.; and Bruneau, M. 2010. Framework for Analytical Quantification of Disaster Resilience. Engineering Structures 32: 3639–3649.
 Cohen, R.; Erez, K.; ben Avraham, D.; and Havlin, S. 2000. Resilience of the Internet to Random Breakdowns. Physical Review Letters 85(21): 4626–4628.
 Cohen, R.; Erez, K.; ben Avraham, D.; and Havlin, S. 2001. Breakdown of the Internet under Intentional Attack. Physical Review Letters 86(16): 3682–3685.
 Dai, H.; Dai, B.; and Song, L. 2016. Discriminative Embeddings of Latent Variable Models for Structured Data. In ICML.
 Dai, H.; Li, H.; Tian, T.; Huang, X.; Wang, L.; Zhu, J.; and Song, L. 2018. Adversarial Attack on Graph Structured Data. In ICML.
 Ellens, W.; Spieksma, F.; Van Mieghem, P.; Jamakovic, A.; and Kooij, R. 2011. Effective Graph Resistance. Linear Algebra and its Applications 435(10): 2491–2506.
 Erdős, P.; and Rényi, A. 1960. On the Evolution of Random Graphs. Publ. Math. Inst. Hung. Acad. Sci 5(1): 17–60.
 Fiedler, M. 1973. Algebraic Connectivity of Graphs. Czechoslovak Mathematical Journal 23(2): 298–305.
 Ganin, A. A.; Massaro, E.; Gutfraind, A.; Steen, N.; Keisler, J. M.; Kott, A.; Mangoubi, R.; and Linkov, I. 2016. Operational Resilience: Concepts, Design and Analysis. Scientific Reports 6(1): 1–12.
 Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; and Dahl, G. E. 2017. Neural Message Passing for Quantum Chemistry. In ICML.
 Gvozdiev, N.; Vissicchio, S.; Karp, B.; and Handley, M. 2018. On LowLatencyCapable Topologies, and Their Impact on the Design of IntraDomain Routing. In SIGCOMM.
 Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; and Silver, D. 2018. Rainbow: Combining Improvements in Deep Reinforcement Learning. In AAAI.
 Holme, P.; Kim, B. J.; Yoon, C. N.; and Han, S. K. 2002. Attack Vulnerability of Complex Networks. Physical Review E 65(5).
 Jin, W.; Barzilay, R.; and Jaakkola, T. 2018. Junction Tree Variational Autoencoder for Molecular Graph Generation. In ICML.
 Khalil, E.; Dai, H.; Zhang, Y.; Dilkina, B.; and Song, L. 2017. Learning Combinatorial Optimization Algorithms over Graphs. In NeurIPS.
 Kunegis, J. 2013. KONECT: the Koblenz network collection. In WWW Companion.
 Latora, V.; and Marchiori, M. 2001. Efficient Behavior of SmallWorld Networks. Physical Review Letters 87(19): 198701.
 Li, Y.; Vinyals, O.; Dyer, C.; Pascanu, R.; and Battaglia, P. 2018. Learning Deep Generative Models of Graphs. In ICML.
 Liao, R.; Li, Y.; Song, Y.; Wang, S.; Nash, C.; Hamilton, W. L.; Duvenaud, D.; Urtasun, R.; and Zemel, R. S. 2019. Efficient Graph Generation with Graph Recurrent Attention Networks. In NeurIPS.
 Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; et al. 2016. Continuous control with deep reinforcement learning. In ICLR.
 Liu, H.; Simonyan, K.; Vinyals, O.; Fernando, C.; and Kavukcuoglu, K. 2018. Hierarchical Representations for Efficient Architecture Search. In ICLR.
 Maron, H.; BenHamu, H.; Serviansky, H.; and Lipman, Y. 2019. Provably Powerful Graph Networks. In NeurIPS.
 Medjroubi, W.; MÃ¼ller, U. P.; Scharf, M.; Matke, C.; and Kleinhans, D. 2017. Open Data in Power Grid Modelling: New Approaches Towards Transparent Grid Models. Energy Reports 3: 14–21.
 Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; et al. 2015. HumanLevel Control through Deep Reinforcement Learning. Nature 518(7540): 529–533.
 Newman, M. E. J. 2002. Assortative Mixing in Networks. Physical Review Letters 89(20).
 Newman, M. E. J. 2003. The Structure and Function of Complex Networks. SIAM Review 45(2).
 Newman, M. E. J. 2018. Networks. Oxford University Press.
 OEIS Foundation. 2020. The OnLine Encyclopedia of Integer Sequences. URL https://oeis.org/A001187.
 Riedmiller, M. 2005. Neural Fitted Q Iteration â First Experiences with a Data Efficient Neural Reinforcement Learning Method. In ECML.
 Schneider, C. M.; Moreira, A. A.; Andrade, J. S.; Havlin, S.; and Herrmann, H. J. 2011. Mitigation of Malicious Attacks on Networks. PNAS 108(10): 3838–3841.
 SolÃ©, R. V.; RosasCasals, M.; CorominasMurtra, B.; and Valverde, S. 2008. Robustness of the European power grids under intentional attack. Physical Review E 77(2): 026102.
 Tanizawa, T.; Paul, G.; Cohen, R.; Havlin, S.; and Stanley, H. E. 2005. Optimization of Network Robustness to Waves of Targeted and Random Attacks. Physical Review E 71(4).
 Trivedi, R.; Yang, J.; and Zha, H. 2020. GraphOpt: Learning Optimization Models of Graph Formation. In ICML.
 Valente, A. X. C. N.; Sarkar, A.; and Stone, H. A. 2004. TwoPeak and ThreePeak Optimal Complex Networks. Physical Review Letters 92(11).
 Vinyals, O.; Fortunato, M.; and Jaitly, N. 2015. Pointer Networks. In NeurIPS.
 Wandelt, S.; Sun, X.; Feng, D.; Zanin, M.; and Havlin, S. 2018. A Comparative Analysis of Approaches to NetworkDismantling. Scientific Reports 8(1): 1–15.
 Wang, H.; and Van Mieghem, P. 2008. Algebraic Connectivity Optimization via Link Addition. In Proceedings of the Third International Conference on BioInspired Models of Network Information and Computing Systems (Bionetics).
 Wang, X.; Pournaras, E.; Kooij, R. E.; and Van Mieghem, P. 2014. Improving Robustness of Complex Networks via the Effective Graph Resistance. The European Physical Journal B 87(9): 221.
 Watkins, C. J. C. H.; and Dayan, P. 1992. Qlearning. Machine Learning 8(34): 279–292.
 Watts, D. J.; and Strogatz, S. H. 1998. Collective dynamics of âsmallworldâ networks. Nature 393(6684): 440.
 You, J.; Liu, B.; Ying, R.; Pande, V.; and Leskovec, J. 2018a. Graph Convolutional Policy Network for GoalDirected Molecular Graph Generation. In NeurIPS.
 You, J.; Ying, R.; Ren, X.; Hamilton, W. L.; and Leskovec, J. 2018b. GraphRNN: Generating Realistic Graphs with Deep Autoregressive Models. In ICML.
 Zoph, B.; and Le, Q. V. 2017. Neural Architecture Search with Reinforcement Learning. In ICLR.
 Å ubelj, L.; and Bajec, M. 2011. Robust network community detection using balanced propagation. The European Physical Journal B 81(3): 353–362.