Improving the Robustness of Graphs through Reinforcement Learning and Graph Neural Networks

Improving the Robustness of Graphs through Reinforcement Learning and Graph Neural Networks


Graphs can be used to represent and reason about real world systems and a variety of metrics have been devised to quantify their global characteristics. An important property is robustness to failures and attacks, which is relevant for the infrastructure and communication networks that power modern society. Prior work on making topological modifications to a graph, e.g., adding edges, in order to increase robustness is typically based on local and spectral properties or a shallow search since robustness is expensive to compute directly. However, such strategies are necessarily suboptimal. In this work, we present RNet–DQN, an approach for constructing networks that uses Reinforcement Learning to address improving the robustness of graphs to random and targeted removals of nodes. In particular, the approach relies on changes in the estimated robustness as a reward signal and Graph Neural Networks for representing states. Experiments on synthetic and real-world graphs show that this approach can deliver performance superior to existing methods while being much cheaper to evaluate and generalizing to out-of-sample graphs, as well as to larger out-of-distribution graphs in some cases. The approach is readily applicable to optimizing other global structural properties of graphs.


1 University College London 2 The Alan Turing Institute 3 University of Bologna
{v.darvariu, s.hailes, m.musolesi}

1 Introduction

Graphs are mathematical abstractions that can be used to model a variety of systems, from infrastructure and biological networks to social structures. Various methods for analyzing networks have been developed: these have been often used for understanding the systems themselves and range from mathematical models of how families of graphs are generated Watts and Strogatz (1998); Barabási and Albert (1999) to measures of centrality for capturing the roles of vertices Bianchini, Gori, and Scarselli (2005) and global network characteristics Newman (2018) just to name a few.

A measure that has attracted significant interest from researchers and practitioners is robustness Newman (2003) (sometimes called resilience), which is typically defined as the capacity of the graph to withstand random failures, targeted attacks on key nodes, or some combination thereof. A network is considered robust if a large fraction (critical fraction) of nodes have to be removed before it becomes disconnected Cohen et al. (2000), its diameter increases Albert, Jeong, and Barabási (2000), or its largest connected component diminishes in size Beygelzimer et al. (2005). Previous work has focused on the robustness of communication networks such as the Internet Cohen et al. (2001) and infrastructure networks used for transportation and energy distribution Cetinay, Devriendt, and Van Mieghem (2018).

In many practical cases, the goal is not to build robust networks from scratch, but to improve existing ones by modifying their structure. For example, Beygelzimer et al. (2005) approach this problem by considering edge addition or rewiring, based on random and preferential (wrt. node degree) modifications. Schneider et al. (2011) propose a “greedy” modification scheme based on random edge selection and swapping if the resilience metric improves. Another line of work focuses on the spectral decomposition of the graph Laplacian, and using properties such as the algebraic connectivity Wang and Van Mieghem (2008) and effective graph resistance Wang et al. (2014) to guide modifications. While simple and interpretable, these strategies may not yield the best solutions or generalize across networks with varying characteristics and sizes. Certainly, better solutions may be found by exhaustive search, but the time complexity of exploring all the possible topologies and the cost of computing the metric render this strategy infeasible. We thus ask whether generalizable robustness improvement strategies can be learned.

Figure 1: Illustration of a Graph Improvement MDP (GI-MDP) trajectory. The agent is provided with a start state . It must make edge additions over a sequence of 6 node selections, receiving rewards proportional to the value of an objective function applied to the graph. In this case, quantifies the robustness of the network to targeted node removal, computed by removing nodes in decreasing order of their degree and in decreasing order of the labels if two nodes have the same degree. The graph is improved from to . Actions and the corresponding edges are highlighted.

In this work, we propose addressing this question in the Reinforcement Learning (RL) framework. We formalize the process of adding edges to a graph as a Markov Decision Process (MDP) in which rewards are proportional to the improvement measured through a graph-level objective function. We consider two objective functions that quantify robustness as the critical fraction of the network in the presence of random failures and targeted attacks. Inspired by recent successes of RL in solving combinatorial optimization problems on graphs Bello et al. (2016); Khalil et al. (2017), we make use of Graph Neural Network (GNN) architectures Gilmer et al. (2017) together with the Deep Q-Network (DQN) Mnih et al. (2015) algorithm. Recent work in goal-directed graph generation and improvement considers performing edge additions for adversarially attacking GNN classifiers Dai et al. (2018) and generating molecules with certain desirable properties using domain-specific rewards You et al. (2018a). In contrast, to the best of our knowledge, this is the first time that RL is used to learn how to optimize a global structural property of a graph through its construction. While in this paper we focus on robustness, other intrinsic global properties of graphs can be used as optimization targets.

The contribution of this paper is twofold. Firstly, we propose using RL as a framework for improving global structural properties of graphs through formulating the Graph Improvement MDP (GI-MDP). Secondly, focusing on the robustness of graphs under failures and attacks as a core case study, we offer an in-depth empirical evaluation that demonstrates significant advantages over existing approaches in this domain, both in terms of the quality of the solutions found as well as the time complexity of model evaluation. Since this approach addresses the problem of building robust networks with a DQN, we name it RNet–DQN.

The remainder of the paper is structured as follows. We provide the definitions of the GI-MDP and the robustness measures in Section 2. Section 3 describes state and action representations for deep RL using GNNs. We describe our experimental setup in Section 4, and discuss our main results in Section 5. In Section 6 we review and compare the key works in this area. Finally, we conclude and offer a discussion of avenues for future work in Section 7.

2 Modeling Graph Robustness for Reinforcement Learning

MDP Preliminaries.

An MDP is one possible formalization of decision making processes. The decision maker, called agent, interacts with an environment. When in a state , the agent must take an action out of the set of valid ones, receiving a reward governed by the reward function . Finally, the agent finds itself in a new state , depending on a transition model that governs the joint probability distribution of transitioning to state after taking action in state . This sequence of interactions gives rise to a trajectory. The agent’s goal is to maximize the expected (possibly discounted) sum of rewards it receives over all trajectories. The tuple defines this MDP, where is a discount factor. We also define a policy , a distribution of actions over states, which fully determines the behavior of the agent. Given a policy , the state-action value function is defined as the expected return when starting from , taking action , and subsequently following policy .

Modeling Graph Improvement.

Let be the set of labeled, undirected, unweighted, connected graphs with nodes; each such graph consists of a vertex set and edge set . Let be the subset of with . We also let be an objective function, and be a modification budget. Given an initial graph , the aim is to perform a series of edge additions to such that the resulting graph satisfies:

This combinatorial optimization problem can be cast as a sequential decision-making process. In order to enable scaling to large graphs, the agent has to select a node at each step, and an edge is added to the graph after every two decisions Dai et al. (2018). Tasks are episodic; each episode proceeds for at most steps. A trajectory visualization is shown in Figure 1. Formally, we map our problem to the Graph Improvement MDP (GI-MDP) as follows:

  1. State: The state is a tuple containing the graph and an edge stub . can be either the empty set or the singleton , where .

  2. Action: corresponds to the selection of a node in . Letting the degree of node be , available actions are defined as:

  3. Transitions: The transition model is defined as ,

  4. Reward: The reward is defined as follows1:

Definition of Objective Functions for Robustness.

In this study, we are interested in the robustness of graphs as objective functions. Given a graph , we let the critical fraction be the minimum fraction of nodes that have to be removed from in some order for it to become disconnected (i.e., have more than one connected component). Connectedness is a crucial operational constraint and the higher this fraction is, the more robust the graph can be said to be. We note that the order in which nodes are removed can have an impact on , and corresponds to different attack strategies. We consider both random permutations of nodes in , as well as permutations , which are subject to the constraint that nodes must appear in the order of their degree, i.e.,

We define the objective functions in the following way:

  1. Expected Critical Fraction to Random Removal:

  2. Expected Critical Fraction to Targeted Removal:

As there are currently no closed-form expressions to compute these quantities, we use Monte Carlo (MC) sampling to estimate them. For completeness, Algorithm 1 in the Technical Appendix describes how the simulations are performed. In the remainder of the paper, we use and to indicate their estimates obtained in this way. We highlight that evaluating an MC sample has time complexity : it involves checking connectedness (an operation) after the removal of each of the nodes. Typically, many such samples need to be used to obtain a low-variance estimate of the quantities. Coupled with the number of possible topologies, the high cost renders even shallow search methods infeasible in this domain.

3 Learning to Build Robust Graphs with Function Approximation

We will now discuss the key ingredients behind the RNet–DQN algorithm: a scalable approach for learning how to build robust graphs.

While the problem formulation described in Section 2 may allow us to work with a tabular RL method, the number of states quickly becomes intractable – for example, there are approximately labeled, connected graphs with 20 vertices OEIS Foundation (2020). We thus require a means of considering graph properties that are label-agnostic, permutation-invariant, and generalize across similar states and actions. Graph Neural Network architectures address these requirements. In particular, we use a graph representation based on a variant of structure2vec (S2V) Dai, Dai, and Song (2016), a GNN architecture inspired by mean field inference in graphical models. 2 Given an input graph where nodes have feature vectors , its objective is to produce for each node an embedding vector that captures the structure of the graph as well as interactions between neighbors. This is performed in several rounds of aggregating the features of neighbors and applying an element-wise non-linear activation function such as the rectified linear unit. For each round , the network simultaneously applies updates of the form:

where is the neighborhood of node . We initialize embeddings with , and let . Once node-level embeddings are obtained, permutation-invariant embeddings for a subgraph can be derived by summing the node-level embeddings: . Summing embeddings has the following advantage over other aggregators proposed in the GNN literature (mean, max): it can distinguish between early states (with less selected nodes) and later states: intuitively, this captures how far the agent is in an episode. The node features are one-hot 2-dimensional vectors representing whether is the edge stub, and their use is required to satisfy the Markovian assumption behind the MDP framework (the agent “commits” to selecting , which is now part of its present state).

In Q-learning Watkins and Dayan (1992), the agent estimates the state-action value function introduced earlier, and derives a deterministic policy that acts greedily with respect to it. The agent interacts with the environment and updates its estimates according to:

During learning, exploratory random actions are taken with probability . In the case of high-dimensional state and action spaces, approaches that use a neural network to estimate have been successful in a variety of domains ranging from general game-playing to continuous control Mnih et al. (2015); Lillicrap et al. (2016). In particular, we use the DQN algorithm: a sample-efficient method that improves on neural fitted Q-iteration Riedmiller (2005) by use of an experience replay buffer and an iteratively updated target network for state-action value function estimation. Specifically, we use two parametrizations of the Q-function depending on whether the state contains an edge stub:

where represents concatenation. This lets the model learn combinations of relevant node features (e.g., that connecting two central nodes has high Q-value). The use of GNNs has several advantages: firstly, the parameters can be learned in a goal-directed fashion for the RL objective, allowing for flexibility in the learned representation. Secondly, the embeddings have the potential to generalize to larger graphs since they control how to combine node features of neighbors in the message passing rounds and are not restricted to graphs of a particular size.

4 Experimental Setup

We evaluate RNet–DQN and baselines both on synthetic and real-world graphs. We allow agents a number of edge additions proportional to a percentage of total possible edges. Training is performed separately for each graph family, objective function , and value of . No hyperparameter tuning is performed due to computational constraints. Further details of the hyperparameters, implementation, and running costs are provided in the supplementary material.


We compare against the following approaches:

  • Random: Randomly selects an available action.

  • Greedy: Uses lookahead and selects the action that gives the biggest improvement in the estimated value of over one edge addition.

  • Preferential: Previous works have considered preferential additions between nodes with the two lowest degrees Beygelzimer et al. (2005), connecting a node with the lowest degree to a random node Wang and Van Mieghem (2008) or connecting the two nodes with the lowest degree product Wang et al. (2014), i.e., adding an edge between the vertices that satisfy . We find the latter works best in all settings tested, and refer to it as LDP.

  • Fiedler Vector (FV): Introduced by Fiedler (1973) and for robustness improvement by Wang and Van Mieghem (2008), this strategy adds an edge between the vertices that satisfy , where is the Fiedler vector i.e., the eigenvector of the graph Laplacian corresponding to the second smallest eigenvalue.

  • Effective Resistance (ERes): Introduced by Ellens et al. (2011) and for robustness improvement as a local pairwise approximation by Wang et al. (2014), this strategy selects vertices that satisfy . is defined as , where is the pseudoinverse of .

  • Supervised Learning (SL): We consider a supervised learning baseline by regressing on to learn an approximate . We use the same S2V architecture as RNet–DQN, which we train using MSE loss instead of the Q-learning loss. To select actions for a graph , the agent considers all graphs that are one edge away, selecting the one that satisfies .

Synthetic Graphs.

We study performance on graphs generated by the following models:

  • Erdős–Rényi (ER): A graph sampled uniformly out of  Erdős and Rényi (1960). We use , which represents 20% of all possible edges.

  • Barabási–Albert (BA): A growth model where nodes each attach preferentially to existing nodes Barabási and Albert (1999). We use .

We consider graphs with , allowing agents to add a percentage of the total number of edges equal to , which yields . For RNet–DQN and SL, we train on a disjoint set of graphs . We periodically measure performance on another set , storing the best model found. We use and . The performance of all agents is evaluated on a set with generated using the ER and BA models.

To evaluate out-of-distribution generalization we repeat the evaluation on graphs with up to (only up to for Greedy and SL due to computational cost, see next section) and scale (for ER) and accordingly.

Real-World Graphs.

In order to evaluate our approach on real-world graphs, we consider infrastructure networks (for which robustness is a critical property) extracted from two datasets:

  • Euroroad: road connections in mainland Europe and parts of Western and Central Asia Å ubelj and Bajec (2011); Kunegis (2013) ();

  • Scigrid: a dataset of the European power grid Medjroubi et al. (2017) ().

We split these graphs by the country in which the nodes are located, selecting the largest connected component in case they are disconnected. We then select those with , obtaining infrastructure graphs for Scigrid and for Euroroad. We consider . Since in this context the performance on individual instances matters more than generalizability, we train and evaluate the models on each graph separately (). SL is excluded for this experiment as dataset size is 1.

5 Results

Random LDP FV ERes Greedy SL RNet–DQN
Objective L avg best avg best
BA 2 0.018 0.036 0.051 0.053 0.033 0.048 0.057 0.051 0.057
5 0.049 0.089 0.098 0.106 0.079 0.099 0.122 0.124 0.130
10 0.100 0.158 0.176 0.180 0.141 0.161 0.203 0.211 0.222
ER 2 0.029 0.100 0.103 0.103 0.082 0.094 0.100 0.098 0.104
5 0.071 0.168 0.172 0.175 0.138 0.158 0.168 0.164 0.173
10 0.138 0.238 0.252 0.253 0.217 0.221 0.238 0.240 0.249
BA 2 0.010 0.022 0.018 0.018 0.045 0.022 0.033 0.042 0.047
5 0.025 0.091 0.037 0.077 0.077 0.055 0.077 0.108 0.117
10 0.054 0.246 0.148 0.232 0.116 0.128 0.217 0.272 0.289
ER 2 0.020 0.103 0.090 0.098 0.149 0.102 0.118 0.122 0.128
5 0.050 0.205 0.166 0.215 0.293 0.182 0.238 0.268 0.279
10 0.098 0.306 0.274 0.299 0.477 0.269 0.374 0.461 0.482
Table 1: Mean cumulative reward per episode obtained by the agents on synthetic graphs with , grouped by objective function, graph family, and number of edge additions .

In Table 1, we present the results of our experimental evaluation for synthetic graphs. We also display the evolution of the validation loss during training in Figure 2. Out-of-distribution generalization results are shown in Figure 3. The results for real-world graphs are provided in Table 2, aggregated by dataset (for results split by individual country please see Table A.1 in the Technical Appendix).

Main Findings.

We summarize our findings as follows:

RNet–DQN provides competitive performance, especially for longer action sequences. Across all settings tested, RNet–DQN performed significantly better than random. On synthetic graphs the best model obtained the highest performance in 8 out of 12 settings tested, while the average performance is at least 89% of that of the best-performing configuration. For BA graphs, RNet–DQN obtained the best performance across all tasks tested. For ER graphs, ERes performed slightly better when considering ; for the greedy baseline performed better for shorter sequences. For real-world graphs, RNet–DQN obtained the best performance across all tasks.

Strategies for improving are easier to learn. The performance gap between the trained model and the baselines is smaller for , suggesting it is less complex to learn. This is also supported by the evaluation losses monitored during training, which show performance improves and plateaus more quickly. For the network with randomly initialized parameters already yields reasonable policies, and training brings some improvement. In contrast, the improvements for are much more dramatic.

Figure 2: Performance on for synthetic graphs of as a function of training steps. Note the different x-axis scales: more training steps are typically required for longer edge addition sequences.

Out-of-distribution generalization only occurs for . The performance on larger out-of-distribution graphs is preserved for the objective, and especially for BA graphs we observe strong generalization. The performance for decays rapidly, obtaining worse performance than the baselines as the size increases. The poor performance of the greedy policy means the estimates are no longer accurate under distribution shift. There are several possible explanations, e.g., the inherent noise of estimating makes the neural network more robust to outliers, or that central nodes impact message passing in larger graphs differently. We think investigating this phenomenon is a worthwhile future direction of this work, since out-of-distribution generalization does occur for and evaluating the objective functions directly is prohibitively expensive for large graphs.

Figure 3: Performance on out-of-distribution synthetic graphs as a function of graph size, grouped by target problem and percentage of edge additions . For RNet–DQN and SL, models trained on graphs with are used.

Performance on real-world graphs is comparatively better wrt. the baselines. This is expected since training is performed separately for each graph to be optimized.

Dataset Algorithm
Euroroad Random 0.081 0.033
LDP 0.164 0.122
FV 0.185 0.091
ERes 0.183 0.104
Greedy 0.161 0.143
RNet–DQN (avg) 0.187 0.213
RNet–DQN (best) 0.202 0.222
Scigrid Random 0.082 0.046
LDP 0.211 0.123
FV 0.212 0.087
ERes 0.217 0.139
Greedy 0.185 0.122
RNet–DQN (avg) 0.236 0.217
RNet–DQN (best) 0.258 0.231
Table 2: Mean cumulative reward per episode obtained by the agents on real-world graphs.

Time Complexity.

We also compare the time complexities of all approaches considered below.

  • RNet–DQN: operations at each step: constructing node embeddings and, based on these embeddings, estimating for all valid actions.

  • Random: for sampling, assuming the environment checks action validity.

  • Greedy: . The improvement in is estimated for all possible edges. For each edge, this involves MC simulations. As described in Section 2, each MC simulation has complexity .

  • LDP: : computing the product of node degrees.

  • FV, ERes: , since they involve computing the eigendecomposition and the Moore-Penrose pseudoinverse of the graph Laplacian respectively (may be faster in practice).

  • SL: . is predicted for graphs that are one edge away, then an argmax is taken.

It is worth noting that the analysis above does not account for the cost of training, the complexity of which is difficult to determine as it depends on many hyperparameters and the specific characteristics of the problem at hand. The approach is thus advantageous in situations in which predictions need to be made quickly, over many graphs, or the model transfers well from a cheaper training regime.

6 Related Work

Network Resilience.

Network resilience was first discussed by Albert, Jeong, and Barabási (2000), who examined the average shortest path distance as a function of the number of removed nodes. Analyzing two scale-free communication networks, they found that this type of network has good robustness to random failure but is vulnerable to targeted attacks. A more extensive investigation by Holme et al. (2002) analyzed the robustness of several real-world networks as well as some generated by synthetic models using a variety of attack strategies. Another area of interest is the analysis of the phase transitions of the graph in terms of connectivity under the two attack strategies Cohen et al. (2000, 2001). Optimal network topologies have also been discovered – for example, under the objective of resilience to both failures and attacks, the optimal network has a bi-modal or tri-modal degree distribution Valente, Sarkar, and Stone (2004); Tanizawa et al. (2005). There exists evidence to suggest that the topological robustness of infrastructure systems is correlated to operational robustness Solé et al. (2008). More broadly, the resilience of systems is highly important in structural engineering and risk management Cimellaro, Reinhorn, and Bruneau (2010); Ganin et al. (2016).

GNNs and Combinatorial Optimization.

Neural network architectures able to deal not solely with Euclidean but also with manifold and graph data have been developed in recent years Bronstein et al. (2017), and applied to a variety of problems where their capacity for representing structured, relational information can be exploited Battaglia et al. (2018). A sub-category of such approaches are Message Passing Neural Networks (MPNN) Gilmer et al. (2017), often referred to as Graph Neural Networks (GNN) instead. Significant progress has been achieved in machine learning for combinatorial optimization problems Bengio, Lodi, and Prouvost (2018) such as Minimum Vertex Cover and the Traveling Salesman Problem by framing them as a supervised learning Vinyals, Fortunato, and Jaitly (2015) or RL Bello et al. (2016) task. Combining GNNs with RL algorithms has yielded models capable of solving several graph optimization problems with the same architecture while generalizing to graphs an order of magnitude larger than those used during training Khalil et al. (2017). The problem as formulated in this paper is a combinatorial optimization problem, related to other problems in network design that arise in operations research Ahuja et al. (1995) – albeit with a different objective.

Graph Generation.

Similarities exist between the current work and the area of graph generative modeling, which tries to learn a generative model of graphs Li et al. (2018); You et al. (2018b); Liao et al. (2019) in a computationally efficient way given some existing examples. This generation, however, is not necessarily conditioned by an objective that captures a global property to be optimized. Other lines of work target constructing, ab initio, graphs that have desirable properties: examples include neural architecture search Zoph and Le (2017); Liu et al. (2018) and molecular graph generation Jin, Barzilay, and Jaakkola (2018); You et al. (2018a); Bradshaw et al. (2019). The concurrent work GraphOpt Trivedi, Yang, and Zha (2020) tackles the inverse problem: given a graph, the goal is to learn the underlying objective function that lead to its generation. This is achieved with maximum entropy inverse RL and GNNs.

Relationship to RL–S2V.

Our method builds on RL–S2V, a method that was applied for constructing adversarial examples against graph classifiers Dai et al. (2018), and for simplicity we keep the learning mechanism very similar (S2V+DQN), while the problem domain we treat is fundamentally different. The key algorithmic differences to RL–S2V are related to the GI-MDP formulation: the reward function used, which in this case quantifies a global property of the graph itself, as well as the definition of the action spaces and the transition model, which account for excluding already-existing edges (RL–S2V ignores this, leading to some of the edge budget being wasted). Since the values of structural properties we discuss are increasing in the number of edges (the complete graph has robustness 1), RNet–DQN generally yields strictly better performance results.

7 Conclusion and Outlook

In this work, we have formulated the problem of improving the value of an arbitrary graph global objective function as the Graph Improvement MDP (GI-MDP) for the first time. Our approach, named RNet-DQN, uses Reinforcement Learning and Graph Neural Networks as a key component for generalization. As a case study, we have considered the problem of improving graph robustness to random and targeted removals of nodes.

Our experimental evaluation on synthetic and real-world graphs shows that in certain situations this approach can deliver performance superior to existing methods, both in terms of the solutions found (i.e., the resulting robustness of the graphs) and time complexity of model evaluation. Further, we have shown the ability to transfer to out-of-sample graphs, as well as the potential to transfer to out-of-distribution graphs larger than those used during training.

The proposed approach can be applied to other problems based on different definitions of resilience or considering fundamentally different objective functions such as efficiency Latora and Marchiori (2001), path diversity Gvozdiev et al. (2018), and assortativity Newman (2002), which are of interest in various biological, communication, and social networks.

Ethics and Broader Impact

The approach described in this work can be used to improve the robustness of a variety of human-made infrastructure systems such as communication networks, transportation networks, and power grids. This research might benefit the civil and structural engineering communities, which aim to make these systems resilient to natural or intentional damage. It may also help in the process of defining disaster recovery protocols. We are excited about the potential applications of this approach to the complex systems that enable society to function.

Beyond human-made structures, there may exist applications of the process of optimizing global characteristics of graphs as presented in this paper in the context of biological neural networks. For example, the topology of the brain evolves over time as connections are made and removed under changing cognitive demands. It is hypothesized that this process satisfies a trade-off between efficiency and metabolic wiring cost Bullmore and Sporns (2012); both are global structural properties of the network.

In principle, an approach very similar to the one presented might be used for learning strategies for attacking networks. The targeting of nodes by degree that we considered in this work is a common but relatively simple strategy; there exists a body of literature concerned with strategies for dismantling networks (see e.g., Wandelt et al. (2018) for a survey). While legitimate use cases for this type of method exist (such as disrupting malicious networks or tackling the spread of misinformation), we warn that ethical considerations should drive the application of this method and potential negative impacts associated to it should be thoroughly and transparently addressed. We believe that this method does not leverage or amplify any biases in the data – indeed, the experiments presented are solely based on data that represent topological structure.


  1. Since is very expensive to estimate, we deliberately only provide the reward at the end of the episode in order to make the training feasible computationally, to the detriment of possible credit assignment issues. Intermediate rewards based on the true objective or a related quantity represent a middle ground which we leave for future work.
  2. We note that the problem formulation does not depend on the specific GNN or RL algorithm used. While further advances developed by the community in these areas Maron et al. (2019); Hessel et al. (2018) can be incorporated, in this paper we focus on aspects specific to the challenges of optimizing the global properties of graphs.


  1. Ahuja, R. K.; Magnanti, T. L.; Orlin, J. B.; and Reddy, M. R. 1995. Chapter 1 Applications of Network Optimization. In Handbooks in Operations Research and Management Science, volume 7 of Network Models, 1–83. Elsevier.
  2. Albert, R.; Jeong, H.; and Barabási, A.-L. 2000. Error and Attack Tolerance of Complex Networks. Nature 406(6794): 378–382.
  3. Barabási, A.-L.; and Albert, R. 1999. Emergence of Scaling in Random Networks. Science 286(5439): 509–512.
  4. Battaglia, P. W.; Hamrick, J. B.; Bapst, V.; Sánchez-González, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. 2018. Relational Inductive Biases, Deep Learning, and Graph Networks. arXiv:1806.01261 .
  5. Bello, I.; Pham, H.; Le, Q. V.; Norouzi, M.; and Bengio, S. 2016. Neural Combinatorial Optimization with Reinforcement Learning. arXiv:1611.09940 .
  6. Bengio, Y.; Lodi, A.; and Prouvost, A. 2018. Machine Learning for Combinatorial Optimization: a Methodological Tour d’Horizon. arXiv:1811.06128 .
  7. Beygelzimer, A.; Grinstein, G.; Linsker, R.; and Rish, I. 2005. Improving Network Robustness by Edge Modification. Physica A 357: 593–612.
  8. Bianchini, M.; Gori, M.; and Scarselli, F. 2005. Inside PageRank. ACM Trans. Internet Technol. 5(1): 92–128.
  9. Bradshaw, J.; Paige, B.; Kusner, M. J.; Segler, M. H. S.; and Hernández-Lobato, J. M. 2019. A Model to Search for Synthesizable Molecules. In NeurIPS.
  10. Bronstein, M. M.; Bruna, J.; LeCun, Y.; Szlam, A.; and Vandergheynst, P. 2017. Geometric Deep Learning: Going beyond Euclidean data. IEEE Signal Processing Magazine 34(4): 18–42.
  11. Bullmore, E.; and Sporns, O. 2012. The Economy of Brain Network Organization. Nature Reviews Neuroscience 13(5): 336–349.
  12. Cetinay, H.; Devriendt, K.; and Van Mieghem, P. 2018. Nodal Vulnerability to Targeted Attacks in Power Grids. Applied Network Science 3(1): 34.
  13. Cimellaro, G. P.; Reinhorn, A. M.; and Bruneau, M. 2010. Framework for Analytical Quantification of Disaster Resilience. Engineering Structures 32: 3639–3649.
  14. Cohen, R.; Erez, K.; ben Avraham, D.; and Havlin, S. 2000. Resilience of the Internet to Random Breakdowns. Physical Review Letters 85(21): 4626–4628.
  15. Cohen, R.; Erez, K.; ben Avraham, D.; and Havlin, S. 2001. Breakdown of the Internet under Intentional Attack. Physical Review Letters 86(16): 3682–3685.
  16. Dai, H.; Dai, B.; and Song, L. 2016. Discriminative Embeddings of Latent Variable Models for Structured Data. In ICML.
  17. Dai, H.; Li, H.; Tian, T.; Huang, X.; Wang, L.; Zhu, J.; and Song, L. 2018. Adversarial Attack on Graph Structured Data. In ICML.
  18. Ellens, W.; Spieksma, F.; Van Mieghem, P.; Jamakovic, A.; and Kooij, R. 2011. Effective Graph Resistance. Linear Algebra and its Applications 435(10): 2491–2506.
  19. Erdős, P.; and Rényi, A. 1960. On the Evolution of Random Graphs. Publ. Math. Inst. Hung. Acad. Sci 5(1): 17–60.
  20. Fiedler, M. 1973. Algebraic Connectivity of Graphs. Czechoslovak Mathematical Journal 23(2): 298–305.
  21. Ganin, A. A.; Massaro, E.; Gutfraind, A.; Steen, N.; Keisler, J. M.; Kott, A.; Mangoubi, R.; and Linkov, I. 2016. Operational Resilience: Concepts, Design and Analysis. Scientific Reports 6(1): 1–12.
  22. Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; and Dahl, G. E. 2017. Neural Message Passing for Quantum Chemistry. In ICML.
  23. Gvozdiev, N.; Vissicchio, S.; Karp, B.; and Handley, M. 2018. On Low-Latency-Capable Topologies, and Their Impact on the Design of Intra-Domain Routing. In SIGCOMM.
  24. Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; and Silver, D. 2018. Rainbow: Combining Improvements in Deep Reinforcement Learning. In AAAI.
  25. Holme, P.; Kim, B. J.; Yoon, C. N.; and Han, S. K. 2002. Attack Vulnerability of Complex Networks. Physical Review E 65(5).
  26. Jin, W.; Barzilay, R.; and Jaakkola, T. 2018. Junction Tree Variational Autoencoder for Molecular Graph Generation. In ICML.
  27. Khalil, E.; Dai, H.; Zhang, Y.; Dilkina, B.; and Song, L. 2017. Learning Combinatorial Optimization Algorithms over Graphs. In NeurIPS.
  28. Kunegis, J. 2013. KONECT: the Koblenz network collection. In WWW Companion.
  29. Latora, V.; and Marchiori, M. 2001. Efficient Behavior of Small-World Networks. Physical Review Letters 87(19): 198701.
  30. Li, Y.; Vinyals, O.; Dyer, C.; Pascanu, R.; and Battaglia, P. 2018. Learning Deep Generative Models of Graphs. In ICML.
  31. Liao, R.; Li, Y.; Song, Y.; Wang, S.; Nash, C.; Hamilton, W. L.; Duvenaud, D.; Urtasun, R.; and Zemel, R. S. 2019. Efficient Graph Generation with Graph Recurrent Attention Networks. In NeurIPS.
  32. Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; et al. 2016. Continuous control with deep reinforcement learning. In ICLR.
  33. Liu, H.; Simonyan, K.; Vinyals, O.; Fernando, C.; and Kavukcuoglu, K. 2018. Hierarchical Representations for Efficient Architecture Search. In ICLR.
  34. Maron, H.; Ben-Hamu, H.; Serviansky, H.; and Lipman, Y. 2019. Provably Powerful Graph Networks. In NeurIPS.
  35. Medjroubi, W.; Müller, U. P.; Scharf, M.; Matke, C.; and Kleinhans, D. 2017. Open Data in Power Grid Modelling: New Approaches Towards Transparent Grid Models. Energy Reports 3: 14–21.
  36. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; et al. 2015. Human-Level Control through Deep Reinforcement Learning. Nature 518(7540): 529–533.
  37. Newman, M. E. J. 2002. Assortative Mixing in Networks. Physical Review Letters 89(20).
  38. Newman, M. E. J. 2003. The Structure and Function of Complex Networks. SIAM Review 45(2).
  39. Newman, M. E. J. 2018. Networks. Oxford University Press.
  40. OEIS Foundation. 2020. The On-Line Encyclopedia of Integer Sequences. URL
  41. Riedmiller, M. 2005. Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In ECML.
  42. Schneider, C. M.; Moreira, A. A.; Andrade, J. S.; Havlin, S.; and Herrmann, H. J. 2011. Mitigation of Malicious Attacks on Networks. PNAS 108(10): 3838–3841.
  43. Solé, R. V.; Rosas-Casals, M.; Corominas-Murtra, B.; and Valverde, S. 2008. Robustness of the European power grids under intentional attack. Physical Review E 77(2): 026102.
  44. Tanizawa, T.; Paul, G.; Cohen, R.; Havlin, S.; and Stanley, H. E. 2005. Optimization of Network Robustness to Waves of Targeted and Random Attacks. Physical Review E 71(4).
  45. Trivedi, R.; Yang, J.; and Zha, H. 2020. GraphOpt: Learning Optimization Models of Graph Formation. In ICML.
  46. Valente, A. X. C. N.; Sarkar, A.; and Stone, H. A. 2004. Two-Peak and Three-Peak Optimal Complex Networks. Physical Review Letters 92(11).
  47. Vinyals, O.; Fortunato, M.; and Jaitly, N. 2015. Pointer Networks. In NeurIPS.
  48. Wandelt, S.; Sun, X.; Feng, D.; Zanin, M.; and Havlin, S. 2018. A Comparative Analysis of Approaches to Network-Dismantling. Scientific Reports 8(1): 1–15.
  49. Wang, H.; and Van Mieghem, P. 2008. Algebraic Connectivity Optimization via Link Addition. In Proceedings of the Third International Conference on Bio-Inspired Models of Network Information and Computing Systems (Bionetics).
  50. Wang, X.; Pournaras, E.; Kooij, R. E.; and Van Mieghem, P. 2014. Improving Robustness of Complex Networks via the Effective Graph Resistance. The European Physical Journal B 87(9): 221.
  51. Watkins, C. J. C. H.; and Dayan, P. 1992. Q-learning. Machine Learning 8(3-4): 279–292.
  52. Watts, D. J.; and Strogatz, S. H. 1998. Collective dynamics of ‘small-world’ networks. Nature 393(6684): 440.
  53. You, J.; Liu, B.; Ying, R.; Pande, V.; and Leskovec, J. 2018a. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. In NeurIPS.
  54. You, J.; Ying, R.; Ren, X.; Hamilton, W. L.; and Leskovec, J. 2018b. GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In ICML.
  55. Zoph, B.; and Le, Q. V. 2017. Neural Architecture Search with Reinforcement Learning. In ICLR.
  56. Å ubelj, L.; and Bajec, M. 2011. Robust network community detection using balanced propagation. The European Physical Journal B 81(3): 353–362.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description