Local Distributed Decision Supported by a France-Israel cooperation grant (“Mutli-Computing” project) from the France Ministry of Science and Israel Ministry of Science.

A central theme in distributed network algorithms concerns understanding and coping with the issue of locality. Despite considerable progress, research efforts in this direction have not yet resulted in a solid basis in the form of a fundamental computational complexity theory. Inspired by sequential complexity theory, we focus on a complexity theory for distributed decision problems. In the context of locality, solving a decision problem requires the processors to independently inspect their local neighborhoods and then collectively decide whether a given global input instance belongs to some specified language.

We consider the standard model of computation and define (for local decision) as the class of decision problems that can be solved in number of communication rounds. We first study the intriguing question of whether randomization helps in local distributed computing, and to what extent. Specifically, we define the corresponding randomized class , containing languages for which there exists a randomized algorithm that runs in rounds and accepts correct instances with probability at least and rejects incorrect ones with probability at least . We show that there exists a language that does not belong to for any but which belong for for any such that . On the other hand, we show that, restricted to hereditary languages, , for any function and any such that .

In addition, we investigate the impact of non-determinism on local decision, and establish some structural results inspired by classical computational complexity theory. Specifically, we show that non-determinism does help, but that this help is limited, as there exist languages that cannot be decided non-deterministically. Perhaps surprisingly, it turns out that it is the combination of randomization with non-determinism that enables to decide all languages in constant time. Finally, we introduce the notion of local reduction, and establish some completeness results.

Keywords: Local distributed algorithms, local decision, nondeterminism, randomized algorithms.

1 Introduction

1.1 Motivation

Distributed computing concerns a collection of processors which collaborate in order to achieve some global task. With time, two main disciplines have evolved in the field. One discipline deals with timing issues, namely, uncertainties due to asynchrony (the fact that processors run at their own speed, and possibly crash), and the other concerns topology issues, namely, uncertainties due to locality constraints (the lack of knowledge about far away processors). Studies carried out by the distributed computing community within these two disciplines were to a large extent problem-driven. Indeed, several major problems considered in the literature concern coping with one of the two uncertainties. For instance, in the asynchrony-discipline, Fischer, Lynch and Paterson [14] proved that consensus cannot be achieved in the asynchronous model, even in the presence of a single fault, and in the locality-discipline, Linial [28] proved that -coloring cannot be achieved locally (i.e., in a constant number of communication rounds), even in the ring network.

One of the significant achievements of the asynchrony-discipline was its success in establishing unifying theories in the flavor of computational complexity theory. Some central examples of such theories are failure detectors [6, 7] and the wait-free hierarchy (including Herlihy’s hierarchy) [18]. In contrast, despite considerable progress, the locality-discipline still suffers from the absence of a solid basis in the form of a fundamental computational complexity theory. Obviously, defining some common cost measures (e.g., time, message, memory, etc.) enables us to compare problems in terms of their relative cost. Still, from a computational complexity point of view, it is not clear how to relate the difficulty of problems in the locality-discipline. Specifically, if two problems have different kinds of outputs, it is not clear how to reduce one to the other, even if they cost the same.

Inspired by sequential complexity theory, we focus on decision problems, in which one is aiming at deciding whether a given global input instance belongs to some specified language. In the context of distributed computing, each processor must produce a boolean output, and the decision is defined by the conjunction of the processors’ outputs, i.e., if the instance belongs to the language, then all processors must output “yes”, and otherwise, at least one processor must output “no”. Observe that decision problems provide a natural framework for tackling fault-tolerance: the processors have to collectively check whether the network is fault-free, and a node detecting a fault raises an alarm. In fact, many natural problems can be phrased as decision problems, like “is there a unique leader in the network?” or “is the network planar?”. Moreover, decision problems occur naturally when one is aiming at checking the validity of the output of a computational task, such as “is the produced coloring legal?”, or “is the constructed subgraph an MST?”. Construction tasks such as exact or approximated solutions to problems like coloring, MST, spanner, MIS, maximum matching, etc., received enormous attention in the literature (see, e.g., [5, 25, 26, 28, 30, 31, 32, 38]), yet the corresponding decision problems have hardly been considered.

The purpose of this paper is to investigate the nature of local decision problems. Decision problems seem to provide a promising approach to building up a distributed computational theory for the locality-discipline. Indeed, as we will show, one can define local reductions in the framework of decision problems, thus enabling the introduction of complexity classes and notions of completeness.

We consider the model [36], which is a standard distributed computing model capturing the essence of locality. In this model, processors are woken up simultaneously, and computation proceeds in fault-free synchronous rounds during which every processor exchanges messages of unlimited size with its neighbors, and performs arbitrary computations on its data. Informally, let us define (for local decision) as the class of decision problems that can be solved in number of communication rounds in the model. (We find special interest in the case where represents a constant, but in general we view as a function of the input graph. We note that in the model, every decidable decision problem can be solved in communication rounds, where denotes the number of nodes in the input graph.)

Some decision problems are trivially in (e.g., “is the given coloring a -coloring?”, “do the selected nodes form an MIS?”, etc.), while some others can easily be shown to be outside , for any (e.g., “is the network planar?”, “is there a unique leader?”, etc.). In contrast to the above examples, there are some languages for which it is not clear whether they belong to , even for . To elaborate on this, consider the particular case where it is required to decide whether the network belongs to some specified family of graphs. If this question can be decided in a constant number of communication rounds, then this means, informally, that the family can somehow be characterized by relatively simple conditions. For example, a family of graphs that can be characterized as consisting of all graphs having no subgraph from , for some specified finite set of finite subgraphs, is obviously in . However, the question of whether a family of graphs can be characterized as above is often non-trivial. For example, characterizing cographs as precisely the graphs with no induced , attributed to Seinsche [40], is not easy, and requires nontrivial usage of modular decomposition.

The first question we address is whether and to what extent randomization helps. For , define as the class of all distributed languages that can be decided by a randomized distributed algorithm that runs in number of communication rounds and produces correct answers on legal (respectively, illegal) instances with probability at least (resp., ). An interesting observation is that for and such that , we have . In fact, for such and , there exists a language , such that , for any . To see why, consider the following Unique-Leader language. The input is a graph where each node has a bit indicating whether it is a leader or not. An input is in the language Unique-Leader if and only if there is at most one leader in the graph. Obviously, this language is not in , for any . We claim it is in , for and such that . Indeed, for such and , we can design the following simple randomized algorithm that runs in 0 time: every node which is not a leader says “yes” with probability 1, and every node which is a leader says “yes” with probability . Clearly, if the graph has at most one leader then all nodes say “yes” with probability at least . On the other hand, if there are at least leaders, at least one node says “no”, with probability at least .

It turns out that the aforementioned choice of and is not coincidental, and that is really the correct threshold. Indeed, we show that , for any , and any and such that . In fact, we show a much more general result, that is, we prove that if , then restricted to hereditary languages, actually collapses into , for any .

In the second part of the paper, we investigate the impact of non-determinism on local decision, and establish some structural results inspired by classical computational complexity theory. Specifically, we show that non-determinism does help, but that this help is limited, as there exist languages that cannot be decided non-deterministically. Perhaps surprisingly, it turns out that it is the combination of randomization with non-determinism that enables to decide all languages in constant time. Finally, we introduce the notion of local reduction, and establish some completeness results.

1.2 Our contributions

Impact of randomization

We study the impact of randomization on local decision. We prove that if , then restricted to hereditary languages, , for any function . This, together with the observation that , for any , may indicate that serves as a sharp threshold for distinguishing the deterministic case from the randomized one.

Impact of non-determinism

We first show that non-determinism helps local decision, i.e., we show that the class (cf. Section 2.3) strictly contains . More precisely, we show that there exists a language in which is not in for every , where is the size of the input graph. Nevertheless, does not capture all (decidable) languages, for . Indeed we show that there exists a language not in for every . Specifically, this language is

Perhaps surprisingly, it turns out that it is the combination of randomization with non-determinism that enables to decide all languages in constant time. Let , for some constants and such that . We prove that contains all languages. To sum up, .

Finally, we introduce the notion of many-one local reduction, and establish some completeness results. We show that there exits a problem, called cover, which is, in a sense, the most difficult decision problem. That is we show that cover is -complete. (Interestingly, a small relaxation of cover, called containment, turns out to be -complete).

1.3 Related work

Locality issues have been thoroughly studied in the literature, via the analysis of various construction problems, including -coloring and Maximal Independent Set (MIS)  [1, 5, 23, 26, 28, 30, 35], Minimum Spanning Tree (MST)  [12, 25, 37], Maximal Matching [19], Maximum Weighted Matching [31, 32, 41], Minimum Dominating Set  [24, 27], Spanners [9, 13, 38], etc. For some problems (e.g., coloring [5, 23, 35]), there are still large gaps between the best known results on specific families of graphs (e.g., bounded degree graphs) and on arbitrary graphs.

The question of what can be computed in a constant number of communication rounds was investigated in the seminal work of Naor and Stockmeyer [34]. In particular, that paper considers a subclass of , called LCL, which is essentially restricted to languages involving graphs of constant maximum degree, and involving processor inputs taken from a set of constant size, and studies the question of how to compute in rounds the constructive versions of decision problems in LCL. The paper provides some beautiful general results. In particular, the authors show that if there exists a randomized algorithm that constructs a solution for a problem in LCL in rounds, then there is also a deterministic algorithm constructing a solution for this problem in rounds. Unfortunately, the proof of this result relies heavily on the definition of LCL. Indeed, the constant bound constraints on the degrees and input sizes allow the authors to cleverly use Ramsey theory. It is thus not clear whether it is possible to extend this result to all languages in .

The question of whether randomization helps in decreasing the locality parameter of construction problems has been the focus of numerous studies. To date, there exists evidence that, for some problems at least, randomization does not help. For instance, [33] proves this for 3-coloring the ring. In fact, for low degree graphs, the gaps between the efficiencies of the best known randomized and deterministic algorithms for problems like MIS, -coloring, and Maximal Matching are very small. On the other hand, for graphs of arbitrarily large degrees, there seem to be indications that randomization does help, at least in some cases. For instance, -coloring can be randomly computed in expected communication rounds on -node graphs [1, 30], whereas the best known deterministic algorithm for this problem performs in rounds [35]. -coloring results whose performances are measured also with respect to the maximum degree illustrate this phenomena as well. Specifically, [39] shows that -coloring can be randomly computed in expected communication rounds whereas the best known deterministic algorithm performs in rounds [5, 23].

Recently, several results were established conserving decision problems in distributed computing. For example, [8] and [20] study specific decision problems in the model. (In contrast to the model, this model assumes that the message size is bounded by bits, hence dealing with congestion is the main issue.) Specifically, tight bounds are established in [20] for the time and message complexities of the problem of deciding whether a subgraph is an MST, and time lower bounds for many other subgraph-decision problems (e.g., spanning tree, connectivity) are established in [8]. It is interesting to note that some of these lower bounds imply strong unconditional time lower bounds on the hardness of distributed approximation for many classical construction problems in the model. Decision problems have received recent attention from the asynchrony-discipline too, in the framework of wait-free computing [17]. In this framework, the focus is on task checkability. Wait-free checkable tasks have been characterized in term of covering spaces, a fundamental tool in algebraic topology.

The theory of proof-labeling schemes [21, 22] was designed to tackle the issue of locally verifying (with the aid of a proof, i.e., a certificate, at each node) solutions to problems that cannot be decided locally (e.g.,“is the given subgraph a spanning tree of the network?”, or, “is it an MST?”). In fact, the model of proof-labeling schemes has some resemblance to our definition of the class . Investigations in the framework of proof-labeling schemes mostly focus on the minimum size of the certificate necessary so that verification can be performed in a single round. The notion of proof-labeling schemes also has interesting similarities with the notions of local detection [2], local checking [3], or silent stabilization [11], which were introduced in the context of self-stabilization [10].

The use of oracles that provide information to nodes was studied intensively in the context of distributed construction tasks. For instance, this framework, called local computation with advice, was studied in [15] for MST construction and in [15] for 3-coloring a cycle.

Finally, we note that our notion of NLD seems to be related to the theory of lifts, e.g., [29].

2 Decision problems and complexity classes

2.1 Model of computation

Let us first recall some basic notions in distributed computing. We consider the model [36], which is a standard model capturing the essence of locality. In this model, processors are assumed to be nodes of a network , provided with arbitrary distinct identities, and computation proceeds in fault-free synchronous rounds. At each round, every processor exchanges messages of unrestricted size with its neighbors in , and performs computations on its data. We assume that the number of steps (sequential time) used for the local computation made by the node in some round is bounded by some function , where denotes the size of the “history” seen by node up to the beginning of round . That is, the total number of bits encoded in the input and the identity of the node, as well as in the incoming messages from previous rounds. Here, we do not impose any restriction on the growth rate of . We would like to point out, however, that imposing such restrictions, or alternatively, imposing restrictions on the memory used by a node for local computation, may lead to interesting connections between the theory of locality and classical computational complexity theory. To sum up, during the execution of a distributed algorithm , all processors are woken up simultaneously, and, initially, a processor is solely aware of it own identity, and possibly to some local input too. Then, in each round , every processor
(1) sends messages to its neighbors,
(2) receives messages from its neighbors, and
(3) performs at most computations.

After a number of rounds (that may depend on the network and may vary among the processors, simply because nodes have different identities, potentially different inputs, and are typically located at non-isomorphic positions in the network), every processor terminates and outputs some value . Consider an algorithm running in a network with input x and identity assignment Id. The running time of a node , denoted , is the maximum of the number of rounds until outputs. The running time of the algorithm, denoted , is the maximum of the number of rounds until all processors terminate, i.e., . Let be a non-decreasing function of input configurations . (By non-decreasing, we mean that if is an induced subgraph of and and are the restrictions of x and Id, respectively, to the nodes in , then .) We say that an algorithm has running time at most , if , for every . We shall give special attention to the case that represents a constant function. Note that in general, given , the nodes may not be aware of . On the other hand, note that, if is known, then w.l.o.g. one can always assume that a local algorithm running in time at most operates at each node in two stages: (A) collect all information available in , the -neighborhood, or ball of radius of in , including inputs, identities and adjacencies, and (B) compute the output based on this information.

2.2 Local decision (LD)

We now refine some of the above concepts, in order to formally define our objects of interest. Obviously, a distributed algorithm that runs on a graph operates separately on each connected component of , and nodes of a component of cannot distinguish the underlying graph from . For this reason, we consider connected graphs only.

Definition 2.1

A configuration is a pair where is a connected graph, and every node is assigned as its local input a binary string .

In some problems, the local input of every node is empty, i.e., for every , where denotes the empty binary string. Since an undecidable collection of configurations remains undecidable in the distributed setting too, we consider only decidable collections of configurations. Formally, we define the following.

Definition 2.2

A distributed language is a decidable collection of configurations.

In general, there are several possible ways of representing a configuration of a distributed language corresponding to standard distributed computing problems. Some examples considered in this paper are the following.

Unique-Leader consists of all configurations such that there exists at most one node with local input 1, with all the others having local input 0.

Consensus consists of all configurations such that all nodes agree on the value proposed by some node.

Coloring where denotes the (open) neighborhood of , that is, all nodes at distance 1 from .

MIS .

SpanningTree is a spanning tree of consists of all configurations such that the set of edges between every node and its neighbor satisfying forms a spanning tree of .
(The language MST, for minimum spanning tree, can be defined similarly).

An identity assignment Id for a graph is an assignment of distinct integers to the nodes of . A node executing a distributed algorithm in a configuration initially knows only its own identity and its own input , and is unaware of the graph . After rounds, acquires knowledge only of its -neighborhood . In each round of the algorithm , a node may communicate with its neighbors by sending and receiving messages, and may perform at most computations. Eventually, each node must output a local output .

Let be a distributed language. We say that a distributed algorithm decides if and only if for every configuration , and for every identity assignment Id for the nodes of , every node of eventually terminates and outputs “yes” or “no”, satisfying the following decision rules:

  • If , then “yes” for every node ;

  • If , then there exists at least one node such that “no”.

We are now ready to define one of our main subjects of interest, the class , for local decision.

Definition 2.3

Let be a non-decreasing function of triplets . Define as the class of all distributed languages that can be decided by a local distributed algorithm that runs in number of rounds at most .

For instance, and . On the other hand, it is not hard to see that languages such as Unique-Leader, Consensus, and SpanningTree are not in , for any . In what follows, we define .

2.3 Non-deterministic local decision (NLD)

A distributed verification algorithm is a distributed algorithm that gets as input, in addition to a configuration , a global certificate vector y, i.e., every node of a graph gets as input a binary string , and a certificate . A verification algorithm verifies if and only if for every configuration , the following hold:

  • If , then there exists a certificate y such that for every id-assignment Id, algorithm applied on with certificate y and id-assignment Id outputs “yes” for all ;

  • If , then for every certificate y and for every id-assignment Id, algorithm applied on with certificate y and id-assignment Id outputs “no” for at least one node .

One motivation for studying the nondeterministic verification framework comes from settings in which one must perform local verifications repeatedly. In such cases, one can afford to have a relatively “wasteful” preliminary step in which a certificate is computed for each node. Using these certificates, local verifications can then be performed very fast. See [21, 22] for more details regarding such applications. Indeed, the definition of a verification algorithm finds similarities with the notion of proof-labeling schemes discussed in [21, 22]. Informally, in a proof-labeling scheme, the construction of a “good” certificate y for a configuration may depend also on the given id-assignment. Since the question of whether a configuration belongs to a language is independent from the particular id-assignment, we prefer to let the “good” certificate y depend only on the configuration. In other words, as defined above, a verification algorithm operating on a configuration and a “good” certificate y must say “yes” at every node regardless of the id-assignment.

We now define the class , for nondeterministic local decision. (our terminology is by direct analogy to the class NP in sequential computational complexity).

Definition 2.4

Let be a non-decreasing function of triplets . Define as the class of all distributed languages that can be verified in at most communication rounds.

2.4 Bounded-error probabilistic local decision (BPLD)

A randomized distributed algorithm is a distributed algorithm that enables every node , at any round during the execution, to toss a number of random bits obtaining a string . Clearly, this number cannot exceed , the bound on the number of computational steps used by node at round . Note however, that may now also depend on the random bits produced by other nodes in previous rounds. For , we say that a randomized distributed algorithm is a -decider for , or, that it decides with “yes” success probability and “no” success probability , if and only if for every configuration , and for every identity assignment Id for the nodes of , every node of eventually terminates and outputs “yes” or “no”, and the following properties are satisfied:

  • If , then ,

  • If , then ,

where the probabilities in the above definition are taken over all possible coin tosses performed by nodes. We define the class , for “Bounded-error Probabilistic Local Decision”, as follows.

Definition 2.5

For and a function , is the class of all distributed languages that have a local randomized distributed -decider running in time . (i.e., can be decided in time by a local randomized distributed algorithm with “yes” success probability and “no” success probability ).

3 A sharp threshold for randomization

Consider some graph , and a subset of the nodes of , i.e., . Let denote the vertex-induced subgraph of defined by the nodes in . Given a configuration , let denote the input x restricted to the nodes in . For simplicity of presentation, if is a subgraph of , we denote by . A prefix of a configuration is a configuration , where (note that in particular, is connected). We say that a language is hereditary if every prefix of every configuration is also in . Coloring and Unique-Leader are clearly hereditary languages. As another example of an hereditary language, consider a family of hereditary graphs, i.e., that is closed under vertex deletion; then the language is hereditary. Examples of hereditary graph families are planar graphs, interval graphs, forests, chordal graphs, cographs, perfect graphs, etc.

Theorem 3.1 below asserts that, for hereditary languages, randomization does not help if one imposes that , i.e, the ”no” success probability distribution is at least as large as one minus the square of the ”yes” success probability. Somewhat more formally, we prove that for hereditary languages, we have . This complements the fact that for , we have , for any .

Recall that [34] investigates the question of whether randomization helps for constructing in constant time a solution for a problem in LCL. We stress that the technique used in [34] for tackling this question relies heavily on the definition of LCL, specifically, that only graphs of constant degree and of constant input size are considered. Hence it is not clear whether the technique of [34] can be useful for our purposes, as we impose no such assumptions on the degrees or input sizes. Also, although it seems at first glance, that Lov‡sz local lemma might have been helpful here, we could not effectively apply it in our proof. Instead, we use a completely different approach.

Theorem 3.1

Let be an hereditary language and let be a function. If for constants such that , then .

Proof. Let us start with some definitions. Let be a language in where and , and is some function. Let be a randomized algorithm deciding , with ”yes” success probability , and ”no” success probability , whose running time is at most , for every configuration with identity assignment Id. Fix a configuration , and an id-assignment Id for the nodes of . The distance between two nodes of is the minimum number of edges in a path connecting and in . The distance between two subsets is defined as

For a set , let denote the event that when running on with id-assignment Id, all nodes in output “yes”. Let . The running time of at may depend on the coin tosses made by the nodes. Let denote the maximal running time of over all possible coin tosses. Note that (we do not assume that neither or are known to ).

The radius of a node , denoted , is the maximum value such that there exists a node , where . (Observe that the radius of a node is at most .) The radius of a set of nodes is . In what follows, fix a constant such that , and define .

A splitter of is a triplet of pairwise disjoint subsets of nodes such that , . (Observe that may depend on the identity assignment and the input, and therefore, being a splitter is not just a topological property depending only on ). Given a splitter of , let , and let be the input x restricted to nodes in , for .

The following structural claim does not use the fact that is hereditary.

Lemma 3.2

For every configuration with identity assignment Id, and every splitter of , we have

Let be a configuration with identity assignment Id. Assume, towards contradiction, that there exists a splitter of triplet , such that and , yet . (The fact that and implies that both and are connected, however, we note, that for the claim to be true, it is not required that , or are connected.) Let .

Given a vertex , we define the level of by . For an integer , let denote the set of nodes in of level . For an integer , let , and finally, for a set of integers, let .

Define

Claim 3.3

There exists such that .

Proof. For proving Claim 3.3, we upper bound the size of by . This is done by covering the integers in by at most sets, such that each one is -independent, that is, for every two integers in the same set, they are at least apart. Specifically, for and , we define . Observe that, as desired, , and for each , is -independent. In what follows, fix and let . Since , we know that,

Observe that for , , and hence, the -neighborhood in of every node is contained in , i.e., . It therefore follows that:

(1)

Consider two integers and in . We know that . Hence, the distance in between any two nodes and is at least . Thus, the events and are independent. It follows by the definition of , that

(2)

By (1) and (2), we have that and thus . Since can be covered by the sets , , each of which is -independent, we get that

Combining this bound with the fact that , we get that . It follows by the pigeonhole principle that there exists some such that , as desired. This completes the proof of Claim 3.3.

Fix such that , and let . By definition,

(3)

Let denote the subgraph of induced by the nodes in . We similarly define as the subgraph of induced by the nodes in . Note that , and for any two nodes and , we have . It follows that, for , the -neighborhood in of each node equals the -neighborhood in of , that is, . (To see why, consider, for example, the case . Given , it is sufficient to show that , such that . Indeed, if such a vertex exists then , and hence . Since there must exists a vertex such that , we get that , in contradiction to the fact that .) Thus, for , since , we get .

Let . As the events and are independent, it follows that , that is

(4)

By Eqs. (3) and (4), and using union bound, it follows that . Thus

This is in contradiction to the assumption that . This concludes the proof of Lemma 3.2.

Our goal now is to show that by proving the existence of a deterministic local algorithm that runs in time and recognizes . (No attempt is made here to minimize the constant factor hidden in the notation.) Recall that both and may not be known to . Nevertheless, by inspecting the balls for increasing , each node can compute an upper bound on as given by the following claim.

Claim 3.4

Fix a a configuration , an id-assignment Id, and a constant . In time, each node can compute a value such that (1) and (2) for every , we have .

To establish the claim, observe first that in time, each node can compute a value satisfying . Indeed, given the ball , for some integer , and using the upper bound on number of (sequential) local computations, node can simulate all its possible executions up to round . The desired value is the smallest for which all executions of up to round conclude with an output at . Once is computed, node aims at computing . For this purpose, it starts again to inspect the balls for increasing , to obtain from each . (For this purpose, it may need to wait until computes , but this delays the whole computation by at most time.) Now, node outputs for the smallest satisfying (1) and (2) for every , we have . It is easy to see that for this , we have , hence .

Given a configuration , and an id-assignment Id, Algorithm , applied at a node first calculates , and then outputs “yes” if and only if the -neighborhood of in belongs to . That is,

Obviously, Algorithm is a deterministic algorithm that runs in time . We claim that Algorithm decides . Indeed, since is hereditary, if , then every prefix of is also in , and thus, every node outputs “yes”. Now consider the case where , and assume by contradiction that by applying on with id-assignment Id, every node outputs “yes”. Let be maximal by inclusion, such that is connected and . Obviously, is not empty, as for every node . On the other hand, we have , because .

Let be a node with maximal such that contains a node outside . Define as the subgraph of induced by . Observe that is connected and that strictly contains . Towards contradiction, our goal is to show that .

Let denote the graph which is maximal by inclusion such that is connected and

Let be the connected components of , ordered arbitrarily. Let be the empty graph, and for , define the graph . Observe that is connected for each . We prove by induction on that for every . This will establish the contradiction since . For the basis of the induction, the case , we need to show that . However, this is immediate by the facts that is a connected subgraph of , the configuration , and is hereditary. Assume now that we have for , and consider the graph . Define the sets of nodes

A crucial observation is that is a splitter of . This follows from the following arguments. Let us first show that . By definition, we have , for every . Hence, in order to bound the radius of (in ) by it is sufficient to prove that there is no node such that . Indeed, if such a node exists then and hence contains a node outside , in contradiction to the choice of . It follows that .

We now claim that . Consider a simple directed path in going from a node to a node . Since and , we get that must pass through a vertex in . Let be the last vertex in such that , and consider the directed subpath of going from to . Now, let . The first vertices in the directed subpath must belong to . In addition, observe that all nodes in must be in . It follows that the first nodes of are in . Since , we get that , and thus . Consequently, , as desired. This completes the proof that is a splitter of .

Now, by the induction hypothesis, we have , because . In addition, we have , because , and