The Power of Choice combined with Preferential Attachment

The Power of Choice combined with Preferential Attachment

Yury Malyshkin Department of Mathematics and Mechanics, Moscow State University
Laboratory of Solid State Electronics, Tver State University
yury.malyshkin@mail.ru
 and  Elliot Paquette Department of Mathematics, Weizmann Institute of Science paquette@weizmann.ac.il
July 11, 2019
Abstract.

We prove almost sure convergence of the maximum degree in an evolving tree model combining local choice and preferential attachment. At each step in the growth of the graph, a new vertex is introduced. A fixed, finite number of possible neighbors are sampled from the existing vertices with probability proportional to degree. Of these possibilities, the vertex with the largest degree is chosen. The maximal degree in this model has linear or near-linear behavior. This contrasts sharply with what is seen in the same choice model without preferential attachment. The proof is based showing the tree has a persistent hub by comparison with the standard preferential attachment model, as well as martingale and random walk arguments.

YM gratefully acknowledges the support of the Weizmann Institute of Science, where this work was performed. EP gratefully acknowledges the support of NSF Postdoctoral Fellowship DMS-1304057.
\DeclareDocumentCommand\one

o \IfNoValueTF#1 1 1{ #1 }

\DeclareDocumentCommand\fuzz

o o \IfNoValueTF#1 ℵ_M \IfNoValueTF #2 ℵ_M^#1 ℵ_M^#1(#2) \DeclareDocumentCommand\pam Om P_#1 \DeclareDocumentCommand\sca Oj Q_ #1 \DeclareDocumentCommand\mga Oj A_ #1 \DeclareDocumentCommand\mgb Oj Oc Q_ #1 ^ #2 \DeclareDocumentCommand\mgc Oj Oc U_ #1 ^ #2 \DeclareDocumentCommand\Filt Oj F_#1

1. Introduction

In the present work we further explore how the addition of choice affects the classic preferential attachment model (see [BA99, KRL00]), building on previous work [DKM07, MP13, KR13]. The preferential attachment graph is a time-indexed inductively constructed sequence of graphs, constructed the following way. We start with some initial graph and then on each step we add a new vertex and an edge between it and one of the old vertices, chosen with probability proportional to its degree. Many different properties of this model have been obtained in both the math and physics literature (see [BA99, KRL00, Mór05, DvdHH10]).

In current work we are interested in the degree distribution and in particular in the maximal degree. For the preferential attachment model this problem is studied in [FFF05, Mór05]. It is shown in [Mór05] that the maximum degree at time has that converges almost surely to a non-degenerate absolutely continuous distribution. In [MP13], limited choice is introduced into the preferential attachment model. More specifically, at each step we independently choose (or in general) existing vertices with probability proportional to degree and connect the new vertex with the one with smaller degree. In [MP13] it is shown that the maximal degree at time in such a model will be with high probability ( in case of choices). There, it is also conjectured by the present authors that if we choose the vertex with the higher degree, the maximal degree will be of order . Subsequently, this is studied in the physics literature [KR13], where the analysis is expanded to show that for this is indeed the case while for the maximal degree has linear order.

We will give exact first-order asymptotics for the maximal degree in the max-choice model and show almost sure convergence of the appropriately scaled maximal degree. We now describe the model in more detail.

Define a sequence of trees given by the following rule. Let be the one-edge tree. Given define by first adding one new vertex . Let , where , be i.i.d. vertices from where is the set of vertices of chosen with probability

Note that as the graph has edges, . Finally, create a new edge between and where is whichever of ,…, has larger degree. In the case of a tie, choose according to an independent fair coin toss. We call this the max-choice preferential attachment tree.

Let us formulate our main theorem:

Theorem 1.1.

In the case the maximum degree of has

a.s. For the maximum degree of has

a.s., where is the unique positive solution of equation in the interval .

Our proof is based on the existence of a persistent hub, i.e. a single vertex that in some finite random time becomes the highest degree vertex for all time after. Using this, instead of analyzing the maximum degree over all vertices we effectively only need to analyze the degree of just one vertex.

Proposition 1.2.

There exists random and that are finite almost surely so that at any time , the vertex has the highest degree among all vertices.

Let denote the number of vertices at time that have maximal degree. The dynamics of are given by the rule

(1)

The effect of Proposition 1.2 is that for some random and sufficiently large, for all If we were to assume that were identically one, we effectively consider a simple multi-choice urn.

This urn contains types of balls, colored black and colored white, with the number of black balls corresponding to and the number of white balls being At every time step, balls are sampled from the urn with replacement and then put back into the urn. If all are white, then two white balls are added back to the urn. If at least one is black, then one white ball and one black ball are added to the urn. Such urn models with multiple samplings have appeared recently in the literature (see [KMP13, CW05]), although this appears to be an uncovered case.

Proof approach and organization

We start in section 3 with some initial lower-bound estimates for the maximal degree. All subsequent arguments require that the maximal degree grows quickly enough to ensure deterministic behavior takes over.

In section 4 we prove the existence of the persistent hub, which allows us to consider the degree of a single vertex instead of the maximal degree. The argument follows the proof of [Gal13] for convex preferential attachment models and consists of two steps. First, we show that the number of possible leaders, vertices that have maximal degree at some time, is almost surely finite; this follows on account of the maximal degree growing quickly enough that vertices added after a long time have a very small probability of ever catching up. Second, we show that any two vertices have degrees that change leadership only finitely many times. These arguments rely heavily on comparison with the preferential attachment model and the Pólya urn respectively.

In sections 5 and 6 we prove convergence of the scaled maximal degree in the cases and respectively, which require different analyses. From (1), we anticipate the maximal degree of the graph evolves according to the differential equation

Setting we get that satisfies the autonomous differential equation

In the case this can be explicitly solved to give while in the case, we are led to consider critical points, which are solutions of When there are two solution of the equation in the interval but it only has one stable solution (meaning that has the opposite sign of in a neighborhood of ).

In section 5 we prove the case by considering explicit scale functions of that can be guessed from the solution of the differential equation. In section 6, we prove the case, which can be formulated generally as follows. Consider a continuous function and define a process , started from point , such that the increments are independent variables conditioned on This problem has appeared many times in the stochastic approximation literature under the name of the Robbins-Monro model (see [KC78] or [Ben99]). Off the shelf techniques are nearly applicable to the situation for but still require that we show that are in a neighborhood of infinitely often, which is the bulk of the work here. We then give a quick random walk argument to show that converges to

2. Discussion

Theorem 1.1 allows us to complete Table LABEL:tab:results about the influence of choice on the maximum degree of growing random trees. In summary, for the min-choice models, the effect of the choice completely overwhelms the extra effect of the preferential attachment. On the other hand, the combined effect of preferential attachment with max-choice completely changes the structure of the graph and the order of the maximum degree (see also Figure 1 for a simulation of these trees). In comparison, adding max-choice to the uniform attachment model produces only a quantitative increase in the maximum degree.

Theorem 1.1 along with Proposition 1.2 provide us information about the degree sequence of the graph and some structural information about the graph, but it would be nice to know more topological information about the tree. One natural topological property to consider is the diameter of the tree.

\ctable

[ notespar, caption = Comparison of max/min-choice for choices with preferential or uniform attachment. , label = tab:results, pos = t ]r | c c c \tnote[(a)] [Mór05] \tnote[(b)] [MP13] \tnote[(c)] [DKM07] \tnote[(d)] To our knowledge, this is not claimed formally anywhere. However, getting the correct order is an elementary exercise. \FL& max-choice & no-choice & min-choice \MLPreferential attachment & & \tmark[(a)]& \tmark[(b)] \NNUniform attachment & \tmark[(c)]& \tmark[(d)] & \tmark[(c)] \LL

In the standard preferential attachment model the diameter is known to be logarithmic [DvdHH10]. It is natural to wonder if the diameter in this situation is smaller. To increase the diameter we must add an edge between a new vertex and an existing vertex of degree-1. In the max-choice model, choosing such a vertex is still not too rare; for while it is less likely to choose a degree-1 vertex than in preferential attachment, there are degree one vertices. Thus, degree-1 vertices are selected at each time step with some probability bounded away from Conditional on choosing a vertex of degree the exact choice of vertex is uniform over all possible choices. Thus we conjecture the diameter of the graph grows at a rate that is commensurate to that of the preferential attachment model.

The rate could be different if we change the rule of picking the vertex in the case of a tie. The model we study breaks ties uniformly, but in fact any tie breaking rule have the same degree sequence evolutions in law. However, it could significantly affect the structure of the graph. For example, if instead of a fair coin toss we define a function , and on each step we choose the vertex with the smallest value of among all vertices with the same degrees, we anticipate something like order diameter (see also [KRL00], where such a model is considered).

In the model we study here, we consider only graphs that are trees, and we believe that similar results should hold for classes on non-tree models. One such natural model would be to add more than one edge at each step. A second would be to flip a coin at each time step to choose between adding a new vertex or adding an edge between existing vertices with probability. If adding a vertex, the rule would be the same as in our model, while for adding an edge there are a few natural possibilities that could affect structure of the graph. Here is one of such rules. We choose the first vertex with probability proportional to the degrees of the vertices of the graph (which is preferential attachment without choice), and then we choose the second vertex among all non-adjacent vertices using the max- choice rule. In this case the degree distribution we anticipate max-degree behavior to match the tree model. Note that both these methods will only increase the average degree of the vertices of the graph.

(a) The preferential attachment tree.
(b) The max-choice preferential attachment tree.
(c) The max-choice uniform attachment tree.
Figure 1. All renderings are with vertices.

3. A priori estimates

We begin with a pair of lower bounds for the growth of the maximal degree. These are needed both for the persistent hub proof and the eventual precise estimates. We will frequently use the following lemma of [Gal13].

Lemma 3.1.

Suppose that a sequence of positive numbers satisfies

for fixed reals and Then has a positive limit.

This is easily checked from a direct computation. We will use denote the natural filtration for the whole tree, i.e. With respect to this filtration, both and are measurable.

Lemma 3.2.

With probability

Proof.

Define with By Lemma 3.1 we have that converges to a positive limit. Now, we will show that is a supermartingale from which the desired conclusion follows.

Let be the probability to increase maximum at the step. Note that

For we get

We will now show that with this initial argument, it is possible to improve the result by an application of the same argument.

Lemma 3.3.

For any fixed ,

a.s.

Proof.

Let be the stopping time given by

From Lemma 3.2, we have that as Set to be the event

As in the proof of Lemma 3.2, we get that Then for it holds that

For each fixed and

if for some sufficiently large . Hence for we get

Define . Then is a supermartingale and from Lemma 3.1 it follows that converges to a positive finite limit. Setting we have that by Doob’s theorem tends to a finite limit with probability 1. Hence, conditioned on we have that Thus, it follows that

Taking we conclude the proof. ∎

4. Persistent hub

Our method of proof is essentially by comparison with the preferential attachment model, and we use the machinery of [Gal13] developed for this task. First we estimate the probability that the degree of the vertex added on the step could exceed the degree of vertex with highest degree at step . For this we use the following lemma:

Lemma 4.1.

The probability that the degree of the vertex added on the -th step becomes maximal does not exceed

where is some polynomial of and is the maximum degree at the -th step. Hence, the number of vertices that at some point in the process have maximal degree is finite almost surely.

First we prove the following auxiliary result:

Lemma 4.2.

Fix Let for denote the random walk on started from that moves one step right or one step up with probabilities proportional to and respectively. For any pair of vertices and , the probability that their degrees become equal at some time is bounded above by the probability that the random walk reaches the line , where at time

Proof.

Consider the two-dimensional random walk where is the degree of vertex and is the degree of vertex . Without loss of generality assume that . We want to show that

We will show the existence of an appropriate coupling of and To this end, set

and let and

The probability that increases is at least the probability that ,…, and that all the other choices have degree strictly less than Thus

Likewise, the probability that increases is at most the probability that vertex ,…, and Thus

So long as we have Hence

Using the convexity of , we have the bound for Applying this to the previous inequality, we get:

Thus,

Letting be the times at which moves, we have that and can be coupled in such a way that both and until the first time Thus if at some finite time it must also be that there is a time at which completing the proof.

The walk would describe the evolution of the degrees of two vertices in the preferential attachment model without choices. Hence we can apply to it some of the results from [Gal13]. We will now use it to prove Lemma 4.1.

Proof.

Consider the vertex added on the -th step. Its degree at time equals to 1. Let , , and . Corollary 15 of [Gal13] gives the following estimate for the probability that the walk , moves from the point to the diagonal:

where is some polynomial.

By Lemma 3.2 we get that for some random almost surely. In particular, forms a convergent series with probability , and by Borel-Cantelli, the number of for which the vertex added at the -th step have maximal degree at some point in time is finite almost surely. ∎

To complete the proof of 1.2 we now need the following lemma:

Lemma 4.3.

Consider two vertices that at some time have maximal degree. With probability there are only a finite number of times when these vertices have the same degree and are maximal.

Proof.

Let and be two vertices that at some point have equal, maximal degree, and let be the first time that this occurs. Consider a two-dimensional random walk with coordinates equal to for all time They have the same degree if and only if the walk is on the line . As in Lemma 4.2, the probability that hits the line when started off the line is bounded from above by the probability that hits the line Hence the number of times that returns to the line is bounded above by the number of times returns to the line

It is a standard fact about the Pólya urn that if starts from a point , then the fraction tends in law to a random variable as tends to infinity, where has a beta probability distribution:

(See also Proposition 16 of [Gal13]) Since the beta distribution is absolutely continuous, the fraction tends to an absolutely continuous probability distribution for any starting point of the process . Thus the limit of exists almost surely, and it takes value with probability 0. Hence this fraction can be equal to only finitely many times, and so can return to the line only finitely many times.

Thus, the only way that there can be infinitely many times for which is if both and stabilize, i.e. there is a not depending on and an for which for all However, in this case, these degrees are only maximal for finitely many times as the maximal degree goes to infinity by Lemma 3.2, which completes the proof. ∎

Proof of Proposition 1.2 .

From Lemma 4.1 the number of vertices that at some point have maximal degree is finite almost surely, and from Lemma 4.3 these finitely many vertices only change leadership finitely many times almost surely. Thus, after some sufficiently long time, a single vertex remains the maximal degree vertex for all subsequent time. ∎

5. The case d=2

In this section, we show the limiting behavior of the maximum degree in the case From Proposition 1.2 it follows that

Introduce events , and the stopping times For fixed we define the following set of scale functions of

(2)
Lemma 5.1.

In the following, let and be a fixed positive number.

  1. For each there is a constant sufficiently large so that if then is a supermartingale.

  2. For each there is a constant sufficiently large so that if then is a supermartingale.

Proof of Lemma 5.1.

Since we only consider we have that almost surely, and hence for the probability at the -th step that increases.

Proof of (i) We must estimate for under the assumption that As we wish to show this is a supermartingale, it suffices to show that there is a sufficiently large so that under these assumptions

The proof follows by Taylor expansion.

Noting that and that under our assumption, it follows that this error term is Substituting in the definition of we get

Note that constant in the term depends only on and Hence, we may find an constant sufficiently large so that this is always strictly less than which completes the proof.

Proof of (ii) This is precisely the same calculation as was done for (i). Once more, it suffices to show that for

If we expand this expectation, we get

The same calculus shows that we have

so that when the desired claim holds. ∎

Using the a priori estimates, we are able to use to prove the main theorem for .

Proof of Theorem 1.1.

Using these supermartingales, the proof proceeds along similar lines as in Lemma 3.3. Once again set to be the event From Lemma 3.3 we have

Hence, we have that

Thus,

On the event we have by positive supermartingale convergence that there is some large random so that

Hence, on this event,

and so

Thus we have that

and so taking and we have that

As this holds for any we conclude the desired lower bound.

The upper bound follows by the exact same machinery. On the event we have by positive supermartingale convergence that there is some large random so that

Hence, on this event,

and so

Thus we have that

and so taking and we have that

As this holds for any the proof is complete. ∎

6. case d>2

The case requires different analysis from the case . Let be the solution of equation in the interval . Note that by monotonicity and continuity of each side of the equation, this solution exists and is unique. From section 5, recall the events , and the stopping time

Lemma 6.1.

Conditional on for any and there is an random with finite almost surely so that .

Proof.

The statement of the lemma is equivalent to the statement that for any and there is and such that and ; as the process has bounded increments, if such and exist, there must be a time in between that satisfies the statement of the lemma, provided is taken sufficiently large.

Recall that is the probability that conditional on Note that for with

Hence if we define the function

then . If this function is equal to . Therefore is the solution of equation in the interval . Note that for any there is a so that if and if .

We will start by proving the lower bound. Assume that for (otherwise we could just put ). Let be the first moment after such that . We need to prove that conditional on the event . Consider the expectation

Thus, by the monotonicity of there is a such that

provided for some large and Setting , we have that is a supermartingale for this same range of By Lemma 3.1 we have that converges to a positive limit, and by Doob’s theorem tends to a finite limit with probability 1. Thus there is a random constant so that for all On the other hand, and so it must be that almost surely. Thus, on the event that we have and so we can put .

Now we turn to the upper bound, which proceeds by nearly the same argument, though using a different supermartingale. To that end, consider the expectation:

Assume that for (otherwise we could just put ). Let be the first moment after , such that . We need to prove that on the event .

Lemma 3.3 and the monotonicity of imply that if and if is large enough, then there is a such that . Therefore is supermartingale for , where . By Lemma 3.1 we have that converges to a positive limit. Setting we have by Doob’s theorem tends to a finite limit with probability 1, and in particular, there is a random constant so that

However, for we have that and so it must be that Thus conditional on we have which completes the proof. ∎

Now we need an auxiliary lemma about the sum of independent variables.

Lemma 6.2.

Let denote a random walk with independent centered increments bounded by For any there is a so that for any

Proof.

For a fixed , we have by Hoeffding’s inequality that there is a so that

Summing this over we get

so that adjusting we have the desired bound. ∎

Using this lemma we will prove next result:

Lemma 6.3.

With probability

Proof.

We will show that for each only finitely many times with probability . The argument to show that it is less than only finitely many times is identical. Together, both statements complete the proof. For any let denote the interval For any let be the first time greater than that or Call the event

As with probability there is an so that then by virtue of Lemma 6.1, is larger than infinitely often if and only if occurs infinitely often.

Set From the monotonicity of and the definition of we have that for and for In particular, we have that

Now, given that then for any with we have that

Thus,