Labeling Algorithm and Compact Routing Schemefor a Small World Network Model

Labeling Algorithm and Compact Routing Scheme
for a Small World Network Model

Santiago Viertel sviertel@inf.ufpr.br André Luís Vignatti vignatti@inf.ufpr.br DINF, Federal University of Paraná, Curitiba, Brazil
Abstract

This paper presents a small world networks generative model and a labeling algorithm for networks generated by this model. In the context of routing messages in networks, labeling algorithms process a network assigning labels, that are addresses, to the nodes. Our model is based on Kleinberg model, generating a -dimensional torus with additional random undirected long-range edges. The Kleinberg routing algorithm can forward messages in these networks, but it needs the vertices labels, that are positions in a lattice, to make routing decisions. Finding these labels when they are not known a priori is a problem of routing in small world networks and labeling algorithms are possible solutions.

The design of our labeling algorithm uses the approach of searching for induced -cycles to find the underlying torus. However, the generated graph may have -cycles with edges not in this torus. We show that the probability of these cycles appearing in a vertex is O. This property allows the long-range edges removing through the detection of lattice patterns on combinations of -cycles, and the running of a breadth-first search in the resulting graph.

Our labeling algorithm labels almost all vertices in O expected time, where . We also present a compact routing scheme, that is a combination of a preprocessing algorithm, that generates sub-linear structures per vertex, and a routing algorithm, that uses these structures for routing messages through paths with bounded length. Our preprocessing algorithm generates structures with expected size of O bits per vertex in expected time O. The Kleinberg routing algorithm uses these structures, running in expected constant time in each vertex and performing an expected number of forwards O.

keywords:
small world network, labeling algorithm, compact routing scheme, greedy routing
journal: Theoretical Computer Science\newdefinition

fctFact \newdefinitiondfnDefinition \newproofpfProof

1 Introduction

Milgram’s experiment milgram1967small () concluded that the average path length between two vertices in real-world networks is small. This phenomenon is known as “small world”, which motivates the development of formal models. There are two well known small world graph models watts1998cds (); Kleinberg:2000 (), both focusing on small average path length and also vertices clustering. Small world graphs have many applications such as networks of the electronic mail messages exchanging ADAMIC2005187 (), friendship Liben-Nowell16082005 (), Internet domain Krioukov2004 (), peer-to-peer Jovanovic:2001 (); ilprints628 () and wireless sensors Liu2009 ().

Kleinberg Kleinberg:2000 () presented a small world graph model, and a greedy routing algorithm for his model, and proved that the algorithm delivers messages with few forwards. The model consists of a -dimensional lattice with some random directed long-range edges. The greedy routing algorithm forwards a received message to the neighbor closest to the target. Kleinberg proved that this decentralized routing procedure executes O expected number of message forwards in graphs generated by his model. Some works make modifications on Kleinberg model and the greedy routing in order to obtain improvements or trade-offs between storage needed in bits per vertex and paths lengths Martel:2004:AKS:1011767.1011794 (); Zeng:2005:NOR:2098796.2098861 (); Liu2009 (). Kleinberg routing algorithm requires that the vertices have labels and routing tables with positioning information. Finding these labels is the main goal of this paper.

In the context of compact routing, we refer preprocessing algorithms those that generate labels and routing tables to each vertex, which typically store the graph topological information. A routing algorithm uses the label and routing table of a vertex to forward messages through it Cowen:2001:CRM:370968.370995 (). A routing scheme is a combination of a preprocessing algorithm and the related routing algorithm, which provides a complete mechanism for routing messages in a network Thorup:2001:CRS:378580.378581 (). A routing scheme is compact if each vertex uses sublinear storage space in the number of vertices Chen:2009:CRP:1813164.1813216 (); Chen:2012:CRS:2390176.2390180 (). The worst and expected cases of the number of bits for storage per vertex are frequent metrics for measure effectiveness of routing schemes Thorup:2001:CRS:378580.378581 (); Chen:2012:CRS:2390176.2390180 (). Some works also measure through stretch Cowen:2001:CRM:370968.370995 (); Abraham:2006:RNL:1153919.1154086 (); Chen:2009:CRP:1813164.1813216 (), which is the maximal ratio between the path lengths performed by the routing scheme and the shortest path over all pairs of vertices. Others measure through the expected path length performed by the routing scheme Zeng:2005:NOR:2098796.2098861 (); Zeng2006 (); Liu2009 ().

There are compact routing schemes for general graphs Cowen:2001:CRM:370968.370995 (); Thorup:2001:CRS:378580.378581 (), trees Thorup:2001:CRS:378580.378581 (), graphs with low doubling dimension Abraham:2006:RNL:1153919.1154086 (); Konjevod:2016:SCR:2930058.2876055 () and power law graphs Chen:2009:CRP:1813164.1813216 (); Chen:2012:CRS:2390176.2390180 (). Some works deal with greedy routing on metric spaces Ban:2010:NRC:2790231.2790244 (); Bringmann:2017:GRA:3087801.3087829 (), even in dynamic networks Krioukov2008 (); 5462131 (). Some of them propose the study of embeddings in metric spaces. Embedding graphs in metric spaces defined by -dimensional lattice is an interesting problem in the context of greedy routing in graphs generated by the Kleinberg model. An embedding attaches to each vertex a label with two integers representing the vertex position on the lattice. Sandberg Sandberg:2006:DRS:2791171.2791185 () claims that in some applications, such as peer-to-peer, a vertex may not have stored the positions of its neighbors. He solved this problem presenting a Markov chain Monte Carlo algorithm that estimates the position of all vertices. Kleinberg Kleinberg06complexnetworks () also mentioned this problem.

We present an algorithm that labels almost all vertices of small world graphs built upon a torus, instead of a lattice. The algorithm tries to identify the spanning torus and labels the vertices with their positions on the torus. The spanning torus is a torus subgraph generated by the small world model presented in this paper. It is a spanning graph, by the fact that it comprehends all the vertices. Section 3 presents it with more details. The algorithm runs a bounded depth search in each vertex aiming to find all the -cycles rooted in it. The vertex recognizes lattice patterns in all combinations of -cycles and identifies the edges in the torus, if possible. Finally, the algorithm labels all vertices with identified edges and their neighbors by a global breadth-first search.

Contributions. We present a small world model based on Kleinberg model, called undirected toroidal small world (UTSW) model. It differs on the generation of graphs over a torus, instead of a lattice, and with undirected long-range edges, instead of directed. We present upper and lower bounds on the normalizing factor of , where is a parameter of the model that defines the torus size. The normalizing factor is a multiplicative factor that keeps the probability distribution summing to one on the long-range edges random generation. We present a formal definition for the labeling problem, an upper bound on the probability of existence of -cycles composed by edges not in the spanning torus and show that this probability is small, which is O. We also present an expected linear time algorithm that labels almost all vertices and show a compact routing scheme for UTSW model.

Organization. Section 2 presents related works on small world graph models, improvements and trade-offs in greedy routing and labeling algorithms for routing. Section 3 presents our model and an analysis of its normalizing factor. Section 4 presents the formal definition of the toroidal small world labeling problem. Section 5 presents a sequence of results in order to bound the probability of the existence of -cycles, rooted in given vertex, that do not belong to the spanning torus. This upper bound is important for the probabilistic analysis of the labeling algorithm. Section 6 presents all procedures of the labeling algorithm, together with their analyses, and finishes presenting a compact routing scheme. Section 7 presents the conclusion and future works.

2 Related Works

Watts and Strogatz watts1998cds () claim that many technological and social networks have topology between regular and random. They present a model that generates an undirected -vertex ring and connects each vertex with its closest vertices. After that, it changes the head of each edge uniformly and independently at random with probability . The resulting graphs have small average of path length for some values of , as Bollobás and Chung doi:10.1137/0401033 () prove that a cycle with a random matching has diameter with high probability.

Kleinberg Kleinberg:2000 () claims that people have “close” and “far” contacts. He presents a model based on geographic positions. It generates a lattice where each vertex has a distinct position in the lattice, represented by a pair of . For each vertex, it creates directed edges to vertices within steps in the lattice and directed edges with independent random trials. Each edge with tail in has head with probability , where is the distance in the lattice between and , is the parameter for the probability distribution and is the normalizing factor in . The distance in the lattice between and is , where is the position of in the lattice. The normalizing factor in is . The probability distribution is referred in this paper as inverse -power distribution.

Kleinberg also investigates the influence of in a specific greedy routing procedure. Myopic search111As the author named it in one of his books David:2010:NCM:1805895 (). is a greedy routing procedure that uses the positions of the neighbors and the target. In myopic search, a vertex forwards the message to its neighbor , where is the set of neighbors of , is the target and is the distance in the lattice. He proves that myopic search performs an expected number of forwards O, for and . Martel and Nguyen Martel:2004:AKS:1011767.1011794 () show that this bound is tight. Myopic search is considered an efficient algorithm because a polylogarithmic function in defines the expected number of forwards. Sandberg Sandberg:2006:DRS:2791171.2791185 () presents the problem of vertices labeling when the positions are unknown. His algorithm estimates the position of each vertex in the case of the graph is toroidal and created by the Kleinberg model. The labeling algorithm presented in this paper labels a large fraction of vertices with exact positions, which distinguishes it from the statistical estimations that the Sandberg method outputs.

Analyzing the greedy routing in alternative models are also considered. Manku ilprints628 () presents a routing algorithm similar to myopic search that also considers the neighbors of the neighbors. The expected number of forwards is O for a unidirectional -ring based model, where is the number of directed edges generated by random trials. Martel and Nguyen Martel:2004:AKS:1011767.1011794 () present a routing algorithm that also considers the positions of the long-range edges endpoints of the nearest vertices. The expected number of forwards is O for a -dimensional lattice with long-range edges created with the inverse -power distribution. Fraigniaud, Gavoille and Paul Fraigniaud:2006:ESE:1160297.1160302 () obtain this result through an oblivious routing algorithm, that does not change the message header.

Zeng, Hsu and Wang Zeng:2005:NOR:2098796.2098861 () present a unidirectional -ring based model. For each vertex, it creates one long-range edge with the inverse -power distribution and two augmented links with the vertices within distance chosen uniformly at random. They define a non-oblivious and an oblivious routing algorithms that consider the positions of the O nearest vertices reachable through augmented links. Both have O expected number of forwards. Zeng and Hsu Zeng2006 () generalize the model for dimensions and define an oblivious algorithm with expected number of forwards O.

Liu, Guan, Bai and Lu Liu2009 () present a model that creates one long-range edge per subgroup of vertices. The routing algorithm considers the positions of the endpoints of the O nearest long-range edges, where . The expected number of forwards is O. More works AspnesDS2002 (); Dietzfelbinger:2009:TLB:1536414.1536494 () also present models, greedy routing algorithms and their expected number of forwards.

In the routing context, each vertex usually has a label and a routing table that encode network topological information. Preprocessing algorithms take as input a graph that models the network and output the labels and routing tables of all vertices. Labeling algorithms assign addresses, or labels, to the vertices Cowen:2001:CRM:370968.370995 (). Running time and labels size are the main metrics for analyzing labeling algorithms.

Cowen Cowen:2001:CRM:370968.370995 () presents a compact routing scheme for general networks. The labeling algorithm runs in Õ time222The asymptotic notation Õ hides polylogarithmic multiplicative factors from the bound., where is the number of vertices, is the number of edges and . It chooses a set of vertices called landmarks and generates labels with bits, where is the maximum degree of the graph. Thorup and Zwick Thorup:2001:CRS:378580.378581 () present a landmark choosing process that changes the running time to Õ.

There are specialized compact routing schemes. Abraham, Gavoille, Goldberg, and Malkhi Abraham:2006:RNL:1153919.1154086 () present one for networks with low doubling dimension. The labeling algorithm running time is linear and it generates labels with bits. Chen, Sommer, Teng and Wang Chen:2009:CRP:1813164.1813216 (); Chen:2012:CRS:2390176.2390180 () present a compact routing scheme for power law networks. The labels encode a shortest path and, because of that, they have size of O bits with probability o. The preprocessing algorithm runs in O expected time, where .

Some authors use existing compact routing schemes. Brady and Cowen Brady:2006:CRP:2791171.2791183 () present a compact routing scheme for power law networks that uses the Thorup and Zwick scheme for trees Thorup:2001:CRS:378580.378581 (). The labels have O bits, where assumes small values. Konjevod, Richa and Xia Konjevod:2016:SCR:2930058.2876055 () present a compact routing scheme for networks with low doubling dimension that also uses the Thorup and Zwick scheme for trees, where the labels have bits.

3 Undirected Toroidal Small World Model

This section presents the undirected toroidal small world (UTSW) model. The model is similar to the Kleinberg Kleinberg:2000 () model. Informally speaking, instead of using a lattice, we “tie the borders” of the lattice, obtaining a torus. Also, Kleinberg uses directed long-range edges, but we use undirected instead. More formally, let and . The model first generates a -dimensional undirected torus as follows. It creates vertices such that each vertex is a distinct element of . After that, each vertex is connected with the vertices and through undirected edges. Now, we have a torus as Figure 1 shows, in that case, it is the -dimensional undirected torus with size . We denote this generated torus as with size in the rest of the paper, unless the generated torus or are locally redefined. Note that and . We also assume sizes , otherwise the torus may have parallel edges or loops, both of which we do not consider in this paper.

Figure 1: The -dimensional undirected torus with size .

Let be the distance between and on . Note that for all , where is defined for during the torus generation. After the torus creation, each vertex chooses another vertex to create an edge. The choice is sampled from the inverse -power distribution, i.e., chooses with probability , where is the normalizing factor in . The normalizing factor is a value that keeps the probability distribution summing to one in each . Its value is

To avoid parallel edges, an edge is not created if it has already been created. We refer the edges created in the torus generation process as local edges and those created by the randomized process as long-range edges. Let be the event of vertex choosing the vertex to create a long-range edge. Note that the probability distribution , for all , sums to one. We use the distance function and the event in the rest of the paper. We also assume that the positions of all , computed during the torus generation, are not an output of the model. In fact, finding these positions is the problem that we solve in this paper.

Theorem 1

for all .

{@proof}

[Proof.] Let for all . Figure 2 shows the vertices in in a torus with . As is symmetrical for all and , so and, then

Figure 2: Vertices at a distance from , with .

Despite each vertex having its own normalizing factor value, Theorem 1 shows that all of them are equal to each other. So, we denote it as , as shown in Definition 3. The UTSW model is defined in Definition 3.

{dfn}

[Normalizing factor] The normalizing factor is for any .

{dfn}

[UTSW model] The undirected toroidal small world (UTSW) model is a graph over a -dimensional undirected torus with size and, for all and a , the edge is included in with probability .

We assume that is a UTSW graph from now on. The remaining of this section presents some results related to the UTSW model that are used throughout the paper.

Lemma 2

Let for each and . Then, if , and otherwise.

{@proof}

[Proof.] The proof follows by “sweeping” in one of the two dimensions of . When ,

When , is an upper bound for .∎

{fct}

[Harmonic number] Let be the harmonic number, then .

Theorem 3

The normalizing factor .

{@proof}

[Proof.] Let for any and all . Given the Definition 3,

due to and Lemma 2. By Fact 3 and , .∎

Theorem 4

The normalizing factor .

{@proof}

[Proof.] As in the beginning of the Theorem 3 proof and by Lemma 2, . By Fact 3, .∎

Despite that Theorem 4 is not used in the rest of the paper, it implies that the upper bound of , presented in Theorem 3, is tight.

4 Toroidal Small World Labeling Problem

A position, or label, in a -dimensional torus of size is an element of , and , for , are the coordinates of .

{dfn}

Let be the function such that , where . A -dimensional toroidal vertex labeling function of is a function such that for all and .

The function is similar to , but defines the distance between a pair of vertices on and defines the distance between a pair of positions (labels). The function defines a label to each vertex of such that the distances and between any two vertices of are equal. The toroidal small world labeling problem consists of finding a -dimensional toroidal vertex labeling function for a UTSW graph and a spanning torus of size . Note that is a spanning torus of that may be distinct from the original torus generated by UTSW model. This is an interesting problem because, given a -dimensional toroidal vertex labeling function , it is possible to run myopic search for routing messages in .

Note that has at least one -dimensional spanning torus, due to the UTSW definition. Besides that, the long-range edges creation process in UTSW may generate others -dimensional spanning tori, as we explain next. The original torus is composed only by local edges, while the others tori are composed by long-range edges also. The left side of the Figure 3 shows a part of a UTSW graph. In this case, each vertex chose a specific vertex to create a long-range edge, represented by the dashed arrows. This process generates a graph that may have more than one spanning torus. The Figure 3 highlights one of them in the right side.

Figure 3: Graph with more than one spanning torus.

5 Cycles of Size 4 Outside the Torus

Identifying a -dimensional spanning torus is a crucial procedure. It can be used by a breadth-first search to label the vertices, as we do in Section 6. Our approach is to detect cycles of size composed by distinct vertices, due to a -dimensional torus being an arrange of them.

A UTSW graph may have induced -cycles composed by some long-range edges, instead of only local edges. The existence of this type of cycles is a problem in the torus identification process, if one uses the approach of searching for induced -cycles. For any , there are only four cycles of size rooted in in the torus . Besides that, the UTSW process may create long-range edges such that more induced -cycles rooted in appear in . Despite this, we show in Theorem 16 that the probability of the existence of -cycles composed by at least one long-range edge is small, and this is the main result of this section.

We aim to bound the probability of the event that at least one -cycle, rooted in , is in but not is in . Such event, turns out to be a union of nine other events. Lemmas 7 to 15 show upper bounds on the probabilities of each of these nine events. Lemmas 5 and 6 are technical results and, together with the Fact 5, are used to prove the other results.

Lemma 5

for all .

{@proof}

[Proof.] Grouping the terms of the sum by their values and using Lemma 2, . By Fact 3, .∎

{fct}

[Riemann zeta function] where is the Riemann zeta function.

Lemma 6

for all and .

{@proof}

[Proof.] The distance because is a -dimensional torus. Thus, the sum can be split into three sums: (i) , (ii) and (iii) .

For (i), for all terms. This fact allows the using of the triangle inequality such that is positive for all . So, this term is at most . Since and by Lemma 2,

where the last inequality holds because for all terms in both sums. As all sums terms are positive for , then , by Fact 5.

For (ii), as , , and Lemma 2,

For (iii), holds by a similar way for (i). The latter is equal to , by Fact 5. Given the upper bounds on (i), (ii) and (iii), therefore .∎

Given , let be the event of the existence of at least one -cycle, rooted in , that is in but not in . Note that this event is equal to the event of UTSW creating long-range edges so that exists at least one -cycle rooted in that belongs to composed by at least one long-range edge. Let be a local edge and be a long-range edge in a sequence of edges of a -cycle rooted in . Note that, there are possible combinations of local and long-range edges and, one of them is certainly not in , which is , i.e. the -cycle with only local edges. The others combinations can be grouped in nine sub-events of , where each group leads to a different analysis, but the events belonging to a given group have the same analysis (which motivates the grouping). Then, and, each is defined by the following list.

  • : existence of at least one -cycle, rooted in , that belongs to with edge sequence or ;

  • : existence of at least one -cycle, rooted in , that belongs to with edge sequence or ;

  • : existence of at least one -cycle, rooted in , that belongs to with edge sequence or ;

  • : existence of at least one -cycle, rooted in , that belongs to with edge sequence or ;

  • : existence of at least one -cycle, rooted in , that belongs to with edge sequence ;

  • : existence of at least one -cycle, rooted in , that belongs to with edge sequence or ;

  • : existence of at least one -cycle, rooted in , that belongs to with edge sequence ;

  • : existence of at least one -cycle, rooted in , that belongs to with edge sequence or ;

  • : existence of at least one -cycle, rooted in , that belongs to with edge sequence .

Lemmas 7 to 15 show upper bounds on the probabilities of each of these events. Finally, Theorem 16 bounds the probability of . Recall that is the event of vertex choosing the vertex to create a long-range edge.

Lemma 7

.

{@proof}

[Proof.] Let . Then, . Using union bound, Definition 3, and , by Lemma 2, then . By Theorem 3, .∎

Lemma 8

.

{@proof}

[Proof.] Let and . So . Using union bound, Definition 3 and , then . By Lemma 2, . When , occurs five times and occurs three times in the last sum. So, because , then . By Theorem 3, .∎

Lemma 9

.

{@proof}

[Proof.] Let . The event is

Note that the three intersections are between mutually independent events. So, in a similar way of Lemmas 7 and 8 proofs, we have

The last sum can be split into two other sums, one for and the other for , because there is no such that in . When , for all and, as a consequence, , due to Lemma 2. When , the triangle inequality is used to find , because . By this fact and Lemma 2, . Therefore, , by the fact that , due to Lemma 2. Using Fact 5 and Theorem 3, .∎

Lemma 10

.

{@proof}

[Proof.] Let and, for a given , . Note that, for , is an event and is a set. The event is

All simple events are mutually independent. Using the union bound, the simple events independence and Definition 3, then

The second sum can be split for and , because there is no such that for all in .

When , holds, due to Lemma 2, for all . Note that, at most five of the at most eight vertices have their with size . The others at most three vertices do not belong to the event . Furthermore, when and, so .

When , for all . Combining these splittings of the sum for and , together with Lemma 2,

As , by Fact 5 and Theorem 3, .∎

Lemma 11

.

{@proof}

[Proof.] Let such that in any sequence. The event is

All simple events in the intersections are independent. So,

Note that in . When , there is a such that for all . Thus, , due to Lemma 2. Furthermore, for all . Thus, the inequality holds when . When , for all and . Combining with Lemma 2, the inequality holds. As , then . By Lemma 2, , thus there are distinct combinations of values for and . Therefore, , due to Fact 5 and Theorem 3.∎

Lemma 12

.

{@proof}

[Proof.] Let . The event is

All the intersections in are among mutually independent simple events. So, because , , and by Definition 3, the probabilities of the four combinations of intersections of simple events in are equal to , for all , and . Using the union bound,

Using Lemmas 6 and 5 and due to , by Lemma 2 and Theorem 3, .∎

Lemma 13

.

{@proof}

[Proof.] For a given , let . The event is

All intersections are between mutually independent simple events. Using the union bound, the events independence, Definition 3, and , then

Note that in . So, the first sum can be split for and .

When , there is a such that for all . As a consequence, the inequalities and hold, due to Lemma 2. Besides that, for all and . Thus, when .

When , then for all and . Therefore,