Distributed Storage in Mobile Wireless Networks with Device-to-Device Communication

Distributed Storage in Mobile Wireless Networks with Device-to-Device Communication

Jesper Pedersen, Alexandre Graell i Amat, ,
Iryna Andriyanova, , and Fredrik Brännström, 
This paper was presented in part at the IEEE Information Theory Workshop, Jeju Island, Korea, October 2015.This work was partially funded by the Swedish Research Council under grants 2011-5961 and 2011-5950, and by the European Research Council under Grant No. 258418 (COOPNET).J. Pedersen, A. Graell i Amat, and F. Brännström are with the Department of Signals and Systems, Chalmers University of Technology, SE-41296 Gothenburg, Sweden (e-mail: {jesper.pedersen, alexandre.graell, fredrik.brannstrom}@chalmers.se).I. Andriyanova is with the ETIS-UMR8051 group, ENSEA/University of Cergy-Pontoise/CNRS, 95015 Cergy, France (e-mail: iryna.andriyanova@ensea.fr).
Abstract

We consider the use of distributed storage (DS) to reduce the communication cost of content delivery in wireless networks. Content is stored (cached) in a number of mobile devices using an erasure correcting code. Users retrieve content from other devices using device-to-device communication or from the base station (BS), at the expense of higher communication cost. We address the repair problem when a device storing data leaves the cell. We introduce a repair scheduling where repair is performed periodically and derive analytical expressions for the overall communication cost of content download and data repair as a function of the repair interval. The derived expressions are then used to evaluate the communication cost entailed by DS using several erasure correcting codes. Our results show that DS can reduce the communication cost with respect to the case where content is downloaded only from the BS, provided that repairs are performed frequently enough. If devices storing content arrive to the cell, the communication cost using DS is further reduced and, for large enough arrival rate, it is always beneficial. Interestingly, we show that MDS codes, which do not perform well for classical DS, can yield a low overall communication cost in wireless DS.

Caching, content delivery, device-to-device communication, distributed storage, erasure correcting codes.
BS
base station
cdf
cumulative distribution function
CDN
content delivery network
c.u.
cost units
D2D
device-to-device
DS
distributed storage
ECC
erasure correcting code
i.i.d.
independent, identically distributed
LRC
locally repairable code
MBR
minimum bandwidth regenerating
MDS
maximum distance separable
MIMO
multiple input multiple output
MSR
minimum storage regenerating
OFDM
orthogonal frequency division multiplexing
P2P
peer-to-peer
pdf
probability density function
pmf
probability mass function
RV
random variable
t.u.
time unit

I Introduction

It is predicted that the global mobile data traffic will exceed 30 exabytes per month by 2020, nearly a tenfold increase compared to the traffic in 2015 [1]. This dramatic increase threatens to completely congest the already burdened wireless networks. One popular approach to reduce peak traffic is to store popular content closer to the end users, a technique known as caching. The idea is to deploy a number of access points (called helpers) with large storage capacity, but low-rate wireless backhaul, and store data across them [2, 3]. Users can then download content from the helpers, resulting in a higher throughput per user. In [4] it was suggested to store content directly in the mobile devices, taking advantage of the high storage capacity of modern smart phones and tablets. The requested content can then be directly retrieved from neighbouring mobile devices, using device-to-device (D2D) communication. This allows for a more efficient content delivery at no additional infrastructure cost. Caching in the mobile devices to alleviate the wireless bottleneck has attracted a significant interest in the research community in the recent years [5, 6, 7, 8]. In all these works, simple content caching and/or replication (i.e., a number of copies of a content are stored in the network) is considered. Additionally, the use of maximum distance separable (MDS) codes to facilitate decentralized random caching was investigated in [8].

A relevant problem in D2D-assisted mobile caching networks is the repairing of the lost data when a storage device is unavailable, e.g., when a storage device fails or leaves the network. Repairing of the lost data was considered in [9], where the communication cost incurred by data download and repair was analyzed for a caching scheme where data is stored in the mobile devices using replication and regenerating codes [10]. A strong assumption in [9] is that the repair of the lost content is performed instantaneously. As a result, content can always be downloaded from the mobile devices. Under the assumption of instantaneous repair, the caching strategy that minimizes the overall communication cost is -replication.

In this paper, we consider content caching in a wireless network scenario using erasure correcting codes. When using erasure correcting codes to cache content, caching bears strong ties with the concept of distributed storage (DS) for reliable data storage. Indeed, the set of mobile devices storing content can be seen as a distributed storage network. The fundamental difference with respect to DS for reliable data storage is that data download can be done not only from the storage nodes, but the base station (BS) can also assist to deliver the data. Therefore, the strict guarantees on fault tolerance can be relaxed, which brings new and interesting degrees of freedom with respect to erasure-correcting coding for DS for reliable data storage. Here, to avoid confusion with standard (uncoded) caching, we will use the term wireless distributed storage, highlighting the resemblance with DS using erasure correcting codes for reliable data storage in, e.g., data centers. Similar to the scenario in [9], we consider a cellular system where mobile devices roam in and out of a cell according to a Poisson random process and request content at random times. The cell is served by a BS, which always has access to the content. Content is also stored across a limited number of mobile devices using an erasure correcting code. Our main focus is on the repair problem when a device that stores data leaves the network. In particular, we introduce a more realistic repair scheduling than the one in [9] where lost content is repaired (from storage devices using D2D communication or from the BS) at periodic times.

We derive analytical, closed-form expressions for the overall communication cost of content download and data repair as a function of the repair interval. The derived expressions are general and can be used to analyze the overall communication cost incurred by any erasure correcting code for DS. As an example of the application of the proposed framework, we analyze the overall communication cost incurred by MDS codes, regenerating codes [10], and locally repairable codes [11]. We show that wireless DS can reduce the overall communication cost as compared to the basic scenario where content is only downloaded from the BS. However, this is provided that repairs can be performed frequently enough. Moreover, in the case when nodes storing content arrive to the cell, the communication cost using DS is further reduced and, for large enough arrival rate, it is always beneficial as compared to BS download. The repair interval that minimizes the overall communication cost depends on the network parameters and the underlying erasure correcting code. We show that, in general, instantaneous repair is not optimal. The derived expressions can also be used to find, for a given repair interval, the erasure correcting code yielding the lowest overall communication cost.

Non-instantaneous repairs, the so-called “lazy” repairs, have already been proposed for DS in data centers [12, 13] to reduce the amount of data that has to be transmitted within the storage network during the repair process, known as the repair bandwidth. However, contrary to [12, 13], in the wireless scenario considered here the non-instantaneous repairs impact both data repair and download. We show that, somewhat interestingly, erasure correcting codes achieving a low repair bandwidth do not always perform well in a wireless DS setting. On the other hand, MDS codes, which entail a high repair bandwidth, can yield a low overall communication cost for some repair intervals.

Notation: The probability density function (pdf) of a random variable is denoted by . Expectation and probability are denoted by and , respectively. We use bold lowercase letters to denote vectors and bold uppercase letters for matrices.

Ii System Model

We consider a single cell in a cellular network, served by a BS, where mobile devices (referred to as nodes) arrive and depart according to a Poisson random process. The initial number of nodes in the network is . Nodes wish to download content from the network. For simplicity, we assume that there is a single object (file), of size bits, stored at the BS. We further assume that nodes can store data and communicate between them using D2D communication. The considered scenario is depicted in Fig. 1.

Arrival-departure model. Nodes arrive according to a Poisson process with exponential independent, identically distributed (i.i.d.) random inter-arrival times with pdf

(1)

where is the expected arrival rate of a node and is time, measured in time units.

The nodes stay in the cell for an i.i.d. exponential random lifetime with pdf

(2)

where is the expected departure rate of a node. The number of nodes in the cell can be described by an queuing model where the probability that there are nodes in the cell is [14]

(3)

For simplicity, we assume that , i.e., the flow in and out from the cell is the same and the expected number of nodes in the cell stays constant (equal to ).

Figure 1: A wireless network with data storage in the mobile devices (nodes). A new node arrives to the network at rate . The departure rate per node is . Blue nodes store exactly bits each. The green node requests the file and downloads it from the storage nodes (solid arrows), or from the BS (dashed arrow). The repair onto a node (in red) is carried out by transmitting bits from storage nodes (solid arrows) or bits from the BS (dashed arrow).

Data storage. The file is partitioned into packets, called symbols, of size bits and is encoded into coded symbols, , using an erasure correcting code of rate . The encoded data is stored in nodes, , referred to as storage nodes. Note that implies that a storage node may store multiple coded symbols. For some of the considered erasure correcting codes, this is the case (see Section VI). To simplify the analysis in Sections III and IV, we set . This guarantees that the probability that the number of nodes in the cell is smaller than is negligibly small, i.e.,

(4)

using (3). For example, for and , (4) is less than . Therefore, with high probability the file can be stored in the cell. In the results section we show that this simplification has negligible impact and that the analytical expressions match closely with the simulation results.

Each storage node stores exactly bits, i.e., we consider a symmetric allocation [15]. Hence111Without loss of generality, we assume .,

(5)

Incoming process. Nodes arriving to the cell may bring cached content. The expected arrival rate of nodes storing content is , . We also assume that the expected arrival rate of nodes not carrying content is , so that the expected arrival rate of a node (with or without content) is and the expected number of nodes in the cell is (see above). The incoming process is discussed in more detail in Section V.

Data delivery. Nodes request the file at random times with i.i.d. random inter-request time with pdf

(6)

where is the expected request rate per node. Whenever possible, the file is downloaded from the storage nodes using D2D communication, referred to as D2D download. In particular, we assume that data can be downloaded from any subset of storage nodes, , which we will refer to as the download locality. In other words, D2D download is possible if or more storage nodes remain in the cell. In this case, the amount of downloaded data is bits.222To simplify the analysis in Sections III and IV, we assume that the download bandwidth is the same irrespective of whether the request comes from a storage node itself or not, i.e., users do not have access to their own stored data. This is a reasonable approximation if . Furthermore, this may be a practical assumption. Due to concerns about security in systems that allow for D2D connectivity, it has been proposed to isolate part of the memory in the mobile devices to be used only for DS, so that devices cannot have access to their own cached data [16]. In the case where there are less than storage nodes in the cell, the file is downloaded from the BS, which we refer to as BS download. In this case, bits are downloaded.

Communication cost. We assume that transmission from the BS and from a storage node (in D2D communication) have different costs. We denote by and the cost (in cost units (c.u.) per bit, [c.u./bit]) of transmitting one bit from the BS and from a storage node, respectively. Therefore, the cost of downloading a file from the BS and the storage nodes is and , respectively. Furthermore, we define , where corresponds to a high traffic load in the BS-to-device link and reflects a scenario where the battery of the devices is the main constraint.

Ii-a Repair Process

When a storage node leaves the cell, its stored data is lost (see blue node with orange stripes in Fig. 1). Therefore, another node needs to be populated with data to maintain the initial state of reliability of the DS network, i.e., storage nodes. The restore (repair) of the lost data onto another node, chosen uniformly at random from all nodes in the cell that do not store any content, will be referred to as the repair process. We introduce a scheduled repair scheme where the repair process is run periodically. We denote the interval between two repairs by (in t.u.), . Note that corresponds to the case of instantaneous repair, considered in [9].

Similar to the download, repair can be accomplished from the storage nodes (D2D repair) or from the BS (BS repair), with cost per bit and , respectively. The amount of data (in bits) that needs to be retrieved from the network to repair a single failed node is referred to as the repair bandwidth, denoted by . For simplicity, we assume that each repair is handled independently of the others. In particular, we assume that D2D repair can be performed from any subset of storage nodes, , by retrieving bits from each node. In other words, D2D repair is possible if there are at least storage nodes in the cell at the moment of repair. In this case, , and the corresponding communication cost is . Parameter is usually referred to as the repair locality in the DS literature. If there are less than storage nodes in the cell at the moment of repair, then the repair is carried out by the BS. In this case, , with communication cost . Note that . For both repair and download, we assume error-free transmission.

Parameters , , , and , and subsequently and , depend on the erasure correcting code used for storage. Since , and are very important parameters, an erasure correcting code in DS is typically defined with the triple . This will be further explained in Section VI.

Iii Repair and Download Cost

In this section, we derive analytical expressions for the repair and download cost, and subsequently for the overall communication cost, as a function of the repair interval . For analysis purposes, we initially disregard the incoming process, i.e., set . The case is then addressed in Section V building upon the results in this section. We denote by the average communication cost of repairing lost data, and refer to it as the repair cost. Also, we denote by the average communication cost of downloading the file, and refer to it as the download cost. The (average) overall communication cost is denoted by , where . The costs are defined in cost units per bit and time unit, [c.u./(bitt.u.)].

For later use, we denote by the probability mass function (pmf) of the binomial distribution with parameters and ,

(7)

Iii-a Repair Cost

The repair cost has two contributions, corresponding to the cases of BS repair and D2D repair. Denote by and the average number of nodes repaired from the storage nodes and from the BS, respectively, in one repair interval. Then, (in [c.u./(bitt.u.)]) is given by

(8)

where and (in c.u.) are the cost of repairing a single storage node from the BS and from storage nodes, respectively (see Section II-A), and we normalize by such that does not depend on the file size.

The repair cost, , is given in the following theorem.

Theorem 1.

Consider the DS network in Section II with departure rate , communication costs and , BS repair bandwidth , file size , repair interval , and probability that a node has not left the network during a time . Furthermore, consider the use of an erasure correcting code with D2D repair bandwidth . The repair cost is given by

(9)
Proof:

As the inter-departure times are exponentially distributed, the probability that a storage node has not left the network during a time and is available for repair is

Hence, the probability that storage nodes are available for repair is . If storage nodes remain in the cell, then repairs need to be performed. D2D repair is performed if , and BS repair is performed otherwise. Therefore,

Using these expressions in (8), we obtain (1). ∎

Remark 1.

We see from (8) that if , i.e., , D2D repair should never be performed, as repairing always from the BS yields a lower repair cost. In this case the repair cost would be

Iii-B Download Cost

Similar to , the download cost has two contributions, corresponding to the case where content is downloaded from the BS and from the storage nodes. Denote by and the probability that, for a request, the file is downloaded from the BS and from the storage nodes, respectively. Then, can be written as

(10)

where and are the cost of downloading the file from the BS and from the storage nodes, respectively (see Section II), and is the overall request rate per t.u.. Again, we normalize by so that the cost does not depend on the file size. The download cost is given in the following theorem.

Theorem 2.

Consider the DS network in Section II with expected number of nodes in the cell , departure rate , request rate , communication costs and , file size , and repair interval . Furthermore, consider the use of an erasure correcting code that stores bits per node. Let for , and . The download cost is given by

(11)
Figure 2: Number of available storage nodes within the repair interval . At , there are storage nodes available. is the time after which less than storage nodes are available, hence D2D download is no longer possible.

The proof is given in Appendix A. Here, for ease of understanding, we give an outline of the proof. Since , it follows from (10) that to derive is sufficient to derive . Let be the number of storage nodes alive in the cell within a repair interval, i.e., for , with . It is important to observe that is described by a Poisson death process [14], since storage nodes may leave the cell, and no repair is attempted before a time . This random process is illustrated in Fig. 2. At some point, too many storage nodes have left the network, such that the number of available storage nodes goes below and D2D download is no longer possible. Denote the (random) time this occurs by , i.e., , (see Fig. 2). Denote by the arrival time of the th file request within a repair interval, . The probability can then be derived in two steps.

  1. Find the pdf of the arrival time of the file requests within a repair interval , .

  2. Find the probability that a request arrives before , (i.e., D2D download is possible).

Remark 2.

If , i.e., , performing BS download only is optimal. The download cost is then

(12)

We also have the following result about the behavior of in (11).

Corollary 1.

For , is monotonically increasing with if , monotonically decreasing with if , and constant otherwise.

Proof:

The proof follows directly from differentiating with respect to and is therefore omitted. ∎

Iii-C Overall Communication Cost

Combining Theorems 1 and 2, one obtains the expression for the overall communication cost,

(13)

Note that, in general, is not monotone with . We can derive the following result for (instantaneous repair) and (no repair).

Corollary 2.
(14)

Moreover, for ,

(15)
Proof:

See Appendix B. ∎

For instantaneous repair (), both repair and download are always performed from the storage nodes. Thus, the two terms in (14) correspond to the D2D repair and D2D download, and we recover the result in [9]. For , data is never repaired (hence, ). For , the number of storage nodes in the cell will become smaller than at some point, and D2D download is no longer possible. Therefore, the overall communication cost in (15) is the BS download cost in (12).

Iv Hybrid Repair and Download

In the system model in Section II and the analysis in Section III we assumed that if repair (resp. download) cannot be completed from storage nodes (because there are less than (resp. ) storage nodes available in the cell), BS repair (resp. download) is performed. Alternatively, for both repair and download, a node might retrieve data from the available storage nodes using D2D communication and retrieve the rest from the BS to complete the repair or the download. We will refer to this setup as partial D2D repair and partial D2D download, and the scheme that implements it as the hybrid repair and download scheme. In the following, we extend the analysis in Section III to the hybrid scheme.

Iv-a Repair Cost

Assume that, at the time of repair, storage nodes are available, i.e., repair cannot be accomplished exclusively from the storage nodes. However, bits could be retrieved from the available storage nodes and the remaining bits to complete the repair from the BS. The corresponding communication cost is . For the conventional scheme, D2D repair is not possible for , and the repair cost corresponds to that of BS repair, i.e., . This implies that, if , partial repair leads to a reduced repair cost if or, equivalently, . For , the hybrid scheme performs partial D2D repair if and BS repair otherwise. The repair cost is given in the following theorem.

Theorem 3.

Consider the DS network in Section II using the hybrid scheme. The repair cost is given by

where , for all codes in Section VI, and .

Proof:

It follows the same lines as the proof of Theorem 1. ∎

Iv-B Download Cost

Similar to the repair case, if storage nodes are available at the time of a file request, the file cannot be downloaded solely from the storage nodes. However, bits could be downloaded from the available storage nodes and the remaining bits from the BS, with communication cost . For the conventional scheme, the download cost corresponds to that of BS download, i.e., . Hence, the hybrid scheme leads to a lower download cost if , or equivalently, . For , the hybrid scheme performs partial D2D download if and BS download otherwise. The download cost is given in the following theorem.

Theorem 4.

Consider the DS network in Section II using the hybrid scheme. Let and , for . The download cost is given by

(16)

where , , and

Proof:

See Appendix C. ∎

V Repair and Download Cost with
an Incoming Process

The analysis in the preceding sections does not consider the possibility that nodes arriving to the cell may bring content. In a real scenario with neighboring cells, however, this may be the case. We will refer to the arrival of nodes with content as the incoming process. Considering an incoming process significantly complicates the analysis. This is due to the fact that arriving nodes may bring content that is not directly useful, in the sense that they may bring code symbols which are already available in another storage node. At a given time, it is likely that some symbols will be stored by more than one storage node, while other symbols will not be present in the storage network (due to node departures). As a result, the analysis needs to consider storage node classes, where a node class defines the set of storage nodes storing given code symbols. In general, for an erasure correcting code, there are storage node classes, since all code symbols are different. The case of simply replicating the data (using a repetition code) is a bit different. Despite the fact that all code symbols are equal, for the analysis of -replication we still need to consider storage node classes, i.e., we treat each of the code symbols of the -replication as they were different.

In this section, we extend the analysis in Sections III and IV to the scenario with an incoming process. In particular, we show that Theorem 1 and Theorem 2 can also be used to analyze the repair and download costs for this scenario by using different input parameters. More precisely, we consider the scenario where storage nodes of a given class arrive to the cell according to a Poisson process with expected arrival rate . An incoming storage node brings a single code symbol of a given class. Furthermore, nodes not storing content arrive according to a Poisson process with expected arrival rate . The departure rate for all nodes is , i.e., as before, the average number of nodes in the cell is . We assume the practical scenario where the BS maintains a list of the nodes storing content, which is communicated periodically to all nodes in the cell every t.u.. For simplicity, we assume that .

V-a Repair Cost

Denote by the number of class- storage nodes in the cell at time . Also, denote by the probability that class is empty at time , i.e., . Since all storage node classes have the same arrival and departure rate, we can drop subindex and write . Also, let be the stationary distribution, where is the probability that class has storage nodes. Equation (1) in Theorem 1 can then be used for the scenario with an incoming process by setting .

The difficulty here lies in computing . Without repairs, the evolution of is given by a Poisson birth-death process, which can be modeled by an Markov chain model. In this case, the stationary distribution exists and can be computed. However, the repairs performed every t.u. interfere with the stationarity of the process. Indeed, in the presence of repairs, the evolution of does no longer correspond to a Poisson birth-death process. In this case, the analysis appears to be formidable.

Here, we propose the following two-step procedure to compute . Consider a single repair interval of duration , where is the number of storage nodes in class at time . Within a repair interval , is described by a Poisson birth-death process333This is contrast to the case with no incoming process, where the evolution of for is described by a Poisson death process.. Since storage node classes are independent of each other and have the same arrival and departure rates, we can focus on a single class. Hence, we will drop the subindex in and simply write .

Let denote the transition probability function of the continuous-time Markov chain representing the Poisson birth-death process. can be computed by deriving a set of differential equations, called Kolmogorov’s forward equations, whose solution can be computed as follows [17]. Let be the matrix with th entry , where is the maximum number of storage nodes of one class. Also, let be the transition rates of the continuous-time Markov chain. Then can be computed as [17]

(17)

where is the generator of the Markov chain, with entries , and , given by

with

(18)

The infinite power series in (17) converges for any square matrix , and can be efficiently computed using, e.g., the algorithm described in [18].

Note that in our scenario, is not finite. However, if the probability of having storage nodes of a given class at time , , sharply decreases with . Therefore, we can limit to a sufficiently large value, and by solving (17) get a very good approximation of .

Given , we can estimate the stationary distribution recursively. For a given distribution at time , , we can compute as

(19)

where and , due to the repair, and for .

Equivalently, this recursion can be written in compact form as

(20)
(21)

where is an matrix with entries , for , and . Note that and are the stationary distributions before and after repair, respectively.

Theorem 5.

Consider the DS network in Section II with departure rate , arrival rate of storage nodes of a given class , arrival rate of nodes not storing content , communication costs and , BS repair bandwidth , file size , and repair interval . Furthermore, consider the use of an erasure correcting code with D2D repair bandwidth . The repair cost is given by (1) with , and is given by the first element of in (21).

Proof:

The proof follows from the discussion above. ∎

Theorem 6.

Consider the DS network in Section II with departure rate , arrival rate of storage nodes of a given class , and arrival rate of nodes not storing content , using the hybrid scheme. The repair cost is given by the expression in Theorem 3 with , and is given by the first element of in (21).

Proof:

The proof follows from the discussion above. ∎

Remark 3.

It is important to remark that the analysis for the scenario with an incoming process does not consider the departure of individual storage nodes, but rather the departure of whole classes, i.e., all nodes of a given class. Thus, and in (1) should not be interpreted as storage nodes and storage nodes, respectively, but as and storage node classes.

Remark 4.

Note that in the analysis above we have made the assumption that the stationary distribution exists. While we do not have a formal proof for this, our numerical results suggest that it does exist. In fact, the recursion (20) and (21) converges to the same independently of .

V-B Download Cost

Assume that after repair there are storage nodes of a given class, say class . With some abuse of notation, let be the number of storage nodes of class at time , where parameter indicates that . The evolution of for is given by a Poisson death process. Denote by the time instant at which the last of the storage nodes in class- leaves the cell. is hypoexponentially distributed with pdf given by (31), with and . The expected value of is [19, Sec. 1.3.1]

(22)

Note that is exponentially distributed.

Let be the time instant at which the last of the storage nodes in class leaves the cell or, in other words, the time instant at which the whole class leaves the cell. The pdf of is a weighted sum of the pdfs , weighed by , i.e., it is a weighted sum of hypoexponential distributions. The expected value of is

(23)

Let be the number of nonempty storage node classes in the cell at time . Computing exactly requires to compute the distribution of the time instant at which changes from to , denoted by , similar to the case with no incoming process (see Appendix A). Unfortunately, due to the fact that the pdf of is a weighted sum of hypoexponential distributions, computing the pdf of seems unfeasible. Here, we propose to approximate the pdf of by an exponential pdf. Indeed, it appears that is in general the largest element in , therefore the distribution of has a large exponential component. Assuming that is well approximated by an exponential distribution with mean , the download cost for the scenario with an incoming process can then be computed using (11) in Theorem 2 by setting , where now a storage node departure should be interpreted as a storage node class departure. We have observed that by approximating the pdf of by an exponential distribution with mean , the analytical results match very well with the simulations for the whole range of interesting values of and , as shown in the results section. The download cost for the hybrid scheme is found by using (16) in Theorem 4 with .

Vi Erasure Correcting Codes in Distributed Storage

From Sections IIIV, it can be seen that the overall communication cost depends on the network parameters (), , , and , and on the parameters , , , , and (and subsequently on and ), which are determined by the erasure correcting code used for DS. An erasure correcting code for DS is typically described in terms of the number of nodes used for storage, the download locality and the repair locality, and is defined using the notation . In this section, we briefly describe MDS codes [20], regenerating codes [10] and LRCs [11] in the context of DS. We also connect the code parameters with the code parameters . In Section VII, we then evaluate the overall communication cost of DS using these three code families.

We remark that the analysis in the previous sections applies directly to MDS and regenerating codes. However, due to the specificities of LRCs, Theorem 1 needs to be slightly modified, as shown in Section VI-C below.

Vi-a Maximum Distance Separable Codes

Assume the use of an MDS code for DS. In this case, each storage node stores one coded symbol, hence and . Due to the MDS property, D2D repair and D2D download require to contact storage nodes. Therefore, an MDS code in a DS context is described with the triple . Moreover, , i.e., . The fact that an amount of information equal to the size of the entire file has to be retrieved to repair a single storage node is a known drawback of MDS codes [10]. The simplest MDS code is the -replication scheme. In this case, each storage node stores the entire file, i.e., and .

Vi-B Regenerating Codes

A lower repair bandwidth (as compared to MDS codes) can be achieved by using regenerating codes [10], at the expense of increasing [10]. Two main classes of regenerating codes are covered here, minimum storage regenerating (MSR) codes and minimum bandwidth regenerating (MBR) codes. MSR codes yield the minimum storage per node, i.e., is minimum, while MBR codes achieve minimum D2D repair bandwidth. Regenerating codes have two repair models, functional repair and exact repair [21]. In exact repair, the lost data is regenerated exactly [21]. In functional repair, the lost data is regenerated such that the initial state of reliability in the DS system is restored [21], but the regenerated data does not need to be a replica of the lost data [21]. Here, we consider only exact repair, since it is of more practical interest [22].

An exact-repair MSR code in a DS system has and , with [22].444The design of linear, exact-repair MSR codes with has been proven impossible [23]. Hence, using (5),

Furthermore [22],

with equality only when , which is only possible for and due to the restriction on the values for the repair locality. The repair bandwidth,

is minimized for [10]. We remark that the storage per node (and hence the average download cost) for an MDS code and an MSR code are equal.

An MBR code further reduces the repair bandwidth at the expense of increasing the storage per node. An exact-repair MBR code has and for [22]. Using (5), we have

Furthermore [22],

Similar to the MSR codes, the repair bandwidth of an MBR code,

is minimized for [10].

Note that an regenerating code has exactly the same overall communication cost as an -replication scheme.

Vi-C Locally Repairable Codes

A lower repair locality (as compared to MDS codes) is achieved by using LRCs [11]. An LRC has and , where and . Each node stores

bits. The storage nodes are arranged in disjoint repair groups with nodes in each group. Any single storage node can be repaired locally by retrieving bits from nodes in the repair group [11]. A storage node involved in the repair process transmits all its stored data, i.e., , hence

If local D2D repair is not possible, repair can be carried out globally by retrieving bits from any subset of storage nodes. Since it is necessary to distinguish between local and global repairs (as opposed to MDS and regenerating codes), the expression of the repair cost in Theorem 1 does not apply to LRCs and needs to be modified. We denote by and the average number of nodes repaired from the storage nodes locally and globally, respectively, in one repair interval. We will also need the following definitions. Let be the random vector whose component is the random variable giving the number of repair groups with storage node departures in a repair interval . Note that takes values in and . The probability of storage node departures in a repair group is , where is the probability that a storage node has not left the network during a time . Let be a realization of and let . Then,

(24)

where , is the multinomial coefficient, and .

The repair cost for LRCs is given in the following theorem.

Theorem 7.

Consider the DS network in Section II with departure rate , communication costs and , BS repair bandwidth , file size , and repair interval . Furthermore, consider the use of an LRC with disjoint repair groups and D2D repair bandwidth . The repair cost is given by

(25)

where

and is an indicator function.

Proof:

See Appendix D. ∎

It is easy to verify that Corollary 2 holds also for LRCs.

Vi-D Lowest Overall Communication Cost for Instantaneous Repair

For instantaneous repair, the minimum overall communication cost is given in the following lemma.

Lemma 1.

For (instantaneous repair), the lowest possible overall communication cost for any linear code with , regenerating codes and LRCs is

where is given in (14) in Corollary 2. The minimum is achieved by -replication.

Proof:

See Appendix E. ∎

This is in agreement with the result in [9], where -replication was shown to be optimal.

Vii Numerical results

In this section, we evaluate the overall communication cost (computed using (1) and (11)) for the erasure correcting codes discussed in the previous section. For the results, we consider a network with nodes, where the number of storage nodes is . This gives a probability smaller than of having less than nodes in the cell (see (4)), which is considered negligible. Without loss of generality, we set the departure rate and , i.e., . Figs. 39 refer to a system with no incoming process, i.e., , while Figs. 10 and 11 consider the presence of an incoming process, .

Figure 3: Normalized overall communication cost versus the repair interval for a selection of MDS codes, regenerating codes and LRCs with , compared to the normalized BS download cost (dotted line).

Fig. 3 shows normalized to the cost of downloading from the BS, , i.e., , as a function of the normalized repair interval, , for a selection of MDS codes, regenerating codes and LRCs with . The ratio between the request rate and departure rate is , i.e., the average request rate in the cell is requests per t.u., and . The meaning of is that each node places in average requests per node life time. Also, in the figure means that the repair interval is equal to one average node lifetime. Simulation results555When simulating the wireless DS system, the repair process is not executed if the number of nodes in the cell is less than at the particular repair instant. are also included in the figure (markers). Note that since we normalize to the BS download cost, values below ordinate correspond to the case where DS is beneficial. For relatively high repair frequencies, all codes yield lower than BS download. However, exceeds , i.e., BS download is less costly than the DS communication cost, for values of the repair interval larger than a threshold, which we define as

(26)

For , retrieving the file from the BS is always less costly, therefore storing data in the nodes is useless. depends on the network parameters , , and as well as the code parameters , and .

We see from Fig. 3 that the value of that minimizes , denoted by , depends on the code used for storage. In particular, for the MSR code, i.e., instantaneous repair is optimal. Performing an exhaustive search for , it is readily verified that the same is true for any of the codes in Section VI with . It is reasonable to assume that this will be the case also for . On the other hand, for the MDS code. depends on the network and code parameters. In particular, the tolerance to storage node departures in a repair interval affects . In Section VII-A, we investigate how the network parameters affect and . In Section VII-B, we explore how the code parameters affect .

Vii-a Effect of Varying Network Parameters

Fig. 4 shows how increases with for the same codes as in Fig. 3 and . For , approximately, for all considered codes, i.e., it is never beneficial to use the devices for storage and the file should always be downloaded from the BS. It is worth noticing that, for moderate-to-large , the MSR code requires in the order of 10 repairs per average node lifetime while the MDS code requires only around 0.66 repairs per node lifetime for DS to be beneficial over BS download. The main difference between the MDS code and the MSR code is the number of storage node departures in a repair interval that the code can tolerate such that D2D repair is still possible, i.e., . The MDS code can handle the departure of up to storage nodes while the MSR code can tolerate a single departure only. This explains the higher repair frequency required by the MSR code.

Figure 4: The maximum repair interval versus the transmission cost ratio .
Figure 5: Normalized overall cost versus the repair interval for the LRC for different values of the ratio , as compared with the normalized BS download cost (straight dotted line). The arrow points in the direction of increasing .

For the LRC and , Fig. 5 shows how and are affected by the ratio . We see that increasing reduces for all and that increases with . The same behavior is observed using any of the codes in Section VI, which can be verified by the following manipulations of the equations in Section III. The case corresponds to , which can be readily seen by taking the limit in (13), using (1) and (11), for fixed and finite . This shows that the overall communication cost is essentially the download cost for a sufficiently high . Since is monotonically increasing in (Corollary 1) and as (Corollary 2), we also have that for . Hence, DS always leads to a lower overall communication cost, as compared to the BS download cost, for sufficiently large .

Vii-B Results of Changing Code Parameters

We investigate how the repair locality affects . Fig. 6 shows versus for the  MSR code for and . We observe that for the lowest is achieved for , i.e., the highest possible repair locality. This is due to the fact that for regenerating codes is minimized for (see [10] and Section VI-B). However, increasing requires decreasing to yield the lowest . This is due to the improved tolerance to storage node departures as decreases. The result is interesting, because it means that in wireless DS, if repairs cannot be accomplished very frequently, repair locality is a more important parameter than repair bandwidth. On the other hand, if repairs can be performed very frequently, repair bandwidth becomes more important than repair locality, because tolerance to storage node departures is not critical. In general, there is a tradeoff between the repair bandwidth and the tolerance to storage node departures (directly related to the repair locality), which holds true for any of the codes in Section VI. How to set the the parameter depends on how frequently we can repair the DS system.

Figure 6: Normalized overall cost versus the repair interval for the MSR code compared with the normalized BS download cost (dotted line). The arrow shows the direction of increasing .

Vii-C Improved Communication Cost Using the Hybrid Scheme

We return to the hybrid repair and download scheme presented in Section IV to investigate the gains in overall communication cost as compared to the cost when using the conventional scheme. We remark that the hybrid scheme does not improve for all codes in Section VI. In particular, for finite , is only reduced if (Theorem 3) and is only improved if (Theorem 4). Fig. 7 shows versus for all codes in Fig. 3 that achieve lower when using the hybrid scheme. We set and and include simulation results in the figure (markers). Dashed curves correspond to the conventional scheme, and solid curves to the hybrid scheme.

Figure 7: The normalized overall cost versus the repair interval