# Repair Scheduling in Wireless Distributed Storage with D2D Communication

###### Abstract

We consider distributed storage (DS) for a wireless network where mobile devices arrive and depart according to a Poisson random process. Content is stored in a number of mobile devices, using an erasure correcting code. When requesting a piece of content, a user retrieves the content from the mobile devices using device-to-device communication or, if not possible, from the base station (BS), at the expense of a higher communication cost. We consider the repair problem when a device that stores data leaves the network. In particular, we introduce a repair scheduling where repair is performed (from storage devices or the BS) periodically. We derive analytical expressions for the overall communication cost of repair and download as a function of the repair interval. We illustrate the analysis by giving results for maximum distance separable codes and regenerating codes. Our results indicate that DS can reduce the overall communication cost with respect to the case where content is only downloaded from the BS, provided that repairs are performed frequently enough. The required repair frequency depends on the code used for storage and the network parameters. In particular, minimum bandwidth regenerating codes require frequent repairs, while maximum distance separable codes give better performance if repair is performed less frequently. We also show that instantaneous repair is not always optimal.

## I Introduction

It is predicted that global mobile data traffic will reach 24.3 exabytes per month by 2019, nearly a tenfold increase compared to the traffic in 2014 [1]. This dramatic increase in mobile data traffic threatens to completely congest the already burdened wireless networks. One popular approach to reduce peak traffic is to store popular data closer to the end users, a technique also known as caching. Recently, a novel architecture was proposed to efficiently handle highly predictable bulky traffic, such as video traffic [2]. The idea is to deploy a number of access points (called helpers) with large storage capacity, but low-rate wireless backhaul, and store data across them. Users can then download content from the helpers, resulting in a performance gain.

In [3] it was suggested to store content directly in the mobile devices, taking advantage of the high storage capacity of modern smart phones and tablets. Hence, no additional infrastructure is required. Traffic to the BS is alleviated by maximizing the number of times a requested file can be retrieved from the mobile devices storing content, using device-to-device (D2D) communication. The problem of repairing the lost data when a device leaves the network was considered in [4], where data is stored in the mobile devices using erasure correcting coding. In particular, the communication cost incurred by data download and repair is analyzed in [4], assuming an infinite storage capacity in the mobile devices and instantaneous repair.

In this paper, we consider distributed storage (DS) in a wireless network scenario similar to the one in [4]. We consider a cellular system where mobile devices roam in and out of a cell according to a Poisson random process and request content at random times. The cell is served by a base station (BS), which always has access to the content. Content is also stored across a limited number of mobile devices using an erasure correcting code. When a user requests a piece of content, it attempts to download it from the mobile devices using D2D communication. If not possible, the content is downloaded from the BS, at the expense of a higher communication cost. Our main focus is on the repair problem when a device that stores data leaves the network. In particular, we introduce a repair scheduling where lost content is repaired (from storage devices sojourning in the cell or from the BS) at periodic times. We derive analytical expressions for the total communication cost of repair and download as a function of the repair interval. Furthermore, we analyze several erasure correcting codes, namely maximum distance separable (MDS), and regenerating codes. We show that DS can reduce the overall communication cost as compared to the classical scenario where content is only downloaded from the BS, provided that repairs are performed frequently enough. The required frequency depends on the code family and on the network parameters. Somewhat surprisingly, instantaneous repair is not always the optimal.

## Ii System Model

We consider a single cell in a cellular network, served by a BS, where mobile devices (referred to as nodes) arrive and depart according to a Poisson process. The average number of nodes in the network is . Nodes wish to download content from the network. For simplicity, we assume that there is a single object (file), of size bits, stored at the BS. We further assume that nodes can store data and communicate between them using D2D communication. The considered scenario is depicted in Fig. 1.

Arrival-departure model. Nodes arrive according to a Poisson process with exponential independent, identically distributed (i.i.d.) random inter-arrival times with probability density function (pdf)

(1) |

where is the expected arrival rate of a node and is time, measured in time units (t.u.).

The nodes stay in the cell for an i.i.d. exponential random lifetime with pdf

(2) |

where is the expected departure rate of a node. The number of nodes in the cell can be described by an queuing model. We assume that , i.e., the average number of nodes in the cell stays constant (equal to ).

Data storage. The file is partitioned into packets and encoded using an erasure correcting code of rate . The encoded data is stored in nodes, referred to as storage nodes. For simplicity, we assume , hence the probability that the number of nodes in the cell is smaller than is negligibly small. Therefore, the file can always be stored in the network. In particular, each storage node stores exactly bits, i.e., we consider a symmetric allocation [5]. Hence,

(3) |

Like [5], we also introduce an overall storage budget constraint of bits, , across the nodes in the cell, i.e., . Note that to satisfy this constraint, .

Data delivery. Nodes request the file at random times with i.i.d. random inter-request time with pdf

(4) |

where is the expected request rate per node. Whenever possible, the file is downloaded from the storage nodes using D2D communication, referred to as D2D download. In particular, we assume that data can be downloaded from any subset of storage nodes. In other words, D2D download is possible if or more storage nodes remain in the cell. In this case, the amount of downloaded data is bits, where the inequality follows because . The parameter depends on the properties of the erasure correcting code used for storage, and will be discussed in Section IV. In the case where there are less than storage nodes in the cell, the file is downloaded from the BS, referred to as BS download. In this case, bits are downloaded. To simplify the analysis in Section III, we assume that the download bandwidth is the same irrespective of whether the request comes from a storage node itself or not. This is a reasonable approximation, since .

We assume that transmission from the BS and from a node (in D2D communication) have different costs. We denote by and the cost (in cost units (c.u.) per bit, [c.u./bit]) of transmitting one bit from the BS and from a node, respectively, and by its ratio. We further assume , hence transmission from the BS is at least as costly as the transmission in D2D communication.

### Ii-a Repair Process

When a storage node leaves the network, its stored data is lost (see blue node with orange stripes in Fig. 1). Therefore, another node needs to be populated with data to maintain the initial state of reliability of the DS network, i.e., storage nodes. The restore (repair) of the lost data onto another node, chosen uniformly at random from all nodes in the cell that do not store any content, will be referred to as the repair process. In particular, we introduce a scheduled repair scheme where the repair process is launched periodically. We denote the interval between two repairs by (in t.u.), . Note that corresponds to the case of instantaneous repair, considered in [4].

Similarly to the download, repair can be accomplished from the storage nodes (D2D repair) or from the BS (BS repair), with cost per bit and , respectively. The amount of data (in bits) that needs to be retrieved from the network to repair a single failed node is referred to as the repair bandwidth, . In particular, we assume that D2D repair can be performed from any subset of storage nodes by retrieving bits from each node. In other words, D2D repair is possible if there are at least storage nodes in the cell at the moment of repair. is usually referred to as the repair access in the literature. In this case , where the subindex indicates that repair is performed from the storage nodes. If there are less than storage nodes in the network at the moment of repair, then the repair is carried out by the BS. In this case . We assume that repair always succeeds. Furthermore, for both repair and download we assume error-free transmission.

## Iii Repair and Delivery Cost

In this section, we derive analytical expressions for the repair cost, , download cost, , and total cost , as a function of the repair interval, . The cost is defined in cost units per bit and time unit [c.u./(bitt.u.)]

### Iii-a Average Repair Cost

Denote by and the average number of nodes repaired from the storage nodes and from the BS, respectively, in one repair interval. Also, let be the probability mass function (pmf) of the binomial distribution with parameters and .

###### Lemma 1.

(5) | ||||

(6) |

where .

###### Proof:

As the inter-departure times are exponentially distributed, the probability that a storage node has not left the network during a time and is accessible for repair is . Hence, the probability that storage nodes are accessible is . If only storage nodes remain in the network, then repairs need to be performed. D2D repair is performed if ; BS repair is performed otherwise. Therefore, (5) and (6) hold. ∎

The average repair cost, , is given in the following theorem.

###### Theorem 1.

Consider the DS network in Section II with parameters , , , , , , , and . The average repair cost is

(7) | ||||

(8) |

where .

###### Proof:

From the system model, it follows that the cost of repairing a single storage node from the BS is c.u. Similarly, the cost of D2D repair of a single node is c.u.. Normalizing by the file size ( bits) and the duration of the repair interval , we obtain (7) in [c.u./bitt.u.]. Finally, using Lemma 1, we obtain (8). ∎

### Iii-B Average Download Cost

The average download cost is given in the following theorem.

###### Theorem 2.

Consider the DS network in Section II with parameters , , , , , , , , and . Let for , and . Then

(9) |

where .

###### Proof:

See appendix. ∎

### Iii-C Average Total Cost

Combining Theorems 1 and 2, one obtains the expression for . Note that in general is not monotone with . We can derive the following result for and .

###### Corollary 1.

. Moreover, for , .

For instantaneous repair (), both repair and download are always performed from the storage nodes. Thus, the two terms in for in Corollary 1 correspond to the repair and download costs in the D2D regime. For , data is never repaired (hence, ). For , the number of storage nodes in the cell will become smaller than at some point, and D2D download is not possible. Therefore, the average download cost is the average BS download cost.

## Iv MDS and Regenerating Codes

From Section III it can be seen that the total cost, , depends on the DS system parameters , , , , and (among others). This section describes how, in turn, these parameters depend on the erasure correcting codes used for storage. We consider as examples MDS codes [6] and regenerating codes [7].

### Iv-a Maximum Distance Separable Codes

Assume the use of an MDS code for DS. Then, due to the MDS property, D2D repair and D2D download require to contact storage nodes. Moreover, , which means that . The fact that an amount of information equal to the size of the entire file has to be retrieved to repair a single storage node is a known drawback of MDS codes [7].

The simplest MDS code is the -replication scheme. In this case, each storage node stores the entire file, i.e., . For the replication scheme, and .

### Iv-B Regenerating Codes

A lower repair bandwidth (as compared to MDS codes) can be obtained by using regenerating codes [7], but at the expense of increasing [7]. Two main classes of regenerating codes are covered here, minimum storage regenerating (MSR) codes and minimum bandwidth regenerating (MBR) codes. For given and , MSR codes yield the best storage efficiency, i.e., is minimum, while MBR codes achieve minimum D2D repair bandwidth, i.e., is minimum.

For an MSR code in a DS system, . Moreover, storage nodes are contacted during the D2D repair process. Hence, the download cost for an MSR code is equal to the one of an MDS code. However, [7]. is minimized for . For , the total cost of the MSR code is equal to that of the MDS code.

## V Numerical results

In this section, we evaluate the total cost for MDS and regenerating codes. For the results, we consider a network with average nodes, request rate , and a cost ratio . Also, the storage budget is set to . Without loss of generality we set c.u./bit, i.e., . To specify a code, we use the alternative notation .

Fig. 2 shows the value of the normalized cost versus the normalized repair interval for , for the MDS code, the MSR code with , i.e., moderate and high repair access respectively, the MBR code with and the 5-replication scheme. The code rate for all codes is , except for the MBR code that has and the MBR code that has . In the figure, means that the repair interval is equal to one average node lifetime.

The code parameters are chosen to highlight particularly interesting behaviors of the different codes. Note that since , (and hence ) and are proportional to the file size , as specified in Section IV, the repair and download cost in (7) and (9), respectively, are independent of the file size . From Corollary 1, (the cost of always downloading content from the BS) when . We observe from Fig. 2 that this is indeed the case. It is interesting to point out that the normalized total cost exceeds for values of the repair interval larger than a threshold . We define the maximum repair interval as

(10) |

For , retrieving the file from the BS is always less costly, therefore storing data in the nodes is useless. Clearly, is a function of the cost ratio . Fig. 3 shows as a function of , for all codes in Fig. 2. We observe that if , approximately, it is never beneficial to use the devices for storage, i.e., the file should always be downloaded from the BS. As increases, storing data in the mobile devices is beneficial, if repair is performed with . The regenerating codes with high repair access require very frequent repairs. Although not included here due to space constraints, the same is true for other MSR and MBR codes with high repair access. The MDS codes and the regenerating codes with moderate repair access require less frequent repairs; for large , the repair interval must be at most around 1.5 and 0.5 average node lifetimes respectively.

For the same parameters and codes used in Fig. 2, Fig. 4 shows the normalized total cost for shorter repair intervals. We observe that instantaneous repair is optimal for the MBR and MSR codes with (Fig. 4(a)). On the other hand, for the MDS codes and the regenerating codes with moderate repair access is minimized for (Fig. 4(b)).

## Vi Conclusions

We considered distributed storage for a wireless network where data is stored in a distributed manner across mobile devices. We introduced a repair scheduling where the repair of the data lost due to device departures is performed periodically. We derived analytical expressions for the total communication cost, due to repair and download, as a function of the repair interval. For a particular network, we showed that there exists a maximum value of the repair interval after which retrieving the file from the BS is always less costly. Therefore, DS is useful if the repair can be performed frequently enough. Instantaneous repair is not always the best solution. The optimal repair interval that minimizes the total communication cost depends on the code used for storage. For a given repair interval, one should find the code that minimizes the total communication cost. A more thorough investigation is left for future research.

## Appendix A Outline of the Proof of Theorem 2

A file request entails a cost with probability , and a cost with probability . The overall request rate per t.u. is . Normalizing by the file size gives the first equality in (9). In the following, we prove the last equality of the theorem.

Within a repair interval, the number of storage nodes in the cell is described by a Poisson death process [8]. Denote by the time interval for which , (see Fig. 5 for illustration). is exponentially distributed with rate . Denote by the time instant within the repair interval at which changes from to . Then,

(11) |

The pdf of is given by [9]

(12) |

We are interested in the distribution of file requests within a repair interval . Let be the time instant of the th request. is computed as the sum of inter-request times with pdf given by (4). Thus, is an Erlang distributed random variable with pdf [8]

(13) |

Define . The following result holds.

###### Lemma 2.

The distribution of for is

###### Lemma 3.

The proofs are omitted due to lack of space. It can be verified numerically that converges to the uniform distribution already for small values of .

D2D download is possible if at least storage nodes are available in the network. Thus, given the sequence of random variables ,

where the approximation follows because for large enough , .

Now, using (12), after some calculations we obtain

(14) |

## References

- [1] Cisco, “Cisco visual networking index: Global mobile data traffic forecast update, 2014-2019,” Cisco, Tech. Rep., 2015.
- [2] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femtocaching: Wireless content delivery through distributed caching helpers,” IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 8402–8413, Dec. 2013.
- [3] N. Golrezaei, P. Mansourifard, A. F. Molisch, and A. G. Dimakis, “Base-station assisted device-to-device communications for high-throughput wireless video networks,” IEEE Trans. Wir. Commun., vol. 13, no. 7, pp. 3665–3676, Jul. 2014.
- [4] J. Pääkkönen, C. Hollanti, and O. Tirkkonen, “Device-to-device data storage for mobile cellular systems,” in Proc. IEEE Globecom Work., Dec. 2013.
- [5] D. Leong, A. G. Dimakis, and T. Ho, “Distributed storage allocations,” IEEE Trans. Inf. Theory, vol. 58, no. 7, pp. 4733–4752, Jul. 2012.
- [6] W. E. Ryan and S. Lin, Channel Codes: Classical and Modern. Cambridge University Press, 2009.
- [7] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4539–4551, Sep. 2010.
- [8] S. L. Miller and D. Childers, Probability and Random Processes. Elsevier, 2004.
- [9] G. Bolch, S. Greiner, H. de Meer, and K. S. Trivedi, Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. Wiley-Interscience, 2006.