On the Service Capacity Region of Accessing Erasure Coded Content
Cloud storage systems generally add redundancy in storing content files such that files are replicated or erasure coded and stored on nodes. In addition to providing reliability against failures, the redundant copies can be used to serve a larger volume of content access requests. A request for one of the files can either be sent to a systematic node, or one of the repair groups. In this paper, we seek to maximize the service capacity region, that is, the set of request arrival rates for the files that can be supported by a coded storage system. We explore two aspects of this problem: 1) for a given erasure code, how to optimally split incoming requests between systematic nodes and repair groups, and 2) choosing an underlying erasure code that maximizes the achievable service capacity region. In particular, we consider MDS and Simplex codes. Our analysis demonstrates that erasure coding makes the system more robust to skews in file popularity than simply replicating a file at multiple servers, and that coding and replication together can make the capacity region larger than either alone.
all \addeditormehmet \addeditorsarah \addeditorann \addeditorgauri \addeditorswanand \addeditorgretchen \addeditorcarolyn \addeditoremina
Cloud storage systems are expected to provide reliability against failures and ensure availability of stored content during high demand, while handling massive amount of data. In order to combat failures, redundancy is added using either replication or erasure coding. Even though replication has conventionally been preferred due its simplicity, a large body of recent literature has proposed novel erasure coding techniques as a more efficient way to provide reliability, see e.g., [1, 2, 3, 4]. In addition to reliability, redundancy has been shown to be effective in enhancing availability by reducing download latency for retrieving entire data in a number of recent research papers, see e.g., [5, 6, 7, 8]. On the other hand, for downloading hot data, wherein users are interested in downloading individual files with different popularities, the role of erasure codes in reducing latency is not yet well understood, and is a topic of active research, see [9, 10, 11].
Besides download latency, an important metric that measures the availability of the stored data is the service capacity region, which is the space of download request rates for which the system is stable. In comparison to download latency, this metric of service capacity has received very little attention. One notable exception is the work of , in which the authors study storage allocation strategies to maximize the service rate for downloading entire data.
In this paper, we seek to investigate the effect of redundancy on the service capacity region of the the system. To the best of our knowledge, this is the first work to investigate the service capacity for downloading hot data from coded storage. More specifically, we consider a system with files, that are replicated or stored in coded form on servers. Requests to download file arrive at rate . Our first objective is to maximize the set of arrival rates supported by a given coding scheme. Next, we compare service capacity regions for different coding schemes.
Together with replication and maximum distance separable (MDS) codes, we consider an important family of distributed storage codes called availability codes (see [13, 14, 15]). Availability codes enable any codeword symbol to be recovered from multiple, disjoint subsets of other symbols of small size. Amongst availability codes, we focus our attention on the special sub-class, namely simplex codes due to their optimality in rate .
We note that MDS codes are more robust to handling variations in the access patterns compared to replication. Availability codes handle the skews in popularities better than MDS codes. Surprisingly, hybrid codes, formed by replicating some of the files and adding MDS parity symbols perform exceptionally well by achieving a large service capacity region.
It is important to note that, even though we focus on the service capacity for content download, the techniques are also applicable for analyzing service capacity for coded computation. For example, suppose some users are interested in computing a matrix vector product , while others are interested in computing . Suppose two worker nodes store matrices and respectively, while the third worker node stored the sum of the two matrices, (assuming that and are of the same size). Then, excess number of requests to compute can be satisfied by using the other two workers.
Organization: In Section II, we describe the problem setup and formally define the service capacity region. In Section III, we motivate the analysis of service capacity by computing the service capacity for several small examples of codes. In Section IV, we focus on systematic MDS codes. We find an outer bound on the service capacity region, and present a greedy algorithm referred to as waterfilling algorithm. We show that the waterfilling algorithm is optimal, and it achieves the outer bound for MDS codes of rate smaller than or equal to half. In Section V, we characterize the service capacity of simplex codes. Our proof for converse uses an interesting connection to graph covering. In Section VI we consider hybrid codes consisting of replication and MDS parities. For files, we characterize the service capacity of hybrid codes as a function of the number of replicas of the two files and the number of MDS parities.
Ii Problem Formulation
We have files, of equal size stored redundantly across nodes, labeled through . We refer to the coding scheme encoding files into as an code. Requests to download arrive at rate . Our objective is to determine the set of arrival rates that can be served by the system. We refer to space of arrival rates that can be served as the capacity region of the system.
As the coding scheme adds redundancy, each file can be recovered in multiple ways. For a file , a subset of nodes (of minimal size) from which the file can be recovered is referred to as a recovering set of . We denote the number of distinct recovering sets of as , and label them as . For example, consider the following code over : . There are four recovering sets for each file. Recovering sets of are given as , , , and . Observe that for a systematic MDS code, there are recovering sets for every file.
We consider the class of scheduling strategies that assign a fraction of requests for a file to each of its recovering sets. Let be the fraction of requests for file that are assigned to its recovering set . Note that . Then, the service capacity region of an coding scheme is defined as follows.
Definition 1 (Service Capacity Region).
Consider a system storing files over nodes using an code such that a file has recovering sets . Let the service rate of every node is . Then, the service capacity region of such a system is the set of vectors such that, for every , there exist , , satisfying the following:
Note that, given any arrival rates , finding the maximum value of and the allocations such that (1), (2), and (3) hold can be considered as a constrained optimization problem. Specifically, given , the linear program to compute the maximum is described as follows.
Iii Examples of Service Capacity Regions
To motivate the analysis, suppose , and we have two files and which are stored on nodes. We compare three storage schemes: uncoded, MDS coded, and a hybrid between repetition and coding shown in Fig. 1.
Iii-a Repetition Coding
Iii-B MDS Coding
Next let us find the rate region of the coded system illustrated in Fig. 1. Recall for each given , we want to determine the maximum achievable . We divide the problem into three cases:
Case 1 (): All the requests for file should be assigned to the systematic node . Requests for file can utilize the remaining capacity of this node. We can use this by assigning requests for file to nodes and , and to nodes and . Now nodes and have capacity each remaining, which can be used to serve requests per second. Thus, maximum achievable is
Case 2 ():
Out of , volume of requests are assigned to the systematic node . The remaining traffic is assigned to nodes and from which we can recover file . Thus, the coded nodes and have capacity remaining to serve requests for file . Hence, the maximum achievable is
Case 3 (): The solution to this case is same as Case 1, with replaced by . Thus the maximum achievable is
Combining these cases, we get the achievable rate region illustrated in blue in Fig. 2.
Iii-C Hybrid Coding
If file is known to be more popular than , we can have a coded system with nodes storing , , and respectively. This coding scheme is a combination of repetition and erasure coding. We can find the service capacity by dividing the problem into cases, similar to Section III-B. For this system the service capacity region is given by Fig. 2.
Iv systematic MDS coded systems
In this section we find the service capacity region of a system of servers that store files together with parity files that are generated using an MDS code. Each of the original and redundant files are distributed across all servers. Each file can be downloaded from the server storing it, which we refer to as the systematic server for the file, or by accessing any of the remaining servers.
Let the arrival rate of requests for file be denoted by . We want to determine the set of arrival rate vectors that can supported by the system.
Iv-a Outer Bound on the Rate Region
First we find an outer bound of the service capacity region.
Theorem 1 (Outer Bound).
The set of all achievable request vectors lies inside the region described by
where the notation .
Each server in the system can support volume of requests, and thus the total capacity is . We now determine the total system capacity utilized by file download requests, and ensure that it is less than . Downloading a file from coded servers requires downloading data of size times the file size. If is the rate of request arrivals for file , the minimum system capacity utilized by these requests is . Since the total system capacity is , the sum of the capacity utilized by all requests must be less than . Thus we have (9). ∎
Iv-B : Achievable Region Matches Outer Bound
We seek to find a strategy to split the download requests across the servers such that the set of feasible arrival rates matches, or comes close to the outer bound given by Theorem 1. We now propose a water-filling algorithm to schedule requests to servers on a coded system. Then we prove the optimality of this algorithm by considering two cases: 1) (the code rate ), and 2) (the code rate ).
Definition 2 (Waterfilling Algorithm).
Given arrival rates , , … for the files, the water-filling algorithm assigns them to the nodes as follows.
Let be the load on node for . Assign requests to their respective systematic nodes until these nodes are saturated. Set for .
Each of the remaining requests can be served by any unsaturated servers.
While and do the following:
Find the least-loaded servers in the system, that is, the servers with minimum ’s.
From , send an infinitesimally small rate to each of these servers. Decrement by , and increment the corresponding ’s by .
For , the proposed water-filling algorithm is optimal. The set of feasible arrival rates span the whole region inside the outer bound given in Theorem 1.
For let us evaluate the set of feasible arrival rates using this waterfilling algorithm.
Without loss of generality, sort the arrival rates in descending order such that . After sending requests to systematic servers until they are saturated, the total residual arrival rate is , as illustrated in Fig. 3 for the MDS coded system. Assume that . If this is not true, then and all requests can be served by systematic servers.
The algorithm first uniformly splits requests over servers, . Then, for every , it uniformly splits requests over servers, . Using this water-filling algorithm, the maximum rate of requests that can be supported using coded servers is
In Fig. 3, the height of each patterned fill in the rightmost column, starting from the bottom upwards, corresponds to each term in the above summation.
Iv-C : Waterfilling is optimal
Next let us consider the second case . For this case, we cannot always achieve the the same rate region as given by the outer bound. However, we can show that the waterfilling algorithm is optimal, and no other rate splitting scheme can yield a strictly larger service capacity region. This result follows from the two lemmas below.
It is optimal to first send requests to their systematic node. Only when the systematic node is saturated, requests should be served using coded servers.
For , we show that not utilizing the systematic node can only add load to the system, and thus reduce its service capacity region. Suppose for some , that is all requests for can be served by the systematic node. Instead, suppose we serve rate using the systematic node , and send the remaining portion to other servers, and decode file from the coded versions. As a result we are reducing the load on the systematic node by , and instead adding load to other servers. If , at least one of these servers is also a systematic node, which stores file . Thus, the maximum rate of requests for file that can be served by its systematic node reduces by .
For , we showed in Theorem 2 that the water-filling algorithm, which first sends requests to the systematic node is optimal. Thus, there is no loss of optimality in sending requests to the systematic node until it is saturated.
After the systematic node is saturated, it is optimal to always send each request to the least-loaded servers that can serve it.
For each rate of requests in , we pick servers that will serve it. By using any algorithm for picking the servers, we will reach one of possible states:
unsaturated servers with the same load . Then we can split a maximum of request rate uniformly over these servers. As a result all servers will be saturated, and the outer bound will be achieved.
There are exactly unsaturated servers in the system with loads , where at least one of these inequalities is strict. Then the additional rate we can serve is . This would leave a non-zero amount of capacity unused.
Since it always sends requests to the least-loaded nodes in the system, the water-filling algorithm always achieves the first state when it is feasible. And if the system ends up in the second state, water-filling minimizes . ∎
V Binary Simplex coded systems
Simplex codes are important subclass of availability codes. When files are encoded with a binary simplex code, must hold and a particular file can be recovered from (availability) disjoint groups of two (locality) servers. As an example, a simplex code encodes three files into seven as . This code has availability three, e.g., file can be can be repaired from either and or and , or and .
Each file can be recovered from its systematic or any of its repair groups. Therefore, the request for each file can be served at rate when the requests for all other files are zero.
Maximum sum of arrival rates that can be served by Simplex system is .
Fig. 4 shows graph representation (i.e., Fano plane) of Simplex code for . Vertices correspond to servers and file stored on each server is indicated by its label. Each edge corresponds to service of a particular file from a systematic server (reflective loops on a server) or a repair group (edges between two servers). Recall that repairing a file from one of its repair groups requires accessing two servers, hence supplying a unit of service rate from a repair group consumes twice the capacity of supplying it from a systematic server. Service rates are shown with label ’s on each edge such that denotes the file that is served and is an index to differentiate between the edges that serve the same file. Sum of the service rates supplied from edges that share the same vertex cannot be greater than .
Firstly consider system. Total service rate supplied by the system is
which is the sum of the service rates supplied by all the edges in graph. Edges with label and are attached to vertex , hence . Similarly, . Thus
Secondly consider system. Total service rate supplied by the system is
All edges in the graph are covered by the edges attached to vertices , , and . Therefore we can conclude
In general, total number of edges in the graph of a Simplex code for can be written as
where number of vertices .
Simplex is a binary linear code with generator matrix consisting of all size- bit vectors up to but not including vector of all ones. For instance, for , the generator matrix is
Every vertex in the graph of a Simplex code can be associated with the corresponding bit vector. For instance for , , and . Ignoring the loops on systematic vertices, there is an edge between two vertices if and only if corresponding bit vectors differ in a single bit (so that a symbol can be repaired from the two vertices). Then, graph of any Simplex code is a bipartite graph such that vertices that correspond to bit vectors with even number of ones can be separated from those with odd number of ones. All edges (excluding the loops) are covered by either one of the partitions. To cover also the loops, we need to pick the partition that includes the systematic vertices (i.e., bit vectors with a single one). In this chosen partition, every vertex has edges attached and no two vertices share any edge, therefore, number of vertices in the partition is .
Overall, for any , there exists of a set of vertices that cover all the edges in the graph. Then total service rate that can be supplied by the system can be bounded as
Using Lemma 1, we show that capacity region of an Simplex system is the simplex geometry in .
Simplex system can serve arrival rates if and only if .
If every server (systematic or not) dedicates the fraction of its capacity solely to serving requests for file , then the part of the system dedicated to acts as a binary simplex code on servers, each with capacity serving exclusively requests for file , giving the supplied service rate of . By construction, inequality always has to hold, and thus every achievable service rate tuple can be realized by the corresponding choice of fraction tuple . This observation together with Lemma 1 shows that achievable capacity region of the system is a simplex in . ∎
Vi Effect of Adding Systematic Nodes
Suppose we have files and stored across a storage system of cache nodes. Denote the arrival rates of requests for and as and , respectively. In all that follows, we assume any coded nodes or a coded node and systematic node may recover file and file . The service capacity region will be denoted by . Moreover, is the number of systematic nodes for file , is the number of systematic nodes for file , and is the number of coded nodes. In this section, we identify the service capacity region of such storage systems.
Let denote the maximum demand for that can be supported by a given storage system. Thus, there exists some splitting strategy for requests to the storage system that handles demand for file . For every this guaranteed splitting strategy also supports demand . Also, given expected wait time for each of the nodes, any demand cannot be supported by the storage system. In this way, given any fixed demand , the set of all of supported is a non-empty, closed subset of that is bounded above by . Thus, there is a maximum such , with in the service capacity region of the storage system. Define to be this maximum supported at given . With this definition, the function
is well-defined. The storage system’s service capacity region can be described as the subset of that is bounded by , , , and . For convenience, further denote .
If and there are coded nodes, then is the region bounded by and . If there are coded nodes and no systematic nodes, then the service capacity region is the point
Since each node can support rate of arrivals and recovering either file requires the use of two coded nodes, and . If , no file can be recovered. Suppose and label the nodes . For each , pair node with node . Note that each node is in two pairs. Requests for file may be evenly divided among the pairs, and requests for file can utilize the remaining capacity of each node, with half of the node’s remaining capacity devoted to each of the two pairs the node is in. Thus, the maximum achievable is
Note that the conclusion of Lemma 2 could be expressed as and . Also, . In this case, the boundary is redundant.
Given a storage system with nodes:
Case 1: If and a systematic node is added for file , then the service capacity region has -bound
Case 2: If and a systematic node is added for file , then the service capacity region has -bound
Case 3: If and a systematic node is added for file , then the service capacity region has -bound
Case 4: If and a systematic node is added for file , then the service capacity region has -bound
Case 1: First, consider when .
Subcase 1 ( is odd): Since , requests for file may be divide evenly among the systematic nodes for file . Since a systematic node and coded node can recover both files and , every systematic node for file can be paired with a coded node and requests for file can utilize the remaining capacity . Thus, coded nodes now have capacity . From Lemma 2, we know they can support requests for file . We can also pair off the remaining coded nodes, the number of which is even since is odd. They can be utilized their full capacity . Note, the systematic nodes for file can support a rate of arrivals for file . Thus, the maximum achievable is
Subcase 2 ( is even): Since , requests for file may be divide evenly among of the systematic nodes for file . As above, every systematic node for file can be paired with a coded node, and each pair can support requests for file . Note, one of the systematic nodes for file has not received any requests. We can form a triple with this systematic node and two coded nodes to serve requests for file . Also, the systematic nodes for file can support a rate of arrivals for file . Thus, the maximum achievable is the same as Subcase 1.
Now, if , then we may send requests for file to the systematic node for file that was added to the system. Then the system of available nodes reduces to the previous system, . A similar argument may be used to prove Cases 2, 3, and 4.
Note that in Lemma 3, the region has , so it would be equivalent in Case 1 to specify on as . In this way, when the addition of a systematic node for file adds a “bonus” region beyond the right-shift by that is seen both in Case 2 and when such a node is added to an uncoded system. A similar “bonus” region is added in Case 3. Figure 5 pictures the resulting rate region for systems with .
The service capacity region is bounded by , and
Let if and otherwise. Let if and otherwise. Similarly, let if and otherwise and let if and otherwise. Consider building a coded storage system by first adding coded nodes. Then by Lemma 2, the service capacity region is bounded by , , and where .
We will then add systematic nodes for file . Repeatedly applying Lemma 3 Case 1 yields
If , then we may continue to add systematic nodes for file by applying Lemma 3 Case 2. This yields
Thus we have
Applying Lemma 3 Case 3 to add systematic nodes for file yields
If , we may continue to add systematic nodes for file by applying Lemma 3 Case 4. This raises the boundary of the service capacity region by in the direction, giving the desired result. ∎
Vii Concluding Remarks
Popular content files are generally replicated at multiple nodes in order to support a larger volume of access requests. For large files, it can be slow and expensive to dynamically add replicas to adjust to changes in popularity.
In this paper, we consider a erasure coded system where some nodes store coded combinations of multiple content files. We determine the service capacity region, or the maximum rate of access requests that can be served by this system. Our results indicate that for the same amount of redundancy, adding coded nodes instead of replicas provides more robustness to changes in content popularity. We determine the capacity region for commonly used codes like MDS and Simplex codes. Comparison of the regions sheds light on designing the erasure code to maximize service capacity.
To the best of our knowledge, this is the first work to analyze the service capacity of coded storage systems. There are many questions open for future research. While we have studied the service capacity regions of commonly used codes, developing a general theory to optimally split requests, and design codes that maximize the capacity region remains an open problem. Also, in this paper we consider sending a request to one of the repair groups. Redundantly assigning requests to multiple groups and waiting for any one copy may increase service capacity, as shown in  for task replication in computing.
Part of this research is based upon work supported by the National Science Foundation under Grants No. CIF-1717314, and No. DMS-1439786 while some authors were in residence at the Institute for Computational and Experimental Research in Mathematics in Providence, RI, during the Women in Data Science and Mathematics Research Collaboration Workshop at ICERM in July 2017.
- A. G. Dimakis, P. B. Godfrey, M. Wainwright, and K. Ramachandran, “Network Coding for Distributed Storage Systems,” vol. 56, no. 9, pp. 4539–4551, Sep. 2010.
- A. G. Dimakis, K. Ramchandran, Y. Wu, and C. Suh, “A Survey on Network Codes for Distributed Storage,” Proceedings of the IEEE, vol. 99, no. 3, pp. 476–489, Mar. 2011.
- C. Huang, M. Chen, and J. Li, “Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems,” in Network Computing and Applications, 2007. NCA 2007. Sixth IEEE International Symposium on, July 2007, pp. 79–86.
- P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the locality of codeword symbols,” Information Theory, IEEE Transactions on, vol. 58, no. 11, pp. 6925–6934, Nov 2012.
- G. Joshi, Y. Liu, and E. Soljanin, “Coding for fast content download,” in Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on. IEEE, 2012, pp. 326–333.
- N. B. Shah, K. Lee, and K. Ramchandran, “The MDS queue: Analysing the latency performance of erasure codes,” in 2014 IEEE International Symposium on Information Theory (ISIT’14), pp. 861–865.
- G. Liang and U. C. Kozat, “Fast cloud: Pushing the envelope on delay performance of cloud storage with coding,” Networking, IEEE/ACM Transactions on, vol. 22, no. 6, pp. 2012–2025, 2014.
- K. Gardner, S. Zbarsky, S. Doroudi, M. Harchol-Balter, and E. Hyytia, “Reducing latency via redundant requests: Exact analysis,” ACM SIGMETRICS Performance Evaluation Review, vol. 43, no. 1, pp. 347–360, 2015.
- S. Kadhe, E. Soljanin, and A. Sprintson, “Analyzing download time for availability codes,” in Information Theory Proceedings (ISIT), 2015 IEEE International Symposium on, July 2015.
- ——, “When do the availability codes make the stored data more available?” in 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Sept 2015, pp. 956–963.
- M. F. Aktas, E. Najm, and E. Soljanin, “Simplex queues for hot-data download,” in Proceedings of the 2017 ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems. ACM, 2017, pp. 35–36.
- M. Noori, E. Soljanin, and M. Ardakani, “On storage allocation for maximum service rate in distributed storage systems,” in 2016 IEEE International Symposium on Information Theory (ISIT), July 2016, pp. 240–244.
- A. Wang and Z. Zhang, “Repair locality with multiple erasure tolerance,” Information Theory, IEEE Transactions on, vol. 60, no. 11, pp. 6979–6987, Nov 2014.
- A. Rawat, D. Papailiopoulos, A. Dimakis, and S. Vishwanath, “Locality and availability in distributed storage,” in Information Theory (ISIT), 2014 IEEE International Symposium on, June 2014, pp. 681–685.
- I. Tamo and A. Barg, “Bounds on locally recoverable codes with multiple recovering sets,” in Information Theory (ISIT), 2014 IEEE International Symposium on, June 2014, pp. 691–695.
- V. Cadambe and A. Mazumdar, “Bounds on the size of locally recoverable codes,” IEEE Transactions on Information Theory, vol. 61, no. 11, pp. 5787–5794, Nov 2015.
- E. W. Weisstein, “Fano plane. From MathWorld—A Wolfram Web Resource.” [Online]. Available: http://mathworld.wolfram.com/FanoPlane.html
- G. Joshi, “Boosting service capacity via adaptive replication,” in Proceedings of ACM/IFIP Performance, Nov. 2017.