Repairable Replication-based Storage Systems Using Resolvable Designs
We consider the design of regenerating codes for distributed storage systems at the minimum bandwidth regeneration (MBR) point. The codes allow for a repair process that is exact and uncoded, but table-based. These codes were introduced in prior work and consist of an outer MDS code followed by an inner fractional repetition (FR) code where copies of the coded symbols are placed on the storage nodes. The main challenge in this domain is the design of the inner FR code.
In our work, we consider generalizations of FR codes, by establishing their connection with a family of combinatorial structures known as resolvable designs. Our constructions based on affine geometries, Hadamard designs and mutually orthogonal Latin squares allow the design of systems where a new node can be exactly regenerated by downloading packets from a subset of the surviving nodes (prior work only considered the case of ). Our techniques allow the design of systems over a large range of parameters. Specifically, the repetition degree of a symbol, which dictates the resilience of the system can be varied over a large range in a simple manner. Moreover, the actual table needed for the repair can also be implemented in a rather straightforward way. Furthermore, we answer an open question posed in prior work by demonstrating the existence of codes with parameters that are not covered by Steiner systems.
Large scale data storage systems are becoming ubiquitous in recent years. The availability of low cost storage media such as magnetic disks have fueled the growth of various applications such as Facebook, Youtube etc. These applications require a massive amount of data to be stored and accessed in a distributed manner. In these systems it is often the case that the individual storage nodes are unreliable. Thus, the integrity of the data and the speed of the data access needs to be maintained even under the presence of such unreliable storage nodes. This issue is typically handled by introducing redundancy in the storage system. For instance, one could replicate data across multiple nodes or use Maximum Distance Separable (MDS) codes such as Reed-Solomon codes that allow for a better reliability at the same redundancy.
However, the large scale distributed nature of the systems under consideration introduces another issue. Namely, if a given storage node fails, it need to be regenerated so that the new system continues to have the properties of the original system. It is of course desirable to perform this regeneration in a distributed manner and download as little data as possible from the existing nodes. The problem of regenerating codes was introduced by Dimakis et al. . The authors demonstrated a fundamental tradeoff between the amount of data stored at each node (storage capacity) and the amount of data that needs to be downloaded for regenerating a failed node (repair bandwidth).
In particular, consider a distributed storage system (DSS) that consists of storage nodes, each of which stores packets. A given user needs to have the ability to reconstruct the stored file by contacting any nodes; this is referred to as the MDS property of the system. Suppose that a given node fails. The DSS needs to be repaired by introducing a new node. This node should be able to contact any surviving nodes and download packets from each of them for a total repair bandwidth of packets. The new DSS should continue to have the MDS property. The work of  considered the case of functional repair, where the new node needs to be functionally equivalent to the failed node. It was shown that this could be achieved by the usage of random network coding. In particular, under functional repair the entire storage vs. repair bandwidth curve is known exactly. One can also consider exact repair where the new node should be able to recreate the contents of the failed node (see [2, 3]). Two points on the curve deserve special mention and are arguably of most interest from a practical perspective. The minimum bandwidth regenerating (MBR) point refers to the point where the repair bandwidth, is minimum. Likewise, the minimum storage regenerating (MSR) point refers to the point where the storage per node is minimized.
Much of the existing work in the area of DSS considers coded repair where the surviving nodes need to compute linear combinations of all their existing packets. It is well recognized that the read/write bandwidth of machines is much lower than the network bandwidth. Thus, this process induces undesirable latencies in the repair process. The process can also be potentially memory intensive if the packets comprising the file are large. Motivated by these issues, in , El Rouayheb and Ramchandran considered the following variant of the DSS problem. The DSS needs to satisfy the property of exact and uncoded repair, i.e., the regenerating node needs to produce an exact copy of the failed node by simply downloading packets from the surviving nodes. This allows the entire system to work without requiring any computation at the surviving nodes. In addition they considered systems that are resilient to multiple failures. However, the DSS only has the property that the repair can be conducted by contacting some set of nodes, i.e., unlike the original setup, repair is not guaranteed by contacting any set of nodes. This is reasonable as most practical systems operate via a table-based repair, where the new node is provided information on the set of surviving nodes that it needs to contact. The work of  proposed a construction whereby an outer MDS code is concatenated with an inner “fractional repetition” code of a certain degree. The main challenge here is to design the inner fractional repetition code in a systematic manner.
The work of  primarily considered fractional repetition (FR) codes that result from Steiner systems, which are an instance of a combinatorial design. Subsequently, Koo and Gill  considered the usage of finite projective planes for the design of these codes. Both  and , consider fractional repetition codes where the new node downloads exactly one packet (i.e., ) from the surviving nodes that are contacted. In this work we study the design of fractional repetition codes in more generality.
I-a Main Contributions
In this work we consider REPairable REplication-based Storage Systems Using REsolvable Designs, abbreviated as REPRESSURED codes. REPRESSURED codes are more general than fractional repetition codes as the new node has the flexibility of downloading packets from the surviving nodes. Our design is based on combinatorial structures called resolvable designs . Our work makes the following contributions.
Our constructions based on affine geometries and Hadamard designs allow for a large class of codes where .
The work of  considers Steiner systems where parameters such as the repetition degree of each packet are fixed a priori. In contrast, our code design allows the system designer to vary the repetition degree within a large range in a simple manner.
We resolve an open question posed in , by showing the existence of FR codes that have a repetition degree greater than two, that cannot be constructed by Steiner systems.
The systems under consideration require table-based repair, whereby a table of nodes that need to be contacted under the various failure patterns needs to be maintained. As will be evident, our code design approach is such that this table can be maintained in a very simple manner.
This paper is organized as follows. Section II contains a formal discussion of the problem formulation. Section III and Section IV discuss the design of FR codes from resolvable designs and Latin squares respectively. We conclude the paper with a comparison with existing work and discussion of future issues in Section V.
Ii Problem Formulation
The DSS is specified by parameters where - number of storage nodes, - number of nodes to be contacted for recovering the file and is the number of nodes to be contacted in order to regenerate a failed node. The storage capacity of each node is denoted by . In case of repair, the new node downloads packets from each surviving node, for a total of bits. Let denote the size of file being stored on the DSS. Under functional repair, it is known that at the MBR point, .
We consider the design of fractional repetition codes that are best explained by means of the following example  with in the discussion below.
Consider a file of packets that needs to stored on the DSS. We use a MDS code that outputs packets and . The coded packets are placed on storage nodes as shown in Fig. 1. This placement specifies the inner fractional repetition code. It can be observed that each is repeated times and the total number of symbols . Any user who contacts any nodes can recover the file (using the MDS property). Moreover, it can be verified that if a node fails, one packet each can be downloaded from the four surviving nodes, i.e., and , so that .
Thus, the approach uses an MDS code to encode a file consisting of a certain number of symbols. Let denote the number of encoded symbols. Copies of these symbols are placed on the nodes such that each symbol is repeated times and each node contains symbols. Moreover, if a given node fails, it can be exactly recovered by downloading packets from some set of surviving nodes, for a total repair bandwidth of . It is to be noted that in this case , i.e., these schemes operate at the MBR point. In the example above, , so that . However, one can consider systems with in general.
In this work we propose the construction of several fractional repetition codes. Before introducing the formal definition of a fractional repetition (FR) code we need the notion of -recoverability. Let denote the set .
Let and be subsets of . Let and consider with . We say that is -recoverable from if there exists for each such that and .
A fractional repetition (FR) code for a DSS (where ) with repetition degree and normalized repair bandwidth ( and are positive integers) is a set of subsets of a symbol set with the following properties.
The cardinality of each is .
Each element of belongs to sets in .
Let denote any sized subset of and . Each is -recoverable from some -sized subset of .
The value of is a measure of the resilience of the system to node failures. The code rate is defined as
It can be observed that corresponds to the maximum filesize that can be obtained with a certain value of . We remark that in , only FR codes with were studied. In this case the requirement (c) in Definition 2 is automatically satisfied and it can be seen that the system is resilient to failures.
Our proposed constructions aim to maximize the file size given the parameters 111It can be seen that these further specify . Furthermore, it can be seen that . while ensuring a certain level of failure resilience. In the case of MBR constructions the maximum filesize under the Dimakis et al. model is known to be . Accordingly, we call a FR code universally good if the code rate ( used this terminology when ). As will be evident, all our constructions in this work are universally good.
Iii REPRESSURED codes from Affine Resolvable Designs
We now discuss the construction of FR codes from resolvable designs. As we shall see this construction allows us to easily vary the repetition degree and the normalized repair bandwidth .
Let where be a FR code. A subset is said to be a parallel class if for and with we have and . A partition of into parallel classes is called a resolution. If there exists at least one resolution then the code is called a resolvable fractional repetition code.
The properties of a resolvable FR code are best illustrated by means of the following example.
Consider a DSS construction with parameters and . Suppose that we arrange the symbols in in a array shown below.
Let the rows and the columns of form the nodes in the FR code (see Fig. 2), thus . It is evident that there are two parallel classes in , (corresponding to rows) and (corresponding to columns). As , this code can tolerate one failure.
It can be observed that for and , we have . Using this we can compute the code rate when , as follows. Let with . Then, the number of distinct symbols in a set of nodes from is
where nodes are from and nodes are from . This is minimized when . Thus, and it can be seen that the construction is universally good.
It can be seen that given a resolvable FR code with parallel classes, one can obtain a resolvable FR code with repetition degree , simply by choosing the node set in to be any distinct parallel classes from . Moreover, the recovery process when at most nodes are in failure and is also quite simple. Specifically, it is clear that upon node failures, there is at least one parallel class in that remains intact. As all symbols from are represented in any parallel class, any failed node can be regenerated by contacting the nodes in the remaining class.
We now present explicit constructions of resolvable FR codes by leveraging the properties of combinatorial designs. For an in-depth discussion of combinatorial designs, see .
A balanced incomplete block design (BIBD) is a pair , where is a -element set and is a collection of -subsets of , called blocks, such that ; every element of is contained in exactly blocks and every -subset of is contained in exactly blocks.
Let denote the number of blocks. It can be shown that for a BIBD, the following relations hold.
It can be observed that a BIBD is essentially a FR code, with the additional property that every -subset of is contained in exactly blocks. Likewise we can define a resolvable -BIBD (analogous to a resolvable FR code) and the notions of a parallel class and resolution. Namely, a parallel class is a subset of disjoint blocks from whose union is and a partition of into parallel classes is a resolution.
A Steiner system is a set of elements and a collection of subsets of of size called blocks such that any subset of the symbol set appears in exactly one of the blocks.
It can be seen that a Steiner system is a -BIBD where .
Bose’s Inequality . Suppose that there exists a resolvable -BIBD. Then, .
Within the class of resolvable designs we will primarily be interested in the class of affine resolvable designs for which .
Iii-a Affine geometry based constructions
First, we will discuss the construction of a resolvable -BIBD. This is also known as the affine plane of order .
We can explicitly construct affine planes when is a prime power. Let be a prime power and denote the finite field of order . We define the symbol set . For any , define a block . For any , define . So there are blocks which we can partition into parallel classes of each size . Specifically, fix then forms the parallel classes and the last parallel class is given by .
By using the above construction we can construct an affine plane of order 2 (see Fig. 3).
The set of symbols is and the blocks are as follows:
Affine planes are also considered in  since any affine plane is a Steiner system. They also mentioned in . However, here we have the flexibility of constructing fractional repetition codes with repetition by choosing any parallel classes. For instance, Example 2 can be also constructed by considering only two parallel classes of an affine plane of order when is a prime power.
Next, we discuss affine geometries which yield a larger class of constructions. Let be a prime power, and . Let . Note that is an -dimensional vector space over . A -flat is a solution set to a system of independent linear equations that can be homogeneous or non-homogeneous. The set and the set of all -flats of comprise the -dimensional affine geometry over , denoted by . It turns out that one can generate a large class of resolvable designs by considering . Let denote the Gaussian coefficient, so that
 Let denote the set of all -flats in . Then and form a resolvable design with and .
The case of corresponds to the case of affine planes that were discussed above. It can be shown that when , using Theorem 1, we obtain affine resolvable designs with . In this case the DSS parameters are , , and . The design can be specified by means of the following algorithm.
Let be the symbol set.
Find , -dimensional subspaces of such that each of them contains the symbol . Note that these subspaces of are the solutions to homogeneous linear equations over in variables. These subspaces are representatives of the different parallel classes.
Construct each parallel class by considering the additive cosets of its representative. Let be a -dimensional subspace corresponding a given homogenous equation. Take a symbol and add to each symbol in to form a subspace (which corresponds to a nonhomogeneous equation). We note that there are non-zero choices for . Each choice of forms a new block.
 Let and . The set of symbols is and there are 39 blocks which can be partitioned into 13 parallel classes. The representatives of the 13 parallel classes are as follows:
The other blocks are additive cosets of these 13 representatives. For example, the first parallel class consist of the following blocks:
The overlap between blocks from different parallel classes in the case of affine resolvable designs is known from the following result.
 Any two blocks from different parallel classes of an affine resolvable -BIBD intersect in exactly symbols.
Let and be parallel classes in an affine resolvable -BIBD. Let . Any block from is -recoverable from .
Proof: By Lemma 2 it is clear that the intersection between a block in and any block in is of size . Next, there is no overlap between the blocks in and there exist blocks in ; this gives us the result. Thus, for affine resolvable designs resulting from affine geometries, we have and .
In addition to the above examples, we emphasize that we can generate resolvable FR codes with a wide range of parameters as shown in Table I.
For instance, let . Then a resolvable FR code with exists. This code has parallel classes. Suppose that we wish to deploy a DSS with a repetition degree of . We can simply pick parallel classes to form the node set. In the event of four node failures, we contact all the nodes in the intact parallel class and download symbols from each of them, i.e., the code is resilient to four node failures. The code rate is guaranteed to be at least as any two nodes have at most symbols in common. This implies that these codes are universally good.
This approach provides us with a systematic way of designing codes with that are resilient up to failures. Furthermore, it can be seen that a system can be resilient to at most failures, i.e., our approach is optimal from a resilience point of view. It is known that there exist affine resolvable designs that are not Steiner systems, i.e., our class of codes is different from the Steiner system based codes considered in .
Iii-B Hadamard matrix based construction
A second construction of affine resolvable designs can be obtained from Hadamard matrices or equivalently difference sets as discussed below. Consider a group of order and such that , with the property that every nonidentity element of can be expressed as a difference of elements of in exactly ways. We refer to as a -difference set.
Quadratic residue difference set.  Let be an odd prime power and . Let be the set of quadratic residues. Then is a -difference set in 222 denotes the additive operation over .
For any , we define the translate of by , and define the development of by . If is a -difference set in , then is a -BIBD .
Let be the -BIBD constructed by using a quadratic residue difference set. Let , and define for . Then it can be shown  that is an affine resolvable -BIBD. Using the equations (1) and (2) it can be seen that this corresponds to a resolvable FR code with parameters and .
is a -difference set in . We can construct the Fano plane by using the difference set which is a -BIBD. By applying the above construction we have the following parallel classes and their corresponding storage nodes.
For this class of codes, is always 2. However, they offer more flexibility in the choice of ; unlike affine geometry based codes, we do not require to be a prime power.
Table II contains the parameters of this construction corresponding to some representative values of .
For these codes which implies that . It can be seen that the code rate is for any Hadamard design based resolvable fractional code for . In this case, since is equal to , these codes are universally good.
The advantage of affine resolvable designs is that the overlap between blocks from different parallel classes is known exactly. This is not the case in general for resolvable designs that are not affine resolvable. Thus, if the design is not affine resolvable, we may not be able guarantee the -recoverable property of the fractional codes. However, it is conceivable that general resolvable designs can be used for the design of FR codes. In the next section we present constructions of resolvable FR codes where , but the corresponding designs are not affine resolvable. We emphasize that these designs result in FR codes whose parameters are not achieved by .
Iv REPRESSURED codes from Latin Squares
In this section, we consider resolvable FR codes with . We use mutually orthogonal Latin squares  for the construction.
A Latin square of order with entries from a -set is an array in which every cell contains an element of such that every row of is a permutation of and every column of is a permutation of .
Suppose that and are Latin squares of order with entries from and respectively (where ). We say that and are orthogonal Latin squares if for every and for every there is a unique cell such that and
Equivalently, one can consider the superposition of and in which each cell is occupied by the pair . Then, and are orthogonal if and only if the resultant array has every value in . A set of Latin squares of order are said to be mutually orthogonal if and are orthogonal for all .
We now demonstrate a procedure of constructing FR codes from mutually orthogonal Latin squares . Let , and let be a set of mutually orthogonal Latin squares of order ().
Arrange the elements of in a array . Each row and each column of corresponds to a storage node (this gives us nodes).
Note that takes values in . Within identify the set of pairs where a given value appears. Create a storage node by including the entries of corresponding to the identified pairs.
Repeat this for each and all . This creates another storage nodes.
Thus, a total of storage nodes of size can be obtained. Of course one can choose fewer storage nodes if so desired.
The construction procedure described above produces a resolvable fractional repetition code with and .
Proof: It is clear from the construction that and . Each storage node has symbols so that . We need to show that the code is resolvable. Towards this end note that it is evident that we obtain a parallel class by considering the nodes corresponding to the rows of (similar argument holds for the columns of ). Next the nodes obtained by considering Latin square also form a parallel class, since the set of elements obtained by considering the pairs corresponding to are distinct from those corresponding to , if . As we have parallel classes, we obtain . Next, consider the overlap between any two storage nodes belonging to different parallel classes. As and are orthogonal, any entry appears exactly once in the superposition of and , which implies that the overlap between storage nodes from different parallel classes corresponding to the ’s is exactly one element. Similarly, a block from a parallel class corresponding to has exactly one overlap with the blocks corresponding to the rows and columns of .
Let , and . Then, we have the following construction
where it can be verified that and are orthogonal. Then we have the following parallel classes and corresponding storage nodes.
Note that in describing the above construction we assumed the existence of mutually orthogonal Latin squares. We now discuss the issue of the existence of such structures.
If is a prime number, is a positive integer, and then we can construct mutually orthogonal Latin squares as described below.
Define , by (where the addition is over ) for all . Then is a Latin square since for a given row (or column ) the column (or row) location of an element is uniquely specified.
For any , and are orthogonal since for given ordered pair the system , , determine and uniquely.
Let N=3. Then , and . The two orthogonal Latin squares of order 3 constructed by the above method are
In general, the construction of orthogonal Latin squares is somewhat involved. However, the celebrated results of , demonstrate the construction of two orthogonal Latin squares for all orders . This immediately allows us to construct resolvable fractional repetition codes with the following parameters and for any .
It can be observed that this construction allows us to design FR codes that are not covered by those arising from Steiner systems. For instance, Let and . Then to construct a FR code we need use the Steiner system which does not exist . However the above construction with two orthogonal Latin squares of order 10 provides us a resolvable fractional code with and .
V Concluding Remarks
In this work we introduced REPRESSURED codes that are fractional repetition codes constructed from resolvable designs. Our work offers the following advantages with respect to the existing work.
The resolvable nature of our codes allows for a natural tradeoff between the repetition degree and the number of parallel classes, i.e., we can obtain FR codes with higher or lower simply by including or removing parallel classes. This flexibility is lacking in the approach of , where the entire Steiner system needs to be used. In particular, it is known that there exist Steiner systems that are not resolvable. For instance, Ray-Chaudhuri and Wilson showed that a resolvable exists if and only if ().
The work of  mentioned the usage of affine planes for designing distributed storage systems. Our work has considered a larger class of designs based on affine geometries, Hadamard designs and Latin squares.
It is to be noted that all our proposed constructions are universally good, i.e. . Moreover, the repair process is particularly simple. With failures, at least one parallel class is guaranteed to be intact. The new node can simply contact the nodes in the intact parallel class for regeneration. This property is likely to simplify the implementation of the proposed systems.
Future work would include the investigation of other classes of combinatorial designs and a more careful analysis of the maximum filesize of the proposed codes.
-  A. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. on Info. Th., vol. 56, no. 9, pp. 4539 –4551, Sept. 2010.
-  K. Rashmi, N. Shah, P. Kumar, and K. Ramchandran, “Explicit construction of optimal exact regenerating codes for distributed storage,” in 47th Annual Allerton Conference on Communication, Control, and Computing, 2009, pp. 1243 –1249.
-  ——, “Explicit and optimal exact-regenerating codes for the minimum-bandwidth point in distributed storage,” in IEEE International Symposium on Information Theory Proceedings (ISIT), 2010, pp. 1938 –1942.
-  S. E. Rouayheb and K. Ramchandran, “Fractional repetition codes for repair in distributed storage systems,” in 48th Annual Allerton Conference on Communication, Control, and Computing, 2010, pp. 1510 –1517.
-  J. Koo and J. Gill, “Scalable constructions of fractional repetition codes in distributed storage systems,” in 49th Annual Allerton Conference on Communication, Control, and Computing, 2011, pp. 1366 –1373.
-  D. R. Stinson, Combinatorial Designs: Construction and Analysis. Springer, 2003.
-  R. C. Bose, “A Note on the Resolvability of Balanced Incomplete Block Designs,” Sankhya: The Indian Journal of Statistics (1933-1960), vol. 6, no. 2, pp. 105–110, 1942.
-  F. Yates, “A new method of arranging variety trials involving a large number of varieties,” The Journal of Agricultural Science, vol. 26, no. 03, pp. 424–455, 1936. [Online]. Available: http://dx.doi.org/10.1017/S0021859600022760
-  R. C. Bose, S. S. Shrikande, and E. T. Parker, “Further results on the construction of mutually orthogonal Latin squares and the falsity of Euler’s conjecture,” Canadian Journal of Mathematics, vol. 12, pp. 189–203, 1960.
-  C. W. H. Lam, L. Thiel, and S. Swiercz, “The nonexistence of finite projective planes of order 10,” Canadian Journal of Mathematics, vol. 4, pp. 1117–1123, 1989.
-  D. K. Ray-Chaudhuri and R. M. Wilson, “Solution of Kirkman’s school girl problem,” Amer. Math. Soc. Proc. Symp. in Pure Math., vol. 19, pp. 187–203, 1971.