Centralized Multi-Node Repair Regenerating Codes
In a distributed storage system, recovering from multiple failures is a critical and frequent task that is crucial for maintaining the system’s reliability and fault-tolerance. In this work, we focus on the problem of repairing multiple failures in a centralized way, which can be desirable in many data storage configurations, and we show that a significant repair traffic reduction is possible. The fundamental functional tradeoff between the repair bandwidth and the storage size for functional repair is established. Using a graph-theoretic formulation, the optimal tradeoff is identified as the solution to an integer optimization problem, for which a closed-form expression is derived. Expressions of the extreme points, namely the minimum storage multi-node repair (MSMR) and minimum bandwidth multi-node repair (MBMR) points, are obtained. We describe a general framework for converting single erasure minimum storage regenerating codes to MSMR codes. The repair strategy for failures is similar to that for single failure, however certain extra requirements need to be satisfied by the repairing functions for single failure. For illustration, the framework is applied to product-matrix codes and interference alignment codes. Furthermore, we prove that functional MBMR point is not achievable for linear exact repair codes. We also show that exact-repair minimum bandwidth cooperative repair (MBCR) codes achieve an interior point, that lies near the MBMR point, when , being the minimum number of nodes needed to reconstruct the entire data. Finally, for and , where is the number of helper nodes during repair, we show that the functional repair tradeoff is not achievable under exact repair, except for maybe a small portion near the MSMR point, which parallels the results for single erasure repair by Shah et al.
Ensuring data reliability is of paramount importance in modern storage systems. Reliability is typically achieved through the introduction of redundancy. Traditionally, simple replication of data has been adopted in many systems. For instance, Google file systems opted for a triple replication policy [ghemawat2003google]. However, for the same redundancy factor, replication systems fall short on providing the highest level of reliability. On the other hand, erasure codes can be optimal in terms of the redundancy-reliability tradeoff. In erasure codes, a file of size is divided into fragments, each of size . The fragments are then encoded into fragments using an maximum distance separable (MDS) code and then stored at different nodes. Using such a scheme, the data is guaranteed to be recovered from any node erasures, providing the highest level of worst-case data reliability for the given redundancy. However, traditional erasure codes suffer from high repair bandwidth. In the case of a single node erasure, they require downloading the entire data of size to repair a single node storing a fragment of size . This expansion factor made erasure codes impractical in some applications using distributed storage systems. In the last decade, the repair problem has gained increasing interest and motivated the research for a new class of erasure codes with better repair capabilities. The seminal work in [dimakis2010network] proposed a new class of erasure codes, called regenerating codes, that optimally solve the repair bandwidth problem. Interestingly, the authors in [dimakis2010network] proved that one can significantly reduce the amount of bandwidth required for repair and the bandwidth decreases as each node stores more information. Formally, suppose any out of nodes are sufficient to recover the entire file of size . Assuming that nodes, termed helpers, are participating in the repair process, denoting the storage capacity of each node by and the amount of information downloaded from each helper by , then, an optimal regenerating code satisfies
The equation describes the fundamental tradeoff between the storage capacity and the bandwidth . Two extreme points can be obtained from the tradeoff. Minimum storage regenerating (MSR) codes correspond to the best storage efficiency with , while minimum bandwidth regenerating (MBR) codes achieve the lowest possible bandwidth at the expense of extra storage per node.
If we recover the exact same information as the failed node, we call it exact repair, otherwise we call it functional repair. Using network coding [ahlswede2000network, ho2006random], it is possible to construct functional regenerating codes satisfying (1) [dimakis2010network]. Following the seminal work in [dimakis2010network], there has been a flurry of interest in designing exact regenerating codes achieving the optimal tradeoff, focusing mainly on the extreme MSR and MBR points, e.g., [shah2012interference, suh2011exact, wu2009reducing, papailiopoulos2013repair, wang2014explicit, rawat2016progress, goparaju2016minimum, cadambe2011optimal, Zigzag_Codes_IT, ye2016nearly]. For interior points that are between the MBR and MSR points in the tradeoff of (1), [non_achievability] showed that most points are not achievable for exact repair.
The aforementioned references, as most of the studies on regenerating codes in the literature, focus on the single erasure repair problem. However, in many practical scenarios, such as in large scale storage systems, multiple failures are more frequent than a single failure. Moreover, many systems (e.g., [bhagwan2004total]) apply a lazy repair strategy, which seeks to limit the repair cost of erasure codes: instead of immediately repairing every single failure, one waits until erasures occur, , then, the repair is done by downloading the equivalent of the total information in the system to regenerate the erased nodes. However, a natural question of interest is, whether we can reduce the amount of download in such scenarios.
In this work, we consider the repair problem of multiple erasures in a centralized manner. The framework requires the content of any out of nodes in the system to be sufficient to reconstruct the entire data. Upon failure of nodes in the system, the repair is carried out by contacting any nodes (helpers) out of the available nodes, , and downloading amount of information from each of the helpers. Our objective is to characterize the functional repair tradeoff between the storage per node and the repair bandwidth under the centralized multiple failure repair framework. Under functional repair, the repaired nodes are not necessarily the same as the failed nodes. Exact-repair however requires that the replacement nodes recover exactly the content of the failed nodes. We also seek to investigate the achievability of the functional tradeoff under exact repair.
The centralized repair framework is applicable to many practical situations. Indeed, there are situations in which, due to architectural constraints, it is more desirable to regenerate the lost nodes at a central server before dispatching the regenerated content to the replacement nodes [bhagwan2004total]. For instance, one can think of a rack-based node placement architecture [rawat2016centralized] in which failures frequently occur to nodes corresponding to a particular rack. In this scenario, a centralized repair of the entire rack is favorable to repairing the rack on the per-node basis. Furthermore, [rawat2016centralized] showed that a centralized repair framework can have interesting applications to communication-efficient secret sharing. Finally, centralized repair can be used in a broadcast network, where the repair information is transmitted to all replacement nodes (e.g. [hu2015broadcast]). For the above reasons, characterizing the repair-bandwidth tradeoff under the centralized repair framework is important from both an information-theoretic and also a practical perspective.
I-a Related work
Cooperative regenerating codes (also known as coordinated regenerating codes) have been studied to address the repair of multiple erasures [kermarrec2011repairing, closed_form_cooperative_regene] in a distributed manner. In this framework, each replacement node downloads information from helpers in the first stage. Then, the replacement nodes exchange information between themselves before regenerating the lost nodes. Cooperative regenerating codes achieving the extreme points on the cooperative tradeoff have been developed: minimum storage cooperative regenerating (MSCR) codes [li2014cooperative, closed_form_cooperative_regene] and minimum bandwidth cooperative regeneration (MBCR) codes[wang2013exact]. In [li2014cooperative], the authors showed that, given an instance of linear exact MSR codes, it is possible to construct an instance of exact linear MSCR codes for 2 erasures.
The number of nodes involved in the repair of a single node, known as locality, is another important measure of node repair efficiency [locality]. Various bounds and code constructions have been proposed in the literature [locality, family_locality]. Recent works have investigated the problem of multiple node repair under locality constraints [locality_two_erasures, song2015locally].
The problem of centralized repair has been considered in [cadambe2013asymptotic], in which the authors restricted themselves to MDS codes, corresponding to the point of minimum required storage per node. [cadambe2013asymptotic] showed the existence of MDS codes with optimal repair bandwidth in the asymptotic regime where the storage per node (as well as the entire information) tends to infinity. In [wang2016optimal], the authors proved that Zigzag codes, which are MDS codes designed initially for repairing optimally single erasures [Zigzag_Codes_IT], can also be used to optimally repair multiple erasures in a centralized manner. In [rawat2016centralized], the authors independently proved that multiple failures can be repaired in Zigzag codes with optimal bandwidth. Moreover, [rawat2016centralized] defines the minimum bandwidth multi-node repair codes as codes satisfying the property of having the downloaded information matching the entropy of nodes. Based on that, the authors derived lower bound on for systems having a certain entropy accumulation property and then showed achievability of the minimum bandwidth using MBCR codes. However, the optimal storage size per node is not known under these codes. In [ye2016explicit], the authors presented an explicit MDS code construction that provide optimal repair for all and simultaneously. The authors in [hu2015broadcast] studied the problem of broadcast repair for wireless distributed storage which is equivalent to the model we study in this paper. It is worth pointing out that the previous constructions are for high-rate codes, with large subpacketization . For scalar MDS codes, i.e., , it is shown that exact-repair cannot be achieved when . In [li2015enabling], the authors presented an approach that enables single erasure MSR codes to recover from multiple failures simultaneously with optimal bandwidth. Based on simulation, [li2015enabling] showed that their approach can provide efficient recovery of most of the failure patterns, but not all of them. The repair problem of Reed Solomon codes has been investigated in [guruswami2015repairing]. Repairing multiples failures in Reed Solomon codes has been investigated in[dau2016repairing, dau2017repairing]. In [IA_code], the authors proved that the interference alignment MSR construction of [suh2011exact], originally designed for repairing any single node failure, can recover from multiple failures in a cooperative way. Specifically, it is shown that any set of systematic nodes, any set of parity-check nodes, or any pair of nodes can be repaired cooperatively with optimal bandwidth.
I-B Contributions of the paper
The main contributions of this paper are summarized as follows.
We first establish the explicit tradeoff between the repair bandwidth and the storage size for functional repair. We obtain the tradeoff using information flow graphs. From the functional tradeoff, we characterize the minimum storage multi-node repair (MSMR) point, and the minimum bandwidth multi-node repair (MBMR) point.
When the number of erasures satisfies , being the minimum number of nodes needed to reconstruct the entire data, the tradeoff reduces to a single point, for which we provide an explicit code construction.
We formalize a construction for exact-repair MSMR codes. Given an instance of an exact linear MSR code, we present a framework to construct an instance of an exact linear MSMR regenerating code. We note here that [li2015enabling] and [li2014cooperative] used similar approach for their numerical results and MSCR codes, respectively. Based on this framework, we study the product-matrix (PM) MSR codes [Rashmi_Product_Matrix] and the interference alignment (IA) construction in [suh2011exact]. We prove the existence of PM and IA MSMR codes for any number of failures , . Moreover, for the IA code, we prove that one can always efficiently recover from any set of node failures as long as the failed nodes are either all systematic nodes or all parity nodes; for failures including both systematic and parity nodes, we derive explicit design conditions under which exact recovery is ensured, for some particular system parameters. We note here that unlike previous constructions, our codes are applicable when the code rate is low and use small subpacketization size of or .
We prove that, to our surprise, functional minimum bandwidth multi-node repair point is not achievable for linear exact repair codes, while linear codes achieve such point for single erasure [Rashmi_Product_Matrix].
We show that exact-repair minimum bandwidth cooperative repair (MBCR) codes achieve an interior point, that lies near the MBMR point, when .
We show that the functional repair tradeoff is not achievable under exact repair for interior points between MBMR and MSMR points, except for maybe a small portion near the minimum storage multi-node repair point, which parallels the results for single erasure repair [non_achievability], for . The achievability of the functional tradeoff under exact repair is summarized in Table I.
Finally, we study the repair problem of multiple erasures in MBR regenerating codes and present an MBR construction with optimal repair, simultaneously for varying number of helpers and varying number of erasures.
||MSMR point||MBMR point||Interior points|
|[Zigzag_Codes_IT, suh2011exact, Rashmi_Product_Matrix]||[Rashmi_Product_Matrix]||✗, except maybe for a small portion near the MSMR point [non_achievability].|
|[ye2016explicit, cadambe2013asymptotic], [Sections IV-B, IV-C,IV-D]||✗ (for linear codes) [Section V]||if : an interior point near the MBMR point is achievable [Section V-D]. if ✗, except maybe for a small portion near the MSMR point [Section VI].|
|Section IV-A||Section IV-A||Section IV-A|
I-C Organization of the paper
The remainder of the paper is organized as follows. A description of the system model is provided in Section II. The analysis of the functional tradeoff is detailed in Section III. Section IV-A describes our code construction for the case . In Section IV-B, we describe the MSMR codes framework and its application to the product-matrix and the interference alignment codes. We prove the non-achievability of MBMR codes under linear exact repair in Section V. The non-achievability of the interior points under exact repair is investigated in Section VI. The repair of multiple erasures for an MBR code is presented in Section VII. Section VIII draws conclusions.
Notation. The superscript is used to denote the transpose of a matrix. For a matrix , denotes its determinant and refers to its entry at position . denotes the standard basis vector whose dimension is clear from the context. For a set , denotes the resultant set after removing item , while denotes the size of . denotes the identity matrix of size and denotes the diagonal matrix with the corresponding elements. denotes the set of elements . The symbol is 1 if and 0 otherwise. The notations and are used to denote whether is a multiple of , or not, respectively. denotes a vector of length .
Ii System model and main results
The centralized mutli-node repair problem is characterized by parameters . We consider a distributed storage system with nodes storing amount of information. The data elements are distributed across the storage nodes such that each node can store up to amount of information. The system should satisfy the following two properties:
Reconstruction property: a data collector (DC) connecting to any nodes should be able to reconstruct the entire data.
Regeneration property: upon failure of nodes, a central node is assumed to contact helpers, , and download amount of information from each of them. New replacement nodes join the system and the content of each is determined by the central node. is called the repair bandwidth. The total bandwidth is denoted .
We consider functional repair and exact repair. In the former case, the replacement nodes are not required to be exact copies of the failed nodes, but the repaired code should satisfy again the above two properties. Our objective is to characterize the tradeoff between the storage per node and the repair bandwidth under the centralized multiple failure repair framework. On the optimal tradeoff, the minimum bandwidth mutli-node repair (MBMR) point has the minimum possible , and the minimum storage mutli-node repair (MSMR) point has the minimum possible .
In the paper, we will use the notation , such that and .
We state our main theorems that will be proved in the sequel of the paper. The first result is the explicit functional repair tradeoff of and .
Theorem 2. For fixed system parameters , functional regenerating codes satisfying the centralized multi-node repair condition exist if and only if
The next result combines Theorem 5 and 9, that give constructions of MSMR codes for exact repair.
Theorems 5 and 9. There exists interference alignment MSMR codes and product-matrix MSMR codes, defined over large enough finite field, such that any erasures can be optimally repaired.
Iii Functional storage-bandwidth tradeoff
In this section, we study the fundamental tradeoff between the storage size and the repair bandwidth for erasures under functional repair. We use the technique of evaluating the minimum cut of a multi-cast information flow graph similar to the single erasure codes [dimakis2010network] and the cooperative regenerating codes [closed_form_cooperative_regene].
Iii-a Information flow graphs
The performance of a storage system can be characterized by the concept of information flow graphs (IFGs). Our constructed IFG depicts the amount of information transferred, processed and stored during repair. We design our IFG with the following different kinds of nodes (see Figure 1). It contains a single source node that represents the source of the data object. Each storage node of the IFG is represented by two distinct nodes: an input storage node and an output storage node . Each node is connected to its input node with an edge of capacity , reflecting the storage constraint of each individual node. The information flow graph is formed with initial nodes, each with storage size connected to the source node with edges of capacity . The IFG evolves with time. Upon failure of nodes, new nodes simultaneously join the system. Each of the replacement nodes is similarly represented by an input node and an output node , linked with an edge of capacity . To model the centralized repair nature of the system, we add a virtual node that links the helpers to the new storage nodes. Likewise, the virtual node consists of an input node and an output node . The input node is connected to the helpers with edges each of capacity . The output node is connected to the input node with an edge of capacity , reflecting to the overall size of the data to be stored in the new replacement nodes. The output node is then connected to the input nodes of the replacement nodes, with edges of capacity .
Each IFG represents one particular history of the failure patterns. The ensemble of IFGs is denoted by . For convenience, we drop the parameters whenever it is clear from the context. Given an IFG , there are different data collectors connecting to nodes in . The set of all data collector nodes in a graph is denoted by . For an IFG and a data collector , the minimum cut (min-cut) value separating the source node and the data collector is denoted by .
Iii-B Network coding analysis
The key idea behind representing the repair problem by an IFG lies in the observation that the repair problem can be cast as a multicast network coding problem [dimakis2010network]. Celebrated results from network coding [ahlswede2000network, ho2006random] are then invoked to establish the fundamental limits of the repair problem.
According to the max-flow bound of network coding[ahlswede2000network], for a data collector to be able to reconstruct the data, the min-cut separating the source to the data collector should be larger or equal to the data object size . Considering all possible data collectors and all possible failure patterns, the following condition is necessary and sufficient for the existence of regenerating codes satisfying the reliability constraint:
Analyzing the minimum cut of all IFGs result in the following theorem.
For fixed system parameters , regenerating codes satisfying the centralized multi-node repair condition exist if and only if
We note that (5) was also independently developed in [rawat2016centralized].
Define a recovery scenario as follows. A data collector DC connects to a subset of nodes , where is the subset of contacted nodes. The size of the support of corresponds to the number of repair groups of size taking part in the reconstruction process, while corresponds to the number of nodes contacted from repair group .
As all incoming edges of DC have infinite capacity, we only examine cuts with and . Every directed acyclic graph has a topological sorting, which is an ordering of its vertices such that the existence of an edge implies . We recall that nodes within the same repair group are repaired simultaneously. Since nodes are sorted, nodes considered at the th step cannot depend on nodes considered at th step with .
Considering the -th group, consider the case and the remaining nodes are such that .
if , then the contribution of each node is . The overall contribution of these nodes is .
else: , then if , the contribution of this node is . Thus, we only consider the case . Then, we discuss two cases
if , the contribution to the cut is .
else, since the -th group is the topologically i-th repair group, at most edge come from output nodes in . Thus, the contribution is . Thus, the contribution of this node is . Note that , we do not need to account for other similar nodes.
Thus, if , the contribution of the i-th repair group is . If , the contribution is , which can be reduced to if . Thus, to lower the cut, either in the case of or otherwise. Thus, the total contribution of the -th repair group is
Finally, summing all contributions from different repair groups and considering the worst case for implies that
with defined as in (7).
Therefore, the existence of regenerating codes is guaranteed by [ahlswede2000network] as long as
111Strictly speaking, this is only valid when the number of failures/repairs is bounded. A rigorous proof is required to drop the boundedness assumption as [wu2007deterministic, closed_form_cooperative_regene] . ∎
Iii-C Solving the minimum cut problem
In this section, we derive the structure of the optimal scenario in (5) for any set of parameters . For instance, we show that for , the number of optimal repair groups (the support of ) is equal to . The result is formalized in the following theorem. Recall that we denote .
For fixed system parameters , functional regenerating codes satisfying the centralized multi-node repair condition exist if and only if
We denote by the vector that is the concatenation of the vectors . The next lemma shows that the minimum cut can be obtained by optimizing any subsequence of first. The proof follows directly from the definition of in (5) and is omitted.
Consider vectors such that . If
In proving the result of Theorem 2, we first characterize the optimal solution in the case of . Insight and intuition gained from the first case are used to motivate and derive the general solution. We first state the following lemma, which represents a key step towards proving our result.
Let be non-negative reals such that , then the following inequality holds
where is defined as in (6).
To prove the result, we cast it as an optimization problem:
The objective function in (13), as a function of , is concave on the interval . The concavity is due to the convexity of . Therefore, the minimum is achieved at one of the extreme values. Equivalently, or . ∎
In this scenario, connecting to nodes from the same repair group yields the worst case scenario from an information flow perspective. Given a particular repair scenario characterized by a vector , for any two adjacent repair groups (i.e., two adjacent entries in ) with and nodes respectively, we have . One can combine these two groups into a single repair group to achieve a lower cut value. Indeed, from the cut expression in (5), the contribution of the initial set to the cut is for some . After combining the groups into a single repair group, the contribution of the newly formed repair group is , which is lower than the initial contribution by virtue of Lemma 2, thus achieving a lower cut. This means that starting from an IFG, we construct a new IFG that has one less repair group and lower min-cut value. This process can be repeated until we end up with a single repair group consisting of nodes, which corresponds to the minimum cut over all graphs in this case.
Therefore, the tradeoff in (5) is simply characterized by . Moreover, and . Equivalently, the functional storage bandwidth tradeoff reduces to a single point given by .
Motivated by the previous case, the intuition is that, given a scenario , one should form a new scenario which exhibits as many groups of size as possible. Subsequently, one constructs a scenario such that all its entries, except maybe one entry, are equal to . Lemma 2 addresses the case . Generalizing it to the case where follows the same approach.
Assume that and . Then, the following inequality holds
where is defined as in (6).
For a fixed , we denote the min-cut corresponding to , as a function of , by . As will be shown later in the proof of Theorem 2, a careful analysis of the behavior of the different scenarios is needed to determine the overall optimal scenario leading the lowest minimum cut. We state the result in the following lemma, whose proof is relegated to Appendix -A.
There exists a point such that, for any ,
Now that we have the necessary machinery, we proceed as follows: given any scenario , we keep combining and/or changing repair groups by means of successive applications of Lemma 2 and Lemma 3 on subsequences of until we can no longer reduce the minimum cut. By Lemma 1 we reduced the overall minimum cut. The algorithm converges because at each step, either the number of repair groups in is reduced by one, or the number of repair groups of full size is increased by one. As the number of repair groups is lower bounded by , and as the number of repair groups of full size is upper bounded by , the algorithm must converge after a finite number of steps. It can be seen then that the above reduction procedure has a finite number of outcomes, given by
with and .
Therefore, if , then the optimal scenario corresponds to considering exactly repair groups. On the other hand, if , then, it is optimal to consider exactly repair groups. However, the optimal position of the repair group with nodes needs to be determined. Then, using Lemma 4, the result in Theorem 2 follows. ∎
Let with . Then, one can start by reducing the first three repair groups . This leads to . Another approach would be to consider the set . Reducing this set leads to either or . Reducing further leads to or . Reducing leads to or . It remains to compare the cuts given by , , and . Following Theorem 2, either or gives the lowest min-cut.
Iii-D Explicit expression of the tradeoff
Having characterized the optimal scenario generating the minimum cut in the last section, we are now ready to state the admissible storage-repair bandwidth region for the centralized multi-node repair problem, the proof of which is in Appendix -B.
For an storage system, there exists a threshold function such that for any , regenerating codes exist. For any , it is impossible to construct codes achieving the target parameters. The threshold function is defined as follows:
if , then: ,
else if , then:
else: with , then:
The functional repair tradeoff is illustrated in Figure 2 for multiple values of for and .
In the case of , the following equality holds for all points on the tradeoff
Therefore, the tradeoff between and is the same as the single erasure tradeoff of a system with reduced parameters given by , and . The expression of the tradeoff in this case can be recovered from [dimakis2010network] with the appropriate parameters.
We now have the expressions of the two extremal points on the optimal tradeoff. We focus on the case , as otherwise the optimal tradeoff reduces to a single point.
MSMR. The MSMR point is the same irrespective of the relation between and , and it is given by
MBMR. Interestingly, the MBMR point depends on whether divides or not.
If , we obtain
The amount of information downloaded for repair is equal to the amount of information stored at the replacement nodes. This property of the MBMR point is similar to the minimum bandwidth point in the single erasure case [dimakis2010network] and also the minimum bandwidth cooperative repair point [closed_form_cooperative_regene].
If , we obtain
This situation is novel for multiple erasures as the nodes need to store more than the overall downloaded information. This is an extra cost in order to achieve the low value of the repair bandwidth. However, later we will see that for both and , the total bandwidth at MBMR is equal to the entropy of the failed nodes (see Lemma 6 and Lemma 11):
where is the information stored in nodes.
From the statement of Theorem 2, we note that if we only consider points between the MSMR and the MBMR points, then the scenario always generates the lowest cut.
We compare the centralized repair scheme repairing nodes to a separate strategy repairing each of the nodes separately using single erasure regenerating codes.
We fix and .
Case I: both strategies use helpers. The separate strategy generates a total bandwidth given by , while the centralized repair generates , where the subscript indicates the number of erasures. For simplicity, we assume that . The case can be treated in a similar way. For points on the multi-node repair tradeoff, we have
Consider a point with the same and on the single erasure tradeoff, we write
It follows that with equality if and only if . Moreover, we note that . Therefore, for any storage capacity , multi-node repair requires strictly less bandwidth than a separate strategy for the same number of helpers .
Case II: multi-node repair uses helpers, and separate repair uses helpers. In this case, the original number of available nodes that can serve as helpers is assumed to be , and erasures occur within the available nodes. Then a separate strategy may generate smaller bandwidth for some values of , as illustrated by Figure 3. However, as is sufficiently large, we observe that multi-node repair with helpers performs better than a separate strategy for all values of . Moreover, for MSMR point, the separate repair bandwidth is , and centralized repair bandwidth is . It follows that a centralized repair is always better that a lazy repair strategy, specifically, for ,
Iv Exact MSMR codes constructions
In the remaining of the paper, we study exact-repair. In this section, we analyze the case and then construct MSMR codes when . Later, we study the feasibility of MBMR codes and the interior points under exact repair.
Iv-a Construction when
In the case of , the optimal tradeoff becomes a single point, so our MSMR construction in this section is also an MBMR code. The optimal parameters satisfy and . We note that the overall repair bandwidth and the reconstruction bandwidth are the same. Therefore, one can achieve and , by dividing the data into packets and encoding them using MDS code (for example, a Reed-Solomon code). The repair can be done by downloading the full content of any out of helpers while not contacting nodes. Such repair is asymmetric in nature. We describe one approach for achieving the repair with equal contribution from helpers.
Divide the original file into symbols/packets (that is ) and encode them using an MDS code.
Store the encoded packets at nodes, such that each node stores encoded packets.
For reconstruction, from any nodes, we obtain different symbols. By virtue of the MDS property, we can reconstruct the data.
For repair, each helper node transmits any symbols. The replacement nodes receive different coded symbols, which are sufficient to reconstruct the whole data and thus regenerate the missing symbols.
The above procedure works for a specific predetermined . However, it can be generalized to support any value of satisfying . For instance, let (lcm denotes the least common multiple). The file of size is then divided into packets and encoded into with an MDS code. Each node then stores coded symbols. For repair with helpers, for any , each node transmits any coded symbols for his node. Similarly, it can be seen that reconstruction is always feasible. Note that the constraint of the field size arises from the need for an MDS code. The field size needs to be no less than , e.g. Reed Solomon codes.
Iv-B Minimum storage codes framework
In the following sub-sections, we discuss an explicit MSMR code construction method using existing MSR codes designed for single failures. We first describe the general framework, and then present two specific codes.
The framework described in this section has been developed in [li2015enabling] for numerical simulations. We present it here in a formal way. Consider an instance of an exact linear MSR code, where . Consider nodes, indexed with , and other distinct nodes, indexed with , such that . Let and define . Consider the repair algorithm corresponding to failed node and helper nodes . We denote by the information sent by node to repair node , for helpers . We drop the superscript when it is clear from the context. The size of is symbols.
Now we construct an MSMR code. Upon failure of the nodes , the centralized node carrying the repair connects to the set of helpers . Each helper node transmits symbols given by One can check that the parameters of an MSMR code in (21) are satisfied with equality.
The approach consists in using the underlying MSR repair procedure for each of the failed nodes. Note that can be obtained from the helpers, for . To this end, we need to reconstruct for all , which we treat as unknowns. Let denote the encoding function used to encode the information sent from node to node . Also, let denote the decoding function used by the MSR code to repair node given information from helpers. Then, we write
where denotes the content of node , . Equation (28) generates linear equations in unknowns. Let be a vector containing the unknowns . Then, we seek to form a system of linear equations as
where is a known matrix and is a known vector. If is non-singular, one can thus recover . Then, the centralized node can recover the failed node as . We adopt the above approach throughout the section.
While the described framework applies to codes with arbitrary rates, we focus in the sequel on low-rate codes. For instance, for a target MSMR code with rate , the construction in [ye2016explicit] yields a storage size , while applying the above approach to IA codes or to PM codes results in a smaller storage size .
Iv-C Product-matrix codes
In this section, we construct MSMR codes for any erasures based on product-matrix (PM) codes [Rashmi_Product_Matrix]. The PM framework allows design of MBR codes for any value of and design of MSR codes for . Moreover, the PM construction enjoys simple encoding and decoding and ensures optimal repair of all nodes. Product-matrix MSR codes are a family of scalar MSR codes, i.e., , designed for parameters . We first focus on the case . Under this setup, . The codeword is represented by an code matrix such that its row corresponds to the symbols stored by the node. Each code matrix is given by
where is an encoding matrix and is an message matrix, and are symmetric matrices constructed such that the entries in the upper-triangular part of each of the two matrices are filled up by distinct message symbols. is an matrix and is an diagonal matrix. The elements of should satisfy:
any rows of are linearly independent;
any rows of are linearly independent;
the diagonal elements of are distinct.
The above conditions may be met by choosing to be a Vandermonde matrix, in which case its row is given by . It follows that . In the following, we assume that is a Vandermonde matrix.
Repair of a single erasure in PM codes. The single erasure repair algorithm [Rashmi_Product_Matrix] is reviewed below. Let denote the content stored at a failed node. Let be the row of . Then, . Let denote the set of helpers. Each helper transmits to the replacement node, who obtains , where Note that is invertible by construction. Thus, using the symmetry of and , we obtain . We can then reconstruct .
Repair of multiple erasures in PM codes. Given the symmetry of PM codes, we can assume w.l.o.g that nodes in have failed. Define . Let . The centralized node connects to nodes . Each helper node transmits .
Let . Our goal is to express explicitly and as in (29).
Consider the repair of node by the set of helpers in . From the previous subsection, we write