Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient
We consider a set up where a file of size is stored in distributed storage nodes, using an minimum storage regenerating (MSR) code, i.e., a maximum distance separable (MDS) code that also allows efficient exact-repair of any failed node. The MDS property ensures that the original file can be reconstructed even if any storage nodes fail. When a node fails, a new node collects data from the remaining healthy nodes and repairs the failed node. The problem of interest in this paper is to minimize the repair bandwidth for exact regeneration of the failed node, i.e., the minimum data to be downloaded by the new node to replace the failed node by its exact replica. Previous work has shown that with random network coding, a bandwidth of is necessary and sufficient for functional (not exact) regeneration, i.e., if the repaired new node need not be exactly identical to the failed node, but only information equivalent to it. It has also been shown using interference alignment based techniques that if then, surprisingly, there is no extra cost of exact regeneration over functional regeneration and the same repair bandwidth of suffices for exact regeneration. The practically relevant setting of low-redundancy, i.e., remains open for and it has been shown that there is an extra bandwidth cost for exact repair over functional repair in this case. In this work, we adopt into the distributed storage context an asymptotically optimal interference alignment scheme previously proposed by Cadambe and Jafar for large wireless interference networks. With this scheme we solve the problem of repair bandwidth minimization for exact-MSR codes for all values including the previously open case of . Our main result is that, for any , and sufficiently large file sizes, there is no extra cost of exact regeneration over functional regeneration in terms of the repair bandwidth per bit of regenerated data. More precisely, we show that . The result is analogous to the wireless interference channel setting where exact interference alignment through linear beamforming is seen to be infeasible for more than users, but almost perfect alignment is achieved asymptotically by the Cadambe-Jafar scheme over a large number of signaling dimensions for any number of users.
The problem of interest in this paper is to minimize the bandwidth required to exactly repair failed nodes in distributed storage systems. It is well known that maximum distance separable (MDS) codes can be used to reliably store data in distributed storage nodes. To see this, consider a scenario where a file of size is to be stored in distributed storage nodes. The file is split into equal parts of size and stored in the first storage nodes, also known as systematic nodes. The remaining nodes, known as parity nodes or non-systematic nodes, store data of the same size, i.e., , adding redundancy to protect from failure of storage nodes. The parity nodes are designed so that a failure of up to storage nodes can be tolerated, i.e., the original file can be completely recovered from the data stored at any nodes out of the original nodes. Clearly, for this problem, storing the data using an MDS code suffices to achieve the required reconstruction criterion, since an MDS code protects the data from erasures. Now, consider the case where only node fails, and a new node is introduced to replace the failed node. The total amount of data to be downloaded by the new node to regenerate a single failed node will be henceforth referred to as the repair bandwidth. Clearly, a repair bandwidth of suffices to repair a failed node since the new node can download data of total size from any of the remaining healthy nodes to reconstruct the failed node. However, note the inherent inefficiency in the solution - to reconstruct a node of size , the newcomer downloads data of size , i.e., times the size of the data to be repaired. A question of interest is whether this inefficiency is fundamental, or whether the node can be repaired with the new comer downloading data of size less than . More specifically, the question of interest of this paper is what is the minimum repair bandwidth required to repair a failed node? The question of minimum repair bandwidth has been studied previously from two perspectives [1, 2, 3, 4, 5]. The first is called functional regeneration [1, 2] and the second is called exact (or systematic) regeneration [3, 4, 5].
The functional regeneration problem requires the new node to replace the failed node by a function of the data, so that the reconstructed new node, along with the other nodes satisfy the property of being an MDS code. In other words, the repaired node is information equivalent to the originally stored data. Note that in the functional regeneration problem, the data stored by the repaired node need not be identical to the data stored by the failed node; all that is required is that the repaired node along with the other nodes forms a MDS code. This problem has been shown to be equivalent to finding the capacity of a particular wired single-source multi-cast network. Since network coding achieves the cut-set bound in a single-source multi-cast network, the functional regeneration problem has been solved, and it is shown in  that the minimum bandwidth required is . Note that since is smaller than (for ), the solution trivially implies that reconstruction of a single failed node requires a smaller repair bandwidth than reconstruction of failed nodes, by a factor of .
The focus of this paper is on the exact (or systematic) regeneration problem, where the new comer is required to replace the failed node by a replica, i.e., an identical copy of the failed node. Since exact regeneration ensures that a failed systematic node is replaced by a systematic node, the systematic structure of data storage is retained. There is a practical advantage of preservation of the systematic structure which ensures easy access to the data for a client, since the client can simply download it from the systematic nodes without any decoding. Note that the constraints for systematic or exact regeneration are stricter than the functional regeneration problem. Since any solution for the exact regeneration problem is also a solution to functional regeneration problem, serves as a lower bound to the minimum repair bandwidth for the exact repair problem. However, if the repair bandwidth of suffices, in general, has been an open question. It is this open question that is the focus of this paper.
I-a Related Work and Summary of Contributions
The exact regeneration problem was formulated and solved for the special case of in . The solution was further extended to the more general case of in [4, 5]. The results in all these cases yield the same surprising conclusion: there is no price for exact regeneration over functional regeneration, and a repair bandwidth of suffices even for exact regeneration. The solution for these cases stems from drawing parallels between the exact regeneration problem and the wireless interference channel . Such parallels enable the use of the interference management technique of interference alignment [6, 7] for the exact regeneration problem. However, prior to this work, as far as we are aware, little was known about the minimum repair bandwidth for . From a practical perspective, note that the previously unsolved case of is important because this case corresponds to the amount of parity data (i.e., number of parity nodes) being smaller than the original file size (number of systematic nodes). This case is briefly studied in reference  where for , it is shown that the lower bound of cannot be achieved using linear codes. The main contribution of this paper is to make progress in this open problem drawing inspiration from the interference alignment solution for the user wireless interference channel in . We argue that, while that lower bound on the repair bandwidth of may not be sufficient in general as noted in , the repair bandwidth per bit of repaired data can indeed achieve this lower bound in the limit of large file sizes for any . More precisely, we show that
i.e., for any , with sufficiently large amount of data, there is no cost of exact repair over functional repair in terms of repair bandwidth per bit of repaired data. An interesting insight of our solution is that the size of the symbol extension in wireless interference channels is analogous to the file size of our solution. Reference  shows that the optimal number of degrees of freedom of the interference channel cannot be achieved with finite symbol extensions (using linear schemes), but can only be achieved asymptotically in the limiting case of arbitrarily large symbol extensions. This is analogous to our result combined with that of reference . The reference shows that the bound of cannot be achieved exactly. Here, we complete the analogy between file size and symbol extensions in our main result by showing that while the bound is not exactly achievable, it is achievable asymptotically in the limit of large file sizes. We state our main result formally below.
Consider any tuple such that . For a file of size stored in distributed storage nodes as a part of a MDS code, the minimum repair bandwidth for exact regeneration of a (single) failed node satisfies
Equivalently, we can write
Ii The Role of Interference Alignment in Exact Regeneration :
Consider the case where the . Further, let the file to be stored be where are vectors over a finite field of size denoted by .
Note that for sufficiently large file sizes, the field size is a design parameter. For the solution for presented here, any prime suffices. Also by defining to be vectors, we are assuming that the file-size , i.e., 4 scalars over the field. For large file sizes, can be treated as a design parameter, since a code for a specified can be used for larger files by splitting the file into portions of size .
As in Figure 2, the first systematic node stores and the second systematic node stores , Let the parity nodes store vectors of the form , where are matrices. Now, consider the case where and
Note that can be reconstructed from any of the nodes. Now, consider the case where the first node fails. First, we present a naive solution which does not align interference. Note that the contents of the node, i.e. can be reconstructed by a new comer downloading linear combinations (or equations) from any other two nodes. For example, it can be reconstructed by downloading the vector from node and the vector from node . In this case, note that among the dimensions (corresponding to the linear equations) at the new comer, two dimensions are occupied by the data to be reconstructed, i.e., the desired data and two dimensions are occupied by the undesired data or interference .
However, a more efficient solution exists. By aligning the interference into dimension, we can see that the node can be repaired by downloading only linear combinations of the stored data from the remaining healthy nodes. To see this, consider the case where the new comer downloads a total of linear combinations of the stored data, one from each remaining healthy node, as follows.
the interference aligns into dimension at the new comer. Further, since
and is linearly independent of the aligned interference , the data storage node can reconstruct the desired two dimensional vector from the received dimensional vector. The code can be shown to exactly repair any failed node with a repair bandwidth of , i.e., with the new comer collecting linear combinations from the healthy nodes. We now proceed to extend this for general values of and prove our main result.
Iii Proof of Theorem 1
We begin by generalizing the setting described in the previous section. The total data is represented by the dimensional matrix , where is an dimensional vector stored by systematic node . Node , where being a parity node stores the vector , where is a square matrix for Henceforth, we assume that for ,
The above assumption implies that the data stored in node is the vector
Note that for are a design choice that define the code; these matrices will henceforth be referred to as the coding matrices. We need to choose these matrices so that the code is an MDS code, i.e., using any subset of nodes, the entire vector of data must be reconstructable. Thus, we need to ensure that
for any distinct .
Now, when a node fails, the new comer collects a vector from each of the remaining healthy nodes where , so that the total repair bandwidth is . Our goal is to find the coding matrices and design the vector to be downloaded by the new comer so as to meet the required bound (presented in the statement of the theorem). We now describe our solution assuming that a systematic node fails. We will later describe how the solution can be adapted to repair failures of parity nodes. Without loss of generality, let us assume that node fails. We provide a linear solution to this problem, so that the vector downloaded by the new comer from node to repair node is , where is a matrix. The matrices will be henceforth referred to as the repair vectors. The new comer now has to regenerate the vector using vectors of the form , each of dimension . Notice that the vectors (of dimension ) downloaded using the systematic nodes do not contain any information about the desired vector and can be interpreted as interference. Therefore, the new comer has, apart from the interference, vectors of dimension containing linear combinations of the desired data. Thus, the vectors available at the new comer can be described as follows.
vectors of the form - these vectors are downloaded from the healthy systematic nodes. They contain no information about the desired data, and will be used to cancel interference.
vectors of the - these vectors contain both the desired signal and components of the interference.
The goal of our solution will be to completely cancel the interference from the latter vectors using the former vectors listed above, and then to regenerate using the latter vectors. In order to completely cancel the interference related to using by linear techniques, we will need, , and for some matrix ,
While the above condition ensures that the entire interference can be cancelled, we also need to ensure that, on interference cancellation, the vectors of dimension are sufficient to reconstruct . Note that after interference cancellation, each of the vectors is of the form , for . For linear reconstruction, we need
for some matrix . Therefore, we need
Therefore, our goal is to design and for so that
The code is a MDS code.
The interference is aligned appropriately so that it can be completely cancelled.
The desired signal can be regenerated at the new comer.
Thus, essentially we need to pick and for so that (1), (4) and (5) are satisfied. Further, as noted in Remark 1, the field size and are also design choices (for large file sizes) that we can use to satisfy these conditions.
Iii-a The solution : Choosing and
For , the solutions of [4, 5] design these matrices using Cauchy matrices to satisfy these conditions. Here, note that the conditions (4), (5) are similar to the interference alignment conditions in the interference channel . Specifically, (4) is analogous to the condition that all the interference must align in the -user interference channel, and (5) is similar to the condition that the desired signal must be linearly independent for linear decoding in the interference channel . These parallels with enable us to build a solution based on the asymptotically perfect interference alignment scheme of the same reference.
On noting that there are alignment equations in (4), like in , we choose and where can be any integer222The intuition for these choices of and will hopefully become clear later in this section for a reader unfamiliar with .. For any value of , we show the existence of a field size , matrices and so that (1), (4), (5) are satisfied and the failed node can be repaired. Finally, we show that our code can be used to repair non-systematic nodes as well. Before we proceed to give a random coding based construction of the coding matrices and repair vectors, we will evaluate the repair bandwidth achieved by our scheme. Noting that our construction is applicable for any value of , we can make , a design parameter, arbitrarily large. As , we have and
We now proceed to explain our construction of coding matrices and repair vectors satisfying the constraints of repair (1), (4), (5). Our solution, unlike those in references [4, 5], is a random coding solution. Specifically, we choose the coding matrices randomly. We then provide an expression for as a (random) function of so that (4) is satisfied. Then we show for large field size , that (1) and (5) are satisfied with a non-zero probability. This implies that there exists at least one choice of coding matrices so that all the desired condtions, i.e., (1), (4), (5) are satisfied.
Design of Coding Matrices,
The alignment constraints, (4), are similar to the alignment constraints for the interference channel (See equation (50) in ). Note that the matrices play a role analogous to channel matrices in wireless interference channels . Drawing inspiration from , we choose the dimensional matrices to be random diagonal matrices with each diagonal entry of each matrix chosen independently and uniformly distributed over the non-zero elements of the field . In other words, we choose
with all the diagonal entries chosen independent of each other and independent of all the diagonal entries of all other coding matrices, i.e., with chosen independent of from the non-zero elements of the field, for all or or , where , and . Note that all the coding matrices are full rank since all the diagonal elements are non-zero. We later show that that this code is an MDS code with non-zero probability.
Design of Repair Vectors,
Here, we provide a set of repair vectors that satisfy (4). We first set the columns of vectors (which are analogous to beamforming vectors in interference channels)
where, and are dimensional matrices. Then the relations (4) can be re-written as
for Note that there are conditions contained in (7). We wish to find so that all these conditions are satisfied.
Intuitive understanding of asymptotic alignment: Before we provide precise expressions for , we will intuitively explain the extent of alignment required to to satisfy (4), (5). Since our bandwidth is restricted by , we need and . Further, noting that (5) implies , we get . Therefore must have at least non-zero linearly independent columns. In order to satisfy (7), the span of the non-zero column vectors of the matrix
should align in the space spanned by the column vectors of . For large values of , since , and all the coding matrices have a full rank of , we have for any . From (7) this implies that . In other words, the alignment between the matrices on the left hand side of the relations indicated by (7) is asymptotically perfect for large . Next we return to the mathematical construction of the alignment scheme.
Following the arguments of [8, 9], we choose the set of non-zero column vectors of as shown below333For convenience, we ignore the abuse in notation of these equations; the quantity on the left denotes the matrix, whereas the quantity on the right only denotes the set of non-zero columns of the matrix.,
where the entries of the column vector are chosen uniformly over the non-zero elements of the field and independent of all the coding matrices.
Thus, the elements of contain products of (diagonal) coding matrices corresponding to interference symbols contained in the parity nodes, with each matrix raised to an exponent that is allowed to take integer values from upto . Since there are coding matrices and possible distinct values for the exponent of each matrix, the total number of elements, i.e. column vectors, in is . Similarly, the total number of column vectors in is . To understand the notation better, consider, e.g., the case where . Then, , i.e., just one column vector, and contains all the vectors of the form
where . For any general value of , the columns of are of the form
where and has columns of the form
where . Note that the ordering of the matrices in the above notation is irrelevant, since the coding matrices, being diagonal, commute. This commuting property is the key to the alignment scheme. Because the ordering of matrices is irrelevant, it is readily verified that multiplying any column vector from by any of the involved, produces a column vector contained in . This is because multiplication by simply raises the corresponding exponent of the element in by one, but the elements of already include all such terms. Since the set of columns of is a sub-set of the columns of for any , it is evident that this choice of repair vectors satisfies (7), and equivalently, (4).
We have now chosen coding matrices and repair vectors so that the alignment constraints (4) are satisfied. We now need to show (1) and (5). In order to show that the matrices of (1) and (5) are full rank, it is enough to show that their determinants are non-zero. Notice that the determinant of the matrix of (1), i.e.,
is a polynomial in its entries. Note that there are polynomials of this kind, which can be represented, for , as
denotes all the diagonal entries of the coding matrices. In the appendix, we show that each of these polynomials is a non-zero polynomial.
Similarly, we need to show (5), i.e.,
where has non-zero columns chosen using (8). Using these non-zero columns (i.e., discounting the columns of which are zero) the above matrix is of dimension . Therefore, to show that this square matrix has a full rank of , we need to show that its determinant is non-zero. Since a determinant is a polynomial function of its entries, the determinant expansion above is a polynomial
where . An argument very similar to Lemma 1 of  can be used to show that the polynomial formed by this matrix for our solution is a non-zero polynomial (See also Appendix III in ). Thus, the product is non-zero polynomial of . Using Schwartz-Zippel Lemma, for large enough , we have at least one choice of coding matrices and repair vectors such that these polynomials do not evaluate to non-zero, and therefore a solution exists so that (1),(5) are satisfied.
Repair of Non-Systematic (Parity) nodes
So far, we have discussed an achievable scheme for regenerating a systematic node. The code constructed here can also be used to regenerate a failed parity node in the same manner. To see this, suppose that a parity node, say node , fails. The new comer intends to regenerate . Let
Since the code is an MDS code, using a change of basis, we can write
where are all diagonal. In other words, a change of basis can essentially transform the regeneration of a parity node to appear like regeneration of a systematic node, i.e., with nodes viewed as systematic nodes storing data ; nodes are viewed as parity nodes using coding matrices . Since all the coding matrices are diagonal, the problem can be solved in a similar manner as above, i.e, the vectors downloaded by the new comer can be constructed as in (9),(8) and be verified to satisfy a property similar to (4). The only thing that remains is to verify if a reconstruction criterion similar to (5) is satisfied. In order to show this, it is enough to show that all the (random) diagonal entries of any new coding matrix for are uniformly distributed over the non-zero entries of the field which are independent of each other and independent of all diagonal entries in the other new coding matrices, much like our original construction. Showing this independence property will ensure that our earlier proof of (5) is applicable. In order to show this independence property, we explicitly evaluate for as follows.
Now, note that if and are two diagonal matrices with their diagonal entries drawn independently and uniformly distributed over the non-zero elements of the field, then each of the matrices and has diagonal entries uniformly distributed over the non-zero elements of the field. Further each of these matrices are independent of . This implies that all the diagonal entries of are distributed independently of each other, and uniformly distributed over the non-zero elements of . Also, this property can be used in (11) to verify that all the entries of any coding matrix are independent of the entries of all the entries of any other new coding matrix for or . For example, is independently distributed of since the entries of are independent of in our original code construction. Thus, the basis transformation preserves the required independence criteria and a property similar to (5) holds. This completes the proof.
We have shown that, per bit of data to be reconstructed, surprisingly, there is no loss of exact regeneration over functional regeneration in terms of the amount of repair bandwidth per bit of repaired data, in the limit of large file sizes, regardless of the desired redundancy level. The result is in contrast with previous work in  where it is shown that there is an efficiency loss for exact regeneration over functional regeneration especially for low redundancy levels. However, note that the two results do not contradict each other. While our asymptotic alignment scheme can approach arbitrarily close to the cut-set bound on minimum repair bandwidth per bit of repaired data, the bound is not achieved with exact equality. Also unlike previous work in [4, 5] we do not provide explicit codes or specify the minimum field size, since our arguments are based on properties of random matrices. Directions for ongoing work include interference alignment solutions for exact repair for each point on the storage-bandwidth tradeoff curve.
Appendix A Proof of (1)
We intend to show that the determinant of the matrix in (1) is a non-zero polynomial in its entries. Assuming, without loss of generality, that are in ascending order, let and . Therefore, we need to show that the determinant of the following matrix is a non-zero polynomial of its entries.
we want the following matrix to be full rank.
Therefore, we essentially need to show that the determinant formed by the above matrix is non-zero. Since the first matrix is the identity matrix, expanding the determinant along the first rows, the determinant can be shown to be equal to the determinant of the following matrix.
We need to show that the determinant of the matrix is non-zero. Note that we have
where each is independent of for or or . Since interchanging the rows or columns of a matrix does not change its determinant except for its sign, we make the row and column exchange operations to simplify . Let the rows of be . Now, we only need to show that the determinant of is non-zero where
where is a permutation. Now, further, let be the columns of . We then perform column exchange operations of to get the matrix where, is also a permutation. Now, that the determinant of is non-zero is equivalent to showing that the determinant of is non-zero. Choosing the permutations as
it can be verified that the matrix has a block diagonal structure, with blocks of size . The th block of is
Since the determinant of a block diagonal matrix is a product of the determinant of each of its blocks, and the determinant of the square matrix formed by the above block is a non-zero polynomial of its entries, the determinant of the matrix in (1) is a non-zero polynomial of its entries, as required.
-  A. Dimakis, P. Godfrey, M. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” in IEEE INFOCOM, pp. 2000 –2008, may 2007.
-  Y. Wu, “Existence and construction of capacity-achieving network codes for distributed storage,” in IEEE International Symposium on Information Theory, pp. 1150 –1154, 28 2009-july 3 2009.
-  Y. Wu and A. Dimakis, “Reducing repair traffic for erasure coding-based storage via interference alignment,” in IEEE International Symposium on Information Theory, pp. 2276 –2280, 28 2009-july 3 2009.
-  C. Suh and K. Ramchandran, “Exact regeneration codes for distributed storage repair using interference alignment,” CoRR, vol. abs/1001.0107, 2010. http://arxiv.org/abs/1001.0107.
-  N. B. Shah, R. K. V., P. V. Kumar, and K. Ramachandran, “Explicit codes minimizing repair bandwidth for distributed storage,” CoRR, vol. abs/0908.2984, 2009. http://arxiv.org/abs/0908.2984.
-  S. Jafar and S. Shamai, “Degrees of freedom region for the MIMO X channel,” IEEE Trans. on Information Theory, vol. 54, pp. 151–170, Jan. 2008.
-  M. Maddah-Ali, A. Motahari, and A. Khandani, “Communication over MIMO X channels: Interference alignment, decomposition, and performance analysis,” in IEEE Trans. on Information Theory, pp. 3457–3470, 2008.
-  V. Cadambe and S. Jafar, “Interference alignment and the degrees of freedom of the K user interference channel,” IEEE Trans. on Information Theory, vol. 54, pp. 3425–3441, Aug. 2008.
-  V. R. Cadambe and S. Jafar, “Reflections on interference alignment and the degrees of freedom of the K user interference channel,” IEEE Information Theory Society Newsletter, vol. 59, pp. 5–9, December 2009.
-  Viveck R. Cadambe, Syed A. Jafar, “Interference Alignment and the Degrees of Freedom of Wireless X Networks”, IEEE Transactions on Information Theory, Vol. 55, No. 9, Sep. 2009,Pages: 3893-3908.