A Note on Secure Minimum Storage Regenerating Codes
This short note revisits the problem of designing secure minimum storage regenerating (MSR) codes for distributed storage systems. A secure MSR code ensures that a distributed storage system does not reveal the stored information to a passive eavesdropper. The eavesdropper is assumed to have access to the content stored on number of storage nodes in the system and the data downloaded during the bandwidth efficient repair of an additional number of storage nodes. This note combines the Gabidulin codes based precoding  and a new construction of MSR codes (without security requirements) by Ye and Barg  in order to obtain secure MSR codes. Such optimal secure MSR codes were previously known in the setting where the eavesdropper was only allowed to observe the repair of nodes among a specific subset of nodes [18, 7]. The secure coding scheme presented in this note allows the eavesdropper to observe repair of any ouf of nodes in the system and characterizes the secrecy capacity of linear repairable MSR codes.
Consider a distributed storage system that stores a file of size (symbols over a finite field ) on a network of storage nodes such that the file can be reconstructed from the content of any out of nodes in the system. In , Dimakis et al. study the issue of recovering the content stored in a node by downloading a small amount of data from the remaining nodes in the system. This problem is referred to as the node repair problem. The ability to conduct node repair is useful in maintaining the redundancy level of the system in the event of a node failure. Moreover, the content of a temporarily unavailable node can be accessed using the rest of the (available) nodes in the system by treating the unavailable node as a failure and invoking the mechanism to repair this node. Dimakis et al. introduce repair-bandwidth, the number of symbols downloaded to repair a node, as a metric to quantify the efficiency of the node repair mechanism . Assuming that each node stores symbols (over ) and the node repair mechanism requires contacting storage nodes and downloading symbols from each of the contacted nodes,  presents the following fundamental trade-off between the repair-bandwidth and per node storage .
The codes that operate at this trade-off are referred to as regenerating codes . In particular, the codes corresponding to the minimum storage point, i.e., , are called minimum storage regenerating (MSR) codes. Note that an MSR code is an MDS vector code  and operates at the following point on the trade-off defined by the bound in (1).
An MSR code is said to be exact-repairable if its repair mechanism ensures reconstruction of the data that is identical to the content stored on the node being repaired. The exact-repairable MSR codes form an attractive class of coding schemes as they preserve the structure of the storage system throughout the operation of the system. The problem of designing exact-repairable MSR codes has been extensively studied in [16, 2, 14, 24, 20, 17, 19, 5] and references therein. Recently, Ye and Barg present explicit constructions for exact-repairable MSR codes for all values of system parameters , and in [27, 28]. These constructions enable repair of all nodes in the system as opposed to some of the earlier constructions (e.g. the constructions from [24, 2]) which enable bandwidth-efficient repair only for a particular set of (systematic) nodes.
In this document, we address the issue of designing distributed storage systems that protect the stored information against eavesdropping attacks. Given increasing utilization of distributed storage systems (a.k.a. cloud storage) for storing valuable and confidential information, it is important that these systems prevent leakage of the stored information to an unauthorized and (or) adversarial agent. In this paper, we present a coding scheme that is information theoretically secure against an eavesdropper who can observe the data downloaded during the repair of storage nodes and access the content stored on (additional) nodes. The secure coding scheme enables exact-repair of all nodes in the system with and operates at the MSR point. The scheme is obtained by combining Gabidulin codes based precoding scheme  with a code construction presented in . We note that the obtained coding scheme characterizes the secrecy capacity  of those distributed storage systems that employ linear repair mechanisms and operate at the MSR point.
2 Background and related work
In this section we formally define the underlying eavesdropper model and the associated secrecy capacity. We then present a brief description two key components of our secure coding schemes: 1) Gabidulin precoding scheme and 2) a code construction from . We conclude the section with a discussion on the prior work in the area of designing secure coding schemes for distributed storage systems.
2.1 System model
We consider a DSS with storage nodes where each node stores symbols over a finite field . We assume that the DSS employs a coding scheme such that content of any out of nodes in the system is sufficient to construct the content stored in the remaining nodes. Furthermore, we assume that the content of every node in the system can be exactly reconstructed by contacting any out of remaining nodes and downloading symbols (over ) from each of the contacted nodes. It follows from the Singleton bound that such a system can store a file with at most (independent) symbols (over ). In fact, an MDS coding scheme does store a file of size symbols (over ). For such coding schemes, it follows from the work of Dimakis et al.  that
Here, we focus on the DSS employing those exact-repairable coding schemes that are both storage and repair-bandwidth efficient, i.e., we have that
2.2 Eavesdropper model and secrecy capacity
We consider the -eavesdropper model introduced in . Let nodes in the DSS are indexed by the set . For and such that , an -eavesdropper can directly access the content stored on any storage nodes indexed by the set . Additionally, the eavesdropper observes the data downloaded during the repair of any storage nodes indexed by the set . The nodes indexed by the sets and are referred to as storage-eavesdropped and download-eavesdropped nodes, respectively. Note that a download-eavesdropped node may reveal more information compared to a storage-eavesdropped node as the content stored on a node is a function of the data downloaded during its repair. In this document we focus on coding schemes that are information theoretically secure against an -eavesdropper. We formalize this notion in the following definition.
Let be a secure file of size symbols (over ). We say that the DSS securely stores against an -eavesdropper, if we have
Here, denotes the observations of an eavesdropped with its storage-eavesdropped and download-eavesdropped nodes indexed by the sets and , respectively. Equivalently, we also say that the DSS achieves a secure file size .
The secrecy capacity of a DSS against an -eavesdropper is defined as the maximum secure file size achieved by the DSS. In other words, the secrecy capacity of a DSS denotes the maximum sized secure file that it can store without leaking any information to an -eavesdropper.
As described in Section 2.1 and 2.2, we aim to store a secure file of size symbols (over ) in a DSS that stores symbols (over ) without any security guarantees. Moreover, the DSS is required to operate at the MSR point which is defined by the parameters given in (4). Towards this we utilize random symbols (over ). The following lemma from [26, 22] allows us to argue that the proposed coding scheme is secure against an -eavesdropper.
Let be the secure file that needs to be stored on a DSS and be random symbols (independent of ). Let be the observations of an eavesdropped with its storage-eavesdropped and download-eavesdropped nodes indexed by the sets and , respectively. If we have and , then
2.3.1 Gabidulin precoding
Given a vector and points which are linearly independent (over a subfield of ), the Gabidulin precoding of is obtained in two steps.
First, construct a linearized polynomial with the vector defining its coefficients as follows.
where, for a positive integer , we use to denote .
Evaluate the linearized polynomial at the given set of points to obtain the associated Gabidulin precoded vector
2.3.2 Ye and Barg construction 
In , Ye and Barg present multiple code constructions for the MSR codes for all values of , and . These are the first fully explicit constructions of this nature. Here, we briefly describe one of the constructions from  which we utilize to construct secure coding schemes at the MSR point. Similarly, other constructions from  can also be utilized to obtain secure coding scheme.
For an element ,
denotes the -ary vector representation of the element . For , and , denotes the element with the following -ary vector representation.
Let be a field with and be its primitive element. An MSR code with is defined by the following parity check matrix.
where denotes the identity matrix. For , is an matrix which is defined as follows.
where denotes addition modulo , and . Here, denotes the collection of standard basis vectors in , i.e., all but -th coordinate of the vector are equal to zero and the -th coordinate has its entry equal to .
Let denote a codeword of the MSR code defined by the parity check matrix (cf. (8)), i.e.,
For , we have that
which denotes the symbols stored on the -th storage node in the system. In , Ye and Barg show that the code defined by is an MDS array code, i.e.,
and for any codeword and any set , the symbols are sufficient to reconstruct the entire codeword . Furthermore, Ye and Barg establish that the code is an MSR code with , i.e., for any and , the symbols stored on the -th node can be reconstructed by downloading symbols (over ) from each of the remaining nodes. In particular, for , can be reconstructed by downloading the following symbols.
Similarly, can be reconstructed by downloading the following symbols.
We refer the reader to  for the further details of the construction.
2.4 Related work
Pawar et al. formally begin the study of the problem of designing coding schemes for DSS that are secure against passive eavesdropping attacks in . For a distributed storage system that has per node storage and requires downloading symbols from intact nodes during the repair of a failed node, Pawar at al. obtain the following upper bound on its secrecy capacity .
Recall that we are only considering those distributed storage systems where the content of any out of storage nodes is sufficient to reconstruct the entire stored information. In , Shah et al. utilize the product-matrix construction  for minimum bandwidth regenerating (MBR) codes to construct coding schemes that are secure against an -eavesdropper for all values of and such that . These coding schemes operate at and attain the bound on the secrecy capacity in (13). Shah et al. also utilize the product-matrix construction for MSR codes to design secure MSR coding schemes that achieves secure file size of
against an -eavesdropper . Note that, for , there is a gap between the bound in (14) and the secure file size achieved in (14). In , Rawat et al. obtained an improved bound on the secure file size achievable at the MSR point.
where denotes the set download-eavesdropped nodes and denotes the data sent by the -th node for the repair of the storage nodes indexed by the set . Furthermore, for linear repair schemes with and , the bound in (15) specializes to the following .
In , Goparaju et al. show that the bound in (16) holds for all values of . They further generalize this bound and show that for linear repair schemes with , the secure file size achievable at the MSR point satisfies the following .
As for the achievability schemes, for , Rawat et al. obtain a secure coding scheme at the MSR point that attain the bound in (16) provided that, whenever , download-eavesdropped nodes are restricted to a fixed set of nodes among the nodes in the system. This coding scheme is obtained by combining the Gabidulin precoding (cf. Section 2.3.1) with the zigzag codes from . We also note that for , the secure coding scheme from  is optimal as it attains the bound in (15). Recently, Huang et al. further explore the problem of characterizing the secrecy capacity of MSR codes in . For , the secure files size in (14) is shown to be optimal [8, 21]. Huang et al. show that optimality of the bound in (14) for the MSR codes with the bounded values of . The problem of obtaining bounds on the secrecy capacity of distributed storage systems is also studied in  under the non black-box version of the problem. For brevity, we skip a discussion on this and refer the reader to [25, 6].
In this paper, we establish that for linear repairable DSS with , the bound in (16) is the exact characterization of the secrecy capacity of an MSR code. One of the codes constructions of MSR codes from  (cf. Section 2.3.2) enables us to remove the restriction appearing in the secure coding scheme from  that the download-eavesdropped nodes be restricted to a subset of nodes. As for the possibility of attaining a larger secure file size by utilizing non-linear repair schemes, Goparaju et al. show Pareto optimality of the linear repairable MSR codes among those MSR codes that simultaneously allow for all values of during the design of a secure coding scheme operating at the MSR point .
The cooperative regenerating codes enable simultaneous bandwidth efficient repair of multiple node failures [23, 10]. Security of DSS employing cooperative regenerating codes against passive eavesdropping attacks is explored in [11, 9]. Locally repairable codes (LRCs) is another class of codes designed to be employed in distributed storage systems [4, 13]. These codes aim at repairing a failed node by contacting a small number of surviving nodes in the system. We note that the problem of designing secure locally repairable codes against passive eavesdropping attacks is considered in [18, 1].
3 Secure MSR codes
In this section we present a linear repairable coding scheme that operates at the MSR point with and achieve the optimal secure file size in this setting.
Let and be given system parameters. Let be a secure file of size
symbols (over a finite field ). We assume that we have , where (cf. (18)) and . Let and . We now generate a coding scheme that securely stores the file against an -eavesdropped in the following two step process.
Let denote i.i.d. random symbols (independent of ) that are uniformly distributed over . We take linearly independent point (over ) and perform Gabidulin precoding of the vector as defined in Section 2.3.1. Let denote the precoded vector, i.e.,
where is the linearized polynomial associated with the vector as define in (5), i.e.,
For , the -th storage node stores the symbols in the subvector (cf. (22)).
The repairability of the proposed coding scheme with the repair-bandwidth symbols (over ) follows from the repairability of the code (cf. 2.3.2). Next, we argue that the proposed coding scheme is secure against an -eavesdropper. Towards this, we present the following simple lemma.
Let . For , let denote the symbols downloaded from the -th storage node during the repair of the storage nodes indexed by the set . Then, we have
The coding scheme described in Construction 2 is secure against an -eavesdropper.
The proof of this proposition is very similar to the proof of [18, Theorem 18]. Let and denote the indices of the storage-eavesdropped and download-eavesdropped nodes, respectively. Let’s consider a set of storage nodes such that . Let denote the symbols observed by the eavesdropper. Note that
Since is a generator matrix of an MDS array code, it follows from [18, Lemma 9] that the symbols in the set
correspond to the evaluations of the linearized polynomial (cf. (20)) at
linearly independent (over ) points in . Note that the symbols in (26) can be obtained from the symbols observed by the eavesdropper (cf. (26)). Given the secure file , one can remove the contribution of from these evaluations of to obtain the evaluations of the following polynomial at the linearly independent (over ) points in .
where the last equality holds as the first coordinates of the vector are composed of random symbols (cf. Construction 2). Now, it is straightforward from [18, Remark 8] that these evaluations are sufficient to recover the coefficients of the linearized polynomial . In other words, we have that
Since the choice of and is arbitrary, this establishes that the coding scheme obtained by Construction 2 is secure against an -eavesdropper. ∎
We characterize the secrecy capacity of linear repairable MSR codes with against a passive eavesdropping attack, where the eavesdropper is allowed to observe repair of storage nodes in addition to the content stored on storage nodes. One of the code constructions for MSR codes from  proves instrumental in establishing this result. It is an interesting question to establish the similar results for general values of . Another direction for future work is to characterize the secrecy capacity of minimum storage cooperative regenerating (MSCR) codes.
-  A. Agarwal and A. Mazumdar. Security in locally repairable storage. In Proc. of 2015 IEEE Information Theory Workshop (ITW), pages 1–5, April 2015.
-  V. R. Cadambe, C. Huang, and J. Li. Permutation code: Optimal exact-repair of a single failed node in MDS code based distributed storage systems. In Proc. of 2011 IEEE International Symposium on Information Theory (ISIT), pages 1225–1229, 2011.
-  A. G. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. IEEE Transactions on Information Theory, 56(9):4539–4551, 2010.
-  P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin. On the locality of codeword symbols. IEEE Transactions on Information Theory, 58(11):6925–6934, 2012.
-  S. Goparaju, A. Fazeli, and A. Vardy. Minimum storage regenerating codes for all parameters. CoRR, abs/1602.04496, 2016.
-  S. Goparaju, S. El Rouayheb, and R. Calderbank. Can linear minimum storage regenerating codes be universally secure? In Proc. of 49th Asilomar Conference on Signals, Systems and Computers, pages 549–553, Nov 2015.
-  S. Goparaju, S. El Rouayheb, R. Calderbank, and H. V. Poor. Data secrecy in distributed storage systems under exact repair. In Proc. of 2013 International Symposium on Network Coding (NetCod), pages 1–6, June 2013.
-  K. Huang, U. Parampalli, and M. Xian. Characterization of secrecy capacity for general MSR codes under passive eavesdropping model. CoRR, abs/1505.01986, 2015.
-  K. Huang, U. Parampalli, and M. Xian. Security concerns in minimum storage cooperative regenerating codes. CoRR, abs/1509.01324, 2015.
-  A.-M. Kermarrec, N. Le Scouarnec, and G. Straub. Repairing multiple failures with coordinated and adaptive regenerating codes. In Proceedings of 2011 International Symposium on Network Coding (NetCod), pages 1–6, 2011.
-  O. O. Koyluoglu, A. S. Rawat, and S. Vishwanath. Secure cooperative regenerating codes for distributed storage systems. IEEE Transactions on Information Theory, 60(9):5228–5244, Sept 2014.
-  F. J. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes. Amsterdam: North-Holland, 1983.
-  D. S. Papailiopoulos and A. G. Dimakis. Locally repairable codes. IEEE Transactions on Information Theory, 60(10):5843–5855, Oct 2014.
-  D. S. Papailiopoulos, A. G. Dimakis, and V. Cadambe. Repair optimal erasure codes through hadamard designs. IEEE Transactions on Information Theory, 59(5):3021–3037, 2013.
-  S. Pawar, S. El Rouayheb, and K. Ramchandran. Securing dynamic distributed storage systems against eavesdropping and adversarial attacks. IEEE Transactions on Information Theory, 57(10):6734–6753, 2011.
-  K. V. Rashmi, N. B. Shah, and P. V. Kumar. Optimal exact-regenerating codes for distributed storage at the msr and mbr points via a product-matrix construction. IEEE Transactions on Information Theory, 57(8):5227–5239, Aug 2011.
-  N. Raviv, N. Silberstein, and T. Etzion. Access-optimal MSR codes with optimal sub-packetization over small fields. CoRR, abs/1505.00919, 2015.
-  A. S. Rawat, O. O. Koyluoglu, N. Silberstein, and S. Vishwanath. Optimal locally repairable and secure codes for distributed storage systems. IEEE Transactions on Information Theory, 60(1):212–236, Jan 2014.
-  A. S. Rawat, O. O. Koyluoglu, and S. Vishwanath. Progress on high-rate MSR codes: Enabling arbitrary number of helper nodes. CoRR, abs/1601.06362, 2016.
-  B. Sasidharan, G. K. Agarwal, and P. V. Kumar. A high-rate MSR code with polynomial sub-packetization level. CoRR, abs/1501.06662, 2015.
-  B. Sasidharan, P. V. Kumar, N. B. Shah, K. V. Rashmi, and K. Ramachandran. Optimality of the product-matrix construction for secure msr regenerating codes. In Proc. of 6th International Symposium on Communications, Control and Signal Processing (ISCCSP), pages 10–14, May 2014.
-  N. B. Shah, K. V. Rashmi, and P. V. Kumar. Information-theoretically secure regenerating codes for distributed storage. In Proc. of 2011 IEEE Global Telecommunications Conference (GLOBECOM), pages 1–5, 2011.
-  K. W. Shum and Y. Hu. Cooperative regenerating codes. IEEE Transactions on Information Theory, 59(11):7229–7258, 2013.
-  I. Tamo, Z. Wang, and J. Bruck. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Transactions on Information Theory, 59(3):1597–1616, 2013.
-  R. Tandon, S. Amuru, T. C. Clancy, and R. M. Buehrer. Toward optimal secure distributed storage systems with exact repair. IEEE Transactions on Information Theory, 62(6):3477–3492, June 2016.
-  A. Wyner. The wire-tap channel. The Bell System Technical Journal, 54(8):1355 – 1387, October 1975.
-  M. Ye and A. Barg. Explicit constructions of high-rate MDS array codes with optimal repair bandwidth. CoRR, abs/1604.00454, 2016.
-  M. Ye and A. Barg. Explicit constructions of optimal-access MDS codes with nearly optimal sub-packetization. CoRR, abs/1605.08630, 2016.