A Note on Secure Minimum Storage Regenerating Codes
Abstract
This short note revisits the problem of designing secure minimum storage regenerating (MSR) codes for distributed storage systems. A secure MSR code ensures that a distributed storage system does not reveal the stored information to a passive eavesdropper. The eavesdropper is assumed to have access to the content stored on number of storage nodes in the system and the data downloaded during the bandwidth efficient repair of an additional number of storage nodes. This note combines the Gabidulin codes based precoding [18] and a new construction of MSR codes (without security requirements) by Ye and Barg [27] in order to obtain secure MSR codes. Such optimal secure MSR codes were previously known in the setting where the eavesdropper was only allowed to observe the repair of nodes among a specific subset of nodes [18, 7]. The secure coding scheme presented in this note allows the eavesdropper to observe repair of any ouf of nodes in the system and characterizes the secrecy capacity of linear repairable MSR codes.
1 Introduction
Consider a distributed storage system that stores a file of size (symbols over a finite field ) on a network of storage nodes such that the file can be reconstructed from the content of any out of nodes in the system. In [3], Dimakis et al. study the issue of recovering the content stored in a node by downloading a small amount of data from the remaining nodes in the system. This problem is referred to as the node repair problem. The ability to conduct node repair is useful in maintaining the redundancy level of the system in the event of a node failure. Moreover, the content of a temporarily unavailable node can be accessed using the rest of the (available) nodes in the system by treating the unavailable node as a failure and invoking the mechanism to repair this node. Dimakis et al. introduce repairbandwidth, the number of symbols downloaded to repair a node, as a metric to quantify the efficiency of the node repair mechanism [3]. Assuming that each node stores symbols (over ) and the node repair mechanism requires contacting storage nodes and downloading symbols from each of the contacted nodes, [3] presents the following fundamental tradeoff between the repairbandwidth and per node storage .
(1) 
The codes that operate at this tradeoff are referred to as regenerating codes [3]. In particular, the codes corresponding to the minimum storage point, i.e., , are called minimum storage regenerating (MSR) codes. Note that an MSR code is an MDS vector code [12] and operates at the following point on the tradeoff defined by the bound in (1).
(2) 
An MSR code is said to be exactrepairable if its repair mechanism ensures reconstruction of the data that is identical to the content stored on the node being repaired. The exactrepairable MSR codes form an attractive class of coding schemes as they preserve the structure of the storage system throughout the operation of the system. The problem of designing exactrepairable MSR codes has been extensively studied in [16, 2, 14, 24, 20, 17, 19, 5] and references therein. Recently, Ye and Barg present explicit constructions for exactrepairable MSR codes for all values of system parameters , and in [27, 28]. These constructions enable repair of all nodes in the system as opposed to some of the earlier constructions (e.g. the constructions from [24, 2]) which enable bandwidthefficient repair only for a particular set of (systematic) nodes.
In this document, we address the issue of designing distributed storage systems that protect the stored information against eavesdropping attacks. Given increasing utilization of distributed storage systems (a.k.a. cloud storage) for storing valuable and confidential information, it is important that these systems prevent leakage of the stored information to an unauthorized and (or) adversarial agent. In this paper, we present a coding scheme that is information theoretically secure against an eavesdropper who can observe the data downloaded during the repair of storage nodes and access the content stored on (additional) nodes. The secure coding scheme enables exactrepair of all nodes in the system with and operates at the MSR point. The scheme is obtained by combining Gabidulin codes based precoding scheme [18] with a code construction presented in [27]. We note that the obtained coding scheme characterizes the secrecy capacity [15] of those distributed storage systems that employ linear repair mechanisms and operate at the MSR point.
2 Background and related work
In this section we formally define the underlying eavesdropper model and the associated secrecy capacity. We then present a brief description two key components of our secure coding schemes: 1) Gabidulin precoding scheme and 2) a code construction from [27]. We conclude the section with a discussion on the prior work in the area of designing secure coding schemes for distributed storage systems.
2.1 System model
We consider a DSS with storage nodes where each node stores symbols over a finite field . We assume that the DSS employs a coding scheme such that content of any out of nodes in the system is sufficient to construct the content stored in the remaining nodes. Furthermore, we assume that the content of every node in the system can be exactly reconstructed by contacting any out of remaining nodes and downloading symbols (over ) from each of the contacted nodes. It follows from the Singleton bound that such a system can store a file with at most (independent) symbols (over ). In fact, an MDS coding scheme does store a file of size symbols (over ). For such coding schemes, it follows from the work of Dimakis et al. [3] that
(3) 
Here, we focus on the DSS employing those exactrepairable coding schemes that are both storage and repairbandwidth efficient, i.e., we have that
(4) 
2.2 Eavesdropper model and secrecy capacity
We consider the eavesdropper model introduced in [22]. Let nodes in the DSS are indexed by the set . For and such that , an eavesdropper can directly access the content stored on any storage nodes indexed by the set . Additionally, the eavesdropper observes the data downloaded during the repair of any storage nodes indexed by the set . The nodes indexed by the sets and are referred to as storageeavesdropped and downloadeavesdropped nodes, respectively. Note that a downloadeavesdropped node may reveal more information compared to a storageeavesdropped node as the content stored on a node is a function of the data downloaded during its repair. In this document we focus on coding schemes that are information theoretically secure against an eavesdropper. We formalize this notion in the following definition.
Definition 1.
Let be a secure file of size symbols (over ). We say that the DSS securely stores against an eavesdropper, if we have
Here, denotes the observations of an eavesdropped with its storageeavesdropped and downloadeavesdropped nodes indexed by the sets and , respectively. Equivalently, we also say that the DSS achieves a secure file size .
Remark 1.
The secrecy capacity of a DSS against an eavesdropper is defined as the maximum secure file size achieved by the DSS. In other words, the secrecy capacity of a DSS denotes the maximum sized secure file that it can store without leaking any information to an eavesdropper.
2.3 Preliminaries
As described in Section 2.1 and 2.2, we aim to store a secure file of size symbols (over ) in a DSS that stores symbols (over ) without any security guarantees. Moreover, the DSS is required to operate at the MSR point which is defined by the parameters given in (4). Towards this we utilize random symbols (over ). The following lemma from [26, 22] allows us to argue that the proposed coding scheme is secure against an eavesdropper.
Lemma 1 (Secrecy Lemma [26, 22]).
Let be the secure file that needs to be stored on a DSS and be random symbols (independent of ). Let be the observations of an eavesdropped with its storageeavesdropped and downloadeavesdropped nodes indexed by the sets and , respectively. If we have and , then
2.3.1 Gabidulin precoding
Given a vector and points which are linearly independent (over a subfield of ), the Gabidulin precoding of is obtained in two steps.

First, construct a linearized polynomial with the vector defining its coefficients as follows.
(5) where, for a positive integer , we use to denote .

Evaluate the linearized polynomial at the given set of points to obtain the associated Gabidulin precoded vector
(6)
2.3.2 Ye and Barg construction [27]
In [27], Ye and Barg present multiple code constructions for the MSR codes for all values of , and . These are the first fully explicit constructions of this nature. Here, we briefly describe one of the constructions from [27] which we utilize to construct secure coding schemes at the MSR point. Similarly, other constructions from [27] can also be utilized to obtain secure coding scheme.
Construction 1.
For an element ,
denotes the ary vector representation of the element . For , and , denotes the element with the following ary vector representation.
(7) 
Let be a field with and be its primitive element. An MSR code with is defined by the following parity check matrix.
(8) 
where denotes the identity matrix. For , is an matrix which is defined as follows.
(9) 
where denotes addition modulo , and . Here, denotes the collection of standard basis vectors in , i.e., all but th coordinate of the vector are equal to zero and the th coordinate has its entry equal to .
Let denote a codeword of the MSR code defined by the parity check matrix (cf. (8)), i.e.,
(10) 
For , we have that
which denotes the symbols stored on the th storage node in the system. In [27], Ye and Barg show that the code defined by is an MDS array code, i.e.,
and for any codeword and any set , the symbols are sufficient to reconstruct the entire codeword . Furthermore, Ye and Barg establish that the code is an MSR code with , i.e., for any and , the symbols stored on the th node can be reconstructed by downloading symbols (over ) from each of the remaining nodes. In particular, for , can be reconstructed by downloading the following symbols.
(11) 
Similarly, can be reconstructed by downloading the following symbols.
(12) 
We refer the reader to [27] for the further details of the construction.
2.4 Related work
Pawar et al. formally begin the study of the problem of designing coding schemes for DSS that are secure against passive eavesdropping attacks in [15]. For a distributed storage system that has per node storage and requires downloading symbols from intact nodes during the repair of a failed node, Pawar at al. obtain the following upper bound on its secrecy capacity [15].
(13) 
Recall that we are only considering those distributed storage systems where the content of any out of storage nodes is sufficient to reconstruct the entire stored information. In [22], Shah et al. utilize the productmatrix construction [16] for minimum bandwidth regenerating (MBR) codes to construct coding schemes that are secure against an eavesdropper for all values of and such that . These coding schemes operate at and attain the bound on the secrecy capacity in (13). Shah et al. also utilize the productmatrix construction for MSR codes to design secure MSR coding schemes that achieves secure file size of
(14) 
against an eavesdropper [22]. Note that, for , there is a gap between the bound in (14) and the secure file size achieved in (14). In [18], Rawat et al. obtained an improved bound on the secure file size achievable at the MSR point.
(15) 
where denotes the set downloadeavesdropped nodes and denotes the data sent by the th node for the repair of the storage nodes indexed by the set . Furthermore, for linear repair schemes with and , the bound in (15) specializes to the following [18].
(16) 
In [7], Goparaju et al. show that the bound in (16) holds for all values of . They further generalize this bound and show that for linear repair schemes with , the secure file size achievable at the MSR point satisfies the following [7].
(17) 
As for the achievability schemes, for , Rawat et al. obtain a secure coding scheme at the MSR point that attain the bound in (16) provided that, whenever , downloadeavesdropped nodes are restricted to a fixed set of nodes among the nodes in the system. This coding scheme is obtained by combining the Gabidulin precoding (cf. Section 2.3.1) with the zigzag codes from [24]. We also note that for , the secure coding scheme from [22] is optimal as it attains the bound in (15). Recently, Huang et al. further explore the problem of characterizing the secrecy capacity of MSR codes in [8]. For , the secure files size in (14) is shown to be optimal [8, 21]. Huang et al. show that optimality of the bound in (14) for the MSR codes with the bounded values of . The problem of obtaining bounds on the secrecy capacity of distributed storage systems is also studied in [25] under the non blackbox version of the problem. For brevity, we skip a discussion on this and refer the reader to [25, 6].
In this paper, we establish that for linear repairable DSS with , the bound in (16) is the exact characterization of the secrecy capacity of an MSR code. One of the codes constructions of MSR codes from [27] (cf. Section 2.3.2) enables us to remove the restriction appearing in the secure coding scheme from [18] that the downloadeavesdropped nodes be restricted to a subset of nodes. As for the possibility of attaining a larger secure file size by utilizing nonlinear repair schemes, Goparaju et al. show Pareto optimality of the linear repairable MSR codes among those MSR codes that simultaneously allow for all values of during the design of a secure coding scheme operating at the MSR point [6].
The cooperative regenerating codes enable simultaneous bandwidth efficient repair of multiple node failures [23, 10]. Security of DSS employing cooperative regenerating codes against passive eavesdropping attacks is explored in [11, 9]. Locally repairable codes (LRCs) is another class of codes designed to be employed in distributed storage systems [4, 13]. These codes aim at repairing a failed node by contacting a small number of surviving nodes in the system. We note that the problem of designing secure locally repairable codes against passive eavesdropping attacks is considered in [18, 1].
3 Secure MSR codes
In this section we present a linear repairable coding scheme that operates at the MSR point with and achieve the optimal secure file size in this setting.
Construction 2.
Let and be given system parameters. Let be a secure file of size
(18) 
symbols (over a finite field ). We assume that we have , where (cf. (18)) and . Let and . We now generate a coding scheme that securely stores the file against an eavesdropped in the following two step process.

Let denote i.i.d. random symbols (independent of ) that are uniformly distributed over . We take linearly independent point (over ) and perform Gabidulin precoding of the vector as defined in Section 2.3.1. Let denote the precoded vector, i.e.,
(19) where is the linearized polynomial associated with the vector as define in (5), i.e.,
(20)
For , the th storage node stores the symbols in the subvector (cf. (22)).
The repairability of the proposed coding scheme with the repairbandwidth symbols (over ) follows from the repairability of the code (cf. 2.3.2). Next, we argue that the proposed coding scheme is secure against an eavesdropper. Towards this, we present the following simple lemma.
Lemma 2.
Let . For , let denote the symbols downloaded from the th storage node during the repair of the storage nodes indexed by the set . Then, we have
(23) 
Proof.
Proposition 1.
The coding scheme described in Construction 2 is secure against an eavesdropper.
Proof.
The proof of this proposition is very similar to the proof of [18, Theorem 18]. Let and denote the indices of the storageeavesdropped and downloadeavesdropped nodes, respectively. Let’s consider a set of storage nodes such that . Let denote the symbols observed by the eavesdropper. Note that
(26) 
Consider
(27) 
where step follows from the fact that for , is a function of the symbols in the set . The steps and follow from [8, Lemma 5] and Lemma 2, respectively.
Since is a generator matrix of an MDS array code, it follows from [18, Lemma 9] that the symbols in the set
(28) 
correspond to the evaluations of the linearized polynomial (cf. (20)) at
linearly independent (over ) points in . Note that the symbols in (26) can be obtained from the symbols observed by the eavesdropper (cf. (26)). Given the secure file , one can remove the contribution of from these evaluations of to obtain the evaluations of the following polynomial at the linearly independent (over ) points in .
(29) 
where the last equality holds as the first coordinates of the vector are composed of random symbols (cf. Construction 2). Now, it is straightforward from [18, Remark 8] that these evaluations are sufficient to recover the coefficients of the linearized polynomial . In other words, we have that
(30) 
It follows from (3) and (30) that the coding scheme defined in Construction 2 satisfies the both requirements of Lemma 1. Thus, we have
Since the choice of and is arbitrary, this establishes that the coding scheme obtained by Construction 2 is secure against an eavesdropper. ∎
4 Conclusion
We characterize the secrecy capacity of linear repairable MSR codes with against a passive eavesdropping attack, where the eavesdropper is allowed to observe repair of storage nodes in addition to the content stored on storage nodes. One of the code constructions for MSR codes from [27] proves instrumental in establishing this result. It is an interesting question to establish the similar results for general values of . Another direction for future work is to characterize the secrecy capacity of minimum storage cooperative regenerating (MSCR) codes.
References
 [1] A. Agarwal and A. Mazumdar. Security in locally repairable storage. In Proc. of 2015 IEEE Information Theory Workshop (ITW), pages 1–5, April 2015.
 [2] V. R. Cadambe, C. Huang, and J. Li. Permutation code: Optimal exactrepair of a single failed node in MDS code based distributed storage systems. In Proc. of 2011 IEEE International Symposium on Information Theory (ISIT), pages 1225–1229, 2011.
 [3] A. G. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. IEEE Transactions on Information Theory, 56(9):4539–4551, 2010.
 [4] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin. On the locality of codeword symbols. IEEE Transactions on Information Theory, 58(11):6925–6934, 2012.
 [5] S. Goparaju, A. Fazeli, and A. Vardy. Minimum storage regenerating codes for all parameters. CoRR, abs/1602.04496, 2016.
 [6] S. Goparaju, S. El Rouayheb, and R. Calderbank. Can linear minimum storage regenerating codes be universally secure? In Proc. of 49th Asilomar Conference on Signals, Systems and Computers, pages 549–553, Nov 2015.
 [7] S. Goparaju, S. El Rouayheb, R. Calderbank, and H. V. Poor. Data secrecy in distributed storage systems under exact repair. In Proc. of 2013 International Symposium on Network Coding (NetCod), pages 1–6, June 2013.
 [8] K. Huang, U. Parampalli, and M. Xian. Characterization of secrecy capacity for general MSR codes under passive eavesdropping model. CoRR, abs/1505.01986, 2015.
 [9] K. Huang, U. Parampalli, and M. Xian. Security concerns in minimum storage cooperative regenerating codes. CoRR, abs/1509.01324, 2015.
 [10] A.M. Kermarrec, N. Le Scouarnec, and G. Straub. Repairing multiple failures with coordinated and adaptive regenerating codes. In Proceedings of 2011 International Symposium on Network Coding (NetCod), pages 1–6, 2011.
 [11] O. O. Koyluoglu, A. S. Rawat, and S. Vishwanath. Secure cooperative regenerating codes for distributed storage systems. IEEE Transactions on Information Theory, 60(9):5228–5244, Sept 2014.
 [12] F. J. MacWilliams and N. J. A. Sloane. The Theory of ErrorCorrecting Codes. Amsterdam: NorthHolland, 1983.
 [13] D. S. Papailiopoulos and A. G. Dimakis. Locally repairable codes. IEEE Transactions on Information Theory, 60(10):5843–5855, Oct 2014.
 [14] D. S. Papailiopoulos, A. G. Dimakis, and V. Cadambe. Repair optimal erasure codes through hadamard designs. IEEE Transactions on Information Theory, 59(5):3021–3037, 2013.
 [15] S. Pawar, S. El Rouayheb, and K. Ramchandran. Securing dynamic distributed storage systems against eavesdropping and adversarial attacks. IEEE Transactions on Information Theory, 57(10):6734–6753, 2011.
 [16] K. V. Rashmi, N. B. Shah, and P. V. Kumar. Optimal exactregenerating codes for distributed storage at the msr and mbr points via a productmatrix construction. IEEE Transactions on Information Theory, 57(8):5227–5239, Aug 2011.
 [17] N. Raviv, N. Silberstein, and T. Etzion. Accessoptimal MSR codes with optimal subpacketization over small fields. CoRR, abs/1505.00919, 2015.
 [18] A. S. Rawat, O. O. Koyluoglu, N. Silberstein, and S. Vishwanath. Optimal locally repairable and secure codes for distributed storage systems. IEEE Transactions on Information Theory, 60(1):212–236, Jan 2014.
 [19] A. S. Rawat, O. O. Koyluoglu, and S. Vishwanath. Progress on highrate MSR codes: Enabling arbitrary number of helper nodes. CoRR, abs/1601.06362, 2016.
 [20] B. Sasidharan, G. K. Agarwal, and P. V. Kumar. A highrate MSR code with polynomial subpacketization level. CoRR, abs/1501.06662, 2015.
 [21] B. Sasidharan, P. V. Kumar, N. B. Shah, K. V. Rashmi, and K. Ramachandran. Optimality of the productmatrix construction for secure msr regenerating codes. In Proc. of 6th International Symposium on Communications, Control and Signal Processing (ISCCSP), pages 10–14, May 2014.
 [22] N. B. Shah, K. V. Rashmi, and P. V. Kumar. Informationtheoretically secure regenerating codes for distributed storage. In Proc. of 2011 IEEE Global Telecommunications Conference (GLOBECOM), pages 1–5, 2011.
 [23] K. W. Shum and Y. Hu. Cooperative regenerating codes. IEEE Transactions on Information Theory, 59(11):7229–7258, 2013.
 [24] I. Tamo, Z. Wang, and J. Bruck. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Transactions on Information Theory, 59(3):1597–1616, 2013.
 [25] R. Tandon, S. Amuru, T. C. Clancy, and R. M. Buehrer. Toward optimal secure distributed storage systems with exact repair. IEEE Transactions on Information Theory, 62(6):3477–3492, June 2016.
 [26] A. Wyner. The wiretap channel. The Bell System Technical Journal, 54(8):1355 – 1387, October 1975.
 [27] M. Ye and A. Barg. Explicit constructions of highrate MDS array codes with optimal repair bandwidth. CoRR, abs/1604.00454, 2016.
 [28] M. Ye and A. Barg. Explicit constructions of optimalaccess MDS codes with nearly optimal subpacketization. CoRR, abs/1605.08630, 2016.