Bounds and Constructions of Codes with All-Symbol Locality and Availability The research was carried out at the IITP RAS and supported by the Russian Science Foundation (project no. 14-50-00150).

# Bounds and Constructions of Codes with All-Symbol Locality and Availability ††thanks: The research was carried out at the IITP RAS and supported by the Russian Science Foundation (project no. 14-50-00150).

Stanislav Kruglik123 and Alexey Frolov12 stanislav.kruglik@skolkovotech.ru, al.frolov@skoltech.ru 1 Skolkovo Institute of Science and Technology
Moscow, Russia
2 Institute for Information Transmission Problems
Moscow, Russia
3 Moscow Institute of Physics and Technology
Moscow, Russia
###### Abstract

We investigate the distance properties of linear locally recoverable codes (LRC codes) with all-symbol locality and availability. New upper and lower bounds on the minimum distance of such codes are derived. The upper bound is based on the shortening method and improves existing shortening bounds. To reduce the gap in between upper and lower bounds we do not restrict the alphabet size and propose explicit constructions of codes with locality and availability via rank-metric codes. The first construction relies on expander graphs and is better in low rate region, the second construction utilizes LRC codes developed by Wang et al. as inner codes and better in high rate region.

## I Introduction

A locally recoverable code (LRC) is a code over finite alphabet such that each symbol is a function of small number of other symbols that form a recovering set [1, 2, 3, 4, 5]. These codes are important due to their applications in distributed and cloud storage systems. LRC codes are well-investigated in the literature. The bounds on the rate and minimum code distance are given in [1, 3] for the case of large alphabet size. The alphabet-dependent shortening bound (see [6] for the method explanation) is proposed in [7]. Optimal code constructions are given in [8] based on rank-metric codes (for large alphabet size, which is an exponential function of the code length) and in [9] based on Reed-Solomon codes (for small alphabet, which is a linear function of the code length).

The natural generalization of an LRC code is an LRC code with availability (or multiple disjoint recovering sets). Availability allows us to handle multiple simultaneous requests to erased symbol in parallel. This property is very important for hot data that is simultaneously requested by a large number of users. The case of LRC codes with availability is much less investigated. Bounds on parameters of such codes and constructions are given in [4, 10, 11, 12]. Most of the papers focused on information-symbol locality and availability.

We are interested in all-symbol locality and availability that is preferable in applications as it permits a uniform approach to system design. In this paper we continue the research started in [10] and improve upper and lower bounds on the minimum distance of linear LRC codes with availability. To reduce the gap in between upper and lower bounds we do not restrict the alphabet size and propose explicit constructions of codes with locality availability via rank-metric codes using the ideas from [8].

Our contribution is as follows. New upper and lower bounds on the minimum distance of LRC codes with availability are derived. The upper bound is based on the shortening method (developed in [6]) and improves existing shortening bounds. We propose explicit constructions of LRC codes with availability via rank-metric codes. The first construction relies on expander graphs and is better in low rate region, the second construction utilizes codes with arbitrarily all-symbol locality and availability, high rate and small minimum distance developed in [13] as inner codes and better in high rate region.

## Ii Preliminaries

### Ii-a Locally recoverable codes

Let us denote by a field with elements. Let . The code has locality if every symbol of the codeword can be recovered from a subset of other symbols of [1]. In other words, this means that, given there exists a subset of coordinates such that the restriction of to the coordinates in enables one to find the value of The subset is called a recovering set for the symbol .

Generalizing this concept, assume that every symbol of the code can be recovered from disjoint subsets of symbols of size . More formally, denote by the restriction of the code to a subset of coordinates . Given define the set of codewords

###### Definition 1

A code is said to have disjoint recovering sets if for every there are pairwise disjoint subsets such that for all and every pair of symbols

 C(i,a)Rji∩C(i,a′)Rji=∅.

In what follows we refer these codes as -LRC codes. We briefly list the existing results below. The first bound for -LRC codes was given in [14, 15]

 d≤n−k+2−⌈t(k−1)+1t(r−1)+1⌉.

An improvement of this bound was obtained in [10]

 d≤n−t∑i=0⌊k−1ri⌋.

An alphabet-dependent bound was probosed in [12] and has form

 d≤min1≤x≤⌈k−1(r−1)t+1⌉;1≤yj≤t;j∈[x]A

where , and denote the largest possible minimum distance of a code over .

The bound on the rate of -LRC codes was given in [10]

 kn≤R∗(r,t)=t∏i=111+1ir. (1)

This bound was improved in [11] for .

In [13] a recursive construction of binary -LRC codes was proposed. The parameters of these codes are as follows: , and . We refer these codes as WZL codes and use them as inner codes in our constructions. We note, that in case of the construction of WZL codes coincides with the construction from [10].

### Ii-B Rank-metric codes

###### Definition 2

A linearized polynomial over of -degree can be presented as follows

 f(x)=ℓ∑i=0aix[i],

where , , and .

We now explain how to construct a codeword of Gabidulin code [16]. Let us choose an arbitrary linearized polynomial over , such that -degree is less or equal to . This polynomial includes information symbols as coefficients. Then

 cG=(f(α1),f(α2),…,f(αnG)),

where the elements and linearly independent as vectors (of length ) over . In what follows we assume , we need this condition for linearly independent vectors to exist.

Note, that the following property of linearized polynomials holds

 f(aβ+bγ)=af(β)+bf(γ), (2)

where and .

### Ii-C Expander graphs

Let us consider a biregular bipartite graph such that and for , and for .

###### Definition 3

is an -expander if for any subset

 |V′|≤αn⇒|Γ(V′)|>tγ|V′|,

where is the set of vertexes connected to the set .

The definition is illustarted in Fig. 1.

The usual way to check the expansion properties of a graph is to examine its second-largest eigenvalue (see e.g. [17]). Unfortunately, the explicit constructions of expander graphs with expansion greater than are not known (Kahale [18] even shows that eigenvalue separation cannot certify greater expansion). Thus, in what follows we rely on the expansion properties of random expander graphs. The following asymptotic result, due to [19], is cited here in the form given in [20, p. 431].

###### Lemma 1

Let be a graph chosen uniformly from the ensemble of -regular bipartite graphs and let For a given let be the positive solution of the equation

 t−1th(δ) − 1r+1h(δγ(r+1)) − δγ(r+1)h(1γ(r+1))=0.

Then for and

 Pr({G is an(t,r+1,δ′,tγ) expander})≥1−O(n−β).

## Iii Upper bound on the minimum distance

Shortening is a well known and widely used technique in coding theory. The idea is to remove (fix) some coordinates of the original code and use the new code to obtain bounds for the original code.

Denote by the set of all coordinates such that for every the values can be found from the values of . We will call the subset the closure of in .

Let us introduce some notations. By we denote an upper bound on the dimension of any linear code. By we denote an upper bound on the distance of any linear code. The shortening bound can be formulated as follows.

###### Theorem 1 (Shortening bound)

Assume we are given an linear code over . The following inequalities hold for the parameters of the code

 k≤minI:|Cl(I)|≤n−d{|I|+k∗(q,n−|Cl(I)|,d)}

and

 d≤minI:|I|
###### Remark 1

We note, that the theorem 1 also valid in non-linear case. In this case by we mean .

Now we explain how the special structure of LRC code enables us to apply the shortening technique most efficiently. The result is formulates in the following theorem.

###### Theorem 2

Let . Assume we are given an linear -LRC code over , then the following inequalities hold for the parameters of the code

 k≤minsr+1≤n−d(1+(r−1)s+k∗(q,n−1−sr,d)

and

 d≤min1+(r−1)s
###### Proof:

Within the proof we construct a set of coordinates , with such a property

 |Cl(I)|≥1+rs.

Let us denote the code dual to by . By we denote the set of codewords of dual code with the weight111Here and in what follows by weight we mean the Hamming weight, i.e. a number of non-zero elements in a vector. less or equal to (local checks), i.e.

 C⊥r+1={h∈C⊥:wt(h)≤r+1}.

In what follows we work only with the set of all local checks .

To construct the required set of coordinates ,, we apply the Algorithm 1 with input parameters and . Let us explain the algorithm in more detail. At each step the algorithm adds a new local check (from the set ) to the set until linearly independent local checks are added. By we denote the set of covered positions. The algorithm chooses a local check with the largest intersection with (line 12). Two cases are possible:

1. there exists a local check, which intersects with .

2. there is no new local check, which intersects with .

In the first case we need to check linear dependency (to proper calculate the number of check symbols) and add the local check to . The second case is more interesting. This condition means, that the elements of form an -LRC code of smaller length. Indeed the absence of new local checks, which intersects with at least one element of means that each position is covered either times or not covered at all. We store the number of recovery sets, that from an -LRC code of smaller length in the variable and the number of check symbols of this code in the variable .

It is clear, that the algorithm constructs the set , such that . The only thing to check is that cannot be bigger then . We know, that the first elements of form an -LRC code with check symbols. The worst case for the rest elements of is to intersect in exactly one position, so

 |I|≤s11−R∗(r,t)−s1+1+(s−s1)(r−1)≤1+s(r−1)

as for (see (1))

 R∗(r,t)≤r−1r.
\qed
###### Corollary 1

If we substitute the Singleton bound for function we obtain

 d≤n−(k−1)−⌊k−2r−1⌋.
###### Corollary 2

The asymptotic form of the new upper bound is as follows

 R≥r−1r(1−δ)−o(1).

## Iv Expander-based constructions

In this section we show the existence of an -LRC codes over a sufficiently large finite field with large minimum distance. The proof relies on the existence of regular bipartite graphs with good expansion properties. We note, that the result here coincides with the result from [10]. At the same time the construction is explicit and the proof is simpler.

Let be a bipartite graph with the following properties:

• and for ;

• and for ;

• G is an -expander;

• .

###### Remark 2

As shown in [21], the probability that a random regular graph on vertexes has no cycles of length is bounded away from zero as This results together with Lemma 1 imply that there exist biregular bipartite expanding graphs with required properties.

We now construct a matrix , , , over . We associate the columns of with the vertexes from and the rows of with the vertexes of . The element is non-zero if and only if the vertexes and are connected with an edge. We choose non-zero elements equiprobably and independently from the set . By we denote a linear code of length over determined by . The following inequality follows for the rate of the code

 R(CE)≥1−tr+1−o(1),

the equality takes place in case of full rank of .

Let us consider a code over , which is constructed in the following way. We first encode information symbols with Gabidulin code. Then we encode the resulting codeword of Gabidulin code with code .

###### Theorem 3

Let us denote the relative minimum distance of the code by . For sufficiently large and the following inequality holds for the rate of the code

 R≥1−tr+1−max{δ(1−tγ),0}−o(1),

where .

###### Proof:

Note, that due to the property (2) the checks added by the code are evaluation points of in the points of , that linearly depend on . To decode the code we need to interpolate . To do this it is sufficient to find evaluation points which correspond to linearly independent elements of .

Let the code has the minimum distance , thus this code can correct any erasures. Let us denote the set of erasures by , , and estimate the number of evaluation points (), which correspond to linear independent elements of . The code imposes linear restrictions. We cannot take all the evaluation points that belong to the same linear restriction as they are linearly dependent. By of size we denote a submatrix of , which corresponds to erased positions (we removed zero rows). The probability for this submatrix to have full rank tends to when grows (see [10]). Thus, the number of evaluation points corresponding to linear independent elements of can be estimated as follows

 k′≥n−|E|−(m−min{|Γ(E)|,|E|}),

where is the set of linear restrictions connected to the set of erased nodes . To conclude the proof we note, that and choose . \qed

## V Concatenated construction

We encode information symbols in two steps. First, information symbols over are encoded using a Gabidulin code. The codeword of the Gabidulin code of length is then partitioned into local groups and each local group is then encoded using an binary WZL code. In what follows we assume, that . This process is illustarted in Fig. 2.

Let us consider a WZL code with parity-check matrix . Let us denote the erasure pattern by , and estimate the rank of the submatrix of , which corresponds to erased positions. The following estimate holds

 rank(~HI) ≥ L∗(eI) = {eI,eI≤tmax{⌈(1−R∗(r−1,t))eI⌉,t},eI>t

We use the fact, that the minimum code distance is and that the submatrix corresponds to -LRC code with .

###### Theorem 4

Let us consider a code of length , minimum distance over . The following bound is valid for the dimension of the code

 k≥kI⌊n−d+1nI⌋+kI−eI+L∗(eI),

where

###### Proof:

As the distance is we need to find the worst combination of erasures to estimate the number of evaluation points which correspond to linearly independent elements of . Due to the properties of WZL codes the worst combination of errors should cover the whole blocks (codewords) of inner code . The number of blocks, that do not contain the erasures in this case is equal to and we can take information symbols of these blocks. In case does not divide we have one block which is partially erased. We can take symbols from it, where is the number of erasures in this block. \qed

###### Corollary 3

The asymptotic from of this bound is as follows

 R≥rr+t(1−δ)−o(1).

## Vi Numerical results

Comparison of upper and lower bounds for different values of locality and availability is presented in Fig. 3. We note, that obtained upper bound improves an upper bounds from [10, 12]. Another interesting fact is as follows. In high rate region concatenated construction is better, than expander-based construction. The situation is opposite in the low rate region.

## References

• [1] P. Goplan, C. Huang, H. Simitci, and S. Yekhanin, “On the locality of codeword symbols,” IEEE Trans. Inf. Theory, vol. 58, no. 11, pp. 6925–6934, Nov. 2011.
• [2] P. Goplan, C. Huang, B. Jenkins, and S. Yekhanin, “Explicit maximally recoverable codes with locality,” IEEE Trans. Inf. Theory, vol. 60, no. 9, pp. 5245 –5256, Sep. 2014.
• [3] D. S. Papailiopoulos and A. G. Dimakis, “Locally repairable codes,” IEEE Trans. Inf. Theory, vol. 60, no. 10, pp. 5843–5855, Oct 2014.
• [4] A. S. Rawat, O. O. Koyluoglu, N. Silberstein, and S. Vishwanath, “Optimal locally repairable and secure codes for distributed storage systems,” IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 212–236, Jan 2014.
• [5] S. Yekhanin, “Locally decodable codes,” Found. Trends Theoretical Comput. Sci., vol. 6, no. 3, pp. 139 –255, 2012.
• [6] Y. Ben-Haim and S. Litsyn, “Upper bounds on the rate of ldpc codes as a function of minimum distance,” IEEE Trans. Inf. Theory, vol. 52, no. 5, pp. 2092 –2100, May 2006.
• [7] V. R. Cadambe and A. Mazumdar, “Bounds on the size of locally recoverable codes,” IEEE Trans. Inf. Theory, vol. 61, no. 11, pp. 5787 –5794, Nov. 2015.
• [8] N. Silberstein, A. S. Rawat, O. Koyluogly, and S. Vishwanath, “Optimal locally repairable codes via rank metric codes,” in Proceedings IEEE International Symposium on Information Theory (ISIT), 2013, pp. 1819–1823.
• [9] I. Tamo and A. Barg, “A family of optimal locally recoverable codes,” IEEE Trans. Inf. Theory, vol. 60, no. 8, pp. 4661 –4676, Aug. 2014.
• [10] I. Tamo, A. Barg, and A. Frolov, “Bounds on the parameters of locally recoverable codes,” IEEE Trans. Inf. Theory, vol. 62, no. 6, pp. 3070 –3083, Jun. 2016.
• [11] N. Prakash, V. Lalitha, and P. V. Kumar, “Codes with locality for two erasures,” in Proceedings IEEE International Symposium on Information Theory (ISIT), 2014, pp. 1962–1966.
• [12] P. Huang, E. Yaakobi, H. Uchikawa, and P. H. Siegel, “Linear locally repairable codes with availability,” in Proceedings IEEE International Symposium on Information Theory (ISIT), Jun. 2015, pp. 1871–1875.
• [13] A. Wang, Z. Zhang, and M. Liu, “Achieving arbitrary locality and availability in binary codes,” in Proceedings IEEE International Symposium on Information Theory (ISIT), Jun. 2015, pp. 1866 – 1870.
• [14] A. Wang and Z. Zhang, “Repair locality with multiple erasure tolerance,” IEEE Trans. Inf. Theory, vol. 60, no. 11, pp. 6979–6987, Nov 2014.
• [15] A. S. Rawat, D. S. Papailiopoulos, A. G. Dimakis, and S. Vishwanath, “Locality and availability in distributed storage,” in Proceedings IEEE International Symposium on Information Theory (ISIT), June 2014, pp. 681–685.
• [16] E. M. Gabidulin, “Theory of codes with maximum rank distance,” Probl. Inf. Transm., vol. 21, no. 1, pp. 1–12, 1985.
• [17] N. Alon, “Eigenvalues and expanders,” Combinarorica, vol. 6, pp. 83–96, 1986.
• [18] N. Kahale, “On the second eigenvalue and linear expansion of regular graphs,” in Proc. IEEE Symp on Foundations of Computer Science, 1992, pp. 296–303.
• [19] D. Burshtein and G. Miller, “Expander graph arguments for message-passing algorithms,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 782–790, 2001.
• [20] T. Richardson and R. Urbanke, Modern Coding Theory.   U.K.: Cambridge Univ. Press, 2008.
• [21] B. D. McKay, N. C. Wormald, and B. Wysocka, “Short cycles in random regular graphs,” Electron. J. Combinat., vol. 11, no. 1, p. 66, 2004.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters