Linear Coded Caching Scheme for Centralized Networks M. Cheng is with Guangxi Key Lab of Multi-source Information Mining \& Security, Guangxi Normal University, Guilin 541004, China, (e-mail: chengqinshi@hotmail.com). J. Li is with the Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan 430062, China (e-mail: jieli873@gmail.com) X. Tang is with the Information Security and National Computing Grid Laboratory, Southwest Jiaotong University, Chengdu, 610031, China (e-mail: xhutang@swjtu.edu.cn). R. Wei is with Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada, P7B 5E1,(e-mail: rwei@lakeheadu.ca).

# Linear Coded Caching Scheme for Centralized Networks ††thanks: M. Cheng is with Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China, (e-mail: chengqinshi@hotmail.com).††thanks: J. Li is with the Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan 430062, China (e-mail: jieli873@gmail.com)††thanks: X. Tang is with the Information Security and National Computing Grid Laboratory, Southwest Jiaotong University, Chengdu, 610031, China (e-mail: xhutang@swjtu.edu.cn).††thanks: R. Wei is with Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada, P7B 5E1,(e-mail: rwei@lakeheadu.ca).

Minquan Cheng, Jie Li , Xiaohu Tang , Ruizhong Wei
###### Abstract

Coded caching systems have been widely studied to reduce the data transmission during the peak traffic time. In practice, two important parameters of a coded caching system should be considered, i.e., the rate which is the maximum amount of the data transmission during the peak traffic time, and the subpacketization level, the number of divided packets of each file when we implement a coded caching scheme. We prefer to design a scheme with rate and packet number as small as possible since they reflect the transmission efficiency and complexity of the caching scheme, respectively.

In this paper, we first characterize a coded caching scheme from the viewpoint of linear algebra and show that designing a linear coded caching scheme is equivalent to constructing three classes of matrices satisfying some rank conditions. Then based on the invariant linear subspaces and combinatorial design theory, several classes of new coded caching schemes over are obtained by constructing these three classes of matrices. It turns out that the rate of our new rate is the same as the scheme construct by Yan et al. (IEEE Trans. Inf. Theory 63, 5821-5833, 2017), but the packet number is significantly reduced. A concatenating construction then is used for flexible number of users. Finally by means of these matrices, we show that the minimum storage regenerating codes can also be used to construct coded caching schemes.

Linear coded caching scheme, matrices, rate, packet number

## I Introduction

Due to the fast growth of network applications, there exist tremendous pressures on the data transmission over computer networks, and the high temporal variability of network traffic further imposes these pressures. As a result, the communication systems are congested during peak-traffic hours and underutilized during off-peak hours. Recently caching systems, where some traffics are proactively placed into the user’s memory during the off peak traffic hours, are regarded as an interesting solution to relieve the pressures and have been used in heterogeneous wireless networks [2, 14, 15, 8, 9, 13, 12].

A centralized caching system, which has been discussed widely, is the basic model of networks for discussing the caching systems. In a centralized caching system, a server, which contains files with equal size, connects to users, each of them has cache size of files, by a single shared link. A caching scheme consists of two independent phases, i.e., placement phase during the off-peak traffic hours and delivery phase during the peak traffic hours. Assume that each user requests one file during the peak traffic hours. We prefer to design a scheme for the caching system such that in the delivery phase the data broadcasting can satisfy various demands of all users, and at the same time, the amount of the data, which is called rate, is as small as possible.

Maddah-Ali and Niesen in [13] proposed a coded caching approach. The proposed coded caching schemes, i.e., coded caching in placement phase and coded broadcasts in delivery phase, allow significant reductions in the number of bits transmitted from the server to the end users. A coded caching scheme called -division scheme if each file is split into packets. If the packets of all files are directly cached in the placement phase, we call it uncoded placement. Otherwise we call it coded placement. Through an elaborate uncoded placement and a coded delivery, Maddah-Ali and Niesen in [13] proposed the first deterministic scheme for an -division coded caching system with , when is an integer. Such a scheme is referred to as MN scheme in this paper.

### I-a Prior work

In this paper, we are interested in the centralized coded caching schemes. So far, many results have been obtained based on that model, for instances, [13, 1, 5, 7, 17, 22, 25, 27, 29, 28, 10, 26] etc. The following parameters are two of the main concerns when researchers construct caching schemes.

Transmission rate: The first parameter is the transmission rate under the assumption that each file in the library is large enough. In [7], an improved lower bound of the transmission rate is derived by a combinatorial problem of optimally labeling the leaves of a directed tree. By interference elimination, the transmission rate of a coded caching scheme is obtained for the case . There are many discussions on the transmission rate by adding another condition, i.e., under the uncoded placement. For example, [26] showed that MN scheme has minimum rate by the graph theory, when . [28] obtained the transmission rate for the various demand patterns by modifying MN scheme in the delivery phase. [10] translated the transmission rate for the non-uniform file popularity and the various demand patterns into solving a optimization problem with variables. It is interesting that when and all of the files have the same popularity the solution of this optimization problem is exactly the rate of MN scheme. But for the other cases, this optimization problem is NP-Hard.

Packet number: The second parameter is the value of packet number . In practice the packet number is finite. Furthermore, the complexity of a coded caching scheme increases when the parameter increasing. is also referred as subpachetization level by many authors. So far, all the known discussions for the packet number are proposed under the assumption of identical uncoded caching policy, i.e., each user caches packets with the same indices from all files, where packets belonging to every file is ordered according to a chosen numbering (note that we assume every file has the same size). In this paper we only consider the schemes under this assumption when . All the following introduced previously known studies have this assumption. In [5] authors showed that the minimum packet number is , i.e., the packet number of MN scheme, for the minimum rate. It is easy to see that in MN scheme increases very quickly with the number of users . This would become infeasible when becomes large. It is well known that there is tradeoff between and .

The first scheme with lower packet number was proposed by Shanmugam et al. in [18]. Recently, Yan et al. in [27] characterized an -division caching scheme by a simple array which is called placement delivery array (PDA), where and . Then they generated two infinite classes of PDAs which gives two classes of schemes. Some other deterministic coded caching schemes with lower subpacketization level were proposed while the rate increased. For example, [16] obtained two classes of schemes by constructing the special -free hypergraphs. [3] generalized all the constructions in [27] and most results in [16] by means of PDAs. [23] used resolvable combinatorial designs and linear block codes to construct two classes of schemes. [19, 20] obtained some schemes by the known results on the Ruzsa-Szeméredi graphs. [29] obtained a class of schemes by the results of strong edge coloring of bipartite graphs. [11] got a class of schemes by means of projective space over where is an prime power. However, up to now there are a few results on the above related graphs for some special parameters, and the existences of these graphs are open problems, especially for the deterministic constructions. Furthermore, the authors in [20] pointed that all the deterministic coded caching schemes can be recast into the PDA. By means of PDA, the authors in [5] obtained two classes of Pareto-optimal PDAs based on MN scheme.

As we have mentioned, in the MN scheme has to be at least . When is large, will be very large so that MN scheme would become unfeasible in practice. Therefore researchers investigated schemes with smaller thereafter. On the other hand, the rate of MN scheme is almost optimal. In this paper, we focus on explicit coded caching schemes with small rate and reduced packet number when . To compare with the known results, we select one scheme from [27] and the MN scheme from [13] in Table I. As we have mentioned that there are many results about the constructions of coded caching schemes. However, up to now only the schemes from [27] and [13] have the rate similar to our new scheme, which is approximating to the optimal rate. The rate of other schemes are not competitive as that of our new scheme.

From the table, we can see that the scheme in [27] significantly reduces while the rate approximates that of MN scheme when is large. An interesting question is that for the same or similar rate of MN scheme, do there exist some classes of schemes in which the packet number is smaller than that of the scheme in [27]. In this paper, we shall give the positive answer. The scheme at third row is from this paper, which further significantly reduced the value of . Our new determined scheme has the rate almost as small as MN scheme, but the packet number is much smaller than that in [27]. It is easy to check that for the fixed , the ratio of packet numbers of these two schemes is , which is exponential with . We also provided some methods to release the restriction of so that the value of is flexible.

The main contributions of this paper are as follows. Firstly, we give a generalization of PDA method which is essentially the basic method for most previous constructions (see [20]). We characterize a general coded caching system from the view point of linear algebra. Consequently a coded caching scheme can be represented by three classes of matrices, say caching matrices, coding matrices and decoding matrices, satisfying some rank conditions. It means that we reduced the problem to finding sequences of subspace of a linear space satisfying certain properties and interrelationships. Secondly, as one example of implementation, we use the invariant subspaces of linear algebra and combinatorial design theory to construct three classes of matrices over . Thus a class of determined coded caching scheme over is obtained. The new scheme has good rate and significant reduced packet number. Thirdly, since many previous schemes required strict user numbers, we use some concatenating constructions to obtain schemes with flexible . Finally we demonstrate that the regenerating code with optimal repair bandwidth, which is a hot topic in the distributed storage, can also be used to constructed a coded caching scheme.

The rest of this paper is organized as follows. Section II introduces system model and backgrounds for coded caching system. In Section III, a new characterization of coded caching scheme is proposed from the viewpoint of matrices. In Section IV, a new class of coded caching schemes is proposed. Section V introduces the concatenating construction. We indicate that the regenerating code with optimal repair bandwidth can be used to construct a coded caching scheme in Section VI. Finally a conclusion is drawn in Section VII. Some of detailed proofs are attached in Appendix.

## Ii Problem formulation and preliminaries

In the centralized caching system, a single server containing files with the same length, denoted by , , , , connects to users, denoted by , over a shared link, and each user has a cache memory of size files (see Fig. 1).

An -division coded caching scheme consists of two independent phases.

• Placement phase: Server divides each file into packets, each of which has the same size, and then places some uncoded/coded packets of each file into users’ cache memories. Denote the content cached in user by . In this phase server does not know the users’ requests in the following phase.

• Delivery phase: Assume that each user randomly requests one file from files independently. Denote the request file numbers by , i.e., user requests the file where and . Then server sends a coded signal of size at most packets, denoted by , to the users such that each user’s demand is satisfied with the help of the contents of its cache.

Let denote the maximum transmission amount among all the request during the delivery phase, i.e.,

 R=supd=(d0,⋯,dK−1)dk∈[0,N),∀k∈[0,K){SdF}. (1)

is called the rate of a coded caching scheme. Maddah-Ali and Niesen proposed the first well known deterministic coded caching scheme.

###### Lemma 1:

(MN scheme, [13]) For any positive integers , and with , there exists an -division coded caching scheme with and .

Denote the minimum value of by . Maddah-Ali and Niesen in [13] proved that the ratio between their rate and is less than or equal to a constant . So MN scheme is ordered optimal. Up to now, the rate of MN scheme is the smallest rate among of all the known schemes when .

### Ii-a Placement delivery array

Before introduce our new linear methods, we review the placement delivery array. Our new method is inspired from this structure. As [20] indicated that PDA method is essentially the basic method for most previous constructions. Our purpose is to generalize the PDA method and get better results from the generalization.

To construct uncoded caching, Yan et al. [27] generated some -division coded caching scheme by constructing a related combinatorial structure which is defined as follows.

###### Definition 1 ([27]):

For positive integers and , an array , , composed of a specific symbol and nonnegative integers , is called a placement delivery array (PDA) if it satisfies the following conditions:

• The symbol appears times in each column;

• Each integer occurs at least once in the array;

• For any two distinct entries and , is an integer only if

• , , i.e., they lie in distinct rows and distinct columns; and

• , i.e., the corresponding subarray formed by rows and columns must be of the following form

 (s∗∗s) or (∗ss∗)
###### Example 1:

It is easily checked that the following array is a PDA.

 P=⎛⎜ ⎜ ⎜⎝∗1∗2∗00∗∗31∗∗30∗2∗2∗1∗∗3⎞⎟ ⎟ ⎟⎠ (2)

Based on a PDA with and , Yan et al., in [27] showed that an -division caching scheme for a caching system with can be conducted by Algorithm 1.

###### Theorem 1 ([27]):

For a given PDA, by Algorithm 1 there exists an -division caching scheme with and for any and .

###### Example 2:

From Example 1, let us generate a -division coded caching scheme by Algorithm 1. Clearly and each file is , by Line-2 of Algorithm 1. According to Lines 2-5 and Lines 8-10, the two phases of a coded caching scheme can be implemented as follows:

• Placement Phase: The contents in each users are

 Z0={Wi,0,Wi,2:i∈[0,6)}    Z1={Wi,1,Wi,3:i∈[0,6)} Z2={Wi,0,Wi,1:i∈[0,6)}    Z3={Wi,2,Wi,3:i∈[0,6)} Z4={Wi,0,Wi,3:i∈[0,6)}    Z5={Wi,1,Wi,2:i∈[0,6)}
• Delivery Phase: Assume the request vector . Table II shows the transmitting process.

By constructing an appropriate PDA, Yan et al., proposed the following two schemes where the rate is near to that of MN scheme but the packet number is far less than that of MN scheme.

###### Lemma 2:

([27]) For any positive integers , with , there exists a -division coded caching scheme with , and rate , respectively.

PDA is used to design most caching schemes previously. In the PDA , each column represents the cache data for a user. A ‘*’ means that this part of the data is stored in this user’s cache. An integer indicates that this part of the data is not stored. In case the user requests data not stored in its cache (i.e., some integer, say , in the cell of PDA), then the server look at the corresponding row (say th row) of the matrix. For those columns containing , the server includes the data of its th cell to the delivery data. The property of the PDA guarantees the user can get the requested file.

## Iii Linear coded caching schemes

The PDA method showed that we need some algorithm to do the arrangement of the data for caching and delivery. For efficiency, we propose linear algorithms to do that. That is the main motivation of our linear coded caching schemes. Now we start to introduce our method below.

Assume that each file is a column vector with length over a certain filed, say for some prime power , and the identical caching policy for all files is carried out for each user. When is a positive integer let us characterize the coded caching scheme from the viewpoint of linear algorithms, i.e., the two phases can be written as follows.

• In the placement phase, for each , the th user caches

 Zk={SkWn | n∈[0,N)}

where , which is called a caching matrix, is an matrix over some field . Clearly each user caches the contents of size .

• In the delivery phase, for any fixed request , sever broadcasts signal

 Xd=A0Wd0+A1Wd1+…+AK−1WdK−1 (3)

where , which is called a coding matrix, is an matrix. Then user can obtain the required file by the contents and the contents

 W′dk=S′kXd

where , which is called a decoding matrix, is an matrix.

In this paper, a coded caching scheme which can be characterized by the above three classes of matrices, i.e., caching matrices, coding matrices and decoding matrices, is called linear caching scheme.

Next, we shall prove some algebraic properties of the three series of matrices.

###### Theorem 2:

For any request and the related signals in (3), user can obtain the required if and only if the matrices , and , , satisfy the following conditions.

 (4)
###### Proof.

For any request vector and the related signals , user can get the following contents by decoding matrix

 S′kXd=S′iA0Wd0+S′kA1Wd1+…+S′kAK−1WdK−1.

This is,

 S′kAkWdk=S′kXd−∑k′∈[0,K)∖{k}S′kAk′Wdk′. (5)

Clearly user can cancel the last item on the right side of the above equality if and only if for each one can obtain based on user ’s caching contents , i.e., can be linear represented by the rows of . This implies that the following equation always holds.

 rank(SkS′kAk′)=FMN.

Furthermore, user can obtain the packets of if and only if the following equation holds.

 rank(SkS′kAk)=F.

The proof is complete. ∎

###### Remark 1:

By (4) in Theorem 2, we know that the matrices and , , must be full row rank. From the knowledge of linear algebra, for each , , we could get matrix satisfying

 Dk,k′S0=S′kAk′. (6)

So from the above equation, (5) can be written as follows.

 S′kAkWdk=S′kXd−∑k′∈[0,K)∖{k}Dk,k′S0Wdk′.

Then

 (SkWdkS′kAkWdk)=(SkS′kAk)Wdk=⎛⎜⎝SkWdkS′kXd−∑k′∈[0,K)∖{k}Dk,k′S0Wdk′⎞⎟⎠.

This implies that

 (7)

The following example demonstrate how the scheme works.

###### Example 3:

Let , and . We have . For , , define and in the following way.

 A0=⎛⎜ ⎜ ⎜⎝e1e1e3e3⎞⎟ ⎟ ⎟⎠  A1=⎛⎜ ⎜ ⎜⎝e0e0e2e2⎞⎟ ⎟ ⎟⎠  A2=⎛⎜ ⎜ ⎜⎝e2e3e2e3⎞⎟ ⎟ ⎟⎠  A3=⎛⎜ ⎜ ⎜⎝e0e1e0e1⎞⎟ ⎟ ⎟⎠  A4=⎛⎜ ⎜ ⎜⎝e00e20⎞⎟ ⎟ ⎟⎠  A5=⎛⎜ ⎜ ⎜⎝e0e100⎞⎟ ⎟ ⎟⎠

Here is a row vector with length where the th entry is and other entries are s, . It is easy to check that (4) holds for each integer . A -division coded caching system can be implemented as follows, where :

• Placement Phase: The contents in each users are

 Z0={S0Wi:i∈[0,6)}    Z1={S1Wi:i∈[0,6)} Z2={S2Wi:i∈[0,6)}    Z3={S3Wi:i∈[0,6)} Z4={S4Wi:i∈[0,6)}    Z5={S5Wi:i∈[0,6)}
• Delivery Phase: Assume the request vector . Then the server just broadcasts

 Xd=A0W0+A1W1+A2W2+A3W3+A4W4+A5W5

Now let us consider user first. User can obtain the following content by and .

 S′0Xd = S0A0W0+S0A1W1+S0A2W2+S0A3W3+S0A4W4+S0A5W5 = (e1e3)W0+(e0e2)W1+(e2e2)W2+(e0e0)W3+(e0e2)W4+(e00)W5

That is,

 (e1e3)W0 = S′0Xd−(e0e2)W1−(e2e2)W2−(e0e0)W3−(e0e2)W4−(e00)W5.

Then we could get matrices

 D0,1=(1001), D0,2=(0101), D0,3=(1010), D0,4=(1001), D0,5=(1000)

satisfying (6). So we have

 (e1e3)W0=(w10w30)=S′0Xd−D0,1S0W1−D0,2S0W2−D0,3S0W3−D0,4S0W4−D0,5S0W5.

By (7), User can get

 W0=(S0S0A0)−1⎛⎜ ⎜ ⎜ ⎜ ⎜⎝w00w20w10w30⎞⎟ ⎟ ⎟ ⎟ ⎟⎠=⎛⎜ ⎜ ⎜⎝e0e2e1e3⎞⎟ ⎟ ⎟⎠−1⎛⎜ ⎜ ⎜ ⎜ ⎜⎝w00w20w10w30⎞⎟ ⎟ ⎟ ⎟ ⎟⎠=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝w00w10w20w30⎞⎟ ⎟ ⎟ ⎟ ⎟⎠.

Similarly we can show that the other users’ requests can be all satisfied.

Now we indicate that the linear coded caching system is a generalization of the PDA. We have the following theorem which is proved in Appendix.

###### Theorem 3:

The coded caching scheme obtained by a PDA is also linear.

The following example gives a demonstration.

###### Example 4:

It is easy to check that in Example 1 the coded caching scheme realized by a PDA can be represented by the following matrices and , , , , .

 S0=(e0e2) S1=(e1e3) S2=(e0e1) S3=(e2e3) S4=(e0e3) S5=(e1e2)
 A0=⎛⎜ ⎜ ⎜⎝0100000000010000⎞⎟ ⎟ ⎟⎠  A1=⎛⎜ ⎜ ⎜⎝0000100000000010⎞⎟ ⎟ ⎟⎠  A2=⎛⎜ ⎜ ⎜⎝0010000100000000⎞⎟ ⎟ ⎟⎠ A3=⎛⎜ ⎜ ⎜⎝0000000010010100⎞⎟ ⎟ ⎟⎠  A4=⎛⎜ ⎜ ⎜⎝0000010000110000⎞⎟ ⎟ ⎟⎠  A5=⎛⎜ ⎜ ⎜⎝1000000000000001⎞⎟ ⎟ ⎟⎠

It is easy to check that (4) holds for each integer .

## Iv A New construction of linear coded caching scheme

From Theorem 2, we need to find the three classes of matrices , and , satisfying the rank conditions. In this section, we give one concrete construction. There should be different constructions and one interesting open problem is finding some better constructions.

Throughout this paper we do not specifically distinguish a matrix and the vector space spanned by its rows if the context is clear. Note that (4) holds if and only if the following formula holds.

 {S′kAk′⊆Skif  k≠k′Sk+S′kAk′=FF2if  k=k′   k,k′∈[1,K) (8)

Here the sum of two subspace , of is defined as .

In this section, we will focus on the scheme over . For simplicity, we assume that each file has the size of bits.

### Iv-a Intuitions

From the proof of Theorem 3, we known that a PDA can be represented by caching matrices, coding matrices and decoding matrices satisfying. So we may construct a linear coded caching schemes by borrowing some similar key structures which are used in constructions of PDAs. But we should note here that there should be other methods to construct linear code caching schemes.

Now let us construct caching matrices, coding matrices and decoding matrices satisfying (8). For any fixed integers and , we can denote each integer by if where and . We refer to as the -ary representation of . There are partitions of , i.e., for each , the th partition is

 Vu,v={s=(s0,⋯,sm−1)q ∣∣ su=v,s∈[0,qm)},  0≤v

which are the placement sets to construct PDA in [27]. It is easy to check that the following formula holds for any integers , and , with .

 ∣∣Vu1,v1⋂Vu2,v2∣∣={0if  u1=u2,v1≠v2qm−2if  u1≠u2

Let be a length row vector where the th entry is and other entries are s. Define

 Eu,v={es | s∈Vu,v} (10)

and

 Qu={q−1∑su=0e(s0,⋯,sm−1)q ∣∣ sj∈[0,q),j≠u} (11)

where the sum is performed under modulo . For each , and with , define

 Cu,v,v′=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝ϕu,v(0)eφu,v′(0)ϕu,v(1)eφu,v′(1)…ϕu,v(qm−1)eφu,v′(qm−1)⎞⎟ ⎟ ⎟ ⎟ ⎟⎠+⎛⎜ ⎜ ⎜ ⎜⎝ϕu,v′(0)e0ϕu,v′(1)e1…ϕu,v′(qm−1)eqm−1⎞⎟ ⎟ ⎟ ⎟⎠ (12)

and

 Cu,q,v=⎛⎜ ⎜ ⎜ ⎜⎝ϕu,v(0)e0ϕu,v(1)e1…ϕu,v(qm−1)eqm−1⎞⎟ ⎟ ⎟ ⎟⎠ (13)

where

 (14)
###### Example 5:

Let and . From (10) and (11) we have

 E0,0={e0, e3, e6},     E0,1={e1, e4, e7},     E0,2={e2, e5, e8},E1,0={e0, e1, e2},     E1,1={e3, e4, e5}     E1,2={e6, e7, e8}

and

 Q0={e0+e1+e2, e3+e4+e5, e6+e7+e8},Q1={e0+e3+e6, e1+e4+e7, e2+e5+e8}.

By (12) and (13), we have

 (15)

The following result is very useful for our new construction of coded caching scheme.

###### Lemma 3:

Given positive integers and , sets and matrices in (10), (11), (12) and (13) satisfy the following conditions for any integers , and any integers , , .

• If , .

• If , . Otherwise, if , .

• If and , . Otherwise, if or , .

The proof of Lemma 3 is included in Appendix B. For any positive integer with , let . For any integer and , we can define sets

 Gv,ε={v+1+ε(q−z),v+2+ε(q−z),…,v+(q−z)+ε(q−z)}q, (16)

Here for any given integer set .

###### Example 6:
• When and , we have . From (16), we have

 (17)
• When and , we have . By (16), we have

 G0,0={1,2},   G0,1={3,4},   G0,2={5,6}, G1,0={2,3},   G1,1={4,5},   G1,2={6,7}, G2,0={3,4},   G1,1={5,6},   G1,2={7,0}, G3,0={4,5},