On the Benefits of Asymmetric Coded Cache Placement in Combination Networks with End-User Caches

# On the Benefits of Asymmetric Coded Cache Placement in Combination Networks with End-User Caches

Kai Wan1, Mingyue Ji2, Pablo Piantanida1, Daniela Tuninetti3 1L2S CentraleSup lec-CNRS-Universit Paris-Sud, France, {kai.wan, pablo.piantanida}@l2s.centralesupelec.fr 2University of Utah, Salt Lake City, USA, mingyue.ji@utah.edu 3University of Illinois at Chicago, Chicago, USA, danielat@uic.edu
###### Abstract

THIS PAPER IS ELIGIBLE FOR THE STUDENT PAPER AWARD. This paper investigates the fundamental tradeoff between cache size and download time in the combination network, where a server with files is connected to relays (without caches) and each of the users (with caches of size files) is connected to a different subset of relays. Existing schemes fall within two categories: either use the uncoded symmetric cache placement originally proposed for the shared-link model and design delivery phase dependent on the network topology, or effectively divide the combination network into independent shared-link networks each serving users; in either case, the placement phase is independent of network topology. In this paper, a novel strategy is proposed where the coded cache placement is dependent on network topology. The proposed scheme is shown to be information theoretically optimal for large cache size. In addition, when not exactly optimal, the proposed scheme can also outperform existing schemes.

## I Introduction

Caching content at the end-user’s memories smooth the network traffic. A caching scheme comprises two phases. (i) Placement phase: during off-peak hours, the server places parts of its library into the users’ caches without knowledge of what the users will later demand. When pieces of files are simply copied into the cache, the cache placement phase is said to be uncoded; otherwise it is coded. (ii) Delivery phase: each user requests one file during peak-hour time. According to the user demands and cache contents, the server transmits the smallest number of files in order to satisfy the user demands.

Caching was originally studied by Maddah-Ali and Niesen (MAN) in [dvbt2fundamental] for shared-link networks, which comprises a server with files, users with a cache of size files, and an error-free broadcast link. The MAN scheme uses uncoded cache placement and a binary linear network code to deliver coded messages that are simultaneously useful for users. Coded caching was shown to provide a multiplicative coded caching/multicast gain of over conventional uncoded caching schemes. In [yas2], a variation of the MAN scheme was shown to be information theoretically optimal to within a factor for shared-link networks.

Since users may communicate with the central server through intermediate relays, caching in relay networks has recently been considered. Since it is difficult to analyze general relay networks, a symmetric network, known as combination network [cachingincom], has received a significant attention. A combination network comprises a server with files that is connected to relays (without caches) through orthogonal links, and each of the users (with caches of size files) is connected to a different subset of relays through orthogonal links, as shown in Fig. 1.

#### Past Work (for combination networks)

Existing works use MAN uncoded placement for shared-link networks for the placement phase (which is agnostic of the network topology) and then design the delivery phase by leveraging the network topology [cachingincom, novelwan2017, wan2017novelmulticase]; these schemes are symmetric in the sense that for every file there exists one subfile cached by each subset of users. The main limitation of the MAN placement is that the multicasting opportunities (directly related to the overall coded cahing gain) to transmit the various subfiles are not “symmetric” across subfiles (because relays are connected to different sets of users). One way to deal with this limitation is to divided the combination network into independent shared-link network and to precode every file by an MDS (Maximum Distance Separable) code so that it becomes irrelevant from which relay a user has received a coded subfile–as long as enough coded subfiles have been collected [Zewail2017codedcaching]. The limitation of this coded placement is that the coded caching gain is now that of a network with equivalent users, which appears to be suboptimal in light of known results for shared-link networks (i.e., the coded caching gain fundamentally scales linearly with the number of users ).

#### Contributions

In this paper we propose a novel placement that aims to attain identical “multicasting opportunities” for each coded subfile, which is then delivered by using a variation of the scheme proposed in [wan2017novelmulticase]. Interestingly, our asymmetric placement leads to a “symmetric delivery”–to be made precise later. The novel scheme is proved to be information theoretically optimal when . To the best of our knowledge, this is the very first work that characterizes the exact memory-download time tradeoff for combination networks. In addition, when not optimal, the proposed scheme can also outperform state-of-the-art schemes.

The paper is organized as follows. Section II gives the formal problem definition and states the main results. Section III contains proofs and numerical evaluations.

## Ii System Model and Main Results

We use the following notation convention. A collection is a set of sets, e.g., . Calligraphic symbols denote sets or collections, bold symbols denote vectors, and sans-serif symbols denote system parameters. We use to represent the cardinality of a set or the length of a vector; and ; represents bit-wise XOR. We define the set Our convention is that if or or .

### Ii-a System Model

In a combination network, a server has files, denoted by , each composed of i.i.d uniformly distributed bits. The server is connected to relays through error-free orthogonal links. The relays are connected to users through error-free orthogonal links. Each user has a local cache of size bits, for , and is connected to a distinct subset of relays.

The set of users connected to relay is denoted by . The set of relays connected to user is denoted by . For each set of relays , we denote the common connected users for the relays in by . For the network in Fig. 1, for example, , and .

In the placement phase, user stores information about the files in its cache of size bits, where . The cache content of user is denoted by ; let . During the delivery phase, user requests file ; the demand vector is revealed to all nodes. Given , the server sends a message of bits to relay . Then, relay transmits a message of bits to user . User must recover its desired file from and with high probability when . The objective is to determine the optimal max-link load defined as

 R⋆:=minZmaxk∈Uh,h∈[H],d∈[N]Kmax{Rh(d,Z),Rh→k(d,Z)}. (1)

Since the max-link load of the uncoded routing scheme in [cachingincom] is , we define the coded caching gain of a scheme with max-link load as

 g:=RrR=K/H(1−M/N)R. (2)

Define that , where is the number of users connected to each relay. By the cut-set bound in [cachingincom], .

### Ii-B Main Results

We now state our main results. Thm.1 gives the max-link load of the novel proposed scheme with coded asymmetric cache placement and Thm.2 gives the optimality results. Different from the state-of-the-art schemes, which fix the cache size and compute the load (and thus the coded caching gain), in the proposed scheme we fix a coded caching gain and then find the minimum needed cache size.

###### Theorem 1.

For an combination network, the lower convex envelop of the following points

 (M,R)=⎛⎜⎝N(K′′−rq)(K′′−rq)+r(K′−1q−1),K/H(1−M/N)g⎞⎟⎠, (3)

for (coded caching gain) , , and , is achievable.

###### Theorem 2.

Under the same assumptions of Thm.1, when , we have

 R⋆=(1−M/N)/r. (4)
###### Proof:

Converse From the general cut-set outer bound in [cachingincom, Thm.2] with and , we have . Between , the outer bound is a straight line. When we have

 R⋆≥1−(K−H+r−1)/Kr=H−r+1rK. (5)

Achievability When in Thm.1, we have and . Thus from (3) we have

 R⋆≤K(1−M/N)Hg=H−r+1rK, (6)

which coincides with (5). The memory sharing is then used between and . ∎

###### Remark 1.

For the scheme in [Zewail2017codedcaching], in order to achieve a coded caching gain the minimum needed cache size is . When , the minimum cache size for our scheme in Thm.1 is strictly less than the one in [Zewail2017codedcaching]; moreover when , we have , and for any memory size the proposed scheme is better than [Zewail2017codedcaching]. The proof can be found in the extended version of this paper.

## Iii Proof of Thm.1

#### Uncoded cache placement

If each user directly store some bits of files in the cache, the placement is uncoded. When placement is uncoded, each file can be effectively partitioned as where represents the bits of which are only cached by users in .

#### MAN placement

For where , each file where is divided into non-overlapping subfiles of length bits; .

With MAN placement, a delivery scheme to create multicast messages by leveraging the symmetries in the topology was proposed in [wan2017novelmulticase]; in Section III-A, we revisit it. In Section III-B, we describe the proposed scheme to achieve coded caching gain . In Section III-C, we generalize the scheme to any coded caching gain .

### Iii-a Separate Relay Decoding delivery Scheme (SRDS) [wan2017novelmulticase]

In the delivery phase, user should recover for all . For each such subfile , we find (i.e., the set of relays each relay in which is connected to the largest number of users in ). We partition into equal-length pieces and denote . For each relay , we add to ; here represents the set of bits needed to be recovered by user (first entry in the subscript) from relay (superscript) and already known by the users in (second entry in the subscript) who are also connected to relay (superscript).

The next step is to generate multicast messages. For each relay and each set , the server forms the multicast messages

 WhJ:=⊕k∈JThk,J∖{k}, (7)

where we used the same convention as that in the literature when it comes to ‘summing’ sets. The message is sent to relay , which then forwards it to the users in .

The main limitation of SRDS with MAN placement is that the delivery of some subfiles, due to the network topology, needs more bits than than others–see Example 2 later on. In the next subsection, we propose a novel placement so that all multicast messages need the same amount of transmitted bits to be delivered to the intended users, in other words, all multicast messages have the same “multicasting opportunities” and thus the delivey phase is “symmetric”.

### Iii-B Novel Caching Scheme for g=K′

We start by describing by way of an example to achieve the maximal coded caching gain ; in this case, the coded multicast gain equals the number of users connected to a relay.

###### Example 1 (H=5, r=3, N=10, g=6).

In this example, we have and

 U1=[6], U2={1,2,3,7,8,9}, U3={1,4,5,7,8,10}, U4={2,4,6,7,9,10}, U5={3,5,6,8,9,10}.

We aim to achieve coded caching gain , that is, every multicast message is simultaneously useful for users. Since (each user is connected to three relays), we can see that every relays (denoted by ) have common connected users (denoted by ). Besides the relays in , each of these three users is connected to a different relay other than the two relays in . For one user , we assume user is connected to relay where ; since relay is connected to only one user (user ) in , we have . 111 For example, if , we have ; let us focus on user , who is also connected to relay ; we can see . This motivates the following placement, which considers all the subsets of relays with cardinality .

#### Placement phase

We divide each into non-overlapping and equal-length pieces and denote

 Fi=(Fi,[K]∖PY:Y⊆[H],|Y|=r−1) ={Fi,[10]∖{1,2,3},Fi,[10]∖{1,4,5},Fi,[10]∖{2,4,6},Fi,[10]∖{3,5,6}, Fi,[10]∖{1,7,8},Fi,[10]∖{2,7,9},Fi,[10]∖{3,8,9},Fi,[10]∖{4,7,10}, Fi,[10]∖{5,8,10},Fi,[10]∖{6,9,10}}.

It can be seen that the required memory size is . Note that, compared to MAN placement, not all subfiles where are present.

#### Delivery phase

Assume . We use SRDS to let each user recover where . For example, user should recover , and . For , we can see that relay  is connected to users , relay  is connected to users while relay  is connected to users and thus we have . Therefore . Similarly, and . After considering all the subfiles demanded by all users, for relay (and similarly for all other relays) we have

For each relay , we create the multicast messages as in (7) to be sent to relay and then forwarded to the users in . Notice that in this example, by the novel placement, each subfile is multicasted with other subfiles and thus the coded caching gain is . The achieved max-link load is , which coincides with the cut-set outer bound in [cachingincom]; the max-link load in [Zewail2017codedcaching] is . In this example, the proposed placement is uncoded and is information theoretically optimal.

We now generalize the scheme in Example 1 to achieve the maximal coded caching gain .

#### Placement phase

Each file is divided into non-overlapping and equal-length pieces denoted by . is cached by user if , which requires (since ).

#### Delivery phase

User should recover where with and . It can be seen that if and only if , we have . So we need to consider each user and each set of relays with cardinality . We can see that and let . Besides , each user in is connected to a different relay other than the relays in . Hence, and thus . Each relay is connected to users in and thus . So for , we have and put it in .

For each relay , the server forms the multicast messages as in (7) and transmits it to relay , which then forwards it to users in .

#### Max-link load

Each demanded subfile is multicasted with other subfiles and thus . As a result, the max-link load is as in (3).

### Iii-C Generalization to g∈[2:K′]

We now extend the scheme in Section III-B to any . The novel ingredient here is an additional ‘precoding’ of the files before placement, i.e., in other words, the design of a coded placement based on the topology of the network instead of uncoded placement. We start with an example.

###### Example 2 (H=4, r=2, N=6, g=2).

In Example 1, for each collection of subsets of relays with cardinality each, we have one corresponding subfile. Similar to Example 1, for each set of relay, in this example we also determine the set of common connected users, in this case , i.e., , , , and . In addition, we have . Before introducing the additional MDS precoding, we show that, if we proceed as for the previous example, not all the subfiles are sent in a linear combination involving the same number of subfiles, in other words, not all subfiles have the same “multicasting opportunities.”

Consider subsets of relays each with cardinality , e.g., and . Since besides relay , each user in is connected to a different relay other than relay , we have . Similarly, we have . It can also be checked that . Hence, we have and thus . Hence with SRDS, for user we can transmit and simultaneously in one linear combination. Similarly, it can be seen that the subfiles , and demanded by user have the same “multicasting opportunities” as .

However, consider the following subsets of relays each with cardinality : and . For user who does not know , since , we can see that for each we have and so . In other words, with SRDS to transmit , we cannot transmit other subfiles in the same combination.

Hence, the main idea of our proposed scheme is to let user recover , , and in the delivery phase, and ignore which has less “multicasting opportunities”. Notice that the subfile is cached by user . This motivates the following placement.

#### Placement phase

Each file is divided into non-overlapping and equal-length pieces, which are then encoded by using a MDS code (not the MDS code as in [Zewail2017codedcaching]). Each MDS coded symbol of is cached by one user and is denoted by , which contains bits. So the cache size needs to be .

#### Delivery phase

Assume . We use SRDS to let each user recover where and , such that from placement and delivery phases, each user can obtain MDS coded symbols of file and is thus able to recover . For example, user must recover , , , and ; those, together with the cached MDS coded symbol , allows him to recover . For , we can see that relay 1 is connected to user , while relay 2 is connected to user , and thus we have and . After considering all the subfiles demanded by all the users, for relay (and similarly for all other relays) we have

 T11,{2}={f1,{2}}, T11,{3}={f1,{3}}, T12,{1}={f2,{1}}, T12,{3}={f2,{3}}, T13,{1}={f3,{1}}, T13,{2}={f3,{2}}.

We then create the multicast messages as in (7) for each where . For example, the server transmits to relay 1 , which are then forwarded to the demanding users. The achieved max-link load is , while that of [Zewail2017codedcaching] is . The outer bound idea used in [cachingincom], which leverages the cut-set bound from [dvbt2fundamental], can be straightforwardly extended to leverage the tighter outer bound from [yas2]; by doing so, for this example we obtain as outer bound ; therefore, our proposed scheme is optimal. In this example, thanks to the novel placement, each MDS coded symbol is multicasted with another one and thus the coded caching gain is .

Notice that the outer bound under the constraint of uncoded placement in [novelwan2017, Thm.4] is , that is, in this example using uncoded cache placement is strictly suboptimal.

We now present our novel scheme that attains .

#### Placement phase

Recall , and . Each file is divided into non-overlapping and equal-length pieces, which are then encoded by using a MDS code. For each collection including subsets of relays with cardinality each, there is an MDS coded symbol cached by users in . The required memory size to store these MDS coded symbols is . After the placement phase, since there are collections of subsets of relays each of cardinality , each file has MDS coded symbols. For each user , we divide the collections into classes.

1. Class 1: if there is no such that , we have and thus the symbol is cached by user where . The number of collections of Class 1 is .

2. Class 2: if and , we transmit the symbol to user in one combination including other symbols in the delivery phase. Furthermore, if and only if at least one is a subset of , we can see that . Recall that each has relays. So if , at most one is a subset of . As a result, if and only if and there exists only one such that , one has and . Hence, the number of symbols to be recovered by user in the delivery phase is .

3. Class 3: if and , for each we have . Hence, we let user ignore the symbol .

#### Delivery phase

In the delivery phase, we let each user recover the MDS coded symbols in Class 2. We focus on each user , each relay , and each set of users where and . Besides relays in , each user in is connected to a different relay other than the relays in . So we have that . Let . For each user , we can find a set of relays . We can similarly prove that . Hence, we construct the collection

 Q′={Hk′∖{h}:k′∈M}∪{Hk∖{h}} (8)

and we have . By this construction, we have . In addition, since and there is no set in containing relay , we have that is the only set in which is a subset of and that . So should be recovered by user and be put in . As a result, for each relay and each set of users where and , we consider a different symbol of demanded by user . With and , we can prove that in the delivery phase, we consider all of the symbols which are needed to be recovered by user .

For each relay and each set where , the server forms the multicast messages as in (7) and transmit it to relay , who then forwards it to each user .

Notice that when , we have . So and thus we need not MDS precoding procedure. Hence, the above scheme is equivalent to the scheme in Section III-B.

#### Max-link load

Each demanded subfile is multicasted with other subfiles such that the coded caching gain is . As a result, the max-link load is as in (3).

###### Remark 2.

From (8), since , for each , we have . Hence, when , we have . In conclusion, and the multicast message generation is identical to SRDS. However, when , may not be the relay connected to the largest number of users in the considered subset and the coded caching gain may be reduced. So the future work includes the improvement for .

###### Example 3 (H=6, r=2, N=15).

In Fig. 2, we compare the performance of the proposed scheme with that of [Zewail2017codedcaching, wan2017novelmulticase, novelwan2017] and the enhanced cut-set outer bound based on [yas2] as described in Example 2. Notice that the proposed scheme is exactly optimal for .

## Iv Further Improvement for Thm.1

We can further improve the asymmetric coded placement proposed in Section III-C.

It is stated in Section III-C that for each collection where and each element in includes relays, we can generate one MDS symbol