# Multi-Message Private Information Retrieval

with Private Side Information

###### Abstract

We consider the problem of private information retrieval (PIR) where a single user with private side information aims to retrieve multiple files from a library stored (uncoded) at a number of servers. We assume the side information at the user includes a subset of files stored privately (i.e., the server does not know the indices of these files). In addition, we require that the identity of the requests and side information at the user are not revealed to any of the servers. The problem involves finding the minimum load to be transmitted from the servers to the user such that the requested files can be decoded with the help of received and side information. By providing matching lower and upper bounds, for certain regimes, we characterize the minimum load imposed to all the servers (i.e., the capacity of this PIR problem). Our result shows that the capacity is the same as the capacity of a multi-message PIR problem without private side information, but with a library of reduced size. The effective size of the library is equal to the original library size minus the size of side information.

## I Introduction

Private Information Retrieval (PIR) is the problem of downloading contents from a library stored at a number of servers, without revealing the indices of requested contents to any of the servers. This problem was first considered in the computer science society, from a computational complexity viewpoint [1], [2], which resulted in elegant cryptographic PIR schemes even for the simple scenario of a single user connected to a single server. Recently, in [3], the capacity of this problem has been characterized from an information theoretic perspective^{1}^{1}1This notion of information theoretic privacy is much stronger than the notion of cryptographic privacy used in the computer science society..
Here, the capacity refers to the supremum of the number of decoded bits per each downloaded bit.
The setup considered in [3] consists of multiple non-colluding databases (servers) having access to a common library, from which a single user requests one file, via a shared link. Interestingly, as shown in [3], the user can wisely send queries to the non-cooperative servers so that the aggregate load imposed to servers is minimized, while no server gets any information about the requested file index. Following the success of [3] in characterizing the exact capacity of the problem, many works have investigated extensions of this setup, such as coded databases [4, 5], colluding databases [6], multi-message PIR [7], adversarial PIR [8], PIR with asymmetric traffic at servers [9], and private function retrieval [10, 11].

An important extension to the PIR problem is the case when the user has access to some form of stored data.
This data can be provided by caching data during network off-peak hours (known as the Cache Content Placement phase), to reduce the required load of servers when the actual request arrives (known as the Content Delivery Phase).
The first work considering this line is [12], which characterizes the capacity of a public cache setup for a multi-server scenario^{2}^{2}2By public we mean that all the servers know the cache contents.. In contrast, [13] considers a PIR problem with private cache where the servers do not know the cache contents.
Moreover, [14] considers a scenario where the cache content placement is done via the same servers used in the content delivery phase, hence resulting in a partially known cache scenario, i.e., each server only knows which contents itself has sent during the cache content placement phase.

While in the cache aided PIR problem one can design the cache contents, in the PIR problem with side information it is assumed that the user has access to a given subset of the library files. Along this direction, the authors in [15] consider a single server scenario with private side information, and characterize the capacity by reducing the PIR setup to an index coding problem. The work [16] extends this result to the original multiple server setup with a user having a private side information.

In this paper, we consider a new PIR scenario, namely multi-message PIR with private side information. In this setup we assume that the user requires multiple messages and at the same time has access to a private side information, which is a subset of the files in the library. This generalized setup includes the scenarios considered in [7] (i.e., multi-message PIR without any side information) and [16] (i.e., single-message PIR with private side information) as its special cases. We propose lower and upper bounds on the required load, which match in some regimes of problem parameters, and thus, characterize the PIR capacity of those regimes. Specifically, we establish that if the user has access to a private side information of size files, then the capacity of the problem is the same as the capacity of the multi-message PIR problem (see [7]) without any side information, but with a library of size reduced by . This result is a generalization of the same finding in [16] for the single-message PIR setting, and suggests the following conjecture: The capacity of any PIR problem with private side information of size is the same as capacity of the same problem without side information, but with a library size reduced by .

The structure of paper is as follows. In Section II we describe the problem setup. In Section III our main result for the capacity of a multi-message PIR problem with private side information is stated. Sections IV and V provide the achievability and converse proofs, respectively. Finally, Section VI concludes the paper.

## Ii Problem Setup

Consider a single user connected to servers having access to a library of files , where each file consists of symbols chosen independently and uniformly at random from a finite field , i.e., . We assume the user has a private side information which contains files from the library, denoted by , , where the servers do not know the index set . The user wishes to retrieve new file , , where . To this end, the user sends a set of queries , where server just receives without having any access to other queries (this is known as the non-colluding servers assumption). The queries should be designed such that the servers do not obtain any information about neither the requested file index set nor the side information index set as formally stated in the following

After receiving the queries, each server sends the answer which is a function of library contents and the query received at that server. The answers must be designed such that there exists a decoding function which satisfies

The objective is to characterize the minimum required download symbols defined as follows

so that the privacy and decodability constraints are satisfied, where the infimum is taken over all possible strategies. Equivalently, the PIR capacity can be defined as .

## Iii Main Result

In this paper, we characterize the capacity of a multi-message PIR problem with side information. Our main result is formally stated in Theorem 1.

###### Theorem 1.

For we have

(1) |

Moreover, for and we have

(2) |

It is worth mentioning that Theorem 1 can be equivalently stated as

under the assumptions of theorem, where is characterized in [7]. This implies that for the multi-message PIR problem, introducing private side information of size into the problem setup reduces the problem’s capacity to the capacity of a multi-message PIR problem without side information but with a library of size . Note that our result generalizes the same finding reported in [16] for the single message PIR problem with side information.

The above library size reduction effect would be trivial if the privacy constraint was not required for side information. However, we prove that the same performance can be achieved even with preserving the privacy constraint for side information. The main ingredient of the proposed achievable scheme is to use an outer layer of an MDS code to leverage the private side information available at the user.

## Iv The Achievability Proof

Let us first start with a motivating example for the case of .

###### Example 1.

Suppose we have servers, files, file as side information, and the user requests new files. First, we review the achievable scheme proposed in [7] which is designed for the case, i.e., no side information. In their scheme, each file contents is permuted independently and uniformly at random, resulting in permuted files , , , and . Moreover, it is assumed that these permutations are not known by the servers. Next, each of the files is partitioned into equal-sized chunks, i.e., , , , and .

Their scheme consists of two phases. Suppose the user requires to privately retrieve files and . In the first phase, each server sends a chunk from each file, i.e., Server 1 sends and Server 2 sends . In the second phase, the scheme uses a Reed-Solomon generator matrix in as follows

The user generates two new matrices and by applying two independent random permutations on the columns of . Then, the user requests from the first and second servers

respectively. Since user has received and from the second server in the first phase, it can decode and from the above two linear equations sent by the first server in the second phase. Similarly, it can decode and . Thus, the user has retrieved files and . This scheme results in the load of per decoded file.

Now, suppose the user has access to a private side information of size . We use the fact that the user can construct some of the above transmissions from its side information directly. In the above, Server 1 sends coded chunks where one of the file chunks in the first phase is already available as side information, which can be used to reduce the load of the first server. Since the side information should remain private to the server, we employ an MDS generator matrix . Then Server 1 sends linear equations of the original coded chunks with coefficients obtained from the rows of . By removing the term already available as side information, the user can form a linear system of equations to decode the remaining chunks. The same procedure is followed by the second server. This results in the load of per decoded file. Notice that as stated in Theorem 1, this scheme achieves the optimal load.

Although the above example was stated for the case , since the main idea of introducing the role of side information into the achievable scheme for the other case of is the same, we skip presenting an example for this case. In the following, we explain the general achievable scheme.

The proposed scheme is along the same line introduced in [16]. Let us first review the concept of linear PIR schemes. Suppose the user chooses , , independently and uniformly at random from the set of all permutations of , hidden from the servers. We define the scrambled version of file as . Then, we group these permuted symbols into chunks of symbols, i.e., where for , in which for the sake of presentation clarity we have assumed that divides .

A -sum of type with coefficients vector of chunks is defined as where are distinct elements of , , , and all the operations are performed element-wise in .

In a general linear PIR scheme, each server transmits blocks where the block consists of all possible types of -sums. This symmetric structure is a consequence of the privacy requirement [3]. Moreover, each type of -sums appears in distinct instances, by involving different chunks from the corresponding files in the -sum. Also, each distinct instance mentioned above appears times with possibly different coefficient vectors. Thus in total, each block consists of chunks which results in the total load of

(3) |

symbols, imposed to each server. Notice that in a general achievable scheme, the chunk size depends on and , hence we denote it by . A linear PIR achievable scheme is defined to be valid if it satisfies both the decodability and privacy constraints, defined in Section II.

Now, for the general PIR problem with private side information, we describe an achievable scheme by employing MDS codes on top of the PIR scheme without private side information, similar to the special case proposed in [16]. In order to do this, assume symbols are transmitted from each server in the PIR scheme without side information. Then, if the user is equipped with side information, it is clear that a subset of these symbols can be constructed directly from the side information, and thus should not be transmitted. Let us denote the number of such symbols by , which can be calculated as follows

(4) |

Now, consider an MDS code^{3}^{3}3In the following, we may remove the dependency of functions and on , , and wherever it is clear from the context. over the finite field where each server encodes its symbols with such an MDS code. Then, instead of sending the original symbols, each server only transmits symbols corresponding to the non-systematic part of such a code. Accordingly, the total number of symbols, per decoded file, required to be transmitted by all servers, in the presence of private side information, is

It can be easily verified that the new scheme is valid if the original scheme is valid. That is because, first, the user can recover the remaining elements from the coded symbols and the side information due to the MDS code properties. Second, the new scheme sends the same set of queries as the original scheme which satisfy the privacy constraint. Hence, the total load of new scheme is symbols per server.

The above construction is also used in [16] on top of the PIR problem studied in [3]. The interesting finding of [16] is that the load of the PIR problem with private side information of size is equal to the load of a PIR problem without private side information where the library size is reduced by , i.e., .

In this section, we use the general construction introduced above to develop an achievable scheme for the multi-message PIR scheme with private side information, based on the original scheme proposed in [7]. Thus, the main question to be answered is whether in this case providing the user with private side information of size will reduce the effective library size from to .

The answer to the above question is a Yes if the following constraint is satisfied

(5) |

where and , defined in (3) and (4), are determined by the achievable scheme through the coefficients , and . Thus, it just remains to check if the corresponding coefficients used in the achievable scheme proposed for the multi-message problem without private side information in [7] satisfy (5) or not. In the rest of this section we answer this question in the affirmative.

### Iv-a Analysis of the Achievable Scheme for

For this regime, considering the scheme introduced in [7] and without going into the details, one can verify the followings

### Iv-B Analysis of the Achievable Scheme for

By inspecting the scheme introduced in [7] for the case of , we can verify that

where according to [7] we have and satisfies . Here, is the solution of the linear equation [7, Eq. (64)]. Notice that we do not need the exact values of ’s since they will cancel out later from our computations. Moreover, we have in this regime.

The constant of the scheme in [7] can be derived as follows. Each -sum transmitted by each server can be one of the two following types. The first type includes those sums containing at least some chunks from the requested files and maybe some chunks from the not-requested files. The second type only consists of sums containing not-requested file chunks, which are being transmitted due to the privacy requirements. For the decodability constraint we require that the total number of useful equations (equations containing at least one chunk from the requested files) transmitted by all the servers should be equal to the total number of symbols required to recover the requested files (i.e., symbols). This will result in

Hence, to verify (5) we can proceed as follows

which is equal to according to [7]. This concludes the proof.

## V The Converse Proof

### V-a The Converse Argument for

To prove the converse for this case, let us proceed as follows. Without loss of generality we assume , . Then, we can state the following lemma.

###### Lemma 1.

Under the assumptions of Theorem 1, we have

(6) |

###### Proof.

The proof is provided in the Appendix. ∎

Now, we can lower bound as follows

where (a) follows from Lemma 1 and (b) holds due the following lemma, Lemma 2. This concludes the proof.

###### Lemma 2.

If , then we have

###### Proof.

The proof is provided in the Appendix. ∎

### V-B The Converse Argument for

To prove the converse for this regime, we can state the following lemma.

###### Lemma 3.

Under the assumption of Theorem 1, we have

###### Proof.

The proof is provided in the Appendix ∎

The result stated in Lemma 3 can be equivalently written as

(7) |

Equation (V-B) provides a lower bound on . Applying the same chain of inequalities used to prove Lemma 3, one can derive a similar inequality for . Hence, by applying (V-B) inductively over the library size, one can derive a lower bound for , similar to [7, Equation (125)]. The important difference of above calculations with those appeared in [7] is that here we start the induction from the message while the starting point of [7] is the message .

## Vi Conclusions

In this paper, we have characterized the capacity of a multi-message PIR problem where the user has access to a private side information, for certain regimes. Our result shows that the role of private side information is equivalent to reducing the effective library size by the size of side information. This result, along with a similar conclusion for a single-message PIR setup in [16], motivate the conjecture that for any PIR scheme, adding a private side information will be equivalent to reducing the effective library size.

## References

- [1] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, “Private information retrieval,” J. ACM, vol. 45, no. 6, pp. 965–981, Nov. 1998.
- [2] W. Gasarch, “A survey on private information retrieval,” Bulletin of the EATCS, vol. 82, pp. 72–107, 2004.
- [3] H. Sun and S. A. Jafar, “The capacity of private information retrieval,” IEEE Transactions on Information Theory, vol. 63, no. 7, pp. 4075–4088, July 2017.
- [4] R. Tajeddine, O. W. Gnilke, and S. E. Rouayheb, “Private information retrieval from mds coded data in distributed storage systems,” IEEE Transactions on Information Theory, pp. 1–1, 2018.
- [5] K. Banawan and S. Ulukus, “The capacity of private information retrieval from coded databases,” IEEE Transactions on Information Theory, vol. 64, no. 3, pp. 1945–1956, March 2018.
- [6] H. Sun and S. A. Jafar, “The capacity of robust private information retrieval with colluding databases,” IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2361–2370, April 2018.
- [7] K. A. Banawan and S. Ulukus, “Multi-message private information retrieval: Capacity results and near-optimal schemes,” CoRR, vol. abs/1702.01739, 2017. [Online]. Available: http://arxiv.org/abs/1702.01739
- [8] Q. Wang and M. Skoglund, “Secure symmetric private information retrieval from colluding databases with adversaries,” CoRR, vol. abs/1707.02152, 2017. [Online]. Available: http://arxiv.org/abs/1707.02152
- [9] K. A. Banawan and S. Ulukus, “Asymmetry hurts: Private information retrieval under asymmetric traffic constraints,” CoRR, vol. abs/1801.03079, 2018. [Online]. Available: http://arxiv.org/abs/1801.03079
- [10] H. Sun and S. A. Jafar, “The capacity of private computation,” CoRR, vol. abs/1710.11098, 2017. [Online]. Available: http://arxiv.org/abs/1710.11098
- [11] M. Mirmohseni and M. A. Maddah-Ali, “Private function retrieval,” CoRR, vol. abs/1711.04677, 2017. [Online]. Available: http://arxiv.org/abs/1711.04677
- [12] R. Tandon, “The capacity of cache aided private information retrieval,” CoRR, vol. abs/1706.07035, 2017. [Online]. Available: http://arxiv.org/abs/1706.07035
- [13] Y. Wei, K. A. Banawan, and S. Ulukus, “Fundamental limits of cache-aided private information retrieval with unknown and uncoded prefetching,” CoRR, vol. abs/1709.01056, 2017. [Online]. Available: http://arxiv.org/abs/1709.01056
- [14] ——, “Cache-aided private information retrieval with partially known uncoded prefetching: Fundamental limits,” CoRR, vol. abs/1712.07021, 2017. [Online]. Available: http://arxiv.org/abs/1712.07021
- [15] S. Kadhe, B. Garcia, A. Heidarzadeh, S. Y. E. Rouayheb, and A. Sprintson, “Private information retrieval with side information,” CoRR, vol. abs/1709.00112, 2017. [Online]. Available: http://arxiv.org/abs/1709.00112
- [16] Z. Chen, Z. Wang, and S. Jafar, “The capacity of private information retrieval with private side information,” CoRR, vol. abs/1709.03022, 2017. [Online]. Available: http://arxiv.org/abs/1709.03022

This appendix contains the omitted proofs from the main body of paper.

###### Proof of Lemma 1.

To prove this lemma we proceed as follows

(8) | |||

(9) |

where and ’s are all subsets of size from . Notice that without loss of generality we can choose . In the above chain of equations (a) holds since is independent from and , (b) is true because can be decoded from , , and , (c) follows since the answers are deterministic functions of the library and queries, (d) follows from [7, Lemma 1] (also, see [3]) and the chain rule, (e) holds since can be recovered from the side information available at the user and answers , (f) follows from [7, Lemma 1] and the fact that conditioning does not increase entropy, (g) chain rule (h) holds since can be recovered from , and the queries. ∎

###### Proof of Lemma 2.

In order to prove the lemma, we can write

where (a) follows since , and (b) holds since is a deterministic function of , , and , and (c) follows from [7, Lemma 1]. This completes the proof of the lemma. ∎

###### Proof of Lemma 3.

###### Lemma 4.

Consider , , and , then we have

###### Proof of Lemma 4.

We can write