The CEO Problem With Secrecy Constraints

# The CEO Problem With Secrecy Constraints

## Abstract

We study a lossy source coding problem with secrecy constraints in which a remote information source should be transmitted to a single destination via multiple agents in the presence of a passive eavesdropper. The agents observe noisy versions of the source and independently encode and transmit their observations to the destination via noiseless rate-limited links. The destination should estimate the remote source based on the information received from the agents within a certain mean distortion threshold. The eavesdropper, with access to side information correlated to the source, is able to listen in on one of the links from the agents to the destination in order to obtain as much information as possible about the source. This problem can be viewed as the so-called CEO problem with additional secrecy constraints. We establish inner and outer bounds on the rate-distortion-equivocation region of this problem. We also obtain the region in special cases where the bounds are tight. Furthermore, we study the quadratic Gaussian case and provide the optimal rate-distortion-equivocation region when the eavesdropper has no side information and an achievable region for a more general setup with side information at the eavesdropper.

## 1Introduction

As networks are becoming more distributed, their vulnerability to malicious activities increases which in turn raises the concern on the security of such networks. Consequently, information-theoretic security as a concrete framework for analyzing secrecy in networks has gained attention among researchers [2]. Information-theoretic security, which was initially introduced by Shannon [4], exploits different statistical characteristics of received information at the legitimate receiver and at the eavesdropper. Moreover, it makes no assumptions on the computational power of the eavesdropper, unlike the traditional cryptographic approaches for secrecy. Later, Wyner introduced the Wiretap channel model in [5] and showed that perfectly secure communication without a shared secret key is possible if the channel from the transmitter to the eavesdropper is a degraded version of the channel to the legitimate receiver. This result was generalized to broadcast channels with confidential messages by Csiszár and Körner in [6]. Subsequently, many extensions to this problem have been developed and studied in the literature (see, for instance, [2], [3], and references therein).

### 1.1Related Work

The chief executive/estimation officer (CEO) problem was motivated in [7] by a communication and distributed processing system analogous to a scenario in which a firm’s CEO is interested in information of a source that cannot be observed directly. The CEO assigns a group of agents to independently observe a corrupted version of the source and communicate their observations. The lossless variant of this setup was initially studied by Gel’fand and Pinsker [8]. It was extended by Yamamoto and Itoh [9] as well as Flynn and Gray [10] to the lossy case with only two encoders for which an achievable rate-distortion region was derived. The model was generalized to the CEO problem with many encoders by Berger and Viswanathan [7] in which the trade-off between the end-to-end average distortion and sum of the rates at which the agents transmit to the CEO was studied. Multiterminal lossy source coding problems, including the CEO problem, are still open in general. However, for the special case of the quadratic Gaussian CEO problem [11], the sum-rate-distortion function for infinite number of agents with identical signal-to-noise ratios (SNRs) was derived by Oohama [12], and later, the complete rate-distortion region with arbitrary number of agents and SNR values was characterized by Prabhakaran et al. [13] and Oohama [14]. More recently, Courtade and Weissman [15] gave the rate-distortion region of the CEO problem under the logarithmic-loss distortion measure.

Secure lossless source coding with uncoded side information at the legitimate decoder and the eavesdropper was studied by Prabhakaran and Ramchandran [16] with the assumption of no rate constraint on the encoder-decoder link. The minimum leakage rate was derived and it was shown that due to the side information at the eavesdropper, the usual Slepian-Wolf scheme [17] is not always optimal. Lossless source coding with coded side information at the decoder (the so-called one-helper problem) and no side information at the eavesdropper was studied by Tandon et al. [18] where the rate-equivocation region was characterized. This setup was extended by Gündüz et al. [19] with additional side information at the eavesdropper in which inner and outer bounds on the compression-equivocation rate region were derived that did not match in general. Secure distributed lossless compression of two correlated sources, in which both sources were to be estimated at the decoder, was considered by Luh and Kundur [20] without side information at the eavesdropper and by Gündüz et al. [21] with side information at the eavesdropper. These models were generalized by Salimi et al. [22] to the case where both the legitimate receiver and the eavesdropper have access to correlated side information and the eavesdropper can choose to intercept either links from the encoders to the decoder at each instant. In [22], inner and outer bounds for the compression-equivocation region were provided which were proved to be tight for several special cases.

The extension to the lossy case was considered in [23], and more recently by Villard and Piantanida [26] in which inner and outer bounds on the rate-distortion-equivocation region were derived. The optimal characterization of the rate-distortion-equivocation region was first found in [24] for the lossy case with uncoded side information. Later in [26], the optimal characterization for the lossless case was also derived. A different setup was considered by Kittichokechai et al. [27] in which the eavesdropper can only access the coded side information, and the complete region was characterized under the logarithmic-loss distortion [15]. Chia and Kittichokechai [28] studied the case when the encoder has access to the side information of the decoder. Tandon et al. [29] considered a scenario with two legitimate receivers and investigated the privacy of side information at one receiver with respect to the other one. An alternative approach to provide secrecy in source coding problems is based on having a shared secret key between the transmitter and the legitimate receiver [30], although we do not exploit this approach in our work.

### 1.2Contributions

Our setup in this paper has two main distinctions from the aforementioned scenarios; first, the destination (CEO) is interested in estimation of the original source rather than the agents’ observations as in all prior works. Similarly, the secrecy constraints in our problem are on the equivocation of the eavesdropper with respect to the remote source, not to the observations of the agents. In fact, our setup is a generalization of the previous cases considered for lossy secure source coding problems. We extend our previous work [33] for the lossless variant of this problem to the lossy case and derive inner and outer bounds on the rate-distortion-equivocation region of the CEO problem with secrecy constraints. We also investigate the region in special cases where the bounds are tight and we show that for these special cases our results coincide with the previous results in the literature.

In addition, we consider the quadratic Gaussian CEO problem with secrecy constraints and provide the optimal characterization of the rate-distortion-equivocation region for the case when the eavesdropper has no side information and an achievable region for a more general setup with side information at the eavesdropper.

### 1.3Notations and Organization

In this paper, we use capital letters to indicate a random variable, small letters to indicate realization of a random variable, calligraphic letters to denote a set, e.g., , and to indicate the cardinality of the set. The notation denotes the sequence . The notion shows that , , and form a Markov chain, i.e., or . We define for , and for . Finally, denotes the indicator function such that for , and otherwise.

The rest of the paper is organized as follows: In Section 2, we describe the problem along with some definitions. Main results for inner and outer bounds on the rate-distortion-equivocation region are presented in Section 3. Then, we study some special cases of our results in Section 4 where the region is completely characterized. The rate-distortion-equivocation region for the quadratic Gaussian case is given in Section 5. Finally, the paper is concluded in Section 6.

## 2Problem Setting

Let be a finite distortion measure. We define the component-wise mean distortion between two sequences , in as

## 3Inner and Outer Bounds on the Rate-Distortion-Equivocation Region

### 3.1Inner Bound

The achievability scheme resulting in the inner bound is based on superposition coding and random binning at the agents, and joint decoding at the CEO. In particular, agent first transmits the bin index related to the auxiliary random variable with distribution via the noiseless link. Then, the agents send the remaining information which is required for the CEO to be able to reconstruct the source based on the Wyner-Ziv scheme [34]. The detailed proof is given in Appendix Section 7, however, we provide some intuitions on the results. Inequalities – and are similar to the Berger-Tung bounds [35] that establish perfect estimation of and at the CEO from which can be reconstructed within the distortion limit . In the equivocation bounds and , the first term corresponds to Eve’s uncertainty about the source after decoding the codeword based on the received bin index combined with her side information and the second term is the reduction in her uncertainty when receiving the remaining information transmitted to the CEO by the agents. Finally, the last term in and stems from the fact that in contrast to previous works, the secrecy constraints are on Eve’s equivocation with respect to the original source while the transmitted information by the agents are functions of their respective observations and not the source, resulting in an increase in Eve’s uncertainty. Inequalities and depict a trade-off between Eve’s equivocation and transmission rates, implying that each link’s transmission rate limits the other link’s equivocation rate.

## 4Special Case: The One-Helper Problem with Secrecy Constraints

If Agent 1 has access to the source sequence , our setup reduces to the lossy source coding problem with a helper and an eavesdropper who can choose to listen in on either source-destination or helper-destination links.

where the auxiliary random variables and satisfy the Markov chain .

The achievability proof follows from the proof of Theorem ? by setting and . Inequalities – are inactive for this setup. The converse proof is given in [26] for the secure lossy source coding with uncoded side information. Note that if Eve intercepts the helper’s link, it can also reconstruct the helper’s sequence losslessly.

The achievability proof follows from the proof of Theorem ? by setting and . The converse proof is similar to the proof given in [16].

The achievability is a special case of Theorem ? and obtained by setting and to be constants, , and . The proof of converse is given in Appendix Section 11.

In this section, we study the Gaussian CEO problem with secrecy constraints and quadratic distortion measure.

Let be a Gaussian source, i.e., . The observations at the agents are modeled as for , with , where Gaussian random variables , , and are mutually independent.

First, we consider the case where the eavesdropper has no side information. The model is depicted in Figure 2 and the following theorem provides the complete rate-distortion-equivocation region for this Gaussian setup.

An example of the region of Theorem ? is illustrated in Figure 4 for different distortion constraints.

Next, we consider the case where Eve has access to additional side information correlated to the source as shown in Figure 5. We model this side information as where is a Gaussian random variable with and is independent of , , and . The following theorem gives an inner bound for the rate-distortion-equivocation region of the quadratic Gaussian CEO problem with secrecy constraints and side information at the eavesdropper.

Note that if there is no correlation between Eve’s side information and the source, i.e., , the region of Theorem ? coincides with the one in Theorem ?.

## 6Conclusion

We studied the extension of the CEO problem with secrecy constraints. This setup is of interest to communication scenarios such as sensor networks or smart power grids in which links are vulnerable to eavesdropping. We derived inner and outer bounds on the rate-distortion-equivocation region in the discrete case. We also showed that the results that were derived for the one-helper problem with secrecy constraints in [18] and [26] can be obtained as special cases of our results for the CEO problem with secrecy constraints. In addition, we provide the optimal region for the quadratic Gaussian case when Eve has no side information as well as an achievable region for a more general case. In this work, we have considered noise-free links from the agents to the CEO, however, it would be interesting to investigate the effects of noisy channels in this problem. Moreover, extending this problem to include more agents and eavesdroppers with possibly different side information is another direction worthwhile investigating.

## 7Proof of Theorem : The Inner Bound

We first state the following lemma that we use in the proof of Theorem ?. The lemma follows from [2].

Now, we proceed to prove Theorem ?.

Let , , , and be random variables on some finite sets , , , and according to the joint distribution , along with a function satisfying the conditions of Theorem ?.

Codebook generation

: For fixed conditional distributions and , , randomly generate independent codewords of length according to , where . Then, divide them into equal-sized bins, indexed by and denoted by . For each codeword , randomly generate independent sequences according to , and divide them into equal-sized bins, indexed by and denoted by . Define for . The codebook is revealed to the agents, CEO, and Eve.

Encoding

: Assume that the sequence is observed by Agent , . Find a codeword jointly typical with . If there is more than one such codeword, select one uniformly at random. If there is no such , select one out of uniformly at random. Given , find a codeword jointly typical with . If there is more than one such codeword, select one uniformly at random. If there is no such , select one out of uniformly at random. The agent transmits the bin indices and of the codewords and , respectively, i.e., .

Decoding at the CEO

: Given the received messages from both agents, and , find a unique index tuple such that the codewords are jointly typical, and they are in the bin indexed by . If there is such a unique index tuple, compute the source estimate component-wise as for ; otherwise set the output to an arbitrary sequence in .

Error analysis

: Let and be the chosen indices at the encoders and decoder, respectively. Let denote the probability of an error event during encoding and decoding steps. We now show that this probability, averaged over all possible codebooks, tends to zero as provided that conditions of Theorem ? is satisfied. Consider the following error events in the encoding steps (for , and ):

Next, consider the following error event in decoding step:

Finally, by the union of events bound, the probability of error in the encoding and decoding steps is upper bounded as

We proceed to bound each term in . From properties of typical sequences, vanishes as . By covering lemma [39], and tend to zero as . For , and , since , by conditional typicality lemma [39], tends to zero as . Similarly, as , also vanishes as . To bound , let . Then, , and by Markov lemma [39], tends to zero as . Using similar steps and based on Markov lemma, and also tend to zeros as .

As can be seen from , in the decoding step, an error occurs if the decoded codewords are jointly typical and they are in the bin indexed by , however, the decoded tuple of codeword indices are different from the chosen ones at the encoders, i.e., . We split this event into eight possible events (other events result in the same constraints as one of these eight events) and bound its probability using the union of events bound as follows:

Now, we consider each of the terms in –.

where is due to the mutual packing lemma [39]. Therefore, vanishes as if

where and follow from the long Markov chain .

Next, we bound the probability of the event in which is correctly decoded but not , i.e., as