Compressed Secret Key Agreement

Compressed Secret Key Agreement

Maximizing Multivariate Mutual Information Per Bit
Chung Chan Institute of Network Coding,
The Chinese University of Hong Kong, Hong Kong.

The multiterminal secret key agreement problem by public discussion is formulated with an additional source compression step where, prior to the public discussion phase, users independently compress their private sources to filter out strongly correlated components for generating a common secret key. The objective is to maximize the achievable key rate as a function of the joint entropy of the compressed sources. Since the maximum achievable key rate captures the total amount of information mutual to the compressed sources, an optimal compression scheme essentially maximizes the multivariate mutual information per bit of randomness of the private sources, and can therefore be viewed more generally as a dimension reduction technique. Single-letter lower and upper bounds on the maximum achievable key rate are derived for the general source model, and an explicit polynomial-time computable formula is obtained for the pairwise independent network model. In particular, the converse results and the upper bounds are obtained from those of the related secret key agreement problem with rate-limited discussion. A precise duality is shown for the two-user case with one-way discussion, and such duality is extended to obtain the desired converse results in the multi-user case. In addition to posing new challenges in information processing and dimension reduction, the compressed secret key agreement problem helps shed new light on resolving the difficult problem of secret key agreement with rate-limited discussion, by offering a more structured achieving scheme and some simpler conjectures to prove.

secret key agreement; source compression; rate-limited discussion; communication complexity; dimension reduction; multivariate mutual information

1 Introduction

In Information-theoretic security, the secret key agreement problem by public discussion is the problem where a group of users discuss in public to generate a common secret key that is independent of their discussion. The problem was first formulated by Maurer [34], Ahlswede and Csiszár [1] under a private source model involving two users who observe some correlated private sources. Rather surprisingly, public discussion was shown to be useful in generating the secret key, i.e., it strictly increases the maximum achievable key rate called the secrecy capacity. Such phenomenon was also discovered in [4] in a different formulation. Furthermore, the secrecy capacity was given an information-theoretically appealing characterization— it is equal to Shannon’s mutual information [41] between the two private sources, assuming the wiretapper can listen to the entire public discussion but not observe any other side information of the private sources. It was also shown that the capacity can be achieved by one-way public discussion, i.e., with only one of the users discusses in public.

As a simple illustration, let , and be three uniformly random independent bits, and suppose user  observes privately while user  observes , where when but when . If user  reveals in public, then user  can recover and therefore . Furthermore, since is independent of , it can serve as a secret key bit that is recoverable by both users but remains perfectly secret to a wiretapper who observes only the public message . This scheme achieves the secrecy capacity equal to the mutual information roughly because user  reveals  bit in public so there is  bits of randomness left for the secret key. However, if no public discussion is allowed, it follows from the work of Gác and Körner [27] that no common secret key bit can be extracted from the sources. In particular, cannot be used as a secret key because user  does not know whether is or . and also cannot be used as a secret key either because they may not be observed by user  when and respectively. It can be seen that, while the private sources are clearly statistical dependent, public discussion is needed to consolidate the mutual information of the sources into a common secret key.

The secret key agreement formulation was subsequently extended to the multi-user case by Csiszár and Narayan [22]. Some users are also allowed to act as helpers who can participate in the public discussion but need not share the secret key. The designated set of users who need to share the secret key are referred to as the active users. Different from the two-user case, one-way discussion may not achieve the secrecy capacity when there are more than two users. Instead, an omniscience strategy was considered in [22] where the users first communicate minimally in public until omniscience, i.e., the users discuss in public at the smallest total rate until every active user can recover all the private sources. The scheme was shown to achieve the secrecy capacity in the case when the wiretapper only listens to the public discussion. This assumes, however, that the public discussion is lossless and unlimited in rate, and the sources take values from finite alphabet sets. If the sources were continuous or if the public discussion were limited to a certain rate, it may be impossible to attain omniscience.

This work is motivated by the search of a better alternative to the omniscience strategy for multiterminal secret key agreement. A prior work of Csiszár and Narayan [21] considered secret key agreement under rate-limited public discussion. The model involves two users and a helper observing correlated discrete memoryless sources. The public discussion by the users is conducted in a particular order and direction. While the region of achievable secret key rate and discussion rates remains unknown, single-letter characterizations involving two auxiliary random variables were given for many special cases, including the two-user case with two rounds of interactive public discussion, where each user speaks once in sequence, with the last public message possibly depending on the first. By further restricting to one-way public discussion, the characterization involves only one auxiliary random variable and was extended to continuous sources by Watanabe and Oohama in [48], where they also gave an explicit characterization without any auxiliary random variable for scalar Gaussian sources in [48]. For vector Gaussian sources, the characterization by the same authors in [49] involving some matrix optimization was further improved in [31] to a more explicit formula. However, if the discussion is allowed to be two-way and interactive, Tyagi [45] showed with a concrete two-user example that the minimum total discussion rate required, called the communication complexity, can be strictly reduced. Using the technique of Kaspi [30], multi-letter characterizations were given in [45] for the communication complexity and, similarly, by Liu et al. in [32] for the region of achievable secret key rate. [32] further simplified the characterization using the idea of convex envelope using the technique by Ma et al [33]. While these characterizations provide many new insights and properties, they are not considered computable, compared to the usual single-letter and explicit characterizations. Further extension to the multi-user case also appears difficult, as the converse can be seen to rely on the Csiszár sum identity [1, Lemma 4.1], which does not appear to extend beyond the two-user case.

Nevertheless, partial solutions under more restrictive public discussion constraints were possible. By simplifying the problem to the right extent, new results were discovered in the multi-user case, which has led to the formulation in this work. For instance, Gohari and Anantharam [28] characterized the secrecy capacity in the multi-user case under the simpler vocality constraint where some users have to remain silent throughout the public discussion. Using this result, simple necessary and sufficient conditions can be derived as to whether a user can remain silent without diminishing the maximum achievable key rate [36, 50, 7]. This is a simpler result than characterizing the achievable rate region because it does not say how much discussion is required if a user must discuss. Another line of work [19, 35, 37, 9] follows [45] to characterize the communication complexity but in the multi-user case. Courtade and Halford [19] characterized the communication complexity under a special non-asymptotic hypergraphical source model with linear discussion. [37] obtained a multi-letter lower bound on the communication complexity for the asymptotic general source model. It also gave a precise and simple condition under which the omniscience strategy for secret key agreement is optimal for a special source model called the pairwise independent network (PIN) [40], which is a special hypergraphical source model [18]. [9, 17] further derived some single-letter and more easily computable explicit lower bounds, from which one can also obtain conditions for the omniscience strategy to be optimal under the hypergraphical source model, which covers the PIN model as a special case. [10] considered the more general problem of characterizing the multiterminal secrecy capacity under rate-limited public discussion. In particular, an objective of [10] is to characterize the constrained secrecy capacity defined as the maximum achievable key rate as a function of the total discussion rate. This covers the communication complexity as a special case when further increase in the public discussion rate does not increase the secrecy capacity. While only single-letter bounds were derived for the general source model, a surprisingly simple explicit formula was derived for the PIN model [10]. The optimal scheme in [10] follows the tree-packing protocol in [39]. It turns out to belong to the more general approach of decremental secret key agreement in [6, 5] inspired by the achieving scheme in [19] and the notion of excess edge in [18]. More precisely, the omniscience strategy is applied after some excess or less useful edge random variables are removed (decremented) from the source. Since the entropy of the decremented source is smaller, the discussion required to attain omniscience of the decremented source is also smaller. Such decremental secret key agreement approach applies to hypergraphical sources more generally, and it results in one of the best upper bounds in [35] for communication complexity. However, for more general source models that are not necessarily hypergraphical, the approach does not directly apply.

The objective of this work is to formalize and extend the idea of decremental secret key agreement beyond the hypergraphical source model. More precisely, the secret key agreement problem is considered with an additional source compression step before public discussion where each user independently compresses their private source component to filter away less correlated randomness that does not contribute much to the achievable secret key rate. The compression is such that the entropy rate of the compressed sources is reduced to under certain specified level. In particular, the edge removal process in decremental secret key agreement can be viewed as a special case of source compression, and the more general problem will be referred to as compressed secrecy key agreement. The objective is to characterize the achievable secret key rate maximized over all valid compression schemes. For simplicity, this work will focus on the case without helpers, i.e., when all users are active and want to share a common secret key. A closely related formulation is by Nitinawarat and Narayan [38], which characterized the maximum achievable key rate for the two-user case under the scalar gaussian source model where one of the user is required to quantize the source to within a given rate. [46] also extended the formulation and techniques in [38] to the multi-user case where every user can quantize their sources individually to a certain rate. The compression considered in this work is more general than quantizations for gaussian sources, and the new results are meaningful beyond continuous sources.

The compressed secret key agreement problem is also motivated by the study of multivariate mutual information (MMI) [15], i.e., an extension of Shannon’s mutual information to the multivariate case involving possibly more than two random variables. The unconstrained secrecy capacity in the no-helper case has been viewed as a measure of mutual information in [11, 15], not only because of its mathematically appealing interpretations such as the residual independence relation and data processing inequalities in [15], but also because of its operational significance in undirected network coding [13, 14], data clustering [8] and feature selection [16] (cf. [20]). The optimal source compression scheme that achieves the compressed secrecy capacity can be viewed more generally as an optimal dimension reduction procedure that maximizes the MMI per bit of randomness, which is an extension of the information bottleneck problem [44] to the multivariate case. However, different from the multivariate extension in [25], the MMI is used instead of Watanabe’s total correlation [47], and so it captures only the information mutual to all the random variables rather than the information mutual to any subsets of the random variables. Furthermore, the compression is on each random variable rather than subsets of random variables.

The paper is organized as follows. The problem of compressed secret key agreement is formulated in Section 2. Preliminary results of secret key agreement are given in Section 3. The main results are motivated in Section 4 and presented in Section 5, followed by the conclusion and some discussions on potential extensions in Section 6.

2 Problem Formulation

Similar to the multiterminal secret key agreement problem [22] without helpers or wiretapper’s side information, the setting of the problem involves a finite set of users, and a discrete memoryless multiple source

N.b., letters in sans serif font are used for random variables and the corresponding capital letters in the usual math italic font denote the alphabet sets. denotes the joint distribution of ’s.

A secret key agreement protocol with source compression can be broken into the following phases:

Private observation:

Each user observes an -sequence

i.i.d. generated from the source for some block length . N.b., for convenience, denotes the set of positive intergers up to , i.e, .

Private randomization:

Each user generates a random variable independent of the private source, i.e.,

Source compression:

Each user computes


for some function that maps to a finite set. is referred to as the compressed source.

Public discussion:

Using a public authenticated noiseless channel, a user is chosen in round  to broadcast a message

where (3a)

is a positive integer denoting the number of rounds and denotes all the messages broadcast in the previous rounds. If the dependency on is dropped, the discussion is said to be non-interactive. The discussion is said to be one-way (from user ) if (and ). For convenience,


denote the aggregate message from user and the aggregation of the messages from all users respectively.

Key generation:

A random variable , called the secret key, is required to satisfy the recoverability constraint that


for some function , and the secrecy constraint that


where denotes the finite alphabet set of possible key values.

N.b., unlike [45], non-interactive discussion is considered different from one-way discussion in the two-user case since both users are allowed to discuss even though their messages cannot depend on each other. Different from [23], there is an additional source compression phase, after which the protocol can only depend on the origninal sources through the compressed sources.

The objective is to characterize the maximum achievable secret key rate for a continuum of different levels of source compression:

Definition 1

The compressed secrecy capacity with a joint entropy limit is defined as


where the supremum is over all possible compressed secret key agreement schemes satisfying


This constraint limits the joint entropy rate of the compressed source.

N.b., instead of the joint entropy limit, one may also consider entropy limits on some subset that


If multiple entropy limits are imposed, will be a higher-dimensional surface instead of a one-dimensional curve. For example, in the two-user case under the scalar gaussian source model, [38] considered the entropy limit only on one of the users. In the multi-user case under the gaussian markov tree model, [46] considered the symmetric case where the entropy limit is imposed on every user.

For simplicity, however, the joint entropy constraint (4) will be the primary focus in this work. It will be shown that is closely related to the constrained secrecy capacity defined as [10]


with instead of (2), i.e., without compression, and the entropy limit (4) replaced by the constraint on the total discussion rate


N.b., it follows directly from the result of [22] that remains unchanged whether the discussion is interactive or not. Indeed, the relation between and to be shown in this work will not be affected either. Therefore, for notational simplicity, may refer to the case with or without interaction, even though may be smaller with non-interactive discussion.

It is easy to show that is continuous, non-decreasing and concave in  [10, Proposition 3.1]. As goes to , the secrecy capacity


is the usual unconstrained secrecy capacity defined in [22] without the discussion rate constraint (7). The smallest discussion rate that achieves the unconstrained secrecy capacity is the communication complexity denoted by


Similar to , the following basic properties can be shown for :

Proposition 1

is continuous, non-decreasing and concave in . Furthermore,


achieving the unconstrained secrecy capacity in the limit.


Continuity, monotonicity and (10) follow directly from the definition of . Concavity follows from the usual time-sharing argument, i.e., for any , , a secret key rate of is achievable with the entropy limit by applying the optimal scheme that achieves for the first samples of and applying the optimal scheme that achieves for the remaining samples.

Because of (10), a quantity playing the same role of for can be defined for as follows.

Definition 2

The smallest entropy limit that achieves the unconstrained secrecy capacity is defined as


and referred to as the minimum admissible joint entropy.

One may also consider both the entropy limit (4) and discussion rate constraint (7) simultaneously, and define the secrecy capacity as a function of and . For simplicity, however, we will not consider this case but, instead, focus on the relationship between and .

The following example illustrates the problem formulation. It will be revisited at the end of Section 5 (Example 3) to illustrate the main results.

Example 1

Consider and


where and are uniformly random and independent bits. It is easy to argue that


To see this, notice that is observed by every user. Any choice of can therefore be recovered by every user without any discussion, satisfying the recoverability constraint (1) trivially. Since there is no public discussion required, the secrecy constraint (2) also holds immediately by taking a portion of the bits from to be the key bits in . Finally, setting for all ensures , satisfying the entropy limit (4) with equal to the key rate. Hence, as desired. Indeed, we will show (by Proposition 5) that the reverse inequality holds in general, and so we have equality for for this example.

For , every user can simply retain their source without compression, i.e., with for while satisfying the entropy limit (4). Now, with and where is the elementwise XOR, it can be shown that both the recoverability (1) and secrecy (2) constraints hold. This is because user can recover from the XOR with the side information . Furthermore, the XOR bit is independent of and therefore does not leak any information about the key bits. With this scheme, . By the usual time-sharing argument,


Indeed, the reverse inequality can be argued using one of the main results (Theorem 5.1) and so the minimum admissible joint entropy will turn out to be .

3 Preliminaries

In this section, a brief summary of related results for the secrecy capacity and communication complexity will be given. The results for the two-user case will be introduced first, followed by the more general results for the multi-user case, and the stronger results for the special hypergraphical source model. An example will also be given at the end to illustrate some of the results.

3.1 Two-user case

As mentioned in the introduction, no single-letter characterization is known for and even in the two-user case where . Furthermore, while multi-letter characterizations for and were given in [45] and [32] respectively in the two-user case under interactive discussion, no such multi-letter characterization is known for the case with non-interactive discussion. Nevertheless, if one-way discussion from user  is considered, then the result of [21, Theorem 2.4] and its extension [48] to continuous sources gave the following characterization of :


The last constraint (1c) corresponds to the Markov chain and so the supremum is taken over the choices of the conditional distribution . Using the double Markov property as in [45], it follows that can be characterized more explicitly by the Gács–Körner common information


where is a discrete random variable. If (1) is finite, a unique optimal solution exists and is called the maximum common function of and because any common function of and must be a function of . The communication complexity also has a more explicit characterization [45, (44)]


and is a discrete random variable. If is finite, a unique optimal solution exists and is called the minimum sufficient statistics of for since can only depend on through .

In Section 4, the expression will be related to the compressed secret key agreement restricted to the two-user case when the entropy limit is imposed only on user . This duality relationship in the two-user case will serve as the motivation of the main results for the multi-user case. Indeed, the desired characterization of for the two-user case has appeared in [38, Lemma 4.1] for the scalar gaussian source model:


For the general source model, the expression (3.1) has also appeared before with other information-theoretic interpretations as mentioned in [24]. The lagrangian dual of (3.1), in particular, reduces to the dimension reduction technique called the information bottleneck method in [44], where is an observable used to predict the target , and is a feature of that captures as much mutual information with the target variable as possible per bit of mutual information with the observable. Interestingly, the principal of the information bottleneck method was also proposed in  [43, 42] as a way to understand deep learning, since the best prediction of from is nothing but a particular feature of sharing a lot of mutual information with .

3.2 General source with finite alphabet set

Consider the multi-user case where . If takes values from a finite set, then the unconstrained secrecy capacity was shown in [22] to be achievable via communication for omniscience (CO) and equal to


where is the smallest rate of CO [22] characterized by the linear program


where denotes the sum . Further, can be achieved by non-interactive discussion. It follows that

or equivalently (1a)
. (1b)

It was also pointed out in [22] that private randomization does not increase . Hence, if is finite, we have


because can be achieved with . While it seems plausible that randomization does not decrease nor increase for any , a rigorous proof remains elusive. Similarly, it appears plausible that neither nor are affected by randomization but, again, no proof is known yet.

An alternative characterization of was established in [11, 18] by showing that the divergence bound in [22] is tight in the case without helpers. More precisely, with defined as the set of partitions of into at least two non-empty disjoint sets, then


In the bivariate case when , reduces to Shannon’s mutual information . It was further pointed out in [15] that is the minimum solution to the residual independence relation


for some . To get an intuition of the above relation, notice that is a solution when the joint entropy on the left is equal to the sum of entropies ’s on the right for some partition . In other words, the MMI is the smallest value of removal of which leads to an independence relation, i.e., the total residual randomness on the left is equal to the sum of individual residual randomness on the right according to some partitioning of the random variables. It was further shown in [15] that there is a unique finest optimal partition to (2a) with a clustering interpretation in [8]. The MMI is also computable in polynomial time, following the result of Fujishige [26].

In the opposite extreme with , it is easy to argue that


where is the multivariate extension of the Gács–Körner common information in (1)


with again chosen as a discrete random variable. Note that, even without any public discussion, every user can compress their source independently to where is the maximum common function if is finite. Hence, it is easy to achieve a secret key rate of without any discussion. The reverse inequality of (2) seems plausible but has not been proven yet except in the two-user case. The technique in [21] which relies on the Csiszár sum identity does not appear to extend to the multi-user case to give a matching converse.

3.3 Hypergraphical sources

Stronger results have been derived for the following special source model:

Definition 3 (Definition 2.4 of [18])

is a hypergraphical source w.r.t. a hypergraph with edge functions iff, for some independent edge variables for with ,


In the special case when the hypergraph is a graph, i.e., , the model reduces to the pairwise independent network (PIN) model in [40]. The hypergrahical source can also be viewed as a special case of the finite linear source considered in [12] if the edge random variables take values from a finite field.

For hypergraphical sources, various bounds on and have been derived in [35, 37, 9, 10]. The achieving scheme makes use of the idea of decremental secret key agreement [6, 5], where the redundant or less useful edge variables are removed or reduced before public discussion. This is a special case of the compressed secret key agreement, where the compression step simply selects the more useful edge variables up to the joint entropy limit.

For the PIN model, it turns out that decremental secret key agreement is optimal, leading to a single-letter characterization of and in [10]:


It can be verified that (5a) is the smallest value of such that using (5b). While the proof of converse, i.e., for (5b), is rather involved, the achievability is by a simple tree packing protocol, which belongs to the decremental secret key agreement approach that removes excess edges unused for the maximum tree packing. In other words, the achieving scheme is a compressed secret key agreement scheme. This connection will lead to a single-letter characterization of for the PIN model (in Theorem 5.2).

To illustrate the above results, a single-letter characterization for will be derived in the following for the source in Example 1. It will also demonstrate how an exact characterization for can be extended from a PIN model to a hypergraphical model via some contrived arguments. The characterization will also be useful later in Example 3 to give an exact characterization of .

Example 2

The source defined in (12) in Example 1, for instance, is a hypergraphical source with , , and . By (3.2), we have with the optimal solution and . This means that user needs to discuss bit to attain omniscience. In particular, user can reveal the XOR so that user and can recover and respectively from their observations. By (1b), then, we have


It can also be checked that the alternative characterization of in (3.2) gives

Next, we argue that


The achievability, i.e., the inequality , is by the usual time-sharing argument. In particular, the bound , for example, can be achieved by the compressed secret key agreement scheme in Example 1 with , i.e., by time-sharing the compressed secret key agreement schemes for and for equally. More precisely, we set , , , and . It follows that the public discussion rate is .

Now, to prove the reverse inequality for (2), we modifies the source to another source defined as follows with an additional uniformly random and independent bit :

N.b., is different from , namely, is obtained from by adding , and is obtained from by adding and removing . It follows that is a PIN. By (3.2) and (5b), the constrained secrecy capacity for the modified source is

The desired inequality is proved if we can show that

To argue this, note that, if user reveals in public, then user can recover . Furthermore, does not leak any information about , and so the source effectively emulates the source . Consequently, any optimal discussion scheme that achieves for can be used to achieve the same secret key rate but after an additional bit of discussion . This gives the desired inequality that establishes (2).

4 Multi-letter characterization

We start with a simple multi-letter characterization of the compressed secrecy capacity in terms of the MMI (3.2).

Proposition 2

For any , we have


where the supremum is over all valid compressed source satisfying the joint entropy limit (4).


This is because the compressed secrecy capacity is simply the secret key agreement on a compressed source. Hence, by (3.2), the MMI on the compressed source gives the compressed secrecy capacity.∎

The characterization in (3) is simpler than the formulation in (3) because it does not involve the random variables and , nor the recoverability (1) and secrecy (2) constraints. Although such a multi-letter expression is not computable and therefore not accepted as a solution to the problem, it serves as an intermediate step that helps derive further results. More precisely, consider the bivariate case where . Then, (3) becomes


If in addition the joint entropy constraint (4b) is replaced by the entropy constraint on user  only, i.e.,


then can be single-letterized by standard techniques as in [21] to defined in (3.1). The following gives a simple upper bound that is tight for sufficiently small .

Proposition 3

defined in (3.1) is continuous, non-decreasing and concave in with


Furthermore, equality holds iff .


Monotonicity is obvious. Continuity and concavity can be shown by the usual time-sharing argument as in Proposition 1. (1) follows directly from the data processing inequality that under the Markov chain required in (4c). If , then there exist a feasible solution to (1) (a common function of and ) with , and so the compressed sources and can be chosen as a function of to achieve the equality for (1). Conversely, suppose is finite and (1) is satisfied with equality. Then, in addition to , we also have , which implies by the double Markov property that, for the maximum common function achieving defined in (1),

In other words, the optimal is a stochastic function of the maximum common function of and , and so as desired.∎

We will show that the above upper bound in (1) extends to the multi-user case (in Proposition 5). However, for , the above upper bound is not tight even in the two-user case. To improve the upper bound, the following duality between and will be used and extended to the multi-user case (in Theorem 5.1).

Proposition 4

For ,


Furthermore, the set of optimal solutions to the left (achieving defined in (3.1)) is the same as the set of optimal solutions to the right (achieving in (3.1) with ). It follows that the minimum admissible entropy (9) but with the entropy constraint on user  instead is


where and are defined in (2) and (3) respectively.


Set . Consider first an optimal solution to and show that it is also an optimal solution to . By optimality,


By the constraint (4b), . It follows that the constraint (1b) holds, and so is a feasible solution to , i.e., we have for (2) that


To show that is also optimal to , suppose to the contrary that there exists a strictly better solution to , i.e., with


It follows that


The last equality means that the constraint (4b) is satisfied with equality. If to the contrary that the equality does not hold, setting to be for some fraction of time gives a better solution to , contradicting the optimality of . The first inequality can also be argued similarly by the optimality of . Now, we have

where (a) is by the concavity of ; and (b) is by the upper bound in (1). N.b., equality cannot hold simultaneously for (a) and (b) because, otherwise, we have , which, together with (6) and (7), contradicts the result in Proposition 3 that (with strict inequality) for . Hence,

which, together with (6) and (7), implies