Analysis of Remaining Uncertainties and Exponents under Various Conditional Rényi Entropies

Analysis of Remaining Uncertainties and Exponents under Various Conditional Rényi Entropies

Vincent Y. F. Tan, , and Masahito HayashiVincent Y. F. Tan is with the Department of Electrical and Computer Engineering and the Department of Mathematics, National University of Singapore (Email: vtan@nus.edu.sg). Masahito Hayashi is with the Graduate School of Mathematics, Nagoya University, and the Center for Quantum Technologies (CQT), National University of Singapore (Email: masahito@math.nagoya-u.ac.jp). This paper was presented in part at the 2016 International Symposium on Information Theory (ISIT) in Barcelona, Spain.
Abstract

In this paper, we analyze the asymptotics of the normalized remaining uncertainty of a source when a compressed or hashed version of it and correlated side-information is observed. For this system, commonly known as Slepian-Wolf source coding, we establish the optimal (minimum) rate of compression of the source to ensure that the remaining uncertainties vanish. We also study the exponential rate of decay of the remaining uncertainty to zero when the rate is above the optimal rate of compression. In our study, we consider various classes of random universal hash functions. Instead of measuring remaining uncertainties using traditional Shannon information measures, we do so using two forms of the conditional Rényi entropy. Among other techniques, we employ new one-shot bounds and the moments of type class enumerator method for these evaluations. We show that these asymptotic results are generalizations of the strong converse exponent and the error exponent of the Slepian-Wolf problem under maximum a posteriori (MAP) decoding.

Remaining uncertainty, Conditional Rényi entropies, Rényi divergence, Error exponent, Strong converse exponent, Slepian-Wolf coding, Universal hash functions, Information-theoretic security, Moments of type class enumerator method

I Introduction

In information-theoretic security [1, 2], it is of fundamental importance to study the remaining uncertainty of a random variable given a compressed version of itself and another correlated signal . This model, reminiscent of the the Slepian-Wolf source coding problem111In this paper, we abuse terminology and use the terms Slepian-Wolf coding [3] and lossless source coding with decoder side-information interchangeably. [3], is illustrated in Fig. 1. A model somewhat similar to the one we study here was studied by Tandon, Ulukus and Ramachandran [4] who analyzed the problem of secure source coding with a helper. In particular, a party would like to reconstruct a source given a “helper” signal (or a compressed version of it) but an eavesdropper, who can tap on is also present in the system. The authors in [4] analyzed the tradeoff between the compression rate and the equivocation of given . Villard and Piantanida [5] and Bross [6] considered the setting in which the eavesdropper also has access to memoryless side-information that is correlated with . However, there are many ways that one could measure the equivocation or remaining uncertainty. The traditional way, starting from Wyner’s seminal paper on the wiretap channel [7] (and also in [1, 2, 4, 5, 6]), is to do so using the conditional Shannon entropy , leading to a “standard” equivocation measure. In this paper, we study the asymptotics of remaining uncertainties based on the family of Rényi information measures [8]. The measures we consider include the conditional Rényi entropy and its so-called Gallager form, which we denote as . We note that unlike the conditional Shannon entropy, there is no universally accepted definition for the conditional Rényi entropy, so we define the quantities that we study carefully in Section II-A. Extensive discussions of various plausible notions of the conditional Rényi entropy are provided in the recent works by Teixeira, Matos and Antunes [9] and Fehr and Berens [10].

We motivate our study by first showing that the limits of the (normalized) remaining uncertainty and the exponent of the remaining uncertainty (for appropriately chosen Rényi parameters ) are, respectively, generalizations of the strong converse exponent and the error exponent for decoding given . Recall that the strong converse exponent [11, 12] is the exponential rate at which the probability of correct decoding tends to zero when one operates at a rate below the first-order coding rate, i.e., the conditional Shannon entropy . In contrast, the error exponent [13, 14, 15, 16] is the exponential rate at which the probability of incorrect decoding tends to zero when one operates at a rate above . Thus, studying the asymptotics of the conditional Rényi entropy allows us not only to understanding the remaining uncertainty for various classes of hash functions [17, 18] but also allows us to provide additional information and intuition concerning the strong converse exponent and the error exponent for Slepian-Wolf coding [3]. We also motivate our study by considering a scenario in information-theoretic security where the hash functions we study appear naturally, and coding can be done in a computationally efficient manner. The present work can be regarded a follow-on from the authors’ previous work in [19] on the asymptotics of the equivocations where we studied the behavior of and (where is the rate of the cardinality of the range of ). In [19], we also studied the exponents and second-order asymptotics of the equivocation. However, we note that because we consider the remaining uncertainty instead of the equivocation, several novel techniques, including new one-shot bounds and large-deviation techniques, have to be developed to single-letterize various expressions.

Fig. 1: The Slepian-Wolf [3] source coding problem. We are interesting in quantifying the asymptotic behaviors of the remaining uncertainty of given measured according to the conditional Rényi entropies and defined in (10) and (12).

Paper Organization

This paper is organized as follows. In Section II, we recap the definitions of standard Shannon information measures and some less common Rényi information measures [10, 9]. We also introduce some new quantities and state relevant properties of the information measures. We state some notation concerning the method of types [16]. In Section III, we further motivate our study by relating the quantities we wish to characterize to the error exponent and strong converse exponent of Slepian-Wolf coding (Proposition 1). In Section IV, we define various important classes of hash functions [17, 18] (such as universal and strong universal hash functions) and further motivate the study of the quantities of interest by discussing efficient implementations of universal hash functions via circulant matrices [20]. The final parts of Section IV contain our main results concerning the asymptotics of the normalized remaining uncertainties (Theorem 2), the optimal rates of compression of the main source to ensure that the remaining uncertainties vanish (Theorem 3), and the exponents of the remaining uncertainties (Theorem 4). We show that the optimal rates are tight in certain ranges of the Rényi parameter. For these evaluations, we make use of several novel one-shot bounds, large-deviation techniques as well as the moments of type class enumerator method [21, 22, 23, 24, 25]. Theorems 2, 3 and 4 are proved in Sections V, VI and VII respectively. We conclude our discussion and suggests further avenues for research in Section VIII. Some technical results (e.g., one-shot bounds, concentration inequalities) are relegated to the appendices.

Ii Information Measures and Other Preliminaries

Ii-a Basic Shannon and Rényi Information Quantities

We now introduce some information measures that generalize Shannon’s information measures. Fix a normalized distribution and a sub-distribution (a non-negative vector but not necessarily summing to one) supported on a finite set . Then the relative entropy and the Rényi divergence of order are respectively defined as

(1)
(2)

where throughout, is to the natural base . It is known that so a special (limiting) case of the Rényi divergence is the usual relative entropy. It is also known that the map is concave in and hence is monotonically increasing for . Furthermore, the following data processing or information processing inequalities for Rényi divergences hold for ,

(3)
(4)

Here is any stochastic matrix (channel) and is the output distribution induced by and .

We also introduce conditional entropies on the product alphabet based on the divergences above. Let for each . If is a distribution on , the conditional entropy, the conditional Rényi entropy of order and the min-entropy relative to another normalized distribution on as

(5)
(6)
(7)

It is known that and

(8)

If , we simplify the above notations and denote the conditional entropy, the conditional Rényi entropy of order and the min-entropy as

(9)
(10)
(11)

The map is concave, and is monotonically decreasing for . The definition of the conditional Rényi entropy in (10) is due to Hayashi [26, Section II.A] and Škorić et al. [27, Definition 7].

We are also interested in the so-called Gallager form of the conditional Rényi entropy and the min-entropy for a joint distribution :

(12)
(13)

By defining the familiar Gallager function [13, 14] (parametrized slightly differently)

(14)

we can express (12) as

(15)

thus (loosely) justifying the nomenclature “Gallager form” of the conditional Rényi entropy in (12). Note that and are respectively denoted as and in the paper by Fehr and Berens [10]. The Gallager form of the conditional Rényi entropy, also commonly known as Arimoto’s conditional Rényi entropy [28], was shown in [10] to satisfy two natural properties for , namely, monotonicity under conditioning (or simply monotonicity)

(16)

and the chain rule

(17)

The monotonicity property of was also shown operationally by Bunte and Lapidoth in the context of lossless source coding with lists and side-information [29] and encoding tasks with side-information [30]. We exploit these properties in the sequel. The quantities and can be shown to be related as follows [10, Theorem 4]

(18)

for . The maximum on the left-hand-side is attained for the tilted distribution

(19)

The map is concave and the map is monotonically decreasing for . It can be shown by L’Hôpital’s rule that . Thus, we regard as , i.e., when , the conditional Rényi entropy and its Gallager form coincide and are equal to the conditional Shannon entropy.

We also find it useful to consider a two-parameter family of the conditional Rényi entropy222This new information-theoretic quantity is somewhat related to in the work by Hayashi and Watanabe [31, Eq. (14)-(15)] but is different and not to be confused with .

(20)

Clearly , so the two-parameter conditional Rényi entropy is a generalization of the Gallager form of the conditional Rényi entropy in (12).

For future reference, given a joint source , define the critical rates

(21)
(22)

Ii-B Notation for Types

The proofs of our results leverage on the method of types [16, Ch. 2], so we summarize some relevant notation here. The set of all distributions (probability mass functions) on a finite set is denoted as . The type or empirical distribution of a sequence is the distribution . The set of all sequences with type is the type class and is denoted as . The set of all -types (types formed from length- sequences) on alphabet is denoted as . When we write , we mean that inequality on an exponential scale, i.e., . The notations and are defined analogously. Throughout, we will use the fact that the number of types .

Iii Motivation for Studying Remaining Uncertainties

As mentioned in the introduction, in this paper, we study the remaining uncertainty and its rate of exponential decay measured using various Rényi information measures. In this section, we further motivate the relevance of this study by relating the remaining uncertainty to the strong converse exponent for decoding given side information and the compressed version of , namely (Slepian-Wolf problem). We also relate the exponential rate of decay of the remaining uncertainty for a source coding rate above the first-order fundamental limit to the error exponent of the Slepian-Wolf problem.

Iii-a Relation to the Strong Converse Exponent for Slepian-Wolf Coding

Consider the Slepian-Wolf source coding problem as shown in Fig. 1. For a given function (encoder) and side information vector , we may define the maximum a-posteriori (MAP) decoder as follows:

(23)

Define the probability of correctly decoding given the encoder and the MAP decoder as follows:

(24)

Then, by the definition of in (13), we immediately see that

(25)

When optimized over , the quantity on the left of (25) (or its limit) is called the strong converse exponent as it characterizes the optimal exponential rate at which the probability of correct decoding the true source given decays to zero. Thus, by studying the asymptotics of for all and, in particular, the limiting case of (which we do in (53) in Part (2) of Theorem 2), we obtain a generalization of the strong converse exponent for the Slepian-Wolf problem. In fact, it is known that for any sequence of encoders if and only if the rate [12, Theorem 2]. This fact will be utilized in the proof of Theorem 3.

Iii-B Relation to the Error Exponent for Slepian-Wolf Coding

Similarly, we may define the probability of incorrectly decoding given the encoder and MAP decoder as follows:

(26)

Then we have the following proposition concerning the exponent of .

Proposition 1.

Assume that tends to zero exponentially fast for a given sequence of hash functions , i.e., (the existence of the limit is part of the assumption). Then for any , we have

(27)
(28)

We recall, by the Slepian-Wolf theorem [3], that there exists a sequence of encoders such that tends to zero if and only if . When optimized over , the quantity on the left of (27) is called the optimal error exponent and it characterizes the optimal exponential rate at which the error probability of decoding given decays to zero. Thus, Proposition 1 says that the exponents of and for are generalizations of the error exponent of decoding given . We establish bounds on these limits for certain classes of hash functions in Part (2) of Theorem 4.

Proof.

We first consider the Gallager form of the conditional Rényi entropy . For brevity, we let (suppressing the dependence on ) and we also define the probability distributions and . Recall the definition of the MAP decoder in (23). We have

(29)
(30)
(31)
(32)

In the following chain of inequalities, we will employ Taylor’s theorem with the Lagrange form of the remainder for the function at , i.e.,

(33)

for some . We choose to be in our application in (36) to follow. Let be a generic element of taking the role of in the Taylor series expansion in (33). We bound the conditional Rényi entropy as follows:

(34)
(35)
(36)
(37)
(38)
(39)
(40)

In (37), noting that , we uniformly upper bounded by . We also upper bounded by . In (40), we used the definition of stated in (26). Because is assumed to decay exponentially fast, we have

(41)
(42)
(43)
(44)
(45)

where (41) and (45) follow from , (42) uses (32), (43) uses the fact that (cf. (18)) and (44) uses (40). The second term in (45) is exponentially smaller than because of the square operation and the fact that . Now, since is constant, the exponents of the quantities on the left and right sides of the above chain are equal. Thus they are equal to the exponents of and for every . This completes the proof of Proposition 1. ∎

Iv Main Results: Asymptotics of the Remaining Uncertainties

In this section we present our results concerning the asymptotic behavior of the remaining uncertainties and its exponential behavior. As mentioned in Section III, the former is a generalization of the strong converse exponent for the Slepian-Wolf problem [3], while the latter is a generalization of the error exponent for the same problem. Before doing so, we define various classes of random hash functions and further motivate our analysis using an example from information-theoretic security.

Iv-a Definitions of Various Classes of Hash Functions

We now define various classes of hash functions. We start by stating a slight generalization of the canonical definition of a universal hash function by Carter and Wegman [17].

Definition 1.

A random333For brevity, we will sometimes omit the qualifier “random”. It is understood, henceforth, that all so-mentioned hash functions are random hash functions. hash function is a stochastic map from to , where denotes a random variable describing its stochastic behavior. The set of all random hash functions mapping from to is denoted as . A hash function is called an -almost universal hash function if it satisfies the following condition: For any distinct ,

(46)

When in (46), we simply say that is a universal hash function [17]. We denote the set of universal hash functions mapping from to by .

The following definition is due to Wegman and Carter [18].

Definition 2.

A random hash function is called strongly universal when the random variables are independent and subject to a uniform distribution, i.e.,

(47)

for all . If is a strongly universal hash function, we emphasize this fact by writing .

Fig. 2: Hierarchy of hash functions. See Definitions 1 and 2.

As an example, if independently and uniformly assigns each element of into one of “bins” indexed by (i.e., the familiar random binning process introduced by Cover in the context of Slepian-Wolf coding [32]), then (47) holds, yielding a strongly universal hash function. The hierarchy of hash functions is shown in Fig. 2.

A universal hash function can be implemented efficiently via circulant (special case of Toeplitz) matrices. The complexity is low—applying to an -bit string requires operations generally. For details, see the discussion in Hayashi and Tsurumaru [20] and the subsection to follow. So, it is natural to assume that the encoding functions we analyze in this paper are universal hash functions.

Iv-B Another Motivation for Analyzing Remaining Uncertainties

Fig. 3: A secure communication scenario that motivates our study of remaining uncertainties. See Section IV-B for a discussion.

To ensure a reasonable level of security in practice, we often send our message via multiple paths in networks. Assume that Alice wants to send an -bit “message” to Bob via paths, and that Eve has access to side-information correlated to and intercepts one of the paths. We also suppose for some . Alice applies an invertible function to and divides into equal-sized parts ( times). See Fig. 3. Bob receives all of them, and applies to decode . Hence, Bob can recover the original message losslessly. However, if Eve somehow manages to tap on the -th part (where ), Eve can possibly estimate the message from and (in Fig. 3, we assume Eve taps on the first piece of information ). Eve’s uncertainty with respect to is ( here is a generic entropy function; it will be taken to be various conditional Rényi entropies in the subsequent subsections). In this scenario, it is not easy to estimate the uncertainty as it depends on the choice of . To avoid such a difficulty, we propose to apply a random invertible function to . To further resolve the aforementioned issue from a computational perspective, we regard as the finite extension field . When Alice and Bob choose invertible element in the finite field subject to the uniform distribution, and is defined as , the map is a universal hash function. Then, Eve’s uncertainty with respect to can be described as . When is taken to be where the ’s are independent and identically distributed, our results in the following subsections are directly applicable in evaluating Eve’s uncertainty measured according to various conditional Rényi entropies. We remark that if is not a multiple of , we can make the final block smaller than bits without any loss of generality asymptotically.

Indeed, this protocol can be efficiently implemented with (low) complexity of [20] because multiplication in the finite field can be realized by an appropriately-designed circulant matrix, leading to a fast Fourier transform-like algorithm. Therefore, this communication setup, which contains an eavesdropper, is “practical” in the sense that encoding and decoding can be realized efficiently.

Iv-C Asymptotics of Remaining Uncertainties

Our results in Theorem 2 to follow pertain to the worst-case remaining uncertainties over all universal hash functions. We are interested in and , where is a shorthand for (similarly for ) and is the -fold product measure. We emphasize that the evaluations of and are stronger than those in standard achievability arguments in Shannon theory where one often uses a random selection argument to assert that an object (e.g., a code) with good properties exist. In our calculations of the asymptotics of and , we assert that all hash functions in have a certain desirable property; namely, that the remaining uncertainties can be appropriately upper bounded. In addition, in Theorem 3 to follow, we also quantify the minimum rate such that the best-case remaining uncertainties over all random hash functions and vanish. For many values of , we show the minimum rates for the two different evaluations (worst-case over all and best-case over all ) coincide, establishing tightness for the optimal compression rates.

Let . The following is our first main result.

Theorem 2 (Remaining Uncertainties).

For each , let the size444When we write , we mean that is the integer of the range of be . Fix a joint distribution . Define the worst-case limiting normalized remaining uncertainties over all universal hash functions as

(48)
(49)

Recall the definitions of the critical rates and in (21) and (22) respectively. The following achievability statements hold:

  1. For any , we have

    (50)

    and for any , we have