On the Most Informative Boolean Functionsof the Very Noisy Channel

# On the Most Informative Boolean Functions of the Very Noisy Channel

Hengjie Yang and Richard D. Wesel
Department of Electrical and Computer Engineering
University of California, Los Angeles, Los Angeles, CA 90095, USA
Email: {hengjie.yang, wesel}@ucla.edu
###### Abstract

Let be an independently identically distributed Bernoulli random variables and let be the result of passing through a binary symmetric channel (BSC) with crossover probability . For any Boolean function , Courtade and Kumar postulated that . In this paper, we prove, in a purely mathematical point of view, that the conjecture is true in high noise regime by showing that holds for , where is some universally small constant. We first point out that in high noise regime, a function is more informative if the second derivative of has a larger value at . Then we show that, the ratio spectrum of , an integer sequence which characterizes the structure of , is the fundamental metric that determines the second derivative of evaluated at . With fixed, lex function is a locally most informative function for the very noisy BSC. The dictator function , with being a special case of lex function , is the globally most informative function for the very noisy BSC as it is the only type of functions that maximize the second derivative of evaluated at to over all possible .

## I Introduction

### I-a Previous work

In 2013, Courtade and Kumar [1] postulated the following maximum mutual information conjecture.

###### Conjecture 1 ([1])

Let be a sequence of i.i.d. Bernoulli(1/2) random variables, and let be the result of passing through a memoryless binary symmetric channel (BSC) with crossover probability . For any Boolean function , we have

 I(f(Xn);Yn)≤1−H(α). (1)

Although Conjecture 1 still remains open so far, several new results were obtained in a series of literature [2, 3, 4]. Samorodnitsky [2] proved that the conjecture holds for the high noise regime, where , for some universally small constant , by considering the entropy of the image of under the noise operator with noise parameter .

###### Theorem 1 ([2])

There exists an absolute constant such that for any noise with and for any Boolean function , it holds that

 I(f(Xn);Yn)≤1−H(α). (2)

Nevertheless, so far the best known bound that holds universally for all Boolean function is

 I(f(Xn);Yn)≤(1−2α)2. (3)

This bound, derived by Erkip [5], can be established through various techniques, including an application of Mrs. Gerber’s Lemma [6], the strong data-processing inequality [7], and standard Fourier analysis. Unfortunately, this bound is still strictly weaker than that in Conjecture 1.

In 2016, Ordentlich, Shayevitz, and Weinstein [3] proved a new upper bound for balanced Boolean function , i.e., any Boolean function that satisfies , which beats in .

###### Theorem 2 ([3])

For any balanced Boolean function , and any , we have that

 I(f(Xn);Yn)≤log2(e)2(1−2α)2+9(1−log2(e)2)(1−2α)4. (4)

In 2017, Huleihel and Ordentlich [4] studied a complementary problem of Conjecture 1 and proved the following theorem.

###### Theorem 3 ([4])

For any function , we have

 I(f(Xn);Yn)≤(n−1)(1−H(α)), (5)

and this bound is attained with equality by, e.g., .

In 2018, Li and Medard [8] studied the problem of finding the optimal Boolean function that maximizes the -th moment of , where denotes the noise operator on with noise parameter , in which they discussed the relationship between this problem and Conjecture 1.

### I-B Our main contributions

In this paper, we prove the same result as in Theorem 1 by directly dealing with Conjecture 1 in a purely mathematical point of view. First, we reformulate Conjecture 1 into a form where given and , the left-hand side is a function in terms of whereas the right-hand side is a constant specified by and . Then, we show that always holds when with being some universally small constant by showing that and .

Given , we say “ is a most informative function” if is undominated in high noise regime, i.e., if for this choice of is greater than or equal to for any other choice of with the same near . Since for any , is equal to the constant at , and the first derivative of is zero at for all , it follows that is undominated if is maximized, which implies that the most informative functions are the ones that maximize .

We introduce the ratio spectrum of , an integer sequence which characterizes the structure of , which is sufficient to determine . With fixed, the lex function [1], a Boolean function with the first lexicographically ordered codewords mapped to , is a “locally” most informative function for the very noisy BSC as it is a function that maximizes over all Boolean functions with the same . Note that, with fixed, lex function is not the only locally most informative function. In general, any other function that has the same ratio spectrum as the lex function is also a locally most informative function, which will be shown in Theorem 7. However, the dictator function is the “globally” most informative function for the very noisy BSC since it is the only type of functions that maximize to over all possible , i.e., over all possible choices of Boolean functions. Note that with the dictator function is a special case of the lex function with .

This paper is organized as follows. Section II reformulates Conjecture 1 into the form . Section III presents the lemmas that are sufficient and necessary to prove the reformulated conjecture in high noise regime. Sec. IV, V and VI present the proof details for the respective lemmas and Section VII concludes the paper.

## Ii Reformulation of Conjecture 1

First, noting that the inequality (1) is equivalent to

 H(α)−H(f(Xn)|Yn)≤1−H(f(Xn)), (6)

we reformulate Conjecture 1 as follows.

Let denote the universal set of -bit binary codewords sorted in lexicographical order, where , is the -bit binary representation of index . A Boolean function can be specified as follows:

 f(ci)={0,ci∈U;1,ci∈V=S∖U, (7)

where with , . For brevity, we denote the above mapping by . Note that when or , (6) degenerates to which always holds, thus it is enough to focus on the case when .

Define

 F(α,n,M,f)≜H(α)−H(f(Xn)|Yn), (8)

and

 T(n,M)≜1−H(f(Xn)), (9)

where and the logarithm base is . Note that in Sec. IV,V and VI, the logarithm base is . Therefore, (6) is equivalent to

 F(α,n,M,f)≤T(n,M). (10)

Note that given pair , , is a constant specified by whereas still depends on the choice of . Later we will show that the second derivative of is uniquely determined by the ratio spectrum of , a fundamental quantity that characterizes the structure of codewords in . Thus, Conjecture 1 translates to the following conjecture.

###### Conjecture 2

Let . Given pair , , for any function and , we have

 maxα∈[0,1]F(α,n,M,f)=F(α∗,n,M,f)=T(n,M), (11)

where

 α∗=argmaxα∈[0,1]{F(α,n,M,f)}. (12)

Note that it is trivial to show due to the fact that when , and are independent so that . Thus, . Hence, the crucial part is to show .

An interesting fact is that the dictator function satisfies for any . Let , . Therefore, and , i.e., . In this paper, we will also show that the dictator function is the globally most informative function for the very noisy BSC as it is the only type of functions that maximize the second derivative of evaluated at to over all possible .

As an example of the reformulation, Fig. 1 shows that for and . Meanwhile, Fig. 1. also depicts two typical shapes of : a quasi-concave shape as shown in , and a “single-peak wave” shape as shown in . In fact, we conjecture that these are the only two possible shapes of given . Note that even for dictator function , is still quasi-concave.

## Iii Main Results

Note that for any , is a continuous, twice differentiable function of . We prove that Conjecture 2 holds in the high noise regime, as stated in Theorem 4, which is equivalent to Theorem 1.

###### Theorem 4

Let . Given pair , , for any function , there exists a universally small constant such that

 maxα∈[α∗−δ,α∗+δ]F(α,n,M,f)=F(α∗,n,M,f)=T(n,M), (13)

where

 α∗=argmaxα∈[α∗−δ,α∗+δ]{F(α,n,M,f)}. (14)

The entire paper is to establish Theorem 4 by proving the following 3 lemmas.

###### Lemma 1

Given pair , , for any function , is symmetric with respect to .

###### Lemma 2

Given pair , , for any function , we have

 ∂F(α,n,M,f)∂α∣∣∣α=α∗=0. (15)
###### Lemma 3

Given pair , , for any function , we have

 ∂2F(α,n,M,f)∂α2∣∣∣α=α∗≤0, (16)

where equality holds if and only if is a dictator function.

In the proof of Lemma 3, we introduce several new concepts such as lex function, ratio spectrum, etc., which play an important role in the proof. We also show that, with fixed, the lex function is a locally most informative function for the very noisy BSC as it maximizes the over the set of Boolean functions with the same . The dictator function is the globally most informative function as it is the only type of functions that maximize to over all possible , i.e., over all possible choices of Boolean functions.

## Iv Proof of Lemma 1

The expansion of is as follows.

 F(α,n,M,f)= H(α)−H(f(Xn)|Yn) (17) = H(α)−S−1∑j=0Pr(Yn=cj)H(f(Xn)|Yn=cj) (18) = H(α)+S−1∑j=0Pr(Yn=cj)1∑f=0p(f|cj)logp(f|cj) (19) = H(α)−1SS−1∑j=0H(λj), (20)

where

 λj ≜p(f=0|cj) (21) =M−1∑i=0Pr(Xn=cti|Yn=cj) (22) =M−1∑i=0αkti,j(1−α)n−kti,j (23) H(p) ≜plog1p+(1−p)log11−p, (24)

and denotes the Hamming distance between and . Note that when , .

Consider and its symmetric part . Note that and its mirror codeword satisfy , we have

 λ′j =M−1∑i=0¯¯¯¯αkti,j(1−¯¯¯¯α)n−kti,j (25) =M−1∑i=0(1−α)n−kti,S−1−jαkti,S−1−j (26) =λS−1−j. (27)

Thus, it follows from (27) that

 F(¯¯¯¯α,n,M,f) =H(¯¯¯¯α)−1SS−1∑j=0H(λ′j) (28) =H(α)−1SS−1∑j=0H(λS−1−j) (29) =F(α,n,M,f), (30)

which implies that is symmetric with respect to .

Similarly, an additional symmetry property for complementary function is presented as follows.

###### Theorem 5

Given pair , , for each function , define its complementary function with and . Then, .

###### Proof:

It is equivalent to expressing . Define

 λcj≜p(f=1|cj) =M−1∑i=0Pr(Xn=cti|Yn=cj) (31) =M−1∑i=0αkti,j(1−α)n−kti,j. (32)

From (31), we notice that is exactly the same as the definition of . Thus, and have the same . \qed

The implication of Theorem 5 is that it suffices to focus on .

## V Proof of Lemma 2

According to (23), The first derivative of with respect to is

 ∂λj∂α =M−1∑i=0∂αkti,j(1−α)n−kti,j∂α (33) =M−1∑i=0(kti,j−nα)αkti,j−1(1−α)n−kti,j−1. (34)

Therefore, from (20), the first derivative of evaluated at is

 ∂F(α,n,M,f)∂α∣∣∣α=α∗= ∂{H(α)−1S∑S−1j=0H(λj)}∂α∣∣ ∣∣α=α∗ (35) = (log1−αα+1SS−1∑j=0∂λj∂αlogλj1−λj)∣∣∣α=α∗ (36) = 0+1S(S−1∑j=0∂λj∂α∣∣∣α=α∗)logλ01−λ0 (37) = CS(S−1∑j=0M−1∑i=0(kti,j−12n)(12)n−2) (38) = 4CS2(M−1∑i=0S−1∑j=0kti,j−12nMS) (39) = 4CS2⎛⎜ ⎜⎝M−1∑i=0S2−1∑j=0(kti,j+kti,S−1−j)−12nMS⎞⎟ ⎟⎠ (40) = 4CS2⎛⎜ ⎜⎝M−1∑i=0S2−1∑j=0n−12nMS⎞⎟ ⎟⎠ (41) = 0, (42)

where and (36) to (37) follows from that when .

## Vi Proof of Lemma 3

The proof of Lemma 3 proceeds as follows: First we compute for any , then we find that, with fixed, the lex function maximizes . The proof concludes by showing that , i.e., the second derivative is nonpositive, when is lex.

We first introduce several new definitions which play an important role in proving Lemma 3.

###### Definition 1

(lex function) We define to be lex when , i.e., the first lexicographically ordered codewords.

###### Definition 2

(codeword matrix ) Given pair and any function , define the codeword matrix as an matrix in which the -th row is .

###### Definition 3

( ratio) Assume the -th column of has bit ’s and bit ’s, where . Define the ratio of the -th column of as .

###### Definition 4

(ratio spectrum) The ratio spectrum of is defined by a sequence , where denotes the number of columns in that have the ratio, .

###### Definition 5

(lexicographic ordering of ratio spectra) The ratio spectrum is said to be (strictly) lexicographically greater than , denoted by , if and only if for some and for all .

With these definitions established, we begin the proof of Lemma 3. According to (34) and (36), the second derivative of and evaluated at are given as follows.

 ∂2λj∂α2= M−1∑i=0∂(kti,j−nα)αkti,j−1(1−α)n−kti,j−1∂α (43) = M−1∑i=0[kti,j(kti,j−1)+2(1−n)kti,jα+(n2−n)α2]αkti,j−2(1−α)n−kti,j−2. (44)

Thus,

 ∂2λj∂α2∣∣∣α=α∗= 16SM−1∑i=0[kti,j(kti,j−1)+(1−n)kti,j+14(n2−n)]. (45)

Therefore,

 ∂2F(α,n,M,f)∂α2∣∣∣α=α∗ =∂∂α(log1−αα+1SS−1∑j=0∂λj∂αlogλj1−λj)∣∣∣α=α∗ (46) =⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩−1α(1−α)+1SS−1∑j=0⎡⎢ ⎢ ⎢ ⎢ ⎢⎣∂2λj∂α2logλj1−λj+(∂λj∂α)2(1−λj)λj⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⎫⎪ ⎪ ⎪ ⎪ ⎪⎬⎪ ⎪ ⎪ ⎪ ⎪⎭∣∣∣α=α∗ (47) =−4+CSS−1∑j=0∂2λj∂α2∣∣∣α=α∗+S(S−M)MS−1∑j=0(∂λj∂α)2∣∣∣α=α∗ (48) =−4+L1+L2, (49)

where

 C ≜logλ01−λ0 (50) L1 ≜CSS−1∑j=0∂2λj∂α2∣∣∣α=α∗ (51) L2 ≜S(S−M)MS−1∑j=0(∂λj∂α)2∣∣∣α=α∗, (52)

and (47) to (48) follows from that when .

Next, we show that :

 L1 =CSS−1∑j=0∂2λj∂α2∣∣∣α=α∗ (53) =16CS2S−1∑j=0M−1∑i=0[kti,j(kti,j−1)+(1−n)kti,j+14(n2−n)] (54) =16CS2[M−1∑i=0S−1∑j=0kti,j(kti,j−n)+14(n2−n)MS] (55) =16CS2[−M−1∑i=0S−1∑j=0kti,j(n−kti,j)+14(n2−n)MS] (56) =16CS2[−M−1∑i=0(n∑k=0(nk)k(n−k))+14(n2−n)MS] (57) =16CS2[M(n∑k=0(nk)k2−nn∑k=0(nk)k)+14(n2−n)MS] (58) =16CS2[M(n(n+1)2n−2−n22n−1)+14(n2−n)MS] (59) =16CS2[14MS(−n2+n)+14(n2−n)MS] (60) =0, (61)

where (58) to (59) is in Appendix A.

 L2 =S(S−M)MS−1∑j=0(∂λj∂α)2∣∣∣α=α∗ (62) =S(S−M)MS−1∑j=0[M−1∑i=0(kti,j−12n)(12)n−2]2 (63) =4(S−M)MS⎡⎣4S−1<