On the Most Informative Boolean Functions
of the Very Noisy Channel
Let be an independently identically distributed Bernoulli random variables and let be the result of passing through a binary symmetric channel (BSC) with crossover probability . For any Boolean function , Courtade and Kumar postulated that . In this paper, we prove, in a purely mathematical point of view, that the conjecture is true in high noise regime by showing that holds for , where is some universally small constant. We first point out that in high noise regime, a function is more informative if the second derivative of has a larger value at . Then we show that, the ratio spectrum of , an integer sequence which characterizes the structure of , is the fundamental metric that determines the second derivative of evaluated at . With fixed, lex function is a locally most informative function for the very noisy BSC. The dictator function , with being a special case of lex function , is the globally most informative function for the very noisy BSC as it is the only type of functions that maximize the second derivative of evaluated at to over all possible .
I-a Previous work
In 2013, Courtade and Kumar  postulated the following maximum mutual information conjecture.
Conjecture 1 ()
Let be a sequence of i.i.d. Bernoulli(1/2) random variables, and let be the result of passing through a memoryless binary symmetric channel (BSC) with crossover probability . For any Boolean function , we have
Although Conjecture 1 still remains open so far, several new results were obtained in a series of literature [2, 3, 4]. Samorodnitsky  proved that the conjecture holds for the high noise regime, where , for some universally small constant , by considering the entropy of the image of under the noise operator with noise parameter .
Theorem 1 ()
There exists an absolute constant such that for any noise with and for any Boolean function , it holds that
Nevertheless, so far the best known bound that holds universally for all Boolean function is
This bound, derived by Erkip , can be established through various techniques, including an application of Mrs. Gerber’s Lemma , the strong data-processing inequality , and standard Fourier analysis. Unfortunately, this bound is still strictly weaker than that in Conjecture 1.
In 2016, Ordentlich, Shayevitz, and Weinstein  proved a new upper bound for balanced Boolean function , i.e., any Boolean function that satisfies , which beats in .
Theorem 2 ()
For any balanced Boolean function , and any , we have that
Theorem 3 ()
For any function , we have
and this bound is attained with equality by, e.g., .
I-B Our main contributions
In this paper, we prove the same result as in Theorem 1 by directly dealing with Conjecture 1 in a purely mathematical point of view. First, we reformulate Conjecture 1 into a form where given and , the left-hand side is a function in terms of whereas the right-hand side is a constant specified by and . Then, we show that always holds when with being some universally small constant by showing that and .
Given , we say “ is a most informative function” if is undominated in high noise regime, i.e., if for this choice of is greater than or equal to for any other choice of with the same near . Since for any , is equal to the constant at , and the first derivative of is zero at for all , it follows that is undominated if is maximized, which implies that the most informative functions are the ones that maximize .
We introduce the ratio spectrum of , an integer sequence which characterizes the structure of , which is sufficient to determine . With fixed, the lex function , a Boolean function with the first lexicographically ordered codewords mapped to , is a “locally” most informative function for the very noisy BSC as it is a function that maximizes over all Boolean functions with the same . Note that, with fixed, lex function is not the only locally most informative function. In general, any other function that has the same ratio spectrum as the lex function is also a locally most informative function, which will be shown in Theorem 7. However, the dictator function is the “globally” most informative function for the very noisy BSC since it is the only type of functions that maximize to over all possible , i.e., over all possible choices of Boolean functions. Note that with the dictator function is a special case of the lex function with .
This paper is organized as follows. Section II reformulates Conjecture 1 into the form . Section III presents the lemmas that are sufficient and necessary to prove the reformulated conjecture in high noise regime. Sec. IV, V and VI present the proof details for the respective lemmas and Section VII concludes the paper.
Ii Reformulation of Conjecture 1
First, noting that the inequality (1) is equivalent to
we reformulate Conjecture 1 as follows.
Let denote the universal set of -bit binary codewords sorted in lexicographical order, where , is the -bit binary representation of index . A Boolean function can be specified as follows:
where with , . For brevity, we denote the above mapping by . Note that when or , (6) degenerates to which always holds, thus it is enough to focus on the case when .
Note that given pair , , is a constant specified by whereas still depends on the choice of . Later we will show that the second derivative of is uniquely determined by the ratio spectrum of , a fundamental quantity that characterizes the structure of codewords in . Thus, Conjecture 1 translates to the following conjecture.
Let . Given pair , , for any function and , we have
Note that it is trivial to show due to the fact that when , and are independent so that . Thus, . Hence, the crucial part is to show .
An interesting fact is that the dictator function satisfies for any . Let , . Therefore, and , i.e., . In this paper, we will also show that the dictator function is the globally most informative function for the very noisy BSC as it is the only type of functions that maximize the second derivative of evaluated at to over all possible .
As an example of the reformulation, Fig. 1 shows that for and . Meanwhile, Fig. 1. also depicts two typical shapes of : a quasi-concave shape as shown in , and a “single-peak wave” shape as shown in . In fact, we conjecture that these are the only two possible shapes of given . Note that even for dictator function , is still quasi-concave.
Iii Main Results
Let . Given pair , , for any function , there exists a universally small constant such that
The entire paper is to establish Theorem 4 by proving the following 3 lemmas.
Given pair , , for any function , is symmetric with respect to .
Given pair , , for any function , we have
Given pair , , for any function , we have
where equality holds if and only if is a dictator function.
In the proof of Lemma 3, we introduce several new concepts such as lex function, ratio spectrum, etc., which play an important role in the proof. We also show that, with fixed, the lex function is a locally most informative function for the very noisy BSC as it maximizes the over the set of Boolean functions with the same . The dictator function is the globally most informative function as it is the only type of functions that maximize to over all possible , i.e., over all possible choices of Boolean functions.
Iv Proof of Lemma 1
The expansion of is as follows.
and denotes the Hamming distance between and . Note that when , .
Consider and its symmetric part . Note that and its mirror codeword satisfy , we have
Thus, it follows from (27) that
which implies that is symmetric with respect to .
Similarly, an additional symmetry property for complementary function is presented as follows.
Given pair , , for each function , define its complementary function with and . Then, .
It is equivalent to expressing . Define
From (31), we notice that is exactly the same as the definition of . Thus, and have the same . \qed
The implication of Theorem 5 is that it suffices to focus on .
V Proof of Lemma 2
According to (23), The first derivative of with respect to is
Vi Proof of Lemma 3
The proof of Lemma 3 proceeds as follows: First we compute for any , then we find that, with fixed, the lex function maximizes . The proof concludes by showing that , i.e., the second derivative is nonpositive, when is lex.
We first introduce several new definitions which play an important role in proving Lemma 3.
(lex function) We define to be lex when , i.e., the first lexicographically ordered codewords.
(codeword matrix ) Given pair and any function , define the codeword matrix as an matrix in which the -th row is .
( ratio) Assume the -th column of has bit ’s and bit ’s, where . Define the ratio of the -th column of as .
(ratio spectrum) The ratio spectrum of is defined by a sequence , where denotes the number of columns in that have the ratio, .
(lexicographic ordering of ratio spectra) The ratio spectrum is said to be (strictly) lexicographically greater than , denoted by , if and only if for some and for all .