Efficient OneWay SecretKey Agreement and Private Channel Coding via Polarization
Abstract
We introduce explicit schemes based on the polarization phenomenon for the tasks of oneway secret key agreement from common randomness and private channel coding. For the former task, we show how to use common randomness and insecure oneway communication to obtain a strongly secure key such that the key construction has a complexity essentially linear in the blocklength and the rate at which the key is produced is optimal, i.e., equal to the oneway secretkey rate. For the latter task, we present a private channel coding scheme that achieves the secrecy capacity using the condition of strong secrecy and whose encoding and decoding complexity are again essentially linear in the blocklength.
I Introduction
Consider two parties, Alice and Bob, connected by an authentic but otherwise fully insecure communication channel. It has been shown that without having access to additional resources, it is impossible for them to carry out informationtheoretically secure private communication Shannon (1949); Maurer (1993). In particular they are unable to generate an unconditionally secure key with which to encrypt messages transmitted over the insecure channel. However, if Alice and Bob have access to correlated randomness about which an adversary (Eve) has only partial knowledge, the situation changes completely: informationtheoretically secure secretkey agreement and private communication become possible. Alternatively, if Alice and Bob are connected by a noisy discrete memoryless channel (DMC) to which Eve has only limited access—the socalled wiretap channel scenario of Wyner Wyner (1975), Csiszár and Körner Csiszár and Körner (1978), and Maurer Maurer (1993)—private communication is again possible.
In this paper, we present explicit schemes for efficient oneway secretkey agreement from common randomness and for private channel coding in the wiretap channel scenario. Our schemes are based on polar codes, a family of capacityachieving linear codes, introduced by Arıkan Arıkan (2009), that can be encoded and decoded efficiently. Previous work by us in a quantum setup Renes et al. () already implies that practically efficient oneway secretkey agreement and private channel coding in a classical setup is possible, where a practically efficient scheme is one whose computational complexity is essentially linear in the blocklength. The aim of this paper is to explain the schemes in detail and give a purely classical proof that the schemes are reliable, secure, practically efficient and achieve optimal rates. Section II introduces the problems of performing oneway secretkey agreement and private channel coding. We summarize known and new results about the optimal rates for these two problems for different wiretap channel scenarios. In Section III, we explain how to obtain oneway secretkey agreement that is practically efficient, strongly secure, reliable, and achieves the oneway secretkey rate. However, we are not able to give an efficient algorithm for code construction. Section IV introduces a similar scheme that can be used for strongly secure private channel coding at the secrecy capacity. Finally in Section V, we state two open problems that are of interest in the setup of this paper as well as in the quantum mechanical scenario introduced in Renes et al. ().
II Background and Contributions
II.1 Notation and Definitions
Let [k]=\left\{1,\ldots,k\right\} for k\in\mathbb{Z}^{+}. For x\in\mathbb{Z}_{2}^{k} and \mathcal{I}\subseteq[k] we have x[\mathcal{I}]=[x_{i}:i\in\mathcal{I}], x^{i}=[x_{1},\ldots,x_{i}] and x_{j}^{i}=[x_{j},\ldots,x_{i}] for j<i. The set \mathcal{A}^{\mathsf{c}} denotes the complement of the set \mathcal{A}. The uniform distribution on an arbitrary random variable X is denoted by \overline{P}_{X}. For distributions P and Q over the same alphabet \mathcal{X}, the variational distance is defined by \delta(P,Q):=\tfrac{1}{2}\sum_{x\in\mathcal{X}}\leftP(x)Q(x)\right. The notation X\small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}Y\small{\mbox{$% \hskip3.698858pt\circ\hskip3.698858pt$}}Z means that the random variables X,Y,Z form a Markov chain in the given order.
In this setup we consider a discrete memoryless wiretap channel (DMWTC) \mathsf{W}:\mathcal{X}\to\mathcal{Y}\times\mathcal{Z}, which is characterized by its transition probability distribution P_{Y,ZX}. We assume that the variable X belongs to Alice, Y to Bob and Z to Eve.
According to Körner and Marton Körner and Marton (1977), a DMWTC \mathsf{W}:\mathcal{X}\to\mathcal{Y}\times\mathcal{Z} is termed more capable if I\!\left({X};{Y}\right)\geq I\!\left({X};{Z}\right) for every possible distribution on X. The channel \mathsf{W} is termed less noisy if I\!\left({U};{Y}\right)\geq I\!\left({U};{Z}\right) for every possible distribution on (U,X) where U has finite support and U\small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}X\small{\mbox{$% \hskip3.698858pt\circ\hskip3.698858pt$}}(Y,Z) form a Markov chain. If X\small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}Y\small{\mbox{$% \hskip3.698858pt\circ\hskip3.698858pt$}}Z form a Markov chain, \mathsf{W} is called degraded. It has been shown Körner and Marton (1977) that being more capable is a strictly weaker condition than being less noisy, which is a strictly weaker condition than being degraded. Hence, having a DMWTC \mathsf{W} which is degraded implies that \mathsf{W} is less noisy, which again implies that \mathsf{W} is also more capable.
II.2 Polarization Phenomenon
Let X^{N} be a vector whose entries are i.i.d. Bernoulli(p) distributed for p\in[0,1] and N=2^{n} where n\in\mathbb{Z}^{+}. Then define U^{N}=G_{N}X^{N}, where G_{N} denotes the polarization (or polar) transform which can be represented by the matrix
G_{N}:=\begin{pmatrix}1&1\\ 0&1\end{pmatrix}^{\!\!\otimes\log N},  (1) 
where A^{\otimes k} denotes the kth Kronecker power of an arbitrary matrix A. Furthermore, let Y^{N}=\mathsf{W}^{N}X^{N}, where \mathsf{W}^{N} denotes N independent uses of a DMC \mathsf{W}:\mathcal{X}\to\mathcal{Y}. For \epsilon\in(0,1) we may define the two sets
\displaystyle\mathcal{R}_{\epsilon}^{N}(XY)  \displaystyle:=\left\{i\in[N]:H\!\left({U_{i}}\!\left{U^{i1},Y^{N}}\right.% \right)\geq 1\epsilon\right\}\quad\textnormal{and}  (2)  
\displaystyle\mathcal{D}_{\epsilon}^{N}(XY)  \displaystyle:=\left\{i\in[N]:H\!\left({U_{i}}\!\left{U^{i1},Y^{N}}\right.% \right)\leq\epsilon\right\}.  (3) 
The former consists of outputs U_{j} which are essentially uniformly random, even given all previous outputs U^{j1} as well as Y^{N}, while the latter set consists of the essentially deterministic outputs. The polarization phenomenon is that essentially all outputs are in one of these two subsets, and their sizes are given by the conditional entropy of the input X given Y.
Theorem 1 (Polarization Phenomenon Arıkan (2009, 2010)).
For any \epsilon\in(0,1)
\left\mathcal{R}_{\epsilon}^{N}(XY)\right=NH\!\left({X}\!\left{Y}\right.% \right)o(N)\quad\textnormal{and}\quad\left\mathcal{D}_{\epsilon}^{N}(XY)% \right=N\left(1H\!\left({X}\!\left{Y}\right.\right)\right)o(N).  (4) 
Based on this theorem it is possible to construct a family of linear error correcting codes, called polar codes, that have several desirable attributes Arıkan (2009); Sasoglu et al. (2009); Arıkan and Telatar (2009); Honda and Yamamoto (2012): they provably achieve the capacity of any DMC; they have an encoding and decoding complexity that is essentially linear in the blocklength N; the error probability decays exponentially in the square root of the blocklength.
Correlated sequences of binary random variables may be polarized using a multilevel construction, as shown in Sasoglu et al. (2009).^{1}^{1}1An alternative approach is given in Abbe (2011); Sahebi and Pradhan (2011), where the polarization phenomenon has been generalized for arbitrary finite fields. We will however focus on the multilevel construction in this paper. Given M i.i.d. instances of a sequence X=(X_{(1)},X_{(2)},\dots X_{(K)}) and possibly a correlated random variable Y, the basic idea is to first polarize X_{(1)}^{M} relative to Y^{M}, then treat X_{(1)}^{M}Y^{M} as side information in polarizing X_{(2)}^{M}, and so on. More precisely, defining U^{M}_{(j)}=G_{M}X^{M}_{(j)} for j=1,\dots,K, we may define the random and deterministic sets for each j as
\displaystyle\mathcal{R}_{\epsilon,(j)}^{M}(X_{(j)}X_{(j1)},\cdots,X_{(1)},Y)  \displaystyle=\{i\in[M]:H\!\left({U_{(j),i}}\!\left{U_{(j)}^{i1},X_{(j1)}^{% M},\cdots,X_{(1)}^{M},Y^{M}}\right.\right)\geq 1\epsilon\},  (5)  
\displaystyle\mathcal{D}_{\epsilon,(j)}^{M}(X_{(j)}X_{(j1)},\cdots,X_{(1)},Y)  \displaystyle=\{i\in[M]:H\!\left({U_{(j),i}}\!\left{U_{(j)}^{i1},X_{(j1)}^{% M},\cdots,X_{(1)}^{M},Y^{M}}\right.\right)\leq\epsilon\}.  (6) 
In principle we could choose different \epsilon parameters for each j, but this will not be necessary here. Now, Theorem 1 applies to the random and deterministic sets for every j. The sets \mathcal{R}_{\epsilon}^{M}(XY)=\{\mathcal{R}_{\epsilon,(j)}^{M}(X_{(j)}X_{(j% 1)},\ldots,X_{(1)},Y)\}_{j=1}^{K} and \mathcal{D}_{\epsilon}^{M}(XY)=\{\mathcal{D}_{\epsilon,(j)}^{M}(X_{(j)}X_{(j% 1)},\ldots,X_{(1)},Y)\}_{j=1}^{K} have sizes given by
\displaystyle\mathcal{R}_{\epsilon}^{M}(XY)  \displaystyle=\sum_{j=1}^{K}\left\mathcal{R}_{\epsilon,(j)}^{M}(X_{(j)}X_{(j% 1)},\ldots,X_{(1)},Y)\right  (7)  
\displaystyle=\sum_{j=1}^{K}MH\!\left({X_{(j)}}\!\left{X_{(1)},\dots,X_{(j1)% },Y}\right.\right)o(M)  (8)  
\displaystyle=MH\!\left({X}\!\left{Y}\right.\right)o(KM),  (9) 
and
\displaystyle\mathcal{D}_{\epsilon}^{M}(XY)  \displaystyle=\sum_{j=1}^{K}\left\mathcal{D}_{\epsilon,(j)}^{M}(X_{(j)}X_{(j% 1)},\ldots,X_{(1)},Y)\right  (10)  
\displaystyle=\sum_{j=1}^{K}M\left(1H\!\left({X_{(j)}}\!\left{X_{(1)},\dots,% X_{(j1)},Y}\right.\right)\right)o(M)  (11)  
\displaystyle=M\left(KH\!\left({X}\!\left{Y}\right.\right)\right)o(KM).  (12) 
In the following we will make use of both the polarization phenomenon in its original form, Theorem 1, and the multilevel extension. To simplify the presentation, we denote by \widetilde{G}_{M}^{K} the K parallel applications of G_{M} to the K random variables X^{M}_{(j)}.
II.3 OneWay SecretKey Agreement
At the start of the oneway secretkey agreement protocol, Alice, Bob, and Eve share N=2^{n}, n\in\mathbb{Z}^{+} i.i.d. copies (X^{N},Y^{N},Z^{N}) of a triple of correlated random variables (X,Y,Z) which take values in discrete but otherwise arbitrary alphabets \mathcal{X}, \mathcal{Y}, \mathcal{Z}.^{2}^{2}2The correlation of the random variables (X,Y,Z) is described by its joint probability distribution P_{X,Y,Z}.
Alice starts the protocol by performing an operation \tau_{A}:\mathcal{X}^{N}\rightarrow(\mathcal{S}^{J},\mathcal{C}) on X^{N} which outputs both her secret key S_{A}^{J}\in\mathcal{S}^{J} for \mathcal{S}=\{0,1\} and an additional random variable C\in\mathcal{C} which she transmits to Bob over an insecure but noiseless public channel. Bob then performs an operation \tau_{B}:(\mathcal{Y}^{N},\mathcal{C})\rightarrow\mathcal{S}^{J} on Y^{N} and the information C he received from Alice to obtain a vector S_{B}^{J}\in\mathcal{S}^{J}; his secret key. The secretkey thus produced should be reliable, i.e., satisfy the
\textnormal{reliability condition:}\quad\lim\limits_{N\to\infty}\,{\rm Pr}\!% \left[S_{A}^{J}\neq S_{B}^{J}\right]=0,  (13) 
and secure, i.e., satisfy the
\textnormal{(strong) secrecy condition:}\quad\lim_{N\rightarrow\infty}\left% \lVert P_{S_{A}^{J},Z^{N},C}\overline{P}_{S_{A}^{J}}\times P_{Z^{N},C}\right% \rVert_{1}=0,  (14) 
where \overline{P}_{X} denotes the uniform distribution on random variable X.
Historically, secrecy was first characterized by a (weak) secrecy condition of the form
\lim\limits_{N\to\infty}\frac{1}{N}I\!\left({S_{A}^{J}};{Z^{N},C}\right)=0.  (15) 
Maurer and Wolf showed that (15) is not a sufficient secrecy criterion Maurer (1994); Maurer and Wolf (2000) and introduced the strong secrecy condition
\lim\limits_{N\to\infty}I\!\left({S_{A}^{J}};{Z^{N},C}\right)=0,  (16) 
where in addition it is required that the key is uniformly distributed, i.e.,
\lim\limits_{N\to\infty}\delta\!\left({P_{S_{A}^{J}}},{\overline{P}_{S_{A}^{J}% }}\right)=0.  (17) 
In recent years, the strong secrecy condition (16) has often been replaced by (14), since (half) the L_{1} distance directly bounds the probability of distinguishing the actual key produced by the protocol with an ideal key. This operational interpretation is particularly helpful in the finite blocklength regime. In the limit N\to\infty, the two secrecy conditions (14) and (16) are equivalent, which can be shown using Pinskser’s and Fano’s inequalities.
Since having weak secrecy is not sufficient, we will only consider strong secrecy in this paper. It has been proven that each secretkey agreement protocol which achieves weak secrecy can be transformed into a strongly secure protocol Maurer and Wolf (2000). However, it is not clear whether the resulting protocol is guaranteed to be practically efficient.
For oneway communication, Csiszár and Körner Csiszár and Körner (1978) and later Ahlswede and Csiszár Ahlswede and Csiszár (1993) showed that the optimal rate R:=\lim_{N\rightarrow\infty}\frac{J}{N} of generating a secret key satisfying (13) and (16), called the secretkey rate S_{\to}\!\left({X};{Y}\!\left{Z}\right.\right), is characterized by a closed singleletter formula.
Theorem 2 (Csiszár and Körner (1978); Ahlswede and Csiszár (1993)).
For triples (X,Y,Z) described by P_{X,Y,Z} as explained above,
S_{\to}\!\left({X};{Y}\!\left{Z}\right.\right)=\left\{\begin{array}[]{rl}\max% \limits_{P_{U,V}}&H\!\left({U}\!\left{Z,V}\right.\right)H\!\left({U}\!\left% {Y,V}\right.\right)\\ \mathrm{s.t.}&V\small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}U% \small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}X\small{\mbox{$% \hskip3.698858pt\circ\hskip3.698858pt$}}(Y,Z),\\ &\mathcal{V}\leq\mathcal{X},\,\mathcal{U}\leq\mathcal{X}^{2}.\end{% array}\right.  (18) 
The expression for the oneway secretkey rate given in Theorem 2 can be simplified if one makes additional assumptions about P_{X,Y,Z}.
Corollary 3.
For P_{X,Y,Z} such that the induced DMWTC \mathsf{W} described by P_{Y,ZX} is more capable,
S_{\to}\!\left({X};{Y}\!\left{Z}\right.\right)=\left\{\begin{array}[]{rl}\max% \limits_{P_{V}}&H\!\left({X}\!\left{Z,V}\right.\right)H\!\left({X}\!\left{Y% ,V}\right.\right)\\ \mathrm{s.t.}&V\small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}X% \small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}(Y,Z),\\ &\mathcal{V}\leq\mathcal{X}.\end{array}\right.  (19) 
Proof.
In terms of the mutual information, we have
\displaystyle H\!\left({U}\!\left{Z,V}\right.\right)H\!\left({U}\!\left{Y,V% }\right.\right)  \displaystyle=I\!\left({U};{Y}\!\left{V}\right.\right)I\!\left({U};{Z}\!% \left{V}\right.\right)  (20)  
\displaystyle=I\!\left({X,U};{Y}\!\left{V}\right.\right)I\!\left({X,U};{Z}\!% \left{V}\right.\right)\left(I\!\left({X};{Y}\!\left{U,V}\right.\right)I\!% \left({X};{Z}\!\left{U,V}\right.\right)\right)  (21)  
\displaystyle\leq I\!\left({X,U};{Y}\!\left{V}\right.\right)I\!\left({X,U};{% Z}\!\left{V}\right.\right)  (22)  
\displaystyle=I\!\left({X};{Y}\!\left{V}\right.\right)I\!\left({X};{Z}\!% \left{V}\right.\right),  (23) 
using the chain rule, the more capable condition, and the Markov chain properties, respectively. Thus, the maximum in S_{\to}(X;YZ) can be achieved when omitting U. ∎
Corollary 4.
For P_{X,Y,Z} such that the induced DMWTC \mathsf{W} described by P_{Y,ZX} is less noisy,
S_{\to}\!\left({X};{Y}\!\left{Z}\right.\right)=H\!\left({X}\!\left{Z}\right.% \right)H\!\left({X}\!\left{Y}\right.\right).  (24) 
Proof.
Since \mathsf{W} being less noisy implies \mathsf{W} being more capable, we know that the oneway secret key rate is given by (19). Using the chain rule we obtain
\displaystyle H\!\left({X}\!\left{Z,V}\right.\right)H\!\left({X}\!\left{Y,V% }\right.\right)  \displaystyle=I\!\left({X};{Y}\!\left{V}\right.\right)I\!\left({X};{Z}\!% \left{V}\right.\right)  (25)  
\displaystyle=I\!\left({X,V};{Y}\right)I\!\left({X,V};{Z}\right)I\!\left({V}% ;{Y}\right)+I\!\left({V};{Z}\right)  (26)  
\displaystyle=I\!\left({X};{Y}\right)I\!\left({X};{Z}\right)\left(I\!\left({% V};{Y}\right)I\!\left({V};{Z}\right)\right)  (27)  
\displaystyle\leq I\!\left({X};{Y}\right)I\!\left({X};{Z}\right).  (28) 
Equation (27) follows from the chain rule and the Markov chain condition. The inequality uses the assumption of being less noisy. ∎
Note that (24) is also equal to the oneway secretkey rate for the case where \mathsf{W} is degraded, as this implies \mathsf{W} being less noisy. The proof of Theorem 2 does not imply that there exists an efficient oneway secretkey agreement protocol. A computationally efficient scheme was constructed in Holenstein and Renner (2005), but is not known to be practically efficient.^{3}^{3}3As defined in Section I, we call a scheme practically efficient if its computational complexity is essentially linear in the blocklength.
For key agreement with twoway communication, no formula comparable to (18) for the optimal rate is known. However, it has been shown that the twoway secretkey rate is strictly larger than the oneway secretkey rate. It is also known that the intrinsic information I(X;Y\!\!\downarrow\!Z):=\min_{P_{Z^{\prime}Z}}I\!\left({X};{Y}\!\left{Z^{% \prime}}\right.\right) is an upper bound on S(X;YZ), but is not tight Ahlswede and Csiszár (1993); Maurer and Wolf (1999); Renner and Wolf (2003).
II.4 Private Channel Coding
Private channel coding over a wiretap channel is closely related to the task of oneway secretkey agreement from common randomness (cf. Section II.5). Here Alice would like to transmit a message M^{J}\in\mathcal{M}^{J} privately to Bob. The messages can be distributed according to some arbitrary distribution P_{M^{J}}. To do so, she first encodes the message by computing X^{N}=\operatorname{enc}(M^{J}) for some encoding function \operatorname{enc}:\mathcal{M}^{J}\to\mathcal{X}^{N} and then sends X^{N} over the wiretap channel to Bob (and to Eve), which is represented by (Y^{N},Z^{N})=\mathsf{W}^{N}X^{N}. Bob next decodes the received message to obtain a guess for Alice’s message \hat{M}^{J}=\operatorname{dec}(Y^{N}) for some decoding function \operatorname{dec}:\mathcal{Y}^{N}\to\mathcal{M}^{J}. As in secretkey agreement, the private channel coding scheme should be reliable, i.e., satisfy the
\textnormal{reliability condition:}\quad\lim\limits_{J\to\infty}\,{\rm Pr}\!% \left[M^{J}\neq\hat{M}^{J}\right]=0  (29) 
and (strongly) secure, i.e., satisfy the
\textnormal{(strong) secrecy condition:}\quad\lim_{J\rightarrow\infty}\left% \lVert P_{M^{J},Z^{N},C}P_{M^{J}}\times P_{Z^{N},C}\right\rVert_{1}=0.  (30) 
The variable C denotes any additional information made public by the protocol.
As mentioned in Section II.3, in the limit J\to\infty this strong secrecy condition is equivalent to the historically older (strong) secrecy condition
\lim\limits_{J\to\infty}I\!\left({M^{J}};{Z^{N},C}\right)=0.  (31) 
The highest achievable rate R:=\lim_{N\rightarrow\infty}\frac{J}{N} fulfilling (29) and (30) is called the secrecy capacity.
Csiszár and Körner showed (Csiszár and Körner, 1978, Corollary 2) that there exists a singleletter formula for the secrecy capacity.^{4}^{4}4Maurer and Wolf showed that the singleletter formula remains valid considering strong secrecy Maurer and Wolf (2000).
Theorem 5 (Csiszár and Körner (1978)).
For an arbitrary DM WTC \mathsf{W} as introduced above,
C_{s}=\left\{\begin{array}[]{rl}\max\limits_{P_{V,X}}&H\!\left({V}\!\left{Z}% \right.\right)H\!\left({V}\!\left{Y}\right.\right)\\ \mathrm{s.t.}&V\small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}X% \small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}(Y,Z),\\ &\mathcal{V}\leq\mathcal{X}.\end{array}\right.  (32) 
This expression can be simplified using additional assumptions about \mathsf{W}.
Corollary 6 (Körner and Marton (1977)).
If \mathsf{W} is more capable,
C_{s}=H\!\left({X}\!\left{Z}\right.\right)H\!\left({X}\!\left{Y}\right.% \right).  (33) 
II.5 Previous Work and Our Contributions
In Section III, we present a oneway secretkey agreement scheme based on polar codes that achieves the secretkey rate, is strongly secure, reliable and whose implementation is practically efficient, with complexity O(N\log N) for blocklength N. Our protocol improves previous efficient secretkey constructions Abbe (2012), where only weak secrecy could be proven and where the eavesdropper has no prior knowledge and/or degradability assumptions are required. However, we are not able to give an efficient algorithm for code construction.
In Section IV, we introduce a coding scheme based on polar codes that provably achieves the secrecy capacity for arbitrary discrete memoryless wiretap channels. We show that the complexity of the encoding and decoding operations is O(N\log N) for blocklength N. Our scheme improves previous work on practically efficient private channel coding at the optimal rate Mahdavifar and Vardy (2011), where only weak secrecy could be proven under the additional assumption that the channel \mathsf{W} is degraded.^{5}^{5}5Note that Mahdavifar and Vardy showed that their scheme achieves strong secrecy if the channel to Eve (induced from \mathsf{W}) is noiseless. Otherwise their scheme is not provably reliable Mahdavifar and Vardy (2011). Recently, Bellare et al. introduced an efficient coding scheme that is strongly secure and achieves the secrecy capacity for binary symmetric wiretap channels Bellare et al. (2012).^{6}^{6}6They claim that their scheme works for a large class of wiretap channels. However, this class has not been characterized precisely so far. It is therefore not clear wether their scheme requires for example degradability assumptions. Note that to obtain strong secrecy for an arbitrarily distributed message, it is required that the wiretap channel is symmetric (Bellare et al., 2012, Lemma14). Several other constructions of private channel coding schemes have been reported Andersson et al. (2010); Hof and Shamai (2010); Koyluoglu and El Gamal (2010), but all achieve only weak secrecy.
The tasks of oneway secretkey agreement and private channel coding explained in the previous two subsections are closely related. Maurer showed how a oneway secretkey agreement can be derived from a private channel coding scenario Maurer (1993). More precisely, he showed how to obtain the common randomness needed for oneway secretkey agreement by constructing a “virtual” degraded wiretap channel from Alice to Bob. This approach can be used to obtain the oneway secretkey rate from the secrecy capacity result in the wiretap channel scenario (El Gamal and Kim, 2012, Section 22.4.3).
One of the main advantages of the two schemes introduced in this paper is that they are both practically efficient. However, even given a practically efficient private coding scheme, it is not known that Maurer’s construction will yield a practically efficient scheme for secret key agreement. For this reason, as well as simplicity of presentation, we treat the oneway secretkey agreement and the private channel coding problem separately in the two sections to follow.
III OneWay SecretKey Agreement Scheme
Our key agreement protocol is a concatenation of two subprotocols, an inner and an outer layer, as depicted in Figure 1. The protocol operates on blocks of N i.i.d. triples (X,Y,Z), which are divided into M subblocks of size L for input to the inner layer. In the following we assume \mathcal{X}=\{0,1\}, which however is only for convenience; the techniques of Sasoglu et al. (2009) and Karzand and Telatar (2010) can be used to generalize the schemes to discrete memoryless wiretap channels with arbitrary input size.
The task of the inner layer is to perform information reconciliation and that of the outer layer is to perform privacy amplification. Information reconciliation refers to the process of carrying out error correction to ensure that Alice and Bob obtain a shared bit string, and here we only allow communication from Alice to Bob for this purpose. On the other hand, privacy amplification refers to the process of distilling from Alice’s and Bob’s shared bit string a smaller set of bits whose correlation with the information available to Eve is below a desired threshold.
Each subprotocol in our scheme is based on the polarization phenomenon. For information reconciliation of Alice’s random variable X^{L} relative to Bob’s information Y^{L}, Alice applies a polar transformation to X^{L} and forwards the bits of the complement of the deterministic set \mathcal{D}_{\epsilon_{1}}^{L}(XY) to Bob over a insecure public channel, which enables him to recover X^{L} using the standard polar decoder Arıkan (2009). Her remaining information is then fed into a multilevel polar transformation and the bits of the random set are kept as the secret key.
Let us now define the protocol more precisely. For L=2^{\ell}, \ell\in\mathbb{Z}^{+}, let V^{L}=G_{L}X^{L} where G_{L} is as defined in (1). For \epsilon_{1}>0, we define
\displaystyle\mathcal{E}_{K}:=\mathcal{D}_{\epsilon_{1}}^{L}(XY),  (34) 
with K:=\mathcal{D}_{\epsilon_{1}}^{L}(XY). Then, let T_{(j)}=V^{L}[\mathcal{E}_{K}]_{j} for j=1,\dots,K and C_{(j)}=V^{L}[\mathcal{E}_{K}^{c}]_{j} for j=1,\dots,LK so that T=(T_{(1)},\dots,T_{(K)}) and C=(C_{(1)},\dots,C_{(LK)}). For \epsilon_{2}>0 and U^{M}_{(j)}=G_{M}T_{(j)}^{M} for j=1,\dots K (or, more briefly, U^{M}=\widetilde{G}_{M}^{K}T^{M}), we define
\displaystyle\mathcal{F}_{J}:=\mathcal{R}_{\epsilon_{2}}^{M}(TCZ^{L}),  (35) 
with J:=R_{\epsilon_{2}}^{M}(TCZ^{L}).

Protocol 1: Oneway secretkey agreement 
Given:  Index sets \mathcal{E}_{K} and \mathcal{F}_{J} (code construction) 
Notation:  Alice’s input: x^{N}\in\mathbb{Z}_{2}^{N} (a realization of X^{N}) 
Bob’s / Eve’s input: (y^{N},z^{N}) (realizations of Y^{N} and Z^{N})  
Alice’s output: s_{A}^{J}  
Bob’s output: s_{B}^{J}  
Step 1:  Alice computes v_{i+1}^{i+L}=G_{L}x_{i+1}^{i+L} for all i\in\{0,L,2L,\ldots,(M1)L\}. 
Step 2:  Alice computes t_{i}=v_{i+1}^{i+L}[\mathcal{E}_{K}] for all i\in\{0,L,2L,\ldots,(M1)L\}. 
Step 3:  Alice sends c_{i}=v^{i+L}_{i+1}[\mathcal{E}_{K}^{\mathsf{c}}] for all i\in\{0,L,2L,\ldots,(M1)L\} over a public channel to Bob. 
Step 4:  Alice computes u^{M}=\widetilde{G}_{M}^{K}t^{M} and obtains s_{A}^{J}=u^{M}[\mathcal{F}_{J}].^{†}^{†}footnotemark: 
Step 5:  Bob applies the standard polar decoder Arıkan (2009); Honda and Yamamoto (2012) to (c_{i},y_{i+1}^{i+L}) to obtain \hat{v}^{i+L}_{i+1} and 
{\hat{t}}_{i}=\hat{v}_{i+1}^{i+L}[\mathcal{E}_{K}], for i\in\{0,L,2L,\ldots,(M1)L\}.  
Step 6:  Bob computes \hat{u}^{M}=\tilde{G}_{M}^{K}{t}^{M} and obtains s_{B}^{J}=\hat{u}^{M}[\mathcal{F}_{J}]. 

III.1 Rate, Reliability, Secrecy, and Efficiency
Theorem 7.
Protocol 1 allows Alice and Bob to generate a secret key S_{A}^{J} respecitvely S_{B}^{J} using public oneway communication C^{M}such that for \beta<\tfrac{1}{2} :
Reliability:  \displaystyle\,{\rm Pr}\!\left[S_{A}^{J}\neq S_{B}^{J}\right]=O\!\left(M2^{L^% {\beta}}\right)  (36)  
Secrecy:  \displaystyle\left\lVert P_{S_{A}^{J},Z^{N},C}\overline{P}_{S_{A}^{J}}\times P% _{Z^{N},C}\right\rVert_{1}=O\!\left(\sqrt{N}2^{\frac{N^{\beta}}{2}}\right)  (37)  
Rate:  \displaystyle R:=\frac{J}{N}=H\!\left({X}\!\left{Z}\right.\right)\frac{1}{L}% H\!\left({V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]}\!\left{Z^{L}}\right.\right)% \frac{o(N)}{N}.  (38) 
All operations by both parties may be performed in O(N\log N) steps.
Proof.
The reliability of Alice’s and Bob’s key follows from the standard polar decoder error probability and the union bound. Each instance of the decoding algorithm employed by Bob has an error probability which scales as O(2^{L^{\beta}}) for \beta<\tfrac{1}{2} Arıkan (2010); application of the union bound gives the prefactor M.
To prove the secrecy statement requires more effort. Using Pinsker’s inequality we obtain
\displaystyle\delta\!\left({P_{S_{A}^{J},Z^{N},C^{M}}},{\overline{P}_{S_{A}^{J% }}\times P_{Z^{N},C^{M}}}\right)  \displaystyle\leq\sqrt{\tfrac{\ln 2}{2}D\!\left(\!\left.\left.{P_{S_{A}^{J},Z^% {N},C^{M}}}\right\!\right{\overline{P}_{S_{A}^{J}}\times P_{Z^{N},C^{M}}}% \right)}  (39)  
\displaystyle=\sqrt{\tfrac{\ln 2}{2}\left(JH\!\left({S_{A}^{J}}\!\left{Z^{N}% ,C^{M}}\right.\right)\right)},  (40) 
where the last step uses the chain rule for relative entropies and that \overline{P}_{S_{A}^{J}} denotes the uniform distribution. We can simplify the conditional entropy expression using the chain rule
\displaystyle H\!\left({S_{A}^{J}}\!\left{Z^{N},C^{M}}\right.\right)  
\displaystyle =H\!\left({U^{M}[\mathcal{F}_{J}]}\!\left{Z^{N},(V^{L}[% \mathcal{E}_{K}^{\mathsf{c}}])^{M}}\right.\right)  (41)  
\displaystyle =\sum_{j=1}^{K}H\!\left({U_{(j)}^{M}[\mathcal{F}_{(j)}]}\!% \left{U_{(1)}^{M}[\mathcal{F}_{(1)}],\ldots,U_{(j1)}^{M}[\mathcal{F}_{(j1)}% ],Z^{N},(V^{L}[\mathcal{E}_{K}^{\mathsf{c}}])^{M}}\right.\right)  (42)  
\displaystyle =\sum_{j=1}^{K}\sum_{i=1}^{\left\mathcal{F}_{(i)}\right}% H\!\left({U_{(j)}^{M}[\mathcal{F}_{(j)}]_{i}}\!\left{U_{(j)}^{M}[\mathcal{F}_% {(j)}]^{i1},U_{(1)}^{M}[\mathcal{F}_{(1)}],\ldots,U_{(j1)}^{M}[\mathcal{F}_{% (j1)}],Z^{N},(V^{L}[\mathcal{E}_{K}^{\mathsf{c}}])^{M}}\right.\right)  (43)  
\displaystyle \geq\sum_{j=1}^{K}\sum_{i\in\mathcal{F}_{j}}H\!\left({U_{(% j)i}}\!\left{U_{(j)}^{i1},U_{(1)}^{M}[\mathcal{F}_{(1)}],\ldots,U_{(j1)}^{M% }[\mathcal{F}_{(j1})],Z^{N},(V^{L}[\mathcal{E}_{K}^{\mathsf{c}}])^{M}}\right.\right)  (44)  
\displaystyle \geq J\left(1\epsilon_{2}\right),  (45) 
where the first inequality uses the fact that that conditioning reduces entropy and the second inequality follows by the definition of \mathcal{F}_{J}. Recall that we are using the notation introduced in Section II.2. For \mathcal{F}_{J} as defined in (35), we have \mathcal{F}_{J}=\left\{\mathcal{F}_{(j)}\right\}_{j=1}^{K} where \mathcal{F}_{(j)}=\mathcal{R}_{\epsilon_{2}}^{M}\left(T_{(j)}\leftT_{(j1)},% \ldots,T_{(1)},C,Z^{L}\right.\right). The polarization phenomenon, Theorem 1, implies J=O(N), which together with (40) proves the secrecy statement of Theorem 7, since \epsilon_{2}=O(2^{N^{\beta}}) for \beta<\tfrac{1}{2}.
The rate of the scheme is
\displaystyle R  \displaystyle=\frac{\left\mathcal{F}_{J}\right}{N}  (46)  
\displaystyle=\frac{1}{L}H\!\left({V^{L}[\mathcal{E}_{K}]}\!\left{V^{L}[% \mathcal{E}_{K}^{\mathsf{c}}],Z^{L}}\right.\right)\frac{o(N)}{N}  (47)  
\displaystyle=\frac{1}{L}\left(H\!\left({V^{L}}\!\left{Z^{L}}\right.\right)H% \!\left({V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]}\!\left{Z^{L}}\right.\right)% \right)\frac{o(N)}{N}  (48)  
\displaystyle=H\!\left({X}\!\left{Z}\right.\right)\frac{1}{L}H\!\left({V^{L}% [\mathcal{E}_{K}^{\mathsf{c}}]}\!\left{Z^{L}}\right.\right)\frac{o(N)}{N},  (49) 
where (47) uses the polarization phenomenon stated in Theorem 1.
It remains to show that the computational complexity of the scheme is O(N\log N). Alice performs the operation G_{L} in the first layer M times, each requiring O(L\log L) steps Arıkan (2009). In the second layer she performs \tilde{G}_{M}^{K}, or K parallel instances of G_{M}, requiring O(KM\log M) total steps. From the polarization phenomenon, we have K=O(L), and thus the complexity of Alice’s operations is not worse than O(N\log N). Bob runs M standard polar decoders which can be done in O(ML\log L) complexity Arıkan (2009); Honda and Yamamoto (2012). Bob next performs the polar transform \widetilde{G}_{M}^{K}, whose complexity is not worse than O(N\log N) as justified above. Thus, the complexity of Bob’s operations is also not worse than O(N\log N). ∎
In principle, the two parameters L and M can be chosen freely. However, to maintain the reliability of the scheme (cf.(36)), M may not grow exponentially fast in L. A reasonable choice would be to have both parameters scale comparably fast, i.e., \frac{M}{L}=O(1).
Corollary 8.
The rate of Protocol 1 given in Theorem 7 can be bounded as
R\geq\max\left\{0,H\!\left({X}\!\left{Z}\right.\right)H\!\left({X}\!\left{Y% }\right.\right)\frac{o(N)}{N}\right\}.  (50) 
Proof.
According to (49) the rate of Protocol 1 is
\displaystyle R  \displaystyle=H\!\left({X}\!\left{Z}\right.\right)\frac{1}{L}H\!\left({V^{L}% [\mathcal{E}_{K}^{\mathsf{c}}]}\!\left{Z^{L}}\right.\right)\frac{o(N)}{N}  (51)  
\displaystyle\geq\max\left\{0,H\!\left({X}\!\left{Z}\right.\right)\frac{% \left\mathcal{E}_{K}^{\mathsf{c}}\right}{L}\frac{o(N)}{N}\right\}  (52)  
\displaystyle=\max\left\{0,H\!\left({X}\!\left{Z}\right.\right)H\!\left({X}% \!\left{Y}\right.\right)\frac{o(N)}{N}\right\},  (53) 
where (53) uses the polarization phenomenon stated in Theorem 1. ∎
III.2 Achieving the SecretKey Rate
Theorem 7 together with Corollaries 4 and 8 immediately imply that Protocol 1 achieves the secretkey rate S_{\to}\!\left({X};{Y}\!\left{Z}\right.\right) if P_{X,Y,Z} is such that the induced DM WTP \mathsf{W} is less noisy. If we can solve the optimization problem (18), i.e., find the optimal auxiliary random variables V and U, our oneway secretkey agreement scheme can achieve S_{\to}\!\left({X};{Y}\!\left{Z}\right.\right) for a general setup. We then make V public, replace X by U and run Protocol 1. Note that finding the optimal random variables V and U might be difficult. It has been shown that for certain distributions the optimal random variables V and U can be found analytically Holenstein and Renner (2005).
Two open problems discussed in Section V address the question if Protocol 1 can achieve a rate that is strictly larger than \max\left\{0,H\!\left({X}\!\left{Z}\right.\right)H\!\left({X}\!\left{Y}% \right.\right)\right\} if nothing about the optimal auxiliary random variables V and U is known, i.e., if we run the protocol directly for X without making V public.
III.3 Code Construction
Before the protocol starts one must construct the code, i.e. compute the index sets \mathcal{E}_{K} and \mathcal{F}_{J}. The set \mathcal{E}_{K} can be computed approximately with a lineartime algorithm introduced in Tal et al. (2012), given the distributions P_{X} and P_{YX}. Alternatively, Tal and Vardy’s older algorithm Tal and Vardy (2011) and its adaption to the asymmetric setup Honda and Yamamoto (2012) can be used.
To compute the outer index set \mathcal{F}_{J} even approximately requires more effort. In principle, we can again use the above algorithms, which require a description of the “supersource” seen by the outer layer, i.e. the source which outputs the triple of random variables (V^{L}[\mathcal{E}_{K}],(Y^{L},V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]),(Z^{L},V^{% L}[\mathcal{E}_{K}^{\mathsf{c}}])). However, its alphabet size is exponential in L, and thus such a direct approach will not be efficient in the overall blocklength N. Nonetheless, due to the structure of the inner layer, it is perhaps possible that the method of approximation by limiting the alphabet size Tal and Vardy (2011); Tal et al. (2012) can be extended to this case. In particular, a recursive construction motivated by the decoding operation introduced in Renes et al. () could potentially lead to an efficient computation of the index set \mathcal{F}_{J}.
IV Private Channel Coding Scheme
Our private channel coding scheme is a simple modification of the secret key agreement protocol of the previous section. Again it consists of two layers, an inner layer which ensures transmitted messages can be reliably decoded by the intended receiver, and an outer layer which guarantees privacy from the unintended receiver. The basic idea is to simply run the key agreement scheme in reverse, inputting messages to the protocol where secret key bits would be output in key agreement. The immediate problem in doing so is that key agreement also produces outputs besides the secret key, so the procedure is not immediately reversible. To overcome this problem, the encoding operations here simulate the random variables output in the key agreement protocol, and then perform the polar transformations \widetilde{G}_{M}^{K} and G_{L} in reverse.^{6}^{6}6As it happens, G_{L} is its own inverse.
The scheme is visualized in Figure 2 and described in detail in Protocol 2. Not explicitly shown is the simulation of the bits U^{M}[\mathcal{F}_{J}] at the outer layer and the bits V^{L}[\mathcal{E}^{c}_{K}] at the inner layer. The outer layer, whose simulated bits are nearly deterministic, makes use of the method described in (Sutter et al., 2012, Definition 1), while the inner layer, whose bits are nearly uniformlydistributed, follows (Honda and Yamamoto, 2012, Section IV). Both proceed by successively sampling from the individual bit distributions given all previous values in the particular block, i.e., constructing V_{j} by sampling from P_{V_{j}V^{j1}}. These distributions can be efficiently constructed, as described in Section IV.3.
Note that a public channel is used to communicate the information reconciliation information to Bob, enabling reliable decoding. However, it is possible to dispense with the public channel and still achieve the same rate and efficiency properties, as will be discussed in Section IV.3.
In the following we assume that the message M^{J} to be transmitted is uniformly distributed over the message set \mathcal{M}=\left\{0,1\right\}^{J}. As mentioned in Section II.4, it may be desirable to have a private coding scheme that works for an arbitrarily distributed message. This can be achieved by assuming that the wiretap channel \mathsf{W} is symmetric—more precisely, by assuming that the two channels \mathsf{W}_{1}:\mathcal{X}\to\mathcal{Y} and \mathsf{W}_{2}:\mathcal{X}\to\mathcal{Z} induced by \mathsf{W} are symmetric. We can define a superchannel \mathsf{W}^{\prime}:\mathcal{T}\to\mathcal{Y}^{L}\times\mathcal{Z}^{L}\times% \mathcal{C} which consists of an inner encoding block and L basic channels \mathsf{W}.^{7}^{7}7This superchannel is explained in more detail in Section V.2. The superchannel \mathsf{W}^{\prime} again induces two channels \mathsf{W}_{1}^{\prime}:\mathcal{T}\to\mathcal{Y}^{L}\times\mathcal{C} and \mathsf{W}_{2}^{\prime}:\mathcal{T}\to\mathcal{Z}^{L}\times\mathcal{C}. Arıkan showed that \mathsf{W}_{1} respectively \mathsf{W}_{2} being symmetric implies that \mathsf{W}_{1}^{\prime} respectively \mathsf{W}_{2}^{\prime} is symmetric (Arıkan, 2009, Proposition 13). It has been shown in (Mahdavifar and Vardy, 2011, Proposition 3) that for symmetric channels polar codes remain reliable for an arbitrary distribution of the message bits. We thus conclude that if \mathsf{W}_{1} is assumed to be symmetric, our coding scheme remains reliable for arbitrarily distributed messages. Assuming having a symmetric channel \mathsf{W}_{2} implies that \mathsf{W}_{2}^{\prime} is symmetric which proves that our scheme is strongly secure for arbitrarily distributed messages.^{8}^{8}8This can be seen easily by the strong secrecy condition given in (30) using that \mathsf{W}_{2}^{\prime} is symmetric.

Protocol 2: Private channel coding 
Given:  Index sets \mathcal{E}_{K} and \mathcal{F}_{J} (code construction)^{†}^{†}footnotemark: 
Notation:  Message to be transmitted: m^{J} 
Outer encoding:  Let u^{M}[\mathcal{F}_{J}]=m^{J}^{†}^{†}footnotemark: and u^{M}[\mathcal{F}_{J}^{\mathsf{c}}]=r^{KMJ} where r^{KMJ} is (randomly) generated 
as explained in (Sutter et al., 2012, Definition 1). Let t^{M}=\widetilde{G}_{M}^{K}u^{M}.  
Inner encoding:  For all i\in\{0,L,\ldots,L(M1)\}, Alice does the following: let \bar{v}_{i+1}^{i+L}[\mathcal{E}_{K}]=t_{(i/L)+1} 
and \bar{v}_{i+1}^{i+L}[\mathcal{E}_{K}^{\mathsf{c}}]=s_{i+1}^{i+LK} where s_{i+1}^{i+LK} is (randomly) generated as explained in  
(Honda and Yamamoto, 2012, Section IV). Send C_{(i/K)+1}:=s_{i+1}^{i+LK} over a public channel to Bob. Finally,  
compute x_{i+1}^{i+L}=G_{L}\bar{v}_{i+1}^{i+L}.  
Transmission:  (y^{N},z^{N})=\mathsf{W}^{N}x^{N} 
Inner decoding:  Bob uses the standard decoder Arıkan (2009); Honda and Yamamoto (2012) with inputs C_{(i/L)+1} and y_{i+1}^{i+L} to obtain \hat{v}_{i+1}^{i+L}, 
and hence \hat{t}_{(i/L)+1}=\hat{v}_{i+1}^{i+L}[\mathcal{E}_{K}], for each i\in\{0,L,\ldots,L(M1)\}.  
Outer decoding:  Bob computes \hat{u}^{M}=\widetilde{G}_{M}^{K}\hat{t}^{M} and outputs a guess for the sent message \hat{m}^{J}=\hat{u}^{M}[\mathcal{F}_{J}]. 

IV.1 Rate, Reliability, Secrecy, and Efficiency
Corollary 9.
For any \beta<\tfrac{1}{2}, Protocol 2 satisfies
Reliability:  \displaystyle\,{\rm Pr}\!\left[M^{J}\neq\hat{M}^{J}\right]=O\!\left(M2^{L^{% \beta}}\right)  (54)  
Secrecy:  \displaystyle\left\lVert P_{M^{J},Z^{N},C}\overline{P}_{M^{J}}\times P_{Z^{N}% ,C}\right\rVert_{1}=O\!\left(\sqrt{N}2^{\frac{N^{\beta}}{2}}\right)  (55)  
Rate:  \displaystyle R=H\!\left({X}\!\left{Z}\right.\right)\frac{1}{L}H\!\left({V^{% L}[\mathcal{E}_{K}^{\mathsf{c}}]}\!\left{Z^{L}}\right.\right)\frac{o(N)}{N}  (56) 
and its computational complexity is O(N\log N).
Proof.
Recall that the idea of the private channel coding scheme is to run Protocol 1 backwards. Since Protocol 2 simulates the nearly deterministic bits U^{M}[\mathcal{F}_{J}] at the outer encoder as described in (Sutter et al., 2012, Definition 1) and the almost random bits V^{L}[\mathcal{E}_{K}^{\mathsf{c}}] at the inner encoder as explained in (Honda and Yamamoto, 2012, Section IV), it follows that for large values of L and M the private channel coding scheme approximates the oneway secretkey scheme setup,^{9}^{9}9This approximation can be made arbitrarily precise for sufficiently large values of L and M. i.e., \lim_{N\to\infty}\delta\!\left({P_{T^{M}}},{P_{(V^{L}[\mathcal{E}_{K}])^{M}}}% \right)=0 and \lim_{L\to\infty}\delta\!\left({P_{X^{L}}},{P_{\hat{X}^{L}}}\right)=0 and, where P_{X^{L}} denotes the distribution of the vector X^{L} which is sent over the wiretap channel \mathsf{W} and P_{\hat{X}^{L}} denotes the distribution of Alice’s random variable \hat{X}^{L} in the oneway secretkey agreement setup. We thus can use the decoder introduced in Arıkan (2010) to decode the inner layer. Since we are using M identical independent inner decoding blocks, by the union bound we obtain the desired reliability condition. The secrecy and rate statement are immediate consequences from Theorem 7.
∎
As mentioned after Theorem 7, to ensure reliability of the protocol, M may not grow exponentially fast in L.
Corollary 10.
The rate of Protocol 2 given in Corollary 9 can be bounded as
R\geq\max\left\{0,H\!\left({X}\!\left{Z}\right.\right)H\!\left({X}\!\left{Y% }\right.\right)\frac{o(N)}{N}\right\}.  (57) 
Proof.
The proof is identical to the proof of Corollary 8. ∎
IV.2 Achieving the Secrecy Capacity
Corollaries 6 and 10 immediately imply that our private channel coding scheme achieves the secrecy capacity for the setup where \mathsf{W} is more capable. If we can find the optimal auxiliary random variable V in (32), Protocol 2 can achieve the secrecy capacity for a general wiretap channel scenario. We define a superchannel \overline{\mathsf{W}}:\mathcal{V}\to\mathcal{Y}\times\mathcal{Z} which includes the random variable X and the wiretap channel \mathsf{W}. The superchannel \overline{\mathsf{W}} is characterized by its transition probability distribution P_{Y,ZV} where V is the optimal random variable solving (32). The private channel coding scheme is then applied to the superchannel, achieving the secrecy capacity. Note that finding the optimal random variable V might be difficult.
In Section V, we discuss the question if it is possible that Protocol 2 achieves a rate that is strictly larger than \max\left\{0,H\!\left({X}\!\left{Z}\right.\right)H\!\left({X}\!\left{Y}% \right.\right)\right\}, if nothing about the optimal auxiliary random variable V is known.
IV.3 Code Construction & Public Channel Communication
To start the private channel coding scheme the code construction has to be done. Therefore, the index sets \mathcal{E}_{K} and \mathcal{F}_{J} as defined in (34) and (35) need to be computed. This can be done as explained in Section III.3. The code construction defines the input distribution P_{X} to the wiretap channel, which should be chosen such that it maximizes the scheme’s rate given in (56).
We next explain how the communication C^{M}\in\mathcal{C}^{M} from Alice to Bob can be reduced such that it does not affect the rate, i.e., we show that we can choose \left\mathcal{C}\right=o(L). Recall that we defined the index set \mathcal{E}_{K}:=\mathcal{D}_{\epsilon_{1}}^{L}(XY) in (34). Let \mathcal{G}:=\mathcal{R}_{\epsilon_{1}}^{L}(XY) using the noation introduced in (2) and \mathcal{I}:=[L]\backslash(\mathcal{E}_{K}\cup\mathcal{G})=\mathcal{E}_{K}^{% \mathsf{c}}\backslash\mathcal{G}. As explained in Section II.2, \mathcal{G} consists of the outputs V_{j} which are essentially uniformly random, even given all previous outputs V^{j1} as well as Y^{L}, where V^{L}=G_{L}X^{L}. The index set \mathcal{I} consists of the outputs V_{j} which are neither essentially uniformly random nor essentially deterministic given V^{j1} and Y^{L}. The polarization phenomenon stated in Theorem 1 ensures that this set is small, i.e., that \left\mathcal{I}\right=o(L). Since the bits of \mathcal{G} are almost uniformly distributed, we can fix these bits independently of the message—as part of the code construction—without affecting the reliability of the scheme for large blocklengths.^{10}^{10}10Recall that we choose \epsilon_{1}=O\left(2^{L^{\beta}}\right) for \beta<\tfrac{1}{2}, such that for L\to\infty the index set \mathcal{G} contains only uniformly distributed bits. We thus only need to communicate the bits belonging to the index set \mathcal{I}.
We can send the bits belonging to \mathcal{I} over a seperate public noiseless channel. Alternatively, we could send them over the wiretap channel \mathsf{W} that we are using for private channel coding. However since \mathsf{W} is assumed to be noisy and it is essential that the bits in \mathcal{I} are recieved by Bob without any errors, we need to protect them using an error correcting code. To not destroy the essentially linear computational complexity of our scheme, the code needs to have an encoder and decoder that are practically efficient. Since \left\mathcal{I}\right=o(L), we can use any error correcting code that has a nonvanishing rate. For symmetric binary DMCs, polar coding can be used to transmit reliably an arbitrarily distributed message (Mahdavifar and Vardy, 2011, Proposition 3). We can therefore symmetrize our wiretap channel \mathsf{W} and use polar codes to transmit the bits in \mathcal{I}.^{11}^{11}11Note that the symmetrization of the channel will reduce its rate which however does not matter as we need a nonvanishing rate only.
As the reliability of the scheme is the average over the possible assignments of the random bits belonging to \mathcal{I} (or even \mathcal{E}_{K}^{\mathsf{c}}), at least one choice must be as good as the average, meaning a reliable, efficient, and deterministic scheme must exist. However, it might be computationally hard to find this choice.
V Discussion
In this section, we describe two open problems, both of which address the question of whether rates beyond \max\left\{0,H\!\left({X}\!\left{Z}\right.\right)H\!\left({X}\!\left{Y}% \right.\right)\right\} can be achieved by our key agreement scheme, even if the optimal auxiliary random variables V and U are not given, i.e., if we run Protocol 1 directly for X (instead of U) without making V public. It may be even possible that the key agreement scheme achieves the optimal rate; no result to our knowledge implies otherwise. The two questions could also be formulated in the private coding scenario, whether rates beyond \max\left\{0,\max_{P_{X}}H\!\left({X}\!\left{Z}\right.\right)H\!\left({X}\!% \left{Y}\right.\right)\right\} are possible, but as positive answers in the former context imply positive answers in the latter, we shall restrict attention to the key agreement scenario for simplicity.
V.1 Polarization with Bob’s or Eve’s Side Information
Question 1.
An equivalent formulation of this question is whether inequality \eqref{eq:tight} is always tight for large enough N, i.e.,
Question 1’.
Using the polarization phenomenon stated in Theorem 1 we obtain
\lim\limits_{L\to\infty}\frac{1}{L}\left\mathcal{E}_{K}^{\mathsf{c}}\right=H% \!\left({X}\!\left{Y}\right.\right),  (60) 
which together with (59) would imply that R>\max\left\{0,H\!\left({X}\!\left{Z}\right.\right)H\!\left({X}\!\left{Y}% \right.\right)\right\} for N\to\infty is possible. Relation (59) can only be satisfied if the highentropy set with respect to Bob’s side information, i.e., the set \mathcal{E}^{\mathsf{c}}_{K}, is not always a highentropy set with respect to Eve’s side information. Thus, the question of rates in the key agreement protocol is closely related to fundamental structural properties of the polarization phenomenon.
For less noisy channels \mathsf{W} defined by P_{YZX} (cf. Section II.1), these questions can be answered in the negative. In this case we have H(X^{L}Z^{L})\geq H(X^{L}Y^{L}), and since V^{L}[\mathcal{E}_{K}^{\mathsf{c}}] is a deterministic function of X^{L},
\lim\limits_{L\to\infty}\frac{1}{L}H\!\left({V^{L}[\mathcal{E}_{K}^{\mathsf{c}% }]}\!\left{Z^{L}}\right.\right)\geq\lim\limits_{L\to\infty}\frac{1}{L}H\!% \left({V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]}\!\left{Y^{L}}\right.\right)=\lim% \limits_{L\to\infty}\frac{1}{L}\left\mathcal{E}_{K}^{\mathsf{c}}\right.  (61) 
Thus, (59) cannot hold. The final equality can be justified as follows. Recall that we defined \mathcal{E}_{K}:=\mathcal{D}_{\epsilon_{1}}^{L}(XY) in (34). Let \mathcal{H}_{LK}:=\mathcal{R}_{\epsilon_{1}}^{L}(XY) and \mathcal{I}:=[L]\backslash(\mathcal{E}_{K}\cup\mathcal{H}_{LK}) such that \mathcal{E}_{K}^{\mathsf{c}}=\mathcal{H}_{LK}\cup\mathcal{I}. Recall that we can choose \epsilon_{1}=O(2^{L^{\beta_{1}}}) for \beta_{1}<\frac{1}{2}. Using the chain rule and the polarization phenomenon given in Theorem 1, we obtain
\displaystyle\lim\limits_{L\to\infty}\frac{1}{L}H\!\left({V^{L}[\mathcal{E}_{K% }^{\mathsf{c}}]}\!\left{Y^{L}}\right.\right)  \displaystyle=\lim\limits_{L\to\infty}\frac{1}{L}\sum_{i\in\mathcal{E}_{K}}H\!% \left({V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]_{i}}\!\left{V^{L}[\mathcal{E}_{K}^% {\mathsf{c}}]^{i1},Y^{L}}\right.\right)  (62)  
\displaystyle\geq\lim\limits_{L\to\infty}\frac{1}{L}\left(\left(1\epsilon_{1}% \right)\left\mathcal{H}_{LK}\right+\epsilon_{1}\left\mathcal{I}\right\right)  (63)  
\displaystyle=\lim\limits_{L\to\infty}\frac{1}{L}\left\mathcal{E}_{K}^{% \mathsf{c}}\right.  (64) 
Using the upper bound of the entropy in terms of the alphabet size we conclude that the equality in (61) holds. The fact that (59) is not possible in the setup where \mathsf{W} is less noisy accords with the oneway secretkey rate formula given in (24), which excludes rates beyond \max\left\{0,H\!\left({X}\!\left{Z}\right.\right)H\!\left({X}\!\left{Y}% \right.\right)\right\}.
If the answer to Question 1, or equivalently to Question 1’, is “yes”, this would give some new insights into the problem of finding the optimal auxiliary random variables U,V in (18) (and V in (32)), which may be hard in general.
Furthermore, a positive answer to Question 1 implies that we can send quantum information reliable over a quantum channel at a rate that is beyond the coherent information using the scheme introduced in Renes et al. (). Since the best known achievable rate for a wide class of quantum channels is the coherent information, our scheme would improve this bound. Furthermore, it would be of interest to know by how much we can outperform the coherent information.^{12}^{12}12Since there exist a lot of good converse bounds for sending quantum information reliable over an arbitrary quantum channel Bennett et al. (1996); Smith and Smolin (2008); Smith et al. (2008), it would be interesting to see how closely they can be met.
V.2 Approximately Less Noisy SuperChannel
To state the second open problem, consider the supersource which outputs the triple of random variables (V^{L}[\mathcal{E}_{K}],(Y^{L},V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]),(Z^{L},V^{% L}[\mathcal{E}_{K}^{\mathsf{c}}])). For instance, Figure 1 consists of two supersources. The supersource implicitly defines a superchannel \mathsf{W}^{\prime} using the conditional probability distribution of the second two random variables given the first. Then we have
Proposition 11.
For sufficiently large L, the channel \mathsf{W}^{\prime} is approximately less noisy, irrespective of \mathsf{W}.
Proof.
Using the chain rule we can write
\displaystyle H\!\left({V^{L}[\mathcal{E}_{K}]}\!\left{V^{L}[\mathcal{E}_{K}^% {\mathsf{c}}],Y^{L}}\right.\right)  \displaystyle=\sum_{i\in\mathcal{E}_{K}}H\!\left({V^{L}[\mathcal{E}_{K}]_{i}}% \!\left{V^{L}[\mathcal{E}_{K}]^{i1},V^{L}[\mathcal{E}_{K}^{\mathsf{c}}],Y^{L% }}\right.\right)  (65)  
\displaystyle\leq\sum_{i\in\mathcal{E}_{K}}H\!\left({V_{i}}\!\left{V^{i1},Y^% {L}}\right.\right)  (66)  
\displaystyle\leq K\epsilon_{1},  (67) 
where the last inquality follows by definition of the set \mathcal{E}_{K}. Recall that we can choose \epsilon_{1}=O\left(2^{L^{\beta}}\right) for \beta<\tfrac{1}{2}. The polarization phenomenon stated in Theorem 1 ensures that K=O\left(L\right). Hence, we can apply the following Lemma 12 which proves the assertion. ∎
Lemma 12.
If U\small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}X\small{\mbox{$% \hskip3.698858pt\circ\hskip3.698858pt$}}(Y,Z) form a Markov chain in the given order and H\!\left({X}\!\left{Y}\right.\right)\leq\epsilon for \epsilon\geq 0, then H\!\left({U}\!\left{Y}\right.\right)\leq H\!\left({U}\!\left{Z}\right.\right% )+\epsilon for all possible distributions of (U,X).
Proof.
Using the chain rule and the nonnegativity of the entropy we can write
\displaystyle H\!\left({U}\!\left{Y}\right.\right)  \displaystyle\leq H\!\left({U}\!\left{Y}\right.\right)+H\!\left({X}\!\left{Y% ,U}\right.\right)  (68)  
\displaystyle=H\!\left({U,X}\!\left{Y}\right.\right)  (69)  
\displaystyle=H\!\left({X}\!\left{Y}\right.\right)+H\!\left({U}\!\left{X,Y}% \right.\right)  (70)  
\displaystyle\leq\epsilon+H\!\left({U}\!\left{X}\right.\right)  (71)  
\displaystyle\leq\epsilon+H\!\left({U}\!\left{Z}\right.\right).  (72) 
Inequality (71) follows by assumption and since conditioning reduces entropy. The final inequality uses the data processing inequality. ∎
Proposition 11 and Lemma 12 imply that the DMWTC \mathsf{W}^{\prime} induced by the supersource described above is almost less noisy. More precisely we have for \beta<\tfrac{1}{2} and \xi=O\left(L2^{L^{\beta}}\right)
H\!\left({T}\!\left{V^{L}[\mathcal{E}_{K}^{\mathsf{c}}],Y^{L}}\right.\right)% \leq H\!\left({T}\!\left{V^{L}[\mathcal{E}_{K}^{\mathsf{c}}],Z^{L}}\right.% \right)+\xi,  (73) 
for all possible distributions of T, where T\small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}V^{L}[\mathcal{E_{K% }}]\small{\mbox{$\hskip3.698858pt\circ\hskip3.698858pt$}}((Y^{L},V^{L}[% \mathcal{E}_{K}^{\mathsf{c}}]),(Z^{L},V^{L}[\mathcal{E}_{K}^{\mathsf{c}}])) and \left\mathcal{T}\right\leq K. Following the proof of Corollary 4—using (73) in (28)—we obtain the oneway secretkey rate of the supersource as
\displaystyle\frac{1}{L}S_{\to}\!\left({V^{L}[\mathcal{E}_{K}]};{Y^{L},V^{L}[% \mathcal{E}_{K}^{\mathsf{c}}]}\!\left{Z^{L},V^{L}[\mathcal{E}_{K}^{\mathsf{c}% }]}\right.\right)  
\displaystyle =\frac{1}{L}\left(H\!\left({V^{L}[% \mathcal{E}_{K}]}\!\left{Z^{L},V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]}\right.% \right)H\!\left({V^{L}[\mathcal{E}_{K}]}\!\left{Y^{L},V^{L}[\mathcal{E}_{K}^% {\mathsf{c}}]}\right.\right)+\xi\right)  (74)  
\displaystyle =\frac{1}{L}\left(H\!\left({V^{L}[% \mathcal{E}_{K}]}\!\left{Z^{L},V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]}\right.% \right)\right)\frac{o(N)}{N}  (75)  
\displaystyle =R.  (76) 
The second equation follows by definition of the set \mathcal{E}_{K} and (76) is according to (47). We thus conclude that the oneway secretkey agreement scheme introduced in Section III always achieves the oneway secretkey rate for the supersource as defined above. This raises the question of when the supersource has the same key rate as the original source, i.e., how much is is lost in the first layer of our key agreement scheme.
Question 2.
Having \tfrac{1}{L}S_{\to}\!\left({V^{L}[\mathcal{E}_{K}]};{Y^{L},V^{L}[\mathcal{E}_{% K}^{\mathsf{c}}]}\!\left{Z^{L},V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]}\right.% \right)=S_{\to}\!\left({X};{Y}\!\left{Z}\right.\right) implies that Protocol 1 achieves the oneway secretkey rate without knowing anything about the optimal auxiliary random variables V and U. If \mathsf{W} is less noisy, Corollary 4 ensures that \tfrac{1}{L}S_{\to}\!\left({V^{L}[\mathcal{E}_{K}]};{Y^{L},V^{L}[\mathcal{E}_{% K}^{\mathsf{c}}]}\!\left{Z^{L},V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]}\right.% \right)=S_{\to}\!\left({X};{Y}\!\left{Z}\right.\right) must be satisfied. For other scenarios Question 2 is currently unsolved.
For the setup of private channel coding, following the proof of Corollary 6 using (73) shows that the secrecy capacity of the superchannel \mathsf{W}^{\prime} is
\displaystyle C_{s}(\mathsf{W}^{\prime})  \displaystyle=\frac{1}{L}\left(H\!\left({V^{L}[\mathcal{E}_{K}]}\!\left{Z^{L}% ,V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]}\right.\right)H\!\left({V^{L}[\mathcal{E% }_{K}]}\!\left{Y^{L},V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]}\right.\right)+\xi\right)  (77)  
\displaystyle=\frac{1}{L}\left(H\!\left({V^{L}[\mathcal{E}_{K}]}\!\left{Z^{L}% ,V^{L}[\mathcal{E}_{K}^{\mathsf{c}}]}\right.\right)\right)\frac{o(N)}{N}  (78)  
\displaystyle=R.  (79) 
The scheme introduced in Protocol 2 hence achieves the secrecy capacity for the channel \mathsf{W}^{\prime} irrespective of the channel \mathsf{W}. This raises the question when the superchannel and the original channel have the same secrecy capacity.
C_{s}(\mathsf{W}^{\prime})=C_{s}(\mathsf{W}) being valid implies that Protocol 2 achieves the secrecy capacity of \mathsf{W} without having knowledge about the optimal auxiliary random variable V. If \mathsf{W} is more capable, according to Corollary 6 C_{s}(\mathsf{W}^{\prime})=C_{s}(\mathsf{W}) must hold. For other channels, Question 2’ has not yet been resolved.
V.3 Conclusion
We have constructed practically efficient protocols (with complexity essentially linear in the blocklength) for oneway secretkey agreement from correlated randomness and for private channel coding over discrete memoryless wiretap channels. Each protocol achieves the corresponding optimal rate. Compared to previous methods, we do not require any degradability assumptions and achieve strong (rather than weak) secrecy.
Our scheme is formulated for arbitrary discrete memoryless wiretap channels. Using ideas of Şaşoğlu et al. Sasoglu et al. (2009) the two protocols presented in this paper can also be used for wiretap channels with continuous input alphabets.
Acknowledgments
The authors would like to thank Alexander Vardy for useful discussions. This work was supported by the Swiss National Science Foundation (through the National Centre of Competence in Research ‘Quantum Science and Technology’ and grant No. 200020135048) and by the European Research Council (grant No. 258932).
References
 Shannon (1949) Claude E. Shannon, “Communication theory of secrecy systems,” Bell System Technical Journal 28, 656–715 (1949).
 Maurer (1993) Ueli Maurer, “Secret key agreement by public discussion from common information,” IEEE Transactions on Information Theory 39, 733 –742 (1993).
 Wyner (1975) Aaron D. Wyner, “The wiretap channel,” Bell System Technical Journal 54, 1355–1387 (1975).
 Csiszár and Körner (1978) Imre Csiszár and János Körner, “Broadcast channels with confidential messages,” IEEE Transactions on Information Theory 24, 339 – 348 (1978).
 Arıkan (2009) Erdal Arıkan, “Channel polarization: A method for constructing capacityachieving codes for symmetric binaryinput memoryless channels,” IEEE Transactions on Information Theory 55, 3051 –3073 (2009).
 (6) Joseph M. Renes, David Sutter, Frédéric Dupuis, and Renato Renner, “Efficient quantum channel coding scheme requiring no preshared entanglement,” In preparation.
 Körner and Marton (1977) János Körner and Katalin Marton, “Comparison of two noisy channels,” in Topics in Information Theory, Colloquia Mathematica Societatis, edited by János Bolyai (The Netherlands: NorthHolland, 1977) pp. 411–424.
 Arıkan (2010) Erdal Arıkan, “Source polarization,” Proceedings IEEE International Symposium on Information Theory (ISIT) , 899 –903 (2010).
 Sasoglu et al. (2009) Eren Sasoglu, Emre Telatar, and Erdal Arıkan, “Polarization for arbitrary discrete memoryless channels,” Proceedings Information Theory Workshop (ITW) , 144–148 (2009).
 Arıkan and Telatar (2009) Erdal Arıkan and Emre Telatar, “On the rate of channel polarization,” Proceedings IEEE International Symposium on Information Theory (ISIT) (2009), 10.1109/ISIT.2009.5205856.
 Honda and Yamamoto (2012) Junya Honda and Hirosuke Yamamoto, “Polar coding without alphabet extension for asymmetric channels,” Proceedings IEEE International Symposium on Information Theory (ISIT) , 2147 –2151 (2012).
 Abbe (2011) Emmanuel Abbe, “Randomness and dependencies extraction via polarization,” Information Theory and Applications Workshop (ITA) , 1–7 (2011).
 Sahebi and Pradhan (2011) Area G. Sahebi and S. Sandeep Pradhan, “Multilevel polarization of polar codes over arbitrary discrete memoryless channels,” 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton) , 1718–1725 (2011).
 Maurer (1994) Ueli Maurer, “The strong secret key rate of discrete random triples,” in Communication and Cryptography, edited by Richard E. Blahut (Boston: Kluwer Academic, 1994) pp. 271–285.
 Maurer and Wolf (2000) Ueli Maurer and Stefan Wolf, “Informationtheoretic key agreement: From weak to strong secrecy for free,” in Advances in Cryptology Ñ EUROCRYPT 2000, Lecture Notes in Computer Science, Vol. 1807, edited by Bart Preneel (Springer Berlin Heidelberg, 2000) pp. 351–368.
 Ahlswede and Csiszár (1993) Rudolf Ahlswede and Imre Csiszár, “Common randomness in information theory and cryptography. i. secret sharing,” IEEE Transactions on Information Theory 39, 1121 –1132 (1993).
 Holenstein and Renner (2005) Thomas Holenstein and Renato Renner, “Oneway secretkey agreement and applications to circuit polarization and immunization of publickey encryption,” in Advances in Cryptology Ð CRYPTO 2005, Lecture Notes in Computer Science, Vol. 3621, edited by Victor Shoup (Springer Berlin Heidelberg, 2005) pp. 478–493.
 Maurer and Wolf (1999) Ueli Maurer and Stefan Wolf, “Unconditionally secure key agreement and the intrinsic conditional information,” IEEE Transactions on Information Theory 45, 499 –514 (1999).
 Renner and Wolf (2003) Renato Renner and Stefan Wolf, “New bounds in secretkey agreement: The gap between formation and secrecy extraction,” in Advances in Cryptology EUROCRYPT 2003, Lecture Notes in Computer Science, Vol. 2656, edited by Eli Biham (Springer Berlin Heidelberg, 2003) pp. 562–577.
 El Gamal and Kim (2012) Abbas El Gamal and YoungHan Kim, Network Information Theory (Cambridge University Press, 2012).
 Abbe (2012) Emmanuel Abbe, “Low complexity constructions of secret keys using polar coding,” Proceedings Information Theory Workshop (ITW) (2012).
 Mahdavifar and Vardy (2011) Hessam Mahdavifar and Alexander Vardy, “Achieving the secrecy capacity of wiretap channels using polar codes,” IEEE Transactions on Information Theory 57, 6428 –6443 (2011).
 Bellare et al. (2012) Mihir Bellare, Stefano Tessaro, and Alexander Vardy, “Semantic security for the wiretap channel,” in Advances in Cryptology Ð CRYPTO 2012, Lecture Notes in Computer Science, Vol. 7417, edited by Reihaneh SafaviNaini and Ran Canetti (Springer Berlin Heidelberg, 2012) pp. 294–311.
 Andersson et al. (2010) Mattias Andersson, Vishwambhar Rathi, Ragnar Thobaben, Jorg Kliewer, and Mikael Skoglund, “Nested polar codes for wiretap and relay channels,” IEEE Communications Letters 14, 752 –754 (2010).
 Hof and Shamai (2010) Eran Hof and Shlomo Shamai, “Secrecyachieving polarcoding,” Proceedings Information Theory Workshop (ITW) , 1 –5 (2010).
 Koyluoglu and El Gamal (2010) Ozan O. Koyluoglu and Hesham El Gamal, “Polar coding for secure transmission and key agreement,” IEEE 21st International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC) , 2698 –2703 (2010).
 Karzand and Telatar (2010) Mohammad Karzand and Emre Telatar, “Polar codes for qary source coding,” Proceedings IEEE International Symposium on Information Theory (ISIT) , 909 –912 (2010).
 Tal et al. (2012) Ido Tal, Artyom Sharov, and Alexander Vardy, “Constructing polar codes for nonbinary alphabets and macs,” Proceedings IEEE International Symposium on Information Theory (ISIT) , 2132 –2136 (2012).
 Tal and Vardy (2011) Ido Tal and Alexander Vardy, “How to construct polar codes,” (2011), submitted to IEEE Transactions on Information Theory, available at arXiv:1105.6164.
 Sutter et al. (2012) David Sutter, Joseph M. Renes, Frédéric Dupuis, and Renato Renner, “Achieving the capacity of any DMC using only polar codes,” Proceedings Information Theory Workshop (ITW) , 114 –118 (2012), extended version available at arXiv:1205.3756.
 Bennett et al. (1996) Charles H. Bennett, David P. DiVincenzo, John A. Smolin, and William K. Wootters, “Mixedstate entanglement and quantum error correction,” Physical Review A 54, 3824–3851 (1996).
 Smith and Smolin (2008) Graeme Smith and John A. Smolin, “Additive extensions of a quantum channel,” Proceedings Information Theory Workshop (ITW) , 368–372 (2008).
 Smith et al. (2008) Graeme Smith, John A. Smolin, and Andreas Winter, “The quantum capacity with symmetric side channels,” IEEE Transactions on Information Theory 54, 4208–4217 (2008).