A Quantum Multiparty Packing Lemma and the Relay Channel
Abstract
Optimally encoding classical information in a quantum system is one of the oldest and most fundamental challenges of quantum information theory. Holevo’s bound places a hard upper limit on such encodings, while the HolevoSchumacherWestmoreland (HSW) theorem addresses the question of how many classical messages can be “packed” into a given quantum system. In this article, we use Sen’s recent quantum joint typicality results to prove a oneshot multiparty quantum packing lemma generalizing the HSW theorem. The lemma is designed to be easily applicable in many network communication scenarios. As an illustration, we use it to straightforwardly obtain quantum generalizations of wellknown classical coding schemes for the relay channel: multihop, coherent multihop, decodeforward, and partial decodeforward. We provide both finite blocklength and asymptotic results, the latter matching existing formulas. Given the key role of the classical packing lemma in network information theory, our packing lemma should help open the field to direct quantum generalization.
Contents
I Introduction
The packing lemma is one of the central tools used in the construction and analysis of information transmission protocols elgamal2011network (). It quantifies the asymptotic rate at which messages can be “packed” reversibly into a medium, in the sense that the probability of a decoding errors vanishes in the limit of large blocklength. For concreteness, consider the following general version of the packing lemma.^{1}^{1}1See, e.g., elgamal2011network (). Our formulation is slightly paraphrased and uses a notation that is more suitably for the following.
Lemma 1 (Classical Packing Lemma).
Let (U,X,Y) be a triple of random variables with joint distribution p_{UXY}. For each n, let (\tilde{U}^{n},\tilde{Y}^{n}) be a pair of arbitrarily distributed random sequences and {\{\tilde{X}^{n}(m)\}} a family of at most 2^{nR} random sequences such that each \tilde{X}^{n}(m) is conditionally independent of \tilde{Y}^{n} given \tilde{U}^{n} (but arbitrarily dependent on the other \tilde{X}^{n}(m^{\prime}) sequences). Further assume that each \tilde{X}^{n}(m) is distributed as \otimes_{i=1}^{n}p_{XU=\tilde{U}_{i}} given \tilde{U}^{n}. Then, there exists \delta(\varepsilon) that tends to zero as \varepsilon\to 0 such that
\displaystyle\lim_{n\to\infty}\Pr((\tilde{U}^{n},\tilde{X}^{n}(m),\tilde{Y}^{n% })\in\mathcal{T}_{\varepsilon}^{(n)}\text{ for some $m$})=0 
if R<I(X;YU)\delta(\varepsilon), where \mathcal{T}_{\varepsilon}^{(n)} is the set of \varepsilontypical strings of length n with respect to p_{UXY}.
The packing lemma provides a unified approach to many, if not most, of the achievability results in Shannon theory. Despite its broad utility, it is a simple consequence of the union bound and the standard joint typicality lemma with the three variables U, X, Y. The usual channel coding theorem directly follows from taking U=\emptyset and when \tilde{Y}^{n}\sim p_{Y}^{\otimes n}.
For the case when U=\emptyset and when \tilde{Y}^{n}\sim p_{Y}^{\otimes n}, the quantum generalization of the packing lemma is known: the HolevoSchumacherWestmoreland (HSW) theorem holevo1998capacity (); schumacher1997sending (). This can be proven using a conditional typicality lemma for a classicalquantum state with one classical and one quantum system. However, until recently no such typicality lemma was known for two classical systems and one quantum system, and so a quantum version of Lemma 1 was lacking. Furthermore, while in classical Shannon theory Lemma 1 can be used repeatedly in settings where the message is encoded into multiple random variables, this approach fails in the quantum case due to measurement disturbance, specifically the influence of one decoding on subsequent ones. Hence, while it is sufficient to solve the full multiparty packing problem in the classical case with just two senders and one receiver, a general multiparty packing lemma with k\in\mathbb{N} senders is required in the quantum case. The bottleneck is again the lack of a general quantum joint typicality lemma with more than two parties. However, we can obtain partial results in the quantum case for some network settings, as we will describe below.
In this paper we use the quantum joint typicality lemma^{2}^{2}2Sen modestly calls his result a lemma, but the highly ingenious proof more than justifies calling it a theorem. established recently by Sen senInPrep () to prove a quantum oneshot multiparty packing lemma for k senders. We then demonstrate the wide applicability of the lemma by using it to straightforwardly generalize classical protocols in a specific network communication setting to the quantum case. The lemma allows us to construct and prove the correctness of these simple generalizations and, we believe, should help to open the field of classical network information theory to direct quantum generalization. One feature of the lemma is that it leads naturally to demonstrations of the achievability of rate regions without having to resort to timesharing, a desirable property known as simultaneous decoding. In network settings, this is often necessary because different receivers could have different effective rate regions and therefore require incompatible timesharing strategies. Indeed, this is a frequent source of incomplete or incorrect results even in classical information theory DBLP:journals/corr/abs12070543 (). A general construction leading to simultaneous decoding in the quantum setting has therefore been sought for many years DBLP:journals/corr/abs12070543 (); fawzi2012classical (); dutil2011multiparty (); winter2001capacity (); fawzi2011quantum (); christandl2018recoupling (); walter2014multipartite (). Sen’s quantum joint typicality lemma achieves this goal, as does our packing lemma, which can be viewed as a userfriendly presentation of Sen’s lemma.
Recall that network information theory is the study of communication in the setting of multiple parties, a generalization of the conventional singlesender singlereceiver twoparty scenario, commonly known as pointtopoint communication. Common network scenarios include having multiple senders encoding different messages, as in the case of the multiple access channel shannon1961two (), multiple receivers decoding the same message, as in the broadcast channel setting cover1972broadcast (), or a combination of both, as for the interference channel ahlswede1974capacity (). However, the above examples are all instances of what is called single hop communication, where the message directly travels from a sender to a receiver. In multihop communication, there are one, or even multiple, intermediate nodes where the message is decoded or partially decoded before being transmitted to the final receiver. Examples of such settings include the relay channel van1971three (), which we will focus on in this paper, and more generally, graphical multicast networks kramer2005cooperative (); xie2005achievable ().
Research in quantum joint typicality has generally been driven by the need to establish quantum generalizations of results in classical network information theory. Examples include the quantum multiple access channel winter2001capacity (); yard2008capacity (), the quantum broadcast channel yard2011quantum (); dupuis2010father (), and the quantum interference channel fawzi2011quantum (). Indeed, some partial results on joint typicality had been established or conjectured in order to prove achievability bounds for various network information processing tasks dutil2011multiparty (); sen2012achieving (). Subsequent work made some headway on the abstract problem of joint typicality for quantum states, but not enough to affect coding theorems drescher2013simultaneous (); notzel2012solution () prior to Sen’s breakthrough senInPrep ().
The quantum relay channel was studied previously in savov2012partial (), where the authors constructed a partial decodeforward protocol. Here we develop finite blocklength results for the relay channel in addition to reproducing the earlier conclusions and avoiding a resolvable issue with error accumulation from successive measurements in their partial decodeforward bound. (We construct a joint decoder which obtains all the messages from the multiple rounds of communication at once.) Naturally, our analysis makes extensive use of the quantum multiparty packing lemma. Indeed, once the coding strategy is specified, a direct application of the packing lemma in the asymptotic limit gives a list of inequalities which describe the rate region, which we then simplify using entropy inequalities to the usual rate region of the partial decodeforward lower bound. There has also been related work in jin2012lower (), which considered concatenated channels, a special case of the more general relay channel model. As noted in savov2012partial (), work on quantum relay channels may have applications to designing quantum repeaters collins2005quantum (). Sen has also used his joint typicality lemma to prove achievability results for the quantum multiple access, broadcast, and interference channels senInPrep (), but here we give a general packing lemma which can be conveniently used as a black box for quantum network information applications.
Our paper is structured as follows. In Section II, we establish our notation and discuss some preliminaries. In Section III, we describe the setting and state the quantum multiparty packing lemma. The statement will very much resemble a oneshot, multiparty generalization of Lemma 1 but, to reiterate, while the multiparty generalization is trivial in the classical case, it requires the power of a full joint typicality lemma in the quantum case. In Section IV we describe the setting of the classicalquantum (cq) relay channel and systematically describe the achievability bounds corresponding to known coding schemes in the classical setting: multihop, coherent multihop, decodeforward, and partial decodeforward cover1979capacity (). It is worthwhile to note that while the first three bounds only require the packing bound with two senders, the last bound is proved by applying multiparty packing for an arbitrary number of senders. In addition to the oneshot bounds, we show that the asymptotic bounds are obtained by taking the limit of large blocklength, thereby obtaining quantum generalizations of known capacity lower bounds for the classical case. In Section V we prove the quantum multiparty packing lemma via Sen’s quantum joint typicality lemma senInPrep (). For convenience, we restate a special case of the Sen’s joint typicality lemma and suppress some of the details. In Section VI we give a conclusion, including an evaluation of the method proposed in this paper as well as possible directions for future work.
II Preliminaries
We first establish some notation and recall some basic results.
Classical and quantum systems: A classical system X is identified with an alphabet \mathcal{X} and a Hilbert space of dimension {\left{\mathcal{X}}\right}, while a quantum system B is given by a Hilbert space of dimension d_{B}. Classical states are modeled by diagonal density operators such as \rho_{X}=\sum_{x\in\mathcal{X}}p_{X}(x)\ket{x}\bra{x}_{X}, where p_{X} is a probability distributions, quantum states are described by density operator \rho_{A} etc, and classicalquantum states are described by density operators of the form
\displaystyle\rho_{XB}=\sum_{x\in\mathcal{X}}p_{X}(x)\ket{x}\bra{x}_{X}\otimes% \rho_{B}^{(x)}.  (1) 
Probability bound: Denote by E_{1}, E_{2} two events. We will use the following inequality repeatedly in the paper:
\displaystyle\Pr(E_{1})  \displaystyle=\Pr(E_{1}E_{2})\Pr(E_{2})+\Pr(E_{1}{\overline{E}}_{2})\Pr({% \overline{E}}_{2})  
\displaystyle\leq\Pr(E_{2})+\Pr(E_{1}{\overline{E}}_{2}),  (2) 
where we use {\overline{E_{2}}} to denote the complement of E_{2} and used the fact that \Pr(E_{2}),\Pr(E_{1}{\overline{E_{2}}})\leq 1.
Hypothesistesting relative entropy: The hypothesistesting relative entropy is defined as
D_{H}^{\varepsilon}(\rho\\sigma)=\max_{\begin{subarray}{c}0\leq\Pi\leq I\\ \operatorname{tr}(\Pi\rho)\geq 1\varepsilon\end{subarray}}\log\operatorname{% tr}(\Pi\sigma). 
For n copies of states \rho and \sigma, datta2011strong () establishes the following inequalities:
D(\rho\\sigma)\frac{F_{1}(\varepsilon)}{\sqrt{n}}\leq\frac{1}{n}D_{H}^{% \varepsilon}(\rho^{\otimes n}\\sigma^{\otimes n})\leq D(\rho\\sigma)+\frac{F% _{2}(\varepsilon)}{\sqrt{n}},  (3) 
where F_{1}(\varepsilon),F_{2}(\varepsilon)\geq 0 are given by F_{1}(\varepsilon)\equiv 4\sqrt{2}\log\frac{1}{\varepsilon}\log\eta and F_{2}(\varepsilon)\equiv 4\sqrt{2}\log\frac{1}{1\varepsilon}\log\eta, with \eta\equiv 1+\operatorname{tr}\rho^{3/2}\sigma^{1/2}+\operatorname{tr}\rho^{1% /2}\sigma^{1/2}. In the limit of large n, we obtain the quantum Stein’s lemma ogawa2005strong (); hiai1991proper ():
\displaystyle\lim_{n\to\infty}\frac{1}{n}D_{H}^{\varepsilon}(\rho^{\otimes n}% \\sigma^{\otimes n})=D(\rho\\sigma).  (4) 
Conditional density operators: Let a classical system X consist of subsystems X_{v}, for v in some index set V, with alphabet \mathcal{X}=\bigtimes_{v\in V}\mathcal{X}_{v}. Consider a classicalquantum state \rho_{XB} as in Eq. 1 and a subset S\subseteq V. We can write
\displaystyle\rho_{XB}=\sum_{x_{\overline{S}}}p_{X_{{\overline{S}}}}(x_{{% \overline{S}}})\ket{x_{\overline{S}}}\bra{x_{\overline{S}}}_{X_{{\overline{S}}% }}\otimes\rho_{X_{S}B}^{(x_{{\overline{S}}})},  (5) 
where
\rho^{(x_{{\overline{S}}})}_{X_{S}B}\equiv\sum_{x_{S}}p_{X_{S}X_{\overline{S}% }}(x_{S}x_{{\overline{S}}})\ket{x_{S}}\bra{x_{S}}_{X_{S}}\otimes\rho^{(x_{S},% x_{{\overline{S}}})}_{B}. 
We can interpret \rho^{(x_{{\overline{S}}})}_{X_{S}B} as a “conditional” density operator. We further define \rho_{XB}^{(\left\{X_{S},B\right\})} by replacing the conditional density operator in Eq. 5 by the tensor product of its marginals:
\displaystyle\rho_{XB}^{(\left\{X_{S},B\right\})}  \displaystyle=\sum_{x_{{\overline{S}}}}p_{X_{\overline{S}}}(x_{{\overline{S}}}% )\ket{x_{{\overline{S}}}}\bra{x_{{\overline{S}}}}_{X_{\overline{S}}}\otimes% \rho^{(x_{{\overline{S}}})}_{X_{S}}\otimes\rho^{(x_{{\overline{S}}})}_{B}  
\displaystyle=\sum_{x}p_{X}(x)\ket{x}\bra{x}_{X}\otimes\rho^{(x_{{\overline{S}% }})}_{B}. 
This formulation lets us obtain the conditional mutual information as an asymptotic limit of the hypothesis testing relative entropy; by Eq. 4,
\displaystyle\lim_{n\to\infty}\frac{1}{n}D_{H}^{\varepsilon}\Big{(}\rho_{XB}^{% \otimes n}\\left(\rho_{XB}^{(\{X_{S},B\})}\right)^{\otimes n}\Big{)}  \displaystyle=D(\rho_{XB}\\rho_{XB}^{(\{X_{S},B\})})  
\displaystyle=\sum_{x_{{\overline{S}}}}p_{X_{\overline{S}}}(x_{{\overline{S}}}% )D\Big{(}\rho^{(x_{{\overline{S}}})}_{X_{S}B}\;\\;\rho^{(x_{{\overline{S}}})}% _{X_{S}}\otimes\rho^{(x_{{\overline{S}}})}_{B}\Big{)}  
\displaystyle=\sum_{x_{{\overline{S}}}}p_{X_{\overline{S}}}(x_{{\overline{S}}}% )I(X_{S};B)_{\rho^{(x_{\overline{S}})}}  
\displaystyle=I(X_{S};BX_{\overline{S}})_{\rho},.  (6) 
III Quantum Multiparty Packing Lemma
In this section, we formulate a general multiparty packing lemma for quantum Shannon theory that can be conveniently used as a black box for random coding constructions. The goal is to “pack” as many classical messages as possible into our quantum system while retaining distinguishability. A multiparty packing lemma is concerned with packing classical messages via an encoding that involves multiple classical systems. As mentioned in the introduction, this is necessary in quantum information theory due to measurement disturbance. That is, while in classical information theory one can do consecutive decoding operations with impunity, in quantum information theory a decoding operation can change the system and thereby affect a subsequent operation. For example, while classically it is possible to check whether the output of a channel is typical for a tuple of input random variables simply by verifying typicality pair by pair, quantumly this method can be problematic. Hence, we would like to combine a set of decoding operations into one simultaneous decoding. We obtain a construction of this flavor in Lemma 2. Its asymptotic version, Lemma 3, states that the decoding error vanishes provided that a set of inequalities on the rate of transmission is satisfied, as opposed to a single one as in Lemma 1. This is exactly what we expect from a simultaneous decoding operation.
In order to motivate the formal statements to come, it is helpful to have an example in mind. In network coding scenarios, it is often necessary to have multiple message sets, representing in the simplest cases transmissions to and from different users or in different rounds of communication. Those messages, in turn, may be generated in a correlated fashion. Suppose for the purpose of illustration that we have three message sets M_{1},M_{2} and M_{3} and a family of density operators \rho^{(x_{1},x_{2},x_{3})}. To generate a code, we could choose x_{1}(m_{1}) for m_{1}\in M_{1} according to P_{X_{1}}, next generate x_{2}(m_{1},m_{2}) for each m_{1} according to p_{X_{2}X_{1}=x_{1}(m_{1})}, and lastly draw x_{3}(m_{1},m_{2},m_{3}) according to P_{X_{3}X_{1}=x_{1}(m_{1}),X_{2}=x_{2}(m_{2}m_{1})} for each pair m_{1},m_{2}. This arrangement can be represented graphically by a structure that we call a multiplex Bayesian network (Fig. 1, explained below). This structure is key to the technical setup of our multiparty packing lemma.
Let the random variable X be a Bayesian network with respect to a directed acyclic graph (DAG) G=(V,E). The random variable X is composed of random variables X_{v} with alphabet \mathcal{X}_{v} for each v\in V. For v\in V, let
\displaystyle\operatorname{pa}(v)\equiv\left\{v^{\prime}\in V\;\;(v^{\prime},% v)\in E\right\} 
denote the set of parents of v, corresponding to the random variables that X_{v} is conditioned on. Below, we will use the Bayesian network to generate codewords x(m) with components x_{v}(m) for v\in V. Just like in our example, different components of a codeword may only depend on a subset of the message. We will model this situation by an index set J, which labels the different parts of the message, message sets M_{j} for each j\in J, and a function \operatorname{ind}:V\to\mathcal{P}(J), where \operatorname{ind}(v)\subseteq J corresponds to the (indices of) the message parts that the codeword component X_{v} depends on. Below we will use this multiplex Bayesian network to construct a code, and for this construction to be welldefined, we will require that given v\in V,
\operatorname{ind}(v^{\prime})\subseteq\operatorname{ind}(v)\text{ for every }% v^{\prime}\in\operatorname{pa}(v).  (7) 
In the example, this captures the fact that the random variable x_{3}(m) is defined conditional on the value of x_{2}(m_{1},m_{2}) and therefore must necessarily depend m_{1} and m_{2}; similarly for X_{2} and m_{2},m_{1}.
We will call the tuple \mathcal{B}=(G,X,M,\operatorname{ind}), where M\equiv\bigtimes_{j\in J}M_{j}, a multiplex Bayesian network. We can visualize a multiplex Bayesian network by adjoining to the DAG G additional vertices M_{j}, one for each j\in J, and edges that connect each M_{j} to those X_{v} such that j\in\operatorname{ind}(v). For a visualization of the example with three random variables, see Fig. 1.
Fix a multiplex Bayesian network \mathcal{B}=(G,X,M,\operatorname{ind}). We would like to produce a random codebook
\displaystyle\left\{x_{v}(m)\right\}_{v\in V,m\in M},  (8) 
where x_{v} is a random variable with alphabet \mathcal{X}_{v}. We will generate a random codebook via an algorithm implemented with respect to the multiplex Bayesian network being considered. The vertices represent components of the codewords and the graph G will be the Bayesian network describing the dependencies between the components of the random codewords. Moreover, each component x_{v}(m) will only depend on those parts m_{j}\in M_{j} of the message for which j\in\operatorname{ind}(v). That is, x_{v}(m) and x_{v}(m^{\prime}) will be equal as random variables provided m_{j}=m^{\prime}_{j} for every j\in\operatorname{ind}(v).
We now give the algorithm for generating the random codebook. Since G is a DAG, it has a topological ordering, that is, a total ordering on V such that for every (v^{\prime},v)\in E, v^{\prime} precedes v in the ordering. We also pick an arbitrary total ordering on J and on M_{j} for every j\in J. This then induces a lexicographical ordering on their Cartesian products, which we denote by M_{J^{\prime}}:=\bigtimes_{j\in J^{\prime}}M_{j} for any J^{\prime}\subseteq J. We define M_{\emptyset}=\{\emptyset\} as a singleton set so that we can identify M_{J^{\prime}}\times M_{J^{\prime\prime}}=M_{J^{\prime}\cup J^{\prime\prime}} for any two disjoint subsets J^{\prime},J^{\prime\prime}\subseteq J. These total orderings determine the order in which we perform the for loops below, but do not impact the joint distribution of the codewords. We can therefore define the following algorithm:
Algorithm 1: Codebook generation from multiplex Bayesian network
Here, {\overline{\operatorname{ind}(v)}}\equiv J\setminus\operatorname{ind}(v), m_{\operatorname{pa}(v)} is the restriction of m_{v} to M_{\operatorname{ind}(\operatorname{pa}(v))} (this makes sense by Eq. 7), X_{\operatorname{pa}(v)}=(X_{v^{\prime}})_{v^{\prime}\in\operatorname{pa}(v)} and similarly for x_{\operatorname{pa}(v)}(m_{\operatorname{pa}(v)}), and the pair (m_{v},m_{\bar{v}}) is interpreted as an element of M with the appropriate components. The topological ordering on V ensures that x_{\operatorname{pa}(v)}(m_{\operatorname{pa}(v)}) is generated before x_{v}(m_{v}), so this algorithm can be run. We thus obtain a random codebook as in Eq. 8.
We make a few observations.

By construction, for all m\in M and \xi\in\mathcal{X},
\displaystyle\Pr(x(m)=\xi)=p_{X}(\xi)\equiv\prod_{v\in V}p_{X_{v}X_{% \operatorname{pa}(v)}}\left(\xi_{v}\xi_{\operatorname{pa}(v)}\right). That is, x(m) is a Bayesian network with respect to G equal in distribution to X.

By construction, given v\in V and m_{v}\in M_{\operatorname{ind}(v)}, all x_{v}(m_{v},m_{\bar{v}}) for m_{\bar{v}}\in M_{{\overline{\operatorname{ind}(v)}}} are equal as random variables.

Generalizing observation 1, the joint distribution of all codewords can be split into factors in a simple manner. Specifically, given \xi(m)\in\mathcal{X} for every m\in M, we have
\displaystyle\Pr(x(m)=\xi(m)\text{ for all }m\in M)=\prod_{v\in V}\prod_{m_{v}% \in M_{\operatorname{ind}(v)}}p_{X_{v}X_{\operatorname{pa}(v)}}\left(\xi_{v}(% m_{v})\xi_{\operatorname{pa}(v)}(m_{\operatorname{pa}(v)})\right) provided \xi_{v}(m)=\xi_{v}(m^{\prime}) for all m,m^{\prime} with m_{v}=m^{\prime}_{v}. Otherwise, the joint probability is zero.
We will use Algorithm 1 on \mathcal{B} to obtain a codebook for which we would like to construct multiple different quantum decoders. More precisely, let H be the induced subgraph of G for some V_{H}\subseteq V where for all v\in V_{H}, \operatorname{pa}(v)\subseteq V_{H}. We call H an ancestral subgraph. Then, we can naturally define X_{H} to be the set of random variables corresponding to V_{H}, J_{H}\equiv\bigcup_{v\in V_{H}}\operatorname{ind}(v)\subseteq J, M_{H}\equiv\bigtimes_{j\in J_{H}}M_{j}, and C_{H}\equiv\left\{x_{H}(m_{H})\right\}_{m_{H}\in M_{H}}.^{3}^{3}3Note that by the definition of M_{H} we only need m_{H} to identify x_{H} up to equality as random variables. We will then use a quantum encoding \left\{\rho_{B}^{(x_{H})}\right\}_{x_{H}\in\mathcal{X}_{H}} where B is some quantum system. Furthermore, the receiver will also only need to decode a subset of the components of the message D\subseteq J_{H} since they might in general have a guess for the other components {\overline{D}}\equiv J_{H}\setminus D. This is a very general construction for classicalquantum network communication settings, where J and X will respectively correspond to the messages and classical inputs to the classicalquantum channel on different rounds of communication. H would then be the inputs on a particular round, and {\overline{D}} would be the decoder’s message estimates from previous rounds.
We can now state our quantum multiparty packing lemma:
Lemma 2 (Oneshot quantum multiparty packing lemma).
Let \mathcal{B}=(G,X,M,\operatorname{ind}) be a multiplex Bayesian network and run Algorithm 1 to obtain a random codebook C=\left\{x(m)\right\}_{m\in M}. Let H\subseteq G be an ancestral subgraph, \{\rho_{B}^{(x_{H})}\}_{x_{H}\in\mathcal{X_{H}}} a family of quantum states, D\subseteq J_{H}, and \varepsilon\in(0,1). Then there exists a POVM^{4}^{4}4These POVMs depend on the codebook C_{H} and are hence involved in the averaging in Eq. 9. This will be important in the analyses below. \{Q_{B}^{(m_{D}m_{{\overline{D}}})}\}_{m_{D}\in M_{D}} for each m_{{\overline{D}}}\in M_{{\overline{D}}} such that, for all (m_{D},m_{{\overline{D}}})\in M_{H},
\displaystyle\mathbb{E}_{C_{H}}\left[\operatorname{tr}\left[(IQ_{B}^{(m_{D}m% _{{\overline{D}}})})\rho^{(x_{H}(m_{D},m_{{\overline{D}}}))}_{B}\right]\right]% \leq f({\left{V_{H}}\right},\varepsilon)+4\sum_{\emptyset\neq T\subseteq D}2% ^{\scalebox{0.7}{$\displaystyle\big{(}\sum_{t\in T}R_{t}\big{)}D_{H}^{% \epsilon}(\rho_{X_{H}B}\\rho^{(\{X_{S_{T}},B\})}_{X_{H}B})$}}.  (9) 
Here, \mathbb{E}_{C_{H}} denotes the expectation over the random codebook C_{H}=\{x_{H}(m_{H})\}_{m_{H}\in M_{H}}, R_{t}\equiv\log{\left{M_{t}}\right},
\displaystyle S_{T}\equiv\left\{v\in V_{H}\;\;\operatorname{ind}(v)\cap T\neq% \emptyset\right\}, 
and
\displaystyle\rho_{X_{H}B}\equiv\sum_{x_{H}\in\mathcal{X}_{H}}p_{X_{H}}(x_{H})% \ket{x_{H}}\bra{x_{H}}_{X_{H}}\otimes\rho^{(x_{H})}_{B}. 
Furthermore, f(k,\varepsilon) is a universal function (independent of our setup) that tends to zero as \varepsilon\to 0.
Remark.
The bound in Eq. 9 can also be written as
\displaystyle\mathbb{E}_{C_{H}}\left[\operatorname{tr}\left[(IQ_{B}^{(m_{D}m% _{{\overline{D}}})})\rho^{(x_{H}(m_{D},m_{{\overline{D}}}))}_{B}\right]\right]% \leq f({\left{V_{H}}\right},\varepsilon)+4\sum_{m_{D}^{\prime}\neq m_{D}}2^{% D_{H}^{\varepsilon}(\rho_{X_{H}B}\\rho_{X_{H}B}^{(\{X_{S},B\})})},  (10) 
where
\displaystyle S\equiv\left\{v\in V_{H}\;\;\exists j\in D\cap\operatorname{ind% }(v)\text{ such that }(m_{D})_{j}\neq(m^{\prime}_{D})_{j}\right\}. 
In words, S is the set of random codewords that depend on a part of the message that differs between m_{D} and m_{D}^{\prime}. This is similar to decoding error bounds obtained with conventional methods, such as the HayashiNagaoka lemma hayashi2003general (). We obtain Eq. 9 from Eq. 10 by parametrizing the different m_{D}^{\prime} with respect to the components that differ from m_{D}.
Remark.
Note Eq. 9 assumes that the decoder’s guess of m_{\overline{D}} is correct. That is, they choose the POVM \left\{Q_{B}^{(m_{D}m_{{\overline{D}}})}\right\}_{m_{D}\in D}, where m_{{\overline{D}}} is exactly the m_{{\overline{D}}} in the encoded state \rho^{(x_{H}(m_{D},m_{{\overline{D}}}))}_{B}. If the decoder’s guess is incorrect, then this bound will not hold in general. In applications, m_{{\overline{D}}} will typically correspond to message estimates of previous rounds, which we will assume to be correct by invoking a union bound. That is, we bound the total probability of error by summing the probabilities of error of a decoding assuming that all previous decodings were correct.
Using Lemma 2 and Eq. 6, we can naturally obtain the asymptotic version where we simply repeat the encodingdecoding procedure n\in\mathbb{N} times and take the limit of large n. By the quantum Stein’s lemma Eq. 4, the error in Eq. 9 will vanish if the rates of encoding are bounded by conditional mutual information quantities. We present this as a selfcontained statement.
Lemma 3 (Asymptotic quantum multiparty packing lemma).
Let \mathcal{B}=(G,X,M,\operatorname{ind}) be a multiplex Bayesian network. Run Algorithm 1 n times to obtain a random codebook C^{n}=\left\{x^{n}(m)\in\mathcal{X}^{n}\right\}_{m\in M}. Let H\subseteq G be an ancestral subgraph, \{\rho_{B}^{(x_{H})}\}_{x_{H}\in\mathcal{X}_{H}} a family of quantum states, and D\subseteq J_{H}. Then there exists a POVM \{Q_{B^{n}}^{(m_{D}m_{{\overline{D}}})}\}_{m_{D}\in M_{D}} for each m_{{\overline{D}}}\in M_{{\overline{D}}} such that, for all (m_{D},m_{{\overline{D}}})\in M_{H},
\displaystyle\lim_{n\to\infty}\mathbb{E}_{C_{H}^{n}}\left[\operatorname{tr}% \left[(IQ_{B^{n}}^{(m_{D}m_{{\overline{D}}})})\bigotimes_{i=1}^{n}\rho_{B_{i% }}^{(x_{i,H}(m_{D},m_{{\overline{D}}}))}\right]\right]=0, 
provided that
\sum_{t\in T}R_{t}<nI(X_{S_{T}};BX_{\overline{S_{T}}})_{\rho}\quad\text{ for % all }\emptyset\neq T\subseteq D. 
Above, \mathbb{E}_{C_{H}^{n}} is the expectation over the random codebook C_{H}^{n}\equiv\left\{x_{H}^{n}(m_{H})\right\}_{m_{H}\in M_{H}}, R_{t}\equiv\log{\left{M_{t}}\right},
\displaystyle S_{T}\equiv\left\{v\in V_{H}\;\;\operatorname{ind}(v)\cap T\neq% \emptyset\right\}, 
and
\displaystyle\rho_{X_{H}B}\equiv\sum_{x_{H}\in\mathcal{X}_{H}}p_{X_{H}}(x_{H})% \ket{x_{H}}\bra{x_{H}}_{X_{H}}\otimes\rho^{(x_{H})}_{B}. 
Example.
To clarify the definitions and illustrate the application of Lemma 3 we give a concrete example of a multiparty packing setting. Consider the multiplex Bayesian network given in Fig. 1. Then, choosing H=G and D=\{2,3\}, we obtain a POVM \{Q_{B}^{(m_{2},m_{3}m_{1})}\}_{m_{2}\in M_{2},m_{3}\in M_{3}} for each m_{1}\in M_{1}. The mapping from T\subseteq D to S_{T}\subseteq V=\left\{1,2,3\right\} is given in Table 1. Hence, we obtain vanishing error in the asymptotic limit if
\displaystyle R_{2}  \displaystyle<nI(X_{2}X_{3};BX_{1})_{\rho}  
\displaystyle R_{3}  \displaystyle<nI(X_{3};BX_{1}X_{2})_{\rho}  
\displaystyle R_{2}+R_{3}  \displaystyle<nI(X_{2}X_{3};BX_{1})_{\rho}, 
where
\displaystyle\rho_{X_{1}X_{2}X_{3}B}=\sum_{x_{1},x_{2},x_{3}}p_{X_{1}X_{2}X_{3% }}(x_{1},x_{2},x_{3})\ket{x_{1},x_{2},x_{3}}\bra{x_{1},x_{2},x_{3}}_{X_{1}X_{2% }X_{3}}\otimes\rho_{B}^{(x_{1},x_{2},x_{3})}. 
T  S_{T} 

\left\{2\right\}  \left\{2,3\right\} 
\left\{3\right\}  \left\{3\right\} 
\left\{2,3\right\}  \left\{2,3\right\} 
Note that the third inequality subsumes the first.
We expect that Lemma 3 can be used in a variety of settings to directly generalize results from classical network information theory, which often hinge on Lemma 1, to the quantum case.
In fact, it is not too difficult to see that an i.i.d. variant^{5}^{5}5This is because we assume i.i.d. codewords in Lemma 3, which is sufficient for, e.g., relay, multiple access senInPrep (), and broadcast channels senInPrep2 (). of Lemma 1 can be derived from Lemma 3. More precisely, let (U,X,Y)\sim p_{UXY} be a triple of random variables as in the former. Consider a DAG G consisting of two vertices, corresponding to random variables U and X with joint distribution p_{UX}, and an edge going from the former to the latter. We set J=\{1\}, \operatorname{ind}(X)=\{1\}, and M_{1}=M as the message set. A visualization of this simple multiplex Bayesian network (G,(U,X),M,\operatorname{ind}) is given in Fig. 2.
By running Algorithm 1 n times, we obtain codewords which we can identify as \tilde{U}^{n} and \tilde{X}^{n}(m). Conditioned on \tilde{U}^{n}, it is clear that for each m\in M, \tilde{X}^{n}(m)\sim\bigotimes_{i=1}^{n}p_{XU=\tilde{U}_{i}}. Next, choose the subgraph to be all of G, set of quantum states the classical states
\displaystyle\left\{\rho_{\tilde{Y}}^{(u,x)}\equiv\sum_{\tilde{y}\in\mathcal{Y% }}p_{YUX}(\tilde{y}u,x)\ket{\tilde{y}}\bra{\tilde{y}}_{\tilde{Y}}\right\}_{u% \in\mathcal{U},x\in\mathcal{X}} 
and decoding subset D=\{1\}, corresponding to M. We see that if we consider the entire system consisting of \tilde{U}^{n},\,\tilde{X}^{n}(m) and \bigotimes_{i=1}^{n}\rho_{\tilde{Y}_{i}}^{(\tilde{U}_{i}\tilde{X}_{i}(m^{% \prime}))} for m^{\prime}\neq m, it is clear that \tilde{X}^{n}(m) is conditionally independent of \tilde{Y}^{n} given \tilde{U}^{n} due to the conditional independence of X^{n}(m) and X^{n}(m^{\prime}) given \tilde{U}^{n}. By Lemma 3, we obtain a POVM \{Q_{\tilde{Y}^{n}}^{(m)}\}_{m\in M} such that, for all m\in M,
\displaystyle\lim_{n\to\infty}\mathbb{E}_{C^{n}}\left[\operatorname{tr}\left[% \left(IQ_{\tilde{Y}^{n}}^{(m)}\right)\bigotimes_{i=1}^{n}\rho_{\tilde{Y}_{i}}% ^{(\tilde{u}_{i}x_{i}(m))}\right]\right]=0 
provided R<I(X;YU), which is analogous to Lemma 1 if we “identify” the POVM measurement with the typicality test.
IV Application to the ClassicalQuantum Relay Channel
To illustrate the wide applicability of Lemma 2 and demonstrate its power, we will use it to prove achievability results for the classicalquantum relay channel. The first three results make use of the packing lemma in situations where the number of random variables involved in the decoding is at most two ({\left{V_{H}}\right}\leq 2). This situation can be dealt with using existing techniques savov2012partial (). The final partial decodeforward lower bound, however, applies the packing lemma with {\left{V_{H}}\right} unbounded with increasing blocklength, thus requiring its full strength. These lower bounds are wellknown for classical relay channels elgamal2011network (), and that our packing lemma allows us to straightforwardly generalize them to the quantum and even finite blocklength case.^{6}^{6}6Note that in this case the oneshot capacity reduces to the pointtopoint scenario, as the relay lags behind the sender. We can then invoke Lemma 3 to obtain lower bounds on the capacity, which match exactly those of the classical setting with the quantum generalization of mutual information. Note that the partial decodeforward asymptotic bound for the classicalquantum relay channel was first established in savov2012partial ().
First we give some definitions. Recall that a classicalquantum relay channel savov2012partial (); jin2012lower () is a classicalquantum channel \mathcal{N} with two classical inputs X_{1},X_{2} and two quantum outputs B_{2}B_{3}:
\mathcal{N}_{X_{1}X_{2}\to B_{2}B_{3}}\colon\mathcal{X}_{1}\times\mathcal{X}_{% 2}\to\mathcal{H}_{B_{2}}\otimes\mathcal{H}_{B_{3}},\quad(x_{1},x_{2})\mapsto% \rho_{B_{2}B_{3}}^{(x_{1}x_{2})}. 
The sender transmits X_{1}, the relay transmits X_{2} and obtains B_{2}, and the receiver obtains B_{3}. The setup is shown in Fig. 3. Note that this is much more general than the setting of two concatenated channels because the relay’s transmission also affects the system that the relay obtains and the sender’s the receiver’s.
We now define what comprises a general code for the classicalquantum relay channel. Let n\in\mathbb{N}, R\in\mathbb{R}_{\geq 0}. A (n,2^{nR}) code for classicalquantum relay channel \mathcal{N}_{X_{1}X_{2}\to B_{2}B_{3}} for n uses of the channel and number of messages 2^{nR} consists of

A message set M with cardinality 2^{nR}.

An encoding x_{1}^{n}(m)\in\mathcal{X}_{1}^{n} for each m\in M.

A relay encoding and decoding \mathcal{R}_{(B_{2})_{j1}(B_{2}^{\prime})_{j1}\to(X_{2})_{j}(B_{2}^{\prime})% _{j}} for j\in[n]. Here, (B_{2})_{j} is isomorphic to B_{2} and (X_{2})_{j} isomorphic to X_{2} while (B_{2}^{\prime})_{j} is some arbitrary quantum system. The relay starts with some trivial (dimension 0) quantum system (B_{2})_{0}(B_{2}^{\prime})_{0}.

A receiver decoding POVM \{Q_{B_{3}^{n}}^{(m)}\}_{m\in M}.
On round j, the sender transmits (x_{1})_{j}(m) while the relay applies the \mathcal{R}_{(B_{2})_{j1}(B_{2}^{\prime})_{j1}\to(X_{2})_{j}(B_{2}^{\prime})% _{j}} ^{7}^{7}7Here \mathcal{R}_{(B_{2})_{j1}(B_{2}^{\prime})_{j1}\to(X_{2})_{j}(B_{2}^{\prime})% _{j}} has label j that we will not write explicitly since systems X_{2}, B_{2} and B_{2}^{\prime} are already labeled. to their (B_{2})_{j1}(B_{2}^{\prime})_{j1} system and transmits the (X_{2})_{j} state while keeping the (B_{2}^{\prime})_{j} system. After the completion of n rounds, the receiver applies the decoding POVM \left\{Q^{(m)}_{B_{3}^{n}}\right\}_{m\in M} on their received systems \rho_{B_{3}^{n}}(m) to obtain their estimate for the message. See Fig. 4 for a visualization of a protocol with n=3 rounds.
The average probability of error of a general protocol is given by
\displaystyle p_{e}=\frac{1}{M}\sum_{m\in M}\operatorname{tr}\left[\left(IQ% _{B_{3}^{n}}^{(m)}\right)\rho_{B_{3}^{n}}(m)\right]. 
In the protocols we give below, we use random codebooks. We can derandomize in the usual way to conform to the above definition of a code. Furthermore, in our protocols the relay only leaves behind a classical system when decoding. Since our relay channels are classicalquantum, it is not clear that this is suboptimal.
Given R\in\mathbb{R}_{\geq 0},\,n\in\mathbb{N},\,\delta\in[0,1], we say that a triple (R,n,\delta) is achievable for a relay channel if there exists a (n,2^{nR^{\prime}}) code such that
\displaystyle R^{\prime}\geq R\quad\mathrm{and}\quad p_{e}\leq{\delta}. 
The capacity of the classicalquantum relay channel \mathcal{N}_{X_{1}X_{2}\to B_{2}B_{3}} is then defined as
\displaystyle C(\mathcal{N})\equiv\lim_{\delta\to 0}\liminf_{b\to\infty}\sup% \left\{R:\text{$(R,n,\delta)$ is achievable for $\mathcal{N}$}\right\}. 
Now, before looking at specific coding schemes, we first give a general upper bound, a direct generalization of the cutset bound for the classical relay channel:
Proposition 4 (Cutset Bound).
Given a classicalquantum relay channel \mathcal{N}_{X_{1}X_{2}\to B_{2}B_{3}}, its capacity is bounded from above by
\displaystyle C(\mathcal{N}_{X_{1}X_{2}\to B_{2}B_{3}})\leq\max_{p_{X_{1}X_{2}% }}\min\left\{I(X_{1}X_{2};B_{3}),I(X_{1};B_{2}B_{3}X_{2})\right\}.  (11) 
Proof.
See Appendix A. ∎
For some special relay channels, this along with some of the lower bounds proven below will be sufficient to determine the capacity.
IV.1 Multihop Scheme
The multihop lower bound is obtained by a simple twostep process where the sender transmits the message to the relay and the relay then transmits it to the receiver. That is, the relay simply “relays” the message. The protocol we give below is exactly analogous to the classical case elgamal2011network (), right down to the structure of the codebook. The only difference is that the channel outputs a quantum state and the decoding uses a POVM measurement.
Consider a relay channel
\mathcal{N}_{X_{1}X_{2}\to B_{2}B_{3}}\colon\mathcal{X}_{1}\times\mathcal{X}_{% 2}\to\mathcal{H}_{B_{2}}\otimes\mathcal{H}_{B_{3}},\quad(x_{1},x_{2})\mapsto% \rho_{B_{2}B_{3}}^{(x_{1}x_{2})}. 
Let R\geq 0, b\in\mathbb{N}, \varepsilon\in(0,1), where b is number of blocks. Again, R will be the log of the size of the message set and b the number of relay uses, while \varepsilon will be the small parameter input to Lemma 2. We will show that we can achieve the triple (\frac{b1}{b}R,b,\delta) for some \delta a function of R,b,\varepsilon. Let p_{X_{1}},p_{X_{2}} be probability distributions over \mathcal{X}_{1},\mathcal{X}_{2}, respectively. Throughout, we will use
\rho_{X_{1}X_{2}B_{2}B_{3}}\equiv\sum_{x_{1},x_{2}}p_{X_{1}}(x_{1})p_{X_{2}}(x% _{2})\ket{x_{1}x_{2}}\bra{x_{1}x_{2}}_{X_{1}X_{2}}\otimes\rho_{B_{2}B_{3}}^{(x% _{1}x_{2})}. 
We also define \rho_{B_{3}}^{(x_{2})}\equiv\sum_{x_{1}}p_{X_{1}}(x_{1})\rho_{B_{3}}^{(x_{1}x_%
{2})} to be the reduced state on B_{3} induced by tracing out X_{1}B_{2} and fixing X_{2}. We will use random coding with a oneshot block Markov scheme.
Code: Throughout, j\in[b]. Let G be a graph with 2b vertices corresponding to independent random variables (X_{1})_{j}\sim p_{X_{1}},(X_{2})_{j}\sim p_{X_{2}}. Since all the random variables are independent, there are no edges. Furthermore, let M_{0},M_{j} be index sets, where {\left{M_{0}}\right}=1 and {\left{M_{j}}\right}=2^{R}. That is, J=[0:b]. The M_{j} will be the sets from which the messages for each round will be taken. We use single element message set M_{0} to make the effect of the first and the last blocks more explicit. Finally, the function \operatorname{ind} maps (X_{1})_{j} to \{j\} and (X_{2})_{j} to \{j1\}. Then, letting X\equiv X_{1}^{b}X_{2}^{b} and M\equiv\bigtimes_{j=0}^{b}M_{j}, \mathcal{B}\equiv(G,X,M,\operatorname{ind}) is a multiplex Bayesian network. See Fig. 5 for a visualization when b=3.
Now, run Algorithm 1 with \mathcal{B} as the argument. This will return a random codebook
\displaystyle C=\bigcup_{j=1}^{b}\left\{(x_{1})_{j}(m_{j}),(x_{2})_{j}(m_{j1}% )\right\}_{m_{j}\in M_{j},m_{j1}\in M_{j1}}, 
where we restricted to the message components the codewords are dependent on via \operatorname{ind}. For decoding we will apply Lemma 2 with this codebook and use the assortment of POVMs that are given for different ancestral subgraphs and other parameters.
Encoding: On the jth transmission, the sender transmits a message m_{j}\in M_{j} via (x_{1})_{j}(m_{j})\in C.
Relay encoding: Set \tilde{m}_{0} to be the sole element of M_{0}. On the jth transmission, the relay sends their estimate \tilde{m}_{j1} via (x_{2})_{j}(\tilde{m}_{j1})\in C. Note that this is the relay’s estimate of the message m_{j1} transmitted by the sender on the (j1)th transmission.
Relay decoding: Consider the jth transmission. We invoke Lemma 2 with the ancestral subgraph containing the two vertices (X_{1})_{j} and (X_{2})_{j}, the set of quantum states \left\{\rho_{B_{2}}^{(x_{1}x_{2})}\right\}_{x_{1}\in\mathcal{X}_{1},x_{2}\in%
\mathcal{X}_{2}}, decoding subset \left\{j\right\}\subseteq\left\{j1,j\right\}, and small parameter \varepsilon\in(0,1). The relay picks the POVM corresponding to the message estimate for the previous round \tilde{m}_{j1}, which is denoted by \left\{Q_{B_{2}}^{(m_{j}^{\prime}\tilde{m}_{j1})}\right\}_{m_{j}^{\prime}\in
M%
_{j}}. He applies this on their received state to obtain a measurement result \tilde{m}_{j}. Note that this is the relay’s estimate for message m_{j}.
Decoding: On the jth transmission, we again invoke Lemma 2 and the receiver will use the POVM corresponding to the ancestral subgraph containing just the vertex (X_{2})_{j}, the set of quantum states \left\{\rho_{B_{3}}^{(x_{2})}\right\}_{x_{2}\in\mathcal{X}_{2}}, decoding subset \{j1\}\subseteq\left\{j1\right\}, and small parameter \varepsilon. Note that we don’t have a message guess here since the decoding subset is not a proper subset. In this case we will suppress the conditioning for conciseness. We denote the POVM by \left\{Q_{B_{3}}^{(m_{j1}^{\prime})}\right\}_{m_{j1}^{\prime}\in M_{j1}}, and the receiver applies this on their received state to obtain a measurement result \hat{m}_{j1}. Note that this is the receiver’s estimate of the (j1)th message. \hat{m}_{0} will trivially be the sole element of M_{0}.
Error analysis:
Set m_{0} to be the sole element of M_{0}. Fix {\bf m}\equiv(m_{0},\dots,m_{b1}). Note that m_{b} is never decoded by the receiver since it is the message sent in the last block and thus, we can ignore it without loss of generality. Let \tilde{\bf m}\equiv(\tilde{m}_{0},\dots,\tilde{m}_{b1}),\hat{\bf m}\equiv(%
\hat{m}_{0},\dots,\hat{m}_{b1}) denote the aggregation of the message estimates of the relay and receiver, respectively. The probability of error averaged over the random codebook C is given by
p_{e}(C)=\mathbb{E}_{C}\left[p(\hat{\bf m}\neq{\bf m})\right], 
where p here denotes the probability for a fixed codebook. Now, by Eq. 2,
p_{e}(C)\leq\mathbb{E}_{C}\left[p(\tilde{\bf m}\neq{\bf m})\right]+\mathbb{E}_% {C}\left[p(\hat{\bf m}\neq{\bf m}\tilde{\bf m}={\bf m})\right].  (12) 
We consider the first term corresponding to the relay decoding. By the union bound,
\mathbb{E}_{C}\left[p(\tilde{\bf m}\neq{\bf m})\right]\leq\mathbb{E}_{C}\left[% p(\tilde{m}_{0}\neq m_{0})\right]+\sum_{j=1}^{b1}\mathbb{E}_{C}\left[p(\tilde% {m}_{j}\neq m_{j}\tilde{m}_{j1}=m_{j1})\right]. 
By the definition of \tilde{m}_{0}, the first term is zero. Now, we can apply Eq. 9 to bound each summand in the second term as follows:^{8}^{8}8The careful reader would notice that the conditioning on \tilde{m}_{j1}=m_{j1} is not necessary here since the probability of decoding m_{j} correctly at the relay is independent of whether m_{j1} was decoded successfully. However, this will be necessary for the other schemes we give.
\displaystyle\mathbb{E}_{C}\left[p(\tilde{m}_{j}\neq m_{j}\tilde{m}_{j1}=m_{% j1})\right]  \displaystyle=\mathbb{E}_{C}\left[\operatorname{tr}\left[(IQ^{(m_{j}m_{j1})% }_{B_{2}})\rho^{((x_{1})_{j}(m_{j})(x_{2})_{j}(m_{j1}))}_{B_{2}}\right]\right]  
\displaystyle=\mathbb{E}_{C_{(X_{1})_{j}(X_{2})_{j}}}\left[\operatorname{tr}% \left[(IQ^{(m_{j}m_{j1})}_{B_{2}})\rho^{((x_{1})_{j}(m_{j})(x_{2})_{j}(m_{j% 1}))}_{B_{2}}\right]\right]  
\displaystyle\leq f(2,\varepsilon)+4\sum_{T=\left\{j\right\}}2^{RD_{H}^{% \varepsilon}(\rho_{(X_{1})_{j}(X_{2})_{j}B_{2}}\\rho_{(X_{1})_{j}(X_{2})_{j}B% _{2}}^{(\{X_{S_{T}},B_{2}\})})},  
\displaystyle=f(2,\varepsilon)+4\times 2^{RD_{H}^{\varepsilon}(\rho_{X_{1}X_{% 2}B_{2}}\\rho_{X_{1}X_{2}B_{2}}^{(\left\{X_{1},B_{2}\right\})})}, 
where C_{(X_{1})_{j}(X_{2})_{j}} is the corresponding subset of the codebook C, and we used S_{\left\{j\right\}}=\left\{(X_{1})_{j}\right\}. We dropped the index j in the last equality since (X_{1})_{j}(X_{2})_{j}\sim p_{X_{1}}\times p_{X_{2}}. Hence, overall,
\mathbb{E}_{C}\left[p(\tilde{\bf m}\neq{\bf m})\right]\leq b\left[f_{2}(% \varepsilon)+4\times 2^{RD_{H}^{\varepsilon}(\rho_{X_{1}X_{2}B_{2}}\\rho_{X_% {1}X_{2}B_{2}}^{(\left\{X_{1},B_{2}\right\})})}\right]. 
We now consider the second term in Eq. 12, corresponding to the receiver decoding. By the union bound,
\mathbb{E}_{C}\left[p(\hat{\bf m}\neq{\bf m}\tilde{\bf m}={\bf m})\right]\leq% \mathbb{E}_{C}\left[p(\hat{m}_{0}\neq m_{0}\tilde{{\bf m}}={\bf m})\right]+% \sum_{j=1}^{b1}\mathbb{E}_{C}\left[p(\hat{m}_{j}\neq m_{j}\tilde{\bf m}={\bf m% })\right]. 
Again by definition, the first term vanishes. Now, the receiver on the (j+1)th transmission obtains the state \rho_{B_{3}}^{((x_{1})_{j+1}(m_{j+1})(x_{2})_{j+1}(\tilde{m}_{j}))}. Averaging over (x_{1})_{j+1}(m_{j+1}), this becomes \rho_{B_{3}}^{(x_{2})_{j+1}(\tilde{m}_{j})}. Hence, the summands in second term are also bounded via Eq. 9:
\displaystyle\mathbb{E}_{C}\left[p(\hat{m}_{j}\neq m_{j}\tilde{\bf m}={\bf m}% )\right]  \displaystyle=\mathbb{E}_{C}\left[\operatorname{tr}\left[(IQ^{(m_{j})}_{B_{3}% })\rho^{((x_{1})_{j+1}(m_{j+1})(x_{2})_{j+1}(m_{j}))}_{B_{3}}\right]\right]  
\displaystyle=\mathbb{E}_{C_{(X_{1})_{j+1}(X_{2})_{j+1}}}\left[\operatorname{% tr}\left[(IQ^{(m_{j})}_{B_{3}})\rho^{((x_{1})_{j+1}(m_{j+1})(x_{2})_{j+1}(m_{% j}))}_{B_{3}}\right]\right]  
\displaystyle=\mathbb{E}_{C_{(X_{2})_{j+1}}}\left[\operatorname{tr}\left[(IQ^% {(m_{j})}_{B_{3}})\rho^{((x_{2})_{j+1}(m_{j}))}_{B_{3}}\right]\right]  
\displaystyle\leq f(1,\varepsilon)+4\sum_{T=\left\{j\right\}}2^{RD_{H}^{% \varepsilon}(\rho_{(X_{2})_{j+1}B_{3}}\\rho_{(X_{2})_{j+1}B_{3}}^{(\{X_{S_{T}% },B_{3}\})})},  
\displaystyle\leq f(1,\varepsilon)+4\times 2^{RD_{H}^{\varepsilon}(\rho_{X_{2% }B_{3}}\\rho_{X_{2}B_{3}}^{(\left\{X_{2},B_{3}\right\})})}, 
where we used S_{\left\{j\right\}}=\left\{(X_{2})_{j+1}\right\} and again dropped indices in the last inequality. Hence, overall
\mathbb{E}_{C}\left[p(\hat{\bf m}\neq{\bf m}\tilde{\bf m}={\bf m})\right]\leq b% \left[f(1,\varepsilon)+4\times 2^{RD_{H}^{\varepsilon}(\rho_{X_{2}B_{3}}\% \rho_{X_{2}B_{3}}^{(\left\{X_{2},B_{3}\right\})})}\right]. 
We have therefore established the following:
Proposition 5 (Multihop).
Given R\in\mathbb{R}_{\geq 0},\,\varepsilon\in(0,1),\,b\in\mathbb{N}, the triple (\frac{b1}{b}R,b,\delta), is achievable for the classicalquantum relay channel, where^{9}^{9}9Note that we need R,\varepsilon to be sufficiently small so that \delta\in[0,1]. Otherwise we can simply take the minimum between the expression and 1. For large b, a more useful bound can be obtained by using the channel for finite n times for each of the b blocks.
\displaystyle\delta=b\big{[}  \displaystyle f(1,\varepsilon)+f(2,\varepsilon)+4\times 2^{RD_{H}^{% \varepsilon}(\rho_{X_{2}B_{3}}\\rho_{X_{2}B_{3}}^{(\left\{X_{2},B_{3}\right\}% )})}+4\times 2^{RD_{H}^{\varepsilon}(\rho_{X_{1}X_{2}B_{2}}\\rho_{X_{1}X_{2}% B_{2}}^{(\left\{X_{1},B_{2}\right\})})}\big{]}. 
At this point it would be useful to give the explicit form of f(k,\varepsilon) for k\in\mathbb{N} from senInPrep (), and our proof of the packing lemma in Section V:
\displaystyle f(k,\varepsilon)=\varepsilon+2^{\frac{k}{2}+2}\varepsilon^{1/4}% \left(2^{2^{k+3}1/2}+1\right). 
Note that some coarse approximations are made to obtain a simple expression.
In the asymptotic limit we use the channel n/b times in each of the b blocks. The protocol will be analogous to oneshot protocol, except the relay channel will have a tensor product form \mathcal{N}^{\otimes(n/b)}_{X_{1}X_{2}\to B_{2}B_{3}} characterized by a family of quantum states \rho^{(x_{1}^{(n/b)}x_{2}^{(n/b)})}_{B_{2}^{(n/b)}B_{3}^{(n/b)}}. The codebook will be C^{(n/b)} and for finite b and large n we will invoke Lemma 3 (instead of Lemma 2) to construct POVM’s for the relay and the receiver such that the decoding error vanishes if R<\min\left\{I(X_{1};B_{2}X_{2})_{\rho},I(X_{2};B_{3})_{\rho}\right\}, thereby obtaining the quantum equivalent of the classical multihop bound for sufficiently large b:^{10}^{10}10Note that our rate is \frac{b1}{b}R. To achieve rate R we need \frac{b1}{b}\to 1, and so we take the large n limit followed by the large b limit.
C\geq\max_{p_{X_{1}}p_{X_{2}}}\min\left\{I(X_{1};B_{2}X_{2})_{\rho},I(X_{2};B% _{3})_{\rho}\right\}.  (13) 
IV.2 Coherent Multihop Scheme
In the multihop scheme, we obtained a rate optimized over product distributions, specifically Eq. 13. For the coherent multihop scheme we will obtain the same rate except optimized over all possible twovariable distributions p_{X_{1}X_{2}} by conditioning codewords on each other.
Again, let R\geq 0 be our rate, \varepsilon\in(0,1), and total blocklength b\in\mathbb{N}. We will show that we can achieve the triple (\frac{b1}{b}R,b,\delta) for some \delta a function of R,b,\varepsilon. Let p_{X_{1}X_{2}} be probability distributions over \mathcal{X}_{1}\times\mathcal{X}_{2}. Throughout, we will use
\rho_{X_{1}X_{2}B_{2}B_{3}}\equiv\sum_{x_{1},x_{2}}p_{X_{1}X_{2}}(x_{1},x_{2})% \ket{x_{1}x_{2}}\bra{x_{1}x_{2}}_{X_{1}X_{2}}\otimes\rho_{B_{2}B_{3}}^{(x_{1}x% _{2})}. 
We also again define \rho_{B_{3}}^{(x_{2})}\equiv\sum_{x_{1}}p_{X_{1}X_{2}}(x_{1}x_{2})\rho_{B_{3%
}}^{(x_{1}x_{2})} to be the reduced state on B_{3} by tracing out X_{1}B_{2} and fixing X_{2}. Our coding scheme will be very similar to that of the multihop.
Code: Let G be a graph with 2b vertices corresponding to random variables (X_{1})_{j}(X_{2})_{j}\sim p_{X_{1}X_{2}}, independent of other pairs, with edges from (X_{1})_{j} to (X_{2})_{j}. Furthermore, let M_{0},M_{j} be index sets, where {\left{M_{0}}\right}=1 and {\left{M_{j}}\right}=2^{R}. Finally, the function \operatorname{ind} maps (X_{1})_{j} to \{j\} and (X_{2})_{j} to \{j1\}. Then, letting X\equiv X_{1}^{b}X_{2}^{b} and M\equiv\bigtimes_{j=0}^{b}M_{j}, it is easy to see that \mathcal{B}\equiv(G,X,M,\operatorname{ind}) is a multiplex Bayesian network. See Fig. 6 for a visualization when b=3.
Now, run Algorithm 1 with \mathcal{B} as the argument. This will return a random codebook
\displaystyle C=\bigcup_{j=1}^{b}\left\{(x_{1})_{j}(m_{j1},m_{j}),(x_{2})_{j}% (m_{j1})\right\}_{m_{j}\in M_{j},m_{j1}\in M_{j1}}, 
where we restricted to the message components the codewords are dependent on via \operatorname{ind}. For decoding we will apply Lemma 2 with this codebook and use the assortment of POVMs that are given for different ancestral subgraphs and other parameters.
Encoding: Set m_{0} to be the sole element of M_{0}. On the jth transmission, the sender transmits a message m_{j}\in M_{j} via (x_{1})_{j}(m_{j1},m_{j})\in C.
Relay encoding: Same as multihop.
Relay decoding: Same as multihop.^{11}^{11}11Note, however, that the POVM the relay uses from Lemma 2 will not be the same as that of the multihop case since the multiplex Bayesian networks are not the same.
Decoding: Same as multihop.
Error analysis:
With an analysis essentially identical to that of the multihop protocol we arrive at the following.
Proposition 6 (Coherent Multihop).
Given R\in\mathbb{R}_{\geq 0},\,\varepsilon\in(0,1),\,b\in\mathbb{N}, the triple (\frac{b1}{b}R,b,\delta) is achievable for the classicalquantum relay channel, where
\displaystyle\delta=b\big{[}f_{1}(\varepsilon)+f_{2}(\varepsilon)+4\times 2^{R% D_{H}^{\varepsilon}(\rho_{X_{2}B_{3}}\\rho_{X_{2}B_{3}}^{\left\{X_{2},B_{3}% \right\}})}+4\times 2^{RD_{H}^{\varepsilon}(\rho_{X_{1}X_{2}B_{2}}\\rho_{X_{% 1}X_{2}B_{2}}^{(\left\{X_{1},B_{2}\right\})})}\big{]}. 
Asymptotically, this vanishes if R<\min\left\{I(X_{1};B_{2}X_{2})_{\rho},I(X_{2};B_{3})_{\rho}\right\}, thereby obtaining the quantum equivalent of the coherent multihop bound for sufficiently large b:
C\geq\max_{p_{X_{1}X_{2}}}\min\left\{I(X_{1};B_{2}X_{2})_{\rho},I(X_{2};B_{3}% )_{\rho}\right\}. 
IV.3 Decode Forward Scheme
In the decodeforward protocol we make an incremental improvement on the coherent multihop protocol by letting the receiver’s decoding also involve X_{1}.
Again, let R\geq 0 be our rate, \varepsilon\in(0,1), and total blocklength b\in\mathbb{N}. The classicalquantum state \rho_{X_{1}X_{2}B_{2}B_{3}} is identical to that of the coherent multihop scenario.
Code: The codebook is generated in the same way as in the coherent multihop protocol save with the index set M_{b} having cardinality 1 to take into account boundary effects for the backward decoding protocol^{12}^{12}12In elgamal2011network () multiple decoding protocols are given. We here give the quantum generalization of the backward decoding protocol. we will implement.
Encoding: Set m_{0} to be the sole element of M_{0}. On the jth transmission, the sender transmits the message m_{j}\in M_{j} via (x_{1})_{j}(m_{j1},m_{j})\in C. Note that there is only one message m_{b}\in M_{b} they can choose on the bth round.
Relay encoding: Same as that of coherent multihop.
Relay decoding: Same as that of coherent multihop. However, note that on bth round, since {\left{M_{b}}\right}=1, the decoding is trivial and the estimate \tilde{m}_{b} will be the sole element of M_{b}.
Decoding: The receiver waits until all b transmissions are finished. Then, they implement a backward decoding protocol, that is, starting with the last system they obtain. Set \hat{m}_{b} to be the sole element of M_{b}. On the jth system they use the POVM corresponding to the ancestral subgraph containing vertices (X_{1})_{j} and (X_{2})_{j}, the set of quantum states \left\{\rho_{B_{3}}^{(x_{1}x_{2})}\right\}_{x_{1}\in\mathcal{X}_{1},x_{2}\in%
\mathcal{X}_{2}}, decoding subset \{j1\}\subseteq\left\{j1,j\right\}, and small parameter \varepsilon. We denote the POVM by \left\{Q_{B_{3}}^{(m_{j1}^{\prime}\hat{m}_{j})}\right\}_{m_{j1}^{\prime}\in
M%
_{j1}}, where we use the estimate \hat{m}_{j}, and the obtained measurement result \hat{m}_{j1}. Note that trivially \hat{m}_{0} is the sole element of M_{0}.
Error analysis: Fix some {\bf m}=(m_{0},\dots,m_{b})\in M. Let \tilde{\bf m}=(\tilde{m}_{0},\dots,\tilde{m}_{b}),\hat{\bf m}=(\hat{m}_{0},%
\dots,\hat{m}_{b}) denote the aggregation of the messages estimates of the relay and receiver, respectively. Then, the probability of error averaged over C is given by
p_{e}(C)=\mathbb{E}_{C}\left[p(\hat{\bf m}\neq{\bf m})\right]. 
Again, by the bound in Eq. 2,
p_{e}(C)\leq\mathbb{E}_{C}\left[p(\tilde{\bf m}\neq{\bf m})\right]+\mathbb{E}_% {C}\left[p(\hat{\bf m}\neq{\bf m}\tilde{\bf m}={\bf m})\right]. 
The bound on the first term is identical to that of the coherent multihop protocol and is given by
\mathbb{E}_{C}\left[p(\tilde{\bf m}\neq{\bf m})\right]\leq b\left[f_{2}(% \varepsilon)+4\times 2^{RD_{H}^{\varepsilon}(\rho_{X_{1}X_{2}B_{2}}\\rho_{X_% {1}X_{2}B_{2}}^{(\left\{X_{1},B_{2}\right\})})}\right]. 
For the second term, we first apply the union bound:
\mathbb{E}_{C}\left[p(\hat{\bf m}\neq{\bf m}\tilde{\bf m}={\bf m})\right]\leq% \mathbb{E}_{C}\left[\sum_{j=1}^{b1}p(\hat{m}_{j}\neq m_{j}\hat{m}_{j+1}=m_{j% +1}\land\tilde{\bf m}={\bf m})\right], 
where we take into account that the terms corresponding to 0 and b vanish by definition. Each of the summands can be bounded via Lemma 2:
\displaystyle\mathbb{E}_{C}\left[p(\hat{m}_{j}\neq m_{j}\hat{m}_{j+1}=m_{j+1}% \land\tilde{\bf m}={\bf m})\right]  
\displaystyle=\mathbb{E}_{C}\left[\operatorname{tr}\left[(IQ^{(m_{j}m_{j+1})% }_{B_{3}})\rho^{((x_{1})_{j+1}(m_{j+1}m_{j})(x_{2})_{j+1}(m_{j}))}_{B_{3}}% \right]\right]  
\displaystyle=\mathbb{E}_{C_{(X_{1})_{j+1}(X_{2})_{j+1}}}\left[\operatorname{% tr}\left[(IQ^{(m_{j}m_{j+1})}_{B_{3}})\rho^{((x_{1})_{j+1}(m_{j+1}m_{j})(x_% {2})_{j+1}(m_{j}))}_{B_{3}}\right]\right]  
\displaystyle\leq f_{2}(\varepsilon)+4\sum_{T=\{j\}}2^{RD_{H}^{\varepsilon}(% \rho_{(X_{1})_{j+1}(X_{2})_{j+1}B_{3}}\\rho_{(X_{1})_{j+1}(X_{2})_{j+1}B_{3}}% ^{(\{X_{S_{T}},B_{3}\})})}  
\displaystyle\leq f_{2}(\varepsilon)+4\times 2^{RD_{H}^{\varepsilon}(\rho_{X_% {1}X_{2}B_{3}}\\rho_{X_{1}X_{2}B_{3}}^{(\left\{X_{1}X_{2},B_{3}\right\})})}, 
where we use that S_{\{j\}}=\{(X_{1})_{j+1}(X_{2})_{j+1}\}. Hence, we conclude that
\mathbb{E}_{C}\left[p(\hat{\bf m}\neq{\bf m}\tilde{\bf m}={\bf m})\right]\leq b% \left[f_{2}(\varepsilon)+4\times 2^{RD_{H}^{\varepsilon}(\rho_{X_{1}X_{2}B_{3% }}\\rho_{X_{1}X_{2}B_{3}}^{(\left\{X_{1}X_{2},B_{3}\right\})})}\right]. 
We conclude the following.
Proposition 7 (Decode Forward).
Given R\in\mathbb{R}_{\geq 0},\,\varepsilon\in(0,1),\,b\in\mathbb{N}, the triple (\frac{b1}{b}R,b,\delta) is achievable for the classicalquantum relay channel where
\displaystyle\delta=b\left[2f_{2}(\varepsilon)+4\times\left(2^{RD_{H}^{% \varepsilon}(\rho_{X_{1}X_{2}B_{3}}\\rho_{X_{1}X_{2}B_{3}}^{(\left\{X_{1}X_{2% },B_{3}\right\})})}+2^{RD_{H}^{\varepsilon}(\rho_{X_{1}X_{2}B_{2}}\\rho_{X_{% 1}X_{2}B_{2}}^{(\left\{X_{1},B_{2}\right\})})}\right)\right]. 
Asymptotically, this vanishes if R<\min\left\{I(X_{1};B_{2}X_{2})_{\rho},I(X_{1}X_{2};B_{3})_{\rho}\right\}, thereby obtaining the decodeforward lower bound for sufficiently large b:
C\geq\max_{p_{X_{1}X_{2}}}\min\left\{I(X_{1};B_{2}X_{2})_{\rho},I(X_{1}X_{2};% B_{3})_{\rho}\right\}. 
IV.4 Partial Decode Forward Scheme
We now derive the partial decodeforward lower bound. This will actually require the full power of Lemma 2 as the receiver will decode all the messages simultaneously by performing a joint measurement on all b blocks. Intuitively, the partial decodeforward builds on the decodeforward by letting the relay only decode and pass on a part, what we will call P, of the overall message.
We will split the message into two parts P and Q with respective rates R_{p},R_{q}\geq 0. Let \varepsilon\in(0,1) and b\in\mathbb{N} be the total blocklength. Choose some distribution p_{X_{1}X_{2}} but also a random variable U correlated with X_{1}X_{2} so that the overall distribution is p_{UX_{1}X_{2}}. The classicalquantum state of interest will be
\rho_{UX_{1}X_{2}B_{2}B_{3}}\equiv\sum_{u,x_{1},x_{2}}p_{UX_{1}X_{2}}(u,x_{1},% x_{2})\ket{ux_{1}x_{2}}\bra{ux_{1}x_{2}}_{UX_{1}X_{2}}\otimes\rho_{B_{2}B_{3}}% ^{(x_{1}x_{2})}.  (14) 
Note that \rho_{B_{2}B_{3}}^{(x_{1}x_{2})} does not depend on u, but sometimes we will \rho_{B_{2}B_{3}}^{(ux_{1}x_{2})}(=\rho_{B_{2}B_{3}}^{(x_{1}x_{2})}) to keep notation simple. However, if we trace over X_{1}, we will induce a u dependence via the correlation between U and X_{1}X_{2}:
\rho_{UX_{2}B_{2}B_{3}}=\sum_{u,x_{2}}p_{UX_{2}}(u,x_{2})\ket{ux_{2}}\bra{ux_{% 2}}_{UX_{2}}\otimes\rho_{B_{2}B_{3}}^{(ux_{2})}, 
where
\rho_{B_{2}B_{3}}^{(ux_{2})}\equiv\sum_{x_{1}}p_{X_{1}UX_{2}}(x_{1}u,x_{2})% \rho_{B_{2}B_{3}}^{(x_{1}x_{2})}. 
This will be important for the relay decoding.
Code: Let G be a graph with 3b vertices corresponding to random variables (U)_{j}(X_{1})_{j}(X_{2})_{j}\sim p_{UX_{1}X_{2}}. The graph has edges going from (X_{2})_{j} to (U)_{j} and (U)_{j} to (X_{1})_{j} for all j and no edges going across blocks with different j’s. Furthermore, let P_{0},P_{j} and Q_{j} be index sets, so that J=[0:b]\sqcup[b], where {\left{P_{0}}\right}={\left{P_{b}}\right}={\left{Q_{b}}\right}=1, {\left{P_{j}}\right}=2^{R_{p}} and {\left{Q_{j}}\right}=2^{R_{q}} otherwise. Finally, the function \operatorname{ind} maps (X_{1})_{j} to^{13}^{13}13For convenience we will denote the elements of J by the index sets they correspond to. \{P_{j},Q_{j},P_{j1}\}, (U)_{j} to \{P_{j},P_{j1}\}, and (X_{2})_{j} to \{P_{j1}\}. Then, letting X\equiv U^{b}X_{1}^{b}X_{2}^{b}, M_{p}=\bigtimes_{j=0}^{b}P_{j}, M_{q}=\bigtimes_{j=1}^{b}Q_{j} and M=M_{p}\times M_{q}, it is easy to see that \mathcal{B}\equiv(G,X,M,\operatorname{ind}) is a multiplex Bayesian network. See Fig. 7 for a visualization when b=3.
Now, run Algorithm 1 with \mathcal{B} as the argument. This will return a random codebook
\displaystyle C=\bigcup_{j=1}^{b}\left\{(x_{1})_{j}(p_{j1},p_{j},q_{j}),(u)_{% j}(p_{j1},p_{j}),(x_{2})_{j}(p_{j1})\right\}_{p_{j}\in P_{j},p_{j1}\in P_{j% 1},q_{j}\in Q_{j}}, 
where we restricted to the message components the codewords are dependent on via \operatorname{ind}. For decoding we will apply Lemma 2 with this codebook and use the assortment of POVMs that are given for different ancestral subgraphs and other parameters.
Encoding: Set p_{0} to be the sole element of P_{0}. On the jth transmission, the sender transmits the twopart message (p_{j},q_{j})\in P_{j}\times Q_{j} via (x_{1})_{j}(p_{j1},p_{j},q_{j})\in C. Note that on the bth transmission the sender has to send a fixed message (p_{b},q_{b}) being the sole element of P_{b}\times Q_{b}.
Relay encoding: Let \tilde{p}_{0} to be the sole element of P_{0}. On the jth transmission, the relay sends \tilde{p}_{j1} via (x_{2})_{j}(\tilde{p}_{j1}) from codebook C. Note that this is the relay’s estimate of the message sent by the sender on the j1th transmission.
Relay decoding: The relay will try to recover the ppart of the sender’s message using the same technique as in the previous protocols. On the jth transmission the relay will use the POVM corresponding to the ancestral subgraph containing the two vertices (U)_{j} and (X_{2})_{j}, the set of quantum states \left\{\rho_{B_{2}}^{(ux_{2})}\right\}_{u\in\mathcal{U},x_{2}\in\mathcal{X}_{2}}, decoding subset \left\{P_{j}\right\}\subseteq\left\{P_{j1},P_{j}\right\}, and small parameter \varepsilon. The POVM is denoted by \Big{\{}Q_{B_{2}}^{(p_{j}^{\prime}\tilde{p}_{j1})}\Big{\}}_{p_{j}^{\prime}%
\in P_{j}}, where we use the estimate \tilde{p}_{j1}, and the relay applies this on their received state to obtain a measurement result \tilde{p}_{j}. Note that \tilde{p}_{b} is trivially the sole element of P_{b}.
Decoding: The decoder waits until all b transmissions are completed.
The receiver will use the POVM corresponding to the ancestral subgraph the entire graph G, the set of quantum states \left\{\bigotimes_{j=1}^{b}\rho_{B_{3}}^{((u)_{j}(x_{1})_{j}(x_{2})_{j})}\right\}, where the (u)_{j} dependence here is trivial, decoding set \bigtimes_{j=1}^{b1}P_{j}\times\bigtimes_{j=1}^{b1}Q_{j}, and small parameter \varepsilon. We denote the POVM by^{14}^{14}14Since the only index sets which are not included in the part to be decoded are all of cardinality 1, we omit the conditioning for conciseness. \left\{Q^{(p^{\prime}_{1}p^{\prime}_{2}\cdots p^{\prime}_{b1},q_{1}^{\prime}q%
_{2}^{\prime}\cdots q_{b1}^{\prime})}_{B^{b}_{3}}\right\}_{\bigtimes_{j=1}^{b%
1}p_{j}^{\prime}\in P_{j},q_{j}^{\prime}\in Q_{j}}, to their received state on B_{3}^{b} to obtain their estimate of the entire string of messages, which we call \hat{m}_{p}\equiv(\hat{p}_{0},\dots,\hat{p}_{b}),\hat{m}_{q}\equiv(\hat{q}_{1}%
,\dots,\hat{q}_{b}), where \hat{p}_{0},\hat{p}_{b},\hat{q}_{b} are set to be the sole elements of the respective index sets.
Error analysis:
We fix the strings of messages m_{p}=(p_{0},\dots,p_{b}) and m_{q}=(q_{1},\dots,q_{b}). By the bound in Eq. 2,
p_{e}(C)\equiv\mathbb{E}_{C}[p(\hat{m}_{p}\hat{m}_{q}\neq m_{p}m_{q})]\leq% \mathbb{E}_{C}[p(\tilde{m}_{p}\neq m_{p})]+\mathbb{E}_{C}[p(\hat{m}_{p}\hat{m}% _{q}\neq m_{p}m_{q}\tilde{m}_{p}=m_{p})]. 
We can bound the first term just as we did for the other protocols. First, use the union bound.
\mathbb{E}_{C}[p(\tilde{m}_{p}\neq m_{p})]\leq\sum_{j=1}^{b1}\mathbb{E}_{C}[p% (\tilde{p}_{j}\neq p_{j}\tilde{p}_{j1}=p_{j1})]. 
By Lemma 2 we can bound each summand as follows:
\displaystyle\mathbb{E}_{C}\left[p(\tilde{p}_{j}\neq p_{j}\tilde{p}_{j1}=p_{% j1})\right]  \displaystyle=\mathbb{E}_{C}\left[\operatorname{tr}\left[(IQ^{(p_{j}p_{j1})% }_{B_{2}})\rho^{((x_{1})_{j}(p_{j1},p_{j},q_{j})(x_{2})_{j}(p_{j1}))}_{B_{2}% }\right]\right]  
\displaystyle=\mathbb{E}_{C_{(U)_{j}(X_{2})_{j}}}\left[\operatorname{tr}\left[% (IQ^{(p_{j}p_{j1})}_{B_{2}})\rho^{((u)_{j}(p_{j1},p_{j})(x_{2})_{j}(p_{j1% }))}_{B_{2}}\right]\right]  
\displaystyle\leq f_{2}(\varepsilon)+4\sum_{T=\{P_{j}\}}2^{R_{p}D_{H}^{% \varepsilon}(\rho_{(U)_{j}(X_{2})_{j}(B_{2})_{j}}\\rho_{(U)_{j}(X_{2})_{j}(B_% {2})_{j}}^{(\{X_{S_{T}},B_{2}\})})}  
\displaystyle=f_{2}(\varepsilon)+4\times 2^{R_{p}D_{H}^{\varepsilon}(\rho_{UX% _{2}B_{2}}\\rho_{UX_{2}B_{2}}^{(\left\{U,B_{2}\right\})})}, 
where we used S_{\left\{P_{j}\right\}}=\left\{(U)_{j}\right\}. We dropped the index j in the last equality since (U)_{j}(X_{2})_{j}\sim p_{UX_{2}}. Hence, overall,
\mathbb{E}_{C}\left[p(\tilde{m}_{p}\neq m_{p})\right]\leq b\left[f_{2}(% \varepsilon)+4\times 2^{R_{p}D_{H}^{\varepsilon}(\rho_{UX_{2}B_{2}}\\rho_{UX% _{2}B_{2}}^{(\left\{U,B_{2}\right\})})}\right]. 
For the second term, we again invoke Lemma 2:
\displaystyle\mathbb{E}_{C}[p(\hat{m}_{p}\hat{m}_{q}\neq m_{p}m_{q}\tilde{m}_% {p}=m_{p})]=  
\displaystyle=\mathbb{E}_{C}\left[\operatorname{tr}\left[(IQ^{(p_{1}\cdots p_% {b1}q_{1}\cdots q_{b1})}_{B^{b}_{3}})\bigotimes_{j=1}^{b}\rho^{(x_{1}(p_{j1% },p_{j},q_{j})x_{2}(p_{j1}))}_{B_{3}}\right]\right]  
\displaystyle=\mathbb{E}_{C}\left[\operatorname{tr}\left[(IQ^{(p_{1}\cdots p_% {b1},q_{1}\cdots q_{b1})}_{B^{b}_{3}})\bigotimes_{j=1}^{b}\rho^{(u(p_{j1},p% _{j})x_{1}(p_{j1},p_{j},q_{j})x_{2}(p_{j1}))}_{B_{3}}\right]\right]  
\displaystyle\leq f_{3b}(\varepsilon)+4\times\sum_{J_{p},J_{q}\subseteq[b1]:j% _{p}+j_{q}>0}2^{j_{p}R_{p}+j_{q}R_{q}D_{H}^{\varepsilon}\left(\rho_{U^{b}X^{b% }_{1}X^{b}_{2}B^{b}_{3}}\\rho_{U^{b}X^{b}_{1}X^{b}_{2}B^{b}_{3}}^{(\{X_{S_{(J% _{p},J_{q})}},B_{3}^{b}\})}\right)}, 
where S_{(J_{p},J_{q})}=\left\{X^{\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2}U^{\mathcal{% J}_{p}}\right\}. Here we used the following definitions \mathcal{J}\equiv\mathcal{J}_{p}\cup J_{q}, \mathcal{J}_{p}\equiv J_{p}\cup J_{p}^{\prime}, J_{p}^{\prime}\equiv\left\{j\in[b]j1\in J_{p}\right\}, and j_{p}\equivJ_{p},j_{q}\equivJ_{q}. Also, note that \rho_{U^{b}X_{1}^{b}X_{2}^{b}B_{3}^{b}}=\rho_{UX_{1}X_{2}B_{3}}^{\otimes b}. Thus, overall, we have proved
Proposition 8.
Given R_{p},R_{q}\in\mathbb{R}_{\geq 0},\,\varepsilon\in(0,1),\,b\in\mathbb{N}, the triple (\frac{b1}{b}(R_{p}+R_{q}),b,\delta) is achievable for the classicalquantum relay channel, where
\displaystyle\delta  \displaystyle=b\left[f_{2}(\varepsilon)+4\times 2^{R_{p}D_{H}^{\varepsilon}(% \rho_{UX_{2}B_{2}}\\rho_{UX_{2}B_{2}}^{(\left\{U,B_{2}\right\})})}\right]  
\displaystyle+f_{3b}(\varepsilon)+4\sum_{J_{p},J_{q}\subseteq[b1]:j_{p}+j_{q}% >0}2^{j_{p}R_{p}+j_{q}R_{q}D_{H}^{\varepsilon}\left(\rho_{U^{b}X^{b}_{1}X^{b}% _{2}B^{b}_{3}}\\rho_{U^{b}X^{b}_{1}X^{b}_{2}B^{b}_{3}}^{(\{X^{\mathcal{J}}_{1% }X^{J_{p}^{\prime}}_{2}U^{\mathcal{J}_{p}},B_{3}^{b}\})}\right)}. 
In the asymptotic limit, the error vanishes provided
R_{p}<I(U;B_{2}X_{2})  (15) 
and, for all J_{p},J_{q}\subseteq[b1],
\displaystyle j_{p}R_{p}+j_{q}R_{q}<I(X^{\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2% }U^{\mathcal{J}_{p}};B^{b}_{3}X_{1}^{\overline{\mathcal{J}}}X_{2}^{\overline{% J_{p}^{\prime}}}U^{\overline{\mathcal{J}_{p}}})_{\rho_{U^{b}X^{b}_{1}X^{b}_{2}% B^{b}_{3}}}.  (16) 
Note J_{p},J_{q}\subseteq[b1] and J_{p}^{\prime}\subseteq[2:b]. However, we will use the convention that all complementary sets are with respect to largest containing set^{15}^{15}15The bth messages and estimates will match, but in general the bth x_{1},x_{2},u depend also on the b1th messages and estimates. [b]. We can simplify Eq. 16 via a general lemma:
Lemma 9.
Let \rho_{B_{1}\dots B_{m}} be mpartite quantum state. We consider the state \rho_{B_{1}\dots B_{m}}^{\otimes n} for some n\in\mathbb{N}. Now, let B,B^{\prime},C be disjoint subsystems of (B_{1}\dots B_{m})^{\otimes n} and such that B,B^{\prime} are supported on disjoint tensor factors. Then,
I(B;B^{\prime}C)=0. 
Proof.
We prove this by the definition of the conditional mutual information and the fact that \rho_{B_{1}\dots B_{m}}^{\otimes n} is a tensor product state:
\displaystyle I(B;B^{\prime}C)  \displaystyle=S(BC)+S(B^{\prime}C)S(BB^{\prime}C)S(C)  
\displaystyle=S(BC_{B})+S(C_{\overline{B}})+S(B^{\prime}C_{B^{\prime}})+S(C_{% \overline{B^{\prime}}})S(BC_{B})S(B^{\prime}C_{B^{\prime}})S(C_{\overline{% BB^{\prime}}})S(C)  
\displaystyle=0. 
where C_{B} is the subsystem of C supported on the tensor factors that support B and C_{\overline{B}} is the rest of C. ∎
Thus, using this and the chain rule, for any conditional mutual information quantity we can remove conditioning systems which are supported on tensor factors disjoint from those that support the nonconditioning systems. This will be key in the following analyses. For instance, in Eq. 16, \overline{\mathcal{J}} and \mathcal{J}\cup J_{p}^{\prime}\cup\mathcal{J}_{p}=\mathcal{J} are supported on disjoint tensor factors, and so we can remove the conditioning on the X_{1}^{{\overline{\mathcal{J}}}} system:
\displaystyle I(X^{\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2}U^{\mathcal{J}_{p}};B% ^{b}_{3}X_{1}^{\overline{\mathcal{J}}}X_{2}^{\overline{J_{p}^{\prime}}}U^{% \overline{\mathcal{J}_{p}}})  
\displaystyle=I(X^{\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2}U^{\mathcal{J}_{p}};B% ^{b}_{3}X_{2}^{\overline{J_{p}^{\prime}}}U^{\overline{\mathcal{J}_{p}}})+I(X^% {\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2}U^{\mathcal{J}_{p}};X_{1}^{{\overline{% \mathcal{J}}}}B^{b}_{3}X_{2}^{\overline{J_{p}^{\prime}}}U^{\overline{\mathcal% {J}_{p}}})I(X^{\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2}U^{\mathcal{J}_{p}};X_{1% }^{{\overline{\mathcal{J}}}}X_{2}^{\overline{J_{p}^{\prime}}}U^{\overline{% \mathcal{J}_{p}}})  
\displaystyle=I(X^{\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2}U^{\mathcal{J}_{p}};B% ^{b}_{3}X_{2}^{\overline{J_{p}^{\prime}}}U^{\overline{\mathcal{J}_{p}}}). 
Thus, Eq. 16 reduces to
\displaystyle j_{p}R_{p}+j_{q}R_{q}<I(X^{\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2% }U^{\mathcal{J}_{p}};B^{b}_{3}X_{2}^{\overline{J_{p}^{\prime}}}U^{\overline{% \mathcal{J}_{p}}}). 
We claim that the set of pairs (R_{p},R_{q}) that satisfy these bounds gives the classical partial decodeforward lower bound with quantum mutual information quantities in the limit of large b.^{16}^{16}16This will also cause \frac{b1}{b}\to 1 so that the rate we achieve really is R_{p}+R_{q}. In particular, we show:
Proposition 10.
Let
\displaystyle S(b)\equiv\Big{\{}  \displaystyle(R_{p},R_{q})\in\mathbb{R}^{2}_{\geq 0}\big{}\forall J_{p},J_{q}% \subseteq[b1]\text{ such that }  
\displaystyle j_{p}+j_{q}>0,\,j_{p}R_{p}+j_{q}R_{q}<I(X^{\mathcal{J}}_{1}X^{J_% {p}^{\prime}}_{2}U^{\mathcal{J}_{p}};B^{b}_{3}X_{2}^{\overline{J_{p}^{\prime}% }}U^{\overline{\mathcal{J}_{p}}})_{\rho_{U^{b}X^{b}_{1}X^{b}_{2}B^{b}_{3}}}% \Big{\}} 
and
\displaystyle S\equiv\Big{\{}(R_{p},R_{q})\in\mathbb{R}^{2}_{\geq 0}R_{q}<I(X% _{1};B_{3}UX_{2})_{\rho_{UX_{1}X_{2}B_{3}}},R_{p}+R_{q}<I(X_{1}X_{2};B_{3})_{% \rho_{X_{1}X_{2}B_{3}}}\Big{\}}, 
where \rho_{UX_{1}X_{2}B_{2}B_{3}} is given by Eq. 14. Then, \lim_{b\to\infty}S(b) exists and is equal to S.
Note that the bounds that define S do not match the bounds given for instance in elgamal2011network () since we do not first decode P and thereby Q, but instead jointly decode to obtain all of P,Q. However, in the end we will still obtain the same lower bound on the capacity.
Proof.
For reference, we list the bounds:
\displaystyle j_{p}R_{p}+j_{q}R_{q}  \displaystyle<I(X^{\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2}U^{\mathcal{J}_{p}};B% ^{b}_{3}X_{2}^{\overline{J_{p}^{\prime}}}U^{\overline{\mathcal{J}_{p}}})_{% \rho_{U^{b}X^{b}_{1}X^{b}_{2}B^{b}_{3}}}  (17) 
and
\displaystyle R_{q}<I(X_{1};B_{3}UX_{2})_{\rho_{UX_{1}X_{2}B_{3}}}  (18)  
\displaystyle R_{p}+R_{q}<I(X_{1}X_{2};B_{3})_{\rho_{X_{1}X_{2}B_{3}}}.  (19) 
We first claim \limsup_{b\to\infty}S(b)\subseteq S. Consider J_{p},J_{q}=[b1], in which case Eq. 17 becomes
(b1)(R_{p}+R_{q})<I(X_{1}^{b}(X_{2})_{2}^{b}U^{b};B_{3}^{b}(X_{2})_{1}), 
which, using Lemma 9, can be manipulated into
\displaystyle R_{p}+R_{q}  \displaystyle<\frac{b}{b1}I(X_{1}X_{2}U;B_{3})\frac{1}{b1}I(X_{2};B_{3})  
\displaystyle=I(X_{1}X_{2};B_{3})+\frac{1}{b1}I(X_{1};B_{3}X_{2}). 
In the limit of large b, this becomes Eq. 19. To obtain Eq. 18, take j_{p}=0. Then, Eq. 17 becomes by Lemma 9
j_{q}R_{q}<I(X_{1}^{J_{q}};B_{3}^{b}X_{2}^{b}U^{b})=j_{q}I(X_{1};B_{3}X_{2}U). 
Now, since j_{p}=0, j_{q} cannot be zero, so this is equivalent to
R_{q}<I(X_{1};B_{3}X_{2}U). 
The claim thus follows.
We next claim S(b)\supseteq S for all b and so \liminf_{b\to\infty}S(b)\supseteq S. We only need to consider when j_{p}>0 since otherwise we obtain Eq. 18 as shown above, which holds for all b. Now, interpret each of the inequalities above as a linear bound on an R_{p}R_{q} diagram. We will show that none of the lines corresponding to Eq. 17 cuts into S. First, fixing J_{p},J_{q}\subseteq[b1], we find the R_{p} intercept of said line
\displaystyle\frac{1}{j_{p}}I(X^{\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2}U^{% \mathcal{J}_{p}};B^{b}_{3}X_{2}^{\overline{J_{p}^{\prime}}}U^{\overline{% \mathcal{J}_{p}}})  \displaystyle=\frac{1}{j_{p}}\left(I(X_{1}^{J_{p}^{\prime}}X_{2}^{J_{p}^{% \prime}}U^{J_{p}^{\prime}};B_{3}^{b}X_{2}^{\overline{J_{p}^{\prime}}}U^{% \overline{\mathcal{J}_{p}}})+\cdots\right)  
\displaystyle\geq\frac{1}{j_{p}}I(X_{1}^{J_{p}^{\prime}}X_{2}^{J_{p}^{\prime}}% U^{J_{p}^{\prime}};B_{3}^{J_{p}^{\prime}})  
\displaystyle=I(X_{1}X_{2}U;B_{3})=I(X_{1}X_{2};B_{3}), 
where \cdots stands for some conditional mutual information quantity and therefore is nonnegative. Thus, the R_{p} intercept is at least as large as that of Eq. 19, as shown in Fig. 8. This determines one of the points of the line.
We now find another point. We observe that I(X_{1};B_{3}X_{2}U)\leq I(X_{1}X_{2}U;B_{3}) so the line associated with Eq. 18 intersects that of Eq. 19 in \mathbb{R}_{\geq 0}^{2}. Hence, it is sufficient to show the bound on R_{p} when R_{q}=I(X_{1};B_{3}X_{2}U) in Eq. 17 is weaker than I(X_{1}X_{2}U;B_{3})I(X_{1};B_{3}X_{2}U)=I(X_{2}U;B_{3}). To see this, we plug in R_{q}=I(X_{1};B_{3}X_{2}U) into Eq. 17:
\displaystyle j_{p}R_{p}+j_{q}I(X_{1};B_{3}X_{2}U)  \displaystyle\leq I(X^{\mathcal{J}}_{1}X^{J_{p}^{\prime}}_{2}U^{\mathcal{J}_{p% }};B^{b}_{3}X_{2}^{\overline{J_{p}^{\prime}}}U^{\overline{\mathcal{J}_{p}}})  
\displaystyle=I(X_{1}^{J_{q}};B_{3}^{b}X_{1}^{\mathcal{J}\backslash J_{q}}X_{% 2}^{b}U^{b})+I(X_{1}^{\mathcal{J}\backslash J_{q}}X_{2}^{J_{p}^{\prime}}U^{% \mathcal{J}_{p}};B_{3}^{b}X_{2}^{\overline{J_{p}^{\prime}}}U^{\overline{% \mathcal{J}_{p}}})  
\displaystyle=I(X_{1}^{J_{q}};B_{3}^{J_{q}}X_{2}^{J_{q}}U^{J_{q}})+I(X_{2}^{J% _{p}^{\prime}}U^{J_{p}^{\prime}};B_{3}^{b}X_{2}^{\overline{J_{p}^{\prime}}}U^% {\overline{\mathcal{J}_{p}}})+\cdots  
\displaystyle=j_{q}I(X_{1};B_{3}X_{2}U)+j_{p}I(X_{2}U;B_{3})+\cdots. 
This establishes our claim and completes the proof. ∎
Therefore, combining the bounds Eqs. 19, 18 and 15, the overall rate R_{p}+R_{q} of the entire protocol is achievable if
R_{p}+R_{q}<\min\left\{I(X_{1};B_{3}UX_{2})_{\rho}+I(U;B_{2}X_{2})_{\rho},I(% X_{1}X_{2};B_{3})_{\rho}\right\}. 
This is sufficient since if it holds we can choose R_{p},R_{q} to satisfy the bounds. It is also necessary since if it is violated, then one of the bounds has to be violated. We can optimize over p_{UX_{1}X_{2}}, so we obtain the partial decodeforward lower bound:
C\geq\max_{p_{UX_{1}X_{2}}}\min\left\{I(X_{1};B_{3}UX_{2})_{\rho}+I(U;B_{2}X% _{2})_{\rho},I(X_{1}X_{2};B_{3})_{\rho}\right\}.  (20) 
Remark.
This coding scheme is optimal in the case when \mathcal{N}_{X_{1}X_{2}\to B_{2}B_{3}} is semideterministic, namely B_{2} is classical and \rho_{B_{2}}^{(x_{1}x_{2})} is pure for all x_{1},x_{2}. This is because in this case the partial decodeforward lower bound Eq. 20 with U=B_{2} as random variables matches the cutset upper bound Eq. 11. This is possible because of the purity condition, which essentially means B_{2} is a deterministic function of X_{1},X_{2}. The semideterministic classical relay channel was defined and analyzed in gamal1982capacity ().
V Proof of the Quantum Multiparty Packing Lemma
In this section we prove Lemma 2 via Sen’s joint typicality lemma senInPrep (). We then use Lemma 2 to prove the asymptotic version, Lemma 3. We shall state a special case of the joint typicality lemma, the t=1 intersection case in the notation of senInPrep (), as a theorem. For the sake of conciseness, we suppress some of the detailed expressions.
We first give some definitions. A subpartition \mathcal{L} of some set S is a collection of nonempty, pairwise disjoint subsets of S. We define \bigcup(\mathcal{L}) to be their union, that is, \bigcup(\mathcal{L})\equiv\bigcup_{L\in\mathcal{L}}L. Note that \bigcup(\mathcal{L})\subseteq S. We say a subpartition \mathcal{L} of S covers T\subseteq S if T\subseteq\bigcup(\mathcal{L}).
Theorem 11 (Oneshot Quantum Joint Typicality Lemma senInPrep ()).
Let
\displaystyle\rho_{XA}=\sum_{x}p_{X}(x)\ket{x}\bra{x}_{X}\otimes\rho_{A}^{(x)} 
be a classicalquantum state where A\equiv A_{1}\dots A_{N} and X\equiv X_{1}\dots X_{M}. Let \varepsilon\in(0,1) and let Y=Y_{1}\dots{}Y_{N+M} consist of N+M identical copies of some classical system, with total dimension d_{Y}. Then there exist quantum systems \widehat{A}_{k} and isometries \widehat{J}_{k}\colon A_{k}\to\widehat{A}_{k} for k\in[N], as well as a cqcstate of the form
\widehat{\rho}_{X\widehat{A}Y}=\frac{1}{d_{Y}}\sum_{x,y}p_{X}(x)\ket{x}\bra{x}% _{X}\otimes\widehat{\rho}_{\widehat{A}}^{(x,y)}\otimes\ket{y}\bra{y}_{Y}, 
and a cqcPOVM \widehat{\Pi}_{X\widehat{A}Y}, such that, with \widehat{J}\equiv\bigotimes_{k\in[N]}\widehat{J}_{k},

\left\\widehat{\rho}_{X\widehat{A}Y}(\mathbbm{1}_{X}\otimes\widehat{J})\rho_% {XA}(\mathbbm{1}_{X}\otimes\widehat{J})^{\dagger}\otimes\tau_{Y}\right\_{1}% \leq f(N,M,\varepsilon), where \tau_{Y}=\frac{1}{d_{Y}}\sum_{y}\ket{y}\bra{y}_{Y} denotes the maximally mixed state on Y,

\operatorname{tr}\left[\widehat{\Pi}_{X\widehat{A}Y}\widehat{\rho}_{X\widehat{% A}Y}\right]\geq 1g(N,M,\varepsilon).

Let \mathcal{L} be a subpartition of [M]\sqcup[N] that covers [N]. Define Y_{\mathcal{L}}:=Y_{\bigcup(\mathcal{L})}, S\equiv[M]\cap\bigcup(\mathcal{L}), {\overline{S}}\equiv[M]\setminus S and the “conditional” quantum states
\displaystyle\widehat{\rho}^{(x_{{\overline{S}}},y_{{\overline{S}}})}_{X_{S}% \widehat{A}Y_{\mathcal{L}}} \displaystyle\equiv\frac{1}{d_{Y_{\mathcal{L}}}}\sum_{x_{S},y_{\mathcal{L}}}p_% {X_{S}X_{{\overline{S}}}}(x_{S}x_{{\overline{S}}})\ket{x_{S}}\bra{x_{S}}_{X_% {S}}\otimes\widehat{\rho}_{\widehat{A}}^{(x_{{\overline{S}}}x_{S},y_{{% \overline{S}}}y_{\mathcal{L}})}\otimes\ket{y_{\mathcal{L}}}\bra{y_{\mathcal{L}% }}_{Y_{\mathcal{L}}} \displaystyle\rho^{(x_{{\overline{S}}})}_{X_{S}A} \displaystyle\equiv\sum_{x_{S}}p_{X_{S}X_{{\overline{S}}}}(x_{S}x_{{% \overline{S}}})\ket{x_{S}}\bra{x_{S}}_{X_{S}}\otimes\rho_{A}^{(x_{{\overline{S% }}}x_{S})}. We can now define
\displaystyle\widehat{\rho}_{X\widehat{A}Y}^{(\mathcal{L})} \displaystyle\equiv\frac{1}{d_{Y_{\overline{S}}}}\sum_{x_{\overline{S}},y_{% \overline{S}}}p_{X_{\overline{S}}}(x_{{\overline{S}}})\ket{x_{{\overline{S}}}}% \bra{x_{{\overline{S}}}}_{X_{\overline{S}}}\otimes\bigotimes_{L\in\mathcal{L}}% \widehat{\rho}^{(x_{{\overline{S}}},y_{{\overline{S}}})}_{X_{L\cap[M]}\widehat% {A}_{L\cap[N]}Y_{L}}\otimes\ket{y_{\overline{S}}}\bra{y_{\overline{S}}}_{Y_{% \overline{S}}} \displaystyle\rho^{(\mathcal{L})}_{XA} \displaystyle\equiv\sum_{x_{\overline{S}}}p_{X_{\overline{S}}}(x_{{\overline{S% }}})\ket{x_{{\overline{S}}}}\bra{x_{{\overline{S}}}}_{X_{\overline{S}}}\otimes% \bigotimes_{L\in\mathcal{L}}\rho^{(x_{{\overline{S}}})}_{X_{L\cap[M]}A_{L\cap[% N]}} in terms of the reduced density matrices of the states \widehat{\rho}^{(x_{{\overline{S}}},y_{{\overline{S}}})}_{X_{S}\widehat{A}Y_{% \mathcal{L}}} and \rho^{(x_{{\overline{S}}})}_{X_{S}A} defined above. Then,
\operatorname{tr}\left[\widehat{\Pi}_{X\widehat{A}Y}\left(\widehat{\rho}_{X% \widehat{A}Y}^{(\mathcal{L})}\right)\right]\leq 2^{D_{H}^{\varepsilon}\left(% \rho_{XA}\\rho^{(\mathcal{L})}_{XA}\right)}+h(N,M,d_{A},d_{Y}).
Here, f(N,M,\varepsilon), g(N,M,\varepsilon), h(N,M,d_{A},d_{Y}) are universal functions (independent of the setup) such that
\lim_{\varepsilon\to 0}f(N,M,\varepsilon)=\lim_{\varepsilon\to 0}g(N,M,% \varepsilon)=\lim_{d_{Y}\to\infty}h(N,M,d_{A},d_{Y})=0. 
Proof.
This follows readily from Sen’s Lemma 1 in senInPrep () with an appropriate change of notation and suitable simplifications. We will use Sen’s terminology and notation. We choose k_{\text{Sen}}\equiv N, c_{\text{Sen}}\equiv M, \mathcal{L}_{\text{Sen}} a system isomorphic to our Y_{k}, \delta_{\text{Sen}}=\varepsilon^{1/4N}, and the same error \varepsilon for each pseudosubpartition of [M]\sqcup[N]. We denote \widehat{A}_{k}\equiv(A^{\prime\prime}_{k})_{\text{Sen}}, so that (A^{\prime}_{k})_{\text{Sen}}=\widehat{A}_{k}Y_{k} and (X^{\prime}_{k})_{\text{Sen}}=X_{k}Y_{k}; that is, we explicitly include the augmenting systems in our notation. We also write \hat{J}_{k} for the natural embedding A_{k}\hookrightarrow A^{\prime\prime}_{k}. Then Sen’s lemma yields a state \widehat{\rho}_{X\widehat{A}Y}\equiv\rho^{\prime}_{\text{Sen}} and a POVM \widehat{\Pi}_{X\widehat{A}Y}\equiv\Pi^{\prime}_{\text{Sen}} that satisfies all desired properties. First, statement 1 in Sen’s lemma asserts that \widehat{\rho}_{X\widehat{A}Y} and \widehat{\Pi}_{X\widehat{A}Y} are cqc. Next, our properties 1 and 2 are direct restatements of his statements 2 and 3, with f(N,M,\varepsilon)=2^{(N+M)/2+1}\varepsilon^{1/4N} and g(N,M,\varepsilon)=2^{2^{MN+4}(N+1)^{N}}2^{(N+M)^{2}}\varepsilon^{1/2}+2^{(N+M% )/2+1}\varepsilon^{1/4N}. Finally, we apply statement 4 in Sen’s lemma to a subpartition \mathcal{L} covering [N] and the probability distribution q_{\text{Sen}}(x)=p_{X_{{\overline{S}}}}(x_{{\overline{S}}})\prod_{L\in% \mathcal{L}}p_{X_{L\cap[M]}X_{{\overline{S}}}}(x_{L\cap[M]}x_{{\overline{S}}}). Then our \rho^{(\mathcal{L})}_{XA} is Sen’s \rho_{(S_{1},\dots,S_{l})} and our \widehat{\rho}^{(\mathcal{L})}_{X\widehat{A}Y} is Sen’s \rho^{\prime}_{(S_{1},\dots,S_{l})}, so we obtain property 3 with h(N,M,d_{A},d_{Y})=3\,2^{N}d_{A}d_{Y}^{1/2(N+M)}. ∎
Now, we will prove a generalization of Lemma 2 which takes greater advantage of the power of Theorem 11 by abstracting the properties that the random codebook C needs to satisfy for the multiparty packing lemma to hold. We will use the notation X\equiv X_{1}\dots X_{k} to denote set of k\in\mathbb{N} systems.
Lemma 12.
Let \{p_{X},\rho_{B}^{(x)}\} be an ensemble of quantum states, where X\equiv X_{1}\dots X_{k} with k\in\mathbb{N}, \mathcal{I}=\mathcal{I}_{1}\times\mathcal{I}_{2} an index set, and \varepsilon\in(0,1) a small parameter. Now, let \mathcal{C}=\left\{x(i)\right\}_{i\in\mathcal{I}} be a family of random variables such that for every i\in\mathcal{I}, x(i)\sim p_{X_{1}\cdots X_{k}}, and there exists a map^{17}^{17}17Note that the bound does not depend on the specific choice of the map. \Psi:\mathcal{I}\times\mathcal{I}\to\mathcal{P}([k]) such that for every i,i^{\prime}\in\mathcal{I}, letting T\equiv\Psi(i,i^{\prime}),

x_{\overline{T}}(i)=x_{\overline{T}}(i^{\prime}) as random variables

x_{T}(i),x_{T}(i^{\prime}) are independent conditioned on x_{{\overline{T}}}(i) (=x_{{\overline{T}}}(i^{\prime})),
where {\overline{T}}\equiv[k]\setminus T. Then, for each i_{1}\in\mathcal{I}_{1} there exists a POVM \{Q^{(i_{2}i_{1})}_{B}\}_{i_{2}\in\mathcal{I}_{2}} dependent on the random variables in \mathcal{C} such that for all i=(i_{1},i_{2})\in\mathcal{I},
\mathbb{E}_{\mathcal{C}}\left[\operatorname{tr}[(IQ_{B}^{(i_{2}i_{1})})\rho^% {(x(i_{1},i_{2}))}_{B}]\right]\leq f(k,\varepsilon)+4\sum_{i_{2}^{\prime}\neq i% _{2}}2^{D_{H}^{\varepsilon}(\rho_{XB}\\rho_{XB}^{(\{X_{S},B\})})}, 
where \mathbb{E}_{\mathcal{C}} is the expectation over the random variables in \mathcal{C}, S\equiv\Psi((i_{1},i_{2}),(i_{1},i_{2}^{\prime})), and
\displaystyle\rho_{XB}\equiv\sum_{x}p_{X}(x)\ket{x}\bra{x}_{X}\otimes\rho^{(x)% }_{B}. 
Furthermore, f(k,\varepsilon) is a universal function such that \lim_{\varepsilon\to 0}f(k,\varepsilon)=0.
Before we prove Lemma 12, we first show that Lemma 2 follows from Lemma 12 by establishing that the random codebook generated by Algorithm 1 satisfies the required properties.
Proof of Lemma 2.
Fix subgraph H, \{\rho_{B}^{(x_{H})}\}_{x_{H}\in\mathcal{X}_{H}}, D\subseteq J_{H}, \varepsilon\in(0,1). We invoke Lemma 12 with the ensemble \{p_{X_{H}},\rho_{B}^{(x_{H})}\} with k={\left{V_{H}}\right}, \mathcal{I}_{1}=M_{{\overline{D}}}, \mathcal{I}_{2}=M_{D}, the same \varepsilon, and the family of random variables \mathcal{C}=C_{H}. We thus identify \mathcal{I}=M_{H}=M_{D}\times M_{{\overline{D}}}. We also define an arbitrary ordering on V_{H} such that we can identify it with [k].
We check that C_{H} satisfies the required properties using the observations we made regarding Algorithm 1. First, for every m_{H}\in M_{H}, x_{H}(m_{H})\sim p_{X_{H}} by observation 1 on p. 1.
Next, we claim the map
\displaystyle\Psi(m_{H},m_{H}^{\prime})\equiv\left\{v\in V_{H}\;\;\exists j% \in\operatorname{ind}(v)\text{ such that }(m_{D})_{j}\neq(m^{\prime}_{D})_{j}\right\} 
satisfies the required conditions. Let m_{H},m_{H}^{\prime}\in M_{H} and T=\Psi(m_{H},m_{H}^{\prime}). By definition, given v\in{\overline{T}}, for all j\in\operatorname{ind}(v), (m_{H})_{j}=(m_{H}^{\prime})_{j}. Hence, m_{H}_{\operatorname{ind}(v)}=m_{H}^{\prime}_{\operatorname{ind}(v)}, so by observation 2 on p. 2, x_{v}(m_{H})=x_{v}(m^{\prime}_{H}) as random variables. Thus, x_{{\overline{T}}}(m_{H})=x_{{\overline{T}}}(m^{\prime}_{H}) as random variables, so we have established condition 1.
We now prove the conditional independence statement in condition 2 is satisfied. For \xi_{{\overline{T}}}\in\mathcal{X}_{{\overline{T}}}, observation 1 shows that
\displaystyle\Pr\left(x_{{\overline{T}}}(m_{H})=\xi_{{\overline{T}}}\right)=% \prod_{v\in{\overline{T}}}p_{X_{v}X_{\operatorname{pa}(v)}}\left(\xi_{v}\xi_% {\operatorname{pa}(v)}\right), 
where we used that \operatorname{pa}(\bar{T})\subseteq\bar{T} as a consequence of Eq. 7. Next, observation 3 implies that the joint distribution of x_{T}(m_{H}), x_{T}(m^{\prime}_{H}), and x_{{\overline{T}}}(m_{H}) is given as follows. For \xi,\xi\in\mathcal{X} such that \xi_{{\overline{T}}}=\xi^{\prime}_{{\overline{T}}},
\displaystyle\Pr\left(x_{T}(m_{H})=\xi_{T},x_{T}(m^{\prime}_{H})=\xi^{\prime}_% {T},x_{{\overline{T}}}(m_{H})=\xi_{{\overline{T}}}\right)\;=\;\Pr\left(x(m_{H}% )=\xi,x(m^{\prime}_{H})=\xi^{\prime}\right)  
\displaystyle=  \displaystyle\left(\prod_{v\in{\overline{T}}}p_{X_{v}X_{\operatorname{pa}(v)}% }(\xi_{v}\xi_{\operatorname{pa}(v)})\right)\left(\prod_{v\in T}p_{X_{v}X_{% \operatorname{pa}(v)}}(\xi_{v}\xi_{\operatorname{pa}(v)})\right)\left(\prod_{% v\in T}p_{X_{v}X_{\operatorname{pa}(v)}}(\xi^{\prime}_{v}\xi^{\prime}_{% \operatorname{pa}(v)})\right). 
Hence, x_{T}(m_{H}) and x_{T}(m^{\prime}_{H}) are independent conditional on x_{{\overline{T}}}(m_{H}). Lemma 2 in the form given in Eq. 10 then directly follows from applying Lemma 12. ∎
Proof of Lemma 3.
This follows from Lemma 2 by replacing X with n\in\mathbb{N} i.i.d. copies of itself, X^{n}. Then, associating each v\in V with X_{v}^{n}, (G,X^{n},M,\operatorname{ind}) is a multiplex Bayesian network.
We now apply Algorithm 1 with (G,X^{n},M,\operatorname{ind}) as input. This is equivalent to applying it with (G,X,M,\operatorname{ind}) n times. Then, applying Lemma 2 with inputs H, \{\bigotimes_{i=1}^{n}\rho_{B_{i}}^{(x_{i,H})}\}_{x_{H}^{n}\in\mathcal{X}^{n}_% {H}}, D, \varepsilon(n)\in(0,1), we obtain a POVM \{Q_{B^{n}}^{(m_{D}m_{{\overline{D}}})}\}_{m_{D}\in M_{D}} for each m_{{\overline{D}}}\in M_{{\overline{D}}} such that, for (m_{D},m_{{\overline{D}}})\in M_{H},
\displaystyle\quad\mathbb{E}_{C_{H}^{n}}\left[\operatorname{tr}\left[(IQ_{B^{% n}}^{(m_{D}m_{{\overline{D}}})})\bigotimes_{i=1}^{n}\rho_{B_{i}}^{((x_{i})_{H% }(m_{D},m_{{\overline{D}}}))}\right]\right]  
\displaystyle\leq f({\left{V_{H}}\right},\varepsilon(n))+4\sum_{\emptyset% \neq T\subseteq D}2^{(\sum_{t\in T}R_{t})D_{H}^{\varepsilon(n)}(\rho_{X_{H}^{% n}B^{n}}\\rho_{X_{H}^{n}B^{n}}^{(\{X^{n}_{S_{T}},B^{n}\})})}. 
Consider now
\displaystyle\rho_{X_{H}^{n}B^{n}}  \displaystyle=\sum_{x_{H}^{n}}p_{X_{H}}^{\otimes n}(x_{H}^{n})\ket{x_{H}^{n}}% \bra{x_{H}^{n}}_{X_{H}^{n}}\otimes\bigotimes_{i=1}^{n}\rho_{B_{i}}^{((x_{i})_{% H})} 
and
\displaystyle\rho_{X_{H}^{n}B^{n}}^{(\{X^{n}_{S_{T}},B^{n}\})}=\sum_{x_{H}^{n}% }p_{X_{H}}^{\otimes n}(x_{H}^{n})\ket{x^{n}_{H}}\bra{x^{n}_{H}}_{X_{H}^{n}}% \otimes\rho_{B^{n}}^{(x^{n}_{{\overline{S_{T}}}})}. 
It is not difficult to see that
\displaystyle\rho_{X_{H}^{n}B^{n}}=\left(\sum_{x_{H}}p_{X_{H}}(x_{H})\ket{x_{H% }}\bra{x_{H}}_{X_{H}}\otimes\rho_{B}^{(x_{H})}\right)^{\otimes n}=\rho_{X_{H}B% }^{\otimes n}, 
which conveniently justifies this slight abuse of notation. Furthermore, considering
\displaystyle\rho_{B^{n}}^{(x_{{\overline{S_{T}}}}^{n})}=\sum_{x_{S_{T}}^{n}}p% _{X_{S_{T}}X_{{\overline{S_{T}}}}}^{\otimes n}(x_{S_{T}}^{n}x_{{\overline{S_% {T}}}}^{n})\bigotimes_{i=1}^{n}\rho_{B_{i}}^{((x_{i})_{S_{T}}(x_{i})_{{% \overline{S_{T}}}})}=\bigotimes_{i=1}^{n}\rho_{B_{i}}^{(x_{i})_{{\overline{S_{% T}}}}}, 
we likewise conclude
\displaystyle\rho_{X_{H}^{n}B^{n}}^{(\{X^{n}_{S_{T}},B^{n}\})}=\left(\rho_{X_{% H}B}^{(\{X_{S_{T}},B\})}\right)^{\otimes n}. 
The conclusion therefore follows by Eq. 6 where we choose \varepsilon(n) such that \varepsilon(n)\to 0 so that f({\left{V_{H}}\right},\varepsilon(n))\to 0 and \frac{1}{n}D_{H}^{\varepsilon(n)}(\rho^{\otimes n}\\sigma^{\otimes n})\to D(% \rho\\sigma). Given Eq. 3, one possibility is \varepsilon(n)=1/n. This concludes the proof. ∎
Finally, we prove Lemma 12. Note that Theorem 11 gives a pair \widehat{\rho},\widehat{\Pi} that satisfy joint typicality properties but live in a larger Hilbert space. In order to prove Lemma 2, which claims the existence of a POVM on the original Hilbert space, we will need to construct the corresponding POVM in the larger Hilbert space and then appropriately invert the isometry. There is also an extra classical system Y associated with the X systems, which we can interpret as an additional random codebook. We will use a conventional derandomization argument to eliminate it from the statement. The extra Y’s associated with the B systems we will simply trace over.
Proof of Lemma 12.
We invoke Theorem 11 with inputs the \rho_{XB}, \varepsilon, and a classical system YZ. Here X\equiv X_{1}\dots X_{k}, Y\equiv Y_{1}\dots Y_{k} and Z is a classical system associated with B, to obtain a quantum state \widehat{\rho}_{X\widehat{B}YZ} and POVM \widehat{\Pi}_{X\widehat{B}YZ} which we can expand as follows:
\widehat{\rho}_{X\widehat{B}YZ}=\bigoplus_{x,y}p_{X}(x)\ket{x}\bra{x}_{X}% \otimes\frac{1}{d_{Y}}\ket{y}\bra{y}_{Y}\otimes\widehat{\rho}_{\widehat{B}Z}^{% (x,y)} 
\widehat{\Pi}_{X\widehat{B}YZ}=\bigoplus_{x,y}\ket{x}\bra{x}_{X}\otimes\ket{y}% \bra{y}_{Y}\otimes\widehat{\Pi}_{\widehat{B}Z}^{(x,y)}. 
Now, for every x_{j}\in\mathcal{X}_{j}, draw y_{j}(x_{j}) uniformly at random from \mathcal{Y}_{j}, and consider the random vectors y(x):=(y_{1}(x_{1}),\dots,y_{k}(x_{k})). We use these random vectors and the codebook \mathcal{C}=\{x(i)\}_{i\in\mathcal{I}} to define a codebook \mathcal{C}^{\prime}=\{y(i)\}_{i\in\mathcal{I}}, where we set y(i)=y(x(i)). We also define the joint codebook \mathcal{C}^{\prime\prime}=\{x(i)y(i)\}_{i\in\mathcal{I}}. Then, for every i,i^{\prime}\in\mathcal{I}, letting T\equiv\Psi(i,i^{\prime}), the following holds:

x_{\overline{T}}(i)y_{\overline{T}}(i)=x_{\overline{T}}(i^{\prime})y_{% \overline{T}}(i^{\prime}) as random variables,

x_{T}(i)y_{T}(i) and x_{T}(i^{\prime})y_{T}(i^{\prime}) are independent conditioned on x_{\overline{T}}(i)y_{\overline{T}}(i) (=x_{\overline{T}}(i^{\prime})y_{\overline{T}}(i^{\prime})),
with probabilities
\displaystyle p_{X_{\overline{T}}Y_{\overline{T}}}(x_{\overline{T}},y_{% \overline{T}})=p_{X_{\overline{T}}}(x_{\overline{T}})\cdot p_{Y_{\overline{T}}% }(y_{\overline{T}})=\frac{1}{d_{Y_{\overline{T}}}}p_{X_{\overline{T}}}(x_{% \overline{T}})  
\displaystyle p_{X_{T}Y_{T}X_{\overline{T}}Y_{\overline{T}}}(x_{T},y_{T}x_{% \overline{T}},y_{\overline{T}})=\frac{1}{d_{Y_{T}}}p_{X_{T}X_{\overline{T}}}(% x_{T}x_{\overline{T}}). 
Define the indexed objects:
\widehat{\rho}^{(i)}_{\widehat{B}Z}\equiv\widehat{\rho}_{\widehat{B}Z}^{(x(i),% y(i))}\quad\text{and}\quad\widehat{\Pi}^{(i)}_{\widehat{B}Z}\equiv\widehat{\Pi% }_{\widehat{B}Z}^{(x(i),y(i))}. 
We then define the squareroot measurement
\widehat{Q}_{\widehat{B}Z}^{(i_{2}i_{1})}\equiv\left(\sum_{i_{2}^{\prime}\in% \mathcal{I}_{2}}\widehat{\Pi}^{(i_{1},i_{2}^{\prime})}_{\widehat{B}Z}\right)^{% 1/2}\widehat{\Pi}_{\widehat{B}Z}^{(i_{1},i_{2})}\left(\sum_{i_{2}^{\prime}\in% \mathcal{I}_{2}}\widehat{\Pi}^{(i_{1},i_{2}^{\prime})}_{\widehat{B}Z}\right)^{% 1/2} 
and “invert” the isometry \widehat{J} to obtain the following family of POVM’s on the original Hilbert space:
Q_{B}^{(i)}=Q_{B}^{(i_{2}i_{1})}\equiv\frac{1}{d_{Z}}(\widehat{J}_{B\to% \widehat{B}})^{\dagger}\operatorname{tr}_{Z}\left[\widehat{Q}_{\widehat{B}Z}^{% (i)}\right]\widehat{J}_{B\to\widehat{B}}. 
Note that we have a POVM for each value of i_{1} and these POVM’s are dependent on our random encoding x(i) and random choice of y(i).
Now, fixing i=(i_{1},i_{2})\in\mathcal{I}, we compute the probability of error averaged over the random choice of x(i) and y(i), denoting this by \mathbb{E}\equiv\equiv\mathbb{E}_{\mathcal{C}^{\prime\prime}}:
\displaystyle\mathbb{E}\operatorname{tr}\left[\left(IQ^{(i)}_{B}\right)\rho^{% (i)}_{B}\right]  
\displaystyle=1\mathbb{E}\operatorname{tr}\left[Q_{B}^{(i)}\rho^{(i)}_{B}\right]  
\displaystyle=1\mathbb{E}\operatorname{tr}\left[\widehat{Q}^{(i)}_{\widehat{B% }Z}\left(\widehat{J}_{B\to\widehat{B}}\rho_{B}^{(i)}\widehat{J}_{B\to\widehat{% B}}^{\dagger}\otimes\tau_{Z}\right)\right]  
\displaystyle\leq 1\mathbb{E}\operatorname{tr}\left[\widehat{Q}^{(i)}_{% \widehat{B}Z}\widehat{\rho}_{\widehat{B}Z}^{(i)}\right]+\mathbb{E}{\left\{% \widehat{J}_{B\to\widehat{B}}\rho^{(i)}_{B}\widehat{J}_{B\to\widehat{B}}^{% \dagger}\otimes\tau_{Z}\widehat{\rho}_{\widehat{B}Z}^{(i)}}\right\}_{1}  
\displaystyle\leq 1\mathbb{E}\operatorname{tr}\left[\widehat{Q}^{(i)}_{% \widehat{B}Z}\widehat{\rho}_{\widehat{B}Z}^{(i)}\right]+{\left\{\left(% \mathbbm{1}_{XB}\otimes\widehat{J}_{B\to\widehat{B}}\right)\rho_{XB}\left(% \mathbbm{1}_{XB}\otimes\widehat{J}_{B\to\widehat{B}}^{\dagger}\right)\otimes% \tau_{YZ}\widehat{\rho}_{X\widehat{B}YZ}}\right\}_{1}  
\displaystyle\leq 1\mathbb{E}\operatorname{tr}\left[\widehat{Q}^{(i)}_{% \widehat{B}Z}\widehat{\rho}_{\widehat{B}Z}^{(i)}\right]+f(1,k,\varepsilon)  
\displaystyle\leq 2\left(1\mathbb{E}\operatorname{tr}\left[\widehat{\Pi}^{(i)% }_{\widehat{B}Z}\widehat{\rho}_{\widehat{B}Z}^{(i)}\right]\right)+4\sum_{i_{2}% ^{\prime}\neq i_{2}}\mathbb{E}\operatorname{tr}\left[\widehat{\Pi}^{(i_{1},i_{% 2}^{\prime})}_{\widehat{B}Z}\widehat{\rho}_{\widehat{B}Z}^{(i_{1},i_{2})}% \right]+f(1,k,\varepsilon)  
\displaystyle\leq 4\sum_{i_{2}^{\prime}\neq i_{2}}\mathbb{E}\operatorname{tr}% \left[\widehat{\Pi}^{(i_{1},i_{2}^{\prime})}_{\widehat{B}Z}\widehat{\rho}_{% \widehat{B}Z}^{(i_{1},i_{2})}\right]+f(1,k,\varepsilon)+2g(1,k,\varepsilon), 
where in the last three inequalities we used Theorem 11 and the HayashiNagaoka lemma hayashi2003general (); wilde2013quantum ().
We consider the first term. Let S=\Psi((i_{1},i_{2}),(i_{1},i_{2}^{\prime})). Note that by our conditions on the random codebook, the codewords are equal as random variables on {\overline{S}} and hence,
\displaystyle 4\sum_{i_{2}^{\prime}\neq i_{2}}\mathbb{E}\operatorname{tr}\left% [\widehat{\Pi}^{(i_{1},i_{2}^{\prime})}_{\widehat{B}Z}\widehat{\rho}_{\widehat% {B}Z}^{(i_{1},i_{2})}\right]  
\displaystyle=4\sum_{i_{2}^{\prime}\neq i_{2}}\mathbb{E}_{XX^{\prime}YY^{% \prime}}\operatorname{tr}\left[\widehat{\Pi}^{(i_{1},i_{2}^{\prime})}_{% \widehat{B}Z}\widehat{\rho}_{\widehat{B}Z}^{(i_{1},i_{2})}\right]  
\displaystyle=4\sum_{i_{2}^{\prime}\neq i_{2}}\operatorname{tr}\left[\mathbb{E% }_{X_{\overline{S}}Y_{\overline{S}}}\left[\mathbb{E}_{X_{S}^{\prime}Y_{S}^{% \prime}X_{\overline{S}}Y_{\overline{S}}}\left(\widehat{\Pi}^{(i_{1},i_{2}^{% \prime})}_{\widehat{B}Z}\right)\mathbb{E}_{X_{S}Y_{S}X_{\overline{S}}Y_{% \overline{S}}}\left(\widehat{\rho}_{\widehat{B}Z}^{(i_{1},i_{2})}\right)\right% ]\right]  
\displaystyle=4\sum_{i_{2}^{\prime}\neq i_{2}}\operatorname{tr}\left[\sum_{x_{% \overline{S}},y_{\overline{S}}}p(x_{\overline{S}})\frac{1}{d_{Y_{\overline{S}}% }}\sum_{x_{S}^{\prime},y_{S}^{\prime}}p(x_{S}^{\prime}x_{\overline{S}})\frac{% 1}{d_{Y_{S}}}\widehat{\Pi}^{(x_{S}^{\prime}x_{\overline{S}},y_{S}^{\prime}y_{% \overline{S}})}_{\widehat{B}Z}\sum_{x_{S},y_{S}}p(x_{S}x_{\overline{S}})\frac% {1}{d_{Y_{S}}}\widehat{\rho}_{\widehat{B}Z}^{(x_{S}x_{\overline{S}},y_{S}y_{% \overline{S}})}\right]  
\displaystyle=4\sum_{i_{2}^{\prime}\neq i_{2}}\operatorname{tr}\left[\sum_{x_{% \overline{S}},y_{\overline{S}}}p(x_{\overline{S}})\frac{1}{d_{Y_{\overline{S}}% }}\sum_{x_{S}^{\prime},y_{S}^{\prime}}p(x_{S}^{\prime}x_{\overline{S}})\frac{% 1}{d_{Y_{S}}}\widehat{\Pi}^{(x_{S}^{\prime}x_{\overline{S}},y_{S}^{\prime}y_{% \overline{S}})}_{\widehat{B}Z}\widehat{\rho}_{\widehat{B}Z}^{(x_{\overline{S}}% ,y_{\overline{S}})}\right]  
\displaystyle=4\sum_{i_{2}^{\prime}\neq i_{2}}\operatorname{tr}\left[\sum_{x,y% }p(x)\frac{1}{d_{Y}}\widehat{\Pi}^{(x,y)}_{\widehat{B}Z}\widehat{\rho}_{% \widehat{B}Z}^{(x_{\overline{S}},y_{\overline{S}})}\right]  
\displaystyle=4\sum_{i_{2}^{\prime}\neq i_{2}}\operatorname{tr}\left[\widehat{% \Pi}_{X\widehat{B}YZ}\widehat{\rho}_{X\widehat{B}YZ}^{(\{X_{S}Y_{S},\hat{B}Z\}% )}\right]  
\displaystyle\leq 4\sum_{i_{2}^{\prime}\neq i_{2}}2^{D_{H}^{\varepsilon}(\rho% _{XB}\\rho_{XB}^{(\{X_{S},B\})})}+\varepsilon. 
In the first two equalities we use the notation X\equiv x(i_{1},i_{2}),X^{\prime}\equiv x(i_{1},i_{2}^{\prime}) and similarly for Y,Y^{\prime}. In the fourth equality \widehat{\rho}_{\hat{B}Z}^{(x_{\overline{S}},y_{\overline{S}})} is the marginal of the conditional density operator \widehat{\rho}_{X_{S}\widehat{B}Y_{S}Z}^{(x_{\overline{S}},y_{\overline{S}})}. In the last inequality we use Theorem 11 and choose the dimensions of Y,Z to be sufficiently large so that h(1,k,d_{B},d_{Y}d_{Z})\leq\varepsilon.
Finally, we can invoke the usual derandomization argument to remove the dependency of our POVM on the choice of y(i). That is, we know that
\displaystyle\mathbb{E}\operatorname{tr}\left[(IQ^{(i)}_{B})\rho_{B}^{(i)}\right]  \displaystyle=\mathbb{E}_{\mathcal{C}^{\prime}}\mathbb{E}_{\mathcal{C}}% \operatorname{tr}\left[(IQ^{(i)}_{B})\rho_{B}^{(i)}\right]  
\displaystyle\leq\varepsilon+f(1,k,\varepsilon)+2g(1,k,\varepsilon)+4\sum_{i_{% 2}^{\prime}\neq i_{2}}2^{D_{H}^{\varepsilon}(\rho_{XB}\\rho_{XB}^{(\{X_{S},B% \})})}. 
Hence, there is a particular choice of y(i) such that the corresponding POVM Q^{(i_{2}i_{1})}_{B} satisfies the bound in Lemma 12, with
\displaystyle f(k,\varepsilon)=\varepsilon+f(1,k,\varepsilon)+2g(1,k,% \varepsilon). 
∎
VI Conclusions
The packing lemma is a cornerstone of classical network information theory, used as a black box in the analyses of all kinds of network communication protocols. At its core, the packing lemma follows from properties of the set of jointly typical sequences for multiple random variables. In this letter, we provide an analogous statement in the quantum setting that we believe can serve a similar purpose for quantum network information theory. We illustrate this by using it as a black box to prove achievability results for the classicalquantum relay channel. Our result is based on a joint typicality lemma recently proved by Sen senInPrep (). This result, at a high level, provides a single POVM which achieves the hypothesis testing bound for all possible divisions of a multiparty state into a tensor product of its marginals. This result allows for the construction of finite blocklength protocols for quantum multiple access, relay, broadcast, and interference channels senInPrep2 ().
Two alternative formulations of joint typicality were proposed in dutil2011multiparty () and drescher2013simultaneous (). In the first work, the author conjectured the existence of the jointly typical state that is close to an i.i.d. multiparty state but with marginals whose purities satisfy certain bounds. This notion of typicality was then used in the analysis of multiparty state merging and assisted entanglement distillation protocols. In the second work, the authors provided a similar statement for the oneshot case. Specifically, for a given multiparty state, they conjectured the existence of a state that is close to the initial state but has a minentropy bounded by the smoothed minentropy of the initial state for all marginals. In a follow up paper we will try to understand the relationship between these various notions of quantum joint typicality and whether Sen’s results can be extended to prove the other notions or to realize the applications they are designed for.
Also, as noted in the corresponding section, our protocol for the partial decodeforward bound is not a straightforward generalization of the classical protocol in elgamal2011network (). Our algorithm involves a joint measurement of all the transmitted blocks instead of performing a backward decoding followed by a forward decoding as in the classical case. The problem arises from the fact that the classical protocol makes multiple measurements on a single system but also intermediate measurements on other systems. Hence, a direct application of our packing lemma has to combine these different measurements and the intermediate ones into one joint measurement. This results in a set of inequalities for the rate region that has to be simplified to obtain the desired bound. This is a step that might be necessary in other applications of our packing lemma.
There are still several interesting questions that remain open regarding quantum relay channels. The most obvious one is proving converses for the given achievability lower bounds. There are known converses for special classical relay channels, and it would be interesting to extend them to the quantum case as we did for semideterministic relay channels. Another, albeit less trivial, direction is to prove a quantum equivalent of the compressforward lower bound elgamal2011network (). We might need to analyze this in the entanglement assisted case since it is only then that a singleletter quantum ratedistortion theorem is known datta2013quantum (). Another idea is to study networks of relay channels, where the relays are operating in series or in parallel. Some preliminary work was done in jin2012lower (), and the most general notion of this in the classical literature is a multicast network elgamal2011network (). Lastly, relay channels with feedback would also be interesting to investigate.
Acknowledgements.
We thank Pranab Sen for interesting discussions and for sharing his draft senInPrep () with us. We would also like to thank Mario Berta, Philippe Faist, and Mark Wilde for inspiring discussions. PH was supported by AFOSR (FA95501610082), CIFAR and the Simons Foundation. HG was supported in part by NSF grant PHY1720397. MW acknowledges financial support by the NWO through Veni grant no. 68047459. DD is supported by the Stanford Graduate Fellowship and the National Defense Science and Engineering Graduate Fellowship. DD would like to thank God for all of His provisions.References
 (1) Abbas El Gamal and YoungHan Kim. Network Information Theory. Cambridge University Press, 2011.
 (2) Alexander S Holevo. The capacity of the quantum channel with general signal states. IEEE Transactions on Information Theory, 44(1):269–273, 1998. doi:10.1109/18.651037.
 (3) Benjamin Schumacher and Michael D Westmoreland. Sending classical information via noisy quantum channels. Physical Review A, 56(1):131, 1997. doi:10.1103/PhysRevA.56.131.
 (4) Pranab Sen. A oneshot quantum joint typicality lemma. 2018. arXiv:1806.07278.
 (5) Omar Fawzi and Ivan Savov. Ratesplitting in the presence of multiple receivers. CoRR, abs/1207.0543, 2012. arXiv:1207.0543.
 (6) Omar Fawzi, Patrick Hayden, Ivan Savov, Pranab Sen, and Mark M Wilde. Classical communication over a quantum interference channel. IEEE Transactions on Information Theory, 58(6):3670–3691, 2012. doi:10.1109/TIT.2012.2188620.
 (7) Nicolas Dutil. Multiparty quantum protocols for assisted entanglement distillation. arXiv preprint arXiv:1105.4657, 2011. arXiv:1105.4657.
 (8) Andreas Winter. The capacity of the quantum multipleaccess channel. IEEE Transactions on Information Theory, 47(7):3059–3065, 2001. doi:10.1109/18.959287.
 (9) Omar Fawzi, Patrick Hayden, Ivan Savov, Pranab Sen, and Mark M Wilde. Quantum interference channels. In Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on, pages 609–616. IEEE, 2011. doi:10.1109/Allerton.2011.6120224.
 (10) Matthias Christandl, M Burak Şahinoğlu, and Michael Walter. Recoupling coefficients and quantum entropies. In Annales Henri Poincaré, volume 19, pages 385–410. Springer, 2018. doi:10.1007/s0002301706391.
 (11) Michael Walter. Multipartite quantum states and their marginals. PhD thesis, ETH Zurich, 2014. doi:10.3929/ethza010250985.
 (12) Claude E Shannon et al. Twoway communication channels. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. The Regents of the University of California, 1961.
 (13) Thomas Cover. Broadcast channels. IEEE Transactions on Information Theory, 18(1):2–14, 1972. doi:10.1109/TIT.1972.1054727.
 (14) Rudolf Ahlswede. The capacity region of a channel with two senders and two receivers. The annals of probability, pages 805–814, 1974. doi:10.1214/aop/1176996549.
 (15) Edward C Van Der Meulen. Threeterminal communication channels. Advances in applied Probability, 3(1):120–154, 1971. doi:10.2307/1426331.
 (16) Gerhard Kramer, Michael Gastpar, and Piyush Gupta. Cooperative strategies and capacity theorems for relay networks. IEEE Transactions on Information Theory, 51(9):3037–3063, 2005. doi:10.1109/TIT.2005.853304.
 (17) LiangLiang Xie and Panganamala R Kumar. An achievable rate for the multiplelevel relay channel. IEEE Transactions on Information Theory, 51(4):1348–1358, 2005. doi:10.1109/TIT.2005.844066.
 (18) Jon Yard, Patrick Hayden, and Igor Devetak. Capacity theorems for quantum multipleaccess channels: Classicalquantum and quantumquantum capacity regions. IEEE Transactions on Information Theory, 54(7):3091–3113, 2008. doi:10.1109/TIT.2008.924665.
 (19) Jon Yard, Patrick Hayden, and Igor Devetak. Quantum broadcast channels. IEEE Transactions on Information Theory, 57(10):7147–7162, 2011. doi:10.1109/TIT.2011.2165811.
 (20) Frédéric Dupuis, Patrick Hayden, and Ke Li. A father protocol for quantum broadcast channels. IEEE Transactions on Information Theory, 56(6):2946–2956, 2010. doi:10.1109/TIT.2010.2046217.
 (21) Pranab Sen. Achieving the HanKobayashi inner bound for the quantum interference channel. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pages 736–740. IEEE, 2012. doi:10.1109/ISIT.2012.6284656.
 (22) Lukas Drescher and Omar Fawzi. On simultaneous minentropy smoothing. In Information Theory Proceedings (ISIT), 2013 IEEE International Symposium on, pages 161–165. IEEE, 2013. doi:10.1109/ISIT.2013.6620208.
 (23) Janis Nötzel. A solution to two party typicality using representation theory of the symmetric group. arXiv:1209.5094.
 (24) Ivan Savov, Mark M Wilde, and Mai Vu. Partial decodeforward for quantum relay channels. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pages 731–735. IEEE, 2012. doi:10.1109/ISIT.2012.6284655.
 (25) Shi JinJing, Shi RongHua, Peng XiaoQi, Guo Ying, Yi LiuYang, and Lee MoonHo. Lower bounds on the capacities of quantum relay channels. Communications in Theoretical Physics, 58(4):487, 2012. doi:10.1088/02536102/58/4/06.
 (26) Daniel Collins, Nicolas Gisin, and Hugues De Riedmatten*. Quantum relays for long distance quantum cryptography. Journal of Modern Optics, 52(5):735–753, 2005. doi:10.1080/09500340412331283633.
 (27) Thomas Cover and Abbas El Gamal. Capacity theorems for the relay channel. IEEE Transactions on information theory, 25(5):572–584, 1979. doi:10.1109/TIT.1979.1056084.
 (28) Nilanjana Datta, MinHsiu Hsieh, and Fernando GSL Brandao. Strong converse rates and an example of violation of the strong converse property. arXiv:1106.3089.
 (29) Tomohiro Ogawa and Hiroshi Nagaoka. Strong converse and stein’s lemma in quantum hypothesis testing. In Asymptotic Theory Of Quantum Statistical Inference: Selected Papers, pages 28–42. World Scientific, 2005. doi:10.1142/9789812563071_0003.
 (30) Fumio Hiai and Dénes Petz. The proper formula for relative entropy and its asymptotics in quantum probability. Communications in mathematical physics, 143(1):99–114, 1991. doi:10.1007/BF02100287.
 (31) Masahito Hayashi and Hiroshi Nagaoka. General formulas for capacity of classicalquantum channels. IEEE Transactions on Information Theory, 49(7):1753–1768, 2003. doi:10.1109/TIT.2003.813556.
 (32) Pranab Sen. Inner bounds via simultaneous decoding in quantum network information theory. 2018. arXiv:1806.07278.
 (33) Abbas El Gamal and Mohammad Aref. The capacity of the semideterministic relay channel (corresp.). IEEE Transactions on Information Theory, 28(3):536–536, 1982. doi:10.1109/TIT.1982.1056502.
 (34) Mark M Wilde. Quantum information theory. Cambridge University Press, 2013.
 (35) Nilanjana Datta, MinHsiu Hsieh, and Mark M Wilde. Quantum rate distortion, reverse shannon theorems, and sourcechannel separation. IEEE Transactions on Information Theory, 59(1):615–630, 2013. doi:10.1109/TIT.2012.2215575.
Appendix A Proof of Cutset Bound
We give a proof of Proposition 4, essentially identical to that of elgamal2011network ():
Proof.
Consider an (n,2^{nR}) code for \mathcal{N}_{X_{1}X_{2}\to B_{2}B_{3}}. Suppose we have a uniform distribution over the message set M, and denote the final classical system obtained by Bob from the POVM measurement by \hat{M}. By the classical Fano’s inequality,
\displaystyle nR=H(M)=I(M;\hat{M})+H(M\hat{M})\leq I(M;\hat{M})+n\delta(n), 
where \delta(n) satisfies \lim_{n\to\infty}\delta(n)=0 if the decoding error is to vanish in asymptotic limit.
We denote by (X_{1})_{j},(X_{2})_{j},(B_{2})_{j},(B_{3})_{j} the respective classical and quantum systems induced by our protocol. We argue
\displaystyle I(M;\hat{M})\leq I(M;B_{3}^{n})  \displaystyle=\sum_{j=1}^{n}I(M;(B_{3})_{j}B_{3}^{j1})  
\displaystyle\leq\sum_{j=1}^{n}I(MB_{3}^{j1};(B_{3})_{j})  
\displaystyle\leq\sum_{j=1}^{n}I((X_{1})_{j}(X_{2})_{j}MB_{3}^{j1};(B_{3})_{j})  
\displaystyle=\sum_{j=1}^{n}I((X_{1})_{j}(X_{2})_{j};(B_{3})_{j}). 
The last step follows from the i.i.d. nature of the n channel uses and the channel is classicalquantum. More explicitly, we can write out the overall state as the protocol progresses, and since the input to the channel on each round is classical, it is not difficult to see that given (X_{1})_{j}(X_{2})_{j}, (B_{2})_{j}(B_{3})_{j} is in tensor product with the other systems. This would not hold if the channel takes quantum inputs, for which we would expect an upper bound that involves regularization. Now, similarly,
\displaystyle I(M;\hat{M})\leq I(M;B_{3}^{n})  \displaystyle\leq I(M;B_{2}^{n}B_{3}^{n})  
\displaystyle=\sum_{j=1}^{n}I(M;(B_{2})_{j}(B_{3})_{j}B_{2}^{j1}B_{3}^{j1})  
\displaystyle=\sum_{j=1}^{n}I(M;(B_{2})_{j}(B_{3})_{j}B_{2}^{j1}B_{3}^{j1}(% X_{2})_{j})  
\displaystyle\leq\sum_{j=1}^{n}I(MB_{2}^{j1}B_{3}^{j1};(B_{2})_{j}(B_{3})_{j% }(X_{2})_{j})  
\displaystyle\leq\sum_{j=1}^{n}I((X_{1})_{j}MB_{2}^{j1}B_{3}^{j1};(B_{2})_{j% }(B_{3})_{j}(X_{2})_{j})  
\displaystyle=\sum_{j=1}^{n}I((X_{1})_{j};(B_{2})_{j}(B_{3})_{j}(X_{2})_{j}), 
where the second equality follows since given B_{2}^{j1}, one can obtain (X_{2})_{j} by a series of \mathcal{R} operations (Note that (B_{2})_{0}(B_{2}^{\prime})_{0} is a trivial system and thus independent of the code.).
Define the state
\displaystyle\sigma_{QX_{1}X_{2}B_{2}B_{3}}\equiv\frac{1}{n}\sum_{q=1}^{n}\ket% {q}\bra{q}_{Q}\otimes\sigma^{(q)}_{X_{1}X_{2}B_{2}B_{3}}, 
where \sigma^{(q)} is the classicalquantum state on the qth round of the protocol, that is, the state on the system (X_{1})_{q}(X_{2})_{q}(B_{2})_{q}(B_{3})_{q}. Now, I(B_{2}B_{3};QX_{1}X_{2})_{\sigma}=0, so
\displaystyle\sum_{j=1}^{n}I((X_{1})_{j}(X_{2})_{j};(B_{3})_{j})  \displaystyle=bI(X_{1}X_{2};B_{3}Q)_{\sigma}  
\displaystyle\leq nI(X_{1}X_{2}Q;B_{3})_{\sigma}  
\displaystyle=nI(X_{1}X_{2};B_{3})_{\sigma} 
and similarly
\displaystyle\sum_{j=1}^{n}I((X_{1})_{j};(B_{2})_{j}(B_{3})_{j}(X_{2})_{j})  \displaystyle=nI(X_{1};B_{2}B_{3}X_{2}Q)_{\sigma}  
\displaystyle\leq nI(X_{1}Q;B_{2}B_{3}X_{2})_{\sigma}  
\displaystyle=nI(X_{1};B_{2}B_{3}X_{2})_{\sigma}. 
Hence,
\displaystyle R\leq\min\{I(X_{1}X_{2};B_{3})_{\sigma},I(X_{1};B_{2}B_{3}X_{2}% )_{\sigma}\}+\delta(n). 
Now, \sigma_{X_{1}X_{2}B_{2}B_{3}} is simply a uniform average of all the classicalquantum states from each round of the protocol, it is also a possible classicalquantum state induced by \mathcal{N}_{X_{1}X_{2}B_{2}B_{3}} acting on some classical input distribution p_{X_{1}X_{2}}. In particular, R is therefore upper bounded by the input distribution which maximizes the quantity on the righthand side:
\displaystyle R\leq\max_{p_{X_{1}X_{2}}}\min\{I(X_{1}X_{2};B_{3}),I(X_{1};B_{2% }B_{3}X_{2})\}+\delta(n). 
Taking the n\to\infty limit completes the proof. ∎