Simple algorithm for computing the communication complexity of quantum communication processes

# Simple algorithm for computing the communication complexity of quantum communication processes

A. Hansen, A. Montina, S. Wolf Facoltà di Informatica, Università della Svizzera Italiana, Via G. Buffi 13, 6900 Lugano, Switzerland
July 6, 2019
###### Abstract

A two-party quantum communication process with classical inputs and outcomes can be simulated by replacing the quantum channel with a classical one. The minimal amount of classical communication required to reproduce the statistics of the quantum process is called its communication complexity. In the case of many instances simulated in parallel, the minimal communication cost per instance is called the asymptotic communication complexity. Previously, we reduced the computation of the asymptotic communication complexity to a convex minimization problem. In most cases, the objective function does not have an explicit analytic form, as the function is defined as the maximum over an infinite set of convex functions. Therefore, the overall problem takes the form of a minimax problem and cannot directly be solved by standard optimization methods. In this paper, we introduce a simple algorithm to compute the asymptotic communication complexity. For some special cases with an analytic objective function one can employ available convex-optimization libraries. In the tested cases our method turned out to be notably faster. Finally, using our method we obtain bits as a lower bound on the asymptotic communication complexity of a noiseless quantum channel with the capacity of 1 qubit. This improves the previous bound of bits.

## I Introduction

Quantum communication can be tremendously more powerful than its classical counterparts in solving distributed computational problems buhrman (). This is one of the important results of quantum communication complexity concerned with the understanding of quantum channels and how they compare to classical ones. A measure of performance of a quantum communication process — called the communication complexity — is the amount of communication required by the most efficient classical protocol simulating the process. If there are instances of the same quantum process, they can be simulated in parallel. The minimal communication cost per instance in the asymptotic limit is called asymptotic communication complexity of the quantum process. In a previous work montina (), we showed that the computation of this quantity can be reduced to a convex minimization problem. Generally, the objective function does not take an analytic form, but is given as the maximum over an infinite set of convex functions. Thus, the computation of the asymptotic communication complexity is generally a minimax problem. In special cases with a suitable symmetry, the objective function is analytically known and the dual form of the minimization is a geometric program montina2 (); montina3 (). Geometric program is an extensively studied class of nonlinear optimization problems and can be solved by robust and very efficient algorithms gp1 (); gp2 (). A commercial implementation is provided by the MOSEK package (see http://www.mosek.com). In this paper, we present a simple and robust algorithm that solves the general minimax problem, which cannot be directly handled by convex or geometric-program libraries. Furthermore, in the numerically considered cases in which the objective function is known, our tailored algorithm turns out to be much faster than available libraries solving convex problems and, more specifically, geometric programs.

The paper is organized as follows. In Sec. II, we introduce the concept of a classical simulation of a quantum communication process and use this to define the communication complexity of the process. In Sec. III, we revise the results of Ref. montina (), where it was showed that the computation of the communication complexity can be reduced to a convex optimization problem. In Sec. IV, the algorithm for computing the communication complexity is presented. The convergence of the algorithm is discussed in Sec. V. As the algorithm is iterative, the solution is approached asymptotically. The iteration is stopped when a desired accuracy is reached. In Sec. VI, we provide an upper bound on the error and, thus, a stopping criterion. Finally, in Sec. VII, we illustrate the method with a numerical example and introduce the improved lower bound of bits on the communication complexity of a noiseless quantum channel with capacity qubit.

## Ii Classical simulation of a quantum communication process

### ii.1 Quantum scenario

Let us consider the following one-way quantum communication process between two parties. A party, say Alice, prepares a quantum state, , chosen among the elements of a set . Then, she sends the quantum state to Bob through a quantum channel. Finally, Bob performs a measurement, say , chosen among a set of measurements. The measurement produces an outcome, , with some probability depending on the prepared quantum state and the performed measurement. The function completely characterizes the overall process and depends on the sets and as well as the quantum channel used in the communication. Alice and Bob both know the sets and , but not the choice made by the other party. That is, their choices are not mutually conditioned. There is no particular constraint on and , but since we are interested in implementing a numerical method, we assume that and have a finite but arbitrarily large number of elements. That is, their cardinalities and are finite.

### ii.2 Single-shot classical simulation

Since the inputs and outcomes are classical, the statistics of a quantum process can be classically simulated by replacing the quantum communication with a classical channel. Besides, Alice and Bob are allowed to share a stochastic random variable, say . The random variable can be an arbitrarily long list of numbers that is generated and delivered to the parties before the inputs and are chosen, so that the numbers in the list do not contain any information on and . The corresponding classical protocol is as follows. Alice generates some variable according to a probability distribution that depends on Alice’s input and the shared stochastic variable . The variable is generated according to some probability distribution and, as mentioned before, is uncorrelated with and . Then, Alice sends to Bob. Finally, Bob generates an outcome with a conditional probability depending on his input , the communicated variable , and the shared stochastic variable . Alice and Bob can also use private stochastic variables, but they can be included in without loss of generality. The protocol exactly simulates the process if

 ∑k∫dy ρB(s|b,k,y)ρA(k|a,y)ρs(y)=P(s|a,b). (1)

Note that the integral symbol stands for integral over some measurable space, but the space of could be indifferently discrete.

There are different definitions of communication cost of a protocol. Without loss of generality, we can assume that is generated deterministically for each value of Alice’s input and random variable . The number of bits required to encode and transmit the variable depends in general on these two values. Let be the number of bits sent by Alice when she chooses with the shared noise . The worst-case communication cost is the maximum of over every possible value taken by and . Alternatively, one can first average over and then take the maximum over the input to obtain the so-called average communication cost

 ¯C≡maxa∫dyρs(y)C(a,y). (2)

There is also an entropic definition, which has been used in Ref. montina (). The entropic cost is always smaller than or equal to the average cost . The results which are presented here hold for both of the last two definitions, thus the average and entropic costs can be used indifferently. Here, we will refer to the average cost .

###### Definition 1 (Communication complexity).

We define the communication complexity of a quantum process to be the minimal communication cost required to simulate it.

### ii.3 Parallel protocols

If the two parties simulate instances of the same quantum process with different inputs and for each instance, it is possible to envisage a larger set of communication protocols, where the probability of generating can depend on the full set of Alice’s inputs, . In other words, the distribution becomes . The asymptotic communication cost, hereafter denoted by , is the cost of the parallelized simulation divided by in the limit of .

###### Definition 2.

We define the asymptotic communication complexity of a problem to be the minimum of over the class of parallel protocols that solve the problem.

Since the set of protocols working for parallel simulations is larger than the set of single-shot protocols, it is clear that

 Casymmin≤Cmin. (3)

However, as showed in Ref. montina (), the difference between and scales at most as the logarithm of , as revised in the next section.

## Iii Computation of Casymmin as a convex optimization problem

In Ref. montina (), we showed that the computation of the asymptotic communication complexity is equivalent to a convex optimization problem. Tight lower and upper bounds on the single-shot communication complexity are given in terms of . The optimization is made over a suitable set of probability distributions. The set, denoted by , depends on the quantum process and is defined as follows.

###### Definition 3.

Given a process , the set is defined as the set of conditional probabilities over the sequence whose marginal distribution of the -th variable is equal to . In other words, the set contains any satisfying the constraints

 ∑s,sb=sρ(s|a)=P(s|a,b)∀a,b and s, (4)

where the sum is performed over every element of the sequence except the -th element , which is set equal to .

The central result in Ref. montina () is a convex optimization problem that yields the asymptotic communication complexity of . The asymptotic communication complexity is equal to the minimum of the capacity of the channels — a convex functional over . Before introducing the definition of the channel capacity, let us recall some concepts from information theory. The mutual information of two stochastic variables and is defined as

 I(X;Y)=∑x∑yρ(x,y)log2ρ(x,y)ρ(x)ρ(y), (5)

where is the joint probability distribution of and , and and are the marginal distributions of and , respectively cover (). The mutual information is a measure of the degree of correlation between two stochastic variables. It is always non-negative, and equal to zero if the variables are uncorrelated. Let us now introduce the concept of a channel. In information theory, a channel is a physical device, such as a wire, carrying information from a sender to a receiver. The channel is mathematically represented by a conditional probability of getting the outcome given the input  cover (). The capacity of the channel , which we denote by , is the maximum of the mutual information between and over the space of probability distributions of the input  cover (), that is,

 C(x→y)≡maxρ(x)I(x;y). (6)

The information-theoretic interpretation of the channel capacity is provided by the noisy-channel coding theorem cover (). Roughly speaking, the capacity of a channel is the maximum rate of information that can be transmitted through the channel.

Given these definitions, let us introduce the functional as the minimum of the capacity over the distributions .

 D(P)≡minρ(s|a)∈V(P)C(a→s)= (7) minρ(s|a)∈V(P)maxρ(a)I(A;S).

The following theorems, proved in Ref. montina (), relate to the communication complexity.

###### Theorem 1.

The asymptotic communication complexity of is equal to .

###### Theorem 2.

The communication complexity is bounded by the inequalities

 D(P)≤Cmin≤D(P)+2log2[D(P)+1]+2log2e. (8)

The single-shot communication complexity is always greater than or equal to the asymptotic communication complexity . However, as anticipated in the previous section, the difference scales at most logarithmically in . Theorem 1 reduces the computation of the asymptotic communication complexity to the following convex optimization problem.

###### Problem 1.
 minρ(s|a)C(a→s)subject to the constraintsρ(s|a)≥0,∑s,sb=sρ(s|a)=P(s|a,b). (9)

Note that the functional is convex in , since the mutual information is convex in  cover () and the pointwise maximum of a set of convex functions is a convex function boyd ().

In general, the channel capacity does not have a known analytic expression. However, in some symmetric problems, it is possible to get rid of the maximization over in the definition of the channel capacity given by Eq. (6). This can be shown by using Sion’s minimax theorem minimax () and some general properties of the mutual information. As the mutual information is convex in and concave in  cover (), we have from the minimax theorem that the minimization and maximization in the definition of [Eq. (7)] can be interchanged. Thus, we obtain

 D(P)=maxρ(a)J(P) (10)

where

 J(P)≡minρ(s|a)∈V(P)I(A;S) (11)

is a functional of . As is concave in and the pointwise minimum of a set of concave functions is concave boyd (), the functional is concave. In some cases, it is trivial to find the distribution maximizing . For example, if the conditional probability is invariant under the transformation up to some suitable transformation of and , then we can infer by symmetry and the concavity of that the uniform distribution maximizes . This case will be considered as a numerical example in Sec. VII.

Thus, if is known, the computation of is reduced to the following convex optimization problem.

###### Problem 2.
 minρ(s|a)I(A;S)subject to the constraintsρ(s|a)≥0,∑s,sb=sρ(s|a)=P(s|a,b). (12)

As shown in Refs. montina2 (); montina3 (), the dual form of Problem 2 is a geometric program (see Ref. boyd () for an introduction to dual theory). Geometric program is an extensively studied class of nonlinear optimization problems gp1 (); gp2 () and the commercial package MOSEK (see http://www.mosek.com) provides a solver specialized for this class. However, if the distribution is not known and we set equal to an arbitrary distribution, the solution of Problem 2 yields merely a lower bound on the asymptotic communication complexity.

In Sec. IV, we present a simple and robust algorithm that directly solves Problem 1. Furthermore, in Sec. VII, we consider some numerical situations in which is known, and we show that the introduced algorithm turns out to be much faster than the Mosek package in solving Problem 2.

## Iv Numerical algorithm

We introduce a simple numerical algorithm (herafter Algorithm 1) for solving Problem 1 and computing the asymptotic communication complexity of a quantum process . A further simplification is given by Algorithm 2, which solves Problem 2. The two algorithms are based on the block coordinate descent method bertsekas (), also called alternating minimization, block-nonlinear Gauss-Seidel method or block coordinate descent method. The alternating minimization is an iterative method that performs the minimization over blocks of variables. Namely, the set of variables, with respect to which the minimization is performed, is divided in blocks, say . The objective function is first minimized with respect to the variables in the block , while keeping the variables in the other blocks constant, then with respect to the variables in and so on and so forth. This procedure is repeated cyclically. There are several results on the convergence of the method for constrained and unconstrained problems. The continuous differentiability of the objective function is generally the basic common assumption. In Ref. bertsekas (), it is proved that the algorithm converges toward a minimum if each block minimization has a unique solution. As the objective function of our problem is not differentiable everywhere (see later discussion), the results relying on the continuous differentiability cannot be employed for a convergence proof. The convergence of Algorithm 2 is a consequence of a general theorem proved in Ref. csiszar (). Although these results cannot be adapted to the case of Algorithm 1, we will provide arguments for the convergence in Sec. V. The convergence proof for Algorithm 2 is given in the same section.

The alternating minimization method with a two-block partition is particularly advantageous for the computation of the asymptotic communication complexity. Indeed, using a suitable partition, one block minimization turns out to be decoupled from the maximization with respect to , whereas the minimization in the other block can be performed analytically.

To derive the algorithm, let us recast Problem 1 as follows. The task is to evaluate the quantity defined by Eq. (7), which takes the form of Eqs. (10,11). The mutual information can be rewritten as minimization of the functional

 K≡∑s,aρ(s|a)ρ(a)lnρ(s|a)R(s) (13)

over the space of probability distributions . Indeed, using the Karush-Kuhn-Tucker conditions for optimality boyd (), we find that the global minimum of the functional with respect to is at [note that the functional is convex in ]. Thus, Eqs. (10,11) turn into the following minimax problem,

 D(P)=maxρ(a)minρ(s|a)∈V(P)minR(s)K. (14)

As is linear in and convex in and , we can swap again the minimization and the maximization minimax (), and obtain

 D(P)=minρ(s|a)∈V(P)minR(s)¯K, (15)

where

 ¯K≡maxρ(a)K (16)

is a convex functional of and . Note that the function is not differentiable if or are zero for some and . Furthermore, is not differentiable in other points, since the maximizing distribution can be discontinuous as a function of and .

To find the global minimum of , we apply the block coordinate descent method by alternately minimizing with respect to and . Given a strictly positive initial distribution (the strict positivity is fundamental for the convergence, as discussed in the end of the section and in Sec. V), we search for the distribution minimizing over the set . Then, we minimize with respect to . We iterate by using these two minimization steps until we get the global minimum up to some given accuracy. By construction, each iteration always lowers the value of .

### iv.1 Minimization w.r.t ρ(s|a)

Let us consider the minimization with respect to . This is made in the domain of non-negative functions under the constraints of Problem 1 and corresponds to solve the minimax problem

 minρ(s|a)maxρ(a)K=maxρ(a)minρ(s|a)K. (17)

We first solve the minimization problem, and show that the solution does not depend on the distribution . The distribution solving the minimization problem under the constraints (4) minimizes the Lagrangian

 L1≡K−∑s,a,b¯λ(s,a,b)ρ(a)[∑s,sb=sρ(s|a)−P(s|a,b)] (18)

over the domain of , where are suitable Lagrange multipliers that are set so that the constraints (4) are satisfied or, equivalently, by maximizing the dual objective function, as discussed later. To find the minimum, we set the derivative of with respect to equal to zero and obtain an optimal distribution of the form

 ρ(s|a)=R(s)e∑bλ(sb,a,b), (19)

where . Replacing the distribution in , we obtain

 K1≡∑s,a,bP(s|a,b)ρ(a)λ(s,a,b)+1−∑sR(s)Fλ(s), (20)

where

 Fλ(s)≡∑aρ(a)e∑bλ(s,a,b). (21)

By definition, is the dual objective function of the original minimization problem (see Ref. boyd () for an introduction to dual theory). Since the constraints of the primal problem satisfy the Slater conditions boyd (), strong duality holds, and the maximum of is equal to the minimum in the original problem. The maximum is characterized by setting the derivative with respect to equal to zero. This gives

 ∑s,sb=sR(s)e∑bλ(sb,a,b)=P(s|a,b), (22)

that is, the constraints (4), as shown by Eq. (19). Thus, the minimizing distribution is given by Eq. (19) and the solution of Eq. (22). As is a concave function, solving Eq. (22) is equivalent to an unconstrained convex optimization problem, and it can be solved through the Newton method boyd (). The introduced quantity plays an important role in the dual form of Problem 2, in evaluating lower and upper bounds on the asymptotic communication complexity and in the formulation of necessary and sufficient conditions for optimality.

At this point, it is important to stress that the solution of Eq. (22) does not depend on . That is, the minimization of in the minimax problem (17) is completely decoupled from the maximization over . Thus, we have reduced the first step of the algorithm to a simple unconstrained maximization of the function with respect to . The computation of the optimal is irrelevant, as it does not affect the next step, the minimization with respect to .

### iv.2 Minimization w.r.t. R(s)

Let us consider the minimization of with respect to , which corresponds to solve the minimax problem

 minR(s)maxρ(a)K=maxρ(a)minR(s)K. (23)

As we already said, the minimization with respect to yields

 R(s)=∑aρ(s|a)ρ(a)≡ρ(s). (24)

Thus, the minimization replaces with . Making this replacement in , we get from Eq. (13)

 K=I(S;A)=∑s,aρ(s|a)ρ(a)lnρ(s|a)ρ(s), (25)

that is, is the mutual information between the variables and . Thus, the maximization with respect to in Eq. (23) is just the computation of the capacity of the channel , which can be performed by using standard methods, such as the Blahut-Arimoto algorithm cover ().

### iv.3 Alternating minimization

The two block-minimizations over and are iterated until a given accuracy is reached. The stopping criteria will be discussed in Sec. VI.

Summarizing, the algorithm is as follows.
Algorithm 1 (for Problem 1).

1. Set some initial distribution .

2. Compute solving the equations

 ∑s,sb=sR(s)e∑¯bλ(s¯b,a,¯b)=P(s|a,b). (26)
3. Set .

4. Maximize the mutual information

 I(S;A)=∑s,aρ(s|a)ρ(a)logρ(s|a)∑¯aρ(s|¯a)ρ(¯a) (27)

with respect to [computation of the capacity of the channel ].

5. Set .

6. Stop if a given accuracy is reached.

7. Repeat from step 2.

The algorithm can be recast into a more illuminating form by getting rid of . This also reduces the amount of required memory by about a factor of . Furthermore, the new form suggests an approximate computation of the channel capacity that is much more efficient numerically. Using constraint (4) and the expression of given at step 3, the mutual information (27) takes the form

 I(S;A)=∑s,a,bP(s|a,b)ρ(a)λ(s,a,b)−∑sR(s)Fλ(s)logFλ(s). (28)

As shown in Refs. montina2 (); montina3 () and later in Secs. V.1 and VI.3, the optimal solution the minimizer satisfies the equation . Thus, if and are close to the solution, we can approximate up to the second order in ,

 I(S;A)≃∑s,a,bP(s|a,b)ρ(a)λ(s,a,b)+∑sR(s)Fλ(s)[1−Fλ(s)]. (29)

This form is quadratic in . Using Eq. (4) and the definition of , we obtain

where

 d1(a)≡∑s,bP(s|a,b)λ(s,a,b)d2(a,a′)≡∑sR(s)e∑bλ(sb,a,b)+∑bλ(sb,a′,b). (31)

To maximize the quadratic form (30) is numerically much more efficient than maximizing the exact form (28). Indeed, the maximization of the exact form requires to compute the objective function and, possibly, its derivatives many times. The associated computational cost grows exponentially with because of the sum over . Conversely, with the approximate form, computation of the coefficients and , which is the hardest part, is made only once before each maximization.

Numerical experiments show that this approximation does not affect the convergence. Algorithm 1 is recast as follows
Algorithm 1b (for Problem 1).

1. Set some initial distribution .

2. Compute solving the equations

 ∑s,sb=sR(s)e∑¯bλ(s¯b,a,¯b)=P(s|a,b). (32)
3. Maximize the function , given by Eq. (28) or its approximate form (30), with respect to .

4. Perform the replacement .

5. Stop if a given accuracy is reached.

6. Repeat from step 2.

This recast of Algorithm will be useful for the subsquent discussion on the convergence.

If the distribution is known, we can fix it and skip step 3 performing the maximization of over . The resulting algorithm solves Problem 2 and is as follows.
Algorithm 2 (for Problem 2).

1. Set some initial distribution .

2. Compute solving the equations

 ∑s,sb=sR(s)e∑¯bλ(s¯b,a,¯b)=P(s|a,b). (33)
3. Perform the replacement .

4. Stop if a given accuracy is reached.

5. Repeat from step 2.

It is worth to note that the initialization is necessary for the convergence of the algorithm, unless the domain of the minimizer distribution, say , is known. Indeed, suppose that we set for some , but . The update performed at step 4 of Algorithm 1b keeps , provided that is finite. Thus, the algorithm never converges toward the solution.

## V Convergence proof

The convergence of Algorithm 2 is a consequence of the results in Ref. csiszar () and will be proved below. Although this proof does not hold for the general Problem 1, in the end of the section, we will give some arguments for the convergence of Algorithm 1.

The proof relies on three simple lemmas.

###### Lemma 1.

(Lemma 1 in Ref. csiszar ()) Let and () be extended real numbers greater than and a finite number such that

 c+bn−1≥bn+an,n=1,2,… (34)

and

 limsupn→∞bn>−∞,bn0<+∞ % for some n0. (35)

Then,

 liminfn→∞an≤c. (36)

Proof. Suppose that . As is finite, Eq. (34) implies that is finite for every . Furthermore, from Eq. (34) we have that

 liminfn→∞(bn−1−bn)>0, (37)

which implies that , in contradiction to the hypothesis.

###### Lemma 2.

Let be the minimizer of

 K|R=R0≡∑s,aρ(s|a)ρ(a)logρ(s|a)R0(s) (38)

with respect to , where is a convex set. Then,

 ∑s,aρ1(s|a)ρ(a)logρ1(s|a)R0(s)≤∑s,aρ(s|a)ρ(a)logρ1(s|a)R0(s) (39)

for every .
Proof. As the set is convex, we have that for every . As the function

 Kt≡K|ρ(s|a)=ρt(s|a),R=R0

is minimal in , we have that

 dKtdt∣∣∣t=1≤0, (40)

which gives Ineq. (39).

###### Lemma 3.

For every pair of distributions and , we have that

 ∑s,aρ(s,a)logρ(s,a)∑¯aρ(s,¯a)≥∑s,aρ(s,a)logρ1(s,a)∑¯aρ1(s,¯a), (41)

where and .

Proof. For every differentiable convex function , we have that . As the function is convex in and its derivative is equal to , we have that

 ∑s,aρ(s,a)logρ(s,a)∑¯aρ(s,¯a)≥∑s,aρ1(s,a)logρ1(s,a)∑¯aρ1(s,¯a)+∑s,a[ρ(s,a)−ρ1(s,a)]logρ1(s,a)∑¯aρ1(s,¯a), (42)

and, therefore, Ineq. (41).

At this point, we can prove the following.

###### Theorem 3.

Algorithm 2 converges to the solution of Problem 2.

Proof. Let and with be the series generated by the algorithm. Namely, is the minimizer of the function with respect to and the minimizer of with respect to . In other words, the series is generated as follows. We start with an initial distribution and compute through block-minimization of with respect to by taking . Then, we compute through block-minimization with respect to by taking and so on. The block-minimization with respect to gives

 Rn(s)=∑aρn(s|a)ρ(a), (43)

At the -th round, after the minimization with respect to , the objective function takes the value

 Kn=∑s,aρn(s|a)ρ(a)logρn(s|a)∑¯aρn(s|¯a)ρ(¯a) (44)

By construction, the series is monotonic decreasing. To prove that the series converges to the minimum of , it is sufficient to prove that

 liminfn→∞Kn≤J(P). (45)

First, we have that

 Kn≤∑s,aρn(s|a)ρ(a)logρn(s|a)Rn−1(s), (46)

since maximizes with respect to . Using Lemma 2, we obtain the inequalities

 Kn≤∑s,aρ(s|a)ρ(a)logρn(s|a)Rn−1(s) (47)

for every . Thus,

 Kn≤∑s,aρ(s|a)ρ(a)logρn(s|a)Rn(s)+∑s,aρ(s|a)ρ(a)logRn(s)Rn−1(s). (48)

As , Lemma 3 implies that

 Kn≤∑s,aρ(s|a)ρ(a)logρ(s|a)∑¯aρ(s|¯a)ρ(¯a)+∑s,aρ(s|a)ρ(a)logRn(s)Rn−1(s). (49)

By making the identifications

 ∑s,aρ(s|a)ρ(a)logρ(s|a)∑¯aρ(s|¯a)ρ(¯a)→c,∑s,aρ(s|a)ρ(a)logρ(s|a)Rn(s)→βn,Kn→an, (50)

Ineq. (49) takes the form of Eq. (34). the quantity is not negative for every . Furthermore, it is finite for , since (initialization condition in the algorithm). Thus, Lemma 1 implies that

 liminfn→∞Kn≤∑s,aρ(s|a)ρ(a)logρ(s|a)∑¯aρ(s|¯a)ρ(¯a) (51)

for every . Thus,

 liminfn→∞Kn≤minρ(s|a)∈VI(S;A)=J(P). (52)

### v.1 Convergence of Algorithm 1

The machinery used for proving the convergence of Algorithm 2 cannot be used for Algorithm 1, since the distribution is updated at each round of the iteration. Here, we give some arguments supporting the hypothesis that also Algorithm 1 converges toward the minimum. This hypothesis is also supported by numerical experiments.

As shown in Ref. montina2 (); montina3 () and later in Sec. VI.3, the necessary and sufficient conditions for optimality of Problem 2 are

 ρ(s|a)=ρ(s)e∑bλ(sb,a,b),Fλ(s)≤1,ρ(s)≥0,∑s,sb=sρ(s|a)=P(s|a,b). (53)

The distributions and are also solutions of Problem 1 if maximizes the mutual information .

We first observe that the decrease of through the block-minimization with respect to goes to zero as the number of iterations goes to infinity. That is,

 limn→∞[maxρ(a)Kn−1/2−maxρ(a)Kn]=0, (54)

where

 Kn−1/2≡∑s,aρn(s|a)ρ(a)logρn(s|a)Rn−1(s), (55)
 Kn≡∑s,aρn(s|a)ρ(a)logρn(s|a)Rn(s), (56)

and being the distributions and at the -th interation. Thus,

 limsupn→∞[Kn−1/2−Kn]ρ(a)=ρn(a)≤0, (57)

where maximizes . This gives the inequality

 limsupn→∞∑sRn(s)logRn(s)Rn−1(s)≤0. (58)

The terms of the sequence are the relative entropy between and and are always non-negative. Thus,

 limn→∞∑sRn(s)logRn(s)Rn−1(s)=0. (59)

Since the relative entropy between two distributions is equal to zero only if the distributions are equal, we also have

 limn→∞[Rn(s)−Rn−1(s)]=0. (60)

Now, the minimization at the -th iteration gives

 ρn(s|a)=Rn−1(s)e∑bλn(sb,a,b), (61)

We also have with arbitrary precision, provided that is arbitrary large. Thus,

 ρn(s|a)≃ρn(s)e∑bλn(sb,a,b) (62)

with arbitrary precision, which is the first optimality condition (53). Also the third and fourth conditions are satisfied. Thus, it remains to check if the second condition is asymptotically satisfied in the limit . Let us assume that sequences and , as well as , converge to some limit point. In particular, it is sufficient to assume that converges to some . Thus, it is clear from step 4 that , otherwise would explode to infinity for every such that . Note that converges to a nonzero value only if . Indeed, the first condition for optimality implies that

 ρ(s)≠0⇒Fλ(s)=1. (63)

Given for granted that the sequences and converge to some and , respectively, this reasoning shows that converges toward the minimum of with equal to the limit distribution. Furthermore, it converges to the minimum of , since is the minimizer of the mutual information at each step of the iteration. Thus, the algorithm converges to the solution of Problem 1.

## Vi Error estimation

The iterations of Algorithm 1 stop at step 6 when a given accuracy is reached. Until now, we have not addressed the issue of how to provide an estimate of the error. As the algorithm converges to from above, the value of obtained in each iteration yields an upper bound. In the following we will use the dual form of problem 2 to derive a lower bound and we will show that the difference of the bounds converges to zero and is thus a reasonable measure of the accuracy of . We will employ the necessary and sufficient conditions for optimality (53) derived in Ref. montina2 () to do so.

The section is organized as follows. In Sec. VI.1, we introduce the dual form of Problem 2, which takes the form of a geometric program montina2 (); montina3 (). Then, In Sec. VI.2, we show how to compute lower and upper bounds at each step of the iteration. In Sec. VI.3, we use the dual problem to derive necessary and sufficient conditions for optimality. Using the conditions, we show that, in the limit of infinite iterations, the lower and upper bounds approach the asymptotic communication complexity. Thus, as a possible stopping criterion, the iterations are terminated when the difference between the lower and upper bound is below some accuracy.

### vi.1 Dual form of Problem 2

The dual form of a constrained minimization problem (primal problem) is a maximization problem in which the constraints are replaced by variables, the Lagrange multipliers. In general, the dual maximum is smaller than the minimum of the primal problem. The difference between the minimum and the maximum is called the duality gap. However, the dual maximum turns out to be equal to the primal minimum if some regularity conditions on the constraints of the primal problem are satisfied boyd (). This is the case for Problem 2 montina2 (); montina3 ().

The dual objective function is given by the minimum of the Lagrangian with respect to the primal variables. As done for the derivation of the numerical algorithm, it is convenient to replace the objective function of Problem 2 with function defined by Eq. (13). The minimization is now performed on the variables and . The first variables satisfy the constraints of Problem 1. Additionally, the variables satisfy the positivity constraints

 R(s)≥0. (64)

A direct way for getting the dual problem passes through the dual form of the block optimization over the variables . As we have seen in Sec. IV.1, this dual form is an unconstrained maximization of the objective function , given by Eq. (20). Thus, Problem 2 takes the form

 minR(s)maxλ(s,a,b)K1subject to the constraintsR(s)≥0. (65)

The variables in can be regarded as Lagrange multipliers associated with the inequality constraints of the following maximization problem:

###### Problem 3.
 maxλ(s,a,b)Idualsubject to the constraintsFλ(s)≤1, (66)

where the objective function is

 Idual≡∑sabP(s|a,b)ρ(a)λ(s,a,b). (67)

Problem 3 is the dual form of Problem 2 and was derived in Ref. montina2 (); montina3 () in different ways. It is a particular case of geometric program gp1 (); gp2 ().

### vi.2 Lower and upper bounds on Casymmin

A lower bound on the asymptotic communication complexity is provided by any feasible point of the dual Problem 3. This gives a lower bound on the optimal value of Problem 2 and, hence, a lower bound on the optimal value of Problem 1. A feasible point can be easily obtained at each step of Algorithm 1. The procedure is as follows. Given the Lagrange multipliers computed at step 2 of the algorithm, we define the variables by adding a constant to . The constant is chosen so that satisfy the constraints of Problem 3, that is, we have

 F~λ(s)≤1. (68)

The quantity

 C−=∑s,a,bP(s|a,b)ρ(a)[λ(s,a,b)+k] (69)

is a lower bound on . To have a lower bound as close as possible to , we have to choose the constant such that is as large as possible and the constraints (68) are satisfied. This is attained by taking , which gives the lower bound

 C−=∑s,a,bP(s|a,b)ρ(a)λ(s,a,b)−logmaxsFλ(s). (70)

An upper bound on at each step of the iteration is given by the value taken by the objective function . After each iteration, we have that

 ρ(s|a)=ρ(s)F−1λ(s)e∑bλ(sb,a,b),R(s)=ρ(s),∑s,sb=sρ(s|a)=P(s|a,b). (71)

Using the first two equations, the upper bound takes the form

 C+≡∑s,a,bρ(s|a)ρ(a)λ(sb,a,b)−∑sρ(s)logFλ(s). (72)

Using the last of Eqs. (71), the upper bound can be rewritten in the form

 C+=∑s,a,bP(s|a,b)ρ(a)λ(s,a,b)−∑sρ(s)logFλ(s), (73)

which is computationally more convenient, as the summation over the sequence is replaced by the summation over .

As Algorithm 1 converges to the solution of Problem 1, obviously converges to from above. To prove that the lower bound also converges to