Operator scaling with specified marginals
Abstract
The completely positive operators, which can be viewed as a generalization of the nonnegative matrices, are maps between spaces of linear operators arising in the study of Calgebras. The existence of the operator analogues of doubly stochastic scalings of matrices, the study of which is known as operator scaling, is equivalent to a multitude of problems in computer science and mathematics such rational identity testing in noncommuting variables, noncommutative rank of symbolic matrices, and a basic problem in invariant theory (Garg, Gurvits, Oliveira and Wigderson, FOCS, 2016).
We study operator scaling with specified marginals, which is the operator analogue of scaling matrices to specified row and column sums (or marginals). We characterize the operators which can be scaled to given marginals, much in the spirit of the Gurvits’ algorithmic characterization of the operators that can be scaled to doubly stochastic (Gurvits, Journal of Computer and System Sciences, 2004). Our algorithm, which is a modified version of Gurvits’ algorithm, produces approximate scalings in time whenever scalings exist. A central ingredient in our analysis is a reduction from operator scaling with specified marginals to operator scaling in the doubly stochastic setting.
Instances of operator scaling with specified marginals arise in diverse areas of study such as the BrascampLieb inequalities, communication complexity, eigenvalues of sums of Hermitian matrices, and quantum information theory. Some of the known theorems in these areas, several of which had no algorithmic proof, are straightforward consequences of our characterization theorem. For instance, we obtain a simple algorithm to find, when they exist, a tuple of Hermitian matrices with given spectra whose sum has a given spectrum. We also prove new theorems such as a generalization of Forster’s theorem (Forster, Journal of Computer and System Sciences, 2002) concerning radial isotropic position.
Contents:
 1 Introduction
 2 Applications and special cases
 3 Preliminaries and main theorems
 4 A reduction to the doubly stochastic case
 5 Positive capacity implies scalability by upper triangulars
 6 Ranknondecreasingness implies positive capacity
 7 Scalability implies ranknondecreasingness
 8 Nonupper triangular scalings: Proof of Theorem 3.9
 9 A sufficient condition for exact scalability
 10 Proofs for Applications
 11 Future work
 12 Appendix
1 Introduction
Completely positive maps are linear maps between spaces of linear operators that, informally speaking, preserve positivesemidefiniteness in a strong sense. Completely positive maps generalize nonnegative matrices in some sense and arise naturally in quantum information theory and the study of Calgebras [20]. If is a complex inner product space, let denote the space of Hermitian operators on . To each completely positive map is associated another completely positive operator known as the dual of . In analogy with the matrix case, say a completely positive map is doubly stochastic if and . A scaling of a completely positive map by a pair of invertible linear maps is the completely positive map . One is led to ask which completely positive maps have doubly stochastic scalings; operator scaling is the study of this question. In fact, several other problems such as rational identity testing in noncommuting variables, membership in the nullcone of the leftright action of [12], and a special case of Edmonds’ problem [14] each reduce to (or are equivalent to) an approximate version of this question. In [14], Gurvits gave two useful equivalent conditions for approximate scalability: a completely positive map can be approximately scaled to doubly stochastic if and only if is ranknondecreasing, i.e. for all , or equivalently where
Gurvits also gave an algorithm to compute approximate scalings if either of these equivalent conditions hold. The authors of [12], [11], and [14] analyzed the same algorithm to obtain polynomialtime decision algorithms for each of the aforementioned problems.
We consider a natural generalization of doubly stochastic scalings. Say maps if and and say is an scaling of if is a scaling of that maps .
Question 1.
Given positive semidefinite matrices and and a completely positive map , does have an scaling?
We extend Gurvits’ characterization of approximate scalability to the setting of Question 1. As in [14], our existence proofs lead to algorithms that efficiently produce approximate scalings when they exist. Theorem 3.8, which closely resembles the characterization in [14], characterizes the existence of approximate scalings by blockuppertriangular matrices. Theorem 3.9 extends this characterization to handle scalings in the full generallinear group with a somewhat surprising outcome  informally, a completely positive map has approximate scalings if and only if a suitable random scaling of satisfies the conditions of Theorem 3.8 with high probability. We also give an exponential time algorithm to decide if can be scaled to map with arbirarily small error.
A close variant of Question 1 first appeared in [13], in which the authors propose scalings as quantum analogues of Markov chains satisfying certain relative entropy minimality conditions. The authors of [13] conjectured a partial answer to Question 1, which was confirmed in [9]. Our Theorem 10.20 extends the answer of [9], and prove the conjecture of [13] apart from one small caveat.
This paper is organized as follows: in Section 2, we describe several questions that can be reduced to Question 1 and for which our results yield a number of new characterizations and algorithms. In Section 3, after providing the necessary background, we state our main results, Theorems 3.8 and 3.9. We prove Theorem 3.8 in Sections 4 through 7 and Theorem 3.9 in Section 8. In Section 9 we describe a sufficient condition called indecomposability that guarantees the existence of exact scalings. Finally in Section 10 bring Theorems 3.8 and 3.9 to bear on the questions from Section 2.
Acknowledgements
The author would like to thank Michael Saks for many insightful discussions, and Rafael Oliveira for interesting observations and pointers to relevant literature.
2 Applications and special cases
Here we mention a few questions that can be answered by via reduction to Question 1.
Question 2 (Matrix scaling).
Given a nonnegative matrix and nonnegative row and columnsum vectors and , do there exist diagonal matrices such that the row (resp. column) sums of are (resp. c)?
It is wellknown that matrix scaling can be reduced to an instance of operator scaling with specified marginals, but Gurvits’ characterization does not apply to this instance unless and are the allones vectors. In 10.1, we recall the reduction from Question 2 to Question 1 and derive the classic theorem of [29] on the existence of such scalings as a consequence of Theorem 3.8.
Question 3 (Eigenvalues of sums of Hermitian matrices).
Given nonincreasing sequences of real numbers, are the spectra of some Hermitian matrices satisfying ?
In [21], Klyachko showed (amazingly) that the answer to Question 3 is “yes” if and only if satisfy a certain finite set of linear inequality constraints. That is, such form a polyhedral cone. A long line of work has been devoted to describing the set , which has connections to representation theory, Schubert calculus, and combinatorics ([22], [21], [10]). There are even polynomialtime algorithms to test if satisfy [27]. However, no previous work has provided an algorithm to find the Hermitian matrices in question. Our reduction, which can be found in 10.2, yields an algorithmic proof of the result in [21]. That is, we exhibit an algorithm that outputs a sequence of Hermitian matrices (in particular, real symmetric matrices!) with spectra approaching if satisfy .
Question 4 (Forster’s scalings).
Given vectors , nonnegative numbers , and a positivesemidefinite matrix , when does there exist an invertible linear transformation such that
Barthe [2] answered this question completely for the case . Forster independently answered Question 4 in the positive for , in general position, and ; as a consequence he was able to prove previously unattainable lower bounds in communication complexity [8]. As noted in [14], Forster’s result is a consequence of Gurvits’ characterization of doubly stochastic scalings. In 10.3 we reduce the general case of Question 4 to an instance of Question 1, and use this reduction to answer the approximate version of Question 4. For fixed and , the admissible form a convex polytope whose form is a natural generalization of the polytope, known as the basis polytope, described in [2]. In fact, one can derive the SchurHorn theorem on diagonals of Hermitian matrices with given spectra [16] from our answer to Question 4.
Lastly, we hope our techniques will be of use in quantum information theory. The completely positive maps that map have a meaningful interpretation: by a fact known as channelstate duality [19], to each completely positive map is associated a unique unnormalized mixed bipartite quantum state . The operator maps if and only if and . That is, the local mixed states induced by are and . Operator scaling has established connections between separable quantum states and matroid theory [14], so perhaps our techniques can shed further light on such relationships. We discuss this further in Section 11.
3 Preliminaries and main theorems
Before presenting the main theorems we fill in some background and justify a few assumptions we will make throughout the paper. The notation established in 3.1 and 3.2 will be summarised in 3.6.
3.1 Preliminaries
Definition 3.1 (Completely positive maps).
A completely positive map is a map of the form
where and are finite dimensional complex inner product spaces and are linear maps called Kraus operators of . Note that preserves positivesemidefiniteness. The map is given by
and is the adjoint of in the trace inner product. Recall that we say maps if and .
Definition 3.2 (Scalings of completely positive maps).
If is a completely positive map, and , we define the completely positive map by
Observe that
is called the scaling of by .
Here (resp. ) will be a subset of (resp. ), and often a subgroup.
Definition 3.3 (Approximate scalings).
Say a scaling of is an scaling if maps with and . If and , say is approximately scalable to if for all , has an scaling by .
If and are invertible, approximate (resp. exact) scalability to is equivalent to approximate (resp. exact) scalability to for and , so we mainly restrict attention to scalings.
It will be handy to be able to easily move back and forth between scalings and scalings. The following easy lemma, which we prove in Appendix 12.2, gives us this freedom.
Lemma 3.1.
Suppose and are positivedefinite, , and that is blockdiagonal. The following are equivalent:

is approximately (resp) exactly scalable to by .

is approximately (resp) exactly scalable to by .

is approximately (resp) exactly scalable to by .

is approximately (resp) exactly scalable to by .
Moreover, if has an scaling by then has , , and scalings by .
Henceforward (resp. ) denotes a positivesemidefinite operator in (resp. ). We further assume because .
Flags
We will think of positivesemidefinite operators in terms of their spectrum and an associated sequence of subspaces called a flag.
Definition 3.4 (Flags).

If is an dimensional vector space, a flag on is a sequence of subspaces
where .

The signature of , denoted , is the set of dimensions appearing in the flag. Say a flag is complete if it has signature ; else is partial.

The standard flag in an orthonormal basis of is the complete flag

Conversely, a complete flag is the standard flag in a unique orthonormal basis up to multiplication of each basis vector by a complex number of modulus 1. In general, if is a flag, say is an adapted basis for if is orthonormal and for . That is, is a subflag of the standard flag in .

If is a set of linear transformations of , denote the set
When is a subgroup of , is the stabilizer subgroup of under the action of .
Definition 3.5 (Blockuppertriangular scalings).
If is flag on and is a linear operator on , we say is blockuppertriangular (with respect to ) if
If is a set of linear transformations of , let
When is a subgroup of , is the stabilizer subgroup of under the action of .
Note that a linear transformation is blockuppertriangular if and only if the matrix for is blockuppertriangular with blocksizes in an adapted basis for .
Next we discuss how to view Hermitian operators in terms of their spectra and an associated flag. is a subspace, let denote the orthogonal projection to . Observe that if is a flag and a sequence of positive numbers, then
is a positivesemidefinite operator in .
In fact, every positive semidefinite operator has a unique representation of this form; this can be seen by taking the sequence to be the sequence of sums of eigenspaces of in decreasing order of eigenvalue. More precisely:
Fact 3.2 (See the survey [10], e.g.).
Let be positivesemidefinite. Let denote the spectrum of . Then there is a unique shortest flag, denoted , such that there exist satisfying
Further, we have and for where .
Note that for any flag such that , we must have . Thus, not all spectra and flags are compatible. We give a name to those flags that are compatible with given spectrum.
Definition 3.6 (partial flag).
If is a nonincreasing sequence of nonnegative numbers, say is an partial flag if
It will be useful to have some shorthand for the difference sequence .
Definition 3.7 (Difference sequences).
If is a sequence, define to be the sequence
Here . We define
Note that if and only if is an partial flag.
Definition 3.8 (Flag notation convention).

will denote the flag and will denote the flag .

and will denote adapted bases for and , respectively.
Note that (resp. ) is diagonal with nonincreasing diagonal in basis (resp. ).
Definition 3.9 (Projections to flags).
For , let be a partial isometry. That is, is , the orthogonal projection to . In the basis (and basis for ) we have
Let for be the analogous partial isometries.
Restrictions on scalings
We must impose some restrictions on in order for our methods to work. Luckily, this level of generality suffices for all the applications known to the author. In particular, these restrictions will never rule out the case and , so any reader only interested in scalings can safely skim this subsection.
Our characterization can apply in a more general setting than the one discussed here. For the sake of simplicity, we describe this more general setting in Remark 8.12 after presenting our main theorems and algorithms.
Our groups will take the form
(1) 
where and (here the direct sums are assumed to be orthogonal direct sums).
For our proof techniques to work, we must assume respects the decompositions and . That is, we require
(2)  
(3) 
If 2 and 3 hold, say is compatible with and . Note that if is compatible with and , then and where , are positivesemidefinite operators. Say an operator of this form is compatible with , and analogously for and . Observe that compatibility of with depends only on . Further, and are compatible if and only if
For this reason we say a flag is compatible with if for all ; we define compatibility with analogously. Since we are interested in scalability, we may assume and are compatible with and , respectively, or else any that is compatible with and is clearly neither exactly nor approximately scalable to . We summarise our assumptions in the next definition.
Definition 3.10 (Blockdiagonal).
If there exist decompositions and such that

and satisfy 1,

is compatible with and , and

and are compatible with and , respectively,
say is blockdiagonal. For convenience, say is blockdiagonal if is blockdiagonal.
Example 1.
The tuple is always blockdiagonal.
Example 2.
Observation 3.3.
If is blockdiagonal, then approximate or exact scalability of to depends only on the spectra of and .
Proof.
All and with fixed spectra such that is blockdiagonal are conjugate by unitaries in and , respectively. However, for any unitaries and , the change of variables by the transformation shows approximate (resp. exact) scalability to is equivalent to approximate (resp. exact) scalability to . ∎
Extensions of Gurvits’ conditions
We remind the reader of Gurvits’ theorem characterizing scalability of completely positive maps to doubly stochastic.
Theorem 3.4 ([14]).
Suppose is a completely positive map. The following are equivalent:


is ranknondecreasing, that is, for all , .

is approximately scalable to .
In order to state our main theorems, we’ll need extensions of ranknondecreasingness and capacity. To define our notion of ranknondecreasingness, we’ll define a polytope depending on , and then define to have the ranknondecreasingness property if is in the polytope defined by .
Definition 3.11 (ranknondecreasingness for specified marginals).
Suppose are given partial flags.

We say a pair of subspaces is independent if for all .

Define to be the the set of satisfying , , and
(6) for all independent pairs . Because the coefficients of the in the above sum can take on only a finitely many values, is a convex polytope.

Say is ranknondecreasing if .
This definition extends the definition of ranknondecreasingness. Ranknondecreasingness is usually, and equivalently, defined by the nonexistence of a shrunk subspace, or a subspace such that . Since
is a independent pair and all other independent pairs have , there is no shrunk subspace if and only if for all independent pairs . That is, is ranknondecreasing if and only if is ranknondecreasing, because .
Remark 3.5.
ranknondecreasingness does not depend on the particular choice of Kraus operators for , because independence of does not depend on the choice of Kraus operators for . This is because is independent if and only if , where denotes the orthogonal projection to the subspace .
We will need a variant of the determinant that depends on additional argument which is a positive semidefinite operator.
Definition 3.12 (Relative determinant).
Define the determinant of relative to , denoted , by
(7) 
Of course, can be defined analogously for any positivesemidefinite operator .
The relative determinant inherits a few of the multiplicative properties of determinant when restricted to blockuppertriangular matrices.
Lemma 3.6 (Properties of ).
If , then
(8)  
(9)  
(10) 
We defer the (easy) proofs to Appendix 12.2.
Remark 3.7.
Observe that
Equivalently, is the von Neumann entropy of , denoted , which is equal to the Shannon entropy of the spectrum of . One can also draw some parallels with the quantum relative entropy. By the concavity of the determinant, is concave in . Further, we will see that for fixed nonsingular , is maximized at subject to and . Thus, it can be intuitively helpful to think of as a crossentropy of and , and
as a relative entropy of with respect to , though it is not equal to the VonNeumann relative entropy.
Finally we come to an extension of Gurvits’ capacity.
Definition 3.13 (Capacity for specified marginals).
Here we take . Recall from Definition 3.9 that , are partial isometries. Define
(11) 
If partial flags and partial flags and are given, then refers to the quantity where and are the unique operators with , and , .
Note that and . By the existence of Cholesky decompositions, . This implies for , so is an extension of the usual capacity.
3.2 Main theorems
We are ready to state our analogue of Gurvits’ characterization for blockuppertriangular scalings. Gurvits’ characterization is the special case and of the following theorem. Recall that if is a flag on and is a subgroup of , then is the subgroup of fixing each subspace in .
Theorem 3.8.
Suppose is a completely positive map and are positivesemidefinite. The following are equivalent:

is ranknondecreasing.

.

is approximately scalable to for all and such that is blockdiagonal.
For fixed, the completely positive map is either ranknondecreasing for no , or is ranknondecreasing for generic . Here “generic ”, a strengthening of “almost all ”, means “all not in the zero set of some fixed finite set of polynomials, none of which vanishes on ”. This allows us to extend our characterization from scalability to a characterization of scalability.
Theorem 3.9.
Suppose is blockdiagonal. The following are equivalent:

is ranknondecreasing for generic .

for generic .

is approximately scalable to .
Recall that for blockdiagonal, the scalability of to depends only on the spectra of and . In fact, the spectra for which approximate scaling can be done form a convex polytope.
Theorem 3.10.
The spectra of pairs of positivesemidefinite operators such that

is blockdiagonal and

is approximately scalable to
forms a convex polytope, which we denote
We also obtain algorithmic counterparts of Theorem 3.8 and Theorem 3.9. Let denote the least nonzero eigenvalues of and , and let be the total bitcomplexity of the input where is given by Kraus operators written down in bases and in which and , respectively, are diagonal.
Theorem 3.11.
Suppose is blockdiagonal. There is a deterministic algorithm of timecomplexity that takes as input and outputs such that is an scaling whenever and ERROR otherwise.
Theorem 3.12.
Suppose is blockdiagonal. There is a randomized algorithm of timecomplexity that takes as input and outputs such that is an scaling with probability at least 2/3 whenever is approximately scalable to and ERROR otherwise.
We can give only a randomized exponential time algorithm for the decision version of our problem. It would be interesting to find a polynomial time algorithm for this, as it would be a step towards finding a truly polynomial time algorithm for membership in the Kronecker polytope. Note that the only exponential dependence is on the bit complexity of the spectra and .
Theorem 3.13.
Suppose is blockdiagonal. There is a randomized algorithm of timecomplexity to decide if Equivalently, the algorithm decides if is approximately scalable to .
3.3 Proof overviews and discussion
Theorem 3.8:
Theorem 3.9:
Theorem 3.9 is proved in Section 8. The implications 12 and 23 follow immediately from Theorem 3.8. The only hard work left is the implication 31. It is not hard to see (Corollary 7.2) that approximate scalability implies there exists such that is ranknondecreasing. Next, via an algebraic geometry argument (Lemma 8.1), we show the existence of any such implies a generic has is ranknondecreasing.
Theorem 3.10:
Theorem 3.10 appears as Corollary 8.9 in Section 8.2, but we give an overview of the proof here. For and fixed, it is clear that the set which we denote , is a convex polytope since it is defined by a finite number of linear constraints.
It is not hard to see (Corollary 7.2 and Theorem 3.8) that is approximately scalable to if and only if for some . In other words, the obtainable pairs of spectra are
This set is not obviously convex, but due to the results of Section 8, we find that for generic ,
This tells us that for some , the obtainable spectra comprise the convex polytope , proving Theorem 3.10.
We remark that Theorem 3.10 could likely be obtained by other methods involving the representation theory of Lie algebras (see [32]), using which one might be able to show is what is known as a moment polytope. We could not see how to obtain Theorems 3.8 and 3.9 from those methods, however.
Theorems 3.11 and 3.12:
Our proofs of Theorem 3.8 and Theorem 3.9 are just shy of effective. While the approximate scalings in Theorem 3.8 are produced by iterated scaling (see Algorithm TOSI of Section 5.1 and Algorithm GOSI of Section 8.3), a priori the bitcomplexity of the scaled operators could grow exponentially. In Appendix 12.4 we obtain the efficient algorithms Algorithm 12.8 and Algorithm 12.10 by modifying the iterative scaling algorithms and rounding. The running times of the modified algorithms are described by Theorems 12.9 and Theorem 12.10.
The reader might wonder if analyzing the performance of Sinkhorn scaling for operators, which is called “Algorithm ” in [12] and “OSI” in [14], on the reduction in Section 4 would be sufficient to obtain our algorithmic results. However, the reduction only works if and have integral spectra and results in a completely positive map from . Thus, the dimension of the reduction depends on a common denominator for all the entries in the spectra of and , rendering the algorithms inefficient.
In fact, operator Sinkhorn scaling on the reduction amounts to Algorithm TOSI anyway, which is simpler to state without using the reduction.
Theorem 3.13
We will prove Theorem 3.13 as Corollary 12.13 and present the algorithm (Algorithm 12.12) in Section 12.4, but the proof is straightforward so we summarize it here.
Corollary 7.2 states that if if is an scaling for smaller than times the inverse of the least common denominator of the entries of and , then must be ranknondecreasing. In other words, . However, . From Theorem 3.12, we have a time algorithm (Algorithm 12.10) which outputs an scaling with probability at least whenever and ERROR otherwise. In particular, if Algorithm 12.10 never outputs other than ERROR and scalings (even if it did, we could easily check if they were).
Thus, running Algorithm 12.10 with small enough (in particular we can take ) and outputting NO if and only if the Algorithm 12.10 outputs ERROR is a time decision problem for membership in .
3.4 Additional background
Our algorithms will rely on the Cholesky decomposition.
Fact 3.14 (Existence and uniqueness of Cholesky decompositions).
Suppose is a flag in an dimensional vector space with , and that is an adapted basis for . If is positive semidefinite operator on , then there exists an operator