Operator Scaling with Specified Marginals

# Operator scaling with specified marginals

## Abstract

The completely positive operators, which can be viewed as a generalization of the nonnegative matrices, are maps between spaces of linear operators arising in the study of C-algebras. The existence of the operator analogues of doubly stochastic scalings of matrices, the study of which is known as operator scaling, is equivalent to a multitude of problems in computer science and mathematics such rational identity testing in non-commuting variables, noncommutative rank of symbolic matrices, and a basic problem in invariant theory (Garg, Gurvits, Oliveira and Wigderson, FOCS, 2016).
We study operator scaling with specified marginals, which is the operator analogue of scaling matrices to specified row and column sums (or marginals). We characterize the operators which can be scaled to given marginals, much in the spirit of the Gurvits’ algorithmic characterization of the operators that can be scaled to doubly stochastic (Gurvits, Journal of Computer and System Sciences, 2004). Our algorithm, which is a modified version of Gurvits’ algorithm, produces approximate scalings in time whenever scalings exist. A central ingredient in our analysis is a reduction from operator scaling with specified marginals to operator scaling in the doubly stochastic setting.
Instances of operator scaling with specified marginals arise in diverse areas of study such as the Brascamp-Lieb inequalities, communication complexity, eigenvalues of sums of Hermitian matrices, and quantum information theory. Some of the known theorems in these areas, several of which had no algorithmic proof, are straightforward consequences of our characterization theorem. For instance, we obtain a simple algorithm to find, when they exist, a tuple of Hermitian matrices with given spectra whose sum has a given spectrum. We also prove new theorems such as a generalization of Forster’s theorem (Forster, Journal of Computer and System Sciences, 2002) concerning radial isotropic position.

## 1 Introduction

Completely positive maps are linear maps between spaces of linear operators that, informally speaking, preserve positive-semidefiniteness in a strong sense. Completely positive maps generalize nonnegative matrices in some sense and arise naturally in quantum information theory and the study of C-algebras [20]. If is a complex inner product space, let denote the space of Hermitian operators on . To each completely positive map is associated another completely positive operator known as the dual of . In analogy with the matrix case, say a completely positive map is doubly stochastic if and . A scaling of a completely positive map by a pair of invertible linear maps is the completely positive map . One is led to ask which completely positive maps have doubly stochastic scalings; operator scaling is the study of this question. In fact, several other problems such as rational identity testing in non-commuting variables, membership in the null-cone of the left-right action of [12], and a special case of Edmonds’ problem [14] each reduce to (or are equivalent to) an approximate version of this question. In [14], Gurvits gave two useful equivalent conditions for approximate scalability: a completely positive map can be approximately scaled to doubly stochastic if and only if is rank-nondecreasing, i.e. for all , or equivalently where

 capT:=infX⪰0,detX=1detT(X).

Gurvits also gave an algorithm to compute approximate scalings if either of these equivalent conditions hold. The authors of [12], [11], and [14] analyzed the same algorithm to obtain polynomial-time decision algorithms for each of the aforementioned problems.

We consider a natural generalization of doubly stochastic scalings. Say maps if and and say is an -scaling of if is a scaling of that maps .

###### Question 1.

Given positive semidefinite matrices and and a completely positive map , does have an -scaling?

We extend Gurvits’ characterization of approximate scalability to the setting of Question 1. As in [14], our existence proofs lead to algorithms that efficiently produce approximate scalings when they exist. Theorem 3.8, which closely resembles the characterization in [14], characterizes the existence of approximate -scalings by block-upper-triangular matrices. Theorem 3.9 extends this characterization to handle scalings in the full general-linear group with a somewhat surprising outcome - informally, a completely positive map has approximate -scalings if and only if a suitable random scaling of satisfies the conditions of Theorem 3.8 with high probability. We also give an exponential time algorithm to decide if can be scaled to map with arbirarily small error.
A close variant of Question 1 first appeared in [13], in which the authors propose -scalings as quantum analogues of Markov chains satisfying certain relative entropy minimality conditions. The authors of [13] conjectured a partial answer to Question 1, which was confirmed in [9]. Our Theorem 10.20 extends the answer of [9], and prove the conjecture of [13] apart from one small caveat.
This paper is organized as follows: in Section 2, we describe several questions that can be reduced to Question 1 and for which our results yield a number of new characterizations and algorithms. In Section 3, after providing the necessary background, we state our main results, Theorems 3.8 and 3.9. We prove Theorem 3.8 in Sections 4 through 7 and Theorem 3.9 in Section 8. In Section 9 we describe a sufficient condition called -indecomposability that guarantees the existence of exact scalings. Finally in Section 10 bring Theorems 3.8 and 3.9 to bear on the questions from Section 2.

## Acknowledgements

The author would like to thank Michael Saks for many insightful discussions, and Rafael Oliveira for interesting observations and pointers to relevant literature.

## 2 Applications and special cases

Here we mention a few questions that can be answered by via reduction to Question 1.

###### Question 2 (Matrix scaling).

Given a nonnegative matrix and nonnegative row- and column-sum vectors and , do there exist diagonal matrices such that the row (resp. column) sums of are (resp. c)?

It is well-known that matrix scaling can be reduced to an instance of operator scaling with specified marginals, but Gurvits’ characterization does not apply to this instance unless and are the all-ones vectors. In 10.1, we recall the reduction from Question 2 to Question 1 and derive the classic theorem of [29] on the existence of such scalings as a consequence of Theorem 3.8.

###### Question 3 (Eigenvalues of sums of Hermitian matrices).

Given nonincreasing sequences of real numbers, are the spectra of some Hermitian matrices satisfying ?

In [21], Klyachko showed (amazingly) that the answer to Question 3 is “yes” if and only if satisfy a certain finite set of linear inequality constraints. That is, such form a polyhedral cone. A long line of work has been devoted to describing the set , which has connections to representation theory, Schubert calculus, and combinatorics ([22], [21], [10]). There are even polynomial-time algorithms to test if satisfy [27]. However, no previous work has provided an algorithm to find the Hermitian matrices in question. Our reduction, which can be found in 10.2, yields an algorithmic proof of the result in [21]. That is, we exhibit an algorithm that outputs a sequence of Hermitian matrices (in particular, real symmetric matrices!) with spectra approaching if satisfy .

###### Question 4 (Forster’s scalings).

Given vectors , nonnegative numbers , and a positive-semidefinite matrix , when does there exist an invertible linear transformation such that

 n∑i=1piBui(Bui)†∥Bui∥2=Q?

Barthe [2] answered this question completely for the case . Forster independently answered Question 4 in the positive for , in general position, and ; as a consequence he was able to prove previously unattainable lower bounds in communication complexity [8]. As noted in [14], Forster’s result is a consequence of Gurvits’ characterization of doubly stochastic scalings. In 10.3 we reduce the general case of Question 4 to an instance of Question 1, and use this reduction to answer the approximate version of Question 4. For fixed and , the admissible form a convex polytope whose form is a natural generalization of the polytope, known as the basis polytope, described in [2]. In fact, one can derive the Schur-Horn theorem on diagonals of Hermitian matrices with given spectra [16] from our answer to Question 4.
Lastly, we hope our techniques will be of use in quantum information theory. The completely positive maps that map have a meaningful interpretation: by a fact known as channel-state duality [19], to each completely positive map is associated a unique unnormalized mixed bipartite quantum state . The operator maps if and only if and . That is, the local mixed states induced by are and . Operator scaling has established connections between separable quantum states and matroid theory [14], so perhaps our techniques can shed further light on such relationships. We discuss this further in Section 11.

## 3 Preliminaries and main theorems

Before presenting the main theorems we fill in some background and justify a few assumptions we will make throughout the paper. The notation established in 3.1 and 3.2 will be summarised in 3.6.

### 3.1 Preliminaries

###### Definition 3.1 (Completely positive maps).

A completely positive map is a map of the form

 T:X↦r∑i=1AiXA†i,

where and are finite dimensional complex inner product spaces and are linear maps called Kraus operators of . Note that preserves positive-semidefiniteness. The map is given by

 T∗:X↦r∑i=1A†iXAi,

and is the adjoint of in the trace inner product. Recall that we say maps if and .

###### Definition 3.2 (Scalings of completely positive maps).

If is a completely positive map, and , we define the completely positive map by

 Tg,h:X↦g†T(hXh†)g.

Observe that

 (Tg,h)∗=T∗h,g.

is called the scaling of by .

Here (resp. ) will be a subset of (resp. ), and often a subgroup.

###### Definition 3.3 (Approximate scalings).

Say a scaling of is an --scaling if maps with and . If and , say is approximately -scalable to if for all , has an --scaling by .

If and are invertible, approximate (resp. exact) scalability to is equivalent to approximate (resp. exact) scalability to for and , so we mainly restrict attention to -scalings.
It will be handy to be able to easily move back and forth between -scalings and -scalings. The following easy lemma, which we prove in Appendix 12.2, gives us this freedom.

###### Lemma 3.1.

Suppose and are positive-definite, , and that is block-diagonal. The following are equivalent:

1. is approximately (resp) exactly scalable to by .

2. is approximately (resp) exactly scalable to by .

3. is approximately (resp) exactly scalable to by .

4. is approximately (resp) exactly scalable to by .

Moreover, if has an --scaling by then has -, -, and --scalings by .

Henceforward (resp. ) denotes a positive-semidefinite operator in (resp. ). We further assume because .

#### Flags

We will think of positive-semidefinite operators in terms of their spectrum and an associated sequence of subspaces called a flag.

###### Definition 3.4 (Flags).

1. If is an -dimensional vector space, a flag on is a sequence of subspaces

 0⊂Fi1⊊⋯⊊Fik⊂V

where .

2. The signature of , denoted , is the set of dimensions appearing in the flag. Say a flag is complete if it has signature ; else is partial.

3. The standard flag in an orthonormal basis of is the complete flag

 F∙=({0},⟨f1⟩,…,⟨f1,…,fn−1⟩,V).
4. Conversely, a complete flag is the standard flag in a unique orthonormal basis up to multiplication of each basis vector by a complex number of modulus 1. In general, if is a flag, say is an adapted basis for if is orthonormal and for . That is, is a subflag of the standard flag in .

5. If is a set of linear transformations of , denote the set

 {h:hFi⊂Fi for all i∈σ(F∘)}.

When is a subgroup of , is the stabilizer subgroup of under the action of .

###### Definition 3.5 (Block-upper-triangular scalings).

If is flag on and is a linear operator on , we say is block-upper-triangular (with respect to ) if

 hFi⊂Fi for all i∈σ(F∘).

If is a set of linear transformations of , let

 HF∘:={h∈H:h is block-upper-triangular w.r.t F∘}.

When is a subgroup of , is the stabilizer subgroup of under the action of .
Note that a linear transformation is block-upper-triangular if and only if the matrix for is block-upper-triangular with block-sizes in an adapted basis for .

Next we discuss how to view Hermitian operators in terms of their spectra and an associated flag. is a subspace, let denote the orthogonal projection to . Observe that if is a flag and a sequence of positive numbers, then

 ∑i∈σ(F∘)ciπFi

is a positive-semidefinite operator in .
In fact, every positive semidefinite operator has a unique representation of this form; this can be seen by taking the sequence to be the sequence of sums of eigenspaces of in decreasing order of eigenvalue. More precisely:

###### Fact 3.2 (See the survey [10], e.g.).

Let be positive-semidefinite. Let denote the spectrum of . Then there is a unique shortest flag, denoted , such that there exist satisfying

 ∑i∈σ(F∘(A))ciπFi(A)=A.

Further, we have and for where .

Note that for any flag such that , we must have . Thus, not all spectra and flags are compatible. We give a name to those flags that are compatible with given spectrum.

###### Definition 3.6 (α-partial flag).

If is a non-increasing sequence of nonnegative numbers, say is an -partial flag if

 σ(F∘)⊃{i∈[n]:αi−αi+1>0}.

It will be useful to have some shorthand for the difference sequence .

###### Definition 3.7 (Difference sequences).

If is a sequence, define to be the sequence

 Δαi=αi−αi+1.

Here . We define

 σ(α)={i:Δαi≠0}.

Note that if and only if is an -partial flag.

###### Definition 3.8 (Flag notation convention).

1. will denote the flag and will denote the flag .

2. and will denote adapted bases for and , respectively.

Note that (resp. ) is diagonal with nonincreasing diagonal in basis (resp. ).

###### Definition 3.9 (Projections to flags).

For , let be a partial isometry. That is, is , the orthogonal projection to . In the basis (and basis for ) we have

 nηj=⎡⎢ ⎢⎣1…00…0⋮⋱⋮⋮⋱⋮0…10…0⎤⎥ ⎥⎦j.

Let for be the analogous partial isometries.

#### Restrictions on scalings

We must impose some restrictions on in order for our methods to work. Luckily, this level of generality suffices for all the applications known to the author. In particular, these restrictions will never rule out the case and , so any reader only interested in -scalings can safely skim this subsection.
Our characterization can apply in a more general setting than the one discussed here. For the sake of simplicity, we describe this more general setting in Remark 8.12 after presenting our main theorems and algorithms.

Our groups will take the form

 G=⨁iGL(Wi) and H=⨁iGL(Vi) (1)

where and (here the direct sums are assumed to be orthogonal direct sums).

For our proof techniques to work, we must assume respects the decompositions and . That is, we require

 T⨁iL(Vi)⊂⨁jL(Wj) (2) T∗⨁jL(Wj)⊂⨁iL(Vi). (3)

If 2 and 3 hold, say is compatible with and . Note that if is compatible with and , then and where , are positive-semidefinite operators. Say an operator of this form is compatible with , and analogously for and . Observe that compatibility of with depends only on . Further, and are compatible if and only if

 Fj(B)=⨁iFj(B)∩Wi.

For this reason we say a flag is compatible with if for all ; we define compatibility with analogously. Since we are interested in -scalability, we may assume and are compatible with and , respectively, or else any that is compatible with and is clearly neither exactly nor approximately -scalable to . We summarise our assumptions in the next definition.

###### Definition 3.10 (Block-diagonal).

If there exist decompositions and such that

1. and satisfy 1,

2. is compatible with and , and

3. and are compatible with and , respectively,

say is block-diagonal. For convenience, say is block-diagonal if is block-diagonal.

###### Example 1.

The tuple is always block-diagonal.

###### Example 2.

If Kraus operators of satisfy

 AlV⊂{0}…{0}⊕Wi(l)⊕{0}… (4) and A†lW⊂{0}…{0}⊕Vj(l)⊕{0}… (5)

for some and , then is compatible with and .
As we will see in Section 10, the Kraus operators of the completely positive maps arising in examples 2, 3, and 4 satisfy the containments 4 and 5.

###### Observation 3.3.

If is block-diagonal, then approximate or exact -scalability of to depends only on the spectra of and .

###### Proof.

All and with fixed spectra such that is block-diagonal are conjugate by unitaries in and , respectively. However, for any unitaries and , the change of variables by the transformation shows approximate (resp. exact) -scalability to is equivalent to approximate (resp. exact) -scalability to . ∎

#### Extensions of Gurvits’ conditions

We remind the reader of Gurvits’ theorem characterizing scalability of completely positive maps to doubly stochastic.

###### Theorem 3.4 ([14]).

Suppose is a completely positive map. The following are equivalent:

1. is rank-nondecreasing, that is, for all , .

2. is approximately -scalable to .

In order to state our main theorems, we’ll need extensions of rank-nondecreasingness and capacity. To define our notion of rank-nondecreasingness, we’ll define a polytope depending on , and then define to have the rank-nondecreasingness property if is in the polytope defined by .

###### Definition 3.11 (rank-nondecreasingness for specified marginals).

Suppose are given partial flags.

1. We say a pair of subspaces is -independent if for all .

2. Define to be the the set of satisfying , , and

 and ∑i∈σ(E∘)ΔqidimEi∩L+∑j∈σ(F∘)ΔpjdimFj∩R≤N. (6)

for all -independent pairs . Because the coefficients of the in the above sum can take on only a finitely many values, is a convex polytope.

3. Say is -rank-nondecreasing if .

This definition extends the definition of rank-nondecreasingness. Rank-nondecreasingness is usually, and equivalently, defined by the nonexistence of a shrunk subspace, or a subspace such that . Since

 (L,(∑AiL)⊥)

is a -independent pair and all other -independent pairs have , there is no shrunk subspace if and only if for all -independent pairs . That is, is rank-nondecreasing if and only if is -rank-nondecreasing, because .

###### Remark 3.5.

-rank-nondecreasingness does not depend on the particular choice of Kraus operators for , because -independence of does not depend on the choice of Kraus operators for . This is because is -independent if and only if , where denotes the orthogonal projection to the subspace .

We will need a variant of the determinant that depends on additional argument which is a positive semidefinite operator.

###### Definition 3.12 (Relative determinant).

Define the determinant of relative to , denoted , by

 det(P,X)=∏j∈σ(F∘(P))(detηjXη†j)Δpj. (7)

Of course, can be defined analogously for any positive-semidefinite operator .

The relative determinant inherits a few of the multiplicative properties of determinant when restricted to block-upper-triangular matrices.

###### Lemma 3.6 (Properties of det(P,X)).

If , then

 det(P,Xh) =det(P,X)det(P,h), (8) det(P,h†Xh) =det(P,h†h)det(P,X), (9) \emph{and} det(P,h−†h−1) =det(P,h†h)−1. (10)

We defer the (easy) proofs to Appendix 12.2.

###### Remark 3.7.

Observe that

 logdet(P,P)=TrPlogP.

Equivalently, is the von Neumann entropy of , denoted , which is equal to the Shannon entropy of the spectrum of . One can also draw some parallels with the quantum relative entropy. By the -concavity of the determinant, is concave in . Further, we will see that for fixed nonsingular , is maximized at subject to and . Thus, it can be intuitively helpful to think of as a cross-entropy of and , and

 −logdet(P,XP−1)

as a relative entropy of with respect to , though it is not equal to the Von-Neumann relative entropy.

Finally we come to an extension of Gurvits’ capacity.

###### Definition 3.13 (Capacity for specified marginals).

Here we take . Recall from Definition 3.9 that , are partial isometries. Define

 cap(T,P,Q)=infh∈GL(V)F∘det(Q,T(hPh†))det(P,h†h), (11)

If -partial flags and -partial flags and are given, then refers to the quantity where and are the unique operators with , and , .

Note that and . By the existence of Cholesky decompositions, . This implies for , so is an extension of the usual capacity.

### 3.2 Main theorems

We are ready to state our analogue of Gurvits’ characterization for block-upper-triangular scalings. Gurvits’ characterization is the special case and of the following theorem. Recall that if is a flag on and is a subgroup of , then is the subgroup of fixing each subspace in .

###### Theorem 3.8.

Suppose is a completely positive map and are positive-semidefinite. The following are equivalent:

1. is -rank-nondecreasing.

2. .

3. is approximately -scalable to for all and such that is block-diagonal.

For fixed, the completely positive map is either -rank-nondecreasing for no , or is -rank-nondecreasing for generic . Here “generic ”, a strengthening of “almost all ”, means “all not in the zero set of some fixed finite set of polynomials, none of which vanishes on ”. This allows us to extend our characterization from -scalability to a characterization of -scalability.

###### Theorem 3.9.

Suppose is block-diagonal. The following are equivalent:

1. is -rank-nondecreasing for generic .

2. for generic .

3. is approximately -scalable to .

Recall that for block-diagonal, the -scalability of to depends only on the spectra of and . In fact, the spectra for which approximate scaling can be done form a convex polytope.

###### Theorem 3.10.

The spectra of pairs of positive-semidefinite operators such that

1. is block-diagonal and

2. is approximately -scalable to

forms a convex polytope, which we denote

We also obtain algorithmic counterparts of Theorem 3.8 and Theorem 3.9. Let denote the least nonzero eigenvalues of and , and let be the total bit-complexity of the input where is given by Kraus operators written down in bases and in which and , respectively, are diagonal.

###### Theorem 3.11.

Suppose is block-diagonal. There is a deterministic algorithm of time-complexity that takes as input and outputs such that is an --scaling whenever and ERROR otherwise.

###### Theorem 3.12.

Suppose is block-diagonal. There is a randomized algorithm of time-complexity that takes as input and outputs such that is an --scaling with probability at least 2/3 whenever is approximately -scalable to and ERROR otherwise.

We can give only a randomized exponential time algorithm for the decision version of our problem. It would be interesting to find a polynomial time algorithm for this, as it would be a step towards finding a truly polynomial time algorithm for membership in the Kronecker polytope. Note that the only exponential dependence is on the bit complexity of the spectra and .

###### Theorem 3.13.

Suppose is block-diagonal. There is a randomized algorithm of time-complexity to decide if Equivalently, the algorithm decides if is approximately -scalable to .

### 3.3 Proof overviews and discussion

#### Theorem 3.8:

(1 2):

We prove -rank-nondecreasingness implies in Section 6 using the reduction to the doubly stochastic case from Section 4 and some concavity properties of capacity.

(2 3):

We prove implies is -scalable to in Section 5 by analyzing Sinkhorn scaling as in [14], but with replaced by .

(3 1):

That -scalability of to implies is -rank-nondecreasing is a direct linear algebra argument presented in Section 7.

### Theorem 3.9:

Theorem 3.9 is proved in Section 8. The implications 12 and 23 follow immediately from Theorem 3.8. The only hard work left is the implication 31. It is not hard to see (Corollary 7.2) that approximate -scalability implies there exists such that is -rank-nondecreasing. Next, via an algebraic geometry argument (Lemma 8.1), we show the existence of any such implies a generic has is -rank-nondecreasing.

### Theorem 3.10:

Theorem 3.10 appears as Corollary 8.9 in Section 8.2, but we give an overview of the proof here. For and fixed, it is clear that the set which we denote , is a convex polytope since it is defined by a finite number of linear constraints.
It is not hard to see (Corollary 7.2 and Theorem 3.8) that is approximately -scalable to if and only if for some . In other words, the obtainable pairs of spectra are

 ⋃g∈G,h∈HK(Tg,h,E∘,F∘).

This set is not obviously convex, but due to the results of Section 8, we find that for generic ,

 K(Tg,h,E∘,F∘)=⋃g′∈G,h′∈HK(Tg′,h′,E∘,F∘).

This tells us that for some , the obtainable spectra comprise the convex polytope , proving Theorem 3.10.
We remark that Theorem 3.10 could likely be obtained by other methods involving the representation theory of Lie algebras (see [32]), using which one might be able to show is what is known as a moment polytope. We could not see how to obtain Theorems 3.8 and 3.9 from those methods, however.

### Theorems 3.11 and 3.12:

Our proofs of Theorem 3.8 and Theorem 3.9 are just shy of effective. While the approximate scalings in Theorem 3.8 are produced by iterated scaling (see Algorithm TOSI of Section 5.1 and Algorithm GOSI of Section 8.3), a priori the bit-complexity of the scaled operators could grow exponentially. In Appendix 12.4 we obtain the efficient algorithms Algorithm 12.8 and Algorithm 12.10 by modifying the iterative scaling algorithms and rounding. The running times of the modified algorithms are described by Theorems 12.9 and Theorem 12.10.
The reader might wonder if analyzing the performance of Sinkhorn scaling for operators, which is called “Algorithm ” in [12] and “OSI” in [14], on the reduction in Section 4 would be sufficient to obtain our algorithmic results. However, the reduction only works if and have integral spectra and results in a completely positive map from . Thus, the dimension of the reduction depends on a common denominator for all the entries in the spectra of and , rendering the algorithms inefficient. In fact, operator Sinkhorn scaling on the reduction amounts to Algorithm TOSI anyway, which is simpler to state without using the reduction.

### Theorem 3.13

We will prove Theorem 3.13 as Corollary 12.13 and present the algorithm (Algorithm 12.12) in Section 12.4, but the proof is straightforward so we summarize it here.
Corollary 7.2 states that if if is an --scaling for -smaller than times the inverse of the least common denominator of the entries of and , then must be -rank-nondecreasing. In other words, . However, . From Theorem 3.12, we have a -time algorithm (Algorithm 12.10) which outputs an --scaling with probability at least whenever and ERROR otherwise. In particular, if Algorithm 12.10 never outputs other than ERROR and -scalings (even if it did, we could easily check if they were).
Thus, running Algorithm 12.10 with small enough (in particular we can take ) and outputting NO if and only if the Algorithm 12.10 outputs ERROR is a -time decision problem for membership in .