A First-order Method for Monotone Stochastic Variational Inequalities on Semidefinite Matrix Spaces

# A First-order Method for Monotone Stochastic Variational Inequalities on Semidefinite Matrix Spaces

Nahidsadat Majlesinasab, Farzad Yousefian, and Mohammad Javad Feizollahi Nahidsadat Majlesinasab is with the Department of Industrial Engineering and Management, Oklahoma State University, Stillwater, OK 74075, USA. nahid.majlesinasab@okstate.edu Farzad Yousefian is with Faculty of Industrial Engineering and Management, Oklahoma State University, Stillwater, OK 74078, USA. farzad.yousefian@okstate.eduMohammad Javad Feizollahi is with with Faculty of Robinson College of Business, Georgia State University, Atlanta, GA 30302, USA. mfeizollahi@gsu.edu
###### Abstract

Motivated by multi-user optimization problems and non-cooperative Nash games in stochastic regimes, we consider stochastic variational inequality (SVI) problems on matrix spaces where the variables are positive semidefinite matrices and the mapping is merely monotone. Much of the interest in the theory of variational inequality (VI) has focused on addressing VIs on vector spaces. Yet, most existing methods either rely on strong assumptions, or require a two-loop framework where at each iteration, a projection problem, i.e., a semidefinite optimization problem needs to be solved. Motivated by this gap, we develop a stochastic mirror descent method where we choose the distance generating function to be defined as the quantum entropy. This method is a single-loop first-order method in the sense that it only requires a gradient-type of update at each iteration. The novelty of this work lies in the convergence analysis that is carried out through employing an auxiliary sequence of stochastic matrices. Our contribution is three-fold: (i) under this setting and employing averaging techniques, we show that the iterate generated by the algorithm converges to a weak solution of the SVI; (ii) moreover, we derive a convergence rate in terms of the expected value of a suitably defined gap function; (iii) we implement the developed method for solving a multiple-input multiple-output multi-cell cellular wireless network composed of seven hexagonal cells and present the numerical experiments supporting the convergence of the proposed method.

## I Introduction

Variational inequality problems first introduced in the 1960s have a wide range of applications arising in engineering, finance, and economics (cf. [1]) and are strongly tied to the game theory. VI theory provides a tool to formulate different equilibrium problems and analyze the problems in terms of existence and uniqueness of solutions, stability and sensitivity analysis. In mathematical programming, VIs encompass problems such as systems of nonlinear equations, optimization problems, and complementarity problems to name a few [2]. In this paper, we consider stochastic variational inequality problems where the variable is a positive semidefinite matrix. Given a set , and a mapping , a VI problem denoted by VI seeks a positive semidefinite matrix such that

 tr(F(X∗)(X−X∗))≥0,for % all X∈X. (1)

In particular, we study VI() where , i.e., the mapping is the expected value of a stochastic mapping where the vector is a random vector associated with a probability space represented by . Here, denotes the sample space, denotes a -algebra on , and is the associated probability measure. Therefore, solves VI() if

 tr(E[Φ(X∗,ξ(w))](X−X∗))≥0, for all X∈X. (2)

Throughout, we assume that is well-defined (i.e., the expectation is finite).

### I-a Motivating Example

A non-cooperative game involves a number of decision makers called players who have conflicting interests and each tries to minimize/maximize his own payoff/utility function. Assume there are players each controlling a positive semidefinite matrix variable which belongs to the set of all possible actions of the player denoted by . Let us define as the feasible actions of other players. Let the payoff function of player be quantified by . Then, each player needs to solve the following semidefinite optimization problem

 minimizeXi∈Xifi(Xi,X−i). (3)

A solution to this game called a Nash equilibrium is a feasible strategy profile such that , for all , . As we discuss in Lemma 3, the optimality conditions of the above Nash game can be formulated as a VI where and .
Problem (3) has a wide range of applications in wireless communications and information theory. Here we discuss a communication network example.

 Ri(Xi,X−i) =logdet(Ini+∑Nj=1HjiXjH†ji) −logdet(W−i), (4)

where is the MUI-plus-noise covariance matrix at receiver [6]. The goal is to solve

 maximizeXi∈XiRi(Xi,X−i), (5)

for all , where , .

### I-B Existing methods

Our primary interest in this paper lies in solving SVIs on semidefinite matrix spaces. Computing the solution to this class of problems is challenging mainly due to the presence of uncertainty and the semidefinite solution space. In what follows, we review some of the methods in addressing these challenges. More details are presented in Table I.

Stochastic approximation (SA) schemes: The SA method was first developed in [16] and has been very successful in solving optimization and equilibrium problems with uncertainties. Jiang and Xu [7] appear amongst the first who applied SA methods to address SVIs. In recent years, prox generalization of SA methods were developed for solving stochastic optimization problems [17, 18] and VIs. The monotonicity of the gradient mapping plays an important role in the convergence analysis of this class of solution methods. The extragradient method which relies on weaker assumptions, i.e., pseudo-monotone mappings to address VIs was developed in [19], but this method requires two projections per iteration. Dang and Lan [20] developed a non-Euclidean extragradient method to address generalized monotone VIs. The prox generalization of the extragradient schemes to stochastic settings were developed in [8]. Averaging techniques first introduced in [21] proved successful in increasing the robustness of the SA method. In vector spaces equipped with non-Euclidean norms, Nemirovski et al. [17] developed the stochastic mirror descent (SMD) method for solving nonsmooth stochastic optimization problems. While SA schemes and their prox generalization can be applied directly to solve problems with semidefinite constraints, they result in a two-loop framework and require projection onto a semidefinite cone by solving an optimization problem at each iteration which increases the computational complexity.

Exponential learning methods: Optimizing over sets of positive semidefinite matrices is more challenging than vector spaces because of the form of the problem constraints. In this line of research, an approach based on matrix exponential learning (MEL) is proposed in [10] to solve the power allocation problem in MIMO multiple access channels. MEL is an optimization algorithm applied to positive definite nonlinear problems and has strong ties to mirror descent methods. MEL makes the use of quantum entropy as the distance generating function. Later, the convergence analysis of MEL is provided in [5] and its robustness w.r.t. uncertainties is shown. In [22], single-user MIMO throughput maximization problem is addressed which is an optimization problem not a Nash game. In the multiple channel case, an optimization problem can be derived that makes the analysis much easier. However, there are some practical problems that cannot be treated as an optimization problem such as multi-user MIMO maximization discussed earlier. In this regard, [4] proposed an algorithm relying on MEL for solving N-player games under feedback errors and presented its convergence to a stable Nash equilibrium under a strong stability assumption. However, in most applications including the game (I-A) the mapping does not satisfy this assumption.

Semidefinite and cone programming: Sparse inverse covariance estimation (SICE) is a procedure which improves the stability of covariance estimation by setting a certain number of coefficients in the inverse covariance to zero. Lu [23] developed two first-order methods including the adaptive spectral projected gradient and the adaptive Nesterov’s smooth methods to solve the large scale covariance estimation problem. In this line of research, a block coordinate descent (BCD) method with a superlinear convergence rate is proposed in [11]. In conic programming with complicated constraints, many first-order methods are combined with duality or penalty strategies [9, 15]. These methods are projection based and do not scale with the problem size.

Much of the interest in the VI regime has focused on addressing VIs on vector spaces. Moreover, in the literature of semidefinite programming, most of the methods address deterministic semidefinite optimization. Yet, there are many stochastic systems such as wireless communication systems that can be modeled as positive semidefnite Nash games. In this paper, we consider SVIs on matrix spaces where the mapping is merely monotone. Our main contributions are as follows:

(i) Developing an averaging matrix stochastic mirror descent (AM-SMD) method: We develop an SMD method where we choose the distance generating function to be defined as the quantum entropy following [24]. It is a first-order method in the sense that only a gradient-type of update at each iteration is needed. The algorithm does not need a projection step at each iteration since it provides a closed-form solution for the projected point. To improve the robustness of the method for solving SVI, we use the averaging technique. Our work is an improvement to MEL method [4] and is motivated by the need to weaken the strong stability (monotonicity) requirement on the mapping. The main novelty of our work lies in the convergence analysis in absence of strong monotonicity where we introduce an auxiliary sequence and we are able to establish convergence to a weak solution of the SVI. Then, we derive a convergence rate of in terms of the expected value of a suitably defined gap function. To clarify the distinctions of our contributions, we prepared Table I where we summarize the differences between the existing methods and our work.

(ii) Implementation results: We present the performance of the proposed AM-SMD method applied on the throughput maximization problem in wireless multi-user MIMO networks. Our results indicate the robustness of the AM-SMD scheme with respect to problem parameters and uncertainty. Also, it is shown that the AM-SMD outperforms both non-averaging M-SMD and MEL [4].

The paper is organized as follows. In Section II, we state the assumptions on the problem and outline our AM-SMD algorithm. Section III contains the convergence analysis and the rate derived for the AM-SMD method. We report some numerical results in Section IV and conclude in Section V.

Notation. Throughout, we let denote the set of all symmetric matrices and the cone of all positive semidefinite matrices. We define . The mapping is called monotone if for any , we have . Let denote the elements of matrix and denote the set of complex numbers. The norm denotes the spectral norm of a matrix being the largest singular value of . The trace norm of a matrix denoted by is the sum of singular values of the matrix. Note that spectral and trace norms are dual to each other [25]. We use to denote the set of solutions to VI().

## Ii Algorithm outline

In this section, we present the AM-SMD scheme for solving (2). Suppose is a strictly convex and differentiable function, where , and let . Then, Bregman divergence between and is defined as

 D(X,Y):=ω(X)−ω(Y)−tr((X−Y)∇ω(Y)T).

In what follows, our choice of is the quantum entropy [26],

 ω(X)={tr(XlogX−X) %ifX∈X,+∞otherwise. (6)

The Bregman divergence corresponding to the quantum entropy is called von Neumann divergence and is given by

 D(X,Y)=tr(XlogX−XlogY) (7)

[24]. In our analysis, we use the following property of .

###### Lemma 1

([27]) The quantum entropy is strongly convex with modulus 1 under the trace norm.

Since , the quantum entropy is also strongly convex with modulus 1 under the trace norm.

Next, we address the optimality conditions of a matrix constrained optimization problem as a VI which is an extension of Prop. 1.1.8 in [28].

###### Lemma 2

Let be a nonempty closed convex set, and let be a differentiable convex function. Consider the optimization problem

 minimize˜X∈Xf(˜X). (8)

A matrix is optimal to problem (8) iff and , for all .

{proof}

() Assume is optimal to problem (8). Assume by contradiction, there exists some such that . Since is continuously differentiable, by the first-order Taylor expansion, for all sufficiently small , we have

 f(˜X∗+α(^Z−˜X∗))=f(X∗)+ tr(∇T˜Xf(˜X∗)(^Z−˜X∗))+o(α)

following the hypothesis . Since is convex and , we have with smaller objective function value than the optimal matrix . This is a contradiction. Therefore, we must have for all .
() Now suppose that and for all , . Since is convex , we have

 f(˜X∗)+tr(∇T˜Xf(˜X∗)(Z−˜X∗))≤f(Z),

for all which implies

 f(Z)−f(˜X∗)≥tr(∇T˜Xf(˜X∗)(Z−˜X∗))≥0,

where the last inequality follows by the hypothesis. Since , it follows that is optima. The next Lemma shows a set of sufficient conditions under which a Nash equilibrium can be obtained by solving a VI.

###### Lemma 3

[Nash equilibrium] Let be a nonempty closed convex set and be a differentiable convex function in for all , where and . Then, is a Nash equilibrium (NE) to game (3) if and only if solves VI(), where

 F(X):≜diag(∇X1f1(X),⋯,∇XNfN(X)), (9) X:≜{X|X=diag(X1,⋯,XN),Xi∈Xi, for all i}. (10)
{proof}

First, suppose is an NE to game (3). We want to prove that solves VI(), i.e, , for all . By optimality conditions of optimization problem (3) and from Lemma 2, we know is an NE if and only if for all and all . Then, we obtain for all

 tr(∇TXifi(X∗)(Zi−X∗i))= ∑u∑v[∇Xifi(X∗)]uv[Zi−X∗i]uv≥0. (11)

Invoking the definition of mapping given by (9) and from (II), we have From the definition of VI() and relation (1), we conclude that . Conversely, suppose . Then, . Consider a fixed and a matrix given by (10) such that the only difference between and is in -th block, i.e.

 ¯Z=diag([X∗1],…,[X∗i−1],[Zi],[X∗i+1],…,[X∗N]),

where is an arbitrary matrix in . Then, we have

 ¯Z−X∗=diag(0n1×n1,…,[Zi−X∗i],…,0nN×nN). (12)

Therefore, substituting by term (12), we obtain

 tr(F(X∗)T(¯Z−X∗))=∑u∑v[∇Xifi(X∗)]uv ×[(Zi−X∗i)]uv=tr(∇TXifi(X∗)(Zi−X∗i))≥0.

Since was chosen arbitrarily, for any . Hence, by applying Lemma 2 we conclude that is a Nash equilibrium to game (3). Algorithm 1 presents the outline of the AM-SMD method. At each iteration , first, using an oracle, a realization of the stochastic mapping is generated at , denoted by . Next, a matrix is updated using (14). Here is a non-increasing step-size sequence. Then, will be projected onto set using the closed-form solution (15). Then the averaged sequence is generated using relations . Next, we state the main assumptions. Let us define the stochastic error at iteration as

 Zt:≜Φ(Xt,ξt)−F(Xt)for allt≥0. (13)

Let denote the history of the algorithm up to time , i.e., for and .

###### Assumption 1

Let the following hold:

• The mapping is monotone and continuous over the set .

• The stochastic mapping has a finite mean squared error, i.e, there exist some such that . (Under this assumption, the mean squared error of the stochastic noise is bounded.)

• The stochastic noise has a zero mean, i.e., for all .

## Iii Convergence analysis

In this section, our interest lies in analyzing the convergence and deriving a rate statement for the sequence generated by the AM-SMD method. Note that a solution of VI() is also called a strong solution. Next, we define a weak solution which is considered to be a counterpart of the strong solution.

###### Definition 1

(Weak solution) The matrix is called a weak solution to VI() if it satisfies , for all

Let us denote and the set of weak solutions and strong solutions to VI(), respectively.

###### Remark 1

Under Assumption 1(a), when the mapping is monotone, any strong solution of problem (2) is a weak solution, i.e., . Providing that is also continuous, the inverse also is true and a weak solution is a strong solution. Moreover, for a monotone mapping on a convex compact set e.g., , a weak solution always exists [8].

Unlike optimization problems where the function provides a metric for distinguishing solutions, there is no immediate analog in VI problems. However, we use the following residual function associated with a VI problem.

###### Definition 2

( function) Define the following function as

 G(X)=supZ∈Xtr(F(Z)(X−Z)),for all X∈X.

The next lemma provides some properties of the function.

###### Lemma 4

The function given by Definition 2 is a well-defined gap function, i.e, for all ; is a weak solution to problem (2) iff .

{proof}

For an arbitrary , we have

 G(X)=supZ∈Xtr(F(Z)(X−Z))≥tr(F(A)(X−A)),

for all . For , the above inequality suggests that implying that the function is nonnegative for all .
Assume is a weak solution. By Definition 1, , for all which implies . On the other hand, from Lemma 4, we get . We conclude that for any weak solution . Conversely, assume that there exists an such that . Therefore, which implies for all implying is a weak solution.

###### Remark 2

Assume the sequence is non-increasing and the sequence is given by the recursive rules (16) where and . Then, using induction, it can be shown that for any .

Next, we derive the conjugate of the quantum entropy and its gradient.

###### Proposition 1

Let and be defined as (6). Then, we have

 ω∗(Y)=log(tr(exp(Y+In))), (17) ∇ω∗(Y)=exp(Y+In)tr(exp(Y+In)). (18)
{proof}

is a lower semi-continuous convex function on the linear space of all symmetric matrices. The conjugate of function can be defined as

 ω∗(Y)=sup{tr(DY)−ω(D): D∈X} =sup{tr(DY)−tr(DlogD−D):D∈X} =−inf{−tr(D(Y+In))+tr(DlogD):D∈XTerm 1}.

The minimizer of the above problem is which is called the Gibbs state (see [29], Example 3.29). We observe that is a positive semidefinite matrix with trace equal to one, implying that . By plugging it into Term 1, we have (17). The relation (18) follows by standard matrix analysis and the fact that [30]. Throughout, we use the notion of Fenchel coupling [31]:

 H(Q,Y)≜ω(Q)+ω∗(Y)−tr(QY), (19)

which provides a proximity measure between and and is equal to the associated Bregman divergence between and . We also make use of the following Lemma which is proved in Appendix.

###### Lemma 5

([4]) For all matrices and for all , the following holds

 H(X,Y+Z)≤H(X,Y)+tr(Z(∇ω∗(Y)−X))+∥Z∥22. (20)

Next, we develop an error bound for the G function. For simplicity of notation we use to denote .

###### Lemma 6

Consider problem (2). Let and the sequence be generated by AM-SMD algorithm. Suppose Assumption 1 holds. Then, for any ,

 E[G(¯¯¯¯¯XT)] ≤2(log(n)+∑T−1t=0η2tC2∑T−1t=0ηt). (21)
{proof}

From the definition of in relation (13), the recursion in the AM-SMD algorithm can be stated as

 Yt+1=Yt−ηt(F(Xt)+Zt). (22)

Consider (20). From Algorithm 1 and (18), we have . Let and . From (22), we obtain

 H(X,Yt+1)≤H(X,Yt)−ηttr((Xt−X)(F(Xt)+Zt)) +η2t∥F(Xt)+Zt∥22.

By adding and subtracting , we get

 H(X,Yt+1)≤H(X,Yt)−ηttr((Xt−X)Zt) −ηttr((Xt−X)F(X))+η2t∥F(Xt)+Zt∥22, (23)

where we used the monotonicity of mapping . Let us define an auxiliary sequence such that , where and define . From (III), invoking the definition of and by adding and subtracting , we obtain

 ηttr((Xt−X)F(X))≤H(X,Yt)−H(X,Yt+1)+ ηttr((Vt−Xt)Zt)+ηt% tr((X−Vt)Zt)+η2t∥Φt∥22. (24)

Then, we estimate the term . By Lemma 5 and setting and , we get

 ηttr((X−Vt)Zt) ≤H(X,Ut)−H(X,Ut+1) +η2t∥Zt∥22.

By plugging the above inequality into (III), we get

 ηttr((Xt−X)F(X))≤H(X,Yt)−H(X,Yt+1) +H(X,Ut)−H(X,Ut+1)+η2t∥Zt∥22 +ηttr((Vt−Xt)Zt)+η2t∥Φt∥22.

By summing the above inequality form to , and rearranging the terms, we have

 ∑T−1t=0ηttr((Xt−X)F(X))≤H(X,Y0)−H(X,YT)
 +H(X,U0)−H(X,UT)+∑T−1t=0η2t∥Zt∥22+ ∑T−1t=0ηttr((Vt−Xt)Zt)+∑T−1t=0η2t∥Φt∥22 ≤H(X,Y0)+H(X,U0)+∑T−1t=0η2t∥Zt∥22+ ∑T−1t=0ηttr((Vt−Xt)Zt)+∑T−1t=0η2t∥Φt∥22, (25)

where the last inequality holds by [4]. By choosing and recalling that for , and [32], from (6), (17) and (19),

 H(X,Y0)=H(X,U0)=tr(XlogX−X)−tr(X) +logtr(exp(2In))≤0−1−1+log(n)≤log(n).

Plugging the above inequality into (III) yields

 ∑T−1t=0ηttr((Xt−X)F(X))=∑T−1t=0ηttr((Xt−X) F(X))≤2log(n)+∑T−1t=0η2t∥Zt∥22+ ∑T−1t=0ηttr((Vt−Xt)Zt)+∑T−1t=0η2t∥Φt∥22. (26)

Let us define and . We divide both sides of (III) by . Then for all ,

 tr((T−1∑t=0γtXt−X)F(X))=tr((¯¯¯¯¯XT−X)F(X)) ≤1∑T−1t=0ηt(2log(n)+T−1∑t=0η2t∥Zt∥22 +∑T−1t=0ηttr((Vt−Xt)Zt)+∑Tt=iη2t∥Φt∥22).

The set is a convex set. Since and , . Now, we take the supremum over the set with respect to and use the definition of the function. Note that the right-hand side of the above inequality is independent of .

 ∑T−1t=0ηttr((Vt−Xt)Zt)+∑T−1t=0η2t∥Φt∥22).

By taking expectations on both sides, we get

 E[G(¯¯¯¯¯XT)]≤1∑T−1t=0ηt(2log(n)+T−1∑t=0η2tE[∥Zt∥22]+ ∑T−1t=0ηtE[tr((Vt−Xt)Zt)]+∑T−1t=0η2tE[∥Φt∥22]).

By definition, both and are -measurable. Therefore, is -measurable. In addition, is -measurable. Thus, by Assumption 1(c), we have . Applying Assumption 1(b),

 E[G(¯¯¯¯¯XT)] ≤2∑T−1t=0ηt(log(n)+∑T−1t=0η2tC2).

Next, we present convergence rate of the AM-SMD scheme.

###### Theorem 1

Consider problem (2) and let the sequence be generated by AM-SMD algorithm. Suppose Assumption 1 holds. Then,

 ηt=1C√log(n)T,for allt≥0, (27) E[G(