A Firstorder Method for Monotone Stochastic Variational Inequalities on Semidefinite Matrix Spaces
Abstract
Motivated by multiuser optimization problems and noncooperative Nash games in stochastic regimes, we consider stochastic variational inequality (SVI) problems on matrix spaces where the variables are positive semidefinite matrices and the mapping is merely monotone. Much of the interest in the theory of variational inequality (VI) has focused on addressing VIs on vector spaces. Yet, most existing methods either rely on strong assumptions, or require a twoloop framework where at each iteration, a projection problem, i.e., a semidefinite optimization problem needs to be solved. Motivated by this gap, we develop a stochastic mirror descent method where we choose the distance generating function to be defined as the quantum entropy. This method is a singleloop firstorder method in the sense that it only requires a gradienttype of update at each iteration. The novelty of this work lies in the convergence analysis that is carried out through employing an auxiliary sequence of stochastic matrices. Our contribution is threefold: (i) under this setting and employing averaging techniques, we show that the iterate generated by the algorithm converges to a weak solution of the SVI; (ii) moreover, we derive a convergence rate in terms of the expected value of a suitably defined gap function; (iii) we implement the developed method for solving a multipleinput multipleoutput multicell cellular wireless network composed of seven hexagonal cells and present the numerical experiments supporting the convergence of the proposed method.
I Introduction
Variational inequality problems first introduced in the 1960s have a wide range of applications arising in engineering, finance, and economics (cf. [1]) and are strongly tied to the game theory. VI theory provides a tool to formulate different equilibrium problems and analyze the problems in terms of existence and uniqueness of solutions, stability and sensitivity analysis. In mathematical programming, VIs encompass problems such as systems of nonlinear equations, optimization problems, and complementarity problems to name a few [2]. In this paper, we consider stochastic variational inequality problems where the variable is a positive semidefinite matrix. Given a set , and a mapping , a VI problem denoted by VI seeks a positive semidefinite matrix such that
(1) 
In particular, we study VI() where , i.e., the mapping is the expected value of a stochastic mapping where the vector is a random vector associated with a probability space represented by . Here, denotes the sample space, denotes a algebra on , and is the associated probability measure. Therefore, solves VI() if
(2) 
Throughout, we assume that is welldefined (i.e., the expectation is finite).
Ia Motivating Example
A noncooperative game involves a number of decision makers called players who have conflicting interests and each tries to minimize/maximize his own payoff/utility function. Assume there are players each controlling a positive semidefinite matrix variable which belongs to the set of all possible actions of the player denoted by . Let us define as the feasible actions of other players. Let the payoff function of player be quantified by . Then, each player needs to solve the following semidefinite optimization problem
(3) 
A solution to this game called a Nash equilibrium is a feasible strategy profile such that ,
for all , . As we discuss in Lemma 3, the optimality conditions of the above Nash game can be formulated as a VI where and .
Problem (3) has a wide range of applications in wireless communications and information theory. Here we discuss a communication network example.
Wireless Communication Networks: A wireless network is founded on transmitters that generate radio waves and receivers that detect radio waves. To enhance the performance of the wireless transmission system, multiple antennas can be used to transmit and receive the radio signals. This system is called multipleinput multipleoutput (MIMO) which provides high spectral efficiency in singleuser wireless links without interference [3]. Other MIMO systems include MIMO broadcast channels and MIMO multiple access channels, where there are multiple users (players) that mutually interfere. In these systems players either share the same transmitter or the same receiver. Recently, there has been much interest in MIMO systems under uncertainty when the state channel information is subject to measurement errors, delays or other imperfections [4]. Here, we consider the throughput maximization problem in multiuser MIMO networks under feedback errors and uncertainty. In this problem, we have MIMO links where each link represents a pair of transmitterreceiver with antennas at the transmitter and antennas at the receiver. We assume each of these links is a player of the game. Let and denote the signal transmitted from and received by the th link, respectively. The signal model can be described by , where is the directchannel matrix of link , is the crosschannel matrix between transmitter and receiver , and is a zeromean circularly symmetric complex Gaussian noise vector with the covariance matrix [5]. The action for each player is the transmit power, meaning that each transmitter wants to transmit at its maximal power level to improve its performance. However, doing so increases the overall interference in the system, which in turn, adversely impacts the performance of all involved transmitters and presents a conflict. It should be noted that we treat the interference generated by other users as an additive noise. Therefore, represents the multiuser interference (MUI) received by th player and generated by other players. Assuming the complex random vector follows a Guassian distribution, transmitter controls its input signal covariance matrix subject to two constraints: first the signal covariance matrix is positive semidefinite and second each transmitter’s maximum transmit power is bounded by a positive scalar . Under these assumptions, each player’s achievable transmission throughput for a given set of players’ covariance matrices is given by
(4) 
where is the MUIplusnoise covariance matrix at receiver [6]. The goal is to solve
(5) 
for all , where , .
IB Existing methods
Our primary interest in this paper lies in solving SVIs on semidefinite matrix spaces. Computing the solution to this class of problems is challenging mainly due to the presence of uncertainty and the semidefinite solution space. In what follows, we review some of the methods in addressing these challenges. More details are presented in Table I.
Reference  Problem  Characteristic  Assumptions  Space  Scheme  Singleloop  Rate 
Jiang and Xu [7]  VI  Stochastic  SM,S  Vector  SA  ✗  
Juditsky et al. [8]  VI  Stochastic  MM,S/NS  Vector  Extragradient SMP  ✗  
Lan et al. [9]  Opt  Deterministic  C,S/NS  Matrix  Primaldual Nesterov’s methods  ✗  
Mertikopoulos et al. [10]  Opt  Stochastic  C,S  Matrix  Exponential Learning  ✓  
Hsieh et al. [11]  Opt  Deterministic  NS,C  Matrix  BCD  ✗  superlinear 
Koshal et al. [12]  VI  Stochastic  MM,S  Vector  Regularized Iterative SA  ✗  
Yousefian et al. [13]  VI  Stochastic  PM,S  Vector  Averaging BSMP  ✗  
Yousefian et al. [14]  VI  Stochastic  MM,NS  Vector  Regularized Smooth SA  ✗  
Mertikopoulos et al. [4]  VI  Stochastic  SM,S  Matrix  Exponential Learning  ✓  
Necoara et al. [15]  Opt  Deterministic  C,S/NS  Vector  Inexact Lagrangian  ✗  
Our work  VI  Stochastic  MM, NS  Matrix  AMSMD  ✓ 
Stochastic approximation (SA) schemes: The SA method was first developed in [16] and has been very successful in solving optimization and equilibrium problems with uncertainties. Jiang and Xu [7] appear amongst the first who applied SA methods to address SVIs. In recent years, prox generalization of SA methods were developed for solving stochastic optimization problems [17, 18] and VIs. The monotonicity of the gradient mapping plays an important role in the convergence analysis of this class of solution methods. The extragradient method which relies on weaker assumptions, i.e., pseudomonotone mappings to address VIs was developed in [19], but this method requires two projections per iteration. Dang and Lan [20] developed a nonEuclidean extragradient method to address generalized monotone VIs. The prox generalization of the extragradient schemes to stochastic settings were developed in [8]. Averaging techniques first introduced in [21] proved successful in increasing the robustness of the SA method. In vector spaces equipped with nonEuclidean norms, Nemirovski et al. [17] developed the stochastic mirror descent (SMD) method for solving nonsmooth stochastic optimization problems. While SA schemes and their prox generalization can be applied directly to solve problems with semidefinite constraints, they result in a twoloop framework and require projection onto a semidefinite cone by solving an optimization problem at each iteration which increases the computational complexity.
Exponential learning methods: Optimizing over sets of positive semidefinite matrices is more challenging than vector spaces because of the form of the problem constraints. In this line of research, an approach based on matrix exponential learning (MEL) is proposed in [10] to solve the power allocation problem in MIMO multiple access channels. MEL is an optimization algorithm applied to positive definite nonlinear problems and has strong ties to mirror descent methods. MEL makes the use of quantum entropy as the distance generating function. Later, the convergence analysis of MEL is provided in [5] and its robustness w.r.t. uncertainties is shown. In [22], singleuser MIMO throughput maximization problem is addressed which is an optimization problem not a Nash game. In the multiple channel case, an optimization problem can be derived that makes the analysis much easier. However, there are some practical problems that cannot be treated as an optimization problem such as multiuser MIMO maximization discussed earlier. In this regard, [4] proposed an algorithm relying on MEL for solving Nplayer games under feedback errors and presented its convergence to a stable Nash equilibrium under a strong stability assumption. However, in most applications including the game (IA) the mapping does not satisfy this assumption.
Semidefinite and cone programming: Sparse inverse covariance estimation (SICE) is a procedure which improves the stability of covariance estimation by setting a certain number of coefficients in the inverse covariance to zero. Lu [23] developed two firstorder methods including the adaptive spectral projected gradient and the adaptive Nesterov’s smooth methods to solve the large scale covariance estimation problem. In this line of research, a block coordinate descent (BCD) method with a superlinear convergence rate is proposed in [11]. In conic programming with complicated constraints, many firstorder methods are combined with duality or penalty strategies [9, 15]. These methods are projection based and do not scale with the problem size.
Much of the interest in the VI regime has focused on addressing VIs on vector spaces. Moreover, in the literature of semidefinite programming, most of the methods address deterministic semidefinite optimization. Yet, there are many stochastic systems such as wireless communication systems that can be modeled as positive semidefnite Nash games. In this paper, we consider SVIs on matrix spaces where the mapping is merely monotone. Our main contributions are as follows:
(i) Developing an averaging matrix stochastic mirror descent (AMSMD) method: We develop an SMD method where we choose the distance generating function to be defined as the quantum entropy following [24]. It is a firstorder method in the sense that only a gradienttype of update at each iteration is needed. The algorithm does not need a projection step at each iteration since it provides a closedform solution for the projected point. To improve the robustness of the method for solving SVI, we use the averaging technique. Our work is an improvement to MEL method [4] and is motivated by the need to weaken the strong stability (monotonicity) requirement on the mapping. The main novelty of our work lies in the convergence analysis in absence of strong monotonicity where we introduce an auxiliary sequence and we are able to establish convergence to a weak solution of the SVI. Then, we derive a convergence rate of in terms of the expected value of a suitably defined gap function. To clarify the distinctions of our contributions, we prepared Table I where we summarize the differences between the existing methods and our work.
(ii) Implementation results: We present the performance of the proposed AMSMD method applied on the throughput maximization problem in wireless multiuser MIMO networks. Our results indicate the robustness of the AMSMD scheme with respect to problem parameters and uncertainty. Also, it is shown that the AMSMD outperforms both nonaveraging MSMD and MEL [4].
The paper is organized as follows. In Section II, we state the assumptions on the problem and outline our AMSMD algorithm. Section III contains the convergence analysis and the rate derived for the AMSMD method. We report some numerical results in Section IV and conclude in Section V.
Notation. Throughout, we let denote the set of all symmetric matrices and the cone of all positive semidefinite matrices. We define . The mapping is called monotone if for any , we have . Let denote the elements of matrix and denote the set of complex numbers. The norm denotes the spectral norm of a matrix being the largest singular value of . The trace norm of a matrix denoted by is the sum of singular values of the matrix. Note that spectral and trace norms are dual to each other [25]. We use to denote the set of solutions to VI().
Ii Algorithm outline
In this section, we present the AMSMD scheme for solving (2). Suppose is a strictly convex and differentiable function, where , and let . Then, Bregman divergence between and is defined as
In what follows, our choice of is the quantum entropy [26],
(6) 
The Bregman divergence corresponding to the quantum entropy is called von Neumann divergence and is given by
(7) 
[24]. In our analysis, we use the following property of .
Lemma 1
([27]) The quantum entropy is strongly convex with modulus 1 under the trace norm.
Since , the quantum entropy is also strongly convex with modulus 1 under the trace norm.
Next, we address the optimality conditions of a matrix constrained optimization problem as a VI which is an extension of Prop. 1.1.8 in [28].
Lemma 2
Let be a nonempty closed convex set, and let be a differentiable convex function. Consider the optimization problem
(8) 
A matrix is optimal to problem (8) iff and , for all .
() Assume is optimal to problem (8). Assume by contradiction, there exists some such that . Since is continuously differentiable, by the firstorder Taylor expansion, for all sufficiently small , we have
following the hypothesis . Since is convex and , we have with smaller objective function value than the optimal matrix . This is a contradiction. Therefore, we must have for all .
() Now suppose that and for all , . Since is convex , we have
for all which implies
where the last inequality follows by the hypothesis. Since , it follows that is optima. The next Lemma shows a set of sufficient conditions under which a Nash equilibrium can be obtained by solving a VI.
Lemma 3
[Nash equilibrium] Let be a nonempty closed convex set and be a differentiable convex function in for all , where and . Then, is a Nash equilibrium (NE) to game (3) if and only if solves VI(), where
(9)  
(10) 
First, suppose is an NE to game (3). We want to prove that solves VI(), i.e, , for all . By optimality conditions of optimization problem (3) and from Lemma 2, we know is an NE if and only if for all and all . Then, we obtain for all
(11) 
Invoking the definition of mapping given by (9) and from (II), we have From the definition of VI() and relation (1), we conclude that . Conversely, suppose . Then, . Consider a fixed and a matrix given by (10) such that the only difference between and is in th block, i.e.
where is an arbitrary matrix in . Then, we have
(12) 
Therefore, substituting by term (12), we obtain
Since was chosen arbitrarily, for any . Hence, by applying Lemma 2 we conclude that is a Nash equilibrium to game (3). Algorithm 1 presents the outline of the AMSMD method. At each iteration , first, using an oracle, a realization of the stochastic mapping is generated at , denoted by . Next, a matrix is updated using (14). Here is a nonincreasing stepsize sequence. Then, will be projected onto set using the closedform solution (15). Then the averaged sequence is generated using relations . Next, we state the main assumptions. Let us define the stochastic error at iteration as
(13) 
Let denote the history of the algorithm up to time , i.e., for and .
Assumption 1
Let the following hold:

The mapping is monotone and continuous over the set .

The stochastic mapping has a finite mean squared error, i.e, there exist some such that . (Under this assumption, the mean squared error of the stochastic noise is bounded.)

The stochastic noise has a zero mean, i.e., for all .
(14)  
(15) 
(16) 
Iii Convergence analysis
In this section, our interest lies in analyzing the convergence and deriving a rate statement for the sequence generated by the AMSMD method. Note that a solution of VI() is also called a strong solution. Next, we define a weak solution which is considered to be a counterpart of the strong solution.
Definition 1
(Weak solution) The matrix is called a weak solution to VI() if it satisfies , for all
Let us denote and the set of weak solutions and strong solutions to VI(), respectively.
Remark 1
Under Assumption 1(a), when the mapping is monotone, any strong solution of problem (2) is a weak solution, i.e., . Providing that is also continuous, the inverse also is true and a weak solution is a strong solution. Moreover, for a monotone mapping on a convex compact set e.g., , a weak solution always exists [8].
Unlike optimization problems where the function provides a metric for distinguishing solutions, there is no immediate analog in VI problems. However, we use the following residual function associated with a VI problem.
Definition 2
( function) Define the following function as
The next lemma provides some properties of the function.
Lemma 4
For an arbitrary , we have
for all . For , the above inequality suggests that implying that the function is nonnegative for all .
Assume is a weak solution. By Definition 1, , for all which implies
.
On the other hand, from Lemma 4, we get . We conclude that for any weak solution .
Conversely, assume that there exists an such that . Therefore, which implies for all implying is a weak solution.
Remark 2
Assume the sequence is nonincreasing and the sequence is given by the recursive rules (16) where and . Then, using induction, it can be shown that for any .
Next, we derive the conjugate of the quantum entropy and its gradient.
Proposition 1
Let and be defined as (6). Then, we have
(17)  
(18) 
is a lower semicontinuous convex function on the linear space of all symmetric matrices. The conjugate of function can be defined as
The minimizer of the above problem is which is called the Gibbs state (see [29], Example 3.29). We observe that is a positive semidefinite matrix with trace equal to one, implying that . By plugging it into Term 1, we have (17). The relation (18) follows by standard matrix analysis and the fact that [30]. Throughout, we use the notion of Fenchel coupling [31]:
(19) 
which provides a proximity measure between and and is equal to the associated Bregman divergence between and . We also make use of the following Lemma which is proved in Appendix.
Lemma 5
([4]) For all matrices and for all , the following holds
(20) 
Next, we develop an error bound for the G function. For simplicity of notation we use to denote .
Lemma 6
From the definition of in relation (13), the recursion in the AMSMD algorithm can be stated as
(22) 
Consider (20). From Algorithm 1 and (18), we have . Let and . From (22), we obtain
By adding and subtracting , we get
(23) 
where we used the monotonicity of mapping . Let us define an auxiliary sequence such that , where and define . From (III), invoking the definition of and by adding and subtracting , we obtain
(24) 
Then, we estimate the term . By Lemma 5 and setting and , we get
By plugging the above inequality into (III), we get
By summing the above inequality form to , and rearranging the terms, we have
(25) 
where the last inequality holds by [4]. By choosing and recalling that for , and [32], from (6), (17) and (19),
Plugging the above inequality into (III) yields
(26) 
Let us define and . We divide both sides of (III) by . Then for all ,
The set is a convex set. Since and , . Now, we take the supremum over the set with respect to and use the definition of the function. Note that the righthand side of the above inequality is independent of .
By taking expectations on both sides, we get
By definition, both and are measurable. Therefore, is measurable. In addition, is measurable. Thus, by Assumption 1(c), we have . Applying Assumption 1(b),
Next, we present convergence rate of the AMSMD scheme.