A First-order Method for Monotone Stochastic Variational Inequalities on Semidefinite Matrix Spaces
Motivated by multi-user optimization problems and non-cooperative Nash games in stochastic regimes, we consider stochastic variational inequality (SVI) problems on matrix spaces where the variables are positive semidefinite matrices and the mapping is merely monotone. Much of the interest in the theory of variational inequality (VI) has focused on addressing VIs on vector spaces. Yet, most existing methods either rely on strong assumptions, or require a two-loop framework where at each iteration, a projection problem, i.e., a semidefinite optimization problem needs to be solved. Motivated by this gap, we develop a stochastic mirror descent method where we choose the distance generating function to be defined as the quantum entropy. This method is a single-loop first-order method in the sense that it only requires a gradient-type of update at each iteration. The novelty of this work lies in the convergence analysis that is carried out through employing an auxiliary sequence of stochastic matrices. Our contribution is three-fold: (i) under this setting and employing averaging techniques, we show that the iterate generated by the algorithm converges to a weak solution of the SVI; (ii) moreover, we derive a convergence rate in terms of the expected value of a suitably defined gap function; (iii) we implement the developed method for solving a multiple-input multiple-output multi-cell cellular wireless network composed of seven hexagonal cells and present the numerical experiments supporting the convergence of the proposed method.
Variational inequality problems first introduced in the 1960s have a wide range of applications arising in engineering, finance, and economics (cf. ) and are strongly tied to the game theory. VI theory provides a tool to formulate different equilibrium problems and analyze the problems in terms of existence and uniqueness of solutions, stability and sensitivity analysis. In mathematical programming, VIs encompass problems such as systems of nonlinear equations, optimization problems, and complementarity problems to name a few . In this paper, we consider stochastic variational inequality problems where the variable is a positive semidefinite matrix. Given a set , and a mapping , a VI problem denoted by VI seeks a positive semidefinite matrix such that
In particular, we study VI() where , i.e., the mapping is the expected value of a stochastic mapping where the vector is a random vector associated with a probability space represented by . Here, denotes the sample space, denotes a -algebra on , and is the associated probability measure. Therefore, solves VI() if
Throughout, we assume that is well-defined (i.e., the expectation is finite).
I-a Motivating Example
A non-cooperative game involves a number of decision makers called players who have conflicting interests and each tries to minimize/maximize his own payoff/utility function. Assume there are players each controlling a positive semidefinite matrix variable which belongs to the set of all possible actions of the player denoted by . Let us define as the feasible actions of other players. Let the payoff function of player be quantified by . Then, each player needs to solve the following semidefinite optimization problem
A solution to this game called a Nash equilibrium is a feasible strategy profile such that ,
for all , . As we discuss in Lemma 3, the optimality conditions of the above Nash game can be formulated as a VI where and .
Problem (3) has a wide range of applications in wireless communications and information theory. Here we discuss a communication network example.
Wireless Communication Networks: A wireless network is founded on transmitters that generate radio waves and receivers that detect radio waves. To enhance the performance of the wireless transmission system, multiple antennas can be used to transmit and receive the radio signals. This system is called multiple-input multiple-output (MIMO) which provides high spectral efficiency in single-user wireless links without interference . Other MIMO systems include MIMO broadcast channels and MIMO multiple access channels, where there are multiple users (players) that mutually interfere. In these systems players either share the same transmitter or the same receiver. Recently, there has been much interest in MIMO systems under uncertainty when the state channel information is subject to measurement errors, delays or other imperfections . Here, we consider the throughput maximization problem in multi-user MIMO networks under feedback errors and uncertainty. In this problem, we have MIMO links where each link represents a pair of transmitter-receiver with antennas at the transmitter and antennas at the receiver. We assume each of these links is a player of the game. Let and denote the signal transmitted from and received by the th link, respectively. The signal model can be described by , where is the direct-channel matrix of link , is the cross-channel matrix between transmitter and receiver , and is a zero-mean circularly symmetric complex Gaussian noise vector with the covariance matrix . The action for each player is the transmit power, meaning that each transmitter wants to transmit at its maximal power level to improve its performance. However, doing so increases the overall interference in the system, which in turn, adversely impacts the performance of all involved transmitters and presents a conflict. It should be noted that we treat the interference generated by other users as an additive noise. Therefore, represents the multi-user interference (MUI) received by th player and generated by other players. Assuming the complex random vector follows a Guassian distribution, transmitter controls its input signal covariance matrix subject to two constraints: first the signal covariance matrix is positive semidefinite and second each transmitter’s maximum transmit power is bounded by a positive scalar . Under these assumptions, each player’s achievable transmission throughput for a given set of players’ covariance matrices is given by
where is the MUI-plus-noise covariance matrix at receiver . The goal is to solve
for all , where , .
I-B Existing methods
Our primary interest in this paper lies in solving SVIs on semidefinite matrix spaces. Computing the solution to this class of problems is challenging mainly due to the presence of uncertainty and the semidefinite solution space. In what follows, we review some of the methods in addressing these challenges. More details are presented in Table I.
|Jiang and Xu ||VI||Stochastic||SM,S||Vector||SA||✗|
|Juditsky et al. ||VI||Stochastic||MM,S/NS||Vector||Extragradient SMP||✗|
|Lan et al. ||Opt||Deterministic||C,S/NS||Matrix||Primal-dual Nesterov’s methods||✗|
|Mertikopoulos et al. ||Opt||Stochastic||C,S||Matrix||Exponential Learning||✓|
|Hsieh et al. ||Opt||Deterministic||NS,C||Matrix||BCD||✗||superlinear|
|Koshal et al. ||VI||Stochastic||MM,S||Vector||Regularized Iterative SA||✗|
|Yousefian et al. ||VI||Stochastic||PM,S||Vector||Averaging B-SMP||✗|
|Yousefian et al. ||VI||Stochastic||MM,NS||Vector||Regularized Smooth SA||✗|
|Mertikopoulos et al. ||VI||Stochastic||SM,S||Matrix||Exponential Learning||✓|
|Necoara et al. ||Opt||Deterministic||C,S/NS||Vector||Inexact Lagrangian||✗|
|Our work||VI||Stochastic||MM, NS||Matrix||AM-SMD||✓|
Stochastic approximation (SA) schemes: The SA method was first developed in  and has been very successful in solving optimization and equilibrium problems with uncertainties. Jiang and Xu  appear amongst the first who applied SA methods to address SVIs. In recent years, prox generalization of SA methods were developed for solving stochastic optimization problems [17, 18] and VIs. The monotonicity of the gradient mapping plays an important role in the convergence analysis of this class of solution methods. The extragradient method which relies on weaker assumptions, i.e., pseudo-monotone mappings to address VIs was developed in , but this method requires two projections per iteration. Dang and Lan  developed a non-Euclidean extragradient method to address generalized monotone VIs. The prox generalization of the extragradient schemes to stochastic settings were developed in . Averaging techniques first introduced in  proved successful in increasing the robustness of the SA method. In vector spaces equipped with non-Euclidean norms, Nemirovski et al.  developed the stochastic mirror descent (SMD) method for solving nonsmooth stochastic optimization problems. While SA schemes and their prox generalization can be applied directly to solve problems with semidefinite constraints, they result in a two-loop framework and require projection onto a semidefinite cone by solving an optimization problem at each iteration which increases the computational complexity.
Exponential learning methods: Optimizing over sets of positive semidefinite matrices is more challenging than vector spaces because of the form of the problem constraints. In this line of research, an approach based on matrix exponential learning (MEL) is proposed in  to solve the power allocation problem in MIMO multiple access channels. MEL is an optimization algorithm applied to positive definite nonlinear problems and has strong ties to mirror descent methods. MEL makes the use of quantum entropy as the distance generating function. Later, the convergence analysis of MEL is provided in  and its robustness w.r.t. uncertainties is shown. In , single-user MIMO throughput maximization problem is addressed which is an optimization problem not a Nash game. In the multiple channel case, an optimization problem can be derived that makes the analysis much easier. However, there are some practical problems that cannot be treated as an optimization problem such as multi-user MIMO maximization discussed earlier. In this regard,  proposed an algorithm relying on MEL for solving N-player games under feedback errors and presented its convergence to a stable Nash equilibrium under a strong stability assumption. However, in most applications including the game (I-A) the mapping does not satisfy this assumption.
Semidefinite and cone programming: Sparse inverse covariance estimation (SICE) is a procedure which improves the stability of covariance estimation by setting a certain number of coefficients in the inverse covariance to zero. Lu  developed two first-order methods including the adaptive spectral projected gradient and the adaptive Nesterov’s smooth methods to solve the large scale covariance estimation problem. In this line of research, a block coordinate descent (BCD) method with a superlinear convergence rate is proposed in . In conic programming with complicated constraints, many first-order methods are combined with duality or penalty strategies [9, 15]. These methods are projection based and do not scale with the problem size.
Much of the interest in the VI regime has focused on addressing VIs on vector spaces. Moreover, in the literature of semidefinite programming, most of the methods address deterministic semidefinite optimization. Yet, there are many stochastic systems such as wireless communication systems that can be modeled as positive semidefnite Nash games. In this paper, we consider SVIs on matrix spaces where the mapping is merely monotone. Our main contributions are as follows:
(i) Developing an averaging matrix stochastic mirror descent (AM-SMD) method: We develop an SMD method where we choose the distance generating function to be defined as the quantum entropy following . It is a first-order method in the sense that only a gradient-type of update at each iteration is needed. The algorithm does not need a projection step at each iteration since it provides a closed-form solution for the projected point. To improve the robustness of the method for solving SVI, we use the averaging technique. Our work is an improvement to MEL method  and is motivated by the need to weaken the strong stability (monotonicity) requirement on the mapping. The main novelty of our work lies in the convergence analysis in absence of strong monotonicity where we introduce an auxiliary sequence and we are able to establish convergence to a weak solution of the SVI. Then, we derive a convergence rate of in terms of the expected value of a suitably defined gap function. To clarify the distinctions of our contributions, we prepared Table I where we summarize the differences between the existing methods and our work.
(ii) Implementation results: We present the performance of the proposed AM-SMD method applied on the throughput maximization problem in wireless multi-user MIMO networks. Our results indicate the robustness of the AM-SMD scheme with respect to problem parameters and uncertainty. Also, it is shown that the AM-SMD outperforms both non-averaging M-SMD and MEL .
The paper is organized as follows. In Section II, we state the assumptions on the problem and outline our AM-SMD algorithm. Section III contains the convergence analysis and the rate derived for the AM-SMD method. We report some numerical results in Section IV and conclude in Section V.
Notation. Throughout, we let denote the set of all symmetric matrices and the cone of all positive semidefinite matrices. We define . The mapping is called monotone if for any , we have . Let denote the elements of matrix and denote the set of complex numbers. The norm denotes the spectral norm of a matrix being the largest singular value of . The trace norm of a matrix denoted by is the sum of singular values of the matrix. Note that spectral and trace norms are dual to each other . We use to denote the set of solutions to VI().
Ii Algorithm outline
In this section, we present the AM-SMD scheme for solving (2). Suppose is a strictly convex and differentiable function, where , and let . Then, Bregman divergence between and is defined as
In what follows, our choice of is the quantum entropy ,
The Bregman divergence corresponding to the quantum entropy is called von Neumann divergence and is given by
. In our analysis, we use the following property of .
() The quantum entropy is strongly convex with modulus 1 under the trace norm.
Since , the quantum entropy is also strongly convex with modulus 1 under the trace norm.
Next, we address the optimality conditions of a matrix constrained optimization problem as a VI which is an extension of Prop. 1.1.8 in .
Let be a nonempty closed convex set, and let be a differentiable convex function. Consider the optimization problem
A matrix is optimal to problem (8) iff and , for all .
() Assume is optimal to problem (8). Assume by contradiction, there exists some such that . Since is continuously differentiable, by the first-order Taylor expansion, for all sufficiently small , we have
following the hypothesis . Since is convex and , we have with smaller objective function value than the optimal matrix . This is a contradiction. Therefore, we must have for all .
() Now suppose that and for all , . Since is convex , we have
for all which implies
where the last inequality follows by the hypothesis. Since , it follows that is optima. The next Lemma shows a set of sufficient conditions under which a Nash equilibrium can be obtained by solving a VI.
[Nash equilibrium] Let be a nonempty closed convex set and be a differentiable convex function in for all , where and . Then, is a Nash equilibrium (NE) to game (3) if and only if solves VI(), where
First, suppose is an NE to game (3). We want to prove that solves VI(), i.e, , for all . By optimality conditions of optimization problem (3) and from Lemma 2, we know is an NE if and only if for all and all . Then, we obtain for all
Invoking the definition of mapping given by (9) and from (II), we have From the definition of VI() and relation (1), we conclude that . Conversely, suppose . Then, . Consider a fixed and a matrix given by (10) such that the only difference between and is in -th block, i.e.
where is an arbitrary matrix in . Then, we have
Therefore, substituting by term (12), we obtain
Since was chosen arbitrarily, for any . Hence, by applying Lemma 2 we conclude that is a Nash equilibrium to game (3). Algorithm 1 presents the outline of the AM-SMD method. At each iteration , first, using an oracle, a realization of the stochastic mapping is generated at , denoted by . Next, a matrix is updated using (14). Here is a non-increasing step-size sequence. Then, will be projected onto set using the closed-form solution (15). Then the averaged sequence is generated using relations . Next, we state the main assumptions. Let us define the stochastic error at iteration as
Let denote the history of the algorithm up to time , i.e., for and .
Let the following hold:
The mapping is monotone and continuous over the set .
The stochastic mapping has a finite mean squared error, i.e, there exist some such that . (Under this assumption, the mean squared error of the stochastic noise is bounded.)
The stochastic noise has a zero mean, i.e., for all .
Iii Convergence analysis
In this section, our interest lies in analyzing the convergence and deriving a rate statement for the sequence generated by the AM-SMD method. Note that a solution of VI() is also called a strong solution. Next, we define a weak solution which is considered to be a counterpart of the strong solution.
(Weak solution) The matrix is called a weak solution to VI() if it satisfies , for all
Let us denote and the set of weak solutions and strong solutions to VI(), respectively.
Under Assumption 1(a), when the mapping is monotone, any strong solution of problem (2) is a weak solution, i.e., . Providing that is also continuous, the inverse also is true and a weak solution is a strong solution. Moreover, for a monotone mapping on a convex compact set e.g., , a weak solution always exists .
Unlike optimization problems where the function provides a metric for distinguishing solutions, there is no immediate analog in VI problems. However, we use the following residual function associated with a VI problem.
( function) Define the following function as
The next lemma provides some properties of the function.
For an arbitrary , we have
for all . For , the above inequality suggests that implying that the function is nonnegative for all .
Assume is a weak solution. By Definition 1, , for all which implies . On the other hand, from Lemma 4, we get . We conclude that for any weak solution . Conversely, assume that there exists an such that . Therefore, which implies for all implying is a weak solution.
Assume the sequence is non-increasing and the sequence is given by the recursive rules (16) where and . Then, using induction, it can be shown that for any .
Next, we derive the conjugate of the quantum entropy and its gradient.
Let and be defined as (6). Then, we have
is a lower semi-continuous convex function on the linear space of all symmetric matrices. The conjugate of function can be defined as
The minimizer of the above problem is which is called the Gibbs state (see , Example 3.29). We observe that is a positive semidefinite matrix with trace equal to one, implying that . By plugging it into Term 1, we have (17). The relation (18) follows by standard matrix analysis and the fact that . Throughout, we use the notion of Fenchel coupling :
which provides a proximity measure between and and is equal to the associated Bregman divergence between and . We also make use of the following Lemma which is proved in Appendix.
() For all matrices and for all , the following holds
Next, we develop an error bound for the G function. For simplicity of notation we use to denote .
From the definition of in relation (13), the recursion in the AM-SMD algorithm can be stated as
By adding and subtracting , we get
where we used the monotonicity of mapping . Let us define an auxiliary sequence such that , where and define . From (III), invoking the definition of and by adding and subtracting , we obtain
Then, we estimate the term . By Lemma 5 and setting and , we get
By plugging the above inequality into (III), we get
By summing the above inequality form to , and rearranging the terms, we have
Plugging the above inequality into (III) yields
Let us define and . We divide both sides of (III) by . Then for all ,
The set is a convex set. Since and , . Now, we take the supremum over the set with respect to and use the definition of the function. Note that the right-hand side of the above inequality is independent of .
By taking expectations on both sides, we get
Next, we present convergence rate of the AM-SMD scheme.