# Large-Scale Beamforming for Massive MIMO via Randomized Sketching

## Abstract

Massive MIMO system yields significant improvements in spectral and energy efficiency for future wireless communication systems. The regularized zero-forcing (RZF) beamforming is able to provide good performance with the capability of achieving numerical stability and robustness to the channel uncertainty. However, in massive MIMO systems, the matrix inversion operation in RZF beamforming becomes computationally expensive. To address this computational issue, we shall propose a novel randomized sketching based RZF beamforming approach with low computational latency. This is achieved by solving a linear system via randomized sketching based on the preconditioned Richard iteration, which guarantees high quality approximations to the optimal solution. We theoretically prove that the sequence of approximations obtained iteratively converges to the exact RZF beamforming matrix linearly fast as the number of iterations increases. Also, it turns out that the system sum-rate for such sequence of approximations converges to the exact one at a linear convergence rate. Our simulation results verify our theoretical findings.

## I Introduction

With the explosive growth in mobile data traffic and number of mobile devices, as well as the stringent and diverse demands of intelligent mobile services, wireless networks are facing formidable challenges to enable high spectral efficiency and support massive connectivity with low-latency. To satisfy these requirements, network densification becomes the key enabling technology. This is achieved by deploying more base stations (BSs) with the storage and computational capabilities, yielding an ultra-dense network (UDN) [1]. In particular, massive multiple-input multiple-output (MIMO) technique provides an alternative to achieve UDN by simply increasing the number of antennas at the existing BS [2, 3]. The key success is based on the fact that deploying large-scale antenna arrays allows for an exceptional array gain and unprecedented spatial resolution such that the wireless communication system is robust to inter-user interference [4]. Furthermore, the large arrays regime provides the opportunities for asymptotic system analysis, e.g., the high-dimensional random matrix theory can provide deterministic approximations for achievable data rates [5, 6].

Transmit beamforming at the BSs is a key method to optimize the network utility function (e.g., sum-rate) in terms of signal-to-interference-plus-noise ratios (SINRs). However, the resulting beamforming optimization problem is generally very difficult to be solved due to the nonconvexity and high-dimensionality. With the known optimal SINRs parameters for maximizing the network utility function, a simple structure for the optimal beamforming can be derived based on the Lagrange duality theory [7]. To find the optimal SINRs parameters, we normally need to solve a sequence of convex subproblems [8]. For instance, in the max-min fairness rate optimization problem, the optimal SINRs parameters can be found via the bi-section method [9], wherein a sequence of convex subproblems are solved. Although the general large-scale convex optimization problem can be solved by the operate splitting method, it still needs to solve a sequence of subspace projection and cone projection problems in the transformed high-dimensional space for the standard cone program [10]. Instead, the heuristic transmit beamforming, i.e., regularized zero-forcing (RZF) beamforming [11] turns out to be the appealing choice since it performs closely to the optimal beamforming in terms of sum-rate and has relatively low computational complexity [7]. We thus focus on investigating RZF beamforming in this paper. However, the RZF beamforming needs to compute a matrix inversion with complexity proportional to , where is the number of users served by transmit antennas. This is however computational expensive in massive MIMO scenario where . To tackle this issue [12, 13] proposed to replace the matrix inversion in RZF beamformer by a truncated polynomial expansion, but it is not clear that which degree of the polynomial is needed to guarantee the good performance for the system sum-rate.

In recent years, randomized sketching algorithms[14, 15, 16] have received a great deal of attention in order to solve large-scale matrix computation problems. The main idea behind randomized sketching algorithms is to compress a given large-scale matrix to a much smaller matrix by multiplying it by a random matrix with certain properties. Very expensive computation can then be operated by the smaller matrix efficiently. In particular, several novel randomized algorithms are proposed for the ridge regression problem [17, 18, 19, 20]. Inspired by these progresses, we propose a randomized sketching based beamforming method to overcome the computational issues for designing beamformers in massive MIMO systems. Specifically, the randomized sketching RZF beamforming matrix is achieved by solving a linear system by preconditioned Richard iteration [21] with the randomized sketching techniques [19]. The proposed randomized sketching RZF beamforming method has a computational complexity proportional to with as the sketching matrix size. We prove that the beamforming matrix obtained iteratively converges to the RZF beamforming matrix at a linear convergence rate. Furthermore, we prove that the achievable system sum-rate of the MIMO system with the proposed randomized method converges to the achievable sum-rate given by RZF beamforming linearly as the number of iteration increases. Extensive numerical results are presented to verify our theoretical findings.

### I-a Outline

The organization of this paper is as follows. In Sec. II, the system model and problem statement of estimating the beamforming matrix for a massive MIMO communication system are described. In Sec. III, we propose the randomized sketching method to approximate the beamforming matrix with low-complexity and provide convergence analysis and complexity analysis. In Sec. IV, we prove that the system sum-rate of the randomized sketching based beamformer converges to the sum-rate of the RZF beamforming matrix as the number of iterations increases. We provide the exact rate of convergence as well. In Sec. V, we numerically evaluate the performance of the randomized sketching based beamforming method. Finally, conclusions are drawn in Sec. VI.

### I-B Notation

Let (resp. ) be the set of real (resp. complex) numbers. For a matrix , (resp. ) denotes -th column(resp. row) vector of . (resp. ) denotes the operator (resp. Frobenius) norm. For a vector , denotes the Euclidean norm. The superscript denotes the transpose operator. is a complex conjugate transpose of . The diagonal matrix whose diagonal entries consist of entries of a vector is denoted by . Denote the identity matrix of size as . When the size can be trivially determined by the context, we simply write . Denote the zero matrix with size as . Let and denote the real and imaginary parts of a matrix , respectively. For a matrix with of rank , its (thin) Singular Value Decomposition (SVD) is the form where is the matrix of the left singular vectors, is the matrix of the right singular vectors, and is a diagonal matrix whose diagonal entries are the singular value of . We denote the singular values of a matrix as . We denote the matrix of the top left singular vectors as and the matrix of the bottom left singular vectors as .

## Ii System Model and Problem Statement

### Ii-a System Model

We consider a single-cell massive MIMO system consisting of one BS equipped with antennas and single-antenna users, where . During the downlink transmission, the received signal at the -th user is given by

(1) |

where is the transmit beamforming vector from the BS for data symbol to user , is the channel propagation coefficients from the BS to the -th user, and is the additive noise (i.e., is a circularly symmetric complex Gaussian random distribution with mean and variance ). Therefore, the SINR at the -th user is given as

(2) |

where is the aggregative beamforming matrix with the total transmit power limited by , i.e.,

(3) |

The achievable system sum-rate is thus given by

(4) |

One of the main goal of transmit beamforming is to maximize the achievable system sum-rate. However, it is generally computationally demanding to find the optimal beamforming matrix [8].

### Ii-B Regularized Zero-Forcing Beamforming

Although there are various precoding techniques such as matched filter, zero forcing, regularized zero-forcing, truncated polynomial expansion, and phased zero forcing [22], this article considers the suboptimal beamforming approach, regularized zero-forcing (RZF) beamforming [11], which is known to have the capability achieving robustness and numerical stability to the channel uncertainty [11, 7]. RZF precoder has been considered as the state-of-the-art linear precoder for MIMO wireless communication systems. Since we focus on the computational issues of RZF, we consider the following RZF with equal power allocation for simplification [5]

(5) | |||||

where is the channel matrix, is an optimal regularizer, and is a normalization parameter to satisfy the power constraint (3). In particular, can be derived as in the symmetric scenario, where the channels are equally strong[11].

### Ii-C Complexity Issues in Massive MIMO

The main computational complexity for computing (5) lies in computing the matrix inversion directly, which leads to computational complexity. To support ultra-low latency communications in massive MIMO systems, it becomes critical to design large-scale precoding algorithm with small computation latency [12, 13]. As fast inversions of large-scale matrices in every coherence period needs to be performed, it is desired to find efficient algorithms to reduce the high computational complexity with performance guarantees.

In this paper, we shall develop the randomized sketching based precoding algorithm to compute the large-scale RZF beamforming matrix in (5). This is based on the key observation that the large-scale array regime, i.e., , offers the opportunity for dimension reduction in (5), thereby reducing the computational complexity while guaranteeing the high performance accuracy. Specifically, we develop the scalable algorithm for computing in (5) based on the principles of Randomized Numerical Linear Algebra [16]. In particular, the theoretical guarantees for the achievable system sum-rate (4) using the randomized sketching based beamforming method will be presented in Sec. IV.

## Iii Randomized Sketching for Large-Scale Beamforming

### Iii-a Randomized Sketching Algorithm

Randomized sketching algorithm exploits randomization as a computational resource to develop improved algorithms for large-scale matrix computation problems. The key idea of randomized algorithm is to compress a given large-scale matrix to a much smaller matrix by multiplying it by a random matrix with certain properties. Very expensive computation can then be performed on the smaller matrix efficiently. For a given matrix and a random matrix , the technique of replacing by is known as a sketching technique and is referred to as a sketch of . Such is called a sketching matrix.

Sketching technique can be accomplished by random sampling or random projection. For random sampling method, the sketch consists of a small number of carefully-sampled and rescaled columns/rows of matrix . On the other hand, for random projection method, the sketch consists of a small number of linear combinations of the columns/rows of . We will discuss various construction for the random matrix in Section III-D.

### Iii-B Randomized Sketching Based RZF Beamforming

The first key observation is that (5) can be expressed as the matrix ridge regression problem as follows [19]:

(6) |

where . To facilitate algorithm design in real field, we focus on solving the equivalent real counterpart of (6):

(7) |

where

Note that since , . Then the optimal solution of (7) takes the form,

(8) |

Given the matrix , it is trivial to obtain the complex RZF beamforming matrix in (5).

Iterative methods provide the solution to the linear system as the limit of a sequence , and usually involve matrix only through multiplications by given vectors. Generally, any iterative method is based on a suitable splitting of the matrix with , where is nonsingular. Then the sequence is generated as follows:

(9) |

where is a given initial vector. Equivalently, such iteration can be restated as , where is the residual at the step , where is called preconditioner for . The following iteration is called preconditioned Richardson iteration[21]:

(10) |

where is the real acceleration parameter.

We present the novel sketching based randomized beamforming in Algorithm 1, which iteratively computes a sequence of matrixes for and returns the approximation to the true solution matrix of (8). In fact, it can be viewed as a preconditioned Richardson iteration. Indeed, for a given in Algorithm 1, we denote . Note that our solution is . By (i) and (iii) in Algorithm 1, we have

(11) |

Applying the recurrence relation (11) successively, it follows that

Then it holds that

Thus, Algorithm 1 can be formulated as a preconditioned Richard iteration to solve the linear system

(12) |

with preconditioner and for all in (10).

Algorithm 1 iteratively computes a sequence of matrices for and returns the approximation to the true solution in (8). Equivalently, it computes the approximation to the true solution in (5). We call such approximation a randomized sketching based beamformer.

Algorithm 1 uses the sketching matrix for the preconditioner in order to improve the rate of convergence and reduce the computational complexity. Specifically, using the sketching matrix with , the preconditioner can be computed by matrix with much smaller size.

### Iii-C Convergence Analysis

The convergence analysis depends on the selected sketching matrix, which satisfies the constraint (13). Theorem 1 presents a quality-of-approximation result under the assumption that the sketching matrix satisfies the constraint (13).

###### Theorem 1.

Assume that for some constant , the sketching matrix satisfies the following constraint

(13) |

where be the matrix of right singular vectors of . Then, after number of iterations, the approximation returned by Algorithm 1 satisfies

where is the true value of the RZF beamforming matrix in (5) in the complex version.

###### Proof.

Then each column of can be considered as the solution of the following optimization problem

(15) |

for each . Recall that and is the -th column of and , respectively. By Theorem 1 in [19], it follows that

for all . Then we have

Clearly, and . ∎

To check whether a sketching matrix satisfies (13), a number of columns that is proportional to is required (see Theorem 3). Thus, the running time of any algorithm that computes the sketch is also proportional to . To reduce the running time, it would be much better to use a parameter which is significantly smaller than . For simplicity of exposition, we will assume that the rank of is .

In the context of ridge regression, a much more important quantity than the rank of is the (effective) degrees of freedom of as follows [26]:

(16) |

where are the singular values of and . Since , it is trivial that . That is, the degrees of freedom is upper bounded by the rank of .

Define a diagonal matrix whose -th diagonal entry is given by

(17) |

where is the -th singular value of and .

Now we provide a weaker constraint with the effective degrees of freedom.

###### Theorem 2.

Assume that for some constant , the sketching matrix satisfies the following constraint

(18) |

where is the matrix of right singular vectors of . Then, after number of iterations, the approximation returned by Algorithm 1 satisfies

(19) |

where is an integer number such that , is the matrix of the bottom left singular vectors of the matrix , and is the true value of the RZF beamforming matrix in (5) in the complex version.

###### Proof.

This improved dependency on instead of the rank of matrix results in a mild loss in accuracy. can be thought of as regularizing the bottom singular values of the matrix , since it dominates them. Theorem 2 presents a quality-of-approximation result, which uses a relative-additive error approximation. The term is a norm of the part of matrix that lies on the regularized component of . As the increase of this part, the quality of the approximation will become worsen. The error decreases exponentially fast with the number of iterations.

The bounds of (13) and (18) guarantee high-quality approximations to the optimal solution. Constraint (13) can be satisfied by constructing the sampling-and-rescaling matrix whose size depends on the rank of matrix , and Theorem 1 guarantees relative error approximations. The second constraint (18) can be satisfied by sampling with respect to the ridge leverage scores, which construct the sampling-and-rescaling matrix whose size depends on the degrees of freedom , and Theorem 2 guarantees relative error approximations.

### Iii-D Sketching Matrices

Matrix sketching attempts to reduce the size of large matrices while minimizing the loss of spectral information that is useful in tasks like linear regression. Matrix sketching algorithms use a typically randomized procedure to compress into an approximation (or ”sketch”) with many fewer columns . Matrix sketching can be accomplished by random sampling or random projection. Random projection algorithms construct by forming random linear combinations of the columns in . On the other hand, random sampling algorithms construct by selecting and possibly rescaling a columns in . In the latter case, we call a sketching matrix as the sampling-and-rescaling matrix.

Sampling itself is simple and extremely efficient. A simple way to perform this random sampling would be to select those columns uniformly at random in i.i.d. trials, which mean . A more sophisticated and much more powerful way to do this would be to construct an important sampling routines which select columns using carefully chosen, non-uniform probabilities . It is known that variations on the standard “statistical leverage scores” give probabilities that are provably sufficient for approximations such as low-rank approximation. Many of these probabilities are modifications on the standard statistical leverage scores.

###### Definition 1.

The (statistical) leverage score of the column of is defined as:

(20) |

for .

Here, denotes the Moore-Penrose pseudoinverse of a matrix. When is full rank, . measures how important is in composing the range of . It is maximized at 1 when is linearly independent from ’s other columns and decreases when many other columns approximately align with or when is small.

Leverage score sampling sets proportional to the (exact or approximate) leverage scores of . The leverage scores are used in fast sketching algorithms for linear regression and matrix preconditioning[27, 28, 29].

Notably, leverage scores are defined in terms of , which is not always unique and regardless can be sensitive to matrix perturbations. As a result, the scores can change drastically when is modified slightly or when only partial information about the matrix is known. This largely limits the possibility of quickly approximating the scores with sampling algorithms, and motivates our adoption of a new leverage score. Rather than using leverage scores based on , we employ regularized scores called ridge leverage scores, which have been used for approximate kernel ridge regression [30] and in works on iteratively computing standard leverage scores [31, 31]. For a given regularization parameter , we define the -ridge leverage score as:

(21) |

Let be the best low-rank approximation for with respect to the Frobenius norm. In other words,

Note that can be expressed as . That is, the best rank approximation can be found by projecting onto the span of its top singular vectors. We will always set as follows.

###### Definition 2.

The ridge leverage score of the column of with respect to the ridge parameter is defined as:

(22) |

for .

Note that the ridge leverage score can also be expressed as

where is the matrix of right singular vectors of and is defined as (17). The constraint (18) can also be satisfied by sampling with respect to the ridge leverage scores [30]. The difference is that, instead of having the column size of the matrix depend on , it now depends on , which could be considerably smaller. Indeed, it follows that by sampling-and-rescaling from the design matrix (using either exact or approximate ridge leverage scores).

In this article we only consider the sampling-and-rescaling matrix for a sketching matrix . Algorithm 2 provides the construction of it. The following theorems show how many sampled columns guarantee that the sketching matrix holds the constraint (13). This theorem is adopted from Theorem 3 in [19], so the proof is omitted.

###### Theorem 3.

Let be the matrix of right singular vectors of . Let be constructed by Algorithm 2 with the sampling probabilities for . Let be a failure probability and let be an accuracy parameter. If the number of sampled columns satisfies

(23) |

then, with probability at least ,

(24) |

The sampling probabilities are the column leverage scores [19] of the channel matrix . Setting suffices to satisfy the condition (13). [32] demonstrated a construction for such with columns such that, for , the product can be computed in time for some constant . Here is the number of nonzero entries of .

Additionally, there are a variety of sketching matrix constructions for that can satisfy (24). The running time of sketch depends on the dimension of , and the construction of sampling with respect to leverage scores is proportional to (we assume that ), which means the running time of is also proportional to . Therefore, we let dimensionality depend on the degrees of freedom of the ridge regression problem, as opposed to the rank of matrix . In this way, the running time would result in significant savings.

To achieve the reduction in running time, the column size of matrix is thus better designed proportional to degrees of freedom which depends on the distribution of the singular value of and instead of proportional to for , which could be significantly smaller than .

###### Theorem 4.

Let be the matrix of right singular vectors of . Let be constructed by Algorithm 2 with the sampling probabilities for . Let be a failure probability and let be an accuracy parameter. If the number of sampled columns satisfies

(25) |

then, with probability at least ,

(26) |

The sampling probabilities are the column ridge leverage scores [33, 30] of the channel matrix . Similarly to the constraint of (24), setting suffices to satisfy the condition (26). Note that while the leverage scores which construct the sampling-and-rescaling matrix with the column size depends on rank of , the ridge leverage scores construct depend on , which could be considerably smaller than rank of . Hence, it could surely achieve time saving. However, the running time savings would lead to a drop in accuracy as shown in Theorem 2.

### Iii-E Time Complexity

We now discuss the time complexity of Algorithm 1. Note that each column in (8) can be computed by each column of , separately. We consider as a column vector. Let . Note that to find , it suffices to compute the singular value decomposition of . Since the singular values of can be computed through , where denotes the singular value of . And the left and right singular vectors of are the same as the left singular vectors of . We store it implicitly by storing its left (and right) singular vector and its singular values , before we just compute all the necessary matrix-vector products using this implicit representation of . The above analysis shows that we do not need to compute directly. Thus computing takes time.

Updating each , , and is dominated by the aforementioned running times, as all updates amount to just matrix-vector products. Thus, summing over all iterations, the running time of Algorithm 1 is given by

(27) |

Thus the time complexity is reduced evidently. Note that the complexity of computing the matrix inversion (8) is .

## Iv The system sum-rate analysis with approximate RZF beamformers

In this section, we show that the system sum-rate of the randomized sketching based beamformer converges to the sum-rate of the RZF beamforming matrix as the number of iterations increases. Moreover, if an approximation sequence converges to the true beamforming matrix with the rate of convergence , then the system sum-rate of the approximation sequence converges with the same rate of convergence . Before stating our main results, we introduce the extra notation, , to cast SINR at the -th user in (2) with a simpler form. From now on, we assume that the channel matrix is fixed and the beamforming matrix is considered as complex variables. Then we can easily deal with the system sum-rate for any approximate beamforming matrix.

For each , let a function be defined by for all . Note that

The system sum-rate of a given variables can thus be rewritten as

(28) |

That is, can be viewed as a function from to , as shown in Fig. 1.

Let be a nonempty open subset of , , and . Recall that a function is said to be on if each partial derivative of of order exists and is continuous on . is said to be on if is on for all . In other words, a -mapping is a function that is differentiable for all degrees of differentiation.

###### Lemma 1.

The system sum-rate is a -mapping on .

###### Proof.

Note that the complex variables can be considered as real variables . We use and interchangeably. Let be real variables. Then it is easy to check that that is a multivariate polynomial in , i.e., the ring of polynomials with real coefficients over variables . Thus is -mapping on . Since the logarithm function are -mapping, the function is a -mapping on , provided . Let be a given channel matrix. Considering the beamforming matrix as real variables in , the system sum-rate in (4) can be considered as a function defined by

In other words, can be considered as a function from . Moreover, it is easy to check that is a multivariate polynomial in , which is -mapping on . Since it can be rewritten as

and the logarithm function is -mapping, the function is a -mapping on , provided . ∎

Denote the true solution of the regularized RZF problem (8) in the complex version as . By Lemma 1, is continuous. By Theorem 1, each entry of the approximation converges to the entry of the true solution, respectively, i.e.,

Note that the image of a convergent sequence under a continuous function converges to the image of limit. Thus, the following holds.

###### Proposition 5.

Assume that for some constant , the sketching matrix satisfies the constraint (13). Let be the number of iterations. Then, the system sum-rate of the approximation converges to the system sum-rate of the true solution as the number of iterations increases. That is,

(29) |

The next theorem is our key result. It shows that the error of the system sum-rate is bounded by the error of an approximation of beamforming matrix. Using this result, the rate of convergence for the system sum-rate of an approximation can be obtained.

###### Theorem 6.

Let be a given channel matrix, and let (resp. ) be the approximation (resp. true) RZF beamforming matrix. Then it holds that

where is constant independent to .

###### Proof.

See Appendix B. ∎

Suppose a sequence converges to zero, and converges to a number . Recall that converges to with rate of convergence if a positive constant exists with for sufficiently large .

The following shows that if an approximation sequence converges to the true beamforming matrix with the rate of convergence , then converges to with the same rate of convergence .

###### Theorem 7.

Let . If an approximation sequence converges to such that then

provided sufficiently large .