Interference Exploitation-based Hybrid Precoding with Robustness Against Phase Errors

# Interference Exploitation-based Hybrid Precoding with Robustness Against Phase Errors

Ganapati Hegde,   Christos Masouros,  Marius Pesavento,  Ganapati Hegde and Marius Pesavento are with the Communication Systems Group, Technische Universität Darmstadt, Darmstadt 64283, Germany. (e-mail: hegde@nt.tu-darmstadt.de; pesavento@nt.tu-darmstadt.de). Christos Masouros is with the Department of Electronic & Electrical Engineering, University College London, London WC1E7JE, U.K. (e-mail: c.masouros@ucl.ac.uk)This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
###### Abstract

Hybrid analog-digital precoding significantly reduces the hardware costs in massive MIMO trans-ceivers when compared to fully-digital precoding at the expense of increased transmit power. In order to mitigate the above shortfall we use the concept of constructive interference-based precoding, which has been shown to offer significant transmit power savings when compared with the conventional interference-suppression-based precoding in fully-digital multiuser MIMO systems. Moreover, in order to circumvent the potential quality-of-service degradation at the users due to the hardware impairments in the transmitters, we judiciously incorporate robustness against such vulnerabilities in the precoder design. Since the undertaken constructive interference-based robust hybrid precoding problem is nonconvex with infinite constraints and thus difficult to solve optimally, we decompose the problem into two subtasks, namely, analog precoding and digital precoding. In this paper we propose an algorithm to compute the optimal constructive interference-based robust digital precoders. Furthermore, we devise a scheme to facilitate the implementation of the proposed algorithm in low-complexity and distributed manner. We also discuss block-level analog precoding techniques. Simulation results demonstrate the superiority of the proposed algorithm and its implementation scheme over the state-of-the-art methods.

## I Introduction

Massive multiple-input multiple-output (MIMO) system, in which the base stations (BSs) are equipped with hundreds of antennas, is one of the key pillars of the upcoming 5G mobile networks to enable immensely high spectral efficiency [1, 2]. Similar as in the conventional MIMO systems - which comprises only a few antennas (typically less than ten), the degrees of freedom resulting from the large antenna array can be used to form narrow transmit beams using precoding in a massive MIMO downlink system. Even though the precoding techniques that were developed over the decades for the MIMO systems [3, 4] are theoretically extendable to the massive MIMO systems, they are often practically unsuitable because the conventional precoding techniques are employed in the digital baseband domain and they require a dedicated radio-frequency (RF) chain for each antenna element. However the cost and power footprints of the RF chains pose a significant obstacle in practical implementations when the number of antennas grows large [5]. Therefore devising novel precoding schemes while keeping the hardware costs and power efficiency in mind to make them compatible for massive MIMO systems is inevitable.

One of the solutions to reduce the hardware costs in a massive MIMO system proposed in the literature is hybrid analog-digital precoding [6, 7, 8, 9], which requires significantly fewer RF chains compared to the antenna elements. In this system, the RF chains are connected to the antenna elements through analog phase shifters (PSs). The hybrid precoding is performed in two stages: digital precoding in the baseband domain and analog precoding using PSs in the RF domain. In a downlink system, the transmit symbols are first applied with digital precoders (DPs) and the resulting signals are fed to RF chains. The output of the RF chains are processed using analog precoders (APs) and subsequently fed to the antenna elements as shown in Fig. 1.

Hybrid precoding can significantly reduce the hardware costs at the expense of reduced spectral efficiency or increased transmit power to satisfy a certain quality-of-service (QoS) requirement, when compared to the fully-digital precoding [10, 11, 12]. Therefore, schemes to improve the energy efficiency are even more desirable in a hybrid precoding-based networks than in the traditional networks. The constructive interference- (CI) based precoding, in which the interference power is exploited to improve the useful signal power at the users, has shown to offer significant transmit power saving in a fully-digital multiuser downlink system [13, 14, 15, 16] when compared to the conventional interference-suppression or cancellation-based precoding schemes [4, 17]. Therefore it is judicious to extend the CI-based precoding to hybrid precoding architectures in order to reduce the required transmit power. Some initial results in this direction are found in [18].

Some of the envisioned use-cases of 5G networks, such as vehicle-to-x communication and industrial WLAN, require ultra-reliable communication [19]. Therefore, it is crucial to foster the precoders with robustness against interference, imperfect channel knowledge, hardware impairments, etc., in order to guarantee a certain QoS in all circumstances. In [20] the authors develop a method to design the interference-suppression-based hybrid precoders (HPs) with robustness against multiple access interference, inter-symbol interference, and errors in the PSs. In [21, 13] the authors extend the CI-based precoding to design precoders that are robust against imperfect channel knowledge in a fully-digital precoding architecture. The CI-based precoding, in which the precoders are majorly determined by the phase values of transmit symbols and channel elements [13], can be highly sensitive to the phase errors in PSs. However, to the best of our knowledge, the problem of designing the CI-based HPs with robustness against phase errors in the PSs - which is the main focus of this paper - is not considered in the literature.

In this paper we consider symbol-level precoding111In symbol-level precoding, the precoders are updated at every symbol-interval, whereas, in block-level precoding the precoders are kept constant for a block of symbol-intervals. in a multiuser massive MIMO downlink system. Our goal is to design CI-based HPs that require minimum transmit power to guarantee a certain QoS to the users in the presence of phase errors in the PSs. The resulting optimization problem is nonconvex and contains an infinite constraints, and thus difficult to solve optimally. To deal with the nonconvexity, we propose a method to solve the joint analog and digital precoding problem suboptimally, where we decompose the problem into two sequential subtasks, namely, analog precoding and digital precoding. First, we design the DPs under the premise that the APs are fixed. Subsequently, we discuss various schemes to obtain the APs. The main contributions of this paper are summarized below:

The CI-based robust digital precoding problem is formulated as a semi-infinite program. An iterative algorithm is proposed to solve the formulated problem. Closed-form expressions are derived to obtain the error matrices, which are required to update the constraint sets in each iteration. The convergence of the algorithm to the optimal point is proven.

A descent-direction-based iterative scheme is devised to facilitate the distributed implementation of the proposed algorithm on parallel hardware architecture. Closed-form expressions are derived for the update direction and step-size - which are required in a descent direction method [22].

To relax the strict latency requirements on the APs update, we propose block-level analog precoding.

The paper is organized as follows. In Section II we present the system model. The optimization problem is formulated in Section III. In Section IV we propose an algorithm to design the optimal robust DPs. We develop an iterative scheme to efficiently implement the proposed algorithm in Section V. We present the block-level analog precoding methods in Section VI. The numerical results are presented in Section VII. Finally, Section VIII concludes the paper.

Notation: Bold lower-case letters (e.g., ) denote vectors and bold upper-case letters (e.g., ) denote the matrices. The symbol represents the th element of vector , indicates the th column of matrix , and stands for the entry in the th row and th column of matrix . denotes a set and its cardinality is indicated by . The letters and symbolize the real and complex-valued domain respectively. The operators , , and correspond to transpose, Hermitian, and pseudo-inverse of matrix respectively. The symbol represents Hadamard (element-wise) product. The operations , , and , indicate absolute value, -norm, and Frobenius-norm operations respectively. stands for exponential function. and denote the real part and imaginary part of a complex argument respectively. The letter stands for the imaginary unit, and indicate the complex-conjugate of complex scalar .

## Ii System Model

Consider a co-channel multiuser MIMO downlink system consisting of a BS equipped with transmit antennas and RF chains, where . Let denote a set of single antenna users served by the BS. The transmit symbol vector at the BS is given by , where the element indicates the symbol intended for the th user. The symbols are assumed to be drawn from an -ary phase-shift keying (-PSK) constellation222Nevertheless, the proposed techniques can be extended to other modulation formats following the principles in [23, 24]., and without loss of generality (w.l.o.g.) each transmit symbol is assumed to be of constant unit modulus. A DP is applied to the transmit symbol and the resulting signals are fed to the RF chains. Each RF chain is connected to all transmit antennas through analog PSs. The PSs have constant gains and w.l.o.g. they are assumed to be a same value for all PSs. Let denote the designed (intended) phase value of the PS that connects the th antenna to the th RF chain, and denotes the resulting complex value of the corresponding PS. Moreover, forms the AP applied to the output of the th RF chain for . Let be the AP matrix. The PSs are assumed to be imperfect, i.e., their actual phase values can vary from their designed phase values due to phase noise, phase drift, etc. [25, 26, 20], while the actual gains of the PSs are unaltered from their nominal values [20, 27]. Let represent the phase error associated with a PS whose designed value is . Then the true value of the PS is given by , where represents the resulting multiplicative complex error associated with the PS. We assume the phase errors are bounded within a known bound such that [20], as shown in Fig. 2.

Let the set denote the infinite set of all possible error matrices that are associated with the AP matrix , i.e.,

 E≜{E | E∈CN×R,|enr|=1,|∠enr|≤δ,∀n∈N,∀r∈R}.

Let be the frequency-flat channel vector between the BS and the th user, which is assumed to be known at the BS [28, 29]. Let represent the i.i.d. additive white Gaussian noise at the th user. The received signal at the th user can be expressed as

 yk=~hTk(A⊙E)(K∑ℓ=1dℓsℓ)+nk, (1)

where the error matrix .

## Iii Problem Formulation

In a communication system with i.i.d. transmit symbols, the Bayesian decision-region of a symbol for the purpose of decoding is defined as the set of points in the complex domain that have the smallest Euclidean distance to the respective symbol [30]. The CI-based precoding is a linear precoding technique that exploits the knowledge of the channel and data of all users to pre-equalize the transmit signals, such that the received signal at each user lie in the correct decision-region with at-least a threshold-margin away from the corresponding decision boundaries [31, 14, 13]. The part of a decision-region that is a threshold-margin away from the corresponding decision boundaries is called as constructive interference-region (CI-region). The CI-regions of constellation symbols in the case of QPSK and 8-PSK are illustrated in Fig. 3, where represents the threshold-margin. The enforced threshold-margins control the achieved symbol-error-rates (SERs) and hence the resulting QoS at the users.

In this paper, we extend the CI-based precoding concept to the hybrid precoding architecture. Our objective is to design HPs with the minimum transmit power at the BS, such that the received signals at each user lie in the CI-regions of the respective transmitted symbols. Note that, when the non-robust precoding is employed, the phase errors in the PSs can drive the received signals at the users outside the corresponding CI-regions, resulting in increased SER. To overcome this drawback, we incorporate robustness into the digital precoding to ensure that the received signal at each user lie in the appropriate CI-region for any error matrix . Extending the CI-based precoding problem developed for fully-digital precoding system in [13], we formulate a semi-infinite program [32, 33, 34] to implement the above stated task as

 minimize A,{dk}k∈K ∣∣ ∣∣∣∣ ∣∣AK∑k=1dksk∣∣ ∣∣∣∣ ∣∣2 (2a) s.t. ∣∣ ∣∣Im(s∗k~hTk(A⊙E)K∑ℓ=1dℓsℓ)∣∣ ∣∣≤ (Re(s∗k~hTk(A⊙E)K∑ℓ=1dℓsℓ)−γk)tanθ, ∀E∈E,∀k∈K, (2b) |anr|=a,∀n∈N,∀r∈R. (2c)

In the above problem denotes the angular distance between the transmit symbol and the corresponding decision boundaries for a given modulation order , the QoS controlling parameter with indicating the threshold-margin at the th user. In the problem, the objective function (2a) minimizes the total transmit power at the BS. The constraints in (2b) enforce the received signals to lie in the appropriate CI-regions for each user [13] . The constraints in (2c) enforce the constant gain of each element of the AP matrix . The problem (2) can be reformulated as an equivalent single-group multicast problem [13, 35] as

 minimize A, b ||Ab||2 (3a) s.t. ∣∣Im(hTk(A⊙E)b)∣∣≤ (Re(hTk(A⊙E)b)−γk)tanθ, ∀E∈E,∀k∈K, (3b) |anr|=a,∀n∈N,∀r∈R, (3c)

where the effective channel . The optimal multicast DP of problem (3) and the optimal DPs of problem (2) are related by [13]

 d⋆k=b⋆s∗kK,∀k∈K. (4)

The problem (3) is nonconvex and difficult to solve optimally, due to the following reasons: i) bilinear coupling of AP matrix and the DP , ii) the nonconvex domain of the elements of , iii) the constraint in (3b) must be satisfied . i.e., the number of constraints is infinite. We propose a sequential optimization approach that decomposes the problem into two subproblems, namely, analog precoding and robust digital precoding. In Section IV and V, we consider the robust digital precoding and its efficient implementation under the premise that the AP matrix is fixed. Subsequently in Section VI we study the AP design techniques.

## Iv Optimal Robust Digital Precoding

In this section we design the worst-case robust DP of problem (3) when the AP matrix is fixed to . The resulting problem can be expressed as a semi-infinite problem given by,

 minimize b∣∣∣∣^Ab∣∣∣∣2 (5a) s.t. +Im(hTk(^A⊙E)b)≤ (Re(hTk(^A⊙E)b)−γk)tanθ,∀E∈E,∀k∈K, (5b) (Re(hTk(^A⊙E)b)−γk)tanθ,∀E∈E,∀k∈K, (5c)

where the constraints in (5b) enforce the received signal at each user to lie below the anti-clockwise boundary of the corresponding CI-region, and the constraints in (5c) enforce them to lie above the clockwise boundary of the corresponding CI-region (see Fig. 4). We assume the problem (5) is feasible. Based on the cutting plane method and alternating procedure [33, 34], we develop an iterative algorithm to efficiently solve the formulated semi-infinite program by exploiting a structure in the problem, namely, the constant magnitude property of elements of error matrix .

We initialize the algorithm (iteration number ) with sets and , , where is an matrix with all elements equal to 1. The proposed algorithm comprises two stages in each iteration. In the first stage of the th iteration we solve the following convex quadratic problem, which corresponds to the non-robust precoding problem in the first iteration.

 min bi∣∣∣∣^Abi∣∣∣∣2 (6a) s.t. +Im(hTk(^A⊙E)bi)≤ (6b) (6c)

In the subsequent iterations, this problem comprises a finite subset of constraints of problem (5): the constraint (6b) for every error matrix ; the constraint (6c) for every error matrix , . The problem (6) can be solved optimally using any general purpose solver such as SDPT3 [36]. In Section V we develop a customized scheme to solve it more efficiently. Let denote the optimal solution of the problem (6) in the th iteration.

In the second stage of the th iteration, we compute the worst-case error matrices of constraints (5b) and (5c) at , . The worst-case error matrix of constraint (5b) is defined as an error matrix that violates the constraint (5b) with the largest margin, or fulfills it with the smallest margin when the constraint is satisfied , for the th user at . Equivalently, the error matrix causes the received signal at the th user furtherest away from the CI-region in the anti-clockwise direction (see Fig. 4), when the DP is set to . Similarly, the worst-case error matrix of constraint (5c) for the th user, denoted as , drives the received signal furtherest away from the corresponding CI-region in the clockwise direction. The closed-form expressions to compute and are presented below. Now, if violates the constraint (5b) then it will be added to the corresponding set of error matrices, i.e.,

 E(i+1)+k=Ei+k∪Ei+k. (7)

Similarly, if the error matrix violates the constraint (5c), then it will be included in set , i.e.,

 E(i+1)−k=Ei−k∪Ei−k. (8)

When both and , , satisfy the constraints (5b) and (5c) respectively, we conclude that the solution of problem (6) is the global optimal solution of problem (5) and thus terminate the algorithm.

Optionally, in order to reduce the number of constraints of problem (6) in the next iteration, the redundant constraints can be dropped [34]. To this end, we identify the error matrices that result in strict inequality of the corresponding constraint in (6b) for the given DP and exclude them from the set . Similarly, the error matrices that cause strict inequality of the corresponding constraint in (6c) are excluded from the set .

Closed-form expressions for the worst-case error matrices: The worst-case error matrices, and for , of constraints (5b) and (5c) for a given DP can be obtained by solving the problems (9) and (10) respectively.

These problems are nonconvex due to the nonconvex domain of optimization variables . We exploit the constant magnitude property of element and derive a closed-form expression to the worst-case error matrices (see Appendix A), which are given by

 Ei+k=U++jW+, (11a) Ei−k=U−+jW−, (11b)

where the elements of above matrices are computed as

 u+nr=max(cosδ, Im(znr)cosθ−Re(znr)sinθ|znr|), (12a) w+nr=Re(znr)+Im(znr)tanθ|Re(znr)+Im(znr)tanθ|√1−(u+nr)2, (12b) u−nr=max(cosδ,  −Im(znr)cosθ−Re(znr)sinθ|znr|), (12c) w−nr=−Re(znr)+Im(znr)tanθ|−Re(znr)+Im(znr)tanθ|√1−(u−nr)2, (12d)

with . (Indices and are dropped from the matrices and for notational simplicity).

Substituting the optimal solutions and in the objective functions of the problems (9) and (10) we obtain the optimal values and respectively. A non-positive implies that satisfies the constraint (5b) for the th user . On the other hand, a positive value for implies that the constraint (5b) is violated at for the error matrix ; a positive means the constraint (5c) is violated at for the error matrix , for the th user.

The above algorithm to design the worst-case robust digital precoding is summarized in Alg. 1.

Theorem 1: When Alg. 1 terminates after an th iteration the optimal solution of problem (6) is equal to the optimal solution of problem (5).
Proof: see Appendix B-1.

Theorem 2: The sequence, of optimal solutions of problem (6) generated by Alg. 1 converges to the optimal solution of problem (5).
Proof: see Appendix B-2.

## V Low-complexity Parallel implementation scheme

The major part of the computations involved in the proposed robust digital precoding algorithm is contributed from the optimization problem (6), which needs to be solved in every iteration. In this section we develop a low-complexity scheme, which can solve the problem (6) in a parallelized manner in order to reap the benefits of any available parallel hardwares and thus speed up the algorithm.

In the following, firstly we transform the complex-valued problem (6) into an equivalent real-valued problem and then derive its dual problem. Subsequently, the dual problem is solved iteratively, as similar in [37, 38], to obtain the optimal solution of the primal problem by performing the following steps: first, an approximate problem is constructed for the dual problem that delivers a descent direction of the dual problem at a given point. Next, the approximate problem is decomposed into multiple independent subproblems, which can be solved in parallel. Afterwards, a closed-form expression is derived for the optimal solutions of the subproblems. Finally we derive a closed-form expression to compute the step-size, which is required to update the current point in the descent direction.

Let be a function that transforms a complex matrix to a real matrix such that

 Y=Fc2r(X)≜[[c]Re(X),−Im(X)Im(X),  Re(X)]. (13)

Let be a function that transforms a complex vector into a real vector such that

 y=fc2r(x)≜[Re(x)T,Im(x)T]T. (14)

Let . We define the sets of AP matrices of problem (6) in real-valued domain as below (the iteration index is dropped for notational convenience).

 (15) (16)

Furthermore we define the following: , , and

 Π1≜ (17) pk≜ Π2fk,qk≜Π1fktanθ,rk≜γktanθ. (18)

In Eq. (17), and are identity and zero matrices respectively. Now we can reformulate the problem (6) in real-valued domain as

 minimize g ||M0g||2 (19) s. t. (+pk−qk)TMg+rk≤0, ∀M∈M+k,∀k∈K, (−pk−qk)TMg+rk≤0, ∀M∈M−k,∀k∈K.

The Lagrangian function of the above problem can be written as

 L(g,λ) =||M0g||2−(Ψλ)Tg+rTλ, (20)

where denotes the vector of Lagrange multipliers. The matrix and vector are given in Eq. (21) and Eq. (22) respectively.

Taking the infimum of the Lagrangian function w.r.t. we obtain the dual function in terms of , and subsequently we formulate the dual problem of (19) as

 minimize λ ||Nλ||2−rTλ (23a) s. t. λ≥0, (23b)

where . Note that the problem (19) is convex and it comprises only affine inequalities in . Therefore, according to the Slater’s condition strong duality holds for this problem when it is feasible [39]. Moreover, one of the KKT conditions dictates that the Lagrangian function (20) has a vanishing gradient w.r.t. at an optimal primal point and an optimal dual point [39]. By setting we obtain the expression for an optimal primal point of the problem (19) in terms of the corresponding optimal dual point as

 g⋆=(MT0M0)−12Ψλ⋆. (24)

In the following we design an iterative algorithm to solve the dual problem (23) optimally.

Approximate problem: Let be the total number of elements in vector and . In problem (23) the objective function is convex in each variable for . Therefore, based on the Jacobi theorem [40, 37] we construct an approximate problem for the problem (23) in the th iteration around a given point as

 minimize λw,w∈W W∑w=1(∣∣∣∣N−wλi−w+nwλw∣∣∣∣2−rT−wλi−w−rwλw) (25a) s. t. λw≥0,∀w∈W, (25b)

where denotes the matrix obtained by discarding the th column from matrix , denotes the vector obtained by discarding the th element from vector , and denotes the vector obtained by eliminating the th element from vector . Let denote the optimal solution of this problem. Then, according to the Jacobi theorem represents a descent direction of the objective function (23a) in the domain of problem (23) [40, 37]. Therefore, the current point can be updated in the descent direction of the objective function (23a) as

 λi+1=λi+ηi(^λ−λi), (26)

where is an appropriate step-size, with . When the iterative algorithm has converged to the global optimal solution of problem (23).

Decomposition of the approximate problem: The objective function (25a) comprises summands, where each summand contains only one optimization variable . Moreover, the constraints set in (25b) is a Cartesian product of convex sets, with each convex set defined by only one optimization variable . Therefore we can decompose the problem (25) into independent subproblems [37], each involving only one optimization variable , as

 ^λw=argmin λw≥0 ∣∣∣∣N−wλi−w+nwλw∣∣∣∣2−rwλw, (27)

, where the constant term has been dropped without affecting the optimal solution.

Closed-form solution of the subproblems: The objective function in subproblem (27) is convex in , and it comprises only an affine inequality, namely, . According to Slater’s condition the strong duality holds for the subproblem and its dual, and KKT conditions are satisfied by the primal and dual optimal points of the subproblem [39]. The Lagrangian of subproblem (27) can be written as

 L(λw,μw)=∣∣∣∣N−wλi−w+nwλw∣∣∣∣2−rwλw−μwλw, (28)

where is the Lagrange multiplier. Using the KKT conditions we derive a closed-form expression for as

 (29)

Optimal step-size computation: Based on the exact line search method [37], we can formulate an optimization problem to compute the optimal step-size that minimizes the objective function (23a) between the current point and the descent direction as

 ηi=argmin 0≤η≤1∣∣∣∣N(λi+η(^λ−λi))∣∣∣∣2−rT(λi+η(^λ−λi))˚f(η). (30)

The function in the above problem is convex and differentiable in . Differentiating w.r.t. and equating the gradient to zero, we obtain a closed-form expression for the optimal solution of problem (30) as

 ηi=⎡⎢ ⎢ ⎢⎣−2(Nλi)TN(^λ−λi)+rT(^λ−λi)2(N(^λ−λi))TN(^λ−λi)⎤⎥ ⎥ ⎥⎦10. (31)

Termination: When the iterative algorithm has converged to the global optimal solution of problem (23) [37]. In practical applications where a finite numerical precision is sufficient, the iterations can be terminated when , where is a small positive scalar that controls the numerical precision of the scheme.

The above proposed scheme to solve the problem (6) is summarized in Alg. 2.

## Vi Block-level Analog precoding

In the previous sections we focused on designing the DPs under the premise that APs are fixed. In this section we discuss techniques to design APs in CI-based hybrid precoding setting. There are two types of APs that are generally used in hybrid precoding systems, namely, continuous-valued APs [7] and codebook-based APs [41, 10]. A continuous-valued AP has more degrees of freedom, as each of its elements can take any phase value between 0 and . However, its realization requires expensive high resolution tunable PSs. On the other hand, in a codebook-based analog precoding the APs are selected from a predefined codebook that is commonly realized in hardware with switchable spatial filter banks composed of inexpensive fixed PSs [42]. Due to the reduced degrees of freedom, the codebook-based APs require an increased transmit power to fulfill a certain QoS as compared to the continuous-valued APs.

Paper [18] compares the performance of different symbol-level AP design techniques in a CI-based hybrid precoding system. These techniques can be used to obtain the AP matrix at every symbol-interval before employing the proposed algorithm to compute the robust DPs. However, the symbol-level AP design can become inappropriate in many scenarios, such as ultra-low latency applications of 5G networks having symbol duration requirement of few micro-seconds [19]. In such cases, the symbol-level analog precoding can cause drastic performance degradation in hybrid precoding systems with inexpensive PSs having the transient response time in the order of micro-seconds (e.g., PSs comprising RF MEMS [43]). In order to overcome this shortcoming we propose the block-level analog precoding, where an AP matrix that is suitable for a block of symbol-intervals is designed, instead of recomputing it in every symbol-interval. We choose , where is the coherence time of the channel, so that the block-level AP matrix can be designed using the known constant channel matrix. In the following we extend the application of methods of [18] to the block-level analog precoding.

### Vi-a Continuous-valued AP design

#### Vi-A1 Conjugate phase of channel (CPC) method

In this method the BS assigns a dedicated RF chain to each user. Then, the array gain between the th user and the associated RF chain is maximized by assigning the conjugate phase values of the elements of channel vector to the corresponding elements of the AP , i.e., , where indicates the phase value of the th element of channel vector [6]. We remark that in this method the AP matrix is independent of transmit symbol vector, and remain same as long as the channel is constant. Thus the method is inherently suitable for block-level AP design.

### Vi-B Codebook-based AP design

In the codebook-based AP design techniques, the APs are chosen from a predefined set , where . Let be the corresponding codebook matrix.

#### Vi-B1 Margin widening and selection operator (MWASO)

In [18] a sparsity-based AP selection technique, termed as MWASO, is devised to select APs from the codebook that maximize a utility function of the system. Here, we extend this technique to enable block-level analog precoding over symbol-intervals by formulating a block-sparsity-based convex optimization problem [44, 45] as

 minimize Υ∈R,{xt}t∈TΥ+ϵ||X||2,1 (32a) s.t. ∣∣Im(s∗k~hTkCxt)∣∣≤(Re(s∗k~hTkCxt)−(γk−Υ))tanθ, ∀k∈K,∀t∈T. (32b)

In this problem, determines the minimum margin between the received signals and the decision boundaries of the associated transmit symbols of the th user over all symbol-intervals. The optimization variable in the objective function along with the constraints in (32b) forces the received signals towards the interior of the CI-region for all users over all symbol-intervals. The optimization matrix acts as the selection operator. The mixed norm in the objective function promotes row sparsity on the matrix [45], thereby allowing the selection of APs from the codebook matrix that are appropriate for all symbol-intervals. The positive scalar is an appropriate weighting factor, which can be chosen e.g., using bisection method, to force the number of non-zero rows in equal to . Subsequently, the columns of the codebook matrix that correspond to the non-zero rows of the optimal solution forms the AP matrix .

#### Vi-B2 Best matching code selection (BMCS) method

In this method, for each user the AP from the codebook that maximizes the inner product with its channel vector is selected [18]. Similar to the CPC method, this method designs the AP matrix independent of the transmit symbol vector, and hence it is inherently suitable for block-level AP design.

After obtaining the block-level AP matrix by employing any appropriate method, we proceed to design the robust DPs at every symbol-interval as discussed in the previous sections.

## Vii Numerical Results

In this section, we first evaluate the QoS degradation at the users due to phase errors in PSs in the case of CI-based non-robust hybrid precoding to validate the necessity of the robust hybrid precoding. Then we compare the performance of proposed robust precoding with that of a conventional robust precoding technique. Afterwards, we compare the performance of proposed CI-based hybrid precoding with that of the interference-suppression-based state-of-the-art precoding schemes for non-robust scenarios. Next, we numerically evaluate the performance of different block-level analog precoding methods. Finally, we examine the computational complexity of different methods. For the simulation we employ geometric channel model [46, 6].

In interference-suppression-based precoding systems, the SINR metric is generally used to measure the quality of received signal, as the SINR controls the achieved SER. However, in CI-based precoding systems the interference plays a constructive role and it does not necessarily cause symbol-error; therefore the SINR is not an appropriate metric to measure the quality of received signals in this system. In order to quantify the received signal quality in CI-based precoding in a noisy-environment we introduce a metric Threshold-margin-to-Noise power Ratio (TNR), which is defined as . It measures the relative margin between the CI-region and the corresponding decision boundaries w.r.t. noise power, and directly influences the achieved SER. The empirical relations between SNR, TNR, and SER for different modulation schemes are provided in Appendix C.

### Vii-a QoS degradation due to errors in PSs

In order to determine the extent of QoS degradation at the users due to errors in the PSs in CI-based hybrid precoding systems, we employ the non-robust precoding and compute the resulting SER. Table I lists the percentage increase in SER when compared to the expected value (which is achieved in the case of ideal PSs), due to errors in the PSs for different phase error bound . The table indicates significant deterioration in QoS, hence reaffirming the need of robust precoding in such scenarios. The proposed robust precoding is designed to handle the worst-case scenario, thus it completely eliminates the symbol-errors resulting from phase errors in PSs (i.e., 0% SER increase).

### Vii-B Proposed vs. conventional robust hybrid precoding

A conventional approach to obtain the robust DPs is to design non-robust DPs targeting a larger QoS than the required QoS. This technique provides robustness against errors by assigning an extra power to the DPs, when compared to the power required to achieve the actual QoS in the error-free scenario [47, 48]. This method can be extended to the CI-based hybrid precoding by appropriately choosing a new TNR value for the non-robust precoding that achieves a similar SER performance in the presence of phase errors in the PSs and additive noise at the users, as that of the optimal worst-case robust DPs. In our simulation we choose the new TNR values using the empirical relation between , TNR, and SER given in Table IV of Appendix D.

Table II compares the performance of proposed optimal robust precoding method and the conventional method. For different values of we design robust HPs using the proposed algorithm to achieve a TNR = 2, and compute the required transmit power and the resulting SER. Then we compute the TNR value required to achieve a similar SER performance for the given with non-robust precoding using Table IV. For this tuned TNR value we design the non-robust DPs and compute the resulting transmit power . Table II reveals that the conventional method requires significantly more transmit power when compared to the optimal robust precoding method.

### Vii-C CI-based precoding vs. state-of-the-art precoding schemes

In this subsection we compare the SER achieved by the proposed CI-based hybrid precoding with that of the following state-of-the-art hybrid precoding schemes: the PZF method proposed in [6], interference-suppression-based hybrid precoding method (IS-based HP) proposed in [7].

In the formulated CI-based hybrid precoding problem, the objective is to minimize the transmit power for a given TNR. However, the considered competing methods aim to maximize the spectral efficiency (or SINR) for a given power budget. In order to facilitate a fair comparison, we utilize the empirical relation between SNR, TNR, and SER given in Appendix C, and obtain SER vs. transmit power relations for all methods. For the simulation we adopt the following system parameters: transmit antennas, and users. We use RF chains in the case of proposed CI-based hybrid precoding and PZF method, and for IS-based HP (this method requires ). The PSs are assumed to be ideal.

Fig. 5 depicts the SER achieved by all three methods for BPSK, QPSK, and 8-PSK modulation schemes. In the figure, we notice that the proposed method considerably reduces the SER when compared to the competing methods for all considered modulation schemes (approx. 500x for BPSK with Transmit power = 5 dB).

### Vii-D CI-based hybrid precoding vs. fully-digital precoding

Fig. 6 compares the SER (in -scale) achieved by the proposed CI-based hybrid precoding with that of the optimal CI-based fully-digital precoding [13] and conventional SINR fulfillment-based fully-digital precoding [17, 4] (referred to as Conv. fully-DP), for a system with transmit antennas and users.

Both CI-based and SINR fulfillment-based fully-digital precoding assume the number of RF chains . The proposed method is employed for different values of , and the APs are chosen from a 6464 DFT codebook using MWASO method. The figure also comprises the SER achieved by the CI-based hybrid precoding with continuous-valued APs, which are designed employing the CPC method.

The figure reveals that the CI-based hybrid precoding (even with , and codebook-based APs) yields significantly better performance than the conv. SINR fulfillment-based fully-digital precoding. As we increase the number of RF chains, the SER of the CI-based hybrid precoding gradually approaches that of the optimal CI-based fully-digital precoding. Moreover we notice that the continuous-valued analog precoding yields considerably better results than the codebook-based analog precoding at the cost of expensive high resolution PSs.

### Vii-E Evaluation of block-level analog precoding techniques

In this subsection, we compare the performance of different block-level analog precoding techniques in terms of transmit powers of the resulting HPs to achieve a certain QoS. In order to facilitate a fair comparison, the optimal CI-based DPs are designed followed by every block-level analog precoding technique. Fig. 7 plots transmit power of the HPs in different methods over a range of block-length .

The figure also plots the transmit power required in the case of the CI-based fully-digital precoding for comparison. In the simulation we assume the channel is constant for symbol-intervals. As we discussed in Section VI, the CPC and BMCS methods are based on channel matrix and independent of transmit symbols. Therefore, the transmit powers associated with these methods are constant over the block-length . On the contrary, the MWASO method designs the APs based on both channel matrix and the transmit symbol vector. Thus, the transmit power required by the MWASO method increases with . The figure also reveals that the continuous-valued CPC method is considerably more efficient than the codebook-based MWASO and BMCS methods in terms of transmit power. The MWASO method results in smaller transmit power, however computationally more demanding as it involves solving an optimization problem, when compared to the heuristic BMCS method.

### Vii-F Computational complexity analysis

In this subsection, we evaluate the computational complexity of different methods discussed in this paper in terms of their computational time. The simulations are conducted on a system having the following features: Intel (R) Core (TM) i7-4790K CPU 4.00GHz, Arch Linux 4.16.8, MATLAB 2018b, CVX 2.1 with SDPT3 solver.

Table III lists the average computation time required to implement the proposed robust hybrid precoding using CVX with SDPT3 and the proposed scheme (Alg. 2) for different values of phase error bound . In the table we notice that the proposed scheme is significantly faster than the SDPT3 solver.

Fig. 8 plots computation time required by different methods to compute the non-robust HPs for different values of . The figure reveals that the CI-based HPs can be computed significantly faster using the proposed scheme in Alg. 2 compared to using the CVX with SDPT3. Even though the PZF method is faster when compared to the proposed method, we have seen in Section VII-C that it is inferior to the proposed method in terms of SER performance.

We consider the rigorous theoretical and numerical analysis of computational complexities of different methods and their comparison as our future work.

## Viii Conclusion

In this paper we developed an algorithm for computing the optimal CI-based digital precoders with robustness against errors in the PSs. We also devised a scheme to facilitate the implementation of the proposed algorithm efficiently in a distributed manner on parallel hardware architectures. Furthermore we proposed block-level analog precoding techniques, which are necessary for ultra-low latency applications. The simulation results validated the need of the CI-based robust hybrid precoding, and demonstrated the superiority of the proposed precoding over a conventional robust hybrid precoding method. The results illustrated the superiority of the CI-based hybrid precoding when compared to the interference-suppression-based state-of-the-art schemes. We also verified that the devised scheme is significantly faster in implementing the robust precoding when compared to the general purpose solver SDPT3. Furthermore, we inferred from the simulations that the continuous-valued APs yield significantly better performance in block-level precoding, at the cost of high resolution PSs, when compared to the codebook-based APs.

## Appendix A Closed-form expressions for the worst-case error matrices

Consider the objective function of problem (9)

 ^f≜ Im(hTk(^A⊙E)bi⋆)−(Re(hTk(^A⊙E)bi⋆)−γk)tanθ.

Let . We can rewrite as,

 g=∑∀n∈N∑∀r∈Rhknbi⋆r^anrenr,

where denotes the th element of vector . It reveals that, the objective function is separable in each optimization variable . Therefore, can be maximized separately and individually w.r.t. each for . Consider a summand of . Define , and . Substituting these new definitions, the part of function that comprises the variable can be expressed as

 ~f(α,β)=(~χ−¯χtanθ)κα+ (¯χ+~χtanθ)τβ.

The constraints on phase error values, given by and , can be equivalently expressed as and . Substituting in the above equation, we get a new equivalent function . This function comprises the following two variants:

 f1(α)=κα+τ√1−α2. f2(α)=κα−τ√1−α2.

Note that is a concave function (square-root function is a non-decreasing concave function and is a concave function) [39]. We can identify two cases based on the value of . In the first case when , is a concave function, is a convex function, and for . Moreover, an optimal point that maximizes also maximizes together with . Similarly we argue that, in the second case when , an optimal point that maximizes the (then) concave function also maximizes together with .

If , we can obtain the optimal point that maximizes , by differentiating w.r.t. and equating to zero, i.e.,

 df1dα=κ−τα√1−α2=0⟹α⋆=κ√κ2+τ2.

Similarly, if we can obtain the optimal point that maximizes as

 df2dα=κ+τα√1−α2=0⟹α⋆=κ√κ2+τ2.

(In the above derivations, we have explicitly used the prior knowledge of sign of and used the intermediate result that reveal the sign of should be same as the sign of ).

Remember the function is concave in if and is concave in if . Therefore, if the obtained optimal point is smaller than then we can simply enforce to get the optimal point within the domain of the phase error that maximizes . Substituting the expressions for and we get , where,

 α⋆=max(cosδ,  ~χcosθ−¯χsinθ|¯χ+j~χ|), β⋆=sign(τ)√1−α⋆2=¯χ+~χtanθ|¯χ+~χtanθ|√1−α⋆2.

Let . Then, the worst-case error values for all PSs at the BS can be obtained efficiently by computing the error matrix , where,

 u+nr=max(cosδ,  Im(znr)cosθ−Re(znr)sinθ|znr|), w+nr=Re(znr)+Im(znr)tanθ|Re(znr)+Im(znr)tanθ|√1−(u+nr)2.

Similarly, we can derive the expression for , where,

 u−nr=max(cosδ,  −Im(znr)cosθ−Re(znr)sinθ|znr|), w−nr=−Re(znr)+Im(znr)tanθ|−Re(znr)+Im(znr)tanθ|√1−(u−nr)2.

## Appendix B Convergence properties of Alg. 1

Lemma 1: The problems (