Doubling Phase Shifters for Efficient Hybrid Precoder Design in Millimeter-Wave Communication Systems

# Doubling Phase Shifters for Efficient Hybrid Precoder Design in Millimeter-Wave Communication Systems

Xianghao Yu, , Jun Zhang, , and Khaled B. Letaief,  This work was presented in part at Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, Nov. 2016 [1].X. Yu, J. Zhang, and K. B. Letaief are with the Department of Electronic and Computer Engineering, the Hong Kong University of Science and Technology (HKUST), Kowloon, Hong Kong (e-mail: {xyuam, eejzhang, eekhaled}@ust.hk). K. B. Letaief is also with Hamad Bin Khalifa University, Doha, Qatar (e-mail: kletaief@hbku.edu.qa). This work was supported by the Hong Kong Research Grants Council under Grant No. 16210216.
###### Abstract

Hybrid precoding is a cost-effective approach to support directional transmissions for millimeter-wave (mm-wave) communications, but its precoder design is highly complicated. In this paper, we propose a new hybrid precoder implementation, namely the double phase shifter (DPS) implementation, which enables highly tractable hybrid precoder design. Efficient algorithms are then developed for two popular hybrid precoder structures, i.e., the fully- and partially-connected structures. For the fully-connected one, the RF-only precoding and hybrid precoding problems are formulated as a least absolute shrinkage and selection operator (LASSO) problem and a low-rank matrix approximation problem, respectively. In this way, computationally efficient algorithms are provided to approach the performance of the fully digital one with a small number of radio frequency (RF) chains. On the other hand, the hybrid precoder design in the partially-connected structure is identified as an eigenvalue problem. To enhance the performance of this cost-effective structure, dynamic mapping from RF chains to antennas is further proposed, for which a greedy algorithm and a modified K-means algorithm are developed. Simulation results demonstrate the performance gains of the proposed hybrid precoding algorithms over existing ones. It shows that, with the proposed DPS implementation, the fully-connected structure enjoys both satisfactory performance and low design complexity while the partially-connected one serves as an economic solution with low hardware complexity.

5G networks, hybrid precoding, low-rank matrix approximation, millimeter-wave communications, multiple-input multiple-output (MIMO), OFDM.

## I Introduction

The proliferation of smart mobile devices has resulted in an ever-increasing wireless data explosion, which calls for an exponential increase in the capacity of wireless networks. In particular, the upcoming 5G networks require a 1000X increase in capacity by 2020 [2]. The spectrum crunch in current wireless systems stimulates extensive interests on exploiting new spectrum bands for cellular communications, and millimeter-wave (mm-wave) bands from 30 GHz to 300 GHz have been demonstrated to be promising candidates in recent experiments [3]. Thanks to the smaller wavelength of mm-wave signals, large-scale antenna arrays can be leveraged at both the transmitter and receiver sides, which can provide spatial multiplexing gains with the help of multiple-input multiple-output (MIMO) techniques. On the other hand, the ten-fold increase of the carrier frequency introduces several challenges to mm-wave communication systems, especially the high power consumption and cost of hardware components at mm-wave bands [4]. In addition, the large available bandwidth at mm-wave frequencies induces severe frequency selectivity, for which multicarrier techniques such as orthogonal frequency-division multiplexing (OFDM) shall be utilized. All the above-mentioned design aspects should be taken into consideration when developing practical transceivers for mm-wave MIMO systems.

By utilizing a small number of radio frequency (RF) chains to combine a low-dimensional digital baseband precoder and a high-dimensional analog RF precoder, hybrid precoding stands out as a cost-effective transceiver solution for mm-wave MIMO systems [5, 6, 7]. Compared with conventional MIMO systems, the additional high-dimensional analog RF precoder is the differentiating part. According to the mapping strategies from RF chains to antennas in the analog RF precoder, hybrid precoders can be categorized into the fully- and partially-connected structures [8]. In the fully-connected structure, each antenna is connected to all the RF chains. In contrast, each antenna is connected to one RF chain in the partially-connected structure, with a significant reduction in the hardware complexity.

To effectively reduce the power consumption in the RF domain, analog RF precoders are usually implemented by phase shifters at the expense of sacrificing the ability to adjust the amplitude of the RF signals [5]. Thus, the analog component forms the major challenge in designing hybrid precoders. Given the large dimension of the design space and the unit modulus constraint induced by the phase shifter implementation, an important design aspect of hybrid precoders is the computational complexity. While various attempts have been made to balance the performance and computational complexity, there is no systematic approach to design computationally efficient hybrid precoders with satisfactory performance in the meanwhile. In this paper, we will show the great potential to develop efficient hybrid precoding algorithms by adopting a novel double phase shifter (DPS) hybrid precoder implementation.

### I-a Related Works and Motivation

Most existing works on hybrid precoding focused on the fully-connected structure [6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. The initial efforts started from single-user single-carrier111In this paper, single-carrier systems refer to single-carrier transmissions over flat-fading channels. mm-wave systems [6, 8, 9, 10]. Then, the investigation was extended to single-user multicarrier [8, 11, 12] and multiuser single-carrier systems [13, 14, 15, 16]. The main differences in these existing works are the approaches in dealing with the unit modulus constraints on the analog RF precoder.

By choosing the analog beamforming vectors from a predefined candidate set, e.g., array response vectors in [6, 11, 13, 14] and discrete Fourier transform beamformers in [10], a greedy algorithm called orthogonal matching pursuit (OMP) has been widely used in designing hybrid precoders. Although its computational complexity is relatively low, the performance is not satisfactory and has been improved by several followed-up works. In [8], it was shown that the unit modulus constraints define a Riemannian manifold, and manifold optimization was introduced to directly tackle them, which helps to approach the performance of the fully digital one with a small number of RF chains. Furthermore, the contribution of each phase shifter to the spectral efficiency was identified in [9, 16], based on which the analog precoder was optimized in a phase shifter-by-phase shifter fashion. However, these algorithms all involve iterative procedures to optimize the analog RF precoders, which results in high computational complexity. Moreover, there were also some studies on how to achieve the performance of the fully digital precoder with the hybrid structure [17, 18], yet requiring a large number of RF chains, which, to some extend, deviates from the motivation of hybrid precoding.

On the other hand, less attention has been paid on hybrid precoding in the partially-connected structure. In [19, 20], codebook-based design of hybrid precoders was presented for single-user narrowband and OFDM systems, respectively. While using codebook enjoys a low complexity, there will be certain performance loss, and how to design the codebook remains to be clarified. By migrating the concept of successive interference cancellation, an iterative hybrid precoding algorithm in the partially-connected structure was proposed in [21] for single-user single-carrier systems. Since the partially-connected structure employs much fewer phase shifters, there should be some inevitable degradation in the analog precoding gain, which makes it difficult for such structure to achieve a high spectral efficiency, especially when the analog precoder is shared across all the users and subcarriers as in the multiuser multicarrier systems. Hence, how to efficiently use the limited number of phase shifters is an urgent issue to be solved in the partially-connected structure.

As illustrated above, in both the fully- and partially-connected structures, there is no comprehensive way to efficiently design hybrid precoders with satisfactory performance, which motivates us to seek a new hybrid precoding architecture that can relieve us from the current dilemma. Furthermore, it is still unclear how to design hybrid precoders in multiuser multicarrier systems, where a single analog RF precoder is shared by a large number of subcarriers, and multiple users that interfere with each other. In this paper, we propose a novel DPS implementation for hybrid precoding in the general setting of multiuser OFDM mm-wave MIMO systems. Although similar implementations were considered in [18, 22], the systematic design approach and algorithmic advantages of this new implementation have not been exploited, which will be illustrated in this paper via effective algorithms for different hybrid precoder structures.

### I-B Contributions

Conventionally, a single phase shifter is used to connect an RF chain and an antenna, i.e., the SPS implementation, which introduces the unit modulus constraints and hinders efficient algorithm design. In this paper, to overcome this algorithmic difficulty, we propose a novel hybrid precoder implementation that makes the precoder design more tractable. Our main contributions are summarized as follows.

• We propose a novel hybrid precoder implementation, i.e., the DPS implementation, which relaxes the unit modulus constraints of the analog RF precoder and thus enables computationally efficient hybrid precoder design. To the best of the authors’ knowledge, this is the first attempt to directly adopt the DPS implementation for designing hybrid precoders in multiuser OFDM mm-wave MIMO systems.

• For the fully-connected structure, the optimization of the analog RF precoder is formulated as a least absolute shrinkage and selection operator (LASSO) problem, based on which efficient algorithms are developed. Furthermore, the hybrid precoder design is identified as a low-rank matrix approximation problem, which has a closed-form solution. Furthermore, the efficient algorithm for the DPS implementation inspires an effective heuristic hybrid precoder design for the conventional SPS implementation, which outperforms the state-of-the-art algorithms in both computational complexity and spectral efficiency.

• For the partially-connected structure, we identify that the hybrid precoder design is an eigenvalue problem, and provide closed-form solutions for both analog RF and digital baseband precoders. To further improve the system performance, a dynamic partially-connected structure is proposed. Two effective algorithms, i.e., the greedy and modified K-means algorithms, are proposed to dynamically optimize the mapping strategies from RF chains to antennas.

• For both structures, we discover that the hybrid precoder in the multiuser setting will produce residual interuser interference, as it only approximates the fully digital precoder. To this end, we propose to cascade an additional block diagonalization (BD) precoder at the baseband to cancel the interuser interference, which is shown to be effective to further improve the spectral efficiency and multiplexing gain.

• Analytical results on the performance gap between the fully- and partially-connected structures are provided. Furthermore, extensive comparisons are offered via simulations to unravel valuable design insights. In particular, the proposed algorithm helps the fully-connected structure to easily approach the performance of the fully digital precoder with a reasonably small amount of RF chains, which cannot be achieved by the widely used OMP algorithm. On the other hand, for the partially-connected structure, it turns out that the dynamic mapping from RF chains to antennas is crucial to achieve good performance. Furthermore, while the DPS partially-connected structure employs much fewer phase shifters, its performance is comparable to the SPS fully-connected structure with the OMP algorithm, which shows its great potential for practical implementation.

### I-C Organization

The remainder of this paper is organized as follows. We introduce the system model and the problem formulation in Section II. Then, hybrid precoder design for the fully- and partially-connected structures are demonstrated in Section III and Section IV, respectively. Simulation results will be presented in Section V. Finally, we conclude this paper in Section VI.

### I-D Notations

The following notations are used throughout this paper. The imaginary unit is denoted as ; and symbolize a column vector and a matrix, respectively; , , , and stand for the transpose, conjugate, conjugate transpose, and pseudo-inverse of matrix ; The -th row, the -th column, and the -th entry in matrix are denoted as , , and ; The determinant, Frobenius norm, and -norm of matrix are expressed as , , and ; denotes the -th largest eigenvalue of matrix , and the corresponding eigenvector is noted as ; and indicate the trace and vectorization of matrix ; and stand for the Hadamard and Kronecker products between two matrices; Expectation and the real part of a complex variable are denoted by and .

## Ii System Model and Problem Formulation

### Ii-a System Model

Consider the downlink transmission of a multiuser OFDM mm-wave MIMO system, as shown in Fig. 1, where the base station (BS) is equipped with antennas and transmits signals to -antenna users over subcarriers. On each subcarrier, data streams are transmitted to each user. The limitations of the RF chains are given by and , where and are the numbers of RF chains facilitated for the BS and each user, respectively.

The received signal for the -th user on the -th subcarrier is given by

 yk,f=WHBBk,fWHRFk(Hk,fK∑k=1FRFFBBk,fsk,f+nk,f), (1)

where the subscript represents the -th user on the -th subcarrier, and is the transmitted symbol vector such that . The digital baseband precoders and combiners are symbolized by and , respectively. Because the transmitted signals for all the users are mixed together via the digital baseband precoder, and the analog RF precoder is a post-IFFT (inverse fast Fourier transform) operation, the analog RF precoder is shared by all the users and subcarriers, denoted as . Similarly, the analog RF combiner is subcarrier-independent for each user , denoted as . Furthermore, the additive noise at the users is represented by , whose elements are independent and identically distributed according to the complex Gaussian distribution . The achievable sum rate on the -th subcarrier when transmitted symbols follow a Gaussian distribution is given by [6, 23]

 Rf=K∑k=1logdet(INs+1KNsFWHk,fHk,fFk,fFHk,fHHk,fWk,fΩ−1k,f), (2)

where and are the precoder and combiner matrices, and stands for the interference plus noise matrix.

The mm-wave MIMO channel between the BS and the -th user on the -th subcarrier, denoted as , can be characterized by the Saleh-Valenzuela model as [6, 8, 11]. Although this specific channel model will be used in the simulation, our precoder design approaches are compatible for other general channel models.

### Ii-B New Hybrid Precoder Implementation

According to the mapping strategies from RF chains to antennas, the hybrid precoder structures can be classified into the fully- and partially-connected ones [8, Fig. 1]. The fully-connected structure fully exploits the degrees of freedom (DoFs) in the RF domain with a natural mapping strategy, i.e., to connect each RF chain to all the antennas. On the contrary, in the partially-connected structure, each antenna element is connected to only one RF chain. These two different mapping strategies (structures) correspond to different constraints in the hybrid precoder design problem, which will be illustrated in detail later in Sections III and IV.

As mentioned before, the analog RF precoder is practically implemented by phase shifters. Conventionally, in either the fully- or partially-connected structure [8], each connection from a certain RF chain to one of its connected antenna elements is implemented by a single phase shifter, as shown in Fig. 2(a), which is referred to the SPS implementation in this paper. This mapping strategy implies that each non-zero element in the analog precoding and combining matrices should have unit modulus, i.e., . This is intrinsically a non-convex constraint and difficult to tackle, which forms the main design challenge. Although there exist some approaches that can directly deal with this non-convex constraint [8, 9], the design complexity is still unacceptable in mm-wave systems with much shorter coherent time compared to current sub-6 GHz systems. As a matter of fact, the main obstacle is that we can only adjust the phase but not the amplitude of the RF signals. This motivates us to consider an alternative hybrid precoder implementation which can adjust the amplitude of the RF signals, yet still realized by phase shifters.

In this paper, we propose a new implementation as shown in Fig. 2(b), referred as the DPS implementation [1], where the phase shifter network is divided into two groups. For each connection from an RF chain to one of its connected antenna elements, one unique phase shifter in each group will be selected and summed up together to compose the analog precoding gain. With this special implementation, each non-zero element in the analog RF precoding and combining matrices corresponds to a sum of two phase shifters. Note that the summation operation creates the possibility to adjust the amplitude of the RF signals, which should be less than two, i.e., the new constraints for the analog RF precoder and combiner are and for all the non-zero entries. By doubling the number of phase shifters, the new constraints become convex and therefore make it more tractable to develop low-complexity design approaches. We impose these amplitude constraints in this paper, and the actual implementation of the phase shifters can then be easily obtained by factorizing a complex number with amplitude less than two into two unit modulus components, expressed as

 aeȷθ=eȷ(θ+ϕ)+eȷ(θ−ϕ), (3)

where and are the amplitude and phase of the non-zero element in and , and .

Remark 1: Despite the increased number of phase shifters, as will be shown in this paper, the DPS implementation enjoys unique advantages in both algorithmic and performance aspects. It also provides valuable guidelines for other hybrid precoder design problems. We highlight the benefits of this proposal as follows.

• The DPS implementation greatly simplifies the hybrid precoder design will be greatly simplified when adopting the DPS implementation, , as illustrated in Sections III and IV.

• With this new implementation, hybrid precoders can approach the performance of the fully digital one with fewer RF chains than existing works. Thus, this proposal serves as an algorithmically efficient hybrid precoder design for general multiuser multicarrier mm-wave systems.

• The DPS fully-connected hybrid precoder structure serves as a performance upper bound for structures that are with lower hardware complexity. It is a tighter upper bound than the fully digital precoder, especially when the number of RF chains is small.

• The precoder design problem becomes a low-rank matrix approximation (eigenvalue) problem for the DPS fully-connected (partially-connected) structure, and theoretical analysis, which is intractable for other structures, becomes possible. It will then help to better understand hybrid precoding systems.

• Thanks to the benefits in both performance and algorithmic perspectives, the proposed DPS implementation would drive the hardware research for this implementation.

### Ii-C Problem Formulation

There exist different problem formulations for hybrid precoding. Some works tried to directly maximize the spectral efficiency based on approximations and bounds in single-user systems [9, 16], or based on some extra constraints on the analog precoder to simplify the design in multiuser single-carrier systems [15, 13]. However, when it comes to multiuser multicarrier systems, it is highly challenging and intractable to directly optimize the hybrid precoder with the spectral efficiency being the objective function, given that the spectral efficiency of each user on each subcarrier is coupled with each other by the shared analog RF precoder. On the other hand, extensive works showed that minimizing the Euclidean distance222In this paper, the Euclidean distance between two precoders refers to the Euclidean distance between two points determined by the vectorization of the two precoding matrices. between the fully digital precoder and the hybrid precoder is an effective surrogate for maximizing the spectral efficiency in mm-wave MIMO systems [6, 8, 20, 10, 11, 14, 1]. In this paper, we adopt this alternative objective as our design goal, whose formulation333In this paper, we focus on the precoder design, and the combiner design problem can be formulated in the same way without the transmit power constraint. is given by

 minimizeFRF,FBB ∥Fopt−FRFFBB∥2F (4) subjectto {FRF∈A∥FRFFBB∥2F≤KNsF,

where is the combined fully digital precoder, and is the concatenated digital baseband precoder. The second constraint is the transmit power constraint at the BS side. The analog RF precoder is a common component for all users and subcarriers, which is restricted in the candidate set induced by the phase shifter implementation. The set will be later specified for different hybrid precoder structures. Justifications for the formulation (4) for single-user systems with flat-fading channels were provided in [6]. Here we provide some intuition for this formulation for general hybrid precoding systems. The fully digital precoder serves as a performance upper bound for the hybrid one, and one ideal design goal is to obtain hybrid precoders that approach the performance of the fully digital one. Therefore, it is intuitive to formulate the design problem as approximating the fully digital precoder with the hybrid one.

With this formulation, the proposed algorithm can be applied with any fully digital precoder. In this paper, we adopt the classical BD precoder as the fully digital one, which is asymptotically optimal in the high signal-to-noise ratio (SNR) regime [24]. We will investigate the hybrid precoder design with the DPS implementation for the fully- and partially-connected structures in Sections III and IV, respectively.

## Iii Hybrid Precoding for the Fully-connected Structure

The fully-connected hybrid precoder structure has drawn much research attention in recent years [6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], which will be investigated in this section with the new DPS implementation. We will first present an RF-only precoder to demonstrate the advantages of doubling the phase shifters, where the optimization of the analog RF precoder is formulated as a LASSO problem. Afterwards, the hybrid precoder design will be performed via a simple low-rank matrix approximation.

### Iii-a RF-Only Precoding

The main difference between the conventional SPS hybrid precoder implementation and the proposed DPS one is on the analog RF precoder. Therefore, we first present an RF-only precoder design [25], where the analog RF precoder is optimized for a given digital precoder. This problem may arise as a subproblem in hybrid precoder design, as in [1, 14, 9], or for situations where the digital precoder has a fixed design, e.g., from a codebook. The investigation of this problem will demonstrate the algorithmic advantage of the DPS implementation. For the fully-connected structure, the feasible set can be specified as , as each RF chain is connected to all the antenna elements. The optimization of the analog RF precoder design problem is given by

 minimizeFRF ∥Fopt−FRFFBB∥2F (5) subjectto FRF∈Af.

Note that the power constraint in (4) is temporarily removed. In fact, after designing the analog RF precoder, we can normalize it if the transmit power constraint is not satisfied. It has been shown in [8, Lemma 1] that as long as we can make the Euclidean distance between the fully digital precoder and the hybrid precoder sufficiently small when ignoring the power constraint, the normalization step will also achieve a small distance to the fully digital precoder. The optimization problem (5) is a convex one and can be solved by solvers such as CVX. Nevertheless, to further reduce the computational complexity, we will exploit the inherent structure of the solution by considering its dual problem.

###### Lemma 1.

The dual problem of (5) is a LASSO problem, given by

 minimizex12∥Ax−b∥22+2∥x∥1. (6)

The parameters and are given by

where and is the singular value decomposition (SVD) of . The optimal solution of (5) can be written as

 vec(F⋆RF)=f⋆RF=AH(b−Ax⋆). (8)
###### Proof:

See Appendix A. ∎

Based on Lemma 1, problem (5) is transferred to a LASSO problem. This provides the opportunity to leverage the large body of existing works on low-complexity algorithms to solve the general LASSO problem [26]. Recall that, with the conventional SPS implementation, the analog RF precoder is optimized through high-complexity algorithms such as manifold optimization [8] to achieve good performance. In contrast, doubling the phase shifters equips us with huge potential to significantly reduce the computational complexity when designing the analog RF precoder.

What deserves an additional mention is a special case where we can get a closed-form solution, which will further reduce the computational complexity. It was shown in [8, 9] that a semi-orthogonal structure of the digital baseband precoder, i.e., 444Note that in mm-wave multiuser OFDM systems, , where for practical system parameters, which means is a fat matrix., leads to an approximately optimal solution. Therefore, we resort to this special case where the observation matrix in the LASSO problem (6) is also semi-orthogonal, i.e.,

 (9)

With the semi-orthogonal observation matrix , the LASSO problem (6) has a closed-form solution called soft-thresholding [26], which is given by

 x⋆=exp{ȷ∠(AHb)}∘(∣∣AHb∣∣−2)+, (10)

where , , and are element-wise operations, and the first two extract the phase and amplitude of a complex variable, respectively. Then, substituting (9) and (10) to (8), we obtain the corresponding optimal solution to in (5) as

 F⋆RF=FoptFHBB−exp{ȷ∠(FoptFHBB)}∘(∣∣FoptFHBB∣∣−2)+. (11)

Note that, in order to obtain the optimal analog RF precoder when the digital baseband precoder is semi-orthogonal, a product of and is the only required step, which is computationally much more efficient than solving the original problem (5) using an algorithm-embedded solver. This result also suggests that it is beneficial to set the digital baseband precoder as a semi-orthogonal one in the RF-only precoding with the DPS implementation.

### Iii-B Hybrid Precoding

Previously, we demonstrated the benefit of doubling the phase shifters when optimizing the analog part. When the digital baseband precoder can be jointly optimized, the hybrid precoder design problem is further simplified as an unconstrained matrix decomposition problem, i.e.,

 minimizeFRF,FBB∥Fopt−FRFFBB∥2F. (12)

Remark 2: The constraint in (5), i.e., , is in fact redundant in hybrid precoding. Once a pair of the unconstrained optimal solution is obtained, one can always get another pair of optimal solution with the factor to satisfy the constraint , which will not affect the objective value. On the other hand, one may consider deploying phase shifters for each connection from an RF chain to an antenna, and the corresponding constraint would be . As illustrated above, this constraint is redundant and the factor can be applied. Therefore, from both performance and algorithmic perspectives, it does not help to further increase the number of phase shifters. Obviously, the minimum number, i.e., two phase shifters, should be adopted due to cost and power consideration. Furthermore, the transmit power constraint is automatically satisfied by the optimal solution of the hybrid precoder, which will be elaborated in the following optimization.

While the main focus of this paper is on multiuser multicarrier systems, some advantages of the proposed DPS implementation in hybrid precoding will be firstly presented in single-carrier systems, as shown in the following result.

###### Lemma 2.

For single-carrier systems, with the DPS implementation, the fully digital precoder can be perfectly decomposed into and using the minimum number of RF chains, i.e., and .

###### Proof:

The proof can be easily obtained by the rank sufficiency of and in the decomposition when , and is omitted due to space limitation. ∎

Lemma 2 shows that, for single-carrier systems with either single-user or multiuser transmissions, the performance of the fully digital precoder can be easily achieved by the hybrid precoder via a simple matrix decomposition. Note that, with the conventional SPS implementation, the number of RF chains should be at least twice that of the data streams in order to realize the fully digital precoder, i.e., and [8, 9]. In this case, since the numbers of phase shifters in use are the same, i.e., at the BS, for both the SPS and DPS implementations, the proposed DPS implementation, which requires fewer RF chains, is more energy efficient when achieving the fully digital precoder.

When it comes to multiuser multicarrier systems, typically , the rank of should be (no longer as single-carrier systems)555Without loss of generality, we assume all the precoding matrices in (12) have full rank. and thus perfect decomposition can only be achieved when , which, however, severely deviates from the setting of hybrid precoding. Therefore, the matrix decomposition cannot be perfect for hybrid precoder design due to the rank deficiency, i.e., . Therefore, problem (12) is typically a low-rank matrix approximation problem, with a closed-form solution as

 (FRFFBB)⋆≜^Fopt=U1S1VH1. (13)

Denote the SVD of as , where matrices and are the first columns of and , respectively, and is the diagonal matrix whose diagonal elements are the largest singular values of . This means that the optimal solution of is simply obtained by extracting the most principle components of . From the optimal solution (13), we observe that

 ∥∥(FRFFBB)⋆∥∥2F=∥∥^Fopt∥∥2F≤∥Fopt∥2F≤KNsF, (14)

which means the transmit power constraint is satisfied by the optimal solution . Until now we have obtained the optimal solution for the entire hybrid precoder, and our next task is to decompose it into two parts. In fact, a large number of options are available for decomposing into . Nevertheless, we are especially interested in the following one.

###### Lemma 3.

The matrix can be decomposed into in the following form:

 FRF=⎡⎣INtRF^Fopt,2^F†opt,1⎤⎦,FBB=^Fopt,1,

where , and are the first rows and the -th to -th rows of , respectively.

###### Proof:

Assume , then the main task to prove Lemma 3 is to find and that satisfy .

First, we have

 FRFFBB=[INtRFX]FBB=[FBBXFBB]=^Fopt=[^Fopt,1^Fopt,2].

Therefore, it is easy to determine that . The remaining task is to solve the equation

 X^Fopt,1=^Fopt,2.

Since is with rank and is obtained by the SVD of , the first rows of (the rows of ) are linearly independent, and the remaining rows in (the rows of ) can be linearly expressed by the rows of . Hence, is the solution to the equation, which completes the proof. ∎

The advantage of this decomposition form lies in the pattern of in Lemma 3. The first rows of form an identity matrix, which in fact does not need a phase shifter implementation since the zero elements correspond to no connections whereas the diagonal elements refer to direct connections from RF chains to antennas. This means we only need phase shifters in the analog RF precoder, instead of . Although a similar result was presented in [18], note that the result in Lemma 3 saves more phase shifters and the method is simpler and more straightforward than the decomposition procedure involving two QR decompositions as in [18]. Furthermore, the decomposition pattern in Lemma 3 can also be applied to single-carrier systems based on the result in Lemma 2, which will further improve the energy efficiency when achieving the fully digital precoder.

As demonstrated above, by doubling the phase shifters, what we need for the hybrid precoder design is computing a subset of singular values and vectors of the fully digital precoding matrix, i.e., the most principle components of , whose computational complexity is . Recall that OMP, as the most popular algorithm for the conventional SPS implementation, is with the computational complexity , which is higher than that of the simple approach we proposed and is related to the channel parameter . In other words, the proposed DPS implementation equips us with precoding algorithms computationally much more efficient than existing ones. Later in Section V, its merits on achievable performance will also be demonstrated via simulations.

### Iii-C DPS-Enabled SPS Hybrid Precoding

In this part, inspired by the above hybrid precoder design, we propose an efficient way to design the conventional SPS implementation. In particular, based on the solution for the DPS implementation, we adopt an heuristic way to tackle the unit modulus constraints induced by the SPS implementation.

As shown in (13), the optimal hybrid precoder can be decomposed by SVD. Therefore, one optimal solution to the hybrid precoder with the DPS implementation is

 FRF=U1,FBB=S1VH1. (15)

Note that the unitary matrix fully extracts the information of the column space of , whose basis are the orthonormal columns in .

In contrast, in the SPS implementation, the unit modulus constraints, i.e., not only require each column in to have a constant norm like , but also induce an element-wise constraint. Since each element in can only contain the phase information, we propose to extract the phases of the optimal analog precoder for the DPS implementation to construct the SPS solution, given by

 FRF=exp{ȷ∠(U1)}. (16)

Although this step is based on heuristics, it shall be shown in Section V that simply extracting the phase information only incurs negligible performance loss. Similar approaches can also be found in [15, 8].

Compared with existing hybrid precoding algorithms with the SPS implementation, e.g., the MO-AltMin [8], OMP [6] algorithms, and the algorithm in [16], the proposed DPS-enabled design method enjoys much lower computational complexity without any iterative procedure, which makes it a good candidate for low-complexity hybrid precoding with the SPS fully-connected structure.

### Iii-D Interuser Interference Cancellation

While we can perfectly cancel the interuser interference with the fully digital precoder , there will be residual interuser interference when applying the hybrid precoder, which is an approximation of the fully digital one. For the same reason, as hybrid combining is adopted at the receiver side, the interuser interference cannot be canceled by the receiver either. Later in Section V, we will see that in multiuser multicarrier systems, interuser interference is a severe problem that will dramatically degrade the hybrid precoding performance, especially at high SNRs.

In this subsection, after designing the hybrid precoder and combiner, we propose to cascade another digital baseband precoder that is responsible for canceling the residual interuser interference. In particular, with the hybrid precoder and combiner at hand, we define an effective channel for the -th user on the -th subcarrier as

 ^Hk,f=WHBBk,fWHRFkHk,fFRFFBBf, (17)

where is the composite digital precoder on the -th subcarrier. Our goal is to design the precoders , which satisfy the conditions

 ^Hj,fFBDk,f=0,k≠j. (18)

A simple way to achieve the conditions is the BD precoder, and note that the dimension of the effective channel is , which is sufficient for BD. More details can be found in [24]. Therefore, after cascading the BD precoder at the baseband, the overall digital baseband precoder of the -th user on the -th subcarrier is

 FBk,f=FBBk,fFBDk,f. (19)

Since now we have obtained an interuser interference free system, we can normalize the precoder to satisfy the maximum transmit power, in order to improve the SNRs of the users. The same approach to cancel the interuser interference will also be used in the partially-connected structure and will not be repeatedly presented in the next section.

## Iv Hybrid Precoding for the Partially-connected Structure

One of the shortages of the fully-connected structure is the large number of phase shifters. The partially-connected structure, as a more energy efficient and cost-effective structure [8, 21], employs notably fewer phase shifters, i.e., phase shifters with the DPS implementation, which lends itself to practical implementation. Since the DoFs of the analog precoder is greatly reduced, RF-only precoding is far from satisfactory in the partially-connected structure. In this section, we shall first present the hybrid precoding with a fixed mapping from RF chains to antennas. Two algorithms will be then proposed to perform dynamic mapping to further improve the performance.

### Iv-a Hybrid Precoding With Fixed Mapping

In [8, 21], fixed mapping was considered in the partially-connected structure, i.e., each RF chain is connected to a certain number of antennas in a predetermined manner. To present the hybrid precoder design with fixed mapping clearly, we take one special mapping [8, 21, 27] as an example in the following, where the -th RF chain is connected to the -th set of adjacent antennas. The corresponding constraint on the analog RF precoding matrix can be visualized as a set of block diagonal matrices , where each block is an dimension vector, i.e.,

 (20)

where . The amplitude of the analog precoding gain for the -th connection from RF chains to antennas is denoted as . Similar to the hybrid precoding in the DPS fully-connected structure, the constraints are redundant and therefore they are omitted in the following derivation. Furthermore, the transmit power constraint is also automatically satisfied by the optimal solution of the hybrid precoder, which will be shown in the following parts. Thus, the hybrid precoder design problem with fixed mapping can be recast as

 minimizeFRF,FBB ∥Fopt−FRFFBB∥2F (21) subjectto FRF∈Ab.

Note that there is only one non-zero element in each row of the analog RF precoding matrix . Due to this special structure, different vectors will be multiplied by distinct rows of , which decouples problem (21) into subproblems in an RF chain-by-RF chain sense. The optimization of the hybrid precoder for the -th RF chain is given by

 Pj:minimize{ai},xj∑i∈Fj∥∥yi−aixj∥∥22, (22)

where , , and .

###### Proposition 1.

The optimal solution to the subproblem is given by the following closed-form expression.

 x⋆j=λ1⎛⎝∑i∈FjyiyHi⎞⎠,a⋆i=xHjyi||xj||22. (23)
###### Proof:

We check the first order optimality conditions as

 ∂∂aif(ai,xj)=0⇒−yHixj+a∗i||xj||22=0⇒ai=xHjyi||xj||22 (24) ∂∂xjf(ai,xj)=0⇒∑i∈Fj−aiyHi+|ai|2xHj=0⇒∑i∈Fj|ai|2xj=∑i∈Fja∗iyi., (25)

where is the objective function in subproblem . Substituting (24) into (25), we can get

 xHj∑i∈Fj|ai|2=xHj||xj||22∑i∈FjyiyHi⇒⎛⎝∑i∈FjyiyHi⎞⎠xj=⎛⎝||xj||22∑i∈Fj|ai|2⎞⎠xj≜λjxj, (26)

which shows that and are the eigenvalue and eigenvector of . Moreover, by substituting (24) into the objective function in , it can be rewritten as

 f(ai,xj)=∑i∈FjyHiyi−|ai|2xHjxj=∑i∈Fj||yi||22−λj. (27)

Hence, minimizing the objective function is equivalent to taking as the largest eigenvalue of the covariance matrix , denoted as .

From equation (27), we obtain

 ∥Fopt−FRFFBB∥2F=∥Fopt∥2F−NtRF∑j=1λj=∥Fopt∥2F−NtRF∑j=1⎛⎝||xj||22∑i∈Fj|ai|2⎞⎠=∥Fopt∥2F−∥FRFFBB∥2F≥0⇒∥FRFFBB∥2F≤∥Fopt∥2F≤KNsF, (28)

which means that the transmit power constraint is naturally satisfied by the optimal solutions. While we fixed the mapping strategy as shown in (20), the proposed design approach is applicable to an arbitrary mapping strategy.

### Iv-B Hybrid Precoding With Dynamic Mapping

Different from the fully-connected structure that utilizes all the connections from RF chains to antennas, the partially-connected structure will induce non-negligible performance loss [8]. In this section, we propose to improve its performance by optimizing the mapping strategy, i.e., we will dynamically determine for each RF chain which antennas it should be connected. The dynamic mapping problem is given as

where is a set of matrices for which every row only has one non-zero entry, i.e., , meaning that each antenna can only be connected to one RF chain. As indicated by equation (28), once the mapping is fixed, the optimal value of the objective function in (21) is

 (30)

Hence, when we have the freedom to design the mapping strategy from RF chains to antennas, the design target is to seek the mapping that maximizes the sum of the largest eigenvalues, i.e.,

 maximize{Dj}NtRFj=1 NtRF∑j=1λ1⎛⎝∑i∈DjyiyHi⎞⎠ (31) subjectto {∪NtRFj=1Dj={1,⋯,Nt}Dj∩Dk=∅,∀j≠k,

where is the mapping set containing the antenna indices that are mapped to the -th RF chain. The dynamic mapping problem is a combinatorial problem and the optimal solution can be given by exhaustive search with an extremely huge number of possible mapping strategies as , which prevents its practical implementation. Therefore, we first propose a greedy algorithm to solve the problem. The pseudocode of the algorithm is omitted due to its simplicity and space limitation.

In each iteration of the greedy algorithm, we connect the -th antenna to the -th RF chain, which is the connection with the maximum increment of the largest eigenvalue when this connection is added into the mapping network. Note that the computational complexity of the algorithm is dominated by the calculation of the largest eigenvalue. In the greedy algorithm, the number of times we need to perform the eigenvalue decomposition (EVD) is , which is a quite large number especially when large-scale antenna arrays are leveraged in mm-wave MIMO systems. To relieve us from the high computational complexity, we then propose a modified K-means algorithm to solve the dynamic design problem (31).

We reconsider problem (31) as follows. The problem is equivalent to classifying vectors (antennas) into clusters (RF chains). K-means, aiming at partitioning the observation vectors into clusters, is a prevalent approach for cluster analysis in data mining, where is a predefined parameter, and turns out to be suitable for problem (31). In the classical K-means algorithm, the objective is to minimize the sum of the Euclidean distances from each observation vector to the centroid of the cluster it belongs to. The distortion function that is to be minimized in the classical K-means algorithm is given by666We present the distortion function in the classified K-means algorithm with a slight abuse of notations and so that the content of the modified one in the following is easier to follow.

 D(yi,xj)=Kcl∑j=1∑i∈Kj||yi−xj||22, (32)

where are the observation vectors while is the centroid of the -th cluster.

However, this distortion function cannot be directly adopted to solve the dynamic mapping design problem (31) since the objectives are quite different. In (31), the objective is to maximize the sum of the largest eigenvalues of the covariance matrices of each cluster. Therefore, we propose to modify the distortion function in the K-means algorithm as

 (33)

The modified distortion function is the sum of Rayleigh quotients of the covariance matrices of each cluster, whose optimal value is the sum of the largest eigenvalues when we maximize (33) over . The overall clustering problem can be written as

 maximize{Dj,xj}NtRFj=1 (34) subjectto {∪NtRFj=1Dj={1,⋯,Nt}Dj∩Dk=∅,∀j≠k.

We propose to adopt alternating maximization to solve this problem, which alternately updates the clustering and centroids when the other one is fixed. This approach results in closed-form solutions for the two update procedures.

In the clustering update, we allocate each vector to the cluster whose centroid has the largest inner product with it, i.e., allocate to the -th cluster, where

 j⋆=argmaxj∣∣yHixj∣∣2. (35)

In the centroid update, the optimization of the centroids is equivalent to maximizing the Rayleigh quotients for each cluster, whose optimal solution is simply given by the eigenvector corresponding to the largest eigenvalue, i.e.,

 x⋆j=λ1⎛⎝∑i∈DjyiyHi⎞⎠. (36)

Now we have the modified K-means algorithm, which is summarized as Algorithm 1.

Note that Steps 3 and 4 both give the globally optimal solutions to the clustering and centroid. Hence, the algorithm will converge to a stationary point since it is a two block coordinate descent procedure [28]. Because the modified distortion function is not jointly convex with respect to and , the modified K-means algorithm converges to a local optimum of problem (34) so the solution is sensitive to the initial centroids selection. For hybrid precoding, the size of the observation set is much larger than the cluster number, i.e., . One heuristic rule of thumb to design the initial centroids is to pick observation vectors with small correlations. In our proposed algorithm, we propose to select pairs of vectors out of the observation vectors as the initial centroids, which have the smallest inner products.

Recall that EVD is the dominant part of the computational complexity in dynamic mapping design. In each alternating iteration in the modified K-means algorithm, times of EVD are needed and therefore the overall times are , where is the iteration number. For practical settings in Section V, the modified K-means algorithm typically converges within 10 iterations, which is much less than and thus results in significant complexity reduction compared to the greedy algorithm.

### Iv-C Fully-connected vs. Partially-connected Structures

There exist several studies [13, 8, 9] investigating different design algorithms for the fully- and partially-connected structures, and comparisons between these two structures are provided via simulations. However, to the best of the authors’ knowledge, so far there is no analytical quantitative comparisons between different structures. The complicated design approaches to handle the unit modulus constraints induced by the SPS implementation are the main obstacles. With the DPS implementation and its resulting low-complexity design approaches at hand, we are able to fill this gap. Following (13) and (21) in Sections III and IV, we obtain that

 f⋆f=Nt∑p=NtRF+1σ2p(Fopt),f⋆p=∥Fopt∥2F−NtRF∑j=1λ1⎛⎝∑i∈DjyiyHi⎞⎠, (37)

where and are the optimal values of the objective function in (4) for the fully- and partially-connected structures, respectively, and denotes the -th largest singular value of . We define the performance gap between the fully- and partially-connected structures as the different between two objective values, i.e.,

 Δ≜f⋆p−f⋆f=∥Fopt∥2F−NtRF∑j=1λ1⎛⎝∑i∈DjyiyHi⎞⎠−Nt∑p=NtRF+1σ2p(Fopt)=NtRF∑p=1λp(FHoptFopt)−NtRF∑j=1λ1⎛⎝∑i∈DjyiyHi⎞⎠=NtRF∑p=1λp(FHoptFopt)−NtRF∑j=1λ1(YjYHj), (38)

where is composed of the vectors as its columns. Therefore, once the RF-antenna mapping in the partially-connected structure is determined, the performance gap is given by (38), which provides an analytical comparison of two hybrid precoder structures. This expression indicates that the performance gap depends on the channel realization, as well as the RF chain-antenna mapping strategy.

## V Simulation Results

In this section, we numerically evaluate the performance of the proposed hybrid precoder design for multiuser OFDM mm-wave MIMO systems, with subcarriers are assumed. The channel parameters are given by clusters, rays, and the average power of each cluster is . The AoDs and AoAs follow the Laplacian distribution with uniformly distributed mean angles in and angular spread of 10 degrees. The antenna elements in the USPA are separated by half wavelength, and all simulation results are averaged over 1000 channel realizations.

### V-a RF-Only Precoding in the Fully-Connected Structure

First, we test how much performance gain we can get when we double the number of phase shifters via investigating the RF-only precoding in the fully-connected structure. Note that, with the conventional SPS implementation, manifold optimization was shown in [8] to be an effective method to directly tackle the unit modulus constraints and achieve higher spectral efficiency than other existing works. In this subsection, we adopt Algorithm 1 in [8] for the SPS implementation as the benchmark to show the advantage of the proposed DPS implementation. For fair comparison, we always keep the same digital baseband precoder that is semi-orthogonal for both SPS and DPS implementations in the simulation777Note that we cannot adjust the digital precoder in the RF-only precoding so we do not apply the additional BD mentioned in Section III-D at this stage..