Doubling Phase Shifters for Efficient Hybrid Precoder Design in MillimeterWave Communication Systems
Abstract
Hybrid precoding is a costeffective approach to support directional transmissions for millimeterwave (mmwave) communications, but its precoder design is highly complicated. In this paper, we propose a new hybrid precoder implementation, namely the double phase shifter (DPS) implementation, which enables highly tractable hybrid precoder design. Efficient algorithms are then developed for two popular hybrid precoder structures, i.e., the fully and partiallyconnected structures. For the fullyconnected one, the RFonly precoding and hybrid precoding problems are formulated as a least absolute shrinkage and selection operator (LASSO) problem and a lowrank matrix approximation problem, respectively. In this way, computationally efficient algorithms are provided to approach the performance of the fully digital one with a small number of radio frequency (RF) chains. On the other hand, the hybrid precoder design in the partiallyconnected structure is identified as an eigenvalue problem. To enhance the performance of this costeffective structure, dynamic mapping from RF chains to antennas is further proposed, for which a greedy algorithm and a modified Kmeans algorithm are developed. Simulation results demonstrate the performance gains of the proposed hybrid precoding algorithms over existing ones. It shows that, with the proposed DPS implementation, the fullyconnected structure enjoys both satisfactory performance and low design complexity while the partiallyconnected one serves as an economic solution with low hardware complexity.
I Introduction
The proliferation of smart mobile devices has resulted in an everincreasing wireless data explosion, which calls for an exponential increase in the capacity of wireless networks. In particular, the upcoming 5G networks require a 1000X increase in capacity by 2020 [2]. The spectrum crunch in current wireless systems stimulates extensive interests on exploiting new spectrum bands for cellular communications, and millimeterwave (mmwave) bands from 30 GHz to 300 GHz have been demonstrated to be promising candidates in recent experiments [3]. Thanks to the smaller wavelength of mmwave signals, largescale antenna arrays can be leveraged at both the transmitter and receiver sides, which can provide spatial multiplexing gains with the help of multipleinput multipleoutput (MIMO) techniques. On the other hand, the tenfold increase of the carrier frequency introduces several challenges to mmwave communication systems, especially the high power consumption and cost of hardware components at mmwave bands [4]. In addition, the large available bandwidth at mmwave frequencies induces severe frequency selectivity, for which multicarrier techniques such as orthogonal frequencydivision multiplexing (OFDM) shall be utilized. All the abovementioned design aspects should be taken into consideration when developing practical transceivers for mmwave MIMO systems.
By utilizing a small number of radio frequency (RF) chains to combine a lowdimensional digital baseband precoder and a highdimensional analog RF precoder, hybrid precoding stands out as a costeffective transceiver solution for mmwave MIMO systems [5, 6, 7]. Compared with conventional MIMO systems, the additional highdimensional analog RF precoder is the differentiating part. According to the mapping strategies from RF chains to antennas in the analog RF precoder, hybrid precoders can be categorized into the fully and partiallyconnected structures [8]. In the fullyconnected structure, each antenna is connected to all the RF chains. In contrast, each antenna is connected to one RF chain in the partiallyconnected structure, with a significant reduction in the hardware complexity.
To effectively reduce the power consumption in the RF domain, analog RF precoders are usually implemented by phase shifters at the expense of sacrificing the ability to adjust the amplitude of the RF signals [5]. Thus, the analog component forms the major challenge in designing hybrid precoders. Given the large dimension of the design space and the unit modulus constraint induced by the phase shifter implementation, an important design aspect of hybrid precoders is the computational complexity. While various attempts have been made to balance the performance and computational complexity, there is no systematic approach to design computationally efficient hybrid precoders with satisfactory performance in the meanwhile. In this paper, we will show the great potential to develop efficient hybrid precoding algorithms by adopting a novel double phase shifter (DPS) hybrid precoder implementation.
Ia Related Works and Motivation
Most existing works on hybrid precoding focused on the fullyconnected structure [6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. The initial efforts started from singleuser singlecarrier^{1}^{1}1In this paper, singlecarrier systems refer to singlecarrier transmissions over flatfading channels. mmwave systems [6, 8, 9, 10]. Then, the investigation was extended to singleuser multicarrier [8, 11, 12] and multiuser singlecarrier systems [13, 14, 15, 16]. The main differences in these existing works are the approaches in dealing with the unit modulus constraints on the analog RF precoder.
By choosing the analog beamforming vectors from a predefined candidate set, e.g., array response vectors in [6, 11, 13, 14] and discrete Fourier transform beamformers in [10], a greedy algorithm called orthogonal matching pursuit (OMP) has been widely used in designing hybrid precoders. Although its computational complexity is relatively low, the performance is not satisfactory and has been improved by several followedup works. In [8], it was shown that the unit modulus constraints define a Riemannian manifold, and manifold optimization was introduced to directly tackle them, which helps to approach the performance of the fully digital one with a small number of RF chains. Furthermore, the contribution of each phase shifter to the spectral efficiency was identified in [9, 16], based on which the analog precoder was optimized in a phase shifterbyphase shifter fashion. However, these algorithms all involve iterative procedures to optimize the analog RF precoders, which results in high computational complexity. Moreover, there were also some studies on how to achieve the performance of the fully digital precoder with the hybrid structure [17, 18], yet requiring a large number of RF chains, which, to some extend, deviates from the motivation of hybrid precoding.
On the other hand, less attention has been paid on hybrid precoding in the partiallyconnected structure. In [19, 20], codebookbased design of hybrid precoders was presented for singleuser narrowband and OFDM systems, respectively. While using codebook enjoys a low complexity, there will be certain performance loss, and how to design the codebook remains to be clarified. By migrating the concept of successive interference cancellation, an iterative hybrid precoding algorithm in the partiallyconnected structure was proposed in [21] for singleuser singlecarrier systems. Since the partiallyconnected structure employs much fewer phase shifters, there should be some inevitable degradation in the analog precoding gain, which makes it difficult for such structure to achieve a high spectral efficiency, especially when the analog precoder is shared across all the users and subcarriers as in the multiuser multicarrier systems. Hence, how to efficiently use the limited number of phase shifters is an urgent issue to be solved in the partiallyconnected structure.
As illustrated above, in both the fully and partiallyconnected structures, there is no comprehensive way to efficiently design hybrid precoders with satisfactory performance, which motivates us to seek a new hybrid precoding architecture that can relieve us from the current dilemma. Furthermore, it is still unclear how to design hybrid precoders in multiuser multicarrier systems, where a single analog RF precoder is shared by a large number of subcarriers, and multiple users that interfere with each other. In this paper, we propose a novel DPS implementation for hybrid precoding in the general setting of multiuser OFDM mmwave MIMO systems. Although similar implementations were considered in [18, 22], the systematic design approach and algorithmic advantages of this new implementation have not been exploited, which will be illustrated in this paper via effective algorithms for different hybrid precoder structures.
IB Contributions
Conventionally, a single phase shifter is used to connect an RF chain and an antenna, i.e., the SPS implementation, which introduces the unit modulus constraints and hinders efficient algorithm design. In this paper, to overcome this algorithmic difficulty, we propose a novel hybrid precoder implementation that makes the precoder design more tractable. Our main contributions are summarized as follows.

We propose a novel hybrid precoder implementation, i.e., the DPS implementation, which relaxes the unit modulus constraints of the analog RF precoder and thus enables computationally efficient hybrid precoder design. To the best of the authors’ knowledge, this is the first attempt to directly adopt the DPS implementation for designing hybrid precoders in multiuser OFDM mmwave MIMO systems.

For the fullyconnected structure, the optimization of the analog RF precoder is formulated as a least absolute shrinkage and selection operator (LASSO) problem, based on which efficient algorithms are developed. Furthermore, the hybrid precoder design is identified as a lowrank matrix approximation problem, which has a closedform solution. Furthermore, the efficient algorithm for the DPS implementation inspires an effective heuristic hybrid precoder design for the conventional SPS implementation, which outperforms the stateoftheart algorithms in both computational complexity and spectral efficiency.

For the partiallyconnected structure, we identify that the hybrid precoder design is an eigenvalue problem, and provide closedform solutions for both analog RF and digital baseband precoders. To further improve the system performance, a dynamic partiallyconnected structure is proposed. Two effective algorithms, i.e., the greedy and modified Kmeans algorithms, are proposed to dynamically optimize the mapping strategies from RF chains to antennas.

For both structures, we discover that the hybrid precoder in the multiuser setting will produce residual interuser interference, as it only approximates the fully digital precoder. To this end, we propose to cascade an additional block diagonalization (BD) precoder at the baseband to cancel the interuser interference, which is shown to be effective to further improve the spectral efficiency and multiplexing gain.

Analytical results on the performance gap between the fully and partiallyconnected structures are provided. Furthermore, extensive comparisons are offered via simulations to unravel valuable design insights. In particular, the proposed algorithm helps the fullyconnected structure to easily approach the performance of the fully digital precoder with a reasonably small amount of RF chains, which cannot be achieved by the widely used OMP algorithm. On the other hand, for the partiallyconnected structure, it turns out that the dynamic mapping from RF chains to antennas is crucial to achieve good performance. Furthermore, while the DPS partiallyconnected structure employs much fewer phase shifters, its performance is comparable to the SPS fullyconnected structure with the OMP algorithm, which shows its great potential for practical implementation.
IC Organization
The remainder of this paper is organized as follows. We introduce the system model and the problem formulation in Section II. Then, hybrid precoder design for the fully and partiallyconnected structures are demonstrated in Section III and Section IV, respectively. Simulation results will be presented in Section V. Finally, we conclude this paper in Section VI.
ID Notations
The following notations are used throughout this paper. The imaginary unit is denoted as ; and symbolize a column vector and a matrix, respectively; , , , and stand for the transpose, conjugate, conjugate transpose, and pseudoinverse of matrix ; The th row, the th column, and the th entry in matrix are denoted as , , and ; The determinant, Frobenius norm, and norm of matrix are expressed as , , and ; denotes the th largest eigenvalue of matrix , and the corresponding eigenvector is noted as ; and indicate the trace and vectorization of matrix ; and stand for the Hadamard and Kronecker products between two matrices; Expectation and the real part of a complex variable are denoted by and .
Ii System Model and Problem Formulation
Iia System Model
Consider the downlink transmission of a multiuser OFDM mmwave MIMO system, as shown in Fig. 1, where the base station (BS) is equipped with antennas and transmits signals to antenna users over subcarriers. On each subcarrier, data streams are transmitted to each user. The limitations of the RF chains are given by and , where and are the numbers of RF chains facilitated for the BS and each user, respectively.
The received signal for the th user on the th subcarrier is given by
(1) 
where the subscript represents the th user on the th subcarrier, and is the transmitted symbol vector such that . The digital baseband precoders and combiners are symbolized by and , respectively. Because the transmitted signals for all the users are mixed together via the digital baseband precoder, and the analog RF precoder is a postIFFT (inverse fast Fourier transform) operation, the analog RF precoder is shared by all the users and subcarriers, denoted as . Similarly, the analog RF combiner is subcarrierindependent for each user , denoted as . Furthermore, the additive noise at the users is represented by , whose elements are independent and identically distributed according to the complex Gaussian distribution . The achievable sum rate on the th subcarrier when transmitted symbols follow a Gaussian distribution is given by [6, 23]
(2) 
where and are the precoder and combiner matrices, and stands for the interference plus noise matrix.
The mmwave MIMO channel between the BS and the th user on the th subcarrier, denoted as , can be characterized by the SalehValenzuela model as [6, 8, 11]. Although this specific channel model will be used in the simulation, our precoder design approaches are compatible for other general channel models.
IiB New Hybrid Precoder Implementation
According to the mapping strategies from RF chains to antennas, the hybrid precoder structures can be classified into the fully and partiallyconnected ones [8, Fig. 1]. The fullyconnected structure fully exploits the degrees of freedom (DoFs) in the RF domain with a natural mapping strategy, i.e., to connect each RF chain to all the antennas. On the contrary, in the partiallyconnected structure, each antenna element is connected to only one RF chain. These two different mapping strategies (structures) correspond to different constraints in the hybrid precoder design problem, which will be illustrated in detail later in Sections III and IV.
As mentioned before, the analog RF precoder is practically implemented by phase shifters. Conventionally, in either the fully or partiallyconnected structure [8], each connection from a certain RF chain to one of its connected antenna elements is implemented by a single phase shifter, as shown in Fig. 2(a), which is referred to the SPS implementation in this paper. This mapping strategy implies that each nonzero element in the analog precoding and combining matrices should have unit modulus, i.e., . This is intrinsically a nonconvex constraint and difficult to tackle, which forms the main design challenge. Although there exist some approaches that can directly deal with this nonconvex constraint [8, 9], the design complexity is still unacceptable in mmwave systems with much shorter coherent time compared to current sub6 GHz systems. As a matter of fact, the main obstacle is that we can only adjust the phase but not the amplitude of the RF signals. This motivates us to consider an alternative hybrid precoder implementation which can adjust the amplitude of the RF signals, yet still realized by phase shifters.
In this paper, we propose a new implementation as shown in Fig. 2(b), referred as the DPS implementation [1], where the phase shifter network is divided into two groups. For each connection from an RF chain to one of its connected antenna elements, one unique phase shifter in each group will be selected and summed up together to compose the analog precoding gain. With this special implementation, each nonzero element in the analog RF precoding and combining matrices corresponds to a sum of two phase shifters. Note that the summation operation creates the possibility to adjust the amplitude of the RF signals, which should be less than two, i.e., the new constraints for the analog RF precoder and combiner are and for all the nonzero entries. By doubling the number of phase shifters, the new constraints become convex and therefore make it more tractable to develop lowcomplexity design approaches. We impose these amplitude constraints in this paper, and the actual implementation of the phase shifters can then be easily obtained by factorizing a complex number with amplitude less than two into two unit modulus components, expressed as
(3) 
where and are the amplitude and phase of the nonzero element in and , and .
Remark 1: Despite the increased number of phase shifters, as will be shown in this paper, the DPS implementation enjoys unique advantages in both algorithmic and performance aspects. It also provides valuable guidelines for other hybrid precoder design problems. We highlight the benefits of this proposal as follows.

The DPS implementation greatly simplifies the hybrid precoder design will be greatly simplified when adopting the DPS implementation, , as illustrated in Sections III and IV.

With this new implementation, hybrid precoders can approach the performance of the fully digital one with fewer RF chains than existing works. Thus, this proposal serves as an algorithmically efficient hybrid precoder design for general multiuser multicarrier mmwave systems.

The DPS fullyconnected hybrid precoder structure serves as a performance upper bound for structures that are with lower hardware complexity. It is a tighter upper bound than the fully digital precoder, especially when the number of RF chains is small.

The precoder design problem becomes a lowrank matrix approximation (eigenvalue) problem for the DPS fullyconnected (partiallyconnected) structure, and theoretical analysis, which is intractable for other structures, becomes possible. It will then help to better understand hybrid precoding systems.

Thanks to the benefits in both performance and algorithmic perspectives, the proposed DPS implementation would drive the hardware research for this implementation.
IiC Problem Formulation
There exist different problem formulations for hybrid precoding. Some works tried to directly maximize the spectral efficiency based on approximations and bounds in singleuser systems [9, 16], or based on some extra constraints on the analog precoder to simplify the design in multiuser singlecarrier systems [15, 13]. However, when it comes to multiuser multicarrier systems, it is highly challenging and intractable to directly optimize the hybrid precoder with the spectral efficiency being the objective function, given that the spectral efficiency of each user on each subcarrier is coupled with each other by the shared analog RF precoder. On the other hand, extensive works showed that minimizing the Euclidean distance^{2}^{2}2In this paper, the Euclidean distance between two precoders refers to the Euclidean distance between two points determined by the vectorization of the two precoding matrices. between the fully digital precoder and the hybrid precoder is an effective surrogate for maximizing the spectral efficiency in mmwave MIMO systems [6, 8, 20, 10, 11, 14, 1]. In this paper, we adopt this alternative objective as our design goal, whose formulation^{3}^{3}3In this paper, we focus on the precoder design, and the combiner design problem can be formulated in the same way without the transmit power constraint. is given by
(4)  
where is the combined fully digital precoder, and is the concatenated digital baseband precoder. The second constraint is the transmit power constraint at the BS side. The analog RF precoder is a common component for all users and subcarriers, which is restricted in the candidate set induced by the phase shifter implementation. The set will be later specified for different hybrid precoder structures. Justifications for the formulation (4) for singleuser systems with flatfading channels were provided in [6]. Here we provide some intuition for this formulation for general hybrid precoding systems. The fully digital precoder serves as a performance upper bound for the hybrid one, and one ideal design goal is to obtain hybrid precoders that approach the performance of the fully digital one. Therefore, it is intuitive to formulate the design problem as approximating the fully digital precoder with the hybrid one.
With this formulation, the proposed algorithm can be applied with any fully digital precoder. In this paper, we adopt the classical BD precoder as the fully digital one, which is asymptotically optimal in the high signaltonoise ratio (SNR) regime [24]. We will investigate the hybrid precoder design with the DPS implementation for the fully and partiallyconnected structures in Sections III and IV, respectively.
Iii Hybrid Precoding for the Fullyconnected Structure
The fullyconnected hybrid precoder structure has drawn much research attention in recent years [6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], which will be investigated in this section with the new DPS implementation. We will first present an RFonly precoder to demonstrate the advantages of doubling the phase shifters, where the optimization of the analog RF precoder is formulated as a LASSO problem. Afterwards, the hybrid precoder design will be performed via a simple lowrank matrix approximation.
Iiia RFOnly Precoding
The main difference between the conventional SPS hybrid precoder implementation and the proposed DPS one is on the analog RF precoder. Therefore, we first present an RFonly precoder design [25], where the analog RF precoder is optimized for a given digital precoder. This problem may arise as a subproblem in hybrid precoder design, as in [1, 14, 9], or for situations where the digital precoder has a fixed design, e.g., from a codebook. The investigation of this problem will demonstrate the algorithmic advantage of the DPS implementation. For the fullyconnected structure, the feasible set can be specified as , as each RF chain is connected to all the antenna elements. The optimization of the analog RF precoder design problem is given by
(5)  
Note that the power constraint in (4) is temporarily removed. In fact, after designing the analog RF precoder, we can normalize it if the transmit power constraint is not satisfied. It has been shown in [8, Lemma 1] that as long as we can make the Euclidean distance between the fully digital precoder and the hybrid precoder sufficiently small when ignoring the power constraint, the normalization step will also achieve a small distance to the fully digital precoder. The optimization problem (5) is a convex one and can be solved by solvers such as CVX. Nevertheless, to further reduce the computational complexity, we will exploit the inherent structure of the solution by considering its dual problem.
Lemma 1.
Proof:
See Appendix A. ∎
Based on Lemma 1, problem (5) is transferred to a LASSO problem. This provides the opportunity to leverage the large body of existing works on lowcomplexity algorithms to solve the general LASSO problem [26]. Recall that, with the conventional SPS implementation, the analog RF precoder is optimized through highcomplexity algorithms such as manifold optimization [8] to achieve good performance. In contrast, doubling the phase shifters equips us with huge potential to significantly reduce the computational complexity when designing the analog RF precoder.
What deserves an additional mention is a special case where we can get a closedform solution, which will further reduce the computational complexity. It was shown in [8, 9] that a semiorthogonal structure of the digital baseband precoder, i.e., ^{4}^{4}4Note that in mmwave multiuser OFDM systems, , where for practical system parameters, which means is a fat matrix., leads to an approximately optimal solution. Therefore, we resort to this special case where the observation matrix in the LASSO problem (6) is also semiorthogonal, i.e.,
(9) 
With the semiorthogonal observation matrix , the LASSO problem (6) has a closedform solution called softthresholding [26], which is given by
(10) 
where , , and are elementwise operations, and the first two extract the phase and amplitude of a complex variable, respectively. Then, substituting (9) and (10) to (8), we obtain the corresponding optimal solution to in (5) as
(11) 
Note that, in order to obtain the optimal analog RF precoder when the digital baseband precoder is semiorthogonal, a product of and is the only required step, which is computationally much more efficient than solving the original problem (5) using an algorithmembedded solver. This result also suggests that it is beneficial to set the digital baseband precoder as a semiorthogonal one in the RFonly precoding with the DPS implementation.
IiiB Hybrid Precoding
Previously, we demonstrated the benefit of doubling the phase shifters when optimizing the analog part. When the digital baseband precoder can be jointly optimized, the hybrid precoder design problem is further simplified as an unconstrained matrix decomposition problem, i.e.,
(12) 
Remark 2: The constraint in (5), i.e., , is in fact redundant in hybrid precoding. Once a pair of the unconstrained optimal solution is obtained, one can always get another pair of optimal solution with the factor to satisfy the constraint , which will not affect the objective value. On the other hand, one may consider deploying phase shifters for each connection from an RF chain to an antenna, and the corresponding constraint would be . As illustrated above, this constraint is redundant and the factor can be applied. Therefore, from both performance and algorithmic perspectives, it does not help to further increase the number of phase shifters. Obviously, the minimum number, i.e., two phase shifters, should be adopted due to cost and power consideration. Furthermore, the transmit power constraint is automatically satisfied by the optimal solution of the hybrid precoder, which will be elaborated in the following optimization.
While the main focus of this paper is on multiuser multicarrier systems, some advantages of the proposed DPS implementation in hybrid precoding will be firstly presented in singlecarrier systems, as shown in the following result.
Lemma 2.
For singlecarrier systems, with the DPS implementation, the fully digital precoder can be perfectly decomposed into and using the minimum number of RF chains, i.e., and .
Proof:
The proof can be easily obtained by the rank sufficiency of and in the decomposition when , and is omitted due to space limitation. ∎
Lemma 2 shows that, for singlecarrier systems with either singleuser or multiuser transmissions, the performance of the fully digital precoder can be easily achieved by the hybrid precoder via a simple matrix decomposition. Note that, with the conventional SPS implementation, the number of RF chains should be at least twice that of the data streams in order to realize the fully digital precoder, i.e., and [8, 9]. In this case, since the numbers of phase shifters in use are the same, i.e., at the BS, for both the SPS and DPS implementations, the proposed DPS implementation, which requires fewer RF chains, is more energy efficient when achieving the fully digital precoder.
When it comes to multiuser multicarrier systems, typically , the rank of should be (no longer as singlecarrier systems)^{5}^{5}5Without loss of generality, we assume all the precoding matrices in (12) have full rank. and thus perfect decomposition can only be achieved when , which, however, severely deviates from the setting of hybrid precoding. Therefore, the matrix decomposition cannot be perfect for hybrid precoder design due to the rank deficiency, i.e., . Therefore, problem (12) is typically a lowrank matrix approximation problem, with a closedform solution as
(13) 
Denote the SVD of as , where matrices and are the first columns of and , respectively, and is the diagonal matrix whose diagonal elements are the largest singular values of . This means that the optimal solution of is simply obtained by extracting the most principle components of . From the optimal solution (13), we observe that
(14) 
which means the transmit power constraint is satisfied by the optimal solution . Until now we have obtained the optimal solution for the entire hybrid precoder, and our next task is to decompose it into two parts. In fact, a large number of options are available for decomposing into . Nevertheless, we are especially interested in the following one.
Lemma 3.
The matrix can be decomposed into in the following form:
where , and are the first rows and the th to th rows of , respectively.
Proof:
Assume , then the main task to prove Lemma 3 is to find and that satisfy .
First, we have
Therefore, it is easy to determine that . The remaining task is to solve the equation
Since is with rank and is obtained by the SVD of , the first rows of (the rows of ) are linearly independent, and the remaining rows in (the rows of ) can be linearly expressed by the rows of . Hence, is the solution to the equation, which completes the proof. ∎
The advantage of this decomposition form lies in the pattern of in Lemma 3. The first rows of form an identity matrix, which in fact does not need a phase shifter implementation since the zero elements correspond to no connections whereas the diagonal elements refer to direct connections from RF chains to antennas. This means we only need phase shifters in the analog RF precoder, instead of . Although a similar result was presented in [18], note that the result in Lemma 3 saves more phase shifters and the method is simpler and more straightforward than the decomposition procedure involving two QR decompositions as in [18]. Furthermore, the decomposition pattern in Lemma 3 can also be applied to singlecarrier systems based on the result in Lemma 2, which will further improve the energy efficiency when achieving the fully digital precoder.
As demonstrated above, by doubling the phase shifters, what we need for the hybrid precoder design is computing a subset of singular values and vectors of the fully digital precoding matrix, i.e., the most principle components of , whose computational complexity is . Recall that OMP, as the most popular algorithm for the conventional SPS implementation, is with the computational complexity , which is higher than that of the simple approach we proposed and is related to the channel parameter . In other words, the proposed DPS implementation equips us with precoding algorithms computationally much more efficient than existing ones. Later in Section V, its merits on achievable performance will also be demonstrated via simulations.
IiiC DPSEnabled SPS Hybrid Precoding
In this part, inspired by the above hybrid precoder design, we propose an efficient way to design the conventional SPS implementation. In particular, based on the solution for the DPS implementation, we adopt an heuristic way to tackle the unit modulus constraints induced by the SPS implementation.
As shown in (13), the optimal hybrid precoder can be decomposed by SVD. Therefore, one optimal solution to the hybrid precoder with the DPS implementation is
(15) 
Note that the unitary matrix fully extracts the information of the column space of , whose basis are the orthonormal columns in .
In contrast, in the SPS implementation, the unit modulus constraints, i.e., not only require each column in to have a constant norm like , but also induce an elementwise constraint. Since each element in can only contain the phase information, we propose to extract the phases of the optimal analog precoder for the DPS implementation to construct the SPS solution, given by
(16) 
Although this step is based on heuristics, it shall be shown in Section V that simply extracting the phase information only incurs negligible performance loss. Similar approaches can also be found in [15, 8].
Compared with existing hybrid precoding algorithms with the SPS implementation, e.g., the MOAltMin [8], OMP [6] algorithms, and the algorithm in [16], the proposed DPSenabled design method enjoys much lower computational complexity without any iterative procedure, which makes it a good candidate for lowcomplexity hybrid precoding with the SPS fullyconnected structure.
IiiD Interuser Interference Cancellation
While we can perfectly cancel the interuser interference with the fully digital precoder , there will be residual interuser interference when applying the hybrid precoder, which is an approximation of the fully digital one. For the same reason, as hybrid combining is adopted at the receiver side, the interuser interference cannot be canceled by the receiver either. Later in Section V, we will see that in multiuser multicarrier systems, interuser interference is a severe problem that will dramatically degrade the hybrid precoding performance, especially at high SNRs.
In this subsection, after designing the hybrid precoder and combiner, we propose to cascade another digital baseband precoder that is responsible for canceling the residual interuser interference. In particular, with the hybrid precoder and combiner at hand, we define an effective channel for the th user on the th subcarrier as
(17) 
where is the composite digital precoder on the th subcarrier. Our goal is to design the precoders , which satisfy the conditions
(18) 
A simple way to achieve the conditions is the BD precoder, and note that the dimension of the effective channel is , which is sufficient for BD. More details can be found in [24]. Therefore, after cascading the BD precoder at the baseband, the overall digital baseband precoder of the th user on the th subcarrier is
(19) 
Since now we have obtained an interuser interference free system, we can normalize the precoder to satisfy the maximum transmit power, in order to improve the SNRs of the users. The same approach to cancel the interuser interference will also be used in the partiallyconnected structure and will not be repeatedly presented in the next section.
Iv Hybrid Precoding for the Partiallyconnected Structure
One of the shortages of the fullyconnected structure is the large number of phase shifters. The partiallyconnected structure, as a more energy efficient and costeffective structure [8, 21], employs notably fewer phase shifters, i.e., phase shifters with the DPS implementation, which lends itself to practical implementation. Since the DoFs of the analog precoder is greatly reduced, RFonly precoding is far from satisfactory in the partiallyconnected structure. In this section, we shall first present the hybrid precoding with a fixed mapping from RF chains to antennas. Two algorithms will be then proposed to perform dynamic mapping to further improve the performance.
Iva Hybrid Precoding With Fixed Mapping
In [8, 21], fixed mapping was considered in the partiallyconnected structure, i.e., each RF chain is connected to a certain number of antennas in a predetermined manner. To present the hybrid precoder design with fixed mapping clearly, we take one special mapping [8, 21, 27] as an example in the following, where the th RF chain is connected to the th set of adjacent antennas. The corresponding constraint on the analog RF precoding matrix can be visualized as a set of block diagonal matrices , where each block is an dimension vector, i.e.,
(20) 
where . The amplitude of the analog precoding gain for the th connection from RF chains to antennas is denoted as . Similar to the hybrid precoding in the DPS fullyconnected structure, the constraints are redundant and therefore they are omitted in the following derivation. Furthermore, the transmit power constraint is also automatically satisfied by the optimal solution of the hybrid precoder, which will be shown in the following parts. Thus, the hybrid precoder design problem with fixed mapping can be recast as
(21)  
Note that there is only one nonzero element in each row of the analog RF precoding matrix . Due to this special structure, different vectors will be multiplied by distinct rows of , which decouples problem (21) into subproblems in an RF chainbyRF chain sense. The optimization of the hybrid precoder for the th RF chain is given by
(22) 
where , , and .
Proposition 1.
The optimal solution to the subproblem is given by the following closedform expression.
(23) 
Proof:
We check the first order optimality conditions as
(24)  
(25) 
where is the objective function in subproblem . Substituting (24) into (25), we can get
(26) 
which shows that and are the eigenvalue and eigenvector of . Moreover, by substituting (24) into the objective function in , it can be rewritten as
(27) 
Hence, minimizing the objective function is equivalent to taking as the largest eigenvalue of the covariance matrix , denoted as . ∎
IvB Hybrid Precoding With Dynamic Mapping
Different from the fullyconnected structure that utilizes all the connections from RF chains to antennas, the partiallyconnected structure will induce nonnegligible performance loss [8]. In this section, we propose to improve its performance by optimizing the mapping strategy, i.e., we will dynamically determine for each RF chain which antennas it should be connected. The dynamic mapping problem is given as
(29)  
where is a set of matrices for which every row only has one nonzero entry, i.e., , meaning that each antenna can only be connected to one RF chain. As indicated by equation (28), once the mapping is fixed, the optimal value of the objective function in (21) is
(30) 
Hence, when we have the freedom to design the mapping strategy from RF chains to antennas, the design target is to seek the mapping that maximizes the sum of the largest eigenvalues, i.e.,
(31)  
where is the mapping set containing the antenna indices that are mapped to the th RF chain. The dynamic mapping problem is a combinatorial problem and the optimal solution can be given by exhaustive search with an extremely huge number of possible mapping strategies as , which prevents its practical implementation. Therefore, we first propose a greedy algorithm to solve the problem. The pseudocode of the algorithm is omitted due to its simplicity and space limitation.
In each iteration of the greedy algorithm, we connect the th antenna to the th RF chain, which is the connection with the maximum increment of the largest eigenvalue when this connection is added into the mapping network. Note that the computational complexity of the algorithm is dominated by the calculation of the largest eigenvalue. In the greedy algorithm, the number of times we need to perform the eigenvalue decomposition (EVD) is , which is a quite large number especially when largescale antenna arrays are leveraged in mmwave MIMO systems. To relieve us from the high computational complexity, we then propose a modified Kmeans algorithm to solve the dynamic design problem (31).
We reconsider problem (31) as follows. The problem is equivalent to classifying vectors (antennas) into clusters (RF chains). Kmeans, aiming at partitioning the observation vectors into clusters, is a prevalent approach for cluster analysis in data mining, where is a predefined parameter, and turns out to be suitable for problem (31). In the classical Kmeans algorithm, the objective is to minimize the sum of the Euclidean distances from each observation vector to the centroid of the cluster it belongs to. The distortion function that is to be minimized in the classical Kmeans algorithm is given by^{6}^{6}6We present the distortion function in the classified Kmeans algorithm with a slight abuse of notations and so that the content of the modified one in the following is easier to follow.
(32) 
where are the observation vectors while is the centroid of the th cluster.
However, this distortion function cannot be directly adopted to solve the dynamic mapping design problem (31) since the objectives are quite different. In (31), the objective is to maximize the sum of the largest eigenvalues of the covariance matrices of each cluster. Therefore, we propose to modify the distortion function in the Kmeans algorithm as
(33) 
The modified distortion function is the sum of Rayleigh quotients of the covariance matrices of each cluster, whose optimal value is the sum of the largest eigenvalues when we maximize (33) over . The overall clustering problem can be written as
(34)  
We propose to adopt alternating maximization to solve this problem, which alternately updates the clustering and centroids when the other one is fixed. This approach results in closedform solutions for the two update procedures.
In the clustering update, we allocate each vector to the cluster whose centroid has the largest inner product with it, i.e., allocate to the th cluster, where
(35) 
In the centroid update, the optimization of the centroids is equivalent to maximizing the Rayleigh quotients for each cluster, whose optimal solution is simply given by the eigenvector corresponding to the largest eigenvalue, i.e.,
(36) 
Now we have the modified Kmeans algorithm, which is summarized as Algorithm 1.
Note that Steps 3 and 4 both give the globally optimal solutions to the clustering and centroid. Hence, the algorithm will converge to a stationary point since it is a two block coordinate descent procedure [28]. Because the modified distortion function is not jointly convex with respect to and , the modified Kmeans algorithm converges to a local optimum of problem (34) so the solution is sensitive to the initial centroids selection. For hybrid precoding, the size of the observation set is much larger than the cluster number, i.e., . One heuristic rule of thumb to design the initial centroids is to pick observation vectors with small correlations. In our proposed algorithm, we propose to select pairs of vectors out of the observation vectors as the initial centroids, which have the smallest inner products.
Recall that EVD is the dominant part of the computational complexity in dynamic mapping design. In each alternating iteration in the modified Kmeans algorithm, times of EVD are needed and therefore the overall times are , where is the iteration number. For practical settings in Section V, the modified Kmeans algorithm typically converges within 10 iterations, which is much less than and thus results in significant complexity reduction compared to the greedy algorithm.
IvC Fullyconnected vs. Partiallyconnected Structures
There exist several studies [13, 8, 9] investigating different design algorithms for the fully and partiallyconnected structures, and comparisons between these two structures are provided via simulations. However, to the best of the authors’ knowledge, so far there is no analytical quantitative comparisons between different structures. The complicated design approaches to handle the unit modulus constraints induced by the SPS implementation are the main obstacles. With the DPS implementation and its resulting lowcomplexity design approaches at hand, we are able to fill this gap. Following (13) and (21) in Sections III and IV, we obtain that
(37) 
where and are the optimal values of the objective function in (4) for the fully and partiallyconnected structures, respectively, and denotes the th largest singular value of . We define the performance gap between the fully and partiallyconnected structures as the different between two objective values, i.e.,
(38) 
where is composed of the vectors as its columns. Therefore, once the RFantenna mapping in the partiallyconnected structure is determined, the performance gap is given by (38), which provides an analytical comparison of two hybrid precoder structures. This expression indicates that the performance gap depends on the channel realization, as well as the RF chainantenna mapping strategy.
V Simulation Results
In this section, we numerically evaluate the performance of the proposed hybrid precoder design for multiuser OFDM mmwave MIMO systems, with subcarriers are assumed. The channel parameters are given by clusters, rays, and the average power of each cluster is . The AoDs and AoAs follow the Laplacian distribution with uniformly distributed mean angles in and angular spread of 10 degrees. The antenna elements in the USPA are separated by half wavelength, and all simulation results are averaged over 1000 channel realizations.
Va RFOnly Precoding in the FullyConnected Structure
First, we test how much performance gain we can get when we double the number of phase shifters via investigating the RFonly precoding in the fullyconnected structure. Note that, with the conventional SPS implementation, manifold optimization was shown in [8] to be an effective method to directly tackle the unit modulus constraints and achieve higher spectral efficiency than other existing works. In this subsection, we adopt Algorithm 1 in [8] for the SPS implementation as the benchmark to show the advantage of the proposed DPS implementation. For fair comparison, we always keep the same digital baseband precoder that is semiorthogonal for both SPS and DPS implementations in the simulation^{7}^{7}7Note that we cannot adjust the digital precoder in the RFonly precoding so we do not apply the additional BD mentioned in Section IIID at this stage..