A HardwareEfficient Analog Network Structure for Hybrid Precoding in Millimeter Wave Systems
Abstract
Hybrid precoding has been recently proposed as a costeffective transceiver solution for millimeter wave (mmwave) systems. While the number of radio frequency (RF) chains has been effectively reduced in existing works, a large number of highprecision phase shifters are still needed. Practical phase shifters are with coarsely quantized phases, and their number should be reduced to a minimum due to cost and power consideration. In this paper, we propose a novel hardwareefficient implementation for hybrid precoding, called the fixed phase shifter (FPS) implementation. It only requires a small number of phase shifters with quantized and fixed phases. To enhance the spectral efficiency, a switch network is put forward to provide dynamic connections from phase shifters to antennas, which is adaptive to the channel states. An effective alternating minimization (AltMin) algorithm is developed with closedform solutions in each iteration to determine the hybrid precoder and the states of switches. Moreover, to further reduce the hardware complexity, a groupconnected mapping strategy is proposed to reduce the number of switches. Simulation results show that the FPS fullyconnected hybrid precoder achieves higher hardware efficiency with much fewer phase shifters than existing proposals. Furthermore, the groupconnected mapping achieves a good balance between spectral efficiency and hardware complexity.
I Introduction
Uplifting the carrier frequency to millimeter wave (mmwave) bands is an effective approach to meet the capacity requirement of the upcoming 5G networks, and thus mmwave communication has drawn extensive attention from both academia and industry [2, 3]. Thanks to the small wavelength of mmwave signals, largescale antenna arrays can be leveraged at transceivers to combat huge path loss at mmwave frequencies and support directional transmissions with advanced multipleinputmultipleoutput (MIMO) techniques. As equipping each antenna element with a single radio frequency (RF) chain is costly and power hungry, hybrid precoding has been put forward as a costeffective transceiver solution, which utilizes a limited number of RF chains to connect a digital baseband precoder and an analog RF precoder [4].
In contrast to the conventional fully digital precoder, the additional hardware in the hybrid one is the analog component, also called the analog network, which determines the overall hardware structure of the hybrid precoder. Most existing works on hybrid precoding are performanceoriented, i.e., aiming at maximizing the spectral efficiency [5, 4, 6]. However, spectral efficiency close to the fully digital precoder was achieved with bulky hardware and impractical assumptions for the analog network, which results in a poor hardware efficiency and hinders its practical implementation. Thus, it is of great importance to develop hardwareefficient analog networks that help the practical deployment of hybrid precoders.
To discuss hardwareefficient design, we first introduce a few terminologies for describing the hybrid precoder structure. Each hybrid precoder structure is specified by its mapping strategy and hardware implementation. Specifically, the mapping strategy decides how the RF chains and antenna elements are connected, which also determines the number of hardware components needed in the analog network. Typical mapping strategies include the fully and partiallyconnected ones. The fullyconnected one exploits all the degrees of freedom to perform the mapping, i.e., it maps every RF chain to all the antennas, e.g., [4]. In contrast, each RF chain is only connected to a subset of antennas in the partiallyconnected one, e.g., [7]. On the other hand, the hardware implementation specifies the adopted hardware components and the way each RF chainantenna pair is connected. The single phase shifter (SPS) implementation is the most commonly adopted one, which deploys one phase shifter to realize each RF chainantenna connection [8]. More recently, a double phase shifter (DPS) implementation was proposed in [9, 10] to simplify the hybrid precoding algorithm design, where two distinct phase shifters are used to connect each RF chainantenna pair.
In this paper, we propose a novel analog network structure that significantly improves the hardware efficiency of hybrid precoders. This is achieved by an innovative hardware implementation, called the fixed phase shifter (FPS) implementation, and a new mapping strategy, i.e., the groupconnected mapping. In particular, the new structure can approach the performance of the fully digital precoder with very few fixed phase shifters.
Ia Related Works
The fullyconnected mapping strategy with the SPS implementation, referred as the SPS fullyconnected structure, is the most popular structure in earlier works on hybrid precoding [4, 11, 6, 12, 13]. However, this structure entails a drawback in the analog network, i.e., the number of phase shifters in use is , with and being the numbers of RF chains and antennas, respectively. Note that phase shifters, originally utilized in military radar systems, are newlyintroduced hardware components in hybrid precoding systems, and currently very costly for commercial use, e.g., it can be around a hundred US dollars even with low resolution [14]. Hence, deploying such a large number of phase shifters would cause prohibitively high cost and power consumption. More importantly, phase shifters are assumed with variable high resolution to provide nearoptimal performance with effective algorithms, which is far from practical.
To improve the hardware efficiency, one possible way is to reduce the number of phase shifters in use via changing the mapping strategy. Partiallyconnected mapping, which connects each RF chain to a subset of antennas, stands out as a popular solution [15, 16, 7, 17, 10]. A semidefinite relaxation based alternating minimization (SDRAltMin) algorithm was proposed in [15] for hybrid precoder design with this mapping strategy. Based on a similar idea as successive interference cancellation (SIC), an iterative hybrid precoding algorithm for the partiallyconnected mapping was proposed in [16]. In addition, a greedy algorithm and a modified Kmeans algorithm were developed in [17] and [10], respectively, to dynamically optimize the subarrays in the partiallyconnected mapping for performance improvement. While various techniques were introduced to design hybrid precoders with the partiallyconnected mapping, there still exists a nonnegligible gap in spectral efficiency compared with the fullyconnected one. Inevitably, tradeoffs need to be made between hardware efficiency and spectral efficiency, but the partiallyconnected mapping goes to an extreme, i.e., it enhances the hardware efficiency by incurring too much performance degradation. It is thus of practical importance to develop hardwareefficient hybrid precoder structures that can achieve more flexible tradeoffs.
On the other hand, different hybrid precoding algorithms have been proposed assuming phase shifters with arbitrary precision, e.g., orthogonal matching pursuit (OMP) [4], manifold optimization [15], and SIC [16]. Following these works, a straightforward refinement for practical hardware implementation is to design hybrid precoders with quantized phase shifters [12, 18, 11, 19, 20]. The main approach is either to determine all the phases at once [4, 18, 11, 19] or update one phase at a time [20] by ignoring the quantization effect at first. Then the phases are heuristically quantized into the finite feasible set according to certain criteria. However, a simple quantization step is far from satisfactory, and the optimality and convergence of the proposed algorithms cannot be guaranteed [20]. In addition, hybrid precoder design based on codebooks consisting of quantized phases was investigated in [21, 22, 23]. While codebookbased design enjoys a low complexity, there will be certain performance loss, and it is not clear how much performance gain can be further obtained. The number of quantized phase shifters was to some extent reduced in [19], which is approximately for achieving a certain required precision , e.g., around quantized phase shifters are needed for . Unfortunately, a large number of phase shifters are still needed for achieving a high spectral efficiency under practical settings in multiuser OFDM systems, i.e., 40 quantized phase shifters for each RF chain, and the number varies with the precision requirement. More importantly, in these existing works, the phases need to be adapted to the channel states, which brings high hardware implementation complexity and also increases power consumption. Recently, a hybrid precoder structure that adopts switches to improve the hardware efficiency was put forward in [24]. Nevertheless, simply replacing variable phase shifters with switches will cause significant performance degradation. Therefore, a more effective approach to handle quantized phases is needed, and the number of phase shifters should be reduced to a minimum.
IB Contributions
In this paper, we investigate hardwareefficient design for hybrid precoding in general multiuser orthogonal frequencydivision multiplexing (OFDM) mmwave systems. The main contributions are summarized as follows.

As a first step, a novel hardware implementation is proposed for the analog network, called the fixed phase shifter (FPS) implementation, where only a small number of phase shifters with fixed phases are needed. To compensate the performance loss induced by the fixed phases, a switch network is proposed to provide dynamic connections from phase shifters to antennas, which is easily implementable by adaptive switches.

An AltMin algorithm is developed to design the hybrid precoder with the fullyconnected mapping, where an upper bound of the objective function is derived as an effective surrogate. In particular, the largescale binary constraints induced by the switch network are delicately tackled with the help of the upper bound, which leads to closedfrom solutions for both the dynamic switch network and the digital baseband precoder, and therefore enables a lowcomplexity hybrid precoding algorithm.

To further reduce the hardware complexity, a novel mapping strategy, i.e., the groupconnected mapping, is proposed and then applied along with the FPS implementation. This flexible mapping strategy incorporates the popular fully and partiallyconnected mapping strategies as special cases. More importantly, the introduction of this new mapping strategy does not incur any additional design challenges as the hybrid precoder can be readily designed by leveraging existing hybrid precoding algorithms.

Extensive comparisons are provided to reveal valuable design insights. In particular, the FPS fullyconnected hybrid precoder structure is shown to be able to easily approach the performance of the fully digital precoder, and enjoys a higher hardware efficiency than existing proposals. What deserves a special mention is the sharp reduction of the number of phase shifters compared with existing hybrid precoder implementations, e.g., 10 fixed phase shifters in total are sufficient. In addition, the FPS groupconnected structure, which further reduces the number of switches, provides a flexible way to trade off spectral efficiency with hardware complexity.
In summary, our results firmly show that the proposed FPS groupconnected structure is a promising candidate for hardwareefficient hybrid precoding in 5G mmwave communication systems.
IC Organization
The remainder of this paper is organized as follows. In Section II, we introduce the system model and proposed FPS implementation, followed by the problem formulation. The AltMin algorithms for the singlecarrier and multicarrier systems with the FPS fullyconnected mapping strategy are demonstrated in Sections III and IIIC, respectively. Section IV introduces the groupconnected mapping strategy. Simulation results are presented in Section V. Finally, we conclude this paper in Section VI.
ID Notations
The following notations are used throughout this paper. and stand for a column vector and a matrix, respectively; The conjugate, transpose, and conjugate transpose of are represented by , , and ; and denote the and Frobenius norms of vector and matrix ; establishes a block diagonal matrix using as its diagonal terms; and indicate the trace and vectorization; Expectation and the real part of a complex variable is noted by and .
Ii System Model
Iia Hybrid Precoding and Combining
Consider the downlink transmission of a multiuser mmwave MIMOOFDM system as shown in Fig. 1. A base station (BS) leverages an size antenna array to serve users over subcarriers using OFDM. Each user is equipped with antennas and receives data streams from the BS on each subcarrier. The numbers of available RF chains are and for the BS and each user, respectively, which are restricted as and .
The received signal of the th user on the th subcarrier is given by
(1) 
where the subscript stands for the th user on the th subcarrier. The average received power of the th user is denoted as , and is the transmitted signal such that , where is the transmit power. In addition, denotes the circularly symmetric complex Gaussian noise with power as at the users. The digital baseband precoders and combiners are denoted as and , respectively, with dimensions and . Since the transmitted signals for all the users are mixed together by the digital precoders, and analog RF precoding is a postIFFT (inverse fast Fourier transform) operation, the RF analog precoder with dimension is a common component shared by all the users and subcarriers. Correspondingly, the RF analog combiner is subcarrierindependent for each user. In this paper, we focus on the precoder design while the combiners can be designed in a similar way.
As discussed in Section I, each hybrid precoder structure is primarily determined by the mapping strategy and hardware implementation. In particular, the former maps the signals out of the limited RF chains to the largescale antenna array, while the latter decides what kind of and how many hardware components are adopted to process the signal for each RF chainantenna pair. In this section, a novel hardware implementation is first proposed to seek a hardwareefficient hybrid precoder structure. Then, to achieve a better balance between the hardware complexity and spectral efficiency, a flexible mapping strategy is introduced in Section IV.
Phase shifter  Other hardware components  
Number  Type  Power  Hardware  Number  Power  
SPS [4, 15]  Fullyconnected  Adaptive  50 mW  N/A  N/A  N/A  
Partiallyconnected  
SPS with Butlter  Fullyconnected  Fixed  20 mW  Coupler  10 mW  
matrices [25]  Partiallyconnected  
DPS [9, 10]  Fullyconnected  Adaptive  50 mW  N/A  N/A  N/A  
Partiallyconnected  
FPS  Fullyconnected  Multichannel  20 mW  Switch  5 mW  
Groupconnected  Fixed 
IiB FPS Implementation
Recently, a DPS implementation was proposed in [9, 10], which enables lowcomplexity hybrid precoder design and also greatly improves the spectral efficiency. These benefits come from allowing the same signal to pass through two phase shifters. Inspired by this insight, we propose a hardwareefficient implementation in the following.
In the proposed implementation, phase shifters are used, where , as shown in Fig. 1. One critical difference between the proposed implementation and existing ones is that the number of phase shifters no longer depends on any other parameters, e.g., the number of RF chains or antennas, and can be made very small, which effectively improves the hardware efficiency. Inspired by the beneficial operation in the DPS implementation, the signal from each RF chain is passed through all available phase shifters. In other words, each phase shifter is an channel phase shifter [26] that can simultaneously process the output signals from RF chains, i.e., in a parallel fashion. On the other hand, while the number of (multichannel) phase shifters could be small, it is still intractable to shift arbitrary phases or to switch between multiple quantized phase levels at a high speed to adapt to the channel states. In our proposal, instead of variable phase shifters, the phase shifters are assumed with fixed phases [27], which is independent of the channel states. Thus, this proposal is referred as the FPS implementation.
Remark 1: With the limited number of fixed phase shifters, the analog precoder can only provide the same static precoding gain for all RF chainantenna pairs and therefore inevitably entails performance loss.
To overcome this drawback brought by the simplified hardware implementation, we propose to cascade a dynamic switch network after the fixed phase shifters, which is adapted to the channel states. The signal flow in the FPS implementation is illustrated as follows. To clearly illustrate the proposed FPS implementation, we focus on the signal flow of one RF chainantenna pair, as shown in Fig. 2.
The fixed phase shifters generate signals with different phases for the output signal of the given RF chain. We propose to adaptively combine a subset of the signals to compose the analog precoding gain from the RF chain to the antenna, which is realized by adaptive switches. Hence, switches are needed for each RF chainantenna pair. Note that, with only binary onoff states, adaptive switches are much easier to implement than adaptive phase shifters [27, 24].
Remark 2: The adaptive switch network enables the analog precoder to offer various precoding gains for different RF chainantenna pairs to adapt to the channel states. Later we will see that although the proposed FPS implementation can only provide the analog precoding gains from a dimension codebook, its performance is satisfactory with just a small value of .
In summary, all the hardware components needed for the FPS implementation are fixed phase shifters and switches per RF chainantenna pair, and the total number of switches depends on the employed mapping strategy.
Accordingly, the analog RF precoding matrix can be expressed as
(2) 
where the switch matrix is a binary matrix with dimension , and the Boolean constraints are induced by the switches with binary states. Note that some entries may be forced to be zero due to different mapping strategies, which shall be discussed later. The matrix stands for the phase shift operation carried out by the available fixed phase shifters, given by a block diagonal matrix as
(3) 
where is the normalized phase shifter vector containing all fixed phases . Note that although there are nonzero parameters in matrix , only phase shifters are required since the phase shifters are with parallel channels and shared by all RF chainantenna pairs.
Table I lists the required hardware components in the analog network for different hybrid precoder structures, as well as the corresponding power consumption of each kind of hardware component [24]. It shows that the proposed FPS implementation employs much less (fixed) phase shifters and consumes less power compared with existing works. While a bunch of switches are cascaded after the fixed phase shifters, the advantages of this proposal in hardware complexity and power consumption shall be demonstrated more explicitly in Section V via numerical comparisons.
Remark 3: The ease of implementation and operation is another important aspect in hybrid precoder design. As switches only have binary states while highresolution phase shifters need to be adaptive between a large number of states, the design and implementation of adaptive switches are generally easier than highresolution adaptive phase shifters [28], which makes the proposed FPS a practical and hardwareefficient implementation for the hybrid precoder structure.
IiC Problem Formulation
There exist different formulations to maximize the spectral efficiency of hybrid precoding systems. One can either directly maximize the spectral efficiency [5], or adopt other performance metrics, e.g., mean square error (MSE) [29] as surrogates to maximize the spectral efficiency. However, these formulations either result in highcomplexity algorithms or with poor performance. More importantly, in multiuser multicarrier (MUMC) systems, the analog precoder is a component that is shared by all users and subcarriers, which incurs additional difficulties on hybrid precoder design and therefore calls for a more tractable formulation to maximize the spectral efficiency. It has been shown in [4, 15, 18, 20, 9, 13, 30] that minimizing the Euclidean distance between the fully digital precoder and the hybrid precoder is an effective and tractable alternative objective for maximizing the spectral efficiency in mmwave systems.
On the other hand, it was found in [9, 10] that the hybrid precoder in the multiuser setting produces residual interuser interference, as it only approximates the fully digital precoder. Such interference will significantly degrade the system performance, especially at high SNR regimes. Moreover, this issue is more prominent in the multicarrier system as the analog precoder is shared by a large number of subcarriers.
Therefore, to both effectively approximate the fully digital precoder and cancel the interuser interference, we propose to apply a twolayer precoding at the baseband [31]. In particular, the digital baseband precoder consists of two parts, i.e.,
(4) 
where is a normalization factor, is the precoder that is utilized for approximating the fully digital precoder along with the analog precoder , and is the precoder that is responsible for canceling the interuser interference. A similar approach was adopted in [32].
Correspondingly, the first task, i.e., to approximate the fully digital precoder, can be formulated as
(5) 
where the combined fully digital precoder is denoted as , and is the concatenated digital precoder^{1}^{1}1The phrase “digital precoder” is used to refer in the remainder of this paper with a slight abuse of terminology, as it is the digital part in the hybrid precoder that approximates the fully digital precoder. with dimension . The constraint set of the switch matrix is denoted as . Note that, while the transmit power constraint is not explicitly considered in , it shall be satisfied by adapting the normalization factor after is solved.
With the digital precoder at hand, the other precoder is cascaded after it to cancel the interuser interference based on the effective channel including the hybrid precoder and physical channel, which is given by
(6) 
where with dimension is the composite digital precoder on the th subcarrier. Then, our goal is to design precoders that satisfy the conditions
(7) 
A simple way to achieve the conditions is the block diagonal (BD) precoder. More details can be found in [33].
Since the interuser interference is canceled, we can determine the normalization factor to satisfy the transmit power constraint , which is given by
(8) 
Note that the combiners at the user side are with the same analog network structure as (2). The hybrid combiners can be designed in a similar way as for each user independently, and thus are omitted due to space limitation. In addition, the problem formulation is not limited to any specific channel models or fully digital precoding schemes. It can be easily observed that the hybrid precoder can be readily designed by (6) to (8) once is solved, and hence we will focus on in the following sections.
Iii Hybrid Precoder Design With the FPS Implementation
In this section, we design the hybrid precoder with the FPS implementation and the popular fullyconnected mapping strategy, for which every entry in the switch matrix is a binary optimization variable and there are in total switches. As shown in the hybrid precoder design problem , the main task is to design the binary switch matrix and the digital precoding matrix . First we make some observations on .
Remark 4: Since the switch matrix is with finite possibilities, the cardinality of the constraint set for the analog precoding matrix is finite, which means that the OMP algorithm [4, 13] is applicable to . However, different from the SPS case, the dimension of the dictionary in the OMP algorithm for the FPS implementation is oversize, i.e., , which is a huge number in largescale antenna systems and hence hinders its practical implementation.
Remark 5: Alternating minimization can be directly applied to where the binary constraints can be tackled with the semidefinite relaxation (SDR) technique [15]. However, an dimension semidefinite programming (SDP) problem should be solved in each iteration, which causes prohibitive computational complexity. Moreover, how to recover a rankone solution from an SDR with binary constraints is still an open problem [34]. This means that the optimality of the relaxation in each iteration of the alternating procedure cannot be ensured and hence the overall convergence of the AltMin algorithm cannot be guaranteed.
As discussed above, the main difficulty to solve is the largesize binary constraints of the switch matrix . As a matter of fact, even if we only focus on the design of the switch matrix , is an NPhard problem [34]. In this section, by deriving an effective surrogate for the objective function and adopting alternating minimization, we come up with a lowcomplexity hybrid precoding algorithm that well tackles the binary constraints.
Note that the property of the combined digital precoding matrix differs for different system settings. It is a tall matrix in singlecarrier systems, i.e., , since . In contrast, when it comes to multicarrier systems, is likely to be a fat matrix as for practical system parameters. As we will see in this section, this difference affects the manipulation of the algorithm, and we first present the hybrid precoder design in singlecarrier systems^{2}^{2}2In this paper, singlecarrier systems refer to singlecarrier transmissions assuming flatfading channels. The choice of such a model is for the ease of presentation, and the algorithm will be later extended to the more realistic multicarrier case with frequencyselective fading channels..
Iiia An Upper Bound for the Objective
In [15, 9, 5], it has been shown that imposing a semiorthogonal structure for is an efficient way to achieve nearoptimal performance. Inspired by these results, we take a similar approach as follows. In singlecarrier systems, the digital precoding matrix is a tall matrix, and thus the semiorthogonal constraint is specified as
(9) 
where , is a scaling factor, and is a semiunitary matrix. Then, an upper bound is derived for the objective function in in the following lemma.
Lemma 1.
The objective function in is upper bounded by
(10) 
Proof:
The objective function in can be rewritten as
(11) 
According to (3), the phase shifter matrix is a semiunitary matrix, i.e., . Therefore, we can derive an upper bound for the last term in (11), given by
(12) 
Step (a) follows the singular value decomposition (SVD) of by utilizing the semiunitary property of , whose left singular vectors are the columns of . ∎
IiiB Alternating Minimization
By adopting the upper bound (10) as the surrogate objective function and dropping the constant term , the hybrid precoder design problem is reformulated as
(13) 
Alternating minimization, as an effective tool for optimization problems involving different subsets of variables, has been widely applied and shown empirically successful in hybrid precoder design [15, 9, 5]. In this section, we apply this design principle to the hybrid precoder design with the FPS fullyconnected structure. In each step of the AltMin algorithm, one subset of the optimization variables is optimized while keeping the other parts fixed.
When the switch matrix and are fixed, the optimization problem can be written as
(14)  
According to the definition of the dual norm [35], we have
(15) 
where and stand for the infinite and one Schatten norms [35], and (b) follows the Hölder’s inequality. The equality is established only when
(16) 
where follows the SVD and is a diagonal matrix with nonzero singular values .
While we can divide the optimization of the two variables and into two separate subproblems, we propose to update them simultaneously to save the number of subproblems involved in the AltMin algorithm and therefore reduce the computational complexity. By adding a constant term to the objective function in , the subproblem of updating and can be recast as
(17)  
Proposition 1.
Proof:
See Appendix A. ∎
Basically, is a quadratic function within each interval , as shown in (36) in the proof. This means that the optimal solutions of in all the intervals can only be obtained either at the endpoints of the intervals, i.e., , or at the vertexes of the parabolas, i.e., , if they fall into the intervals. Therefore, the optimal is obtained via a closedform solution by comparing the optimal solutions of in all the intervals , as indicated in (18). Nevertheless, since the number of intervals to be compared is , it will incur high computational complexity when is large as in mmwave systems. In the following lemma, we show that there is no need to compute the optimal in all the intervals , which further reduces the complexity of the proposed algorithm.
Lemma 2.
The optimal is obtained at one of the points , where denotes the set of the ’s that have finite values of .
Proof:
See Appendix B. ∎
Lemma 2 indicates that any endpoints of the intervals cannot be the optimal solution for . Moreover, since is a coercive function, i.e., , we only need to pick the ’s that have finite values of , i.e., the ones that satisfy the first two conditions in (20), and the optimal solution for is given by
(21) 
By Lemma 2, the number of intervals we need to compare to obtain the optimal is shrunk from to , which is empirically shown to be less than 5 via simulations in Section V and hence further reduces the computational complexity of the proposed AltMin algorithm.
Thus, we have shown that, with the help of the upper bound derived in (12), the largescale binary switch matrix can be efficiently optimized by a closedform solution (19), which verifies the benefits and superiority of the surrogate objective function adopted in . With the closedform solutions derived in (16), (19), and (21) at hands, the AltMin algorithm for the FPS fullyconnected structure in singlecarrier systems is summarized as FPSAltMin Algorithm. There are several issues involved in the FPSAltMin algorithm that require some further remarks.
1) Convergence: The FPSAltMin algorithm is essentially a block coordinate descent (BCD) algorithm with two blocks and , whose globally optimal solutions are given by (16), (19) and (21). Hence, the algorithm is guaranteed to converge to a stationary point of [36].
2) Initial point: Since the algorithm converges to a stationary point, it may be sensitive to the initial point . We provide a way to construct an initial point in the FPSAltMin algorithm. The fully digital precoding matrix can be decomposed as follows according to its SVD , i.e.,
(22) 
where is an matrix with full column rank, is a dimension square matrix, and is an arbitrary matrix. In (22), the fully digital precoding matrix is decomposed into two matrices that satisfy the dimensions of and , respectively. In other words, , , and is a globally optimal solution to the hybrid precoding problem without any constraints on the analog precoding matrix . In this way, we generate the initial point as
(23) 
Note that fully extracts the information of the row space of , whose basis are the first rows in . We also stress that the satisfies the semiunitary constraint introduced in (9).
3) Computational complexity: We compare the computational complexity of the proposed algorithm with the ones mentioned in Remarks 4 and 5. Since the dictionary size in the OMP algorithm is , the computational complexity could be prohibitively high even though this algorithm only needs a small number of iterations. For the SDR method mentioned in Remark 5, in each iteration^{3}^{3}3The procedure that updates both the analog and digital precoders is counted as one iteration., an dimension SDP problem should be solved for updating the analog part while a pseudoinverse operation is needed for updating the digital precoder. Therefore, the computational complexity per iteration is . On the contrary, in each iteration of the proposed FPSAltMin algorithm, the computational complexity is dominated by the truncated SVD and sorting operations, with the complexity , which is much lower than those of the OMP algorithm and SDR method^{4}^{4}4To solve the switch matrix in one iteration, the running time of the SDR method is 1.3 s while the proposed FPSAltMin algorithm takes 0.04 s when , , and ..
IiiC Hybrid Precoder Design in Multicarrier Systems
Multicarrier techniques such as OFDM are often utilized to overcome the frequencyselective fading caused by the large available bandwidth in mmwave systems. Compared with the narrowband hybrid precoder design in Section III, the main difference in OFDM systems is that the analog precoder is shared not only by all users but also across all subcarriers [15, 21]. In particular, the digital precoding matrix in is no longer a tall matrix, since for practical OFDM system settings.
In this section, we modify the FPSAltMin algorithm for OFDM systems. Similar to (9), we enforce a semiorthogonal constraint on the digital precoding matrix. As is generally a fat matrix, the semiorthogonal constraint is specified as
(24) 
In this way, the upper bound of the objective function derived in (12) still holds since
(25) 
where (c) comes from the SVD of , i.e., , since is a semiunitary matrix, and the columns of are the left singular vectors of . As the modifications in multicarrier systems lie in the digital precoding matrices and , in the modified AltMin algorithm, the update of and is the same as that in Section IIIB. On the other hand, since is a fat matrix in OFDM systems, the optimization of should be modified as
(26) 
where and is a diagonal matrix with nonzero singular values , which is the SVD of . Correspondingly, the construction of the initial is given by
(27) 
where is the SVD of and the subscript denotes the first to the th columns of a matrix.
By substituting (27) and (26) into Steps 1 and 4 in the FPSAltMin algorithm, respectively, we obtain the modified FPSAltMin algorithm for mmwave OFDM systems. The conclusion on convergence remains the same as was discussed in Section IIIB while the computational complexity is . Furthermore, the interuser interference canceling approach can also be extended to OFDM systems, i.e., an additional BD precoder is utilized based on the effective channel that is defined as
(28) 
where with dimension is the composite digital precoder on the th subcarrier. Therefore, the extension to multicarrier systems does not lead to extra design difficulties compared with singlecarrier systems.
Iv The GroupConnected Mapping Strategy for Hybrid Precoding
In previous sections, the hybrid precoder design is based on a novel hardware implementation but with a conventional mapping strategy, i.e., the fullyconnected mapping. In this section, a new mapping strategy, called the groupconnected mapping, is proposed to offer a flexible tradeoff between hardware complexity and spectral efficiency. In particular, with this mapping strategy, the number of switches in the FPS implementation is further reduced.
Iva The GroupConnected Mapping Strategy
Fig. 3 compares different mapping strategies. In the groupconnected mapping, the RF chains and antennas are divided into groups, as shown in Fig. 3(c). Within each group, the mapping strategy is the same as the fullyconnected mapping, i.e., each RF chain is connected to all antennas. Thus, the analog precoding matrix has the block diagonal structure, with each block corresponding to one RF chainantenna group, specified as
(29) 
with being the analog precoding matrix in the th group. Note that while the RF chains and antennas are uniformly divided into groups in Fig. 3(c) to simplify notation, the grouping can be flexible, i.e., the numbers of RF chains and antennas in different groups can be different.
The proposed groupconnected mapping is a general mapping strategy that incorporates existing mapping strategies as special cases:

When , which means that all RF chains and antennas are in the only one group, the groupconnected mapping reduces to the fullyconnected one, as shown in Fig. 3(a).

When , which means there is only one RF chain in each group, and each of them is connected to antennas, as shown in Fig. 3(b), the mapping strategy corresponds to the partiallyconnected one, and the analog precoding matrix is a block diagonal matrix with each block being an dimension vector [15, Eq. 29].
Inevitably, tradeoffs need to be made among hardware complexity and spectral efficiency. The two existing mapping strategies provide such a tradeoff, but in an extreme way. The fullyconnected mapping strategy is with too low hardware efficiency, while the partiallyconnected one incurs too much performance degradation. In contrast, it will be shown later in Section V that the groupconnected mapping provides a smoother transition between the two extreme cases. To the best of the authors’ knowledge, this is the first proposal for a general mapping strategy in hybrid precoding systems.
Similar to existing mapping strategies, the groupconnected mapping can also be applied to hybrid precoding along with any hardware implementations, e.g., SPS, DPS, and FPS implementations. As this paper mainly focuses on the FPS hardware implementation, we will elaborate the hybrid precoder design with the FPS groupconnected structure in the following.
IvB Hybrid Precoder Design for the FPS GroupConnected Structure
As mentioned before, the number of RF chains and phase shifters has already been reduced by the FPS implementation. On the other hand, the amount of switches depends on the number of connections, which in turn is determined by the mapping strategy. For the groupconnected structure, the analog precoding matrix can be rewritten as
(30) 
where is a block diagonal matrix that extracts the first blocks from the matrix in (3), and with dimension is the switch matrix for the th group. Hence, there are RF chainantenna pairs, and the number of switches in use is , which is reduced by the factor of compared with the FPS fullyconnected structure. Furthermore, the hardware implementation of the analog network is simplified with the groupconnected mapping. In particular, with the conventional fullyconnected mapping, way power dividers and way power combiners are required [37]. In contrast, with the proposed groupconnected mapping, only way power dividers and way power combiners are needed.
Fortunately, the reduced hardware complexity does not incur additional difficulties and computational complexity in hybrid precoder design. Due to the block diagonal structure of , the product of and can be expressed as
(31) 
The matrix is the submatrix consisting of the th to the th rows of . In this way, the hybrid precoder design problem can be decoupled into subproblems, each of which corresponds to one group, given by
(32) 
where is the submatrix that extracts the th to the th rows from . We can observe that each subproblem is with the same form as with the FPS fullyconnected structure. This result is also intuitively true since the mapping strategy within each group is nothing but the fullyconnected one.
Following the same procedures in Sections III and IIIC, the subproblems can be solved in a parallel fashion. The only additional step is to determine whether the matrix is a tall or fat matrix, i.e., to decide whether or not, since they correspond to different ways to update in singlecarrier and multicarrier design, respectively. For the FPS groupconnected structure, the computational complexity of the proposed FPSAltMin algorithm is .
Note that this design methodology for the groupconnected mapping is applicable to any kinds of hardware implementation. This means that the algorithm design for the groupconnected mapping with any hardware implementations can be realized by directly migrating the design for the fullyconnected mapping, which has been investigated in abundant existing works [4, 11, 6, 5, 15]. It also shows the benefits of introducing this groupconnected mapping from the algorithmic perspective.
V Simulation Results
In this section, we evaluate the performance of the proposed FPSAltMin algorithm via simulations. Unless otherwise specified, the BS and each user are equipped with 144 and 16 antennas, respectively, while all the transceivers are equipped with uniform planar arrays. The phases of the available fixed phase shifters are uniformly separated within by equal length intervals. Four users and 128 subcarriers are assumed when considering multiuser OFDM systems. To reduce the cost and power consumption, the minimum number of RF chains is adopted according to the assumptions in Section IIA, i.e., and . The phases of the available fixed phase shifters are uniformly separated within by equallength intervals. The nominal SNR is defined as , and all the simulation results are averaged over 1000 channel realizations. For the fully digital precoder, the BD precoder is adopted, which is asymptotically optimal in high SNR regimes [33]. Furthermore, the SalehValenzuela model is adopted in simulations to characterize mmwave channels [4, 15], and the frequency domain channel matrix for the th subcarrier given by [38, 15]
(33) 
where is the normalization factor. The numbers of clusters and rays in each cluster are represented by and , respectively. The channel gain of the th ray in the th cluster is denoted as . Furthermore, represent the receive and transmit array response vectors, where () and () stand for azimuth and elevation angles of arrival and departure, respectively. While this channel model is used in the simulation, our precoder design does not depend on the channel model and is also applicable to other more general models.
Va SingleUser SingleCarrier (SUSC) Systems
As a great number of previous efforts have been spent on pointtopoint systems, it is intriguing to test the performance of the proposed implementation and algorithm by comparing with existing works as benchmarks. The OMP algorithm proposed in [4, 13] has been widely used as a lowcomplexity algorithm with the analog precoder selected from a predefined set, which contains the array response vectors of the channels. An alternating minimization algorithm was then proposed in [15] to improve the performance over the OMP algorithm, yet with high computational complexity of performing the manifold optimization, referred as the MOAltMin algorithm. For the SPS partiallyconnected structure, a dynamic subarray approach was proposed in [17] to compensate the performance loss caused by the fewer connections between the RF chains and antennas^{5}^{5}5As the algorithm in [17] can only design the hybrid precoder at the BS side, a fully digital combiner is adopted at the user side for this approach while other approaches adopt hybrid combiners in Fig. 4..
In Fig. 4, the performance of a random binary switch matrix in the FPS fullyconnected structure is firstly presented. It shows that this approach is far from satisfactory and therefore a delicate design of the switch matrix is needed. Fig. 4 also compares the performance achieved by the proposed FPSAltMin algorithm in the FPS fullyconnected structure with three existing approaches in the SPS fullyconnected structure. It shows that, although the phase shifters are with fixed phases and the number of them is small, i.e., 30 fixed phase shifters, the proposed FPS fullyconnected structure achieves the highest spectral efficiency. Thanks to the proposed lowcomplexity FPSAltMin algorithm, the simulation time of the proposed algorithm is comparable to the OMP one for the SPS fullyconnected structure. The performance gain in spectral efficiency over the benchmarks is mainly attributed to the proposed FPS hardware implementation, where each signal from an RF chain passes through more than one phase shifter. Furthermore, the results show that the proposed FPSAltMin algorithm leads to an effective design of the dynamic switch network. Note that the MOAltMin algorithm is so far the one that achieves the best performance in the SPS fullyconnected structure, which means the proposed structure and algorithm stand out as an excellent candidate for hybrid precoding with high hardware efficiency, high spectral efficiency, and lowcomplexity design methodology.
VB Multiuser Multicarrier Systems
As we have shown that only a small number of phase shifters is required to approach the performance of the fully digital precoder in SUSC systems, we wonder whether this phenomenon still establishes when the analog precoder is shared by all subcarriers and users in MUMC systems. While the MOAltMin algorithm well tackles the unit modulus constraint induced by the SPS implementation, the extremely high computational complexity hinders its further extension to MUMC systems where the dimension of the optimization scales up quickly.
Besides the fully digital case, we consider the following three baseline cases for comparison. A hybrid precoder design where one phase shifter is optimized in each iteration was developed in [39], which so far achieves the best spectral efficiency in the literature. In addition, Butler matrices can utilize fixed phase shifters and hybrid couplers to realize the SPS fullyconnected structure, and the OMP algorithm is suitable for designing the analog network based on Butler matrices. In [9], the DPS fullyconnected structure was proposed for MUMC systems to approach the performance of the fully digital precoder by sacrificing the hardware efficiency of employing a large number of phase shifters, i.e., phase shifters. In the evaluation of MUMC systems, the DPS fullyconnected structure is adopted as the benchmark, where a simple lowrank matrix approximation is enough for designing the hybrid precoder.
As shown in Fig. 5, the proposed FPS fullyconnected structure only entails little performance loss compared to the DPS fullyconnected one when only 30 fixed phase shifters are adopted. Both the DPS fullyconnected and FPS fullyconnected structures benefit from the operation that allows the same signal to pass through multiple phase shifters, while the main difference between them is the quantized and fixed phases assumed in the FPS one. This simulation result demonstrates that the performance loss caused by the quantization is negligible with the proposed hybrid precoder structure. On the other hand, the FPS fullyconnected structure enjoys significant improvement in terms of spectral efficiency compared with the SPS fullyconnected structure with the algorithm in [39] and the OMP algorithm based on Butler matrices, which illustrates the effectiveness of both the newly proposed implementation and algorithm. More importantly, it indicates that the number of phase shifters can also be sharply reduced by the proposed FPS implementation even if the analog precoder is shared in MUMC systems.
Phase shifter  Other hardware  Total power^{6}^{6}6The total power consumed by the main hardware components in the analog network.  
Type  Hardware  
DPS fullyconnected [9]  2304  Adaptive  N/A  N/A  115.2 W 
FPS fullyconnected  10  Fixed^{7}^{7}7For fair comparisons, the power consumed by the FPS implementation is counted by calculating the power of fixed phase shifters, each of which is with the same power consumption as the fixed phase shifter in the Butler matrix implementation.  Switch  11520  59.2 W 
SPS fullyconnected  1152  Adaptive  N/A  N/A  57.6 W 
4bit quantization [39]  
FPS fullyconnected  2  Fixed  Switch  2304  11.84 W 
SPS fullyconnected  3456  Fixed  Coupler  4032  109.44 W 
with Bulter matrices 
VC Comparisons of Hardware Efficiency
To improve the hardware efficiency, the number of fixed phase shifters, i.e., , should be reduced to a minimum. Thus, a natural question is how many fixed phase shifters are needed to support a satisfactory spectral efficiency. Fig. II plots the spectral efficiency achieved with different numbers of fixed phase shifters, i.e., . The simulation parameters are the same as those in Figs. 4 and 5 for SUSC and MUMC systems, respectively. Fig. II shows that in SUSC systems 15 phase shifters are enough for achieving a satisfactory performance as the spectral efficiency almost saturates when we further increase the number of fixed phase shifters. By contrast, 576 variable phase shifters with arbitrary precision are needed in the SPS implementation. Moreover, the OMP algorithm achieves a lower spectral efficiency and the MOAltMin algorithm suffers from the high computational complexity. A similar phenomenon is found in MUMC systems, i.e., around 10 fixed phase shifters are sufficient, which has not been revealed in existing works. Although the DPS implementation slightly outperforms the proposed FPSAltMin algorithm, it employs 200 times more phase shifters with variable and high resolution. This illustrates that the proposed FPS implementation is much more hardwareefficient than existing hybrid precoder implementations, and also with satisfactory performance, which is quite attractive for practical implementation of hybrid precoding.
As MUMC is more likely to be the system setting in future 5G mmwave networks, we compare the power consumption of different hybrid precoder structures in such systems, as listed in Table II.
As the power consumption of the baseband and RF chains are the same for different hybrid precoder structures, in this section we compare the power consumption of the analog network, which is the distinct part for different structures and is mainly determined by the power consumed by phase shifters, switches or couplers. The total power consumption of the analog network in Table II is calculated as
(34) 
where and are the power consumption of each phase shifter and switch/coupler given in Table I. For fair comparisons, we compare the hardware efficiency by calculating the power consumption of different hybrid precoder structures while keeping comparable spectral efficiency. As indicated in Fig. II, 10 fixed phase shifters in the FPS fullyconnected structure are sufficient to achieve comparable performance as that of the DPS fullyconnected one. Table II shows that, while a switch network is required in the FPS fullyconnected structure, it consumes much less power as the power consumption of each switch is small. This leads to a higher hardware efficiency than the DPS fullyconnected structure that requires a large number of adaptive phase shifters.
On the other hand, it is found in Fig. II that 2 fixed phase shifters in the FPS fullconnected structure are sufficient for achieving a comparable spectral efficiency as the SPS fullyconnected one with the algorithm in [39]. Note that although infinite resolution phase shifters are assumed in [39], quantized phase shifters should be adopted to ensure practical comparison in terms of the power consumption. Therefore, as suggested in [24] all the phase shifters in the SPS fullyconnected structure are quantized with 4 bits. According to Table II, to achieve the same spectral efficiency, the SPS fullyconnected structure needs almost 5 times more power than the FPS fullyconnected one, which again demonstrates the advantages of our proposal in terms of hardware efficiency. In addition, due to the large numbers of fixed phase shifters and hybrid couplers in the Butler matrix implementation, it suffers from a huge power consumption and the lowest spectral efficiency, which results in a low hardware efficiency. Moreover, it is observed that different levels of hardware efficiency can be readily achieved by adapting the number of fixed phase shifters in the FPS fullyconnected structure.
VD The FPS GroupConnected Hybrid Precoder Structure
In this part, we evaluate the spectral efficiency achieved by the proposed groupconnected mapping strategy. By employing this mapping strategy with the FPS implementation, the number of switches can be reduced by a factor of , which is the number of groups in the mapping. In existing works, only the fullyconnected () and p