Analysis of CachedEnabled Hybrid Millimter Wave & Sub6 GHz Massive MIMO Networks
Abstract
This paper focuses on edge caching in mm/Wave hybrid wireless networks, in which all mmWave SBSs and Wave MBSs are capable of storing contents to alleviate the traffic burden on the backhaul link that connect the BSs and the core network to retrieve the noncached contents. The main aim of this work is to address the effect of capacitylimited backhaul on the average success probability (ASP) of file delivery and latency. In particular, we consider a more practical mmWave hybrid beamforming in small cells and massive MIMO communication in macro cells. Based on stochastic geometry and a simple retransmission protocol, we derive the association probabilities by which the ASP of file delivery and latency are derived. Taking no caching event as the benchmark, we evaluate these QoS performance metrics under MC and UC placement policies. The theoretical results demonstrate that backhaul capacity indeed has a significant impact on network performance especially under weak backhaul capacity. Besides, we also show the tradeoff among cache size, retransmission attempts, ASP of file delivery, and latency. The interplay shows that cache size and retransmission under different caching placement schemes alleviates the backhaul requirements. Simulation results are present to valid our analysis.
I Introduction
The huge mobile traffic in wireless communications, mainly caused by the the mobile video traffic that accounts for the majority of the total mobile data traffic, has brought us needs for 5G and beyond 5G technologies [1, 2]. Currently, massive multipleinput multipleoutput (MIMO) communication, millimeterwave communication, and network densification through heterogeneous networks (HetNets) are the most three promising techniques proposed for 5G wireless communication systems. However, though the above potential solutions are beneficial for the access links, they do little to alleviate the burden on the backhaul links that connect edge base stations (BSs) to the data center in the core network. Further, their requirement for the existence of expensive backhaul links exaggerates the backhaul congestion issue during peak hours. In particular, it is found that only a small percentage (5–10%) contents (i.e., treated as the popular contents) are repeatedly requested by the majority of users, which results in a substantial amount of the redundant data traffic over networks. Motivated by this, caching the popular contents at the edge nodes closer to users (i.e., wireless edge caching) is proposed as a promising solution to offload the traffic of backhual links and reduce the backhaul cost and latency. Caching technique includes two different phases. while the first phase is caching placement phase that is conducted during offpeak hours according to the statistics of the users’ requests and the main limitation of this phase is the caching capacity, the second is content delivery phase that is performed after the actual requests of the users have been revealed and the main limitation of this phase is the QoS requirements.
As aforesaid, since the caching capacity at edge nodes is limited and much less than the total amount of popular contents of users’ interest, it is necessary to design proper caching placement strategies to make exploit use of caching benefit. However, due to more dynamic wireless networks than wired networks, implementing caching technique in wireless networks is more challenging than wired networks and wired caching strategies cannot be directly applied to wireless networks. Put another way, the unique transmission features and randomness in cellular networks, e.g., fading channel, limited spectrum, and cochannel interference, are required to take into consideration when designing efficient caching strategies.
Recent studies have focused on the caching design and analysis in various scenarios. Both centralized and decentralized coded caching were studied in a basic model with a shared errorfree link to acquire more caching gains by creating more multicast transmission in [3] and [4]. Futher, by taking into network topology into consideration, the optimal caching placement strategies were designed to minimize the average sum delay for both coded and uncoded scenarios in a simple cacheenabled femtocell networks (i.e., caching helper networks) [5]. Futher, the throughput scaling law was studied with the random caching strategy in a simple gridmodelled D2D networks [6]. However, the network models considered in [3, 4, 5, 6] did not capture the stochastic natures of channel fading, interference, annd geographic locations of network nodes. In order to take account of realistic cellular networks, some other works focused on caching technique in a stochastic geometric framework. In [7], a probabilistic caching model was applied to a singletier stochastic wireless caching helper networks and the optimal caching placement was designed in terms of average success probability of delivery for both noiselimited and interferencelimited scenarios. In [8], caching cooperation was studied in a same caching helper networks and the nearoptimal caching placement scheme that maximizes the approximated average success probability of delivery was acquired. Further, optimal caching placement strategy in the tier HetNets was designed, where the optimal caching probabilities maximize the average success probability of delivery [9]. With stochastic geometric framework, D2D caching was investigated in literature such as [10], [11]. While [10] evaluated the throughput gain acquired with different optimal cachehit and throughput caching placement strategies, [11] considered a D2D underlaid cellular network in which an optimal cooperative caching placement was studied whose performance of the average success probability (ASP) outperforms other caching placement strategies.
However, on one hand, prior works focused on cache hit event to acquire the optimal caching strategies but they did not take cache miss event into consideration and did not justify the design and insights of caching on backhaul limitations; on the other hand, caching capacity of local BSs can be treated as a new type of resources of wireless networks other than time, frequency, and space. Therefore, the emerging radioaccess technologies and wireless network architectures provide edge caching new opportunities to fulfil the common goals of improving the quality of service (QoS) and quality of experience (QoE) for users, which makes it imperative to investigate the performance of these technologies in a coexisted framework. In this regard, the work [12] evaluated the impact of backhaul delay and proposed an optimal caching strategy with respect to average download delay while [13] considered backhaul effect on throughput and delay analysis in multitier HetNets. Moreover, the backhaul effect were also taken into consideration in literature [14, 15]. While [14] aimed to find out the tradeoff between optimal cache size and backhaul requirement, the work [15] aimed to analyze the impact of backhaul on the energy efficiency in HetNets. From another research direction, research work has focused on caching in mmWave networks and mmWave/Wave hybrid networks in literature [16, 17, 18, 19] due to the problem of Wave spectrum crunch. However, none of the above work have studied the cacheenabled hybrid HetNets with limited backhaul transmission and considered a relatively practical mmWave hybrid beamforming together with massive MIMO.
In this paper, we focus on edge caching in a backhaul limited mm/Wave hybrid network assisted with massive MIMO, which has been understood yet, especially considering the mmwave hybrid beamforming. On one hand, mmWave hyrbid beamforming is motivated by the fact that the cost and power hungry for largescale antenna arrays at mmWave bands; on the other hand, the backhaul implementation cost is very expensive, especially for the large capacity backhaul. Therefore, it is necessary to investigate the what network design parameters can help alleviate the backhaul capacity requirement and how backhaul impact works on different performance metrics. Our work contributions are summarized as follows:

We consider cacheenabled hybrid HetNets, where the locations of nodes are modelled as homogeneous Poison point processes (PPPs). In particular, small BSs (SBSs) aided with mmWave hybrid beamforming operated at mmWave frequencies and macro BSs (MBSs) aided with massive MIMO operated at Wave frequencies are equipped with finite cache size to store some popular contents. Moreover, we also consider limited backhaul links between BSs and the gateways, which has not been studied in the existing literature.

We first derive the association probabilities by which the probability of the typical user is associated with different BSs is characterized. Then we derive the PDF of distance between the serving BS and the typical user.

Considering mmWave and Wave transmission, we derive retransmission based ASP of file delivery, latency, and backhaul load per unit area based on stochastic geometry.

Taking no caching as the benchmark, we numerically analyze the performance of ASP of file delivery, latency, and backhaul load per unit area under different caching strategies with respect to many significant network design parameters, such as cache size, antenna number, backhaul capacity, blockage density, target data rate, blockage density, the number of retransmission attempt, and content popularity. We demonstrate that weak and strong backhaul have different impacts on different caching strategies due to association probabilities, e.g., when the backhaul capacity is huge, UC performs worse than no caching case. Besides, we also evaluate the tradeoff between cache size and backhaul capacity. Moreover, we confirm that retransmission is a good solution to improve QoS by increasing retransmission attempts but we also show the tradeoff between ASP of file delivery and latency.
Ii System Model
In this section, we introduce the network topology, the caching model, the cacheenabled content access protocol, the partial probabilistic caching placement schemes, and the user association policy. The main notations used in this paper are summarized in Table I.
Notations  Physical meaning 

, ,  PPP distributed locations of Wave MBSs, mmWave SBSs, and UEs 
, , ,  Spatial densities of Wave MBSs, mmWave SBSs, UEs, and gateways 
,  Number of transmit antennas at each Wave MBS and mmWave SBS 
,  Number of receive antennas at each UEs 
,  Transmit power of each Wave MBS and mmWave SBS 
i.e.,  The limited file set with files 
with  The probability for requesting the th file 
,  Cache sizes of each Wave MBS and mmWave SBS 
The number of bits per file  
The backhaul capacity per BS (either mm or Wave)  
, , ,  Locations of the associated mmWave SBSs with LOS and NLOS transmission for 
the th file  
,  Locations of the associated cache hit and cache miss Wave MBSs for the th file 
, , ,  Distances between the associated mmWave SBSs with LOS and NLOS 
transmissions and the typical user  
,  Distances between the cache hit and cache miss Wave MBSs and the typical user 
Blockage density  
,  Biased factors 
,  The LOS and NLOS probabilities of the channel link 
The set of users that can be served by a BS  
with  The number of users served by the associated mmWave or Wave BS 
RF chain  
The number of users associated with the associated BS  
with  The number of paths 
,  AOA and AOD 
Channel coefficient of Wave communication  
Channel coefficient of mmWave communication 
Iia Network Architecture
We consider the downlink of a twotier cacheenabled hybrid wireless heterogeneous network, where massive MIMOaided macro BSs (Wave MBSs) operated at sub6GHz Wave spectrum are overlaid with successively denser small BSs (mmWave SBSs) operated at mmWave spectrum.By utilizing the stochastic geometric framework, SBSs and MBSs are deployed in a 2D Euclidean plane based on mutually independent homogeneous Poisson point processes (HPPPs) and with densities and , respectively. Accordingly, all users are also distributed as another PPP with density . Particularly, in practical system there are more users than BSs, thus we assume in this work. Since mmWave and Wave transmissions occur in different frequency bands, both transmissions can be considered to be orthogonal to each other with different transmitting as well as receiving antenna interfaces. Put another way, the set of BSs or users belonging to a certain network (small cell network (SCN) or macro cell network (MCN)) operate in the same spectrum (mm or Wave), it does not interfere with the set of BSs or users from the other network. Further, all MBSs and SBSs are equipped with and antennas with transmit power and , respectively. Considering two different transmissions in the work, the user equipment (UEs) are assumed to be equipped with two sets of RF chains with antennas and to independently received Wave and mmWave signals, respectively.
Remark 1.
We assume and . This is due to the intrinsic relation between wavelength of signals and antenna separation, whereby the wavelength of Wave signals is much larger than mmWave signals and hence, much larger separation is required between antennas for Wave to avoid correlation and coupling. As a result, accommodating more than one antenna at small devices, like mobile phones operating in the Wave spectrum may not be feasible.
Compared to closed access, this work considers open access, which a user is allowed to be asscosiated with any tier’s BSs to provide bestcase coverage. For analytical tractability, instead of considering the exact average biased received signal power, we consider a cell association criterion based on least biased path loss with a bias factor that includes the average channel gain and all other effects to control the cell range. This enables the cell association to be tuned for balancing the cell load or other purposes. When the bias factor is greater than one, it extends the cell range. Otherwise, it shrinks the cell range. Due to mmWave short propagation distance, it is reasonable to adjust bias factor smaller than one. Based on the least biased path loss association criterion, the user will be served by either a mmWave SBS or a Wave MBS with differennt association probabilities. Due to Slivnyak’s theorem that ensures the statistics measured at a random point of a PPP is the same as those measured at the origin [20]. Therefore, the analysis hereinafter is performed for the typical user located at the origin, denoted by . Besides, it is worth noting that the propagation between the mmWave SBS and the typical user is via a fully connected hybrid precoder that combines radio frequency (RF) and baseband (BB) precoding. Due to the sparsity of mmWave channels, we assume that all scattering happens in the azimuth plane. Therefore, the uniform linear array (ULA) is justifiably assumed to employ at each mmWave BS and UE with size of and , respectively. In contrast, the propagation between the Wave MBS and the typical user is via massive MIMO where we do not consider any training in the forward link and therefore, the users do not have nay channel knowledge. In particular, the pilot contamination problem is mainly due to uplink training with nonorthogonal pilots. Since the main work focuses on analysis of caching placement in backhaullimited hybrid network, the problem of pilot contamination is avoided and not considered by assuming the assignment of pilot sequences to the users who are associated with MBSs is orthogonal. In this regard, we consider that the number of users associated to MCN is less than the available pilot length.
IiB Caching model
It has been observed that people are always interested in the most popular multimedia contents, where only a small portion of the contents are frequently accessed by the majority of users. This work assumes that the finite file set consisting of multimedia contents on the content server, where all BSs can get access to the content server to retrieve the cache miss contents via the capacitylimited backhaul links. In order to avoid backhaul burden caused by redundant transmissions of the repeated requests during peak hours, caching is implemented at all BSs (including mmWave and Wave BSs) but with different cache sizes and , respectively, such that . For clarity, all files in the file set are of equal size, denoted by bits. This assumption is justifiable due to the fact that each file can be divided into multiple chunks of equal size or different sizes. Hereinafter, for notational simplicity, we use file index to denote each file, namely . Then, each mmWave and Wave BS can cache up to bits and bits, respectively. Further, each user independently and identically requests the th file from the file set according to the Zipf distribution given by , where a skewness parameter controls the skewness of the content popularity distribution. The content popularity tends to be uniform distribution when goes to zero. However, even though Zipf distribution is commonly utilized in [21, 22], especially for videos and web pages, the analysis of this work can also be applied to other content popularity distribution and it is expected to exhibit similar analytical results.
IiC Cacheenabled Push and Delivery Content Access Protocol
Based on the client/server (C/S) structure, all BSs are connected with a default gateway (or central controller) via a capacitylimited wired backhaul solution while the highcapacity wired backhaul solution is assumed for the connection between a gateway and content server, which supports relatively highly reliable transmission. Then this work only considers the effect of backhaul from the connection between the BSs and the centrol controller while neglecting the effect of the backhaul connection between the central controller and the content server. Particularly, the limited backhaul capacity is equally allocated to all BSs, the backhaul capacity of each BS, denoted by , is the function of BS density, given as [23, 24]
(1) 
where , are arbitrary coefficients with regard to the capacity of backhaul links. The more the number of BSs included in the network, the less capacity of each BS it is.
During peak hours, the users’ requests are collected to estimate the content popularity distribution by means of some estimation technologies. For analytical tractability, hereinafter we assume that the popularity of the files is perfectly known and stationary. This assumption is perhaps over simplistic, but we leave the investigation of unknown and timevarying popularity for future work. In particular, for given the timevarying content popularity, the seek of new caching placement schemes and the analysis incorporated with estimation error should be required although it is not addressed in this paper. During offpeak hours, the network traffic load is relatively low and cache placement phase is implemented according to the content popularity distribution and some specific proactive caching placement policies. By pushing some desired contents to the local caches of the BSs via broadcasting, the aim is to further alleviate the network traffic burden in the content delivery phase during peak hours. In particular, all the cacheenabled BSs proactively fetch the same copy of the contents up to the full cache sizes through some certain caching placement schemes that will be given in details in the following subsection.
When a user requests a content from the file set , it initially checks if the requested content is available in the local caches of the associated BS. If the requested content is cached at the associated BS, then it directly serves the requesting user without the need of backhaul links. Otherwise, the requested content should be retrieved initially by BSs from the content server in the core network via capacitylimited bakchaul links modelled by (1), then forwarded to the requesting user via wireless access links. As mentioned above, considering the typical user requests the th file, there are total six possible content access and association events, including both cache hit and cache miss scenarios described as follows. In view of the fact that mmWave signals are sensitive to blockages in the network, such as trees, concrete buildings, and so forth, this work considers two different transmission conditions i.e., LOS and NLOS transmissions with different path loss coefficients. This content is provided in detail in the association subsection.
Scenario 1: The typical user associated with a mmWave BS located at that has the requested file in its local caches is served in LOS transmission. The distance between the typical user and the associated BS is denoted by .
Scenario 2: The typical user associated with a mmWave BS located at that has the requested file in its local caches is served in NLOS transmission. The distance between the typical user and the associated BS is denoted by .
Scenario 3: The typical user associated with a mmWave BS located at that has not the requested file in its local caches is served in LOS transmission. The requested file is forwarded to the typical user via backhaul link and access link in order. The distance between the typical user and the associated BS is denoted by .
Scenario 4: The typical user associated with a mmWave BS located at that has not the requested file in its local caches is served in NLOS transmission. The requested file is forwarded to the typical user via backhaul link and access link in order. The distance between the typical user and the associated BS is denoted by .
Scenario 5: The typical user is associated with a Wave BS located at that has the requested file in its local caches. The distance between the typical user and the associated BS is denoted by .
Scenario 6: The typical user is associated with a Wave BS located at that has not the requested file in its local caches. The requested file is forwarded to the typical user via backhaul link and access link in order. The distance between the typical user and the associated BS is denoted by .
The Scenario 1 – 4 are for the typical user associated with SCN while Scenario 5 – 6 are for the typical user associated with MCN. In particular, the work takes no caching scenario as the benchmark where all BSs are not able to cache any files. We will give the association probability of the typical user associated with the aforesaid event in the association subsection.
IiD Caching placement schemes
As for proactive caching, the content placement is usually conducted during offpeak traffic periods. In this phase, the network prefetches the content to the storage by means of some caching placement strategies to decide which file should be cached in which BSs. Different from wired networks with fixed and known network topology, highly dynamic wireless network topology makes it impossible to known as a prior that which user will associate with which BS due to undetermined user locations. In order to deal with this problem, this work utilizes the probabilistic content caching policy rather than the deterministic caching policy considered in wired networks, where the contents are independently and randomly cached with given probabilities in a distributed manner, so it can be applicable even to complex networks with high flexibility.
This work applies three probabilistic caching placement schemes – uniform caching (UC), caching most popular contents (MC), and random caching (RC) – that are commonly utilized in most existing work. In particular, we consider all the BSs in a same tier with the same caching probabilities and each BS caches contents with the given probabilities independently of other BSs. For clarity, we define the caching probabilities that Wave and mmWave BSs caching the contents as with subscript denoting either mmWave SCN or Wave MCN, respectively. Based on thinning theorem, the thinned PPP consisting of the BSs storing the th file is characterized by the density .
UC: The contents in the file set are uniformly selected to be cached in the local caches with caching probabilities given as with and .
MC: The first contents in the file set are certainly selected to be cached in the local caches with caching probabilities given as
(2) 
with .
RC: The contents in the file set are randomly selected to be cached in the local caches with caching probabilities given as with . In fact, it is vague to use the term random caching since we have not defined the distribution to generate the caching probabilities. For simplicity, this work considers that each caching probability uniformly chooses a value between and . In particular, the summation of all caching probabilities has the mean of . However, when the cache size is too small such that , it will never generate a valid realisation of such a random caching probability by all means. In order to deal with it, we introduce a scaler to make the mean approximately equal to . In this manner, it is reasonable to expect that random caching is slightly better than uniform caching. In fact, the performance of random caching is highly related to this generating distribution.
Finally, no caching is defined as with , which is considered as the benchmark.
IiE Association probability
Unlike other existing work that considers the users only associated with BSs caching the requested files to find the optimal caching placement, this work considers both cache hit and cache miss association scenarios to figure out the backhaul effects. Besides, due to mmWave LOS/NLOS transmissions, mmWave SBS coverage areas are no longer the typical weighted Voronoi cell since a user can associate with a far away BS with LOS transmission rather than a closest BS with NLOS transmission. Thus, least distance association criterion is not suitable any more. For simplicity, we consider least biased path loss as the user association strategy instead of maximum biased received signal power. The least biased path loss for mmWave and Wave BSs are given by with and , respectivley. The bias factor is to control the cell range. Usually they are set as 1’s in [26]. However, due to mmWave short propagation distance, we slightly shrink the mmWave SBS coverage area by setting less than 1 while keeping as 1.
This work adopts a two state statistical blockage model for each link as in [27], such that the probability of the link to be LOS or NLOS state is a function of the distance between the typical user and the serving mmWave SBS. Assume the distance between them is , the probability of the link is LOS state is and NLOS state is where is the blockage density. Based on the thinning theorem, the PPP is further thinned as and in terms of LOS and NLOS states with density and , respectively.
As described in the above section, we have defined 6 association events. Now the following Lemma 1 and 2 gives the 6 association probabilities for the typical user associated with SCN and MCN, respectively.
Lemma 1.
The probabilities that the typical user requesting the th file is associated with cache hit mmWave SBSs located at and with LOS and NLOS transmissions are given by
(3) 
(4) 
and the probabilities that the typical user requesting the th file is associated with cache miss mmWave SBSs located at and with LOS and NLOS transmissions are given by
(5) 
(6) 
where and .
Proof.
Lemma 2.
The probability that the typical user requesting the th file is associated with cache hit Wave MBS located at is given by
(7) 
and the probability that the typical user requesting the th file is associated with cache miss Wave MBS located at is given by
(8) 
where all parameters have already defined in Lemma 1.
Proof.
The proof of the Lemma is similar to the Lemma 1. ∎
IiF Average Number of Users
Each mmWave SBS and Wave MBS serves multiple users simultaneously with equal power allocation. Consequently, the link capacity of each user is reduced by a fraction of the number of users served simultaneously. If the numbers of users simultaneously served by a mmWave and a Wave BS located at and are denoted as and , respectively. Since the association coverage areas are different from a distancebased PoissonVoronoi cells, it is complicated to compute the exact cell distribution. Consequently, this work considers the average number of users following the same assumption in [20], where the average number of users are given by assuming the same mean cell areas as that of the PoissonVoronoi cell areas. The average number of users associated with the tagged mmWave and Wave BSs are given by [28, 29]
(9) 
where and are the association probability of the typical user associated with SCN and MCN, respectively, such that and . In particular, it is worth noting that and are not the function of caching probabilities since the total number of users associated with SCN and MCN includes all cache hit and cache miss users, by which the caching probabilities are averaged out.
Accordingly, the numbers of associated users of the other mmWave and Wave BSs except the tagged mmWave and Wave BS are given as
(10) 
where and are the number of users associated with SCN and MCN, respectively.
However, in practice, due to the finite number of RF chains (antennas), the associated users to serve should not be more than the available number of RF chains (antennas) in one time/frequency resource block. Assume the set of the served users of each mmWave located at and Wave BS located at are denoted by and , respectively. Accordingly, the cardinalities of the sets and are given by and , respectively. Unlike mmWave hybrid beamforming, this paper applied the fullydigital baseband processing to massive MIMO where each RF chain per antenna is applied. However, this approach is impractical for mmWave BSs equipped with much larger antenna arrays than massive MIMO. Therefore, hybrid beamforming^{1}^{1}1The next section provides the more details. is implemented and the number of RF chains is not less than the number of antennas due to hardware complexity, power consumption, and cost. Consider the mmWave RF chains is , the total number of served UEs of a tagged Wave and tagged mmWave BS are and , respectively. The total number of users of any other Wave BS and mmWave BS except the tagged Wave and mmWave BS are and , respectively. For notational simplicity, hereinafter the average number of users of the tagged mmWave BS and other mmWave BS except the tagged mmWave BS are expressed as and , respectively. And the average number of users of the tagged Wave BS and other Wave BS except the tagged Wave BS are expressed as and , respectively.
IiG Distribution of the distance between the typical user and the serving BS
Unlike the distance between any points in a PPP, which is given as the nearest neighbour distance distribution, this subsection derives the distribution of the distance between the serving BS and the typical user as the conditional probability.
Assume that the distances between the typical user and the serving cache hit and cache miss mmWave SBS with and without the requested th file are denoted by , , , and , respectively. Similarly, the distances between the typical user and the serving cache hit and cache miss Wave MBS are denoted by and , respectively. The following Lemma provides the probability density function (PDF) for each of these distances.
Lemma 3.
The PDF of , , , and of SCN and and of MCN are given as follows:
(11)  
(12)  
(13)  
(14)  
(15)  
(16) 
where and are defined in the above and all other parameters are provided before as well.
Proof.
The proof of the Lemma can be found in the Appendix B. ∎
Iii Propagation model
This section we model the mmWave hybrid beamforming and massive MIMO channels. Particularly, the objective of this section is to illustrate the propagation model. For simplicity, we only consider cache hit association event as the example where the typical user is served by the BS that caches the requested file. Here we assume that the typical user, located at origin, requesting the th file, denoted by , would be associated with either a mmWave SBS located at with depending on the LOS or NLOS state or a Wave MBS located at .
Iiia MmWave propagation model
The propagation between the typical user and its associated mmWave SBS is via a fully connected hybrid precoder that consists of both RF and BB precoders. However, for simplicity, we assume that each mmWave SBS transmits a total of streams of data to serve multiple users but one single stream per user. Therefore, it is sufficient that each user adopts a RFonly combiner with analog beamforming to decode the transmitted single [30].
1) Channel Model: The mmWave channel between the serving BS and the typical user, denoted by , is written as^{2}^{2}2Hereinafter, for notational simplicity, the typical user subscript is ignored.
(17) 
where is the complex channel gain on the th path, distributed as a smallscale Rayleigh fading distribution for both LOS and NLOS paths for analytical tractability [30], is the number for paths from the serving BS to the typical user^{3}^{3}3As mentioned in [30], this work considers multiple LOS and NLOS paths since a more general channel model would incorporate scenarios with one or more LOS paths as well as NLOS paths and assume each scatterer provides a single dominant path.. It is expected that as in [30]., is the angle of arrival (AOA), is the angle of departure (AOD), is the the path loss exponent that is different for LOS and NLOS paths. and are the array response vectors of each user and BS, respectively, which are modelled as uniform linear arrays (ULAs)
(18)  
(19) 
where is the distance between antenna elements, commonly is the half of the wavelength () of the transmitted signal.
2) Received Signal: After passing through BB and RF precoders, channel, and RF combiner, the received signal sent from the mmWave BS located at to the typical user is given by
(20) 
where the effective channel gain . The BB and RF precoders are defined as and , respectively. The RF combiner is defined as where each entity with . The transmitted data stream from the mmWave BS is defined as . The noise term . is the average received signal power where the total power enforces .
3) Design of hybrid precoding: Even though we have provided the received signal at the typical user requesting the th file served by the mmWave BS , the BB and RF beamforming vectors have not been defined. In order to eliminate the inter user interference shown in (IIIA), we utilize zeroforcing (ZF) beamforming at the BB precoder, such that .
As for RF precoder and combiner vectors, this work follows a near optimal method in [31] to achieve the nearoptimal received signal power. As a result, the RF precoding and combing vectors are given by and , respectively, where is the path that has the best channel gain i.e., .
4) SINR characterization: Based on (IIIA), the SINR of the typical user from the mmWave BS is formulated as
(21) 
However, the analysis on (21) is not tractable. Hereinafter, we give a tractable SINR approximation according to [30, 32] with the assumption . In particular, the first term in the denominator is zero due to ZF BB precoding and it is reduced to
(22) 
where using the ON/OFF approximation model for the array response vectors^{4}^{4}4Assume , the array response model enhances the analysis tractability since if . Otherwise, it is a nonzero value but lower bounded by due to ZF precoding., is the ZF precoding penalty that is the probability that the signal power is mainly dominant at the typical user and the signal powers of other users in the tagged cell are lower bounded by , given by [32]
(23) 
The second term in the denominator, , the intercellinterference, given by
(24)  
However, unlike the fact that IUI in the tagged cell is cancelled by ZF precoding, there is no ZF penalty on any of the interfering signals from other BSs except the serving BS. Therefore, ON/OFF model approximation made for the IUI is not suitable and accurate to ICI. Instead of setting to the unaligned interfering signal power, we give another values and to approximate the term by
(25) 
Further, (24) is reduced to a simple closedform expression that will be used in the Performance Metric section.
IiiB Massive MIMO
Received signal: For massive MIMO access link, this work applies ZF beamforming with equal power allocation to the massive MIMO enabled MBSs. Besides, the massive MIMO enabled MBSs performs the average channel estimation within TDD mode to acquire the channel state information (CSI) while not performing any channel estimation at users due to the channel reciprocity in TDD mode, which users do not have any channel knowledge. Unlike using instantaneous CSI estimation to calculate the normal SINR measure, this work considers the worst Gaussian noise as a lower bounding technique for any type of precoding strategies [33]. Consequently, the additional selfinterference caused by the average channel estimation appears in the SINR denominator and considered as part of the noise. However, the method in [33] did not utilize stochastic geometry while this work considers the dense cellular network from the stochastic geometric framework. Based on the instantaneous received signal expression rewritten to compute the achievable data rate [33, 34], the received signal at the typical user requesting the th file is shown as
(26)  
where is the transmitted signal from the MBS at to the user at the origin, is the channel gain among the serving Wave MBS located at and the user located at the origin. is the channel gain between the MBS except the serving MBS to the typical user. is the distance between the serving MBS and the typical user. is the noise term.
Iv Performance Metric
The QoS is closely related to the ASP of file delivery, the delay experienced by users, network capacity, and the backhaul load. However, it is evident that the downlink transmission over the wireless medium incurs outage and delay mainly due to the interference from concurrent transmissions and channel fading. Therefore, we consider a simple retransmission protocol where a packet of requested content is repeatedly transmitted until its successful delivery, up to a predefined number of retransmission attempts . Indeed, inferring whether a packet delivery is successful or not at the BS essentially relies on the SINR being higher than the predefined threshold . If a packet is delivered successfully, we shall assume that the BS receives a onebit acknowledgement message from the UE with negligible delay and error. Otherwise, if the delivery fails, the BS receives a onebit negative acknowledge message in the same vein. An outage event occurs if the packet is not delivered after attempts. Therefore, this work considers three retransmission based performance metrics – retransmission based ASP of the file delivery, average packet delay, throughput per user, and the backhaul load per unit area. In particular, we consider the static scenario where the locations of users and BSs are stationary in the consecutive retransmission attempts. The channel fading power coefficient is considered stationary and i.i.d. in each attempt slot. For analytical tractability and simplicity, we neglect the temporal interference correlation and every attempt is an independent event to give the bestcase retransmission based ASP of file delivery (upper bound), the packet delay (lower bound), throughput per user(upper bound), and the backhaul load per area (upper bound). For more details about the complete characterization of retransmission based ASP of file delivery and latency, the readers are recommended to look into [35] and [36].
Iva Retransmission based ASP of file delivery
Due to the independence of PPPs and , the total 6 association events are considered as independent and in each event the retransmission protocol is implemented. Therefore, according to the law of total probability and the conditional probability, the retransmission based the ASP of file delivery is given by