DataAided Secure Massive MIMO Transmission with Active Eavesdropping
Abstract
In this paper, we study the design of secure communication for time division duplexing multicell multiuser massive multipleinput multipleoutput (MIMO) systems with active eavesdropping. We assume that the eavesdropper actively attacks the uplink pilot transmission and the uplink data transmission before eavesdropping the downlink data transmission phase of the desired users. We exploit both the received pilots and data signals for uplink channel estimation. We show analytically that when the number of transmit antennas and the length of the data vector both tend to infinity, the signals of the desired user and the eavesdropper lie in different eigenspaces of the received signal matrix at the base station if their signal powers are different. This finding reveals that decreasing (instead of increasing) the desire user’s signal power might be an effective approach to combat a strong active attack from an eavesdropper. Inspired by this result, we propose a dataaided secure downlink transmission scheme and derive an asymptotic achievable secrecy sumrate expression for the proposed design. Numerical results indicate that under strong active attacks, the proposed design achieves significant secrecy rate gains compared to the conventional design employing matched filter precoding and artificial noise generation.
I Introduction
Wireless networks are widely used in civilian and military applications and have become an indispensable part of our daily lifes. Therefore, security is a critical issue for future wireless networks. Conventional security approaches based on cryptographic techniques have many wellknown weaknesses. Therefore, new approaches to security based on information theoretical concepts, such as the secrecy capacity of the propagation channel, have been developed and are collectively referred to as physical layer security [1, 2, 3, 4].
Massive MIMO is a promising approach for efficient transmission of massive amounts of information and is regarded as one of the “big three” 5G technologies [5]. Most studies on physical layer security in massive MIMO systems assume that the eavesdropper is passive and does not attack the communication process of the systems [6, 7, 8, 9]. However, a smart eavesdropper can perform the pilot contamination attack to jeopardize the channel estimation process at the base station [10]. Due to the channel hardening effect caused by large antenna arrays, the pilot contamination attack results in a serious secrecy threat to time division duplexing (TDD)based massive MIMO systems [10].
The authors of [11] propose a secret key agreement protocol for singlecell multiuser massive MIMO systems under the pilot contamination attack. An estimator for the base station (BS) is designed to evaluate the information leakage. Then, the BS and the desired users perform secure communication by adjusting the length of the secrecy key based on the estimated information leakage. Other works have studied how to combat the pilot contamination attack. The authors of [12] investigate the pilot contamination attack problem for singlecell multiuser massive MIMO systems over independent and identically distributed (i.i.d.) fading channels. The eavesdropper is assumed to only know the pilot signal set whose size scales polynomially with the number of transmit antenna. For each transmission, the desired users randomly select certain pilot signals from this set, which are unknown to the eavesdropper. In this case, it is proved that the impact of the pilot contamination attack can be eliminated as the number of transmit antenna goes to infinity. For the more pessimistic assumption that the eavesdropper knows the exact pilot signals of the desired users for each transmission, the secrecy threat caused by the pilot contamination attack in multicell multiuser massive MIMO systems over correlated fading channels is analyzed in [10]. Based on this, three transmission strategies for combating the pilot contamination attack are proposed. However, the designs in [10] are not able to guarantee a high (or even a nonzero) secrecy rate for weakly correlated or i.i.d. fading channels under a strong pilot contamination attack.
In this paper, we investigate secure transmission for i.i.d. fading^{1}^{1}1For simplicity of presentation, we assume i.i.d. fading to present the basic idea of dataaided secure massive MIMO transmission. The results can be extended to the general case of correlated fading channels by combining the techniques in [10] with those in this paper. This will be considered in extended journal version of this paper. TDD multicell multiuser massive MIMO systems under a strong active attack. We assume the system performs first uplink training followed by an uplink data transmission phase and a downlink data transmission phase. The eavesdropper jams the uplink training phase and the uplink data transmission phase and then eavesdrops the downlink data transmission^{2}^{2}2.
We utilize the uplink transmission data to aid the channel estimation at the BS. Then, based on the estimated channels, the BS designs precoders for the downlink transmission.
This paper makes the following key contributions:

We prove that when the number of transmit antennas and the amount transmitted data both approach infinity, the desired users’ and the eavesdropper’s signals lie in different eigenspaces of the uplink received signal matrix due to their power differences. Our results reveal that increasing the power gap between the desired users’ and the eavesdropper’s signals is beneficial for separating the desired users and the eavesdropper. This implies that when facing a strong active attack, decreasing (instead of increasing) the desired users’ signal power could be an effective approach to enable secrete communication.

Inspired by this observation, we propose a joint uplink and downlink dataaided transmission scheme to combat strong active attacks from an eavesdropper. Then, we derive an asymptotic achievable secrecy sumrate expression for this scheme. The derived expression indicates that the impact of an active attack on the uplink transmission can be completely eliminated by the proposed design.

Our numerical results reveal that the proposed design achieves a good secrecy performance under strong active attacks, while the conventional design employing matched filter precoding and artificial noise generation (MFAN) [10] is not able to guarantee secure communication in this case.
Notation: Vectors are denoted by lowercase boldface letters; matrices are denoted by uppercase boldface letters. Superscripts , , and stand for the matrix transpose, conjugate, and conjugatetranspose operations, respectively. We use and to denote the trace and the inverse of matrix , respectively. denotes a diagonal matrix with the elements of vector on its main diagonal. denotes a diagonal matrix containing the diagonal elements of matrix on the main diagonal. The identity matrix is denoted by , and the allzero matrix and the allzero vector are denoted by . The fields of complex and real numbers are denoted by and , respectively. denotes statistical expectation. denotes the element in the th row and th column of matrix . denotes the th entry of vector . denotes the Kronecker product. denotes a circularly symmetric complex vector with zero mean and covariance matrix . denotes the variance of random variable . stands for . means that is much larger than .
Ii Uplink Transmission
Throughout the paper, we adopt the following transmission protocol. We assume the uplink transmission phase, composing the uplink training and the uplink data transmission, which is followed by a downlink data transmission phase.
We assume the main objective of the eavesdropper is to eavesdrop the downlink data. The eavesdropper chooses to attack the uplink transmission phase to impair the channel estimation phase at the BS. The resulting mismatched channel estimation will increase the information leakage in the subsequent downlink transmission. In the downlink transmission phase, the eavesdropper does not attack but focuses on eavesdropping the data.
We study a multicell multiuser system with cells. We assume an antenna BS and singleantenna users are present in each cell. The cells are index by , where cell is the cell of interest. We assume an antenna active eavesdropper^{3}^{3}3An antenna eavesdropper is equivalent to cooperative singleantenna eavesdroppers. is located in the cell of interest and attempts to eavesdrop the data intended for all users in the cell. The eavesdropper sends pilot signals and artificial noise to interfere channel estimation and uplink data transmission^{4}^{4}4We note that if the eavesdropper only attacks the channel estimation phase and remains silent during the uplink data transmission, then the impact of this attack can be easily eliminated with the joint channel estimation and data detection scheme in [13]. Therefore, a smart eavesdropper will attack the entire uplink transmission., respectively. Let and denote the coherence time of the channel and the length of the pilot signal, respectively. Then, for uplink transmission, the received pilot signal matrix and the received data signal matrix at the BS in cell are given by^{5}^{5}5For notation simplicity, we assume the users in each cell use the same transmit power [6]. Following the similar techniques in this paper, the results can be easily extended to the case of different transmit powers of the users in each cell.
(1) 
(2) 
where , , and denote the average transmit power, the pilot sequence, and the uplink transmission data of the th user in cell of interest, respectively. It is assumed that the same orthogonal pilot sequences are used in each cell where and . and denote the average transmit power and the uplink transmission data of the th user in the th cell, respectively. denotes the channel between the th user in the th cell and the BS in the th cell, where is the corresponding largescale path loss. and denote the channel between the eavesdropper and the base station in the th cell and the average transmit power of the eavesdropper, respectively. We assume the columns of are i.i.d. with Gaussian distribution , where is the largescale path loss for the eavesdropper. For the training phase, the eavesdropper attacks all the users in cell of interest. Therefore, it uses the attacking pilot sequences [12], where . For the uplink data transmission phase, the eavesdropper generates artificial noise , whose elements conform i.i.d. standard Gaussian distribution. and are noise matrices whose columns are i.i.d. Gaussian distributed with .
We define and the eigenvalue decomposition , where the eigenvalues on the main diagonal of matrix are originated in ascending order. For the following, we make the important assumption that due to the strong active attack and the largescale path loss difference between the cell of interest and other cells, , , and have the relationship . Let and vector has the same element as vector but with the elements originated in ascending order whose index satisfies , . Define . Define and . Then, we have the following theorem.
Theorem 1.
Let and . Then, when and , the minimum mean square error (MMSE) estimate of based on is given by
(3) 
where and .
Proof.
Please refer to Appendix A. ∎
Remark 1: The basic intuition behind Theorem 1 is that when and , each channel tends to be an eigenvector of the received signal matrix. As a result, we project the received signal matrix along the eigenspace which corresponds to the desired users’ channel. In this case, the impact of the strong active attack can be effectively eliminated.
Remark 2: In Theorem 1, we assume that the coherence time of the channel is significantly larger than the symbol duration [14]. This assumption can be justified based on the expression for the coherence time in [14, Eq. (1)]. For typical speeds of mobile users and typical symbol duration, the coherence time can be more than hundreds symbol durations or even more.
Remark 3: The simulation results in Section IV indicate that a sufficient power gap between and can guarantee a good secrecy performance when the number of transmit antennas and the coherence time of the channel are large but not infinite. We note that allocating more power to the desired users to combat a strong active attack is not needed. In contrast, the larger gap between and will be beneficial to approach the channel estimation result in Theorem 1. This implies that decreasing the power of the desire users can be an effective secure transmission strategy under a strong active attack.
Remark 4: We can use large dimension random matrix theory [15] to obtain a more accurate approximation for the eigenvalue distribution of for the case when and are large but not infinite. Then, power design policies for , , and can be obtained. This will be discussed in the extended journal version of this work.
Based on Theorem 1, we can design the precoders for downlink transmission.
Iii Downlink Transmission
In this section, we consider the downlink transmission. We assume the BSs in all cells perform channel estimation according to Theorem 1 by replacing , , , and by , , , and , respectively. Then, the th BS designs the transmit signal as follows
(4) 
where is the downlink transmission power, , and is the downlink transmitted signal for the th user in the th cell.
For the proposed precoder design, the base station only needs to know the statistical channel state information of the eavesdropper in order to construct . This assumption is justified in [10].
Because each user in the cell of interest has the risk of being eavesdropped, an achievable ergodic secrecy sumrate can be expressed as [16]
(5) 
where and denote an achievable ergodic rate between the BS and the th user and the ergodic capacity between the BS and the eavesdropper seeking to decode the information of the th user, respectively.
The received signal at the th user in the cell of interest is given by
(6) 
where is the noise in the downlink transmission.
For , we adopt the same pessimistic assumption as in [10], i.e., we assume that the eavesdropper can eliminate all interference from intra and intercell users to obtain an upper bound of as follows
(9) 
where
(10) 
Iv Numerical Results
In this section, we present numerical results to examine the proposed design and the obtained analytical results. We set , , , , , , , and . We define the signaltonoise ratio (SNR) as . Also, we define .
Figure 1 plots the asymptotic and exact secrecy rate performance vs. the SNR for , dB, , and different numbers of users, respectively. The exact secrecy rate is obtained based on Monte Carlo simulation of (III) and (9). We note from Figure 1 that the asymptotic secrecy rate in Theorem 2 provides a good estimate for the exact secrecy rate.
Figure 2 compares the secrecy performance of the proposed design and the MFAN design in [10] for large but finite and as a function of for , dB, SNR = 5dB, and different values of . We keep constant and increase to increase . We observe from Figure 2 that when the power of the active attack is strong, the MFAN design cannot provide a nonzero secrecy rate. However, our proposed design performs well in the entire considered range of . As increases, the gap between and increases as well. Therefore, the secrecy rate increases with for the proposed design. Moreover, Figure 2 reveals that increasing is beneficial for the secrecy performance of the proposed design.
V Conclusions
In this paper, we have proposed a dataaided secure transmission scheme for multicell multiuser massive MIMO systems which are under a strong active attack. We exploit the received uplink data signal for joint uplink channel estimation and secure downlink transmission. We show analytically that when the number of transmit antennas and the length of the data vector both approach infinity, the proposed design can effectively eliminate the impact of an active attack by an eavesdropper. Numerical results validate our theoretical analysis and demonstrate the effectiveness of the proposed design under strong active attacks.
Appendix A Proof of Theorem 1
We define ,
, ,
, ,
, .
(16) 
where
(17)  
(18)  
(19)  
(20) 
and has orthogonal columns.
When , we have
(21) 
Appendix B Proof of Theorem 2
First, based on the property of MMSE estimates, we know that .
Next, we evaluate . First, we obtain