Keys through ARQ
Abstract
This paper develops a novel framework for sharing secret keys using the wellknown Automatic Repeat reQuest (ARQ) protocol. The proposed key sharing protocol does not assume any prior knowledge about the channel state information (CSI), but, harnesses the available opportunistic secrecy gains using only the one bit feedback, in the form of ACK/NACK. The distribution of key bits among multiple ARQ epochs, in our approach, allows for mitigating the secrecy outage phenomenon observed in earlier works. We characterize the information theoretic limits of the proposed scheme, under different assumptions on the channel spatial and temporal correlation function, and develop low complexity explicit implementations. Our analysis reveals a novel role of “dumb antennas” in overcoming the negative impact of spatial correlation, between the legitimate and eavesdropper channels, on the achievable secrecy rates. We further develop an adaptive rate allocation policy which achieves higher secrecy rates by exploiting the channel temporal correlation. Finally, our theoretical claims are validated by numerical results that establish the achievability of nonzero secrecy rates even when the eavesdropper channel is less noisy, on the average, than the legitimate channel.
I Introduction
The recent flurry of interest on wireless physical layer secrecy is inspired by Wyner’s pioneering work on the wiretap channel. Under the assumption that the eavesdropper channel is a degraded version of the legitimate channel, Wyner showed in [1, 2] that perfectly secure communication is possible by hiding the message in the additional noise level seen by the eavesdropper. The effect of fading on the secrecy capacity was studied later. In particular, by appropriately distributing the message across different fading realizations, it was shown that the multiuser diversity gain can be harnessed to enhance the secrecy capacity, e.g. [3, 8]. More recently, the authors of [4] proposed using the wellknown Hybrid ARQ protocol to facilitate the exchange of secure messages over fading channels. This paper extends this line of work by developing a novel ARQbased approach for secret key sharing between two legitimate users (Alice and Bob), communicating over a wireless channel, in the presence of a passive eavesdropper (Eve). The shared key can then be used to secure any future message transmission.
One innovative aspect of our framework is the distribution of key bits over an asymptotically large number of ARQ epochs. This approach allows for overcoming the secrecy outage phenomenon observed in [4] at the expense of increased delay. In this setup, we characterize the fundamental information theoretic limits on the maximum achievable key rate; subject to a perfect secrecy constraint. Our information theoretic analysis inspires the design of explicit ARQ protocols that attain an excellent throughputdelaysecrecy tradeoff with a realizable coding/decoding complexity. It also reveals the negative impact of spatial correlation on the achievable key rate. This problem is mitigated via the efficient use of dumb antennas which is shown to effectively decorrelate the legitimate and eavesdropper channels in the asymptotic limit of a large number of transmit antennas. Moreover, we propose a greedy rate adaptation algorithm capable of transforming the temporal correlation in the legitimate channel into additional gains in the secrecy rate. In a nutshell, our results demonstrate the achievability of nonzero perfectly secure key rate over fading channels by opportunistically exploiting the ARQ feedback (even when the eavesdropper channel is less noisy, on the average, than the main channel).
The rest of this paper is organized as follows. Our system model is detailed in Section II. Section III develops the main results for the spatially independent block fading model. In Section IV, we extend our analysis to spatially and temporally correlated channels, whereas numerical results that validate our theoretical claims are presented in Section V. Finally, Section VI offers some concluding remarks and our proofs are collected in the Appendices to enhance the flow of the paper.
Ii System Model
Our model, shown in Figure 1, assumes one transmitter (Alice), one legitimate receiver (Bob) and one passive eavesdropper (Eve). We adopt a block fading model in which the channel is assumed to be fixed over one coherence interval and changes from one interval to the next. In order to obtain rigorous information theoretic results, we consider the scenario of asymptotically large coherence intervals and allow for sharing the secret key across an asymptotically large number of those intervals. The finite delay case will be considered as well. In any particular interval, the signals received by Bob and Eve are respectively given by,
(1)  
(2) 
where is the transmitted symbol in the block, is the received symbol by Bob in the block, is the received symbol by Eve in the block, and are the complex block channel gains from Alice to Bob and Eve, respectively. The channel gains can also be written as
(3)  
(4) 
where and , the phase shifts at Bob and Eve respectively, are assumed to be independent in all considered scenarios. Moreover, and are zeromean, unit variance white complex Gaussian noise coefficients at Bob and Eve, respectively. We do not assume any prior knowledge about the channel state information at Alice. Bob, however, is assumed to know and Eve is assumed to know both and apriori. We impose the following shortterm average power constraint
(5) 
Our model only allows for one bit of ARQ feedback from Bob to Alice. Each ARQ epoch is assumed to be contained in one coherence interval (i.e., fixed channel gains) and that different epochs correspond to different coherence intervals. The transmitted packets are assumed to carry a perfect error detection mechanism that Bob (and Eve) can use to determine whether the packet has been received correctly or not. Based on the error check, Bob sends back to Alice an ACK/NACK bit, through a public and errorfree feedback channel. Eve is assumed to be passive (i.e., can not transmit); an assumption which can be justified in several practical settings. To minimize Bob’s receiver complexity, we adopt the memoryless decoding assumption implying that frames received in error are discarded and not used to aid in future decoding attempts.
Iii Secrecy via ARQ
Our main results are first derived for the scenario where and vary independently from one block to another according to a joint distribution . The impact of temporal correlation on the performance of our secret key sharing protocols will be investigated in the next section.
Iiia Information Theoretic Foundation
In our setup, Alice wishes to share a secret key with Bob. To transmit this key, Alice and Bob use an code consisting of : 1) a stochastic encoder at Alice that maps the key to a codeword , 2) a decoding function : which is used by Bob to recover the key. The codeword is partitioned into blocks, each one corresponds to one ARQepoch and contains symbols where . For now, we focus on the asymptotic scenario where and .
Alice starts with a random selection of the first block of symbols. Upon reception, Bob attempts to decode this block. If successful, it sends an ACK bit to Alice who moves ahead and makes a random choice of the second and sends it to Bob. Here, Alice must make sure that the concatenation of the two blocks belong to a valid codeword. As shown in the sequel, this constraint is easily satisfied. If an error was detected, then Bob sends a NACK bit to Alice. To simplify the analysis, we assume that the error detection mechanism is perfect which is justified in the asymptotic scenario . In this case, Alice replaces the first block of symbols with another randomly chosen block and transmits it. The process then repeats until Alice and Bob agree on a sequence of blocks, each of length symbols, corresponding to the key.
The code construction must allow for reliable decoding at Bob while hiding the key from Eve. It is clear that the proposed protocol exploits the error detection mechanism to make sure that both Alice and Bob agree on the key (i.e., ensures reliable decoding). What remains is the secrecy requirement which is measured by the equivocation rate defined as the entropy rate of the transmitted key conditioned on the intercepted ACKs or NACKs and the channel outputs at Eve, i.e.,
(6) 
where is the number of symbols transmitted to exchange the key (including the symbols in the discarded blocks due to decoding errors), , denotes sequence of ACK/NACK bits, and are the sequences of channel coefficients seen by Bob and Eve in the blocks, and denotes Eve’s channel outputs in the symbol intervals. We limit our attention to the perfect secrecy scenario, which requires the equivocation rate to be arbitrarily close to the key rate. The secrecy rate is said to be achievable if for any , there exists a sequence of codes such that for any , we have
(7) 
and the key rate for a given input distribution is defined as the maximum achievable perfect secrecy rate with this distribution. The following result characterizes this rate, assuming a Gaussian input distribution
Theorem 1
The key rate for the memoryless ARQ protocol with Gaussian inputs is given by:
(8) 
where and if is true and otherwise. For the special case of spatially independent fading, i.e. ) the above expression simplifies to
(9) 
A few remarks are now in order

It is clear from (8) that a positive secret key rate is achievable under very mild conditions on the channels experienced by Bob and Eve. More precisely, unlike the approach proposed in [4], Theorem 1 establishes the achievability of a positive perfect secrecy rate by appropriately exploiting the ARQ feedback even when Eve’s average SNR is higher than that of Bob.

Theorem 1 characterizes the fundamental limit on secret key sharing and not message transmission. The difference between the two scenarios stems from the fact that the message is known to Alice before starting the transmission of the first block, whereas Alice and Bob can defer the agreement on the key till the last successfully decoded block. This observation was exploited by our approach in making Eve’s observations of the frames discarded by Bob, due to failure in decoding, useless.

It is intuitively pleasing that the secrecy key rate in (9) is the product of the probability of success at Bob and the expected value of the additional mutual information gleaned by Bob, as compared to Eve, in those successfully decoded frames.

We stress the fact that our approach does not require any prior knowledge about the channel state information. The only assumption is that the public feedback channel is errorfree, authenticated, and only accessible by Bob.

The achievability of (8) hinges on a random binning argument which only establishes the existence of a coding scheme that achieves the desired rate. Our result, however, stops short of explicitly finding such optimal coding scheme and characterizing its encoding/decoding complexity. This observation motivates the development of the explicit secrecy coding schemes in Section IIIB.
IiiB Explicit Secrecy Coding Schemes
This section develops explicit secrecy coding schemes that allow for sharing keys using the underlying memoryless ARQ protocol with realizable encoding/decoding complexity and delay. We proceed in three steps. The first step replaces the random binning construction, used in the achievability proof of Theorem 1, with an explicit coset coding scheme for the erasurewiretap channel. This erasurewiretap channel is created by the ACK/NACK feedback and accounts for the computational complexity available to Eve. In the second step, we limit the decoding delay by distributing the key bits over only a finite number of ARQ frames. Finally, we replace the capacity achieving Gaussian channel code with practical coding schemes in the third step. Overall, our threestep approach allows for a nice performancevscomplexity tradeoff.
The perfect secrecy requirement used in the information theoretic analysis does not impose any limits on Eve’s decoding complexity. The idea now is to exploit the finite complexity available at Eve in simplifying the secrecy coding scheme. To illustrate the idea, let’s first assume that Eve can only afford maximum likelihood (ML) decoding. Hence, successful decoding at Eve is only possible when
(10) 
for a given transmit power level . Now, using the idealized error detection mechanism, Eve will be able to identify and erase the frames decoded in error resulting in an erasure wiretap channel model. In practice, Eve may be able to go beyond the performance of the ML decoder. For example, Eve can generate a list of candidate codewords and then use the error detection mechanism, or other means, to identify the correct one. In our setup, we quantify the computational complexity of Eve by the amount of side information bits per channel use offered to it by a Genie. With this side information, the erasure probability at Eve is given by
(11) 
since now the channel has to supply only enough mutual information to close the gap between the transmission rate and the side information . The ML performance can be obtained as a special case of (11) by setting .
It is now clear that using this idea we have transformed our ARQ channel into an erasurewiretap channel, as in Figure 2. In this equivalent model, we have a noiseless link between Alice and Bob, ensured by the idealized error detection algorithm, and an erasure channel between Alice and Eve. The following result characterizes the achievable performance over this channel
Lemma 2
The secrecy capacity for the equivalent erasurewiretap channel is
(12)  
In the case of spatially independent channels, the above expression reduces to
(13) 
The proof follows from the classical result on the erasurewiretap channel [2]. It is intuitively appealing that the expression in (13) is simply the product of the transmission rate per channel use, the probability of successful decoding at Bob, and the probability of erasure at Eve. The main advantage of this equivalent model is that it lends itself to the explicit coset LDPC coding scheme constructed in [5, 6, 7]. In summary, our first low complexity construction is a concatenated coding scheme where the outer code is a coset LDPC for secrecy and the inner one is a capacity achieving Gaussian code. The underlying memoryless ARQ is used to create the erasurewiretap channel matched to this concatenated coding scheme.
The second step is to limit the decoding delay resulting from the distribution of key bits over an asymptotically large number of ARQ blocks in the previous approach. To avoid this problem, we limit the number of ARQ frames used by the key to a finite number . The implication for this choice is a nonvanishing value for secrecy outage probability. For example, if we encode the message as the syndrome of the rate parity check code then Eve will be completely blind about the key if at least one of the ARQ frames is erased [5, 6, 7] (Here the distilled key is the modulo sum of the key parts received correctly). The secrecy outage probability, assuming spatially independent channels, is therefore
(14) 
where ,…, are i.i.d. random variables drawn according to the marginal distribution of Eve’s channel. Assuming a Rayleigh fading distribution, we get
(15) 
Under the same assumption, it is straightforward to see that the average number of Bernoulli trials required to transfer ARQ frames successfully to Bob is given by
(16) 
resulting in a key rate
(17) 
Therefore, for a given and , one can obtain a tradeoff between and by varying . Our third, and final, step is to relax the assumption of a capacity achieving inner code. Section V reports numerical results with practical coding schemes, including uncoded transmission, with a finite frame length . Overall, these results demonstrate the ability of the proposed protocols to achieve nearoptimal key rates, under very mild assumptions, with realizable encoding/decoding complexity and bounded delay.
Iv Correlated Fading
Iva Dumb Antennas for Secrecy
One of the important insights revealed by Theorem 1 is the negative relation between the achievable key rate and the spatial correlation between the main and eavesdropper channels. In fact, one can easily verify that the key rate collapses to zero in the fully correlated case (i.e., with probability one) independent of the marginal distribution of . In this section, we propose a solution to this problem based on a novel utilization of “dumb antennas.” The concept of dumb antennas was introduced in [9] as a means to create artificial channel fluctuations in slow fading environments. These fluctuations are used to harness opportunistic performance gains in multiuser cellular networks. As indicated by the name, one of the attractive features of this approach is that the receiver(s) can be oblivious to the presence of multiple transmit antennas [9]. We use dumb transmit antennas to decorrelate the main and eavesdropper channels as follows. Alice is equipped with transmit antennas, whereas both Bob and Eve will still have only one receive antenna. In order to simplify the presentation, we focus on the case of the symmetric fully correlated line of sight channels; whereby the magnitudes of the channel gains are all equal to one. The rest of our modeling assumption remains as detailed in Section II. The same data stream is transmitted from the transmitted after applying an i.i.d uniform phase to each of the signals. Also, Bob is assumed to perturb its location in each ARQ frame resulting in a random and independent phase shift (from that experienced by Eve). Our multiple transmit antenna scenario, therefore, reduces to a single antenna fading wiretap channel with the following equivalent channel gains
(18)  
(19) 
where , , and are i.i.d. and uniform over that remain fixed from one ARQ frame and change randomly from one frame to the next. One can now easily see that as increases, the marginal distribution of each equivalent channel gain approaches a zeromean complex Gaussian with unit variance (by the Central Limit Theorem (CLT) [11]). It is worth noting that the correlation coefficient between the two channels’ equivalent power gains depends on the instantaneous channels’ phases ’s and ’s for . It can be easily shown that, in the limit of , this correlation coefficient between the two channels power gains converges in a meansquare sense to zero (please refer to Appendix B for the proof). Therefore, in the asymptotic limit of a large , our dumb antennas approach has successfully transformed our fully correlated line of sight channel into a symmetric and spatially independent Rayleigh wiretap channel; whose secrecy capacity (assuming Gaussian inputs) is reported in Theorem 1. The numerical results reported in the sequel demonstrate that this result is not limited to line of sight channels, and that this asymptotic behavior can be observed for a relatively small number of transmit antennas.
IvB Temporal Correlation
Thus far, we have assumed that the channel gains affecting different frames are independent. This assumption renders optimal the stationary rate allocation strategy of Theorem 1. In this section, we relax this assumption by introducing temporal correlation between the channel gains experienced by successive frames. Assuming high temporal correlation and if a stationary rate strategy is employed and it is less than Eve’s channel capacity, all the information transmitted will be leaked to Eve. On the other hand, if the rate is much less than Bob’s channel capacity, additional gains in the secrecy capacity will not be harnessed. Hence, we are going to employ a rate adaptation strategy in which the optimal rate used in each frame is determined based on the past history of ACK/NACK feedbacks and the rates used in previous blocks. More specifically, following in the footsteps of [10], the optimal rate allocation policy can be formulated as follows (assuming a short term average power constraint and a Gaussian input distribution).
(20) 
where
where is the vector of previous transmission rates and is the vector of previously received ACKs and NACKs. The basic idea is that, after frame , the posteriori distribution of is updated using and . The expected secrecy rate, in future transmissions, is then maximized based on this updated distribution. It is worth noting that the above expression assumes no spatial correlation between and . This assumption represents the worst case scenario since it prevents Alice from learning the channel gains impairing Eve through the ARQ feedback. Since the channel gain is not observed directly, but through an indicator in the form of ARQ feedback, the optimal rate assignment, when the channel is Markovian, is a Partially Observable Markov Decision Process (POMDP). The solution of this POMDP is computationally intractable except for trivial cases. This motivates the following greedy rate allocation policy
(21) 
Interestingly, the numerical results reported in the following section demonstrate the ability of this simple strategy to harness significant performance gains in first order Markov channels. Note that the performance of any rate allocation policy can be upperbounded by the ergodic capacity with transmitter CSI (and short term average power constraint ), i.e.,
(22) 
V Numerical Results
Throughout this section, we focus on the symmetric scenario, where the average SNRs experienced by both Bob and Eve are the same, i.e., = 1. We further assume Rayleigh fading channels, for both Bob and Eve. Assuming spatially and temporally independent channels, the achievable secrecy rate in (9) becomes
(23) 
where .
Figure 3 gives the variation of and with SNR under different constraints on the decoding capabilities of Eve, captured by the geniegiven side information, . It is clear from the figure that can be greater than for certain and SNR values. For instance, in the case of , a packet received in error at Eve will be discarded without any further attempts at decoding. Therefore, the instantaneous secrecy rate becomes , which is larger than that used in (9) where are the instantaneous secrecy rate, and Eve’s channel power gain, respectively. Averaging over all fading realizations, we get a greater than . It is worth noting that, under the assumptions of the symmetric scenario and the Rayleigh fading model, the scheme proposed in [4] is not able to achieve any positive secrecy rate (i.e., probability of secrecy outage is one).
Next, we turn our attention to the delaylimited coding constructions proposed in Section IIIB. Figures 4 and 5 show, for different and , the tradeoff between the secrecy outage probability and key rate for the proposed rate coset secrecy coding scheme assuming an optimal inner Gaussian channel coding. Figure 4 gives the key rate corresponding to a desired secrecy outage probability, given some values for and . Figure 5, on the other hand, quantifies the reduction in key rate, corresponding to a certain outage probability, as increases. In Figure 6, we relax the optimal channel coding assumption and plot key rates for practical coding schemes and finite frame lengthes (i.e., finite ). The code used in the simulation is a punctured convolutional code derived from a basic code with a constraint length of and generator polynomials and (in octal). We assume that Eve is genieaided and can correct an additional erroneous symbols (beyond the error correction capability of the channel code). From the figure, we see that the key rate increases with increasing SNR and then drops after reaching a peak value. Note that the transmission rate is fixed and independent of the SNR. Therefore, a low SNR means more transmissions to Bob and a consequent low key rate. As the SNR increases, while keeping the transmission rate fixed, the key rate increases. However, increasing the SNR also means an increased ability of Eve to correctly decode the codewordcarrying packets. This explains why the key rate curves peak and then decay with SNR. In practice, one can always operate at the optimal value of the SNR by adjusting the transmit power level. We also observe that for a certain modulation and channel coding scheme, decreasing the packet size in bits lowers the key rate. Reducing the packet size increases the probability of correct decoding by Bob and, thus, decreases the number of transmissions. However, it also increases the probability of correct decoding by Eve and the overall effect is a decreased key rate.
The role of dumb antennas in increasing the secrecy capacity of spatially correlated ARQ channels is investigated in the next set of figures. In our simulations, we assume that the channel gains are fully correlated, but the channel phases are independent. The independence assumption for the phases is justified as a small change in distance between Bob and Eve in the order of several electromagnetic wavelengths translates to a significant change in phase. Under these assumptions, it is easy to see that with one transmit antenna the secrecy capacity is zero. In Figure 7, it is shown that as the number of antennas increases, the secret key rate approaches the upper bound given by (9) which assumes that the main and eavesdropper channels are independent. The same trend is observed in Figures 8, 9, and 10 which generate the channel gains using chisquare distribution with different degrees of freedom. Overall, this set of results validates the theoretical claim of Appendix B, indicating that dumb antennas can be used to decorrelate the main and eavesdropper channels, even for a relatively small number of transmit antennas.
Figure 11 reports the performance of the greedy rate adaptation algorithm for temporally correlated channels. The channel is assumed to follow the first order Markov model:
(24) 
where is the innovation process following distribution. As expected, it is shown that as decreases, the key rate increases. For the extreme points when or , we get an upper bound, which is the ergodic secrecy under the mainchannel transmit CSI assumption, and a lower bound, which is the ARQ secrecy capacity in case of independent block fading channel, respectively.
Vi Conclusions
This paper develops a novel overlay approach for sharing secret keys using existing ARQ protocols. The underlying idea is to distribute the key bits over multiple ARQ frames and then use the authenticated ACK/NACK feedback to create an equivalent degraded channel at the eavesdropper. Our results establish the achievability of nonzero secrecy rates even when the eavesdropper is experiencing a higher average SNR than the legitimate receiver and shed light on the structure of optimal ARQ secrecy protocols. It is worth noting that our approach does not assume any prior knowledge about the instantaneous CSI; only prior knowledge of the average SNRs seen by the eavesdropper and the legitimate receiver are needed. Inspired by our information theoretic analysis, we have constructed low complexity secrecy coding schemes by transforming our channel to an erasure wiretap channel which lends itself to explicit coset coding approaches. Our secrecy capacity characterization reveals the negative impact of spatial correlation and the positive impact of temporal correlation on the achievable key rates. The former phenomenon is mitigated via a novel “dumb antennas” technique, whereas the latter is exploited via a greedy rate adaptation policy. Finally, our theoretical claims have been validated via numerical examples that demonstrate the efficiency of the proposed schemes. The most interesting part of our work is, perhaps, the demonstration of the possibility of sharing secret keys in wireless networks via rather simple modifications of the existing infrastructure which, in our case, corresponds to the ARQ mechanism. This observation motivated our followup work on developing secrecy protocols for WiFi networks [13].
Appendix A Proof of Theorem 1
In this appendix, we are going to prove both the achievability and converse of (8).
Aa Achievability Proof
The proof is given for a fixed average power and transmission rate . The key rate is then obtained by the appropriate maximization. Let for some small and . We first generate all binary sequences of length and then independently assign each of them randomly to one of groups, according to a uniform distribution. This ensures that any of the sequences are equally likely to be within any of the groups. Each secret message is then assigned a group . We then generate a Gaussian codebook consisting of codewords, each of length symbols. The codebooks are then revealed to Alice, Bob, and Eve. To transmit the codeword, Alice first selects a random group of bits, and then transmits the corresponding codeword, drawn from the chosen Gaussian codebook. If Alice receives an ACK bit from Bob, both are going to store this group of bits and selects another group of bits to send in the next coherence interval in the same manner. If a NACK was received, this group of bits is discarded and another is generated in the same manner. This process is repeated till both Alice and Bob have shared the same key corresponding to bits. We observe that the channel coding theorem implies the existence of a Gaussian codebook where the fraction of successfully decoded frames is given by
(25) 
as . The equivocation rate at the eavesdropper can then be lower bounded as follows.
(26)  
In the above derivation, (a) results from the independent choice of the codeword symbols transmitted in each ARQ frame which does not allow Eve to benefit from the observations corresponding to the NACKed frames, (b) follows from the memoryless property of the channel and the independence of the ’s, (c) is obtained by removing all those terms which correspond to the coherence intervals , where , where is a binary random variable and indicates that an ACK was received, and (d) follows from the ergodicity of the channel as . Now we show that the term vanishes as by using a list decoding argument. In this list decoding, at coherence interval , the wiretapper first constructs a list such that if are jointly typical. Let . Given , the wiretapper declares that was transmitted, if is the only codeword such that , where is the set of codewords corresponding to the message . If the wiretapper finds none or more than one such sequence, then it declares an error. Hence, there are two types of error events: 1) : the transmitted codeword is not in , 2) : such that . Thus the error probability . Based on the Asymptotic Equipartition Property (AEP) [12], we know that . In order to bound , we first bound the size of . We let
(27) 
Now
(28) 
Hence
(29)  
(30)  
where (a) follows from the uniform distribution of the codewords in . Now as and , we get
where . Thus, by choosing , the error probability as . Now using Fano’s inequality, we get
Combining this with (26), we get the desired result.
AB Converse Proof
We now prove the converse part by showing that for any perfect secrecy rate with equivocation rate as , there exists a transmission rate , such that
Consider any sequence of codes with perfect secrecy rate and equivocation rate , such that as . We note that the equivocation only depends on the marginal distribution of , and thus does not depend on whether is a physically or stochastically degraded version of or vice versa. Hence we assume in the following derivation that for any fading state, either is a physically degraded version of or vice versa (since the noise processes are Gaussian). Thus we have
(31) 
where
In the above derivation, (a) results from the independent choice of the codeword symbols transmitted in each ARQ frame which does not allow Eve to benefit from the observations corresponding to the NACKed frames, (b) follows from Fano’s inequality, (c) follows from the data processing inequality since forms a Markov chain, (d) follows from the fact that conditioning reduces entropy and from the memoryless property of the channel, (e) follows from the fact that as shown in [1], (f) follows from ergodicity of the channel as . The claim is thus proved.
Appendix B Proof of Decorrelation
In this appendix, we show that employing multiple transmit antennas makes the correlation between Eve’s and Bob’s channel power gains converge to zero, in a meansquare sense, as the number of antennas goes to . Let
Assuming all ’s to be uniformly distributed in the interval , we get,
(32)  
Similarly for ,
(33) 
Now, taking the expectation of (32) and (33) with respect to the random phases applied on the transmit antenna array for given values of ’s and ’s, we get,
(34)  
(35)  
(36) 
So, the variance of and is given by,
(37) 
Therefore, the correlation coefficient between the channels’ power gains is given by
(38)  
where
(39) 
Assuming are all independent, and uniformly distributed in the interval , and taking the expectation of over them, we get,
(40) 
The divergence of around its mean is given by,