Cognitive MAC Protocols for General Primary Network Models
Abstract
We consider the design of cognitive Medium Access Control (MAC) protocols enabling a secondary (unlicensed) transmitterreceiver pair to communicate over the idle periods of a set of primary (licensed) channels. More specifically, we propose cognitive MAC protocols optimized for both slotted and unslotted primary networks. For the slotted structure, the objective is to maximize the secondary throughput while maintaining synchronization between the secondary pair and not causing interference to the primary network. Our investigations differentiate between two sensing scenarios. In the first, the secondary transmitter is capable of sensing all the primary channels, whereas it senses only a subset of the primary channels in the second scenario. In both cases, we propose blind MAC protocols that efficiently learn the statistics of the primary traffic online and asymptotically achieve the throughput obtained when prior knowledge of primary traffic statistics is available. For the unslotted structure, the objective is to maximize the secondary throughput while satisfying an interference constraint on the primary network. Sensingdependent periods are optimized for each primary channel yielding a MAC protocol which outperforms previously proposed techniques that rely on a single sensing period optimization.
1 Introduction
The radio spectrum resource is of fundamental importance to wireless communication. Recent reports show that most available spectrum has been allocated. However, most of licensed spectrum resources are underutilized. This observation has encouraged the emergence of dynamic and opportunistic spectrum access concepts, where secondary (unlicensed) users (SU) equipped with cognitive radios are allowed to opportunistically access the spectrum as long as they do not interfere with primary (licensed) users (PU). To achieve this goal, the secondary users must monitor the primary traffic in order to identify spectrum holes or opportunities which can be exploited to transfer data [1].
The main goal of a cognitive MAC protocol is to sense the radio spectrum, detect the occupancy state of different primary channels, and then opportunistically communicate over unused channels (spectrum holes). Specifically, the cognitive MAC protocol should continuously make efficient decisions on which channels to sense and access in order to obtain the most benefit from the available spectrum opportunities. Previous work on the design of cognitive MAC protocols has considered two distinct scenarios. In the first, the primary network is slotted (e.g., [2], [4], [5], [7], [8] and [13]) whereas a continuous structure (unslotted) of the primary channels is adopted in the second set of works (e.g., [3], [10], [11] and [14]). In this work we propose decentralized cognitive MAC protocols for each of the two models.
For the slotted structure, two cases are considered. The first assumes that the secondary transmitter can sense all the available primary channels before making the decision on which one to access. The secondary receiver, however, does not participate in the sensing process and waits to decode on only one channel. This is the model adopted in [4] and is intended to limit the decoding complexity needed by the secondary receiver. In the sequel, we propose an efficient algorithm that optimizes the online learning capabilities of the secondary transmitter and ensures perfect synchronization between the secondary pair. The proposed protocol does not assume a separate control channel, and hence, piggybacks the synchronization information on the same data packet.
In the second scenario, the secondary transmitter can only sense a subset of the available primary channels at the beginning of each time slot. This model was studied in [5] where the optimal algorithm was obtained by formulating the problem as a Partially Observable Markov Decision Process (POMDP). Unfortunately, finding a computationally efficient version of the optimal solution for this problem remains an elusive task. By recasting the problem as a restless multiarmed bandit problem, a nearoptimal index policy was proposed in [7]. The authors of [5] and [7], however, assumed that the primary traffic statistics (i.e., Markov chain transition probabilities) were available apriori to the secondary users. Here, we develop blind MAC protocols where the protocol must learn the transition probabilities online. This can be viewed as the Whittle index strategy of [7] augmented with a similar learning phase to the one proposed in [8] for the multiarmed bandit scenario. Our numerical results show that the performance of this protocol converges to that of the Whittle index strategy with known transition probabilities [7].
Under the unslotted primary network setup, we first assume that the SU radio can be tuned to any combination of the primary channels at the same time. This can be achieved by an Orthogonal Frequency Division Multiplexing (OFDM) technique with adaptive and selective allocation of OFDM subcarriers to utilize any subset of licensed channels at the same time. The SU aims at maximizing its throughput (i.e., maximizing the opportunities discovered and accessed in all primary channels) while imposing minimal interference to the primary network. A similar model was adopted in [3], where the authors developed an optimal sensing period for each of the primary channels by optimizing the tradeoff between the sensing overhead resulting from frequent sensing of the channels and the missed opportunities in the primary channels due to infrequent sensing. However, it was assumed that if a primary transmission is resumed on a channel, the SU will discover this return, via the help of a Genie, and immediately evacuate the channel, thereby causing no interference to the primary transmissions. In this work, we relax this Genieaided assumption and impose an interference/outage constraint on each primary channel. In [11], an optimal sensing period satisfying a primary network interference constraint was developed. However, the approximations made in [11] deviate considerably from the true values. More importantly, we show that by introducing two different sensing periods, a period if the channel is sensed free and a different period if the channel is sensed busy, the performance can be substantially improved. In particular, this performance improvement becomes more significant when there is a large difference between the expected time a primary channel is busy and the expected time it remains idle. Finally, we consider the scenario when the SU radio can be tuned to only one channel. A SU in this case shall try to access a primary channel as long as it is free. When this channel switches to busy, the SU shall search other primary channels until a free channel is identified. A similar model was adopted in [10], where an optimal sequence of primary channels to be sensed was proposed. This optimal sequence aimed at minimizing the average delay in finding a free channel. Here, we extend this work by finding the period a free channel shall be accessed in order satisfy an interference/outage constraint on the primary network, which was not considered in [10].
The rest of the paper is organized as follows. Section 2 presents our modeling assumptions. The proposed cognitive MAC protocols for slotted primary networks are developed in Section 3, whereas the unslotted scenario is investigated in Section 4. Numerical results for our proposed strategies are reported in Section 5. Finally, Section 6 summarizes our conclusions.
2 Network Model
2.1 Primary Network
We consider a primary network consisting of independent channels. The presence or absence of primary users in each channel can be modeled as alternating time intervals of busy and free states with random durations. For channel , we model the sojourn time of a busy period as a random variable with the probability density function (p.d.f.) . Similarly, the p.d.f. of the sojourn time in a free period is given as . Busy and free periods are assumed to be independent and identically distributed (i.i.d.). We also assume that busy and free periods are independent of each other. The state of channel at time , , is equal to if the channel is free and if busy.
2.2 Secondary Pair
The SU can sense any of the primary channels in order to identify the presence of a PU. After sensing the channel, the SU applies the channel access strategies as described in Sections 3 and 4. The performance of the sensing stage is limited by two types of errors. If the secondary transmitter decides that a free channel is busy, it will refrain from transmitting, and a spectrum opportunity is overlooked. This is the false alarm situation, which is characterized by the probability of false alarm . On the other hand, if the detector fails to classify a busy channel as busy, a miss detection occurs resulting in interference with primary user. The probability of missdetection is denoted by . If energy detection is used as a sensing method [12], the minimum required sensing time that satisfies a certain desired and is given by:
(1) 
Where, is the sampling frequency, is the tail probability of a zeromean unitvariance Gaussian random variable and is the PU signaltonoise ratio [12].
3 Slotted Primary Network
In this section, we consider the case of discrete probability distributions for the free and busy periods. We model the duration of these intervals (in terms of the number of time slots they occupy) as geometrically distributed random variables. From the memoryless property of the geometric distribution, we can model the primary users’ traffic in each channel by the two state Markov chain depicted in Figure 1. The channel state transition matrix of the Markov chain is given by . We assume that remains fixed for a block of time slots. We use to refer to the timeslot index .
We assume that the secondary transmitter can sense channels and can transmit on only one channel in each slot if the channel it chooses to access is sensed to be free. In this section, denotes the state of channel at time slot as sensed by the transmitter, which might not be the actual channel state . The secondary receiver does not participate in channel sensing and is assumed to be capable of accessing only one channel [4]. This assumption is intended to limit the decoding complexity needed by the secondary receiver. Another motivation behind restricting channel sensing to the transmitter is the potentially different sensing outcomes at the secondary transmitter and receiver due to the spatial diversity of the primary traffic which can lead to the breakdown of the secondary transmitterreceiver synchronization. Overall, successful communication between the secondary transmitter and receiver occurs only when: 1) they both decide to access the same channel, and 2) the channel is sensed to be free and is actually free from primary transmissions.
Our proposed cognitive MAC protocol can be decomposed into the following stages:

Decision stage: The secondary transmitter decides which channels to sense. Also, both transmitter and receiver decide which channel to access.

Sensing stage: The transmitter senses the selected primary channels.

Learning stage: Based on the sensing results from the sensing stage, the transmitter updates the estimated primary channels’ probability transition matrix .

Access stage: If the access channel is sensed to be free, a data packet is transmitted to the secondary receiver. This packet contains the information needed to sustain synchronization between secondary terminals, and hence, synchronization does not require a dedicated control channel. The length of the packet is assumed to be large enough such that the loss of throughput resulting from the synchronization overhead is marginal.

ACK stage: The receiver sends an ACK to the transmitter upon successful reception of sent data.
3.1 Full Sensing Capability:
In this subsection we assume that the secondary transmitter can sense all primary channels at the beginning of each time slot. Since the receiver doesn’t participate in sensing, and in order to sustain the transmitterreceiver synchronization, the secondary pair must share the same variables which are used to decide upon the channel to be accessed. We refer to as the belief vector at slot , where is the probability that . Given the sensing outcomes in slot , the belief state in slot can be obtained recursively as follows:
(2) 
Due to the different sensing roles between the secondary transmitter and receiver, we introduce the vector as the common or shared belief vector between the secondary transmitter and receiver. The initial packet sent to the receiver includes estimates for the transition probabilities, and . Once the initial communication is established, the secondary transmitter and receiver implement the same spectrum access strategy described below for .

Decision: At the beginning of time slot , and using belief vector , the secondary transmitter and receiver decide to access channel:
(3) 
Sensing: The secondary transmitter senses all channels and captures the sensing vector , where if the ith channel is sensed to be free, and if it is found busy. might be different than due to sensing errors. Note that the decision stage precedes the sensing stage in order to maintain the synchronization between the secondary terminals.

Learning: Based on the sensing results, the transmitter updates the estimates and for all primary channels as explained below.

Access: If , the transmitter sends its data packet to the receiver. The packet includes , and . In addition, if the transmission at slot has failed, the transmitter sends , which is the belief vector computed at the transmitter based on its observations. If the receiver successfully receives the packet, it sends an ACK back to the transmitter. Parameter is equal to unity if an ACK is received by the transmitter, and zero otherwise. If the channel is free, the forward transmission and the feedback channel are assumed to be errorfree.

Finally, the transmitter and receiver update the common belief vector such that:
If :
(4) 
If :
(5) 
where:
(6) 
(7) 
(8) 
and are the most recent shared estimates of ith channel transition probabilities. Obviously, in case of perfect sensing, , and .
In addition, the transmitter computes another belief vector, , based on its observations:
If ,
If :
(9) 
where , , and are the same as , and with replaced by . Note that , and differs from only when . If transmission succeeds at the jth time slot after one or more failures, the transmitter and receiver set before computing .
It is noted that although is the updated belief vector which is available to the transmitter, the transmitter and receiver use the degraded in the decision stage instead, in order to maintain the synchronization. So, as an analytical benchmark, we have the following upperbound on the achievable throughput in this scenario. Assuming that the delayed side information of all the primary channels’ states is known to the secondary receiver as well as the transmitter, the expected throughput per slot is given by:
(10) 
where denotes the state transition probability for channel from state to the free state, and is the Markov steady state probability of channel being free or busy. The first term in the summation corresponds to the probability that the channels are in one of the states, and the second term represents the highest expected throughput given the current joint state for the channels. The loss in throughput resulting from the use of instead of in the decision is illustrated in Figure 3.
Since we assume that traffic statistics on primary channels () are unknown to the secondary user apriori, the secondary user needs to estimate these probabilities. When continuous observations of each channel are available, each channel can be modeled as a hidden Markov model (HMM). An optimal learning algorithm for HMM is described in [6] using which the transition probabilities, , and can be estimated. However, we here adopt a simple Bayesian learning method. Assume that and are random variables with distributions and defined on ; respectively. After sensing all the primary channels at the beginning of each time slot, and depending on the previous state of the channel, the posterior distribution of can be updated according to Bayes’ rule; i.e.,
(11)  
(12) 
where the event represents the state transition from busy at time to free at time . Also, the event represents the state transition from busy at time to busy at time . The posterior distribution of can be updated similarly. In addition, after sensing all the primary channels at the beginning of each time slot, the secondary transmitter shall keep track of the following metrics for each channel:

Number of state transitions from busy to busy:
(13) 
Number of state transitions from busy to free:
(14) 
Number of state transitions from free to busy:
(15) 
Number of state transitions from free to free:
(16)
Thus, if we assume that at , is uniformly distributed in (i.e., ), or in other words no prior information about is available, then using equation (11) it can be shown that satisfies the following Beta distribution:
(18)  
Finally, the expected value of , obtained from equation (18), gives the following best estimate for at time :
(19) 
Using the same approach, the best estimate for at time is given by:
(20) 
This learning strategy can be easily applied to the situation where the primary traffic statistics () changes with time. One simple idea is to consider only a fixed number of previous sensing samples in estimating (i.e., using a sliding window of samples for the estimation)
In order to share the channel transition probabilities between the secondary transmitter and receiver, as dictated by the proposed strategy, any updates in the values of , , and must be sent within the transmitted packet. If , the transmitter and receiver update and . Otherwise, the transmitter only updates , , and , but uses the old values, available at the last successful transmission, in the decision phase.
In a nutshell, the proposed algorithm uses the full sensing capability of the secondary transmitter to decouple the exploration (i.e., learning) task from the exploitation task. After an ACK is received, both nodes use the common observationbased belief vector to make the optimal access decision. On the other hand, in the absence of the ACK, both nodes cannot use the optimal belief vector in order to maintain synchronization. In this case, the proposed algorithm opts for a greedy strategy in order to minimize the time between the two successive ACKs.
A final remark is now in order. Assuming that , the probability of channel being free, , becomes independent of the previous state, i.e., . In this case, the optimal strategy, assuming that the transition probabilities are known, is for the secondary transmitter to access the channel and the expected throughput becomes: [8]. Assuming, however, that the transition probabilities are unknown but both nodes know that , one can estimate each channel free probability as , where is the number of times channel was sensed to be free until time slot . In Section 5, we quantify the value of this side information by comparing the performance of this strategy with our proposed universal algorithm that does not make any prior assumption about the transition probabilities.
3.2 The Restless Bandit Scenario:
Here, we assume that the secondary transmitter can sense only a subset of the primary channels at the beginning of each time slot. Obviously, the problem here is not as simple as the case since the secondary transmitter has to decide on which channels to sense at each time slot in addition to the channel the transmitter and receiver decide to access. This opportunistic spectrum access network can be modeled as a partially observable Markov decision process (POMDP) where the channel sensing and access of a MAC protocol correspond to a policy for this POMDP [5]. The design objective is to determine, in each slot, which channel to sense so that the expected total reward: obtained in slots is maximized, where . It has been shown that for any , the belief vector is a sufficient statistic for the design of the optimal action in slot [16]. A policy for a POMDP is thus given by a sequence of functions, each mapping from the current belief vector to the sensing and access action to be taken in slot . Unfortunately, finding the optimal policy for a general POMDP is computationally prohibitive. In [5], the authors proposed a reduced complexity strategy based on the greedy approach that maximizes the perslot throughput based on already known information (i.e., at time slot , transmit on channel ). In a more recent work [7], the problem was recasted as a restless bandit problem.
Restless Multiarmed Bandit Processes (RMBP) are generalizations of the classical Multiarmed Bandit Processes (MBP). In the MBP, a player, with full knowledge of the current state of each arm, chooses one out of arms to activate at each time and receives a reward determined by the state of the activated arm. Only the activated arm changes its state while the states of passive arms are frozen. The objective is to maximize the longrun reward over the infinite horizon by choosing which arm to activate at each time. The solution to the multiarmed bandit problem should be able to maintain a balance between the exploration and exploitation in order to maximize the total reward. Whittle [9] introduced the RMBP which allow multiple arms to be activated simultaneously and passive arms to also change states. In each slot, the user chooses one of two possible actions to make a particular arm passive or active. Whittle’s index measures how attractive it is to activate an arm based on the concept of subsidy for passivity. In other words, Whittle’s index for a channel is the minimum subsidy that is needed to move a state from the active set to the passive set. In [7], the Whittle index was obtained in closed form and was used to construct a more efficient medium access policy than the greedy approach. The given in [7] can be viewed as a combination of the immediate reward represented by and a learning reward obtained from observing the state of the channel. Based on this Whittle index formulation, the maximum reward obtained at each time slot is given by , where:
(21) 
and represents the set of channels with the largest values of not including channel . Knowing the states of the set of channels gives the largest observation reward (i.e., exploration) which enhances future access decisions.
Here, we relax the assumption of the apriori available transition probabilities at the secondary transmitter/receiver. This adds another interesting dimension to the problem since the blind cognitive MAC protocol must now learn this statistical information online in order to make the appropriate access decisions. Inspired by the previous results of Lai et al. in the multiarmed bandit setup [8], we propose the following simple strategy. The primary channels are divided into channel groups. Then, at the beginning of the slots, each of the of the channel groups are continuously monitored for an initial learning period () to get an estimate for and . In summary, the proposed strategy works as follows:

Initial learning period: Each group of channels are continuously sensed for time slots. At the end of the learning period, the transition probabilities are estimated as ,

Decision: At the beginning of any time slot (), the secondary transmitter and receiver decide to access channel .

Sensing: The secondary transmitter senses the channels in the set in addition to channel . The sensing vector for the selected channels is captured

Learning: if , update , , , , , and .

Access: If , the transmitter sends its data packet to the receiver. The packet includes , and . In addition, if the transmission at slot has failed, the transmitter sends

The transmitter and receiver calculate , while the transmitter calculates :
If :
(22) 
If :
(23) 
(24) 
where and are the latest successfully shared and between the secondary transmitterreceiver pair. Finally, is used to update Whittle’s index of each channel as detailed in [7].
In the case of timeindependent channel states, i.e., , the problem reduces to the multiarmed bandit scenario considered in [8]. The difference, here, is the lack of the dedicated control channel, between the cognitive transmitter and receiver, as assumed in [8]. Assuming , the following strategy, which is applied as soon as the initial synchronization is established, avoids this drawback by ensuring synchronization using the ACK feedback over the same data channel.

Decision: At the beginning of any time slot , the secondary transmitter and receiver decide to access the channel , where , is the number of time slots where successful communication occurs on channel , and is the number of time slots where channel is chosen to sense and access [8].

Sensing: The secondary transmitter senses channel .

Access: If , the transmitter sends its data packet to the receiver. If the receiver successfully receives a packet, it sends an ACK back to the transmitter.

The transmitter and receiver update the following:
, if
, if
4 UnSlotted Primary Network
In this section, we consider the case of continuous probability distributions for the free and busy periods. Whenever the channel enters the busy or free state, the time until the next state transition is governed by the continuous p.d.f. or ; respectively. Without loss of generality, we will use the following exponentially distributed busy/free periods for each channel as an illustrative example:
(25)  
(26) 
The channel utilization in this case is given by:
(27)  
(28) 
We also assume that the SU is equipped with a single antenna that can be used for either sensing or transmission.
4.1 Multiple Channel Access
Here, the SU can transmit on any combination of the primary channels simultaneously. However, the SU can sense only one channel at a time. Thus in order to sense any channel, the transmission taking place on any other channels is paused till the end of the sensing event. The goal is to find the optimal access strategy that maximizes the throughput for SU while satisfying the PU intereference/outage constraints for each channel.
Since the SU depends only on sensing a channel at specific times to identify the channel’s state, it cannot track the exact state transition of each channel. Hence, the free portion of time between the actual state transition from busy to free until the SU discovers this transition cannot be utilized. In addition, some free periods may remain undiscovered at all if sensing is infrequent. These Unexplored Opportunities are quantified by , which is defined as the average fraction of time during which channel i’s vacancy is not discovered by the SU [3]. On the other hand, the transition of primary activity from free to busy on a channel utilized by the SU causes interference to the primary and secondary receivers until the SU realizes this transition. This Interference Ratio is quantified by , which is defined as the average fraction of time at which channel is at the busy state but interrupted by SU transmission. Finally, we note that blindly increasing the sensing frequency to reduce interference and discover more opportunities is not desirable because the SU must suspend the use of the discovered channel(s) when it senses other channels. This is due to the assumption that data transmission and sensing cannot take place at the same time with one antenna. Thus the Sensing Overhead is defined as the average fraction of time during which channel discovered opportunities are interrupted due to the need for sensing any of the channels [3]. This tradeoff will be captured in the construction of our objective function which is used to find the optimal sensing frequencies/periods. The proposed algorithm relies on the novel idea of using two sensing periods for each channel: free sensing period if , and busy sensing period if . Therefore, our optimization task is to identify the optimal sensing periods , for each channel, that maximize the total throughput for the SU on the channels while satisfying the PU interference constraint on each channel.
The channel as seen by the SU can be modeled by a two state (free/busy) Markov chain, where the transition probabilities from the free or busy state to the free state are: and , is the most recent sensing time. For exponentially distributed busy/free periods, and are given by [3]:
(29)  
(30) 
The ratio of the average number of times the channel is sensed free to the total number of times the channel is sensed can be obtained from the steady state probability that the Markov chain is in the free state:
(31) 
In case of perfect sensing (i.e., and ), represents the probability that the SU senses channel with sensingdependent periods and , and finds it free. In the presence of sensing errors, the probability of finding channel free is:
Note that the average time between sensing events on channel is given by:
(32)  
We define the Secondary Utilization as the expected fraction of time during which channel is sensed or utilized by the SU,
(33) 
The total SU uninterrupted transmission time is equivalent to the expected throughput that can be achieved by the SU on all channels, and is given by:
(34)  
Figure 2 illustrates the sensingdependent periods per channel, the interference ratio, the unexplored opportunities, the sensing overhead, the secondary utilization, and the secondary achieved throughput for a two primary channels model.
Now, in order to find expressions for the expected unexplored opportunities and interference ratio we need first to find expressions for and , which are defined as the expected time in which a channel is free during the time between and provided that or ; respectively. Based on the theory of alternating renewal processes, the remaining time for a channel to be in the same state from any sampling time can be shown to have the p.d.f. [15], where is the c.d.f. of the free or busy period. Therefore, it can be easily shown that:
(35)  
(36) 
where and are the same as and if the change in state happens exactly at . That is,
(37)  
(38) 
Using Laplace transform, and for exponentially distributed busy/free periods can be obtained as: (see Appendix A for a complete derivation)
(39)  
(40) 
The unexplored opportunities can now be obtained as:
(42)  
Similarly, is given by:
(44)  
Finally, the sensing overhead is given by:
(45) 
It is worth noting that the first term represents the average fraction of time channel is utilized by the SU without interference from the PU. This is the useful time that is interrupted by the need for sensing. The second term represents the aggregate sensing overhead given by the ratio of the sensing time to the average sensing sensing period.
In summary, given a maximum interference constraint per primary channel, , our optimization problem can be expressed as follows: