Repeated Auctions with Learning for Spectrum Access in Cognitive Radio Networks
In this paper, spectrum access in cognitive radio networks is modeled as a repeated auction game subject to monitoring and entry costs. For secondary users, sensing costs are incurred as the result of primary users’ activity. Furthermore, each secondary user pays the cost of transmissions upon successful bidding for a channel. Knowledge regarding other secondary users’ activity is limited due to the distributed nature of the network. The resulting formulation is thus a dynamic game with incomplete information. In this paper, an efficient bidding learning algorithm is proposed based on the outcome of past transactions. As demonstrated through extensive simulations, the proposed distributed scheme outperforms a myopic one-stage algorithm, and can achieve a good balance between efficiency and fairness.
Recent studies have shown that despite claims of spectral scarcity, the actual licensed spectrum remains unoccupied for long periods of time . Thus, cognitive radio (CR) systems have been proposed  in order to efficiently exploit these spectral holes. CRs or secondary users (SUs) are wireless devices that can intelligently monitor and adapt to their environment, hence, they are able to share the spectrum with the licensed primary users (PUs), operating whenever the PUs are idle. Three key design challenges are active topics of research in cognitive radio networks, namely, distributed implementation, spectral efficiency, and the tradeoff between sensing and spectrum access.
Previous studies have tackled various aspects of spectrum sensing and spectrum access. In , the performance of spectrum sensing, in terms of throughput, is investigated when the SUs share their instantaneous knowledge of the channel. The work in  studies the performance of different detectors for spectrum sensing, while in  spatial diversity methods are proposed for improving the probability of detecting the PU by the SUs. Other aspects of spectrum sensing are discussed in  and . Furthermore, spectrum access has also received increased attention, e.g., [8, 9, 10, 11, 12]. In , a dynamic programming approach is proposed to allow the SUs to maximize their channel access time while taking into account a penalty factor from any collision with the PU. The work in  (and the references therein) establish that, in practice, the sensing time of CR networks is large and affects the access performance of the SUs. In , the authors model the spectrum access problem as a non-cooperative game, and propose learning algorithms to find the correlated equilibria of the game. Non-cooperative solutions for dynamic spectrum access are also proposed in  while taking into account changes in the SUs’ environment such as the arrival of new PUs, among others.
When multiple SUs compete for spectral opportunities, the issues of fairness and efficiency arise. On one hand, it is desirable for an SU to access a channel with high availability. On the other hand, the effective achievable rate of an SU decreases when contending with many SUs over the most available channel. Consequently, efficiency of spectrum utilization in the system reduces. Therefore, an SU should explore transmission opportunities in other channels if available and refrain from transmission in the same channel all the time. Intuitively, diversifying spectrum access in both frequency (exploring more channels) and time (refraining from continuous transmission attempts) would be beneficial to achieving fairness among multiple SUs, in that SUs experiencing poorer channel conditions are not starved in the long run.
The objective of this paper is to design a mechanism that enables fair and efficient sharing of spectral resources among SUs. We model spectrum access in cognitive radio networks as a repeated auction game with entry and monitoring costs. Auctioning the spectral opportunities is carried out repeatedly. At the beginning of each period, each SU that wishes to participate in the spectrum access submits a bid to a coordinator based on its view of the channel and past auction history. Knowledge regarding other secondary users’ activities is limited due to the distributed nature of the network. The resulting formulation is thus a dynamic game with incomplete information. The bidder with the highest bid gains spectrum access. Entry fees are charged for all bidders who participate in the auction irrespective of the outcome of the auction. An SU can also choose to stay out (SO) of the current round, in which case no entry fee is incurred. At the end of each auction period, information regarding bidding and allocation are made available to all SUs, and in turn a monitoring fee is incurred.
To achieve efficient bidding, a learning algorithm is proposed based on the outcome of past transactions. Each SU decides on local actions with the objective of increasing its long-term cost effectiveness. As demonstrated through extensive simulations, the proposed distributed scheme outperforms a myopic one-stage algorithm where an SU always participates in the spectrum access game in both single channel and multi-channel networks.
A comment is in order on the feasibility of such an auction-based approach to spectrum access in practice. Due to commercial and industrial exploitation and different stake holders’ interests, the functional architectures and cognitive signaling schemes are currently under discussion within standardization forums, including IEEE SCC 41 and ETSI TC RRS (Reconfigurable Radio Systems). Cognitive pilot channel (CPC) has gained attention as a potential enabler of data-aided mitigation techniques between secondary and primary communication systems as well as a mechanism to support optimized radio resource and data management across heterogeneous networks. In CPC, a common control channel is used to provide the information corresponding to the operators, Radio Access Technology and frequencies allocated in a given area. We can thus leverage the intelligence of the CPC coordinator and the control channel to solicit bidding and broadcast the outcome of auctions.
The main contributions of this paper are:
We have formulated the spectrum access problem in cognitive radio networks as a repeated auction game.
A distributed learning algorithm is proposed for single-channel networks, and a non-regret learning algorithm is investigated for multi-channel networks.
The rest of the paper is organized as follows. In Section II, the system model and terminology are introduced. Mechanism design of the repeated auction with learning is presented in Section III. Simulation results are given in Section IV followed by conclusions and a discussion of future work in Section V.
Ii Physical layer and System Model
We consider a cognitive radio network consisting of channels to be occupied by SUs who compete repeatedly for access to the channels at each discrete time . At time , the SU can reasonably estimate its channel rate in the channel while having no knowledge of that of other SUs. In other words, each SU has imperfect information. We assume that both and are known to all . The primary user’s activity follows Bernoulli distribution, i.e., at time , the th channel is occupied with probability at time . Without loss of generality, all secondary users use a common transmit power with a thermal noise level at the basestation. The channel gain for the secondary user is assume to be , where is the propagation path loss and follows the Rayleigh fading distribution. The rate for the user on the channel at time can be written as
where is the bandwidth for each channel. All channels are assumed to have the same bandwidth for ease of exposition.
At time , an SU may incur two types of costs, namely, the cost of accessing , which accounts for the energy expenditure needed for spectrum access; and the cost of monitoring , which is the cost of sensing and subscribing to the control channel (e.g., CPC) to obtain information regarding past auctions. The spectrum access among SUs is modeled as a repeated auction. The access cost, also called the entry free is charged only when the user decides to participate in the auction at time . On the other hand, SUs always need to pay for the monitoring cost regardless of their decisions. At the beginning of a slot , an SU chooses whether to stay out (SO) or participate in spectrum access. If the latter option is chosen, the SU (or bidder) sends a confidential message to the coordinator (or auctioneer) containing its bid. Let the bid submitted by SU be , which is a vector with component for the channel. We define the set of actions of user ’s opponents as
The cost incurred is thus
where is the indicator function.
In each round, the bidder with the maximum bid wins and is granted the spectrum access opportunity. A payment is incurred accordingly. There are two key differences compared to existing spectrum access models, where upon sensing an idle channel, all SUs contend for spectrum access. First, an SU may choose to stay out if participating in the spectrum access game is deemed unfavorable (because of low data rate or large number of contending SUs). Second, the transmission opportunity in each available channel is granted only to a single SU after the auction. Therefore, no further contention will occur. Each SU is assumed to follow a symmetric strategy based on its local state and information learned.
Figure 1 gives an illustration of the system model. An analogy can be drawn in casinos, in which different gamblers try to select which table to play and how much to bet. Each secondary user shall decide which channel to sense and bid for, and how much the bid should be. These two issues will be addressed in the following sections for single-channel (i.e., ) and multi-channel (i.e., ) cognitive radio networks, respectively.
Iii Repeated Auction with Learning
In this section, we investigate the spectrum access problem among multiple SUs in cognitive radio networks. We first discuss the auction mechanism and then define the utility function. Finally, an efficient bidding mechanism in repeated auctions with learning is proposed.
Recall from Section II that at the beginning of a slot , an SU decides to either place a bid, or stay out and monitor the results. Based on the SUs’ actions, the allocation strategy at time for channel can be written as
If SU chooses to stay out, then its allocation equals zero, i.e.,
The SU with the highest bid would win the right to access the channel, i.e.,
The winner would pay
for its bid, where
For ease of presentation, a second price auction is assumed in the remaining discussion of the paper. The auction mechanism can be written as follows:
SU observes its current valuation (rate) ;
SU decides ;
The mechanism implements and ; and
SU observes and .
Mechanisms and results for “one-shot” auction with and without entry fees have been well established in the literature . Typically, a symmetric and known strategy is assumed. In our formulation, at each stage of the auction, an SU decides on its action according to the bidding history monitored thus far. The number of participants varies from stage to stage depending on the SU’s valuation and its knowledge regarding other players.
Iii-B Utility Function
To assess the expenditure in the course of the game, we define the accumulated cost for SU at time as
where is the bidding history observed by SU up to time . The cost includes payment for the spectrum access opportunity, entry and monitoring fees over the history and across the different channels.
The accumulated reward of SU is given by
The utility is thus defined as the accumulated reward over the total cost, i.e.,
The utility function is essentially the revenue to cost ratio of the SU’s actions over time. An SU will try to maximize its utility. Intuitively, when an SU’s valuation is low compared with others, it is beneficial for the SU to stay out so that the entry cost is not incurred unnecessarily. On the other hand, staying out all the time leads to zero accumulation of revenue and starvation of the SU, and thus should be avoided. Optimizing (9) is difficult even in a centralized manner due to the large decision space. Therefore, distributed heuristic learning algorithms are warranted to determine at each SU individually.
At time , an SU decides whether to participate in the auction and if so, its bid. If SU ’s decision is to participate (or ), it can be proved straightforwardly that SU ’s dominating strategy is to bid its own valuation in the second price auction. More formally, we have
The equilibrium of the repeated auction with utility function (9) consists of each bidder using the following strategy at time :
where is a function of SU ’s current valuation and bidding history. The above strategy implies a thresholding criterion for participating in the game. The form of differs in the single channel and multi-channel scenarios, and will be discussed in more detail in subsequent sections.
Iii-C Repeated Auction in a Single Channel
When there is only a single channel, we can drop in the notation. An SU stays out of bidding if it deems that participation is likely to reduce its payoff. Formally, , if
In (11), the expectation is taken over all possible valuations of SU ’s opponents.
In the first auction, no past history is available. The same thresholding function is applied at each SU under the assumption that the valuations of SUs are independent and identically distributed with cumulative distribution function (CDF) and probability density function (PDF) . Therefore, the CDF and PDF of the second largest valuation among users are and , respectively. Let , for all . The strategy for the first auction is stated as follows.
In the first auction, if and only if , where
Since is the lowest valuation of any SU to participate in the auction, only when all other SUs have a valuation less than will SU with valuation win the auction. Therefore,
To satisfy (11), we have
Direct evaluation of (11) is difficult after the first auction. This is because the accumulated reward, cost and current valuation are only available to each SU individually (although the auctioneer provides the information regarding the highest bid and associated payment at the end of each stage). Next, we introduce a simple heuristic to approximate the right hand side of (11). SU maintains a private threshold value , initiated according to Proposition 2. At time , SU updates . Furthermore,
SU ’s action is thus,
At the end of stage , the SU obtains the largest bid and associated payment. If SU chooses to stay out, but the payment of the winner is less than , its is set to the payment amount. Otherwise, remains the same. On the other hand, if SU participates in the auction but either loses the auction or is required to make a payment higher than , its is set to the payment amount. To avoid fluctuation of estimates, a moving average of old and new values can be applied.
The above mechanism is summarized in Algorithm 1.
Iii-D Non-Regret Algorithm for the Multi-channel Case
In this section, we will address the spectrum access problem in multi-channel networks. A class of algorithms called regret-matching  is explored. The resulting stationary solution of the learning algorithm exhibits no regret by setting the probability of a particular action proportional to the “regrets” for not having played other actions. In particular, for any two distinct actions at every time , the regret of SU at time for not playing is
with denoting the size of time window. has the interpretation of average payoff that SU would have obtained, if it had bid every time in the past instead of choosing . The expression can be viewed as a measure of the average regret. In the context of spectrum access in multi-channel networks, the alternative actions correspond to participating in the auction game in different channels111Each user decides which channel to bid on, and then uses the value for bidding that channel.. The probability for SU to join auctions in channel is a linear function of the regret. Here is a -by- probability vector with . Define as the indication vector for whether or not SU competes in the th channel. The detailed regret-matching algorithm is given in Algorithm 2. The complexity of the algorithm is and can be implemented distributively. Furthermore, its convergence has been proved in the literature . Once SU chooses the channel to access, its action is decided by Algorithm 1. Note that even though an SU can access only one channel at a time, the bidding history on all data channels is made available through the control channel to all SUs.
In this section, we investigate the performance of the proposed schemes by simulations. We construct a network of dimensions m-by-m, in which the SUs are randomly placed. All SUs transmit to a base station at a fixed location m away from the center of the network. The propagation loss exponent is set to be . The common transmission power level of all SUs is set to be 100mW and the noise level at -90dbm. A unit bandwidth is assumed with frame length at 100s, and Doppler frequency 100Hz. We set and . The proposed schemes are compared to a myopic scheme, in which SUs always participate in the auctions.
First, we consider a simple 2-user case to understand the convergence of the proposed algorithm. Figure 2 shows a snapshot of the change of the utility function for user over time. The entry and monitoring fees are fixed at and , respectively. From the figure, we can see that the proposed scheme converges quickly and then tracks the changes in the channel. Furthermore, compared to the myopic scheme in which the SUs always bid, the utilities attained are higher for both users. This is because the SUs can decide whether to bid or not based on its valuation and outcome of past auctions. Between time and , a primary user is active, and all SUs stay out of the auction but still pay for the monitoring fee. The average value of gamma decreases during that period of time. After the primary user stops transmitting, the auction game resumes. Since the effects of a primary user are very predictable, in the remaining simulations we assume the primary user is always idle.
In Figure 3, we demonstrate the effects of entry and monitoring costs on performance. The proposed scheme is shown to achieve better performance in all cases and the average gain is up to 15%. We can see that when the monitoring fee is fixed, as the entry fee () increases, the average utility decreases. This is expected as it is more expensive to participate in the auction. Furthermore, the gap between the utilities attained by the proposed scheme and the myopic scheme also increases. This is because in the proposed scheme SUs are selective and would participate in the auction only if they are likely to win. The myopic scheme would incur high losses in revenue as the result of a higher entry cost.
Clearly, is between and . The larger the value, the better the fairness is. We can see that the proposed scheme results in fairer resource allocation compared with the myopic bidding scheme. As the entry cost increases, the fairness of the myopic scheme also decreases. This is because users experiencing worse channels are repeatedly penalized by losing the game and paying entry fees. In comparison, when and , the proposed scheme achieves slightly better fairness as the entry cost increases.
In the next set of experiments, we set the number of SUs to be . Figure 5 () compares the SUs’ average valuation , average bid and instantaneous bid at time , respectively. Several observations are in order. First, the users with a higher average value generally agree with users with a higher average bid though not always. This is because the average bid also includes the case in which an SU stays out (treated as a zero bid). Second, as expected, not all the users are bidding in each slot; only the users with low cost and high chance of winning would participate.
In Figure 6 and Figure 7, we show the utility and fairness as a function of the number of SUs varying from 2 to 16. The costs are set as and , respectively. We can see that as the number of SUs increases, the utility decreases due to limited radio resources. The fairness also decreases since there might be more chances for users to dominate when the number of users is large. The proposed scheme has better performance in both utility and fairness, compared with the myopic scheme. The gain in utility ranges from 12% to 25%.
In this set of experiments, we study the performance and convergence of a two-user two-channel case. The parameters are set as , , and . Three schemes are compared, namely, Best Channel Bidding (BCB), Geni Aided (GA), and Non-Regret Learning (NRL). In BCB, the users select to bid on the channel with the highest channel gain. In GA, a Geni tells the SUs not to bid on the channel that they would not win and instead to bid on the other channels. The GA solution is thus the performance upper bound for practical systems. We can see that the BCB has the worst performance since the SUs might bid on the same channel while the other channels are vacant. The proposed NRL solution on the other hand, performs closely to the GA solution, and can be easily implemented in a distributed manner.
In this paper, we have investigated the problem of spectrum access in single and multi-channel cognitive radio networks. A repeated auction based framework has been adopted. In single-channel spectrum access, SUs selectively participate in the auction based on their valuation and past auction history. This scheme has been shown to outperform a myopic scheme in which SUs always compete. In multi-channel networks, a non-regret approach has been proposed. Its performance has been shown to be significantly better than a naive greedy solution and come close to that of the Geni aided solution. As future work, we plan to improve on the convergence speed and optimality of the proposed scheme. Also of interest is the study of robust mechanisms for situations in which the monitored information may be inaccurate.
This research was supported in part by the Air Force Office of Scientific Research under Grant FA 9550-08-1-0480 and the National Science Foundation under Grant CNS-0832084.
-  Federal Communications Commission, “Spectrum policy task force report,” Report ET Docket no. 02-135, Nov. 2002.
-  S. Haykin, “Cognitive radio: Brain-empowered wireless communications,” IEEE J. Select. Areas Commun., vol. 23, pp. 201–220, Feb. 2005.
-  K. Lee and A. Yener, “Throughput enhancing cooperative spectrum sensing strategies for cognitive radios,” in Proc. of Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, CA, Nov. 2007.
-  D. Cabric, M. S. Mishra, and R. W. Brodersen, “Implementation issues in spectrum sensing for cognitive radios,” in Proc. Asilomar Conf. on Signals, Systems, and Computers, Pacific Grove, CA, Nov. 2004.
-  W. Zhang and K. Ben Letaief, “Cooperative spectrum sensing with transmit and relay diversity in cognitive networks,” IEEE Trans. Wireless Commun., vol. 7, pp. 4761–4766, Dec. 2008.
-  J. Unnikrishnan and V. Veeravalli, “Cooperative sensing for primary detection in cognitive radio,” IEEE J. Select. Topics Signal Processing, vol. 2, no. 1, pp. 18–27, Feb. 2008.
-  B. Wang, K. J. R. Liu, and T. Clancy, “Evolutionary game framework for behavior dynamics in cooperative spectrum sensing,” in Proc. IEEE Global Commun. Conf., New Orleans, LA, Dec. 2008.
-  S. Huang, X. Liu, and Z. Ding, “Optimal sensing-transmission structure for dynamic spectrum access,” in Proc. Int. Conf. on Computer Communications (INFOCOM), Rio de Janeiro, Brazil, Apr. 2009.
-  M. Maskery, V. Krishnamurthy, and Q. Zhao, “Decentralized dynamic spectrum access for cognitive radios: Cooperative design of a non-cooperative game,” IEEE Trans. Commun., vol. 57, no. 2, pp. 459–469, Feb. 2009.
-  S. Subranami, T. Başar, S. Armour, D. Kaleshi, and Z. Fan, “Noncooperative equilibrium solutions for spectrum access in distributed cognitive radio networks,” in Proc. IEEE DySPAN, Chicago, IL, Oct. 2008.
-  M. Bloem, T. Alpcan, and T. Başar, “A Stackelberg game for power control and channel allocation in cognitive radio networks,” in Proc. ICST/ACM Gamecomm Workshop, Nantes, France, Oct. 2007.
-  D. I. Kim, L. B. Le, and E. Hossain, “Joint rate and power allocation for cognitive radios in dynamic spectrum access environment,” IEEE Trans. Wireless Commun., vol. 7, no. 12, Dec. 2008.
-  E. W. M. Wong and C. Foh, “Analysis of cognitive radio spectrum access with finite user population,” IEEE Comm. Letters, vol. 13, no. 5, pp. 294–296, May 2009.
-  L. B. Le and E. Hossain, “OSA-MAC: A multi-channel MAC protocol for opportunistic spectrum access in cognitive radio networks,” in Proc. IEEE Wireless Commun. and Networking Conf., Las Vegas, LV, Apr. 2008.
-  A. Danak and S. Mannor, “Bidding efficiently in repeated auctions with entry and observation costs,” In Proc. International Conference on Game Theory for Networks (Gamenets), Istanbul, Turkey, May 2009.
-  V. Krishna, Auction Theory, Elsevier Science, Amsterdam, Netherlands, 2002.
-  Z. Han and K. J. R. Liu, Resource Allocation for Wireless Networks: Basics, Techniques, and Applications, Cambridge University Press, Cambridge, UK, April, 2008.
-  E. Hossain, D. Niyato, and Z. Han, Dynamic Spectrum Access in Cognitive Radio Networks, Cambridge University Press, Cambridge, UK, 2009.
-  S. Hart and A. Mas-Colell, “A simple adaptive procedure leading to correlated equilibrium,” Econometrica, vol. 68, no. 5, pp. 1127-1150, September 2000.
-  S. A. Matthews, “A Technical Primer on Auction Theory I: Independent Private Values”, Discussion Papers 1096, Northwestern University, Center for Mathematical Studies in Economics and Management Science, Evanston, IL, May, 1995.
-  R. Jain, D. M. Chiu, and W. Hawe, “A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Systems,” DEC Research Report TR-301, Sept. 1984.