Spectrum Coordination in Energy Efficient Cognitive Radio Networks

# Spectrum Coordination in Energy Efficient Cognitive Radio Networks

## Abstract

Device coordination in open spectrum systems is a challenging problem, particularly since users experience varying spectrum availability over time and location. In this paper, we propose a game theoretical approach that allows cognitive radio pairs, namely the primary user (PU) and the secondary user (SU), to update their transmission powers and frequencies simultaneously. Specifically, we address a Stackelberg game model in which individual users attempt to hierarchically access to the wireless spectrum while maximizing their energy efficiency. A thorough analysis of the existence, uniqueness and characterization of the Stackelberg equilibrium is conducted. In particular, we show that a spectrum coordination naturally occurs when both actors in the system decide sequentially about their powers and their transmitting carriers. As a result, spectrum sensing in such a situation turns out to be a simple detection of the presence/absence of a transmission on each sub-band. We also show that when users experience very different channel gains on their two carriers, they may choose to transmit on the same carrier at the Stackelberg equilibrium as this contributes enough energy efficiency to outweigh the interference degradation caused by the mutual transmission. Then, we provide an algorithmic analysis on how the PU and the SU can reach such a spectrum coordination using an appropriate learning process. We validate our results through extensive simulations and compare the proposed algorithm to some typical scenarios including the non-cooperative case in [1] and the throughput-based-utility systems. Typically, it is shown that the proposed Stackelberg decision approach optimizes the energy efficiency while still maximizing the throughput at the equilibrium.

{IEEEkeywords}

Cognitive Radio Networks; Multi-carrier systems; Energy Efficiency; Spectrum Coordination; Game Theory; Learning; Sensing.

## 1 Introduction

Cognitive radio technology has been proposed first to increase the throughput of the mobiles for the next generation of wireless technologies [2]. This enhancement is possible with an efficient use of the wireless spectrum and specifically spectrum holes. Indeed, PUs that have a specific and licensed access to the spectrum let part of the spectrum unused in different time and geographic location. Many works have been done for optimizing the behavior of SUs in cognitive radio networks (CRNs), see [3] for a survey. However, most of previous works are focused on spectrum sharing [4] or CRN and interference avoidance [5]. Consequently, the energy efficiency aspect in this setting was largely ignored. Green communications are attracting growing attention due to various economical and environmental reasons. This has led research community to focus more to reduce energy consumption by introducing enhanced networking technologies [6], [7]. Motivated by the limited battery life of mobile terminals in spite of the transmission rate, green networking have spurred great interest and excitement these recent years. In the literature, energy efficient power control game has been first proposed by Goodman et al. in [8] for flat fading channels and re-used by [1] for multi-carrier code-division multiple access (CDMA) systems and [9] for relay-assisted DS/CDMA. Most of these works do not consider the cognitive radio technology and therefore the capabilities of the secondary users.

In CRN, interference management is very important since the interference due to spectrum-sharing can significantly degrade the overall performance. In the existing work, various resource allocation methods are proposed to either improve energy efficiency or alleviate interference. However, very little research has addressed their joint interaction. In [10], the authors considered that primary and secondary users’ signals coexist in the same frequency band, and the transmit powers of the SUs are constrained so that the interference from the whole secondary network to each PU does not exceed a prescribed threshold. They formulate the problem using a non-cooperative power control game and proved the existence of a unique Nash equilibrium (NE). [11] provides an energy efficient game perspective to the problem of contention-based synchronization in orthogonal frequency-division multiple access (OFDMA) communication systems. Each user trades off its available resources so as to selfishly maximize its own revenue (in terms of probability of correct detection) while saving as much energy as possible and satisfying quality-of-service (QoS) requirements (in terms of probability of false alarm and timing estimation accuracy). In [12], the authors study the gradual removal problem in energy efficient wireless networks. That is, any transmitting user whose required transmit power for reaching its target-SIR exceeds its maximum power is temporarily removed, but resumes its transmission if its required transmit power goes below a given threshold obtained in a distributed manner. Thus all transmitting users reach their target rate consuming the minimum aggregate transmit power.

We consider in this paper a hierarchical (Stackelberg) game model of a CRN in which the PU is the leader and the SU is the follower of the game. It is noteworthy that in our paper we consider the spectrum underlay concept in which the PU experiences interference from the SU. Most of the current work has been focusing on the spectrum sharing between cognitive radio pairs, where cognitive radio nodes dynamically detect spectrum holes of primary spectrum users and opportunistically utilize them in frequency and time [3]. We formally prove that the hierarchical structure of the game induces a spectrum coordination between the different components of the network in such a way that they transmit on distinct carriers. This coordination property across the multiple interfering devices is particularly appealing not only from an implementation perspective, but also due to its low complexity, smaller overhead, and ability for radio resource management (see as an example [13] for open spectrum ad-hoc networks, [14] for multi-cell MIMO systems and [15] for cellular downlink networks).

There are many motivations for studying wireless networks with hierarchical structures, but the most important ones are to improve the network efficiency and modeling aspect. The Stackelberg game has been firstly proposed in economic problem and also in biology for modeling optimal behaviors against nature [16]. It is in fact a mechanism for wireless networks in which some wireless nodes have the priority to access the medium whereas some other nodes are equipped with cognitive sensors like in CRN (see [17] which is one of the first reference which addressed a multi-leader and multi-follower game theoretic model for CR spectrum sharing networks). This is also a natural setting for heterogeneous wireless networks due to the absence of coordination among the small cells and between small cells and macro cells [18, 19]. At the core lies the idea that the utility of the leader obtained at the Stackelberg equilibrium is higher than his utility obtained at the NE when the two users play simultaneously. This is due to the Stackelberg mechanism in which the leader anticipates the follower’s action. It has been proved in [20] that this result is also true for the follower. The goal is then to find a Stackelberg equilibrium in this two-step game [21].

The original contributions of this paper are threefold:

• Introducing hierarchy concept in power control game for energy efficient multi-carrier systems,

• Characterizing completely and analytically the Stackelberg equilibrium and compare the results obtained in the proposed hierarchical game with those obtained in the non-cooperative game in [1],

• Our main result is that we always obtain an equilibrium (contrary to the work addressed in [1]) where, for the most general cases, the two users transmit on distinct carriers delivering a binary channel assignment.

The organization of the paper is the following. First, we introduce in Section 2 the CRN context and the different decision makers of the system. In Section 3, we define the energy efficiency framework which is used throughout the paper and present the game theoretic model in Section 4. Next, in Section 5, we characterize the Stackelberg equilibrium by providing a thorough analysis on the existence and uniqueness of such an equilibrium. Having these results, we then address the important property of spectrum coordination in Section 6. Section 7 provides an analysis of the implementation issues including a learning algorithm that ensures convergence to the Stackelberg equilibrium in 7.1 and the sensing issue in 7.2. Section 8 illustrates some numerical results and Section 9 concludes the paper.

## 2 The Cognitive Radio System Model

We consider a network composed of a PU (or leader – indexed by ), having the priority to access the medium, and a SU (or follower – indexed by ) that accesses the medium after observing the action of the PU subject to mutual interference. We assume slotted transmissions (over carriers) for both the PU and the SU. The equivalent baseband signal received by the base station can be written as

 yk=h1kx1k+h2kx2k+zk,fork=1,2 (1)

where stands for the block fading process of user on the sub-band , is the signal transmitted by user on the sub-band and is the additive Gaussian noise at the th sub-band. We denote by the fading channel gain which is assumed to stay constant over each block fading length (i.e., coherent communication). We statistically model the channel gains to be i.i.d. distributed over the Rayleigh fading coefficients. The signal transmitted can be further written as where and are the transmit power and data of user . We thus have . The additive Gaussian noise at the receiver is i.i.d. circularly symmetric and for . For any user the received signal-to-noise plus interference ratio (SINR) over carrier is expressed as

 γnk=gnkpnkσ2+2∑m=1m≠ngmkpmk:=pnkˆhnk. (2)

In the remainder, we will define the ratio between the SINR and the transmission power by the effective channel gain . It follows from the above SINR expression that the strategy chosen by a user (i.e., the power vector ) may affect the performance of the other user in the network through multiple-access interference reflected by the effective channel gain.

## 3 Network Energy Efficiency Analysis

Our system model is based on the seminal paper [8] that defines the energy efficiency framework. In order to formulate the power control problem as a game, we first need to define a utility function suitable for data applications. Increasing the transmit power clearly favors the packet success rate and therefore the throughput. However, as the packet success rate tends to one, further increasing the power can lead to marginal gains in terms of throughput regarding the amount of extra power used. The following utility function allows one to measure the corresponding trade-off between the transmission benefit (total throughput over both carriers) and cost (total power over both carriers)1:

 un(p1,p2)=Rn⋅(f(γn1)+f(γn2))pn1+pn2 (3)

where is the transmission rate of user and is an increasing, continuous and S-shaped efficiency function which measures the packet success rate. A more detailed discussion of the efficiency function can be found in [22]. The utility function that has units of bits per joule perfectly captures the trade-off between throughput and battery life and is particularly suitable for applications where energy efficiency is crucial such as sensors and mobiles terminals.

## 4 The Game Theoretic Framework

### 4.1 The non-cooperative game problem

An important solution concept of the game under consideration is the NE [23], which is a fundamental concept in non-cooperative strategic games. It is a vector of strategies (or actions in our case) , one for each player, such that no player has incentive to unilaterally deviate, i.e., for all action , where the subscript on vector stands for ”except user ”. In [8], authors showed that, under certain conditions, the NE of the game with utility (3) exists.

### 4.2 The hierarchical game formulation

In this work, we consider a Stackelberg game framework in which the PU decides first his power control vector and based on this, the SU will adapt his power control vector .

###### Definition 1.

(Stackelberg equilibrium): A vector of actions is called Stackelberg equilibrium (SE) if and only if: where and .

A SE can be determined using a bi-level approach [21]. Given the action of the PU, we compute the best-response function of the SU (the function ), i.e., the action of the SU which maximizes his utility given the action of the PU. This best-response function is characterized by using a result from [1] which depends on the PU’s power control on carrier through the following expression:

 ∀k∈{1,2},ˆh2k(p1k)=γ2kp2k=g2kσ2+g1kp1k.

## 5 Characterization of the Stackelberg Equilibrium

In order to determine the SE, a standard approach is to consider a backward induction technique. Then, we first determine the best-response function of the SU depending on the action of the PU. This result comes directly from Proposition 1 of [1]. For making this paper sufficiently self-contained, we review here the latter proposition.

### 5.1 The secondary user’s power control vector

###### Proposition 1.

(Given in [1])

Given the power control vector of the PU, the best-response function of the SU is given by

 Misplaced & (4)

with and is the unique (positive) solution of the first order equation

 xf′(x)=f(x). (5)

Equation (5) has a unique solution if the efficiency function is sigmoidal [24], and we will use this assumption throughout our paper.

Proposition 1 claims that there are two regions depending on the PU’s power control which yields different best-response functions for the SU. Below, we define the two regions:

 A = {(p11,p12)|ˆh22≥ˆh21} (6) = {(p11,p12)|p12≤p11g11g22g12g21+σ2(g22−g21)g12g21}

and

 B = {(p11,p12)|ˆh22<ˆh21} (7) = {(p11,p12)|p12>p11g11g22g12g21+σ2(g22−g21)g12g21}.

### 5.2 The primary user’s power control vector

So far, we have seen that the best-response function of the SU is to use only one carrier, the one with the best effective channel gain. Let us now study the optimal power control for the PU knowing the best-response function of the SU. The following proposition, which is our first main result, gives the existence and uniqueness of the optimal power control of the PU at the SE knowing the best-response function of the SU. Notice that uniqueness of the SE is a desirable property for a Stackelberg game. If there exists exactly one equilibrium, we can predict the equilibrium strategy of the players and resulting performance of the system.

###### Proposition 2.

(First main result) Existence and uniqueness of the PU’s power control at the SE
There exists a unique power control vector for the PU which maximizes his energy efficiency over Region . It is defined by:

 ˜p12=0,and˜p11=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩[]lrσ2γ∗g11,if g22g21≥11+γ∗,σ2(g21−g22)g11g22,otherwise% .

There exists a unique power control vector for the PU which maximizes his energy efficiency over Region . It is defined by:

 ˜p11=0and˜p12=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩σ2γ∗g12,ifg22g21≤1+γ∗,σ2(g22−g21)g12g21,otherwise% .

For the clarity of the exposition, this proposition is proven in Appendix .1.

This result combined with the result of Prop. 1 yields the existence of a SE.

###### Corollary 1.

At the Stackelberg equilibrium, when the channel gains of the SU satisfy

 11+γ∗≤g21g22≤1+γ∗, (8)

the power control vector which maximizes the PU’s utility is unique and is given by

 ˜p1k=⎧⎪⎨⎪⎩σ2γ∗g1k,fork=˜k,0,for allk≠˜k, (9)

where denotes the ”best” carrier of the PU, i.e., .

###### Proof.

The proof makes use of results from Prop. 2 for the PU’s power control in Region and . We have that the utility of the PU within Region is maximized when , yielding

 maxp11,p12uA1(p11,p12)=maxp11uA1(p11,0)=maxp11R1f(g11p11σ2)p11 (10)

which implies that the maximum utility over Region is given by .

Within Region , we have that the utility of the PU is maximized when , yielding

 maxp11,p12uB1(p11,p12)=maxp12uB1(0,p12)=maxp12R1f(g12p12σ2)p12. (11)

which implies that the maximum utility within Region is . Combining the above results for Region (in Eq. (10)) and Region (in Eq. (11)), the maximization problem of the PU’s utility becomes

 maxp11,p12u1(p11,p12)=max(˜uA1,˜uB1)={˜uA1,ifg11≥g12,˜uB1,ifg11

where we use the fact that is a strictly increasing function. This completes the proof. ∎

Condition 8 means that a given user experiences approximately the same channel characteristics over his two carriers. Note that this is typically the case when the two carriers are close enough [25]. Corollary 1 says that the utility of PU is maximized when he transmits only over his best carrier. Accordingly, we observe that the carrier which doest not provide enough energy efficiency to outweigh the interference degradation caused by the SU’s transmission is switched ”off”. Notice that this result is in contradiction with throughput-based-utility systems which lead to a water-filling power control [26] where only a certain number of carriers are exploited depending on the channel gains.

To resume, Prop. 1 and Prop. 2 suggest that, at the SE, both the SU and the PU transmit on only one carrier depending on their channel gains. In the next section, we will show that hierarchy ”pushes” users towards coordinating their actions in such a way that they transmit on distinct carriers.

## 6 Spectrum Coordination

### 6.1 General result

A necessary and sufficient condition on the SU’s channel gains such that the best-response function of the SU is to transmit over a distinct carrier than the PU is given in the following proposition.

###### Proposition 3.

At the Stackelberg equilibrium, if the PU transmits over only one carrier, the SU transmits over a distinct carrier if and only if Condition (8) is satisfied.

The proof of Prop. 3 is given in Appendix .2. Prop. 3 claims that Condition (8) is a necessary and sufficient condition to obtain spectrum coordination. We will see in the next proposition that a spectrum coordination can occurs even if Condition (8) is not satisfied. In this case, the SE is not unique as the SU obtains the same utility by choosing to transmit either on a different carrier from the PU (coordination case) or on the same carrier than the PU (non coordination case).

###### Proposition 4.

(Second main result) Spectrum Coordination
Introducing hierarchy between users in a two-carrier energy efficient power control game induces a natural coordination pattern where users have incentive to choose their transmitting carriers in such a way that they transmit on orthogonal channels.

###### Proof.

To show this important result, we will determine the Stackelberg equilibria of the users depending on their channel gains. As far as the proposed hierarchical model is concerned, the SE can be computed by considering the following possibilities:

• (a) If (i.e., the SU experiences approximately the same radio conditions over his two carriers),

• (i) if , then because and .
The SE is then given by:

 (˜p11,˜p12,˜p21,˜p22)=(0,γ∗σ2g12,γ∗σ2g21,0), (12)
• (ii) otherwise, , then because and .
The SE is then given by: is

 (˜p11,˜p12,˜p21,˜p22)=(γ∗σ2g11,0,0,γ∗σ2g22). (13)
• (b) If (i.e., the SU experiences deep fade on his second carrier compared to his first carrier),

• (i) if , then the Stackelberg equilibrium is

 (˜p11,˜p12,˜p21,˜p22)=(0,γ∗σ2g12,γ∗σ2g21,0). (14)
• (ii) otherwise, , the power control vector of the PU at the SE is

 (˜p11,˜p12)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩(σ2(g21−g22)g11g22,0)if (???),(0,σ2γ∗g12)otherwise.

where Condition (15) is

 g11g12≥f(γ∗)γ∗g21g22−1f(g21g22−1). (15)

The SU transmits on the carrier which is left idle by the PU if Condition (15) is not satisfied. In this case

 (˜p21,˜p22)=(σ2γ∗g21,0).

If Condition (15) is satisfied, we have the following best-response function for the SU:

 ¯¯¯p2(σ2(g21−g22)g11g22,0)={(σ2γ∗g22,0)or(0,σ2γ∗g22)},

because the effective channel gains are equal for both carriers, i.e., . Then the best-response function is not unique in this case and the two players can use the same carrier, the first one here. As the SU plays after observing the action of the primary user, the SU can decide, for optimizing spectrum utilization, to transmit over the carrier left idle by the PU. Moreover, the SU’s power is inversely proportional to the channel gain over the second carrier. Then, it is more convenient for him to transmit over this second carrier.

• (c) If (i.e., the SU experiences deep fade on his first carrier compared to his second carrier), we have the similar results:

• (i) if , then the SE is

 (˜p11,˜p12,˜p21,˜p22)=(γ∗σ2g11,0,0,γ∗σ2g22). (16)
• (ii) otherwise, , the power control vector of the PU at the SE is

 (˜p11,˜p12)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩(0,σ2(g22−g21)g12g21)if (???),(σ2γ∗g11,0)otherwise.

where Condition (17) is

 g11g12<γ∗f(γ∗)f((g22−g21)g21)(g22−g21)g21. (17)

The SU transmits on the carrier which is left idle by the PU if Condition (17) is not satisfied. In this case

 (˜p21,˜p22)=(0,σ2γ∗g22).

If Condition (17) is satisfied, we have the following best-response function for the SU:

 ¯¯¯p2(0,σ2(g22−g21)g12g21)={(0,σ2γ∗g21)or(σ2γ∗g21,0)},

because the effective channel gains are equal for both carriers, i.e., . Then the best-response function is not unique in this case and the two players can use the same carrier, the second one here. In this particular case, the SU can decide to transmit over the first carrier in order to optimize the spectrum utilization. Again, as the SU’s power is inversely proportional to the channel gain on the first carrier, it is more convenient for him to transmit over this first carrier.

Having treated the case of spectrum coordination, let us now present a particular case (on the fading channel ) where the two players gain by transmitting on the same carrier at the SE.

### 6.2 Extreme Case

In a Stackelberg game, if the leader decides to play a Nash action, then the follower plays the Nash action too as it is the best-response function to the Nash action. Then, depending on the ratio , it could be interesting for the PU to transmit over the same channel than the SU. We will show in the next proposition that this case can appear, essentially when the target SINR at the SE is very low, i.e., and with some conditions on the channel gains.

###### Proposition 5.

At the Stackelberg equilibrium, in Region (resp. Region ), if , both the PU and the SU transmit on the first (resp. second) carrier if

 gn1gn2≥11−γ∗,(resp.gn1gn2≤1−γ∗),forn∈{1,2}. (18)

The proof of Prop. 5 is given in Appendix .3. Prop. 5 claims that the probability of extreme case turns out to be the probability of no coordination between users. Specifically, in the extreme case of Region (), the PU decides to transmit on the same carrier (second one here) as the second carrier is much better that the first one. In the extreme case of Region (), the channel gain is very bad on the second carrier with respect to the one on the first carrier and then both users choose to transmit on the first carrier. Note that, in this case, the SU and the PU transmit over the same carrier using an optimal power control given by the Stackelberg model proposed in [20].

Notice that, in the case of Rayleigh fading channels, the probability of being in the extreme case is given by:

 ψ(γ∗) = Pr{g11g12≥11−γ∗}⋅Pr{g21g22≥11−γ∗} +Pr{g11g12≤1−γ∗}⋅Pr{g21g22≤1−γ∗} = ⎡⎣∫∞0∫∞y(1−γ∗)e−(x+y)dxdy⎤⎦2 +[∫∞0∫(1−γ∗)y0e−(x+y)dxdy]2=2⋅(γ∗−1γ∗−2)2.

Figure 1 depicts the probability of being in the extreme case – which is the probability of no coordination – when . It is shown that the probability of being in the extreme case is always lower than . As increases, the extreme region shrinks resulting in a decrease of the probability of no coordination.

A global overview of the occupation of the carriers at the SE, as function of the ratios and is depicted in Figure 2. It is shown the main contributions of the paper, namely

• we have proved the existence and uniqueness of an equilibrium when a user can observe the action of the other user before deciding his own action, whatever the channel gains are. This result is not true in the case when the two users play a NE (see for instance [1]),

• although we have formulated the problem of energy efficiency maximization by allowing that a carrier could be shared by both users, we have obtained a spectrum coordination pattern in which, to refrain from mutual interference, users have incentive to choose their carriers orthogonally (exactly like in OFDMA systems).

## 7 Implementation Issues

Although Prop. 1 and Prop. 2 guarantee SE existence, it is still not clear whether users will be able to calculate this equilibrium in a decentralized environment where only partial/local information is available at the mobile terminal. Consequently, our goal in this section will be to study implementation issues related to the converge to the equilibrium and its speed along with the sensing problem. So far, we have assumed that the channels are static. If the channels fluctuate stochastically over time, the associated game still admits an equilibrium, but the learning process is no more deterministic; just the same, by employing the theory of stochastic approximation, it can be shown that users still converge to equilibrium [27]. In the next section, we propose a temporal difference learning algorithm that ensures convergence to the SE within a limited time.

### 7.1 Learning-based approach

The interaction between the PU and the SU provides a potential incentive for both agents to make decision process based on their respective perceived payoff. Determining the equilibrium strategy of both the primary and the secondary users requires in practice the knowledge of several informations which can not be observed in a realistic scenario [28]. We propose, in this section, an on-policy learning-based algorithm that allow the PU and the SU to determine their strategies on-the-fly. Machine learning is a powerful technique where learning is accomplished by real-time interactions with the environment, and proper utilization of past experience. In particular, we consider a well-known temporal difference learning where each user maintains state-value functions as a lookup tables in order to determine the optimal action in the current time slot [29]. To cope with the hierarchical decision process between the PU and the SU, we further set an iteration scale parameter which traduces how frequent the SU updates its state-value function and set new values of powers with respect to the PU. The PU’s state-value function is given by

 q(gt−1,pt−1)←(1−βt)q(gt−1,pt−1)+ βt(u1+κq(gt,pt)),

whereas, the SU’s state-value function is

 Q(gt−1,pt−1)←(1−αt)Q(gt−1,pt−1)+αt(u2+κQ(gt,pt)),

where is the discount factor, and and are the learning rate factors satisfying and , respectively and .

The pseudo-code for the proposed algorithm is given in Algorithm 1. Specifically, we consider an effective balancing between exploration and exploitation. Note that with a probability we explore new actions, while we choose the already established action with a probability . Indeed, the trade-off between exploration and exploitation remains a challenging issue in stochastic learning process.

Algorithm 1: Learning-based Algorithm for Energy Efficient Cognitive Radio Networks. {algorithm} Initialize and for all channel gains and transmit powers  Initialize , , , and ;
\Whiletrue   Observe the new channel gains   Select transmit power vector as follows with probability , else choose a random transmit power vector  \For   Observe the new channel gains   Select transmit power vector as follows with probability , else choose a random power vector  Use the transmit power vector and observe the reward , and given by Eq. (3)

The following proposition proves that the learning-based algorithm for energy efficient cognitive radio networks converges to the optimal policy.

###### Proposition 6.

The learning-based algorithm converges w.p.1 to the optimal -function.

The proof of Prop. 6 is given in Appendix .4. The learning rate time is addressed in the following proposition.

###### Proposition 7.

Let and be the value of the learning-based algorithm for the SU and the PU respectively. Then, we have with probability at least , given that

 T=Ω⎛⎜ ⎜⎝Niter⋅(L+Φ⋅L+1)1β⋅ln(Vmaxϵ)⋅V2maxln(|S||A|VmaxδβϵΦ)(Φβϵ)2⎞⎟ ⎟⎠ (19)

where , , is the maximum reward obtained, and and are the number of possible states and strategies respectively. For a sequence of state-action pairs let the covering time, denoted by , be an upper bound on the number of state-action pairs starting from any pair, until all state-action appear in the sequence. Indeed, the convergence speed of the proposed algorithm depends on the iteration scale parameter . The notation implies that there are constants and such that .

The proof of Prop. 7 is given in Appendix .5.

### 7.2 Spectrum Sensing

In the current Stackelberg model, Proposition 4 claims that the SU transmits over a certain frequency carrier in order to reach only when the PU does not. This enables public access to the new spectral ranges without sacrificing the transmission quality of the actual license owners. Typically, the PU comes first in the system, estimates his channel gains over his two carriers and adapts his transmit power using Prop. 2. The SU comes later in the system, estimates his channel links over his two carriers and chooses his transmit power using Prop. 1. Such an assumption could be further justified by the fact that in an asynchronous context, the probability that two users decide to transmit at the same moment is negligible as the number of users is limited. Thus, within this setting, the PU is assumed to be oblivious to the presence of the SU. The PU communicates with his BS while the SU listens to the wireless channel. The SU has only to reliably detect the carrier used by the PU and not the PU’s transmit power as it is the case in the single carrier context in [20]). Many well-known techniques were developed in order to detect the holes in the spectrum band (energy detection [30], feature detection [31], etc.).

## 8 Numerical illustration

In this section, we present a comprehensive Matlab-based simulation of the CRN described in the previous sections. We consider the energy efficiency function proposed in most papers dealing with power control games that is , where is the block length in bits. This results on ( dB). and the rate Mbps for .

### 8.1 Energy Efficiency as a function of the SNR

This section is devoted to performance comparison of the proposed Stackelberg scheme with respect to traditional schemes. As far as sum energy efficiency comparison is concerned, this can be conducted by considering the four following schemes:

• the Stackelberg model: the one proposed in this paper,

• the Nash model: each user chooses his power level according to [1],

• the best channel model: each user chooses to transmit on his ”best” channel (i.e., the one with the best channel gain) without sensing,

• the best channel with sensing: the PU chooses the ”best” channel to transmit on. The SU senses the spectrum and transmits on the vacant sub-band. Here we assume perfect sensing of the idle sub-band by the SU.

In Figure 4, we plot the energy efficiency at equilibrium as function of the SNR. Interestingly, we see that the energy efficiency of the PU at the SE performs the same than in the sensing scenario till dB, while the energy efficiency of the SU at the SE is always the same than in the scheme where sensing is done by the SU. Moreover, the Stackelberg model outperforms all the other strategies. This is due to the Stackelberg mechanism in which the PU anticipates the SU’s action. In particular, we found out that the PU achieves an energy efficiency gain up to with respect to the Nash strategy at dB. As expected, results in Figure 4 also show that the energy efficiency for the SU at SE is less than the one obtained at NE. This is due to the fact that in Nash model, the PU does not anticipate the SU’s action. Notice that, as the SNR decreases, all configurations tend towards having the same (zero) energy efficiency. This can be justified by the fact that, at low SNR regime, whatever the power control strategy each user chooses, the signal is overwhelmed by the noise.

Figure 4 depicts the throughput at the equilibrium. We observe approximately the same observations than in Figure 4. Of particular interest is the fact that the PU still outperforms all the other strategies till dB whereas the throughput of the SU at the SE is still less than the one obtained at the Nash equilibrium. That is, the proposed Stackelberg scheme achieves a flexible and desirable trade-off between energy efficiency and throughput maximization.

### 8.2 Learning the Equilibria

To proceed further with the analysis, we resort to simulate how the PU and the SU users converge to the equilibria according to Algorithm 7.1 presented in Section 7.1. The noise variance is which corresponds to a dB. We consider an iteration scale , which means that the SU runs iterations for iteration of the PU.

#### Static Channels

In Figures 6 and 6, we consider static channel gains , , and . We observe from Figure 6 that the optimal power control decision of the PU is to transmit on the first carrier whereas the SU chooses to transmit on the second carrier as claimed by Prop. 4. Indeed, we have and which is in the interval . This means that the SE is given by Prop. 4-a-ii yielding the following SE:

In Figure 8 and 8, we change the second carrier’s PU channel gain to and the second carrier’s SU channel gain to . The SE changes accordingly. In fact, we have that and which corresponds to the case (b-i) of Prop. 4 where the PU decides to transmit on the second carrier and the SU transmit on the first carrier yielding the following SE:

In Figure 6 and 8, we look at the energy efficiency of the PU and the SU. In general case, the PU outperforms the SU since the PU anticipates the SU’s action (see Fig. 6). However, it is illustrated in Fig. 8 that, although he plays first, the PU performs worse that the SU at the equilibrium as the best SU’s carrier () is much better than the PU’s best carrier ().

In Figure 10, we plot the energy efficiency of the PU and the SU at the NE proposed in [1] depending on time. It is clear that both the PU and the SU converge to the same energy efficiency since the Nash game is a one-shot game. We also observe that both the PU and the SU converge to exactly the same energy efficiency of Mbit/Joule than the one obtained in Figure 4 at dB. Next, we plot in Figure 10, the convergence of the energy efficiency at the SE for both the PU and the SU. Again we observe that the PU and the SU converge to the same energy efficiency of Mbit/Joule and of Mbit/Joule respectively obtained in Figure 4 at dB. Moreover, as expected, that the energy efficiency at the SE of the PU is higher than the energy efficiency of the SU. Note that the variance of energy efficiency in Figures 10 and 10 is due to the fact that the fading channel states of the PU and the SU vary every time slot. Though, the algorithm still converges to the equilibrium of an averaged game whose payoff functions correspond to the users’ achievable ergodic rates.

## 9 Conclusion

In this paper, we have proposed a hierarchical concept in a power control game for energy efficient multi-carrier cognitive radio systems. We have firstly completely and analytically characterized the Stackelberg equilibrium of such a game. Interestingly, we have shown that, although we have considered that each user is prone to interference from the other transmitter on the same carrier, for the vast majority of cases, there exists a natural coordination pattern where the PU and the SU have incentive to choose their transmitting carriers orthogonally (like in OFDMA systems). The proposed system goes toward the vision of a fully coordinated cognitive radio multi-carrier network, whereby transmit powers are coordinated across the users. Then, we have compared the users’ energy efficiency of the proposed hierarchal game with those obtained in a standard non-cooperative setting. In addition to allowing coordination of the spectrum usage, the proposed power control game provides additional functionalities that can be used in energy efficient CRN. In particular, the proposed Stackelberg scheme achieves a flexible and desirable trade-off between energy efficiency and throughput maximization. For implementation purposes, the SU has only to reliably sense the spectral environment (and not the PU’s transmit power as it is the case in the single carrier context in [20]) and then decides to transmit only on the best carrier left idle by the PU. Finally, with extensive measurement-driven simulations we show that the proposed game model converges to the desired equilibria in a small number of steps, and hence are amenable to practical implementation.

### .1 Proof of Prop. 2: Existence and uniqueness of the PU’s power control at the SE

###### Proof.

Given Proposition 1, we have that the power control vector of the SU in Region and are given, respectively, by

 pA2(p12)=(0,γ∗(σ2+g12p12)g22)

and

 pB2(p11)=(γ∗(σ2+g11p11)g21,0).

Based on the above equations, we can compute the explicit expression of the PU’s SINR on each carrier for both regions, namely

 γ11=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩g11p11σ2,in Region A,g11p11σ2(1+γ∗)+γ∗g11p11,in Region B
 γ12=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩g12p12σ2(1+γ∗)+γ∗g12p12,in Region A,g12p12σ2,in Region B

It follows that the utility function of the PU given by Equation (3) for Region can be expressed as

 uA1(p11,p12) = R1f(γ11)+R1f(γ12)p11+p12, = R1f(g11p11σ2)+R1f(g12p12σ2(1+γ∗)+γ∗g12p12)p11+p12

Similarly, in Region , the PU’s utility function is

 uB1(p11,p12)=R1f(g11p11σ2(1+γ∗)+γ∗g11p11)+R1f(g12p12σ2)p11+p12

Without loss of generality, the analysis is given only for Region . Similar approach can be adopted for Region . We first derive the utility of the PU w.r.t . We obtain

 ∂uA1(p11,p12)∂p11=R1⋅f′(γ11)⋅g11σ2⋅(p11+p12)−[f(γ11)+f(γ12)](p11+p12)2

Now, let us compute the derivative of the PU’s utility on the Region w.r.t . We have

 ∂uA1(p11,p12)∂p12=R1⋅f′(γ12)⋅∂γ12∂p12⋅(p11+p12)−[f(γ11)+f(γ12)](p11+p12)2

where . Knowing that and after some simple simplifications, we obtain that .

We shall now look for a couple such that . It follows from the above results that a couple is solution of the following system

 Unknown environment '%

with and .

The solutions of the above system are given by

 p11=σ2γ11g11 (20)

and

 p12=σ2g12γ12(1+γ∗)(1−γ∗γ12). (21)

In Region , Eq. (6) yields to the following relation between the powers of the PU:

 p