Smart Jammer and LTE Network Strategies in An InfiniteHorizon ZeroSum Repeated Game with Asymmetric and Incomplete Information
Abstract
LTE/LTEAdvanced networks are known to be vulnerable to denialofservice (DOS) and lossofservice attacks from smart jammers. In this article, the interaction between a smart jammer and LTE network (eNode B) is modeled as an infinitehorizon, zerosum, repeated game, with asymmetric and incomplete information. The smart jammer and eNode B are modeled as the informed and the uninformed player, respectively. The main purpose of this article is to construct efficient suboptimal strategies for both players that can be used to solve the abovementioned infinitehorizon repeated game with asymmetric and incomplete information. It has been shown in gametheoretic literature that security strategies provide optimal solution in zerosum games. It is also shown that both players’ security strategies in an infinitehorizon asymmetric game depend only on the history of the informed player’s actions. However, fixedsized sufficient statistics are needed for both players to solve the abovementioned game efficiently. The smart jammer (informed player) uses its evolving belief state as the fixedsized sufficient statistics for the repeated game. Whereas, the LTE network (uninformed player) uses worstcase regret of its security strategy and its antidiscounted update as the fixedsized sufficient statistics. Although fixedsized sufficient statistics are employed by both players, optimal security strategy computation in discounted asymmetric games is still hard to perform because of nonconvexity. Hence, the problem is convexified in this article by devising “approximated” security strategies for both players that are based on approximated optimal game value. However, “approximated” strategies require full monitoring. Therefore, a simplistic yet effective “expected” strategy is also constructed for the LTE network (uninformed player) that does not require full monitoring. The simulation results show that the smart jammer plays nonrevealing and misleading strategies against the network for its own longterm advantage.
I Introduction
LTE/LTEA ([2, 3]) networks have been deployed around the world providing advanced data, VoiceoverLTE (VoLTE), multimedia and locationbased services to more than 3.2 billion subscribers via 681 commercial networks [4]. However, it has been previously shown that Long Term Evolution (LTE) and LTEAdvanced (LTEA) networks are vulnerable to controlchannel jamming attacks from smart jammers who can “learn” network parameters and “synchronize” themselves with the network even when they are not attached to it (cf. [5, 6, 7, 8, 9, 10]). It is shown in the abovereferenced articles that such a smart jammer can launch very effective denialofservice (DOS) and loss of service attacks without even hacking the network or its components. Recently, Garnaev and Trappe [11] also looked into the possibility when the rival is not smart in a jamming game with incomplete information. Hence, pursuing autonomous techniques to address potentially devastating wireless jamming problem has become an active research topic.
In this article, the interaction between the LTE network and the smart jammer is modeled as an infinitehorizon zerosum repeated ^{1}^{1}1A repeated game results when an underlying stage game is played over many stages, and a player may take into account its observations about all previous stages when making a decision at each stage [12]. Bayesian game with asymmetric and incomplete information. This article is similar to previously published articles by the authors ([6, 8, 10]) with the exception of zerosum and discounted utility. The main purpose of this article is to construct efficient suboptimal strategies for both players to solve the abovementioned infinitehorizon game with asymmetric and incomplete information. Asymmetric information games (cf. [12]  [16]) provide a rich framework to model situations in which one player lacks complete knowledge about the “state of the nature”. The player who possesses complete knowledge about the state of the nature is known as the informed player and the one who lacks this knowledge is called the uninformed player. The smart jammer is modeled as the informed row player, whereas LTE eNode B is modeled as the uninformed column player. The informed player deals with the ultimate and subtle tradeoff of exploiting its superior information at the cost of revealing that information via its actions or some other (unavoidable) signals during repeated interactions with other players (cf. [12, 14]). In most gametheoretic literature on repeated games with asymmetric information, the informed player’s strategy is computed based on how much information it should reveal for an optimal or suboptimal policy. Furthermore, many informed player zerosum formulations model the uninformed player as a Bayesian player in order to solve asymmetric games (cf. [17]  [21]). However, relatively little work has been done to address the optimal strategy computation of the uninformed player in an infinitehorizon repeated zerosum game with asymmetric information [22]. The main difficulty arises from the fact that the uninformed player lacks complete knowledge about the state of nature and informed player’s belief state, which plays a crucial role in determining players’ payoffs and strategies. However, it has been shown in[15] that the uninformed player’s security strategy does exist in finitehorizon and infinitehorizon games with discounted and bounded average cost formulations. Furthermore, it has been shown in [23] that the uninformed player’s security strategy does not depend on history of its own actions.
This article attempts to solve the abovementioned LTE network vs. smart jammer game by constructing efficient linear programming (LP) formulations for both players’ “approximated” strategy computation and unique “expected” strategy computation for the eNode B. The informed player’s (the smart jammer) security strategy (optimal strategy in the worstcase scenario) only depends on the history of its own actions and is independent of the other player’s actions. The smart jammer models the uninformed player as a Bayesian player, making Bayesian updates with an evolving belief state. However, in order to solve the infinitehorizon game efficiently, fixedsized sufficient statistics are needed for both players that do not grow with the horizon. The evolving belief state serves as sufficient statistics for the informed player in a discounted asymmetric repeated game. On the other hand, the uninformed player’s (the eNode B) security strategy does not depend on the history of its own actions, but rather depends on the history of the informed player’s actions. However, the uninformed player does not have access to the informed player’s belief state and needs to find different fixedsized sufficient statistics. Fortunately, the uninformed player’s security strategy in the dual game depends only on a fixedsized sufficient statistics that is fully available to it. Furthermore, the uninformed player’s security strategy in the dual game, with initial worstcase regret vector, also serves as its security strategy in the primal game. Therefore, initial worstcase regret of its security strategy and its antidiscounted update (which is the same size as the cardinality of system state) is used as the fixedsized sufficient statistics for the uninformed player. Although the abovementioned sufficient statistics are fixedsized for both players in an infinitehorizon game, the optimal security strategy computation in discounted asymmetric game is still hard to compute because of nonconvexity [24]. Consequently, “approximated” security strategies based on an approximated optimal game value with guaranteed performance are computed for both players that are based on a recent work by Li and Shamma [21].
The abovementioned “approximated” security strategies require full monitoring ^{2}^{2}2Full monitoring requires that all players are capable of observing previous actions of their opponents with certainty after each stage [12].. Since eNode B cannot observe smart jammer’s actions with complete certainty, a unique “expected” strategy formulation is also presented for the uninformed player (the eNode B) that does not require full monitoring. Both the smart jammer and the LTE eNode B exploit the “approximated” and “expected” formulations to compute their suboptimal yet efficient strategies in order to maximize their corresponding utilities. The main idea of the zerosum formulation is to find an (almost) optimal yet tractable strategy for both players in the worstcase scenario. It is to be noted here that the smart jammer is the maximizer in the zerosum game and the eNode B is the minimizer. Also, full monitoring is only limited to observing actions of the opponent and, hence, does not reveal any private information of ordinary UEs in the network to the smart jammer.
Ia Related Work
Game theory (cf. [25, 12, 13, 14, 15, 16, 26]) provides a rich set of mathematical tools to analyze and address conflict and cooperation scenarios in multiplayer situations, and as such has been applied to a multitude of realworld situations in economics, biology, cyber security, multiagent networks, wireless networks (cf. [27, 28, 29]) and more. In this article, the interaction between the LTE network and the smart jammer is modeled as an infinitehorizon zerosum repeated game with asymmetric information.
Zerosum repeated game formulations have been studied extensively in the gametheoretic literature, including asymmetric information cases, such as Chapter 5 of [12], Chapter 4 of [14], Chapters 2  4 of [15], and Chapter 2 of [26]. However, most of the prior work on asymmetric zerosum repeated games deals with the informed player’s viewpoint. For example, [12] and [14] pointed out that the informed player might reveal its superior information implicitly by its actions and, hence, may want to refrain from certain actions in order not to reveal that information. In case of full monitoring, playing nonrevealing strategy for the informed player is equivalent to not using its superior information [12]. Furthermore, [15] showed that the informed player’s belief state (conditional probability of the game being played given history of informed player’s actions) is its sufficient statistics to make longrun decisions. Hence, many informed player’s strategies (cf. [17]  [21]) use the belief state as their sufficient statistics. However, [24] showed that computing the optimal value of the infinitehorizon repeated game is nonconvex and identified computational complexities involved in solving infinitehorizon games. Therefore, the abovementioned articles approximate the optimal game value via linear programming. On the other hand, limited work has been done for the uninformed player’s optimal strategy computation as compared to the vast research done for the informed player [22]. It is, however, known that the uninformed player’s security strategy exists in infinitehorizon repeated zerosum games, and that it does not depend on the history of its own actions (cf. [20, 23]). But, efficient computation of the uninformed player’s optimal security strategy is still an open problem. Recently, [22] suggested that the uninformed player could use its expected payoff for each candidate game as sufficient statistics since it is unaware of the game being played. Similarly, [20] used realized vector payoff as the uninformed player’s sufficient statistics to compute its efficient but suboptimal strategy in finitehorizon zerosum repeated games. However, it is to be noted here that all of these formulations are based on commonlyassumed notion of full monitoring in which players can perfectly observe their opponent’s actions. This article also utilizes the notion of full monitoring for its “approximated” strategy computation.
Although there has been quite a lot of work done on infinitehorizon repeated zerosum games with asymmetric information, there does not exist any tractable nonzerosum formulations for the uninformed player that can be used for its optimal strategy computation in infinitehorizon asymmetric repeated games. Most of the classic generalsum (nonzerosum) gametheoretic literature like Chapter 6 of [12] and Chapters V and IX of [16] focus on the characterization and existence of equilibria in repeated games with asymmetric information, and deal with the optimal strategy construction for the full monitoring case. Chapter V of [16] also suggests using approachability theory for the construction of the uninformed player’s strategy for the full monitoring case. However, none of these formulations result in efficient computation of the uninformed player’s optimal strategy. This problem gets further complicated for generalsum (nonzerosum) games with imperfect monitoring. For example, [30] pointed out that the solution of a generalsum (nonzerosum) stochastic game with both incomplete knowledge and imperfect monitoring is an open problem and there is no wellestablished solution available so far. To the best of our knowledge, that is still the case for repeated as well as stochastic generalsum (nonzerosum) games (e.g., see [22, 35]). This is one of the main reasons that the LTE network and smart jammer interaction is not modeled as a generalsum (nonzerosum) game in this paper.
Bayesian approaches have been widely used to solve asymmetric information problems in which updated belief state can be used as sufficient statistics for the informed player. Belief state often serves as a tool for updating the internal notion of a player’s knowledge related to another. For example, [17]  [21] modeled the uninformed player as a Bayesian player in order to compute the informed player’s suboptimal strategies in repeated zerosum games. Similarly, [31]  [33] used Bayesian approaches to devise an uninformed player’s strategy based on expected payoff, and [30] employed Bayesian NashQ learning in an incomplete information stochastic game and used Bayes’ formula to update belief of an Intrusion Detection System (IDS). Another commonly used technique to address lack of information problems is state estimation. For example, [34] used a Kalman filter to estimate the state of an observable, linear, stochastic dynamic system in an infrastructure security game. However, the system of interest and game dynamics in this article are nonlinear and may not be completelyobservable. Therefore, the applicability of state estimation techniques is very limited.
To the best of our knowledge, there does not exist any explicit formulations for the optimal strategy computation of the uninformed player in an infinitehorizon repeated zerosum game with asymmetric information [22]. However, it has been shown in[15] that the uninformed player’s security strategy does exist in finitehorizon and infinitehorizon games with discounted and bounded average cost formulations. Moreover, it has been shown in [23] that the uninformed player’s security strategy does not depend on history of its own actions. Nevertheless, a recent LP formulation in [21] provides an efficient technique for the explicit “approximated” strategy computation of the uninformed player in infinitehorizon asymmetric repeated zerosum games, but with the assumption of full monitoring.
Furthermore, there are multiple differences between this article and [10] which is focused only on estimating the jammer type at the beginning of the game and as such strategies computed in that article cannot be used for longterm interaction. On the other hand, this article is focused on computing both players’ strategies for very longterm interaction. Moreover, smart jammer in [10] is modeled as a myopic player (i.e., it only cares about shortterm utility) as opposed to being modeled as a strategic player in this article. In addition, the interaction between the smart jammer and the eNode B in [10] is modeled as a generalsum (nonzerosum) game without perfect monitoring as opposed to zerosum formulation in this article. The eNode B exploits “jamming sense” part of the algorithm presented in [10] to invoke strategy computation algorithms discussed in this article. On the other hand, the article [21] written by coauthors Li and Shamma is focused on an approximated yet efficient LP formulation for both players’ strategies in an abstract repeated zerosum game with perfect monitoring. This “approximated” strategy construction technique is further extended to realistic smart jammer and LTE network interaction in this article. On top of that, a unique “expected” strategy formulation is also explored in the article.
Smart jamming problem in LTE networks has been studied extensively lately. However, to the best of our knowledge, none of the articles published so far studied the smart jamming problem in LTE networks in a gametheoretic manner.
Ii Smart Jamming in LTE Networks
Potential smart jamming attacks and suggested network countermeasures are the same as described in [6]. They are briefly discussed here for the sake of completeness.
Iia Smart Jamming Attacks on an LTE Network
The set of smart jammer’s pure actions consists of the following jamming attacks ^{3}^{3}3See [2] or [3] for the description of various LTE channels:

= Inactive (no jamming): corresponds to default UE operation when no jammer is active in the network.

= Jam CSRS: corresponds to OFDM pilot jamming/nulling extensively studied in the literature. This action prevents all UEs from demodulating data channels by prohibiting them from performing coherent demodulation, degrades cell quality measurements and blocks initial cell acquisition in the jamming area.

= Jam CSRS + PUCCH: corresponds to jamming PUCCH in the UL in addition to DL CSRS jamming. This action could be more catastrophic for Connected mode UEs as compared to jamming the CSRS alone due to eNode B’s loss of critical UL control information but requires more sophisticated dualband jamming.

= Jam CSRS + PBCH + PRACH: corresponds to jamming DL broadcast channel PBCH and UL random access channel PRACH in addition to pilot nulling. This action is intended to block reselection/handover of UEs from neighboring cells and block synchronization of idle mode and outofsync UEs.

= Jam CSRS + PCFICH + PUCCH + PRACH: corresponds to jamming CSRS and PCFICH in the DL and jamming PUCCH and PRACH in the UL. This action is intended to cause loss of DL and UL grants, radio link failures and loss of UE synchronization mostly in Connected mode UEs.
Although jamming individual control channels may also cause denialofservice (DOS) attacks, jamming effects may not have specific consequences desired by the smart jammer. For example, jamming CSRS alone could be limited to specific part of a cell depending on the jammer location and its transmit power and jamming PBCH alone may only prevent a small fraction of UEs to not reselect/handover to the cell if they have not visited that particular cell recently. Even though concatenation of jamming multiple control channels requires distribution of the jammer’s transmit power among all jamming activities, it allows the smart jammer to target specific aspects of network operation, such as cell reselection/handover, data download and upload etc.
In addition to the abovementioned pure actions, the smart jammer uses its probability of jamming () and transmit power () to decide when to jam the network and how much power to use during a particular jamming attack. The duty cycle of each action is also implicitly modeled in the utility function of the jammer. Thus, the smart jammer launches denialofservice (DoS) and loss of service attacks on the LTE network by employing these actions and can be easily implemented using a softwaredefined radio (SDR) and a colluding UE.
IiB Suggested Network Countermeasures
It is proposed that the network can use the following (pure) countermeasures in case of a jamming attack:

= Normal (default action): corresponds to default network operation.

= Increase CSRS Transmit Power: corresponds to pilot boosting in order to alleviate CSRS jamming at the expense of transmitting other channels at lower transmit power than normal operation.

= Throttle: corresponds to a specific threat mechanism when all active UEs’ DL/UL grants (and hence throughputs) are throttled.

= Change eNode B + SIB 2: corresponds to a specific interference avoidance mechanism when the network “relocates” its carrier frequency to a different carrier within its allocated band/channel and rearranges itself into a lower occupied bandwidth configuration. It also changes its PRACH configuration parameters in SIB 2 to alleviate PRACH jamming.

= Change eNode B Timing: corresponds to a specific interference avoidance mechanism in which network “resets” its frame/subframe/slot/symbol timing and SIB 2 parameters. Ongoing data sessions would be handed over to neighboring cells before the “reset” and the cell would not be available during transition.
It is to be noted here that the abovementioned countermeasures do not require any exogenous information or significant changes in 3GPP specifications and can be implemented easily with current technology. Furthermore, the network is not aware of jammer’s location, jamming waveform, and its probability of jamming . Also, the average duty cycle and eNode B’s transmit power () determine the power consumption of the network, modeled in its utility function. The curious reader is encouraged to see [10, 35] for further details on smart jammer actions and network countermeasures.
Iii LTE Network & Smart Jammer Dynamics
Iiia Network Model
The network model used in this article is the same as developed in [10] and is briefly discussed here for the sake of completeness.
UEs arrive in the cell according to a homogeneous 2D Stationary Spatial Poisson Point Process (SPPP) with the rate per unit area and are uniformly distributed over the entire cell conditioned on the total number of users N. The largescale path loss is modeled using the Simplified Path Loss Model [36].
(1) 
where is the received power, is the transmitted power, is a constant, is the path loss exponent, is the distance between the transmitter and receiver, and is the outdoor reference distance for antenna far field. The smallscale multipath fading is modeled using exponentially distributed Rayleighfaded channel gains at each subcarrier. Thus, the instantaneous SINR of a particular OFDM subcarrier is modeled as follows:
(2) 
where and are desired and jammer transmit powers, and are exponentially distributed Rayleighfaded channel gains, and are largescale distances from desired transmitter and jammer respectively, is the path loss exponent, and is the noise variance at the receiver. It is assumed that InterCell Interference (ICI) is independent of jamming and, hence, any residual ICI can be lumped together in the noise variance for the scope of this article. It is further assumed that is the same at all receivers.
The SINR in (2) can be rewritten in terms of the CarriertoJammer ratio as follows:
(3) 
Equations (2) and (3) are used to model the SINR of narrowband flatfaded signals and channels like CSRS, PCFICH, PUCCH etc.. However, wideband channels like PDSCH and PUSCH cannot be modeled using (2) or (3). Furthermore, SINR estimation is done in the frequency domain.
In addition, the LTE network’s th user’s DL PDSCH throughput in the th resource block during the th subframe is modeled as a fraction of Shannon’s AWGN Channel Capacity as described in (4).
(4) 
where is the bandwidth of a single RB i.e. 180 kHz. For the purposes of this article, it is assumed that .
The th user’s total throughput in a given subframe is the sum of its assigned RBs’ throughput for that particular subframe. It is modeled that the eNode B uses a Proportional Fair Scheduling (PFS) [37] algorithm to allocate resources to its users. User is allocated in resource block during the th subframe if the ratio of its achievable instantaneous data rate and longterm average throughput in (5) is the highest among all the active users in the network. The longterm average throughput of user , during subframe is computed using the recursive equations (5) and (6) below:
(5) 
(6) 
where represents fairness time window, and is an indicator function.
In general, the overall LTE network dynamics can be modeled as a highly nonlinear dynamical system described by:
(7) 
where represents state of the network (not to be confused with the gametheoretic state of nature ) with each row corresponding to the user , including elements for each user (such as, SINRs of its control and data channels, and average throughput for user ); represents the gametheoretic state of nature (jammer type) described in the next section; represents eNode B action; represents jammer’s action and characterizes the randomness in the network induced by the channel, arbitrary user locations, varying transmit power levels, PFS scheduling and other sources of randomness in the network. Thus, the network dynamics are modeled as a PartiallyObservable Markov Decision Process (POMDP).
Evidently, it is nontrivial and intractable to model the LTE network and smart jammer dynamics analytically. Hence, these abstracted dynamics are simulated in MATLAB without losing any modeling fidelity. Although this article can be used as a building block for more complicated scenarios, multicell and multijammer scenarios are beyond the scope of this article.
IiiB GameTheoretic Model
Notations: For the rest of this paper, the following mathematical notations are used. Let be a finite set. Its cardinality is denoted by , and the set of all probability distributions over is indicated by .
IiiB1 Game Model
The interaction between the LTE network and the smart jammer is modeled as a strictly competitive infinitehorizon zerosum repeated Bayesian game with asymmetric and incomplete information, with the smart jammer as the informed (row) player and the eNode B as the uninformed (column) player. Infinitehorizon games are used to model situations in which horizon length is not fixed in advance and there is a nonzero probability at the end of each stage that the game will continue for next stage (e.g., see [13]). The game is described by

= {smart jammer, eNode B}, the set of players,

, the set of states of nature (jammer types),

, the prior probability distribution on , which is common knowledge,

and , the set of pure actions of the smart jammer and the eNode B, respectively as described in Section II, where and represent corresponding elements in these sets,

, a set of sequences such that each is a history of observations,

, the information partition of player and

, the singlestage utility function of player , the utility matrix of player given jammer type whose element is , .
Following the convention used in gametheoretic literature including [21], the informed player, i.e., the smart jammer is played as the maximizer (row player), whereas the uninformed player, i.e., the eNode B is played as the minimizer (column player). It is to be noted here that ordinary user equipments (UEs) are not modeled as players in this article.
IiiB2 Jammer Types
The type of smart jammer is classified as:

Type I: Cheater

Type II: Saboteur
The type Cheater is used to model the jammer with the intent of getting more resources for itself as a result of reduced competition among UEs. Thus, a cheating UE is always present in the network with an active data session. On the other hand, the type Saboteur is used to model the jammer with the intent of causing highest possible damage to the network resources. Thus, a sabotaging UE may be unattached to the network. It is to be noted here that the “Normal (inactive)” jammer type is not modeled here because jammer is not present in that state and it is in the best interest of the network to play default normal action in that case. The strategy algorithms presented in this article are invoked when jammer is present in the network and/or when jamming is sensed by the network using “jamming sense” part of the algorithm presented in [10]. It is to be noted here that absence of jamming does not imply jammer’s absence as an active smart jammer may also decide to play “inactive” action as part of its strategy during an attack.
IiiB3 Strategies
Both the network and the jammer are modeled as rational and strategic. By definition, a pure strategy of a player is a mapping from each nonterminal history to a pure action and a mixed strategy is a probability measure over the set of its pure strategies. A behavioral strategy specifies a probability measure over its available actions at each stage when an action needs to be taken [13]. Also, the best response (BR) is the strategy (or strategies) that produces the most favorable outcome for a player given other players’ strategies [25]. Two types of suboptimal security strategies for infinitehorizon game are presented in this article, as discussed in Section V.
IiiB4 Information Partitions
The jammer is informed of its own type . However, eNode B is only informed about the prior probability distribution . This results in a game with asymmetric information, with lack of information on the network side, making eNode B the uninformed player.
IiiB5 Observable Signals
It is assumed that for the “approximated” strategy computation, players can observe each other’s actions with certainty after each stage, i.e. full monitoring requirements are satisfied. This is a very widely used assumption used in classic and modern gametheoretic literature (e.g., Chapter 6 of [12]). The network can distinguish between smart jammer’s different actions at high SNR and can make reasonable estimates at low SNR. However, the imperfect monitoring case is beyond the scope of the “approximated” formulation presented in this article. On the other hand, the “expected” strategy formulation for repeated games does not require any full monitoring.
IiiB6 Utilities
Both players’ utility functions are based on their key performance indicators (KPIs) and are defined to reflect a strictly competitive (zerosum) setting, i.e., one player’s gain is the other player’s loss as described by:
(8) 
When the system state is Cheater, the zerosum utility function is simplified as
(9) 
where represents change in the Cheater’s normalized average throughput from the baseline scenario, represents its corresponding weight, represents the normalized average number of Connected mode UEs in the network when the Cheater is present, represents its corresponding weight and represents expectation with respect to randomness caused by as mentioned in (7).
The Cheater tries to maximize (9) in order to increase its throughput from the baseline scenario and reduce the number of Connected mode UEs in the network which at the same time reduces the competition for limited network resources. The eNode B, on the other hand, tries to minimize (9) to do the opposite, hence, creating a proper zerosum game.
Similarly, the zerosum utility function is defined in (10) when the system state is Saboteur.
(10) 
where represents the normalized average number of Connected mode UEs in the network when Saboteur is present, represents its corresponding weight, represents eNode B’s normalized average throughput/UE, represents its corresponding weight and , again, represents the expectation with respect to randomness caused by as mentioned above.
The Saboteur tries to maximize the opposite (negative of) eNode B utility defined in terms of average number of Connected mode users and average throughput/UE, hence, defining the zerosum game.
Note that there are no “unilateral” fixed costs associated with either player in the abovementioned zerosum construction. This means that the game would be played without modeling higher “fidelity” parameters like players’ duty cycles and implicit cost associated with eNode B actions like ’f Change’ and ’Timing Change’. However, this “fidelity” loss does not affect the inherent nature of smart jammer and network interaction and, hence, can be discounted. It is also to be noted here that the utility functions are different for different jammer type, which is a common phenomenon in Bayesian games. Furthermore, the key performance indicators (KPIs) are functions of observable parameters only, for example, eNode B’s utility is a function of parameters observed from Connected Mode UEs.
Interestingly, none of the players need to compute their utilities explicitly as it is not used to make strategy decisions in repeated games as discussed later in Section V. The payoffs are received by both players as a result of the interaction between the smart jammer and the network. Even though the network does not know the jammer type, it can compute expected utility for minimization based on the prior (and updated belief) probability of specific jammer type presence.
IiiB7 Game Play
At the beginning of the game, nature flips a coin and selects (jammer type) according to , which remains fixed for the rest of the game. The jammer is informed about its selected type but eNode B is not. However, in a repeated game, eNode B’s history of interaction with the jammer evolves with time which may affect its belief about .
Iv SingleShot Game
The singleshot game is played between the smart jammer as the maximizer (row player) and the network as the minimizer (column player). The maxmin value for the row player for given state is denoted by ; whereas the minmax value for the column player is denoted by . It is widely known that is always true. However, when is satisfied, then the game is said to have a value . The legendary von Neumann’s celebrated Minmax Theorem states that any matrix game has a value in mixed strategies and the players have optimal strategies [15], i.e., the minmax solution of a zerosum game is the same as the Nash equilibrium. Both players play their security strategies in a zerosum game to guarantee the best outcome under the worst conditions, due to the game’s strictly competitive (zerosum) nature.
The singleshot game simulation results are obtained from a MonteCarlo simulation of LTE network and smart jammer dynamics as dicussed in Section III. The following parameters are used for our simulations:

Carriertojammer power ratio: dB,

Probability of jamming: ,

Weight of no. of Connected UEs for Type I: ,

Weight of no. of Connected UEs for Type II: ,

Weight of average throughput for Type I: ,

Weight of average throughput for Type II: .
The following simulation results are obtained for the singleshot game when the jammer type is Cheater. The first element of the utility matrices represent the baseline scenario when no jammer is active in the network and network is playing its default normal action.
Similarly, the simulation results for the singleshot game when the jammer type is Saboteur are presented below.
For the complete information case when the network is aware of the jammer type Cheater, the game has a single pure strategy Nash Equilibrium, (’Jam CSRS + PUCCH’, ’Throttling’), with the game value , satisfying the following equation.
For the complete information case when the network is aware of the jammer type Saboteur, the game does not have any pure strategy Nash Equilibrium. If the players are allowed to use mixed strategies, i.e., a probability distribution over a player’s action set, then there exists a mixed strategy Nash Equilibrium , where , and with the game value , satisfying the following equation. This mixed strategy probability distribution loosely translates to playing (’Jam CSRS’, ’Jam CSRS + PCFICH + PUCCH + PRACH’) and (’Normal’, ’Timing Change’) equally likely by the jammer and the eNode B respectively.
where is the expected value of the singlestage utility given mixed strategies and . Given the utility matrix, linear programming is used to compute the Nash Equilibirum [13] with and , and the game value .
However, in the asymmetric information case, eNode B only knows the probability distribution over jammer’s types which is public information, while the jammer knows exactly its own type. Knowing its own type, the jammer can use a different strategy for different states . Therefore, in the asymmetric game, jammer’s mixed strategy is a mapping from to . The singleshot asymmetric game still has a mixed strategy Nash Equilibrium , where and satisfy
where is the expected value of the singlestage utility given the initial probability and mixed strategies and . It is to be noted here that the utility functions are common knowledge. Although eNode B is unaware of the jammer type, it knows that the utility function is either or given that the jammer is present in the network. For a given prior probability , the eNode B can evaluate its expected utility whose minmax value (i.e., the game value) is a function of prior probability . Since, is fixed, the game value shall also remain fixed. The Nash Equilibrium for the asymmetric information game can be computed by solving an LP by setting the time horizon to a single stage [38].
V InfiniteHorizon Asymmetric Repeated Game Strategy Algorithms
The repetition of a zerosum game in its basic form does not warrant further study as the players can play their optimal security strategies i.i.d. at each stage to guarantee an optimal game value [15]. However, in repeated asymmetric games, playing the optimal strategy in a singlestage asymmetric game i.i.d. at each stage does not guarantee the player the optimal game value [15]. Therefore, the repeated game needs to be studied further.
It is assumed that both players’ actions are publicly known at the end of each stage. The jammer’s action history at stage is , and denotes the set of all possible action histories of the jammer at stage . Similarly, eNode B’s action history at stage is , and the set of all possible action histories of eNode B at stage is denoted by .
Since the optimal strategies of both players do not depend on the action history of the uninformed player eNode B [15], the behavior strategy of the jammer at stage is defined as a mapping from to , and the behavior strategy of eNode B at stage is a mapping from to . The behavior strategies of jammer and eNode B are denoted by and , and the set of all possible behavior strategies are denoted by and , respectively.
This article considers a discounted utility function as the overall utility function in the infinite horizon game. The discounted utility function is commonly used in both classic and modern gametheoretic literature (cf. [12, 14, 15, 16, 19]). The main reason for selecting discounted utility formulation is its guarantee of convergence in infinitehorizon games. If average utility formulation is selected then the overall payoff is taken as a limit, which may or may not exist. Also, uniformity conditions are required for equilibrium in average utility formulations [12]. It is also shown in [15] that as , the game value of a discounted infinitehorizon game converges to that of an average reward game. It is to be noted here that is merely a mathematical constant dictating the discount factor and does not represent any physical quantity like received signal strength or SNR etc. The discounted utility function is defined as follows:
(11) 
Discounted utility represents the idea that players often focus more on current reward and apply a discount factor of to future reward. Based on the discounted utility function, jammer has a security level which is the maximum utility it can get in the game if eNode B always plays the best response strategy. The strategy that guarantees this value no matter what strategy eNode B plays is called the jammer’s security strategy. Similarly, eNode B’s security level is defined as , and the strategy that guarantees the security level no matter what strategy jammer plays is called eNode B’s security strategy. If the security levels of both players are the same, which is true in our case, it can be said that the game has a value , and there exists a Nash Equilibrium.
This article is concerned with the security strategies of both players. However, in our case, the system has multiple states and the game is played with the lack of information on one side. Li et al. showed that the security strategies for both the players in finitehorizon asymmetric information repeated zerosum games depend only on the informed player’s history actions [20]. For the infinitehorizon games, this would imply utilizing large amount of memories to record the history actions. It is, therefore, necessary for the players to find fixedsize sufficient statistics for decision making in discounted infinitehorizon games. However, it is still nontrivial to compute optimal security strategies even with fixedsize sufficient statistics. Therefore, Li & Shamma provided approximated security strategies with guaranteed performance to solve infinitehorizon games [21].
Va The Smart Jammer’s Approximated Security Strategy Algorithm
Since jammer’s behavior strategy depends on the type of the jammer, the type of the jammer may be revealed through the action history of the jammer. The revelation is characterized by the conditional probability over conditioned on jammer’s action history , which is updated as follows
(12) 
with and represents weighted average of . The conditional probability is also called the belief state which is the sufficient statistics for the informed player, the jammer, to make a decision at stage . It was shown in [15] that the informed player has a stationary security strategy that only depends on , and the game value satisfies the following recursive equation
(13) 
where represents jammer’s behavioral strategy given state , represents eNode B’s behavioral strategy, and . Although the game value from a stage to the end of the game changes as evolves with time, the game value from the first stage to the end of the game remains the same as is fixed.
It is nonconvex to compute and the corresponding security strategies [24] and, therefore, an approximated strategy is proposed. The basic idea is to use the game value of a stage discounted asymmetric repeated game to approximate the game value , and compute the approximated security strategy based on the approximated game value . Define the jammer’s stationary behavior strategy as . The approximated stationary security strategy is
(14) 
where is an matrix whose th column is .
Furthermore, Li & Shamma constructed a linear program to compute the approximated game value and corresponding approximated security strategy . It was shown that satisfies the following linear program in the discounted zerosum asymmetric game :
(15) 
(16)  
(17)  
(18)  
(19)  
(20) 
where is a set of all properly dimensioned real vectors, is a properly dimensioned real space, and corresponds to a concatenation. The approximated security strategy is
(21) 
where is the optimal solution of the linear program in (15)  (20). The curious reader is encouraged to see [21] for further details.
VA1 The Algorithm
The LPbased algorithm for the informed player to compute its approximated security strategy and update belief state in discounted asymmetric repeated game is presented as follows [21]:

Initialization:

Read payoff matrices , prior probability , and system state .

Set receding horizon length .

Let , and .


Compute the informed player’s approximated security strategy based on (21) with .

Choose an action according to the probability , and announce it publicly.

Update the belief state according to (12).

Update and go to step 2.
VB The eNode B’s Approximated Security Strategy Algorithm
The uninformed player does not have access to the informed player’s strategy or belief state , therefore, cannot serve as its sufficient statistics. The sufficient statistics of the uninformed player, eNode B, was shown to be the antidiscounted regret which will be explained further. The regret in state is defined as the difference between the expected realized utility so far and the security level of eNode B’s security strategy, given state , i.e.,
where
is eNode B’s security strategy, indicates jammer’s behavior strategy given , and is the corresponding set including all . The antidiscounted regret is defined as
and is updated according to
(22) 
where is the th row of matrix .
Computation of the security level of eNode B’s security strategy is nonconvex [21]. Therefore, an approximated security level is used, which is the security level of eNode B’s security strategy given state in stage discounted asymmetric repeated game. The approximated security level is computed according to the following linear program:
(23) 
(24) 
(25) 
(26) 
where is properlydimensioned real space. The approximated security level is , where is the optimal solution to the LP problem (2326).
The eNode B has a stationary security strategy that only depends on the antidiscounted regret [21]. Define eNode B’s stationary behavior strategy as . Computation of the stationary security strategy of eNode B is nonconvex [21]. Therefore, an approximated stationary security strategy of eNode B is proposed in [21], which can be computed by solving the following LP problem.
(27) 
(28) 
(29) 
(30) 
(31) 
where is properlydimensioned real space. The uninformed player’s approximated security strategy is . The curious reader is encouraged to see [21] for further details.
VB1 The Algorithm
The LPbased algorithm for the uninformed player to compute its approximated security strategy in discounted asymmetric repeated game is presented as follows [21]:

Choose an action according to the probability , and announce it publicly.

Read the informed player’s action, and update the antidiscounted regret according to (22).

Update and go to Step 2.
It is to be noted here that jamming sense is still required by the network, i.e., the network has to first decide whether it is under jamming attack or not in order to invoke the abovementioned algorithm.
VC The eNode B’s Expected Security Strategy Algorithm
The “expected strategy” algorithm for the eNode B is defined as a simplex over its completeinformation singleshot game security strategies with the same probability as prior . In other words, the eNode B would play the completeinformation singleshot security strategies and with the probabilities and respectively. Since, the prior is common knowledge, it alleviates eNode B from “learning” and full monitoring in a repeated game. Thus, the eNode B essentially plays a singleshot strategy in a repeated game but without the requirements of full monitoring, which may not be such a bad idea if the jammer plays “nonrevealing” strategies. Furthermore, the network does not need to observe the jammer’s action with certainty that leads to more practical implementations. Both discounted and average payoff formulations can be used with this algorithm. It is to be noted here that the expected security strategy algorithm is novel and not based on a prior work.
In the next section, the approximated security strategy and expected security strategy algorithms are used to design strategies for both the smart jammer and the LTE network. The discounted cost formulation is used for both the algorithms in the infinitehorizon game.
Vi Performance Analysis of Repeated Game Strategy Algorithms
The zerosum gametheoretic algorithms presented earlier are used to devise “approximated” strategy formulations for both the players in a discounted utility sense. However, these algorithms require full monitoring, i.e. the network has to observe jammer’s action at every stage with certainty. Therefore, the “expected” formulation is devised in which the network being the uninformed player simply plays its singleshot best response in an expected sense, i.e., it would play singleshot Best Response (BR) with the same probability distribution as the prior probability (which is common knowledge) of the jammer occurrence. This enables the network to alleviate full monitoring requirement, i.e., the network does not have to observe the jammer’s action with certainty and leads to more practical implementations.
The performance of both “approximated” and “expected” algorithms for discounted utility formulations is characterized in the following section. However, not all of the simulation results can be shared here due to space constraints. The following parameters were used for both the players in repeated game simulations (in addition to the singleshot case): discount factor and receding horizon length . It is to be noted here that the receding horizon length of is chosen for simulation efficiency purposes and almost the same results are obtained at higher values of T.
Via eNode B vs. Cheater
ViA1 Jammer Strategy
When the Cheater () is in the network, it always uses its “approximated” algorithm to devise repeated game strategy against the network. Also, being the informed player, there is no ambiguity about the system state so Cheater can decide to reveal its superior information as much as it suits it. The Cheater’s steady state belief state and repeated game strategy vs. prior probability are shown in Figs. 1 and 2, respectively, where and represent updated belief (probability) about the states and , respectively, and represents kth pure action of the Cheater. It is interesting to note that the Cheater always plays the same security (pure) strategy (play = ’Jam CSRS + PUCCH’) that it uses for a singleshot game, independent of the prior probability. It is also interesting to know that Cheater’s strategies are nonrevealing ^{4}^{4}4The informed player is said to play nonrevealing at stage when the posterior probabilities in (12) do not change at that stage if its mixed move at stage is independent of the state for all values of for which . In case when full monitoring is assumed, not revealing the information is equivalent to not using that information, [16]., even at a relatively low prior probability of its occurrence when . This means that the network does not “learn” anything new about the jammer type from jammer’s repeated actions despite full monitoring when and Cheater takes full advantage of its superior information. At relatively low prior probability of Cheater’s occurrence (), the jammer reveals very little information in the first stage when the belief state gets updated to but it remains the same after that. For instance, Fig. 3 shows the evoluation of Cheater’s belief state and its strategy at every stage when the prior probability is . This puts the network at a disadvantageous position in the game if the network plays as a Bayesian player, even when it can observe jammer’s actions perfectly at every stage.
ViA2 eNode B Strategies
The eNode B’s steady state “approximated” and “expected” security strategies vs. prior probability are plotted in Figs. 4, and 5, respectively, where represents kth pure action of the eNode B. The network’s strategies (both “expected” and “approximated”) evolve with varying prior probability levels as it is the uninformed player. The “approximated” strategy relies on full monitoring and switches to a different strategy at , when it starts playing = ‘Throttling’ (its security strategy against Cheater in completeinformation singleshot game) in addition to playing = ‘Change ’. On the other hand, the “expected” algorithm does not rely on full monitoring and, hence, uses an expectation of its singleshot strategies involving playing mixed strategy over ‘Normal’, ‘Throttling’ and ‘Change Timing’. The “expected” strategy is precomputed based on the prior probability and does not change as the game proceeds, whereas the “approximated” algorithm converges in around 12 stages. The “expected” strategy algorithm may work well enough for the network as the jammer’s strategies are mostly nonrevealing and the “approximated” algorithm requires full monitoring.
ViA3 eNode B’s discounted Utilities
A snapshot of both eNode B and Cheater’s actions and eNode B’s utility at every stage is shown in Fig. 6 for and . It is apparent that the eNode B’s (hence, Cheater’s) utility stabilizes very quickly at the beginning of the game  a trend that is observed throughout the repeated game. The eNode B’s “approximated” and “expected” discounted utility values against Cheater are plotted in Fig. 7 at different prior probability levels. The “approximated security” algorithm performs almost optimally when , whereas the “expected” algorithm performs poorly as compared to the “approximated” algorithm with the exception of low prior values. The “approximated” algorithm uses full monitoring and repeated game linear programming (LP) formulation to compute its strategy and, hence, performs much better than its counterpart. On the other hand, the “expected” algorithm only relies on the prior probability and does not observe the jammer’s actions and, hence, ends up underperforming even when the jammer uses its singleshot security strategy. When prior probability for Cheater’s occurrence is low (i.e., ), eNode B strategies fail to even come close to the completeinformation singleshot value. This happens due to the fact that it is rather unlikely for the Cheater to be present in the network at such low prior value and eNode B strategy algorithms are not robust enough to address this problem.
ViB eNode B vs. Saboteur
ViB1 Jammer Strategy
Similar to the eNode B vs. Cheater game, Saboteur’s steady state belief states and “approximated security” strategies vs. prior probability of its occurrence are shown in Figs. 8 and 9, respectively. It is very interesting to note that being the informed player, Saboteur plays nonrevealing and “misleading” strategies even at prior probability values as high as (this value goes up to for ). It plays its type (Cheater) dominant security strategy (play = ’Jam CSRS + PUCCH’) while actually being a type (Saboteur) jammer. For example, Fig. 10 shows the evoluation of Saboteur’s belief state and its strategy at every stage when the prior probability is . At high prior probability values of , the jammer’s belief state goes through a transition period because the network forces it to reveal its true identity by playing with certainty. Hence, the belief state eventually settles down to the completelyrevealing state of . During the transition period when the jammer’s belief state converges to , it plays its singleshot security strategy for state , i.e., play = ’Jam CSRS’ and = ’Jam CSRS + PUCCH + PCFICH + PRACH’ with almost the same probability. At very high prior probability levels of , the state information () is completely revealed and the jammer plays its singleshot security strategy for state as mentioned above. Hence, the jammer uses its superior information to its complete advantage even when full monitoring is allowed. This is a good example of the strength of superior information and how it can be exploited in asymmetric games against an adversary.
ViB2 eNode B Strategies
Similar to the repeated game against Cheater, the eNode B adapts its repeated game strategy against Saboteur as the game proceeds. From the simulations, eNode B’s strategy seems to converge in 12 stages. The “expected” strategy is shown in Fig. 11 and is deployed similar to the game against Cheater. Since, the “expected” strategy algorithm is oblivious to the actual jammer type and does not use full monitoring, its mixed strategy does not depend on the system state and is played solely based on the prior probability value.
On the other hand, the “approximated” security strategy algorithm relies on the repeated game and full monitoring to adapt its strategy. The network’s steady state “approximated” strategy vs. prior probability is plotted in Fig. 12. As discussed above, the jammer plays completely nonrevealing and misleading strategies for and, hence, eNode B gets tricked into believing that it is playing against Cheater (), when in fact it is playing against the Saboteur (). This leads the network to play the same strategy that it played against Cheater until . At , the network plays = ’Change frequency’ with certainty to force the jammer to reveal its state. When the jammer starts playing its security strategy for state at high prior values, the network switches to its own security strategy against Saboteur and plays = ’Normal’ + = ’Change Timing’. The network also plays = ’Pilot Boosting’ with a very low probability. This trend continues whenever the network observes (due to full monitoring) the jammer playing its state security strategies. It is curious to see how the network gets tricked by the jammer even with full monitoring because it lacks information about the system state.
ViB3 eNode B’s discounted Utilities
A snapshot of both eNode B and Saboteur’s actions and eNode B’s utility at every stage is shown in Fig. 13 for and . Similar to the game against Cheater, eNode B’s (hence, Saboteur’s) utility stabilizes very quickly at the beginning of the game. The network’s discounted utility values for both “approximated” and “expected” security strategy algorithms are plotted against prior probability in Fig. 14. The jammer strategies are mostly nonrevealing and, hence, eNode B does not seem to “learn” much about the jammer type from its repeated interaction. Therefore, the “approximated security strategy” formulation seems to perform very poorly until . At , the eNode B switches its strategy to playing = ’Change ’ and catches up to the optimal value at . Obviously, the jammer also uses full monitoring and is forced to come out and play revealing strategy at .
On the other hand, the “expected strategy” algorithm seems to perform better than the “approximated security strategy” as it does not get tricked by the jammer’s nonrevealing strategies due to its oblivion. It appears that the “expected strategy” algorithm outperforms its counterpart when (or equivalently, ) given that the Cheater () is present in the network and (or equivalently, ) when Saboteur () is present in the network. Thus, it performs better in low prior probability regions, when eNode B does not expect a certain jammer type in the network.
Nevertheless, it becomes clear that the network is at a very disadvantageous position in the game against the smart jammer due to its lack of information and can be easily misled by the jammer. Furthermore, the “approximated” and “expected” strategy algorithms work in a complementary sense in favor of the network.
Vii Conclusion
In this article, the smart jammer and eNode B dynamics are modeled as a strictly competitive (zerosum) repeated asymmetric game with incomplete information and lack of information on the network side. The solution of a completeinformation singleshot game is based on very familiar security strategies that lead to a Nash equilibrium. However, tractable optimal strategy formulations for infinitehorizon asymmetric repeated games do not exist in gametheoretic literature, especially for the uninformed player. Therefore, efficient LP formulations from a recent work are used for “approximated” security strategy computation for both players that requires full monitoring. Therefore, a simplistic yet effective “expected” security strategy algorithm is also devised for the network that does not require full monitoring.
This article also presents and discusses performance characterization of the abovementioned algorithms. It turns out that the jammer is able to play nonrevealing strategies most of the time, which implies that the network is unable to learn any new information about the jammer type in repeated games. Hence, at low prior values, the network performs worse (or equivalently, smart jammer performs better) in repeated games as compared to the completeinformation singleshot game. In the game against the Cheater, the “approximated security strategy” algorithm is able to strategize against the Cheater rather quickly and achieves its optimal utility because the jammer plays its singleshot game security strategy in repeated game. However, this advantage goes away in the game against the Saboteur, when the jammer plays misleading strategies for a wide range of prior probabilities. Nevertheless, the network’s algorithm eventually catches up and forces the jammer to reveal its true type.
The unique “expected security strategy” algorithm performs equally well or sometimes better than its counterpart “approximated security strategy” algorithm against the type Saboteur. This is due to the fact that it does not get duped by the “misinformation” spread by the jammer due to lack of full monitoring, which plays at its advantage. However, the former algorithm performs better than the latter at low prior probability values against the type Cheater because the smart jammer always plays its singleshot game security strategy. Nevertheless, the biggest advantage of “expected strategy” algorithm comes from the fact that it does not require full monitoring and, hence, can be easily deployed in practical networks.
References
 [1]
 [2] 3rd Generation Partnership Project (3GPP): Technical Specifications; LTE (Evolved UTRA) and LTEAdvanced Radio Technology Series (Rel 14) [Online]. Available: http://www.3gpp.org/ftp/Specs/latest/Rel14/
 [3] S. Sesia, I. Toufik, and M. Baker (Eds.), LTE  The UMTS Long Term Evolution: From Theory to Practice. (2nd ed.) West Sussex, UK: Wiley, 2011.
 [4] The Global mobile Suppliers Association (GSA). (2018, Aug.) Status of the LTE Ecosystem  August 2018. [Online]. Available: http://gsacom.com/
 [5] R. P. Jover, J. Lackey, and A. Raghavan, “Enhancing the security of LTE networks against jamming attacks,” EURASIP Journal on Information Security, 2014, 2014:7.
 [6] F. M. Aziz, J. S. Shamma, and G. L. Stüber, “Resilience of LTE networks against smart jamming attacks,” in Proc. 2014 IEEE Globecom, Austin, TX, pp. 734739, Dec. 2014.
 [7] C. Shahriar, M. L. Pan, M. Lichtman, T. C. Clancy, R. McGwier, R. Tandon, S. Sodagari, and J. H. Reed, “PHYlayer resiliency in OFDM communications: a tutorial,” IEEE Comm. Surveys & Tutorials, vol. 17, no. 1, pp. 292314, Jan. 2015.
 [8] F. M. Aziz, J. S. Shamma, and G. L. Stüber, “Resilience of LTE networks against smart jamming attacks: wideband model,” in Proc. 2015 IEEE 26th International Symposium on PIMRC, Hong Kong, China, pp. 15341538, Aug  Sep. 2015.
 [9] M. Lichtman, R. P. Jover, M. Labib, R. Rao, V. Marojevic, and J. H. Reed, “LTE/LTEA jamming, spoofing, and sniffing: threat assessment and mitigation,” IEEE Comm. Magazine, vol. 54, no. 4, pp. 5461, Apr. 2016.
 [10] F. M. Aziz, J. S. Shamma, and G. L. Stüber, “Jammer type estimation in LTE with a smart jammer repeated game,” Vehicular Technology, IEEE Trans. on, vol. 66, no. 8. 74227432, Aug. 2017.
 [11] A. Garnaev, and W. Trappe, “The rival might not be smart: revising a CDMA jamming game,” in Proc. 2018 IEEE Wireless Communications and Networking Conference (WCNC), Barcelona, Spain, pp. 16, Apr. 2018.
 [12] R. J. Aumann, and S. Hart (Eds.), Handbook of Game Theory with Economic Applications. vol. 1. Amsterdam, The Netherlands: Elsevier, 1992.
 [13] M. J. Osborne, and A. Rubinstein, A Course in Game Theory. Cambridge: The MIT Press, 1994.
 [14] R. J. Aumann, and M. Maschler, Repeated Games with Incomplete Information. Cambridge: The MIT Press, 1995.
 [15] S. Sorin, A First Course on ZeroSum Repeated Games. vol. 37. Berlin Heidelberg: SpringerVerlag, 2002.
 [16] JF. Mertens, S. Sorin, and S. Zamir, Repeated Games. New York: Cambridge University Press, 2015.
 [17] M. Jones, and J. S. Shamma, “Policy improvement for repeated zerosum games with asymmetric information,” in Proc. 51st IEEE Conference on Decision and Control (CDC) 2012, pp. 77527757, Dec. 2012.
 [18] L. Li, and J. S. Shamma, “LP formulation of asymmetric zerosum stochastic games,” in Proc. 53rd IEEE Conference on Decision and Control (CDC) 2014, pp. 19301935, Dec. 2014.
 [19] L. Li, and J. S. Shamma, “Efficient computation of discounted asymmetric information zerosum stochastic games,” in Proc. 54th IEEE Conference on Decision and Control (CDC) 2015, pp. 45314536, Dec. 2015.
 [20] L. Li, E. Feron, and J. S. Shamma, “Finite stage asymmetric repeated games: Both players’ viewpoints,” in Proc. 55th IEEE Conference on Decision and Control (CDC) 2016, pp. 53105315, Dec. 2016.
 [21] L. Li, and J. S. Shamma, “Efficient strategy computation in zerosum asymmetric repeated games,” Automatic Control, IEEE Trans. on, accepted for publication, [Online] Available: arXiv preprint arXiv:1703.01952, 2017.
 [22] V. Kamble, Games with vector payoff: a dynamic programming approach. PhD dissertation, UC Berkeley, CA, Fall 2015.
 [23] B. De Meyer, “Repeated games and partial differential equations,” Mathematics of Operations Research, vol. 21, no. 1, pp. 209236, 1996.
 [24] T. Sandholm, “The state of solving large incompleteinformation games, and application to poker,” AI Magazine, vol. 31, no. 4, pp. 1332, 2010.
 [25] D. Fudenberg, and J. Tirole, Game Theory. Cambridge: The MIT Press, 1991.
 [26] H. P. Young, and S. Zamir (Eds.), Handbook of Game Theory with Economic Applications. vol. 4. Amsterdam, The Netherlands: Elsevier, 2015.
 [27] A. B. Mackenzie, and L. A. DaSilva, Game Theory for Wireless Engineers. San Rafael, California: Morgan & Claypool Publishers, 2006.
 [28] E. Altman, T. Boulogne, R. ElAzouzi, T. Jimenez, and L. Wynter, “A survey on networking games in telecommunications,” Computers & Operations Research, vol. 33, no. 2, pp. 286311, 2006.
 [29] Z. Han, D. Niyato, W. Saad, T. Basar, and A. Hjorungnes, Game Theory in Wireless and Communication Networks: Theory, Models, and Applications. New York: Cambridge University Press, 2011.
 [30] X. He, H. Dai, P. Ning, and R. Dutta, “Dynamic IDS configuration in the presence of intruder type uncertainty,” in Proc. 2015 IEEE Global Communications Conference (GLOBECOM), Dec. 2015.
 [31] A. Garnaev, M. BaykalGürsoy, and H. V. Poor, “Incorporating attacktype uncertainty into network protection,” Information Forensics and Security, IEEE Trans. on, vol. 9, no. 8, pp. 12781287, Aug. 2014.
 [32] A. Garnaev, M. BaykalGürsoy, and H. V. Poor, “Security games with unknown adversarial strategies,” Cybernetics, IEEE Transactions on, vol. 46, no. 10, pp. 22912299, Oct. 2016.
 [33] A. Garnaev, and W. Trappe, “A bandwidth monitoring strategy under uncertainty of the adversary’s activity,” Information Forensics and Security, IEEE Trans. on, vol. 11, no. 4, pp. 837849, Apr. 2016.
 [34] M. BaykalGürsoy, Z. Duan, H. V. Poor, and A. Garnaev, “Infrastructure security games,” European Journal of Operational Research, vol. 239, no. 2, pp. 469478, 2014.
 [35] F. M. Aziz, Resilience of LTE networks against smart jamming attacks: a gametheoretic approach. PhD dissertation, Georgia Institute of Technology, Atlanta, GA, Summer 2017.
 [36] A. Goldsmith, Wireless Communications. New York: Cambridge University Press, 2005.
 [37] P. Viswanath, D. N. C. Tse, and R. Laroia, “Opportunistic beamforming using dumb antennas,” Info. Theory, IEEE Trans. on, vol. 48, no. 6, pp. 12771294, Jun. 2002.
 [38] JP. Ponssard, and S. Sorin, “The LP formulation of finite zerosum games with incomplete information,” Game Theory, International Journal of, vol. 9, no. 2, pp. 99105, Jun. 1980.