# Asymptotic Properties of Likelihood Based Linear Modulation Classification Systems

## Abstract

The problem of linear modulation classification using likelihood based methods is considered. Asymptotic properties of most commonly used classifiers in the literature are derived. These classifiers are based on hybrid likelihood ratio test (HLRT) and average likelihood ratio test (ALRT) respectively. Both a single-sensor setting and a multi-sensor setting that uses a distributed decision fusion approach are analyzed. For a modulation classification system using a single sensor, it is shown that HLRT achieves asymptotically vanishing probability of error () whereas the same result cannot be proven for ALRT. In a multi-sensor setting using soft decision fusion, conditions are derived under which vanishes asymptotically. Furthermore, the asymptotic analysis of the fusion rule that assumes independent sensor decisions is carried out.

Automatic modulation classification, maximum likelihood classifier, decision fusion.

## I Introduction

Automatic modulation classification (AMC) is a signal processing technique that is used to estimate the modulation scheme corresponding to a received noisy communication signal. It plays a crucial role in various civilian and military applications, e.g., this technique has been widely used in many communication applications such as spectrum monitoring and adaptive demodulation. The AMC methods can be divided into two general classes (see the survey paper [1]): 1) likelihood-based (LB) and 2) feature-based (FB) methods. In this paper, we focus on the former method which is based on the likelihood function of the received signal under each modulation scheme, where the decision is made using a Bayesian hypothesis testing framework. The solution obtained by the LB method is optimal in the Bayesian sense, i.e., it minimizes the probability of incorrect classification. In the last two decades, extensive research has been conducted on AMC methods, which are mainly limited to methods based on receptions at a single sensor (communication receiver). A detailed survey on the AMC techniques using a single sensor can be found in [1]. For a single sensor tasked with AMC, the classification performance depends highly on the channel quality which directly affects the received signal strength. In non-cooperative communication environments, additional challenges exist that further complicate the problem. These challenges stem from unknown parameters such as signal-to-noise ratio (SNR) and phase offset. In order to alleviate classification performance degradation in non-cooperative environments, network centric collaborative AMC approaches have been proposed in [2, 3, 4, 5, 6]. It has been shown that the use of multiple sensors has the potential of boosting effective SNR, thereby improving the probability of correct classification.

In this paper, we focus on the likelihood based classification of linearly modulated signals, i.e., PSK and QAM signals. We notice that this problem is a composite hypothesis testing problem due to unknown signal parameters, i.e., uncertainty in the parameters of the probability density functions (pdfs) associated with different hypotheses. Various likelihood ratio based automatic modulation classification techniques have been proposed in the literature. An underlying assumption in all of these techniques is that each hypothesis has equally likely priors, in which case the classifiers reduce to maximum likelihood (ML) classifiers. These techniques take the form of a generalized likelihood ratio test (GLRT), an average likelihood ratio test (ALRT) or a hybrid likelihood ratio test (HLRT). A thorough review of these techniques can be found in [7]. In the GLRT approach, all the unknown parameters are estimated using maximum likelihood (ML) methods and then a likelihood ratio test (LRT) is carried out by plugging in these estimates into the pdfs under both hypotheses. In addition to its complexity, GLRT has been shown to provide poor performance in classifying nested constellation schemes such as QAM [8]. In the ALRT approach [7], the unknown signal parameters are marginalized out assuming certain priors converting the problem into a simple hypothesis testing problem. In the HLRT approach [7], the likelihood function (LF) is marginalized over the unknown constellation symbols and then the resulting average likelihood function (LF) is used to find the ML estimates of the remaining unknown parameters. These estimates are then plugged into the average LFs to carry out the LRT. Also, there are several variations of HLRT, which are called quasi HLRT (QHLRT), in which the ML estimates are replaced with other alternatives such as moment based estimators. We do not discuss the details here and refer the interested reader to [7] for further details. Our goal in this paper is to derive asymptotic (in the number of observations ) properties of modulation classification methods. We consider both single sensor and multiple sensor approaches. Although there has been extensive work on developing various methods for modulation classification, to the best of our knowledge, except for the work in [9], there is no work in the literature that investigates asymptotic properties of modulation classification systems under single sensor or multi-sensor settings. In [9], the authors consider a coherent scenario where the only unknown variables are the constellation symbols. In this scenario, they analyze the asymptotic behavior of ML classifiers for linear modualtion schemes. Using Kolmogorov-Smirnov (K-S) distance, they show that the ML classification error probability vanishes as . Our contributions in this paper are as follows. We start with a single sensor system and analyze the asymptotic properties of two AMC scenarios: 1) coherent scenario with known signal-to-noise ratio (SNR), 2) non-coherent scenario with unknown SNR. Although the first scenario is the same as the one considered in [9], we provide a much simpler proof which is then utilized to obtain the results for our second scenario. We analyze both HLRT and ALRT approaches. We do not consider GLRT due to its poor performance in classifying nested constellations. After analyzing single sensor approaches, we consider a multi-sensor setting as shown in Fig. 1. Under this framework, we analyze a specific multi-sensor approach, namely distributed decision fusion for multi-hypothesis modulation classification where each sensor uses the LB approach to make its local decision. In this setting, there are sensors observing the same unknown signal. Each sensor employs its own LB classifier and sends it soft decision to a fusion center where a global decision is made. We analyze the asymptotic properties of ALRT and HLRT in this multi-sensor setting in the asymptotic region as and . We also provide implications of large number of observations for the fusion rule at the fusion center.

The rest of the paper is organized as follows. In Section II, we introduce the system model and lay out our assumptions. In Section III, we formulate the likelihood-based modulation classification problem and summarize HLRT and ALRT approaches. We consider the single sensor case in Section IV and analyze the asymptotic probability of classification error under various settings. Similarly, the asymptotic probability of classification error in the multi-sensor case is analyzed in Section V. We provide numerical results that corroborate our analyses in Section VI. Finally, concluding remarks along with avenues for future work are provided in Section VII.

## Ii System Model Assumptions

We consider a general linear modulation reception scenario with multiple receiving sensors assuming that the wireless communication channel between the unknown transmitter and each sensor undergoes flat block fading, i.e., the channel impulse response is over the observation interval. After preprocessing, the received complex baseband signal at each sensor can be expressed as [1]:

(1) |

(2) |

where denotes the time-varying message signal; represents the unknown signal parameter vector; and are the channel gain (or the signal amplitude) and the channel (or the signal) phase, respectively; is the additive zero-mean white Gaussian noise; is the transmitted pulse; is the symbol period; is the complex information sequence, i.e., the constellation symbol sequence; and and represent residual time and frequency offsets, respectively. The constant represents the propagation time delay within a symbol period where . Throughout the paper, we assume that and are perfectly known. Therefore, without loss of generality, we set . The representation in (2) has the implicit assumption that phase jitter is negligible. Without loss of generality, we further assume that the constellation symbols have unit power, i.e., , where denotes statistical expectation. Note that the unknown phase term denoted by in (2) subsumes both the unknown channel phase and unknown carrier phase. Similarly, the unknown signal amplitude subsumes the unknown signal amplitude as well as the unknown channel gain.

After filtering the received signal with a pulse-matched filter , and sampling at a rate of , where is an integer, the following discrete-time obervation sequence is obtained [10]:

(3) |

(4) |

where with denoting the convolution operator, , , is the total number of observed information symbol, and . Note that , i.e., there are samples per symbol. For simplicity, we assume that is a rectangular pulse where . We further assume and is independent identically distributed (i.i.d.) circularly symmetric complex Gaussian noise with real and imaginary parts of variance , i.e., . Our analysis in this paper can be easily generalized to other pulse shapes and cases where . Under these assumptions, the received observation sequence can be written as:

(5) |

The above signal model is a commonly used model in modulation classification literature [1, 11, 12, 13]. Note that , , and are the unknown signal parameters. In a general modulation classification scenario, in addition to the unknown signal parameters, the noise power may also be unknown. In this case, the unknown parameter vector can be written as .

## Iii Likelihood-based Linear Modulation Classification

Our goal throughout this paper is to gain insights into the modulation classification problem using the assumptions commonly made in the modulation classification literature. Suppose there are candidate modulation formats under consideration. Let denote the observation vector defined as and denote the constellation symbol at time corresponding to modulation . The conditional pdf of conditioned on the unknown modulation format and the unknown parameter vector , i.e., the likelihood function (LF), is given by

(6) |

If the transmitted signal is an M-PSK signal, the constellation symbol set is given as and . Otherwise, if the transmitted signal is an M-QAM signal, the constellation symbol set is and ^{2}

Note that the LF in (6) is parameterized by the modulation scheme under consideration and the only difference between conditional pdfs of different modulation schemes comes from the constellation symbols . In a Bayesian setting, the optimal classifier in terms of minimum probability of classification error is the maximum a posteriori (MAP) classifier. If there is no a priori information on probability of modulation scheme employed by the transmitter available, which is usually the case in a noncooperative environment, one can use a non-informative prior, i.e., each modulation scheme is assigned an identical prior probability. This is the assumed scenario in this paper. In this case, the optimal classifier takes the form of the maximum likelihood (ML) classifier.

Let us first consider the HLRT approach, where the LF is averaged over the unknown constellation symbols and then maximized over the remaining unknown parameters. The modulation scheme that maximizes the resulting LF is selected as the final decision, i.e.,

(7) |

where denotes the expectation operator with respect to the random variable , and is the unknown constellation symbol for modulation format .

In the ALRT approach, the unknown parameters are all marginalized out resulting in the marginal likelihood function which is used to make the final decision as

(8) |

In the next section, we analyze the probability of classification error starting with a single sensor setting followed by a multi-sensor setting.

## Iv Asymptotic Probability of Error Analysis: Single Sensor Case

### Iv-a Scenario 1: Coherent Reception with Known SNR

In this scenario, the only unknown variables are the data symbols , . In this case, without loss of generality, the received complex signal can be expressed as

(9) |

Assuming independent information symbols and white sensor noise, the LF averaged over the unknown constellation symbols under modulation format is given as

(10) |

where

(11) |

In (11), and are the number of constellation symbols and the constellation symbol for modulation class , respectively. In general, the constellation symbols are assumed to have equal a priori probabilities, i.e., , which results in

(12) |

where

(13) |

In this case, in (12) represents a complex Gaussian mixture model (GMM), or a complex Gaussian mixture distribution, with homoscedastic components where each component has identical occurrence probability (weight) as well as identical variance , and the mean of each component is one of the unique constellation symbols in modulation format . Let us revisit the generic expression for a complex GMM denoted by :

(14) |

where

(15) |

We know that a GMM given by (14) and (15) is completely parameterized by the set [14].

###### Remark 1

For a given modulation format , the Gaussian mixture model (GMM) in (12) is completely parameterized by the means of the components in the mixture, i.e., by the constellation symbol set . In other words, if then and represent two different GMMs.

Let us now define the test statistics

(16) |

Then, the ML classifier is given as

(17) |

The classifier performance can be quantified in terms of the average probability of error () given as

(18) |

where is the probability of error under hypothesis , i.e., given that modulation is the true modulation,

(19) |

Now, we can state the following theorem which shows that the probability of error of the ML classifier vanishes asymptotically as . Note that the same result was also obtained in [9] using Kolmogorov-Smirnov (K-S) distance. Here, we provide a simpler proof than the one in [9].

###### Theorem 1

The ML classifier in (17) asymptotically attains zero probability of error for classifying digital amplitude-phase modulations regardless of the received SNR, i.e.,

(20) |

###### Proof:

Suppose is the true hypothesis. In order to study the asymptotic behavior of under , we follow the same technique as in [15] and write the following using the law of large numbers:

(21) | ||||

(22) | ||||

(23) |

where is the expectation under , is the Kullback-Leibler (KL) distance between and defined as , and is the differential entropy defined as [16]. Note that is not a function of any modulation . Therefore, under , the only difference between test statistics and is the KL distance , which is equal to zero if and only if . Now, let us revisit the ML classification rule given in (17),

(24) |

Since the second term in (23) is independent of the test statistics under consideration, i.e., , the only difference between different test statistics results from the the first term in (23), which is the KL distance . If for and for , the ML classifier in (24) will always decide

(25) |

Therefore, (25) implies that perfect classification is obtained for any given SNR in the limit as if and only if . For digital phase-amplitude modulations, we know from (12) that represents a GMM and each modulation format corresponds to a unique GMM (see Remark 1). Therefore, , which is the only condition needed for asymptotically vanishing error probability of the ML classifier. \qed

### Iv-B Noncoherent Reception with Unknown SNR

In this scenario, the received complex signal is expressed as

(26) |

In this case, in addition to the unknown constellation symbols, there are three more unknown parameters which are channel amplitude (), channel phase (), and noise power (). We will denote these additional unknown parameters in vector form as , where , and .

Let us first consider the HLRT approach, where the unknown data symbols are marginalized out and the remaining unknown parameters are estimated using an ML estimator. In HLRT, these ML estimates are plugged into the likelihood function to perform the ML classification task. In practice, the complex channel gain can be either random or deterministic depending on the application. In deep-space communications, the channel gain can be assumed to be a deterministic time-independent constant [17], whereas in urban wireless communications, the channel gain is often assumed to be random due to multipath effects resulting in fading. In fading channels, the duration over which the channel gain remains constant depends on the coherence time of the channel. Nevertheless, in HLRT, the channel gain is always treated as a deterministic unknown regardless of the application and ML estimation is employed to estimate and . The resulting likelihood function for modulation can be written as

(27) |

where

(28) |

(29) |

In order to be explicit, we re-write (28) as

(30) |

From (30), we can see that represents a complex GMM with homoscedastic components where each component has identical occurrence probability as well as identical variance , and the mean of each component is one of the unique constellation symbols in modulation format mutiplied by .

We can define the new test statistics which now includes the estimates of the unknown parameters as

(31) |

Then (29) can be equivalently written as

(32) |

and the ML classifier is given as

(33) |

We start the analysis by making the following observations. In practice, there is always some a priori knowledge on the bounds of the unknown parameters and . In other words, the search space for the maximization of the likelihood function with respect to and can be confined to and , respectively, for some known and . Regarding the unknown phase , the search space depends on the modulation class that is under consideration. For M-PSK modulations, it suffices to limit the search space of to , because the likelihood function is a periodic function of with a period of . This is due to averaging over the unknown constellation symbols and rotation of the constellation map with respect to , i.e., rotation of the constellation map by results in the same constellation map as far as the likelihood function averaged over the constellation symbols is considered. Similarly, for M-QAM modulations, it suffices to limit the search space of to because of the same reasons as M-PSK modulations discussed earlier. We now make the following assumption which will simplify mathematical analysis. We assume that the unknown parameters lie in the interior region of the cube for M-PSK or for M-QAM, respectively. Note that these assumptions are almost always satisfied in practice. Let us denote this closed Euclidean space as , where for M-PSK and for M-QAM.

###### Lemma 1

Let denote the set of PSK and QAM modulation classes. Define . Let , , . If , then

(34) |

###### Proof:

See Appendix A. \qed

The following theorem states that the probability of error of the HLRT classifier vanishes asymptotically as .

###### Theorem 2

The ML classifier in (33) asymptotically attains zero probability of error for classifying digital amplitude-phase modulations regardless of the received SNR.

###### Proof:

Suppose is the true hypothesis and denotes the true value of the unknown parameter. We start by noting that the maximum likelihood estimator (MLE) is consistent under some mild regularity conditions [18], which are satisfied by the likelihood functions of digital amplitude-phase modulations. In other words, if is the true hypothesis and is the true value of the unknown parameter , then

(35) |

Under , we write the following using the law of large numbers

(36) |

where denotes expectation with respect to . Then, (36) can be written as

(37) | ||||

(38) |

where the second term is the differential entropy of the true distribution defined as . The proof follows from Lemma 1 and the same reasoning as in Theorem 1. \qed

From (38), we can make the following observation. Under and the true parameter ,

(39) | ||||

(40) |

As , the MLE minimizes the KL distance between the true and the assumed distributions. This was actually observed by Akaike [19] in the area of maximum likelihood estimation under misspecified models (see also [20]). We should also emphasize that the consistency of the ML estimator is necessary for to vanish as as otherwise one cannot deduce (38) from (37). As one would expect, the result in Theorem 2 is useful in practice only when the channel gain remains constant over a large observation interval. Channels that exhibit such a behavior include deep space communication channels as well as slowly varying fading channels.

Next, we consider a variation of the HLRT approach where, in addition to unknown data symbols, a subset of remaining unknown parameters are marginalized out. Then the maximization is carried over the remaining subset. Let denote the subset of the unknown parameters that are marginalized out and denote the joint a priori distribution of . Let denote the vector of the remaining unknown parameters over which the maximization is carried out. Then, the ML classifier is given as

(41) |

(42) |

where

(43) |

Since the unknowns stay constant over the observation interval, it is clear from (57) that the observations become dependent after averaging (conditional independence is no longer valid), i.e.,

(44) |

Due to this dependence, the law of large number cannot be invoked. Therefore, these classifiers do not have provably vanishing in the asymptotic regime as . This is also the case for the ALRT approach where all the unknowns are marginalized out before classification. In practice, ALRT may be preferred over HLRT since the latter requires multi-dimensional maximization of the LF which is generally a non-convex optimization problem. In order to alleviate this problem, a suboptimal HLRT called quasi-HLRT (or QHLRT) was proposed in [8, 12], where the MLEs of the unknown parameters were replaced with moment based estimators. In general, QHLRT does not guarantee provably asymptotically vanishing , since these estimators are generally not consistent.

## V Asymptotic Probability of Error Analysis: Multi-Sensor Case

In this section, we consider a multi-sensor setting where each sensor transmits its soft decision to a fusion center where a global decision is made. We start our analyses assuming soft decision fusion where each sensor sends its unquantized local likelihood value to the fusion center.

In a multiple sensor scenario, the set of unknown parameters corresponding to each sensor is independent from that of other sensors. However, care must be taken to analyze this scenario as the independence of these unknowns does not guarantee the independence of different sensor observations. In the following, we will investigate the multiple sensor scenario and derive conditions under which the asymptotic error probability goes to zero.

### V-a Scenario 1: Coherent Reception with Known SNR

We first consider the general case for the coherent and synchronous environment where there are sensors and each sensor makes observations. Let us define the vector of observations for each sensor as , . We also define the set of indices for the complex information sequence that each sensor observes as

(45) |

Similar to (10)-(12), the likelihood function at sensor is

(46) |

(47) |

Let and denote two arbitrary likelihood functions for sensor and , where . Assuming independent sensor noises, it is important to see that and are independent if and only if

(48) |

The condition in (48) is required for independence since data symbols are marginalized out in the likelihood function. We should note that the implicit assumption in (48) is that the data symbols are i.i.d. in time which is a common assumption in communications literature. From (48), we can deduce the general condition for independence. All sensor observations are independent (across sensors) if and only if

(49) |

Physically, the condition in (49) implies that sensor observations, or the underlying baseband symbol sequences, should not overlap in time to satisfy independence. This condition may or may not be realized in practice. One possible way of obtaining independent sensor observations is to send a pilot signal to each sensor initiating data collection and leave enough time between two consecutive pilot signals so that each sensor observes a different non-overlapping time window of the same signal.

Suppose the condition in (49) is satisfied. Let denote the likelihood function at the fusion center for modulation defined as

(50) |

We can now define

(51) | ||||

Note that the independence condition is necessary in order for the second equality in (51) to hold. Then, the ML classifier is given as

(52) |

###### Theorem 3

As ^{3}

###### Proof:

The proof follows the same steps as in Theorem 1 and is omitted here for brevity. \qed

### V-B Noncoherent Reception with Unknown SNR

In this scenario, the received complex signal at sensor can be expressed as

(53) |

The vector of unknowns for sensor is . Let us first consider the HLRT approach where sensor computes its likelihood by first marginalizing over the unknown symbols , and then plugging in the MLE of . Let us define the vector of observations at the fusion center as . Suppose that the independence condition in (49) is satisfied. Let denote the likelihood function at the fusion center for the HLRT given as

(54) |

Following the same reasoning as in the single sensor scenario, we can claim that as using Theorem 1. However, the same result cannot be claimed for finite even when due to different unknown parameters at different sensors.

If a subset of unknowns are marginalized out in the HLRT approach (see Section IV-B eqs. (41)-(44)), the distribution at the fusion center takes the following form:

(55) |

where denotes the ML estimate of the remaining unknown parameters of sensor under , i.e.,

(56) |

where

(57) |

Then, the ML classifier is given as

(58) |

Similar to (44), since the unknowns , stay constant over the observation interval, it is clear from (57) that the observations become dependent after averaging, i.e.,

(59) |

Therefore, these classifiers do not have provably vanishing in the asymptotic regime as due to dependence or as due to different unknown parameters at different sensors.

Let us now consider the ALRT approach where all the unknowns are marginalized out. Denote the joint a priori distribution of as. Let denote the likelihood function at the fusion center for ALRT defined as

(60) |

where

(61) |

Now, define the following

(62) |

The ML classifier is given as

(63) |

For ALRT, we consider a special case where is known^{4}

(64) |

###### Lemma 2

Let denote the set of PSK and QAM modulation classes. Define as given in (64). For , if and , then .

###### Proof:

See Appendix B. \qed

###### Theorem 4

Suppose is known, is Rayleigh distributed, and is uniformly distributed over . Then the ML classifier in (63) achieves zero probability of error as .

Theorem 4 ensures that asymptotically vanishing is guaranteed in the number of sensors if ALRT is used at each sensor provided that each sensor has independent observations, i.e., each sensor observes a non-overlapping time window of the transmitted signal. In other words, using a multi-sensor approach ensures asymptotically vanishing for ALRT which is not provably the case for a single sensor as explained in Section IV-B.

### V-C Fusion Rule

In this section, we analyze the implications of the independence condition in (49) for decision fusion based modulation classification. For finite number of observations (), it is clear that if (49) is not satisfied, there are sensors observing the same baseband sequence resulting in dependent observations due to averaging over unknown constellation symbols. If (49) is not satisfied, even though each sensor noise is independent, the joint conditional distribution at the fusion center cannot be written as a product of individual conditional distributions, i.e.,

(65) |

However, in the asymptotic regime as , we have the following theorem.

###### Theorem 5

Suppose there are two groups of sensors denoted as and observing the same signal with unknown modulation. Suppose the sensors in have arbitrary overlaps in their observations and the sensors in have no overlaps. Let and , denote the observations from the sensors in and , respectively. Let () denote the likelihood function of sensor () under which represents either a coherent scenario with known SNR as in (46) or a noncoherent scenario with unknown SNR in the forms of HLR or ALR as in (27) or (57) or (61). Suppose both groups use the same fusion rule to classify the unknown modulation given as:

(66) | |||

(67) |

Let and denote the probabilities of classification error for the fusion rules in (66) and (67), respectively. As , we have the following result:

(68) |

###### Proof:

Sensor observations in are dependent. This dependence results solely from overlapping sensor observations regardless of the scenario under consideration and regardless of which classification algorithm is employed (HLR or ALR). Suppose is the hypothesis under consideration. Let denote the set of constellation symbols for modulation with ; and , denote the received constellation symbol sequence by an arbitrary sensor. Suppose and let denote the indicator function defined as if or otherwise. Now, define

(69) |

which represents the number of occurences of in the received symbol sequence . Now, take the limit

(70) |

where results from applying the law of large numbers and results from the fact that each symbol in the constellation set is equally likely. We can rewrite (70) as

(71) |

which implies that as , each constellation symbol has identical number of occurences . Therefore, in the asymptotic regime (), each sensor observes equal number of different constellation symbols whether those symbols overlap across sensors or not.

Now, consider sensor and let denote the -th symbol received by sensor . Note that is permutation invariant with respect to (or ), because each is i.i.d. and background noise is white. In other words, is invariant to the order of the received symbol sequence . Let us define a virtual sensor indexed by and suppose that it observes a symbol sequence that does not overlap with those observed by other sensors, i.e., and represents i.i.d. symbol sequences. As we let , the number of occurences of each symbol in and become identical from (71). This implies that