# Structure-Aware Bayesian Compressive Sensing for Frequency-Hopping Spectrum Estimation with Missing Observations

## Abstract

In this paper, we address the problem of spectrum estimation of multiple frequency-hopping (FH) signals in the presence of random missing observations. The signals are analyzed within the bilinear time-frequency (TF) representation framework, where a TF kernel is designed by exploiting the inherent FH signal structures. The designed kernel permits effective suppression of cross-terms and artifacts due to missing observations while preserving the FH signal auto-terms. The kernelled results are represented in the instantaneous autocorrelation function domain, which are then processed using a re-designed structure-aware Bayesian compressive sensing algorithm to accurately estimate the FH signal TF spectrum. The proposed method achieves high-resolution FH signal spectrum estimation even when a large portion of data observations is missing. Simulation results verify the effectiveness of the proposed method and its superiority over existing techniques.

Frequency hopping, spectrum estimation, missing observations, Bayesian compressive sensing, time-frequency distribution, kernel design.

## 1 Introduction

\IEEEPARstartFrequency-hopping (FH) signals are generated by varying the carrier frequencies according to a certain hopping pattern, which is typically pseudo-random. Due to their inherent capability of low probability of intercept, reduced interference to/from other users, resistance to jamming and multipath fading, and desirable ambiguity function property, FH signals have become a favorable choice in a wide range of communication and radar applications, particularly in the context of multiple-input multiple-output (MIMO) operations [3, 4, 5, 6, 7]. For a variety of tasks ranging from interception of non-cooperative emitters to exploitation of signals of opportunity for passive sensing, estimating and tracking the instantaneous spectrum of FH signals are an important yet challenging task when the hopping patterns of the constituent signals are unavailable. The problem becomes even more difficult when the hopping period is time-varying [9].

In this paper, we consider the spectrum estimation of multi-emitter FH signals with unknown and time-varying hopping periods in the context of Bayesian compressive sensing (BCS). In particular, we focus on the case where the received signal waveform is subject to missing observations. The specific FH signal structures are utilized to design time-frequency (TF) kernels and BCS structure priors to achieve reliable and high-resolution FH spectrum estimation.

The continuous-time noisy multi-emitter FH signal considered in this paper is expressed as [8]

(1) |

where , and represents a normalized boxcar function, which is equal to one for and otherwise. In addition, denotes the duration of each hop of the -th individual FH emitter, and is the number of FH emitters. Moreover, and are the complex amplitude and carrier frequency of the -th tone in the -th system-wise dwell, respectively. The number of tones, , may vary with because of emitter (de)activation or bandwidth mismatch [9]. represents the additive circularly-symmetric complex white Gaussian noise. Let and respectively denote the sampling rate and the sampling interval. Then, the sampled discrete-time FH signal can be derived from (1) as

(2) |

In practice, the measured data may experience missing samples due to channel distortion/fading, line-of-sight obstruction, removal of samples contaminated by impulsive noise, and collecting/storage equipment failures [10]. Denote as signal with missing data, and as the set of missing time instants with cardinality , where is the total length of signal and , and represents the missing-sample ratio. Then, can be interpreted as modulated by a sum of Dirac delta functions (impulses) [11], i.e.,

(3) |

Random missing observations induce noise-like artifacts in the time-frequency distributions (TFD) [17], which makes the problem even more intractable.

### 1.1 Related Work

Time-varying spectrum signatures of non-stationary signals, such as FH signals, can be revealed in the joint TF domain representations. As FH signals generally exhibit sparsity in the joint TF domain, compressive sensing (CS) and sparse reconstruction techniques [12, 13, 14] enable effective FH spectrum representation and parameter estimation. In [9, 15], this problem is solved by formulating the problem as an underdetermined linear regression with a dual sparsity penalty, i.e., a penalty function that controls both the intrinsic sparsity and smoothness of the estimation. However, this approach requires appropriate tuning of the parameters, and obtaining the optimum solution still requires considerable effort. Another limitation of the approach proposed in [9, 15] is that they do not provide robust estimation performance due to the sensitivity of the differential operator used in fused least absolute shrinkage and selection operator (LASSO). To improve the parameter estimation performance, particularly in low signal-to-noise ratio (SNR) conditions, a BCS method was adopted in [16], where a logistic stick breaking process is employed to encourage the temporal clustering over each hopping interval. BCS algorithm enables, through the proper design of priors, the incorporation of the contiguity property of typical TF signatures and thus enhances sparse optimization solutions. However, all the aforementioned approaches are based on linear TF analyses, and do not account for the effect of missing observations. Actually, linear approaches fail in the case of missing samples, as we will show in Section 4.

As described in [17, 18, 19, 10, 11], the effect of artifacts induced by random missing samples can be substantially reduced by applying proper TF kernels, which involves developing FH spectrum estimation methods in the bilinear time-frequency representation (TFR) framework. Sparse reconstruction of TFRs using different CS methods can also be found therein. It is known that bilinear (quadratic) TFDs provide high-resolution time-varying spectrum representations. The Wigner-Ville distribution (WVD) is considered as a prototype of bilinear TFDs, which offers highest TF energy concentration for single-component linear frequency modulated signals. However, because of the bilinear nature, it causes cross-terms between different components that constitute false energy distributions. To resolve this problem, various reduced-interference distributions have been developed for cross-term reduction through the design of appropriate TF kernels in the general Cohen’s class [20, 21]. Such TF kernels can be signal-independent or signal-dependent. The latter performs parameter tuning via optimization, and thus generally provides better performance in trading off the cross-term suppression and the auto-term preservation. In particular, the adaptive optimal kernel (AOK), which is based on the optimization of radial Gaussian functions in the ambiguity function (AF) domain [22, 23], is a commonly used signal-dependent kernel.

Recently, such approaches have been adopted to estimate FH spectrum from data with missing observations, and an orthogonal matching pursuit (OMP) algorithm based approach was developed to achieve both artifact mitigation and high-resolution FH signal spectrum estimation [1]. The filtering capability of TF kernels offers bilinear TFR unique advantages over its linear counterpart [9, 15, 16] to effectively suppress the artifacts induced by missing observations. In the underlying problem that deals with FH spectrum estimation, however, separately reconstructing the TFR in each time instant as in [1] does not utilize an important signal characteristic relevant to the contiguous structure of the FH signatures. In particular, the approach may likely generate isolated or sporadic entries in the reconstructed TFR in the presence of missing data and/or measurement noise. In [24], a novel continuous structure based BCS approach [25] is proposed for the sparse reconstruction of nonstationary signals with missing observations. On this basis, a re-designed BCS-based scheme that exploits the contiguous structure of the FH signal is applied in [2, 26] to provide additional robustness in the FH spectrum estimation. Compared with the FH spectrum estimation via OMP [1], the BCS-based approach is proved capable to achieve an improved sparse solution [25]. The BCS methods approach sparse solutions that are close to -norm optimization and support structure-aware sparse problems through the use of adequate priors.

### 1.2 Contributions

The main novelty of this paper lies in the development of a *Comprehensive Structure-Aware Spectrum Estimation* technique for FH signals which is more advantageous than existing techniques. In particular, it is the first time to investigate the spectrum estimation for FH signals *in the presence of missing observations*. The concept of structure-awareness contains two major components, namely, structure-aware TF kernel design and structure-aware TFR reconstruction. (a) A structure-aware TF kernel is first developed in the AF domain to perform effective suppression of cross-terms and artifacts due to missing observations while preserving the FH signal auto-terms. In particular, we propose a new waveform-adaptive TF kernel design which combines an automatically optimized pre-filtering window and the data-dependent AOK. The pre-filtering window function exploits the prior information of the FH waveform characteristics, whereas the AOK further optimizes the kernel for effective cross-term and artifact reduction while preserving the signal auto-terms. The kernelled AF is then transformed to the instantaneous autocorrelation function (IAF) domain through a one-dimensional (1-D) Fourier transform with respect to (w.r.t.) the frequency difference (Doppler) domain. The IAF results are then processed using sparse reconstruction methods for high-resolution reconstruction of the FH signal TF spectrum. (b) In the sparse reconstruction process, a re-designed BCS approach is developed to estimate the TFD of the signal from the IAF. A novel structure prior for the TFD is imposed to enforce the unique horizontal continuity of the TFR, that characterize the underlying FH signals. Compared with [24, 27], in addition to designing new structure-aware patterns, we also propose nonlinear updating rules associating the hyper-parameters with the TF patterns, rather than simply select the hyper-parameters from fixed categories. As such, the proposed approach can robustly estimate the FH spectrum in the presence of a high number of missing samples and when the *a priori* information of the hopping patterns is unavailable. As we will show in Section 4, while existing methods coping with the FH parameter estimation problem developed in [9, 15, 16] deteriorate sharply when treating data with missing observations, the proposed approach achieves superior performance in such challenging scenarios.

Notations: Lower-case (upper-case) bold characters are used to denote vectors (matrices). returns the modulus of a given complex number. denotes Hadamard product. represents a diagonal matrix that uses the entries of a vector as its diagonal entries, and denotes an identity matrix. and denote the 1-D discrete Fourier transform (DFT) and inverse discrete Fourier transform (IDFT) matrices w.r.t. the dimension, respectively, and denotes a two-dimensional (2-D) DFT w.r.t. the and dimensions. , and respectively denote complex conjugate, transpose and Hermitian operations of a matrix. represents the -norm of a vector, and denotes the cardinality of a set. denotes the probability density function (PDF). , , , and denote Bernoulli, complex Gaussian, Beta, and Gamma distributions, respectively.

## 2 Structure-Aware Adaptive Kernel Design

The main stages of the proposed structure-aware scheme are summarized in the flowchart depicted in Fig. 1. In this section, we first present a detailed description of the proposed signal-dependent kernel design. A joint-variable representation of the FH spectrum in the presence of missing samples is first described in Section 2.1, and the adaptive kernel design is introduced in Section 2.2. Section 2.4 discusses the optimization of the pre-filtering parameters.

### 2.1 Joint-Variable Representations of Missing-Sample FH Spectrum

The discrete-time IAF of signal is defined as [21]

(4) |

where denotes the time-lag index. Stacking corresponding to all values of and results in an IAF matrix . Then, the AF matrix of signal vector , expressed w.r.t. lag and Doppler frequency , can be obtained by performing 1-D IDFT on the IAF w.r.t. the time index , i.e.,

(5) |

where the notation is used to emphasize that the matrix is constructed w.r.t. variables and . Similarly, the WVD can be obtained by performing 1-D DFT on the IAF w.r.t. the lag index , i.e.,

(6) |

*Remarks:* Note that we use in the above expression to perform the DFT because integer lags are adopted. This is a common practice in computing the discrete WVD.

Substituting (3) and (4) into (5), the AF of the observed FH signal with missing samples can be obtained as follows

(7) |

where and respectively denote the IAF and AF of the original FH signal without missing samples. The term in (7) contains auto-terms and cross-terms.

It can be observed from (7) that, the missing-sample AF consists of two parts, i.e., the full-data AF of and the artifacts due to missing samples. The latter contains the auto-terms of the missing samples and the cross-terms between the signal and the missing samples. The artifacts expressed in (7) resemble noise in the sense that they spread over the entire ambiguity domain. The noise pattern of the first two artifact terms in the ambiguity domain depends on the values of the missing observations and their positions, whereas the third artifact term is only affected by the missing-sample positions. As pointed out in references [10, 11], careful attention should be paid to the last artifact term, which is always located at , i.e., along the Doppler frequency axis. This discourages the use of conventional kernels which, due to the required marginal properties, capture all values along the axis. A typical AF magnitude plot of an FH signal is depicted in Fig. 2(a).

### 2.2 Adaptive Kernel Design

With the use of the *a priori* information on the TF structure of the FH signal, i.e., its piecewise constant frequency TF signature, we can apply a proper pre-filtering window before optimizing the AOK so as to prevent the artifacts from being falsely identified as desired signal components and misguiding the AOK optimization process. Generally, for signals whose auto-terms are nearly parallel to either the lag or the Doppler axis, which are exactly the case with the FH signals considered in this paper, the extended compact support kernel (ECSK) outperforms the other kernels in terms of artifact suppression and auto-term preservation [28, 21, 29]. The ECSK also provides flexibility to independently adjust the shape and the size of the kernel. In this paper, we modify the ECSK such that different shape control parameters are used for the two branches, i.e., the lag and Doppler, to offer better flexibility. The modified ECSK is formulated as

(8) |

where

(9) |

and

(10) |

respectively represent lag/Doppler window branches. In the above expressions, and denote the shape control parameters of the two branches, and and represent their respective sizes. Larger values of and result in a steeper kernel shape in the corresponding branch, whereas larger values of and imply a larger kernel size.

In our proposed method, prior to the radial kernel optimization procedure, the short-time AF is pre-filtered by utilizing the modified ECSK *in a time-localized, short-time manner*, as illustrated in Fig. 2(b), where the preserved support for the auto-terms is a sufficiently small region where the Doppler frequency is nearly zero. In doing so, the vertical TFD stripes due to impulsive missing samples, whose AF components spread in the Doppler domain, and noise-like artifacts, whose AF components spread in the entire ambiguity domain, are effectively eliminated.

### 2.3 Pre-filtering Parameter Optimization

Enhanced TFD concentration generally yields sharp TF representations and reduced vicinal interference. To achieve an optimal pre-filtering performance that simultaneously maximizes the TFD concentration and minimizes the TFD artifacts, parameter pairs and as well as and should be tuned to their optima based on a proper criterion. Several optimization criteria are available in the literature for the evaluation of the concentration performance. Among these criteria, distribution norm-based measures [30, 31] and entropy-based measures [32, 33] are commonly used. However, norm-based measures tend to discriminate poorly concentrated components [34], whereas entropy-based measures are sensitive to amplitude and phase variations [35]. In this context, an efficient energy concentration measure is introduced in [34] that overcomes the aforementioned drawbacks, and has been applied to automatic determination of the best window length in the computation of spectrogram. In this paper, an automatic parameter tuning method is proposed based on this energy concentration measure. The discrete-time expression of this energy concentration measure can be written as

(11) |

We define the cost function in our pre-filtering parameter optimization process as

(12) |

The constraints in (12) are set according to the domain of definition and can are applicable to different types of FH signals.

To achieve a fully automated optimization of the kernel parameters, an adaptive differential evolution algorithm [36] is adopted. When the *a priori* knowledge about the distribution of potential optima is available, we can further exploit it to arrange the initial population settings. In the simulation examples provided in Section 4, we assume that the potential optima follow a uniform distribution. To better illustrate the parameter optimization process, we provide a -point scatter plot of the optimized parameters and in our numerical trials. It can be observed that the optimal values of vary within a relatively narrower range than .

### 2.4 AOK After Pre-filtering

After applying the modified ECSK as a pre-filtering window, AOK is then employed to further mitigate the effect of artifacts due to missing samples. As discussed in Section 2.1, such artifacts spread over the entire ambiguity domain. The AOK is a well-known data-dependent kernel, which is designed by solving the following optimization problem [23]:

(13) |

where denotes the kernel volume constraint, represents the AF of the signal in polar coordinates, and and denote the radius and radial angle variables, respectively. Equation (13) is optimized in the sense that the signal auto-terms are preserved to the maximum extent within the low-pass Gaussian filter, while the pass-band area of the filter is limited to a total volume of so as to filter out the cross-terms which are located away from the origin, and to reduce the artifacts and noise that spread over the entire ambiguity domain. The desired resolution and the cross-term attenuation are determined by a proper selection of .

For signals with time-varying characteristics, AOK is usually implemented with a time-localized short-time AF. At time instant , a time-adaptive kernel is produced by substituting the short-time AF for in (13) and then following the polar-coordinate Gaussian kernel optimization procedure for each individual TFD slice [23]. Denoting the rectangular-coordinate short-time AF as , the pre-filtered short-time AF can be expressed as

(14) |

where represents a rectangular short-time sliding window. Stacking for all and results in the short-time AF matrix . Then, the TFD corresponding to the kernelled AF is obtained as its 2-D DFT w.r.t. and , expressed as

(15) |

where is the time-localized AOK matrix represented in the rectangular coordinate system.

*Remarks:* It is worth emphasizing that, when compared to references [9, 15, 16], which consider FH spectrum estimation in the context of linear short-time Fourier transform (STFT), the utilization of the bilinear TFR in this paper enables us to better address the missing-sample problem because kernel design and its capability to filter out undesired signal components can be utilized *only* in bilinear TF analysis. This is a key novel contribution of this paper since so far only the linear TF analysis has been used in sparse FH spectrum estimation, and no missing samples have been considered in the literature.

## 3 BCS-Based FH Spectrum Estimation

### 3.1 CS Model for FH Spectrum Estimation

In this section, we consider a CS based approach which yields a high-resolution TFR. The IAF matrix corresponding to the kernelled AF is obtained as the 1-D IDFT of w.r.t. , i.e.,

(16) |

On the other hand, the bilinear TFR matrix is associated with the IAF matrix by the following 1-D Fourier relationship:

(17) |

The CS approach obtains by exploiting the above Fourier transform relationship but through a sparse reconstruction operation. Denote as the -th column of the IAF matrix , and as the -th column of the bilinear TFR matrix . Then, their relationship conforms to the following standard linear model commonly used in CS and sparse reconstruction:

(18) |

Therefore, the TFR can be obtained from sparse reconstruction, in lieu of conventional Fourier transform, by repeating the procedure for each time instant. Various CS algorithms can be used for this purpose. In the following, we consider this problem from a BCS perspective [25], and the structure of the FH spectrum is utilized for improved spectrum estimation. BCS methods are known for their capability to flexibly model sparse signals that not only promote the sparsity of its solution, but also exploit additionally known structures of the sparse signal [37]. For notational convenience, we simplify the notations , and as , and , respectively, i.e.,

(19) |

### 3.2 Sparsity Prior

(26) |

with

(27) |

(28) |

The BCS is a nonparametric solver of sparse linear inverse problems imposing a conditional Gaussian prior with its precision (reciprocal of the variance) guided by a hyperprior of Gamma distribution, i.e., . The BCS assumes the following likelihood model [38]

(20) |

where is the variance. To encourage sparsity of the FH signal TFR, a Dirichlet process prior with a spike-and-slab centering distribution [24, 39] is employed to , i.e., the -th entry of , which allows different predictors to have identical coefficients while performing variable selection. That is,

(21) |

where is a mixing weight standing for the prior probability of a nonzero entry, and represents the delta function with a unit point measure concentrated at zero. Also, we assign a Gamma prior to the precision as .

To make the inference analytical, a product of two latent variables and , i.e., , is introduced to follow the PDF in (21), where , and is a binary variable that follows the Bernoulli distribution . implies that the -th entry is nonzero, whereas implies a zero entry. Denote and . The overall prior on w.r.t. and can be evaluated analytically through the integration over , and it corresponds to the Student-t distribution [40]. With an appropriate choice of and , the Student-t distribution is strongly peaked about , and thus the overall prior on favors sparseness [39]. In practice, the hyper-parameters , , , are usually assigned to small values to make the corresponding priors flat.

### 3.3 Structure Prior

The FH spectrum shows sparse piecewise constant frequencies. This structure characteristic can be exploited to improve the accuracy and robustness of the sparse learning performance. For the underlying FH signals, a continuous structure prior that encourages the FH spectrum to have a longer horizontally linear structure in the TFR is desired. With a slightly increased computational complexity, we extend the model to size , i.e., the neighborhood entries that are within a Euclidean distance of , and the vertical pixels from the proximate frequency rows are taken into consideration when decision is made to the TF entry under test. It is evident from Fig. 4 that the utilization of the structure model with a higher dimension enables more comprehensive characterization and treatment of pixel patterns. In this case, simply dividing various patterns into a fixed number of categories does not adequately characterize the relationship between the neighboring pixels. In addition, in the situations with a low SNR, there will be more artifact residue and a higher level of spectrum distortion. As a result, simply rejecting all entries with vertical non-zero neighbors will degrade the robustness of the algorithm.

In this context, we propose a new structure prior which is related to each individual pattern with a proper nonlinear relationship. We first define the neighborhood of index as

(22) |

where is the Euclidean distance between and . We then define the deleted neighborhood of index , i.e., the neighborhood of index with itself excluded, as

(23) |

The number of nonzero entries at a location and its neighborhood is denoted as . Note that during the pattern classification process, three rows of are investigated, and we denote the location in the -th and -th rows of as and , respectively. In Bayesian probability theory, if the posterior distributions belong to the same family as the prior probability distribution, then the prior and posterior are termed conjugate distributions. The Beta distribution is conjugate to Bernoulli likelihood, so is assumed to follow the Beta distribution. For a certain structure prior, the posterior distribution of is derived as

(24) |

The distribution tends to draw small values of when , and a large value when , while it has no tendency when . By choosing proper values of and , therefore, we can encourage or discourage the sparsity of the pixel under test, depending on the sparsity support in the neighboring pixels. The value of hyper-parameters and should be decimal fractions between and . In the previous three-decision-category based method [24, 27], these hyper-parameters are multiples of , and is chosen to be integer power of for computation efficiency. Also, as a rational nonlinear relationship associating the hyper-parameters with the TF structure should encourage longer horizontal structures while discourage high vertical-to-horizontal non-zero neighborhood ratios, we choose a straightforward formula to express the nonlinear relationship between hyper-parameter and the TF structure patterns, and let . On the other hand, some modifications should be made to the formula to ensure that the value of corresponds to the boundary values of , whereas is constrained to a reasonable range and to avoid zero denominator. As a result, the hyper-parameter can be derived as

(39) |

*Remarks:* The statistical properties of Beta distribution, such as mode, mean, and variance are closely related to the weight of each shape parameters in their summation. We set these parameters and in order to encode the structure prior beliefs. For example, the mean of the Beta distribution in this paper is set to . A similar parameter setting has also been adopted in several existing references (c.f., e.g., [24, 27]). We set and keep both hyperparameters for better interpretation and consistency with the existing references.

In the above expressions, the vertical structure factor is assigned different weights and respectively to indirectly and directly adjacent structures as shown in Fig. 5. The reason we discriminate between indirectly and directly adjacent structures is that directly adjacent structures tend to broaden the signal bandwidth. This is contradictory to the fact that the underlying FH signals are instantaneously narrowband. In contrast, indirectly adjacent structures may be formed by the distortion of the desired signal component, noise, or artifact residue, so the weight should be relatively smaller. On the other hand, the horizontal structure factor is obtained by counting the number of continuous adjacent entries. Note that, if the entry under test is located in an isolated line, i.e., both the left and right edge pixels are , the value of in (28) will be set to 0. Because the codomain of and can be derived as and , respectively, according to (27) and (28), we can further obtain the domain of as .

Assume that . According to the proposed structure prior formation method, the hyper-parameter pairs for all the patterns are listed under each case in Fig. 4. These hyper-parameters better reflect the corresponding cluttering situation, by automatically assigning a moderate value to the pattern where a long straight line is present whereas the vertical pixels in the nearby rows take a small value. For those cases where nonzero entries extend in the frequency domain or occur isolatedly, a discouraging value will be asserted to prevent or restrain the structure.

### 3.4 Bayesian Inference

Since no closed-form expressions of the Bayesian estimators can be derived, Markov-chain Monte Carlo sampling is used to implement the inference. The maximum likelihood estimation of and from (21) will generally lead to severe overfitting. To obviate the overfitting problem, a smoother inference model is formulated by defining an automatic relevance determination Gaussian prior over the weights [40]:

(29) |

where is a vector consisting of hyper-parameters that independently control the prior variance of each weight. We can then acquire the posterior distribution of by combining (29) with the observation likelihood in (20), i.e.,

(30) |

A Gibbs sampler is adopted to implement the Bayesian inference as following. Let be the -th column of . Then, the paired Gibbs sampler iteratively samples the observations from the following conditional PDF [24, 39]

(31) |

where the notation denotes the subvector excluding the -th entry. The probability is acquired as

(32) |

where and are respectively updated as

(33) |

(34) |

The conditional distribution of can be expressed as

(35) |

For the case, as the value of does not affect the result of , we directly draw the value of from its prior. Subsequently, the Gibbs sampler updates the mixing weight according to (24).

Next, we update the precision variable . By utilizing the conjugate property of the Gaussian and Gamma distributions, we analytically acquire the posterior distribution of as

(36) |

After completing all the iterations, the posterior distribution of the noise precision is updated as

(37) |

The maximum a posteriori (MAP) estimator is adopted to infer the estimation of as

(38) |

where marginal distribution can be obtained by integrating out the hyper-parameters , , and in (30), as expressed in (39) [39], where denotes a Gamma function.

This completes the sparse reconstruction result of (18) for one time instant. The estimation of the entire FH spectrum is rendered by repeating the BCS-based estimation for each column of .

*Remarks:* The proposed method in this paper differs from that of [2] in two aspects: (a) In [2] a threshold-based post-AOK window was adopted, whereas in this paper we pre-filter the running AF with a new ECSK. ECSK is known as the best TF kernel for signals with axially distributed auto-terms [28, 21, 29], and it facilitates independent controlling of the shape and size according to the *a priori* knowledge on the signal structure. An automatic parameter optimization approach for the pre-filtering ECSK kernel is also proposed in this paper. As a result, the cleanest possible running AF is delivered to the AOK optimization process, so that the resultant adaptive kernel design significantly improves the desired TF filtering performance. (b) Unlike in [2] where the structure priors for BCS-based TF reconstruction were designed based on a fixed three-category pattern, in this paper we associate the hyper-parameters with a nonlinear relationship of the TF structure. The modified structure prior is designed to more adequately model the diversified relationship with neighboring TF entries.

### 3.5 Computational Complexity

In this subsection, we analyze the computational complexity of the proposed scheme and compare it with the existing approaches for FH spectrum estimation. Three methods are compared, namely, the STFT, sparse linear regression (SLR) [9, 15], and sparse Bayesian learning (SBL) [16] based approaches. Note that, although the terms SBL and BCS are used interchangeably for the same algorithm in the literature, we use SBL and BCS in the sequel to respectively refer to the algorithms developed in the linear and bilinear TF frameworks for convenience.

Let be the length of the short-time slide window. The computational complexity of the STFT-based method is , which is the least among all the existing approaches. In comparison, the complexity of the SLR-based method is [9], where is the number of frequency bins. Similar to the STFT-based method, the SBL approach [16] also partitions the input signals into overlapped segments through a sliding window. The computational complexity of the SBL approach is then [16], where is the number of latent parameters, which is normally truncated to a value close to for a tractable Bayesian inference, and denotes the cardinality of the sampled time set in the temporal kernel basis vector, which is typically smaller than . As stated in [16], the computational complexities of both linear TF based methods [9, 15, 16] are actually in a very similar order. In our proposed scheme, the complexities of the pre-filtering parameter optimization, pre-filtering plus AOK processing, as well as BCS reconstruction stages are [41], [23], and [40], respectively, where is the total number of generations, and is the dimension of the problem, i.e., the number of the parameters to be optimized. When considering the overall computational complexity, which includes multiple terms, its order is determined by that of the fastest growing term (with the highest order of ). As such, the overall asymptotic computational complexity of the proposed scheme is . As such, the computational complexity of the proposed method is much higher than the STFT-based method, but is only slightly higher than the SLR and SBL approaches. This is the price we pay in order to achieve robust and accurate spectrum estimation with missing observations, as we will demonstrate in the next Section.

## 4 Numerical Experiments and Analysis

In this section, numerical experiments are conducted to evaluate the performance of the proposed algorithm in comparison with those reported in the literature. In this section, the input SNR is defined as [9, 16]

(40) |

where is the signal vector, and denotes the power of additive white Gaussian noise.

In particular, two performance measures are defined for the evaluation of the hopping time and the instantaneous frequency (IF) detection performance, respectively. The ratio of correct hopping time detection is defined as [16]

(41) |

where is the number of Monte Carlo trails and is the ratio of correct detections in the -th Monte Carlo trial. A correct hopping time detection is declared if the estimated hopping instant is less than 3 observations away from the associated true hopping instant. The hopping time statistic is defined as . The same definition is used in references [9, 16]. The ratio of incorrect IF detection is defined as [16]

(42) |

where is the ratio of correct frequency detections in the -th Monte Carlo trial.

Simulation results are provided to demonstrate the effectiveness of the proposed approach. First, an illustrative example is given in Fig. 6(a), where the FH signals are identical to those used in reference [16]. The signals are generated as follows: The first FH component is active with a carrier frequency of KHz within the range of time index and the carrier frequency hops to KHz within the range of time index . The second hopping component is active with a carrier frequency of KHz within the range of time index and the carrier frequency hops to KHz within the range of time index . The third hopping component is active with a carrier frequency of KHz within the range of time index and the carrier frequency hops to KHz within the range of time index . The sampling frequency is KHz. The model hyper-parameters for the priors are set as follows: , the value of is assigned as in (25), and . The initial conditions are set as , and , where yields the scalar variance of a vector. Fig. 6(a) shows the true TF trajectories of the generated FH signals. The TF analysis of such multi-component FH signals, particularly at a low input SNR, is a challenging problem. Fig. 6(b) shows the spectrogram with missing samples and input SNR of . It is evident that the TF signatures can be hardly recognized with linear approach even in the case where the missing-sample rate is low and the input SNR is high.

In the following, we show the superior performance achieved by the proposed method for the situation where the SNR is set to dB, and the missing-sample rate is . The joint-variable representations of the missing-sample FH spectrum and their kernelled versions are presented in Fig. 7. In Figs. 7(a) through 7(c), no kernel is adopted. The impact of missing samples can be clearly observed from the IAF showing in Fig. 7(b), and the auto-terms can hardly be identified from both AF and WVD in Figs. 7(a) and 7(c). Figs. 7(d) through 7(f) show the corresponding joint-variable representations when the AOK is applied.

In this case, because of the low SNR and the missing samples as well as the required marginal properties, the optimization process in the AOK is severely distorted. As the result, although the AF plane is much cleaner compared with Fig. 7(a), a satisfactory kernelled result cannot be achieved. Rather, the estimated TFR in Fig. 7(f) shows strong vertical strips. In Figs. 7(g) to 7(i), the proposed revised ECSK plus AOK scheme is adopted. It can be observed from Fig. 7(g) that the auto-term energy in the AF is integrally preserved, while nearly all the undesired terms are suppressed. Nevertheless, direct estimation of the instantaneous frequencies from this plot is still difficult because of the low TF resolution. Therefore, we use the structure-aware BCS to obtain an improved FH spectrum estimation with a finer resolution. The yielding result and the comparison between true and estimated hopping time statistics are respectively depicted in Figs. 8 (a) and (b), which showcase a significant improvement as compared to all the above results depicted in Fig. 7.

To better demonstrate the effectiveness of the proposed method with statistical results, Monte Carlo trials are conducted with the input SNRs varying from dB to dB. In Fig. 9, comparisons are drawn among different existing approaches. It can be summarized from Fig. 9 that an improved performance is obtained by using the proposed method, and the advantage is more remarkable in the low SNR cases. Regarding the influence of missing samples and SNR on the algorithm performance, the statistical results are provided in Fig. 10. Note that the detection performance of the other methods are very poor and thus are not included in Fig. 10 when we compare the performance in the presence of missing observations. As stated above, existing linear TF analysis based approaches cannot robustly perform spectrum estimation with missing observations. Hence, in the presence of missing samples as studied in this paper, these methods yield a detection ratio which is very close to for all SNR values being investigated.

To explore the possibility to skip the pre-filtering parameter optimization process by adopting the average values after collecting sufficient estimations, we conduct numerical trials using the above simulation settings. As the simulation results shown in Fig. 11 indicate, this inevitably affect the pre-filtering performance and consequently slightly degrade the reconstruction accuracy.

*Remarks:* Unlike the method proposed in [16] which considers FH signal recovery using linear TF analysis (i.e., STFT), the proposed work utilizes the bilinear TF methods. As the bilinear TF methods can use kernel designs to filter out undesired signal components, the proposed method can better utilize the known properties of FH signatures to design the kernels, thus enhancing the FH signal before applying BCS-based sparse reconstruction. This is particularly important in the presence of strong artifacts and noise. Note that the design of such kernels in the structure-aware context is a core contribution of this paper. Such kernel design is not offered in the linear STFT-based approaches. As such, the proposed work is very different to that in [16] and its advantages can be easily understood in concept and are clearly demonstrated through the above simulation results.

## 5 Conclusion

In this paper, a novel structure-aware FH spectrum estimation approach with the consideration of missing observations was proposed in the sparse reconstruction framework. In particular, a TF kernel was designed to effectively utilize the inherent FH signal structure. The kernelled joint-variable representation over time and lag was used to provide the TF signal representation through sparse reconstruction. In the sparsity-based spectrum estimation process, the structure of the entry under test and its neighborhood is exploited to impose a structure prior on the Bayesian inference. It was shown that this approach significantly outperforms existing approaches devised for the same problem.

## Acknowledgment

The authors would like to thank the anonymous reviewers for their valuable comments, which have helped improve the quality and clarity of this paper.

[]Shengheng Liu (S’14-M’17) is currently a Postdoctoral Fellow at the Institute for Digital Communications, School of Engineering, The University of Edinburgh, UK. Prior to joining UoE, he received the B.Eng. and Ph.D. degrees in Electronics Engineering from the School of Information and Electronics, Beijing Institute of Technology, China, in 2010 and 2017 respectively. He also worked as a Visiting Research Associate from 2015 to 2016 at the Department of Electrical and Computer Engineering, Temple University, Philadelphia, PA, USA, under the support of the China Scholarship Council. His research interests include compressive sensing and time-frequency analyses for non-stationary signals, interference cancellation and coherent integration for passive bistatic radars, as well as image reconstruction in electrical impedance tomography. He is a frequent reviewer for several top-tier journals, including IEEE Transactions on Signal Processing, IEEE Transactions on Audio, Speech, and Language Processing, and IEEE Transactions on Instrumentation and Measurement.

[]Yimin D. Zhang (SM’01) received his Ph.D. degree from the University of Tsukuba, Tsukuba, Japan, in 1988.

He joined the faculty of the Department of Radio Engineering, Southeast University, Nanjing, China, in 1988. He served as a Director and Technical Manager at the Oriental Science Laboratory, Yokohama, Japan, from 1989 to 1995, and a Senior Technical Manager at the Communication Laboratory Japan, Kawasaki, Japan, from 1995 to 1997. He was a Visiting Researcher at the ATR Adaptive Communications Research Laboratories, Kyoto, Japan, from 1997 to 1998. From 1998 to 2015, he was with the Villanova University, Villanova, PA, where he was a Research Professor at the Center for Advanced Communications, and was the Director of the Wireless Communications and Positioning Laboratory and the Director of the Radio Frequency Identification (RFID) Laboratory. Since August 2015, he has been with the Department of Electrical and Computer Engineering, College of Engineering, Temple University, Philadelphia, PA, where is an Associate Professor. His general research interests lie in the areas of statistical signal and array processing for radar, communications, and satellite navigation applications, including compressive sensing, convex optimization, nonstationary signal and time-frequency analysis, MIMO systems, radar imaging, target localization and tracking, wireless and cooperative networks, and jammer suppression.

He has 12 book chapters and more than 300 journal articles and peer-reviewed conference papers. Dr. Zhang is an Associate Editor for the IEEE Transactions on Signal Processing, and an Editor for the *Signal Processing* journal. He was an Associate Editor for the IEEE Signal Processing Letters during 2006–2010, and an Associate Editor for the *Journal of the Franklin Institute* during 2007–2013. Dr. Zhang is a member of the Sensor Array and Multichannel (SAM) Technical Committee of the IEEE Signal Processing Society, and a Technical Co-chair of the 2018 IEEE Sensor Array and Multichannel Signal Processing Workshop.

[]Tao Shan (M’15) received his B.S. degree from Xidian University, Xi’an, in 1991 and Ph.D. degree from Beijing Institute of Technology in 2004. Currently, he is an Associate Professor with the School of Information and Electronics, Beijing Institute of Technology. From 2014 to 2015, he was a Senior Visiting Scholar at the Center for Advanced Communications, Villanova University, PA. He was a recipient of the first prize of science and technology progress awarded by the Ministry of Education in 2006 and 2007 respectively. His research interests include radar signal processing and time-frequency analysis for non-stationary signals.

[]Ran Tao (M’00-SM’04) received the B.S. degree from Electronic Engineering Institute of PLA, Hefei, in 1985 and the M.S. and Ph.D. degrees from Harbin Institute of Technology, Harbin, in 1990 and 1993, respectively. He has been a senior visiting scholar at the University of Michigan, Ann Arbor, MI, and the University of Delaware, DE, in 2001 and 2016, respectively. He is currently a Professor with the School of Information and Electronics, Beijing Institute of Technology, Beijing, China. He is a Fellow of the Institute of Engineering and Technology (IET), and a Fellow of the Chinese Institute of Electronics (CIE).

Dr. Tao was a recipient of National Science Foundation of China for Distinguished Young Scholars in 2006, and a Distinguished Professor of Changjiang Scholars Program in 2009. He has been a Chief Professor of the Creative Research Groups of the National Natural Science Foundation of China since 2014, and he was a Chief Professor of the Program for Changjiang Scholars and Innovative Research Team in University during 2010 to 2012. He is currently the Vice Chair of IEEE China Council. He is also the Vice Chair of the International Union of Radio Science (URSI) China Council and a Member of Wireless Communication and Signal Processing Commission of URSI. He was a recipient of the first prize of science and technology progress in 2006, 2007, respectively, and the first prize of natural science in 2013, both awarded by the Ministry of Education. His current research interests include fractional Fourier transform and its applications, theory and technology for radar and communication systems. He has 3 books and more than 100 peer-reviewed journal articles.

### References

- S.-H. Liu, Y. D. Zhang, and T. Shan, “Sparsity-based frequency-hopping spectrum estimation with missing samples,” in Proc. 2016 IEEE Radar Conference, Philadelphia, PA, USA, May 2016.
- S.-H. Liu, Y. D. Zhang, and T. Shan, “Structure-aware Bayesian compressive sensing for frequency-hopping spectrum estimation,” in Proc. SPIE 9857, Compressive Sensing V: From Diverse Modalities to Big Data Analytics, Baltimore, MD, USA, pp. 98570N, May 2016.
- S. V. Maric and E. L. Titlebaum, “A class of frequency hop codes with nearly ideal characteristics for use in multiple-access spread-spectrum communications and radar and sonar systems,” IEEE Trans. Commun., vol. 40, no. 9, pp. 1442–1447, Sept. 1992.
- Y. Zhang and M. G. Amin, “MIMO radar exploiting narrowband frequency-hopping waveforms,” in Proc. European Signal Process. Conf., Lausanne, Switzerland, pp. 1–5, Aug. 2008.
- C. Y. Chen and P. P. Vaidyanathan, “MIMO radar ambiguity properties and optimization using frequency-hopping waveforms,” IEEE Trans. Signal Process., vol. 56, no. 12, pp. 5926–5936, Dec. 2008.
- A. R. Hunt, “Use of a frequency-hopping radar for imaging and motion detection through walls,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 5, pp. 1402–1408, May 2009.
- S. Gogineni, A. Nehorai, “Frequency-hopping code design for MIMO radar estimation using sparse modeling,” IEEE Trans. Signal Process., vol. 60, no. 6, pp. 3022–3035, June 2012.
- M. K. Simon, U. Cheng, L. Aydin, A. Polydoros, and B. K. Levitt, “Hop timing estimation for noncoherent frequency-hopped M-FSK intercept receivers,” IEEE Trans. Commun., vol. 43, no. 2/3/4, pp. 1144–1154, Feb./Mar./April 1995.
- D. Angelosante , G. B. Giannakis, and N. D. Sidiropoulos, “Estimating multiple frequency-hopping signal parameters via sparse linear regression,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5044–5056, Oct. 2010.
- M. G. Amin, B. Jokanović, Y. D. Zhang, and F. Ahmad, “A sparsity-perspective to quadratic time-frequency distributions,” Digital Signal Process., vol. 46, pp. 175–190, Nov. 2015.
- B. Jokanović and M. G. Amin, “Reduced interference sparse time-frequency distributions for compressed observations,” IEEE Trans. Signal Process., vol. 63, no. 24, pp. 6698–6709, Dec. 2015.
- D. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.
- E. Candès, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Commun. Pur. Appl. Math., vol. 59, no. 8, pp. 1207–1223, Aug. 2006.
- P. Flandrin and P. Borgnat, “Time-frequency energy distributions meet compressed sensing,” IEEE Trans. Signal Process., vol. 58, no. 6, pp. 2974-2982, June 2010.
- D. Angelosante, G. Giannakis, and N. Sidiropoulos, “Sparse parametric models for robust nonstationary signal analysis: Leveraging the power of sparse regression,” IEEE Signal Process. Mag., vol. 30, no. 6, pp. 64–73, Nov. 2013.
- L. Zhao, L. Wang, G. Bi, L. Zhang, and H. Zhang, “Robust frequency-hopping spectrum estimation based on sparse Bayesian method,” IEEE Trans. Wireless Commun., vol. 14, no. 2, pp. 781–793, Feb. 2015.
- Y. D. Zhang, M. G. Amin, and B. Himed, “Reduced interference time-frequency representations and sparse reconstruction of undersampled data,” in Proc. 21st European Signal Process. Conf. (EUSIPCO), Marrakech, Morocco, pp. 1–5, Sep. 2013.
- B. Jokanović, M. G. Amin, Y. D. Zhang, and F. Ahmad, “Time-frequency kernel design for sparse joint-variable signal representations,” in Proc. 22nd European Signal Process. Conf. (EUSIPCO), Lisbon, Portugal, pp. 1–5, Sep. 2014.
- L. Stanković, S. Stanković, I. Orović, and Y. D. Zhang, “Time-frequency analysis of micro-Doppler signals based on compressive sensing,” in M. Amin (ed.), Compressive Sensing for Urban Radars, CRC Press, 2014.
- L. Cohen, Time-Frequency Analysis. Prentice Hall, 1995.
- B. Boashash, Time-Frequency Signal Analysis and Processing: A Comprehensive Reference, Academic Press, 2015.
- D. L. Jones and R. G. Baraniuk, “Signal-dependent time-frequency analysis using a radially Gaussian kernel,” Signal Process., vol. 32, no. 3, pp. 263–284, June 1993.
- R. G. Baraniuk and D. L. Jones, “An adaptive optimal-kernel time-frequency representation,” IEEE Trans. Signal Process., vol. 43, no. 10, pp. 2361–2371, Oct. 1995.
- Q. Wu, Y. D. Zhang, and M. G. Amin, “Continuous structure based Bayesian compressive sensing for sparse reconstruction of time-frequency distributions,” in Proc. Int. Conf. Digital Signal Process., Hong Kong, China, Aug. 2014, pp. 831–836.
- S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2346–2356, June 2008.
- S.-H. Liu, Y. D. Zhang, and T. Shan, “Detection of weak astronomical signals with frequency-hopping interference suppression,” Digital Signal Process., vol. 72, pp. 1–8, Jan. 2018.
- L. Yu, H. Sun, J. P. Barbot, and G. Zheng, “Bayesian compressive sensing for cluster structured sparse signals,” Signal Process., vol. 92, no. 1, pp. 259–269, Jan. 2012.
- B. Boashash, N. A. Khan, and T. Ben-Jabeur, “Time-frequency features for pattern recognition using high-resolution TFDs: A tutorial review,” Digital Signal Process., vol. 40, pp. 1–30, May 2015.
- M. Abed, A. Belouchrani, M. Cheriet, and B. Boashash, “Time-frequency distributions based on compact support kernels: Properties and performance evaluation,” IEEE Trans. Signal Process., vol. 60, no. 6, pp. 2814–2827, June 2012.
- D. L. Jones and T. W. Parks, “A high resolution data-adaptive time-frequency representation,” IEEE Trans. Acoust. Speech Signal Process., vol. 38, no. 12, pp. 2127–2135, Dec. 1990.
- D. L. Jones and R. Baraniuk, “A simple scheme for adapting time-frequency representations,” IEEE Trans. Signal Process., vol. 42, no. 12, pp. 3530–3535, Dec. 1994.
- R. G. Baraniuk, P. Flandrin, A. J. E. M. Janssen, and O. J. J. Michel, “Measuring time-frequency information content using the Rényi entropies,” IEEE Trans. Inf. Theory, vol. 47, no. 4, pp. 1391–1409, May 2001.
- S. Aviyente and W. J. Williams, “Minimum entropy time-frequency distributions,” IEEE Signal Process. Lett., vol. 12, no. 1, pp. 37–40, Jan. 2005.
- L. J. Stanković, “A measure of some time-frequency distributions concentration,” Signal Process., vol. 81, no. 3, pp. 621–631, Mar. 2001.
- O. Michel, R. G. Baraniuk, and P. Flandrin, “Time-frequency based distance and divergence measures,” in Proc. IEEE-SP Int. Symp. Time-Freq. Time-Scale Anal., Philadelphia, PA, USA, pp. 64–67, Oct. 1994.
- J. Zhang and A. C. Sanderson, “JADE: adaptive differential evolution with optional external archive,” IEEE Trans. Evol. Comput., vol. 13, no. 5, pp. 945–958, Oct. 2009.
- Y. Zai, L. Xie, and C. Zhang, “Variational Bayesian algorithm for quantized compressed sensing,” IEEE Trans. Signal Process., vol. 61, no. 11, pp. 2815–2824, June 2013.
- D. P. Wipf and B. D. Rao, “Sparse Bayesian learning for basis selection,” IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2153–2164, Aug. 2004.
- L. Yu, J. P. Barbot, G. Zheng, and H. Sun, “Compressive sensing for cluster structured sparse signals: Variational Bayes approach,” Technical Report, 2011. Available at http://hal.archives-ouvertes.fr/docs/00/57/39/53/PDF/cluss␣vb.pdf.
- M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, Sept. 2001.
- J.-Q. Zhang and A. C. Sanderson, Adaptive Differential Evolution: A Robust Approach to Multimodal Problem Optimization. Springer Berlin Heidelberg, 2009.