Dictionary Learning for Adaptive GPR Landmine Classification
Abstract
Ground penetrating radar (GPR) target detection and classification is a challenging task. Here, we consider online dictionary learning (DL) methods to obtain sparse representations (SR) of the GPR data to enhance feature extraction for target classification via support vector machines. Online methods are preferred because traditional batch DL like KSVD is not scalable to highdimensional training sets and infeasible for realtime operation. We also develop DropOff MINibatch Online Dictionary Learning (DOMINODL) which exploits the fact that a lot of the training data may be correlated. The DOMINODL algorithm iteratively considers elements of the training set in small batches and drops off samples which become less relevant. For the case of abandoned antipersonnel landmines classification, we compare the performance of KSVD with three online algorithms: classical Online Dictionary Learning, its correlationbased variant, and DOMINODL. Our experiments with real data from Lband GPR show that online DL methods reduce learning time by 3693% and increase mine detection by 428% over KSVD. Our DOMINODL is the fastest and retains similar classification performance as the other two online DL approaches. We use a KolmogorovSmirnoff test distance and the DvoretzkyKieferWolfowitz inequality for the selection of DL input parameters leading to enhanced classification results. To further compare with stateoftheart classification approaches, we evaluate a convolutional neural network (CNN) classifier which performs worse than the proposed approach. Moreover, when the acquired samples are randomly reduced by 25%, 50% and 75%, sparse decomposition based classification with DL remains robust while the CNN accuracy is drastically compromised.
ground penetrating radar, online dictionary learning, adaptive radar, deep learning, radar target classification, sparse decomposition
I Introduction
A ground penetrating radar (GPR, hereafter) is used for probing the underground by transmitting radio waves from an antenna held closely to the surface and acquiring the echoes reflected from subsurface anomalies or buried objects. As the electromagnetic wave travels through the subsurface, its velocity changes due to the physical properties of the materials in the medium. By recording such changes in the velocity and measuring the travel time of the radar signals, a GPR generates profiles of scattering responses from the subsurface. The interest in GPR is due to its ability to reveal buried objects noninvasively and detect nonmetallic scatterers with increased sensitivity to dielectric contrast [1]. From the recordings of the previously observed regions, GPR surveys can also extrapolate subsurface knowledge for inaccessible or unexcavated areas. This sensing technique is, therefore, attractive for several applications such as geophysics, archeology, forensics, and defense (see e.g. [2, 1] for some surveys). Over the last decade, there has been a spurt in GPR research because of advances in electronics and computing resources. GPR has now surpassed traditional ground applications and has become a more general ultrawideband remote sensing system with proliferation to novel avenues such as throughthewall imaging, building construction, food safety monitoring, and vegetation observation.
In this work, we consider the application of detecting buried landmines using GPR. This is one of the most extensively investigated GPR applications due to its obvious security and humanitarian importance[3]. Mine detection GPR usually operates in the Lband ( GHz) with ultrawideband (UWB) transmission in order to achieve both sufficient resolution to detect small targets ( cm diameter) and penetrate solid media at shallow depths ( cm) [4].
Even though a lot of progress has been made on GPR for landmine detection, discriminating them from natural and manmade clutter remains a critical challenge. In such applications, the signal distortion due to inhomogeneous soil clutter, surface roughness and antenna ringing hampers target recognition. Moreover, the constituting material of many models of landmines is largely plastic and has a very weak response to radar signals due to its low dielectric contrast with respect to the soil [2]. Finally, a major problem arises due to low radar cross section (RCS) of some landmine models [5]. A variety of signal processing algorithms have been proposed for detection of low metalcontent landmines in realistic scenarios; approaches based on feature extraction and classification are found to be the most successful (see e.g. [6, 7, 8]), yet falsealarm rates remain very high.
Sparse representation (SR) is effective in extracting the mid or highlevel features in image classification [9, 10]. In the context of antipersonnel landmines recognition using GPR, our prior work [11, 12] has shown that frameworks based on SR improve the performance of Support Vector Machine (SVM) classifiers in distinguishing different types of mines and clutter in highly corrupted GPR signals. In this approach, the signalofinterest is transformed into a domain where it can be expressed as a linear combination of only a few atoms chosen from a collection called the dictionary matrix [13, 14]. The dictionary may be learned from the data it is going to represent. Dictionary learning (DL) techniques (or sparse coding in machine learning parlance) aim to create adaptive dictionaries which provide the sparsest reconstruction for given trainingsets, i.e., a representation with a minimum number of constituting atoms. DL methods are critical building blocks in many applications such as deep learning, image denoising, and superresolution; see [15, 16, 17] for further applications.
Classical DL algorithms such as Method of Optimal Directions (MOD) [18] and KSVD [13] operate in batches  dealing with the entire training set in each iteration. Although extremely successful, these methods are computationally demanding and not scalable to highdimensional training sets. An efficient alternative is the Online Dictionary Learning (ODL) algorithm [19] that has faster convergence than batch DL methods. In this work, we develop a new approach toward classification based on online DLSR framework for the specific case of GPRbased landmine identification.
The main contributions of this paper are as follows:
1) Faster DL for GPR landmine classification.^{1}^{1}1The conference precursor of this work was presented in IEEE International Geoscience and Remote Sensing Symposium, 2017 [12]. We investigate the application of DL towards GPRbased landmine classification. To the best of our knowledge, this has not been investigated previously. Furthermore, online DL^{2}^{2}2We use the term “online DL” to imply any algorithm that operates in online mode. From here on, we reserve the term ODL solely to refer to the method described in [19]. methods have been studied more generally in GPR. Only one other previous study has employed DL (KSVD) using GPR signals [20], although for the application of identifying bedrock features. We employ online DL methods and use the coefficients of the resulting sparse vectors as input to a SVM classifier to distinguish mines from clutter. Our comparison of KSVD and online DL using real data from Lband GPR shows that online DL algorithms present distinct advantages in speed and low falsealarm rates.
We propose a new DropOff MINibatch Online Dictionary Learning (DOMINODL) which processes the training data in minibatches and avoids unnecessary update of the irrelevant atoms in order to reduce the computational complexity. The intuition for the dropoff step comes from the fact that some training samples are highly correlated and, therefore, in the interest of processing time, they can be dropped during training without significantly affecting performance.
2) Better statistical metrics for improved classification. Contrary to previous studies [20] which determine DL parameters (number of iterations, atoms, etc.) based on bulk statistics such as normalized rootmeansquareerror (NRMSE), we consider statistical inference for parameter analysis. Our methods based on KolmogorovSmirnoff test distance [21] and DvoretzkyKieferWolfowitz (DKW) Inequality [22, 23] are able to finetune model selection resulting in improved mine classification performance.
3) Experimental validation for different landmine sizes. Our comparison of KSVD with three online DL algorithms  ODL, its correlationbased variant [24] and DOMINODL  shows that online methods successfully detect mines with very small RCS buried deep into clutter and noise. Some recent studies [25, 26, 27, 28] employ stateoftheart deep learning approaches such as a convolutional neural network (CNN) to classify GPRbased mines data. Our comparison with CNN illustrates that it has poorer performance in detection of small mines than our online DL approaches. This may also be caused by the relatively small dimensions of our training set which, even if perfectly adequate for DL, may not meet the expected requirements for CNN [29]. We also show that the classification performance of online DL methods does not deteriorate significantly when signal samples are reduced.
The rest of the paper is organized as follows. In the next section, we formally describe the classification problem and GPR specific challenges. In Section III, we explain various DL algorithms used in our methodology and also describe DOMINODL. We provide an overview of the GPR system and field campaign to collect GPR data sets in Section IV. In Section V, we introduce our techniques for DL parameter selection. Section VI presents classification and reconstruction results using real radar data. We conclude in Section VII.
Throughout this paper, we reserve boldface lowercase and uppercase letters for vectors and matrices, respectively. The th element of a vector y is while the th entry of the matrix Y is . We denote the transpose by . We represent the set of real and complex numbers by and , respectively. Other sets are represented by calligraphic letters. The notation stands for the norm of its argument and is the Frobenius norm. A subscript in the parenthesis such as is the value of the argument in the th iteration. The convolution product is denoted by . The function outputs a diagonal matrix with the input vector along its main diagonal. We use to denote probability, is the statistical expectation, and denotes the absolute value. The functions and output the maximum and supremum value of their arguments, respectively.
Ii Problem Formulation
A pulsed GPR transmits a signal into the ground and receives its echo for each point in the radar coverage area. The digital samples of this echo constitute a range profile designated by the signal vector where is the number of range cells. Formally, the SR of can be described by , where with column vectors or atoms is a redundant or overcomplete () dictionary, and is the sparse representation vector. The SR process finds the decomposition that uses the minimum number of atoms to express the signal. If there are range profiles available, then the SR of the data is described as , where .
Our goal is to classify different mines (including the ones with small RCS) and clutter based on the SR of range profiles using fast, online DL methods. The GPR range profiles from successive scans are highly correlated. We intend to exploit this property during the DL process. In the following, we describe the SRbased classification and mention its challenges.
Iia GPR Target Classification Method
We use SVM to classify sparsely represented GPR range profiles using the learned dictionary . Given a predefined collection of labeled observations, SVM searches for a functional that maps any new observation to a class . A binary classifier with linearly separable data, for example, would have . In our work, we use , i.e. the sparse decomposition of a given set of signals (the ‘training’ signals) using the learned dictionary , as a set of labeled observations for the SVM. SVM transforms the data into a high dimensional feature space where it is easier to separate between different classes. The kernel function that we use to compute the high dimensional operations in the feature space, is the Gaussian Radial Basis Function (RBF): , where is the free parameter that decides the influence of the support data vector on the class of the vector in the original space and is the parameter for the soft margin cost function which controls the influence of each individual support vector [30]. To optimally select the SVM input parameters, we arrange the original classification set into training and validation vectors in different ways (fold crossvalidation with =10) to arrive at a certain mean crossclassification accuracy of the validation vectors. The folds were randomly selected and their number was found empirically by determining the limit above which the accuracy improvement was negligible. We refer the reader to [30] for more details on SVM.
IiB The Basic Framework of DL
In many applications, the dictionary is unknown and has to be learned from the training signals coming from the desired class. A DL algorithm finds an overcomplete dictionary that sparsely represents measurements such that . Each of the vectors is a sparse representation of with only nonzero entries. A nontractable formulation of this problem is
(1) 
Since both and are unknown, a common approach is to use alternating minimization in which we start with an initial guess of and then obtain the solution iteratively by alternating between two stages  sparse coding [31] and dictionary update [32]  as follows:
1) Sparse Coding: Obtain as:
(2) 
where is the SR in iteration. This can be solved using greedy algorithms such as orthogonal matching pursuit (OMP) (), convex relaxation methods like basis pursuit denoising (BPDN) () or focal underdetermined system solver (FOCUSS) ().
2) Dictionary Update: Given , update such that
(3) 
where is a set of all dictionaries with unit columnnorms, for . This subproblem is solved by methods such as singular value decomposition or gradient descent [31, 19].
Classical methods such as MOD [18] and KSVD [13] retain a guess for and and iteratively update either using basis/matched pursuit (BP/MP) or using least squares solver. Both MOD and KSVD operate in batches, i.e. they deal with the entire training set in each iteration, and solve the same dictionary learning model but differ in the optimization method. Since the initial guesses of or can be far removed from the actual dictionary, the BP step may behave erratically. While there are several stateoftheart results that outline DL algorithms with concrete performance guarantees, e.g. [33, 34, 16], they require stronger assumptions on the observed data. In practice, heuristic DL such as MOD and KSVD do yield overcomplete dictionaries although provable guarantees for such algorithms are difficult to come by.
Several extensions of batch DL have been proposed e.g. label consistent (LC) KSVD [35] and discriminative KSVD [36] introduced label information into the procedure of learning dictionaries to make them more discriminative. The performance of KSVD can be improved in terms of both computational complexity and obtaining an incoherent dictionary if the learning process enforces constraints such as hierarchical tree sparsity [37], structured group sparsity (StructDL) [38], Fisher discrimination (FDDL) [39], and lowrankandFisher (DLR) [40]. Often objects belonging to different classes have common features. This has been exploited in improving KSVD to yield methods such as DL with structured incoherence and shared features (DLSI) [41], separating the commonality and the particularity (COPAR) [42], convolutional sparse DL (CSDL) [43], shiftinvariant DL [44], principal component analysis DL [45], convolutional DL [46] and lowrank shared DL (LRSDL) [47]. A recent review of various DL algorithms can be found in [48].
In general, batch DL methods are computationally demanding at test time and not scalable to highdimensional training sets. On the other hand, online methods such as ODL [19] converge fast and process small sets. A few improvements to ODL have already been proposed. For example, [49] considered a faster Online Sparse Dictionary Learning (OSDL) to efficiently handle bigger training set dimensions using a doublesparsity model. A recent study [24] notes that even though online processing reduces computational complexity compared to batch methods, ODL performance can be further improved if the useful information from previous data is not ignored in updating the atoms. In this study, a new online DL called CorrelationBased Weighted Least Square Update (CBWLSU) was proposed, which employs only part of the previous data correlated with current data for the update step. The CBWLSU is relevant to GPR because the latter often contains highly correlated range profiles.
In this paper, our focus is to investigate such fast DL methods in the context of GPRbased landmine detection and classification. We also propose a new online DL method that exploits range profile correlation as in CBWLSU but is faster than both ODL and CBWLSU. Our inspiration is the KSVD variant called incremental structured DL (ISDL) that was used earlier in the context of SAR imaging [50]. In ISDL, at each iteration, a small batch of samples is randomly drawn from the training set. Let be the set of indices of the minibatch training elements chosen uniformly at random at the iteration. Then, ISDL updates the dictionary using the minibatch and the corresponding representation coefficient . The fast iterative shrinkagethresholding algorithm (FISTA) [51] and block coordinate descent methods solve the sparse coding and dictionary update, respectively. As we will see in later sections, the minibatch strategy that we employed in our DOMINODL reduces computational time without degrading performance.
Iii Online DL
We now describe the DL techniques used for GPR target classification and then develop DOMINODL in order to address challenges of long training times in the context of our problem.
Iiia KSVD, LRSDL, ODL and CBWLSU
As mentioned earlier, the popular KSVD algorithm [52] sequentially updates all the atoms during the dictionary update step using all training set elements. For the sparse coding step at iteration , KSVD employs OMP with the formulation:
(4) 
where is the maximum residual error used as a stopping criterion. For the dictionary update at iteration , KSVD solves the global minimization problem (3) via sequential minimization problems, wherein every column of and its corresponding row of coefficients of are updated, as follows
(5) 
The update process employs SVD to find the closest rank1 approximation (in Frobenius norm) of the error term subject to the constraint .
Another recent batch method of interest is the lowrank shared DL (LRSDL) [47]. This is a discriminative batch DL algorithm (others being DKSVD [36] and LCKSVD [35]) that learns by promoting the generation of a dictionary which is separated in blocks of atoms associated to different classes as where is the number of classes present in the training set . The resultant coefficient matrix has a block diagonal structure. The assumption of nonoverlapping subspaces is often unrealistic in practice. Techniques such as COPAR [42], JDL [53] and CSDL [43] exploit common patterns among different classes even though different objects possess distinct classspecific features. These methods produce am additional constituent which is shared among all classes so that . The drawback of these strategies is that the shared dictionary may also contain classdiscriminative features. To avoid this problem, LRSDL requires that the shared dictionary must have a lowrank structure and that its sparse coefficients have to be almost similar. Once the data is sparsely represented with such dictionaries, a sparserepresentationbased classifier (SRC) is used to predict the class of new data. The LRSDL update process employs alternating direction method of multipliers (ADMM) [54] and FISTA for the sparse decomposition step.
ODL is an interesting alternative for inferring a dictionary from large training sets or ones which change over time [19]. ODL also updates the entire dictionary sequentially, but uses one element of training data at a time for the dictionary update. Assuming that the training set is composed of independent and identically distributed samples of a distribution , ODL first draws an example of the training set from . Then, the sparse coding is the Choleskybased implementation of the LARSLASSO algorithm [55]. The latter solves a regularized leastsquares problem for each column of . In the dictionary update we consider all the training set elements analyzed so far, namely, :
(6) 
In the next step, each column of is sequentially updated via gradient descent using the dictionary computed in the previous iteration. Before receiving the next training data, the dictionary update is repeated multiple times for convergence.
CBWLSU is an online method that introduces an interesting alternative for the dictionary update step [24]. Like ODL, CBWLSU evaluates one new training data . However, to update the dictionary, it searches among all previous training data and uses only the ones which share the same atoms with . Let with be the set of previous training elements at iteration . Define as the set of indices of all previous training elements that are correlated with the new element such that . The new training set is . Then, CBWLSU employs a weighting matrix to evaluate the influence of the selected previous elements for the dictionary update step and solves the optimization problem therein via weighted least squares. Unlike KSVD and ODL, CBWLSU does not require the dictionary pruning step to replace the unused or rarely used atoms with the training data. The sparse coding in CBWLSU is achieved via batch OMP.
IiiB DropOff MiniBatch Online Dictionary Learning
DL step  KSVD  LRSDL  ODL  CBWLSU  DOMINODL 

Training method  Batch  Batch  Online  Online  Online 
Sparse coding method  OMP  FISTA  LARS  Batch OMP  Entropythresholded batch OMP 
Dictionary update  Entire atomwise  Entire  Entire groupwise  Entire atomwise  Partial adaptively groupwise 
Training samples per iteration  Entire  Entire  
Optimization method  SVD  ADMM  Gradient descent  Weighted least squares  Weighted least squares 
Postupdate dictionary pruning  Yes  No  Yes  No  No 
Trainingset dropoff  No  No  No  No  Yes 
We now introduce our DOMINODL approach for online DL which not only leads to a dictionary () that is tuned to sparsely represent the training set () but is also faster than other online algorithms. The key idea of DOMINODL is as follows: When sequentially analyzing the training set, it is pertinent to leverage the memory of previous data in the dictionary update step. However, algorithms such as CBWLSU consider all previous elements. Using all previous training set samples is computationally expensive and may also slow down convergence. The samples which have already contributed in the dictionary update do not need to be considered again. Moreover, in some realtime applications (such as highly correlated range profiles of GPR), their contribution may not be relevant anymore for updating the dictionary.
In DOMINODL, we save computations by considering only a small batch of previous elements that are correlated with the new elements. The two sets are defined correlated if, in their sparse decomposition, they have at least one common nonzero element. The time gained from considering fewer previous training elements is used to consider a minibatch of new training data (instead of a single element as in ODL and CBWLSU). The sparse coding step of DOMINODL employs batch OMP, selecting the maximal residual error in (IIIA) using a datadriven entropybased strategy as described later in this section. At the end of each iteration, DOMINODL also dropsoff those previous training set elements that have not been picked up after a certain number of iterations, . The minibatch drawing combined with dropping off training elements and entropybased criterion to control sparsity results in an extremely fast online DL algorithm that is beneficial for realtime radar operations.
We initialize the dictionary using a collection of training set samples that are randomly chosen from . We then perform a sparse decomposition of with the dictionary . Let the iteration count indicate the element of the training set. We define the minibatch of new training elements as , where with and . When , we simply take the remaining new elements to constitute this minibatch^{3}^{3}3In numerical experiments, we observed that the condition rarely occurs because DOMINODL updates the dictionary and converges in very few iterations. The algorithm also ensures that the number of previous samples before the dictionary update. If this condition is not fulfilled, then it considers all previous training samples.. We store the set of dictionary atoms participating in the SR of the signals in as . We define with as the set of previous training elements at iteration . We consider a randomly selected minibatch with such that . Let where such that be a subset of previous training elements that are correlated with the minibatch of new elements. In order to avoid multiple occurrences of the same element in consecutive minibatches, DOMINODL ensures that . Let be the set of dictionary atoms used for SR of . Our new training set is . Both minibatches of new and previous elements are selected such that the entire training set size () is still smaller than that of CBWLSU where it is .
The dictionary update subproblem reduces to considering only the sets , and :
(7) 
Assume that the sparse coding for each example is known and define the errors as
(8) 
We can update , such that the above error is minimized, with the assumption of fixed . A similar problem is considered in MOD where error minimization is achieved through least squares. Here, we employ weighted least squares inspired by the fact that it has shown improvement in convergence over standard least squares [24]. We compute the weighting matrix using the sparse representation error :
(9) 
We then solve the optimization problem
(10) 
This leads to the weighted least squares solution
(11) 
The dictionary is then updated with the atoms and its columns are normalized by their norms.
The is next used for updating the sparse coding of using batch OMP. Selecting a value for the maximal residual error in (IIIA) is usually not straightforward. This value can be related to the amount of noise in the observed data but this information is not known. The samples of our training set can be seen as realizations of a statistical process with an unknown distribution and therefore one can associate to these realizations the concept of statistical entropy. We compute the normalized entropy of the mean vector of all the training set samples as
(12) 
where is the mean vector of all training samples, is the number of features for each training sample and is the probability mass function. In our case, is obtained as the normalized histogram of . Here, is an indicator of the randomness of the data due to noise. We use as the maximal residual error while applying batch OMP in DOMINODL. Algorithm 1 summarizes all major steps of DOMINODL.
Table I summarizes the important differences between DOMINODL and other related algorithms. Like MOD and CBWLSU, DOMINODL uses a weighted least squares solution in the dictionary update. The proof of convergence for the alternating minimization method in MOD was provided in [34] where it is shown that alternating minimization converges linearly as long as the following assumptions hold true: sparse coefficients have bounded values, sparsity level is on the order of and the dictionary satisfies the RIP property. In [24], these assumptions have been applied for CBWLSU convergence. Compared to CBWLSU, the improvements in DOMINODL include minibatch based data selection and data reduction via a dropoff strategy but the update algorithms remain the same. Numerical experiments in Section VI suggest that DOMINODL usually converges in far fewer iterations than CBWLSU.
Although we developed and tested DOMINODL on a highly correlated GPR dataset (see Section IV), this technique may be employed in other applications where realtime learning is necessary and the signals are correlated. Our tests demonstrate that DOMINODL converges faster than other online DL approaches (see Section VI) because of the combined strategy of drawing more new elements for each iteration, considering less previous elements in search for correlation and dropping off the unused previous elements. The entropybased calculation of , although not exclusive for DL applications as mentioned above, also helps in improving the SR of the data thus, learning a more representative dictionary.
Computational complexity of DOMINODL is very low compared to other online approaches. As mentioned earlier, there are atoms in the dictionary. Assume that every signal is represented by a linear combination of atoms, . Empirically, among all possible combinations of atoms from , the probability to have a common atom in the sparse representation is . Given training elements, the number of training data which have a specific atom in their representation is proportional to . Suppose our minibatch has elements that reduce the number of training data by a factor (depending on the values of and ). Further, assume that the dropping off step reduces the training set elements by a factor . The number of training data in the iteration is proportional to . Then, the worst estimate of DOMINODL’s computational complexity is due to the sparse coding batch OMP which is of order . This is much smaller than the complexity of ODL () or CBWLSU ().
Figure 1 illustrates the computational complexity of online DL approaches. Figure 1(a) shows that, for fixed number of iterations (), the general trend of complexity with respect to the increase in the number of atoms () is similar for all algorithms. However, the complexity of ODL is higher than CBWLSU and DOMINODL; the latter being the least complex. When the number of iterations is increased, the complexity of ODL and CBWLSU have a similar increasing trend (see Fig. 1(b)). In case of DOMINODL, its complexity is similar to the increasing trend of CBWLSU and determined largely by . When DOMINODL iterations begin accounting for previous elements, its complexity stays constant. The value of changes for every iteration, while depends on the data itself. In general, after a few dozen of iterations, DOMINODL’s complexity always stays lower than CBWLSU.
Iv Measurement Campaign
In this section, we first provide details of our GPR system and the field measurement campaign. We then describe the procedure to organize the entire dataset for our application.
Iva Ground Penetrating Radar System
Our GPR (see Fig. 2) is the commercially available SPRScan system manufactured by ERA Technology. It is an Lband, impulse waveform, ultrawideband (UWB) radar that is mounted on a movable trolley platform. Pulsed GPRs are more effective in terms of offering penetration depth and wide bandwidth with respect to the standard SteppedFrequency Continuous Wave (SFCW) systems. The former is also more robust to electronic interference and does not suffer from unequal balancing of antenna signals [56].
Table II lists the salient technical parameters of the system. The radar uses a cm dual bowtie dipole antenna for both transmit (Tx) and receive (Rx) sealed in a metallic shielding filled with an internal absorber. The central frequency of the system () and its bandwidth () are GHz. The pulse repetition frequency (PRF) and the sampling of the receiver ADC is MHz. The scanning system has a resolution of cm towards the perpendicular broadside (or X direction) and cm towards the crossbeam (Y direction). In our field campaigns, the SPRScan system moves along the survey area over a rail system which allows accurate positioning of the sensor head in order to obtain the aforementioned resolution in X and Y (see also Section IV).
Parameter  Value 

Operating frequency  2 GHz 
Pulse repetition frequency  1 MHz 
Pulse length  0.5 ns 
Sampling time  25 ps 
Spatial sampling along the beam  1 cm 
Crossbeam resolution  4 cm 
Antenna height  59 cm 
Antenna configuration  Perpendicular broadside 
Samples/Ascan  512 
The transmit pulse of the GPR system is a monocycle. Given the Gaussian waveform
(13) 
where is the central frequency, is the peak amplitude and , the monocycle waveform is its first derivative [57]
(14) 
In these UWB systems both the central frequency and the bandwidth are approximately the reciprocal of the pulse length.
The scattering of UWB radar signals from complex targets that are composed of a finite number of scattering centers can be described in terms of the channel impulse response (CIR). Here, the CIR is considered as a linear, time invariant, causal system which is a function of the target shape, size, constituent materials, and scan angle. The CIR of a GPR target, with scatterers, is expressed as a series of timedelayed and weighted Gaussian pulses [58]
(15) 
where each scatterer located at range from the radar is characterized by the reflectivity , duration , relative time shift , where is the speed of the electromagnetic wave in the soil, m/s is the speed of light, and is the dielectric constant which depends on the soil composition and moisture.
The response of the target to the Gaussian monocycle is the received signal
(16) 
also regarded as the target image, or range profile. For each X/Y position, the system receives a radar echo (range profile) from the transmitted pulse. In order to deal with the exponential signal attenuation during the propagation through the soil medium, the dynamic range of the signal is enhanced via stroboscopic sampling [2, 59, 60]. This technique comprises integrating receiver samples (generated by transmitting a sequence of pulses) at the ADC receiver sampling rate but with a small time offset for each of them. To achieve the desired stroboscopic sampling rate , the time offset must be selected accordingly, i.e., [60]. Our GPR system employs stroboscopic sampling to reach a pseudo sampling frequency of GHz (much above the Nyquist rate) to yield the discretetime signal .
The receiver has the ability to acquire a maximum of 195 profiles per second, each one consisting of 512 range samples. Prior to the A/D conversion, the signal is averaged to improve the signal to noise ratio (SNR). A timevarying gain correction can be applied to compensate for the soil attenuation and increase the overall dynamic range of the system. The receiver averages 100 range profiles for each antenna position.
IvB Test Field Measurements
We evaluated the proposed approach with the measurement data from a 2013 field campaign at Leibniz Institute for Applied Geophysics (LIAG) in Hannover (Germany) [6]; Fig. 3 shows the test field, for detailed ground truth informations. The soil texture was sandy and highly inhomogeneous (due to the presence of material such as organic matter and stones), thereby leading to a high variability in the electrical parameters. We measured the dielectric constant at three different locations of the testbed with a Time Domain Reflectometer (TDR) to obtain an estimate of its mean value and variability. The average value oscillated between 4.6 and 10.1 with standard deviation and correlation length [6] of cm. These large variations in soil dielectric characteristics pose difficulties in mine detection.
During the field tests, the SPRScan system moved on two plastic rails with the scan resolution in the X and Y directions being and cm, respectively. The entire survey lane was divided in m sections (see Fig. 3), each containing two targets in the center. The targets on the left and right sides of the lane were buried at approximately and cm depths, respectively.
Our testbed contains standard test targets (STT) and simulant landmines (SIM) of different sizes and shapes. An STT is a surrogate target used for testing landmine detection equipment. It is intended to interact with the equipment in an identical manner as a real landmine does. An SIM has the representative characteristics of a specific landmine class although it is not a replica of any specific model. In this paper, we study three STTs (PMA2, PMN and Type72) and one SIM(ERA). All of these test objects are buried at a depth of  cm in the test field [61]. For classification purposes, we group PMN and PMA2 together as the largest targets while T72 mines are the smallest (Fig. 4).
IvC Dataset Organization
The entire LIAG dataset consists of 27 aforementioned survey sections (or simply, “surveys”) of size m. Every survey consists of range profiles. We arranged the data into the training set () to be used for both DL and classification (as explained in subsection IIA) and a test set () to evaluate the performance of the proposed algorithms.
The training set is a matrix whose columns consist of sampled range profiles of range profiles each. The profiles are selected from different surveys and contain almost exclusively either a particular class of landmine or clutter. In total, we have , , and range profiles for clutter, PMA2/PMN, ERA and Type72, respectively. An accurate separation of these classes was very challenging because of the contributions from the nonhomogeneous soil clutter that often masked the target responses completely. A poor selection would lead the DL to learn a dictionary that is appropriate for sparsely representing clutter, instead of landmines. The test set is a matrix with columns that correspond to sampled range profiles from 6 surveys, two for each target class. The test and training sets contain data from separate surveys to enable fair assessment of the classification performance.
We denote by the matrices and the SRs of and , respectively and by the number of atoms of the learned dictionary .
V Parametric Analysis
In practice, the SRbased classification performance is sensitive to the input parameters of DL algorithms thereby making it difficult to directly apply DL with arbitrary parameter values. Previous works set these parameters through hitandtrial or resorting to metrics that are unable to discriminate the influence of different parameters [20]. In this section, we propose methods to investigate the effect of the various input parameters on the learning performance and then preset the parameter to optimal values that yield the dictionary (for each DL method) optimized to sparsely represent our GPR data, therefore improving the quality of the features for classification (i.e. the sparse coefficients).
Table III lists these parameters (see Section III): number of iterations , number of trained atoms , and DOMINODL parameters , and . We applied KSVD, LRSDL, ODL, CBWLSU and DOMINODL separately on the training set for different combination of parameter values. In order to compare the dictionaries obtained from various DL algorithms, we use a similarity measure that quantifies the closeness of the original training set with the reconstructed set obtained using the sparse coefficients of the learned dictionary . From these similarity values, empirical probability density functions (EPDFs) for any combination of parameter values are obtained; we evaluate these EPDFs using statistical metrics described in Section VB. These metrics efficiently characterize the similarity between and and lead us to an optimal selection of various DL input parameters for our experimental GPR dataset.
DL algorithm  Input parameters 

KSVD  , 
LRSDL  , 
ODL  , 
CBWLSU  
DOMINODL  , , , 
Va Similarity Measure
Consider the crosscorrelation between the original training set vector and its reconstruction : . The normalized crosscorrelation is defined as
(17) 
For the vector , we define the similarity measure as
(18) 
where a value of closer to unity demonstrates greater similarity of the reconstructed data with the original training set. We compute for all vectors , and then obtain the normalized histogram or empirical probability density function (EPDF) of the similarity measure. Here, the subscript DL represents the algorithm used for learning e.g. “K”, “O”, “C” and “D” for KSVD, ODL, CBWLSU and DOMINODL, respectively. Various parameter combinations for a specific DL method result in a collection of EPDFs. For a given DL method, our goal is to compare the epdfs of similarity measure by varying these parameters, and arrive at the thresholds of parameter values after which the changes in are only incremental.
For instance, Fig. 5 shows the EPDFs of learned from the GPR mines data where optimal parameters for different DL methods were determined using statistical methods described in the following subsection. We note that the online DL approaches (, and ) yield distributions that are more skewed towards unity than KSVD ().
VB Statistical Metrics
We are looking for parameter values for which is skewed towards unity and has small variance. The individual comparisons of mean () and standard deviation (), as used in previous GPR DL studies [20], are not sufficient to quantify the observed dispersion in the epdfs obtained by varying any of the parameter values. Some DL studies [20, 62, 50] rely on bulk statistics such as NRMSE but these quantities are insensitive to large changes in parameter values and, therefore, unhelpful in finetuning the algorithms. For this evaluation, we will use three different metrics: the coefficient of variation, the Twosample KolmogorovSmirnov (KS) distance and the DvoretzkyKieferWolfowitz (DKW) inequality.
VB1 Coefficient of variation
We choose to simultaneously compare both () and variance () of a single EPDF by using the coefficient of variation, ; in our analysis, it represents the extent of variability in relation to the mean of the similarity values.
VB2 Twosample KolmogorovSmirnov distance
In the context of our application, it is more convenient to work with the cumulative distribution functions (CDFs) rather than with PDFs because the welldeveloped statistical inference theory allows for convenient comparison of CDFs. Therefore, our second metric to compare similarity measurements obtained by successive changes in parameter values is the twosample KolmogorovSmirnov (KS) distance [21], which is the maximum distance between two given empirical cumulative distribution functions (ECDF). Larger values of this metric indicate that samples are drawn from different underlying distributions. Given two random variables and , suppose and are their ECDFs of the same length and correspond to their EPDFs and , respectively. Then, the KS distance is
(19) 
where denotes the supremum over all distances and is the number of i.i.d. observations (or samples) to evaluate both distributions. In our case, is the number of range profiles in the training set. We first compute a reference ECDF () for each DL algorithm and fixed parameter values. For our purposes, this reference ECDF will be obtained by a particular combination of input parameters of the selected DL algorithm. Then, we vary parameter values from this reference and obtain the corresponding ECDF of similarity measure. Finally, we calculate the KS distance of with respect to as
(20) 
For our evaluation, states how much the selection of certain input parameters of DL changes the ECDFs of similarity values (i.e. how different is the result of DL) with respect to the reference.
VB3 DvoretzkyKieferWolfowitz inequality metric
As a third metric, we exploit the DvoretzkyKieferWolfowitz inequality (DKW) [22, 23] which precisely characterizes the rate of convergence of an ECDF to a corresponding exact CDF (from which the empirical samples are drawn) for any finite number of samples. Let be the KS distance between ECDF and the continuous CDF for a random variable and samples. Since changes with the change in the random samples, is also a random variable. We are interested in the conditions that provide desired confidence in verifying if F and G are the same distributions for a given finite . If the two distributions are indeed identical, then the DKW inequality bounds the probability that is greater than any number , with as follows^{4}^{4}4The corresponding asymptotic result that as , with probability is due to the GlivenkoCantelli theorem [63, 64].
(21) 
Consider a binary hypothesis testing framework where we use (21) to test the null hypothesis for a given . The probability of rejecting the null hypothesis when it is true is called the pvalue of the test and is bounded by the DKW inequality. Assuming the pvalue is smaller than a certain confidence level , the following inequality must hold with probability at least :
(22) 
Our goal is to use the DKW inequality to compare two ECDFs and as in (20), to verify if they are drawn from the same underlying CDF. By the triangle inequality, the KS distance satisfies
(23) 
where an are the underlying CDFs corresponding to and . We now bound the right side using DKW
(24) 
which is the maximum distance for which and are identical with probability . The DKW metric is the difference
(25) 
Larger values of this metric imply greater similarity betweem the two ECDFs; a negative value implies that the null hypothesis is not true.
VC Parametric Evaluation
We evaluated the performance of the aforementioned DL algorithms by analyzing the influence of the various DL input parameters using the metrics introduced in VB for the reconstruction of the training set . There are various soil types and scenarios for a landmine contaminated site. The LIAG test data provides an accurate representation of a practical scenario. Our metrics are general and derived from widely accepted statistical studies. Thus, their relevance to similar scenarios is very likely. As shown in table III the number of iterations is not relevant to CBWLSU and DOMINODL while the latter requires additional parameters to spacify the minibatch dimensions and the iterations required to dropoff unused training set elements. We compute the KS distance and the DKW metric for all methods with respect to a reference distribution . This reference, different for each DL algorithm, is obtained using the following parameters as applicable: , , , and .
VC1 Number of iterations
Figures 7(a)(c) show the effect of on the , KS test distance and the DKW metric for KSVD, ODL and LRSDL. We have skipped CBWLSU and DOMINODL from this analysis because they do not accept as an input. For ODL, the remains relatively unchanged with an increase in . However, the KSVD exhibits an oscillating behavior and generally high values. In case of the KS distance, ODL shows slight increase in while KSVD oscillates around a mean value that is higher than ODL. The DKW metric provides better insight: even though the ODL distributions differ from with increase in the iterations, the null hypothesis always holds because remains positive. The for KSVD is also positive but much smaller than ODL. It also does not exhibit any specific trend with an increase in iterations. We also observed a similar behavior with the mean of similarity values. The influence of the number of iterations in LRSDL had the same oscillating behaviour as in KSVD but with larger variation. We conclude that the number of iterations does not significantly influence the metrics for these algorithms, and choose .
VC2 Number of trained atoms
Figs. 8(a)(c) compare all three metrics with change in the number of trained atoms , a parameter that is common to all DL methods. We observe that generally decreases with an increase in . This indicates an improvement in the similarity between the reconstructed and the original training set. KSVD shows an anomalous pattern for lower values of but later converges to a trend that is identical to other DL approaches. The KS distance exhibits a linear change in the the distributions with respect to the reference. Since quantifies the difference between the distributions rather than stating which one is better, combining its behavior with makes it evident that an increase in leads to better distributions of similarity values. The DKW metric , calculated with the same reference, expectedly also shows a linear change. It is clear that, even a slight change in leads to more negative values of implying that the null hypothesis does not hold true. This shows the significant influence of the parameter on the distributions. It was interesting to see a slight improvement for the coefficient of variation when using LRSDL with respect to the other strategies. However, KSdistance and DKW metric indicated that the distributions of similarity values for LRSDL were sensitive to the number of trained atoms only up to a certain value.
VC3 DOMINODL parameters
It is difficult to evaluate DOMINODL EPDFs by varying all four parameters together. Instead, we fix the parameter that is common to all algorithms, i.e. the number of trained atoms , and then determine optimal values of , and . Figure 6 shows the coefficient of variation of the distribution of similarity values as a function of DOMINODL parameters. The dropoff value appears to have a greater influence than minibatch dimensions and . Our analysis of the computational times of DOMINODL showed that it is essentially independent of and but slightly increases with . This is expected because we also increased the number of steps for sparse decomposition (see Algorithm 1) which is the source of bulk of computations in DL algorithms [24]. Further, in order to ensure that the correlation and the dropoff steps kick off from the very first iteration, DOMINODL should admit several new samples for each iteration thereby increasing as well as the number of previous elements accordingly. Taking into account these observations, we choose and .
According to the results of the parametric evaluation, we choose the following combination of “optimal” parameters for testing our DL strategies: , , , , and .
Vi Experimental Results and Analysis
After selecting the input parameters of the proposed DL strategies, we proceed with the trained dictionaries for sparse decomposition of both training and test sets. The resulting sets of sparse coefficients are the input to the SVM classifier. As mentioned in Section IIA, the threshold and the kernel function parameter for SVM have been selected through cross validation. Our key objective is to demonstrate that online DL algorithms may lead to an improvement in the classification performance over batch learning strategies. In particular, we want to analyze the performance of DOMINODL in terms of classification accuracy and learning speed. As a comparison with a popular stateoftheart classification method, we also show the classification results with a deeplearning approach based on CNN. Finally, We demonstrate classification performance when the original samples of the range profiles are randomly reduced.
Via Classification with Optimal Parameters
For a comprehensive analysis of the classification performance, we provide both classification maps and confusion matrices for the test set using the optimal DL input parameters that we selected following our parametric evaluation in Section V. The classification maps depict the predicted class of each range profile of the survey under test. The pixel dimension of these maps is dictated by the sampling of the GPR in X and Y directions (see Table II). We stitched together 3 of the 6 surveys from the test set where each survey had 2 buried landmines from a specific target class (PMN/PMA2, ERA and Type72).
Figure 9 shows the classification maps for different DL methods along with the raw data at depth cm. The selected survey area covers a total of range profiles. The raw data in Fig. 9(a) shows that only four of the six mines exhibit a strong reflectivity while the other two mines have echoes so weak that they are not clearly visible in the raw data. Figures 9(b)(d) show the results of the SRbased classification approaches using DL. All methods clearly detect and correctly classify the large PMN/PMA2 mines. In case of the mediumsize ERA, the echoes are certainly detected as nonclutter but some of its constituent pixels are incorrectly classified as another mine. It is remarkable that the left ERA mine is recognized by our method even though it cannot be discerned visually in the raw data. Most of the false alarms in the map belong to the smallest Type72 mines. This is expected because their small sizes produce echoes very similar to the ground clutter. On the other hand, when T72 is the ground truth, it is correctly identified.
Using accurate ground truth information, we defined target halos as the boundaries of the buried landmines. The dimension of the target halos varied depending on the mine size. Let the number of pixels and the declared mine pixels inside the target halo be and , respectively. Similarly, we denote the number of true and declared clutter pixels outside the target halo by and , respectively. Then, the probabilities of correct classification () for each target class and clutter are, respectively,
(26) 
The being the output of a classifier should not be mistaken as the radar’s probability of detection which is the result of a detector. A detector declares the presence of a mine when only a few pixels inside the halo have been declared as mine; provides a fairer and more accurate evaluation of the classification result. This perpixel information can be easily used to improve the final detection result. For instance, the operator could set a threshold for the minimum number of pixels to be detected in a cluster so that a circle with center at the cluster centroid could be used as the detected mine. However, such a circle may exclude some of the mine pixels leading to a potential field danger. The perpixel classification is then employed to determine the guard area around the mine circle.
A confusion matrix is a quantitative representation of the classifier performance. The matrix lists the probability of classifying the ground truth as a particular class. The classes listed columnwise in the confusion matrix are the ground truths while the rowwise classes are their predicted labels. Therefore, the diagonal of the matrix is the while offdiagonal elements are probabilities of misclassification.
[b]
Clutter  PMN/PMA2  ERA  Type72  

Clutter  0.892  0.044  0.25  0.37  
KSVD  PMN/PMA2  0.022  0.938^{1}  0.166  0.074 
ERA  0.021  0.017  0.472  0.018  
Type72  0.064  0  0.111  0.537  
Clutter  0.435  0.061  0.111  0.351  
LRSDL (SRC)  PMN/PMA2  0.155  0.289  0.319  0.259 
ERA  0.172  0.372  0.361  0.278  
Type72  0.237  0.272  0.208  0.111  
Clutter  0.889  0.114  0.333  0.463  
LRSDL (SVM)  PMN/PMA2  0.026  0.877  0.186  0.185 
ERA  0.027  0  0.444  0.926  
Type72  0.058  0.008  0.041  0.426  
Clutter  0.871  0  0.194  0.333  
ODL  PMN/PMA2  0.022  0.973  0.139  0 
ERA  0.018  0.026  0.583  0.018  
Type72  0.088  0  0.083  0.648  
Clutter  0.872  0.017  0.181  0.314  
CBWLSU  PMN/PMA2  0.023  0.973  0.153  0 
ERA  0.025  0.008  0.528  0  
Type72  0.08  0  0.138  0.685  
Clutter  0.876  0.017  0.167  0.315  
DOMINODL  PMN/PMA2  0.023  0.974  0.138  0 
ERA  0.027  0.008  0.58  0  
Type72  0.077  0  0.11  0.685 

Gray denotes the value for a specified class and DL algorithm
For the classification map of Fig. 9, Table IV shows the corresponding confusion matrices for each DLbased classification approach. In general, we observe an excellent classification of PMN/PMA2 landmines (~%), implying that almost every range profile in the test set which belongs to this class is correctly labeled. The for the clutter is also quite high (~%). This can also be concluded from the classification maps where the false alarms within the actual clutter regions are very sparse (i.e. they do not form a cluster) and, therefore, unlikely to be interpreted as an extended target. As noted previously, most of the clutter misclassification is associated with the Type72 class. The ERA test targets show some difficulty with correct classification. But most of the pixels within its target halo are declared at least as some type of mine (which is quite useful in terms of issuing safety warnings in the specific field area). This result can be explained by the fact that ERA test targets do not represent a specific mine but have general characteristics common to most landmines. The Type72 mines exhibit a which is slightly higher with respect to ERA targets. This is a remarkable result because Type72 targets were expected to be the most challenging to classify due to their small size.
Conventionally, as mentioned in [47], LRSDL is used with a sparserepresentationbased classification (SRC). However, applying this approach to our problem resulted in very low accuracy (an average of ~% across all classes as evident from Table IV) and semirandom classification maps (Fig. 9). This can be explained by the extreme similarity between the training set examples of different classes; mines and clutter are only slightly dissimilar in their responses and mine responses are generally hidden in the ground reflections. Each learned “block” differed only slightly from the other and, therefore, poor classification results are achieved with this dataset. On the other hand, when we used the dictionary learned with LRSDL with our SVMbased technique, we obtained better classification accuracy (see Table IV and Fig. 9). However, this performance is still inferior to KSVD and, hence, even worse than the other online DL approaches.
All DL algorithms used for our sparse classification approach show very similar results for the clutter and PMN/PMA2 classes. However, online DL methods show higher for the ERA and Type72 targets than KSVD. From Table IV, the detection enhancement using the best of the online DL algorithms for PMN/PMA2 over KSVD is %. The improvements for ERA and T72 are computed similarly as % and %, respectively.
ViB Classification with NonOptimal Parameters
In order to demonstrate how the quality of the learned dictionary affects the final classification, we now show the confusion matrices for a nonoptimal selection of input parameters in different DL algorithms. Our goal is to emphasize the importance of learning a good dictionary by selecting the optimal parameters rather than specifying how each parameter affects the final classification result. We arbitrarily selected the number of trained atoms to be only for all DL approaches, reduce the number of iterations to for ODL and KSVD and, for DOMINODL, we use =30, =5 and =2. Table V shows the resulting confusion matrix. While the clutter classification accuracy is almost the same as in Table IV, the for PMN/PMA2 landmines decreased by ~% for most of the algorithms except ODL where it remains unchanged. The classification accuracy for ERA and Type72 mines is only slightly worse for online DL approaches. However, in the case of KSVD, the reduces by ~% and ~% for ERA and Type72, respectively. Clearly, the reconstruction and correct classification of range profiles using batch algorithms such as KSVD is strongly affected by a nonoptimal choice of DL input parameters. As discussed earlier in Section VC, this degradation is likely due to the influence of rather than .
Clutter  PMN/PMA2  ERA  Type72  

Clutter  0.853  0.07  0.305  0.222  
KSVD  PMN/PMA2  0.037  0.851  0.222  0.111 
ERA  0.032  0  0.194  0.241  
Type72  0.077  0.078  0.277  0.426  
Clutter  0.86  0.017  0.181  0.444  
ODL  PMN/PMA2  0.016  0.973  0.097  0 
ERA  0.022  0.008  0.638  0  
Type72  0.1  0  0.083  0.555  
Clutter  0.887  0.078  0.319  0.352  
CBWLSU  PMN/PMA2  0.019  0.877  0.097  0 
ERA  0.018  0.043  0.541  0  
Type72  0.074  0  0.042  0.648  
Clutter  0.888  0.078  0.319  0.352  
DOMINODL  PMN/PMA2  0.019  0.877  0.097  0 
ERA  0.018  0.043  0.54  0  
Type72  0.074  0  0.042  0.648 
[b]
DOMINODL  CBWLSU  ODL  KSVD  LRSDL  

Time (seconds)  1.75^{1}  16.49  5.75  25.8  1057 

Blue denotes the best performance among all DL algorithms
ViC Computational Efficiency
We used MATLAB 2016a platform on an 8Core CPU Windows 7 desktop PC to clock the times for DL algorithms. The ODL algorithm from [19] is implemented as mex executable, and therefore already finetuned for speed. For KSVD, we employed the efficient implementation from [52] to improve computational speed. Table VI lists the execution times of the five DL approaches. Here, the parameters were optimally selected for all the algorithms. The LRSDL is the slowest of all while ODL is more than 4 times faster than KSVD. The CBWLSU provided better classification results but is three times slower than ODL. This could be because the dictionary update step always considers the entire previous training set elements that correlate with only one new element (i.e. there is no minibatch strategy). This makes the convergence in CBWLSU more challenging.
The DOMINODL is the fastest DL method clocking 3x speed than ODL and 15x than KSVD. This is because the DOMINODL updates the dictionary by evaluating only a minibatch of previous elements (instead of all of them as in CBWLSU) that correlate with a minibatch of several new elements (CBWLSU uses just one new element). Further, DOMINODL drops out the unused elements leading to a faster convergence. We note that, unlike ODL and KSVD implementations, we did not use mex executables of DOMINODL which can further shorten current execution times. From Table VI, the reduction in DOMINODL computational time over KSVD is %. The reduction for ODL and CBWLSU are computed similarly as % and %, respectively.
The computational bottleneck of mines classification lies in the training times. In comparison, the common steps of sparse decomposition and SVMbased classification during testing take just 0.4 s and 1 s, respectively, for an entire survey (1 m 1 m area with 2500 range profiles). Thus, time taken per range profile in ~0.59 ms. The average scan rate of our GPR system is 0.19 m/s (or 1 cm/52.1 ms). This can go as high as 2.7 m/s (or 1 cm/3.61 ms) in other GPRs used for landmines application. Therefore, the test times do not impose much computational cost.
ViD Comparison with SparseRepresentationBased Classification
[b]
Clutter  PMN/PMA2  ERA  Type72  

Clutter  0.912  0.017  0.25  0.925  
SRC  PMN/PMA2  0.037  0.456^{1}  0.042  0 
ERA  0.025  0.526  0.278  0.074  
Type72  0.025  0  0.43  0  
Clutter  0.729  0.017  0.083  0.241  
DOMINODL  PMN/PMA2  0.041  0.982  0.139  0 
ERA  0.054  0  0.667  0.074  
Type72  0.176  0  0.111  0.685  
Clutter  0.584  0  0.125  0.185  
CBWLSU  PMN/PMA2  0.063  0.982  0.153  0.11 
ERA  0.106  0  0.625  0.185  
Type72  0.247  0.017  0.097  0.518  
Clutter  0.71  0.035  0.153  0.259  
ODL  PMN/PMA2  0.036  0.912  0.069  0.074 
ERA  0.088  0.008  0.667  0.074  
Type72  0.165  0.044  0.111  0.593  
Clutter  0.617  0  0.111  0.148  
KSVD  PMN/PMA2  0.044  0.982  0.194  0.056 
ERA  0.113  0  0.667  0.241  
Type72  0.226  0.017  0.027  0.444 

Gray denotes the value for a specified class and DL algorithm
We compared our proposed DLbased approach with the sparserepresentationbased classification (SRC) method proposed in [11]. The SRC needs a labeled dictionary but the dictionary that we learn from does not have label information anymore thereby making SRC infeasible here. Therefore, we adopt the following steps for a reasonable comparison of the two methods. We feed SRC with as the dictionary . As indicated in Section IV.C, a meticulously selected collection of mines/clutter responses as is meaningful for comparing different DL approaches. But it does not highlight the benefits of employing DL per se. Therefore, we generate a coarser selection, i.e. more profiles than the handpicked case, as for both approaches. Table VII shows the confusion matrix for the residualbased classification along with the proposed DLbased approaches. The DLbased mine classification is consistent with previous results  even better with DOMINODL  but with some tradeoff of decreasing clutter accuracy. The accuracy of residualbased classifier is severely degraded for all mine classes, dropping by at least 45%, 39% and 55% for PMA/PMA2, ERA and T72, respectively. This renders the increase in clutter classification accuracy of this method not usable.
ViE DeepLearningBased Classification
Clutter  PMN/PMA2  ERA  Type72  

Clutter  0.909  0.14  0.016  0.574 
PMN/PMA2  0.032  0.807  0.181  0 
ERA  0  0.053  0.319  0.315 
Type72  0.033  0  0.111  0.370 
The core idea of SRbased classification is largely based on the assumption that signals are linear combinations of a few atoms. In practice, this is often not the case. This has led to a few recent works [65] that suggest employing deep learning for radar target classification. However, these techniques require significantly large datasets for training.
We compared classification results of our methods with a deep learning approach. In particular, we constructed a CNN because these networks are known to efficiently exploit structural or locational information in the data and yield comparable learning potential with far fewer parameters [66]. We modeled our proposed CNN framework as a classification problem wherein each class denotes the type of mine or clutter. The training data set for our CNN structure is the matrix (see Section IV). Building up a synthetic database is usually an option for creating (or extending) a training set for deep learning applications. However, accurately modeling a GPR scenario is still an ongoing challenge in the GPR community because of the difficulties in accurately reproducing the soil inhomogeneities (and variabilities), the surface and underground clutter, the antenna coupling and ringing effects, etc. Even though some applications have been promising [67], this remains a cumbersome task.
The input layer of our CNN took onedimensional sample set of size . It was followed by two convolutional layers with and filters of size and , respectively. The output layer consisted of four units wherein the network classifies the given input data as clutter or one of the three mines. There were rectified linear units (ReLU) after each convolutional layer; the ReLU function is given by [68]. The architecture of the CNN was selected through an arduous process of testing many combination of layers/filters and hyperparameters which would lead to better accuracy during training. A deeper network slightly increased the accuracy in the training phase but led to poorer performance when classifying new data (i.e. the test set ). Since our data are limited, adding more layers (i.e. more weights) only led to overfitting and made the network incapable to generalize on new datasets. A multidimensional CNN formed by clustering 2D and 3D data would have further reduced the training set. Augmenting the data was also envisioned but commonly used transformations such as scaling/rotations are not useful in our case because the mines were always in the same inclination and their dimension defines the class itself. We also attempted adding different levels of noise but this did not lead to better results considering the available data are already very noisy.
25% Reduction  50% Reduction  75% Reduction  

Clutter  PMN/PMA2  ERA  Type72  Clutter  PMN/PMA2  ERA  Type72  Clutter  PMN/PMA2  ERA  Type72  
Clutter  0.892  0.078  0.319  0.389  0.882  0.026  0.291  0.37  0.877  0.061  0.402  0.426  
KSVD  PMN/PMA2  0.021  0.921  0.153  0.055  0.018  0.947  0.153  0.037  0.02  0.912  0.125  0.074 
ERA  0.021  0  0.486  0.018  0.021  0.026  0.5  0  0.021  0.026  0.333  0.018  
Type72  0.065  0  0.041  0.537  0.078  0  0.055  0.592  0.08  0  0.138  0.481  
Clutter  0.872  0.088  0.208  0.315  0.868  0  0.208  0.333  0.862  0.02  0.319  0.296  
ODL  PMN/PMA2  0.021  0.973  0.152  0  0.021  0.965  0.18  0.018  0.023  0.964  0.138  0.018 
ERA  0.018  0.017  0.527  0.018  0.018  0.035  0.5  0  0.021  0.008  0.416  0.074  
Type72  0.087  0  0.111  0.666  0.09  0  0.111  0.648  0.091  0  0.125  0.611  
Clutter  0.871  0.026  0.194  0.351  0.872  0.017  0.25  0.40  0.855  0.088  0.388  0.370  
CBWLSU  PMN/PMA2  0.024  0.956  0.139  0  0.023  0.973  0.111  0  0.024  0.974  0.111  0 
ERA  0.025  0.017  0.541  0  0.02  0.008  0.541  0  0.027  0.017  0.333  0.018  
Type72  0.79  0  0.125  0.648  0.083  0  0.097  0.592  0.091  0  0.125  0.611  
Clutter  0.88  0.017  0.236  0.277  0.868  0.035  0.194  0.296  0.864  0.035  0.278  0.444  
DOMINODL  PMN/PMA2  0.022  0.964  0.138  0  0.023  0.929  0.138  0  0.027  0.938  0.152  0 
ERA  0.018  0.017  0.527  0  0.024  0.035  0.527  0.018  0.026  0.026  0.5  0  
Type72  0.078  0  0.097  0.722  0.083  0  0.138  0.685  0.082  0  0.069  0.556  
Clutter  0.708  0.359  0.236  0.407  0.265  0.166  0.645  0.148  0.162  0.105  0.647  0.129  
CNN  PMN/PMA2  0.026  0.41  0.097  0.018  0.062  0.096  0.069  0.018  0.015  0.061  0.013  0 
ERA  0  0.21  0.5  0.426  0  0.72  0.73  0.75  0  0.71  0.708  0.759  
Type72  0.029  0.017  0.069  0.148  0.027  0.088  0.014  0.074  0.17  0.12  0.12  0.11 
25% Reduction  50% Reduction  75% Reduction  

Clutter  PMN/PMA2  ERA  Type72  Clutter  PMN/PMA2  ERA  Type72  Clutter  PMN/PMA2  ERA  Type72  
Clutter  0.887  0.044  0.236  0.370  0.868  0  0.417  0.556  0.885  0.097  0.444  0.426  
KSVD  PMN/PMA2  0.019  0.912  0.139  0  0.023  0.912  0.153  0.037  0.019  0.746  0.014  0.130 
ERA  0.023  0.044  0.528  0.019  0.010  0.070  0.306  0.019  0.012  0.088  0.319  0  
Type72  0.071  0  0.097  0.611  0.099  0.018  0.125  0.389  0.084  0.070  0.222  0.444  
Clutter  0.865  0.009  0.167  0.315  0.865  0.009  0.375  0.611  0.857  0.070  0.375  0.352  
ODL  PMN/PMA2  0.016  0.939  0.181  0  0.023  0.991  0.111  0  0.027  0.851  0.056  0.093 
ERA  0.020  0.053  0.583  0.019  0.011  0  0.431  0  0.016  0.079  0.444  0  
Type72  0.096  0  0.069  0.667  0.101  0  0.083  0.389  0.101  0  0.125  0.556  
Clutter  0.865  0.035  0.250  0.426  0.874  0.009  0.347  0.407  0.834  0.035  0.500  0.685  
CBWLSU  PMN/PMA2  0.021  0.912  0.167  0  0.033  0.904  0.069  0.241  0.020  0.851  0.056  0 
ERA  0.026  0.053  0.542  0.019  0.014  0.088  0.500  0  0.015  0.105  0.403  0  
Type72  0.088  0  0.042  0.556  0.080  0  0.083  0.352  0.131  0.089  0.042  0.315  
Clutter  0.841  0.026  0.222  0.296  0.864  0.053  0.333  0.574  0.815  0.175  0.431  0.519  
DOMINODL  PMN/PMA2  0.021  0.921  0.167  0  0.023  0.868  0.069  0.074  0.027  0.746  0.069  0 
ERA  0.026  0.053  0.542  0.019  0.014  0.079  0.486  0  0.017  0.079  0.486  0  
Type72  0.112  0  0.070  0.685  0.099  0  0.111  0.352  0.142  0  0.014  0.482  
Clutter  0.838  0.026  0.040  0.333  0.819  0.088  0.047  0.315  0.644  0.035  0.109  0.463  
CNN  PMN/PMA2  0.062  0.868  0.208  0.537  0.059  0.851  0.319  0.463  0.080  0.693  0.181  0.019 
ERA  0  0.053  0.250  0.074  0  0  0.153  0.130  0  0.193  0.306  0.019  
Type72  0.106  0.053  0.208  0.556  0.120  0.061  0.250  0.500  0.105  0.079  0.097  0 
We trained the network with the labeled training set , selecting ~% of the training data for validation. Specifically, the validation set employed , , , and range profiles for clutter, PMN/PMA2, ERA and Type72, respectively. We used a stochastic gradient descent algorithm for updating the network parameters with the learning rate of and minibatch size of samples for epochs.
We realized the proposed network in TensorFlow on a Windows 7 PC with 8core CPU. The network training took minutes. Figure 10 shows the classification map obtained using CNN. The corresponding confusion matrix is listed in Table VIII. We note that the CNN classifier shows worse than our SRbased techniques, particularly for ERA and Type72 target classes.
ViF Classification with Reduced Range Samples
We now analyze the robustness of our DLbased adaptive classification method to the reduction of the number of samples in the raw data. Assuming the collected data is sparse in dictionary , we undersampled the original raw data in range to obtain its rowundersampled version by randomly reducing the samples. We then applied the same random sampling pattern to the dictionary for obtaining the sparse coefficients. We also analyzed the CNN classifier when the signals are randomly reduced in the same way. Figure 11 illustrates the classification map for all DL approaches when the sampling is reduced by %. Table IX clubs together the confusion matrices when undersampling by %, %, and %.
In comparison to the results in Table IV which used all samples of the raw data, the DL approaches maintain similar classifier performance even when we reduce the samples by 75% (i.e. just 52 samples in total). In contrast, the CNN classifier result which is already heavily compromised with a reduction of %, fails completely for %and % sampling rate. Reducing the number of signal samples when using a dictionary which minimizes the number of nonzero entries in the sparse representation, still assures an exact reconstruction of the signal itself and, consequently its correct classification. The features for classifying the traces are robust to the reduction of the original samples. Deep learning strategies use the signal samples directly as classification features. They also require enormous amount of data for training. Therefore, the degradation in their performance is expected. From the confusion matrix in Table IX indicates that CNN has the highest for ERA. This is a false trail because the network misclassified almost every pixel as ERA. Overall, DOMINODL and CBWLSU provide excellent results for small mines. However, as seen earlier, CBWLSU is not very wellsuited for realtime operation because of longer execution times.
We also assessed the performance of different methods when, instead of sampling fewer range samples per profile, we include all samples in every range profile but reduce the overall number of training set elements randomly from 926 to 694, 464 and 232 range profiles (which respectively correspond to 25%, 50% and 75% reduction). From the corresponding confusion matrices listed in Table X, we note that CNNbased classification results have improved with respect to Table IX. However, the classification accuracy of CNN is still poorer than the DLbased classification. In general, all methods show performance degradation as the training set elements are reduced. Among the online DL methods, ODL is more robust to the range profile reduction than KSVD.
Vii Summary
In this paper, we proposed effective online DL strategies for sparse decomposition of GPR traces of buried landmines. The online methods outperform KSVD thereby making them a good candidate for SRbased classification. Our algorithm DOMINODL is always the fastest providing near realtime performance and high clutter rejection while also maintaining a classifier performance that is comparable to other online DL algorithms. DOMINODL and CBWLSU generally classify smaller mines better than ODL and KSVD. Unlike previous works that rely on RMSE, we used metrics based on statistical inference to tune the DL parameters for enhanced operation.
Fast ODL computations pave the way towards cognition [69, 70, 71] in GPR operation, wherein the system uses previous measurements to optimize the processing performance and is capable of sequential sampling adaptation [72] based on the learned dictionary. For example, in a realistic landmine clearance campaign, an operator could gather the training measurements over a safe area next to the contaminated site, hypothetically placing some buried landmine simulants over it in order to have a faithful representation of the soil/targets interaction beneath the surface. In other words, our work allows the operator to calibrate the acquisition by providing a good training set to learn the dictionary.
Acknowledgements
The authors acknowledge valuable assistance from David MateosNúñez in Section VIE.
References
 [1] H. M. Jol, Ed., Ground penetrating radar theory and applications. Elsevier Science, 2009.
 [2] D. J. Daniels, Ground penetrating radar. IET, 2004.
 [3] Landmine Monitor 2017, Geneva, Switzerland, 2017, International Campaign to Ban Landmines  Cluster Munition Coalition.
 [4] F. Giovanneschi, M. A. GonzálezHuici, and U. Uschkerat, “A parametric analysis of time and frequency domain GPR scattering signatures from buried landminelike targets,” in SPIE Defense, Security, and Sensing, 2013, pp. 870 914–870 914.
 [5] M. A. GonzálezHuici, I. Catapano, and F. Soldovieri, “A comparative study of GPR reconstruction approaches for landmine detection,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 12, pp. 4869–4878, 2014.
 [6] M. A. GonzálezHuici and F. Giovanneschi, “A combined strategy for landmine detection and identification using synthetic GPR responses,” Journal of Applied Geophysics, vol. 99, pp. 154–165, 2013.
 [7] P. A. Torrione, K. D. Morton, R. Sakaguchi, and L. M. Collins, “Histograms of oriented gradients for landmine detection in groundpenetrating radar data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 3, pp. 1539–1550, 2014.
 [8] I. Giannakis, A. Giannopoulos, and A. Yarovoy, “Modelbased evaluation of signaltoclutter ratio for landmine detection using groundpenetrating radar,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 6, pp. 3564–3573, 2016.
 [9] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
 [10] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan, “Sparse representation for computer vision and pattern recognition,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1031–1044, 2010.
 [11] F. Giovanneschi and M. A. GonzálezHuici, “A preliminary analysis of a sparse reconstruction based classification method applied to GPR data,” in International Workshop on Advanced Ground Penetrating Radar, 2015, pp. 1–4.
 [12] F. Giovanneschi, K. V. Mishra, M. A. GonzalezHuici, Y. C. Eldar, and J. H. G. Ender, “Online dictionary learning aided target recognition in cognitive GPR,” in IEEE International Geoscience and Remote Sensing Symposium, 2017, pp. 4813–4816.
 [13] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image processing, vol. 15, no. 12, pp. 3736–3745, 2006.
 [14] Y. C. Eldar, Sampling Theory: Beyond Bandlimited Systems. Cambridge University Press, 2015.
 [15] M. Elad, Sparse and Redundant Representations  From Theory to Applications in Signal and Image Processing. Springer, 2010.
 [16] S. Arora, R. Ge, and A. Moitra, “New algorithms for learning incoherent and overcomplete dictionaries,” in Conference on Learning Theory, 2014, pp. 779–806.
 [17] Y. C. Eldar and G. Kutyniok, Compressed sensing: Theory and applications. Cambridge University Press, 2012.
 [18] K. Engan, S. O. Aase, and J. H. Husoy, “Method of optimal directions for frame design,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, 1999, pp. 2443–2446.
 [19] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning for sparse coding,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 689–696.
 [20] W. Shao, A. Bouzerdoum, and S. L. Phung, “Sparse representation of GPR traces with application to signal classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 7, pp. 3922–3930, 2013.
 [21] I. M. Chakravarti, L. R. G., and J. Roy, Handbook of methods of applied statistics: Volume I. John Wiley and Sons, 2004.
 [22] A. Dvoretzky, J. Kiefer, and J. Wolfowitz, “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator,” The Annals of Mathematical Statistics, pp. 642–669, 1956.
 [23] P. Massart et al., “The tight constant in the dvoretzkykieferwolfowitz inequality,” The Annals of Probability, vol. 18, no. 3, pp. 1269–1283, 1990.
 [24] Y. Naderahmadian, S. Beheshti, and M. A. Tinati, “Correlation based online dictionary learning algorithm,” IEEE Transactions on Signal Processing, vol. 64, no. 3, pp. 592–602, 2016.
 [25] J. N. Wilson, P. Gader, W.H. Lee, H. Frigui, and K. Ho, “A largescale systematic evaluation of algorithms using groundpenetrating radar for landmine detection and discrimination,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 8, pp. 2560–2572, 2007.
 [26] L. Robledo, M. Carrasco, and D. Mery, “A survey of land mine detection technology,” International Journal of Remote Sensing, vol. 30, no. 9, pp. 2399–2410, 2009.
 [27] S. Lameri, F. Lombardi, P. Bestagini, M. Lualdi, and S. Tubaro, “Landmine detection from GPR data using convolutional neural networks,” in European Signal Processing Conference, 2017, pp. 508–512.
 [28] L. E. Besaw and P. J. Stimac, “Deep convolutional neural networks for classifying GPR Bscans,” Proceedings of SPIE, vol. 9454, p. 945413, 2015.
 [29] E. D. Sontag, “VC dimension of neural networks,” NATO ASI Series F Computer and Systems Sciences, vol. 168, pp. 69–96, 1998.
 [30] C.C. Chang and C.J. Lin, “Libsvm: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, p. 27, 2011.
 [31] M. Aharon, M. Elad, and A. Bruckstein, “KSVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–22, 2006.
 [32] M. S. Lewicki and T. J. Sejnowski, “Learning overcomplete representations,” Neural computation, vol. 12, no. 2, pp. 337–365, 2000.
 [33] D. A. Spielman, H. Wang, and J. Wright, “Exact recovery of sparselyused dictionaries,” in Conference on Learning Theory, 2012, pp. 37.1–37.18.
 [34] A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, and R. Tandon, “Learning sparsely used overcomplete dictionaries,” in Conference on Learning Theory, 2014, pp. 123–137.
 [35] Z. Jiang, Z. Lin, and L. S. Davis, “Label consistent KSVD: Learning a discriminative dictionary for recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2651–2664, 2013.
 [36] Q. Zhang and B. Li, “Discriminative KSVD for dictionary learning in face recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2691–2698.
 [37] K. R. Varshney, M. Çetin, J. W. Fisher, and A. S. Willsky, “Sparse representation in structured dictionaries with application to synthetic aperture radar,” IEEE Transactions on Signal Processing, vol. 56, no. 8, pp. 3548–3561, 2008.
 [38] Y. Suo, M. Dao, U. Srinivas, V. Monga, and T. D. Tran, “Structured dictionary learning for classification,” arXiv preprint arXiv:1406.1943, 2014.
 [39] M. Yang, L. Zhang, X. Feng, and D. Zhang, “Sparse representation based Fisher discrimination dictionary learning for image classification,” International Journal of Computer Vision, vol. 109, no. 3, pp. 209–232, 2014.
 [40] L. Li, S. Li, and Y. Fu, “Learning lowrank and discriminative dictionary for image classification,” Image and Vision Computing, vol. 32, no. 10, pp. 814–823, 2014.
 [41] I. Ramirez, P. Sprechmann, and G. Sapiro, “Classification and clustering via dictionary learning with structured incoherence and shared features,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3501–3508.
 [42] S. Kong and D. Wang, “A dictionary learning approach for classification: Separating the particularity and the commonality,” in European Conference on Computer Vision, 2012, pp. 186–199.
 [43] S. Gao, I. W.H. Tsang, and Y. Ma, “Learning categoryspecific dictionary and shared dictionary for finegrained image categorization,” IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 623–634, 2014.
 [44] C. Rusu, “On learning with shiftinvariant structures,” arXiv preprint arXiv:1812.01115, 2018.
 [45] L. H. Nguyen and T. D. Tran, “Separation of radiofrequency interference from SAR signals via dictionary learning,” in IEEE Radar Conference, 2018, pp. 0908–0913.
 [46] C. GarciaCardona and B. Wohlberg, “Convolutional dictionary learning: A comparative review and new algorithms,” IEEE Transactions on Computational Imaging, vol. 4, no. 3, pp. 366–381, 2018.
 [47] T. H. Vu and V. Monga, “Fast lowrank shared dictionary learning for image classification,” IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5160–5175, 2017.
 [48] B. Dumitrescu and P. Irofti, Dictionary Learning Algorithms and Applications. Springer, 2018.
 [49] J. Sulam, B. Ophir, M. Zibulevsky, and M. Elad, “Trainlets: Dictionary learning in high dimensions,” IEEE Transactions on Signal Processing, vol. 64, no. 12, pp. 3180–3193, 2016.
 [50] J. Chen, L. Jiao, W. Ma, and H. Liu, “Unsupervised highlevel feature extraction of SAR imagery with structured sparsity priors and incremental dictionary learning,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 10, pp. 1467–1471, 2016.
 [51] A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–202, 2009.
 [52] R. Rubinstein, M. Zibulevsky, and M. Elad, “Efficient implementation of the KSVD algorithm using batch orthogonal matching pursuit,” CS Technion, vol. 40, no. 8, pp. 1–15, 2008.
 [53] N. Zhou and J. Fan, “Jointly learning visually correlated dictionaries for largescale visual recognition applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 715–730, 2014.
 [54] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
 [55] M. R. Osborne, B. Presnell, and B. A. Turlach, “A new approach to variable selection in least squares problems,” IMA Journal of Numerical Analysis, vol. 20, no. 3, pp. 389–403, 2000.
 [56] J. Leckebusch, “Comparison of a steppedfrequency continuous wave and a pulsed GPR system,” Archaeological Prospection, vol. 18, no. 1, pp. 15–25, 2011.
 [57] C. Warren, A. Giannopoulos, and I. Giannakis, “An advanced GPR modelling framework: The next generation of gprMax,” in IEEE International Workshop on Advanced Ground Penetrating Radar, 2015, pp. 1–4.
 [58] M. G. M. Hussain, “Principles of highresolution radar based on nonsinusoidal waves  Part III: Radartarget reflectivity model,” IEEE Transactions on Electromagnetic Compatibility, vol. 32, no. 2, pp. 144–152, 1990.
 [59] D. Pasculli and G. Manacorda, “Realtime, pseudo realtime and stroboscopic sampling in timedomain GPRs,” in IEEE International Workshop on Advanced Ground Penetrating Radar, 2015, pp. 1–4.
 [60] A. Bystrov and M. Gashinova, “Analysis of stroboscopic signal sampling for radar target detectors and range finders,” IET Radar, Sonar & Navigation, vol. 7, no. 4, pp. 451–458, 2013.
 [61] M. A. GonzálezHuici, “Accurate ground penetrating radar numerical modeling for automatic detection and recognition of antipersonnel landmines,” Ph.D. dissertation, Universitätsund Landesbibliothek Bonn, 2013.
 [62] H. Hongxing, J. M. BioucasDias, and V. Katkovnik, “Interferometric phase image estimation via sparse coding in the complex domain,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 5, pp. 2587–2602, 2015.
 [63] V. Glivenko, “Sulla determinazione empirica della legge di probabilità,” Giornale Dell’Istituto Italiano Degli Attuari, no. 4, p. 92â99, 1933, in Italian.
 [64] F. P. Cantelli, “Sulla determinazione empirica della legge di probabilità,” Giornale Dell’Istituto Italiano Degli Attuari, no. 4, p. 221â424, 1933, in Italian.
 [65] T. Vu, L. Nguyen, T. Guo, and V. Monga, “Deep network for simultaneous decomposition and classification in UWBSAR imagery,” in IEEE Radar Conference, 2018, pp. 0553–0558.
 [66] R. Girshick, “Fast RCNN,” in IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.
 [67] I. Giannakis, A. Giannopoulos, and C. Warren, “A realistic fdtd numerical modeling framework of ground penetrating radar for landmine detection,” IEEE journal of selected topics in applied earth observations and remote sensing, vol. 9, no. 1, pp. 37–51, 2016.
 [68] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
 [69] K. V. Mishra and Y. C. Eldar, “SubNyquist radar: Principles and prototypes,” in Compressed Sensing in Radar Signal Processing, A. D. Maio, Y. C. Eldar, and A. Haimovich, Eds. Cambridge University Press, 2019, in press.
 [70] K. V. Mishra, E. Shoshan, M. Namer, M. Meltsin, D. Cohen, R. Madmoni, S. Dror, R. Ifraimov, and Y. C. Eldar, “Cognitive subNyquist hardware prototype of a collocated MIMO radar,” in IEEE International Workshop on Compressed Sensing Theory and its Applications to Radar, Sonar and Remote Sensing, 2016.
 [71] K. V. Mishra and Y. C. Eldar, “Performance of time delay estimation in a cognitive radar,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017, pp. 3141–3145.
 [72] K. V. Mishra, A. Kruger, and W. F. Krajewski, “Compressed sensing applied to weather radar,” in IEEE International Geoscience and Remote Sensing Symposium, 2014, pp. 1832–1835.