Dictionary Learning for Adaptive GPR Landmine Classification
Ground penetrating radar (GPR) target detection and classification is a challenging task. Here, we consider online dictionary learning (DL) methods to obtain sparse representations (SR) of the GPR data to enhance feature extraction for target classification via support vector machines. Online methods are preferred because traditional batch DL like K-SVD is not scalable to high-dimensional training sets and infeasible for real-time operation. We also develop Drop-Off MINi-batch Online Dictionary Learning (DOMINODL) which exploits the fact that a lot of the training data may be correlated. The DOMINODL algorithm iteratively considers elements of the training set in small batches and drops off samples which become less relevant. For the case of abandoned anti-personnel landmines classification, we compare the performance of K-SVD with three online algorithms: classical Online Dictionary Learning, its correlation-based variant, and DOMINODL. Our experiments with real data from L-band GPR show that online DL methods reduce learning time by 36-93% and increase mine detection by 4-28% over K-SVD. Our DOMINODL is the fastest and retains similar classification performance as the other two online DL approaches. We use a Kolmogorov-Smirnoff test distance and the Dvoretzky-Kiefer-Wolfowitz inequality for the selection of DL input parameters leading to enhanced classification results. To further compare with state-of-the-art classification approaches, we evaluate a convolutional neural network (CNN) classifier which performs worse than the proposed approach. Moreover, when the acquired samples are randomly reduced by 25%, 50% and 75%, sparse decomposition based classification with DL remains robust while the CNN accuracy is drastically compromised.
ground penetrating radar, online dictionary learning, adaptive radar, deep learning, radar target classification, sparse decomposition
A ground penetrating radar (GPR, hereafter) is used for probing the underground by transmitting radio waves from an antenna held closely to the surface and acquiring the echoes reflected from subsurface anomalies or buried objects. As the electromagnetic wave travels through the subsurface, its velocity changes due to the physical properties of the materials in the medium. By recording such changes in the velocity and measuring the travel time of the radar signals, a GPR generates profiles of scattering responses from the subsurface. The interest in GPR is due to its ability to reveal buried objects non-invasively and detect non-metallic scatterers with increased sensitivity to dielectric contrast . From the recordings of the previously observed regions, GPR surveys can also extrapolate subsurface knowledge for inaccessible or unexcavated areas. This sensing technique is, therefore, attractive for several applications such as geophysics, archeology, forensics, and defense (see e.g. [2, 1] for some surveys). Over the last decade, there has been a spurt in GPR research because of advances in electronics and computing resources. GPR has now surpassed traditional ground applications and has become a more general ultra-wideband remote sensing system with proliferation to novel avenues such as through-the-wall imaging, building construction, food safety monitoring, and vegetation observation.
In this work, we consider the application of detecting buried landmines using GPR. This is one of the most extensively investigated GPR applications due to its obvious security and humanitarian importance. Mine detection GPR usually operates in the L-band (- GHz) with ultra-wideband (UWB) transmission in order to achieve both sufficient resolution to detect small targets (- cm diameter) and penetrate solid media at shallow depths (- cm) .
Even though a lot of progress has been made on GPR for landmine detection, discriminating them from natural and manmade clutter remains a critical challenge. In such applications, the signal distortion due to inhomogeneous soil clutter, surface roughness and antenna ringing hampers target recognition. Moreover, the constituting material of many models of landmines is largely plastic and has a very weak response to radar signals due to its low dielectric contrast with respect to the soil . Finally, a major problem arises due to low radar cross section (RCS) of some landmine models . A variety of signal processing algorithms have been proposed for detection of low metal-content landmines in realistic scenarios; approaches based on feature extraction and classification are found to be the most successful (see e.g. [6, 7, 8]), yet false-alarm rates remain very high.
Sparse representation (SR) is effective in extracting the mid- or high-level features in image classification [9, 10]. In the context of anti-personnel landmines recognition using GPR, our prior work [11, 12] has shown that frameworks based on SR improve the performance of Support Vector Machine (SVM) classifiers in distinguishing different types of mines and clutter in highly corrupted GPR signals. In this approach, the signal-of-interest is transformed into a domain where it can be expressed as a linear combination of only a few atoms chosen from a collection called the dictionary matrix [13, 14]. The dictionary may be learned from the data it is going to represent. Dictionary learning (DL) techniques (or sparse coding in machine learning parlance) aim to create adaptive dictionaries which provide the sparsest reconstruction for given training-sets, i.e., a representation with a minimum number of constituting atoms. DL methods are critical building blocks in many applications such as deep learning, image denoising, and super-resolution; see [15, 16, 17] for further applications.
Classical DL algorithms such as Method of Optimal Directions (MOD)  and K-SVD  operate in batches - dealing with the entire training set in each iteration. Although extremely successful, these methods are computationally demanding and not scalable to high-dimensional training sets. An efficient alternative is the Online Dictionary Learning (ODL) algorithm  that has faster convergence than batch DL methods. In this work, we develop a new approach toward classification based on online DL-SR framework for the specific case of GPR-based landmine identification.
The main contributions of this paper are as follows:
1) Faster DL for GPR landmine classification.111The conference precursor of this work was presented in IEEE International Geoscience and Remote Sensing Symposium, 2017 . We investigate the application of DL towards GPR-based landmine classification. To the best of our knowledge, this has not been investigated previously. Furthermore, online DL222We use the term “online DL” to imply any algorithm that operates in online mode. From here on, we reserve the term ODL solely to refer to the method described in . methods have been studied more generally in GPR. Only one other previous study has employed DL (K-SVD) using GPR signals , although for the application of identifying bedrock features. We employ online DL methods and use the coefficients of the resulting sparse vectors as input to a SVM classifier to distinguish mines from clutter. Our comparison of K-SVD and online DL using real data from L-band GPR shows that online DL algorithms present distinct advantages in speed and low false-alarm rates. We propose a new Drop-Off MINi-batch Online Dictionary Learning (DOMINODL) which processes the training data in mini-batches and avoids unnecessary update of the irrelevant atoms in order to reduce the computational complexity. The intuition for the dropoff step comes from the fact that some training samples are highly correlated and, therefore, in the interest of processing time, they can be dropped during training without significantly affecting performance.
2) Better statistical metrics for improved classification. Contrary to previous studies  which determine DL parameters (number of iterations, atoms, etc.) based on bulk statistics such as normalized root-mean-square-error (NRMSE), we consider statistical inference for parameter analysis. Our methods based on Kolmogorov-Smirnoff test distance  and Dvoretzky-Kiefer-Wolfowitz (DKW) Inequality [22, 23] are able to fine-tune model selection resulting in improved mine classification performance.
3) Experimental validation for different landmine sizes. Our comparison of K-SVD with three online DL algorithms - ODL, its correlation-based variant  and DOMINODL - shows that online methods successfully detect mines with very small RCS buried deep into clutter and noise. Some recent studies [25, 26, 27, 28] employ state-of-the-art deep learning approaches such as a convolutional neural network (CNN) to classify GPR-based mines data. Our comparison with CNN illustrates that it has poorer performance in detection of small mines than our online DL approaches. This may also be caused by the relatively small dimensions of our training set which, even if perfectly adequate for DL, may not meet the expected requirements for CNN . We also show that the classification performance of online DL methods does not deteriorate significantly when signal samples are reduced.
The rest of the paper is organized as follows. In the next section, we formally describe the classification problem and GPR specific challenges. In Section III, we explain various DL algorithms used in our methodology and also describe DOMINODL. We provide an overview of the GPR system and field campaign to collect GPR data sets in Section IV. In Section V, we introduce our techniques for DL parameter selection. Section VI presents classification and reconstruction results using real radar data. We conclude in Section VII.
Throughout this paper, we reserve boldface lowercase and uppercase letters for vectors and matrices, respectively. The th element of a vector y is while the th entry of the matrix Y is . We denote the transpose by . We represent the set of real and complex numbers by and , respectively. Other sets are represented by calligraphic letters. The notation stands for the -norm of its argument and is the Frobenius norm. A subscript in the parenthesis such as is the value of the argument in the th iteration. The convolution product is denoted by . The function outputs a diagonal matrix with the input vector along its main diagonal. We use to denote probability, is the statistical expectation, and denotes the absolute value. The functions and output the maximum and supremum value of their arguments, respectively.
Ii Problem Formulation
A pulsed GPR transmits a signal into the ground and receives its echo for each point in the radar coverage area. The digital samples of this echo constitute a range profile designated by the signal vector where is the number of range cells. Formally, the SR of can be described by , where with column vectors or atoms is a redundant or overcomplete () dictionary, and is the sparse representation vector. The SR process finds the decomposition that uses the minimum number of atoms to express the signal. If there are range profiles available, then the SR of the data is described as , where .
Our goal is to classify different mines (including the ones with small RCS) and clutter based on the SR of range profiles using fast, online DL methods. The GPR range profiles from successive scans are highly correlated. We intend to exploit this property during the DL process. In the following, we describe the SR-based classification and mention its challenges.
Ii-a GPR Target Classification Method
We use SVM to classify sparsely represented GPR range profiles using the learned dictionary . Given a predefined collection of labeled observations, SVM searches for a functional that maps any new observation to a class . A binary classifier with linearly separable data, for example, would have . In our work, we use , i.e. the sparse decomposition of a given set of signals (the ‘training’ signals) using the learned dictionary , as a set of labeled observations for the SVM. SVM transforms the data into a high dimensional feature space where it is easier to separate between different classes. The kernel function that we use to compute the high dimensional operations in the feature space, is the Gaussian Radial Basis Function (RBF): , where is the free parameter that decides the influence of the support data vector on the class of the vector in the original space and is the parameter for the soft margin cost function which controls the influence of each individual support vector . To optimally select the SVM input parameters, we arrange the original classification set into training and validation vectors in different ways (-fold cross-validation with =10) to arrive at a certain mean cross-classification accuracy of the validation vectors. The folds were randomly selected and their number was found empirically by determining the limit above which the accuracy improvement was negligible. We refer the reader to  for more details on SVM.
Ii-B The Basic Framework of DL
In many applications, the dictionary is unknown and has to be learned from the training signals coming from the desired class. A DL algorithm finds an over-complete dictionary that sparsely represents measurements such that . Each of the vectors is a sparse representation of with only nonzero entries. A non-tractable formulation of this problem is
Since both and are unknown, a common approach is to use alternating minimization in which we start with an initial guess of and then obtain the solution iteratively by alternating between two stages - sparse coding  and dictionary update  - as follows:
1) Sparse Coding: Obtain as:
where is the SR in iteration. This can be solved using greedy algorithms such as orthogonal matching pursuit (OMP) (), convex relaxation methods like basis pursuit denoising (BPDN) () or focal underdetermined system solver (FOCUSS) ().
2) Dictionary Update: Given , update such that
Classical methods such as MOD  and K-SVD  retain a guess for and and iteratively update either using basis/matched pursuit (BP/MP) or using least squares solver. Both MOD and K-SVD operate in batches, i.e. they deal with the entire training set in each iteration, and solve the same dictionary learning model but differ in the optimization method. Since the initial guesses of or can be far removed from the actual dictionary, the BP step may behave erratically. While there are several state-of-the-art results that outline DL algorithms with concrete performance guarantees, e.g. [33, 34, 16], they require stronger assumptions on the observed data. In practice, heuristic DL such as MOD and K-SVD do yield overcomplete dictionaries although provable guarantees for such algorithms are difficult to come by.
Several extensions of batch DL have been proposed e.g. label consistent (LC) K-SVD  and discriminative K-SVD  introduced label information into the procedure of learning dictionaries to make them more discriminative. The performance of K-SVD can be improved in terms of both computational complexity and obtaining an incoherent dictionary if the learning process enforces constraints such as hierarchical tree sparsity , structured group sparsity (StructDL) , Fisher discrimination (FDDL) , and low-rank-and-Fisher (DLR) . Often objects belonging to different classes have common features. This has been exploited in improving K-SVD to yield methods such as DL with structured incoherence and shared features (DLSI) , separating the commonality and the particularity (COPAR) , convolutional sparse DL (CSDL) , shift-invariant DL , principal component analysis DL , convolutional DL  and low-rank shared DL (LRSDL) . A recent review of various DL algorithms can be found in .
In general, batch DL methods are computationally demanding at test time and not scalable to high-dimensional training sets. On the other hand, online methods such as ODL  converge fast and process small sets. A few improvements to ODL have already been proposed. For example,  considered a faster Online Sparse Dictionary Learning (OSDL) to efficiently handle bigger training set dimensions using a double-sparsity model. A recent study  notes that even though online processing reduces computational complexity compared to batch methods, ODL performance can be further improved if the useful information from previous data is not ignored in updating the atoms. In this study, a new online DL called Correlation-Based Weighted Least Square Update (CBWLSU) was proposed, which employs only part of the previous data correlated with current data for the update step. The CBWLSU is relevant to GPR because the latter often contains highly correlated range profiles.
In this paper, our focus is to investigate such fast DL methods in the context of GPR-based landmine detection and classification. We also propose a new online DL method that exploits range profile correlation as in CBWLSU but is faster than both ODL and CBWLSU. Our inspiration is the K-SVD variant called incremental structured DL (ISDL) that was used earlier in the context of SAR imaging . In ISDL, at each iteration, a small batch of samples is randomly drawn from the training set. Let be the set of indices of the mini-batch training elements chosen uniformly at random at the iteration. Then, ISDL updates the dictionary using the mini-batch and the corresponding representation coefficient . The fast iterative shrinkage-thresholding algorithm (FISTA)  and block coordinate descent methods solve the sparse coding and dictionary update, respectively. As we will see in later sections, the mini-batch strategy that we employed in our DOMINODL reduces computational time without degrading performance.
Iii Online DL
We now describe the DL techniques used for GPR target classification and then develop DOMINODL in order to address challenges of long training times in the context of our problem.
Iii-a K-SVD, LRSDL, ODL and CBWLSU
As mentioned earlier, the popular K-SVD algorithm  sequentially updates all the atoms during the dictionary update step using all training set elements. For the sparse coding step at iteration , K-SVD employs OMP with the formulation:
where is the maximum residual error used as a stopping criterion. For the dictionary update at iteration , K-SVD solves the global minimization problem (3) via sequential minimization problems, wherein every column of and its corresponding row of coefficients of are updated, as follows
The update process employs SVD to find the closest rank-1 approximation (in Frobenius norm) of the error term subject to the constraint .
Another recent batch method of interest is the low-rank shared DL (LRSDL) . This is a discriminative batch DL algorithm (others being D-KSVD  and LC-KSVD ) that learns by promoting the generation of a dictionary which is separated in blocks of atoms associated to different classes as where is the number of classes present in the training set . The resultant coefficient matrix has a block diagonal structure. The assumption of non-overlapping subspaces is often unrealistic in practice. Techniques such as COPAR , JDL  and CSDL  exploit common patterns among different classes even though different objects possess distinct class-specific features. These methods produce am additional constituent which is shared among all classes so that . The drawback of these strategies is that the shared dictionary may also contain class-discriminative features. To avoid this problem, LRSDL requires that the shared dictionary must have a low-rank structure and that its sparse coefficients have to be almost similar. Once the data is sparsely represented with such dictionaries, a sparse-representation-based classifier (SRC) is used to predict the class of new data. The LRSDL update process employs alternating direction method of multipliers (ADMM)  and FISTA for the sparse decomposition step.
ODL is an interesting alternative for inferring a dictionary from large training sets or ones which change over time . ODL also updates the entire dictionary sequentially, but uses one element of training data at a time for the dictionary update. Assuming that the training set is composed of independent and identically distributed samples of a distribution , ODL first draws an example of the training set from . Then, the sparse coding is the Cholesky-based implementation of the LARS-LASSO algorithm . The latter solves a -regularized least-squares problem for each column of . In the dictionary update we consider all the training set elements analyzed so far, namely, :
In the next step, each column of is sequentially updated via gradient descent using the dictionary computed in the previous iteration. Before receiving the next training data, the dictionary update is repeated multiple times for convergence.
CBWLSU is an online method that introduces an interesting alternative for the dictionary update step . Like ODL, CBWLSU evaluates one new training data . However, to update the dictionary, it searches among all previous training data and uses only the ones which share the same atoms with . Let with be the set of previous training elements at iteration . Define as the set of indices of all previous training elements that are correlated with the new element such that . The new training set is . Then, CBWLSU employs a weighting matrix to evaluate the influence of the selected previous elements for the dictionary update step and solves the optimization problem therein via weighted least squares. Unlike K-SVD and ODL, CBWLSU does not require the dictionary pruning step to replace the unused or rarely used atoms with the training data. The sparse coding in CBWLSU is achieved via batch OMP.
Iii-B Drop-Off Mini-Batch Online Dictionary Learning
|Sparse coding method||OMP||FISTA||LARS||Batch OMP||Entropy-thresholded batch OMP|
|Dictionary update||Entire atom-wise||Entire||Entire group-wise||Entire atom-wise||Partial adaptively group-wise|
|Training samples per iteration||Entire||Entire|
|Optimization method||SVD||ADMM||Gradient descent||Weighted least squares||Weighted least squares|
|Post-update dictionary pruning||Yes||No||Yes||No||No|
We now introduce our DOMINODL approach for online DL which not only leads to a dictionary () that is tuned to sparsely represent the training set () but is also faster than other online algorithms. The key idea of DOMINODL is as follows: When sequentially analyzing the training set, it is pertinent to leverage the memory of previous data in the dictionary update step. However, algorithms such as CBWLSU consider all previous elements. Using all previous training set samples is computationally expensive and may also slow down convergence. The samples which have already contributed in the dictionary update do not need to be considered again. Moreover, in some real-time applications (such as highly correlated range profiles of GPR), their contribution may not be relevant anymore for updating the dictionary.
In DOMINODL, we save computations by considering only a small batch of previous elements that are correlated with the new elements. The two sets are defined correlated if, in their sparse decomposition, they have at least one common non-zero element. The time gained from considering fewer previous training elements is used to consider a mini-batch of new training data (instead of a single element as in ODL and CBWLSU). The sparse coding step of DOMINODL employs batch OMP, selecting the maximal residual error in (III-A) using a data-driven entropy-based strategy as described later in this section. At the end of each iteration, DOMINODL also drops-off those previous training set elements that have not been picked up after a certain number of iterations, . The mini-batch drawing combined with dropping off training elements and entropy-based criterion to control sparsity results in an extremely fast online DL algorithm that is beneficial for real-time radar operations.
We initialize the dictionary using a collection of training set samples that are randomly chosen from . We then perform a sparse decomposition of with the dictionary . Let the iteration count indicate the element of the training set. We define the mini-batch of new training elements as , where with and . When , we simply take the remaining new elements to constitute this mini-batch333In numerical experiments, we observed that the condition rarely occurs because DOMINODL updates the dictionary and converges in very few iterations. The algorithm also ensures that the number of previous samples before the dictionary update. If this condition is not fulfilled, then it considers all previous training samples.. We store the set of dictionary atoms participating in the SR of the signals in as . We define with as the set of previous training elements at iteration . We consider a randomly selected mini-batch with such that . Let where such that be a subset of previous training elements that are correlated with the mini-batch of new elements. In order to avoid multiple occurrences of the same element in consecutive mini-batches, DOMINODL ensures that . Let be the set of dictionary atoms used for SR of . Our new training set is . Both mini-batches of new and previous elements are selected such that the entire training set size () is still smaller than that of CBWLSU where it is .
The dictionary update subproblem reduces to considering only the sets , and :
Assume that the sparse coding for each example is known and define the errors as
We can update , such that the above error is minimized, with the assumption of fixed . A similar problem is considered in MOD where error minimization is achieved through least squares. Here, we employ weighted least squares inspired by the fact that it has shown improvement in convergence over standard least squares . We compute the weighting matrix using the sparse representation error :
We then solve the optimization problem
This leads to the weighted least squares solution
The dictionary is then updated with the atoms and its columns are normalized by their -norms.
The is next used for updating the sparse coding of using batch OMP. Selecting a value for the maximal residual error in (III-A) is usually not straightforward. This value can be related to the amount of noise in the observed data but this information is not known. The samples of our training set can be seen as realizations of a statistical process with an unknown distribution and therefore one can associate to these realizations the concept of statistical entropy. We compute the normalized entropy of the mean vector of all the training set samples as
where is the mean vector of all training samples, is the number of features for each training sample and is the probability mass function. In our case, is obtained as the normalized histogram of . Here, is an indicator of the randomness of the data due to noise. We use as the maximal residual error while applying batch OMP in DOMINODL. Algorithm 1 summarizes all major steps of DOMINODL.
Table I summarizes the important differences between DOMINODL and other related algorithms. Like MOD and CBWLSU, DOMINODL uses a weighted least squares solution in the dictionary update. The proof of convergence for the alternating minimization method in MOD was provided in  where it is shown that alternating minimization converges linearly as long as the following assumptions hold true: sparse coefficients have bounded values, sparsity level is on the order of and the dictionary satisfies the RIP property. In , these assumptions have been applied for CBWLSU convergence. Compared to CBWLSU, the improvements in DOMINODL include mini-batch based data selection and data reduction via a drop-off strategy but the update algorithms remain the same. Numerical experiments in Section VI suggest that DOMINODL usually converges in far fewer iterations than CBWLSU.
Although we developed and tested DOMINODL on a highly correlated GPR dataset (see Section IV), this technique may be employed in other applications where real-time learning is necessary and the signals are correlated. Our tests demonstrate that DOMINODL converges faster than other online DL approaches (see Section VI) because of the combined strategy of drawing more new elements for each iteration, considering less previous elements in search for correlation and dropping off the unused previous elements. The entropy-based calculation of , although not exclusive for DL applications as mentioned above, also helps in improving the SR of the data thus, learning a more representative dictionary.
Computational complexity of DOMINODL is very low compared to other online approaches. As mentioned earlier, there are atoms in the dictionary. Assume that every signal is represented by a linear combination of atoms, . Empirically, among all possible combinations of atoms from , the probability to have a common atom in the sparse representation is . Given training elements, the number of training data which have a specific atom in their representation is proportional to . Suppose our mini-batch has elements that reduce the number of training data by a factor (depending on the values of and ). Further, assume that the dropping off step reduces the training set elements by a factor . The number of training data in the iteration is proportional to . Then, the worst estimate of DOMINODL’s computational complexity is due to the sparse coding batch OMP which is of order . This is much smaller than the complexity of ODL () or CBWLSU ().
Figure 1 illustrates the computational complexity of online DL approaches. Figure 1(a) shows that, for fixed number of iterations (), the general trend of complexity with respect to the increase in the number of atoms () is similar for all algorithms. However, the complexity of ODL is higher than CBWLSU and DOMINODL; the latter being the least complex. When the number of iterations is increased, the complexity of ODL and CBWLSU have a similar increasing trend (see Fig. 1(b)). In case of DOMINODL, its complexity is similar to the increasing trend of CBWLSU and determined largely by . When DOMINODL iterations begin accounting for previous elements, its complexity stays constant. The value of changes for every iteration, while depends on the data itself. In general, after a few dozen of iterations, DOMINODL’s complexity always stays lower than CBWLSU.
Iv Measurement Campaign
In this section, we first provide details of our GPR system and the field measurement campaign. We then describe the procedure to organize the entire dataset for our application.
Iv-a Ground Penetrating Radar System
Our GPR (see Fig. 2) is the commercially available SPRScan system manufactured by ERA Technology. It is an L-band, impulse waveform, ultra-wideband (UWB) radar that is mounted on a movable trolley platform. Pulsed GPRs are more effective in terms of offering penetration depth and wide bandwidth with respect to the standard Stepped-Frequency Continuous Wave (SFCW) systems. The former is also more robust to electronic interference and does not suffer from unequal balancing of antenna signals .
Table II lists the salient technical parameters of the system. The radar uses a cm dual bow-tie dipole antenna for both transmit (Tx) and receive (Rx) sealed in a metallic shielding filled with an internal absorber. The central frequency of the system () and its bandwidth () are GHz. The pulse repetition frequency (PRF) and the sampling of the receiver ADC is MHz. The scanning system has a resolution of cm towards the perpendicular broadside (or X direction) and cm towards the cross-beam (Y direction). In our field campaigns, the SPRScan system moves along the survey area over a rail system which allows accurate positioning of the sensor head in order to obtain the aforementioned resolution in X and Y (see also Section IV).
|Operating frequency||2 GHz|
|Pulse repetition frequency||1 MHz|
|Pulse length||0.5 ns|
|Sampling time||25 ps|
|Spatial sampling along the beam||1 cm|
|Cross-beam resolution||4 cm|
|Antenna height||5-9 cm|
|Antenna configuration||Perpendicular broadside|
The transmit pulse of the GPR system is a monocycle. Given the Gaussian waveform
where is the central frequency, is the peak amplitude and , the monocycle waveform is its first derivative 
In these UWB systems both the central frequency and the bandwidth are approximately the reciprocal of the pulse length.
The scattering of UWB radar signals from complex targets that are composed of a finite number of scattering centers can be described in terms of the channel impulse response (CIR). Here, the CIR is considered as a linear, time invariant, causal system which is a function of the target shape, size, constituent materials, and scan angle. The CIR of a GPR target, with scatterers, is expressed as a series of time-delayed and weighted Gaussian pulses 
where each scatterer located at range from the radar is characterized by the reflectivity , duration , relative time shift , where is the speed of the electromagnetic wave in the soil, m/s is the speed of light, and is the dielectric constant which depends on the soil composition and moisture.
The response of the target to the Gaussian monocycle is the received signal
also regarded as the target image, or range profile. For each X/Y position, the system receives a radar echo (range profile) from the transmitted pulse. In order to deal with the exponential signal attenuation during the propagation through the soil medium, the dynamic range of the signal is enhanced via stroboscopic sampling [2, 59, 60]. This technique comprises integrating receiver samples (generated by transmitting a sequence of pulses) at the ADC receiver sampling rate but with a small time offset for each of them. To achieve the desired stroboscopic sampling rate , the time offset must be selected accordingly, i.e., . Our GPR system employs stroboscopic sampling to reach a pseudo sampling frequency of GHz (much above the Nyquist rate) to yield the discrete-time signal .
The receiver has the ability to acquire a maximum of 195 profiles per second, each one consisting of 512 range samples. Prior to the A/D conversion, the signal is averaged to improve the signal to noise ratio (SNR). A time-varying gain correction can be applied to compensate for the soil attenuation and increase the overall dynamic range of the system. The receiver averages 100 range profiles for each antenna position.
Iv-B Test Field Measurements
We evaluated the proposed approach with the measurement data from a 2013 field campaign at Leibniz Institute for Applied Geophysics (LIAG) in Hannover (Germany) ; Fig. 3 shows the test field, for detailed ground truth informations. The soil texture was sandy and highly inhomogeneous (due to the presence of material such as organic matter and stones), thereby leading to a high variability in the electrical parameters. We measured the dielectric constant at three different locations of the testbed with a Time Domain Reflectometer (TDR) to obtain an estimate of its mean value and variability. The average value oscillated between 4.6 and 10.1 with standard deviation and correlation length  of cm. These large variations in soil dielectric characteristics pose difficulties in mine detection.
During the field tests, the SPRScan system moved on two plastic rails with the scan resolution in the X and Y directions being and cm, respectively. The entire survey lane was divided in m sections (see Fig. 3), each containing two targets in the center. The targets on the left and right sides of the lane were buried at approximately and cm depths, respectively.
Our testbed contains standard test targets (STT) and simulant landmines (SIM) of different sizes and shapes. An STT is a surrogate target used for testing landmine detection equipment. It is intended to interact with the equipment in an identical manner as a real landmine does. An SIM has the representative characteristics of a specific landmine class although it is not a replica of any specific model. In this paper, we study three STTs (PMA2, PMN and Type-72) and one SIM(ERA). All of these test objects are buried at a depth of - cm in the test field . For classification purposes, we group PMN and PMA2 together as the largest targets while T72 mines are the smallest (Fig. 4).
Iv-C Dataset Organization
The entire LIAG dataset consists of 27 aforementioned survey sections (or simply, “surveys”) of size m. Every survey consists of range profiles. We arranged the data into the training set () to be used for both DL and classification (as explained in subsection II-A) and a test set () to evaluate the performance of the proposed algorithms.
The training set is a matrix whose columns consist of sampled range profiles of range profiles each. The profiles are selected from different surveys and contain almost exclusively either a particular class of landmine or clutter. In total, we have , , and range profiles for clutter, PMA2/PMN, ERA and Type-72, respectively. An accurate separation of these classes was very challenging because of the contributions from the non-homogeneous soil clutter that often masked the target responses completely. A poor selection would lead the DL to learn a dictionary that is appropriate for sparsely representing clutter, instead of landmines. The test set is a matrix with columns that correspond to sampled range profiles from 6 surveys, two for each target class. The test and training sets contain data from separate surveys to enable fair assessment of the classification performance.
We denote by the matrices and the SRs of and , respectively and by the number of atoms of the learned dictionary .
V Parametric Analysis
In practice, the SR-based classification performance is sensitive to the input parameters of DL algorithms thereby making it difficult to directly apply DL with arbitrary parameter values. Previous works set these parameters through hit-and-trial or resorting to metrics that are unable to discriminate the influence of different parameters . In this section, we propose methods to investigate the effect of the various input parameters on the learning performance and then preset the parameter to optimal values that yield the dictionary (for each DL method) optimized to sparsely represent our GPR data, therefore improving the quality of the features for classification (i.e. the sparse coefficients).
Table III lists these parameters (see Section III): number of iterations , number of trained atoms , and DOMINODL parameters , and . We applied K-SVD, LRSDL, ODL, CBWLSU and DOMINODL separately on the training set for different combination of parameter values. In order to compare the dictionaries obtained from various DL algorithms, we use a similarity measure that quantifies the closeness of the original training set with the reconstructed set obtained using the sparse coefficients of the learned dictionary . From these similarity values, empirical probability density functions (EPDFs) for any combination of parameter values are obtained; we evaluate these EPDFs using statistical metrics described in Section V-B. These metrics efficiently characterize the similarity between and and lead us to an optimal selection of various DL input parameters for our experimental GPR dataset.
|DL algorithm||Input parameters|
|DOMINODL||, , ,|
V-a Similarity Measure
Consider the cross-correlation between the original training set vector and its reconstruction : . The normalized cross-correlation is defined as
For the vector , we define the similarity measure as
where a value of closer to unity demonstrates greater similarity of the reconstructed data with the original training set. We compute for all vectors , and then obtain the normalized histogram or empirical probability density function (EPDF) of the similarity measure. Here, the subscript DL represents the algorithm used for learning e.g. “K”, “O”, “C” and “D” for K-SVD, ODL, CBWLSU and DOMINODL, respectively. Various parameter combinations for a specific DL method result in a collection of EPDFs. For a given DL method, our goal is to compare the epdfs of similarity measure by varying these parameters, and arrive at the thresholds of parameter values after which the changes in are only incremental.
For instance, Fig. 5 shows the EPDFs of learned from the GPR mines data where optimal parameters for different DL methods were determined using statistical methods described in the following subsection. We note that the online DL approaches (, and ) yield distributions that are more skewed towards unity than K-SVD ().
V-B Statistical Metrics
We are looking for parameter values for which is skewed towards unity and has small variance. The individual comparisons of mean () and standard deviation (), as used in previous GPR DL studies , are not sufficient to quantify the observed dispersion in the epdfs obtained by varying any of the parameter values. Some DL studies [20, 62, 50] rely on bulk statistics such as NRMSE but these quantities are insensitive to large changes in parameter values and, therefore, unhelpful in fine-tuning the algorithms. For this evaluation, we will use three different metrics: the coefficient of variation, the Two-sample Kolmogorov-Smirnov (K-S) distance and the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality.
V-B1 Coefficient of variation
We choose to simultaneously compare both () and variance () of a single EPDF by using the coefficient of variation, ; in our analysis, it represents the extent of variability in relation to the mean of the similarity values.
V-B2 Two-sample Kolmogorov-Smirnov distance
In the context of our application, it is more convenient to work with the cumulative distribution functions (CDFs) rather than with PDFs because the well-developed statistical inference theory allows for convenient comparison of CDFs. Therefore, our second metric to compare similarity measurements obtained by successive changes in parameter values is the two-sample Kolmogorov-Smirnov (K-S) distance , which is the maximum distance between two given empirical cumulative distribution functions (ECDF). Larger values of this metric indicate that samples are drawn from different underlying distributions. Given two random variables and , suppose and are their ECDFs of the same length and correspond to their EPDFs and , respectively. Then, the K-S distance is
where denotes the supremum over all distances and is the number of i.i.d. observations (or samples) to evaluate both distributions. In our case, is the number of range profiles in the training set. We first compute a reference ECDF () for each DL algorithm and fixed parameter values. For our purposes, this reference ECDF will be obtained by a particular combination of input parameters of the selected DL algorithm. Then, we vary parameter values from this reference and obtain the corresponding ECDF of similarity measure. Finally, we calculate the K-S distance of with respect to as
For our evaluation, states how much the selection of certain input parameters of DL changes the ECDFs of similarity values (i.e. how different is the result of DL) with respect to the reference.
V-B3 Dvoretzky-Kiefer-Wolfowitz inequality metric
As a third metric, we exploit the Dvoretzky-Kiefer-Wolfowitz inequality (DKW) [22, 23] which precisely characterizes the rate of convergence of an ECDF to a corresponding exact CDF (from which the empirical samples are drawn) for any finite number of samples. Let be the K-S distance between ECDF and the continuous CDF for a random variable and samples. Since changes with the change in the random samples, is also a random variable. We are interested in the conditions that provide desired confidence in verifying if F and G are the same distributions for a given finite . If the two distributions are indeed identical, then the DKW inequality bounds the probability that is greater than any number , with as follows444The corresponding asymptotic result that as , with probability is due to the Glivenko-Cantelli theorem [63, 64].
Consider a binary hypothesis testing framework where we use (21) to test the null hypothesis for a given . The probability of rejecting the null hypothesis when it is true is called the p-value of the test and is bounded by the DKW inequality. Assuming the p-value is smaller than a certain confidence level , the following inequality must hold with probability at least :
Our goal is to use the DKW inequality to compare two ECDFs and as in (20), to verify if they are drawn from the same underlying CDF. By the triangle inequality, the K-S distance satisfies
where an are the underlying CDFs corresponding to and . We now bound the right side using DKW
which is the maximum distance for which and are identical with probability . The DKW metric is the difference
Larger values of this metric imply greater similarity betweem the two ECDFs; a negative value implies that the null hypothesis is not true.
V-C Parametric Evaluation
We evaluated the performance of the aforementioned DL algorithms by analyzing the influence of the various DL input parameters using the metrics introduced in V-B for the reconstruction of the training set . There are various soil types and scenarios for a landmine contaminated site. The LIAG test data provides an accurate representation of a practical scenario. Our metrics are general and derived from widely accepted statistical studies. Thus, their relevance to similar scenarios is very likely. As shown in table III the number of iterations is not relevant to CBWLSU and DOMINODL while the latter requires additional parameters to spacify the mini-batch dimensions and the iterations required to drop-off unused training set elements. We compute the K-S distance and the DKW metric for all methods with respect to a reference distribution . This reference, different for each DL algorithm, is obtained using the following parameters as applicable: , , , and .
V-C1 Number of iterations
Figures 7(a)-(c) show the effect of on the , K-S test distance and the DKW metric for K-SVD, ODL and LRSDL. We have skipped CBWLSU and DOMINODL from this analysis because they do not accept as an input. For ODL, the remains relatively unchanged with an increase in . However, the K-SVD exhibits an oscillating behavior and generally high values. In case of the K-S distance, ODL shows slight increase in while K-SVD oscillates around a mean value that is higher than ODL. The DKW metric provides better insight: even though the ODL distributions differ from with increase in the iterations, the null hypothesis always holds because remains positive. The for K-SVD is also positive but much smaller than ODL. It also does not exhibit any specific trend with an increase in iterations. We also observed a similar behavior with the mean of similarity values. The influence of the number of iterations in LRSDL had the same oscillating behaviour as in K-SVD but with larger variation. We conclude that the number of iterations does not significantly influence the metrics for these algorithms, and choose .
V-C2 Number of trained atoms
Figs. 8(a)-(c) compare all three metrics with change in the number of trained atoms , a parameter that is common to all DL methods. We observe that generally decreases with an increase in . This indicates an improvement in the similarity between the reconstructed and the original training set. K-SVD shows an anomalous pattern for lower values of but later converges to a trend that is identical to other DL approaches. The K-S distance exhibits a linear change in the the distributions with respect to the reference. Since quantifies the difference between the distributions rather than stating which one is better, combining its behavior with makes it evident that an increase in leads to better distributions of similarity values. The DKW metric , calculated with the same reference, expectedly also shows a linear change. It is clear that, even a slight change in leads to more negative values of implying that the null hypothesis does not hold true. This shows the significant influence of the parameter on the distributions. It was interesting to see a slight improvement for the coefficient of variation when using LRSDL with respect to the other strategies. However, KS-distance and DKW metric indicated that the distributions of similarity values for LRSDL were sensitive to the number of trained atoms only up to a certain value.
V-C3 DOMINODL parameters
It is difficult to evaluate DOMINODL EPDFs by varying all four parameters together. Instead, we fix the parameter that is common to all algorithms, i.e. the number of trained atoms , and then determine optimal values of , and . Figure 6 shows the coefficient of variation of the distribution of similarity values as a function of DOMINODL parameters. The drop-off value appears to have a greater influence than mini-batch dimensions and . Our analysis of the computational times of DOMINODL showed that it is essentially independent of and but slightly increases with . This is expected because we also increased the number of steps for sparse decomposition (see Algorithm 1) which is the source of bulk of computations in DL algorithms . Further, in order to ensure that the correlation and the drop-off steps kick off from the very first iteration, DOMINODL should admit several new samples for each iteration thereby increasing as well as the number of previous elements accordingly. Taking into account these observations, we choose and .
According to the results of the parametric evaluation, we choose the following combination of “optimal” parameters for testing our DL strategies: , , , , and .
Vi Experimental Results and Analysis
After selecting the input parameters of the proposed DL strategies, we proceed with the trained dictionaries for sparse decomposition of both training and test sets. The resulting sets of sparse coefficients are the input to the SVM classifier. As mentioned in Section II-A, the threshold and the kernel function parameter for SVM have been selected through cross validation. Our key objective is to demonstrate that online DL algorithms may lead to an improvement in the classification performance over batch learning strategies. In particular, we want to analyze the performance of DOMINODL in terms of classification accuracy and learning speed. As a comparison with a popular state-of-the-art classification method, we also show the classification results with a deep-learning approach based on CNN. Finally, We demonstrate classification performance when the original samples of the range profiles are randomly reduced.
Vi-a Classification with Optimal Parameters
For a comprehensive analysis of the classification performance, we provide both classification maps and confusion matrices for the test set using the optimal DL input parameters that we selected following our parametric evaluation in Section V. The classification maps depict the predicted class of each range profile of the survey under test. The pixel dimension of these maps is dictated by the sampling of the GPR in X and Y directions (see Table II). We stitched together 3 of the 6 surveys from the test set where each survey had 2 buried landmines from a specific target class (PMN/PMA2, ERA and Type-72).
Figure 9 shows the classification maps for different DL methods along with the raw data at depth cm. The selected survey area covers a total of range profiles. The raw data in Fig. 9(a) shows that only four of the six mines exhibit a strong reflectivity while the other two mines have echoes so weak that they are not clearly visible in the raw data. Figures 9(b)-(d) show the results of the SR-based classification approaches using DL. All methods clearly detect and correctly classify the large PMN/PMA2 mines. In case of the medium-size ERA, the echoes are certainly detected as non-clutter but some of its constituent pixels are incorrectly classified as another mine. It is remarkable that the left ERA mine is recognized by our method even though it cannot be discerned visually in the raw data. Most of the false alarms in the map belong to the smallest Type-72 mines. This is expected because their small sizes produce echoes very similar to the ground clutter. On the other hand, when T-72 is the ground truth, it is correctly identified.
Using accurate ground truth information, we defined target halos as the boundaries of the buried landmines. The dimension of the target halos varied depending on the mine size. Let the number of pixels and the declared mine pixels inside the target halo be and , respectively. Similarly, we denote the number of true and declared clutter pixels outside the target halo by and , respectively. Then, the probabilities of correct classification () for each target class and clutter are, respectively,
The being the output of a classifier should not be mistaken as the radar’s probability of detection which is the result of a detector. A detector declares the presence of a mine when only a few pixels inside the halo have been declared as mine; provides a fairer and more accurate evaluation of the classification result. This per-pixel information can be easily used to improve the final detection result. For instance, the operator could set a threshold for the minimum number of pixels to be detected in a cluster so that a circle with center at the cluster centroid could be used as the detected mine. However, such a circle may exclude some of the mine pixels leading to a potential field danger. The per-pixel classification is then employed to determine the guard area around the mine circle.
A confusion matrix is a quantitative representation of the classifier performance. The matrix lists the probability of classifying the ground truth as a particular class. The classes listed column-wise in the confusion matrix are the ground truths while the row-wise classes are their predicted labels. Therefore, the diagonal of the matrix is the while off-diagonal elements are probabilities of misclassification.
Gray denotes the value for a specified class and DL algorithm
For the classification map of Fig. 9, Table IV shows the corresponding confusion matrices for each DL-based classification approach. In general, we observe an excellent classification of PMN/PMA2 landmines (~%), implying that almost every range profile in the test set which belongs to this class is correctly labeled. The for the clutter is also quite high (~%). This can also be concluded from the classification maps where the false alarms within the actual clutter regions are very sparse (i.e. they do not form a cluster) and, therefore, unlikely to be interpreted as an extended target. As noted previously, most of the clutter misclassification is associated with the Type-72 class. The ERA test targets show some difficulty with correct classification. But most of the pixels within its target halo are declared at least as some type of mine (which is quite useful in terms of issuing safety warnings in the specific field area). This result can be explained by the fact that ERA test targets do not represent a specific mine but have general characteristics common to most landmines. The Type-72 mines exhibit a which is slightly higher with respect to ERA targets. This is a remarkable result because Type-72 targets were expected to be the most challenging to classify due to their small size.
Conventionally, as mentioned in , LRSDL is used with a sparse-representation-based classification (SRC). However, applying this approach to our problem resulted in very low accuracy (an average of ~% across all classes as evident from Table IV) and semi-random classification maps (Fig. 9). This can be explained by the extreme similarity between the training set examples of different classes; mines and clutter are only slightly dissimilar in their responses and mine responses are generally hidden in the ground reflections. Each learned “block” differed only slightly from the other and, therefore, poor classification results are achieved with this dataset. On the other hand, when we used the dictionary learned with LRSDL with our SVM-based technique, we obtained better classification accuracy (see Table IV and Fig. 9). However, this performance is still inferior to K-SVD and, hence, even worse than the other online DL approaches.
All DL algorithms used for our sparse classification approach show very similar results for the clutter and PMN/PMA2 classes. However, online DL methods show higher for the ERA and Type-72 targets than K-SVD. From Table IV, the detection enhancement using the best of the online DL algorithms for PMN/PMA2 over K-SVD is %. The improvements for ERA and T-72 are computed similarly as % and %, respectively.
Vi-B Classification with Non-Optimal Parameters
In order to demonstrate how the quality of the learned dictionary affects the final classification, we now show the confusion matrices for a non-optimal selection of input parameters in different DL algorithms. Our goal is to emphasize the importance of learning a good dictionary by selecting the optimal parameters rather than specifying how each parameter affects the final classification result. We arbitrarily selected the number of trained atoms to be only for all DL approaches, reduce the number of iterations to for ODL and KSVD and, for DOMINODL, we use =30, =5 and =2. Table V shows the resulting confusion matrix. While the clutter classification accuracy is almost the same as in Table IV, the for PMN/PMA2 landmines decreased by ~% for most of the algorithms except ODL where it remains unchanged. The classification accuracy for ERA and Type-72 mines is only slightly worse for online DL approaches. However, in the case of K-SVD, the reduces by ~% and ~% for ERA and Type-72, respectively. Clearly, the reconstruction and correct classification of range profiles using batch algorithms such as K-SVD is strongly affected by a non-optimal choice of DL input parameters. As discussed earlier in Section V-C, this degradation is likely due to the influence of rather than .
Blue denotes the best performance among all DL algorithms
Vi-C Computational Efficiency
We used MATLAB 2016a platform on an 8-Core CPU Windows 7 desktop PC to clock the times for DL algorithms. The ODL algorithm from  is implemented as mex executable, and therefore already fine-tuned for speed. For K-SVD, we employed the efficient implementation from  to improve computational speed. Table VI lists the execution times of the five DL approaches. Here, the parameters were optimally selected for all the algorithms. The LRSDL is the slowest of all while ODL is more than 4 times faster than K-SVD. The CBWLSU provided better classification results but is three times slower than ODL. This could be because the dictionary update step always considers the entire previous training set elements that correlate with only one new element (i.e. there is no mini-batch strategy). This makes the convergence in CBWLSU more challenging.
The DOMINODL is the fastest DL method clocking 3x speed than ODL and 15x than K-SVD. This is because the DOMINODL updates the dictionary by evaluating only a mini-batch of previous elements (instead of all of them as in CBWLSU) that correlate with a mini-batch of several new elements (CBWLSU uses just one new element). Further, DOMINODL drops out the unused elements leading to a faster convergence. We note that, unlike ODL and K-SVD implementations, we did not use mex executables of DOMINODL which can further shorten current execution times. From Table VI, the reduction in DOMINODL computational time over K-SVD is %. The reduction for ODL and CBWLSU are computed similarly as % and %, respectively.
The computational bottleneck of mines classification lies in the training times. In comparison, the common steps of sparse decomposition and SVM-based classification during testing take just 0.4 s and 1 s, respectively, for an entire survey (1 m 1 m area with 2500 range profiles). Thus, time taken per range profile in ~0.59 ms. The average scan rate of our GPR system is 0.19 m/s (or 1 cm/52.1 ms). This can go as high as 2.7 m/s (or 1 cm/3.61 ms) in other GPRs used for landmines application. Therefore, the test times do not impose much computational cost.
Vi-D Comparison with Sparse-Representation-Based Classification
Gray denotes the value for a specified class and DL algorithm
We compared our proposed DL-based approach with the sparse-representation-based classification (SRC) method proposed in . The SRC needs a labeled dictionary but the dictionary that we learn from does not have label information anymore thereby making SRC infeasible here. Therefore, we adopt the following steps for a reasonable comparison of the two methods. We feed SRC with as the dictionary . As indicated in Section IV.C, a meticulously selected collection of mines/clutter responses as is meaningful for comparing different DL approaches. But it does not highlight the benefits of employing DL per se. Therefore, we generate a coarser selection, i.e. more profiles than the handpicked case, as for both approaches. Table VII shows the confusion matrix for the residual-based classification along with the proposed DL-based approaches. The DL-based mine classification is consistent with previous results - even better with DOMINODL - but with some trade-off of decreasing clutter accuracy. The accuracy of residual-based classifier is severely degraded for all mine classes, dropping by at least 45%, 39% and 55% for PMA/PMA2, ERA and T72, respectively. This renders the increase in clutter classification accuracy of this method not usable.
Vi-E Deep-Learning-Based Classification
The core idea of SR-based classification is largely based on the assumption that signals are linear combinations of a few atoms. In practice, this is often not the case. This has led to a few recent works  that suggest employing deep learning for radar target classification. However, these techniques require significantly large datasets for training.
We compared classification results of our methods with a deep learning approach. In particular, we constructed a CNN because these networks are known to efficiently exploit structural or locational information in the data and yield comparable learning potential with far fewer parameters . We modeled our proposed CNN framework as a classification problem wherein each class denotes the type of mine or clutter. The training data set for our CNN structure is the matrix (see Section IV). Building up a synthetic database is usually an option for creating (or extending) a training set for deep learning applications. However, accurately modeling a GPR scenario is still an ongoing challenge in the GPR community because of the difficulties in accurately reproducing the soil inhomogeneities (and variabilities), the surface and underground clutter, the antenna coupling and ringing effects, etc. Even though some applications have been promising , this remains a cumbersome task.
The input layer of our CNN took one-dimensional sample set of size . It was followed by two convolutional layers with and filters of size and , respectively. The output layer consisted of four units wherein the network classifies the given input data as clutter or one of the three mines. There were rectified linear units (ReLU) after each convolutional layer; the ReLU function is given by . The architecture of the CNN was selected through an arduous process of testing many combination of layers/filters and hyperparameters which would lead to better accuracy during training. A deeper network slightly increased the accuracy in the training phase but led to poorer performance when classifying new data (i.e. the test set ). Since our data are limited, adding more layers (i.e. more weights) only led to overfitting and made the network incapable to generalize on new datasets. A multi-dimensional CNN formed by clustering 2D and 3D data would have further reduced the training set. Augmenting the data was also envisioned but commonly used transformations such as scaling/rotations are not useful in our case because the mines were always in the same inclination and their dimension defines the class itself. We also attempted adding different levels of noise but this did not lead to better results considering the available data are already very noisy.
|25% Reduction||50% Reduction||75% Reduction|
|25% Reduction||50% Reduction||75% Reduction|
We trained the network with the labeled training set , selecting ~% of the training data for validation. Specifically, the validation set employed , , , and range profiles for clutter, PMN/PMA2, ERA and Type-72, respectively. We used a stochastic gradient descent algorithm for updating the network parameters with the learning rate of and mini-batch size of samples for epochs.
We realized the proposed network in TensorFlow on a Windows 7 PC with 8-core CPU. The network training took minutes. Figure 10 shows the classification map obtained using CNN. The corresponding confusion matrix is listed in Table VIII. We note that the CNN classifier shows worse than our SR-based techniques, particularly for ERA and Type-72 target classes.
Vi-F Classification with Reduced Range Samples
We now analyze the robustness of our DL-based adaptive classification method to the reduction of the number of samples in the raw data. Assuming the collected data is sparse in dictionary , we undersampled the original raw data in range to obtain its row-undersampled version by randomly reducing the samples. We then applied the same random sampling pattern to the dictionary for obtaining the sparse coefficients. We also analyzed the CNN classifier when the signals are randomly reduced in the same way. Figure 11 illustrates the classification map for all DL approaches when the sampling is reduced by %. Table IX clubs together the confusion matrices when undersampling by %, %, and %.
In comparison to the results in Table IV which used all samples of the raw data, the DL approaches maintain similar classifier performance even when we reduce the samples by 75% (i.e. just 52 samples in total). In contrast, the CNN classifier result which is already heavily compromised with a reduction of %, fails completely for %and % sampling rate. Reducing the number of signal samples when using a dictionary which minimizes the number of non-zero entries in the sparse representation, still assures an exact reconstruction of the signal itself and, consequently its correct classification. The features for classifying the traces are robust to the reduction of the original samples. Deep learning strategies use the signal samples directly as classification features. They also require enormous amount of data for training. Therefore, the degradation in their performance is expected. From the confusion matrix in Table IX indicates that CNN has the highest for ERA. This is a false trail because the network mis-classified almost every pixel as ERA. Overall, DOMINODL and CBWLSU provide excellent results for small mines. However, as seen earlier, CBWLSU is not very well-suited for real-time operation because of longer execution times.
We also assessed the performance of different methods when, instead of sampling fewer range samples per profile, we include all samples in every range profile but reduce the overall number of training set elements randomly from 926 to 694, 464 and 232 range profiles (which respectively correspond to 25%, 50% and 75% reduction). From the corresponding confusion matrices listed in Table X, we note that CNN-based classification results have improved with respect to Table IX. However, the classification accuracy of CNN is still poorer than the DL-based classification. In general, all methods show performance degradation as the training set elements are reduced. Among the online DL methods, ODL is more robust to the range profile reduction than K-SVD.
In this paper, we proposed effective online DL strategies for sparse decomposition of GPR traces of buried landmines. The online methods outperform K-SVD thereby making them a good candidate for SR-based classification. Our algorithm DOMINODL is always the fastest providing near real-time performance and high clutter rejection while also maintaining a classifier performance that is comparable to other online DL algorithms. DOMINODL and CBWLSU generally classify smaller mines better than ODL and K-SVD. Unlike previous works that rely on RMSE, we used metrics based on statistical inference to tune the DL parameters for enhanced operation.
Fast ODL computations pave the way towards cognition [69, 70, 71] in GPR operation, wherein the system uses previous measurements to optimize the processing performance and is capable of sequential sampling adaptation  based on the learned dictionary. For example, in a realistic landmine clearance campaign, an operator could gather the training measurements over a safe area next to the contaminated site, hypothetically placing some buried landmine simulants over it in order to have a faithful representation of the soil/targets interaction beneath the surface. In other words, our work allows the operator to calibrate the acquisition by providing a good training set to learn the dictionary.
The authors acknowledge valuable assistance from David Mateos-Núñez in Section VI-E.
-  H. M. Jol, Ed., Ground penetrating radar theory and applications. Elsevier Science, 2009.
-  D. J. Daniels, Ground penetrating radar. IET, 2004.
-  Landmine Monitor 2017, Geneva, Switzerland, 2017, International Campaign to Ban Landmines - Cluster Munition Coalition.
-  F. Giovanneschi, M. A. González-Huici, and U. Uschkerat, “A parametric analysis of time and frequency domain GPR scattering signatures from buried landmine-like targets,” in SPIE Defense, Security, and Sensing, 2013, pp. 870 914–870 914.
-  M. A. González-Huici, I. Catapano, and F. Soldovieri, “A comparative study of GPR reconstruction approaches for landmine detection,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 12, pp. 4869–4878, 2014.
-  M. A. González-Huici and F. Giovanneschi, “A combined strategy for landmine detection and identification using synthetic GPR responses,” Journal of Applied Geophysics, vol. 99, pp. 154–165, 2013.
-  P. A. Torrione, K. D. Morton, R. Sakaguchi, and L. M. Collins, “Histograms of oriented gradients for landmine detection in ground-penetrating radar data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 3, pp. 1539–1550, 2014.
-  I. Giannakis, A. Giannopoulos, and A. Yarovoy, “Model-based evaluation of signal-to-clutter ratio for landmine detection using ground-penetrating radar,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 6, pp. 3564–3573, 2016.
-  J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
-  J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan, “Sparse representation for computer vision and pattern recognition,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1031–1044, 2010.
-  F. Giovanneschi and M. A. González-Huici, “A preliminary analysis of a sparse reconstruction based classification method applied to GPR data,” in International Workshop on Advanced Ground Penetrating Radar, 2015, pp. 1–4.
-  F. Giovanneschi, K. V. Mishra, M. A. Gonzalez-Huici, Y. C. Eldar, and J. H. G. Ender, “Online dictionary learning aided target recognition in cognitive GPR,” in IEEE International Geoscience and Remote Sensing Symposium, 2017, pp. 4813–4816.
-  M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image processing, vol. 15, no. 12, pp. 3736–3745, 2006.
-  Y. C. Eldar, Sampling Theory: Beyond Bandlimited Systems. Cambridge University Press, 2015.
-  M. Elad, Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing. Springer, 2010.
-  S. Arora, R. Ge, and A. Moitra, “New algorithms for learning incoherent and overcomplete dictionaries,” in Conference on Learning Theory, 2014, pp. 779–806.
-  Y. C. Eldar and G. Kutyniok, Compressed sensing: Theory and applications. Cambridge University Press, 2012.
-  K. Engan, S. O. Aase, and J. H. Husoy, “Method of optimal directions for frame design,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, 1999, pp. 2443–2446.
-  J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning for sparse coding,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 689–696.
-  W. Shao, A. Bouzerdoum, and S. L. Phung, “Sparse representation of GPR traces with application to signal classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 7, pp. 3922–3930, 2013.
-  I. M. Chakravarti, L. R. G., and J. Roy, Handbook of methods of applied statistics: Volume I. John Wiley and Sons, 2004.
-  A. Dvoretzky, J. Kiefer, and J. Wolfowitz, “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator,” The Annals of Mathematical Statistics, pp. 642–669, 1956.
-  P. Massart et al., “The tight constant in the dvoretzky-kiefer-wolfowitz inequality,” The Annals of Probability, vol. 18, no. 3, pp. 1269–1283, 1990.
-  Y. Naderahmadian, S. Beheshti, and M. A. Tinati, “Correlation based online dictionary learning algorithm,” IEEE Transactions on Signal Processing, vol. 64, no. 3, pp. 592–602, 2016.
-  J. N. Wilson, P. Gader, W.-H. Lee, H. Frigui, and K. Ho, “A large-scale systematic evaluation of algorithms using ground-penetrating radar for landmine detection and discrimination,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 8, pp. 2560–2572, 2007.
-  L. Robledo, M. Carrasco, and D. Mery, “A survey of land mine detection technology,” International Journal of Remote Sensing, vol. 30, no. 9, pp. 2399–2410, 2009.
-  S. Lameri, F. Lombardi, P. Bestagini, M. Lualdi, and S. Tubaro, “Landmine detection from GPR data using convolutional neural networks,” in European Signal Processing Conference, 2017, pp. 508–512.
-  L. E. Besaw and P. J. Stimac, “Deep convolutional neural networks for classifying GPR B-scans,” Proceedings of SPIE, vol. 9454, p. 945413, 2015.
-  E. D. Sontag, “VC dimension of neural networks,” NATO ASI Series F Computer and Systems Sciences, vol. 168, pp. 69–96, 1998.
-  C.-C. Chang and C.-J. Lin, “Libsvm: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, p. 27, 2011.
-  M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–22, 2006.
-  M. S. Lewicki and T. J. Sejnowski, “Learning overcomplete representations,” Neural computation, vol. 12, no. 2, pp. 337–365, 2000.
-  D. A. Spielman, H. Wang, and J. Wright, “Exact recovery of sparsely-used dictionaries,” in Conference on Learning Theory, 2012, pp. 37.1–37.18.
-  A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, and R. Tandon, “Learning sparsely used overcomplete dictionaries,” in Conference on Learning Theory, 2014, pp. 123–137.
-  Z. Jiang, Z. Lin, and L. S. Davis, “Label consistent K-SVD: Learning a discriminative dictionary for recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2651–2664, 2013.
-  Q. Zhang and B. Li, “Discriminative K-SVD for dictionary learning in face recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2691–2698.
-  K. R. Varshney, M. Çetin, J. W. Fisher, and A. S. Willsky, “Sparse representation in structured dictionaries with application to synthetic aperture radar,” IEEE Transactions on Signal Processing, vol. 56, no. 8, pp. 3548–3561, 2008.
-  Y. Suo, M. Dao, U. Srinivas, V. Monga, and T. D. Tran, “Structured dictionary learning for classification,” arXiv preprint arXiv:1406.1943, 2014.
-  M. Yang, L. Zhang, X. Feng, and D. Zhang, “Sparse representation based Fisher discrimination dictionary learning for image classification,” International Journal of Computer Vision, vol. 109, no. 3, pp. 209–232, 2014.
-  L. Li, S. Li, and Y. Fu, “Learning low-rank and discriminative dictionary for image classification,” Image and Vision Computing, vol. 32, no. 10, pp. 814–823, 2014.
-  I. Ramirez, P. Sprechmann, and G. Sapiro, “Classification and clustering via dictionary learning with structured incoherence and shared features,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3501–3508.
-  S. Kong and D. Wang, “A dictionary learning approach for classification: Separating the particularity and the commonality,” in European Conference on Computer Vision, 2012, pp. 186–199.
-  S. Gao, I. W.-H. Tsang, and Y. Ma, “Learning category-specific dictionary and shared dictionary for fine-grained image categorization,” IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 623–634, 2014.
-  C. Rusu, “On learning with shift-invariant structures,” arXiv preprint arXiv:1812.01115, 2018.
-  L. H. Nguyen and T. D. Tran, “Separation of radio-frequency interference from SAR signals via dictionary learning,” in IEEE Radar Conference, 2018, pp. 0908–0913.
-  C. Garcia-Cardona and B. Wohlberg, “Convolutional dictionary learning: A comparative review and new algorithms,” IEEE Transactions on Computational Imaging, vol. 4, no. 3, pp. 366–381, 2018.
-  T. H. Vu and V. Monga, “Fast low-rank shared dictionary learning for image classification,” IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5160–5175, 2017.
-  B. Dumitrescu and P. Irofti, Dictionary Learning Algorithms and Applications. Springer, 2018.
-  J. Sulam, B. Ophir, M. Zibulevsky, and M. Elad, “Trainlets: Dictionary learning in high dimensions,” IEEE Transactions on Signal Processing, vol. 64, no. 12, pp. 3180–3193, 2016.
-  J. Chen, L. Jiao, W. Ma, and H. Liu, “Unsupervised high-level feature extraction of SAR imagery with structured sparsity priors and incremental dictionary learning,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 10, pp. 1467–1471, 2016.
-  A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–202, 2009.
-  R. Rubinstein, M. Zibulevsky, and M. Elad, “Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit,” CS Technion, vol. 40, no. 8, pp. 1–15, 2008.
-  N. Zhou and J. Fan, “Jointly learning visually correlated dictionaries for large-scale visual recognition applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 715–730, 2014.
-  S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
-  M. R. Osborne, B. Presnell, and B. A. Turlach, “A new approach to variable selection in least squares problems,” IMA Journal of Numerical Analysis, vol. 20, no. 3, pp. 389–403, 2000.
-  J. Leckebusch, “Comparison of a stepped-frequency continuous wave and a pulsed GPR system,” Archaeological Prospection, vol. 18, no. 1, pp. 15–25, 2011.
-  C. Warren, A. Giannopoulos, and I. Giannakis, “An advanced GPR modelling framework: The next generation of gprMax,” in IEEE International Workshop on Advanced Ground Penetrating Radar, 2015, pp. 1–4.
-  M. G. M. Hussain, “Principles of high-resolution radar based on nonsinusoidal waves - Part III: Radar-target reflectivity model,” IEEE Transactions on Electromagnetic Compatibility, vol. 32, no. 2, pp. 144–152, 1990.
-  D. Pasculli and G. Manacorda, “Real-time, pseudo real-time and stroboscopic sampling in time-domain GPRs,” in IEEE International Workshop on Advanced Ground Penetrating Radar, 2015, pp. 1–4.
-  A. Bystrov and M. Gashinova, “Analysis of stroboscopic signal sampling for radar target detectors and range finders,” IET Radar, Sonar & Navigation, vol. 7, no. 4, pp. 451–458, 2013.
-  M. A. González-Huici, “Accurate ground penetrating radar numerical modeling for automatic detection and recognition of antipersonnel landmines,” Ph.D. dissertation, Universitäts-und Landesbibliothek Bonn, 2013.
-  H. Hongxing, J. M. Bioucas-Dias, and V. Katkovnik, “Interferometric phase image estimation via sparse coding in the complex domain,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 5, pp. 2587–2602, 2015.
-  V. Glivenko, “Sulla determinazione empirica della legge di probabilità,” Giornale Dell’Istituto Italiano Degli Attuari, no. 4, p. 92â99, 1933, in Italian.
-  F. P. Cantelli, “Sulla determinazione empirica della legge di probabilità,” Giornale Dell’Istituto Italiano Degli Attuari, no. 4, p. 221â424, 1933, in Italian.
-  T. Vu, L. Nguyen, T. Guo, and V. Monga, “Deep network for simultaneous decomposition and classification in UWB-SAR imagery,” in IEEE Radar Conference, 2018, pp. 0553–0558.
-  R. Girshick, “Fast R-CNN,” in IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.
-  I. Giannakis, A. Giannopoulos, and C. Warren, “A realistic fdtd numerical modeling framework of ground penetrating radar for landmine detection,” IEEE journal of selected topics in applied earth observations and remote sensing, vol. 9, no. 1, pp. 37–51, 2016.
-  N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
-  K. V. Mishra and Y. C. Eldar, “Sub-Nyquist radar: Principles and prototypes,” in Compressed Sensing in Radar Signal Processing, A. D. Maio, Y. C. Eldar, and A. Haimovich, Eds. Cambridge University Press, 2019, in press.
-  K. V. Mishra, E. Shoshan, M. Namer, M. Meltsin, D. Cohen, R. Madmoni, S. Dror, R. Ifraimov, and Y. C. Eldar, “Cognitive sub-Nyquist hardware prototype of a collocated MIMO radar,” in IEEE International Workshop on Compressed Sensing Theory and its Applications to Radar, Sonar and Remote Sensing, 2016.
-  K. V. Mishra and Y. C. Eldar, “Performance of time delay estimation in a cognitive radar,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017, pp. 3141–3145.
-  K. V. Mishra, A. Kruger, and W. F. Krajewski, “Compressed sensing applied to weather radar,” in IEEE International Geoscience and Remote Sensing Symposium, 2014, pp. 1832–1835.