Localized Dictionary Design for Geometrically Robust Sonar ATR
Advancements in Sonar image capture have opened the door to powerful classification schemes for automatic target recognition (ATR). Recent work has particularly seen the application of sparse reconstruction-based classification (SRC) to sonar ATR, which provides compelling accuracy rates even in the presence of noise and blur. However, existing sparsity based sonar ATR techniques assume that the test images exhibit geometric pose that is consistent with respect to the training set. This work addresses the outstanding open challenge of handling inconsistently posed Sonar images relative to training. We develop a new localized block-based dictionary design that can enable geometric robustness. Further, a dictionary learning method is incorporated to increase performance and efficiency. The proposed SRC with Localized Pose Management (LPM), is shown to outperform the state of the art SIFT feature and SVM approach, due to its power to discern background clutter in Sonar images.
J. McKay, V. Monga\sthanksSupported by Office of Naval Research, Arlington, VA, Grant 0401531
Pennsylvania State University
Department of Electrical Engineering
University Park, PA R. Raj U.S. Naval Research Laboratory
The threat of mines and other harmful underwater devices have made the problem of object identification via automated underwater vehicles (AUVs) a vital area of study for both military and commercial parties. These machines, which offer greater mobility and safety over a human piloted submersible, can be used to detect targets of interest [stack2011automation]. Two ways for AUVs to do this are active target recognition where a human assists in the classification and automated target recognition (ATR). While the former has its benefits, there are times where AUVs cannot incorporate human interactions into its classification. For this reason, we look to investigate Sonar ATR.
Recent work in Sonar ATR has demonstrated the potential for sparse reconstruction-based classification (SRC) in this field   . SRC has been widely popular in computer vision circles since its introduction with  in 2009 because of its ability to perform even when pressed with occlusion and noise. The authors of  show how this characteristic makes SRC particularly attractive to Sonar ATR given how common noise is in this setting. In addition to this, they show that SRC can thrive even when under significantly constrained training settings making it even more fitting given how large the training sample has to be in Sonar ATR to incorporate the different looks objects have at different angles.
One issue with SRC methods are their ability to handle images whose targets are not aligned in the same manner as the training images. In the manner  and  execute their SRC scheme, targets need to have the same location and dimension, reducing flexibility in real-world scenarios. We look to address this issue of inconsistently-posed test images via localized pose management (LPM) which exploits localized geometric information present in the image by sampling the global image with multiple sub windows and uses these to initialize the input to a well-known dictionary learning algorithm. It has been empirically established in diverse application domains [srinivas2013shirc]  that localized features (such as corners and edges) reveal more discriminatory information than global geometric counterparts. Additionally, geometric manipulations, such as transformations between geometric structures, are easier to handle at the local level.
In the following section, we give a summary of what SRC methods are and how they are implemented. Next, we describe LPM in detail with focus on its dictionary learning step. Lastly, we present the results of several experiments comparing our SRC with LPM method against a popular SIFT feature SVM using a dataset of Sonar images provided by the Naval Surface Warfare Center.
Building on the success of sparsity-based methods in compressive sensing ,  present SRC, a linear modeling framework that offers essentially no formal training and robust classification rates even when pressed with noise and blurring. Their work starts by constructing a class-specific dictionary, , using the available training images, i.e.
where represents the dictionary of vectorized training images corresponding to class . With we can classify a vectorized test image by solving for
where the typically induces the sparsest solution for . There are several options with which to solve the above problem and the review by  presents a comprehensive overview. We found a L1LS method to produce the most satisfactory results for our work.
 showed that it is possible produce compelling classification rates on consistently-posed test images of Sonar image using SRC. That said, in real-world cases the ability to collect targets all arranged in geometrically ideal positions is difficult if not infeasible. For this reason, we present the following algorithm to handle pose diversity for SRC in Sonar ATR.
3 SRC with Localized Pose Management
To use SRC with geometrically diverse Sonar ATR settings, we develop a localized block-based approach. For our method, images are segmented into several by blocks which are then used as the training images for the dictionary of (1). Each of the small blocks are assigned the label of whichever class the original larger image has. The test image is then also segmented into blocks and each one is tested against . We use the term “block” is used instead of “patch” to highlight the fact that we have no intention to apply any feature transformation to the sub-images.
There are several routes to determining the class of the test image once all of its blocks have individually been classified via SRC. A majority vote approach where the test image is assigned the most common class amongst its blocks is one of the simplest, but is also highly susceptible to misclassification in cluttered images with prominent background features. Another method that has proven to yield the best results in our own experiments is a tailored maximal likelihood approach [kittler1998combining]. To understand it, consider the following: let be an extracted test block and be its coefficient vector found via SRC corresponding to the dictionary . Define as
where is a vector that holds all the values of corresponding to class and presents zero for all other entries. The probability that belongs to any of the classes is
Thus, the maximal likelihood estimate of the class of is
where is the number of blocks extracted from the test image.
This strategy as a whole provides a straightforward way to use SRC with test images whose target is not aligned with the training and/or has dimension different from the training. In the context of Sonar ATR, this approach offers a translationally invariant method by which to use SRC without any rotationally invariant confusion. We note this as Sonar image capture can render quite different images for the same shaped object depending on the angle of the object to the device collecting the data, as figure 2 shows. Therefore, SRC with LPM is structured in such a manner to adhere to the constraints that Sonar imaging imposes.
An outcome of a dictionary design that concatenates many local block images is the issue of handling the very large matrix . The computational stress of SRC using every by training block can cause the process to be untenable for most machines, making any approach that reduces the size of valuable. Additionally, it is within reason to believe that there is redundancy within each class’ blocks. In , the authors found that SRC can work in Sonar even when the dictionary lacks such redundancy so these essentially repeated blocks are unnecessary. For these reasons, we look at a dictionary learning procedure as a justifiable strategy to condense .
There are several different dictionary learning methods for which we could consider. The implementation that  outlines has proven to yield highly robust dictionaries with relatively modest computational stress. Their approach entails selecting a random assortment of training samples that are then fed into the Online Dictionary Learning (ODL) algorithm from [mairal2009online] to be further refined. ODL specializes in minimizing dictionaries to a condensed, discriminative form intended for sparsity-based applications. The whole process serves as a structured means to overcoming our dictionary redundancies problems and, as we will see in section 4, can perform well in seeking out mines in Sonar images.
Figure 1 provides a diagram of how the LPM strategy follows from the block extraction to the final classification of a test image.
To test SRC with LPM with Sonar ATR, we used a dataset provided from the Naval Surface Warfare Center of authentic synthetic aperture Sonar (SAS) captures of 13 backgrounds with 4 separate shapes simulated in various arrangement. Based on similarities, we divided the data into two categories: mine-like and non-mine-like. We used 40 inconsistently-posed test images, twenty per class, for our experiments.
First, we looked to show how much the dictionary learning step of our SRC with LPM impacts classification. We did so by performing a similar block-based scheme that randomly chose every element of its dictionary without filtering through a dictionary learning step. Additionally, we show how the SRC method alone performs on our test images to give a baseline understanding of why a procedure to handle geometric diversity is needed.
All the SRC methods used 18 training images. The two block-based approaches used 18 samples of 60 by 20 pixel blocks from each training image and involved the maximal likelihood scheme of (4). This was implemented using the results 30 test blocks extracted from the test images.
As figure 3 depicts, the dictionary learning step provides a 15% increase (79% classification rate vs. 64%) in accuracy over random sampling alone. The benefits are not class-specific either as both mines and non-mines saw jumps in performance with dictionary learning. This non-trivial result demonstrates how powerful ODL can increase the viability of SRC with LPM.
This said, even the randomized sampling approach performed better than the straight forward application of SRC on mis-aligned targets. SRC alone performed poorly, yielding a 33% accuracy rate. Given how SRC works, it makes sense that it would do so poorly. Without it or a similar method, SRC is poorly equipped to tackle real-world classification problems alone.
Next, we present how well SRC with LPM performs when compared to a popular image classification technique, SIFT feature SVM. In  and  the authors implemented SIFT feature SVMs towards Sonar ATR, the former of which in to handle pose-diversity. For this reason, we used this algorithm on our test images to provide context for SRC with LPM. Experiments involved 25 training images for the SRC with LPM and 50 for the SIFT feature SVM. Our approach used 15 samples of 60 by 20 pixel blocks for the dictionary and 30 blocks taken from each test image for classifications.
Figure 4 shows that our SRC with LPM outperforms the SIFT SVM method in overall classification rate (79% vs. 68%) and correctly identifying mines (90% to 38%). It appears as though the SIFT SVM has a hard time discerning background clutter from the rounded edges of the non-mines, making for a high rate of false negatives. For Sonar ATR, this tendency to miss on potentially threatening objects could be disastrous. On the other hand, the SRC with LPM was able to present high mine-object classification rates with half the training of the SIFT SVM. This further confirms the work of  in showing how SRC can thrive even in limited training.
Lastly, we considered images with noise. The process of Sonar image capture, especially SAS, can be susceptible to fair amounts of noise . Thus, Sonar ATR methods have to show a certain degree of resiliency to this hindrance in order to prove its merit in real-world settings. For the following, we added varying intensities of salt and pepper noise to the test images and saw how each method performed.
5 shows is that the SIFT feature SVM suffers great difficulty in classifying the mine-like objects while the SRC with LPM is able to still retain classification rates above 50% for the same targets, even under 25% pixel corruption. The overall rate for the SIFT feature SVM is buoyed by its non-mine classification, but its trouble with mines is highly problematic. The SRC with LPM seems some impact given noise but its resiliency in avoiding substantial false negatives gives it a great deal of value in real-world settings.
-  Naveen Kumar, Qun Feng Tan, and Shrikanth S Narayanan, “Object classification in sidescan sonar images with sparse representation techniques,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012, pp. 1333–1336.
-  John McKay, Raghu Raj, Vishal Monga, and Jason Isaacs, “Discriminative sparsity in sonar atr,” Oceans 2015 Washington, DC, 2015.
-  R. Fandos, L. Sadamori, and A.M. Zoubir, “Sparse representation based classification for mine hunting using synthetic aperture sonar,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, March 2012, pp. 3393–3396.
-  John Wright, Allen Y Yang, Arvind Ganesh, Shankar S Sastry, and Yi Ma, “Robust face recognition via sparse representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 2, pp. 210–227, 2009.
-  Tiep H Vu, Hojjat S Mousavi, Vishal Monga, UK Rao, and Ganesh Rao, “Dfdl: Discriminative feature-oriented dictionary learning for histopathological image classification,” IEEE Transactions on Medical Imaging, vol. 33, no. 5, pp. 1163–1179, May 2015.
-  Daniel D. Lee and H. Sebastian Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.
-  Emmanuel J Candès and Michael B Wakin, “An introduction to compressive sampling,” Signal Processing Magazine, IEEE, vol. 25, no. 2, pp. 21–30, 2008.
-  David L Donoho, “For most large underdetermined systems of linear equations the minimal 1-norm solution is also the sparsest solution,” Communications on pure and applied mathematics, vol. 59, no. 6, pp. 797–829, 2006.
-  Allen Y Yang, S Shankar Sastry, Arvind Ganesh, and Yi Ma, “Fast 1-minimization algorithms and an application in robust face recognition: A review,” in Image Processing (ICIP), 2010 17th IEEE International Conference on. IEEE, 2010, pp. 1849–1852.
-  Tiep Vu, Hojjat Mousavi, Vishal Monga, Ganesh Rao, and Arvind Rao, “Histopathological image classification using discriminative feature-oriented dictionary learning,” Medical Imaging, IEEE Transactions on.
-  Zhaotong Zhu, Xiaomei Xu, Liangliang Yang, Huicheng Yan, Shibao Peng, and Jia Xu, “A model-based sonar image atr method based on sift features,” in OCEANS 2014 - TAIPEI, April 2014, pp. 1–4.
-  Michael P Hayes and Peter T Gough, “Synthetic aperture sonar: a review of current status,” Oceanic Engineering, IEEE Journal of, vol. 34, no. 3, pp. 207–224, 2009.
-  H.S. Mousavi, V. Monga, and T.D. Tran, “Iterative convex refinement for sparse recovery,” Signal Processing Letters, IEEE, vol. 22, no. 11, pp. 1903–1907, Nov 2015.
-  Umamahesh Srinivas, Yuanming Suo, Minh Dao, Vishal Monga, and Trac D Tran, “Structured sparse priors for image classification,” Image Processing, IEEE Transactions on, June 2015.
-  Raghu G. Raj and A.C. Bovik, “A hierarchicial bayesian-map approach to computational imaging,” IEEE Internation Conference on Image Processing, 2014.