Localized Dictionary Design for Geometrically Robust Sonar ATR

Localized Dictionary Design for Geometrically Robust Sonar ATR


Advancements in Sonar image capture have opened the door to powerful classification schemes for automatic target recognition (ATR). Recent work has particularly seen the application of sparse reconstruction-based classification (SRC) to sonar ATR, which provides compelling accuracy rates even in the presence of noise and blur. However, existing sparsity based sonar ATR techniques assume that the test images exhibit geometric pose that is consistent with respect to the training set. This work addresses the outstanding open challenge of handling inconsistently posed Sonar images relative to training. We develop a new localized block-based dictionary design that can enable geometric robustness. Further, a dictionary learning method is incorporated to increase performance and efficiency. The proposed SRC with Localized Pose Management (LPM), is shown to outperform the state of the art SIFT feature and SVM approach, due to its power to discern background clutter in Sonar images.


J. McKay, V. Monga\sthanksSupported by Office of Naval Research, Arlington, VA, Grant 0401531 Pennsylvania State University
Department of Electrical Engineering
University Park, PA R. Raj U.S. Naval Research Laboratory
Washington, DC

1 Introduction

Figure 1: SRC with localized pose management (LPM) for Sonar ATR.

The threat of mines and other harmful underwater devices have made the problem of object identification via automated underwater vehicles (AUVs) a vital area of study for both military and commercial parties. These machines, which offer greater mobility and safety over a human piloted submersible, can be used to detect targets of interest [stack2011automation]. Two ways for AUVs to do this are active target recognition where a human assists in the classification and automated target recognition (ATR). While the former has its benefits, there are times where AUVs cannot incorporate human interactions into its classification. For this reason, we look to investigate Sonar ATR.

Recent work in Sonar ATR has demonstrated the potential for sparse reconstruction-based classification (SRC) in this field [1] [2] [3]. SRC has been widely popular in computer vision circles since its introduction with [4] in 2009 because of its ability to perform even when pressed with occlusion and noise. The authors of [2] show how this characteristic makes SRC particularly attractive to Sonar ATR given how common noise is in this setting. In addition to this, they show that SRC can thrive even when under significantly constrained training settings making it even more fitting given how large the training sample has to be in Sonar ATR to incorporate the different looks objects have at different angles.

One issue with SRC methods are their ability to handle images whose targets are not aligned in the same manner as the training images. In the manner [4] and [2] execute their SRC scheme, targets need to have the same location and dimension, reducing flexibility in real-world scenarios. We look to address this issue of inconsistently-posed test images via localized pose management (LPM) which exploits localized geometric information present in the image by sampling the global image with multiple sub windows and uses these to initialize the input to a well-known dictionary learning algorithm. It has been empirically established in diverse application domains [srinivas2013shirc] [6] that localized features (such as corners and edges) reveal more discriminatory information than global geometric counterparts. Additionally, geometric manipulations, such as transformations between geometric structures, are easier to handle at the local level.

In the following section, we give a summary of what SRC methods are and how they are implemented. Next, we describe LPM in detail with focus on its dictionary learning step. Lastly, we present the results of several experiments comparing our SRC with LPM method against a popular SIFT feature SVM using a dataset of Sonar images provided by the Naval Surface Warfare Center.

2 Src

Building on the success of sparsity-based methods in compressive sensing [7], [4] present SRC, a linear modeling framework that offers essentially no formal training and robust classification rates even when pressed with noise and blurring. Their work starts by constructing a class-specific dictionary, , using the available training images, i.e.

where represents the dictionary of vectorized training images corresponding to class . With we can classify a vectorized test image by solving for


where the typically induces the sparsest solution for [8]. There are several options with which to solve the above problem and the review by [9] presents a comprehensive overview. We found a L1LS method to produce the most satisfactory results for our work.

[2] showed that it is possible produce compelling classification rates on consistently-posed test images of Sonar image using SRC. That said, in real-world cases the ability to collect targets all arranged in geometrically ideal positions is difficult if not infeasible. For this reason, we present the following algorithm to handle pose diversity for SRC in Sonar ATR.

3 SRC with Localized Pose Management

To use SRC with geometrically diverse Sonar ATR settings, we develop a localized block-based approach. For our method, images are segmented into several by blocks which are then used as the training images for the dictionary of (1). Each of the small blocks are assigned the label of whichever class the original larger image has. The test image is then also segmented into blocks and each one is tested against . We use the term “block” is used instead of “patch” to highlight the fact that we have no intention to apply any feature transformation to the sub-images.

There are several routes to determining the class of the test image once all of its blocks have individually been classified via SRC. A majority vote approach where the test image is assigned the most common class amongst its blocks is one of the simplest, but is also highly susceptible to misclassification in cluttered images with prominent background features. Another method that has proven to yield the best results in our own experiments is a tailored maximal likelihood approach [kittler1998combining]. To understand it, consider the following: let be an extracted test block and be its coefficient vector found via SRC corresponding to the dictionary . Define as


where is a vector that holds all the values of corresponding to class and presents zero for all other entries. The probability that belongs to any of the classes is


Thus, the maximal likelihood estimate of the class of is


where is the number of blocks extracted from the test image.

This strategy as a whole provides a straightforward way to use SRC with test images whose target is not aligned with the training and/or has dimension different from the training. In the context of Sonar ATR, this approach offers a translationally invariant method by which to use SRC without any rotationally invariant confusion. We note this as Sonar image capture can render quite different images for the same shaped object depending on the angle of the object to the device collecting the data, as figure 2 shows. Therefore, SRC with LPM is structured in such a manner to adhere to the constraints that Sonar imaging imposes.

Figure 2: Sonar images of two differently oriented cylinders.

An outcome of a dictionary design that concatenates many local block images is the issue of handling the very large matrix . The computational stress of SRC using every by training block can cause the process to be untenable for most machines, making any approach that reduces the size of valuable. Additionally, it is within reason to believe that there is redundancy within each class’ blocks. In [2], the authors found that SRC can work in Sonar even when the dictionary lacks such redundancy so these essentially repeated blocks are unnecessary. For these reasons, we look at a dictionary learning procedure as a justifiable strategy to condense .

There are several different dictionary learning methods for which we could consider. The implementation that [10] outlines has proven to yield highly robust dictionaries with relatively modest computational stress. Their approach entails selecting a random assortment of training samples that are then fed into the Online Dictionary Learning (ODL) algorithm from [mairal2009online] to be further refined. ODL specializes in minimizing dictionaries to a condensed, discriminative form intended for sparsity-based applications. The whole process serves as a structured means to overcoming our dictionary redundancies problems and, as we will see in section 4, can perform well in seeking out mines in Sonar images.

Figure 1 provides a diagram of how the LPM strategy follows from the block extraction to the final classification of a test image.

4 Experiments

To test SRC with LPM with Sonar ATR, we used a dataset provided from the Naval Surface Warfare Center of authentic synthetic aperture Sonar (SAS) captures of 13 backgrounds with 4 separate shapes simulated in various arrangement. Based on similarities, we divided the data into two categories: mine-like and non-mine-like. We used 40 inconsistently-posed test images, twenty per class, for our experiments.

First, we looked to show how much the dictionary learning step of our SRC with LPM impacts classification. We did so by performing a similar block-based scheme that randomly chose every element of its dictionary without filtering through a dictionary learning step. Additionally, we show how the SRC method alone performs on our test images to give a baseline understanding of why a procedure to handle geometric diversity is needed.

All the SRC methods used 18 training images. The two block-based approaches used 18 samples of 60 by 20 pixel blocks from each training image and involved the maximal likelihood scheme of (4). This was implemented using the results 30 test blocks extracted from the test images.

As figure 3 depicts, the dictionary learning step provides a 15% increase (79% classification rate vs. 64%) in accuracy over random sampling alone. The benefits are not class-specific either as both mines and non-mines saw jumps in performance with dictionary learning. This non-trivial result demonstrates how powerful ODL can increase the viability of SRC with LPM.

This said, even the randomized sampling approach performed better than the straight forward application of SRC on mis-aligned targets. SRC alone performed poorly, yielding a 33% accuracy rate. Given how SRC works, it makes sense that it would do so poorly. Without it or a similar method, SRC is poorly equipped to tackle real-world classification problems alone.

Figure 3: SRC with LPM (left), LPM without dictionary learning (center), and without LPM (right); standard deviations shown.

Next, we present how well SRC with LPM performs when compared to a popular image classification technique, SIFT feature SVM. In [11] and [2] the authors implemented SIFT feature SVMs towards Sonar ATR, the former of which in to handle pose-diversity. For this reason, we used this algorithm on our test images to provide context for SRC with LPM. Experiments involved 25 training images for the SRC with LPM and 50 for the SIFT feature SVM. Our approach used 15 samples of 60 by 20 pixel blocks for the dictionary and 30 blocks taken from each test image for classifications.

Figure 4: SRC with LPM (left) vs. SIFT Feature SVM (right). SRC used 25 training samples and the SVM used 50. Standard deviation of trials shown.

Figure 4 shows that our SRC with LPM outperforms the SIFT SVM method in overall classification rate (79% vs. 68%) and correctly identifying mines (90% to 38%). It appears as though the SIFT SVM has a hard time discerning background clutter from the rounded edges of the non-mines, making for a high rate of false negatives. For Sonar ATR, this tendency to miss on potentially threatening objects could be disastrous. On the other hand, the SRC with LPM was able to present high mine-object classification rates with half the training of the SIFT SVM. This further confirms the work of [2] in showing how SRC can thrive even in limited training.

Lastly, we considered images with noise. The process of Sonar image capture, especially SAS, can be susceptible to fair amounts of noise [12]. Thus, Sonar ATR methods have to show a certain degree of resiliency to this hindrance in order to prove its merit in real-world settings. For the following, we added varying intensities of salt and pepper noise to the test images and saw how each method performed.

Figure 5: SRC with LPM vs. SIFT feature SVM with noise.

5 shows is that the SIFT feature SVM suffers great difficulty in classifying the mine-like objects while the SRC with LPM is able to still retain classification rates above 50% for the same targets, even under 25% pixel corruption. The overall rate for the SIFT feature SVM is buoyed by its non-mine classification, but its trouble with mines is highly problematic. The SRC with LPM seems some impact given noise but its resiliency in avoiding substantial false negatives gives it a great deal of value in real-world settings.


  • [1] Naveen Kumar, Qun Feng Tan, and Shrikanth S Narayanan, “Object classification in sidescan sonar images with sparse representation techniques,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012, pp. 1333–1336.
  • [2] John McKay, Raghu Raj, Vishal Monga, and Jason Isaacs, “Discriminative sparsity in sonar atr,” Oceans 2015 Washington, DC, 2015.
  • [3] R. Fandos, L. Sadamori, and A.M. Zoubir, “Sparse representation based classification for mine hunting using synthetic aperture sonar,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, March 2012, pp. 3393–3396.
  • [4] John Wright, Allen Y Yang, Arvind Ganesh, Shankar S Sastry, and Yi Ma, “Robust face recognition via sparse representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 2, pp. 210–227, 2009.
  • [5] Tiep H Vu, Hojjat S Mousavi, Vishal Monga, UK Rao, and Ganesh Rao, “Dfdl: Discriminative feature-oriented dictionary learning for histopathological image classification,” IEEE Transactions on Medical Imaging, vol. 33, no. 5, pp. 1163–1179, May 2015.
  • [6] Daniel D. Lee and H. Sebastian Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.
  • [7] Emmanuel J Candès and Michael B Wakin, “An introduction to compressive sampling,” Signal Processing Magazine, IEEE, vol. 25, no. 2, pp. 21–30, 2008.
  • [8] David L Donoho, “For most large underdetermined systems of linear equations the minimal 1-norm solution is also the sparsest solution,” Communications on pure and applied mathematics, vol. 59, no. 6, pp. 797–829, 2006.
  • [9] Allen Y Yang, S Shankar Sastry, Arvind Ganesh, and Yi Ma, “Fast 1-minimization algorithms and an application in robust face recognition: A review,” in Image Processing (ICIP), 2010 17th IEEE International Conference on. IEEE, 2010, pp. 1849–1852.
  • [10] Tiep Vu, Hojjat Mousavi, Vishal Monga, Ganesh Rao, and Arvind Rao, “Histopathological image classification using discriminative feature-oriented dictionary learning,” Medical Imaging, IEEE Transactions on.
  • [11] Zhaotong Zhu, Xiaomei Xu, Liangliang Yang, Huicheng Yan, Shibao Peng, and Jia Xu, “A model-based sonar image atr method based on sift features,” in OCEANS 2014 - TAIPEI, April 2014, pp. 1–4.
  • [12] Michael P Hayes and Peter T Gough, “Synthetic aperture sonar: a review of current status,” Oceanic Engineering, IEEE Journal of, vol. 34, no. 3, pp. 207–224, 2009.
  • [13] H.S. Mousavi, V. Monga, and T.D. Tran, “Iterative convex refinement for sparse recovery,” Signal Processing Letters, IEEE, vol. 22, no. 11, pp. 1903–1907, Nov 2015.
  • [14] Umamahesh Srinivas, Yuanming Suo, Minh Dao, Vishal Monga, and Trac D Tran, “Structured sparse priors for image classification,” Image Processing, IEEE Transactions on, June 2015.
  • [15] Raghu G. Raj and A.C. Bovik, “A hierarchicial bayesian-map approach to computational imaging,” IEEE Internation Conference on Image Processing, 2014.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description