Gaussian Affine Feature Detector
A new method is proposed to get image features’ geometric information. Using Gaussian as an input signal, a theoretical optimal solution to calculate feature’s affine shape is proposed. Based on analytic result of a feature model, the method is different from conventional iterative approaches. From the model, feature’s parameters such as position, orientation, background luminance, contrast, area and aspect ratio can be extracted.
Tested with synthesized and benchmark data, the method achieves or outperforms existing approaches in term of accuracy, speed and stability. The method can detect small, long or thin objects precisely, and works well under general conditions, such as for low contrast, blurred or noisy images.
LoG, DoG, differential geometry, Hessian, Harris, affine, Fourier, Laplacian.
Detecting two dimension signals is more difficult than one dimension ones and many heuristic algorithms are proposed to deal with it. However, following appearance of scale space theory [12, 7], many effective feature detectors come into being .
Originally, scale space theory is proposed by physicists, and developed by computer scientists. It is studied thoroughly from view point of vision and mathematics, and a consistent way to find new detector had been built . Many successful feature detectors are built upon scale space, including [3, 15].
An additional dimension is introduced in scale space, namely scale dimension. In order to get image’s information, such as affine shape parameters, some methods [9, 6, 4] iteratively search in scale dimension. They are based on fix point theory: they will finally get a solution if there is one. In practice, however, these methods have several drawbacks, including,
waste lots of candidate features;
get abundant duplicated or false features.
To overcome these drawbacks, we propose a new method based on analytic solution. It achieves or outperforms iterative methods with much less computation resources.
Features’ position, orientation, background luminance, contrast, area and aspect ratio can also be extracted from images. Until recently, information such as background luminance contrast are not commonly used in feature extraction. Others including area, orientation and aspect ratio are studied extensively, but with a limited accuracy.
In this paper, a feature model is proposed, and the above mentioned parameters will be calculated.
2 Gaussian Affine Shape
In this section, firstly a feature model is proposed, and then analytic result is derived based on the model to get various parameters.
2.1 Feature Model
From a view point of systematology, images, feature extractor and features correspond to input, system, and output. We need build a system that can transform input to output. In another words, image is system input and feature parameters are output. In order to study behavior of the system, we need define input signals. As it is not possible to build an all-purpose feature extractor, we will concentrate on some specific image signals. Since (two dimensional) Gaussian has nice analytic properties and simple form, it is chosen as input signal. As to be shown later, Gaussian based model will filter out high frequency signal, hence ideal for noisy images.
Based on above mentioned idea, image surface is modeled as Gaussian function, as shown in Fig. 1. In this way, image feature parameters are related to Gaussian parameters, including orientation, long and short radii, baseline height and contrast. Traditionally, baseline height and contrast are not considered in feature extraction, they are included for completeness. The signal can be defined as Equ. 1. Parameters and model variables are listed in Tab. I.
|LoG detected scale|
Before continuing, it is helpful to clarify a fact, that is, rotating an image will not affect our discussion. This fact greatly simplifies our deduction. It is proved in Appendix. A.
Known the fact, two-dimensional axis-aligned Gaussian will be used as input signal, as shown in Equ. 2. For this function, we need get value of , , and .
2.2 Feature Detection
Before computing parameters, we need detect feature’s position. There exists many feature detectors, we need choose the one that has good performance and solid mathematical foundation. It will be chosen from rotation invariant differential operator family. As defined in Equ. 3, LoG detector is a good candidate, it is very stable, has fast implementation, and is Gaussian based. The last one is the most important reason, because input signal is also a Gaussian, and they may have close relation.
As shown in Equ. 7, convolving I with G is another Gaussian, which is called (Gaussian) scale space. For zero shifted , its Laplacian will get extreme at origin. Normalizing this value will get normalized Laplacian of Gaussian operation upon I, which is basis of some feature extractors.
Applying LoG to image to get extreme points, and with information provided by , we need get radii (standard deviations) of original input (Gaussian) function .
2.3 Parameters Calculation
As shown before, image can be considered as a surface in three-dimensional space. From differential geometry, we know its hessian matrix directly relates to principal curvatures and principal directions, and for Gaussian function, principal curvatures connect with its standard deviations. In one word, eigenvalues of hessian matrix relate to radii and eigenvectors relate to directions. We also know that two principal directions are perpendicular to one another. Based on these facts, we will derive formulas for parameters.
Obviously, convolving with isotropic Gaussian will not change principal direction. For extreme point, we can use ’s principal direction as ’s principal direction. For the case of axis-aligned Gaussian, we already know principal directions, otherwise, compute eigenvectors.
Remaining question is, giving information of , how to get ’s radii and , its contrast and baseline height .
Here, we will exploit a fact, that LoG can detect Gaussian at one and only one scale. In another word, every and pair must produce one and only one , as shown in Equ. 8. If analytic form of is determined, we can recover and from .
For any input image, is fixed. LoG will detect extreme point in a fixed scale . Let us denote , and .
Apply normalized LoG operator to I, and substitute and , and let , , we get Equ. 9.
Let be constance and draw this expression in Fig. 2. It is clearly shown that for , extreme of is located on a smooth ridge.
For a fixed , at extreme point, the formula’s one order derivative will be zero. After some calculation, we can get Equ. 10.
To solve this equation, let and , and we get two order equation Equ. 11.
It is easy to solve, as Equ. 2.3.
Known constraint of and , we need more information to get their values. As mentioned above, eigenvalues of hessian matrix relate to radii closely. We calculate hessian matrix over scale space, as shown in Equ. 13.
As before, we calculate eigenvalues of this matrix, and let , . Since our discuss based Gaussian, we can get analytic solution of two eigenvalues, shown in Equ. 2.3.
These two eigenvalues have complicated form, but their ratio is simpler, shown in Equ. 15.
We draw this relation in Fig. 3, which shows detecting scale tends to be constancy as shape gets elongate. Simply put, elongating a shape contributes little to its detecting scale.
Got , , , and will be solved directly.
and can also be solved in analytic form. Equ. 9 is used to get . Because is constant component of scale space, it will disappear by differential operation; therefore can only be solved in scale space itself. Let and in , we will get Equ. 18, so can be solved upon extreme point of scale space.
Until now, we have calculated all parameters of the zero shifted and axis aligned Gaussian. Because axis can be shifted or rotated, our discussion will be applied to Gaussian of any position or rotation. We will summary the steps of our algorithm.
Detect extreme point in normalized LoG space, and get its .
compute hessian matrix of extreme point in corresponding scale space
compute eigenvectors as principal directions of the point.
compute eigenvalues, let absolute larger one divide smaller one, and represented as
2.4 Data Transformation
Until now, signal’s radii and angle are extracted. In order to comparing with other methods’ results, we depend on some publicly available tools. Therefore radii and angle need to be transformed to a common form, such as symmetric positive definite matrix, as shown in Equ. 19.
If let be signal’s orientation, and , in a similar way as before, we get Equ. 20.
3 Implementation Details
In this section, some important implementation details are outlined.
3.1 Approximation and Adjustment
As shown in Equ. 21, LoG can be implemented by DoG, and together with pyramid algorithm, which makes proposed method ready for application. We use similar DoG pyramid as Lowe’s. Extremum of DoG should be adjusted by a constant multiplier, for its value is used to compute and .
3.2 Removal of False Features
Tested with synthesized data, we found one common problem among several (affine) feature detectors, that is, for a single Gaussian signal, often there are several features detected out. Some of them have similar radii and orientations, located around true position, as shown in Fig. 5 and Fig. 9. Others are false features arisen from noise, as shown in Fig. 9 and Fig. 10.
In practice, we found a large part of false features coming from sampling and digitization process, that is to say, they are small sized, low contrast features. True features seldom have such properties. Therefore features with small value of , and are considered as noises.
3.3 Detector Threshold
Like SIFT, we uses ratio of principal curvatures (ratio of hessian’s eigenvalues, or in our method) to remove points on valley or ridge. To accept more features, the ratio needs to be refined. Combining Equ. 17 and Equ. 16, with , we have Equ. 22.
In order to evaluate performance of our method, we firstly test it with synthesized data. In this way, we will know true parameters and therefor can compare them with calculated ones. We will compare results of our method and others, including Harris-Affine, Hessian-Affine and Mser. Only common parameters such as orientation, long and short scale can be compared, because contrast and base height are unique provided by our method. Nevertheless, we will show the results alone.
Gaussian will be used as test image. Image size is 256x256, and gray scale level is . Our method can detect a large range of parameters, and Tab. II lists parameters used in experiments.
4.1 Results of ideal signals
As demostrated in Fig. 5, Hessian-Affine and Harris-Affine tend to detect duplicated features. Fig. 6 and Fig. 7 show, for noise free Gaussian signal, Mser has highest accuracy for detecting position and aspect ratio. Our method achieves similar results as MSer. Compared with Harris-Affine, Hessian-Affine gets better results. Both Mser and our method can detect signals of high aspect ratio, but Hessian-Affine and Harris-Affine are limited to low aspect ratio signals.
Our method is to compute original parameters from blurred output image. For very long and thin shapes, our method may slightly underestimate true aspect ratio, as shown in Fig .7.
As shown in Fig. 8, our method and Mser achieve highest accuracy for detecting short radii. However, in addition to true signals, Mser often finds small concentric signals.
In conclusion, for ideal noise free Gaussian, Mser get best results, and ours is similar to that of Mser. Hessian-Affine and Harris-Affine are not as stable as Mser and ours.
4.2 Results of noisy signals
Fig. 9 is a typical noisy image, and Mser is the most sensitive to noise. Even a small amount of noise can impact Mser seriously. Fig. 10 is distance of true and detected points. It is difficult for Mser to differentiate noises from true signals. Therefore we only compare Hessian-Affine, Harris-Affine and ours for noisy images.
Using Mikolajczyk’s evaluation images and toolbox, we get repeatability in Fig. 13. For these noisy free images, Mser get highest accuracy, and Hessian-Affine, Harris-Affine and our methods have similar results. Our and are results of different thresholds.
Fig. 14 is detecting results of graffiti under different view angle. Compared with Hessian-Affine and Harris-Affine, Mser and ours detect fewer features. It seems that the former two detect many redundant features. Compared with ours, Mser tends to detect many small features.
In this paper, we have proposed a new feature detector. Compared with other methods, it is very stable, accurate and quick. Tested with Gaussian, for ideal noisy free signal, our method produces one of the best results, and for noisy signal, it outperforms others significantly. The proposed method can also extracts parameters unavailable for other methods, such as contrast and baseline height.
Test with benchmark images, the method get similar repeatability as Harris-Affine and Hessian-Affine.
Appendix A Proof of Rotation Invariant for Image Surface
Let be Fourier operator, and be an input function; Fourier transform is shown in Equ. 24.
If input function rotates in space, and let , its Fourier transform will also rotate same angle, as shown in Equ. A.
Convolution in space domain can be implemented by multiplication in domain, as shown in Equ. 26.
The authors would like to thank PhD. Andrea Vedaldi for his excellent open sourced code, and professor Bart ter Haar Romeny for his free distributed electronic book.
-  J.J. Koenderink and A.J. Doorn, ”Generic Neighborhood Operators,” IEEE Trans. Pattern Anal. Mach. Intell., Vol. 14, No. 6. (1992), pp. 597-605.
-  B.M.H Romeny, Front-end vision and multi-scale image analysis. Berlin, Germany.: Springer Verlag, 2003.
-  D.G. Lowe, ”Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, Vol. 60, No. 2. (1 November 2004), pp. 91-110.
-  K. Mikolajczyk and C. Schmid, ”Scale and Affine Invariant Interest Point Detectors,” International Journal of Computer Vision, Vol. 60, No. 1. (1 October 2004), pp. 63-86.
-  T. Lindeberg, ”Feature Detection with Automatic Scale Selection” International Journal of Computer Vision, Vol. 30, No. 2. (1 November 1998), pp. 79-116.
-  K. Mikolajczyk and C. Schmid, ”A performance evaluation of local descriptors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 10. (October 2005), pp. 1615-1630.
-  J.J. Koenderink ”The structure of images,” Biological Cybernetics, Vol. 50, No. 5, pp. 363-370-370, Aug 1984, doi:10.1007/BF00336961.
-  T. Tuytelaars and Luc Van Gool, ”Matching Widely Separated Views Based on Affine Invariant Regions,” Int. J. Comput. Vision, Vol. 59, No. 1. (August 2004), pp. 61-85.
-  K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool, ”A comparison of affine region detectors,” International Journal of Computer Vision, Vol. 65, No. 1. (13 November 2005), pp. 43-72.
-  J. Matas, O. Chum, U. Martin and T. Pajdla, ”Robust wide baseline stereo from maximally stable extremal regions,” In Proceedings of the British Machine Vision Conference, Vol. 1 (2002), pp. 384-393.
-  X. Xu and J. Yang, ”Directional SIFT – An Improved Method Using Elliptical Gaussian Pyramid,” Chinese Conference on Pattern Recognition (CCPR ’10), pp. 1 - 5, 2010, doi:10.1109/CCPR.2010.5659135.
-  A.P. Witkin, ”Scale-Space Filtering,” In 8th Int. Joint Conf. Artificial Intelligence, Vol. 2, pp. 1019-1022, Aug 1983.
-  A. Vedaldi and B. Fulkerson, ”VLFeat: An Open and Portable Library of Computer Vision Algorithms,” http://www.vlfeat.org/
-  T. Kadir, A. Zisserman and M. Brady, ”An Affine Invariant Salient Region Detector,” In Computer Vision - ECCV 2004, pp. 228¨C241.
-  D.G. Lowe, ”Object Recognition from Local Scale-Invariant Features,” IEEE International Conference on In Computer Vision, 1999, Vol. 2, (06 August 1999), pp. 1150-1157.