Distributed High Dimensional Information Theoretical Image Registration via Random Projections^{✩}
Abstract
Information theoretical measures, such as entropy, mutual information, and various divergences, exhibit robust characteristics in image registration applications. However, the estimation of these quantities is computationally intensive in high dimensions. On the other hand, consistent estimation from pairwise distances of the sample points is possible, which suits random projection (RP) based low dimensional embeddings. We adapt the RP technique to this task by means of a simple ensemble method. To the best of our knowledge, this is the first distributed, RP based information theoretical image registration approach. The efficiency of the method is demonstrated through numerical examples.
Distributed High Dimensional Information Theoretical Image Registration via Random Projections^{✩}
Eötvös Loránd University, Department of Software Technology and Methodology Pázmány Péter sétány 1/C, Budapest, H1117, Hungary
Keywords: random projection, information theoretical image registration, high dimensional features, distributed solution
1 Introduction
Machine learning methods are notoriously limited by the high dimensional nature of the data. This problem may be alleviated via the random projection (RP) technique, which has been successfully applied, e.g., in the fields of classification Fradkin and Madigan (2003); Deegalla and Boström (2006); Goel et al. (????), clustering Fern and Brodley (2003), independent subspace analysis Szabó and Lőrincz (2009), search for approximate nearest neighbors Ailon and Chazelle (????), dimension estimation of manifolds Hegde et al. (????), estimation of geodesic paths Mahmoudi et al. (2008), learning mixture of Gaussian models Dasgupta (2000), compression of image and text data Bingham and Mannila (2001), data stream computation Li et al. (2007); Menon et al. (2007) and reservoir computing Lukos̆evic̆ius and Jaeger (2009). For a recent RP review, see Vempala (2005). We note that the RP technique is closely related to the signal processing method of compressed sensing Baraniuk et al. (2008).
As it has been shown recently in a number of works Özuysal et al. (2010); Kokiopoulou et al. (2009); AkselrodBallin et al. (2009); Healy and Rohde (2007), the RP approach has potentials in patch classification and image registration. For example, Özuysal et al. (2010) combines the votes of random binary feature groups (ferns) for the classification of random patches in a naive Bayes framework. Promising registration methods using and (Euclidean distance, correlation) norms have been introduced in Kokiopoulou et al. (2009) and AkselrodBallin et al. (2009); Healy and Rohde (2007), respectively.
Information theoretical cost functions, however, exhibit more robust properties in multimodal image registration Neemuchwala et al. (2007); Kybic (2004); Bardera et al. (2006). Papers Neemuchwala et al. (2007); Kybic (2004) apply knearest neighbor based estimation. However, the computation of these quantities is costly in high dimensions Arya et al. (1998) and the different image properties (e.g., colors, intensities of neighborhood pixels, gradient information, output of spatial filters, texture descriptors) may easily lead to high dimensional representation. The task is formulated as the estimation of discrete mutual information in Bardera et al. (2006) and the solution is accomplished by equidistant sampling of points from randomly positioned straight lines. The method estimates a histogram of bins, where is the number of bins of the image, which may considerably limit computational efficiency.
Here we address the problem of information theoretical image registration in case of high dimensional features. Particularly, we demonstrate that Shannon’s multidimensional differential entropy can be efficiently estimated for high dimensional image registration purposes through RP methods. Our solution enables distributed evaluation. The presented approach extends the method presented in Szabó and Lőrincz (2009) in the context of independent subspace analysis (ISA) Cardoso (1998), where we exploited the fact that ISA can be formulated as the optimization problem of the sum of entropies under certain conditions Szabó et al. (2007). Here, to our best knowledge, we present the first distributed RP based information theoretical image registration approach.
2 Background
First, we describe the image registration task (Section 2.1) followed by low distortion embeddings and random projections (Section 2.2).
2.1 The Image Registration Problem
In image registration one has two images, and , as well as a family of geometrical transformations, such as scaling, translation, affine transformations, and warping. We assume that the transformations can be described by some parameter and let denote the set of the possible parameters. Let transformation with parameter on produce . The goal of image registration is to find the transformation (parameter ) for which the warped test image is the ‘closest’ possible to reference image . Formally, the task is
where the similarity of two images is given by the similarity measure . Registration depends on the similarity measure and one use – among other things – and norm, or different information theoretical similarity measures.
Let feature denote the feature of image associated with pixel . In the simplest case, the feature is the pixel itself, but one can choose a neighborhood of the pixel, edge information at and around the pixel, the RGB values for colored images, or combinations of these. For registrations based on the norm , the cost function takes the form
where for a vector . Instead of the similarity of features in and norms, one might consider similarity by means of information theoretical concepts. An example is that we take the negative value of the joint entropy of the features of images , as our cost function GarcíaArteaga and Kybic (2008):
(1) 
where denotes Shannon’s multidimensional differential entropy Cover and Thomas (1991). One may replace entropy in (1) by other quantities, e.g., by the Rényi’s entropy, the mutual information, and the divergence, to mention some of the candidate similarity measures Neemuchwala et al. (2007).
2.2 Low Distortion Embeddings, Random Projection
Low distortion embedding and random projections are relevant for our purposes. Low distortion embedding intends to map a set of points of a high dimensional Euclidean space to a much lower dimensional one by preserving the distances between the points approximately. Such low dimensional approximate isometric embedding exists according to the JohnsonLindenstrauss Lemma Johnson and Lindenstrauss (1984):
Lemma (JohnsonLindenstrauss).
Given a number and a point set of elements. Then for there exists a Lipschitz mapping such that
(2) 
for any .
During the years, a number of explicit constructions have appeared for the construction of . Notably, one can show that the property embraced by (2) is satisfied with probability that approaches 1 for random linear mapping (, ) provided that is chosen to project to a random dimensional subspace Frankl and Maehara (1987).
Less strict conditions on are also sufficient and many of them decreases computational costs. Introducing the notation , i.e., , , it is sufficient that matrix elements of matrix are drawn independently from the standard normal distribution Indyk and Motwani (????).^{1}^{1}1Multiplier in expression means that the length of the rows of matrix is not strictly one; it is sufficient if their lengths are 1 on the average. Other explicit constructions for include Rademacher and (very) sparse distributions Achlioptas (2003); Li et al. (2006). More general methods are also available based on weak moment constraints Arriga and Vempala (2006); Matous̆ek (2008).
3 Method
In image registration information theoretical registration measures show robust characteristics when compared with and measures^{2}^{2}2The family of measures include the correlation defined by the scalar product., e.g., in Kybic (2004) directly for (1), and for entropy, mutual information, and divergence in Neemuchwala et al. (2007). However, these estimations have high computational burdens since the dimension of the features in the cited references are and respectively. Here, we deal with the efficient estimation of cost function (1) that from now on we denote by . We note that the idea of efficient estimation can be used for a number of information theoretical quantities, provided that they can be estimated by means of pairwise Euclidean distances of the samples.
Central to our RP based distributed method are the following:

The computational load can be decreased by

dividing the samples into groups and then

computing the averages of the group estimates Kybic (2004).
We call this the ensemble approach.

Taking into account that low dimensional approximate isometric embedding of points of high dimensional Euclidean space can be addressed by the JohnsonLindenstrauss Lemma and the related random projection methods, we suggest the following procedure for distributed RP based entropy (and thus ) estimation:

divide the feature samples^{3}^{3}3In the image registration task the set of feature samples is where is the running index and denotes the concatenation of vectors and . into groups indexed by sets so that each group contains samples,

for all fixed groups take the random projection of as
Note: normalization factor can be dropped in since it becomes an additive constant term for the case of the differential entropy, .

average the estimated entropies of the RPed groups to get the estimation
(3)
In the next section we illustrate the efficiency of the proposed RP based approach in image registration.
4 Illustrations
In our illustrations, we show examples that enable quantitative evaluation and reproduction:

we chose to evaluate the objective function (1) for angles from to by steps and in interval by steps . In the ideal case the optimal degree is . Our performance measure is the deviation from the optimal value.
In our simulations,

we chose the rectangle around each pixel as the feature of that pixel.

for each individual parameter, random runs were averaged. Our parameters included , the linear size of the neighborhood that determines dimension of the feature, , the size of the randomly projected groups and , the dimension of RP.

performance statistics are summarized by means of notched boxed plots, which show the quartiles (), depict the outliers, i.e., those that fall outside of interval by circles, and whiskers represent the largest and smallest nonoutlier data points.

we studied the efficiency of five different entropy estimating methods in (3) including

the recursive kd partitioning scheme Stowell and Plumbley (2009),

the nearest neighbor method Leonenko et al. (2008),

generalized knearest neighbor graphs Pál et al. (????),

the weighted nearest neighbor method Sricharan and Hero (????).
The methods will be referred to as kdp, kNN, kNN, MST and wkNN, respectively. kdp is a plugin type method estimating the underlying density directly, hence especially efficient for small dimensional () problems. Pál et al. (????) extends the approach of Leonenko et al. (2008) () to an arbitrary neighborhood subset (). In our experiments, we set . Instead of knearest graphs the total sum of pairwise distances is minimized over spanning trees in the MST method. The kNN, kNN, MST constructions belong to the general umbrella of quasiadditive functionals Yukich (1998) providing statistically consistent estimation for the Rényi entropy () Rényi (????) . The Shannon’s entropy is a special case of this family since . In our simulations, we chose . , the number of neighbors in kNN, kNN was . Finally, the wkNN technique makes use of a weighted combination of knearest neighbor estimators for different values.


, the neighborhood parameter was selected from the set .

, the size of groups and , the RP dimension took values , , , , , and , respectively.

the feature points were distributed randomly into groups of size in order to increase the diversity of the individual groups.
In the first set of experiments we focused on the precision of the estimations on the Lena dataset. According to our experiences

there is no relevant/visible difference in the precision of the estimations for . The estimation is even of high precision for that we illustrate in Fig. 2(a)(b) for the kdp technique. The estimation errors are quite similar for and , the latter is shown in Fig. 2(c)(d). Here, one can notice a small uncertainty in the estimations for smaller RP dimensions (), which is moderately present for larger values () – except for the largest studied group size .
In the second set of experiments we were dealing with the mandrill dataset, where different modalities of the same image (pixel, edge filtered version) had to be registered. Here,

the kdp approach gradually deteriorates as the dimension of the underlying feature representation is increasing, i.e., as a function of . For , the method gives precise estimations for and small group sizes (); other parameter choices result in uncertain estimations, see Fig. 5(a)(b). By increasing the size of neighborhood (), the estimations gradually break down. For , the precisions are depicted in Fig. 5(c)(d); the estimations are still acceptable. For we did not obtain valuable estimations for the kdp technique.

in contrast to the kdp method, the kNN, kNN, MST and wkNN techniques are all capable of coping with the and values, as it is illustrated in Fig. 6, Fig. 7, Fig. 8 and Fig. 9, respectively. It can also be observed, that the RP dimension must be here, and in case of one obtaines highly precise/certain estimations.

the only method which could cope with the increased neighbor size value, was the technique. This result could be achieved for RP dimension making use of small group sizes (), see Fig. 10.
The computation times are illustrated for the Lena () and mandrill dataset () for the kdp method in Fig.11(a) and Fig.11(b), respectively. As it can be seen, the ensemble approach with group size may speed up computations by several orders of magnitudes; similar trends can be obtained for the other estimators, too. Among the studied methods, the kdp technique was the most competitive in terms of computation time. We also present the computation times for the largest studied problem, Lena with ; compared to kdp

the kNN and kNN techniques were within a factor of in terms of computation time,

the wkNN method was () times slower compared to the kdp approach in case of (), and

the MST based estimator was within a factor of compared to kdp in case of , and more than times slower for .
As it can be seen in Fig. 11, the application of the reduced RP dimension can be advantageous in terms of computation time. Moreover, compared to schemes without dimensionality reduction ( and , ), i.e., working directly on raw data, the presented RP based dimensionality approach can heavily speedup computations. This behaviour is already present for , as it is illustrated for on the mandrill dataset in Table 1.
Considering the possible choices, according to our numerical experiences,

often, small RP dimensions give rise to reliable estimations for several entropy methods,

it is necessary to slowly increase as a function of the dimension of the feature representation (parameterized by ),

in the studied parameter domain, group sizes of could provide precise estimations, and simultaneously open the door to massive speedup by distributed solutions.
These results demonstrate the efficiency of our RP based approach.
5 Conclusions
We have shown that the random projection (RP) technique can be adapted to distributed information theoretical image registration. Our extensive numerical experiments including five different entropy estimators demonstrated that the proposed approach (i) can offer orders of magnitude in computation time, and (ii) provides robust estimation for large dimensional features.
It is very promising since it is parallel and fits multicore architectures, including graphical processors. Since information theoretical measures are robust, our method may be useful in diverse signal processing areas with the advance of multicore hardware.
Acknowledgments
The European Union and the European Social Fund have provided financial support to the project under the grant agreement no. TÁMOP 4.2.1./B09/1/KMR20100003. The research has also been supported by the ‘European Robotic Surgery’ EC FP7 grant (no.: 288233). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of other members of the consortium or the European Commission.
The authors would like to thank to Kumar Sricharan for making available the implementation of the wkNN method.
References
 Fradkin and Madigan (2003) D. Fradkin, D. Madigan, Experiments with random projections for machine learning, in: International Conference on Knowledge Discovery and Data Mining (KDD 2003), pp. 517–522.
 Deegalla and Boström (2006) S. Deegalla, H. Boström, Reducing highdimensional data by principal component analysis vs. random projection for nearest neighbor classification, in: International Conference on Machine Learning and Applications (ICMLA 2006), pp. 245–250.
 Goel et al. (????) N. Goel, G. Bebis, A. V. Nefian, Face recognition experiments with random projections, in: SPIE Conference on Biometric Technology for Human Identification, 2005, volume 5779, pp. 426–437.
 Fern and Brodley (2003) X. Z. Fern, C. E. Brodley, Random projection for high dimensional data clustering: A cluster ensemble approach, in: International Conference on Machine Learning (ICML2003), pp. 186–193.
 Szabó and Lőrincz (2009) Z. Szabó, A. Lőrincz, Fast parallel estimation of high dimensional information theoretical quantities with low dimensional random projection ensembles, in: Independent Component Analysis and Signal Separation (ICA 2009), volume 5441 of Lecture Notes in Computer Science, SpringerVerlag, Berlin Heidelberg, 2009, pp. 146–153.
 Ailon and Chazelle (????) N. Ailon, B. Chazelle, Approximate nearest neighbors and the fast JohnsonLindenstrauss transform, in: Annual ACM Symposium on Theory of Computing, 2006, pp. 557 – 563.
 Hegde et al. (????) C. Hegde, M. B. Wakin, R. G. Baraniuk, Random projections for manifold learning, in: Neural Information Processing Systems (NIPS 2007), pp. 641–648.
 Mahmoudi et al. (2008) M. Mahmoudi, P. Vandergheynst, M. Sorci, On the estimation of geodesic paths on sampled manifolds under random projections, in: International Conference on Image Processing (ICIP2008), pp. 1840–1843.
 Dasgupta (2000) S. Dasgupta, Experiments with random projection, in: Conference on Uncertainty in Artificial Intelligence (UAI2000), Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000, pp. 143–151.
 Bingham and Mannila (2001) E. Bingham, H. Mannila, Random projection in dimensionality reduction: applications to image and text data, in: International Conference on Knowledge Discovery and Data Mining (KDD 2001), pp. 245–250.
 Li et al. (2007) P. Li, T. J. Hastie, K. W. Church, Nonlinear estimators and tail bounds for dimension reduction in using Cauchy random projections, Journal of Machine Learning Research 8 (2007) 2497–2532.
 Menon et al. (2007) A. K. Menon, A. Pham, S. Chawla, A. Viglas, An incremental datastream sketch using sparse random projections, in: SIAM International Conference on Data Mining (SDM 2007), pp. 563–568.
 Lukos̆evic̆ius and Jaeger (2009) M. Lukos̆evic̆ius, H. Jaeger, Reservoir computing approaches to recurrent neural network training, Computer Science Review 3 (2009) 127–149.
 Vempala (2005) S. S. Vempala, The Random Projection Method (DIMACS Series in Discrete Math), volume 65, 2005.
 Baraniuk et al. (2008) R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, A simple proof of the restricted isometry property for random matrices, Constructive Approximation 28 (2008) 253–263.
 Özuysal et al. (2010) M. Özuysal, M. Calonder, V. Lepetit, P. Fua, Fast keypoint recognition using random ferns, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2010) 448–461.
 Kokiopoulou et al. (2009) E. Kokiopoulou, D. Kressner, P. Frossard, Optimal image alignment with random measurements, in: European Signal Processing Conference (EUSIPCO 2009), pp. 1304–1308.
 AkselrodBallin et al. (2009) A. AkselrodBallin, D. Bock, R. C. Reid, S. K. Warfield, Accelerating feature based registration using the JohnsonLindenstrauss lemma, in: Medical Image Computing and ComputerAssisted Intervention (MICCAI 2009), volume 5761 of Lecture Notes in Computer Science, Springer, 2009, pp. 632–639.
 Healy and Rohde (2007) D. M. Healy, G. K. Rohde, Fast global image registration using random projections, in: International Symposium on Biomedical Imaging (ISBI 2007), pp. 476–479.
 Neemuchwala et al. (2007) H. Neemuchwala, A. Hero, S. Zabuawala, P. Carson, Image registration methods in high dimensional space, International Journal of Imaging Systems and Technology 16 (2007) 130–145.
 Kybic (2004) J. Kybic, Highdimensional mutual information estimation for image registration, in: IEEE International Conference on Image Processing (ICIP’04), IEEE Computer Society, 2004, pp. 1779–1782.
 Bardera et al. (2006) A. Bardera, M. Feixas, I. Boada, M. Sbert, Highdimensional normalized mutual information for image registration using random lines, in: Third International Workshop on Biomedical Image Registration (WBIR 2006), volume 4057 of Lecture Notes in Computer Science, SpringerVerlag, Berlin Heidelberg, 2006, pp. 264–271.
 Arya et al. (1998) S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, A. Y. Wu, An optimal algorithm for approximate nearest neighbor searching in fixed dimensions, Journal of the ACM (JACM) 45 (1998) 891 – 923.
 Cardoso (1998) J. Cardoso, Multidimensional independent component analysis, in: International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’98), volume 4, pp. 1941–1944.
 Szabó et al. (2007) Z. Szabó, B. Póczos, A. Lőrincz, Undercomplete blind subspace deconvolution, Journal of Machine Learning Research 8 (2007) 1063–1095.
 GarcíaArteaga and Kybic (2008) J. D. GarcíaArteaga, J. Kybic, Regional image similarity criteria based on the KozachenkoLeonenko entropy estimator, in: Computer Vision and Pattern Recognition Workshops (CVPRW2008), pp. 1–8.
 Cover and Thomas (1991) T. M. Cover, J. A. Thomas, Elements of information theory, John Wiley and Sons, New York, USA, 1991.
 Johnson and Lindenstrauss (1984) W. B. Johnson, J. Lindenstrauss, Extensions of Lipschitz maps into a Hilbert space, Contemporary Mathematics 26 (1984) 189–206.
 Frankl and Maehara (1987) P. Frankl, H. Maehara, The JohnsonLindenstrauss Lemma and the sphericity of some graphs, Journal of Combinatorial Theory Series A 44 (1987) 355 – 362.
 Indyk and Motwani (????) P. Indyk, R. Motwani, Approximate nearest neighbors: Towards removing the curse of dimensionality, in: ACM Symposium on Theory of Computing, 1998, pp. 604–613.
 Achlioptas (2003) D. Achlioptas, Databasefriendly random projections: JohnsonLindenstrauss with binary coins, Journal of Computer and System Sciences 66 (2003) 671–687.
 Li et al. (2006) P. Li, T. J. Hastie, K. W. Hastie, Very sparse random projections, in: International Conference on Knowledge Discovery and Data Mining (KDD2006), ACM, New York, NY, USA, 2006, pp. 287–296.
 Arriga and Vempala (2006) R. I. Arriga, S. Vempala, An algorithmic theory of learning: Robust concepts and random projections, Machine Learning 63 (2006) 161–182.
 Matous̆ek (2008) J. Matous̆ek, On variants of the JohnsonLindenstrauss lemma, Random Structures and Algorithms 33 (2008) 142–156.
 Kozachenko and Leonenko (1987) L. F. Kozachenko, N. N. Leonenko, On statistical estimation of entropy of random vector, Problems of Information Transmission 23 (1987) 95–101.
 Hero et al. (2002) A. Hero, B. Ma, O. Michel, J. Gorman, Applications of entropic spanning graphs, Signal Processing 19 (2002) 85–95.
 Leonenko et al. (2008) N. Leonenko, L. Pronzato, V. Savani, A class of Rényi information estimators for multidimensional densities, Annals of Statistics 36 (2008) 2153–2182.
 Stowell and Plumbley (2009) D. Stowell, M. D. Plumbley, Fast multidimensional entropy estimation by kd partitioning, IEEE Signal Processing Letters 16 (2009) 537–540.
 Pál et al. (????) D. Pál, B. Póczos, C. Szepesvári, Estimation of Rényi entropy and mutual information based on generalized nearestneighbor graphs, in: Neural Information Processing Systems (NIPS2010), pp. 1849–1857.
 Yukich (1998) J. E. Yukich, Probability Theory of Classical Euclidean Optimization Problems, Springer, 1998.
 Sricharan and Hero (????) K. Sricharan, A. O. Hero, Weighted kNN graphs for Rényi entropy estimation in high dimensions, in: IEEE Workshop on Statistical Signal Processing (SSP2011), pp. 773–776.
 Rényi (????) A. Rényi, On measures of entropy and information, in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1961, volume 1, pp. 547–561.
About the author–ZOLTÁN SZABÓ (Applied Mathematics M.Sc. 2006, Ph.D. 2012,
Informatics Ph.D. 2009) is a research fellow at the Eötvös Loránd University. In 2007, he won the Scientist of the Year
Award of the Faculty of Informatics. In 2008, he obtained the Bronze Medal of the Pro
Patria et Scientia Award of Hungarian Ph.D. Students. He is a reviewer at the IEEE Transactions
on Neural Networks and Learning Systems, Signal, Image and Video Processing, Neurocomputing and
IEEE Transactions on Signal Processing journals. His research interest include Independent Subspace Analysis and its extensions,
information theory, kernel methods, groupstructured dictionary learning and collaborative filtering.
About the author–ANDRÁS LŐRINCZ (Physics M.Sc. 1975, Solid
State Physics Ph.D. 1978, Molecular Physics C.Sc. 1986, Laser Physics
habilitation, 1998, Information Technology habilitation, 2009) is a senior
researcher of Information Science at Eötvös Loránd University.
He is a Fellow of the European Coordinating Committee for Artificial
Intelligence. He has published more than 140 peer reviewed journal 80 peer
reviewed conference papers on his research areas. He has been leading a
group working on different aspects of intelligent systems.


kdp  

kNN (kNN)  
MST  
wkNN 