The Shortlist Method for Fast Computation of the Earth Mover’s Distance and Finding Optimal Solutions to Transportation Problems
Finding solutions to the classical transportation problem is of great importance, since this optimization problem arises in many engineering and computer science applications. Especially the Earth Mover’s Distance is used in a plethora of applications ranging from content-based image retrieval, shape matching, fingerprint recognition, object tracking and phishing web page detection to computing color differences in linguistics and biology. Our starting point is the well-known revised simplex algorithm, which iteratively improves a feasible solution to optimality. The Shortlist Method that we propose substantially reduces the number of candidates inspected for improving the solution, while at the same time balancing the number of pivots required. Tests on simulated benchmarks demonstrate a considerable reduction in computation time for the new method as compared to the usual revised simplex algorithm implemented with state-of-the-art initialization and pivot strategies. As a consequence, the Shortlist Method facilitates the computation of large scale transportation problems in viable time. In addition we describe a novel method for finding an initial feasible solution which we coin Modified Russell’s Method.
Transportation problem, revised simplex method, earth mover’s distance, EMD, Wasserstein distance, Monge-Kantorovich problem
Finding solutions to the classical transportation problem is of great importance, since this optimization problem arises in various guises in many real world and theoretical situations. They occur as subproblems in larger problems, e.g. the warehouse location problem or the traveling salesperson problem and also in a variety of engineering and computer science applications, such as content based image retrieval , automatic scene analysis  or for the discrimination between real and artificial fingerprints . A more extensive discussion of such applications is given in Section 6.
The problem was first described by Monge in 1781  in somewhat different form and has been analyzed by many researches including Kantorovich, Hitchcock, Koopmans and especially Dantzig [7, 8], the father of the simplex algorithm. The solution of this problem is the fundamental ingredient for computing the Earth Mover’s Distance  in computer science and the Wasserstein distance, also known as Mallows or Kantorovich distance in statistics and physics, see Chapter 6 in .
In order to give a quick and intuitive description of the various facets of the transportation problem and the revised simplex algorithm we often use an economic interpretation, which of course will not reduce the scope of the described algorithms and their applications in any way. The problem can be summarized as follows.
Consider a consortium of production and consumption facilities of a certain good. For simplicity these are also referred to as origins and destinations. Suppose that there is a certain supply of available at origin , and there is a certain demand of at destination . The cost for transporting a unit of the good from to shall be given by arbitrary . Borrowing the illustration from Chapter 3 in , the production facilities might be Parisian bakeries cooperating with cafés (consumption facilities), where the good transported are baguettes, and the cost incurred is the actual transportation cost. It is assumed that total supply equals total demand, i.e. . The objective is then to determine a transportation plan such that all producers and consumers are satisfied and that the total cost is minimized. In other words
A dual formulation can be obtained as follows. Suppose that a carrying company offers to take over the good from the consortium for a price of per unit at origin and to hand it back at destination for a price of (any prices may be negative). In order for the carrier to be competitive, it needs to set prices so that for all , . Following  we refer to the difference as relative cost incurred when the consortium takes over the transportation from to itself rather than commissioning the carrier. The carrier would like to maximize its profit subject to the price constraint. Standard duality theory, e.g. Chapter 4 in  relates the solutions of the two problems to one another (provided one of them exists) and shows that the optimal values of the objective functions are the same.
The rest of the paper is organized as follows. In Section 2 we first give a non-technical description of the revised simplex algorithm for solving the transportation problem; for a more detailed presentation see . Then we discuss crucial aspects in various subsections, starting with pivot strategies (Subsection 2.1), and passing from cycle finding to treating initialization methods (Subsections 2.2 and 2.3, respectively). In Section 3, we introduce the new Shortlist Method for solving the transportation problem. Benchmark tests reported in Section 4 clearly show the advantage of the proposed method over the existing ones. We conclude with a discussion of the results in Section 5 and review relevant application scenarios in Section 6.
2 The Transportation Algorithm
Using the simplex approach the transportation algorithm consists of two stages: first, an initial transportation plan is constructed such that Equations (2–4) are satisfied. Second, the initial plan is iteratively improved until the optimal solution is obtained.
At any time the current feasible plan consists of “active” origin/destination pairs between which a positive amount is transported (in a degenerate case there might be pairs with zero amount, but we exclude this case in our description). We will refer to them as basis pairs or basis entries.
For each iteration in the second stage a basis entry is replaced by a “better one”. For this we first compute the “dual” prices and . In the context of the simplex method, these are also known as simplex multipliers. Starting with an arbitrary value, e.g. setting , all other prices are determined by solving the equations , where are basis entries. A property well-known as basis triangularity sees to it that every origin and every destination gets a price assigned in this way.
A new basis entry is then selected as a so-called pivot element by finding a non-basis pair that has negative relative cost , meaning that the consortium can transport goods more cheaply from to by itself than by commissioning the carrier.
Next, a cycle of changes starting in is determined by alternately scanning rows and columns for basis entries until a cycle is complete, which again is bound to happen by basis triangularity. Assuming that all amounts at basis entries are positive (the non-degenerate case), there is a maximal positive amount which we can alternately add and subtract from the values when following the cycle, starting with addition for the first value . Since the cycle alternates between following rows and columns, the procedure preserves Equations (2–3).
After this, one of the has been reduced to and we remove the corresponding pair from the basis (if several values have been reduced to zero, we remove the first such entry, but are then dealing with a degenerate case). The basis still has exactly entries, and we proceed with the next iteration, continuing until there are no entries with negative relative cost any more. In this case we have reached an optimum.
2.1 Pivot Strategies
When selecting a pivot element to enter the basis, all non-basis entries with relative cost are candidates. According to Dantzig’s criterion, the most negative one is chosen. To the best of our knowledge it is an open question whether a better criterion for selecting one of these candidates can be formulated in order to minimize the number of pivot operations until optimality is reached.
If the algorithm is applied to solve real-world transportation problems, the goal of a practical implementation is typically to minimize the runtime on a computer. Our analysis has shown that two key factors determine the runtime: the number of pivot operations and the number of elements for which relative costs are computed in order to select pivot elements.
The former can be made small by computing the relative costs for all non-basis entries which in turn maximizes the latter (‘matrix most negative’ strategy). The other extreme is to perform the pivot operation immediately after discovering the first candidate (‘first negative’ strategy). In this way, the second factor is minimized at the cost of an increase of the first. A more balanced strategy is to compute the relative costs for all non-basis entries of a row and then choose the most negative among these candidates (‘modified row most negative strategy’) or go on with the subquent row, if no candidate has been discovered. In the next iteration of the algorithm, continue with the first row not considered in the previous one. The latter strategy outperformed the others in our tests, which corroborates earlier findings reported by  and by .
2.2 Finding Cycles
The procedure of finding cycles of changes can be translated into a depth-first search (DFS)  on the following directed graph (see also Figure 1): Each basis entry corresponds to two vertices: one vertex with the basis entries in the same row as incoming edges and the basis entries in the same column as outgoing edges, and a second vertex with the basis entries in the same column as incoming edges and the basis entries in the same row as outgoing edges. The graph is weakly connected, acyclic and bipartite. By adding the (two copies of the) pivot element, the graph becomes cyclic and DFS is an efficient method for discovering the (up to mirroring) unique cycle. Since each basis entry is connected to all other basis entries in the same row and the same column, no other data structure is needed to store the graph than a list of basis entries for each row and for each column.
Considering the example shown in Figure 1, we begin with the transport plan and graph on the left. Next, we insert F as pivot element (right) and discover the cycle starting in F1 with depth-first search. Along the cycle, the minimum of all nodes on the right side of the graph determines the amount of change which is substracted from B and E (red) and added to D and F (green). One of the two elements B or E will leave the basis. F2 was not required during the pivot operation, but alternatively it would have been possible to use the complementary cycle F2 B1 D2 E1 F2 instead, leading to the same result. Basis elements A and C remain unchanged during this pivot operation.
2.3 Initialization Methods
In the subsequent comparison of methods for constructing an initial feasible solution (stage one in the transportation algorithm), we take the following established procedures into account. If a method generates fewer than basis entries (degenerate case), we complement them by adding the right number of entries in such a way that all basis entries are connected, i.e. there are other basis entries in the same row or the same column, but no cycles are formed and their values remain zero.
Northwest Corner Rule. Suppose we list all origins from to as rows and all destinations from to as columns. This rather naive rule starts in the top left corner and allocates the maximum possible amount to , i.e. the minimum of and . If there remains supply at origin , we move to the right and assign to maximum possible amount. Otherwise if the demand at destination was larger than the supply, we move one cell down and continue with assignment . And in case that is equal to , we move directly to . In this way, we iterate over all origins and destinations, and we obtain a solution satisfying Equations (2-4).
Least Cost Rule or Matrix Minimum Rule. This simple rule determines in each iteration the minimum cost entry among all origins with remaining supply and among all destinations with remaining demand, and assigns the maximum possible amount to until all requirements are met.
Houthakker’s Method of Mutually Preferred Flows. The idea of Houthakker’s mutually preferred flows  is somewhat similar to the least cost rule. For all origins that have any supply left, the minimum cost of the corresponding row is determined, and likewise for all destinations that have any demand left, the minimum cost of the corresponding column is detected. If an entry is both row and column mimimum, the maximum feasible amount is assigned to . A difference to the least cost method is that more than one entry can enter the basis in each iteration.
Vogel’s Approximation Method. The basic idea of Vogel’s approximation method  is to compute the opportunity costs: for each not yet exhausted origin and for each remaining destination, take the difference between its smallest cost and its second smallest cost. This idea is also the key ingredient for computing bids and raising prices in the auction algorithm . In each iteration of Vogel’s approximation method, the row or column with the maximum opportunity cost is selected and for the minimum in that row or column, the maximum possible value is allocated.
Russell’s Method. Russell  proposed an approach to approximate Dantzig’s criterion. In each iteration denote by the set of origins that have any supply left and by the set of destinations that have any demand left. Then determine for every and for every . The quantities and are supposed to approximate the simplex multipliers and (see Section 2). Using these estimates, Russell computes in each iteration and allocates the maximum possible amount to .
Modified Russell Method. In this paper, we propose a modification of Russell’s method which outperforms the original version on our benchmarks: instead of updating and , we compute these values once at the start. Next, we compute a cost matrix with and then, we apply the least cost rule to this matrix . The proposed modification saves a lot of computational time in each iteration by not updating and and performs much better in comparison to the orginal Russell method (see Figure 3).
Weighted Frequency Method. Eight years before Russell, Habr  proposed a related method which he callled weighted frequency method. Let be the mean cost of row and the mean cost of column . According to Habr’s method, we define a matrix with cost entries . The transportation plan is established by choosing in each iteration pursuing the matrix minimum rule applied to and assigning the maximum possible amount to . Habr provides a nice theoretical justification for his method: suppose for each possible entry we consider each possible combination with and . The question whether it is beneficial to include in the transportation plan is answered by comparing the costs with the costs for all combinations . Habr showed that summing up the differences over all possible combinations is equivalent (up to a constant) to computing the matrix .
Row Minimum Rule and Modified Row Minimum Rule. These two rules  iterate over the rows (origins) and determine for each row the column (destination) with positive unassigned demand which has the minimum transportation cost . The difference between both rules is that modified row minimum rule assigns at most one entry per row and then resumes with the next row. The row minimum rule in contrast repeatedly determines the minimum for row until the supply of origin is completely distributed and only then it continues with the next row.
Column Minimum Rule and Modified Column Minimum Rule. These two rules work exactly as the two previous described methods with rows and columns exchanged.
Alternating Row Column Minimum Rule. This initialization method combines the modified row minimum rule and the modified column minimum rule by alternating between rows and colums.
Two Smallest in Row Rule. The two smallest in row rule  can be regarded as a variant of the modified row minimum rule that assigns two instead of one entries per row and iteration.
3 The Shortlist Method
As described in the previous section, the simplex-based transportation algorithm consists of two stages: an initialization phase to find a feasible solution and a convergence phase in which the current solution is iteratively improved to optimality. The Shortlist Method introduces an additional phase in between these two. The main steps of the Shortlist Method can be outlined as follows:
A shortlist is created for each origin containing only a small fraction of all possible destinations.
An initial feasible transportation plan is derived from these shortlists (for an example see Figure 2, left).
The transportation plan is improved towards optimality based on the shortlists.
The transportation plan is improved to global optimality based on the complete matrix (for an example see Figure 2, right).
The crucial part is the third step in which the shortlist search for a new basis entry balances the computional burden between the number of elements for which relative costs are calculated and the number of pivot operations performed.
More precisely the Shortlist Method uses as parameters the length of the shortlists and two decision criteria and . The four steps are carried out as follows.
At the beginning, for each origin , a list of destinations with the lowest transportation costs is created, containing the index of the destination and the corresponding costs. This shortlist is sorted ascendingly according to costs by QuickSort .
Next, we iterate over all not yet exhausted origins and assign the maximum feasible amount to with the smallest costs among all destinations in the shortlist of . If no such destination is available any more, the minimum over the remaining is chosen. The latter is usually only necessary for very small shortlist lengths.
In the third phase, we improve the transportation plan iteratively considering batches of consecutive shortlists. Starting from the first shortlist not considered in the previous iteration, we compute relative costs for non-basis entries until candidate entries with negative have been discovered or percent of all shortlists have been searched. Then the batch ends. We choose the entry with the most negative relative cost for performing a pivot operation, i.e. we add the entry to the basis, compute a cycle of changes and remove another entry from the basis as detailed in Section 2. Then we go the next iteration. Whenever the last shortlist has been used, we continue by reusing the first one. If at any point no more candidates are discovered, phase three is terminated.
In the final phase, complete rows are searched instead of shortlists and if a row contains at least one candidate, the most negative one is chosen; i.e. we perform the simplex-based transportation algorithm as described in Section 2 with the ‘modified row most negative’ pivot strategy until the optimum is reached.
4 Simulation Results
In order to evaluate the performance of the described initialization methods as a function of the number of origins and destinations, a benchmark was generated in the following way: On an empty grid of size , the - and -coordinates of locations for origins and destinations were chosen independently and uniformly at random while avoiding double allocations. Amounts and were chosen independently and uniformly at random between 0 and 255. A final adjustment step ensures the equality of the sum over all and the sum over all . The cost matrix contains as entry the Euclidean distance between origin and destination . 100 examples are generated for each number of origins and destinations from 100 to 3000 in steps of 100.
We make the generated transportation problem examples available for download, so that other researchers can reproduce the results and test other methods on this benchmark. The URL for the download is: www.stochastik.math.uni-goettingen.de/TransportBenchmark
All initialization methods were implemented to the best of our knowledge and optimal solutions were computed using the same revised simplex implementation for all methods. In Figure 3, we report total runtimes including the runtime for finding an initial basis and the runtime for the simplex iterations. The total runtimes are averaged over the 100 examples for each . The implementations are written in Java and were tested using one core of an Intel Core i7 CPU with 3.20GHz.
We observe that the Shortlist Method outperforms the other methods by a rather large margin. While for other initialization methods it is clearly preferable to use the “modified row most negative pivot strategy” (compare the remark in Subsection 2.1), this makes hardly a difference for the Shortlist Method. We may attribute this to the fact that this choice of the pivot only enters in step 4 of the Shortlist Method. However, by the end of step 3 the solution is already so close to optimality that step 4 does not have much influence on the total computation time.
The aforementioned parameters of the Shortlist Method were chosen in the following way. An additional set of examples was created with 30 examples for each . For each parameter, a set of a few possible choices were defined, and in total about thirty of their combinations were used for computing initial bases and optimal solutions on this training set. In this way, we obtained the following rough rule of thumb:
Shortlist length: for , then an increase of by another for each doubling of . More precisely, for .
Stop criteria: (i) candidates. (ii) of shortlists are searched at most in one iteration.
Although these parameter values have been trained, we consider them to be rather ad hoc, as they were chosen informally and by considering a few choices only. We understand this as a proof of concept of the Shortlist Method and as a first step towards determining good universal parameters that only depend on the problem size. There are clearly situations, where one has the opportunity to train the method to more specific features of the problem at hand, e.g. when comparing images to a larger database. Then we expect our method to perform even considerably better than suggested by the above simulations.
Table 1 gives a comparison of our implementation of the shortlist method with two other programs: the original C code by Rubner used via the R package emdist  and lp_solve  by Berkelaar and others, a general purpose mixed integer linear programming solver (which accounts to some extent for its long runtime).
Last but not least, let us note that we have also compared the different approaches on various collections of real and randomly generated images, and the respective performances were largely confirmed.
|lp_solve ||emdist ||Shortlist|
In this paper, we have presented a novel method for solving the classical transportation problem in its full generality (with an arbitrary cost matrix) based on the simplex algorithm and we have compared the proposed Shortlist Method to best methods known in the literature. The results for various problem sizes demonstrate the potential of the novel approach, which clearly outperforms all the other methods on the considered benchmark and indicates that the performance difference between the Shortlist Method and the second best alternative grows with increasing problem size.
To substantiate this conjecture we have simulated additional sets of 10 examples for each of the six best performing methods in the lower panel of Figure 3 in combination with each of the problem sizes , , , , , and . Based on the literature we have expected polynomial growth of the time complexity of the problem with an exponent that is somewhat below , i.e. a runtime of roughly the form for some . Since this implies that one can expect a roughly linear relation, when drawing the logartihm of the runtime as a function of the log problem size.
As we can see from Figure 4 this idea works out quite well. We only plot the results for the Shortlist Method and two competing methods as the other four competitors would overlap large parts of the two that are given. The circles indicate the results from our simulations, whereas the lines have been fitted by least-squares regression. Note that the lines fit the simulation data very well. The slopes of the lines provide estimates for the exponents . These are given numerically in Table 2 for all seven methods, together with p-values for testing whether the slope is different from the obtained for the Shortlist Method. The p-values are based on statistical tests for comparing slopes in an ANCOVA model, see [9, Chapter 13]. Since they are so small, it seems highly likely that the Shortlist Method has in fact a better time complexity than the other methods.
|Alternating Row Col. Minimum Rule||2.5510||0.009090|
|Modified Column Minimum Rule||2.5667||0.002624|
|Modified Row Minimum Rule||2.5915||0.000312|
|Least Cost Rule||2.6362||0.000005|
|Weighted Frequency Method||2.6574|
Let us also compare the performances of the best six competitors for our original benchmark to earlier performance studies from the literature. Based on the results considered in the lower panel of Figure 3, i.e. based on problem sizes up to 3000, several initialization methods performed similarly: the modified column minimum rule, Houthakker’s method and the alternating row column minimum rule, followed by the modified row minimum rule, the least cost rule and the weighted frequency method.
These results confirm earlier findings reported in  and in  on other benchmarks, with one exception: the least cost rule (also known as matrix mimimum rule) performed among the best competitors in our test and finished among the slowest methods in . A possible explanation is our implementation which sorts all matrix entries once ascendingly by transportation costs and then iterates over the list until the initial solution is obtained. This procedure is more efficient than determining the matrix cost minimum in each iteration by scanning all remaining origins and destinations. Analogously, we can explain the advantage of the proposed modified Russell’s method over the original Russell’s method. The speedup gained by the avoidance of scanning large parts of the complete matrix in each iteration clearly outweighs a possible quality loss of the initial solution by not updating the quantities and which are supposed to approximate the simplex multipliers and (see Section 2).
Further research includes a systematic large-scale simulation study to determine good universal parameter settings depending only on easy-to-determine features of the problem such as problem size. Also we would like to investigate to what extent computation times and orders of complexity can be improved when comparing images within a homogeneous database, where one has the possibility to train the parameters to the expected type of transportation problem.
In either case we believe that there is still much room for improvement of the results obtained above. We expect these findings to prepare the ground for applications in pattern recognition, computer vision and image processing, where solving the transportation problem has so far been considered as intractable due to the problem size and the runtimes of existing methods when applied to (smaller) raw gray scale images or feature descriptors like curved Gabor filter bank responses . A selection of further applications is contained in the next section.
6 Applications of the Transportation Problem
Solving transportation problems efficiently is of great importance in many different fields of application. We would like to give an idea of the relevance of fast algorithms by discussing a selection of specific examples.
Detection of Phishing Web Pages
The earth mover’s distance (EMD) has been applied for the detection of phishing web pages by Fu et al. . Screenshots are taken from banking websites and potential phishing sites and the visual similarity is measured using the EMD. If an antiphishing system automatically compares thousands or millions of websites, the speed of each comparison is an important factor and can become the bottleneck of the system. In this application scenario, the speedup by the shortlist method can make a huge difference. E.g. if web sites are compared at a resolution of pixels, this correponds to a problem with an approximate dimension of 5000 origins and 5000 destinations.
The EMD has been applied as a measure of dissimilarity when comparing the distribution of color names among 110 different languages . Notably, computation of EMDs for 2300 language vectors took the authors about one week using an industrial strength LP solver . Due to the computational complexity, they refrained from evaluating the 23,982 speaker response vectors.
Content-based Image Retrieval
Since the early days of retrieving images from large databases, the EMD has been applied for comparing histograms and signatures . Pele and Werman proposed a thresholded ground distance which is an EMD variant . For content-based image retrieval, thresholding the ground distance has a positive effect on the retrieval accuracy .
In the area of fingerprint recognition, the EMD has been applied for discriminating between real and synthetic fingerprint images based on minutiae histograms . These 2-dimensional minutiae histograms capture the minutiae distribution as a fixed-length feature vector which is invariant to rotation, translation and the variations in the number of minutiae. Scale invariance can be achieved by scaling input fingerprint images or minutiae templates to the size of adult fingerprints at a fixed resolution, e.g. 500 DPI. Fingerprints of adolescents can be enlarged using an age-dependent scaling factor as described in .
Performance Evaluation of Multi-Object Filters
In  and  the transport idea was used to evaluate the performance of multi-object filtering and control algorithms. Using a simulated ground truth of a varying number of objects moving through space, the online predictions by an algorithm that had only a cluttered version of the ground truth available was judged by performance curves over time. These curves at any one time were defined as the cost of the optimal transport between predicted configuration and ground truth.
Perceived Plant Color
The EMD was applied for computing color differences between images of different plant species by Kendal et al. . Comparisons showed that these results were largely consistent with qualitative assessments by human experts.
A fast approximation of the EMD for shape matching was introduced by Grauman and Darrell in . Similar shapes are retrieved by embedding the mimimum weight matching of the contour features of a query contour and performing an approximate nearest neighbors search with locality-sensitive hashing. Ling and Okada proposed a method  that reduces the computational complexity for computing the EMD between histograms and they show its usefulness for shape matching and histogram feature matching. However, the method is restricted to the taxicab metric ( distance).
Qiu  considered the two-class problem of classifying cells represented by multi-dimensional flow cytometry data into cells from heathly donors and cells from patients with acute myeloid leukemia. The EMD was used by Qiu to compare cell distributions and to derive features for classification.
Complex Scene Analysis
Ricci et al. apply the EMD idea for analyzing complex scenes such as frames from videos which change dynamically and they propose to learn a sparse set of prototypes with EMD .
Visual Object Tracking
Zhao et al. address the problem of visual object tracking . They argue that the EMD is suited for capturing the perceptual differences between images, however, its computational complexity is too large for many potential applications. They propose a differential EMD for tracking which has a reduced computational complexity.
Squared Euclidean Distances and the Interpolation of Shapes and Images
In the last two decades, numerical schemes were proposed for the special situation that the ground distance is the square of the Euclidean distance between origins and destinations. Aurenhammer et al.  proposed an algorithm which uses power diagrams to transform the transportation problem into an unconstrained convex minimization problem. Recently, Mérigot  improved this algorithm by solving this optimization problem via a multiscale approach and applied it to the interpolation of images. Further methods for solving transportation problems with a squared Euclidean ground distance were proposed by Benamou and Brenier , by Angenent et al. , by Loeper and Rapetti  and by Benamou et al. .
An important special case of the transportation problem is the assignment problem, where the numbers of origins and destinations are the same and the mass at each origin and destination is equal to one.
There exists a multitude of applications in computer science and electrical engineering as well as in operations research: e.g. assigning persons to jobs, or computational tasks to nodes in a network.
For geographical coordinates obtained at different points in time for objects like airplanes from radar or satellites, target tracking can be viewed as an assignment problem by matching moving targets observed at two points in time. However, if more than two points in time are considered simultaneously, the problem becomes a multi index assignment problem which is a NP-hard problem .
Further potential applications can arise in the area of future public transportation systems: in case of a prevalence of electric drive vehicles and autonomous driving, the proposed method can be used to optimally assign cars to recharging locations, using for recharging e.g. a wireless transmission by electromagnetic induction.
C. Gottschlich gratefully acknowledges the support of the Felix-Bernstein-Institute for Mathematical Statistics in the Biosciences and the Volkswagen Foundation.
-  S. Angenent, S. Haker, and A. Tannenbaum. Minimizing flows for the Monge-Kantorovich problem. SIAM Journal on Mathematical Analysis, 35(1):61–97, January 2003.
-  F. Aurenhammer, F. Hoffmann, and B. Aronov. Minkowski-type theorems and least-squares clustering. Algorithmica, 20(1):61–76, January 1998.
-  J.-D Benamou and Y. Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 84(3):375–393, 2000.
-  J.-D Benamou, B.D. Froese, and A.M. Oberman. Numerical solution of the optimal transportation problem using the Monge-Ampère equation. Journal of Computational Physics, 260(1):107–126, March 2014.
-  M. Berkelaar et al. lp_solve v5.5. Available: http://lpsolve.sourceforge.net/5.5/, 2014.
-  D. P. Bertsekas and D. A. Castanon. The auction algorithm for the transportation problem. Annals of Operations Research, 20(1):67–96, January 1989.
-  G. B. Dantzig. Linear Programming and Extensions. Princeton Univ. Press, Princeton, NJ, 1963.
-  G. B. Dantzig. The diet problem. Interfaces, 20(4):43–47, July 1990.
-  J. J. Faraway. Linear Models with R. Chapman & Hall/CRC, Boca Raton, FL, 2004.
-  A.Y. Fu, L. Wenyin, and X. Deng. Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Transactions on Dependable and Secure Computing, 3(4):301–311, October 2006.
-  F. Glover, D. Karney, D. Klingman, and A. Napier. A computation study on start procedures, basis change criteria, and solution algorithms for transportation problems. Management Science, 20(5):793–813, January 1974.
-  C. Gottschlich. Curved-region-based ridge frequency estimation and curved Gabor filters for fingerprint image enhancement. IEEE Transactions on Image Processing, 21(4):2220–2227, April 2012.
-  C. Gottschlich, T. Hotz, R. Lorenz, S. Bernhardt, M. Hantschel, and A. Munk. Modeling the growth of fingerprints improves matching for adolescents. IEEE Transactions on Information Forensics and Security, 6(3):1165–1169, September 2011.
-  C. Gottschlich and S. Huckemann. Separating the real from the synthetic: Minutiae histograms as fingerprints of fingerprints. IET Biometrics, to appear.
-  K. Grauman and T. Darrell. Fast contour matching using approximate earth mover’s distance. In Proc. CVPR, pages 220 –227, Washington, DC, USA, June 2004.
-  J. Habr. Die Frequenzmethode zur Lösung der Transportprobleme und verwandter linearer Programmierungsprobleme. Wissenschaftliche Zeitung der Universität Dresden, 10(5):1069–1071, May 1961.
-  J.R. Hoffman and R.P.S. Mahler. Multitarget miss distance via optimal assignment. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 34(3):327–336, May 2004.
-  H. S. Houthakker. On the numerical solution of the transportation problem. Operations Research, 3(2):210–214, May 1955.
-  D. Kendal, C.E. Hauser, G.E. Garrard, S. Jellinek, K.M. Giljohann, and J.L. Moore. Quantifying plant colour and colour difference as perceived by humans using digital images. PLOS ONE, 8(8):e72296, August 2013.
-  H. Ling and K. Okada. An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5):840–853, May 2007.
-  G. Loeper and F. Rapetti. Numerical solution of the Monge-Ampère equation by a Newton’s algorithm. Comptes Rendus Mathematique, 340(4):319–324, February 2005.
-  D. G. Luenberger and Y. Ye. Linear and Nonlinear Programming. Springer, New York, NY, 2008.
-  Q. Lv, M. Charikar, and K. Li. Image similarity search with compact data structures. In Proc. CIKM, pages 208–217, Washington, DC, USA, November 2004.
-  Q. Mérigot. A multiscale approach to optimal transport. Computer Graphics Forum, 30(5):1583–1592, August 2011.
-  G. Monge. Mémoire sur la théorie des déblais et des remblais. De l’Imprimerie Royale, 1781.
-  O. Pele and M. Werman. Fast and robust earth mover’s distances. In Proc. ICCV, pages 460–467, Kyoto, Japan, September 2009.
-  P. Qiu. Inferring phenotypic properties from single-cell characteristics. PLOS ONE, 7(5):e37038, May 2013.
-  N. V. Reinfeld and W. R. Vogel. Mathematical Programming. Prentice-Hall, Englewood Cliffs, NJ, 1958.
-  E. Ricci, G. Zen, N. Sebe, and S. Messelodi. A prototype learning framework using EMD: Application to complex scenes analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3):513–526, March 2013.
-  Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2):99–121, November 2000.
-  E. J. Russell. Extension of Dantzig’s algorithm to finding an initial near-optimal basis for the transportation problem. Operations Research, 17(1):187–191, February 1969.
-  D. Schuhmacher, B.-T. Vo, and B.-N. Vo. A consistent metric for performance evaluation of multi-object filters. IEEE Transactions on Signal Processing, 56(8):3447–3457, August 2008.
-  R. Sedgewick. Algorithms in Java. Addison-Wesley, Boston, MA, 2003.
-  F.C.R. Spieksma and G.J. Woeginger. Geometric three-dimensional assignment problems. European Journal of Operational Research, 91(3):611–618, June 1996.
-  V. Srinivasan and G. L. Thompson. Benefit-cost analysis of coding techniques for the primal transportation algorithm. Journal of the ACM, 20(2):194–213, April 1973.
-  S. Urbanek and Y. Rubner. emdist. R package version 0.3-1. Available: http://CRAN.R-project.org/package=emdist, 2012.
-  CPLEX v12.5.1. High-performance mathematical programming engine. International Business Machines corp., 2010.
-  M. Vejdemo-Johansson, S. Vejdemo, and C.-H. Ek. Comparing distributions of color words: Pitfalls and metric choices. PLOS ONE, 9(2):e89184, February 2014.
-  C. Villani. Optimal Transport, Old and New. Springer, Berlin, Germany, 2008.
-  Q. Zhao, Z. Yang, and H. Tao. Differential earth mover’s distance with its applications to visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2):274–287, February 2010.