# Similar Elements and Metric Labeling on Complete Graphs

We consider a problem that involves finding similar elements in a collection of sets. The problem is motivated by applications in machine learning and pattern recognition (see, e.g. [2]). Intuitively we would like to discover something in common among a collection of sets, even when the sets have empty intersection. A solution involves selecting an element from each set such that the selected elements are close to each other under an appropriate metric. We formulate an optimization problem that captures this notion and give an efficient approximation algorithm that finds a solution within a factor of 2 of the optimal solution.

The similar elements problem is a special case of the metric labeling problem defined in [1] and we also give an efficient 2-approximation algorithm for the metric labeling problem on complete graphs. Metric labeling on complete graphs generalizes the similar elements problem to include costs for selecting elements in each set.

Beyond producing solutions with good theoretical guarantees, the algorithms described here are also practical. The algorithm for the similar elements problem has been implemented and used to find similar objects in a collection of photographs [3].

## 1 Similar Elements

Let be a (possibly infinite) set and be a metric on . Let be finite subsets of . The goal of the similar elements problem is to select an element from each set such that the selected elements are close to each other under the metric . One motivation is for discovering something in common among the sets even when they have empty intersection.

We formalize the problem as the minimization of the sum of pairwise distances among selected elements. Let with . Define the similar elements objective as,

(1) |

Let be an optimal solution for the similar elements problem.

Optimizing appears to be difficult, but we can define easier problems if we ignore some of the pairwise distances in the objective. In particular we define different “star-graph” objective functions as follows. For each define the objective to account only for the terms in involving ,

(2) |

Let be an optimal solution for the optimization problem defined by . We can compute efficiently using a simple form of dynamic programming, by first computing and then computing for .

(3) |

(4) |

Each of the “star-graph” objective functions leads to a possible solution. We then select from among the solutions by evaluating the original objective,

(5) |

###### Theorem 1.

The algorithm described above finds a 2-approximate solution to the similar elements problem. That is,

###### Proof.

Since the minimum of a set of values is at most the average,

By the triangle inequality we have

To analyze the running time of the approximation algorithm we assume the distances between pairs of elements in are either pre-computed and given as part of the input, or they can each be computed in time.

Let . The first stage of the algorithm involves optimization problems that can be solved in time each. The second stage of the algorithm involves evaluating for solutions, and takes time.

###### Remark 2.

If each of the sets has size at most the running time of the approximation algorithm for the similar elements problem is .

## 2 Metric Labeling on Complete Graphs

Let be an undirected simple graph on nodes . Let be a finite set of labels with and be a metric on . For let be a non-negative function mapping labels to real values. The unweighted metric labeling problem on is to find a labeling minimizing

(6) |

Let . This optimization problem can be solved in polynomial time using dynamic programming if is a tree. Here we consider the case when is the complete graph and give an efficient 2-approximation algorithm based on the solution of several metric labeling problems on star graphs.

For each define a different objective function, , corresponding to a metric labeling problem on a star graph with vertex set rooted at ,

(7) |

Let . We can solve this optimization problem in time using a simple form of dynamic programming. First compute an optimal label for the root vertex using one step of dynamic programming,

(8) |

Then compute for ,

(9) |

Optimizing each separately leads to possible solutions . We select one of them using the original metric labeling objective on the complete graph,

(10) |

###### Theorem 3.

The algorithm described above finds a 2-approximate solution to the metric labeling problem on a complete graph. That is,

###### Proof.

Since the minimum of a set of values is at most the average,

Since is a metric and is non-negative,

The first stage of the algorithm involves optimization problems that can be solved in time each. The second stage of the algorithm involves evaluating solutions, and takes time.

###### Remark 4.

The running time of the approximation algorithm for the metric labeling problem on complete graphs is .

### Acknowledgments

We thank Caroline Klivans, Sarah Sachs, Robert Kleinberg and Yang Yuan for helpful discussions about the contents of this report. This material is based upon work supported by the National Science Foundation under Grant No. 1447413.

## References

- [1] Jon Kleinberg and Eva Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. Journal of the ACM, 49(5):616–639, 2002.
- [2] Oded Maron and Aparna Lakshmi Ratan. Multiple-instance learning for natural scene classification. In International Conference on Machine Learning, volume 98, pages 341–349, 1998.
- [3] Sarah Sachs. Similar-part approximation using invariant feature descriptors. Undergraduate Honors Thesis, Brown University, 2016.