A Quadratic Penalty Method for Hypergraph Matching
@Springer Science+Business Media, LLC 2017
Hypergraph matching is a fundamental problem in computer vision. Mathematically, it maximizes a polynomial objective function, subject to assignment constraints. In this paper, we reformulate the hypergraph matching problem as a sparse constrained optimization problem. By dropping the sparse constraint, we show that the resulting relaxation problem can recover the global minimizer of the original problem. This property heavily depends on the special structures of hypergraph matching. The critical step in solving the original problem is to identify the location of nonzero entries (referred to as the support set) in a global minimizer. Inspired by such observation, we apply the quadratic penalty method to solve the relaxation problem. Under reasonable assumptions, we show that the support set of the global minimizer in a hypergraph matching problem can be correctly identified when the number of iterations is sufficiently large. A projected gradient method is applied as a subsolver to solve the quadratic penalty subproblem. Numerical results demonstrate that the exact recovery of the support set indeed happens, and the proposed algorithm is efficient in terms of both accuracy and CPU time.
Keywords:Hypergraph matching Sparse optimization Quadratic penalty method Projected gradient method
Recently, hypergraph matching has become a popular tool in establishing correspondence between two sets of points. It is a central problem in computer vision, and has been used to solve several applications, including object detection berg2005shape (), image retrieval yan2005efficient (), image stitching zaragoza13 (); zaragoza14 (), and bioinformatics wu2012prl ().
From the point of view of graph theory, hypergraph matching belongs to bipartite matching. Traditional graph matching models only use point-to-point features or pair-to-pair features, which can be solved by linear assignment algorithms jiang2007 (); maciel2003 () or quadratic assignment algorithms egozi2013 (); Jiang2016 (); lee2011 (); litman2014 (); yan2015 (), respectively. To use more geometric information such as angles, lines, and areas, triple-to-triple graph matching was proposed in 2008 zass2008probabilistic (), and was further studied in duchenne2011tensor (); lee2011hyper (); nguyen2016efficient (). Since three vertices are associated with one edge, it is also termed as hypergraph matching. Numerical experiments in literature duchenne2011tensor (); lee2011hyper (); nguyen2016efficient (); zass2008probabilistic () show that hypergraph matching is more efficient than traditional graph matching. The aim of this paper is to study the hypergraph matching problem in both theory and algorithm.
The mathematical model of hypergraph matching is to maximize a multi-linear objective function subject to the row permutation constraints for
or permutation constraints for
Most existing algorithms for hypergraph matching relax the binary constraints into bound constraints and solve a continuous optimization problem. For instance, the probabilistic Hypergraph Matching method (HGM) zass2008probabilistic () reformulated the constraints as the intersection of three convex sets, and successively projected the variables onto the sets until convergence. The Tensor Matching method (TM) duchenne2011tensor () solved the optimization problem using the power iteration algorithm. The Hypergraph Matching method via Reweighted Random Walks (RRWHM) lee2011hyper () dealt with the problem by walking among two feasible vectors randomly. Different from the above algorithms, Block Coordinate Ascent Graph Matching (BCAGM) nguyen2016efficient () applied a block coordinate ascent framework, where they kept the binary constraints, and proposed to reformulate the multi-linear objective function into a linear one and solve it using linear assignment algorithms. All the existing algorithms require the equality constraints in (1.1) or (1.2) to be satisfied strictly at each iteration. In fact, we only expect that one of the elements is significantly larger than the others in each row or column of . That is, the equality constraints are only soft constraints, which allow violations to some extent. Therefore, we penalize the equality constraint violations as part of the objective function in our algorithm.
The hypergraph matching problem can also be reformulated equivalently as a nonlinear optimization problem with sparse constraint. During the last few years, in the optimization community, there has been significant progress on solving sparse constrained nonlinear problems, particularly on dealing with optimality conditions and numerical algorithms in different situations. Recent development in optimality conditions can be found in Pan2017 (), where based on decomposition properties of the normal cones, the authors characterized different kinds of stationary points and performed detailed investigations on relations of local minimizers, global minimizers and several types of stationary points. Other related work includes Bauschke2014Restricted (); Burdakov2015MATHEMATICAL (); C2016Constraint (); Li2015 (); Pan2015On (). The related algorithms can be summarized into two approaches. One is the direct approach, aiming at dealing with the sparse constraint directly, such as the hard-thresholding type based algorithms Beck2012Sparsity (); Pan2016A () and the penalty based algorithms Lu2012Sparse (). The other one is the relaxation approach such as the regularization based algorithms ChenLeiLuYe (); Jiang2016 (). In particular, an efficient regularization algorithm was proposed in Jiang2016 (), which deals with problems over the permutation matrix constraints (1.2). It can be applied to solve the hypergraph matching problem subject to (1.2).
Motivation. Noting that hypergraph matching is essentially a mixed integer programming, most existing methods relax the integer constraints as box constraints, and solve the relaxed continuous optimization problem. A natural question is: what is the relation between hypergraph matching and the relaxation problem? Furthermore, the key step in solving this problem is actually to identify the support set of the global minimizer. None of the existing algorithms has taken this fact into account. This leads to the second question: can we make use of this insight to design our algorithm?
Our Contributions. In this paper, by reformulating hypergraph matching equivalently as a sparse constrained optimization problem, we study it from the following aspects.
Relaxation problem. By dropping the sparse constraint, we show that the relaxation problem can recover the solution of the original problem in the sense that the former problem shares at least one global minimizer with the latter one (Theorem 3.1). This result highly depends on the special structures of hypergraph matching. Furthermore, we show that Theorem 3.1 can be extended to more general problems (Corollary 2). For any global minimizer of the relaxation problem, we propose a procedure to reduce its sparsity until a global minimizer of the original problem is reached.
Quadratic penalty method. Our aim is to identify the support set of a global minimizer of the original problem, thus the equality constraints are not necessary to be satisfied strictly. This motivates us to penalize the equality constraint violations, and solve the relaxation problem by a quadratic penalty method. We show that under reasonable assumptions, the support set of a global minimizer of the original problem can be recovered exactly, when the number of iteration is sufficiently large (Theorems 4.2 and 4.3).
Projected gradient method. For the quadratic penalty subproblem, which is a nonlinear problem with simple box constraints, we choose one of the active set based methods called the projected gradient method as a subsolver. The advantage of the active set based method is that it well fits our motivation, which is to identify the support set of the solution rather than to look for the magnitude. Numerical results demonstrate that the exact recovery of the support set indeed happens, and the proposed algorithm is particularly suitable for large-scale problems.
Organization. The rest of the paper is organized as follows. In Section 2, we introduce the reformulation of the hypergraph matching problem, and discuss several preliminary properties. In Section 3, we study the properties of the relaxation problem by dropping the sparse constraint. In Section 4, we study the quadratic penalty method by penalizing the equality constraint violations and establish the convergence results in terms of support set under different situations. An existing projected gradient method is also discussed to solve the quadratic penalty subproblem. Numerical experiments are reported in Section 5. Final conclusions are drawn in Section 6.
Notations. For , define the active set as and the support set as . We also use and , and and to denote the corresponding sets at and , respectively. Let be the number of elements in the set . denotes the norm of , the number of nonzero entries in , and the infinity norm of .
2 Problem Reformulation
In this section, we will reformulate hypergraph matching as a sparse constrained optimization problem, and discuss several preliminary properties.
2.1 Hypergraph matching problem
In this part, we will give the mathematical formulation for hypergraph matching, including its objective function and constraints.
Consider two hypergraphs , and , where and are sets of points with , , and , are sets of hyperedges. In this paper, we always suppose that , and each point in is matched to exactly one point in , while each point in can be matched to arbitrary number of points in . That is, we focus on (1.1). For each hypergraph, we consider three-uniform hyperedges. Namely, the three points involved in each hyperedge are different, for example, . Our aim is to find the best correspondence (also referred to as ‘matching’) between and with the maximum matching score.
Let be the assignment matrix between and , i.e.,
Two hyperedges and are said to be matched if are assigned to , respectively. It can be represented equivalently by . Let be the matching score between and . Then is a sixth order tensor. Assume is given, satisfying if and , and , otherwise.
Given hypergraphs , , and the matching score , the hypergraph matching problem takes the following form
Note that (2.1) is a matrix optimization problem, which can be reformulated as a vector optimization problem as follows.
Let , be the vectorization of , that is
Here, is the -th block of . In the following, for any vector , we always assume it has the same partition as . Define as
Consequently, (2.1) can be reformulated as
where is a vector with all entries equal to one, and .
2.2 Preliminary properties
In this subsection, we will discuss several properties of , , and . We begin with properties of .
for all and ;
If , then , , and are distinct. If , then , , and are also distinct;
For any permutation operator , suppose and . There is
The above properties of result in the following properties of directly.
, for all ;
For nonzero entries of , say , , and come from different blocks of ;
Suppose is any permutation of . Then
In other words, is nonnegative and symmetric.
Proof. (i) follows directly from the nonnegativity of . In terms of (ii), by the definition of , there exist and such that (2.2) and (2.3) hold. Further, we know that is the -th entry in the -th block of , i.e., . Similarly, and . By (ii) in Proposition 1, are distinct, which implies that , , come from different blocks of . In terms of (iii), since , , again by the definition of and (2.3), there is . Together with (2.5) and (2.2), there is (2.6). ∎
Different from other nonlinear problems, the homogenous polynomial enjoys special structures. To see this, for the -th block , denote
Rewrite as follows:
For each block , , is a linear function of , i.e., is independent of ;
Proof. In terms of (i), by the definition of , we only need to consider the term , where is nonzero. Due to (ii) in Proposition 2, is linear in each related block , , and . Therefore, is a linear function of , .
In terms of (ii), the elements of gradient take the following form
Rewrite in (2.7) as
Hence, , which gives (2.8). ∎
2.3 Sparse constrained optimization problem
Problem (2.4) is a 0-1 mixed integer programming, which is one of Karp’s 21 NP-complete problems Karp1972 (). In this subsection, we will reformulate (2.4) into a sparse constrained optimization problem.
By direct computations, (2.4) can be reformulated as the following sparse constrained minimization problem
To see this, for each satisfying the equality constraints, we have . Together with , we actually have .
where and is the -th column of the -by- identity matrix.
Note that the dimension of is , which can be large even for moderate and . For instance, if and , then , and the number of elements in will be around . Hence, algorithms capable of dealing with large-scale problems are highly in demand.
Problem (2.9) is essentially a - mixed integer programming. Each feasible point is actually an isolated feasible point, which means that it is a strict local minimizer and of course is a stationary point of (2.9). For a theoretical verification from the optimality point of view, please see Theorems 1 and 3 in an earlier version of our paper CuiLiQiYan2017 ().
3 Relaxation Problem of (2.9)
By dropping the sparse constraint in (2.9), we obtain the following problem (referred to as the relaxation problem)
As we will show later in Theorem 3.1, although we drop the sparse constraint, the relaxation problem (3.1) still admits a global minimizer with sparsity due to the special structures of (2.9). That is, the relaxation problem (3.1) recovers a global minimizer of (2.9).
which are equivalent to
for all . Define the active set and the support set for the -th block as
The KKT conditions can be reformulated as
for all . The above analysis gives the following lemma.
Let be a stationary point of (3.1), and be the Lagrange multiplier corresponding to the equality constraints. For all , we have
Proof. (i) can be obtained directly from the KKT conditions (3.3).
In terms of (ii), by (2.8), there is
where the last equality is due to . This completes the proof.∎
Proof. Without loss of generality, let be a global minimizer of (3.1) with . Find the first block of , denoted as , such that . Now we choose one index from , and define a new point as follows:
Next, we will show that . Indeed,
This gives that is a feasible point with . In other words, is another global minimizer of (3.1) with . If , let . Otherwise, by repeating the above process, we can obtain a finite sequence , which are all feasible points for (3.1) satisfying
Note that there are blocks in . After at most steps, the process will stop. In other words, . The final point will satisfy . One can obtain a global minimizer of (3.1) with nonzero elements.
Next, we will show that is also a global minimizer of (2.9). Note that the feasible region of (2.9) is a subset of the feasible region of (3.1). implies that is also a feasible point for (2.9). Together with the fact that attains the global minimum of (3.1), we conclude that is a global minimizer of (2.9). ∎
From the proof of Theorem 3.1, one can start from any global minimizer of (3.1) to reach a point , which is a global minimizer of both (2.9) and (3.1). We only need to choose one index as the location of nonzero entry in each block . Assume is chosen from . Let . This will give the support set in the -th block, which in turn determines the global minimizer of (2.9) by
for each and . One particular method to choose is to choose the index with the largest value within the block. This is actually the projection of onto the feasible set of (2.9). Here, we summarize the process in Algorithm 1.
Note that HGM zass2008probabilistic () also solves the relaxation problem (3.1), whereas TM duchenne2011tensor () and RRWHM lee2011hyper () solve the relaxation problem with the permutation constraints (1.2). However, none of them analyzes the connections between the original problem and the relaxation problem in terms of global minimizers. On contrast, the result in Theorem 3.1 reveals for the first time the connections between the original problem (2.9) and the relaxation problem (3.1), which is one of the main differences of our work from existing algorithms for hypergraph matching.
Theorem 3.1 reveals an interesting connection between the original problem (2.9) and the relaxation problem (3.1) in terms of global minimizers. The result heavily relies on the property of in Proposition 3, as well as the equality constraints in (2.9). It can be extended to the following general case.
4 The Quadratic Penalty Method
In this section, we will consider the quadratic penalty method for the relaxation problem (3.1). It contains three parts. The first part is devoted to motivating the quadratic penalty problem and its preliminary properties. The second part mainly focuses on the quadratic penalty method and the convergence in terms of the support set. In the last part, we apply an existing projected gradient method for the quadratic penalty subproblem.
4.1 The quadratic penalty problem
Note that (3.1) is a nonlinear problem with separated simplex constraints, which can be solved by many traditional nonlinear optimization solvers such as fmincon in MATLAB. As mentioned in Section 1, existing algorithms for hypergraph matching require the equality constraints in (3.1) to be satisfied strictly. On contrast, our aim here is actually to identify the support set of a global minimizer of (3.1) rather than the magnitude. Once the support set is found, we can follow the method in Remark 3 to obtain a global minimizer of (2.9). Inspired by such observations, we penalize the equality constraint violations as part of the objective function. This is another main difference of our method from existing algorithms. It leads us to the following quadratic penalty problem
where is a penalty parameter. However, this problem is not well defined in general, since for a fixed the global minimizer will approach infinity. We can add an upper bound to make the feasible set bounded. This gives the following problem
where is a given number. (4.1) is actually the quadratic penalty problem of the following problem
which is equivalent to (3.1).
The Lagrangian function of (4.1) is
where and are the Lagrange multipliers corresponding to the inequality constraints in (4.1). The KKT conditions are
The KKT conditions are equivalent to the following, for each ,
Define the violations of the equality constraints as
The above analysis can be stated in the following lemma.
Let be a stationary point of (4.1). We have for all .
Proof. For each