Safe, Optimal, Real-time Trajectory Planning with a Parallel Constrained Bernstein Algorithm

Safe, Optimal, Real-time Trajectory Planning with a Parallel Constrained Bernstein Algorithm


To move through the world, mobile robots typically use a receding-horizon strategy, wherein they execute an old plan while computing a new plan to incorporate new sensor information. A plan should be dynamically feasible, meaning it obeys constraints like the robot’s dynamics and obstacle avoidance; it should have liveness, meaning the robot does not stop to plan so frequently that it cannot accomplish tasks; and it should be optimal, meaning that the robot tries to satisfy a user-specified cost function such as reaching a goal location as quickly as possible. Reachability-based Trajectory Design (RTD) is a planning method that can generate provably dynamically-feasible plans. However, RTD solves a nonlinear polynmial optimization program at each planning iteration, preventing optimality guarantees; furthermore, RTD can struggle with liveness because the robot must brake to a stop when the solver finds local minima or cannot find a feasible solution. This paper proposes RTD*, which certifiably finds the globally optimal plan (if such a plan exists) at each planning iteration. This method is enabled by a novel Parallelized Constrained Bernstein Algorithm (PCBA), which is a branch-and-bound method for polynomial optimization. The contributions of this paper are: the implementation of PCBA; proofs of bounds on the time and memory usage of PCBA; a comparison of PCBA to state of the art solvers; and the demonstration of PCBA/RTD* on a mobile robot. RTD* outperforms RTD in terms of optimality and liveness for real-time planning in a variety of environments with randomly-placed obstacles.

I Introduction

Fig. 1: A Segway RMP mobile robot using the proposed PCBA/RTD* method to autonomously navigate a tight obstacle blockade. The executed trajectory is shown fading from light to dark blue as time passes, and the robot is shown at four different time instances. The top right plot shows the Segway’s (blue circle with triangle indicating heading) view of the world at one planning iteration, with obstacles detected by a planar lidar (purple points). The top left plot shows the optimization program solved at the same planning iteration; the decision variable is , which parameterizes the velocity and yaw rate of a trajectory plan; the pink regions are infeasible with respect to constraints generated by the obstacle points in the right plot; and the blue contours with number labels depict the cost function, which is constructed to encourage the Segway to reach a waypoint (the star in the top right plot). The optimal solution found by PCBA is shown as a star on the left plot, in the non-convex feasible area (white). This optimal solution generates a provably-safe trajectory for the Segway to track, shown as a blue dashed line in the right plot. A video is available at

For mobile robots to operate successfully in unforeseen environments, they must plan their motion as new sensor information becomes available. This receding-horizon strategy requires iteratively generating a plan while simultaneouly executing a previous plan. Typically, this requires solving an optimization program in each planning iteration (see, e.g., [19]).

The present work considers a mobile ground robot tasked with reaching a global goal location in an arbitrary, static environment. To assess receding-horizon planning performance, we consider the following characteristics of plans. First, a plan should be dynamically feasible, meaning that it obeys the dynamic description of the robot and obeys constraints such as actuator limits and obstacle avoidance. Second, a plan should maintain liveness, meaning that it keeps the robot moving through the world without stopping frequently to replan, which can prevent a robot from achieving a task in a user-specified amount of time. Third, a plan should be optimal with respect to a user-specified cost function, such as reaching a goal quickly.

Ensuring that plans have these characteristics is challenging for several reasons. First, robots typically have nonlinear dynamics; this means that creating a dynamically-feasible plan typically requires solving a nonlinear program (NLP) at runtime. However, it is difficult to certify that an NLP can be solved in a finite amount of time, meaning that the robot may have to sacrifice liveness. Furthermore, even if a robot has linear dynamics, the cost function may be nonlinear; in this case, it is challenging to certify optimality due to the presence of local minima.

This paper extends prior work on Reachability-based Trajectory Design (RTD). RTD is able to provably generate dynamically-feasible trajectory plans in real time, but cannot guarantee optimality or liveness (the robot will generate plans in real time, but may brake to a stop often). We address this gap by proposing the Parallel Constrained Bernstein Algorithm (PCBA), which provably finds globally-optimal solutions to Polynomial Optimization Problems (POPs), a special type of NLP. We apply PCBA to RTD to produce an optimal version of RTD, which we call RTD* in the spirit of the well-known RRT* algorithm [17]. We show on hardware that RTD* demonstrates liveness (in comparison to RTD) for trajectory optimization. For the remainder of this section, we discuss related work, then state our contributions.

I-a Related Work

Receding-horizon Planning

A variety of methods exist that attempt receding-horizon planning while maintaining dynamic feasibility, liveness, and optimality. These methods can be broadly classified by whether they perform sampling or solve an optimization program at each planning iteration. Sampling-based methods typically either attempt to satisfy liveness and dynamic feasibility by choosing samples offine [34, 24], or attempt to satisfy optimality at the potential expense of liveness and dynamic feasibility [17]. Optimization-based methods attempt to find a single optimal trajectory. These methods typically have to sacrifice dynamic feasibility (e.g., by linearizing the dynamics) to ensure liveness [29], or sacrifice liveness to attempt to satisfy dynamic feasibility [46, 8] (also see [18, §9] and [37]).

Reachability-based Trajectory Design

Reachability-based Trajectory Design (RTD) is an optimization-based receding horizon planner that requires solving a POP at each planning iteration [18]. RTD specifies plans as parameterized trajectories. Since these trajectories cannot necessarily be perfectly tracked by the robot, RTD begins with an offline computation of a Forward Reachable Set (FRS). The FRS contains every parameterized plan, plus the tracking error that results from the robot not tracking any plan perfectly. At runtime, in each planning iteration, the FRS is intersected with sensed obstacles to identify all parameterized plans that could cause a collision (i.e., be dynamically infeasible). This set of unsafe plans is represented as a (finite) list of polynomial constraints, and the user is allowed to specify an arbitrary (not necessarily convex) polynomial cost function, resulting in a POP. At each planning iteration, either the robot successfully solves the POP to get a new plan, or it continues executing its previously-found plan. While the decision variable is typically only two- or three-dimensional, each POP often has hundreds of constraints, making it challenging to find a feasible solution in real-time [18]. Each plan includes a braking maneuver, ensuring that the robot can always come safely to a stop if the POP cannot be solved quickly enough in any planning iteration.

Note, for RTD, optimality means finding the optimal solution to a POP at each planning iteration. The cost function in RTD’s POPs typically encode behavior such as reaching a waypoint between the robot’s current position and the global goal (e.g., [18, §9.2.1]; RTD attempts to find the best dynamically feasible trajectory to the waypoint. RTD does not attempt to find the best waypoints themselves (best, e.g., with respect to finding the shortest path to the global goal). Such waypoints can be generated quickly by algorithms such as A* or RRT* by ignoring dynamic feasibility [17, 18].

Polynomial Optimization Problems

POPs require minimizing (or maximizing) a polynomial objective function, subject to polynomial equality or inequality constraints. As a fundamental class of problems in non-convex optimization, POPs arise in various applications, including signal processing [35, 43, 27], quantum mechanics [4, 12], control theory [16, 18, 2], and robotics [39, 26]. This paper presents a novel parallelized constrained Bernstein Algorithm for solving POPs.

The difficulty of solving a POP increases with the dimension of the cost and constraints, with the number of constraints, and with the number of optima [33]. Existing methods attempt to solve POPs while minimizing time and memory usage (i.e., complexity). Doing so typically requires placing limitations on one of these axes of difficulty to make solving a POP tractable. These methods broadly fall into the following categories: derivative-based, convex relaxation, and branch-and-bound.

Drivative-based methods use derivatives (and sometimes Hessians) of the cost and constraint functions, along with first- or second-order optimality conditions [33, §12.3, §12.5], to attempt to find optimal, feasible solutions to nonlinear problems such as POPs. These methods can find local minima of POPs rapidly despite high dimension, a large number of constraints, and high degree cost and constraints [33, Chapter 19.8]. However, these methods do not typically converge to global optima without requiring assumptions on the problem and constraint structure (e.g., [36]).

Convex relaxation methods attempt to find global optima by approximating the original problem with a hierarchy of convex optimization problems. These methods can be scaled to high-dimensional problems (up to 10 dimensions), at the expense of limits on the degree and sparse structure of the cost function; furthermore, they typically struggle to handle large numbers of constraints (e.g., the hundreds of constraints that arise in RTD’s POPs), unless the problem has low-rank or sparse structure [39]. Well-known examples include the lift-and-project linear program procedure [1], reformulation-linearization technique [41], and Semi-Definite Program (SDP) relaxations [21, 39]. By assuming structure such as homogeneity of the cost function or convexity of the domain and constraints, one can approximate solutions to a POP in polynomial-time, with convergence to global optima in the limit [5, 22, 23, 42, 15]. Convergence within a finite number of convex hierarchy relaxations is possible under certain assumptions (e.g., a limited number of equality constraints [32, 20]).

Branch-and-bound methods perform an exhaustive search over the feasible region. These methods are typically limited to up to four dimensions, but can handle large numbers of constraints and high degree cost and constraints. Examples include interval analysis techniques [14, 45] and the Bernstein Algorithm (BA) [10, 31, 40]. Interval analysis requires cost and constraint function evaluations in each iteration, and therefore can be computationally slow. BA, on the other hand, does not evaluate the cost and constraint functions; instead, BA represents the coefficients of the cost and constraints in the Bernstein basis, as opposed to the monomial basis. The coefficients in the Bernstein basis provide lower and upper bounds on the polynomial cost and constraints over box-shaped subsets of Euclidean space by using a subdivision procedure [9, 30]. Note, one can use the Bernstein basis to transform a POP into a linear program (LP) on each subdivided portion of the problem domain, which allows one to find tighter solution bounds that given by the Bernstein coefficients alone [40]. Since subdivision can be parallelized [6], the time required to solve a POP can be greatly reduced by implementing BA on a Graphics Processing Unit (GPU). However, a parallelized implementation or bounds on the rate of convergence of BA with constraints has not yet been shown in the literature. Furthermore, to the best of our knowledge, BA has not been shown as a practical method for solving problems in real-time robotics applications.

I-B Contributions and Paper Organization

In this paper, we make the following contributions. First, we propose a Parallel Constrained Bernstein Algorithm (PCBA)III). Second, we prove that PCBA always finds an optimal solution (if one exists), and prove bounds on PCBA’s time and memory usage (§IV). Third, we evaluate PCBA on a suite of well-known POPs in comparison to the Bounded Sums-of-Squares (BSOS) [20] solver and a generic nonlinear solver (MATLAB’s fmincon) (§V). Fourth, we apply PCBA to RTD to make RTD*, a provably-safe, optimal, and real-time trajectory planning algorithm for mobile robots (§VI), thereby demonstrating dynamic feasibility and liveness. The remainder of the paper is organized as follows. §II defines notation for RTD and POPs. §VII draws conclusions. Appendix B lists benchmark problems for PCBA.

Ii Preliminaries

This section introduces notation, RTD, POPs, the Bernstein form, and subdivision.

Ii-a Notation

Polynomial Notation

We follow the notation in [31]. Let be real variable of dimension . A multi-index is defined as and the corresponding multi-power is defined as . Given another multi-index of the same dimension, an inequality should be understood component-wise. An -variate polynomial in canonical (monomial) form can be written as


with coefficients and some multi-index . The space of polynomials of degree , with variable , is .

Definition 1.

We call the multi-degree of a polynomial ; each th element of is the maximum degree of the variable out of all of the monomials of . We call the degree of ; is the maximum sum, over all monomials of , of the powers of the variable . That is, , where is the sum of the elements of a multi-index.

Point and Set Notation

Let denote a general -dimensional box in , with for each . Let be the -dimensional unit box. Denote by the maximum width of a box , i.e. . For any point , denote by the Euclidean norm of , and denote by the closed Euclidean ball centered at with radius .

RTD Notation

Let denote the time interval of a single plan (in a single receding-horizon iteration). Let denote the robot’s state space, and denote the robot’s control inputs where . The robot is described by dynamics , which we call a high-fidelity model. The parameterized trajectories are described by , where is the space of trajectory parameters. A point parameterizes a trajectory . For any , the robot uses a feedback controller to track the trajectory parameterized by (we say it tracks for short).

Ii-B Polynomial Optimization Problems

We denote a POP as follows:


The decision variable is , where is a compact, box-shaped domain and is the dimension of the program. The cost function is , and the constraints are (). We assume for convenience that is the greatest degree amongst the cost and constraint polynomials; we call the degree of the problem.

Ii-C Reachability-based Trajectory Design

Recall that RTD begins by computing an FRS offline for a robot tracking a parameterized set of trajectories. At runtime, in each receding-horizon planning iteration, the FRS is used to construct a POP as in (P); solving this POP is equivalent to generating a new trajectory plan. We now briefly describe how this POP is constructed (see [18] for details).

We define the FRS as the set of all states reachable by the robot when tracking any parameterized trajectory:


where is the set of valid initial conditions for the robot in any planning iteration. To implement RTD, we conservatively approximate the FRS using sums-of-squares programming. In particular, we compute a polynomial ( or ) for which the 0-superlevel set contains the FRS:


See [18, Section 3.2] for details on how to compute .

At runtime, we use to identify unsafe trajectory plans. If is a collection of points on obstacles, then we solve the following program in each planning iteration:


where is a user-constructed polynomial cost function (see (28) in §VI as an example). Note that, by (3), the set of safe trajectory plans is open, so we implement the constraints in (4) as , . Critically, any feasible solution to (4) is provably dynamically feasible (and collision-free) [18, Theorem 68]. To understand RTD’s online planning in more detail, see [18, Algorithm 2].

In this work, instead of using a derivative-based method to solve (4), we use our proposed PCBA method, which takes advantage of the polynomial structure of and . Next, we discuss Bernstein polynomials.

Ii-D Bernstein Form

A polynomial in monomial form (1) can be expanded into Bernstein form over an arbitrary -dimensional box as


where is the th multivariate Bernstein polynomial of multi-degree over , and are the corresponding Bernstein coefficients of over . A detailed definition of Bernstein form is available in [13]. Note that the Bernstein form of a polynomial can be determined quickly [44], by using a matrix multiplication on a polynomial’s monomial coefficients, with the matrix determined by the polynomial degree and dimension. This matrix can be precomputed, and the conversion from monomial to Bernstein basis only needs to happen once for the proposed method (see Algorithm 1 in §III).

For convenience, we collect all such Bernstein coefficients in a multi-dimensional array , which is called a patch. We denote by (resp. ) the minimum (resp. maximum) element in the patch . The range of polynomial over is contained within the interval spanned by the extrema of , formally stated as the following theorem:

Theorem 2 ([31, Lemma 2.2]).

Let be a polynomial defined as in (5) over a box . Then, the following property holds for a patch of Bernstein coefficients


This theorem provides a means to obtain enclosure bounds of a multivariate polynomial over a box by transforming the polynomial to Bernstein form. This range enclosure can be further improved either by degree elevation or by subdivision. This work uses subdivision, discussed next, to refine the bounds.

Ii-E Subdivision Procedure

Consider an arbitrary box . The range enclosure in Theorem 2 is improved by subdividing into subboxes and computing the Bernstein patches over these subboxes. A subdivision in the th direction () is a bisection of perpendicular to this direction. That is, let


be an arbitrary box over which the Bernstein patch is already computed. By subdividing in the th direction we obtain two subboxes and , defined as


Note that we have subdivided by halving its width in the th direction; we choose as the subdivision parameter in this work, but one can choose a different value in (see [31, Equation (10)]).

The new Bernstein patches, and , can be computed by a finite number of linear transformations [31, §2.2]:


where and are constant matrices, which can be precomputed, for each (notice that [31, Equation (10)] obtains with linear operations on ). The patches and one iteration of the subdivision procedure are shown in Figure 2.

Remark 3.

To reduce wordiness, we say that we subdivide a patch to mean the subdivision of a single subbox into two subboxes and the computation of the corresponding Bernstein patches for the POP cost and constraints.

By repeatedly applying the subdivision procedure and Theorem 2, the bounds on the range of polynomial in a subbox can be improved. In fact, such bounds can be exact in the limiting sense if the subdivision is applied evenly in all directions:

Theorem 4 ([25, Theorem 2]).

Let be a box of maximum width () and let be the corresponding Bernstein patch of a given polynomial , then


where is a non-negative constant that can be given explicitly independent of .

Notice from the proof of Theorem 4 that changing the sign of does not change the value of . By substituting with in Theorem 4, one can easily show a similar result holds for the maximum of over :

Corollary 5.

Let and be as in Theorem 4. Then,


where is the same non-negative constant as in Theorem 4.

Theorems 4 and Corollary 5 provide shrinking bounds for values of a polynomial over subboxes as the subdivision process continues. By comparing the bounds over all subboxes, one can argue that the minimizers of a polynomial may appear in only a subset of the subboxes. This idea underlies the Bernstein Algorithm for solving POPs [30, 31], discussed next.

Iii Parallel Constrained Bernstein Algorithm

Fig. 2: The 3rd iteration of PCBA (Algorithm 1) on a one-dimensional polynomial cost (top) with one inequality constraint (middle) and one equality constraint (bottom). The rectangles represent Bernstein patches (as in §II-E), where the horizontal extent of each patch corresponds to an interval of the decision variable over which Bernstein coefficients are computed. The top and bottom of each patch represent the maximum and minimum Bernstein coefficients, which bound the cost and constraint polynomials on the corresponding interval. As per Definition 9, the green patch is feasible, the pink patches are infeasible, and the grey patches are undecided; the purple dashed lines show the inequality constraint cut-off (zero) and the equality constraint tolerance (note that is chosen to be this large only for illustration purposes). Per Definition 11, the light blue patch is suboptimal; the blue dashed line in the top plot is the current solution estimate (Definition 10). The infeasible and suboptimal patches are each marked with for elimination (Algorithm 5), since they cannot contain the global optimum (Theorem 12); the feasible and undecided patches are kept for the next iteration.

This section proposes the Parallel Constrained Bernstein Algorithm (PCBA, Algorithm 1) for solving a general POP. We extend the approach in [31]. This approach utilizes Bernstein form to obtain upper and lower bounds of both objective and constraint polynomials (Theorem 2), iteratively improves such bounds using subdivision (Theorem 4 and Corollary 5), and removes patches that are cannot contain a solution (Theorem 12). We discuss the algorithm, the list used to store patches, tolerances and stopping criteria, subdivision, a cut-off test for eliminating patches, and the advantages and disadvantages of PCBA. The next section, §IV, proves PCBA finds globally optimal solutions to POPs, up to user-specified tolerances.

Inputs: Polynomials , , as in (P), of dimension ; optimality tolerance , step tolerance , and equality constraint tolerance ; and maximum number of patches and of iterations .

Outputs: Estimate of optimal solution, and subbox containing optimal solution.


1:Initialize patches of , , and over -dimensional initial domain box as in [44] InitPatches
2:Initialize lists of undecided patches and patch extrema on the GPU
3:Initialize iteration count and subdivision direction ,
4:Test for sufficient memory (iteration begins here) if then go to 12 else continue end if
5: (Parallel) Subdivide each patch in in the th direction to create two new patches using Algorithm 2 Subdivide()
6: (Parallel) Find bounds of , , and on each new patch using Algorithm 3 FindBounds
7:Estimate upper bound of the global optimum as the least upper bound of all feasible patches, and determine which patches to eliminate using Algorithm 4 CutOffTest;
8:Test if problem is feasible if then go to 13
9:Test stopping criteria for all if    and    and then go to 12 end if
10: (Parallel) Eliminate infeasible, suboptimal patches using Algorithm 5 Eliminate;
11:Prepare for next iteration if then end if if then go to Step 12 else go to Step 4 end if
12:Return current best approximate solution for which return ,
13:No solution found (problem infeasible)
Algorithm 1 Parallel Constrained Bernstein Algorithm (PCBA)

Before proceeding, we make an assumption for notational convenience, and to make the initial computation of Bernstein patches easier.

Assumption 6.

Without loss of generality, the domain of the decision variable is the unit box (i.e., ), since any nonempty box in can be mapped affinely onto [44].

Iii-a Algorithm Summary

We now summarize PCBA, implemented in Algorithm 1. The algorithm is initialized by computing the Bernstein patches of the cost and constraints on the domain (Line 1). Subsequently, PCBA subdivides each patch as in Remark 8 (Line 5 and Algorithm 2). Then, PCBA finds the upper and lower bounds of each new patch (Line 6 and Algorithm 3). These bounds are used to determine which patches are feasible, infeasible, and undecided as in Definition 9 (Line 7; see Algorithm 4 and Theorem 12). Algorithm 4 also determines the current solution estimate (the smallest upper bound over all feasible patches), and marks any patches that are suboptimal as in Definition 11. If every patch is infeasible (Line 8), PCBA returns that the problem is infeasible (Line 13); otherwise, PCBA checks if the current solution estimate meets user-specified tolerances (Line 9). If the tolerances are met, PCBA returns the solution estimate (Line 12). Otherwise, PCBA eliminates all infeasible and suboptimal patches (Line 10 and Algorithm 5), then moves to the next iteration (Line 11). Note, algorithms 2, 3, and 5 are parallelized.

Iii-B Items and The List

Denote an item as the tuple , where (resp. ) is shorthand for the set of patches (resp. ). We use the following notation for items. If , then , , , and .

We denote the list , , indexed by . PCBA adds and removes items from by assessing the feasibility and optimality of each item.

Iii-C Tolerances and Stopping Criteria

Recall that, by Theorem 2, Bernstein patches provide upper and lower bounds for polynomials over a box. From Theorem 4 and Corollary 5, as we subdivide into smaller subboxes, the bounds of the Bernstein patches on each subbox more closely approximate the actual bounds of the polynomial. While the bounds will converge to the value of the polynomial in the limit as the maximum width of subboxes goes to zero, to ensure the algorithm terminates, we must set tolerances on optimality and equality constraint satisfaction (the equality constraints may not be satisfied for all points in certain boxes). During optimization one is also usually interested in finding the optimizer of the cost function up to some resolution. In our case, this resolution corresponds to the maximum allowable subbox width which we refer to as the step tolerance.

Definition 7.

We denote the optimality tolerance as , the equality constraint tolerance as , and the step tolerance as . We terminate Algorithm 1 either when is empty (the problem is infeasible) or when there exists an item that satisfies all of the following conditions:

  1. ,

  2. for all ,

  3. for all , and

  4. for all .

We discuss feasibility in more detail in section III-E Note that, to implement the step tolerance , since we subdivide by halving the width of each subbox, we need only ensure that sufficiently many iterations have passed.

Note that we do not set a tolerance on inequality constraints, since these are “one-sided” constraints; for any inequality constraint and subbox , we satisfy the constraint if (see Definition 9 and Theorem 17).

Iii-D Subdivision

Recall that subdivision is presented in §II-E. We implement subdivision with Algorithm 2. Since the subdivision of one Bernstein patch is computationally independent of another, each subdivision task is assigned to an individual GPU thread, making Algorithm 2 parallel.

Note that the subdivision of Bernstein patches can be done in any direction, leading to the question of how to select the direction in practice. Example rules are available in the literature, such as maximum width [38, §3], derivative-based [47, §3], or a combination of the two [38, §3]. In the context of constrained optimization, the maximum width rule is usually favored over derivative-based rules for two reasons: first, computing the partial derivatives of all constraint polynomials can introduce significant computational burden, especially when the number of constraints is large (see §V-C); second, the precision of Bernstein patches as bounds to the polynomials depends on the maximum width of each subbox (Theorem 4 and Corollary 5), so it is beneficial to subdivide along the direction of maximum width for better convergence results.

In each th iteration of PCBA, we subdivide in each direction , in the order . We halve the width of each subbox each time we subdivide, leading to the following remark.

Remark 8.

In the th iteration, the maximum width of any subbox in is .

2:parfor  do
3:     ;
4:     Subdivide along the th direction into and
5:     Compute patches and
6:     Compute patches and
7:     Compute patches and
10:end parfor
Algorithm 2 Subdivision (Parallel)

Iii-E Cut-Off Test

Subdivision would normally occur for every patch in every iteration, leading to exponential memory usage ( patches in iteration ). However, by using a cut-off test, some patches can be deleted, reducing both the time and memory usage of PCBA (see §IV for complexity analysis). To decide which patches are to be eliminated, we require the following definitions.

Definition 9.

An item is feasible if both of the following hold:

  1. for all , and

  2. for all .

An item is infeasible if any of the following hold:

  1. for at least one , or

  2. for at least one , or

  3. for at least one .

An item is undecided if it is neither feasible nor infeasible.

Notice in particular, a feasible item must not be infeasible.

Definition 10.

The solution estimate is the smallest upper bound of the cost over all feasible items in :


where if .

Definition 11.

An item is suboptimal if


Note that Definitions 10 and 11 are dependent on ; that is, for the purposes of PCBA, optimality is defined in terms of the elements of . We show in Corollary 13 below how this notion of optimality coincides with optimality of the POP itself.

Feasible, infeasible, undecided, and suboptimal patches are illustrated in Figure 2. Any item that is infeasible or suboptimal can be eliminated from , because the corresponding subboxes cannot contain the solution to the POP (formalized in the following Theorem). We call checking for infeasible and suboptimal items the cut-off test.

Theorem 12 (Cut-Off Test).

Let be an item. If the item is infeasible (as in Definition 9) or suboptimal (as in Definition 11), then does not contain a global minimizer of (P). Such item can be removed from the list .


Let be an item in . We only need to show:

  1. if is feasible, then all points in are feasible (up to the tolerance ), and

  2. if is infeasible, then all points in are infeasible (up to the tolerance ), and

  3. if is suboptimal, then all points in are not optimal.

Note that 1 and 2 follows directly from Theorem 2. To prove 3, let be a subbox on which the solution estimate is achieved, that is, is feasible and


Let be arbitrary, then it follows from Theorem 2 and the definition of suboptimality that


for all . Since such point is necessarily feasible (obtained from condition 2), cannot be global minimum to the POP.

Corollary 13.

Suppose there exists a (feasible) global minimizer of the POP (P). Then, while executing Algorithm 1, there always exists an item such that .


This result is the contrapositive of Theorem 12.

2:parfor  do
4:     Find and by parallel reduction
5:     Find and similarly
6:     Find and similarly
8:end parfor
Algorithm 3 FindBounds (Parallel)
3:for  do
5:     if  then
6:         if  then
8:         end if
9:         if  then
11:         end if
12:     end if
13:end for
14:Initialize lists for indices of patches to save or eliminate ,
15:for  do
16:     if  and and  then
17:         Append to
18:     else
19:         Append to
20:     end if
21:end for
22:return ,
Algorithm 4 = CutOffTest
4:if  or  then
5:     return
6:end if
7:for  do
8:     if  then
10:         break
11:     end if
12:end for
13:parfor  do
15:end parfor
Algorithm 5 Eliminate (Parallel)

We implement the cut-off tests as follows. Algorithm 3 (FindBounds) computes the maximum and minimum element of each Bernstein patch; Algorithm 4 (CutOffTest) implements the cut-off tests and marks all subboxes to be eliminated with a list ; and Algorithm 5 (Eliminate) eliminates the marked subboxes from the list . Algorithms 3 and 5 are parallelizable, whereas Algorithm 4 must be computed serially.

Iii-F Advantages and Disadvantages of PCBA

PCBA has several advantages. First, it always finds a global optimum (if one exists), subject to tolerances. PCBA does not require an initial guess, and does not converge to local minima, unlike generic nonlinear solvers (e.g., fmincon [28]). It also does not require tuning hyperparameters. As we show in §IV, PCBA has bounded time and memory complexity under certain assumptions. Finally, due to parallelization, PCBA is fast enought to enable RTD* for real-time, safe, optimal trajectory planning, which we demonstrate in §VI.

However, PCBA also has several limitations in comparison to traditional approaches to solving POPs. First, to prove the bounds on time and memory usage, at any global minimum, we require that active constraints are linearly independent, and that the Hessian of the cost function is positive definite (see Theorems 14 and 17). Furthermore, due to the number of Bernstein patches growing exponentially with the decision variable dimension, we have not yet applied PCBA to problems larger than four-dimensional.

Iv Complexity Analysis

In this section, we prove that Algorithm 1 terminates by bounding the number of iterations of PCBA for both unconstrained and constrained POPs. We also prove the number of Bernstein patches (i.e., the length of the list in Algorithm 1) is bounded after sufficiently many iterations, under certain assumptions. For convenience, in the remainder of this section we use as a shorthand notation for where . The proofs of Theorems 16, 17, and 18 are in the Appendix.

Iv-a Unconstrained Case

We first consider unconstrained POPs whose optimal solutions are not on the boundary of . Note, we can treat optimal solutions on the boundary of as having active linear constraints; see §IV-B for the corresponding complexity analysis. In the unconstrained case, all points in are feasible, and we are interested in solving


where is an -dimensional multivariate polynomial. Given an optimality tolerance and step tolerance , we bound the number of iterations to solve (16) with PCBA as follows:

Theorem 14.

Let in (16) be a multivariate polynomial of dimension with Lipschitz constant . Then the maximum number of iterations needed to solve (16) up to accuracy and is


where is the constant in Theorem 4 corresponding to polynomial , and rounds up to the nearest integer.


Let be the current iteration number. It is sufficient to show that, for all , there exists a subbox of such that the following conditions hold:

  1. ; and

  2. for all .

Let be a minimizer of (16). According to Corollary 13, there exists a subbox such that . To prove such satisfies Condition 1, notice from Remark 8 that


To prove Condition 2, first notice for any ,


where (19) follows from Theorem 4; and (20) follows from the definition of . Therefore


where (21) follows from Corollary 5 and (20); (22) is true because for all ; and (23) follows from (17) and the assumption that .

According to (17), the rate of convergence with respect to (WRT) the decision variables is quadratic (1st term); the rate of convergence WRT the objective function is either quadratic (2nd term) or linear (3rd term), depending on which term dominates. However, a tighter bound exists if one of the global minimizers satisfies the second-order sufficient condition for optimality [33, Theorem 2.4], which is shown in the following theorem:

Theorem 15.

Let in (16) be a multivariate polynomial of dimension . Let the Hessian be positive definite at some global minimizer of (16), where is not on the boundary of . Then the maximum number of iterations needed to solve (16) up to accuracy and is


where and are constants that only depend on .

Theorem 15 proves a quadratic rate of convergence WRT the objective function once the number of iterations exceeds a threshold , given that the second-order optimality condition is satisfied at some global minimizer. We now discuss the number of patches remaining after sufficiently many iterations, which gives an estimate of memory usage when Algorithm 1 is applied to solve (16).

Theorem 16.

Suppose there are global minimizers of (16), and none of them are on the boundary of the unit box . Let the Hessian be positive definite at these minimizers. Then after sufficiently many iterations of Algorithm 1, the number of Bernstein patches remaining (i.e., length of the list in Algorithm 1) is bounded by a constant.

The constant bound in Theorem 16 scales exponentially with the problem dimension, and is a function of the condition number of the cost function’s Hessian at the minimizers.

Iv-B Constrained Case

Theorem 17.

Suppose that the linear independence constraint qualification (LICQ) [33, Definition 12.4] is satisfied at all global minimizers of the constrained POP (P), and at least one constraint is active (i.e., the active set [33, Definition 12.1] is nonempty) at some minimizer . Then the maximum number of iterations needed to solve (P) up to accuracy , , and equality constraint tolerance is


where , , are constants.

Theorem 17 gives a bound on the number of PCBA iterations needed to solve a POP up to specified tolerances. In particular, (25) shows the rate of convergence is linear in step tolerance (2nd term), equality constraint tolerance (3rd term), and objective function (4th term), once the number of iterations is larger than a constant (1st term). We next prove a bound on the number of items in the list after sufficiently many iterations.

Theorem 18.

Suppose there are () global minimizers of the constrained problem (P), and none of them are on the boundary of the unit box . Let the critical cone (see [33, (12.53)]) be nonempty for () as in the proof of Theorem 17. Then after sufficiently many iterations of Algorithm 1, the number of Bernstein patches remaining (i.e., length of the list ) is bounded by a constant.

The constant proved in Theorem 18 scales exponentially with respect to the dimension of the problem.

Iv-C Memory Usage Implementation

We now state the amount of GPU memory required to store a single item , given the degree and dimension of the cost and constraint polynomials. Note that, for our implementation, all numbers in an item are represented using 4B of space, as either floats or unsigned integers.

For a multi-index , let , and let for . Let be the multi-degree of the cost . Let be a multi-degree large enough for all inequality constraints , and a multi-degree large enough for all equality constraints . By “large enough” we mean that, if is the multi-degree of any then (and similarly for ). Then, as per [44, §4.1], an item can be stored in memory as an array with the following number of entries:


where the first entries store the upper and lower bounds (in each dimension) of the subbox .

Iv-D Summary

We have shown that PCBA will find a solution to (P), if one exists, in bounded time. We have also shown that the memory usage of PCBA is bounded after a finite number of iterations, which implies that the memory usage is bounded; and we have provided a way to compute how much memory is required to store the list . Next, we benchmark PCBA on a variety of problems and compare it to two other solvers.

V PCBA Evaluation

In this section, we compare PCBA against a derivative-based solver (MATLAB’s fmincon [28]) and a convex relaxation method (Lasserre’s BSOS [20]). First, we test all three solvers on eight Benchmark Evaluation problems with dimension less than or equal to 4. Second, we compare all three solvers on several Increasing Number of Constraints problems, to assess how each solver scales on a variety of difficult objective functions [11].

All of the solvers/problems in this section are run on a computer with a 3.7GHz processor, 16 GB of RAM, and an Nvidia GTX 1080 Ti GPU. PCBA is implemented with MATLAB R2017b executables and CUDA 10.2. Our code is available on GitHub:

V-a Parameter Selection

To set up a fair comparison, we scale each problem to the box, where is the problem dimension. For PCBA, we use the stopping criteria in §III. To choose , we first compute the patch , then set


We set the maximum number of iterations to . We do not set , which determines the minimum number of iterations; is only needed to prove complexity bounds in §IV.

BSOS [20, §4] requires the user to specify the size of the semidefinite matrix associated with the convex relaxation of the POP. This is done by by selecting a pair of parameters, and (note these are different from our use of and ). Though one has to increase and gradually to ensure convergence, larger values of and correspond to larger semidefinite programs, which can be difficult to solve. We chose and separately for the Benchmark Evaluation. We used for the Increasing Number of Constraints.

For fmincon [28], we set the OptimalityTolerance option to in (27). We set and . We also provide fmincon with the analytic gradients of the cost and constraints.

V-B Benchmark Evaluation


We tested PCBA, BSOS, and fmincon on eight benchmark POPs [31], listed as P1 through P8; the problems are reported in the Appendix. We ran each solver 50 times on each problem, and report the median solution error and time required to find a solution. Since fmincon may or may not converge to the global optimum depending on its initial guess, we used random initial guesses for each of the 50 attempts.


The results are summarized in Table I. For additional results (e.g., to plot the results of any of the problems), see the GitHub repository.

In terms of solution quality, PCBA always found the solution to within the desired optimality tolerance , except for on P1, where PCBA stopped at the maximum allowed number of iterations (28); PCBA always used between 22 and 28 iterations. BSOS always found a lower bound to the solution, as expected. While fmincon converged to the global optimum at least once on every problem, it often converged to local minima, hence the large error values on some problems. In terms of solve time, fmincon solves the fastest (in 10–20 ms), PCBA is about twice as slow as fmincon, and BSOS is one to two orders of magnitude slower than PCBA.

For PCBA, the memory usage (computed with (26)) increases roughly by one order of magnitude for each additional dimension of the decision variable increases (Table I reports the peak GPU memory used by PCBA on each benchmark problem). Notice that PCBA never uses more than several MB of GPU memory, which is much less than the 11 GB available on the Nvidia GTX 1080 Ti GPU. Figure 3 shows the number of patches and the amount of GPU memory used on P4. We see that the memory usage peaks, then stays below a constant, as predicted by Theorem 18.

Fig. 3: The maximum number of patches (left axis) and corresponding GPU memory used (right axis) at each iteration of PCBA, for P4 of the benchmark problems (see §V). This problem took 24 iterations to solve. Notice that the number of patches peaks in iteration 5, then stays under 400 patches at every iteration from iteration 9 onwards; this visualizes Theorem 18.
PCBA BSOS [20] fmincon [28]
error time [s] iterations memory error time [s] error time [s]
P1 2 7.0000e-07 7.9002e-07 0.0141 28 23 kB (3,3) -0.0068 1.2264 1.0880 0.0083
P2 2 0.1369 0.0731 0.0131 26 60 kB (2,2) -3.4994e-04 0.7671 -0.4447 0.0180
P3 2 2.0000e-06 1.9879e-06 0.0128 26 57 kB (2,2) -3.2747 0.5065 -3.0000 0.0220
P4 3 1.7000e-06 6.5565e-07 0.0204 24 416 kB (4,4) -0.9455 6.6546 3.2000e-05 0.0130
P5 3 1.0000e-06 0 0.0312 28 3 MB (2,2) -4.4858e-07 0.8462 7.9985e-06 0.0062
P6 4 4.2677e-04 1.8685e-04 0.0315 23 2 MB (3,3) -36.6179 6.2434 0.0040 0.0065
P7 4 5.0000e-07 2.5280e-07 0.0374 26 282 kB (1,1) -1.0899 0.1839 2.4002e-07 0.0139
P8 4 1.3445e-04 6.7803e-05 0.0440 22 6 MB (2,2) -3.2521 1.8295 9.9989e-04 0.0169
TABLE I: Results for PCBA, BSOS, and fmincon on eight benchmark problems with 2, 3, and 4 dimensional (column ) decision variables (see the Appendix for more details). The error columns report each solver’s result minus the true global minimum. For all three solvers, the reported error and time to find a solution are the median over 50 trials (with random initial guesses for fmincon). For PCBA, we also report the optimality tolerance (as in (27)), number of iterations to convergence, and peak GPU memory used. Note that, on P1 and P5, PCBA stopped at the maximum number of iterations (28).

V-C Increasing Constraint Problems

Next, we tested each solver on problems with an increasing number of constraints, to each solver for use with RTD; we find in practice that a robot running RTD must handle between 30 and 300 constraints at each planning iteration.


We first choose an objective function with either many local minima or a nearly flat gradient near the global optimum (the global optimizer is known for each function). In particular, we tested on the ElAttar-Vidyasagar-Dutta, Powell, Wood, Dixon-Price (with ), Beale, Bukin02, and Deckkers-Aarts problems (see the Appendix and [11]).

For each objective function, we generate 200 random constraints in total, while ensuring at least one global optimizer stays feasible (if there are multiple global optimizers, we choose one at random that will be feasible for all constraints). To generate a single constraint , we first create a polynomial as a sum of the monomials of the decision variable with maximum degree 2, with random coefficients in the range . To ensure is feasible, we evaluate on , then subtract the resulting value from to produce (i.e., ).

We run PCBA, BSOS, and fmincon on each objective function for 20 trials, with 10 random constraints in the first trial, and adding 10 constraints in each trial. As before, we run fmincon 50 times for each trial with random initial guesses, since its performance is dependent upon the initial guess.


To illustrate the results, data for the Powell objective function are shown in Figure 4. The data (and plots) for the other objective functions are available in the GitHub repository

In terms of solution quality, all three algorithms converge to the global optimum often when the number of constraints is low, but fmincon converges to suboptimal solutions more frequently as the number of constraints increases. PCBA and BSOS are always able to find the optimal solution. PCBA is always able to find the global optimum regardless of the number of constraints, unlike BSOS (which runs out of memory) or fmincon (which converges to local minima).

All three solvers require an increasing amount of solve time as the number of constraints increases. PCBA is comparable in speed to fmincon on 2-D problems, but is typically slower on higher-dimensional problems. Regardless of the number of constraints, BSOS takes three to four orders of magnitude more time to solve than PCBA or fmincon.

More details on PCBA are presented in Table II. PCBA’s time to find a solution increases roughly by an order of magnitude when the decision variable dimension increases by 1; however, PCBA solves all of the increasing constraint POPs within 0.5 s. The memory usage increases by 1–3 orders of magnitude with each additional dimension; however, PCBA never uses more than 650 MB of GPU memory, well below the 11 GB available. Figure 5 shows PCBA’s GPU memory usage versus the number of constraints for the Powell objective function.

V-D Summary

As expected from the complexity bounds in §IV-B, the results in this section indicate that PCBA is practical for quickly solving 2-D POPs with hundreds of constraints. We leverage this next by applying PCBA to solve RTD’s POP (4) for real-time receding-horizon trajectory optimization.

Fig. 4: Results for an increasing number of constraints on the Powell objective function (see the Appendix) for the PCBA, BSOS, and fminconṪhe top plot shows the time required to solve the problem as the number of constraints increases. The bottom plot shows the error between each solver’s solution and the true global optimum. For both time and error, fmincon is shown as a box plot over 50 trials with random initial guesses; the central red line indicates the median, the top and bottom of the red box indicate the 25th and 75th percentiles, the black whiskers are the most extreme values not considered outliers, and the outliers are red plus signs. PCBA solves the fastest in general; fmincon typically solves slightly slower than PCBA for more than 40 constraints; and BSOS is the slowest solver. PCBA and BSOS always find the global optimum, as does fmincon when there are not many constraints, because the Powell objective function is convex. Above 30 constraints, fmincon frequently has large error due to convergence to local minima.
Fig. 5: The approximate peak GPU memory used by PCBA for the Powell problem, as a function of the number of constraints. Since the amount of memory required per item in the list grows linearly with the number of constraints, the overall memory usage also grows linearly. However, at 160 constraints, we see a drop in the memory usage; this is because the additional constraints render more parts of the problem domain infeasible, resulting in more items being eliminated per PCBA iteration. Note that the maximum memory usage is well under the several GB of memory available on a typical GPU.
max time [s] max items max memory
E-V-D 2 0.0360 66 171 kB
Powell 2 0.0447 1200 6.24 MB
Wood 2 0.0532 54 220 kB
D-P 2-D 2 0.0259 90 433 kB
D-P 3-D 3 0.0675 356 3.47 MB
D-P 4-D 4 0.402 4994 193 MB
Beale 2 0.0302 106 259 kB
Bukin02 4 0.393 10550 110 MB
D-A 4 0.389 51886 647 MB
TABLE II: Results for the increasing constraints PCBA evaluation. Abbreviated problem names (as in the Appendix) are on the left, along with each problem’s decision variable dimension . Over all 20 trials (with between 10 and 200 constraints), we report the maximum time spent find a solution, the maximum number of items in the list , and the maximum amount of GPU memory used. Note that the problems all solved under 0.5 s regardless of the number of constraints, and no problem requested more than 650 MB of memory.

Vi Hardware Demonstrations

Recall from §II-C that RTD enables real-time, provably collision-free trajectory planning via solving a POP every planning iteration. RTD is provably collision-free regardless of the POP solver used, meaning that we are able to test PCBA safely on hardware; when PCBA is applied to RTD, we call the resulting trajectory planning algorithm RTD*. See Figure 1 and the video (click for link).

In this section, we apply RTD* to a Segway robot navigating a hallway with static obstacles. Recall that RTD always produces dynamically feasible trajectory plans [18]. As proven in Corollary 13 and demonstrated in §V, PCBA always finds optimal solutions. This section shows that PCBA/RTD* improves the liveness of a robot by successfully navigating a variety of scenarios, and outperforming fmincon/RTD.

Vi-a Overview


We ran two demonstrations in a m hallway. In the first demonstration, we filled the hallway with random static obstacles and ran RTD*. In the second demonstration, we constructed two difficult scenarios and ran bot PCBA/RTD* and fmincon/RTD on each.


We use a Segway differential-drive robot with a planar Hokuyo UTM-30LX LIDAR for mapping and obstacle detection (see Figure 1). Mapping, localization, and trajectory optimization run onboard, on a 4.0 GHz laptop with an Nvidia GeForce GTX 1080 GPU.


As in §II-C (also see Figure 1), the robot plans by optimizing over a set of parameterized trajectories as in [18, (16)]. The parameters are speed m/s and yaw rate rad/s, so . The trajectories are between 1 and 2 s long, depending on the robot’s initial speed (since each trajectory includes a braking maneuver). The robot must find a new plan (i.e., PCBA must solve a new POP) every 0.5 s, or else it begins executing the braking maneuver associated with its previously-computed plan [18, Remark 70]. In other words, we require PCBA to return a feasible solution or that the problem is infeasible. PCBA is given a time limit of 0.4 s to find a solution, because the robot requires 0.1 s for other onboard processes.

At each planning iteration, obstacles are converted into constraints as in [18, Sections 6 and 7]; thi