Safe, Optimal, Realtime Trajectory Planning with a Parallel Constrained Bernstein Algorithm
Abstract
To move through the world, mobile robots typically use a recedinghorizon strategy, wherein they execute an old plan while computing a new plan to incorporate new sensor information. A plan should be dynamically feasible, meaning it obeys constraints like the robot’s dynamics and obstacle avoidance; it should have liveness, meaning the robot does not stop to plan so frequently that it cannot accomplish tasks; and it should be optimal, meaning that the robot tries to satisfy a userspecified cost function such as reaching a goal location as quickly as possible. Reachabilitybased Trajectory Design (RTD) is a planning method that can generate provably dynamicallyfeasible plans. However, RTD solves a nonlinear polynmial optimization program at each planning iteration, preventing optimality guarantees; furthermore, RTD can struggle with liveness because the robot must brake to a stop when the solver finds local minima or cannot find a feasible solution. This paper proposes RTD*, which certifiably finds the globally optimal plan (if such a plan exists) at each planning iteration. This method is enabled by a novel Parallelized Constrained Bernstein Algorithm (PCBA), which is a branchandbound method for polynomial optimization. The contributions of this paper are: the implementation of PCBA; proofs of bounds on the time and memory usage of PCBA; a comparison of PCBA to state of the art solvers; and the demonstration of PCBA/RTD* on a mobile robot. RTD* outperforms RTD in terms of optimality and liveness for realtime planning in a variety of environments with randomlyplaced obstacles.
I Introduction
For mobile robots to operate successfully in unforeseen environments, they must plan their motion as new sensor information becomes available. This recedinghorizon strategy requires iteratively generating a plan while simultaneouly executing a previous plan. Typically, this requires solving an optimization program in each planning iteration (see, e.g., [19]).
The present work considers a mobile ground robot tasked with reaching a global goal location in an arbitrary, static environment. To assess recedinghorizon planning performance, we consider the following characteristics of plans. First, a plan should be dynamically feasible, meaning that it obeys the dynamic description of the robot and obeys constraints such as actuator limits and obstacle avoidance. Second, a plan should maintain liveness, meaning that it keeps the robot moving through the world without stopping frequently to replan, which can prevent a robot from achieving a task in a userspecified amount of time. Third, a plan should be optimal with respect to a userspecified cost function, such as reaching a goal quickly.
Ensuring that plans have these characteristics is challenging for several reasons. First, robots typically have nonlinear dynamics; this means that creating a dynamicallyfeasible plan typically requires solving a nonlinear program (NLP) at runtime. However, it is difficult to certify that an NLP can be solved in a finite amount of time, meaning that the robot may have to sacrifice liveness. Furthermore, even if a robot has linear dynamics, the cost function may be nonlinear; in this case, it is challenging to certify optimality due to the presence of local minima.
This paper extends prior work on Reachabilitybased Trajectory Design (RTD). RTD is able to provably generate dynamicallyfeasible trajectory plans in real time, but cannot guarantee optimality or liveness (the robot will generate plans in real time, but may brake to a stop often). We address this gap by proposing the Parallel Constrained Bernstein Algorithm (PCBA), which provably finds globallyoptimal solutions to Polynomial Optimization Problems (POPs), a special type of NLP. We apply PCBA to RTD to produce an optimal version of RTD, which we call RTD* in the spirit of the wellknown RRT* algorithm [17]. We show on hardware that RTD* demonstrates liveness (in comparison to RTD) for trajectory optimization. For the remainder of this section, we discuss related work, then state our contributions.
Ia Related Work
Recedinghorizon Planning
A variety of methods exist that attempt recedinghorizon planning while maintaining dynamic feasibility, liveness, and optimality. These methods can be broadly classified by whether they perform sampling or solve an optimization program at each planning iteration. Samplingbased methods typically either attempt to satisfy liveness and dynamic feasibility by choosing samples offine [34, 24], or attempt to satisfy optimality at the potential expense of liveness and dynamic feasibility [17]. Optimizationbased methods attempt to find a single optimal trajectory. These methods typically have to sacrifice dynamic feasibility (e.g., by linearizing the dynamics) to ensure liveness [29], or sacrifice liveness to attempt to satisfy dynamic feasibility [46, 8] (also see [18, §9] and [37]).
Reachabilitybased Trajectory Design
Reachabilitybased Trajectory Design (RTD) is an optimizationbased receding horizon planner that requires solving a POP at each planning iteration [18]. RTD specifies plans as parameterized trajectories. Since these trajectories cannot necessarily be perfectly tracked by the robot, RTD begins with an offline computation of a Forward Reachable Set (FRS). The FRS contains every parameterized plan, plus the tracking error that results from the robot not tracking any plan perfectly. At runtime, in each planning iteration, the FRS is intersected with sensed obstacles to identify all parameterized plans that could cause a collision (i.e., be dynamically infeasible). This set of unsafe plans is represented as a (finite) list of polynomial constraints, and the user is allowed to specify an arbitrary (not necessarily convex) polynomial cost function, resulting in a POP. At each planning iteration, either the robot successfully solves the POP to get a new plan, or it continues executing its previouslyfound plan. While the decision variable is typically only two or threedimensional, each POP often has hundreds of constraints, making it challenging to find a feasible solution in realtime [18]. Each plan includes a braking maneuver, ensuring that the robot can always come safely to a stop if the POP cannot be solved quickly enough in any planning iteration.
Note, for RTD, optimality means finding the optimal solution to a POP at each planning iteration. The cost function in RTD’s POPs typically encode behavior such as reaching a waypoint between the robot’s current position and the global goal (e.g., [18, §9.2.1]; RTD attempts to find the best dynamically feasible trajectory to the waypoint. RTD does not attempt to find the best waypoints themselves (best, e.g., with respect to finding the shortest path to the global goal). Such waypoints can be generated quickly by algorithms such as A* or RRT* by ignoring dynamic feasibility [17, 18].
Polynomial Optimization Problems
POPs require minimizing (or maximizing) a polynomial objective function, subject to polynomial equality or inequality constraints. As a fundamental class of problems in nonconvex optimization, POPs arise in various applications, including signal processing [35, 43, 27], quantum mechanics [4, 12], control theory [16, 18, 2], and robotics [39, 26]. This paper presents a novel parallelized constrained Bernstein Algorithm for solving POPs.
The difficulty of solving a POP increases with the dimension of the cost and constraints, with the number of constraints, and with the number of optima [33]. Existing methods attempt to solve POPs while minimizing time and memory usage (i.e., complexity). Doing so typically requires placing limitations on one of these axes of difficulty to make solving a POP tractable. These methods broadly fall into the following categories: derivativebased, convex relaxation, and branchandbound.
Drivativebased methods use derivatives (and sometimes Hessians) of the cost and constraint functions, along with first or secondorder optimality conditions [33, §12.3, §12.5], to attempt to find optimal, feasible solutions to nonlinear problems such as POPs. These methods can find local minima of POPs rapidly despite high dimension, a large number of constraints, and high degree cost and constraints [33, Chapter 19.8]. However, these methods do not typically converge to global optima without requiring assumptions on the problem and constraint structure (e.g., [36]).
Convex relaxation methods attempt to find global optima by approximating the original problem with a hierarchy of convex optimization problems. These methods can be scaled to highdimensional problems (up to 10 dimensions), at the expense of limits on the degree and sparse structure of the cost function; furthermore, they typically struggle to handle large numbers of constraints (e.g., the hundreds of constraints that arise in RTD’s POPs), unless the problem has lowrank or sparse structure [39]. Wellknown examples include the liftandproject linear program procedure [1], reformulationlinearization technique [41], and SemiDefinite Program (SDP) relaxations [21, 39]. By assuming structure such as homogeneity of the cost function or convexity of the domain and constraints, one can approximate solutions to a POP in polynomialtime, with convergence to global optima in the limit [5, 22, 23, 42, 15]. Convergence within a finite number of convex hierarchy relaxations is possible under certain assumptions (e.g., a limited number of equality constraints [32, 20]).
Branchandbound methods perform an exhaustive search over the feasible region. These methods are typically limited to up to four dimensions, but can handle large numbers of constraints and high degree cost and constraints. Examples include interval analysis techniques [14, 45] and the Bernstein Algorithm (BA) [10, 31, 40]. Interval analysis requires cost and constraint function evaluations in each iteration, and therefore can be computationally slow. BA, on the other hand, does not evaluate the cost and constraint functions; instead, BA represents the coefficients of the cost and constraints in the Bernstein basis, as opposed to the monomial basis. The coefficients in the Bernstein basis provide lower and upper bounds on the polynomial cost and constraints over boxshaped subsets of Euclidean space by using a subdivision procedure [9, 30]. Note, one can use the Bernstein basis to transform a POP into a linear program (LP) on each subdivided portion of the problem domain, which allows one to find tighter solution bounds that given by the Bernstein coefficients alone [40]. Since subdivision can be parallelized [6], the time required to solve a POP can be greatly reduced by implementing BA on a Graphics Processing Unit (GPU). However, a parallelized implementation or bounds on the rate of convergence of BA with constraints has not yet been shown in the literature. Furthermore, to the best of our knowledge, BA has not been shown as a practical method for solving problems in realtime robotics applications.
IB Contributions and Paper Organization
In this paper, we make the following contributions. First, we propose a Parallel Constrained Bernstein Algorithm (PCBA) (§III). Second, we prove that PCBA always finds an optimal solution (if one exists), and prove bounds on PCBA’s time and memory usage (§IV). Third, we evaluate PCBA on a suite of wellknown POPs in comparison to the Bounded SumsofSquares (BSOS) [20] solver and a generic nonlinear solver (MATLAB’s fmincon) (§V). Fourth, we apply PCBA to RTD to make RTD*, a provablysafe, optimal, and realtime trajectory planning algorithm for mobile robots (§VI), thereby demonstrating dynamic feasibility and liveness. The remainder of the paper is organized as follows. §II defines notation for RTD and POPs. §VII draws conclusions. Appendix B lists benchmark problems for PCBA.
Ii Preliminaries
This section introduces notation, RTD, POPs, the Bernstein form, and subdivision.
Iia Notation
Polynomial Notation
We follow the notation in [31]. Let be real variable of dimension . A multiindex is defined as and the corresponding multipower is defined as . Given another multiindex of the same dimension, an inequality should be understood componentwise. An variate polynomial in canonical (monomial) form can be written as
(1) 
with coefficients and some multiindex . The space of polynomials of degree , with variable , is .
Definition 1.
We call the multidegree of a polynomial ; each ^{th} element of is the maximum degree of the variable out of all of the monomials of . We call the degree of ; is the maximum sum, over all monomials of , of the powers of the variable . That is, , where is the sum of the elements of a multiindex.
Point and Set Notation
Let denote a general dimensional box in , with for each . Let be the dimensional unit box. Denote by the maximum width of a box , i.e. . For any point , denote by the Euclidean norm of , and denote by the closed Euclidean ball centered at with radius .
RTD Notation
Let denote the time interval of a single plan (in a single recedinghorizon iteration). Let denote the robot’s state space, and denote the robot’s control inputs where . The robot is described by dynamics , which we call a highfidelity model. The parameterized trajectories are described by , where is the space of trajectory parameters. A point parameterizes a trajectory . For any , the robot uses a feedback controller to track the trajectory parameterized by (we say it tracks for short).
IiB Polynomial Optimization Problems
We denote a POP as follows:
(P) 
The decision variable is , where is a compact, boxshaped domain and is the dimension of the program. The cost function is , and the constraints are (). We assume for convenience that is the greatest degree amongst the cost and constraint polynomials; we call the degree of the problem.
IiC Reachabilitybased Trajectory Design
Recall that RTD begins by computing an FRS offline for a robot tracking a parameterized set of trajectories. At runtime, in each recedinghorizon planning iteration, the FRS is used to construct a POP as in (P); solving this POP is equivalent to generating a new trajectory plan. We now briefly describe how this POP is constructed (see [18] for details).
We define the FRS as the set of all states reachable by the robot when tracking any parameterized trajectory:
(2) 
where is the set of valid initial conditions for the robot in any planning iteration. To implement RTD, we conservatively approximate the FRS using sumsofsquares programming. In particular, we compute a polynomial ( or ) for which the 0superlevel set contains the FRS:
(3) 
See [18, Section 3.2] for details on how to compute .
At runtime, we use to identify unsafe trajectory plans. If is a collection of points on obstacles, then we solve the following program in each planning iteration:
(4) 
where is a userconstructed polynomial cost function (see (28) in §VI as an example). Note that, by (3), the set of safe trajectory plans is open, so we implement the constraints in (4) as , . Critically, any feasible solution to (4) is provably dynamically feasible (and collisionfree) [18, Theorem 68]. To understand RTD’s online planning in more detail, see [18, Algorithm 2].
In this work, instead of using a derivativebased method to solve (4), we use our proposed PCBA method, which takes advantage of the polynomial structure of and . Next, we discuss Bernstein polynomials.
IiD Bernstein Form
A polynomial in monomial form (1) can be expanded into Bernstein form over an arbitrary dimensional box as
(5) 
where is the ^{th} multivariate Bernstein polynomial of multidegree over , and are the corresponding Bernstein coefficients of over . A detailed definition of Bernstein form is available in [13]. Note that the Bernstein form of a polynomial can be determined quickly [44], by using a matrix multiplication on a polynomial’s monomial coefficients, with the matrix determined by the polynomial degree and dimension. This matrix can be precomputed, and the conversion from monomial to Bernstein basis only needs to happen once for the proposed method (see Algorithm 1 in §III).
For convenience, we collect all such Bernstein coefficients in a multidimensional array , which is called a patch. We denote by (resp. ) the minimum (resp. maximum) element in the patch . The range of polynomial over is contained within the interval spanned by the extrema of , formally stated as the following theorem:
Theorem 2 ([31, Lemma 2.2]).
Let be a polynomial defined as in (5) over a box . Then, the following property holds for a patch of Bernstein coefficients
(6) 
This theorem provides a means to obtain enclosure bounds of a multivariate polynomial over a box by transforming the polynomial to Bernstein form. This range enclosure can be further improved either by degree elevation or by subdivision. This work uses subdivision, discussed next, to refine the bounds.
IiE Subdivision Procedure
Consider an arbitrary box . The range enclosure in Theorem 2 is improved by subdividing into subboxes and computing the Bernstein patches over these subboxes. A subdivision in the ^{th} direction () is a bisection of perpendicular to this direction. That is, let
(7) 
be an arbitrary box over which the Bernstein patch is already computed. By subdividing in the ^{th} direction we obtain two subboxes and , defined as
(8) 
Note that we have subdivided by halving its width in the ^{th} direction; we choose as the subdivision parameter in this work, but one can choose a different value in (see [31, Equation (10)]).
The new Bernstein patches, and , can be computed by a finite number of linear transformations [31, §2.2]:
(9) 
where and are constant matrices, which can be precomputed, for each (notice that [31, Equation (10)] obtains with linear operations on ). The patches and one iteration of the subdivision procedure are shown in Figure 2.
Remark 3.
To reduce wordiness, we say that we subdivide a patch to mean the subdivision of a single subbox into two subboxes and the computation of the corresponding Bernstein patches for the POP cost and constraints.
By repeatedly applying the subdivision procedure and Theorem 2, the bounds on the range of polynomial in a subbox can be improved. In fact, such bounds can be exact in the limiting sense if the subdivision is applied evenly in all directions:
Theorem 4 ([25, Theorem 2]).
Let be a box of maximum width () and let be the corresponding Bernstein patch of a given polynomial , then
(10) 
where is a nonnegative constant that can be given explicitly independent of .
Notice from the proof of Theorem 4 that changing the sign of does not change the value of . By substituting with in Theorem 4, one can easily show a similar result holds for the maximum of over :
Corollary 5.
Theorems 4 and Corollary 5 provide shrinking bounds for values of a polynomial over subboxes as the subdivision process continues. By comparing the bounds over all subboxes, one can argue that the minimizers of a polynomial may appear in only a subset of the subboxes. This idea underlies the Bernstein Algorithm for solving POPs [30, 31], discussed next.
Iii Parallel Constrained Bernstein Algorithm
This section proposes the Parallel Constrained Bernstein Algorithm (PCBA, Algorithm 1) for solving a general POP. We extend the approach in [31]. This approach utilizes Bernstein form to obtain upper and lower bounds of both objective and constraint polynomials (Theorem 2), iteratively improves such bounds using subdivision (Theorem 4 and Corollary 5), and removes patches that are cannot contain a solution (Theorem 12). We discuss the algorithm, the list used to store patches, tolerances and stopping criteria, subdivision, a cutoff test for eliminating patches, and the advantages and disadvantages of PCBA. The next section, §IV, proves PCBA finds globally optimal solutions to POPs, up to userspecified tolerances.
Before proceeding, we make an assumption for notational convenience, and to make the initial computation of Bernstein patches easier.
Assumption 6.
Without loss of generality, the domain of the decision variable is the unit box (i.e., ), since any nonempty box in can be mapped affinely onto [44].
Iiia Algorithm Summary
We now summarize PCBA, implemented in Algorithm 1. The algorithm is initialized by computing the Bernstein patches of the cost and constraints on the domain (Line 1). Subsequently, PCBA subdivides each patch as in Remark 8 (Line 5 and Algorithm 2). Then, PCBA finds the upper and lower bounds of each new patch (Line 6 and Algorithm 3). These bounds are used to determine which patches are feasible, infeasible, and undecided as in Definition 9 (Line 7; see Algorithm 4 and Theorem 12). Algorithm 4 also determines the current solution estimate (the smallest upper bound over all feasible patches), and marks any patches that are suboptimal as in Definition 11. If every patch is infeasible (Line 8), PCBA returns that the problem is infeasible (Line 13); otherwise, PCBA checks if the current solution estimate meets userspecified tolerances (Line 9). If the tolerances are met, PCBA returns the solution estimate (Line 12). Otherwise, PCBA eliminates all infeasible and suboptimal patches (Line 10 and Algorithm 5), then moves to the next iteration (Line 11). Note, algorithms 2, 3, and 5 are parallelized.
IiiB Items and The List
Denote an item as the tuple , where (resp. ) is shorthand for the set of patches (resp. ). We use the following notation for items. If , then , , , and .
We denote the list , , indexed by . PCBA adds and removes items from by assessing the feasibility and optimality of each item.
IiiC Tolerances and Stopping Criteria
Recall that, by Theorem 2, Bernstein patches provide upper and lower bounds for polynomials over a box. From Theorem 4 and Corollary 5, as we subdivide into smaller subboxes, the bounds of the Bernstein patches on each subbox more closely approximate the actual bounds of the polynomial. While the bounds will converge to the value of the polynomial in the limit as the maximum width of subboxes goes to zero, to ensure the algorithm terminates, we must set tolerances on optimality and equality constraint satisfaction (the equality constraints may not be satisfied for all points in certain boxes). During optimization one is also usually interested in finding the optimizer of the cost function up to some resolution. In our case, this resolution corresponds to the maximum allowable subbox width which we refer to as the step tolerance.
Definition 7.
We denote the optimality tolerance as , the equality constraint tolerance as , and the step tolerance as . We terminate Algorithm 1 either when is empty (the problem is infeasible) or when there exists an item that satisfies all of the following conditions:

,

for all ,

for all , and

for all .
We discuss feasibility in more detail in section IIIE Note that, to implement the step tolerance , since we subdivide by halving the width of each subbox, we need only ensure that sufficiently many iterations have passed.
IiiD Subdivision
Recall that subdivision is presented in §IIE. We implement subdivision with Algorithm 2. Since the subdivision of one Bernstein patch is computationally independent of another, each subdivision task is assigned to an individual GPU thread, making Algorithm 2 parallel.
Note that the subdivision of Bernstein patches can be done in any direction, leading to the question of how to select the direction in practice. Example rules are available in the literature, such as maximum width [38, §3], derivativebased [47, §3], or a combination of the two [38, §3]. In the context of constrained optimization, the maximum width rule is usually favored over derivativebased rules for two reasons: first, computing the partial derivatives of all constraint polynomials can introduce significant computational burden, especially when the number of constraints is large (see §VC); second, the precision of Bernstein patches as bounds to the polynomials depends on the maximum width of each subbox (Theorem 4 and Corollary 5), so it is beneficial to subdivide along the direction of maximum width for better convergence results.
In each ^{th} iteration of PCBA, we subdivide in each direction , in the order . We halve the width of each subbox each time we subdivide, leading to the following remark.
Remark 8.
In the ^{th} iteration, the maximum width of any subbox in is .
IiiE CutOff Test
Subdivision would normally occur for every patch in every iteration, leading to exponential memory usage ( patches in iteration ). However, by using a cutoff test, some patches can be deleted, reducing both the time and memory usage of PCBA (see §IV for complexity analysis). To decide which patches are to be eliminated, we require the following definitions.
Definition 9.
An item is feasible if both of the following hold:

for all , and

for all .
An item is infeasible if any of the following hold:

for at least one , or

for at least one , or

for at least one .
An item is undecided if it is neither feasible nor infeasible.
Notice in particular, a feasible item must not be infeasible.
Definition 10.
The solution estimate is the smallest upper bound of the cost over all feasible items in :
(12) 
where if .
Definition 11.
An item is suboptimal if
(13) 
Note that Definitions 10 and 11 are dependent on ; that is, for the purposes of PCBA, optimality is defined in terms of the elements of . We show in Corollary 13 below how this notion of optimality coincides with optimality of the POP itself.
Feasible, infeasible, undecided, and suboptimal patches are illustrated in Figure 2. Any item that is infeasible or suboptimal can be eliminated from , because the corresponding subboxes cannot contain the solution to the POP (formalized in the following Theorem). We call checking for infeasible and suboptimal items the cutoff test.
Theorem 12 (CutOff Test).
Proof.
Let be an item in . We only need to show:

if is feasible, then all points in are feasible (up to the tolerance ), and

if is infeasible, then all points in are infeasible (up to the tolerance ), and

if is suboptimal, then all points in are not optimal.
Note that 1 and 2 follows directly from Theorem 2. To prove 3, let be a subbox on which the solution estimate is achieved, that is, is feasible and
(14) 
Let be arbitrary, then it follows from Theorem 2 and the definition of suboptimality that
(15) 
for all . Since such point is necessarily feasible (obtained from condition 2), cannot be global minimum to the POP.
Corollary 13.
Proof.
This result is the contrapositive of Theorem 12.
We implement the cutoff tests as follows. Algorithm 3 (FindBounds) computes the maximum and minimum element of each Bernstein patch; Algorithm 4 (CutOffTest) implements the cutoff tests and marks all subboxes to be eliminated with a list ; and Algorithm 5 (Eliminate) eliminates the marked subboxes from the list . Algorithms 3 and 5 are parallelizable, whereas Algorithm 4 must be computed serially.
IiiF Advantages and Disadvantages of PCBA
PCBA has several advantages. First, it always finds a global optimum (if one exists), subject to tolerances. PCBA does not require an initial guess, and does not converge to local minima, unlike generic nonlinear solvers (e.g., fmincon [28]). It also does not require tuning hyperparameters. As we show in §IV, PCBA has bounded time and memory complexity under certain assumptions. Finally, due to parallelization, PCBA is fast enought to enable RTD* for realtime, safe, optimal trajectory planning, which we demonstrate in §VI.
However, PCBA also has several limitations in comparison to traditional approaches to solving POPs. First, to prove the bounds on time and memory usage, at any global minimum, we require that active constraints are linearly independent, and that the Hessian of the cost function is positive definite (see Theorems 14 and 17). Furthermore, due to the number of Bernstein patches growing exponentially with the decision variable dimension, we have not yet applied PCBA to problems larger than fourdimensional.
Iv Complexity Analysis
In this section, we prove that Algorithm 1 terminates by bounding the number of iterations of PCBA for both unconstrained and constrained POPs. We also prove the number of Bernstein patches (i.e., the length of the list in Algorithm 1) is bounded after sufficiently many iterations, under certain assumptions. For convenience, in the remainder of this section we use as a shorthand notation for where . The proofs of Theorems 16, 17, and 18 are in the Appendix.
Iva Unconstrained Case
We first consider unconstrained POPs whose optimal solutions are not on the boundary of . Note, we can treat optimal solutions on the boundary of as having active linear constraints; see §IVB for the corresponding complexity analysis. In the unconstrained case, all points in are feasible, and we are interested in solving
(16) 
where is an dimensional multivariate polynomial. Given an optimality tolerance and step tolerance , we bound the number of iterations to solve (16) with PCBA as follows:
Theorem 14.
Proof.
Let be the current iteration number. It is sufficient to show that, for all , there exists a subbox of such that the following conditions hold:

; and

for all .
According to (17), the rate of convergence with respect to (WRT) the decision variables is quadratic (1^{st} term); the rate of convergence WRT the objective function is either quadratic (2^{nd} term) or linear (3^{rd} term), depending on which term dominates. However, a tighter bound exists if one of the global minimizers satisfies the secondorder sufficient condition for optimality [33, Theorem 2.4], which is shown in the following theorem:
Theorem 15.
Theorem 15 proves a quadratic rate of convergence WRT the objective function once the number of iterations exceeds a threshold , given that the secondorder optimality condition is satisfied at some global minimizer. We now discuss the number of patches remaining after sufficiently many iterations, which gives an estimate of memory usage when Algorithm 1 is applied to solve (16).
Theorem 16.
Suppose there are global minimizers of (16), and none of them are on the boundary of the unit box . Let the Hessian be positive definite at these minimizers. Then after sufficiently many iterations of Algorithm 1, the number of Bernstein patches remaining (i.e., length of the list in Algorithm 1) is bounded by a constant.
The constant bound in Theorem 16 scales exponentially with the problem dimension, and is a function of the condition number of the cost function’s Hessian at the minimizers.
IvB Constrained Case
Theorem 17.
Suppose that the linear independence constraint qualification (LICQ) [33, Definition 12.4] is satisfied at all global minimizers of the constrained POP (P), and at least one constraint is active (i.e., the active set [33, Definition 12.1] is nonempty) at some minimizer . Then the maximum number of iterations needed to solve (P) up to accuracy , , and equality constraint tolerance is
(25) 
where , , are constants.
Theorem 17 gives a bound on the number of PCBA iterations needed to solve a POP up to specified tolerances. In particular, (25) shows the rate of convergence is linear in step tolerance (2^{nd} term), equality constraint tolerance (3^{rd} term), and objective function (4^{th} term), once the number of iterations is larger than a constant (1^{st} term). We next prove a bound on the number of items in the list after sufficiently many iterations.
Theorem 18.
Suppose there are () global minimizers of the constrained problem (P), and none of them are on the boundary of the unit box . Let the critical cone (see [33, (12.53)]) be nonempty for () as in the proof of Theorem 17. Then after sufficiently many iterations of Algorithm 1, the number of Bernstein patches remaining (i.e., length of the list ) is bounded by a constant.
The constant proved in Theorem 18 scales exponentially with respect to the dimension of the problem.
IvC Memory Usage Implementation
We now state the amount of GPU memory required to store a single item , given the degree and dimension of the cost and constraint polynomials. Note that, for our implementation, all numbers in an item are represented using 4B of space, as either floats or unsigned integers.
For a multiindex , let , and let for . Let be the multidegree of the cost . Let be a multidegree large enough for all inequality constraints , and a multidegree large enough for all equality constraints . By “large enough” we mean that, if is the multidegree of any then (and similarly for ). Then, as per [44, §4.1], an item can be stored in memory as an array with the following number of entries:
(26) 
where the first entries store the upper and lower bounds (in each dimension) of the subbox .
IvD Summary
We have shown that PCBA will find a solution to (P), if one exists, in bounded time. We have also shown that the memory usage of PCBA is bounded after a finite number of iterations, which implies that the memory usage is bounded; and we have provided a way to compute how much memory is required to store the list . Next, we benchmark PCBA on a variety of problems and compare it to two other solvers.
V PCBA Evaluation
In this section, we compare PCBA against a derivativebased solver (MATLAB’s fmincon [28]) and a convex relaxation method (Lasserre’s BSOS [20]). First, we test all three solvers on eight Benchmark Evaluation problems with dimension less than or equal to 4. Second, we compare all three solvers on several Increasing Number of Constraints problems, to assess how each solver scales on a variety of difficult objective functions [11].
All of the solvers/problems in this section are run on a computer with a 3.7GHz processor, 16 GB of RAM, and an Nvidia GTX 1080 Ti GPU. PCBA is implemented with MATLAB R2017b executables and CUDA 10.2. Our code is available on GitHub: https://github.com/ramvasudevan/GlobOptBernstein.
Va Parameter Selection
To set up a fair comparison, we scale each problem to the box, where is the problem dimension. For PCBA, we use the stopping criteria in §III. To choose , we first compute the patch , then set
(27) 
We set the maximum number of iterations to . We do not set , which determines the minimum number of iterations; is only needed to prove complexity bounds in §IV.
BSOS [20, §4] requires the user to specify the size of the semidefinite matrix associated with the convex relaxation of the POP. This is done by by selecting a pair of parameters, and (note these are different from our use of and ). Though one has to increase and gradually to ensure convergence, larger values of and correspond to larger semidefinite programs, which can be difficult to solve. We chose and separately for the Benchmark Evaluation. We used for the Increasing Number of Constraints.
VB Benchmark Evaluation
Setup
We tested PCBA, BSOS, and fmincon on eight benchmark POPs [31], listed as P1 through P8; the problems are reported in the Appendix. We ran each solver 50 times on each problem, and report the median solution error and time required to find a solution. Since fmincon may or may not converge to the global optimum depending on its initial guess, we used random initial guesses for each of the 50 attempts.
Results
The results are summarized in Table I. For additional results (e.g., to plot the results of any of the problems), see the GitHub repository.
In terms of solution quality, PCBA always found the solution to within the desired optimality tolerance , except for on P1, where PCBA stopped at the maximum allowed number of iterations (28); PCBA always used between 22 and 28 iterations. BSOS always found a lower bound to the solution, as expected. While fmincon converged to the global optimum at least once on every problem, it often converged to local minima, hence the large error values on some problems. In terms of solve time, fmincon solves the fastest (in 10–20 ms), PCBA is about twice as slow as fmincon, and BSOS is one to two orders of magnitude slower than PCBA.
For PCBA, the memory usage (computed with (26)) increases roughly by one order of magnitude for each additional dimension of the decision variable increases (Table I reports the peak GPU memory used by PCBA on each benchmark problem). Notice that PCBA never uses more than several MB of GPU memory, which is much less than the 11 GB available on the Nvidia GTX 1080 Ti GPU. Figure 3 shows the number of patches and the amount of GPU memory used on P4. We see that the memory usage peaks, then stays below a constant, as predicted by Theorem 18.
PCBA  BSOS [20]  fmincon [28]  

error  time [s]  iterations  memory  error  time [s]  error  time [s]  
P1  2  7.0000e07  7.9002e07  0.0141  28  23 kB  (3,3)  0.0068  1.2264  1.0880  0.0083 
P2  2  0.1369  0.0731  0.0131  26  60 kB  (2,2)  3.4994e04  0.7671  0.4447  0.0180 
P3  2  2.0000e06  1.9879e06  0.0128  26  57 kB  (2,2)  3.2747  0.5065  3.0000  0.0220 
P4  3  1.7000e06  6.5565e07  0.0204  24  416 kB  (4,4)  0.9455  6.6546  3.2000e05  0.0130 
P5  3  1.0000e06  0  0.0312  28  3 MB  (2,2)  4.4858e07  0.8462  7.9985e06  0.0062 
P6  4  4.2677e04  1.8685e04  0.0315  23  2 MB  (3,3)  36.6179  6.2434  0.0040  0.0065 
P7  4  5.0000e07  2.5280e07  0.0374  26  282 kB  (1,1)  1.0899  0.1839  2.4002e07  0.0139 
P8  4  1.3445e04  6.7803e05  0.0440  22  6 MB  (2,2)  3.2521  1.8295  9.9989e04  0.0169 
VC Increasing Constraint Problems
Next, we tested each solver on problems with an increasing number of constraints, to each solver for use with RTD; we find in practice that a robot running RTD must handle between 30 and 300 constraints at each planning iteration.
Setup
We first choose an objective function with either many local minima or a nearly flat gradient near the global optimum (the global optimizer is known for each function). In particular, we tested on the ElAttarVidyasagarDutta, Powell, Wood, DixonPrice (with ), Beale, Bukin02, and DeckkersAarts problems (see the Appendix and [11]).
For each objective function, we generate 200 random constraints in total, while ensuring at least one global optimizer stays feasible (if there are multiple global optimizers, we choose one at random that will be feasible for all constraints). To generate a single constraint , we first create a polynomial as a sum of the monomials of the decision variable with maximum degree 2, with random coefficients in the range . To ensure is feasible, we evaluate on , then subtract the resulting value from to produce (i.e., ).
We run PCBA, BSOS, and fmincon on each objective function for 20 trials, with 10 random constraints in the first trial, and adding 10 constraints in each trial. As before, we run fmincon 50 times for each trial with random initial guesses, since its performance is dependent upon the initial guess.
Results
To illustrate the results, data for the Powell objective function are shown in Figure 4. The data (and plots) for the other objective functions are available in the GitHub repository
In terms of solution quality, all three algorithms converge to the global optimum often when the number of constraints is low, but fmincon converges to suboptimal solutions more frequently as the number of constraints increases. PCBA and BSOS are always able to find the optimal solution. PCBA is always able to find the global optimum regardless of the number of constraints, unlike BSOS (which runs out of memory) or fmincon (which converges to local minima).
All three solvers require an increasing amount of solve time as the number of constraints increases. PCBA is comparable in speed to fmincon on 2D problems, but is typically slower on higherdimensional problems. Regardless of the number of constraints, BSOS takes three to four orders of magnitude more time to solve than PCBA or fmincon.
More details on PCBA are presented in Table II. PCBA’s time to find a solution increases roughly by an order of magnitude when the decision variable dimension increases by 1; however, PCBA solves all of the increasing constraint POPs within 0.5 s. The memory usage increases by 1–3 orders of magnitude with each additional dimension; however, PCBA never uses more than 650 MB of GPU memory, well below the 11 GB available. Figure 5 shows PCBA’s GPU memory usage versus the number of constraints for the Powell objective function.
VD Summary
As expected from the complexity bounds in §IVB, the results in this section indicate that PCBA is practical for quickly solving 2D POPs with hundreds of constraints. We leverage this next by applying PCBA to solve RTD’s POP (4) for realtime recedinghorizon trajectory optimization.
max time [s]  max items  max memory  

EVD  2  0.0360  66  171 kB 
Powell  2  0.0447  1200  6.24 MB 
Wood  2  0.0532  54  220 kB 
DP 2D  2  0.0259  90  433 kB 
DP 3D  3  0.0675  356  3.47 MB 
DP 4D  4  0.402  4994  193 MB 
Beale  2  0.0302  106  259 kB 
Bukin02  4  0.393  10550  110 MB 
DA  4  0.389  51886  647 MB 
Vi Hardware Demonstrations
Recall from §IIC that RTD enables realtime, provably collisionfree trajectory planning via solving a POP every planning iteration. RTD is provably collisionfree regardless of the POP solver used, meaning that we are able to test PCBA safely on hardware; when PCBA is applied to RTD, we call the resulting trajectory planning algorithm RTD*. See Figure 1 and the video (click for link).
In this section, we apply RTD* to a Segway robot navigating a hallway with static obstacles. Recall that RTD always produces dynamically feasible trajectory plans [18]. As proven in Corollary 13 and demonstrated in §V, PCBA always finds optimal solutions. This section shows that PCBA/RTD* improves the liveness of a robot by successfully navigating a variety of scenarios, and outperforming fmincon/RTD.
Via Overview
Demonstrations
We ran two demonstrations in a m hallway. In the first demonstration, we filled the hallway with random static obstacles and ran RTD*. In the second demonstration, we constructed two difficult scenarios and ran bot PCBA/RTD* and fmincon/RTD on each.
Hardware
We use a Segway differentialdrive robot with a planar Hokuyo UTM30LX LIDAR for mapping and obstacle detection (see Figure 1). Mapping, localization, and trajectory optimization run onboard, on a 4.0 GHz laptop with an Nvidia GeForce GTX 1080 GPU.
POPs
As in §IIC (also see Figure 1), the robot plans by optimizing over a set of parameterized trajectories as in [18, (16)]. The parameters are speed m/s and yaw rate rad/s, so . The trajectories are between 1 and 2 s long, depending on the robot’s initial speed (since each trajectory includes a braking maneuver). The robot must find a new plan (i.e., PCBA must solve a new POP) every 0.5 s, or else it begins executing the braking maneuver associated with its previouslycomputed plan [18, Remark 70]. In other words, we require PCBA to return a feasible solution or that the problem is infeasible. PCBA is given a time limit of 0.4 s to find a solution, because the robot requires 0.1 s for other onboard processes.
At each planning iteration, obstacles are converted into constraints as in [18, Sections 6 and 7]; thi