Rulebased Optimal Control for Autonomous Driving
Abstract.
We develop optimal control strategies for Autonomous Vehicles (AVs) that are required to meet complex specifications imposed by traffic laws and cultural expectations of reasonable driving behavior. We formulate these specifications as rules, and specify their priorities by constructing a priority structure. We propose a recursive framework, in which the satisfaction of the rules in the priority structure are iteratively relaxed based on their priorities. Central to this framework is an optimal control problem, where convergence to desired states is achieved using Control Lyapunov Functions (CLFs), and safety is enforced through Control Barrier Functions (CBFs). We also show how the proposed framework can be used for afterthefact, pass / fail evaluation of trajectories  a given trajectory is rejected if we can find a controller producing a trajectory that leads to less violation of the rule priority structure. We present case studies with multiple driving scenarios to demonstrate the effectiveness of the proposed framework.
1. Introduction
With the development and integration of cyber physical and safety critical systems in various engineering disciplines, there is an increasing need for computational tools for verification and control of such systems according to rich and complex specifications. A prominent example is autonomous driving, which received a lot of attention during the last decade. Besides common objectives in optimal control problems, such as minimizing the energy consumption and travel time, and constraints on control variables, such as maximum acceleration, autonomous vehicles (AVs) should follow complex and possibly conflicting traffic laws with different priorities. They should also meet cultural expectations of reasonable driving behavior (Nolte et al., 2017; ShalevShwartz et al., 2017; Parseh et al., 2019; Ulbrich and Maurer, 2013; Qian et al., 2014; ISO, 2019; Collin et al., 2020). For example, an AV has to avoid collisions with other road users (high priority), drive faster than the minimum speed limit (low priority), and maintain longitudinal clearance with the lead car (medium priority). We formulate these behavior specifications as a set of rules with a priority structure that captures their importance (Censi et al., 2019).
To accommodate the rules, we formulate the AV control problem as an optimal control problem, in which the satisfaction of the rules and some vehicle limitations are enforced by Control Barrier Functions (CBF) (Ames et al., 2017), and convergence to desired states is achieved through Control Lyapunov Functions (Freeman and Kokotovic, 1996). To minimize violation of the set of rules, we formulate iterative rule relaxation according to the preorder on the rules.
Control Lyapunov Functions (CLFs) (Freeman and Kokotovic, 1996; Artstein, 1983) have been used to stabilize systems to desired states. CBFs enforce set forwardinvariance (Tee et al., 2009; Wisniewski and Sloth, 2013), and have been adopted to guarantee the satisfaction of safety requirements (Ames et al., 2017; Wang et al., 2017; Lindemann and Dimarogonas, 2019). In (Ames et al., 2017; Glotfelter et al., 2017), the constraints induced by CBFs and CLFs were used to formulate quadratic programs (QPs) that could be solved in real time to stabilize affine control systems while optimizing quadratic costs and satisfying state and control constraints. The main limitation of this approach is that the resulting QPs can easily become infeasible, especially when bounds on control inputs are imposed in addition to the safety specifications and the state constraints, or for constraints with high relative degree (Xiao and Belta, 2019). Relaxations of the (hard) CLF (Ames et al., 2012, 2017) and CBF (Xiao and Belta, 2019) constraints have been proposed to address this issue.
The approaches described above do not consider the (relative) importance of the safety constraints during their relaxations. With particular relevance to the application considered here, AVs often deal with situations where there are conflicts among some of the traffic laws or other requirements. For instance, consider a scenario where a pedestrian walks to the lane in which the AV is driving  it is impossible for the AV to avoid a collision with the pedestrian or another vehicles, stay in lane, and drive faster than the minimum speed limit at the same time. Given the relative priorities of these specifications, a reasonable AV behavior would be to avoid a collision with the pedestrian or other vehicles (high priority), and instead violate low or medium priority rules, e.g., by reducing speed to a value lower than the minimum speed limit, and/or deviating from its lane. The maximum satisfaction and minimum violation of a set of rules expressed in temporal logic were studied in (Dimitrova et al., 2018; Tmová et al., 2013) and solved by assigning positive numerical weights to formulas based on their priorities (Tmová et al., 2013). In (Censi et al., 2019), the authors proposed rulebooks, a framework in which relative priorities were captured by a preorder. In conjunction with rule violation scores, rulebooks were used to rank vehicle trajectories. These works do not consider the vehicle dynamics, or assume very simple forms, such as finite transition systems. The violation scores are example  specific, or are simply the quantitative semantics of the logic used to formulate the rules. In their current form, they capture worst case scenarios and are nondifferentiable, and cannot be used for generating controllers for realistic vehicle dynamics.
In this paper, we draw inspiration from Signal Temporal Logic (STL) (Maler and Nickovic, 2004) to formalize traffic laws and other driving rules and to quantify the degree of violation of the rules by AV trajectories. We build on the rulebooks from (Censi et al., 2019) to construct a rule priority structure. The main contribution of this paper is an iterative procedure that uses the rule priority to determine a control strategy that minimizes rule violation globally. We show how this procedure can be adapted to develop transparent and reproducible rulebased pass/fail evaluation of AV trajectories in test scenarios. Central to these approaches is an optimization problem based on (Xiao and Belta, 2019), which uses detailed vehicle dynamics, ensures the satisfaction of “hard” vehicle limitations (e.g., acceleration constraints), and can accommodate rule constraints with high relative degree. Another key contribution of this work is the formal definition of a speed dependent, optimal overapproximation of a vehicle footprint that ensures differentiability of clearancetype rules, which enables the use of powerful approaches based on CBF and CLF. Finally, we use and test the proposed architecture and algorithms were implemented in a userfriendly software tool in various driving scenarios.
2. Preliminaries
2.1. Vehicle Dynamics
Consider an affine control system given by:
(1) 
where ( is the state constraint set), denotes differentiation with respect to time, and are globally Lipschitz, and , where is the control constraint set defined as:
(2) 
with , and the inequalities are interpreted componentwise. We use to refer to a trajectory of (1) at a specific time , and we use to denote a whole trajectory starting at time 0 and ending at a final time specified by a scenario. Note that most vehicle dynamics, such as “traditional” dynamics defined with respect to an inertial frame (Ames et al., 2017) and dynamics defined along a given reference trajectory (Rucco et al., 2015) (see (18)) are in the form (1). Throughout the paper, we will refer to the vehicle with dynamics given by (1) as ego.
Definition 0 ().
Definition 0 ().
(Relative degree (Nguyen and Sreenath, 2016)) The relative degree of a (sufficiently many times) differentiable function with respect to system (1) is the number of times it needs to be differentiated along its dynamics (Lie derivatives) until the control explicitly shows in the corresponding derivative.
In this paper, since function is used to define a constraint , we will also refer to the relative degree of as the relative degree of the constraint.
2.2. High Order Control Barrier Functions
Definition 0 ().
(Class function (Khalil, 2002)) A continuous function is said to belong to class if it is strictly increasing and .
Given and a constraint with relative degree , we define and a sequence of functions :
(3) 
where denotes a order differentiable class function. We further define a sequence of sets associated with (3) in the following form:
(4) 
Definition 0 ().
(High Order Control Barrier Function (HOCBF) (Xiao and Belta, 2019)) Let be defined by (4) and be defined by (3). A function is a High Order Control Barrier Function (HOCBF) of relative degree for system (1) if there exist order differentiable class functions and a class function such that
(5)  
for all . () denotes Lie derivatives along () (one) times, and denotes the remaining Lie derivatives along with degree less than or equal to (see (Xiao and Belta, 2019) for more details).
The HOCBF is a general form of the relative degree CBF (Ames et al., 2017), (Glotfelter et al., 2017), (Lindemann and Dimarogonas, 2019) (setting reduces the HOCBF to the common CBF form in (Ames et al., 2017), (Glotfelter et al., 2017), (Lindemann and Dimarogonas, 2019)), and is also a general form of the exponential CBF (Nguyen and Sreenath, 2016).
Theorem 5 ().
Definition 0 ().
(Control Lyapunov Function (CLF) (Ames et al., 2012)) A continuously differentiable function is an exponentially stabilizing control Lyapunov function (CLF) if there exist positive constants such that , , the following holds:
(6) 
Theorem 7 ((Ames et al., 2012)).
Recent works (Ames et al., 2017),(Lindemann and Dimarogonas, 2019),(Nguyen and Sreenath, 2016) combined CBFs and CLFs with quadratic costs to formulate an optimization problem that stabilized a system using CLFs subject to safety constraints given by CBFs. In this work, we follow a similar approach. Time is discretized and CBFs and CLFs constraints are considered at each discrete time step. Note that these constraints are linear in control since the state value is fixed at the beginning of the discretization interval. Therefore, in every interval, the optimization problem is a QP . The optimal control obtained by solving each QP is applied at the current time step and held constant for the whole interval. The next state is found by integrating the dynamics (1). The usefulness of this approach is conditioned upon the feasibility of the QP at every time step. In the case of constraints with high relative degrees, which are common in autonomous driving, the CBFs can be replaced by HOCBFs.
2.3. Rulebooks
As defined in (Censi et al., 2019), a rule specifies a desired behavior for autonomous vehicles. Rules can be derived from traffic laws, local culture, or consumer expectation, e.g., “stay in lane for all times”, “maintain clearance from pedestrians for all times”, “obey the maximum speed limit for all times”, “reach the goal”. A rulebook as introduced in (Censi et al., 2019) defines a priority on rules by imposing a preorder that can be used to rank AV trajectories:
Definition 0 ().
(Rulebook (Censi et al., 2019)) A rulebook is a tuple , where is a finite set of rules and is a preorder on .
A rulebook can be represented by a directed graph, where each node is a rule and an edge between two rules means that the first rule has higher priority than the second. Formally, in the graph means that ( has a higher priority than ). Note that, using a preorder, two rules can be in one of three relations: comparable (one has a higher priority than the other), incomparable, or equivalent (each has a higher priority than the other).
Example 0 ().
Consider the rulebook shown in Fig. 1, which consists of 6 rules. In this example, and are incomparable, and both have a higher priority than and . Rules and are equivalent ( and ), but are incomparable to . Rule has the lowest priority among all rules.
Rules are evaluated over vehicle trajectories (i.e., trajectories of system (1)). A violation metric is a function specific to a rule that takes as input a trajectory and outputs a violation score that captures the degree of violation of the rule by the trajectory (Censi et al., 2019). For example, if the AV crosses the lane divider and reaches within the left lane by a maximum distance of 1m along a trajectory, then the violation score for that trajectory against the “stay in lane for all times” rule can be 1m.
3. Problem Formulation
For a vehicle with dynamics given by (1) and starting at a given state , consider an optimal control problem in the form:
(7) 
where denotes the 2norm of a vector, denotes a bounded final time, and is a strictly increasing function of its argument (e.g., an energy consumption function ). We consider the following additional requirements:
Trajectory tracking: We require the vehicle to stay as close as possible to a desired reference trajectory (e.g., middle of its current lane).
State constraints: We impose a set of constraints (componentwise) on the state of system (1) in the following form:
(8) 
where and denote the maximum and minimum state vectors, respectively. Examples of such constraints for a vehicle include maximum acceleration, maximum braking, and maximum steering rate.
Priority structure: We require the system trajectory of (1) starting at to satisfy a priority structure , i.e.:
(9) 
where is an equivalence relation over a finite set of rules and is a total order over the equivalence classes. Our priority structure is related to the rulebook from Sec. 2.3, but it requires that any two rules from are either comparable or equivalent (see Sec. 4.2 for a formal definition). Informally, this means that is the “best” trajectory that (1) can produce, considering the violation metrics of the rules in and the priorities captured by and . A formal definition for a priority structure and its satisfaction will be given in Sec. 4.2.
Control bounds: We impose control bounds as given in (2). Examples include jerk and steering acceleration.
Formally, we can define the optimal control problem as follows:
Problem 1 ().
Our approach to Problem 1 can be summarized as follows: We use CLFs for tracking the reference trajectory and HOCBFs to implement the state constraints (8). For each rule in , we define violation metrics. We show that satisfaction of the rules can be written as forward invariance for sets described by differential functions, and enforce them using HOCBFs. The control bounds (2) are considered as constraints. We provide an iterative solution to Problem 1, where each iteration involves solving a sequence of QPs. In the first iteration, all the rules from are considered. If the corresponding QPs are feasible, then an optimal control is found. Otherwise, we iteratively relax the satisfaction of rules from subsets of based on their priorities, and minimize the corresponding relaxations by including them in the cost function.
4. Rules and priority structures
In this section, we extend the rulebooks from (Censi et al., 2019) by formalizing the rules and defining violation metrics. We introduce a priority structure, in which all rules are comparable, and it is particularly suited for the hierarchical control framework proposed in Sec. 5.3.
4.1. Rules
In the definition below, an instance is a traffic participant or artifact that is involved in a rule, where is the set of all instances involved in the rule. For example, in a rule to maintain clearance from pedestrians, a pedestrian is an instance, and there can be many instances encountered by ego in a given scenario. Instances can also be traffic artifacts like the road boundary (of which there is only one), lane boundaries, or stop lines.
Definition 0 ().
(Rule) A rule is composed of a statement and three violation metrics. A statement is a formula that is required to be satisfied for all times. A formula is inductively defined as:
(10) 
where are formulas, is a predicate on the state vector of system (1) with . are Boolean operators for conjunction and negation, respectively. The three violation metrics for a rule are defined as:

instantaneous violation metric

instance violation metric , and

total violation metric ,
where is an instance, is a trajectory at time and is a whole trajectory of ego. The instantaneous violation metric quantifies violation by a trajectory at a specific time with respect to a given instant . The instance violation metric captures violation with respect to a given instance over the whole time of a trajectory, and is obtained by aggregating over the entire time of a trajectory . The total violation metric is the aggregation of the instance violation metric over all instances .
The aggregations in the above definitions can be implemented through selection of a maximum or a minimum, integration over time, summation over instances, or by using general norms. A zero value for a violation score shows satisfaction of the rule. A strictly positive value denotes violation  the larger the score, the more ego violates the rule. Throughout the paper, for simplicity, we use and instead of and if there is only one instance. Examples of rules (statements and violations metrics and scores) are given in Sec. 6 and in the Appendix.
We divide the set of rules into two categories: (1) clearance rules  safety relevant rules enforcing that ego maintains a minimal distance to other traffic participants and to the side of the road or lane (2) nonclearance rules  rules that that are not contained in the first category, such as speed limit rules. In Sec. 5.2, we provide a general methodology to express clearance rules as inequalities involving differentiable functions, which will allow us to enforce their satisfaction using HOCBFs.
Remark 1 ().
The violation metrics from Def. 1 are inspired from Signal Temporal Logic (STL) robustness (Maler and Nickovic, 2004; Donzé and Maler, 2010; Mehdipour et al., 2019), which quantifies how a signal (trajectory) satisfies a temporal logic formula. In this paper, we focus on rules that we aim to satisfy for all times. Therefore, the rules in (10) can be seen as (particular) STL formulas, which all start with an “always” temporal operator (omitted here).
4.2. Priority Structure
The preorder rulebook in Def. 8 defines a “base” preorder that captures relative priorities of some (comparable) rules, which are often similar in different states and countries. A preorder rulebook can be made more precise for a specific legislation by adding rules and/or priority relations through priority refinement, rule aggregation and augmentation (Censi et al., 2019). This can be done through empirical studies or learning from local data to construct a total order rulebook. To order trajectories, authors of (Censi et al., 2019) enumerated all the total orders compatible with a given preorder. In this paper, motivated by the hierarchical control framework described in Sec. 5.3, we require that any two rules are in a relationship, in the sense that they are either equivalent or comparable with respect to their priorities.
Definition 0 (Priority Structure).
A priority structure is a tuple , where is a finite set of rules, is an equivalence relation over , and is a total order over the set of equivalence classes determined by .
Equivalent rules (i.e., rules in the same class) have the same priority. Given two equivalence classes and with , every rule has lower priority than every rule . Since is a total order, any two rules are comparable, in the sense that exactly one of the following three statements is true: (1) and have the same priority, (2) has higher priority than , and (3) has higher priority than . Given a priority structure , we can assign numerical (integer) priorities to the rules. We assign priority 1 to the equivalence class with the lowest priority, priority 2 to the next one and so on. The rules inside an equivalence class inherit the priority from their equivalence class. Given a priority structure and violation scores for the rules in , we can compare trajectories:
Definition 0 (Trajectory Comparison).
A trajectory is said to be better (less violating) than another trajectory if the highest priority rule(s) violated by has a lower priority than the highest priority rule(s) violated by . If both trajectories violate an equivalent highest priority rule(s), then the one with the smaller (maximum) total violation score is better. In this case, if the trajectories have equal violation scores, then they are equivalent.
It is easy to see that, by following Def. 3, given two trajectories, one can be better than the other, or they can be equivalent (i.e., two trajectories cannot be incomparable).
Example 0 ().
Consider the driving scenario from Fig. 2(a) and a priority structure in Fig. 2(b), where , and : “No collision”, : “Lane keeping”, : “Speed limit” and : “Comfort”. There are 3 equivalence classes given by , and . Rule has priority 1, and have priority 2, and has priority 3. Assume the instance (same as total, as there is only one instance for each rule) violation scores of rule by trajectories are given by as shown in Fig. 2(b). Based on Def. 3, trajectory is better (less violating) than trajectory since the highest priority rule violated by () has a lower priority than the highest priority rule violated by (). The same argument holds for trajectories and , i.e., is better than . The highest priority rules violated by trajectories and have the same priorities. Since the maximum violation score of the highest priority rules violated by is smaller than that for , i.e., , , trajectory is better than .
Definition 0 ().
Def. 5 is central to our solution to Problem 1 (see Sec. 5.3), which is based on an iterative relaxation of the rules according to their satisfaction of the priority structure.
5. RuleBased Optimal Control
In this section, we present our approach to solve Problem 1.
5.1. Trajectory Tracking
As discussed in Sec. 2.1, Eqn. (1) can define “traditional” vehicle dynamics with respect to an inertial reference frame (Ames et al., 2017), or dynamics defined along a given reference trajectory (Rucco et al., 2015) (see (18)). The case study considered in this paper falls in the second category (the middle of ego’s current lane is the default reference trajectory). We use the model from (Rucco et al., 2015), in which part of the state of (1) captures the tracking errors with respect to the reference trajectory. The tracking problem then becomes stabilizing the error states to 0. Suppose the error state vector is (the components in are part of the components in ). We define a CLF ( in Def. 6). Any control that satisfies the relaxed CLF constraint (Ames et al., 2017) given by:
(11) 
exponentially stabilizes the error states to 0 if , where is a relaxation variable that compromises between stabilization and feasibility. Note that the CLF constraint (11) only works for with relative degree one. If the relative degree is larger than , we can use inputtostate linearization and state feedback control (Khalil, 2002) to reduce the relative degree to one (Xiao et al., 2020a).
5.2. Clearance and Optimal Disk Coverage
Satisfaction of a priority structure can be enforced by formulating realtime constraints on ego state that appear in the violation metrics. Satisfaction of the nonclearance rules can be easily implemented using HOCBFs (See Sec. 5.3, Sec. A). For clearance rules, we define a notion of clearance region around ego and around the traffic participants in that are involved in the rule (e.g., pedestrians and other vehicles). The clearance region for ego is defined as a rectangle with tunable speeddependent lengths (i.e., we may choose to have a larger clearance from pedestrians when ego is driving with higher speeds) and defined based on ego footprint and functions that determine the front, back, left, and right clearances as illustrated in Fig. 3, where . The clearance regions for participants (instances) are defined such that they comply with their geometry and cover their footprints, e.g., (fixedlength) rectangles for other vehicles and (fixedradius) disks for pedestrians, as shown in Fig. 3.
To satisfy a clearance rule involving traffic participants, we need to avoid any overlaps between the clearance regions of ego and traffic participants. We define a function to determine the signed distance between the clearance regions of ego and participant ( denotes the state of participant ), which is negative if the clearance regions overlap. Therefore, satisfaction of a clearance rule can be imposed by having a constraint on to be nonnegative. For the clearance rules “stay in lane” and “stay in drivable area”, we require that ego clearance region be within the lane and the drivable area, respectively.
However, finding can be computationally expensive. For example, the distance between two rectangles could be from corner to corner, corner to edge, or edge to edge. Since each rectangle has corners and edges, there are 64 possible cases. More importantly, this computation leads to a nonsmooth function, which cannot be used to enforce clearance using a CBF approach. To address these issues, we propose an optimal coverage of the rectangles with disks, which allows to map the satisfaction of the clearance rules to a set of smooth HOCBF constraints (i.e., there will be one constraint for each pair of centers of disks pertaining to different traffic participants).
We use and to denote the length and width of ego’s footprint, respectively. Assume we use disks with centers located on the center line of the clearance region to cover it (see Fig. 4). Since all the disks have the same radius, the minimum radius to fully cover ego’s clearance region, denoted by , is given by:
(12) 
The minimum radius of the rectangular clearance region for a traffic participant with disks number is defined in a similar way using the length and width of its footprint and setting .
Assume the center of the disk for ego, and the center of the disk for the instance are given by and , respectively (See Appendix B). To avoid any overlap between the corresponding disks of ego and the instance , we impose the following constraints:
(13)  
Since disks fully cover the clearance regions, enforcing (13) also guarantees that . For the clearance rules “stay in lane” and “stay in drivable area”, we can get similar constraints as (13) to make the disks that cover ego’s clearance region stay within them (e.g., we can consider and formulate (13) such that the distance between ego disk centers and the line in the middle of ego’s current lane be less than ). Thus, we can formulate satisfaction of all the clearance rules as continuously differentiable constraints (13), and implement them using HOCBFs.
To efficiently formulate the proposed optimal disk coverage approach, we need to find the minimum number of the disks that fully cover the clearance regions as it determines the number of constraints in (13). Moreover, we need to minimize the lateral approximation error since large errors imply overly conservative constraint (See Fig. 4). This can be formally defined as an optimization problem, and solved offline to determine the numbers and radii of the disks in (13) (the details are provided in Appendix B).
5.3. Optimal Control
In this section, we present our complete framework to solve Problem 1. We propose a recursive algorithm to iteratively relax the satisfaction of the rules in the priority structure (if needed) based on the total order over the equivalence classes.
Let be the set of equivalence classes in , and be the cardinality of . We construct the power set of equivalence classes denoted by , and incrementally (from low to high priority) sort the sets in based on the highest priority of the equivalence classes in each set according to the total order and denote the sorted set by , where . We use this sorted set in our optimal control formulation to obtain satisfaction of the higher priority classes, even at the cost of relaxing satisfaction of the lower priority classes. Therefore, from Def. 5, the solution of the optimal control will satisfy the priority structure.
Example 0 ().
Reconsider Exm. 4. We define . Based on the given total order , we can write the sorted power set as .
In order to find a trajectory that satisfies a given priority structure, we first assume that all the rules are satisfied. Starting from in the sorted set , we solve Problem 1 given that no rules are relaxed, i.e., all the rules must be satisfied. If the problem is infeasible, we move to the next set , and relax all the rules of all the equivalence classes in while enforcing satisfaction of all the other rules in the equivalence class set denoted by . This procedure is done recursively until we find a feasible solution of Problem 1. Formally, at for , we relax all the rules for all the equivalence classes and reformulate Problem 1 as the following optimal control problem:
(14) 
subject to:
dynamics (1), control bounds (2), CLF constraint (11),
(15)  
(16)  
(17) 
where and assign the tradeoff between the the CLF relaxation (used for trajectory tracking) and the HOCBF relaxations . denotes the relative degree of , respectively. The functions and are HOCBFs for the rules in , and are implemented directly from the rule statement for nonclearance rules or by using the optimal disk coverage framework for clearance rules. At relaxation step , HOCBFs corresponding to the rules in , are relaxed by adding in (15), while for other rules in in (16) and the state constraints (17), regular HOCBFs are used. We assign according to their relative priorities, i.e., we choose a larger for the rule that belongs to a higher priority class. The functions are HOCBFs for the state limitations (8). The functions are defined as in (3). can be penalized to improve the feasibility of the problem above (Xiao and Belta, 2019; Xiao et al., 2020b).
If the above optimization problem is feasible for all , we can specifically determine which rules (within an equivalence class) are relaxed based on the values of in the optimal solution (i.e., if , then rule does not need to be relaxed). This procedure is summarized in Alg. 1.
Remark 2 (Complexity).
The optimization problem (14) is solved using QPs introduced in Sec. 2. The complexity of the QP is , where is the dimension of decision variables. It usually takes less than to solve each QP in Matlab. The total time for each iteration depends on the final time and the length of the reference trajectory . The computation time can be further improved by running the code in parallel over multiple processors.
5.4. Pass/Fail Evaluation
As an extension to Problem 1, we formulate and solve a pass / fail (P/F) procedure, in which we are given a vehicle trajectory, and the goal is to accept (pass, P) or reject (fail, F) it based on the satisfaction of the rules. Specifically, given a candidate trajectory of system (1), and given a priority structure , we pass (P) if we cannot find a better trajectory according to Def. 3. Otherwise, we fail (F) . We proceed as follows: We find the total violation scores of the rules in for the candidate trajectory . If no rules in are violated, then we pass the candidate trajectory. Otherwise, we investigate the existence of a better (less violating) trajectory. We take the middle of ego’s current lane as the reference trajectory and reformulate the optimal control problem in (14) to recursively relax rules such that if the optimization is feasible, the generated trajectory is better than the candidate trajectory . Specifically, assume that the highest priority rule(s) that the candidate trajectory violates belongs to , . Let denote the set of equivalence classes with priorities not larger than , and denote the cardinality of . We construct a power set , and then apply Alg. 1, in which we replace by .
Remark 3 ().
The procedure described above would fail a candidate trajectory even if only a slightly better alternate trajectory (i.e., violating rules of the same highest priority but with slightly smaller violation scores) can be found by solving the optimal control problem. In practice, this might lead to an undesirably high failure rate. One way to deal with this, which we will consider in future work (see Sec. 7), is to allow for more classification categories, e.g., “Provisional Pass” (PP), which can then trigger further investigation of .
Example 0 ().
Reconsider Exm. 4 and assume trajectory is a candidate trajectory which violates rules , thus, the highest priority rule that is violated by trajectory belongs to . We construct . The power set is then defined as , and is sorted based on the total order as .
6. Case Study
In this section, we apply the methodology developed in this paper to specific vehicle dynamics and various driving scenarios. Ego dynamics (1) are defined with respect to a reference trajectory (Rucco et al., 2015), which measures the alongtrajectory distance and the lateral distance of the vehicle Center of Gravity (CoG) with respect to the closest point on the reference trajectory as follows:
(18) 
where is the vehicle local heading error determined by the difference of the global vehicle heading in (33) and the tangent angle of the closest point on the reference trajectory (i.e., ); , denote the vehicle linear speed and acceleration; , denote the steering angle and steering rate, respectively; is the curvature of the reference trajectory at the closest point; is the length of the vehicle from the tail to the CoG; and , denote the two control inputs for jerk and steering acceleration as shown in Fig. 5. where is the length of the vehicle from the head to the CoG.
We consider the cost function in (14) as:
(19) 
The reference trajectory is the middle of ego’s current lane, and is assumed to be given as an ordered sequence of points , , , , where ( denotes the number of points). We can find the reference point , at time as:
(20) 
where denotes ego’s location. , and for a is chosen such that . Once we get , we can update the progress , the error states and the curvature in (18). The trajectory tracking in this case is to stabilize the error states ( in (11)) to 0, as introduced in Sec. 5.1. We also wish ego to achieve a desired speed (otherwise, ego may stop in curved lanes). We achieve this by redefining the CLF in (11) as . As the relative degree of w.r.t. (18) is larger than 1, as mentioned in Sec. 5.1, we use inputtostate linearization and state feedback control (Khalil, 2002) to reduce the relative degree to one (Xiao et al., 2020a). For example, for the desired speed part in the CLF ( (18) is in linear form from to , so we don’t need to do linearization), we can find a desired state feedback acceleration . Then we can define a new CLF in the form whose relative degree is just one w.r.t. in (18). We proceed similarly for driving to 0 in the CLF as the relative degrees of are also larger than one.
We consider the priority structure from Fig. 6, with rules , where is a pedestrian clearance rule; and are clearance rules for staying in the drivable area and lane, respectively; and are nonclearance rules specifying maximum and minimum speed limits, respectively; is a comfort nonclearance rule; and and are clearance rules for parked and moving vehicles, respectively. The formal rule definitions (statements, violation metrics) are given in Appendix A. These metrics are used to compute the scores for all the trajectories in the three scenarios below. The optimal disk coverage from Sec. 5.2 is used to compute the optimal controls for all the clearance rules, which are implemented using HOCBFs.
In the following, we consider three common driving scenarios in our tool (See Appendix C). For each of them, we solve the optimal control Problem 1 and perform pass/fail evaluation. In all three scenarios, in the pass/fail evaluation, an initial candidate trajectory is drawn “by hand” using the tool described in the Appendix. We use CLFs to generate a feasible trajectory which tracks the candidate trajectory subject to the vehicle dynamics (1), control bounds (2) and state constraints (8).
6.1. Scenario 1
Assume there is an active vehicle, a parked (inactive) vehicle and a pedestrian, as shown in Fig. 7.
Optimal control: We solve the optimal control problem (14) by starting the rule relaxation from (i.e., without relaxing any rules). This problem is infeasible in the given scenario since ego cannot maintain the required distance between both the active and the parked vehicles as the clearance rules are speeddependent. Therefore, we relaxed the next lowest priority equivalence class set in , i.e., the minimum speed limit rule in , for which we were able to find a feasible trajectory as illustrated in Fig. 7. By checking for from (14), we found it is positive in some time intervals in , and thus, is indeed relaxed. The total violation score for rule from (26) for the generated trajectory is 0.539, and all other rules in are satisfied. Thus, by Def. 5, the generated trajectory satisfies in Fig. 6.
Pass/Fail: The candidate trajectory is shown in Fig. 8. This candidate trajectory only violates rule with total violation score 0.682. Following Sec. 5.4, we can either relax or do not relax any rules to find a possibly better trajectory. As shown in the above optimal control problem for this scenario, we cannot find a feasible solution if we do not relax rule . Since the violation score of the candidate trajectory is larger than the optimal one, we fail this candidate trajectory.
6.2. Scenario 2
Assume there is an active vehicle, two parked (inactive) vehicles and two pedestrians, as shown in Fig. 9.
Optimal control: Similar to Scenario 1, the optimal control problem (14) starting from (without relaxing any rules in ) is infeasible. We relax the next lowest priority rule set in , i.e., the minimum speed rule in , for which we are able to find a feasible trajectory as illustrated in Fig. 9. Again, the for is positive in some time intervals in , and thus, is indeed relaxed. The total violation score of the rule for the generated trajectory is 0.646, and all the other rules in are satisfied.
Pass/Fail: The candidate trajectory shown in red dashed line in Fig. 10 violates rules and with total violation scores 0.01, 0.23, 0.22 found from (22), (24),(29), respectively. In this scenario, we know that ego can change lane (where the lane keeping rule is in a lower priority equivalence class than ) to get reasonable trajectory. Thus, we show the case of relaxing the rules in the equivalence classes and to find a feasible trajectory that is better than the candidate one. The optimal control problem (14) generates a trajectory as the redsolid curve shown in Fig. 10, and only the for is 0 for all . Thus, does not need to be relaxed. The generated trajectory violates rules and with total violation scores 0.124 and 0.111, respectively, but satisfies all the other rules including the highest priority rule . According to Def. 3 for the given in Fig. 6, the new generated trajectory is better than the candidate one, thus, we fail the candidate trajectory. Note that although this trajectory violates the lane keeping rule, it has a smaller violation score for compared to the trajectory obtained from the optimal control in Fig. 9 (0.111 v.s. 0.646), i.e., the average speed of ego in the redsolid trajectory in Fig. 10 is larger.
6.3. Scenario 3
Assume there is an active vehicle, a parked vehicle and two pedestrians (one just gets out of the parked vehicle), as shown in Fig. 11.
Optimal control: Similar to Scenario 1, the optimal control problem (14) starting from (without relaxing any rules in ) is infeasible. We relax the lowest priority rule set in , i.e., the minimum speed rule , and solve the optimal control problem. In the (feasible) generated trajectory, ego stops before the parked vehicle, which satisfies all the rules in except . Thus, by Def. 5, the generated trajectory satisfies the priority structure . However, this might not be a desirable behavior, thus, we further relax the lane keeping and comfort rules and find the feasible trajectory shown in Fig. 11. for is 0 for all , and, therefore, does not need to be relaxed. The total violation scores for the rules and are 0.058 and 0.359, respectively, and all other rules in are satisfied.
Pass/Fail: The candidate trajectory shown as the reddashed curve in Fig. 12 violates rules and with total violation scores 0.025 and 0.01, respectively. In this scenario, from the optimal control in Fig. 11 we know that ego can change lane (where the lane keeping rule is in a lower priority equivalence class than ). We show the case of relaxing the rules in the equivalence classes and (all have lower priorities than ). The optimal control problem (14) generates the redsolid curve shown in Fig. 12. By checking for , we found that is indeed not relaxed. The generated trajectory violates rules and with total violation scores 0.028 and 0.742, respectively, but satisfies all other rules including . According to Def. 3 and Fig. 6, the new generated trajectory (although violates more than the candidate trajectory, it does not violate which has a higher priority) is better than the candidate one. Thus, we fail the candidate trajectory.
7. Conclusions and Future Work
We developed a framework to design optimal control strategies for autonomous vehicles that are required to satisfy a set of traffic rules with a given priority structure, while following a reference trajectory and satisfying control and state limitations. We showed that, for commonly used traffic rules, by using control barrier functions and control Lyapunov functions, the problem can be cast as an iteration of optimal control problems, where each iteration involves a sequence of quadratic programs. We also showed that the proposed algorithms can be used to pass / fail possible autonomous vehicle behaviors against prioritized traffic rules. We presented multiple case studies for an autonomous vehicle with realistic dynamics and conflicting rules. Future work will be focused on learning priority structures from data, improving the feasibility of the control problems, and refinement of the pass / fail procedure.
Appendix
Appendix A Rule definitions
Here we give definitions for the rules used in Sec. 6. According to Def. 1, each rule statement should be satisfied for all times.
(22)  Maintain clearance with pedestrians  
where denotes the distance between footprints of ego and the pedestrian , and the clearance threshold is given based on a fixed distance and increases linearly by based on ego speed ( and are determined empirically), denotes the index set of all pedestrians, and