Planning under non-rational perception of uncertain spatial costs

Planning under non-rational perception of uncertain spatial costs

Abstract

This work investigates the design of motion-planning strategies that can incorporate non-rational perception of risks associated with uncertain spatial costs. Our proposed method employs the Cumulative Prospect Theory (CPT) model to generate a perceived risk function across a given environment, which is scalable to high dimensional space. Using this, CPT-like perceived risks and path-length metrics are combined to define a cost function that is compliant with the requirements of asymptotic optimality of sampling-based motion planners (RRT*). The modeling capabilities of CPT are demonstrated in simulation by producing a rich set of meaningful paths, capturing a range of different risk perceptions in a custom environment. Furthermore, using a simultaneous perturbation stochastic approximation (SPSA) method, we investigate the capacity of these CPT-based risk-perception planners to approximate arbitrary paths drawn in the environment. We compare this adaptability with Conditional Value at Risk (CVaR), another popular risk perception model. Our simulations show that CPT is richer and able to capture a larger class of paths as compared to CVaR and expected risk in our setting.

I Introduction

Motivation

Autonomous robots ranging from industrial manipulators to robotic swarms [1, 2, 3], are becoming less isolated and increasingly more interactive in today’s world. Arguably, most environments where these robots operate, have an associated degree of spatial cost, which can lead to a loss or damage to the robots. For example, by moving onto an oily surface, a robot may collide with a nearby obstacle, resulting in a (mild or hard) crash. In more complex scenarios, a decision maker (DM) may be directly involved with the motion of an autonomous system, such as in robotic surgery, search and rescue operations, or autonomous car driving. The risk perceived from these costs or losses could vary among different DMs. In such cases, the decision making of an autonomous system w.r.t perceived losses may influence the confidence that a DM has on the robot. This motivates to consider richer models that are inclusive of non-rational perception of these spatial costs into planning algorithms for robotic systems. We aim to incorporate one such model known as Cumulative Prospect Theory (CPT) [4] into path planning, and compare the qualitative behavior of its output paths with those obtained from other risk perception models.

(a) Motion plan with expected risk
(b) Motion plan with CPT risks
Fig. 1: Environment perception and sampling-based motion planning using a) Rational environment perception using expected risk, b) DM’s Risk-Averse environment perception with the chosen path in white.

Related Work

Traditional risk-aware path planning considers risk explicitly in forms such as motion and state uncertainty [5], collision time [6], or sensing uncertainty [7]. However, how these risks are perceived or relatively weighted has been an overlooked topic. A few recent works [8] contemplate risk perception models, but assume rational DMs and use coherent risk measures like Conditional Value at Risk (CVaR) [9]. Unlike CPT suggests, these risk measures are built using certain axioms that assume rationality and linearity of the DM’s risk perception [10]. CPT has been extensively used in numerous engineering applications like traffic routing [11], network protection [12], stochastic optimization [13], and safe shipping policies [14] to model non-rational decision making. However, these notions are yet to be applied in the context of robotic planning and control.

Regarding planning algorithms themselves, RRT* [15] has been the basis for many motion planners owing to its asymptotic optimality properties and its ability to solve complex problems [16]. Risk [17] and uncertainty [18] have been an ingredient of motion planning problems involving a human, but have been mainly modeled in a probabilistic manner [19] with discrete obstacles. CC-RRT* [20] handles both agent and environment uncertainty in a robust manner, but uncertainty or risk perception models are not used and discrete polyhedral obstacles are considered which cannot incorporate continuous spatial costs. Very few of these works have considered modeling planning environments via continuous cost maps [21, 22], while, to the best of our knowledge, the simultaneous treatment of cost and uncertainty perception to model a DM’s spatial risk profile has largely been ignored.

Contributions

Our contributions lie in three main areas: Firstly, we adapt CPT into path planning to model non-rational perception of spatial cost embedded in an environment. With this, we can capture a larger variety of risk perception models, extending the existing literature. Secondly, we generate desirable paths using a sampling-based (RRT*-based) planning algorithm on the perceived risky environment. Instead of just using path length for costs, our planner embeds a continuous risk profile and path length to calculate the costs, enabling us to plan in the perceived environment setting above. Furthermore, we prove that our chosen cost satisfies the sufficient conditions for asymptotic optimality of the planner, leading to reliable and consistent paths according to a specified risk profile in the perceived environment. Finally, using SPSA, we measure the adaptability of risk perception models in the planning algorithms to express a given arbitrary path in an environment. This can be an useful first step towards a larger learning scheme for capturing human path planning using frameworks like inverse reinforcement learning (which is currently out of scope for this work). We would like to clarify that in this work, we examine CPT based environment perception models for planning and leave the validation of CPT based models with human user studies for future work.

Figure 1 shows a preview of how a nonlinear DM’s perception of the environment influences the path produced to reach a goal. While Figure 0(a) shows a rational perception of the environment using expected risk, Figure 0(b) illustrates a nonlinearly deformed and scaled surface that reflects the perception of a certain DM using CPT. In particular, as discussed later, we observe a richer path behavior of CPT-based planners as compared with others.

Ii Preliminaries

In this section, we describe some basic notations used in the paper along with a concise description of Cumulative Prospect Theory. More details about CPT can be found in [23].

Notation

We let denote the space of real numbers, the space of positive integers and the space of non negative real numbers. Also, and denote the -dimensional real vector space and the configuration space used for planning. We use for the Euclidean norm and for the composition of two functions and , that is . We model a tree by an directed graph , where denotes the set of sampled points (vertices of the graph), and , denotes the set of edges of the graph.

Cost and uncertainty weighting using CPT

CPT is a non-rational decision making model which incorporates non-linear perception of uncertain costs. Traditionally it has been used in scenarios of monetary outcomes such as lotteries [4] and the stock market [23].

Let us suppose a DM is presented with a set of prospects , representing potential outcomes and their probabilities, . More precisely, there are possible outcomes associated with a prospect , given by , for , which can happen with a probability . The outcomes are arranged in a decreasing order denoted by with their corresponding probabilities, which satisfy . The outcomes of prospect may be interpreted as the random cost1 of choosing prospect .

We define a utility function, modeling a DM’s perceived cost and as the probability weighting function which represents the DM’s perceived uncertainty. While previous literature have used various forms for these functions, here we will focus on CPT utility function taking the form:

(1)

where and . Tversky and Kahnemann [4] suggest the use of and to parametrize an average human in the scenario of monetary lotteries, however this may not hold for our application scenario. The parameter represents the coefficient of cost aversion with greater values implying stronger aversion indicative of higher perceived costs, as indicated in Figure 1(a). The parameter represents the coefficient of cost sensitivity with lower values implying greater indifference towards cost which is indicated in Figure 1(b).

(a) Change in risk aversion with
(b) Change in risk sensitivity with
(c) Change in with
(d) Change in with
Fig. 2: Variation of risk aversion, risk sensitivity and uncertainty perception using CPT. (a)-(b) show risk perception with x-axis indicating the associated risk, , and the y-axis showing the perceived risk, . The dotted line indicates the line . (c)-(d) show uncertainty perception with x-axis indicating probabilities and y-axis showing their perception , with the dotted line depicting

We will be using the popular Prelec’s probability weighting function [23, 24] indicative of perceived uncertainty, which takes the form:

(2)

Figures 1(c) and 1(d) show changing uncertainty perception resulting from varying and respectively. By choosing low and values, one can get “uncertainty averse” behavior, with implying that unlikely outcomes are perceived to be more certain, as seen on Figures 1(c) and 1(d). On the other hand when , “uncertainty insensitive” behavior is obtained implying that the DM only considers more certain outcomes, which can be observed with high and values.

These concepts illustrate the nonlinear perception of cost and uncertainty, a DM under consideration can be categorized by the parameters . Using the non-linear parametric perception functions and , CPT calculates a value function , indicating the perceived risk value of the prospect . This calculation is detailed in Section IV using our planning setting.

Iii Environment Setup and Problem Statement

In this work, we consider spatial sources of risk embedded in the space. Our starting point is an uncertain cost function that aims to quantify objectively the (negative) consequences of being at a location or adopting a certain decision under uncertainty at a point .

For example, suppose a robot moves to a location from where there is an obstacle with certain probability. Then, we can define a cost measurement as the possible damage to the robot by moving from to under action applied at . A cost value can be defined depending on i) the type of robot (flexible robot or rigid robot), ii) the probability of having an obstacle in the said location, and iii) the type of action applied at to get to (e.g. slow/fast velocity). For simplicity, adopting a worst-case scenario, we may reduce the previous cost function to a function of the state 2.

As another example, consider a drone navigating in a building which is ablaze. In this case, the cost function can be proportional to the temperature profile. As sensors are noisy, the temperature profile is uncertain, resulting into a noisy spatial cost value . Similarly, environmental conditions that affect the robot’s motion may lead to underperformance. When moving over an icy road, the dynamics of the robot may behave unpredictably, resulting in a temporary loss of control and departure from an intended goal state. In this case, the uncertain cost may be quantified as the state disturbance under a given action over a given period of time. For example, for a simple second-order and fully actuated vehicle dynamics with acceleration input which is subject to a locally constant “ice” disturbance , in a small neighborhood of , we have for a small time . Thus, the difference with an intended state can be measured by the random variable , which encodes information about and the unit time of actuation, . Here is uncertain, and can be modeled with prior data or measured with a noisy sensor as in the temperature profile case.

Prior knowledge in the form of expert inputs and data collected from sensors can be used to get information about the cost , environmental uncertainty, and the robot’s capabilities. In this way, icy roads pose much lesser cost in the previous sense to a 4WD car with snow tires than a 2WD car with summer tires, hence the same cost at a given location could be scaled differently depending upon the robot’s capabilities.

In this work, we will assume that the cost at a location has been characterized as a random variable with a mean and standard deviation , for each . In particular, it is reasonable to approximate via a “bump function,” a concept extensively used in differential geometry [25]. To fix ideas and be more specific, consider the previous case where a vehicle moves through an “icy” environment, and assume . Then, a mean disturbance over a subset should result approximately into a disturbance , where is the portion of the trajectory from to that is inside the icy section . As is farther from , the disturbance reduces its effect on , and the value of should decrease to zero. In other words, there is a such that , where is an enlarged region whose boundary delimits the uncertain cost area from the certain one (i.e. outside the cost is zero with low uncertainty). The effect of is thus similar to that of a bump function defined with respect to and . Bump functions are infinitely smooth, take a positive constant value over , which smoothly decreases and becomes zero outside . There are many ways of defining bump functions, such as via convolutions, which works in arbitrary dimensions as described the following3. Let denote the indicator function of a subset , and, given , define , with . Then, a bump function based on and can be given by the convolution , . This function takes a value of outside , inside and a value between and at the points . Another example of bump function construction is provided in Section VII.

In this work, the notions of “risk” and “risk perception” relate to the way in which the values of are scaled and averaged in expectation. That is, risk is a moment of a given uncertain function (either or a composition with ). For example, the risk of being at a location can be measured via expected cost; that is , which may represent “expected damage to robot” with respect to uncertainty. However, there are other ways of weighting the outcomes to define alternative risk functions, such as using CPT. With this is mind, we proceed to define the following three main problems:

Problem 1

(CPT environment generator). Given the configuration space containing the uncertain cost along with the DM’s CPT parameters , obtain a DM’s (non-rational) perceived risk consistent with CPT theory.

Problem 2

(Planning with perceived risk). Given a start and goal points and , compute a desirable path from to in accordance with the DM’s perceived risk .

Problem 3

(CPT planner evaluation). Given the configuration space containing an uncertain cost along with a drawn path , evaluate the CPT planner as a model approximator to generate the perceived risk representing the drawn path .

Now we will proceed to solve the above problems in Section IV, Section V and Section VI respectively in order.

Iv Risk perception using CPT

In this section, we will generate a DM’s perceived risk in the setting of Section III, thus addressing Problem 1. We consider an uncertain cost is given at every point , which we approximate via its first two moments, a mean value and a standard deviation . In what follows, we use a discrete approximation4 of by considering bins, to obtain a set of possible cost values such that with their corresponding probabilities , such that . Further, we will assume that . In other words, even though cost values , , may change from point to point in , the probabilities remain the same for different . Note that we can do this wlog by discretizing the continuous RV appropriately, see Algorithm 1. The function finds such that with as a convention.

Now, the expected Risk at a point is

(3)

That is, from (3) we have an expected risk  associated with the configuration space, which is shown in Figure 0(a) and corresponds to a standard or rational notion of risk.

Next, we use the CPT notions developed in Section II to provide a non-rational perception model of the cost . According to CPT [4], functions which represent the non-rational perception of the probabilities in a cumulative fashion. Defining a partial sum function we have

(4)

where we employ the weighting function from (2).

With this, a DM’s CPT risk associated to the configuration is given by:

(5)

We note that both functions and are differentiable, which is important for the good behavior of the planner and which will be used for the analysis in Section V.

1Input:
2 Output :
3 for  do
4       ;
5       ;
6      
7 end for
8 ;
9 ;
10 for  do
11       ;
12      
13 end for
14 ;
Algorithm 1 CPT Environment (CPT-Env)

The above concepts are illustrated in Figure 1. Given an uncertain spatial cost with the first moment (Figure 2(b)) and second moment (Figure 2(c)) across an environment, the DM’s perception in the risk domain can vary from being rational (i.e. using expected risk in Figure 0(a)) to non-rational (i.e using CPT risk ). By varying , CPT risk can be tuned to represent risk averse (Figure 0(b)) and risk indifferent (Figure 2(d)) perception, as well as uncertainty indifferent (Figure 2(e)) to uncertainty averse (Figure 2(f)) perception.

This process gives us the CPT perceived risk at a point , the process is summarized in Algorithm 1. It can be seen that Algorithm 1 does not depend on the dimensionality of the space, making it scalable to high dimensional spaces.

V Sampling-based Planning using perceived risk

Here, we will use CPT notions to derive new cost functions, which will be used for planning in the DM’s perceived environment generated in Section IV. In traditional RRT* optimal planning is achieved using path length as the metric. In our setting, the notion of path length is insufficient as it does not capture the risk in the configuration space. Thus, we define cost functions that a) take into account risk and path length of a path, and b) satisfy the requirements that guarantee the asymptotic performance of an RRT*-based planner.

Path costs functions

Let two points be arbitrarily close. A decrease in risk is a desirable trait, hence it is reasonable to add an additional term in the cost only if , which indicates an increase in DM’s perceived risk by traveling from to . Consider the set of parameterized paths . First, we first define the cost of a path . Consider a discretization of given by with , for all . Then, a discrete approximation of the cost over should be:

where denotes the arc-length of the curve , and is a constant encoding an urgency versus risk tradeoff. The greater the value, the greater is the urgency and hence path length is more heavily weighted whereas, smaller indicates greater prominence towards risk. The choice of will be discussed in Section VII. By taking limits in the previous expression, and due to the continuity and integrability of , we can express as:

(6)
(7)

From here, the cost of traveling from to is given by

Similarly, the path cost using expected risk can be obtained by replacing the CPT cost in (7) with the expected risk as calculated in (3).

Remark 1

(Monotonicity). From the above definitions, it can be verified that the costs and satisfy monotonic properties in the sense that 1) they assign a positive cost to any path in , and 2) given two paths and , and their concatenation , in the space , it holds that (resp. ), (due to the additive property of the integrals) and 3) (resp. ) are bounded over a bounded .

1Input: ; Output :
2 , , ;
3 for  do
4       ; ;
5       ; ;
6       ; ;
7       ;
8       ;
9       for  do
10             ;
11             if  then
12                   ; ;
13                  
14             end if
15            
16       end for
17       ;
18       for  do
19             ;
20             if  then
21                   ;
22                   ;
23                   ;
24                   for  do
25                        
26                   end for
27                  
28             end if
29            
30       end for
31      
32 end for
33 ;
Algorithm 2 CPT-RRT*

Proposed Algorithm

Now we have all the elements to adapt RRT* to our problem setting. Given , a number of iterations and a start point , we wish to produce graph , which represents a tree rooted at whose nodes are sample points in the configuration space and the edges represent the path between the nodes in . Let be a function that maps to the cumulative cost to reach a point from the root of the tree using the CPT cost metric (7). Similarly we define for the expected cost function .

Remark 2

(Additivity). The cumulative costs and are additive with respect to costs and in the sense that: for any we have and similarly .

The other basic functional components of our algorithm CPT-RRT* (Algorithm 2) are similar to RRT*, and we briefly outline it out here for the sake of completeness:

  • : Returns a pseudo-random sample drawn from a uniform distribution across . Other risk-averse sampling schemes as in [21] may be employed. However, such schemes lead to conservative plans, which may not be suitable for all risk profiles.

  • : Returns the nearest node according to the Euclidean distance metric from in tree .

  • returns

  • : returns a set of nodes around , which are within a radius as given in [15].

  • : Returns the parent node of in the tree .

  • : Returns the list of children of in .

  • : Returns the path from the nearest node to in to .

We note that in order to compute for each path, we approximate the cost as the sum of costs over its edges, , and for each edge we compute the cost as the differences , where the latter is just the length of the edge. Then, this approximation will approach the computation of the real cost in the limit as the number of samples goes to infinity. The values are evaluated according to Algorithm 1. Our proposed CPT-RRT* algorithm augments RRT* algorithm in the following aspects: we consider a general continuous cost profile which leads to no obstacle collision checking. We also consider both path length and CPT costs for choosing parents and rewiring with the parameter which serves as relative weighting between CPT costs and Euclidean path length.

Remark 3

(ER-RRT*). We can obtain the expected risk version of Algorithm 2 by replacing cost function by and following the same procedure as Algorithm 2.

Lemma 1

(Asymptotic Optimality). Assuming compactness of and the choice of according to Theorem 38 in [15] , the CPT-RRT* algorithm is asymptotically optimal.

{proof}

It follows from the application of Theorem 38 in [15], and the conditions required for the result to hold. More precisely, the cost functions are monotonic (which follows from Remark 1), it holds that iff reduces to a single point (resp. the same for ), and the cost of any path is bounded. The latter follows from the compactness of and continuity of the cost functions. In addition, the costs are also cumulative, due to the additivity property in Remark 2. Finally, the result also requires the condition of the zero measure of the set of points of an optimal trajectory. This holds because both costs include a term for path length.

Simulation results of CPT-RRT* algorithm are presented in Section VII-A. Next we describe our proposed method to evaluate and compare risk perception models in our setting.

Vi CPT-planner parameter adaptation

In this section, we describe an algorithm that can adapt the CPT parameters of the CPT planner to approximate arbitrary paths in the environment. By doing so, we aim to evaluate the expressive power of the CPT planner or its capability to approximate single and arbitrary paths in the environment versus other approaches which which use different risk perception models.

If successful, this method could be used as a first ingredient in a larger scheme aimed at learning the risk function of a human decision maker5 using techniques such as inverse reinforcement learning (IRL). We recall that IRL requires either discrete state and action spaces or, if carried out over infinite-dimensional state and action spaces, a class of parameterized functions that can be used to approximate system outputs. Since our planning problem is defined over a continuous state and action space, the class of CPT planners for a parameter set could play the role of a function approximation class required to apply IRL. Then, as is done in IRL, a larger collection of path examples can used to learn the best weighted combination of specific CPT planners in the class. While certainly of interest, this IRL question is out of the scope of this work, and we just focus on analyzing the expressive power of the proposed class of CPT planners. Having a good expressive power is a necessary prerequisite for the class of CPT planners to constitute a viable function approximation class.

Toward this end, let us suppose that we have an arbitrary example path drawn in the environment. If the class of CPT planners is expressive enough, we should be able to find a set of parameters that is able to to exactly mimic this drawn path. Since an arbitrary path belongs to a very high dimensional space6 and the planner parameters are typically finite, any amount of parametric tuning may not produce good approximations. This is what we evaluate in the following. In what follows, we use the term to denote the area enclosed between the given path and another path . This value measures the closeness between and .

A path produced by a CPT planner can be represented by the CPT parameters . In order to find the closest possible path to we have to evaluate

(8)

where is the path produced by CPT-RRT* with CPT parameters , and is the set of all possible values of . Directly evaluating (8) is computationally not feasible as the set is infinite and resides in space.

An alternative to (8) is to use parameter estimation algorithms to determine which characterizes the path with as a loss/cost function. We note that neither can be computed directly (without running CPT-RRT* first), nor the gradient of wrt is accessible. This limits the use of standard gradient descent algorithms to estimate . To address this problem, we use SPSA [26] with as the loss function to estimate the parameters . Next, we briefly explain the main idea and adaptation of SPSA to our setting and refer the reader to [27, 26] for more detailed treatment and analysis of the SPSA algorithm.

We start with an initial estimate and iterate to produce estimates , using the loss function measurements . The main idea is to perturb the estimate according to  [26] to get and , for the iteration. These perturbations are then used to generate the perturbed paths using Algorithm 2. With these perturbed paths, the loss function measurements are evaluated and used to update our parameter according to [26]. To test the goodness of the updated parameter, we determine the corresponding path and measure . If the area is within a tolerance , that is, if , the iteration stops and is returned. We followed the guidelines from [27, 26] for choosing the parameters used in SPSA. The results of this adaptation are evaluated and compared with the results that employ other risk perception models in Section VII.

Vii Results and Discussion

In this section we illustrate the results of the solutions to the problem statement proposed in Sections IV,V, and VI considering a specific scenario having some risk and uncertainty profiles.

Vii-a Environment Perception and Planning

We consider a hypothetical scenario where an agent needs to navigate in a room during a fire emergency. In this, the 2D configuration space for planning becomes . The agent is shown a rough floor map (Figure 2(a)) with obstacles (which are thought to be ablaze) in the environment with a blot of ink/torn patch, making that region unclear and hard to decipher. This results in the spatial uncertain cost with first moment () represented by cost associated to obstacles and fire source and second moment () represented by the uncertainty associated to the ink spot/tear.

The blue colored objects are the obstacles whose location is known to be within some tolerance (dark green borders) and the light orange ellipses illustrate that these objects have caught fire. The grey ellipse indicates a possible tear/ink spot on the map, which makes that particular region hard to read. The start and goal positions are indicated as blue spot and green cross respectively. We use a scaled sum of bi-variate Gaussian distribution to model the sources of continuous cost (orange ellipses) with appropriate means and variances to depict the scenario in Figure 2(a). We utilize bump functions from differential geometry [25] to create smooth “bumps” depicting the discrete obstacles. One approach to do this is described in Section III. An alternative procedure is briefly described as follows. Consider the maximum cost value imparted to the obstacles as and let be the inner (blue rectangle) and outer (dark green borders) measurements of the obstacles from the center . Let be a point in the configuration space with being real valued scalar functions given by , and . Then, can be calculated by :

(9)

This procedure produces smooth “bumps” in the cost profile which are visualized in Figure 2(b) using . This approach can be easily generalized to arbitrary high dimensions by simply multiplying upto terms in (9) to create a bump function in the dimension. To generate the second moment of cost , we use a scaled bi-variate Gaussian distribution with appropriate means and variances to depict the ink spot/tear in Figure 2(a). Now we will illustrate the results of implementing Algorithm 2 in this environment.

Simulations and discussions

With the uncertain cost with moments and from previous paragraph, we use a half Normal distribution and discretization factor to generate the costs and their corresponding from Section IV, the results of using Algorithm 1 to every point in to generate the perceived environment is shown in Figures 1 and 3. The level of risk at a point or is indicated by color map. Figure 0(a) shows a rationally perceived environment using expected risk . Whereas, Figure 0(b) indicates a non-rational highly risk averse perception using CPT () with having a high value. A risk indifferent profile (Figure 2(d)) is generated by having a low risk sensitivity value. Similarly, uncertainty indifferent profile (Figure 2(e)) and uncertainty averse profile (Figure 2(f)) are generated by fixing and having high and low values respectively.

(a) Rough sketch of environment.
(b) Mean cost
(c) Uncertainty
(d) Risk indifferent profile with
(e) Uncertainty indifferent profile with
(f) Uncertainty averse profile
Fig. 3: Environment perception using CPT.

After the perceived environment is generated, Algorithm 2 is used to plan a path from the start point to the goal point shown in Figure 2(a). We use iterations for the CPT-RRT* algorithm with . The same random seed was used for all executions for consistency. The path planning results are illustrated in Figure 4. As expected, we see that the path depends on the perceived risk profile. Figure 3(a) indicates a circuitous path due to the highly risk averse perception, whereas Figure 3(d) indicates a shorter and more direct path for a rational DM using expected risk. Increasing the uncertainty sensitivity (lowering ) and reducing risk aversion (lowering ) makes the planner avoid the highly uncertain ink spot/tear in the top-right region and take a more riskier path in the lower region as shown in Figure 3(b). By having a medium risk aversion and lower uncertainty sensitivity (increasing ), the planner produces a different path through the medium risky and uncertain middle region as shown in Figure 3(c).

(a) High cost and uncertainty Aversion ()
(b) Medium cost and high uncertainty aversion ()
(c) Medium cost and low uncertainty aversion ()
(d) Expected Risk
Fig. 4: Paths produced by CPT-RRT* under different perception models. White lines indicate the tree grown from the start position, red line indicates the optimal path to goal after iterations. Background color map depicts the CPT costs in (a)-(c) and expected costs in (d)

Solution quality

Figure 5 illustrates the empirical convergence and solution quality of the paths produced by our algorithm. We performed empirical convergence tests, by running CPT-RRT* times with the same parameters and initial conditions and measuring the area between paths produced after every iterations for a total of iterations. The results are shown in Figure 4(a). We see that initially ( iterations) there are changes in the output path as the space is being explored and the output path is changing. After iterations we consistently see minimal path changes indicating that the algorithm is converging towards a desirable path. Then we also checked the solution quality of the path by computing the cost of the output path every iterations as shown in Figure 4(b) for trials consisting of iteration. We see that the there is a consistent decrease in path cost in all the trials throughout. We also note that after iterations the cost decrease starts to plateau, indicating that the algorithm is close to a high quality (low cost) solution. From these observations of Figure 5, we recommend upwards of iterations to achieve smooth and consistent paths in our setting.

(a) Convergence over iterations
(b) Path cost over iterations
Fig. 5: a) Empirical convergence analysis. The distance between paths after every iterations (y-axis) with the number of iterations in thousand (x-axis). b) Cost of the output path (y-axis) every iterations with the number of iterations (x-axis)
(a) Paths produced by CPT-RRT*.
(b) Paths produced by T-RRT* using IC.
(c) Paths produced by T-RRT* using MW.
(d) Paths produced by varying
Fig. 6: (a)-(c)Paths produced in a cluttered environment using iterations for CPT-RRT* and nodes for T-RRT*7

Comparison in narrow and cluttered environments

Here, we will illustrate and compare the performance between our RRT* framework and T-RRT* [21] (another algorithm operating on continuous cost spaces) in a cluttered environment with narrow passages as shown in Figure 6. To construct this environment, we used 100 randomly placed small objects on the right half and two big objects separated by a narrow passage on the left half. Start point is on the top right corner and the goal is at the center of the narrow passage. Bump functions similar to previous paragraphs were used to construct a smooth spatial cost from the obstacles. Since T-RRT* does not have risk perception capabilities, for a fair comparison we use the continuous cost to implement both algorithms. In this way, we will be able to specifically compare the planning capabilities of both algorithms in the same continuous cost environment. We used and for both algorithms. From Figure 5(a) we can see that our algorithm is able to sample and generate paths in the narrow passage, as well as avoid obstacles in a cluttered environment. In comparison, we can see that from T-RRT* employing integral cost (IC) in Figure 5(b) and minimum work (MW) in Figure 5(c) cannot generate paths in the narrow costly region fast enough irrespective of the used due to the sampling bias towards low-cost regions. Also, T-RRT* paths do not appear to be as smooth as the paths from our framework, irrespective of the cost(IC or MW) used. We also note that, the cluttered and high cost environment induces a high failure rate of the transition test, resulting in longer run times of T-RRT* required to build the same number of nodes as our algorithm, especially for high values.

Variation in

Using the previous environment (Figure 2(a)) and a cost and uncertainty averse profile (Figure 3(a))), we run CPT-RRT* with varying from to for iterations. The results are shown in Figure 5(d). We can see that when the path output changes reflecting an increase in urgency over risk and thus choosing shorter paths. When is comparable to the risk values (in this case ), we see that the paths no longer avoid the high risk area and can even go through the soft obstacles. From this study, we observe that needs to be rather small as compared to the given risk profile in order to ensure meaningful consideration of risk in the planning process. If explicit obstacle avoidance is a necessity, then a standard collision check can be performed prior to adding a node in the tree .

Overall, our adaptation of CPT to the planning setting produces paths that are logically consistent with a given risk scenario. Additionally, our planning framework can explore narrow corridors and cluttered environment and produce smooth paths quickly.

Vii-B CPT planner expressive power evaluation

We now discuss the proposed SPSA framework in Section VI to gauge the adaptability of CPT as a perception model to depict a drawn path . We compare CPT with Conditional Value at Risk (CVaR) [10], also known as “expected shortfall”, another popular risk perception model in the financial decision making community. CVaR uses a single parameter representing the fraction of worst case outcomes to consider for evaluating expected risk of an uncertain cost . We will use to denote the perceived risk by CVaR model with . So a considers the worst case outcome of and a considers all the outcomes thus making the CVaR value equal to expected risk ().

To implement SPSA, we follow guidelines from [27]. We consider a Bernoulli distribution of with support and equal probabilities, learning rate and perturbation parameter .

We choose for CPT throughout the simulation, which are the nominal parameters from [4] and for the CVaR variant. We use the same environment as in Figure 3 for all the simulations. Four different paths are drawn by hand on the expected risk profile (Figure 3(d)) using a computer mouse as shown in Figure 6(a). Path is similar to a path generated with expected risk perception (Figure 3(d)). Whereas, path and are similar to paths generated with high risk aversion (Figure 3(a)) and uncertainty insensitivity (Figure 3(c)) respectively. Path is more challenging to represent as it shows an initial aversion to risk and uncertainty and then takes a seemingly costlier turn at the top.

We then use the SPSA approach described in Section VI with a tolerance and a maximum of SPSA iterations per trial. We use iterations and to implement Algorithm 2 to determine in order to determine the loss during each SPSA iteration. For the CVaR variant, the planner (Algorithm 2) replaces with in order to use perceived risk according to CVaR while the rest of the RRT* framework remains unchanged. At the end each trial we get the area (loss) between the returned and the drawn path .

(a) Four paths are drawn in blue, orange, green and red respectively.
(b) Boxplots showing the cost(Area) returned after using SPSA with CPT and CVaR to capture risk profile of the drawn paths.
Fig. 7: Result of using CPT and CVaR to model drawn paths.

We represent the statistics of the returned cost as boxplots as shown in Figure 6(b). Each box plot represents the distribution of cost/Area values returned after each trial for each path and perception model. The Y-Axis represents the cost/Area in a base log scale. We calculate a few sample areas: and to give a quantitative idea of the measure in this scenario to the reader. The median values for each box plot is indicated on the top row. The mean value of the distribution is indicated as “stars”, the black lines above and below the box represent the range, and indicates outliers.

We observe that from Figure 6(b), both Path and Path were captured equally well with CVar and CPT with low values. Since both CPT and CVaR are generalizations of expected risk, paths close (like ) to paths generated from expected risk can be easily mimicked. Similarly, since CPT and CVaR are designed to capture risk aversion, paths close (like ) to risk averse paths (Figure 3(a)) can also be easily captured.

However, we see a contrast in performance for path and path . CPT, on both occasions, is able to track the drawn paths reasonably well with low values. Whereas CVaR has consistently higher (an order of magnitude) values, indicating the inability to capture the risk perception leading to path and path . This is due to the fact that CPT can handle uncertainty perception independently from the cost (as seen between Figure 2(e) and Figure 2(f)). This ability is needed to capture paths like and which is lacking in models like CVaR and expected risk. This shows the generalizability of CPT over CVaR with CPT having a richer modeling capability.

Viii Conclusions and Future Work

In this work, we have proposed a novel adaptation of CPT to model a DM’s non-rational perception of a risky environment in the context of path planning. Firstly, we have demonstrated a DM’s non-rational perception of a 2D environment embedded with an uncertain spatial cost using CPT, and provide a tuning knob to model various cost and uncertainty perceptions. Next, we propose and demonstrate a novel embedding of non-rational risk perception into a sampling based planner, the CPT-RRT*, which utilizes the DM’s perceived environment to plan asymptotically optimal paths. Finally, we evaluate CPT as a good approximator to the risk perception of arbitrary drawn paths by comparing against CVaR, and show that CPT is a richer model approximator. Future work will analyze how CPT can be used to learn the risk profile of a decision maker by using learning frameworks like IRL and conducting user studies.

Footnotes

  1. CPT has an alternate perception model for random rewards [23], which is not used here since we are interested in cost perception.
  2. Instead of a operation, one may use an expected operation wrt .
  3. Bump functions can also be defined and formalized in arbitrary Riemannian manifolds.
  4. The discretization of the random cost function is used to be able to use CPT directly with discrete random variables. However, it is possible to generalize what follows to the continuous random variable case.
  5. Just for offline planning, or in situations where the human does not update the environment online as new information is found.
  6. An arbitrary path can be modeled as a curve defined by a large number of parameters (possibly infinite).
  7. We used nodes instead of iterations for T-RRT* to maintain an equal number of nodes in the tree, as a node does not get added it fails the transition test. (d) Paths produced by CPT-RRT* with varying with

References

  1. H. Choset, S. Hutchinson, K. M. Lynch, G. Kantor, W. Burgard, L. E. Kavraki, and S. Thrun, Principles of Robot Motion: Theory, Algorithms and Implementations.   The MIT Press, 2005.
  2. C. Wu, A. Bayen, and A. Mehta, “Stabilizing traffic with autonomous vehicles,” in IEEE Int. Conf. on Robotics and Automation, 2018, pp. 1–7.
  3. A. Suresh and S. Martínez, “Gesture-based human-swarm interactions for formation control using interpreters,” in IFAC Conf. on Cyber Physical and Human Systems, vol. 51, no. 34, Miami, FL, USA, 2018, pp. 83–88.
  4. A. Tversky and D. Kahneman, “Advances in prospect theory: Cumulative representation of uncertainty,” Journal of Risk and Uncertainty, vol. 5, no. 4, pp. 297–323, 1992.
  5. H. Kurniawati, T. Bandyopadhyay, and N. M. Patrikalakis, “Global motion planning under uncertain motion, sensing, and environment map,” Autonomous Robots, vol. 33, no. 3, pp. 1–18, 2012.
  6. J. Song, S. Gupta, and T. Wettergren, “T*: Time-optimal risk-aware motion planning for curvature-constrained vehicles,” IEEE Robotics and Automation letters, vol. 4, no. 1, pp. 33–40, 2019.
  7. B. Burns and O. Brock, “Sampling-based motion planning with sensing uncertainty,” in IEEE Int. Conf. on Robotics and Automation, 2007, pp. 3313–3318.
  8. S. Singh, Y. Chow, A. Majumdar, and M. Pavone, “A framework for time-consistent, risk-sensitive model predictive control: Theory and algorithms,” IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2905–2912, 2019.
  9. A. Hakobyan, G. Kim, and I. Yang, “Risk-aware motion planning and control using CVaR-constrained optimization,” IEEE Robotics and Automation letters, vol. 4, no. 4, pp. 3924–3931, 2019.
  10. P. Artzner, F. Delbaen, J. Eber, and D. Heath, “Coherent measures of risk,” Mathematical Finance, vol. 9, no. 3, pp. 203–228, 1999.
  11. S. Gao, E. Frejinger, and M. Ben-Akiva, “Adaptive route choices in risky traffic networks: A prospect theory approach,” Transportation Research Part C: Emerging Technologies, vol. 18, no. 5, pp. 727–740, 2010.
  12. A. R. Hota and S. Sundaram, “Game-Theoretic Protection against Networked sis Epidemics by Human Decision-Makers,” IFAC Papers Online, vol. 51, no. 34, pp. 145–150, 2019.
  13. C. Jie, P. L. A., M. Fu, S. Marcus, and C.Szepesvari, “Stochastic optimization in a Cumulative Prospect Theory framework,” IEEE Transactions on Automatic Control, vol. 63, no. 9, pp. 2867–2882, 2018.
  14. L. Wang, Q. Liu, and T. Yin, “Decision-making of investment in navigation safety improving schemes with application of Cumulative Prospect Theory,” Journal of Risk and Reliability, vol. 232, no. 6, pp. 710–724, 2018.
  15. S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion planning,” International Journal of Robotics Research, vol. 30, no. 7, pp. 846–894, 2011.
  16. B. Boardman, T. Harden, and S. Martínez, “Limited range spatial load balancing in non-convex environments using sampling-based motion planners,” Autonomous Robots, vol. 42, no. 8, pp. 1731–1748, 2018.
  17. W. Chi and M. Q. Meng, “Risk-RRT*: A robot motion planning algorithm for the human robot coexisting environment,” in Int. Conf. on Advanced Robotics, 2017, pp. 583–588.
  18. B. Englot, T. Shan, S. D. Bopardikar, and A. Speranzon, “Sampling-based min-max uncertainty path planning,” in IEEE Int. Conf. on Decision and Control, 2016, pp. 6863–6870.
  19. E. Schmerling, K. Leung, W. Vollprecht, and M. Pavone, “Multimodal Probabilistic Model-based planning for Human-Robot Interaction,” in IEEE Int. Conf. on Robotics and Automation, 2018, pp. 1–9.
  20. B. Luders, S. Karaman, and J. How, “Robust sampling-based motion planning with asymptotic optimality guarantees,” in AIAA Conf. on Guidance, Navigation and Control, August 2013.
  21. D. Devaurs, T. Simeon, and J. Cortes, “Optimal Path Planning in Complex Cost Spaces with Sampling-based Algorithms,” IEEE Transactions on Automation Sciences and Engineering, vol. 13, no. 2, pp. 415–424, 2016.
  22. T. Shan and B. Englot, “Sampling-based minimum risk path planning in multiobjective configuration spaces,” in IEEE Int. Conf. on Decision and Control, 2015, pp. 814–821.
  23. S. Dhami, The Foundations of Behavioral Economic Analysis.   Oxford University press, 2016.
  24. D.Prelec, “The probability weighing function,” Econometrica, vol. 66, no. 3, pp. 497–527, 1998.
  25. L. Tu, An Introduction to Manifolds.   Springer, 2011.
  26. J. C. Spall, “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation,” IEEE Transactions on Automatic Control, vol. 37, no. 3, pp. 332–341, 1992.
  27. ——, Introduction to Stochastic Search and Optimization.   New York, 2003.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
409373
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel