Planning under nonrational perception of uncertain spatial costs
Abstract
This work investigates the design of motionplanning strategies that can incorporate nonrational perception of risks associated with uncertain spatial costs. Our proposed method employs the Cumulative Prospect Theory (CPT) model to generate a perceived risk function across a given environment, which is scalable to high dimensional space. Using this, CPTlike perceived risks and pathlength metrics are combined to define a cost function that is compliant with the requirements of asymptotic optimality of samplingbased motion planners (RRT*). The modeling capabilities of CPT are demonstrated in simulation by producing a rich set of meaningful paths, capturing a range of different risk perceptions in a custom environment. Furthermore, using a simultaneous perturbation stochastic approximation (SPSA) method, we investigate the capacity of these CPTbased riskperception planners to approximate arbitrary paths drawn in the environment. We compare this adaptability with Conditional Value at Risk (CVaR), another popular risk perception model. Our simulations show that CPT is richer and able to capture a larger class of paths as compared to CVaR and expected risk in our setting.
I Introduction
Motivation
Autonomous robots ranging from industrial manipulators to robotic swarms [1, 2, 3], are becoming less isolated and increasingly more interactive in today’s world. Arguably, most environments where these robots operate, have an associated degree of spatial cost, which can lead to a loss or damage to the robots. For example, by moving onto an oily surface, a robot may collide with a nearby obstacle, resulting in a (mild or hard) crash. In more complex scenarios, a decision maker (DM) may be directly involved with the motion of an autonomous system, such as in robotic surgery, search and rescue operations, or autonomous car driving. The risk perceived from these costs or losses could vary among different DMs. In such cases, the decision making of an autonomous system w.r.t perceived losses may influence the confidence that a DM has on the robot. This motivates to consider richer models that are inclusive of nonrational perception of these spatial costs into planning algorithms for robotic systems. We aim to incorporate one such model known as Cumulative Prospect Theory (CPT) [4] into path planning, and compare the qualitative behavior of its output paths with those obtained from other risk perception models.
Related Work
Traditional riskaware path planning considers risk explicitly in forms such as motion and state uncertainty [5], collision time [6], or sensing uncertainty [7]. However, how these risks are perceived or relatively weighted has been an overlooked topic. A few recent works [8] contemplate risk perception models, but assume rational DMs and use coherent risk measures like Conditional Value at Risk (CVaR) [9]. Unlike CPT suggests, these risk measures are built using certain axioms that assume rationality and linearity of the DM’s risk perception [10]. CPT has been extensively used in numerous engineering applications like traffic routing [11], network protection [12], stochastic optimization [13], and safe shipping policies [14] to model nonrational decision making. However, these notions are yet to be applied in the context of robotic planning and control.
Regarding planning algorithms themselves, RRT* [15] has been the basis for many motion planners owing to its asymptotic optimality properties and its ability to solve complex problems [16]. Risk [17] and uncertainty [18] have been an ingredient of motion planning problems involving a human, but have been mainly modeled in a probabilistic manner [19] with discrete obstacles. CCRRT* [20] handles both agent and environment uncertainty in a robust manner, but uncertainty or risk perception models are not used and discrete polyhedral obstacles are considered which cannot incorporate continuous spatial costs. Very few of these works have considered modeling planning environments via continuous cost maps [21, 22], while, to the best of our knowledge, the simultaneous treatment of cost and uncertainty perception to model a DM’s spatial risk profile has largely been ignored.
Contributions
Our contributions lie in three main areas: Firstly, we adapt CPT into path planning to model nonrational perception of spatial cost embedded in an environment. With this, we can capture a larger variety of risk perception models, extending the existing literature. Secondly, we generate desirable paths using a samplingbased (RRT*based) planning algorithm on the perceived risky environment. Instead of just using path length for costs, our planner embeds a continuous risk profile and path length to calculate the costs, enabling us to plan in the perceived environment setting above. Furthermore, we prove that our chosen cost satisfies the sufficient conditions for asymptotic optimality of the planner, leading to reliable and consistent paths according to a specified risk profile in the perceived environment. Finally, using SPSA, we measure the adaptability of risk perception models in the planning algorithms to express a given arbitrary path in an environment. This can be an useful first step towards a larger learning scheme for capturing human path planning using frameworks like inverse reinforcement learning (which is currently out of scope for this work). We would like to clarify that in this work, we examine CPT based environment perception models for planning and leave the validation of CPT based models with human user studies for future work.
Figure 1 shows a preview of how a nonlinear DM’s perception of the environment influences the path produced to reach a goal. While Figure 0(a) shows a rational perception of the environment using expected risk, Figure 0(b) illustrates a nonlinearly deformed and scaled surface that reflects the perception of a certain DM using CPT. In particular, as discussed later, we observe a richer path behavior of CPTbased planners as compared with others.
Ii Preliminaries
In this section, we describe some basic notations used in the paper along with a concise description of Cumulative Prospect Theory. More details about CPT can be found in [23].
Notation
We let denote the space of real numbers, the space of positive integers and the space of non negative real numbers. Also, and denote the dimensional real vector space and the configuration space used for planning. We use for the Euclidean norm and for the composition of two functions and , that is . We model a tree by an directed graph , where denotes the set of sampled points (vertices of the graph), and , denotes the set of edges of the graph.
Cost and uncertainty weighting using CPT
CPT is a nonrational decision making model which incorporates nonlinear perception of uncertain costs. Traditionally it has been used in scenarios of monetary outcomes such as lotteries [4] and the stock market [23].
Let us suppose a DM is presented with a set of prospects
, representing potential outcomes
and their probabilities, . More precisely, there are possible outcomes associated with a
prospect , given by , for , which can happen with a probability . The outcomes are arranged in a decreasing order denoted by
with their corresponding
probabilities, which satisfy . The outcomes of
prospect may be interpreted as the random cost
We define a utility function, modeling a DM’s perceived cost and as the probability weighting function which represents the DM’s perceived uncertainty. While previous literature have used various forms for these functions, here we will focus on CPT utility function taking the form:
(1) 
where and . Tversky and Kahnemann [4] suggest the use of and to parametrize an average human in the scenario of monetary lotteries, however this may not hold for our application scenario. The parameter represents the coefficient of cost aversion with greater values implying stronger aversion indicative of higher perceived costs, as indicated in Figure 1(a). The parameter represents the coefficient of cost sensitivity with lower values implying greater indifference towards cost which is indicated in Figure 1(b).
We will be using the popular Prelec’s probability weighting function [23, 24] indicative of perceived uncertainty, which takes the form:
(2) 
Figures 1(c) and 1(d) show changing uncertainty perception resulting from varying and respectively. By choosing low and values, one can get “uncertainty averse” behavior, with implying that unlikely outcomes are perceived to be more certain, as seen on Figures 1(c) and 1(d). On the other hand when , “uncertainty insensitive” behavior is obtained implying that the DM only considers more certain outcomes, which can be observed with high and values.
These concepts illustrate the nonlinear perception of cost and uncertainty, a DM under consideration can be categorized by the parameters . Using the nonlinear parametric perception functions and , CPT calculates a value function , indicating the perceived risk value of the prospect . This calculation is detailed in Section IV using our planning setting.
Iii Environment Setup and Problem Statement
In this work, we consider spatial sources of risk embedded in the space. Our starting point is an uncertain cost function that aims to quantify objectively the (negative) consequences of being at a location or adopting a certain decision under uncertainty at a point .
For example, suppose a robot moves to a location from where
there is an obstacle with certain probability. Then, we can define a
cost measurement as the possible damage to the robot by moving from
to under action applied at . A cost value can be defined depending on i) the type of robot (flexible robot
or rigid robot), ii) the probability of having an obstacle in the said
location, and iii) the type of action applied at to get to
(e.g. slow/fast velocity). For simplicity, adopting a worstcase
scenario, we may reduce the previous cost function to a function of
the state
As another example, consider a drone navigating in a building which is ablaze. In this case, the cost function can be proportional to the temperature profile. As sensors are noisy, the temperature profile is uncertain, resulting into a noisy spatial cost value . Similarly, environmental conditions that affect the robot’s motion may lead to underperformance. When moving over an icy road, the dynamics of the robot may behave unpredictably, resulting in a temporary loss of control and departure from an intended goal state. In this case, the uncertain cost may be quantified as the state disturbance under a given action over a given period of time. For example, for a simple secondorder and fully actuated vehicle dynamics with acceleration input which is subject to a locally constant “ice” disturbance , in a small neighborhood of , we have for a small time . Thus, the difference with an intended state can be measured by the random variable , which encodes information about and the unit time of actuation, . Here is uncertain, and can be modeled with prior data or measured with a noisy sensor as in the temperature profile case.
Prior knowledge in the form of expert inputs and data collected from sensors can be used to get information about the cost , environmental uncertainty, and the robot’s capabilities. In this way, icy roads pose much lesser cost in the previous sense to a 4WD car with snow tires than a 2WD car with summer tires, hence the same cost at a given location could be scaled differently depending upon the robot’s capabilities.
In this work, we will assume that the cost at a location has been characterized as a random variable
with a mean and standard deviation
, for each .
In particular, it is reasonable to approximate
via a “bump function,” a concept extensively used in
differential geometry [25]. To fix ideas and be more
specific, consider the previous case where a vehicle moves through
an “icy” environment, and assume . Then, a mean
disturbance over a subset should result
approximately into a disturbance , where is the portion of the
trajectory from to that is inside the icy section . As
is farther from , the disturbance reduces its effect on ,
and the value of should decrease to zero. In other words,
there is a such that , where is an enlarged
region whose boundary delimits the uncertain cost area from the
certain one (i.e. outside the cost is zero with low
uncertainty). The effect of is thus similar to that of a bump
function defined with respect to and . Bump functions are
infinitely smooth, take a positive constant value over , which
smoothly decreases and becomes zero outside . There are many ways
of defining bump functions, such as via convolutions, which works in
arbitrary dimensions as described the following
In this work, the notions of “risk” and “risk perception” relate to the way in which the values of are scaled and averaged in expectation. That is, risk is a moment of a given uncertain function (either or a composition with ). For example, the risk of being at a location can be measured via expected cost; that is , which may represent “expected damage to robot” with respect to uncertainty. However, there are other ways of weighting the outcomes to define alternative risk functions, such as using CPT. With this is mind, we proceed to define the following three main problems:
Problem 1
(CPT environment generator). Given the configuration space containing the uncertain cost along with the DM’s CPT parameters , obtain a DM’s (nonrational) perceived risk consistent with CPT theory.
Problem 2
(Planning with perceived risk). Given a start and goal points and , compute a desirable path from to in accordance with the DM’s perceived risk .
Problem 3
(CPT planner evaluation). Given the configuration space containing an uncertain cost along with a drawn path , evaluate the CPT planner as a model approximator to generate the perceived risk representing the drawn path .
Iv Risk perception using CPT
In this section, we will generate a DM’s perceived risk in the setting
of Section III, thus addressing Problem 1.
We
consider an uncertain cost is given at every point , which we approximate via its first two moments, a
mean value and a standard deviation
. In what follows, we use a
discrete approximation
Now, the expected Risk at a point is
(3) 
That is, from (3) we have an expected risk associated with the configuration space, which is shown in Figure 0(a) and corresponds to a standard or rational notion of risk.
Next, we use the CPT notions developed in Section II to provide a nonrational perception model of the cost . According to CPT [4], functions which represent the nonrational perception of the probabilities in a cumulative fashion. Defining a partial sum function we have
(4) 
where we employ the weighting function from (2).
With this, a DM’s CPT risk associated to the configuration is given by:
(5) 
We note that both functions and are differentiable, which is important for the good behavior of the planner and which will be used for the analysis in Section V.
The above concepts are illustrated in Figure 1. Given an uncertain spatial cost with the first moment (Figure 2(b)) and second moment (Figure 2(c)) across an environment, the DM’s perception in the risk domain can vary from being rational (i.e. using expected risk in Figure 0(a)) to nonrational (i.e using CPT risk ). By varying , CPT risk can be tuned to represent risk averse (Figure 0(b)) and risk indifferent (Figure 2(d)) perception, as well as uncertainty indifferent (Figure 2(e)) to uncertainty averse (Figure 2(f)) perception.
V Samplingbased Planning using perceived risk
Here, we will use CPT notions to derive new cost functions, which will be used for planning in the DM’s perceived environment generated in Section IV. In traditional RRT* optimal planning is achieved using path length as the metric. In our setting, the notion of path length is insufficient as it does not capture the risk in the configuration space. Thus, we define cost functions that a) take into account risk and path length of a path, and b) satisfy the requirements that guarantee the asymptotic performance of an RRT*based planner.
Path costs functions
Let two points be arbitrarily close. A decrease in risk is a desirable trait, hence it is reasonable to add an additional term in the cost only if , which indicates an increase in DM’s perceived risk by traveling from to . Consider the set of parameterized paths . First, we first define the cost of a path . Consider a discretization of given by with , for all . Then, a discrete approximation of the cost over should be:
where denotes the arclength of the curve , and is a constant encoding an urgency versus risk tradeoff. The greater the value, the greater is the urgency and hence path length is more heavily weighted whereas, smaller indicates greater prominence towards risk. The choice of will be discussed in Section VII. By taking limits in the previous expression, and due to the continuity and integrability of , we can express as:
(6) 
(7) 
From here, the cost of traveling from to is given by
Similarly, the path cost using expected risk can be obtained by replacing the CPT cost in (7) with the expected risk as calculated in (3).
Remark 1
(Monotonicity). From the above definitions, it can be verified that the costs and satisfy monotonic properties in the sense that 1) they assign a positive cost to any path in , and 2) given two paths and , and their concatenation , in the space , it holds that (resp. ), (due to the additive property of the integrals) and 3) (resp. ) are bounded over a bounded .
Proposed Algorithm
Now we have all the elements to adapt RRT* to our problem setting. Given , a number of iterations and a start point , we wish to produce graph , which represents a tree rooted at whose nodes are sample points in the configuration space and the edges represent the path between the nodes in . Let be a function that maps to the cumulative cost to reach a point from the root of the tree using the CPT cost metric (7). Similarly we define for the expected cost function .
Remark 2
(Additivity). The cumulative costs and are additive with respect to costs and in the sense that: for any we have and similarly .
The other basic functional components of our algorithm CPTRRT* (Algorithm 2) are similar to RRT*, and we briefly outline it out here for the sake of completeness:

: Returns a pseudorandom sample drawn from a uniform distribution across . Other riskaverse sampling schemes as in [21] may be employed. However, such schemes lead to conservative plans, which may not be suitable for all risk profiles.

: Returns the nearest node according to the Euclidean distance metric from in tree .

returns

: returns a set of nodes around , which are within a radius as given in [15].

: Returns the parent node of in the tree .

: Returns the list of children of in .

: Returns the path from the nearest node to in to .
We note that in order to compute for each path, we approximate the cost as the sum of costs over its edges, , and for each edge we compute the cost as the differences , where the latter is just the length of the edge. Then, this approximation will approach the computation of the real cost in the limit as the number of samples goes to infinity. The values are evaluated according to Algorithm 1. Our proposed CPTRRT* algorithm augments RRT* algorithm in the following aspects: we consider a general continuous cost profile which leads to no obstacle collision checking. We also consider both path length and CPT costs for choosing parents and rewiring with the parameter which serves as relative weighting between CPT costs and Euclidean path length.
Remark 3
Lemma 1
(Asymptotic Optimality). Assuming compactness of and the choice of according to Theorem 38 in [15] , the CPTRRT* algorithm is asymptotically optimal.
It follows from the application of Theorem 38 in [15], and the conditions required for the result to hold. More precisely, the cost functions are monotonic (which follows from Remark 1), it holds that iff reduces to a single point (resp. the same for ), and the cost of any path is bounded. The latter follows from the compactness of and continuity of the cost functions. In addition, the costs are also cumulative, due to the additivity property in Remark 2. Finally, the result also requires the condition of the zero measure of the set of points of an optimal trajectory. This holds because both costs include a term for path length.
Simulation results of CPTRRT* algorithm are presented in Section VIIA. Next we describe our proposed method to evaluate and compare risk perception models in our setting.
Vi CPTplanner parameter adaptation
In this section, we describe an algorithm that can adapt the CPT parameters of the CPT planner to approximate arbitrary paths in the environment. By doing so, we aim to evaluate the expressive power of the CPT planner or its capability to approximate single and arbitrary paths in the environment versus other approaches which which use different risk perception models.
If successful, this method could be used as a first ingredient in a
larger scheme aimed at learning the risk function of a human decision
maker
Toward this end, let us suppose that we have an arbitrary
example path drawn in the environment. If the class of CPT
planners is expressive enough, we should be able to find a set of
parameters that is able to to exactly mimic this drawn path. Since an
arbitrary path belongs to a very high dimensional
space
A path produced by a CPT planner can be represented by the CPT parameters . In order to find the closest possible path to we have to evaluate
(8) 
where is the path produced by CPTRRT* with CPT parameters , and is the set of all possible values of . Directly evaluating (8) is computationally not feasible as the set is infinite and resides in space.
An alternative to (8) is to use parameter estimation algorithms to determine which characterizes the path with as a loss/cost function. We note that neither can be computed directly (without running CPTRRT* first), nor the gradient of wrt is accessible. This limits the use of standard gradient descent algorithms to estimate . To address this problem, we use SPSA [26] with as the loss function to estimate the parameters . Next, we briefly explain the main idea and adaptation of SPSA to our setting and refer the reader to [27, 26] for more detailed treatment and analysis of the SPSA algorithm.
We start with an initial estimate and iterate to produce estimates , using the loss function measurements . The main idea is to perturb the estimate according to [26] to get and , for the iteration. These perturbations are then used to generate the perturbed paths using Algorithm 2. With these perturbed paths, the loss function measurements are evaluated and used to update our parameter according to [26]. To test the goodness of the updated parameter, we determine the corresponding path and measure . If the area is within a tolerance , that is, if , the iteration stops and is returned. We followed the guidelines from [27, 26] for choosing the parameters used in SPSA. The results of this adaptation are evaluated and compared with the results that employ other risk perception models in Section VII.
Vii Results and Discussion
In this section we illustrate the results of the solutions to the problem statement proposed in Sections IV,V, and VI considering a specific scenario having some risk and uncertainty profiles.
Viia Environment Perception and Planning
We consider a hypothetical scenario where an agent needs to navigate in a room during a fire emergency. In this, the 2D configuration space for planning becomes . The agent is shown a rough floor map (Figure 2(a)) with obstacles (which are thought to be ablaze) in the environment with a blot of ink/torn patch, making that region unclear and hard to decipher. This results in the spatial uncertain cost with first moment () represented by cost associated to obstacles and fire source and second moment () represented by the uncertainty associated to the ink spot/tear.
The blue colored objects are the obstacles whose location is known to be within some tolerance (dark green borders) and the light orange ellipses illustrate that these objects have caught fire. The grey ellipse indicates a possible tear/ink spot on the map, which makes that particular region hard to read. The start and goal positions are indicated as blue spot and green cross respectively. We use a scaled sum of bivariate Gaussian distribution to model the sources of continuous cost (orange ellipses) with appropriate means and variances to depict the scenario in Figure 2(a). We utilize bump functions from differential geometry [25] to create smooth “bumps” depicting the discrete obstacles. One approach to do this is described in Section III. An alternative procedure is briefly described as follows. Consider the maximum cost value imparted to the obstacles as and let be the inner (blue rectangle) and outer (dark green borders) measurements of the obstacles from the center . Let be a point in the configuration space with being real valued scalar functions given by , and . Then, can be calculated by :
(9) 
This procedure produces smooth “bumps” in the cost profile which are visualized in Figure 2(b) using . This approach can be easily generalized to arbitrary high dimensions by simply multiplying upto terms in (9) to create a bump function in the dimension. To generate the second moment of cost , we use a scaled bivariate Gaussian distribution with appropriate means and variances to depict the ink spot/tear in Figure 2(a). Now we will illustrate the results of implementing Algorithm 2 in this environment.
Simulations and discussions
With the uncertain cost with moments and from previous paragraph, we use a half Normal distribution and discretization factor to generate the costs and their corresponding from Section IV, the results of using Algorithm 1 to every point in to generate the perceived environment is shown in Figures 1 and 3. The level of risk at a point or is indicated by color map. Figure 0(a) shows a rationally perceived environment using expected risk . Whereas, Figure 0(b) indicates a nonrational highly risk averse perception using CPT () with having a high value. A risk indifferent profile (Figure 2(d)) is generated by having a low risk sensitivity value. Similarly, uncertainty indifferent profile (Figure 2(e)) and uncertainty averse profile (Figure 2(f)) are generated by fixing and having high and low values respectively.
After the perceived environment is generated, Algorithm 2 is used to plan a path from the start point to the goal point shown in Figure 2(a). We use iterations for the CPTRRT* algorithm with . The same random seed was used for all executions for consistency. The path planning results are illustrated in Figure 4. As expected, we see that the path depends on the perceived risk profile. Figure 3(a) indicates a circuitous path due to the highly risk averse perception, whereas Figure 3(d) indicates a shorter and more direct path for a rational DM using expected risk. Increasing the uncertainty sensitivity (lowering ) and reducing risk aversion (lowering ) makes the planner avoid the highly uncertain ink spot/tear in the topright region and take a more riskier path in the lower region as shown in Figure 3(b). By having a medium risk aversion and lower uncertainty sensitivity (increasing ), the planner produces a different path through the medium risky and uncertain middle region as shown in Figure 3(c).
Solution quality
Figure 5 illustrates the empirical convergence and solution quality of the paths produced by our algorithm. We performed empirical convergence tests, by running CPTRRT* times with the same parameters and initial conditions and measuring the area between paths produced after every iterations for a total of iterations. The results are shown in Figure 4(a). We see that initially ( iterations) there are changes in the output path as the space is being explored and the output path is changing. After iterations we consistently see minimal path changes indicating that the algorithm is converging towards a desirable path. Then we also checked the solution quality of the path by computing the cost of the output path every iterations as shown in Figure 4(b) for trials consisting of iteration. We see that the there is a consistent decrease in path cost in all the trials throughout. We also note that after iterations the cost decrease starts to plateau, indicating that the algorithm is close to a high quality (low cost) solution. From these observations of Figure 5, we recommend upwards of iterations to achieve smooth and consistent paths in our setting.
Comparison in narrow and cluttered environments
Here, we will illustrate and compare the performance between our RRT* framework and TRRT* [21] (another algorithm operating on continuous cost spaces) in a cluttered environment with narrow passages as shown in Figure 6. To construct this environment, we used 100 randomly placed small objects on the right half and two big objects separated by a narrow passage on the left half. Start point is on the top right corner and the goal is at the center of the narrow passage. Bump functions similar to previous paragraphs were used to construct a smooth spatial cost from the obstacles. Since TRRT* does not have risk perception capabilities, for a fair comparison we use the continuous cost to implement both algorithms. In this way, we will be able to specifically compare the planning capabilities of both algorithms in the same continuous cost environment. We used and for both algorithms. From Figure 5(a) we can see that our algorithm is able to sample and generate paths in the narrow passage, as well as avoid obstacles in a cluttered environment. In comparison, we can see that from TRRT* employing integral cost (IC) in Figure 5(b) and minimum work (MW) in Figure 5(c) cannot generate paths in the narrow costly region fast enough irrespective of the used due to the sampling bias towards lowcost regions. Also, TRRT* paths do not appear to be as smooth as the paths from our framework, irrespective of the cost(IC or MW) used. We also note that, the cluttered and high cost environment induces a high failure rate of the transition test, resulting in longer run times of TRRT* required to build the same number of nodes as our algorithm, especially for high values.
Variation in
Using the previous environment (Figure 2(a)) and a cost and uncertainty averse profile (Figure 3(a))), we run CPTRRT* with varying from to for iterations. The results are shown in Figure 5(d). We can see that when the path output changes reflecting an increase in urgency over risk and thus choosing shorter paths. When is comparable to the risk values (in this case ), we see that the paths no longer avoid the high risk area and can even go through the soft obstacles. From this study, we observe that needs to be rather small as compared to the given risk profile in order to ensure meaningful consideration of risk in the planning process. If explicit obstacle avoidance is a necessity, then a standard collision check can be performed prior to adding a node in the tree .
Overall, our adaptation of CPT to the planning setting produces paths that are logically consistent with a given risk scenario. Additionally, our planning framework can explore narrow corridors and cluttered environment and produce smooth paths quickly.
ViiB CPT planner expressive power evaluation
We now discuss the proposed SPSA framework in Section VI to gauge the adaptability of CPT as a perception model to depict a drawn path . We compare CPT with Conditional Value at Risk (CVaR) [10], also known as “expected shortfall”, another popular risk perception model in the financial decision making community. CVaR uses a single parameter representing the fraction of worst case outcomes to consider for evaluating expected risk of an uncertain cost . We will use to denote the perceived risk by CVaR model with . So a considers the worst case outcome of and a considers all the outcomes thus making the CVaR value equal to expected risk ().
To implement SPSA, we follow guidelines from [27]. We consider a Bernoulli distribution of with support and equal probabilities, learning rate and perturbation parameter .
We choose for CPT throughout the simulation, which are the nominal parameters from [4] and for the CVaR variant. We use the same environment as in Figure 3 for all the simulations. Four different paths are drawn by hand on the expected risk profile (Figure 3(d)) using a computer mouse as shown in Figure 6(a). Path is similar to a path generated with expected risk perception (Figure 3(d)). Whereas, path and are similar to paths generated with high risk aversion (Figure 3(a)) and uncertainty insensitivity (Figure 3(c)) respectively. Path is more challenging to represent as it shows an initial aversion to risk and uncertainty and then takes a seemingly costlier turn at the top.
We then use the SPSA approach described in Section VI with a tolerance and a maximum of SPSA iterations per trial. We use iterations and to implement Algorithm 2 to determine in order to determine the loss during each SPSA iteration. For the CVaR variant, the planner (Algorithm 2) replaces with in order to use perceived risk according to CVaR while the rest of the RRT* framework remains unchanged. At the end each trial we get the area (loss) between the returned and the drawn path .
We represent the statistics of the returned cost as boxplots as shown in Figure 6(b). Each box plot represents the distribution of cost/Area values returned after each trial for each path and perception model. The YAxis represents the cost/Area in a base log scale. We calculate a few sample areas: and to give a quantitative idea of the measure in this scenario to the reader. The median values for each box plot is indicated on the top row. The mean value of the distribution is indicated as “stars”, the black lines above and below the box represent the range, and indicates outliers.
We observe that from Figure 6(b), both Path and Path were captured equally well with CVar and CPT with low values. Since both CPT and CVaR are generalizations of expected risk, paths close (like ) to paths generated from expected risk can be easily mimicked. Similarly, since CPT and CVaR are designed to capture risk aversion, paths close (like ) to risk averse paths (Figure 3(a)) can also be easily captured.
However, we see a contrast in performance for path and path . CPT, on both occasions, is able to track the drawn paths reasonably well with low values. Whereas CVaR has consistently higher (an order of magnitude) values, indicating the inability to capture the risk perception leading to path and path . This is due to the fact that CPT can handle uncertainty perception independently from the cost (as seen between Figure 2(e) and Figure 2(f)). This ability is needed to capture paths like and which is lacking in models like CVaR and expected risk. This shows the generalizability of CPT over CVaR with CPT having a richer modeling capability.
Viii Conclusions and Future Work
In this work, we have proposed a novel adaptation of CPT to model a DM’s nonrational perception of a risky environment in the context of path planning. Firstly, we have demonstrated a DM’s nonrational perception of a 2D environment embedded with an uncertain spatial cost using CPT, and provide a tuning knob to model various cost and uncertainty perceptions. Next, we propose and demonstrate a novel embedding of nonrational risk perception into a sampling based planner, the CPTRRT*, which utilizes the DM’s perceived environment to plan asymptotically optimal paths. Finally, we evaluate CPT as a good approximator to the risk perception of arbitrary drawn paths by comparing against CVaR, and show that CPT is a richer model approximator. Future work will analyze how CPT can be used to learn the risk profile of a decision maker by using learning frameworks like IRL and conducting user studies.
Footnotes
 CPT has an alternate perception model for random rewards [23], which is not used here since we are interested in cost perception.
 Instead of a operation, one may use an expected operation wrt .
 Bump functions can also be defined and formalized in arbitrary Riemannian manifolds.
 The discretization of the random cost function is used to be able to use CPT directly with discrete random variables. However, it is possible to generalize what follows to the continuous random variable case.
 Just for offline planning, or in situations where the human does not update the environment online as new information is found.
 An arbitrary path can be modeled as a curve defined by a large number of parameters (possibly infinite).
 We used nodes instead of iterations for TRRT* to maintain an equal number of nodes in the tree, as a node does not get added it fails the transition test. (d) Paths produced by CPTRRT* with varying with
References
 H. Choset, S. Hutchinson, K. M. Lynch, G. Kantor, W. Burgard, L. E. Kavraki, and S. Thrun, Principles of Robot Motion: Theory, Algorithms and Implementations. The MIT Press, 2005.
 C. Wu, A. Bayen, and A. Mehta, “Stabilizing traffic with autonomous vehicles,” in IEEE Int. Conf. on Robotics and Automation, 2018, pp. 1–7.
 A. Suresh and S. Martínez, “Gesturebased humanswarm interactions for formation control using interpreters,” in IFAC Conf. on Cyber Physical and Human Systems, vol. 51, no. 34, Miami, FL, USA, 2018, pp. 83–88.
 A. Tversky and D. Kahneman, “Advances in prospect theory: Cumulative representation of uncertainty,” Journal of Risk and Uncertainty, vol. 5, no. 4, pp. 297–323, 1992.
 H. Kurniawati, T. Bandyopadhyay, and N. M. Patrikalakis, “Global motion planning under uncertain motion, sensing, and environment map,” Autonomous Robots, vol. 33, no. 3, pp. 1–18, 2012.
 J. Song, S. Gupta, and T. Wettergren, “T*: Timeoptimal riskaware motion planning for curvatureconstrained vehicles,” IEEE Robotics and Automation letters, vol. 4, no. 1, pp. 33–40, 2019.
 B. Burns and O. Brock, “Samplingbased motion planning with sensing uncertainty,” in IEEE Int. Conf. on Robotics and Automation, 2007, pp. 3313–3318.
 S. Singh, Y. Chow, A. Majumdar, and M. Pavone, “A framework for timeconsistent, risksensitive model predictive control: Theory and algorithms,” IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2905–2912, 2019.
 A. Hakobyan, G. Kim, and I. Yang, “Riskaware motion planning and control using CVaRconstrained optimization,” IEEE Robotics and Automation letters, vol. 4, no. 4, pp. 3924–3931, 2019.
 P. Artzner, F. Delbaen, J. Eber, and D. Heath, “Coherent measures of risk,” Mathematical Finance, vol. 9, no. 3, pp. 203–228, 1999.
 S. Gao, E. Frejinger, and M. BenAkiva, “Adaptive route choices in risky traffic networks: A prospect theory approach,” Transportation Research Part C: Emerging Technologies, vol. 18, no. 5, pp. 727–740, 2010.
 A. R. Hota and S. Sundaram, “GameTheoretic Protection against Networked sis Epidemics by Human DecisionMakers,” IFAC Papers Online, vol. 51, no. 34, pp. 145–150, 2019.
 C. Jie, P. L. A., M. Fu, S. Marcus, and C.Szepesvari, “Stochastic optimization in a Cumulative Prospect Theory framework,” IEEE Transactions on Automatic Control, vol. 63, no. 9, pp. 2867–2882, 2018.
 L. Wang, Q. Liu, and T. Yin, “Decisionmaking of investment in navigation safety improving schemes with application of Cumulative Prospect Theory,” Journal of Risk and Reliability, vol. 232, no. 6, pp. 710–724, 2018.
 S. Karaman and E. Frazzoli, “Samplingbased algorithms for optimal motion planning,” International Journal of Robotics Research, vol. 30, no. 7, pp. 846–894, 2011.
 B. Boardman, T. Harden, and S. Martínez, “Limited range spatial load balancing in nonconvex environments using samplingbased motion planners,” Autonomous Robots, vol. 42, no. 8, pp. 1731–1748, 2018.
 W. Chi and M. Q. Meng, “RiskRRT*: A robot motion planning algorithm for the human robot coexisting environment,” in Int. Conf. on Advanced Robotics, 2017, pp. 583–588.
 B. Englot, T. Shan, S. D. Bopardikar, and A. Speranzon, “Samplingbased minmax uncertainty path planning,” in IEEE Int. Conf. on Decision and Control, 2016, pp. 6863–6870.
 E. Schmerling, K. Leung, W. Vollprecht, and M. Pavone, “Multimodal Probabilistic Modelbased planning for HumanRobot Interaction,” in IEEE Int. Conf. on Robotics and Automation, 2018, pp. 1–9.
 B. Luders, S. Karaman, and J. How, “Robust samplingbased motion planning with asymptotic optimality guarantees,” in AIAA Conf. on Guidance, Navigation and Control, August 2013.
 D. Devaurs, T. Simeon, and J. Cortes, “Optimal Path Planning in Complex Cost Spaces with Samplingbased Algorithms,” IEEE Transactions on Automation Sciences and Engineering, vol. 13, no. 2, pp. 415–424, 2016.
 T. Shan and B. Englot, “Samplingbased minimum risk path planning in multiobjective configuration spaces,” in IEEE Int. Conf. on Decision and Control, 2015, pp. 814–821.
 S. Dhami, The Foundations of Behavioral Economic Analysis. Oxford University press, 2016.
 D.Prelec, “The probability weighing function,” Econometrica, vol. 66, no. 3, pp. 497–527, 1998.
 L. Tu, An Introduction to Manifolds. Springer, 2011.
 J. C. Spall, “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation,” IEEE Transactions on Automatic Control, vol. 37, no. 3, pp. 332–341, 1992.
 ——, Introduction to Stochastic Search and Optimization. New York, 2003.