Between-Ride Routing for Private Transportation Services
Spurred by the growth of transportation network companies and increasing data capabilities, vehicle routing and ride-matching algorithms can improve the efficiency of private transportation services. However, existing routing solutions do not address where drivers should travel after dropping off a passenger and before receiving the next passenger ride request, i.e., during the between-ride period. We address this problem by developing an efficient algorithm to find the optimal policy for drivers between rides in order to maximize driver profits. We model the road network as a graph, and we show that the between-ride routing problem is equivalent to a stochastic shortest path problem, an infinite dynamic program with no discounting. We prove under reasonable assumptions that an optimal routing policy exists that avoids cycles; policies of this type can be efficiently found. We present an iterative approach to find an optimal routing policy. Our approach can account for various factors, including the frequency of passenger ride requests at different locations, traffic conditions, and surge pricing. We demonstrate the effectiveness of the approach by implementing it on road network data from Boston and New York City.
Advances in information technology and decision theory are helping revolutionize the market for private transportation services. Transportation Network Companies (TNCs) like Lyft and Uber utilized internet-enabled ride requests to quickly grow their market share in private transportation. From 2014 to 2016, Uber’s market share for ride-hailing services rose from 18% to 75% in the United States [Martha:2016]. Policy efforts increasingly support the concept of mobility as a service [jittrapirom2017mobility], increasing the economic importance of private transportation services. However, TNCs are also associated with concerns related to increased congestion and low driver wages.
Tools from decision theory can utilize new sources of consumer and driver data to help improve the efficiency of private transportation services. TNCs typically feature automated passenger ride matching and prices that vary geographically and temporally. Higher prices during peak demand periods are commonly referred to as surge prices. There is new opportunity for developing efficient algorithms to optimize surge pricing, improve system efficiency, and to help increase driver wages.
In this paper, we consider the situation where a driver seeks to maximize the expected value of their profits by optimizing routing decisions during the between-ride period, the time period after a driver drops off a passenger and before they receive their next ride request.
We formulate this problem as an undiscounted dynamic program with an uncertain number of decision stages; our formulation is equivalent to a stochastic shortest path problem. We prove that the between-ride problem has special characteristics; given reasonable assumptions, it can be efficiently solved as a dynamic program with a finite number of stages. We implement our algorithm using road network data from Boston and New York City, demonstrating that our approach is practical and scalable. Our implemented algorithm can directly advise drivers between rides, which could bring substantial benefits to transportation efficiency. It can also help increase driver wages and reduce costs.
Current research approaches regarding algorithms for private transportation services focus on the issue of driver-passenger matching. They encompass a wide variety of scenarios, including but not limited to: matching algorithms for ride-sharing and carpooling services [agatz2012optimization] [SCHREIECK2016272] [kleiner2011mechanism], private transportation services with ride-sharing [alonso2017demand], and generalizations of the ride-matching problem to cases where passengers might be asked to transfer over to another vehicle in the middle of their journey [masoud2017decomposition]. Algorithms for driver-passenger matching typically focus on the period after passengers have made a trip request. Dispatch algorithms can move drivers to higher value areas, but they do not provide routing suggestions to private drivers to optimize their personal profits.
Our approach complements the aforementioned work on ride-matching: our algorithm can be used to navigate drivers towards areas with high probability of new passenger requests by providing routing directions that align with their profit motive. These results can improve the effectiveness of existing algorithms by helping move unmatched drivers to higher value locations where they are more likely to be matched to a ride when a ride request is issued.
The issue of location-based and distance-based pricing policy is also an active area of research. Research demonstrates how spatial pricing policy can be implemented to achieve better matching between supply of drivers and demand of passengers [bimpikis2016spatial]. Another study investigates the differences between dynamic and static pricing strategies [banerjee2015pricing]. Each of these papers focus on pricing strategies, and drivers are assumed to move efficiently towards areas with higher demand and prices. It is unlikely that drivers are able to make optimal between-ride decisions without the assistance of routing technology; the between-ride routing problem is complicated, requiring the synthesis of multiple data sources.
Again, our proposed algorithm complements existing research: it provides optimal paths for drivers between rides to maximize driver profits, given an existing pricing policy. If a pricing policy is efficient, our approach can ensure that the benefits of the pricing policy are attained. The combination of research areas can help create a more coordinated network system, reducing waiting time for passengers and allowing more driver-passenger matches to occur.
Section II describes the model and formulates the between-ride routing problem as a stochastic shortest path problem, a type of dynamic program. Section III proves that we can solve the between-ride routing problem by using a finite-horizon dynamic program, where the number of stages is no greater than the number of nodes in the transportation network. Section IV presents a practical algorithm for finding an optimal solution to the between-ride routing problem, and Section V describes an implementation of our approach using road network data from Boston and New York City.
Ii Technical Model
This section details the model for optimizing a between-ride route in order to maximize the expected value of profits for a driver. First, we detail the relevant parameters. Second, we explain the probability model whereby a driver receives ride requests at a particular location according to an exponential distribution. We specify the expected value of a route along points on a 2-dimensional map. In the following Subsection II-A, we present a model of the road network as a directed, connected graph. We take the parameters of interest to be constant along each edge. We explain the driver objective function and optimization problem in terms of discrete decisions on the graph.
Consider a specific driver that does not currently have a passenger. Let be the driver’s wage rate, i.e. the value of their time. Let be their fuel and vehicle cost per unit distance driven.
For each location on the map, let be the expected value of the profit from a ride request the driver receives at location . We assume that the random profit from the ride accounts for its various features, including length, price, and duration. For tractability, we assume that rides are undifferentiated aside from the expected value of their profit. In practice, ride opportunities can vary in other ways as well. For instance, rides at some locations could be more likely to end at high-value locations, which would increase the expected value of profit from subsequent rides. In practice, this could be incorporated into these results by adjusting to account for relevant characteristics, but a proper formulation would be non-myopic with regard to the value of subsequent rides. Future research could focus on the case where the value of subsequent rides is directly incorporated to the model; this would lead to an interesting formulation over multiple potential rides.
Furthermore, let be the the pickup rate at location , i.e. the expected value of the number of ride requests per minute. The values and are indexed by because they can vary at different points in the transportation network. These variables can change over time, but the algorithm assumes that they are static over the course of the between-ride decision making. This is a reasonable assumption given the short duration of between-ride routing; research from Denver suggests that the average between-ride period is less than 12 minutes, with a median of 7.5 minutes [henao2017impacts]. If inputs change or shocks occur, the algorithm can be rerun to optimize the remaining route during the between-ride period. This allows drivers to respond to real-time changes in demand and congestion.
Our model assumes that and are not influenced by the driver’s route. It would also be interesting to consider the case where drivers actions directly influence the price and demand for rides, for instance in a game-theoretic or mean-field model.
At any position , we assume that ride requests arrive according to a Poisson process, and we model passenger ride requests via an exponential distribution. Consider a driver at position who travels along the trajectory in continuous time, and let be the random variable of the time the driver receives their first ride request along route . Then
In general, we assume the drivers must accept their first ride request, as is commonly required for drivers in TNCs. Let be the expected value of the profit through the next ride period for the between-ride driver on trajectory at each time . is given by
In Equation 2, the first term corresponds to the expected revenue earned at , taking into account the likelihood of receiving a passenger match while at . The second term is the cost due to time spent waiting, while the third term represents fuel costs. The trajectory is differentiable, so exists and is finite for all . It represents the speed traveled along the route, so it accounts for local vehicle speeds and congestion.
This expression is intuitive: if a driver waits at location until they receive a passenger match, they receive a ride match eventually with probability . Thus, their expected revenue is . The cost of waiting is their wage rate times the expected value of the amount of time until they receive a passenger match .
In general, a driver at location seeks to choose a trajectory to maximize (2). In the following section, we will model the road network as a graph and formulate the decision problem as a dynamic program in discrete time and space. As we will show, this leads to a tractable decision problem that can be efficiently solved.
Ii-a Road Network Model
To solve the between-ride routing problem, we model the road network as a directed, connected graph . Each edge represents a section of a road, while the set of nodes includes, but is not limited to, all road intersections.111Nodes can also be used to model a specified point along a road segment where there is no intersection with other roads. These additional nodes are equivalent to intersections with only 1-2 options for directed travel: continue straight, or (possibly) make a u-turn. Let . Note that includes all loops; i.e. , . A driver on the edge corresponds to the action of a driver waiting at node .
The driver seeks to choose a route to maximize their expected profit over an infinite horizon:
We can formulate the between-ride routing problem (4) as a stochastic shortest path problem222Stochastic shortest path problems were first formulated by [eaton1962optimal]. Existing research [bertsekas1991analysis] extends the analysis to the case where transition values may be positive or negative, which is helpful for our analysis. The two-volume textbook [Bertv1][Bertv2] provides additional information on stochastic shortest path problems in each volume., which is an infinite horizon dynamic program and a type of Markovian Decision Problem (MDP). Let the state space , where we augment the set of nodes with a terminal state . When a driver moves to state , we say that they are no longer in the between-ride routing period, because they have either found a rider or stopped offering rides. Once the driver reaches state , they derive zero additional cost or value, so the decision problem equivalently terminates.
The function represents the value of transitioning to state from state after choosing policy , which prescribes an action for each state in each stage of the decision problem. It includes the potential value of receiving a ride match, less the costs associated with the driver’s time and fuel costs. The transition probabilities are stationary, and they are given by .
At a specific state , the driver chooses the next location with edge . For example, at node , the driver could choose to stay at node , turn right to travel to node , or turn left to travel to node if . If , then the time to traverse the edge from to is not a decision variable; it is given by the speed of traffic. If , then we say that the driver is waiting at node , and the driver can choose exactly how long to wait at node before making a subsequent decision.
Formally, the driver chooses an action where is the set of admissible actions at state . The action is a double, i.e. . The driving decision is the driver’s choice of the next location. The waiting or travel time is chosen from the set if . Otherwise, . is the time required to travel from node to along edge at the current speed of traffic, taking into account traffic and congestion.
Additionally, for all , there is an admissible action with . This describes the case where the driver stops searching for another ride request and stops offering rides. This can happen, for instance, at the the end of a shift or if prices are too low for the driver to keep searching for the next ride request.
Consider the transition probabilities and profits for a driver at node . Let the driver elect action with . If , then is a decision variable; otherwise, . Along edge , the constant represents the arrival rate for ride requests. Then, from (1)
Remember that if , then , because drivers must move at the speed of traffic along a particular edge.333The original map can be augmented with a node in the middle of edges with curbside parking or waiting zones, in order to model the case where drivers can stop and wait along some edges. In this case, there is a probability that they will be matched with a ride; otherwise, they will move to node for the next decision stage after a duration .
A driver making the decision at node receives profit
As before, the values and refer to the driver’s fixed wage rate and fuel/vehicle cost. The expected value of the ride revenue for every edge is known and stationary in the time period of interest, and given by . The driver drives at a constant speed along each edge (which is directly the edge distance divided by ).
The variable is the expected value of the time until a match occurs along edge , when the edge is traversed over a time period , conditional on the fact that a match does occur in ; .
From (6), we can write the expected value of the profit at any stage according to the transition probabilities and duration associated with the chosen action , again with :
Note that for loops , i.e. edges of the form . The idea is that the only cost for drivers when they are waiting is due to their time, not due to gas or other per-distance vehicle costs. When , we evaluate the associated transition probabilities and profits as the limits of the provided equations when goes to infinity; these limits exist for each of the provided expressions in (5), (6), and (7).
Since is the terminal node, and for any . The driver receives value when they transition to randomly by receiving a ride request. Once a driver reaches , they have accepted a ride request or stopped offering rides. They receive no additional value and the decision problem ends. As explained previously, at any node , the driver can select to stop offering rides; mechanistically, this is performed as an action to move directly to the terminal node without any reward. For with , and . The idea is that the driver can elect to stop offering rides at any time, which incurs no further cost but eliminates the opportunity of collecting revenue from a potential ride.
Going forward, we assume that local maxima of are defined as nodes. Consider an edge with . Then
From any input data, it is straightforward to ensure that a map meets the required assumption by adding a node along or in the middle of any edge that has value greater than that value for each of their adjacent nodes.
The policy defines actions for each state in each decision period . In practice, the policy can vary in each decision stage. Therefore, the action taken at node in stage under policy is . However, we focus on stationary policies where at each stage. As we will show, at least one of these stationary policies is optimal, justifying our narrow focus. Due to the focus on stationary policies, in the subsequent Section we drop the stage subscript from our notation. Instead, refers to stationary policies and and refer to the actions (direction and waiting time) associated with policy at node .
Optimal policies for infinite horizon dynamic programs like (4) are typically solved using convergence of value iteration (VI) algorithms [Bertv1]. In general, this can lead to sub-optimal results despite extensive computation periods; this could limit their value for the between-ride routing problem, since driving suggestions should ideally be provided to multiple drivers in a network and available very quickly after a driver drops off their previous passenger and enters the between-ride period. However, in this section, we show that the between-ride routing problem has special structure that guarantees that allows it to be efficiently solved as a finite-horizon dynamic program with stages.
Since every node and edge has a non-zero probability of a match,444We could relax this assumptions, but in practice a match is possible anywhere, because drivers can be matched with passengers that are near their location. A driver can receive a match in a nearby residential neighborhood, for instance, while traveling along a highway. then for any policy, there is always a positive probability that the termination state will be reached.
Then as explained by [Bertv1], optimal values at nodes indexed by satisfy Bellman’s Equation:
First, we will show that there is exists optimal stationary policy where for all states , obtains the minimum of (9) and where the driver waits until they receive a ride request at any node where they choose to wait. Then we will show that this optimal stationary policy has no cycles, i.e. for any states , and stages . Together, this implies that an optimal policy can be found using a type of value iteration algorithm requiring at most steps.
There exists a stationary policy that is optimal for the decision problem (4). Under this stationary policy, whenever a driver waits at a specific node, they plan to remain at that node until they receive a ride request, i.e. , implies .
First, let define a restricted action space for node . We say that if and if and if . Note that , . This restricted action space implies that whenever a driver chooses to wait at a node, they will wait at that node until they receive a ride request.
The values of given by are unchanged if we replace the constraint sets with . To see this, fix according to (9), and for each fix a set of actions that attain the optimal values of (9) for each state . Define the set . Define an additional policy such that , , and , but .
For an arbitrary policy , let there be some with and . Let . Then with ,
Subsequently, this implies that , , since for each , the objective function in is unchanged. Therefore, we can restrict the decision space to , and the same as above satisfy
For each state , the optimization problem in (12) describes a finite and discrete choice set with . Therefore, there exists some such that for all , obtains the maximum in (12). This describes a stationary policy, because the objective function and constraints in (12) do not change across decision periods. We showed that also obtains the maximum of (9). As shown by [Bertv1], this implies that is an optimal policy for the decision problem (4). ∎∎
There exists an optimal stationary policy that meets the characteristics of Proposition 1 and has no cycles with probability 1, i.e. for any state and decision stages .
Let refer to an optimal stationary policy that meets the criteria of Proposition 1. By examining (5), we see that under policy only if the policy admits a cycle (possibly of length 1): such that under , we have that , , …, . For each , let .
Assume the policy admits a cycle. Without loss of generality, let be the maximum valued waiting node in the loop:
Then, we can write Bellman’s equation for the policy as
Let be the probability of receiving a ride request anywhere along the cycle, i.e. . Then
Since , this implies that .
Let define a policy with for all with , and and . Then as shown in (11), . Therefore, considering also the optimality of policy , . Due to this equality, and since for all , then , . Therefore, is an optimal stationary policy. If still contains a cycle (for instance, if contained multiple such cycles), this procedure can be repeated until the resulting policy has no such cycles; this would require at most repetitions.
Let be the first policy constructed using (potentially multiple) iterations of the above procedure, starting from the original optimal stationary policy , such that has no cycles. Then under there does not exist any with , such that , , …, . The set eliminates all policy cycles with two or more nodes.
Furthermore, in , for all with , then . Thus, for any subset of nodes , containing an arbitrary number of nodes, . Therefore, under policy , for any , , . Equivalently, there exists an optimal policy that has no cycles with probability 1. ∎∎
The policy represents an optimal stationary policy with no cycles. Therefore, every decision node is visited at most once. This implies that there are at most decision stages. Furthermore, there is an optimal stationary policy over the decision stages. Therefore, from any starting node , we have that (4) is exactly solved by
This can be solved exactly by iterations of a Value Iteration (VI) algorithm, which we present in the following section.
Iv Path-Finding Algorithm
This section describes the process and the Value Iteration (VI) algorithm we developed for the between-ride problem to solve for the optimal values in equation (17) and, therefore, equation (4). First, we pre-process the network graph to ensure it satisfies the condition (8). This section describes the algorithm and proves that our algorithm is optimal for the appropriately pre-processed map.
The main algorithmic steps can be described as follows:
Initially, we calculate the expected driver revenue for waiting at every location , from (3).
We “relax” each edge iteratively to see if traveling through it will provide a more optimal path for drivers on the connecting vertices. We do this by iteratively applying (12) on every edge.
We terminate the algorithm when no better path is found after iterating through all edges.
The algorithm returns a provably optimal solution. This result is proven in the next Subsection. As an added benefit, the algorithm simultaneously solves for the optimal path for all drivers in the road network. The total runtime remains the same even if more drivers are added to the network.
The first algorithm finds the optimal next node to go for all nodes. After processing the values for all the nodes, we can easily find the optimal path for any starting node using the second algorithm GETPATH, which traverses the next nodes in order.
Iv-a Proof of Correctness
From Section III, we have that from any starting node, there is an optimal policy that results in a path that traverses at most edges in with probability 1. This result is used to establish a proof of the optimality of the path returned by the described algorithm.
From Section III, we see that there exists an optimal policy of (17) that is also optimal for (4). Therefore, we can focus on policies that are optimal for decision stages. These policies transverse a path of length less than or equal to . This motivates the following definition:
Let denote the optimal value attainable if you start from with decision stages, i.e. you start at and travel through a path of length at most .
Recall that is the optimal value attainable, starting from , considering all potential policies and paths of potentially infinite length. By definition of the optimal policy, , , . Proposition 2 concludes that , . Note also that according to our initialization. This leads to the following Proposition:
After iterations of the for loops in line 4 of algorithm 1 (i.e. after we relax all the edges, repeatedly for times), for each node the stored value of the node satisfies the following:
1. It is the value obtained from a valid policy.
2. It is larger than or equal to .
To prove the first property, we use induction and consider what happens when a single edge is relaxed. For the base case, at the start , and hence is the value for a valid policy (waiting at the node until a ride request is received). In the inductive case, suppose we relax edge , changing the value of node from to , with as defined in line 9 of Algorithm 1. From the previous step, is a value associated with a valid policy. It thus follows that is the value of valid policy since .
To prove the second property, we induct on the value of . In our base case , we initialized each node with , so the second property holds immediately. Then for our inductive case, suppose value is achieved by a valid policy . After we relax all of the edges, we aim to show through the inductive step that . Without loss of generality, assume the action associated with node under the optimal policy with decision stages is , i.e. the chosen direction from node in stage 1 is towards node . Then, by the principal of optimality, must be the value achieved by the same policy starting at ; otherwise we will be able to achieve a better value of by selecting and then following the policy that corresponds to , which violates the optimality of .
From this we know that with . By our assumption in the induction step, the value for node after iterations of the for loop must satisfy , because our algorithm only increases the stored value .
Then in the -th iteration, when we update edge , we have the inequality:
which completes the proof. ∎∎
After iterations of relaxing all edges, the values stored in each node . As demonstrated in Proposition 2, .
Finally, observe that the values for each node provide the optimal stationary policy for each node . Therefore, by tracing the sequence of decisions we can find a path that follows the optimal policy, starting from an initial node .
The algorithm gives a worst-case runtime of , since relaxing each edge takes constant time, and we perform total relaxations in lines 4 and 5 of Algorithm 1. We can assume a constant upper bound on the number of edges connected to any particular vertex in a transportation map, because the number of roads converging at any particular intersection has some upper bound that does not depend on the number of nodes, i.e. . Therefore the worst-case runtime is . 555In practice, there are additional improvements that can be used to decrease the runtime. Firstly, we can terminate the algorithm if the values are unchanged after a full iteration. Our runtime will then be , where is the maximum length of an optimal path in the road network. This greatly reduces the runtime especially for large networks, since typically each optimal path only spans a fraction of the set of all nodes.
To demonstrate the feasibility and scalability of our algorithm, we implemented it using road network data from Boston, MA, and New York, NY. We used open source data from Open Street Maps [openstreetmap] to obtain the road network data. We used posted speed limits as initial values for and . We divide the map into a grid and set experimental values for and in each gridbox. Future research could utilize TNC data to analyze the results and profit improvements for this algorithm with realistic values of and .
Figure 3 demonstrates an example output for the city of Boston. We recorded the time taken to run the algorithm, using a standard laptop with a Intel Core i7-5500U CPU @ 2.40GHz4 processor and 8GB RAM. Overall, the program took less than 3 seconds to implement the optimization algorithm, demonstrating the feasibility of our approach. The relevant information is shown in Figure 4.
|City: Boston, MA|
|Network size||12541 Edges, 9072 Vertices|
|Top-Right coordinates||42.38N, 71.03W|
|Bottom-Left coordinates||42.33N, 71.12W|
|City: New York, NY|
|Network size||20428 Edges, 13973 Vertices|
|Top-Right coordinates||40.82N, 73.92W|
|Bottom-Left coordinates||40.70N, 74.02W|
We can also compare the optimal driver value obtained by our approach to the value obtained by other routing decisions. For example, we can compare the optimal value to the value associated with a route where the driver takes the shortest path towards the node with the highest expected value (we call this route the “shortest-route”). The shortest-route represents a reasonable heuristic route for drivers in the between-ride period: head towards the highest value location. In the base case, the expected value of profit from the next ride, averaged across all nodes in Boston when driving along the optimal route is $6.78, while that of the shortest-route is $6.44. The optimal between-ride solution provides a 5% average improvement. At certain nodes, the optimal route increases the expected value or profit by 25-50%.
The relative value of the optimal solution is significantly higher in periods of congestion. If we assume that average vehicle speeds are of the posted speed limits, then the average nodal value for the optimal policy is $5.61, versus $4.99 for a policy that takes the shortest path to the highest value node. In this case, the presented algorithm allows a 12% improvement in driver profits. These calculations represent a back-of-the-envelope effort to test the value of our algorithm. Future research could use real-world price and traffic data in order to more accurately measure the benefits of our between-ride algorithm and to understand the conditions that influence its value.
This paper models the between-ride routing problem for private transportation providers. We seek to optimize routing to maximize the expected value of profits for a driver that does not currently have a passenger and who is awaiting their next ride request. Our algorithm can account for various factors, including the pickup rate at different locations, surge pricing, fuel costs, and traffic conditions.
We model the decision problem as an dynamic program with an uncertain number of stages before termination; this is equivalent to the stochastic shortest path problem. We show that under reasonable conditions, the between-ride problem can be solved to optimality by solving a simpler finite-horizon dynamic program. We present an algorithm using an iterative technique related to Value Iteration, and illustrate the feasibility of this algorithm by implementing it on road networks from Boston and New York City.
There are several interesting areas for future research related to the between-ride routing problem. Our algorithm focuses on a single between-ride period. Future research could focus on the driver’s optimization problem over multiple rides, considering variability in the probability distribution of ride destinations from different origins.
Our algorithm focuses on the case where the behavior of individual drivers does not substantively change the rate of passenger ride requests or the value of rides in different locations. When there are multiple drivers in the network, driver behavior could impact prices and pickup rates at different locations. In this case, the optimal between-ride behavior of drivers would anticipate the behavior and trajectory of other drivers. Extensions could use tools from game theory or mean field theory to develop optimal driver strategies that anticipate the decisions of other drivers.