Stochastic Model Predictive Control
for Autonomous Mobility on Demand
Abstract
This paper presents a stochastic, model predictive control (MPC) algorithm that leverages shortterm probabilistic forecasts for dispatching and rebalancing Autonomous MobilityonDemand systems (AMoD), i.e. fleets of selfdriving vehicles. We first present the core stochastic optimization problem in terms of a timeexpanded network flow model. Then, to ameliorate its tractability, we present two key relaxations. First, we replace the original stochastic problem with a Sample Average Approximation, and provide its performance guarantees. Second, we divide the controller into two submodules. The first submodule assigns vehicles to existing customers and the second redistributes vacant vehicles throughout the city. This enables the problem to be solved as two totally unimodular linear programs, allowing the controller to scale to large problem sizes. Finally, we test the proposed algorithm in two scenarios based on real data and show that it outperforms prior stateoftheart algorithms. In particular, in a simulation using customer data from the ridesharing company DiDi Chuxing, the algorithm presented here exhibits a 62.3 percent reduction in customer waiting time compared to state of the art nonstochastic algorithms.
I Introduction
The last decade has witnessed a rapid transformation in urban mobility. On the one hand, MobilityonDemand (MoD) services like ridesharing (e.g. Uber and Lyft) and carsharing (e.g. Zipcar, Car2Go) have become ubiquitous due to the convenience and flexibility of their services. On the other hand, the advent of selfdriving vehicles promises to further revolutionize urban transportation. Indeed, some expect that the junction of these two paradigms, Autonomous MobilityonDemand (AMoD) will have such a profound impact that it will dramatically reduce personal vehicle ownership [1].
In particular, AMoD systems present a unique opportunity to address the widespread problem of vehicle imbalance: any uncontrolled MoD system will inevitably accumulate vehicles in some areas and deplete others [2], hampering the quality of service. Unlike existing MoD systems, an AMoD operator can order empty, selfdriving vehicles to rebalance themselves.
Accordingly, this opportunity has spurred the development of controllers that attempt to optimally rebalance AMoD systems in real time. However, as we discuss in the literature review, most of the existing controllers either ignore future demand, assume deterministic future demand, or do not scale to large systems. In particular, while travel demand follows relatively predictable patterns, it is subject to significant uncertainties due to externalities such as, e.g., weather and traffic. Successful AMoD systems must cope with these uncertainties. Thus, the goal of this paper is to propose a stochastic modelpredictive control approach for vehicle rebalancing that leverages shortterm travel demand forecasts while considering their uncertainty.
Literature Review. To keep this paper concise, we limit our review to work that specifically addresses AMoD systems, although similar ideas can be found in the MoD literature. We categorize prior work in realtime control of AMoD systems in two broad classes: i) reactive control methods that do not make assumptions about future demand and ii) Model Predictive Control (MPC) algorithms that are able to leverage signals about future demand. Reactive, timeinvariant methods span from simple bipartite matching, to control methods based on fluidic frameworks. A good comparison of different reactive controllers can be found in [3] and [4], where, notably, both studies show that the controller first proposed in [5] performs competitively across tests. However, these controllers do not provide a natural way to leverage travel demand forecasts.
In contrast, timevarying MPC algorithms, such as those proposed in [3, 6, 7], provide a natural way to leverage travel forecasts. However, [3] suffers from computational complexity as the fleet size grows, and [3, 6] do not account for uncertainty on the forecasts. While the authors of [6] show impressive results in their experiments, it can be shown that the difference between the stochastic optimum and the certainty equivalent one can be arbitrarily large. To address stochasticity of demand, [7] proposes a distributionally robust approach leveraging semidefinite programming. However, their model makes a restrictive Markovian assumption that exchanges fidelity for tractability. Moreover, the authors do not address how to recover integer rebalancing tasks from the fractional strategy provided by the controller.
To the best of our knowledge, there is no existing AMoD controller that i) exploits travel demand forecasts while considering its stochasticity, ii) produces actionable integer solutions for realtime control of AMoD systems, and iii) scales to large AMoD systems.
Statement of Contributions. The contributions of this paper are threefold. First, we develop a stochastic MPC algorithm that leverages travel demand forecasts and their uncertainties to assign and reposition empty, selfdriving vehicles in an AMoD system. Second, we provide high probability bounds on the suboptimality of the proposed algorithm when competing against an oracle controller which knows the true distribution of customer demand. Third, we demonstrate through experiments that the proposed algorithm outperforms the aforementioned deterministic counterparts when the demand distribution has significant variance. In particular, on the same DiDi Chuxing dataset, our controller yields a 62.3 percent reduction in customer waiting time compared to the work presented in [6].
Organization. The remainder of the paper is organized as follows. We introduce the AMoD rebalancing problem in Section II and we formulate it as an explicit stochastic integer program using a Sample Average Approximation (SAA) approach in Section III. In Section IV we discuss approximation algorithms to rapidly solve such an integer program, while in Section V we leverage the presented results to design a stochastic MPC scheme for AMoD systems. In Section VI we compare the proposed MPC scheme against stateoftheart algorithms using numerical simulations. Section VII concludes the paper with a brief discussion and remarks on future research directions.
Ii Model and Problem Formulation
In this section, we first present a stochastic, timevarying network flow model for AMoD systems that will serve as the basis for our control algorithms. Unlike in [6], the model does not assume perfect information about the future, instead it assumes that customer travel demand follows an underlying distribution, which we may estimate from historical and recent data. Then, we present the optimization problem of interest: how to minimize vehicle movements while satisfying as much travel demand as possible. Finally, we end with a discussion on the merits and challenges of the model and problem formulation.
Iia Model
Let be a weighted graph representing a road network, where is the set of discrete regions (also referred to as stations), and the directed edges represent the shortest routes between pairs of stations. We consider to be fully connected so there is a path between any pair of regions. Accordingly, let denote the number of stations. We represent time in discrete intervals of fixed size . The time it takes for a vehicle to travel from station to station , denoted , is an integer multiple of for all pairs .
At time , we consider a planning horizon of consecutive time intervals, i.e. . For notational convenience and without loss of generality, we will always assume that the beginning of the planning horizon is at time . For each time interval in , represents the number of future passengers that want to go from station to station at time interval . However, the travel demand is a random process. Thus, we assume that the travel demand within the time window is characterized by a probability distribution . Additionally, denotes the number of outstanding passengers who have already issued a request to travel from to some time in the past but have not yet been serviced. Note that it is safe to assume that is always known (since keeping track of waiting customers is relatively trivial) and, therefore, deterministic.
Within the same time window, there are selfdriving vehicles which are either idling, serving a customer, or executing a rebalancing task. Thus, the availability of these vehicles is location and timedependent. Specifically, is the number of idle vehicles at the beginning of the time window at station , and the number of vehicles which are currently busy, but will finish their current task and become available at time at station . Thus, the total number of available vehicles in the system as a function of location and time is given by
Vehicle movements are captured by , i.e., is the number of cars, rebalancing or serving customers, which are departing from at time and traveling to . Note that vehicles must satisfy flow conservation, such that, the number of vehicles arriving at a station at a particular time equals the number of departing vehicles. Formally:
(1) 
Finally, captures outstanding customers, such that is the number of outstanding customers who waited until time to be transported from station to station . All outstanding customers must be served within the planning horizon:
(2) 
IiB Problem Formulation
Our objective is to minimize a combination of i) the operational cost based on vehicle movement, ii) the waiting time for outstanding customers and iii) the expected number of customers who upon arrival do not find an available vehicle in their region. Given and a vehicle availability state , the goal is to solve the following optimization problem:
(3)  
s.t  
The first term in the objective, where , is the operational cost, i.e. the cost of operating the fleet (including, e.g., fuel, maintenance, depreciation) in proportion to total distance traveled. Similarly, the second term penalizes customer waiting times by a cost vector , where is the cost of making an outstanding customer wanting to travel between stations and wait until time interval to be served. The last term penalizes the expected mismatch between customer demand and the vehicle supply, that is is the cost of not being able to serve a customer wanting to travel between and at time . Finally, in addition to the previously mentioned constraints, and must be positive integers since fractional vehicles and customers are nonphysical.
IiC Discussion
There are two key challenges in solving (3). First, is a time varying high dimensional probability distribution which is generally not known. Hence, one cannot evaluate the objective function explicitly. Secondly, due to the integer constraints on , (3) is an instance of integer programming which is NPhard, such that no polynomial time algorithms exist and the problem remains computationally intractable for large inputs.
In the following sections, we present a series of relaxations that allow us to efficiently obtain solutions to a surrogate problem that approximates (3). Specifically, to address the unknown distribution in the objective function, we fit a conditional generative model on historical data to predict future demand given recent realizations of demand. To address the computational complexity of integer programming, we perform several relaxations to arrive at a linear programming surrogate problem. Finally, we present bounds on the optimality gap induced by making these relaxations.
Iii Sample Average Approximation Techniques
Since is an unknown, time varying distribution, we cannot explicitly evaluate the objective in (3). To address this issue, we present a SAA problem whose objective function approximates the objective of (3) in section IIIA. In section IIIB we give sufficient conditions under which the solution to the SAA problem from IIIA is near optimal for the original problem. We address the tradeoff between solution accuracy and problem complexity in section IIIC.
Iiia Sample Average Approximation for AMoD control
Despite not knowing , nor being able to sample from it, we have historical data from that we use to train a conditional generative model to mimic the behavior of . With a generative model in hand, one can consider solving (3) with instead of .
However, in many cases solving a stochastic optimization problem exactly is not possible if the underlying distribution does not have a computationally tractable form. Many popular probabilistic generative models, such as Bayesian networks and Bayesian neural networks fall into this category. To overcome this issue, we can sample from the generative model and replace expectations with Monte Carlo estimates to get approximate solutions, a method commonly referred to Sample Average Approximation (SAA) [8, 9]. To this end we generate samples and approximate expectations under with Monte Carlo estimates, i.e.
Using this approximation, we consider the following SAA surrogate problem:
(4)  
s.t.  
where, in addition to the Monte Carlo estimate, we include a series of inequalities to make the objective function linear. Specifically, minimizing is equivalent to minimizing with the constraints . The surrogate SAA problem (4) is directly solvable by offtheshelf mixed integer linear programming (MILP) solvers.
IiiB Oracle inequality performance guarantees for SAA
Sample Average Approximation is not guaranteed in general to provide asymptotically optimal solutions to the population problem as the number of samples goes to infinity. While the objective of an SAA problem converges pointwise to the population objective, if the convergence is not uniform, SAA may return solutions that do not converge to the optimal population value even as the number of samples goes to infinity. In this section, we compare the quality of the solutions to (3) and (4) when evaluated by the objective in (3). Specifically, we present a result stating that if is close to in an appropriate sense and we use enough samples for the SAA in (4), then the obtained solution is with high probability, provably near optimal for the original problem in (3) that we would have solved had we known . Such a result is called an oracle inequality. Using the notation
Lemma 1 (continuity of function minima)
Let denote two real valued functions that have finite global minima, i.e., both and exist. Then,
See section AA for a proof.
Applying this idea to the AMoD setting, let be a solution to (3), and a solution to (4). If is small, then will be at most worse than when evaluated by . It is then of interest to understand the conditions for which will be uniformly close to . Since is a random object, its error in estimating has two contributors: stochastic error and model error. Specifically, the stochastic error is due to the error induced by estimating expectations under using SAA, and the model error is the error incurred when estimating the true distribution using . For the analysis, we will need the following definition.
Definition 1
Subexponential Random Variables
A random vector is subexponential with parameters if, for any satisfying , the following inequality holds:
Intuitively, a random variable is subexponential if its tails decay at least as fast as that of an exponential random variable.
Lemma 2 (Uniform Convergence for SAA)
Let be the true distribution of customer demand, be the distribution of predicted customer demand and let be the distribution of under respectively. Assuming that is subexponential, then for any , with probability , the following holds:
(5)  
where , and represents the divergence between probability distributions which is nonnegative and zero if and only if its arguments are equal.
See section AB for a proof.
Note that the assumption of subexponential is not very restrictive. Indeed, many common distributions including gaussian, Poisson, chisquared, exponential, geometric, and any bounded random variables are all subexponential [11]. If we denote the solution to (3) as and the solution to (4) as , then applying lemmas 1 and 2, the following happens with probability at least .
IiiC Computational Complexity
As shown in lemma 2, the sampling error of (4) is , where is the number of samples used to form the SAA objective. On the other hand, the computational complexity of (4) is an increasing function of , so in this section we discuss how the problem size of (4) depends on . A naive implementation of (4) would allocate decision variables for , and a linear dependence on which would lead to scalability issues since integer programming is NP hard in the worst case. However, note that in an optimal solution, we will have . Thus if for some we have , then the optimal solution has . In this case, solving (4) with the additional constraint of will still yield the same optimal value while reducing the number of decision variables by one. Therefore, for each trip type , instead of needing decision variables , we only need decision variables, where is the number of unique values in the set . The following lemma demonstrates the reduction in complexity achievable by this variable elimination procedure.
Lemma 3 (SAA Problem Size for Subexponential Demand)
Assume that is subexponential with parameters . For any , with probability at least , the number of distinct realizations of the customer demand is no more than . Thus, as long as is not exponentially larger than , a variable elimination procedure ensures that the number of decision variables scales as , as opposed to the linear scaling that the naive implementation would lead one to believe.
See section AC for a proof.
Thus with high probability the number of decision variables will be logarithmic in , which is an exponential improvement over the linear dependence that the naive implementation proposes. This is especially important since using large gives an objective function with less variance.
Iv Scalable Integer Solutions via Totally Unimodular Linear Relaxations
Recall from (5) that increasing the number of samples used for Monte Carlo reduces the standard deviation of the random objective in (4), thereby increasing the quality of the algorithm’s output. While we showed that the number of decision variables is only logarithmic in the sample size , the problem is still NPhard. Thus, increasing the number of samples used in (4) may not be tractable in large scale settings. In this section, we propose a modified algorithm that solves a convex relaxation of (4), which is scalable to large problem sizes.
Our relaxation separately addresses the tasks of servicing existing customers and rebalancing vacant vehicles that are jointly solved in (4). Note that information about future customers can affect scheduling of waiting customers and vice versa in the optimal solution. In such a situation, servicing existing customers and rebalancing vacant vehicles with two separate algorithms prevents the sharing of information and can lead to suboptimal solutions. Nevertheless, this procedure runs in polynomial time, as opposed to integer programming. It is important to note, however, that solutions to convex relaxations of combinatorial problems need not be integral, and in this case naive rounding techniques can lead to violations of the network flow constraints. We obtain integer solutions by showing that our convex relaxations are totally unimodular linear programs. A linear program being totally unimodular means that it always has optimal solutions that are integer valued [12], and can thus be obtained using standard interior point optimization methods.
Network flow minimization problems are linear programs with constraints of the form (1), and preserve total unimodularity. However, in the case of problem (4), the inclusion of the constraints (2) break this totally unimodular structure, and hence solving a relaxation of (4) with the constraint removed is not guaranteed to return an integer solution. Alternatively, if we first assign vehicles to service existing customers, then the problem of rebalancing the empty vehicles no longer has constraints of type (2), and becomes totally unimodular. Inspired by this fact, in section IVA we discuss a bipartite matching algorithm we use to assign vacant vehicles to waiting customers, and in Section IVB we solve a totally unimodular version of (4) to determine rebalancing tasks.
Iva Bipartite Matching for Servicing Waiting Customers
We use a bipartite matching algorithm to pick up waiting customers in a way that minimizes the total waiting time. Specifically, the current state of the system is , where is the number of vehicles currently available at station , and is the number of outstanding customers at station . The decision variable is a vector where is the number of vehicles sent from station to station . Let , where is the vector of all ’s in , is the identity matrix of size and is the matrix Kronecker product. Using this notation, the resulting state of taking action in vehicle state is simply . To satisfy the customers, we want elementwise. If this is not possible, we will pay a cost of for every customer we do not pick up. To capture this, we define a drop vector . The cost vector is defined so that is the travel time between . Thus, the optimal solution to the bipartite matching problem is obtained by solving the following linear program:
(6)  
s.t.  
It can be shown that bipartite matching has the totally unimodular property, and, therefore, will return integer solutions when the constraints are also integer.
IvB Network Flow Optimization for Rebalancing Vehicles
To rebalance vacant vehicles in anticipation of future demand, we now solve (4) with to obtain a rebalancing flow. We have because the task of picking up outstanding customers is given to a bipartite matching algorithm, and hence does not need to be considered here. In this case, we can relax the integer constraints on to obtain a totally unimodular linear program according to Lemma 4.
Lemma 4 (Totally Unimodular SAA Rebalancing Problem)
Consider the following convex relaxation of (4) where :
(7)  
s.t.  
This problem is totally unimodular.
See section AD for a proof.
Thus, in the setting where , the convex relaxation from (4) to (7) is tight in the sense that the solution to the latter is feasible and optimal for the former. For practical use the control strategy is to perform the tasks specified by the solutions to (6) and (7). The main strength of this approach is that both optimizers efficiently solve linear programs, as opposed to integer programs like (4) which can take orders of magnitude longer to solve in practice.
V Stochastic optimization for Model Predictive Control of AMoD systems
When controlling an autonomous fleet of cars in real time, using a receding horizon framework allows the controller to take advantage of new information that is observed in the system. We propose a model predictive control approach to control AMoD systems online whereby a controller periodically issues commands obtained from solutions to optimization problems. Algorithm 1 outlines the details of the MPC controller for one timestep. Every minutes, the controller queries the system to obtain information about the current state , the current number of waiting customers , and recent demand measurements . The controller then draws samples from and uses those samples to solve a stochastic optimization problem. The solve mode specifies if a solution results from integer programming (cf. IIIA) or linear programming (cf section IV). Specifically, if , the controller solves the integer program specified by (4), otherwise it solves the convex relaxation specified by (6) and (7). The controller executes the plan resulting from the optimization for the next seconds after which it repeats this process with updated information.
Vi Numerical Experiments
In this section we evaluate the performance of Algorithm 1 in a MPC framework. We simulate the operation of an AMoD system servicing requests from the two different datasets and compare performance to recent state of the art algorithms. The AMoD system services trip requests in Hangzhou, China from a DiDi Chuxing ridesharing company dataset in the first experiment, and requests from the New York City Taxi and Limousine Commission dataset in the second experiment.
Via Scenarios
For Hangzhou, we leveraged a dataset provided by the Chinese ridesharing company Didi Chuxing. The dataset contains all trips requested by users from January 1 to January 21, 2016, resulting in a total of around eight million trips. The dataset separates Hangzhou into 793 discretized regions. However, the dataset contains only trips that started in a core subset consisting of 66 regions. For simplicity, we disregard trips that do not start and end in this core subset (approximately one million trips). For each trip, the records provide origin region, destination region, a unique customer ID, a unique driver ID, the start timestamp and the price paid. The dataset contains neither geographic information about the location of the individual districts, nor information on the duration of the trip. Thus, we used RideGuru [13] to estimate the travel time of each trip from the trip price, which in turn allowed us to infer average travel times between regions. For the simulation, we used the first 15 days to train the forecasting model, and the last day to test in simulation by “playing back” the historical demand.
The second scenario is based on the wellknown New York City Taxi and Limousine Commission dataset^{1}^{1}1http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml. It contains, among others, all yellow cab taxi trips from 2009 to 2018 in New York City. For each trip, the start and end coordinates and timestamps are provided. For our simulation, we looked only into the trips that started and ended in Manhattan. Additionally, we partitioned the island into 50 regions. We used the trips between December 22, 2011 and February 29, 2012 to train the forecasting model, and used the evening rush hour (18:0020:00) of March 1, 2012 for testing in simulation.
ViB Experimental Design
For each scenario, we simulate the operation of an AMoD system by “playing back” the historical demand at a 6 second resolution. That is, vehicle and customer states get updated in 6 second timesteps. If on arrival, a customer arrives to a region where there is an available vehicle, the customer is assigned to that vehicle. Otherwise, the customer joins the region’s customer queue. A customer’s trip duration corresponds to the travel time recorded in the dataset. However, vehicle speeds are such that travel time between any two region centroids corresponds to the average travel time between those respective regions in the dataset.
Every minutes, the simulation invokes an AMoD controller. The controller returns the rebalancing tasks for each region. These tasks, in turn, are assigned to idle vehicles as they become available. After minutes, unused tasks are discarded, and the controller is invoked again. We tested the following controllers:

Reactive is a timeinvariant reactive controller presented in [5] which rebalances vehicles in order to track uniform vehicle availability at all stations.

MPCLSTMMILP is the model predictive controller presented in [6] which relies on point forecasts and mixed integer linear programming.

MPCLSTMLP is a relaxation of the MPCLSTMMILP controller attained by the ideas described in Section IV by running two linear programs.

MPCLSTMSAA is the controller implementing Algorithm 1 with and samples.

MPCPerfect is a noncausal golden standard where the MPC controller is given perfect forecasts instead of samples of predicted demand.
All MPC controllers are using a planning horizon of 4 hours.
ViC Forecasting
In these experiments, the generative model for Algorithm 1 first estimates the mean of the future demand using a Long Short Term Memory (LSTM) neural network. The LSTM networks were trained on a subset of the data that does not include the test day. We trained a different network for each of the scenarios. Specifically, the LSTM takes as input the past 4 hours of observed customer demand and then predicts the expected demand for the next 2 hours. We assume that the demand follows a Poisson distribution. Moreover, to account for model uncertainty, we sample from the LSTM with dropout, a standard procedure to approximate Bayesian inference [14]. Thus, we draw samples from the LSTM using dropout, and then sample the demand predictions from a Poisson process whose mean is specified by so that
ViD Results
In the Hangzhou scenario, the MPCLSTMSAA controller, based on 1, greatly outperforms the other controllers: it provided a 62.3% reduction in mean customer waiting time over the MPCLSTMMILP controller from [6], and a 96.7% reduction from Reactive (see Table I). Qualitatively, Figure 1 shows how the MPCLSTMSAA controller shows the greatest improvement over MPCLSTMMILP in times of the day where there is relatively high variance (in daytoday travel demand variation). This suggests that the proposed algorithm’s rebalancing strategy is better at handling future demand with high uncertainty than prior work. Naturally, handling uncertainty requires being prepared for a large variety of demand realizations. Thus, it is not unexpected that, as seen in Figure 1 and Table I, MPCLSTMSAA rebalances slightly more than MPCLSTMMILP; nonetheless, it still issues less rebalancing tasks than Reactive.
Moreover, the performance of the MPCLSTMMILP and MPCLSTMLP controllers are essentially the same, which suggests that the relaxations described in IV yield reliable runtimes without significantly sacrificing performance quality.
Wait Times:  Mean  Median  99 Percentile  Reb. Tasks 

Reactive  276.284  72.0  1890.0  139927 
MPCLSTMMILP  24.149  0.0  582.0  40097 
MPCLSTMLP  23.305  0.0  558.0  39687 
MPCLSTMSAA  9.0799  0.0  264.0  68150 
MPCPerfect  5.527  0.0  168.0  32950 
The New York City scenario also demonstrates benefits of using stochastic optimization in the control. Table II summarizes the results of this case study. While the 99 percentile wait time for the deterministic algorithm MPCLSTMLP is 32 percent smaller than that of Reactive, its mean waiting time is larger by 16 percent. Leveraging stochastic optimization, MPCLSTMSAA further improves the 99 percentile wait time of MPCLSTMLP by 17 percent and offers a 22 percent reduction in mean waiting time over Reactive. In summary, MPCLSTMSAA outperforms both Reactive and MPCLSTMLP in both mean and 99 percentile wait times. As a tradeoff, both MPCLSTMLP and MPCLSTMSAA issue more rebalancing tasks than Reactive.
Wait Times:  Mean  Median  99 Percentile  Reb. Tasks 

Reactive  19.15  0.0  732.0  7196 
MPCLSTMLP  22.70  0.0  504.0  7907 
MPCLSTMSAA  15.07  0.0  420.0  10952 
MPCPerfect  10.8  0.0  384.0  8356 
Vii Conclusions
In this paper, we developed a stochastic Model Predictive Control algorithm for AMoD systems that leverages uncertain travel demand forecasts. We discussed two variants of the proposed algorithm, one using integer programming and a relaxed, linear programming approach that trades optimality for scalability. Through experiments, we show that the latter algorithm outperforms stateofthe art approaches in the presence of uncertainty.
Future work will incorporate traffic congestion by modeling a detailed road network with finite capacities as is done in [15]. This, in turn, can be coupled with models for public transit to provide a multimodal [16], realtime stochastic control of AMoD systems. Similarly, ongoing research is studying coordination between the power network and electric AMoD systems [17]. Such a setting would be particularly interesting since both travel and power demand are stochastic. Due to its central role in the proposed algorithm, another area of interest is the development of principled algorithms for predicting shortterm demand. In particular, we will tackle the challenge of capturing spatiotemporal demand distribution in the face of a myriad of factors, such as weather, traffic and vehicle availability. Finally, accurate forecasts will enable robust and riskaverse objectives.
References
 [1] D. Neil. Could selfdriving cars spell the end of ownership? Available at http://www.wsj.com/articles/couldselfdrivingcarsspelltheendofownership1448986572. The Wall Street Journal.
 [2] G. K. David, “Stochastic modeling and decentralized control policies for largescale vehicle sharing systems via closed queueing networks,” Ph.D. dissertation, The Ohio State University, 2012.
 [3] R. Zhang, F. Rossi, and M. Pavone, “Model predictive control of Autonomous MobilityonDemand systems,” in Proc. IEEE Conf. on Robotics and Automation, 2016.
 [4] S. HÃ¶rl, C. Ruch, F. Becker, E. Frazzoli, and K. W. Axhausen, “Fleet control algorithms for automated mobility: A simulation assessment for zurich,” in 97th Annual Meeting of the Transportation Research Board, 2018.
 [5] M. Pavone, S. L. Smith, E. Frazzoli, and D. Rus, “Robotic load balancing for MobilityonDemand systems,” Int. Journal of Robotics Research, vol. 31, no. 7, pp. 839–854, 2012.
 [6] R. Iglesias, F. Rossi, K. Wang, D. Hallac, J. Leskovec, and M. Pavone, “Datadriven model predictive control of autonomous mobilityondemand systems,” in Proc. IEEE Conf. on Robotics and Automation, 2018, in Press.
 [7] F. Miao, S. Han, A. M. Hendawi, M. E. Khalefa, J. A. Stankovic, and G. J. Pappas, “Datadriven distributionally robust vehicle balancing using dynamic region partitions,” in Int. Conf. on CyberPhysical Systems, 2017.
 [8] T. Homemde Mello and G. Bayraksan, “Monte carlo samplingbased methods for stochastic optimization,” Surveys in Operations Research and Management Science, vol. 19, no. 1, pp. 56–85, 2014.
 [9] J. R. Birge and F. Louveaux, Introduction to stochastic programming. Springer Science & Business Media, 2011.
 [10] M. Tsao, R. Iglesias, and M. Pavone, “Stochastic model predictive control for autonomous mobility on demand,” Extended version of ITSC 2018 paper. Available at https://arxiv.org/pdf/1804.11074.pdf.
 [11] R. Vershynin, HighDimensional Probability: An Introduction with Applications in Data Science. Canbridge University Press, 2018. [Online]. Available: http://wwwpersonal.umich.edu/~romanv/papers/HDPbook/HDPbook.pdf
 [12] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network Flows: Theory, Algorithms and Applications. Prentice Hall, 1993.
 [13] Fare estimates, rideshare questions & answers. Available at https://ride.guru. RideGuru. https://ride.guru/.
 [14] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in Proceedings of The 33rd International Conference on Machine Learning, M. F. Balcan and K. Q. Weinberger, Eds. PMLR, Jun. 2016, pp. 1050–1059. [Online]. Available: http://proceedings.mlr.press/v48/gal16.pdf
 [15] R. Zhang, F. Rossi, and M. Pavone, “Routing autonomous vehicles in congested transportation networks: Structural properties and coordination algorithms,” in Robotics: Science and Systems, 2016.
 [16] M. Salazar, F. Rossi, M. Schiffer, C. H. Onder, and M. Pavone, “On the interaction between autonomous mobilityondemand and the public transportation systems,” in Proc. IEEE Int. Conf. on Intelligent Transportation Systems, 2018, submitted. Extended Version, Available at https://arxiv.org/abs/1804.11278.
 [17] F. Rossi, “On the interaction between Autonomous MobilityonDemand systems and the built environment: Models and large scale coordination algorithms,” Ph.D. dissertation, Stanford University, Dept. of Aeronautics and Astronautics, 2018.
Appendix A Appendix
Aa Proof of Lemma 1
Given define and . Let . We then see that
which is the desired result.
AB Proof of Lemma 2
First recall the definition of the divergence between two probability distributions .
The function is nonnegative and is zero if and only if . Since is Lipschitz, we have:
Define , we have:
Since for any constant , we let .
The first inequality is due to CauchySchwarz in , and the second inequality is due the fact that for any random variable by the following calculations:
The second inequality is because is a 1Lipschitz function. It is also possible to control the model error using the RMSE of the generative model , but that bound is weaker than what is presented here. For the standard deviation bound, we use the concentration of measure for subexponential random variables. A random variable is subexponential if there exists parameters so that for any , we have
The following probability bounds for subexponential random variables are well known: If is subexponential, then:
Let be samples from used to form the objective function. Let . 1Lipschitz functions of subexponential random variables are also subexponential with the same parameters, thus are i.i.d. subexponential random variables. Thus, the objective of (4), is subexponential. Applying the first part of the probability bound, we see that:
for any . For any error tolerance, setting , for sufficiently large the bound evaluates to . However this inequality only applies to a particular pair of . Since and , can take at most many distinct values. Note that if we can always set without affecting performance because the system cannot pick up more than waiting customers at any time. Thus, we also have and hence can take at most values. Thus there are at most possible plans . Taking a union bound over all possible gives, with probability at least ,
Applying lemma 1 with this bound yields the desired result.
AC Proof of Lemma 3
Here we use subexponential concentration inequalities to obtain a bound on the maximum and minima of i.i.d. subexponential random variables. Lemma 3 is related to the the distribution of maxima of subexponential random variables. Let be i.i.d. zero mean subexponential random variables. We proceed by the standard Chernoff bounding technique. For any , we have:
If then setting gives the tightest upper bound, in which case we have:
but recall that , meaning
thus for any , setting , the upper bound is equal to . Hence,
The concentration of the minimum is analogous, by noting that is also subexponential, and applying the above argument. Applying this to our problem, if the demand for each is subexponential, and samples are observed, then by the above argument, with probability at least all samples fall in the interval
But since these samples are integer valued, if they lie in an interval of size , then there can be at most distinct samples of the demand. Taking a union bound over all tuples , we have that with probability at least , for each tuple , the number of unique elements in is at most . Summing over all we then have that the total number of decision variables is at most . The number of decision variables is trivially at most , therefore taking the better of the two bounds yields the result.
AD Proof of Lemma 4
Define so that the decision variable for (7) is . To show that (7) is totally unimodular, it is necessary and sufficient to show that all extreme points of the constraint polyhedron are integer vectors. Recall that a point is an extreme point if and only if the matrix of active constraints has rank , where the active constraint matrix is the matrix whose rows are the equality constraints and active inequality constraints of the problem at . Since , a point is extreme if and only if has full column rank. We can express the active constraints as:
where are chosen so that is equivalent to the network flow constraints specified by (1), and are chosen so that represents the active inequality constraints , and/or . Noting that
has full column rank only if has full column rank, meaning must be an extreme point of the polyhedral constraints defined by . However, recalling that arises from network flow constraints and is a unimodular matrix, this immediately implies that must be an integer vector. For each tuple , the decision variable is subject to exactly two constraints: , and . In any extreme point, at least one of these constraints is active. This can be shown easily via contradiction. If is extreme and for some , both (i.e. all) constraints are inactive, then define I to be the index of in . Since there are no active constraints involving , the Ith column of is zero, hence cannot have full column rank. Therefore . Since we showed is integer, and are integer, this implies that must be integer, finally implying that is integer. Since all extreme points are integer valued, (7) is a totally unimodular linear program.