Data-Driven Robust Taxi Dispatch under Demand Uncertainties

Data-Driven Robust Taxi Dispatch under Demand Uncertainties

Fei Miao,     Shuo Han,     Shan Lin,     Qian Wang,     John Stankovic,     Abdeltawab Hendawi,     Desheng Zhang,     Tian He,     George J. Pappas This work was supported by NSF CPS-1239152, Project Title: CPS: Synergy: Collaborative Research: Multiple-Level Predictive Control of Mobile Cyber Physical Systems with Correlated Context, NSF (CNS-1239224) and TerraSwarm. Part of the results of this work appeared at the 54th IEEE Conference on Decision and Control, Osaka, Japan, December 2015 [19]. F. Miao is with the Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA 06269. Email: S. Han is with the Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, Illinois, USA 60607. Email: S. Lin is with Department of Electrical and Computer Engineering, Stony Brook University, Long Island, NY, USA 11794. Email: Q. Wang is with ADVANCE.AI, Beijing, China. Email: J. Stankovic and A. Hendawi are with the Department of Computer Science, University of Virginia, Charlottesville, VA, USA, 22904. Email: {stankovic, hendawi} D. Zhang is with the Department of Computer Science at Rutgers University, NJ, USA 08854. Email: T. He is with Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA. Email: G. J. Pappas is with the Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA 19014. Email:

In modern taxi networks, large amounts of taxi occupancy status and location data are collected from networked in-vehicle sensors in real-time. They provide knowledge of system models on passenger demand and mobility patterns for efficient taxi dispatch and coordination strategies. Such approaches face new challenges: how to deal with uncertainties of predicted customer demand while fulfilling the system’s performance requirements, including minimizing taxis’ total idle mileage and maintaining service fairness across the whole city; how to formulate a computationally tractable problem. To address this problem, we develop a data-driven robust taxi dispatch framework to consider spatial-temporally correlated demand uncertainties. The robust vehicle dispatch problem we formulate is concave in the uncertain demand and convex in the decision variables. Uncertainty sets of random demand vectors are constructed from data based on theories in hypothesis testing, and provide a desired probabilistic guarantee level for the performance of robust taxi dispatch solutions. We prove equivalent computationally tractable forms of the robust dispatch problem using the minimax theorem and strong duality. Evaluations on four years of taxi trip data for New York City show that by selecting a probabilistic guarantee level at 75%, the average demand-supply ratio error is reduced by 31.7%, and the average total idle driving distance is reduced by 10.13% or about 20 million miles annually, compared with non-robust dispatch solutions.

I Introduction

Modern transportation systems are equipped with various sensing technologies for passenger and vehicle tracking, such as radio-frequency identification (RFID) and global positioning system (GPS). Sensing data collected from transportation systems provides us opportunities for understanding spatial-temporal patterns of passenger demand. Methods of predicting taxi-passenger demand [28, 22], travel time [15, 27, 3] and traveling speed [13, 2] according to traffic monitoring data have been developed.

Based on such rich spatial-temporal information about passenger mobility patterns and demand, many control and coordination solutions have been designed for intelligent transportation systems. Robotic mobility-on-demand systems that minimize the number of re-balancing trips [24, 30], and smart parking systems that allocates resource based on a driver’s cost function [14] have been proposed. Dispatch algorithms that aim to minimize customers’ waiting time [26, 17] or to reduce cruising mile [29] have been developed. In our previous work [20, 21], we design a receding horizon control (RHC) framework that incorporates predicted demand model and real-time sensing data. Considering future demand when making the current dispatch decisions helps to reduce autonomous vehicle balancing costs [30] and taxis’ total idle distance [21, 20]. Strategies for resource allocation depend on the model of demand in general, and the knowledge and assumptions about the demand affect the performance of the supply-providing approaches [9][23]. These works rely on precise passenger-demand models to make dispatch decisions.

However, passenger-demand models have their intrinsic model uncertainties that result from many factors, such as weather, passenger working schedule, and city events etc. Algorithms that do not consider these uncertainties can lead to inefficient dispatch services, resulting in imbalanced workloads, and increased taxi idle mileage. Although robust optimization aims to minimize the worst-case cost under all possible random parameters, it sacrifices average system performances [1]. For a taxi dispatch system, it is essential to address the trade-off between worst-case and the average dispatch costs under uncertain demand. A promising yet challenging approach is a robust dispatch framework with an uncertain demand model, called an uncertainty set, that captures spatial-temporal correlations of demand uncertainties and the robust optimal solution under this set provides a probabilistic guarantee for the dispatch cost (as defined in problem (12)).

In this work, we consider two aspects of a robust vehicle dispatch model given a taxi-operational records dataset: (1) how to formulate a robust resource allocation problem that dispatches vacant vehicles towards predicted uncertain demand, and (2) how to construct spatial-temporally correlated uncertain demand sets for this robust resource allocation problem without sacrificing too much average performance of the system. We first develop the objective and constraints of a robust dispatch problem considering spatial-temporally correlated demand uncertainties. The objective of a system-level optimal dispatch solution is balancing workload of taxis in each region of the entire city with minimum total current and expected future idle cruising distance. We define an approximation of the balanced vehicle objective in this work, such that the robust vehicle dispatch problem is concave of the uncertain demand and convex of the decision variables. We then design a data-driven algorithm for constructing uncertainty demand sets without assumptions about the true model of the demand vector. The constructing algorithm is based on hypothesis testing theories [6] [11] [25], however, how to apply these theories for spatial-temporally correlated transportation data and uncertainty sets of a robust vehicle resource allocation problem have not been explored before. To the best of our knowledge, this is the first work to design a robust vehicle dispatch model that provides a desired probabilistic guarantee using predictable and realistic demand uncertainty sets.

Furthermore, we explicitly design an algorithm to build demand uncertainty set from data according to different probabilistic guarantee level for the cost. With two types of uncertainty sets — box type and second-order-cone (SOC) type, we prove equivalent computationally tractable forms of the robust dispatch problem under these uncertainty demand models via the minimax theorem and the strong duality theorem. The robust dispatch problem formulated in this work is convex over the decision variables and concave over the constructed uncertain sets with decision variables on the denominators. This form is not the standard form (i.e., linear programming (LP) or semi-definite programming (SDP) problems) that has already been covered by previous work [4, 6, 10]. With proofs shown in this work, both system performance and computational tractability are guaranteed under spatial-temporal demand uncertainties. The average performance of the robust taxi dispatch solutions with SOC type of uncertain demand set is better compared with that of the box (range) type of uncertainty set in the evaluations based on data. Hence, it is critical to use a more complex type of uncertainty set, the SOC type, and the corresponding robust dispatch model we design in this work. The contributions of this work are:

  • We develop a robust optimization model for taxi dispatch systems under spatial-temporally correlated uncertainties of predicted demand, and define an approximation of the balanced vehicle objective. The robust optimization problem of approximately balancing vacant taxis with least total idle distance is concave of the uncertain demand, convex of the decision variables and computationally tractable under multiple types of uncertainties.

  • We design a data-driven algorithm to construct uncertainty sets that provide a desired level of probabilistic guarantee for the robust taxi dispatch solutions.

  • We prove that there exist equivalent computationally tractable convex optimization forms for the robust dispatch problem with both polytope and second-order-cone (SOC) types of uncertainty sets constructed from data.

  • Evaluations on four years of taxi trip data in New York City show that the SOC type of uncertain set provides a smaller average dispatch cost than the polytope type. The average demand-supply ratio mismatch is reduced by , and the average total idle distance is reduced by or about million miles annually with robust dispatch solutions under the SOC type of uncertainty set.

The rest of the paper is organized as follows. The taxi dispatch problem is described and formulated as a robust optimization problem given a closed and convex uncertainty set in Section II. We design an algorithm for constructing uncertain demand sets based on taxi operational records data in Section III. Equivalent computationally tractable forms of the robust taxi dispatch problem given different forms of uncertainty sets are proved in Section IV. Evaluation results based on a real data set are shown in Section V. Concluding remarks are provided in Section VI.

Ii Problem Formulation

The goal of taxi dispatch is to direct vacant taxis towards current and predicted future requests with minimum total idle mileage. There are two objectives. One is sending more taxis for more requests to reduce mismatch between supply and demand across all regions in the city. The other is to reduce the total idle driving distance for picking up passengers in order to save cost. Involving predicted future demand when making current decisions benefits to increasing total profits, since drivers are able to travel to regions with better chances to pick up future passengers. In this section, we formulate a taxi dispatch problem with uncertainties in the predicted spatial-temporal patterns of demand. A typical monitoring and dispatch infrastructure is shown in Figure 1. The dispatch center periodically collects and stores real-time information such as GPS location, occupancy status and road conditions; dispatch solutions are sent to taxis via cellular radio.

Figure 1: A prototype of the taxi dispatch system

Ii-a Problem description

Parameters of (11) Description
the number of regions
model predicting time horizon
the uncertain total number of requests at each region during time
weight matrix, is the distance from region to region
probability matrix that describes taxi mobility patterns during one time slot
the initial number of vacant taxis at each region provided by GPS and occupancy status data
the upper bound of distance each taxi can drive idly for picking up a passenger
the power on the denominator of the cost function
the weight factor of the objective function
Variables of (11)
the number of taxis dispatched from region to region during time
the number of vacant taxis at each region before dispatching at the beginning of time
Parameters of Algorithm 1
the uncertain concatenated demand vector of consecutive time slots
one sample of according to sub-dataset , records of date
significance level of a hypothesis testing
Table I: Parameters and variables of taxi dispatch problem (11).

For computational efficiency, we assume that the entire city is divided into regions, and time of one day is discretized to time slots indexed by . Taxi dispatch decision is calculated in a receding horizon process, since considering future demand when making the current dispatch decisions helps to reduce resource allocating costs [30] and taxis’ total idle distance [20]. At time , we consider the effects of current decision to the following time slots. Only the dispatch solution for time is implemented and solutions for remaining time slots are not materialized. When the time horizon rolls forward by one step from to , information about vehicle locations and occupancy status is observed and updated and we calculate a new dispatch solution for .

We define as the number of total requests within region during time , and is the model predicting time horizon. We relax the integer constraint of to positive real, since the integer constraint will make the robust dispatch problem in this section not computationally tractable. The total number of requests at region may have similar patterns as its neighbors, for instance, during busy hours, several downtown regions may all have peak demand. Meanwhile, demand during several consecutive time slots , are temporally correlated. Typically, it is difficult to predict a deterministic value of passenger demand of a region during specific time. We define the spatial-temporally correlated uncertain demand by one closed and convex, or compact set as

Where is called the concatenated demand vector, means the transpose of . The closed, bounded, and convex form of depends on the method to construct the uncertainty set, which we will describe in detail in Section III. Since depends on , and is one component of , the uncertainty set for demand at time is defined as a closed, convex set , and a projection of

Note that the projection of a convex set onto some of its coordinates is also convex [8, Chapter 2.3.2].

A robust dispatch model that decides the amount of vacant taxis sent between each node pair according to the demand at each node and practical constraints is described in a network flow model of Figure 2. The edge weight of the graph represents the distance between two regions. Specifically, each region has an initial number of vacant taxis provided by real-time sensing information and an uncertain predicted demand. We define a non-negative decision variable matrix , , where is the number of vehicles dispatched from region to . We relax the integer constraint of to a non-negative real constraint, since mixed integer programming is not computational tractable with uncertain parameters. Every time when making a resource allocation decision by solving the following robust optimization problem


where is a convex cost function for allocating resources, is a function concave in and convex in that measures the service fairness of the resource allocating strategy, and is a convex domain of the decision variables that describes the constraints. We define specific formulations of the objective and constraint functions in the rest of this section.

Figure 2: A network flow model of the robust taxi dispatch problem. A circle represents a region with region ID . We omit the superscript of time since every parameter is for one time slot only. Uncertain demand is denoted by , is the original number of vacant taxis before dispatch at region , and is a dispatch solution that sending the number of vacant taxis from region to region with the distance .

Ii-B Robust taxi dispatch problem formulation

Estimated cross-region idle-driving distance: When traversing from region to region , taxi drivers take the cost of cruising on the road without picking up a passenger till the target region. Hence, we consider to minimize this kind of idle driving distance while dispatching taxis. We define the weight matrix of the network in Fig. 2 as , where is the distance between region and region . The across-region idle driving cost according to is


We assume that the region division method is time-invariant in this work, and is a constant matrix for the optimization problem formulation – for instance, the value of represents the length of shortest path on streets from the center of region to the center of region 111For control algorithms with a dynamic region division method, the distance matrix can be generalized to a time dependent matrix as well..

The distance every taxi can drive should be bounded by a threshold parameter during limited time

which is equivalent to


To explain this, assume the constraint (3) holds. If and , we have , which contradicts to (3). The threshold is related to the length of time slot and traffic conditions on streets. For instance, with an estimated average speed of cars in one city during time , and idle driving time to reach a dispatched region is required to be less than minutes, then the value of should be the distance one taxi can drive during minutes with the current average speed on road.

Metric of serving quality: We design the metric of service quality as a function concave in and convex in in this work for computational efficiency [4]. Besides vacant taxis traverse to region according to matrix , we define as the number of vacant taxis at region before dispatching at the beginning of time , and is provided by real-time sensing information. We assume that the total number of vacant taxis is greater than the number of regions, i.e., , and each region should have at least one vacant taxi after dispatch. Then the total number of vacant taxis at region during time satisfies that


One service metric is fairness, or that the demand-supply ratio of each region equals to that of the whole city. A balanced distribution of vacant taxis is an indication of good system performance from the perspective that a customer’s expected waiting time is short as shown by a queuing theoretic model [30]. Meanwhile, a balanced demand-supply ratio means that regions with less demand will get less resources, and idle driving distance will be reduced in regions with more supply than demand if we pre-allocate possible redundant supply to those regions in need. We aim to minimize the mismatch value or the total difference between local region demand-supply ratio and the global demand-supply ratio of the whole city, similarly as the objective defined in [21, 20]


However, the function (6) is not concave in for any . It is worth noting we need a function concave in for any , and convex in for any , to make sure the robust optimization problem is computationally tractable. Hence, we define


as a service fairness metric to minimize. This is because we approximately minimize (6) by minimizing (7) under the constraints (4) and (5) with an value chosen according to the desired approximation level, and the following Lemma explains this approximation.

Lemma 1

Given deterministic demand vectors and initial number of vacant vehicles before dispatch that satisfy , , , for any , any , , there exists an , such that the optimal solution by minimizing (7) under constraints (4) and (5) satisfies


See Appendix A-A. According to the proof, we can always choose to be small enough (or close enough to ) in order to obtain a desired level of approximation . Hence, in the experiments of Section V, we numerically choose based on simulation results. Therefore, with function (7), we map the objective of balancing supply according to demand across every region in the city to a computationally tractable function that concave in the uncertain parameters and convex in the decision variables for a robust optimization problem.

The number of initial vacant taxis depends on the number of vacant taxis at each region after dispatch during time and the mobility patterns of passengers during time , while we do not directly control the latter. We define as the probability that a taxi traverses from region to region and turns vacant again (after one or several drop off events) at the beginning of time , provided it is vacant at the beginning of . Methods of getting based on data include but not limited to modeling trip patterns of taxis [21] and autonomous mobility on demand systems [30]. Then the number of vacant taxis within each region by the end of time satisfies


Weighted-sum objective function: Since there exists a trade-off between two objectives, we define a weighted-sum with parameter of the two objectives defined in (2) and defined in (7) as the objective function. Let and represent decision variables and . Without considering model uncertainties corresponding to , a convex optimization form of taxi dispatch problem is


Robust taxi dispatch problem formulation: We aim to find out a dispatch solution robust to an uncertain demand model in this work. For time , uncertain demand only affects the dispatch solutions of time , and dispatch solution at is related to uncertain demand at , similar to the multi-stage robust optimization problem in [7]. However, the control laws considered in [7] are polynomial in past-observed uncertainties; in this work, we do not restrict the decision variables to be any forms of previous-observed uncertain demands. The dispatch decisions are numerical optimal solution of a robust optimization problem. With a list of parameters and variables shown in Table I, considering both the current and future dispatch costs when making the current decisions, we define a robust taxi dispatch problem as the following


After getting an optimal solution of (11), we adjust the solution by rounding methods to get an integer number of taxis to be dispatched towards corresponding regions. It does not affect the optimality of the result much in practice, since the objective or cost function is related to the demand-supply ratio of each region. A feasible integer solution of (11) always exists, since is feasible. Although we cannot provide any theoretical guarantee on the suboptimality of the rounded integer solution, in the numerical experiments the costs under integer solution after rounding and the original real value optimal solution are comparable.

Iii Algorithm For Constructing Uncertain Demand Sets

With many factors affecting taxi demand during different time within different areas of a city, explicitly describing the model is a strict requirement and errors of the model will affect the performance of dispatch frameworks. Considering future demand and demand uncertainties benefits for minimizing worst-case demand-supply ratio mismatch error and idle distance [21, 20]. It is then essential to construct a model that captures the spatial-temporal demand uncertainties and provides a probabilistic guarantee about the vehicle resource allocation cost. We construct demand uncertainty sets via Algorithm 1—getting a sample set of from the original dataset and partition the sample set, bootstrapping a threshold for the test statistics according to the requirement of the probability guarantee, and calculating the model of uncertainty sets based on the thresholds.

Iii-a An uncertainty set with probabilistic guarantee

For convenience, we concisely denote all the variables of the taxi dispatch problem as . Assume that we do not have knowledge about the true distribution of the random demand vector . WIth the objective function of problem (11), the probabilistic guarantee for the event that the true dispatch cost being smaller than the optimal dispatch cost is defined as the following chance constrained problem


The constraint and objective function are concave in for any , and convex in for any . Without loss of generality about the objective and constraint functions, equivalently we aim to find solutions for


When it is difficult to explicitly estimate , we solve the following robust problem such that its optimal solutions satisfy the probabilistic guarantee requirement for (13)


Then of problem (14) can be any vector in the uncertainty set instead of a random vector in (13). The uncertainty set that keeps the optimal solution of (14) satisfying the constraints of problem (13) is defined as the following:

Problem 1

Construct an uncertainty set , given and samples of random vectors , such that

(P1). The robust constraint (14) is computationally tractable.

(P2). The set implies a probabilistic guarantee for the true distribution of a random vector at level , that is, for any optimal solution and for any function concave in , we have the implication:


The given probabilistic guarantee level is related to the degree of conservativeness of the robust optimization problem.

Iii-B Aggregating demand and partition the sample set

Every discretized time slots of demand are concatenated to a vector . The first step is to transform the original taxi operational data to a dataset of sampled vector of different dates for each index . For instance, assume we choose the length of each time slot as one hour, and the dataset records all trip information of taxis during each day. According to the start time and GPS coordinate of each pick-up event, we aggregate the total number of pick up events during one hour at each region to get samples .

It is always possible to describe the support of the distribution of all samples contained in the dataset even they do not follow the same distribution, as explained in Figure 3. When there is prior knowledge or categorical information to partitioned the dataset into several subsets, we get a more accurate uncertainty set for each sub-dataset to provide the same probabilistic guarantee level compared with the uncertainty set from the entire dataset. Clustering algorithms with categorical information [16] is applicable for dataset partition when information besides pick up events is available, such as weekdays/weekends, weather or traffic conditions. It is worth noting that if the uncertainty sets are built for a categorical information set , then for the robust dispatch problems, we require the same set of categories is available in real-time, hence we apply the uncertainty set of to find solutions when the current situation is considered as .

Figure 3: Intuition for partitioning the whole dataset. When the data set includes data from three distributions , without prior knowledge, we can build a larger uncertainty set that describes the range of all samples in the dataset. The problem is that the uncertainty set is not accurate enough.

Iii-C Uncertainty Modeling

The basic idea to define an uncertainty set is to find a threshold for a hypothesis testing that is acceptable with respect to the given dataset and a required probabilistic guarantee level, and the formula of an uncertainty set is related to the threshold value of an acceptable hypothesis testing. Given the original data, the null hypothesis , , and the test statistics , we need to find a threshold that accepts at significance value for each subset of sampled demand vectors. Since we do not assume that the marginal distribution for every element of vector is independent with each other, we apply two models without any assumptions about the true distribution in the robust optimization literature [6] [11] [25] on the spatial-temporally correlated demand data.

Iii-C1 Box type of uncertainty demand sets built from marginal samples

One intuitive description about a random vector is to define a range for each element. For instance, consider the following multivariate hypothesis holds simultaneously for with given thresholds  [11]


Assume that we have random samples for each component of , ordered in increasing value as no matter what is the original sampling order. We define the index by


and let if the corresponding set is empty. The test is rejected if . To construct an uncertainty set, we need an accepted hypothesis test. Hence, we set and . The following uncertainty set is then applied in this work based on the range hypothesis testing (16).

Proposition 1 ([6][11])

If defined by equation (17) satisfies that , then, with probability at least over the sample, the set


implies a probabilistic guarantee for at level .

Iii-C2 SOC type of uncertainty set motivated by moment hypothesis testing

It is not easy to tell directly from the uncertainty set (18) when the range of one component changes how will others be affected. To directly show the spatial-temporal correlations of the demand, we also apply hypothesis testing related to both the first and second moments of the true distribution of the random vector [25].


where and are the (unknown) true mean and covariance of , and are the estimated mean and covariance from data. Without knowledge of and , is rejected when the difference among the estimation of mean or covariance according to multiple times of samples is greater than the threshold, i.e., or , where is the estimated mean value of one experiment, and are the estimated mean and covariance from multiple experiments, and are the thresholds. The remaining problem is then to find the values of the thresholds such that hypothesis testing (19) holds given the dataset. The uncertainty set derived based on the moment hypothesis testing is defined in the following proposition.

Proposition 2 ([6][25])

With probability at least with respect to the sampling, the following uncertainty set implies a probabilistic guarantee level of for


where is a Cholesky decomposition.

When one component of increases or decreases, we have an intuition how it affects the value of other components of by the expression (20).

Iii-D Algorithm

With a threshold of the test statistics calculated via the given dataset, we then apply the formula (18) for constructing a box type of uncertainty set, and the formula (20) for an SOC type of uncertainty set, respectively. The following Algorithm 1 describes the complete process for constructing uncertain demand sets based on the original dataset.

Input: A dataset of taxi operational records
1. Demand aggregating and sample set partition
Aggregate demand to get a sample set of the random demand vector from the original dataset. Partition the sample set and denote a subset , as the subset partitioned for each time index according to either prior knowledge or categorical information .
2. Bootstrapping thresholds for test statistics
for each subset  do
Initialization: Testing statistics , a null-hypothesis , the probabilistic guarantee level , a significance level , the number of bootstrap time .
Estimate the mean and covariance for vector based on subset .
for  do
  (1). Re-sample data points from with replacement for each .
  (2). Get the value of the test statistics based on .
end for
 (3). Get the thresholds of the significance level for .
end for
3. Calculate the model of uncertainty sets
Get the box type and the SOC type of uncertainty sets according to (18) and (20), respectively, for each and . Output: Uncertainty sets for problem (11)

Algorithm 1 Constructing uncertain demand sets

We do not restrict the method of estimating mean and covariance matrices of a subset in step , and bootstrap is one method. For step , the process for the box type of uncertainty sets is: calculate index that satisfies (17) with the given , sort each component of sampled vectors , and get the order statistics , of the -th sample set . For the SOC type, we calculate the mean and covariance of the samples of the vector according to the subset as and , respectively.

In step , the level thresholds for the box type of uncertainty sets are the -th largest value of the upper bound and the -th largest value of the lower bound for the -th component. For the SOC type of uncertainty sets, we calculate the mean and covariance of for the times bootstrap as and , and get , Denote the -th largest value of and as and , respectively.

In summary, to construct a spatial-temporal uncertain demand model for problem (11), in this section, we consider the taxi operational record of each day as one independent and identically distributed (i.i.d.) sample for the concatenated demand vector . By partitioning the entire dataset to several subsets according to categorical information such as weekdays and weekends, we are able to build uncertainty sets for each subset of data without additional assumptions about the true distribution of the spatial-temporal demand profile. Then we design Algorithm 1 to construct a box type and an SOC type of uncertainty sets based on data that provide a desired probabilistic guarantee of robust solutions.

Iv Computationally Tractable Formulations

We build equivalent computationally tractable formulations of problem (11) with different definitions of uncertain sets calculated by Algorithm 1 in this section. Hence, the robust taxi dispatch problem considered in this work can be solved efficiently. Computational tractability of a robust linear programming problem for ellipsoid uncertainty sets is discussed in [4]. The process is to reformulate constraints of the original problem to its equivalent convex constraints that must hold given the uncertainty set. The objective function of problem (11) is concave of the uncertain parameters , convex of the decision variables with the decision variables on the denominators, not standard forms of linear programming (LP) or semi-definite programming (SDP) problems that already covered by previous work [4, 6]. Hence, we prove one equivalent computationally tractable form of problem (11) for each uncertainty set constructed in Section III.

Only the components of objective functions in (11) include uncertain parameters, and the decision variables of the function are in the denominator of the function . The box type uncertainty set defined as (18) is a special form of polytope, hence, we first prove an equivalent standard form of convex optimization problem for (11) for a polytope uncertainty set as the following.

Theorem 1

(Next step dispatch) If the uncertainty set of problem (11) when is defined as the non-empty polytope , and we omit the superscripts for variables and parameters without confusion. Then problem (11) with is equivalent to the following convex optimization problem


See Appendix A-B.

To directly use the demand uncertainty set that describes the spatial-temporal correlation of like (18) and (20) for the concatenated demand in problem (11), we first consider to group the maximization over each together to save the process of projection for individual . Furthermore, we can find the dual (a minimizing problem) of the maximizing cost problem over , and then numerically efficiently solve (11) that minimizes the total cost during time under uncertain demand . Hence, we first prove that the minimax equality holds for the maximin problem over each pair of and for problem (11), and (11) is equivalent to the robust optimization problem shown in the following lemma.

Lemma 2

(Minimax equality) Given the assumption that the definition of the uncertainty sets and are compact (closed and convex), the robust dispatch problem (11) is equivalent to the following robust dispatch problem


See Appendix A-C.

For the robust optimization problem (11), the computationally tractable convex form depends on the definition of uncertainty sets.When conditions of Lemma 2 hold, equivalent convex optimization forms of problem (11) are derived based on problem (22). For a multi-stage robust optimization problem that restricts the near-optimal control input of linear dynamical systems to be a certain degree of polynomial of previous observed uncertainties, an approximated semidefinite programming method for calculating the time dependent control input is proposed in [7]. The method does not require minimax equality holds for the robust optimal control problem.

The box type uncertainty set (18) is a special form of polytope, that the uncertain demand model during different time of a day is described separately. The process of converting problem (11) to an equivalent computationally tractable convex form is similar to that of the one-stage robust optimization problem. The result is described as the following lemma.

Lemma 3

If the uncertain set for describes each demand vector separately as a non-empty polytope with the form


problem (11) is equivalent to the following convex optimization problem


See Appendix A-D1.

For a more general case that the uncertainty sets for are temporally correlated, the following theorem and proof describe the equivalent computationally tractable convex form of (11).

Theorem 2

When is defined as the following non-empty polytope set


problem (11) is equivalent to the following convex optimization problem


See Appendix A-D2.

With an uncertain demand model defined as (20) for concatenated , the following theorem derive the equivalent computationally tractable form of problem (11).

Theorem 3

When the uncertainty set for is defined as the SOC form of (20), problem (11) is equivalent to the following convex optimization problem (27).


where is the concatenation of .


See Appendix A-E.

It is worth noting that any optimal solution for problem (10) has a special form between any pair of regions .

Proposition 3

Assume is an optimal solution of (10), then any satisfies that for any pair of , at least one value of the two elements and is .


We prove by contradiction. Assume that one optimal solution has the form such that and . Without loss of generality, we assume that , and let

other elements of equal to . Then

Hence, we have All constraints are satisfied and is also a feasible solution for (11).

Next, we compare and . With , and , we have

Thus the partial cost , which contradicts with the assumption that is an optimal solution. To summarize, we show that an optimal solution cannot have at the same time, and at least one of and should be .

With equivalent convex optimization forms under different uncertainty sets, robust taxi dispatch problem (11) is computationally tractable and solved efficiently.

V Data-Driven Evaluations

We conduct data-driven evaluations based on four years of taxi trip data of New York City [12]. A summary of this data set is shown in Table II. In this data set, every record represents an individual taxi trip, which includes the GPS coordinators of pick up and drop off locations, and the date and time (with precision of seconds) of pick-up and drop-off locations. The dispatch solutions based on different granularities of equal-area region partitions have been compared in [20], and other region partition methods are discussed in [18]. In the following experiments, we use equal-area grid partition since it is a baseline, and compare the robust and non-robust solutions based on the same region partition method. One partition example given the map of Manhattan area is shown in Figure 4, where we visualize the density of taxi passenger demand with the data we use for large-scale data-driven evaluations. The lighter the region, the higher the daily demand density, and the middle regions typically have higher density than the uptown and downtown regions. We construct uncertainty sets according to Algorithm 1, discuss factors that affect modeling of the uncertainty set, and compare optimal costs of the robust dispatch formulation (11) and the non-robust optimization form (10) in this section.

How vacant taxis are balanced across regions with different values: Figure 5 shows mismatch between supply and demand defined as (6) for different optimal solutions of minimizing defined in (7) for . With closer to , the optimal value of (6) is smaller. We choose for calculating optimal solutions of (11) and (10) in this section.

Figure 4: Map of Manhattan area in New York City.
Figure 5: Comparison of demand and supply mismatch values defined as (6) with different solutions for minimizing defined in (7) with in range . The value of function (6) under an optimal solution of is smaller with an closer to , which means the dispatch solution tends to be more balanced throughout the entire city.
Taxi Trip Data set Format
Collection Period Data Size Record Number ID Trip Time Trip Location
01/01/2010-12/31/2013 about million Date Start and end time GPS coordinates of start and end
Table II: New York city data in the evaluation section.
Figure 6: Comparison of box type of uncertainty sets constructed from all data and those constructed only based on trip records of weekdays. When keeping all parameters the same, by applying data of weekdays, the range of uncertainty set for each is smaller than that based on the whole dataset.
Figure 7: Comparison of box type of uncertainty sets constructed from all data and uncertainty sets constructed only based on trip records of weekends.

V-a Box type of uncertainty set

For all box type of uncertainty sets shown in this subsection with the model described in Subsection III-C1, we set the confidence level of hypothesis testings as , bootstrap time as , number of randomly sampled data (with replacement) for each time of bootstrap as .

Partitioned dataset compared with non-partitioned dataset: We show the effects of partitioning the trip record dataset by weekdays and weekends in Figure 6 and 7. The whole city is partitioned into regions, the prediction time horizon is , where one time instant means one hour, , and every . Figures 6 and 7 show the lower and upper bounds of each region during one time slot of (18). By applying data of weekdays and weekends separately, the range of each component is reduced. To get a measurement of the uncertainty level, we defined the sum of range of every component for as


For the box type of uncertainty sets, when values of the dimension of , i.e., , and are fixed, a smaller means a smaller area of the uncertainty set, or a more accurate model. We denote calculated via records of weekdays and weekends as and