Submitted to 
manuscript (Please, provide the manuscript number!) 
[10pt] Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named journal. INFORMS journal templates are for the exclusive purpose of submitting to an INFORMS journal and should not be used to distribute the papers in print or online or to submit the papers to another publication.
Progressive Focus Search for the Static and Stochastic VRPTW with both Random Customers and Reveal Times
Michael SaintGuillain
Université catholique de Louvain, Belgium, michael.saint@uclouvain.be
Christine Solnon
Institut National des Sciences Appliquées de Lyon, France, christine.solnon@insalyon.fr
Yves Deville
Université catholique de Louvain, Belgium
Static stochastic VRPs aim at modeling reallife VRPs by considering uncertainty on data. In particular, the SSVRPTWCR considers stochastic customers with time windows and does not make any assumption on their reveal times, which are stochastic as well. Based on customer request probabilities, we look for an a priori solution composed preventive vehicle routes, minimizing the expected number of unsatisfied customer requests at the end of the day. A route describes a sequence of strategic vehicle relocations, from which nearby requests can be rapidly reached. Instead of reoptimizing online, a socalled recourse strategy defines the way the requests are handled, whenever they appear. In this paper, we describe a new recourse strategy for the SSVRPTWCR, improving vehicle routes by skipping useless parts. We show how to compute the expected cost of a priori solutions, in pseudopolynomial time, for this recourse strategy. We introduce a new metaheuristic, called Progressive Focus Search (PFS), which may be combined with any localsearch based algorithm for solving static stochastic optimization problems. PFS accelerates the search by using approximation factors: from an initial rough simplified problem, the search progressively focuses to the actual problem description. We evaluate our contributions on a new, realworld based, public benchmark.
In the Vehicle Routing Problem with Time Windows, a set of customers must be serviced by a homogeneous fleet of capacitated vehicles, while reconciling each customer’s time windows and vehicle travel times, as well as cumulated customers’ demands and vehicle capacities. Whereas deterministic VRP(TW)s assume perfect information on input data, in realworld applications some input data may be uncertain when computing a solution. In this paper, we focus on cases where the customer presence is a priori unknown. Furthermore, we assume to be provided with some probabilistic knowledge on the missing data. In many situations, the probability distributions can be obtained from historical data. In order to handle new customers who appear dynamically, the current solution must be adapted as such random events occur. Depending on the operational context, we distinguish two fundamentally different assumptions. If the currently unexecuted part of the solution can be arbitrarily redesigned, then we are facing a Dynamic and Stochastic VRP(TW) (DSVRP(TW)). In that case, the solution is adapted by reoptimizing the new current problem while fixing the executed partial routes.
If the routes can only be adapted by following some predefined scheme, then we are facing a Static and Stochastic VRP(TW) (SSVRP(TW)). In the SSVRP(TW), whenever a bit of information is revealed, the current solution is adapted by applying a recourse strategy. Based on the probabilistic information, we seek a first stage (also called a priori) solution that minimizes its a priori cost, plus the expected sum of penalties caused by the recourse strategy. In order for the evaluation function to remain tractable, the recourse strategy must be efficiently computable, hence simple enough to avoid reoptimization. For example, in Bertsimas (1992) the customers are known, whereas their demands are revealed online. Two different assumptions are considered, leading to different recourse strategies, as illustrated in Fig. 1. In strategy a, each demand is assumed to be revealed when the vehicle arrives at the customer place. If the vehicle reaches its maximal capacity, then the first stage solution is adapted by adding a round trip to the depot. In strategy b, each demand is revealed when leaving the previous customer, allowing to skip customers having null demands.
In the recent review of Gendreau et al. (2016), the authors argue for new recourse strategies: With the increasing use of ICT, customer information is likely to be revealed on a very frequent basis. In this context, the chronological order in which this information is transmitted no longer matches the planned sequences of customers on the vehicle routes. In particular, the authors consider as paradoxical the fact that the existing literature on SSVRPs with random Customers (SSVRPC) assumes full knowledge on the presence of customers at the beginning of the operational period.
In this paper, we focus on the SSVRPTW with both random Customers and Reveal times (SSVRPTWCR) introduced by SaintGuillain, Solnon, and Deville (2017), in which no assumption is made on the moment at which a request is known. The goal is to compute the firststage solution that minimizes the expected number of rejected requests, while avoiding assumptions on the moment at which customer requests are revealed. To handle uncertainty on the reveal times, waiting (re)locations are part of firststage solutions.
Let us consider the problem of managing a team of onduty doctors, operating at patient home places during nights and weekends. Oncall periods start with all the doctors at a central depot, where each get assigned a taxi cab for efficiency and safety. Patient requests arrive dynamically. We know from historical data the probability that a request appears depending on the region and time of day. Each online request comes with a hard deadline, and a recourse strategy can be used to decide whether the request can be satisfied in time and how to adapt the routes accordingly. If it cannot be handled in time, the request is rejected and entrusted to an (expensive) external service provider. The goal is to minimize the expected number of rejected requests. In such context, involving very short deadlines, relocating idle doctors anticipatively is often critical. Modeled as a SSVRPTWCR, it is possible to compute a firststage solution composed of (sequences of) waiting locations, optimizing the expected quality of service.
We introduce an improved recourse strategy for the SSVRPTWCR that optimizes routes by skipping some useless parts. Closedform expressions are provided to efficiently compute expected costs for the new recourse strategy. Another contribution is a new metaheuristic, called Progressive Focus Search (PFS), for solving static stochastic optimization problems. PFS accelerates the solution process by using approximation factors, both reducing the size of the search space and the complexity of the objective function. These factors are progressively decreased during the search process so that, from an initial rough approximation of the problem, the search gradually focuses to the actual problem. We also introduce a new public benchmark for the SSVRPTWCR, based on realword data coming from the city of Lyon. Experimental results on this benchmark show that PFS obtains better results than a classical search. Important insights are brought to light. By comparing with a basic (yet realistic) policy which does not exploit stochastic knowledge, we show that our stochastic models are particularly beneficial when the number of vehicles increases and when time windows are tight.
In Section LABEL:sec:relatedwork, we review existing studies on VRPs with stochastic customers and clearly position the SSVRPTWCR with respect to them. In Section LABEL:sec:Problemdescription, we formally define the general SSVRPTWCR. Section LABEL:sec:recourse presents a new recourse strategy, which we present as both a generalization and an improvement over the recourse strategy previously proposed in SaintGuillain, Solnon, and Deville (2017). In Section LABEL:sec:PFS, we describe the Progressive Focus Search metaheuristic for static stochastic optimization problems and show how to instantiate it to solve the SSVRPTWCR. In Section LABEL:sec:bench_experimental_plan, we describe a new public benchmark for the SSVRPTWCR, derived from realworld data, and describe the experimental settings. The experimental results are analyzed in Section LABEL:sec:small_instances for small instances and in Section LABEL:sec:big_instances for larger instances. Finally, further research directions are discussed in Section LABEL:sec:conclusions.
By definition, the SSVRPTWCR is a static problem. Decisions are made a priori. The reader interested in online decision making should refer to Dynamic and Stochastic VRPs, such as the DSVRPTW. A recent literature review about DSVRPs can be found in Ritzinger, Puchinger, and Hartl (2016). Gendreau et al. (2016) provides a literature review on Static and Stochastic VRPs (SSVRP). According to Pillac et al. (2013), the most studied cases in SSVRPs are:

Stochastic customers (SSVRPC), where customer presences are described by random variables. Since the SSVRPTWCR belongs to this category, this nonexhaustive literature review is limited to to studies involving customer uncertainty.
The Traveling Salesman Problem (TSP) is a special case of the VRP with only one uncapacitated vehicle. Jaillet (1985) formally introduced the TSP with stochastic Customers (SSTSPC), a.k.a. the probabilistic TSP (PTSP) or TSPSC in the literature, and provided mathematical formulations and a number of properties and bounds of the problem (see also Jaillet, 1988). In particular, he showed that an optimal solution for the deterministic problem may be arbitrarily bad in case of uncertainty. Laporte, Louveaux, and Mercure (1994) developed the first exact solution method for the SSTSPC.Heuristics for the SSTSPC have then been proposed in Jezequel (1985), Rossi and Gavioli (1987), Bertsimas, Chervi, and Peterson (1995), and Bianchi and Campbell (2007) as well as metaheuristics, such as simulated annealing (Bowler, Fink, and Ball (2003)) or ant colony optimization (Bianchi, Gambardella, and Dorigo, 2002).
Particularly close to the SSVRPTWCR is the SSTSPC with Deadlines introduced by Campbell and Thomas (2008). Unlike the SSVRPTWCR, authors assume that customer presences are revealed all at once at the beginning of the day. They showed that deadlines are particularly challenging when considered in a stochastic context, and proposed two recourse strategies to address deadline violations. A survey on the SSTSPC may be found in Henchiri, Bellalouna, and Khaznaji (2014).
The first SSVRPC has been studied by Jezequel (1985) as a generalization of the SSTSPC. Bertsimas (1992) considered a VRP with stochastic Customers and Demands (SSVRPCD), as described in the introduction section. Gendreau, Laporte, and Séguin (1995) developed the first exact algorithm for solving the SSVRPCD for instances up to 70 customers, by means of an integer Lshaped method, and Gendreau, Laporte, and Séguin (1996) later proposed a tabu search algorithm.A preventive restocking strategy for the SSVRP with random demands has been proposed by Yang, Mathur, and Ballou (2000). Biesinger, Hu, and Raidl (2016) later introduced a variant for the Generalized SSVRP with random demands.
Sungur and Ren (2010) considered the Courier Delivery Problem with Uncertainty. Potential customers have deterministic soft time windows but are present probabilistically, with uncertain service times. The objective is to construct an a priori solution, to be used every day as a basis, then adapted to daily requests. Unlike the SSVRPTWCR, the set of customers is revealed at the beginning of the operations. Heilporn, Cordeau, and Laporte (2011) introduced the DialaRide Problem (DARP) with stochastic customer delays. Each customer is present at its pickup location with a stochastic delay. A customer is then skipped if it is absent when the vehicle visits the corresponding location, involving the cost of fulfilling the request by an alternative service (e.g., a taxi). In a sense, stochastic delays imply that each request is revealed at some uncertain time during the planning horizon. That study is thus related to our problem, except that in the SSVRPTWCR, part of the requests will reveal to never appear.
This section recalls the definition of the SSVRPTWCR, initially provided in SaintGuillain, Solnon, and Deville (2017). In fact, it contains parts taken from section 3 of the aforementioned paper.
We consider a complete directed graph and a discrete time horizon , where the interval denotes the set of all integer values such that . A travel time (or distance) is associated with every arc . The set of vertices is composed of a depot , a set of waiting locations , and a set of customer vertices . We note and . The fleet is composed of vehicles of maximum capacity . Let be the set of potential requests. An element of represents a potential request revealed at time at customer vertex . It is associated the following deterministic attributes: a demand , a service duration and a time window with . We note the probability that appears on vertex at time and assume independence between requests. Although our formalism imposes for all potential requests, in practice a request may be known with probability , leading to a deterministic request. Finally, different customers in can share the same geographical location, making it possible to consider different types of requests in terms of deterministic attributes. To simplify notations, we use to denote the reveal time of a request and for its customer vertex. Furthermore, a request may be written in place of its own vertex . For instance, the distance may also be written as . Table 1 summarizes the main notations.
Complete directed graph  Set of potential requests  
Set of vertices (depot is )  Reveal time of request  
Waiting vertices  Customer vertex hosting request  
Customer vertices  Service time of request  
Travel time of arc  Time window of request  
Number of vehicles  Demand of request  
Vehicle capacity  Probability associated with request  
Discrete time horizon 
The firststage solution is computed offline, before the beginning of the time horizon. It consists of a set of vehicle routes visiting a subset of the waiting vertices, together with duration variables denoted by indicating how long a vehicle should wait on each vertex. More specifically, we denote by a firststage solution to the SSVRPTWCR. defines a set of disjoint sequences of waiting vertices of , each starting and ending with the depot. Each vertex of occurs at most once in . We note , the set of waiting vertices visited in . The vector associate a waiting time with every waiting vertex . For each sequence , the vehicle is back at the depot by the end of the time horizon:
In other words, defines a solution to a Team Orienteering Problem (TOP, see Chao, Golden, and Wasil (1996)) to which each visited location is assigned a waiting time by . Given a firststage solution , we define for each vertex such that (resp. ) is the arrival (resp. departure) time at . In a sequence in , we then have and for and assume that .
Given a first stage solution , a recourse strategy states how the requests, which appear dynamically, are handled by the vehicles. In other words, it defines how the secondstage solution is gradually constructed, based on and depending on these online requests. A more formal description of recourse strategies in the context of the SSVRPTWCR is provided in SaintGuillain, Solnon, and Deville (2017). Let a recourse strategy . An optimal firststage solution to the SSVRPTWCR minimizes the expected cost of the secondstage solution:
(SSVRPTWCR)  (1)  
(2) 
The objective function , which is nonlinear in general, is the expected number of rejected requests, i.e., requests that fail to be visited under recourse strategy and firststage solution .Note that actually represents an expected quality of service, which does not take travel costs into account. In fact, in most practical applications that could be formulated as an SSVRPTWCR, quality of service prevails whenever the number of vehicles is fixed, as travel costs are usually negligible compared to the labor cost of the mobilized mobile units.
Formulation (1)(2) states the problem in general terms, hiding two nontrivial issues. Given a recourse strategy , finding a computationally tractable way to evaluate constitutes the first challenge. We address it in Section LABEL:sec:recourse, based on a new recourse strategy we propose. The second problem naturally concerns the minimization problem, or how to deal with the solution space. This is addressed in Section LABEL:sec:PFS. For completeness, a mathematical formulation of the constraints involved by (2) is provided in Appendix LABEL:sec:sip.
The strategy we introduce, called , is a generalization and an improvement of strategy introduced in SaintGuillain, Solnon, and Deville (2017). First, it generalizes by taking vehicle capacities into account. Second, improves by saving operational time when possible, by avoiding some pointless round trips from waiting vertices.
For the sake of completeness and since generalizes , part of this section includes elements from section 4 of SaintGuillain, Solnon, and Deville (2017), which are common to both strategies. We emphasize the common points and differences between these two strategies at the end of this section.
Informally, the recourse strategy accepts a request revealed at time if the assigned vehicle is able to adapt its firststage tour to visit the customer, given its set of previously accepted requests. Time window and capacity constraints should be respected, and already accepted requests should not be disturbed.
Ideally, whenever a request appears and prior to determine whether it can be accepted, a vehicle should be selected to minimize objective function (1). Furthermore, if several requests appear at the same time unit and amongst the subset of these that are possibly acceptable, some may not contribute optimally to (1). Given a set of accepted requests, the order in which they are handled also plays a critical role. Unfortunately, none of these decisions can be made optimally without reducing to a NPhard problem. In order for to remains efficiently computable, they are necessarily made heuristically.
The solution proposed in SaintGuillain, Solnon, and Deville (2017) makes these decisions beforehand. Before the start of the operations and in order to avoid reoptimization, the set of potential requests is ordered. Each potential request is also preassigned to exactly one planned waiting vertex in , and therefore one vehicle, based on geographical considerations.
The ordering heuristic is independent of the current firststage solution. Different orders may be considered, provided that the order is total, strict, and , if the reveal time of is smaller than the reveal time of then must be smaller than in the request order. We order by increasing reveal time first, end of time window second, and lexicographic order to break further ties.
Given a firststage solution , we assign each request of either to a waiting vertex visited in or to to denote that is not assigned. We note this assignment. It is computed for each firststage solution before the application of the recourse strategy. To compute this assignment, for each request , we first compute the set of waiting vertices from which satisfying is possible, if appears:
where and are defined as follows. Time is the earliest time at which the vehicle can possibly leave vertex in order to satisfy request . Time , where is the waiting vertex that directly follows in the firststage solution , is the latest time at which a vehicle can leave vertex to handle and arrive at in time. Given the set of feasible waiting vertices for , we define the waiting vertex associated with as follows:

If , then ( is always rejected as it has no feasible waiting vertex);

Otherwise, is set to the feasible vertex of that has the least number of requests already assigned to it (further ties are broken with respect to vertex number). This heuristic rule aims at evenly distributing potential requests on waiting vertices.
Once finished, the request assignment ends up with a partition of , where is the set of requests assigned to the waiting vertices visited by vehicle and is the set of unassigned requests (such that ). We note , the set of requests assigned to a waiting vertex .
At each time step , the recourse strategy is applied to decide whether to accept or reject the new incoming requests, if any, and determine the appropriate vehicle actions. Let be the set of accepted requests up to time . Note that is likely to contain some requests that have been accepted but are not yet satisfied (i.e. not yet visited).
The decision to accept or reject a request appearing at time depends on when vehicle will be available for . By available, we mean that it has finished serving all its accepted requests that precede , according to the predefined order on . This time is denoted by . It is only defined when all the accepted requests, that must be served before by the same vehicle, are known. If is the first request of its waiting vertex, the first of , then:
Otherwise, let be the request that directly precedes in . As the requests assigned to are ordered by increasing reveal time, we know all these accepted requests for sure when . Given current time , function is defined in as:
If is the first request of its waiting vertex, the location from which the vehicle travels towards request is necessarily the waiting vertex . Otherwise, depends on whether reveals by the time the vehicle finishes to satisfy the last accepted request:
is the set of requests accepted up to time . It is initialized with as all previously accepted requests must still be accepted at time . Then incoming requests (i.e. revealed at time ) are considered in increasing order with respect to . is either accepted (added to ) or rejected (not added to ). A request is accepted if (i) it is assigned to a waiting location, (ii) the vehicle is available, and (iii) its capacity is not exceeded. Formally, is added to if and only if:
(3) 
Once has been computed, vehicle operations for time unit must be decided. Each vehicle operates independently from all other vehicles. If vehicle is traveling between a waiting vertex and a customer vertex, or if it is serving a request, then its operation remains unchanged. Otherwise, its operations are defined in Algorithm 1.
In strategy the vehicles handle requests by performing systematic round trips for their current waiting locations. In , a vehicle travels directly towards a revealed request from a previously satisfied one ’, provided that appears by the time the service of gets completed. Furthermore, a vehicle is now allowed to travel directly from a customer vertex to the next planned waiting vertex, without passing by the waiting vertex associated with . Figure 2 illustrates, informally, the differences between and .
Given a recourse strategy and a firststage solution to the SSVRPTWCR, a naive approach for computing would be to follow the strategy described by in order to confront with each and every possible scenario . Since there can be up to possible scenarios, this naive approach is not affordable in practice. This section gives an overview of how we efficiently compute the expected number of rejected requests under the recourse strategy . Further developments of the closedform expressions are then provided in Appendix LABEL:g1_Rplus.
Recall that we assume that request probabilities are independent of each other; i.e., for any couple of requests , the probability that both requests will appear is given by . is equal to the expected number of rejected requests, which in turn is equal to the expected number of requests that are found to appear minus the expected number of accepted requests. Under the independence hypothesis, the expected number of revealed requests is given by the sum of all request probabilities, whereas the expected number of accepted requests is equal to the cumulative sum, for every request , of the probability that it belongs to , i.e.,
(4) 
In the case of , the satisfiability of a request depends on the current time and vehicle load, but also on the vertex from which the vehicle would leave to serve it. The candidate vertices are necessarily either the current waiting location or any vertex hosting one of the previous requests associated with . Consequently, under , the probability is decomposed over all the possible time, load, and vertex configurations in which the vehicle can satisfy :
(5) 
where:
Each tuple in the summation (5), where is either or a previously visited customer vertex , represents a possible configuration for accepting . The probability to accept is then equivalent to the probability to fall into one of those states. In particular, note that represents the probability that, if is accepted, the vehicle leaves its current position at time in order to satisfy it. The calculus of is further developed in Appendix LABEL:g1_Rplus. Given customer vertices, a horizon of length and vehicle capacity of size , the computational complexity of computing the whole expected cost is in , as detailed in the appendix.
A naive implementation of equation (5) would basically fill up an array. We draw attention to the fact that even a small instance with and would then lead to a memory consumption of floating point numbers. Using a common eightbyte representation requires more than seven gigabytes. Like strategy , important savings are obtained by noticing that the computation of functions for a given request under only relies on the previous potential request . By computing while only keeping in memory the expectations of (instead of all potential requests), the memory requirement is reduced by a factor . This however comes at the price of making any incremental computation, based on probabilities belonging to a similar firststage solution, impossible.
Solving a static stochastic optimization problem, such as the SSVRPTWCR, involves finding values for a set of firststage decision variables that optimize an expected cost with respect to some recourse strategy:
Solving this kind of problem is always challenging. Besides the exponential size of the (firststage) solution space , the nature of the objective function , an expectation, is usually computationally demanding. Because enumerating all possible scenarios is usually impossible in practice, some approaches tend to circumvent this bottleneck by restricting the set of considered scenarios, using for example the sample average approximation method (Ahmed and Shapiro, 2002). In some cases, expectations may be directly computed in (pseudo) polynomial time, by reasoning on the random variables themselves rather than on the scenarios. However, the required computational effort depends on the recourse strategy and usually remains very demanding, as it is the case for the SSVRPTWCR.
The Progressive Focus Search (PFS) metaheuristic aims at addressing these issues with two approximation factors, intended to reduce the size of the solution space and the complexity of the objective function. The initial problem is simplified into a problem having simplified objective function and solution space. Parameters and define the approximation factors of the objective function and of the solution space, respectively, and when . Whenever or , the optimal cost of is an approximation of that of . Starting from some initial positive values for and , the idea of PFS is to progressively decrease these values using an update policy. The simplified problem is iteratively optimized for every valuation of , using the best solution found at the end of one iteration as starting point in the solution space for the next iteration.
The definition of the simplified problem depends on the problem to be solved. In Sections LABEL:sec:alpha and LABEL:sec:beta, we give some general principles concerning and and describe how to apply them to the case of the SSVRPTWCR. In Section LABEL:sec:PFS_general, we describe the generic PFS metaheuristic.
We assume the expected cost to be computed by filling matrices in several dimensions. In order to reduce the complexity, some of these dimensions must be scaled down. This is achieved by changing the scale of the input data and the decision variable domains related to the selected dimensions, dividing the values by the scale factor and rounding to integer if necessary.
For example, in the SSVRPTWCR the dimensions considered at computing the objective function are: the number of waiting vertices , the vehicle capacity , and the time horizon . Let be the time horizon in the initial problem, corresponding to five hours in units of one second. If we choose to reduce the time dimension with respect to a scale factor , then all durations in the input data (travel times, service times, time windows, etc.) are rounded to the nearest multiple of 60. Thus, the time horizon in the simplified problem is of , corresponding to a fivehour time horizon in units of one minute. The domains of waiting times decision variables are reduced accordingly, scaled from in to in .
Similarly, if we choose to reduce the vehicle capacity dimension with respect to a scale factor , and if the vehicle capacity in the initial problem is , e.g. 500 kg in steps of 1 g, then all demands must be rounded to multiples of 1000. The capacity in becomes , thus 500 kg in units of 1 kg. When scaling dimensions of different nature, such as time and capacity, different scale factors should be considered, leading to a vector .
Experiments have shown us that the closer is to 1, the more accurate the approximation of the actual objective function is. Progressively reducing during the search process allows us to quickly compute rough approximations at the beginning of the search process, when candidate solutions are usually far from being optimal, and spend more time computing more accurate approximations at the end of the search process, when candidate solutions get closer to optimality.
When applying a scaling factor , for consistency reasons the nature of the scaled input data may impose to the domains of some decision variables to be reduced accordingly. Yet the solution space can further simplified by reducing the domains of (part of) the remaining decision variables, or even by further reducing the same decision variables. Let be the initial set of values that may be assigned to , that is, the domain of a decision variable . Domain reduction is not necessarily done for all decision variables, but only for a selected subset of them, denoted as . The simplified problem is obtained by selecting values and only considering these candidate values when searching for solutions, for each decision variable . Ideally, the selection of this subset of values should be done in such a way that the selected values are evenly distributed within the initial domain . We note , the domain of a decision variable in the simplified problem .
For example, in the SSVRPTWCR a subset of decision variables defines the waiting times on the visited waiting vertices: defines the waiting time on , with . If the temporal dimension is not scaled with respect to , or if , then is reduced to a subset of that contains values. To ensure that these values are evenly distributed in , we may keep multiples of . However, if the temporal dimension is scaled with respect to , the selected values must thereafter be scaled.
Another subset of decision variables in the SSVRPTWCR defines the waiting vertices to be visited by the vehicles. The initial decision variable domains are then equivalent to . Reducing the domains of these decision variables can be achieved by restricting to a subset of that contains waiting vertices. To ensure that these values are evenly distributed in the space, we may use geographical clustering techniques.
Progressively decreasing the value of allows us to progressively move from diversification to intensification: at the beginning of the search process, there are fewer candidate values for the decision variables of . The solution method is therefore able to move quickly towards more fruitful regions of the search space. For minimization (resp. maximization) problems, we can easily show that the optimal solution of a simplified problem is an upper (resp. lower) bound of the optimal solution of the problem ; this is a direct consequence of the fact that every candidate solution of is also a candidate solution of .
PFS requires the following input parameters:

An initial problem ;

Initial values for and , as well as final values ;

An update policy that returns the new values and given and ;

A computation time policy such that returns the time allocated for optimizing ;

A solution algorithm such that, given a problem , an initial solution , and a time limit , returns a possibly improved solution for .
PFS is described in Algorithm 2. At each iteration , the simplified problem is built (line 3), and the current solution is updated accordingly (line 3): every value assigned to a decision variable which is concerned by the scale factor is updated with respect to the new scale , and if a value assigned to a decision variable does not belong to the current domain associated with and , then it is replaced with the closest available value. Note that the updated solution may not be a feasible solution of (because of value replacements and rounding operations on input data). Therefore the optimizer must support starting with infeasible solutions.
Algorithm is then used to improve with respect to the simplified problem within a CPU time limit defined by the computation time policy (line 4). Finally, new values for and are computed, according to the update policy (line 5). This iterative optimization process stops when and , i.e., when the last optimization of with has been done with respect to the targeted level of accuracy defined by . To ensure termination, we assume that the update policy eventually returns after a finite number of calls. Finally, if the final value of is larger than 1, so that is a scaled solution, then is scaled down to become a solution of the initial problem (line 8).
In this section, we introduce the new benchmark as well as the experimental concepts and tools used for experimentations reported in Sections LABEL:sec:small_instances and LABEL:sec:big_instances.
We derive our test instances from the benchmark described in Melgarejo, Laborie, and Solnon (2015) for the TimeDependent TSP with Time Windows (TDTSPTW). This benchmark has been created using real accurate delivery and travel time data obtained from the city of Lyon, France. It is available at http://becool.info.ucl.ac.be/resources/ssvrptwcroptimodlyon, as well as the solution files and detailed result tables of the experiments conducted in the following sections.
The benchmark contains two different kinds of instances: instances with separated waiting locations and instances without separated waiting locations. Each instance with separated waiting locations is denoted by cw, where is the number of customer vertices, is the number of waiting vertices, and is the random seed. Each instance without separated waiting locations is denoted by c+w. In these instances, every customer vertex is also a waiting vertex, . Instances sharing the same number of customers and the same random seed (e.g. 50c30w1, 50c50w1 and 50c+w1) always share exactly the same set of customer vertices . In all instances, the duration of an operational day is eight hours and the time horizon is , which corresponds to oneminute time steps.
To each potential request is assigned a time window , where is taken uniformly from . Note that the time window always starts with the reveal time . This aims at simulating operational contexts similar to the practical application example described in introduction, the ondemand health care service at home, requiring immediate responses within small time windows. See the ecompanion LABEL:sec:instance_generation for more details on the complete process used to generate instances.
Experiments have been done on a cluster composed of 64bit AMD Opteron 1.4GHz cores. The code is developed in C++11 with GCC4.9, using O3 optimization flag. The current source code of our library for (SS)VRPs is available from the online repository: bitbucket.org/mstguillain/vrplib.
We consider both recourse strategies and , a generalized version of for capacitated vehicles. We compare their respective contribution and applicability, then we combine them to take the best of each, using several variations of PFS. An exact method allows us to measure optimality gaps, in order to assess the quality of the solutions found by PFS. In order to evaluate the interest of exploiting stochastic knowledge, that is by modeling the problem as a SSVRPTWCR, the solutions are also compared with a waitandserve policy which does not anticipate, i.e. in which vehicles are never relocated.
Recourse strategy is designed to be able to cope with vehicle maximal capacity constraints. In order to compare both strategies and , and since part of our experiments involve limited vehicle capacities, an adapted version of is required. We call this generalization . Vehicles behave under exactly as under , but are limited by their capacity. Its request acceptance rule follows the condition in (3), except that the definition of and are those stated in SaintGuillain, Solnon, and Deville (2017) for , and that .
In Appendix LABEL:sec:R_Q_expectation, we explain how to efficiently compute . We also show how the resulting equations naturally reduce to the ones proposed in Bertsimas (1992), when particularized to the special case of the SSVRPC. We found that, given customer vertices, a horizon of length and vehicle capacity of size , computing is of complexity . This is significantly lower than under , which requires operations in the worst case. However, such a lower complexity naturally comes at the price of a significantly higher expected cost in average, motivating the need for an adequate tradeoff. We empirically address this question in Section LABEL:sec:small_instances.
We have considered different update and computation time policies and in our experiments. In this section, we only describe the optimizer and the approximation factors and used when conducting experiments with PFS.
The optimizer is the local search (LS) introduced in SaintGuillain, Solnon, and Deville (2017) to solve the SSVRPTWCR. Starting from an initial randomly generated firststage solution, LS iteratively modifies it by using a set of neighborhood operators: four classical ones for the VRP, i.e., relocate, swap, inverted 2opt, and crossexchange (see Kindervater and Savelsbergh (1997), Taillard et al. (1997)), and five new operators dedicated to waiting vertices: insertion/deletion of a randomly chosen waiting vertex in/from , increase/decrease of the waiting time of a randomly chosen vertex , and transfer of a random waiting duration from one waiting vertex to another. After each modification of the firststage solution, its expected cost is updated using the appropriate equations, depending on whether strategy or is considered. The acceptance criterion follows the Simulated Annealing (SA) metaheuristic of Kirkpatrick, Gelatt, and Vecchi (1983): improved solutions are always accepted, while degrading solutions are accepted with a probability which depends on the degradation and temperature. Temperature is initialized to and progressively decreased by a factor after each iteration of the LS. A restart strategy resets the temperature to its initial value each time it reaches a lower limit . In all experiments, SA parameters were set to , and .
In the initial problem , temporal data is expressed with a resolution of oneminute time units. The factor is used to scale down this temporal dimension. The time horizon is scaled down to , so that each time step in has a duration of minutes. Every temporal input value (travel times , reveal times , service times , and time windows ) is scaled from its initial value to . Rounding operations are chosen in such a way that the desired quality of service is never underestimated by scaled data: is rounded down while all other values are rounded up. This ensures that a feasible first stage of a simplified problem always remains feasible once adapted to .
The decision variables concerned by domain reductions are waiting time variables: . In , we have . Domains are reduced by selecting a subset of values, evenly distributed in . As the temporal dimension is also scaled with respect to , selected values are scaled down: .
It is both meaningless (for vehicle drivers) and too expensive (for the optimization process) to design firststage solutions with waiting times that are precise to the minute. Hence, in our experiments the domain of every waiting time decision variable is always reduced by a factor . When , waiting times are multiples of 10 minutes. When and , we have , but temporal data (travel and service times, time windows, etc.) are precise to the minute.
In order to assess the ability of our algorithms to find (near) optimal solutions, we devise a simple enumerative optimization method which is able to compute optimal solutions on small instances. To that end, the solution space is restricted to the solutions that (a) use all available vehicles and (b) use all the available waiting time. Indeed, if , then on the basis of any optimal solution which uses only a subset of the available vehicles, a solution of the same cost can be obtained by assigning an idle vehicle to either a nonvisited waiting vertex (if any) or the last visited vertex of any nonempty route (visiting at least two waiting locations), so that (a) does not remove any optimal solution. Furthermore, if an optimal firststage solution contains a route for which the vehicle returns to the depot before the end of the horizon, adding the remaining time to the last visited waiting vertex will never increase the expected cost of the solution, so that (b) is also valid. The resulting solution space is then recursively enumerated in order to find the firststage solution with the optimal expected cost.
In order to assess the contribution of our recourse strategies, we compare them with a policy ignoring anticipative actions. This waitandserve (w&s) policy takes place as follows. Vehicles begin the day at the depot. Whenever an online request appears, it is accepted if at least one of the vehicles is able to satisfy it, otherwise it is rejected. If accepted, it is assigned to the closest such vehicle which then visits it as soon as it becomes idle. If there are several closest candidates, the least loaded vehicle is chosen. After servicing (which lasts time units), the vehicle simply stays idle at ’s location until it is assigned another request or until it must return to the depot. Note that a request cannot be assigned to a vehicle if satisfying it prevents the vehicle from returning to the depot before the end of the horizon.
Note that, whereas our recourse strategies for the SSVRPTWCR generalize to requests such that the time window starts later than the reveal time, in our instances we consider only requests where . Doing it the other way would in fact require a more complex waitandserve policy, since the current version would be far less efficient and unrealistic in the case of requests with significantly greater than .
In what follows, average results are always reported for the w&s policy. We randomly generate scenarios according to the probabilities. For each scenario, we apply the w&s approach to compute a number of rejected requests; finally, the average number of rejected requests is reported. The results of PFS and the exact method are always reported by means of average relative gains, in percentages, with respect to the w&s policy: the gain of a firststage solution computed with PFS or the exact method is , where is the expected cost of and is the average cost under the w&s policy.
We consider small test instances, having customer vertices. Furthermore, PFS is here instantiated such that we perform only a single optimization step (lines 27 of Algorithm 2): and . The simplified problem is therefore first optimized for a duration of seconds, and the returned solution is adapted with respect to the initial problem , ensuring that all results are expressed according to the original input data. This limited experimental setting, while ignoring the impact of performing several optimization steps in PFS, aims at determining:

Whether the loss of precision, introduced by and , is counterbalanced by the fact that the approximation is easier to solve than the initial problem.

The impact of avoiding pointless trips in recourse strategy , compared with simpler (but computationally less demanding) strategy .

The interest of exploiting stochastic knowledge, by comparing the expected costs of SSVRPTWCR solutions with their average costs under the w&s policy.

The quality of the solutions computed by the LS algorithm under different scale factors. These are compared with optimal solutions obtained with the exact method. When or , the exact method solves , and the results are reported according to the final solution, scaled back to .
Exact (% gain after 30 minutes)  PFS (% gain after 5 minutes)  
w&s  
10c5w1  12.8  8.9*  15.4  7.3*  12.8*  4.4*  9.5*  8.9  14.1  7.3  13.1  4.4  9.2 
10c5w2  10.8  4.8*  7.4  4.8*  7.4*  8.8*  0.5*  4.8  0.2  4.8  4.9  8.8  0.5 
10c5w3  8.0  46.9*  26.5  46.9*  26.5*  55.9*  43.2*  46.9  29.9  43.1  30.6  43.1  32.9 
10c5w4  10.5  10.9*  0.9  10.9*  0.9*  10.9*  0.9*  10.9  8.7  10.9  4.9  10.9  2.2 
10c5w5  8.4  17.9*  2.5  17.9*  2.5  20.5*  0.5*  17.9  6.1  18.9  2.8  19.5  1.1 
#eval  
10c+w1  12.8  35.3  34.4  35.3  34.4  32.7  26.5  39.1  30.3  38.3  36.7  34.9  34.1 
10c+w2  10.8  28.1  19.1  30.1  21.5  30.1  29.7  32.3  18.8  32.1  25.2  32.3  25.8 
10c+w3  8.0  14.4  17.1  18.8  17.1  13.3  13.7  26.1  18.8  27.6  20.3  23.1  23.9 
10c+w4  10.5  7.8  11.0  12.4  11.6  7.8  11.4  22.6  12.3  23.3  16.4  18.8  16.5 
10c+w5  8.4  3.5  8.9  8.4  6.6  23.7  1.7  31.6  15.8  32.7  21.1  29.3  28.5 
#eval 
Table 2 shows the average gains, in percentages, of using an SSVRPTWCR solution instead of the w&s policy, for small instances composed of customer vertices with uncapacitated vehicles. We consider three different values for . When (resp. , ), the time horizon is (resp. , ) and each time unit corresponds to one minute (resp. two and five minutes). In all cases, the domain reduction factor is set to : waiting times are restricted to multiples of 60 minutes.
Unlike the recourse strategies, which must to deal with a limited set of predefined waiting locations, the w&s policy makes direct use of the customer vertices. Therefore, the relative gain of using an optimized SSVRPTWCR firststage solution is highly dependent on the locations of the waiting vertices. Gains are always greater for 10c+w instances, where any customer vertex can be used as a waiting vertex: for these instances, gains with the bestperforming strategy are always greater than , whereas for 10c5wi instances, the largest gain is , and is negative in some cases.
The results obtained on instance 10c5w3 are quite interesting: gains are always negative; i.e., waiting strategies always lead to higher expected numbers of rejected requests than the w&s policy. By looking further into the average travel times in each instance, in Table 3, we find that the average travel time between customer vertices in instance 10c5w3 is rather small (12.5), and very close to the average duration of time windows (12.3). In this case, anticipation is of less importance and the w&s policy appears to perform better. Furthermore, average travel time between waiting and customer vertices (19.5) is much larger than the average travel time between customer vertices.
Instance:  10c5w1  10c5w2  10c5w3  10c5w4  10c5w5 
Travel time within :  19.6’  16.8’  12.5’  18.0’  13.0’ 
Travel time between and :  23.7’  19.9’  19.5’  20.9’  18.2’ 
Time window duration:  11.6’  12.7’  12.3’  13.2’  12.6’ 
We note that the exact enumerative method runs out of time under for all instances, when . Increasing to 2 speeds up the solution process and makes it possible to prove optimality on all 10c5w instances except instance 5. Setting allows to find all optimal solutions. However, optimizing with coarser scales may degrade the solution quality. This is particularly true for 10c5w instances which are easier, in terms of solution space, than 10c+wx instances as they have half the number of waiting locations: for 10c5w instances, gains are often decreased when is increased because, whatever the scale is, the search finds optimal or nearoptimal solutions.
For PFS, gains with recourse strategy are always greater than gains with recourse strategy on 10c5w instances. However, we observe the opposite on 10c+wi instances. This comes from the fact that expected costs are much more expensive to compute under than under . Table 2 displays the average number of times the objective function is evaluated (#eval), that is the number of solutions considered by either the local search or the exact method, in which case it corresponds to the size of the solution space (when enumeration is complete and under assumptions (a) and (b) discussed in Section LABEL:sec:expe_plan). We note that the number of LS iterations is 10 times smaller when using compared to . As 10c5w instances are easier than 10c+wi instances, around iterations is enough to allow the LS optimizer of PFS to find nearoptimal solutions. In this case, gains obtained with are much larger than those obtained with . However, on 10c+wi instances, iterations are not enough to find nearoptimal solutions. For these instances, better results are obtained with .
When optimality has been proven by Exact, we note that PFS often finds solutions with the same gain. With , PFS may even find better solutions: this is due to the fact that optimality is only proven for the simplified problem , whereas the final gain is computed after scaling back to the original horizon at scale 1. When optimality has not been proven, PFS often finds better solutions (with larger gains).
Results obtained from Table 2 show that although it leads to larger gains, the computation of expected costs is much more expensive under recourse strategy than under , which eventually penalizes the optimization process as it performs fewer iterations within the same time limit (for both Exact and PFS).
We now introduce a pseudostrategy that we call , which combines and . For both Enum and PFS, strategy refers to the process that uses as the evaluation function during all the optimization process. When stopping at a final solution, we reevaluate it using . Table 4 reports the gains obtained by applying on instances 10c5w and 10c+w. By using , we actually use to guide the LS optimization, which permits the algorithm to consider a significantly bigger part of the solution space. For both Enum and PFS, always leads to better results than . From now on, we will only consider strategies and in the next experiments.
Exact (% gain after 30 minutes)  PFS (% gain after 5 minutes)  
w&s  
10c5w1  12.8  8.9  14.0  7.3  12.8  4.4  9.5  8.9  14.0  7.3  12.8  4.4  9.5 
10c5w2  10.8  4.8  7.4  4.8  7.4  8.8  0.5  4.8  7.1  4.8  7.4  8.8  0.5 
10c5w3  8.0  46.9  26.5  46.9  26.5  55.9  37.3  46.9  26.5  43.1  22.5  43.1  22.5 
10c5w4  10.5  10.9  0.9  10.9  0.9  10.9  0.9  10.9  0.9  10.9  0.9  10.9  0.9 
10c5w5  8.4  17.9  2.5  17.9  2.5  20.5  0.5  17.9  2.5  18.9  1.8  19.5  1.1 
10c+w1  12.8  35.3  37.7  35.3  37.7  32.7  34.0  39.1  40.9  38.3  39.5  34.9  36.2 
10c+w2  10.8  28.1  33.2  30.1  34.6  30.1  34.6  32.3  35.6  32.1  35.5  32.3  34.4 
10c+w3  8.0  14.4  24.4  18.8  29.9  13.3  21.9  26.1  35.0  27.6  35.6  23.1  30.3 
10c+w4  10.5  7.8  13.5  12.4  20.0  7.8  13.5  22.6  29.2  23.3  29.3  18.8  24.1 
10c+w5  8.4  3.5  10.6  8.4  13.3  23.7  31.0  31.6  39.8  32.7  40.7  29.3  35.0 
Table 5 considers instances involving 20 customer vertices and either 10 separated waiting locations (20c10w) or one waiting location at each customer vertex (20c+w). It compares results obtained by PFS for two different computation time limits, with . When (resp. and ), domains of waiting time variables contain 48 (resp. 16 and 8) values, corresponding to multiples of 10 (resp. 30 and 60) minutes. In all cases, the scale factor is set to 2.
When considering the recourse strategy with a fiveminute computation time limit, we observe that better results are obtained with , as domains are much smaller. When the computation time is increased to 30 minutes, or when considering strategy , which is cheaper to compute, then better results are obtained with , as domains contain finergrained values.
We observe that always provides better results than pure , whatever the waiting time multiple used. Except when switching to significantly greater computational times, seems more adequate as it combines the limited computational cost incurred by with the nicer expected performances of the cleverer strategy .
Exact (% gain after 30 minutes)  PFS (% gain after 5 minutes)  
w&s  
20c10w1  22.6  9.3  12.1  9.7  11.4  12.6  16.0  10.8  5.5  12.0  5.6  15.2  6.4 
20c10w2  19.8  11.8  27.9  5.0  29.1  3.7  31.4  5.7  12.2  3.8  10.2  1.9  8.7 
20c10w3  21.1  0.5  16.7  5.6  15.9  6.4  21.3  1.1  2.8  7.7  0.9  8.1  0.2 
20c10w4  25.3  4.6  4.3  5.2  6.9  5.4  9.2  5.7  4.3  5.5  3.8  5.1  4.7 
20c10w5  20.9  10.7  25.9  1.0  24.6  0.2  23.8  9.1  14.0  0.0  7.0  0.8  6.4 
20c+w1  22.6  15.4  2.8  17.2  2.2  17.4  0.2  17.9  13.5  19.4  11.4  20.2  13.0 
20c+w2  19.8  7.6  10.0  5.6  16.8  6.9  14.3  12.2  2.7  10.7  2.1  12.3  3.5 
20c+w3  21.1  2.8  11.1  3.9  14.1  3.1  13.7  4.8  1.2  6.4  0.4  7.8  0.3 
20c+w4  25.3  14.3  5.2  15.3  2.6  14.0  3.0  15.9  12.4  18.6  13.5  19.6  14.2 
20c+w5  20.9  13.6  2.6  14.0  11.2  16.7  7.7  15.7  9.8  19.0  11.0  18.5  12.0 
We now consider instances with customer vertices. Instances 50c30w and 50c50w have and separated waiting locations, respectively. Instances 50c+w have waiting vertices which correspond to the customer vertices. Each class is composed of 15 instances such that, for each seed , the three instances classes 50c30wi, 50c50wi, and 50c+wi contain the same set of 50 customer vertices and thus only differ in terms of the number and/or positions of waiting vertices. For each instance, the vehicle’s capacity is set to , and we consider three different numbers of vehicles . In total, we thus have 45 3 = 135 different configurations.
We first compare and discuss the behaviors of different instantiations of PFS. Then, based on the PFS variant that appears to perform best, further experiments (Section LABEL:sec:big_instances_results) measure the contribution of a twostage stochastic model, through the use of a SSVRPTWCR formulation and our recourse strategies.
All runs of PFS are limited to seconds (three hours). We compare seven instantiations of PFS, which have different update and computation time policies and , while all other parameters are set as described in Section 6.2.1. Strategy is used for all experiments. The different instantiations are:

PFS*10: the scale factor is progressively decreased from 5 to 2 and 1 while the domain reduction factor remains fixed to . More precisely, , , and . The update policy successively returns and , while . The computation time policy always returns 3600 seconds, so that the three LS optimizations have the same CPU time limit of one hour.

PFS1*: remains fixed to 1 while is progressively decreased from 60 to 30 and 10. More precisely, , , and . The update policy successively returns and , while . The computation time policy always returns 3600 seconds.

PFS**: both and are progressively decreased. We set , , , and . The update policy returns the following couples of values for : (2, 60), (1, 60), (5, 30), (2, 30), (1, 30), (5, 10), (2, 10), (1, 10). The computation time policy always returns 1200 seconds. The PFS optimization process is hence composed of nine LS optimizations of 20 minutes each.

PFSab which performs only a single LS optimization step with and and , as experimented in Section 7. We consider two different values for , i.e., , and two different values for , i.e., , thus obtaining four different instantiations.
The performances of the seven PFS instantiations and the baseline w&s approach are compared in Figure 3 by using performance profiles. Performance profiles (Dolan and Moré, 2002) provide, for each considered approach, a cumulative distribution of its performance compared to other approaches. For a given method A, a point on A’s curve means that in of the instances, A performed at most times worse than the best method on each instance taken separately. A method A is strictly better than another method B if A’s curve always stays above B’s curve.
According to Figure 3 (left), algorithms PFS*10 and PFS** show the best performances when tested on the 15 instances of class 50c+w with vehicles. More experiments are conducted and reported in Figure 3 (right) in order to distinguish between the algorithms PFS*10, PFS1* and PFS** on all 135 instances. In comparison to the other approaches, algorithms PFS*10 and PFS** clearly obtain the best performances on average over the 135 configurations.
Figure 4 illustrates, on a single instance (50c50w1 with vehicles), the evolution through time of the gain of the expected cost of the current solution , with respect to the average cost of the w&s policy, during a single run of PFS110, PFS*<