Zhao, Haskell, and Cardin
Approximate Dynamic Programming Algorithm for MultiStage Capacity Investment Problems
An Approximate Dynamic Programming Algorithm for MultiStage Capacity Investment Problems
Sixiang Zhao \AFFDepartment of Industrial Systems Engineering and Management, National University of Singapore, Singapore 117576, \EMAILzhaosixiang@u.nus.edu \AUTHORWilliam B. Haskell \AFFDepartment of Industrial Systems Engineering and Management, National University of Singapore, Singapore 117576, \EMAILisehwb@nus.edu.sg \AUTHORMichelAlexandre Cardin \AFFDyson School of Design Engineering, Imperial College London, London SW7 2AZ, United Kingdom, \EMAILm.cardin@imperial.ac.uk
This paper studies a dynamic multifacility capacity investment problem (MCIP) with discrete capacity. In this setting, capacity adjustment decisions are made sequentially based on observations of demand. First, we formulate this problem as a Markov decision process (MDP). Then, we design a customized fitted value iteration (FVI) algorithm. In particular, we approximate the value functions with a twolayer neural network with piecewise linear activation functions. However, the action selection procedure of FVI for MCIP can be timeconsuming since the action space is discrete and high dimensional. To speed up the action selection, we recast the action selection problem as a twostage stochastic programming problem. The resulting recourse function comes from the twolayer neural network, and it is solved with a specialized multicut decomposition algorithm. Numerical studies show that our algorithm provides high quality solutions to the MCIP, and also that the multicut algorithm can significantly speed up the action selection problem of FVI in comparison to the bruteforce method and the integer Lshape algorithm. Finally, we show that the techniques developed here for MCIP are applicable to many other finitetime horizon MDPs with finite but high dimensional action spaces.
real options analysis, multifacility capacity investment problem, discrete capacity, approximate dynamic programming, decomposition algorithm
1 Introduction
Strategic capacity decisions are important to companies because of the high expenditures entailed and the uncertainty associated with the business environment. To deal with the uncertainty, a wiser decision is to adjust the capacity periodically given the new information of the uncertain parameters, such as demands, instead of establishing facilities with a large amount of capacity in the beginning. Real options analysis has provided an efficient framework to evaluate the value of a system with dynamically adjusted capacity (Trigeorgis 1996). It views capacity investment decisions as a series of options, i.e. capacity adjustment options, that can be dynamically exercised. Namely, in each decision epoch, decision makers have the right, but not the obligation, to invest or salvage the capacity once new demand information is revealed. A key question in real options analysis is to evaluate the economic performance of the system with capacity adjustment options over a system without such options (i.e., the value of flexibility). Evaluating the performance of a system with such options is a multistage decision making problem under uncertainty, of which the optimal capacity investment policy is a function with respect to the observed uncertain parameters. In a capacity investment problem with a single facility, the capacity investment policy is to determine when and how much to adjust the capacity when the uncertain customer demands are observed over time. In a multifacility problem, the system has not only the options to adjust capacity, but also the options to switch between facilities; for example, if one facility runs out of capacity, the excess demands can be transported to adjacent facilities by paying extra fees. In this context, evaluating the economic performance becomes harder as the decisions are multidimensional. Though multidimensional, the form of the optimal policy can still be found when the capacity investment problem is convex—i.e. the capacity is continuous and the capacity adjustment costs are convex (Eberly and Van Mieghem 1997). However, these assumptions are too strict in practice as the capacity of a production system is usually modulardesigned; that is, the capacity is discrete.
Motivated by the issues outlined above, we study a multifacility capacity investment problem (MCIP) with discrete capacity in this paper. Such problems have been widely studied in many fields, such as transportation systems (Sun and Schonfeld 2015), the semiconductor industry (Huang and Ahmed 2009), and energy systems (Singh et al. 2009). To solve this problem, one prevalent method is multistage stochastic programming, where the evolution of the uncertain parameters is modeled as a scenario tree and the problem solved as a largescale mixed integer programming problem. However, the scenario tree grows exponentially with the number of stages and the dimension of the uncertain parameters, so this method can easily become intractable. Another method that can solve MCIP is the decision rulebased multistage stochastic programming, where the policy space of the problem is approximated given a preset functional space. Zhao et al. (2018) have designed an ifthen decision rule to solve a multifacility capacity expansion problem with uncertain demands, but the proposed decision rule may require redesigns once capacity contraction or other types of uncertainty, such as price, are considered.
As an alternative, the MCIP can be formulated as a Markov decision process (MDP), and solved via dynamic programming (DP)based algorithms. However, DPbased methods are subject to the curse of dimensionality; that is to say, the complexity of the algorithm increases exponentially with the number of customers and of facilities. To alleviate the burden brought by the high dimensional state space, we can use approximate dynamic programming (ADP)—more specifically, we can use the fitted value iteration (FVI) algorithm—to approximate the value functions of the problem. In FVI, the value function of the last stage is firstly approximated and then the algorithm proceeds via going backward in time. In each time period, given each generated sample from the state space, one needs to choose the optimal action by solving an action selection problem. Since the action space is finite, the selection problem can be done by enumeration, but this method can be extensively timeconsuming in MCIP, whose action space is multidimensional.
In this paper, we propose a neural networkbased fitted value iteration (NNFVI) to solve MCIPs, where the capacity can be either expanded or contracted. Then, a decomposition algorithm is proposed to accelerate the action selection procedure of NNFVI. Theoretical guarantee of the algorithm is provided, and the numerical studies herein show that it can speed up NNFVI significantly in comparison with the enumeration method. More specifically, the contributions of this paper are summarized below:

We analyze the functional form of the value functions of MCIP, and show that the value functions present some piecewise linear structure. Then, we approximate the value functions by twolayer neural networks with rectified linear units (ReLU) being activation functions. We also prove that the NNFVI algorithm with ReLU is consistent; that is, an optimal solution with probability can be achieved under some mild conditions.

Since the action space of MCIP is multidimensional, it can be timeconsuming in selecting the optimal action via enumeration. To accelerate the action selection procedure, we formulate it as a two stage stochastic programming problem with a nonconvex recourse function where the recourse function is derived from the twolayered neural network. This type of stochastic programming problems has been widely studied in (Laporte and Louveaux 1993, Sen and Sherali 2006, Sen et al. 2010), but our method differs from the literature in that we design a multicuts method by making use of the gradient information of the recourse function. We also show that the proposed multicut algorithm can converge to the global optimum in a finite number of steps, and the numerical studies herein show that it can speed up the algorithm significantly in comparison with the enumeration method or the integer Lshape algorithm.

We explain how to extend our method to other dynamic programming problems, such as MCIP with lead time or MCIP with uncertain rewards/costs; the proposed algorithm works as long as problems have combinatorial action selection and the value functions are approximated via twolayer neural networks with ReLU activation functions.
The remainder of the paper is organized as below. Related work about multistage capacity planning problems and their solutions are reviewed in Section 2. In Section 3, an MCIP model with discrete capacity is presented. The proposed NNFVI, along with the analysis of the value functions, are presented in Section 4. Section 5 introduces the proposed multicut decomposition algorithm to solve the action selection procedure. The numerical studies are presented in Section 6 and we discuss the extensions of the proposed method in Section 7. In the last section, we conclude the major findings and arguments of this study.
2 Related Work
2.1 Capacity Investment Problems
Since the seminal work (Manne 1961), capacity investment problems with stochastic demands have been widely studied. Analytical results for capacity expansion problem with a single facility and discrete capacity have been derived by (Angelus et al. 2000), where the authors found that the optimal policy of the problem is similar to the wellknown policy in inventory management. More recently, papers investigated problems where contraction of capacity is allowed (Angelus and Porteus 2002). Ye and Duenyas (2007) studied a capacity investment problem with twosided fixed costs, and characterized the structure of the optimal policy. This result was further extended by considering risk aversion of decision makers (Lu and Yan 2016). For multifacility problems, Eberly and Van Mieghem (1997) studied a multifactors capacity investment problem and derived the structure of the optimal policy, but the result is subject to the assumptions that the capacity is continuous and the capacity adjustment costs are convex. Twostage multifacility capacity expansion models with discrete capacity have been studied in the semiconductor industry (Geng et al. 2009) or in transportation systems (Dong et al. 2015), where the first stage sets up capacity plan and the second stage includes the waitandsee allocation decisions among the installed facilities. However, the twostage models are inflexible as the capacity investment plan is unchanged during the planning horizon, regardless of the realizations of uncertainty. Huang and Ahmed (2009) studied a multifacility capacity expansion problem with discrete capacity, and derived an analytical bound for the value of the multistage model over that of the twostage model. Truong and Roundy (2011) developed a multidimensional approximation algorithm to solve capacity expansion problems by decomposing the problem via a costseparation scheme. Similar applications about multistage multifacility capacity expansion models with discrete capacity can also be found in wastetoenergy systems (Zhao et al. 2018), or in mobility ondemand transportation (Cardin et al. 2017a). For comprehensive review, interesting readers can also refer (Luss 1982, Van Mieghem 2003, MartínezCosta et al. 2014) for details.
2.2 Stochastic Programming Methods
One prevalent method to solve multiperiod optimization problems under uncertainty is multistage stochastic programming. Ahmed et al. (2003) solved a multifacility capacity expansion problem via a scenario treebased multistage stochastic programming. Huang and Ahmed (2009) and Taghavi and Huang (2016) decomposed the scenario treebased model across facilities, and then solved the subproblems via a linear programming (LP) based approximation algorithm. Taghavi and Huang (2018) considered a similar problem with budget constraints, and solved it with a heuristic Lagrangian relaxation method. Another algorithm that can solve the MCIP with discrete capacity is stochastic dual dynamic integer programming (Zou et al. 2018); it formulates the uncertain parameters as a scenario tree and the problem is solved based on a nested decomposition scheme. However, when solving MCIP, the number of nodes of a scenario tree not only increases exponentially with the number of stages but also with the dimensions of the uncertain parameters, which make the size of the tree astronomical and the convergence of the algorithm slow.
Decision rulebased methods are proposed to solve multistage capacity planning problems. A decision rule is a function which maps the realized uncertainty to the decisions; it approximates the policy space of a multistage problem by specifying the functional form of the policy (Shapiro et al. 2009). In the previous studies, ifthen decision rules have been proposed to solve a multistage capacity expansion problem. An ifthen decision can be stated as: if the gap of capacity exceeds a certain threshold, then expand the capacity to a certain level (Cardin et al. 2017b). Zhao et al. (2018) extended the ifthen decision rule to a multifacility problem by introducing a weighted matrix, and designed a decomposition algorithm to optimize the control parameters of the rule. Unfortunately, these methods only work for systems with capacity expansion options only. If one needs to extend the ifthen decision rules in solving the problems with capacity contraction, the proposed rules require redesigns: another ifthen statement for capacity contraction is needed, and it may complicate the problem because additional binary variables are introduced. In addition, the proposed ifthen rules do not work if other uncertain parameters, besides demands, are introduced.
2.3 Dynamic Programming Methods
As another alternative, approximate dynamic programming (ADP), or reinforcement learning, incorporates the capacity decisions and uncertain parameters into the underlying MDP. The basic idea of ADP is to approximate the value functions or functions via state aggregation (Bertsekas 2012), basic functions approximation (Munos and Szepesvári 2008, Antos et al. 2008, Powell 2011), or sample average approximation (Haskell et al. 2016). Many of the ADPs in the literature assumed that the action space of the MDP is finite, so that we can enumerate all possible actions in the action selection procedure. However, it can be extensively timeconsuming to find the optimal action via enumeration when the action space is multidimensional. Policy gradient algorithms have been proposed to find the optimal action for problems with continuous and highdimensional action spaces (Lillicrap et al. 2015, Schulman et al. 2015), but this type of methods may not work for derivativefree problems such as MCIPs with discrete capacity. Other methods to handle with the derivativefree problems include direct policy search—the policy of the problem is approximated, for example, via neural networks—and the optimization of such policy is done by a blackbox optimization (Hu et al. 2017). Jain and Varaiya (2010) approximated the policies via sampling and proposed a simulationbased optimization framework for policy improvements. However, the required number of samples of this type of methods could be large. The NNFVI algorithm studied in this paper has been partly presented in a conference paper (Zhao et al. 2017), but Zhao et al. (2017) studied a problem with capacity expansion only and the proposed algorithm selects the optimal action via enumeration.
3 Problem Description
Notation. Throughout this paper, we denote as an dimensional vector of zero. Denote as the set of integers and as the set of nonnegative integers. We further denote as the support of a random variable. Let and denote the dimensions and extreme points of a set, respectively. We denote as the norm for vectors; if , we use instead of . Finally, for a realvalued measurable function and a probability distribution defined over , we define the norm by .
Consider a multistage stochastic capacity investment problem with customers and facilities . In each time period, customer demand is allocated to and satisfied by the facilities. The objective is to maximize the net present value (NPV) by determining the optimal capacity investment plan and demand allocation plan over the finite planning horizon . Denote as the random demand from customer in time period , and denote its vector form as . Denote as a realization of , with vector form . Let be the installed capacity vector at the end of time period , and let be the initial capacity vector. Define to be the domain of the demand, i.e. Define as the domain of the capacity and as a vector of capacity limits (we assume the capacity is bounded), so that
Our main assumptions are listed below. {assumption} The process, , is a Markov process with components. Without loss of generality, we assume is independent of the installed capacity for all .
Given Assumption 3, if denote to be the conditional probability function, then we have for all . If demand is continuous, then is the conditional density function.
The demand is nonnegative and bounded, i.e. there exists such that for all . If the demand is continuous, we further assume the conditional densities for all are Lipschitz continuous. More specifically, there exists such that
The system has the option to expand/contract the installed capacity from to at the end of each period , and capacity adjustments are instantaneous. {assumption} The expansion cost and salvage value are linear with respect to the capacity, and the per unit expansion cost is not smaller than the per unit salvage value.
Assumption 3 is standard since realworld demands are always finite and their variation from one period to the next does not surge to infinity. In addition, we assume that the lead time of capacity adjustment (compared to the length of each time period) is negligible in our strategic capacity investment problem. However, the lead time for capacity adjustment in some industries can be long. For example, in the semiconductor industry, the lead time for purchasing machines can vary from six to eighteen months (Truong and Roundy 2011).
In this paper, we consider an MCIP with linear expansion cost and salvage value, but the proposed method can also solve more general cases whose expansion cost/salvage value are nonlinear. In addition, the resale value of an asset is usually smaller than its purchase price, so the second statement of Assumption 3 is realistic. The notation for our model is summarized in Table 1.
Indices and sets  

Index for customers  
Index for facility  
Index for time period  
Set of customers, , and  
Set of facilities, , and  
Set of time periods, , and  
Parameters  
Amount of demand generated from customer in time ; the vector form  
is denoted as for all  
Discount factor of time value of money,  
Unit revenue from satisfying customer with facility in time  
Unit penalty cost for unsatisfied customer in time  
Coefficient parameters of per unit expansion cost of facility in time  
Coefficient parameters of per unit salvage value of facility in time  
Variables  
Capacity of facility in time ; the vector form is  
Initial capacity of facility ; the vector form is  
Amount of demand allocated from customer to facility in time 
In each period, we can allocate the realized demand to the facilities, given the constraints of the currently installed capacity . A penalty with unit cost is incurred if demand from customer is unsatisfied. Denote as the amount of demand allocated from customer to facility in time period and as its corresponding revenue. Denote the operating profit in time , it is given by:
(1) 
s. t.  (2)  
(3)  
(4) 
In the above, for all are the decision variables that allocate realized demands to the installed facilities and the objective function (1) is to maximize the current rewards, which consist of the revenues and the penalty for unsatisfied demands. Constraints (2) and (3) are capacity and demand constraints respectively. Note that the allocation decisions depend on the current state only and do not affect future activity.
Denote as the capacity adjustment costs in time . Denote and as the unit expansion cost and unit salvage value for facility respectively. According to Assumption 3, we have for all . The cost function is then
which is convex in .
Based on the aforementioned assumptions, the MCIP can be modeled as an MDP. The state in time period can be represented by a two tuple —i.e. the installed capacity in time and the realized demands—and the action is the adjusted capacity . The state space of our problem, therefore, is and the action space is for all . Denote as a Markov decision rule for time . We denote the class of Markov policies as . Without loss of generality, we assume the initial state for the MCIP is , and the system salvages all installed capacity at the end of period so . MCIP can then be formulated as the following dynamic optimization problem:
(5) 
where is the discount factor. The objective of the above problem is to find an optimal policy, i.e. , such that the expected total rewards are maximized. We can solve Problem (5) via its dynamic programming equations. {theorem} Let and define the following dynamic equations for all :
(6)  
(7) 
Given Assumption 3, Eqs. (6)–(7) recover the optimal policy of Problem (5).
Above, we model MCIP as a finite horizon MDP with a finite action space. This problem can be solved by DPbased algorithms, such as value iteration (VI), but VI is not applicable to MCIP when the dimensions of the state/action space are high. More specifically, if demands are discrete, the complexity of VI is , where is of dimensions and is of dimensions. For example, consider a system with four customers and three facilities and . If and demands of each customer are integer values ranging from to , then the complexity of VI is . Therefore, VI is intractable even for a medium size problem. If the demands are continuous, the state space is infinite and exact VI cannot be done.
3.1 The FVI Algorithm
Since the state space is large or continuous, evaluating the exact value functions in Eqs. (6)–(7) is intractable because of the curse of dimensionality. To respond to this challenge, we will use FVI to fit the value functions by a finite number of samples generated from the state space. The required number samples is generally much smaller than the original state space. Then, we can approximately solve the Bellman equations in Eqs. (6)–(7) by using the approximated value functions instead of the exact ones.
Define and as the sets of indices for the samples generated from the state space and from the state transitions, respectively. First, a set of state samples is drawn from in the last period , and their values are calculated according to Eq. (6). Then, a set of samples, i.e. , can be generated. Denote as the parametric approximate value functions given adjustable parameters , where and is finite (a simple example is to approximate the value functions via linear basis functions, i.e. ). Then, can be trained by solving the following regression problem:
where is the regularization parameter (regularizing the objective function can sometimes improve the generalization of the function fitting and avoid overfitting). Subsequently, the algorithm proceeds backwards in time . In time , we draw number of samples of current states. Since the transitions of demands are independent of the actions, for each state sample , we generate the future transitions given via Monte Carlo simulations; that is
Then, we can calculate the estimated values given the trained function , and solve . The detailed procedure of the algorithm is summarized below.

Step 0: Initialize , , initial state , and the MDP parameters. Set .

Step 1: Draw samples independently from the state space and generate samples of future transitions given each for all .

Step 2: Compute for ,
(8) 
Step 3: Given samples , fit the approximate value functions
(9) 
Step 4: If , set and go to Step 1; otherwise, terminate and return
There are two questions that need to be answered about the above FVI algorithm. First, what is the appropriate approximator such that can fit the true function with arbitrary precision? Second, the action selection problem, i.e. Eq. (8), involves a multidimensional action space such that it is difficult to solve via the enumeration method; how can we speed up this algorithm? We can answer both questions. We discuss the choice of approximator in Section 4 and then introduce a decomposition algorithm to accelerate Problem (8) in Section 5.
4 Neural Networkbased Fitted Value Iteration Algorithm
Munos and Szepesvári (2008, Theorem 2 & Corollary 4) have shown that FVI can achieve an optimal solution with probability when: (i) the functional family used for approximation is sufficiently rich, and (ii) the sample complexity increase polynomially in the scale of the problem instance. In particular, we want to show that the value functions of our MDP are Lipchitz functions. Then, we will show that our approximator is rich enough to handle Lipschitz functions. We first verify that the value functions of MCIP are Lipchitz, and then we discuss the choice of approximators such that FVI is able to derive optimal solution with high probability.
4.1 Fitting the Value Function
The value functions for all are defined over , which is not connected since is finite. In addition, if demand is discrete, then the state space is finite and discrete. To have a better understanding of the structure of the value functions, we wish to extend to its smallest connected superset. Then, we will construct a set of extended value functions.
First we define
where . Note that can be either continuous or discrete in the above definitions. Now, we may define extended value functions for all . If can recover the exact values of on , then we can analyze the characteristics of to learn about . The study of is more amenable since is a connected set.
The dynamic programming equations for the extended value functions at are given by and
(10)  
(11) 
Note that the values of and are attained given any .
Next we show that the value functions can be recovered from the extended functions on .
for all .
Proof.
We now show that the extended value functions of MCIP, i.e. , are bounded and Lipchitz. First, we show that the reward function is bounded by a constant
We have for all .
Proof.
Proof. As is a bounded set and for all , an upper and lower bound on follows if we have unlimited or no capacity:
Therefore,
Similarly, bounds on follow by assuming the capacity is changed from zero to or the reverse:
Since we have assumed , it follows that
Then, for all , we have
which concludes the proof. \Halmos∎
Next, we show that the extended value functions are Lipschitz where
This property is extremely important because it allows us to show that our neural net approximation architecture is rich enough.
The functions for all in Eq. (10) are Lipschitz, i.e. for all and .
Proof.
Proof. We only provide the proof for the continuous demand case with density function ; the discrete case can be proved similarly. To simplify the notation, we denote the state variable as and the action as . Firstly, we show that for all are bounded. According to Lemma 4.1, we have for all . For , we have , for all . Note that is independent with , and we have
Then, we prove the Lipschitz condition for . Firstly, for all , compute
For , we have
In general, for , we have
(12)  
(13)  
(14) 
using the fact that for are bounded according to Assumption 3. Thus, there exists such that are Lipschitz functions.
FVI proceeds via backward induction: the value function in the last period is solved first, then approximated, and then the remaining value functions are calculated by going backwards in time. If the approximation in the last period is poor, the error can be passed on to the approximation in earlier stages. In the next result, we use backward induction to understand the structure of .
The extended value functions, i.e. Eqs. (10)–(11), have the following properties:

is a piecewise linear function;

for all are piecewise linear functions if is finite.
Proof.
Proof. First, we know that piecewise linearity is preserved under finite summation and the max/min of a finite collection. Now, observe that is a piecewise linear function when defined over . This result follows by transforming into its dual. Since is a linear programming problem and the optimal value is finite and attained for all , strong duality holds (see e.g. (Bertsimas and Tsitsiklis 1997, Chapter 4)). Thus, we have
s. t. 
Let denote the feasible set of for the above problem, where and . Since the dual is also a linear programming problem, the optimal solutions can be chosen from the extreme points of their feasible regions (Bertsimas and Tsitsiklis 1997, Chapter 3), and so the above problem is equivalent to
Therefore, is piecewise linear and concave in since it is the min of a finite collection of linear functions. Since and is convex in , is piecewise linear for all and concave in .
To prove (ii), as we have shown that is piecewise linear and concave, by backward induction, we have
Since is piecewise linear and is finite, is piecewise linear as it is a finite sum of piecewise linear functions. Therefore, is piecewise linear as the max of a finite set of piecewise linear functions. \Halmos∎
Note that for all are nonconcave in given any . As is concave in given any , is concave in given (see the proof in Proposition 4.1), may be nonconcave as it is a finite max of concave functions, and the result follows by backward induction.
According to Proposition 4.1(i), we know that is a piecewise linear function if the capacity is defined over a connected space . However, for , the value functions , according to Proposition 4.1(ii), are piecewise linear only when the demand is discrete. Fortunately, if the demands are continuous, we can use MonteCarlo simulation to generate finitely many samples of future transitions to approximate the expectation in Eqs. (6)–(7). In this setting, the approximate value functions are still piecewise linear. However, given Remark 4.1, we may need to solve nonconvex optimization problems in each .
4.2 Twolayer Neural Network with ReLU
To solve the MCIP, we approximate the value functions of the problem by using neural networks with piecewise linear activation functions. Neural networks are powerful approximators, and can incorporate structure of the target function (Jain et al. 1996). In our case, we will use twolayer neural networks. As we will see later in this section, a twolayer network is powerful enough to approximate our value functions arbitrarily well.
A twolayer neural network consists of inputs, one hidden layer for intermediate computations, and an output. Let index the neurons in the hidden layer of our network. The general form of a twolayer neural network is then
where and are the adjustable weights of the input layer and the hidden layer respectively, and is the activation function for neuron . In MCIP, the inputs of the networks are the states and the outputs are the approximate values Based on Proposition 4.1, we choose the activation functions to be ReLU, which are themselves piecewise linear
The architecture of the neural network is shown in Figure 1.
The adjustable weights for the input of neuron in include and an vector of adjustable coefficients
Denote as the vector of all adjustable weights for the input layer and
as a vector of the adjustable parameters for the output of the hidden layer. The neural network at time can then be represented by a function given the input and the adjustable weights :