Stockout Prediction in Multiechelon Networks
Abstract
In multiechelon inventory systems the performance of a given node is affected by events that occur at many other nodes and in many other time periods. For example, a supply disruption upstream will have an effect on downstream, customerfacing nodes several periods later as the disruption "cascades" through the system. There is very little research on stockout prediction in singleechelon systems and (to the best of our knowledge) none on multiechelon systems. However, in real the world, it is clear that there is significant interest in techniques for this sort of stockout prediction. Therefore, our research aims to fill this gap by using deep neural networks (DNN) to predict stockouts in multiechelon supply chains.
Stockout Prediction in Multiechelon Networks
Afshin Oroojlooyjadid Lehigh University Bethlehem, PA 18015 oroojlooy@lehigh.edu Lawrence Snyder Lehigh University Bethlehem, PA 18015 larry.snyder@lehigh.edu Martin Takáč Lehigh University Bethlehem, PA 18015 takac.mt@gmail.com
noticebox[b] \end@float
1 Introduction
A multiechelon network is a chain of nodes that aims to provide a product or service to its customers. Each network consists of production and assembly lines, warehouses, transportation systems, retail processes, etc., and each of them is connected at least to one other node. The most downstream nodes of the network face the customers, which usually present an external stochastic demand. The most upstream nodes interact with thirdparty vendors, which offer an unlimited source of raw materials and goods. An example of a multiechelon network is shown in Figure 1, which depicts a distribution network, e.g, a retail supply chain.
The supply chain manager’s goal is to find a compromise between the profit and service level (i.e. a number between zero and one that determines the percent of the customer’s orders that are satisfied on time) to its customers. For example, a retail network may decide to change the number of retail stores to increase its service availability and create more sales, which also results in a higher cost for the system. In this case, the relevant decisions are how many, where, and when they should be opened/closed to maximize the profit. Facility location and network design are the common mathematical programming problems to provide the optimal decision in those questions. Similarly, the problems in production and inventory systems, are where, when, how, and how much to produce or order of which item. Scheduling and capacity management are common problems in this area. Also, distribution systems must decide when, where, how, and how much of which item should be moved. The transportation problem is the most famous problem that answers these questions. In wellrun companies, there are multiple systems that optimize those problems to provide the best possible balance between service level and profit. In this paper, we focus on inventory management systems to provide an algorithm that answers some of the questions in an environment with stochastic demand.
Balancing between the service level and profit in an inventory system is equivalent to balancing the stockout level and holding safety stock. Stockouts are expensive and common in supply chains. For example, distribution systems face stockout for nonpromoted items and for promoted items (Gartner Inc., 2011). Stockouts result in significant lost revenue for the supply chain. When a company faces a stockout, roughly 70% of customers do not wait for inventory to be replenished, but instead, purchase the items from a competitor (Gruen et al., 2002). Thus, in order to not lose customers and maximize profit, companies should have an inventory management system to provide high service level at a small cost.
Supply chains have different tools to balance between the service level and stockout costs, and all of those tools use a kind of optimization or decisionmaking procedure. For example, some companies produce huge products such as ships cannot hold inventory and have to balance their service level and service costs. Others can hold inventory and in this case the optimization problem finds a compromise between the holding and stockout costs. Usually these models assume a given service level, and minimize the corresponding costs for that. As mentioned, the other relevant questions of an inventory management system are when, where, and how much of each item should be ordered, moved, stored, or transported. These questions can be optimally answered by finding the optimal inventory policy for each node of the network.
One category of models for multiechelon inventory optimization is called the Stochastic Service Model (SSM) approach, which considers stochastic demand and stochastic lead times due to upstream stockouts. The optimal basestock level can be found for serial systems without fixed costs by solving a sequence of singlevariable convex problems (Clark and Scarf, 1960). Similarly, by converting an assembly system (in which each node has at most one successor) to an equivalent serial system, the optimal solution can be achieved (Rosling, 1989). For more general network topologies, no efficient algorithm exists for finding optimal basestock levels, and in some cases the form of the optimal inventory policy is not even known (Zipkin, 2000).
Another approach for dealing with multiechelon problems is the Guaranteed Service Model (GSM) approach. GSM assumes the demand is bounded above, or equivalently the excess demand can be satisfied from outside of the system, e.g., by a third party vendor. It assumes a Committed Service Time (CST) for each node, which is the latest time that the node will satisfy the demand of its successor nodes. By this definition, instead of optimizing the inventory level, the GSM model optimizes the CST for each node, or equivalently it finds the base stock for each node of the network to minimize the holding costs. This approach can handle more general supply chain topologies, typically using either dynamic programming (Graves, 1988; Graves and Willems, 2000) or MIP techniques (Magnanti et al., 2006).
For a review of GSM and SSM Models see Eruguz et al. (2016), SimchiLevi and Zhao (2012), and Snyder and Shen (2018).
The sense among (at least some) supply chain practitioners is that the current set of inventory optimization models are sufficient to optimize most systems as they function normally. What keeps these practitioners up at night is the deviations from “normal” that occur on a daily basis and that pull the system away from its steady state. In other words, there is less need for new inventory optimization models and more need for tools that can help when the real system deviates from the practitioners’ original assumptions.
Our algorithm takes a snapshot of the supply chain at a given point in time and makes predictions about how individual components of the supply chain will perform, i.e., whether they will face stockouts in the near future. We assume an SSMtype system, i.e., a system in which demands follow a known probability distribution, and stages within the supply chain may experience stockouts, thus generating stochastic lead times to their downstream stages. The stages may follow any arbitrary inventory policy, e.g., basestock or . Classical inventory theory can provide longterm statistics about stockout probabilities and levels (see, e.g., Snyder and Shen (2018); Zipkin (2000)), at least for certain network topologies and inventory policies. However, this theory does not make predictions about specific points in time at which a stockout may occur. Since stockouts are expensive, such predictions can be very valuable to companies so that they may take measures to prevent or mitigate impending stockouts.
Note that systems whose basestock levels were optimized using the GSM approach may also face stockouts, even though the GSM model itself assumes they do not. The GSM approach assumes a bound on the demand value; when the realworld demand exceeds that bound, it may not be possible or desirable to satisfy the demand externally, as the GSM model assumes; therefore, stockouts may occur in these systems. Therefore, stockout prediction can be useful for fans of both SSM and GSM approaches.
In a singlenode network, one can obtain the stockout probability and make stockout predictions if the probability distribution of the demand is known (see Appendix A). However, to the best of our knowledge, there are no algorithms to provide stockout predictions in multiechelon networks. To address this need, in this paper, we propose an algorithm to provide stockout predictions for each node of a multiechelon network, which works for any network topology (as long as it contains no directed cycles) and any inventory policy.
The remainder of paper is organized as follows. In Section 2, we introduce our algorithm. Section 3 describes three naive algorithms to predict stockouts. To demonstrate the efficiency of the proposed algorithm in terms of solution quality, we compare our results with the best naive algorithms in Section 4. Finally, Section 5 concludes the paper and proposes future studies.
2 Stockout Prediction Algorithm
We develop an approach to provide stockout predictions for multiechelon networks with available data features. Our algorithm is based on deep learning, or a deep neural network (DNN). DNN is a nonparametric machine learning algorithm, meaning that it does not make strong assumptions about the functional relationship between the input and output variables. In the area of supply chain, DNN has been applied to demand prediction (Efendigil et al., 2009; Vieira, 2015; Ko et al., 2010) and quantile regression (Taylor, 2000; Kourentzes and Crone, 2010; Cannon, 2011; Xu et al., 2016). It has also been successfully applied to the newsvendor problem with data features (Oroojlooyjadid et al., 2016). The basics of deep learning are available in Goodfellow et al. (2016).
Consider a multiechelon supply chain network with nodes, with arbitrary topology. For each node of the network, we know the history of the inventory level (IL), i.e., the onhand inventory minus backorders, and of the inventoryintransit (IT), i.e., the items that have been shipped to the node but have not yet arrived; the values of these quantities in period are denoted and , respectively. In addition, we know the stockout status for the node, given as a True or False Boolean, where True indicates that the node experienced a stockout. (We use 1 and 0 interchangeably with True and False.) The historical stockout information is not used to make predictions at time but is used to train the model. The demand distribution can be known or unknown; in either case, we assume historical demand information is available. The goal is to provide a stockout prediction for each node of the network for the next period.
The available information that can be provided as input to the DNN algorithm includes the values of the available features (e.g., day of week, month of year, weather information), along with the historical observations of IL and IT at each node. Therefore, the available information for node at time can be written as:
(1) 
where denotes the value of the features at time .
However, DNN algorithms are designed for inputs whose size is fixed; in contrast, the vector in (1) changes size at every time step. Therefore, we only consider historical information from the most recent periods instead of the full history. Although this omits some potentially useful information from the network, it unifies and reduces the input size, which has computational advantages, and selecting a large enough provides a good level of information about the system. Therefore, the input of the DNN is:
(2) 
The output of the DNN is the stockout prediction for time , for each node of the network, denoted , a vector of length . Each of the , , equals 1 if the node in period has stockout and 0 otherwise.
A DNN is a network of nodes, beginning with an input layer (representing the inputs, i.e., (2)), ending with an output layer (representing the vector), and one or more layers in between. Each node uses a mathematical function, called an activation function, to transform the inputs it receives into outputs that it sends to the next layer, with the ultimate goal of approximating the relationship between the overall inputs and outputs. In a fully connected network, each node of each layer is connected to each node of the next layer through some coefficients, called weights, which are initialized randomly. “Training” the network consists of determining good values for those weights, typically using nonlinear optimization methods. (A more thorough explanation of DNN is outside the scope of this paper; see, e.g., Goodfellow et al. (2016).)
A loss function is used to evaluate the quality of a given set of weights. The loss function measures the distance between the predicted values and the known values of the outputs. We consider the following loss functions, which are commonly used for binary outputs such as ours:

Hinge loss function

Euclidean loss function

Softmax loss function
The hinge and Euclidean loss functions are reviewed in Appendix C. The softmax loss function uses the softmax function, which is a generalization of logistic regression and is given by
(3) 
where is the number of possible categories (in our case, ),
is the number of layers in the DNN network, is the activation value of node in layer , is the weight between node in layer and node in layer , and represents the number of nodes in layer . Then the softmax loss function is given by
(4) 
where is the total number of training samples, is the indicator function, and is the loss function value, which evaluates the quality of a given classification (i.e., prediction). In essence, the loss function (4) penalizes predictions that differ from the value given by the loss function (3).
The hinge and softmax function provide a probability distribution over possible classes; we then take the argmax over them to choose the predicted class. In our case there are classes, i.e., True and False values, as required in the prediction procedure. On the other hand, the Euclidean function provides a continuous value, which must be changed to a binary output. In our case, we round Euclidean loss function values to their nearest value, either 0 or 1.
Choosing weights for the neural network involves solving a nonlinear optimization problem whose objective function is the loss function and whose decision variables are the network weights. Therefore, we need gradients of the loss function with respect to the weights; these are usually obtained using backpropagation or automatic differentiation. The weights are then updated using a first or secondorder algorithm, such as gradient descent, stochastic gradient descent (SGD), SGD with momentum, LBFGS, etc. Our procedure repeats iteratively until one of the following stopping criteria is met:

The loss function value is less than Tol

The number of passes over the training data reaches MaxEpoch
Tol and MaxEpoch are parameters of the algorithm; we use Tol and MaxEpoch
The loss function provides a measure for monitoring the improvement of the DNN algorithm through the iterations. However, it cannot be used to measure the quality of prediction, and it is not meaningful by itself. Since the prediction output is a binary value, the test error—the number of wrong predictions divided by the number of samples—is an appropriate measure. Moreover, statistics on false positives (type I error, the incorrect rejection of a true null hypothesis) and false negatives (type II error, the failure to reject a true null hypothesis) are helpful, and we use them to get more insights about how the algorithm works.
The DNN algorithm provides one prediction, in which the false positive and negative errors are weighted equally. However, the modeler should be able to control the likelihood of a stockout prediction, i.e., the balance between false positive and false negative errors. To this end, we would benefit from a loss function that can provide control over the likelihood of a stockout prediction, since the DNN’s output is directly affected by its loss function.
The loss functions mentioned above do not have any weighting coefficient, and place equal weight between selecting 0 (predicting no stockout) and 1 (predicting stockout). To correct this, we propose weighing the loss function value that is incurred for each output, 0 and 1, using weights and , which represent the costs of false positive and negative errors, respectively. In this way, when , the DNN tries to have a smaller number of cases in which it returns False but in fact , so it predicts more stockouts to result in a smaller number of false negative errors and a larger number of false positive errors. Similarly, when , the DNN predicts fewer stockouts to avoid cases in which it returns True but in fact . Therefore, it makes a smaller number of false positive errors and a larger number of false negative errors. If , our revised loss function works similarly to the original loss functions.
Using this approach, the weighted hinge, Euclidean, and softmax loss functions are as follows.
Hinge:
(5a)  
(5b) 
Euclidean:
(6a)  
(6b) 
Softmax:
(7) 
where , , and . Thus, these loss functions allow one to manage the number of false positive and negative errors.
3 Naive Approaches
In this section, we propose three naive approaches to predict stockouts. These algorithms are used as baselines for measuring the quality of the DNN algorithm. They are easy to implement, but they do not consider the system state at any nodes other than the node for which we are predicting stockouts. (The proposed DNN approach, in contrast, uses the state at all nodes to provide a more effective prediction.)
In all of the naive algorithms, we use to denote the inventory position in period . Also, and are the numbers of the training and testing records, respectively, and is the demand of the customers in each period of the training set. Finally, the function approximator() takes a list of numbers, fits a normal distribution to it, and returns the corresponding parameters of the normal distribution.
Naive Algorithm 1 first determines all periods in the training data in which a stockout occurred and builds a list of the inventory positions in the preceding period for each. Then it fits a normal distribution to the values in and calculates the th quantile of that distribution, for a given value of . Finally, it predicts a stockout in period if is less than that quantile. The value of is determined by the modeler.
Naive Algorithm 2 groups the inventory positions into a set of ranges, calculates the frequency of stockouts in the training data for each range, and then predicts a stockout in period if the range that falls into experienced stockouts is times more than of the time in the training data.
Finally, Naive Algorithm 3 uses classical inventory theory, which says the inventory level in period equals minus the leadtime demand, where is the lead time (Zipkin, 2000; Snyder and Shen, 2018). The algorithm estimates the leadtime demand distribution by fitting a normal distribution based on the training data, then predicts a stockout in period if is less than or equal to the th quantile of the estimated leadtime demand distribution, where is a parameter chosen by the modeler.
In Naive Algorithms 1 and 3, the value of (and hence ) and the value of in Naive Algorithm 2 is selected by the modeler. A small value of results in a small so that the algorithm predicts fewer stockouts. The same is true for a small . Generally, as or decreases, the number of false positive errors decreases compared to the number of false negative errors, and vice versa. Thus, selecting an appropriate value of or is important and directly affects the output of the algorithm. Indeed, the value of or has to be selected according to the preferences of the company running the algorithm. For example, a company may have very expensive stockouts. So, it may choose a very large or so that the algorithm predicts frequent stockouts, along with many more false positive errors, and then checks them one by one to prevent the stockouts. In this situation the number of false positive errors increases; however, the company faces fewer false negative errors, which are costly. In order to determine an appropriate value of or , the modeler should consider the costs of false positive and negative errors, i.e., and , respectively.
4 Numerical Experiments
In order to check the validity and accuracy of our algorithm, we conducted a series of numerical experiments. Since there is no publicly available data of the type needed for our algorithm, we built a simulation model that assumes each node follows a basestock policy and can make an order only if its predecessor has enough stock to satisfy it so that only the retailer nodes face stockouts. The simulation records several state variables for each of the nodes and for each of the time periods. Figure 2 shows the flowchart of the simulation algorithm used.
To see how our algorithm works with different network topologies, we conducted multiple tests on five supply chain network topologies, ranging from a simple series system to complex networks containing (undirected) cycles and little or no symmetry. These tests are intended to explore the robustness of the DNN approach on simple or very complex networks. The five supply chain networks we used are:

Serial network with 11 nodes.

One warehouse, multiple retailer (OWMR) network with 11 nodes.

Distribution network with 13 nodes.

Complex network I with 11 nodes, including one retailer and two warehouses.

Complex network II with 11 nodes, including three retailers and one node at the farthest echelon upstream (which we refer to as a warehouse).
We simulated each of the networks for periods, with 75% of the resulting data used for training (and validation) and the remaining 25% for testing. For all of the problems we used a fully connected DNN network with 350 and 150 sigmoid nodes in the first and second layers, respectively. The inputs are the inventory levels and onorder inventories for each node from each of the most recent periods (as given in (2)), and the output is the binary stockout predictor for each of the nodes. Figure 3 shows a general view of the DNN network. Among the loss functions reviewed in Section 2, the softmax loss function had the best accuracy in initial numerical experiments. Thus, the softmax loss function was selected and its results are provided. To this end, we implemented the weighted softmax function and its gradient (see Appendix B) in the DNN computation framework Caffe (Jia et al., 2014), and all of the tests were done on machines with 16 AMD cores and 32 GB of memory. In order to optimize the network, the SGD algorithm—with batches of 50—with momentum is used, and each problem is run with MaxEpoch=3. Each epoch defines one pass over the training data. Finally, we tested 99 values of and 118 values of , such that .
The DNN algorithm is scale dependent, meaning that the algorithm hyperparameters (such as , learning rate, momentum, etc.; see Goodfellow et al. (2016)) are dependent on the values of and . Thus, a set of appropriate hyperparameters of the DNN network for a given set of cost coefficients does not necessarily work well for another set . This means that, ideally, for each set of , we should retune the DNN hyperparameters, i.e., retrain the network. However, the tuning procedure is computationally expensive, so in our experiments we tuned the hyperparameters for and and used the resulting value for other sets of costs, in all network topologies. However, in complex network II, we did not get good convergence using this method, so we tuned the network for another set of cost coefficients to make sure that we get a notdiverging DNN for each set of coefficients. To summarize, our experiments use minimal tuning, except for complex network II—still far less than the total numbers of possible experiment—, but even so, the algorithm performs very well; however, better tuning could further improve our results.
In what follows, we demonstrate the results of the DNN and three naive algorithms in seven experiments. Sections 4.1–4.5 present the results of the serial, OWMR, distribution, complex I, and complex II networks, respectively. Section 4.7 extends these experiments: Section 4.7.1 provides threshold prediction, Section 4.7.2 analyzes the results of a distribution network with multiple items with dependent demand, and Section 4.7.3 shows the results of predicting stockouts multiple periods ahead in a distribution network. In each of the network topologies, we plot the false positive vs. false negative errors for all algorithms to compare their performance. In addition, two other figures in each section show the accuracy vs. false positive and negative errors to provide better insights into the way that the DNN algorithm (weighted and unweighted) works compared to the naive algorithms.
4.1 Results: Serial Network
Figure 4 shows the serial network with 11 nodes. The training dataset is used to train all five algorithms and the corresponding results are shown in Figures 5 and 6.
Figure 5 plots the falsenegative errors vs. the falsepositive errors for each approach and for a range of values for the naive approaches and a range of weights for the weighted DNN approach. Points closer to the origin indicate more desirable solutions. Since there is just one retailer, the algorithms each make stockout predictions (one in each of the testing periods); therefore, the number of errors in both figures should be compared to .
The DNN approach always dominates the naive approaches, with the unweighted version providing a slightly better accuracy but the weighted version providing more flexibility. For any given number of falsepositive errors, the numbers of falsenegative errors of the DNN and WDNN algorithms are smaller than those of the naive approaches, and similarly for a given number of falsenegative errors. The results of naive approaches are similar to each other, with Naive1 and Naive3 outperforming Naive2 for most values. Similarly, Figure 6 plots the errors vs. the accuracy of the predictions and shows that for a given number of false positives or negatives, the DNN approaches attain a much higher level of accuracy than the naive approaches do. In conclusion, the naive algorithms perform similar to each other and worse than DNN, since they do not use the available historical information. In contrast, DNN learns the relationship between state inputs and stockouts and can predict stockouts very well.
4.2 Results: OWMR Network
Figure 7 shows the OWMR network with 11 nodes and Figures 8 and 9 present the experimental results for this network. Since there are 10 retailers, prediction is more challenging than for the serial network, as the algorithms each make stockout predictions; the number of errors in both figures should be compared to .
Figure 8 shows the falsenegative errors vs. the falsepositive errors for each approach and for a range of values for the naive approaches and a range of weights for the weighted DNN approach. DNN and weighted DNN dominate the naive approaches. The three naive approaches are similar to each other, with Naive2 somewhat worse than the other two. Figure 9 plots the errors vs. the accuracy of the predictions and confirms that DNN can attain higher accuracy levels for the same number of errors than the naive approaches. It is also apparent that all methods are less accurate for the OWMR system than they are for the serial system since there are many more predictions to make. However, DNN still provides better accuracy compared to the naive approaches.
4.3 Results: Distribution Network
Figure 10 shows the distribution network with 13 nodes, and Figure 11 provides the corresponding results of the five algorithms.
As Figure 11 shows, the DNN approach mostly dominates the naive approaches. However,it does not perform as well as in serial or OWMR networks; that occurs because of the tuning of the DNN network hyperparameters. Among the three naive approaches, Naive3 dominates Naive1, since the demand data comes from a normal distribution without any noise, and the algorithm also approximates a normal distribution, which needs around 12 samples to get a good estimate of the mean and standard deviation. Therefore, the experiment is biased in favor of Naive3. Plots of the errors vs. the accuracy of the predictions are similar to those in Figure 9; they are omitted to save space.
Compared to the OWMR network, the distribution network includes fewer retailer nodes and therefore fewer stockout predictions; however, the network is also more complex, and as a result the DNN is less accurate than it is for OWMR network. We conclude that the accuracy of the DNN depends more on the number of echelons in the system than it does on the number of retailers. On the other hand, DNN obtains greater accuracy than any of the naive approaches.
4.4 Results: Complex Network I
Figure 12 shows a complex network with two warehouses (i.e., two nodes at the farthest echelon upstream), and Figure 13 presents the corresponding results of the five algorithms.
Figure 13 plots the falsenegative errors vs. the falsepositive errors for each approach and for a range of values for the naive approaches and a range of weights for the weighted DNN approach. The DNN approach dominates the naive approaches for most cases, but does worse when falsepositives are tolerated in favor of reducing falsenegatives. The average accuracy rates for this system are 91% for WDNN and 97% for DNN, which show the importance of hyperparameter tuning for each weight of the weighted DNN approach. Tuning it for each weight individually would improve the results significantly (but increase the computation time). Plots of the errors vs. the accuracy of the predictions are similar to those in Figure 9; they are omitted to save space.
As in the serial network, there is just one retailer node; however, since the network is more complex, DNN produces less accurate predictions for complex network I than it does for the serial network, or for the other tree networks (OWMR and distribution). The added complexity of this network topology has an effect on the accuracy of our model, though the algorithm is still quite accurate.
4.5 Results: Complex Network II
Figure 14 shows the complex network with three retailers and Figure 15 presents the corresponding results of each algorithm.
Figure 15 plots the falsenegative errors vs. the falsepositive errors for each approach and for a range of values for the naive approaches and a range of weights for the weighted DNN approach. Figure 16 plots the errors vs. the accuracy of the predictions. As we did for the other network topologies, for complex network II we tuned the DNN network hyperparameters for the case of and and used the resulting hyperparameters for all other values of . However, the hyperparameters obtained in this way did not work well for 46 sets of values, mostly those with . In these cases, the training network did not converge, i.e., after 3 epochs of training, the network generally predicted 0 (or 1) for every data instance, even in the training set, and the loss values failed to decrease to an acceptable level. Thus, we also tuned the hyperparameters for and and used them to obtain the results for these 46 cases. The hyperparameters obtained using and are all given in Table 1. We used the first set of hyperparameters for 72 of the 118 combinations of values and the second set for the remaining 46 combinations. Additional hyperparameter tuning would result in further improved dominance of the DNN approach.
Network  lr  

Serial  0.001  0.0005  0.0001 
Distribution  0.0005  0.001  0.0005 
OWMR  0.001  0.0005  0.0005 
ComplexI  0.05  0.000005  0.000005 
ComplexII,  0.05  0.05  0.05 
ComplexII,  0.005  0.005  0.005 
Complex network II is the most complex network among all the networks we analyzed, since it is a nontree network with multiple retailers. As Figure 16 shows, WDNN performs worse than the naive approaches for a few values of the weight, which shows the difficulty of the problems and the need to tune the network’s hyperparameters for each set of cost coefficients.
4.6 Results: Comparison
In order to get more insight, the average accuracy of each algorithm for each of the networks is presented in Table 2. The average is taken over all instances of a given network type, i.e., over all cost parameters. In the column headers, N1, N2, and N3 stand for the Naive1, Naive2, and Naive3 algorithms. The corresponding hyperparameters that we used to obtain these results are also presented in Table 1.
DNN provides the best accuracy compared to the other algorithms. WDNN is equally good for the serial and OWMR networks and slightly worse for the distribution and complex II networks. The difference is larger for complex I; this is a result of the fact that we did not retune the DNN network for each value of the cost parameters, as discussed in Section 4.4. We conclude that DNN is the method to choose if the user wants to ensure high accuracy; and WDNN is useful if the user wants to control the balance between false positive and false negative errors.
The column labeled shows the number of costparameter values in which one of Naive3’s predictions has fewer false positive and fewer false negative errors than at least one of the predictions of Naive1. This happens often for some networks, since the simulated data are normally distributed and since Naive3 happens to assume a normal distribution. We would expect the method to work worse if the simulated data were from a different distribution.
The last column shows a similar comparison for the Naive3 and WDNN algorithms. In particular, Naive3 never dominates WDNN in this way.
Network  N1  N2  N3  WDNN  DNN  

Serial  0.94  0.97  0.95  0.99  0.99  0  0 
Distribution  0.91  0.93  0.99  0.95  0.95  91  0 
OWMR  0.91  0.93  0.91  0.95  0.98  19  0 
Complex I  0.86  0.94  0.92  0.91  0.97  22  0 
Complex II  0.86  0.94  0.92  0.94  0.97  0  0 
4.7 Extended Results
In this section we present results on some extensions of our original model and analysis. In Section 4.7.1, we examine the ability of the algorithms to predict whether the inventory level will fall below a given threshold that is not necessarily 0. In Section 4.7.2, we apply our method to problems with dependent demands. Finally, in Section 4.7.3, we explore multipleperiodahead predictions.
4.7.1 Threshold Prediction
The models discussed above aim to predict whether a stockout will occur; that is, whether the inventory level will fall below 0. However, it is often desirable for inventory managers to have more complete knowledge about inventory levels; in particular, we would like to be able to predict whether the inventory level will fall below a given threshold that is not necessarily 0. In order to see how well our proposed algorithms perform at this task, in this section we provide results for the case in which we aim to predict whether the inventory level will fall below 10.
A similar procedure is applied to achieve the results of all algorithms. In particular, we changed the way that the data labels are applied so that we assign a label of 1 when and a label of 0 otherwise. We exclude the results of the DNN and Naive2 algorithms, since they are dominated by the WDNN and Naive3 algorithms. Figures 17–21 present the results of the serial, OWMR, distribution, complex I, and complex II networks. As before, WDNN outperforms the naive algorithms. Table 3 provides the overall accuracy of all algorithms and the comparisons between them; the columns are the same as those in Table 2. As before, WDNN performs better than or equal to the other algorithms for all networks. The accuracy figures for this case are provided in Appendix E.
Network  N1  N3  WDNN  

Serial  0.88  0.96  0.99  0  0 
Distribution  0.90  0.92  0.93  89  0 
OWMR  0.91  0.92  0.96  97  0 
Complex I  0.85  0.87  0.97  13  0 
Complex II  0.82  0.87  0.96  0  0 
4.7.2 MultiItem Dependent Demand MultiEchelon Problem
The data sets we have used so far assume that the demands are statistically independent. However, in the real world, demand for multiple items are often dependent on each other. Moreover, this dependence information provides additional information for DNN and might help to provide more accurate stockout predictions. To analyze this, we generated the data for seven items with dependent demands, some positively and some negatively correlated. The mean demand of the seven items for seven days of a week is shown in Figure 22. For more details see Appendix D which provides the demand means and standard deviation for each item and each day.
We tested this approach using the distribution network (Figure 10). Figure 23 plots the falsenegative errors vs. the falsepositive errors for each approach and for a range of values for the naive approaches and a range of weights for the weighted DNN approach. WDNN produces an average accuracy rate of 99% for this system, compared to 95% for the independentdemand case, which shows how DNN is able to make more accurate predictions by taking advantage of information it learns about the demand dependence. Finally, Figure 24 plots the errors vs. the accuracy of the predictions. DNN and WDNN provide much more accurate predictions than the naive methods.
4.7.3 MultiPeriod Prediction
In order to see how well our algorithm can make stockout predictions multiple periods ahead, we revised the DNN structure, such that there are output values in the DNN algorithm, where is the number of prediction periods. We tested this approach using the distribution network (Figure 10).
We tested the algorithm for three different problems. The first predicts stockouts for each of the next two days; the second and third to the same for the next three and seven days, respectively. The accuracy of the predictions for each day are plotted in Figure 25. For example, the blue curve shows the accuracy of the predictions made for each of the next 3 days when we make predictions over a horizon of 3 days. The oneday prediction accuracy is plotted as a reference.
Not surprisingly, it is harder to predict stockouts multiple days in advance. For example, the accuracy for days 4–7 is below 90% when predicting 7 days ahead. Moreover, when predicting over a longer horizon, the predictions for earlier days are less accurate. For example, the accuracy for predictions 2 days ahead is roughly 99% if we use a 2day horizon, 95% if we use a 3day horizon, and 94% if we use a 7day horizon. Therefore, if we wish to make predictions for each of the next days, it is more accurate (though slower) to run separate DNN models rather than a single model that predicts the next days.
5 Conclusion and Future Works
We studied stockout prediction in multiechelon supply chain networks. In singlenode networks, classical inventory theory provides tools for making such predictions when the demand distribution is known. However, there is no algorithm to predict stockout in multiechelon networks. To address this need, we proposed an algorithm based on deep learning. We also introduced three naive algorithms to provide a benchmark for stockout prediction. None of the algorithms require knowledge of the demand distribution; they use only historical data.
Extensive numerical experiments show that the DNN algorithm works well compared to the three naive algorithms. The results suggest that our method holds significant promise for predicting stockouts in complex, multiechelon supply chains. It obtains an average accuracy of 99% in serial networks and 95% for OWMR and distribution networks. Even for complex, nontree networks, it attains an average accuracy of at least 91%. It also performs well when predicting inventory levels below a given threshold (not necessarily 0), making predictions when the demand is correlated, and making predictions multiple period ahead.
Several research directions are now evident, including expanding the current approach to handle other types of uncertainty, e.g., lead times, supply disruptions, etc. Improving the model’s ability to make accurate predictions for more than one period ahead is another interesting research direction. Our current model appears to be able to make predictions accurately up to roughly 3 periods ahead, but its accuracy degrades quickly after that. Finally, the model can be extended to take into account other supply chain state variables in addition to current inventory and intransit levels.
6 Acknowledgment
This research was supported in part by NSF grant #CMMI1663256. This support is gratefully acknowledged.
References
 Cannon [2011] Alex J Cannon. Quantile regression neural networks: Implementation in R and application to precipitation downscaling. Computers & geosciences, 37(9):1277–1284, 2011.
 Clark and Scarf [1960] Andrew J Clark and Herbert Scarf. Optimal policies for a multiechelon inventory problem. Management science, 6(4):475–490, 1960.
 Efendigil et al. [2009] Tuğba Efendigil, Semih Önüt, and Cengiz Kahraman. A decision support system for demand forecasting with artificial neural networks and neurofuzzy models: A comparative analysis. Expert Systems with Applications, 36(3):6697–6707, 2009.
 Eruguz et al. [2016] Ayse Sena Eruguz, Evren Sahin, Zied Jemai, and Yves Dallery. A comprehensive survey of guaranteedservice models for multiechelon inventory optimization. International Journal of Production Economics, 172:110–125, 2016.
 Gartner Inc. [2011] Gartner Inc. Improving onshelf availability for retail supply chains requires the balance of process and technology, gartner group. https://www.gartner.com/doc/1701615/improvingonshelfavailabilityretailsupply, 2011. Accessed: 20160804.
 Goodfellow et al. [2016] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
 Graves [1988] Stephen C. Graves. Safety stocks in manufacturing systems. Journal of Manufacturing and Operations Management, 1:67–101, 1988.
 Graves and Willems [2000] Stephen C. Graves and Sean P. Willems. Optimizing strategic safety stock placement in supply chains. Manufacturing and Service Operations Management, 2(1):68–83, 2000.
 Gruen et al. [2002] Thomas W Gruen, Daniel S Corsten, and Sundar Bharadwaj. Retail outofstocks: A worldwide examination of extent, causes and consumer responses. Grocery Manufacturers of America Washington, DC, 2002.
 Jia et al. [2014] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
 Ko et al. [2010] Mark Ko, Ashutosh Tiwari, and Jörn Mehnen. A review of soft computing applications in supply chain management. Applied Soft Computing, 10(3):661–674, 2010.
 Kourentzes and Crone [2010] Nikolaos Kourentzes and Sven Crone. Advances in forecasting with artificial neural networks. 2010.
 Magnanti et al. [2006] Thomas L. Magnanti, ZuoJun Max Shen, Jia Shu, David SimchiLevi, and ChungPiaw Teo. Inventory placement in acyclic supply chain networks. Operations Research Letters, 34:228–238, 2006.
 Oroojlooyjadid et al. [2016] Afshin Oroojlooyjadid, Lawrence Snyder, and Martin Takáč. Applying deep learning to the newsvendor problem. http://arxiv.org/abs/1607.02177, 2016.
 Rosling [1989] Kaj Rosling. Optimal inventory policies for assembly systems under random demands. Operations Research, 37(4):565–579, 1989.
 SimchiLevi and Zhao [2012] David SimchiLevi and Yao Zhao. Performance evaluation of stochastic multiechelon inventory systems: A survey. Advances in Operations Research, vol. 2012, 2012.
 Snyder and Shen [2018] Lawrence V Snyder and ZuoJun Max Shen. Fundamentals of Supply Chain Theory. John Wiley & Sons, 2th edition, 2018.
 Taylor [2000] James W Taylor. A quantile regression neural network approach to estimating the conditional density of multiperiod returns. Journal of Forecasting, 19(4):299–311, 2000.
 Vieira [2015] Armando Vieira. Predicting online user behaviour using deep learning algorithms. Computing Research Repository  arXiv.org, abs/1511.06247, 2015. URL http://arxiv.org/abs/1511.06247.
 Xu et al. [2016] Qifa Xu, Xi Liu, Cuixia Jiang, and Keming Yu. Quantile autoregression neural network model with applications to evaluating value at risk. Applied Soft Computing, 49:1–12, 2016.
 Zipkin [2000] Paul H. Zipkin. Foundations of Inventory Management. McGrawHill, Irwin, 2000.
Appendix A StockOut Prediction for SingleStage Supply Chain Network
Consider a singlestage supply chain network. The goal is to obtain the stockout probability and as a result make a stockout prediction, i.e., we want to obtain the probability:
where is the ending inventory level in period . Classical inventory theory (see, e.g., Snyder and Shen [2018], Zipkin [2000]) tells us that
where is the lead time, is the inventory position (inventory level plus onorder inventory) after placing a replenishment order in period , and is the leadtime demand. Since we know and we know the probability distribution of , we can determine the probability distribution of and use this to calculate . Then we can predict a stockout if this probability is larger than , for some desired threshold .
Appendix B Gradient of Weighted Softmax Function
and the gradient of weighted softmax loss function (7) is:
Appendix C Activation and Loss Functions
The most common loss functions are the hinge (8), logistic (9), and Euclidean (10) loss functions, given (respectively) by:
(8) 
(9) 
(10) 
where is the observed value of sample , and is the output of the DNN. The hinge loss function is appropriate for classification. The logistic loss function is also used for classification; however, it is a convex function which is easier to optimize than the hinge function. The Euclidean loss function minimizes the difference between the observed and calculated values and penalizes closer predictions much less than farther predictions.
Each node of the DNN network has an activation function. The most commonly used activation functions are sigmoid, tanh, and inner product, given (respectively) by:
(11)  
(12)  
(13) 
Appendix D Dependent Demand Data Generation
This section provides the details of data generation for dependent demands. In the case of dependent demand, there are seven items, and the demand mean of each item is different on different days of the week. Tables 4 and 5 provide the mean () and standard deviation () of the normal distribution of for each item in each day of week.
Item  Mon  Tue  Wen  Thu  Fri  Sat  Sun 

1  12  10  9  11  14  9  11 
2  14  12  11  9  16  7  9 
3  8  7  6  14  10  13  14 
4  7  6  5  15  9  14  15 
5  6  5  4  16  7  15  16 
6  8  7  6  14  10  13  13 
7  10  9  8  12  12  11  12 
Item  Mon  Tue  Wen  Thu  Fri  Sat  Sun 

1  3  2  4  1  2  3  2 
2  4  3  4  1  3  2  1 
3  1  1  2  2  2  4  3 
4  1  1  1  3  1  4  3 
5  1  1  1  2  1  3  3 
6  2  1  1  3  1  3  3 
7  3  2  4  1  2  3  2 