Stock-out Prediction in Multi-echelon Networks

Stock-out Prediction in Multi-echelon Networks

Afshin Oroojlooyjadid
Lehigh University
Bethlehem, PA 18015
&Lawrence Snyder
Lehigh University
Bethlehem, PA 18015
&Martin Takáč
Lehigh University
Bethlehem, PA 18015

In multi-echelon inventory systems the performance of a given node is affected by events that occur at many other nodes and in many other time periods. For example, a supply disruption upstream will have an effect on downstream, customer-facing nodes several periods later as the disruption "cascades" through the system. There is very little research on stock-out prediction in single-echelon systems and (to the best of our knowledge) none on multi-echelon systems. However, in real the world, it is clear that there is significant interest in techniques for this sort of stock-out prediction. Therefore, our research aims to fill this gap by using deep neural networks (DNN) to predict stock-outs in multi-echelon supply chains.


Stock-out Prediction in Multi-echelon Networks

  Afshin Oroojlooyjadid Lehigh University Bethlehem, PA 18015 Lawrence Snyder Lehigh University Bethlehem, PA 18015 Martin Takáč Lehigh University Bethlehem, PA 18015


noticebox[b] \end@float

1 Introduction

A multi-echelon network is a chain of nodes that aims to provide a product or service to its customers. Each network consists of production and assembly lines, warehouses, transportation systems, retail processes, etc., and each of them is connected at least to one other node. The most downstream nodes of the network face the customers, which usually present an external stochastic demand. The most upstream nodes interact with third-party vendors, which offer an unlimited source of raw materials and goods. An example of a multi-echelon network is shown in Figure 1, which depicts a distribution network, e.g, a retail supply chain.

Figure 1: A multi-echelon network with 10 nodes

The supply chain manager’s goal is to find a compromise between the profit and service level (i.e. a number between zero and one that determines the percent of the customer’s orders that are satisfied on time) to its customers. For example, a retail network may decide to change the number of retail stores to increase its service availability and create more sales, which also results in a higher cost for the system. In this case, the relevant decisions are how many, where, and when they should be opened/closed to maximize the profit. Facility location and network design are the common mathematical programming problems to provide the optimal decision in those questions. Similarly, the problems in production and inventory systems, are where, when, how, and how much to produce or order of which item. Scheduling and capacity management are common problems in this area. Also, distribution systems must decide when, where, how, and how much of which item should be moved. The transportation problem is the most famous problem that answers these questions. In well-run companies, there are multiple systems that optimize those problems to provide the best possible balance between service level and profit. In this paper, we focus on inventory management systems to provide an algorithm that answers some of the questions in an environment with stochastic demand.

Balancing between the service level and profit in an inventory system is equivalent to balancing the stock-out level and holding safety stock. Stock-outs are expensive and common in supply chains. For example, distribution systems face stock-out for non-promoted items and for promoted items (Gartner Inc., 2011). Stock-outs result in significant lost revenue for the supply chain. When a company faces a stock-out, roughly 70% of customers do not wait for inventory to be replenished, but instead, purchase the items from a competitor (Gruen et al., 2002). Thus, in order to not lose customers and maximize profit, companies should have an inventory management system to provide high service level at a small cost.

Supply chains have different tools to balance between the service level and stock-out costs, and all of those tools use a kind of optimization or decision-making procedure. For example, some companies produce huge products such as ships cannot hold inventory and have to balance their service level and service costs. Others can hold inventory and in this case the optimization problem finds a compromise between the holding and stock-out costs. Usually these models assume a given service level, and minimize the corresponding costs for that. As mentioned, the other relevant questions of an inventory management system are when, where, and how much of each item should be ordered, moved, stored, or transported. These questions can be optimally answered by finding the optimal inventory policy for each node of the network.

One category of models for multi-echelon inventory optimization is called the Stochastic Service Model (SSM) approach, which considers stochastic demand and stochastic lead times due to upstream stockouts. The optimal base-stock level can be found for serial systems without fixed costs by solving a sequence of single-variable convex problems (Clark and Scarf, 1960). Similarly, by converting an assembly system (in which each node has at most one successor) to an equivalent serial system, the optimal solution can be achieved (Rosling, 1989). For more general network topologies, no efficient algorithm exists for finding optimal base-stock levels, and in some cases the form of the optimal inventory policy is not even known (Zipkin, 2000).

Another approach for dealing with multi-echelon problems is the Guaranteed Service Model (GSM) approach. GSM assumes the demand is bounded above, or equivalently the excess demand can be satisfied from outside of the system, e.g., by a third party vendor. It assumes a Committed Service Time (CST) for each node, which is the latest time that the node will satisfy the demand of its successor nodes. By this definition, instead of optimizing the inventory level, the GSM model optimizes the CST for each node, or equivalently it finds the base stock for each node of the network to minimize the holding costs. This approach can handle more general supply chain topologies, typically using either dynamic programming (Graves, 1988; Graves and Willems, 2000) or MIP techniques (Magnanti et al., 2006).

For a review of GSM and SSM Models see Eruguz et al. (2016), Simchi-Levi and Zhao (2012), and Snyder and Shen (2018).

The sense among (at least some) supply chain practitioners is that the current set of inventory optimization models are sufficient to optimize most systems as they function normally. What keeps these practitioners up at night is the deviations from “normal” that occur on a daily basis and that pull the system away from its steady state. In other words, there is less need for new inventory optimization models and more need for tools that can help when the real system deviates from the practitioners’ original assumptions.

Our algorithm takes a snapshot of the supply chain at a given point in time and makes predictions about how individual components of the supply chain will perform, i.e., whether they will face stock-outs in the near future. We assume an SSM-type system, i.e., a system in which demands follow a known probability distribution, and stages within the supply chain may experience stock-outs, thus generating stochastic lead times to their downstream stages. The stages may follow any arbitrary inventory policy, e.g., base-stock or . Classical inventory theory can provide long-term statistics about stock-out probabilities and levels (see, e.g., Snyder and Shen (2018); Zipkin (2000)), at least for certain network topologies and inventory policies. However, this theory does not make predictions about specific points in time at which a stock-out may occur. Since stock-outs are expensive, such predictions can be very valuable to companies so that they may take measures to prevent or mitigate impending stock-outs.

Note that systems whose base-stock levels were optimized using the GSM approach may also face stock-outs, even though the GSM model itself assumes they do not. The GSM approach assumes a bound on the demand value; when the real-world demand exceeds that bound, it may not be possible or desirable to satisfy the demand externally, as the GSM model assumes; therefore, stock-outs may occur in these systems. Therefore, stock-out prediction can be useful for fans of both SSM and GSM approaches.

In a single-node network, one can obtain the stock-out probability and make stock-out predictions if the probability distribution of the demand is known (see Appendix A). However, to the best of our knowledge, there are no algorithms to provide stock-out predictions in multi-echelon networks. To address this need, in this paper, we propose an algorithm to provide stock-out predictions for each node of a multi-echelon network, which works for any network topology (as long as it contains no directed cycles) and any inventory policy.

The remainder of paper is organized as follows. In Section 2, we introduce our algorithm. Section 3 describes three naive algorithms to predict stock-outs. To demonstrate the efficiency of the proposed algorithm in terms of solution quality, we compare our results with the best naive algorithms in Section 4. Finally, Section 5 concludes the paper and proposes future studies.

2 Stock-out Prediction Algorithm

We develop an approach to provide stock-out predictions for multi-echelon networks with available data features. Our algorithm is based on deep learning, or a deep neural network (DNN). DNN is a non-parametric machine learning algorithm, meaning that it does not make strong assumptions about the functional relationship between the input and output variables. In the area of supply chain, DNN has been applied to demand prediction (Efendigil et al., 2009; Vieira, 2015; Ko et al., 2010) and quantile regression (Taylor, 2000; Kourentzes and Crone, 2010; Cannon, 2011; Xu et al., 2016). It has also been successfully applied to the newsvendor problem with data features (Oroojlooyjadid et al., 2016). The basics of deep learning are available in Goodfellow et al. (2016).

Consider a multi-echelon supply chain network with nodes, with arbitrary topology. For each node of the network, we know the history of the inventory level (IL), i.e., the on-hand inventory minus backorders, and of the inventory-in-transit (IT), i.e., the items that have been shipped to the node but have not yet arrived; the values of these quantities in period are denoted and , respectively. In addition, we know the stock-out status for the node, given as a True or False Boolean, where True indicates that the node experienced a stock-out. (We use 1 and 0 interchangeably with True and False.) The historical stock-out information is not used to make predictions at time but is used to train the model. The demand distribution can be known or unknown; in either case, we assume historical demand information is available. The goal is to provide a stock-out prediction for each node of the network for the next period.

The available information that can be provided as input to the DNN algorithm includes the values of the available features (e.g., day of week, month of year, weather information), along with the historical observations of IL and IT at each node. Therefore, the available information for node at time can be written as:


where denotes the value of the features at time .

However, DNN algorithms are designed for inputs whose size is fixed; in contrast, the vector in (1) changes size at every time step. Therefore, we only consider historical information from the most recent periods instead of the full history. Although this omits some potentially useful information from the network, it unifies and reduces the input size, which has computational advantages, and selecting a large enough provides a good level of information about the system. Therefore, the input of the DNN is:


The output of the DNN is the stock-out prediction for time , for each node of the network, denoted , a vector of length . Each of the , , equals 1 if the node in period has stock-out and 0 otherwise.

A DNN is a network of nodes, beginning with an input layer (representing the inputs, i.e., (2)), ending with an output layer (representing the vector), and one or more layers in between. Each node uses a mathematical function, called an activation function, to transform the inputs it receives into outputs that it sends to the next layer, with the ultimate goal of approximating the relationship between the overall inputs and outputs. In a fully connected network, each node of each layer is connected to each node of the next layer through some coefficients, called weights, which are initialized randomly. “Training” the network consists of determining good values for those weights, typically using nonlinear optimization methods. (A more thorough explanation of DNN is outside the scope of this paper; see, e.g., Goodfellow et al. (2016).)

A loss function is used to evaluate the quality of a given set of weights. The loss function measures the distance between the predicted values and the known values of the outputs. We consider the following loss functions, which are commonly used for binary outputs such as ours:

  • Hinge loss function

  • Euclidean loss function

  • Soft-max loss function

The hinge and Euclidean loss functions are reviewed in Appendix C. The soft-max loss function uses the soft-max function, which is a generalization of logistic regression and is given by


where is the number of possible categories (in our case, ),

is the number of layers in the DNN network, is the activation value of node in layer , is the weight between node in layer and node in layer , and represents the number of nodes in layer . Then the soft-max loss function is given by


where is the total number of training samples, is the indicator function, and is the loss function value, which evaluates the quality of a given classification (i.e., prediction). In essence, the loss function (4) penalizes predictions that differ from the value given by the loss function (3).

The hinge and soft-max function provide a probability distribution over possible classes; we then take the argmax over them to choose the predicted class. In our case there are classes, i.e., True and False values, as required in the prediction procedure. On the other hand, the Euclidean function provides a continuous value, which must be changed to a binary output. In our case, we round Euclidean loss function values to their nearest value, either 0 or 1.

Choosing weights for the neural network involves solving a nonlinear optimization problem whose objective function is the loss function and whose decision variables are the network weights. Therefore, we need gradients of the loss function with respect to the weights; these are usually obtained using back-propagation or automatic differentiation. The weights are then updated using a first- or second-order algorithm, such as gradient descent, stochastic gradient descent (SGD), SGD with momentum, LBFGS, etc. Our procedure repeats iteratively until one of the following stopping criteria is met:

  • The loss function value is less than Tol

  • The number of passes over the training data reaches MaxEpoch

Tol and MaxEpoch are parameters of the algorithm; we use Tol and MaxEpoch

The loss function provides a measure for monitoring the improvement of the DNN algorithm through the iterations. However, it cannot be used to measure the quality of prediction, and it is not meaningful by itself. Since the prediction output is a binary value, the test error—the number of wrong predictions divided by the number of samples—is an appropriate measure. Moreover, statistics on false positives (type I error, the incorrect rejection of a true null hypothesis) and false negatives (type II error, the failure to reject a true null hypothesis) are helpful, and we use them to get more insights about how the algorithm works.

The DNN algorithm provides one prediction, in which the false positive and negative errors are weighted equally. However, the modeler should be able to control the likelihood of a stock-out prediction, i.e., the balance between false positive and false negative errors. To this end, we would benefit from a loss function that can provide control over the likelihood of a stock-out prediction, since the DNN’s output is directly affected by its loss function.

The loss functions mentioned above do not have any weighting coefficient, and place equal weight between selecting 0 (predicting no stock-out) and 1 (predicting stock-out). To correct this, we propose weighing the loss function value that is incurred for each output, 0 and 1, using weights and , which represent the costs of false positive and negative errors, respectively. In this way, when , the DNN tries to have a smaller number of cases in which it returns False but in fact , so it predicts more stock-outs to result in a smaller number of false negative errors and a larger number of false positive errors. Similarly, when , the DNN predicts fewer stock-outs to avoid cases in which it returns True but in fact . Therefore, it makes a smaller number of false positive errors and a larger number of false negative errors. If , our revised loss function works similarly to the original loss functions.

Using this approach, the weighted hinge, Euclidean, and soft-max loss functions are as follows.







where , , and . Thus, these loss functions allow one to manage the number of false positive and negative errors.

3 Naive Approaches

In this section, we propose three naive approaches to predict stock-outs. These algorithms are used as baselines for measuring the quality of the DNN algorithm. They are easy to implement, but they do not consider the system state at any nodes other than the node for which we are predicting stockouts. (The proposed DNN approach, in contrast, uses the state at all nodes to provide a more effective prediction.)

In all of the naive algorithms, we use to denote the inventory position in period . Also, and are the numbers of the training and testing records, respectively, and is the demand of the customers in each period of the training set. Finally, the function approximator() takes a list of numbers, fits a normal distribution to it, and returns the corresponding parameters of the normal distribution.

1:procedure Naive-1
2:     given as an input;
3:     ; Training procedure
4:      = approximator();
5:     ;
6:     for  do Testing procedure
7:         if  then
8:              prediction
9:         else
10:              prediction
11:         end if
12:     end for
13:     return prediction
14:end procedure
Algorithm 1 Naive Algorithm 1

Naive Algorithm 1 first determines all periods in the training data in which a stock-out occurred and builds a list of the inventory positions in the preceding period for each. Then it fits a normal distribution to the values in and calculates the th quantile of that distribution, for a given value of . Finally, it predicts a stock-out in period if is less than that quantile. The value of is determined by the modeler.

Naive Algorithm 2 groups the inventory positions into a set of ranges, calculates the frequency of stock-outs in the training data for each range, and then predicts a stock-out in period if the range that falls into experienced stock-outs is times more than of the time in the training data.

Finally, Naive Algorithm 3 uses classical inventory theory, which says the inventory level in period equals minus the lead-time demand, where is the lead time (Zipkin, 2000; Snyder and Shen, 2018). The algorithm estimates the lead-time demand distribution by fitting a normal distribution based on the training data, then predicts a stockout in period if is less than or equal to the th quantile of the estimated lead-time demand distribution, where is a parameter chosen by the modeler.

1:procedure Naive-2
2:     ; ;
3:     given as an input;
4:     Divide into equal intervals ;
5:      ;
6:     for  do Training procedure
7:          such that
8:         if then
10:         else
12:         end if
13:     end for
14:     for  do Testing procedure
15:          such that
16:         if  then
17:              prediction
18:         else
19:              prediction
20:         end if
21:     end for
22:     return prediction
23:end procedure
Algorithm 2 Naive Algorithm 2
1:procedure Naive-3
2:      = approximator(); Training procedure
3:     given as an input;
4:     ;
5:     for  do Testing procedure
6:         if  then
7:              prediction
8:         else
9:              prediction
10:         end if
11:     end for
12:     return prediction
13:end procedure
Algorithm 3 Naive Algorithm 3

In Naive Algorithms 1 and 3, the value of (and hence ) and the value of in Naive Algorithm 2 is selected by the modeler. A small value of results in a small so that the algorithm predicts fewer stock-outs. The same is true for a small . Generally, as or decreases, the number of false positive errors decreases compared to the number of false negative errors, and vice versa. Thus, selecting an appropriate value of or is important and directly affects the output of the algorithm. Indeed, the value of or has to be selected according to the preferences of the company running the algorithm. For example, a company may have very expensive stock-outs. So, it may choose a very large or so that the algorithm predicts frequent stock-outs, along with many more false positive errors, and then checks them one by one to prevent the stock-outs. In this situation the number of false positive errors increases; however, the company faces fewer false negative errors, which are costly. In order to determine an appropriate value of or , the modeler should consider the costs of false positive and negative errors, i.e., and , respectively.

4 Numerical Experiments

In order to check the validity and accuracy of our algorithm, we conducted a series of numerical experiments. Since there is no publicly available data of the type needed for our algorithm, we built a simulation model that assumes each node follows a base-stock policy and can make an order only if its predecessor has enough stock to satisfy it so that only the retailer nodes face stock-outs. The simulation records several state variables for each of the nodes and for each of the time periods. Figure 2 shows the flowchart of the simulation algorithm used.

Figure 2: The simulation algorithm used to simulate a supply network

To see how our algorithm works with different network topologies, we conducted multiple tests on five supply chain network topologies, ranging from a simple series system to complex networks containing (undirected) cycles and little or no symmetry. These tests are intended to explore the robustness of the DNN approach on simple or very complex networks. The five supply chain networks we used are:

  • Serial network with 11 nodes.

  • One warehouse, multiple retailer (OWMR) network with 11 nodes.

  • Distribution network with 13 nodes.

  • Complex network I with 11 nodes, including one retailer and two warehouses.

  • Complex network II with 11 nodes, including three retailers and one node at the farthest echelon upstream (which we refer to as a warehouse).

We simulated each of the networks for periods, with 75% of the resulting data used for training (and validation) and the remaining 25% for testing. For all of the problems we used a fully connected DNN network with 350 and 150 sigmoid nodes in the first and second layers, respectively. The inputs are the inventory levels and on-order inventories for each node from each of the most recent periods (as given in (2)), and the output is the binary stock-out predictor for each of the nodes. Figure 3 shows a general view of the DNN network. Among the loss functions reviewed in Section 2, the soft-max loss function had the best accuracy in initial numerical experiments. Thus, the soft-max loss function was selected and its results are provided. To this end, we implemented the weighted soft-max function and its gradient (see Appendix B) in the DNN computation framework Caffe (Jia et al., 2014), and all of the tests were done on machines with 16 AMD cores and 32 GB of memory. In order to optimize the network, the SGD algorithm—with batches of 50—with momentum is used, and each problem is run with MaxEpoch=3. Each epoch defines one pass over the training data. Finally, we tested 99 values of and 118 values of , such that .

Figure 3: A network used to predict stock-outs of two nodes. For each of the networks, we used a similar network with soft-max outputs.

The DNN algorithm is scale dependent, meaning that the algorithm hyper-parameters (such as , learning rate, momentum, etc.; see Goodfellow et al. (2016)) are dependent on the values of and . Thus, a set of appropriate hyper-parameters of the DNN network for a given set of cost coefficients does not necessarily work well for another set . This means that, ideally, for each set of , we should re-tune the DNN hyper-parameters, i.e., re-train the network. However, the tuning procedure is computationally expensive, so in our experiments we tuned the hyper-parameters for and and used the resulting value for other sets of costs, in all network topologies. However, in complex network II, we did not get good convergence using this method, so we tuned the network for another set of cost coefficients to make sure that we get a not-diverging DNN for each set of coefficients. To summarize, our experiments use minimal tuning, except for complex network II—still far less than the total numbers of possible experiment—, but even so, the algorithm performs very well; however, better tuning could further improve our results.

In what follows, we demonstrate the results of the DNN and three naive algorithms in seven experiments. Sections 4.14.5 present the results of the serial, OWMR, distribution, complex I, and complex II networks, respectively. Section 4.7 extends these experiments: Section 4.7.1 provides threshold prediction, Section 4.7.2 analyzes the results of a distribution network with multiple items with dependent demand, and Section 4.7.3 shows the results of predicting stock-outs multiple periods ahead in a distribution network. In each of the network topologies, we plot the false positive vs. false negative errors for all algorithms to compare their performance. In addition, two other figures in each section show the accuracy vs. false positive and negative errors to provide better insights into the way that the DNN algorithm (weighted and unweighted) works compared to the naive algorithms.

4.1 Results: Serial Network

Figure 4 shows the serial network with 11 nodes. The training dataset is used to train all five algorithms and the corresponding results are shown in Figures 5 and 6.

Figure 4: The serial network

Figure 5 plots the false-negative errors vs. the false-positive errors for each approach and for a range of values for the naive approaches and a range of weights for the weighted DNN approach. Points closer to the origin indicate more desirable solutions. Since there is just one retailer, the algorithms each make stock-out predictions (one in each of the testing periods); therefore, the number of errors in both figures should be compared to .

The DNN approach always dominates the naive approaches, with the unweighted version providing a slightly better accuracy but the weighted version providing more flexibility. For any given number of false-positive errors, the numbers of false-negative errors of the DNN and WDNN algorithms are smaller than those of the naive approaches, and similarly for a given number of false-negative errors. The results of naive approaches are similar to each other, with Naive-1 and Naive-3 outperforming Naive-2 for most values. Similarly, Figure 6 plots the errors vs. the accuracy of the predictions and shows that for a given number of false positives or negatives, the DNN approaches attain a much higher level of accuracy than the naive approaches do. In conclusion, the naive algorithms perform similar to each other and worse than DNN, since they do not use the available historical information. In contrast, DNN learns the relationship between state inputs and stock-outs and can predict stock-outs very well.

Figure 5: False positives vs. false negatives for the serial network
Figure 6: Accuracy of each algorithm for the serial network

4.2 Results: OWMR Network

Figure 7 shows the OWMR network with 11 nodes and Figures 8 and 9 present the experimental results for this network. Since there are 10 retailers, prediction is more challenging than for the serial network, as the algorithms each make stock-out predictions; the number of errors in both figures should be compared to .

Figure 7: The OWMR network

Figure 8 shows the false-negative errors vs. the false-positive errors for each approach and for a range of values for the naive approaches and a range of weights for the weighted DNN approach. DNN and weighted DNN dominate the naive approaches. The three naive approaches are similar to each other, with Naive-2 somewhat worse than the other two. Figure 9 plots the errors vs. the accuracy of the predictions and confirms that DNN can attain higher accuracy levels for the same number of errors than the naive approaches. It is also apparent that all methods are less accurate for the OWMR system than they are for the serial system since there are many more predictions to make. However, DNN still provides better accuracy compared to the naive approaches.

Figure 8: False positives vs. false negatives for the OWMR network
Figure 9: Accuracy of each algorithm for the OWMR network

4.3 Results: Distribution Network

Figure 10 shows the distribution network with 13 nodes, and Figure 11 provides the corresponding results of the five algorithms.

Figure 10: The distribution network

As Figure 11 shows, the DNN approach mostly dominates the naive approaches. However,it does not perform as well as in serial or OWMR networks; that occurs because of the tuning of the DNN network hyper-parameters. Among the three naive approaches, Naive-3 dominates Naive-1, since the demand data comes from a normal distribution without any noise, and the algorithm also approximates a normal distribution, which needs around 12 samples to get a good estimate of the mean and standard deviation. Therefore, the experiment is biased in favor of Naive-3. Plots of the errors vs. the accuracy of the predictions are similar to those in Figure 9; they are omitted to save space.

Figure 11: False positives vs. false negatives for the distribution network

Compared to the OWMR network, the distribution network includes fewer retailer nodes and therefore fewer stock-out predictions; however, the network is also more complex, and as a result the DNN is less accurate than it is for OWMR network. We conclude that the accuracy of the DNN depends more on the number of echelons in the system than it does on the number of retailers. On the other hand, DNN obtains greater accuracy than any of the naive approaches.

4.4 Results: Complex Network I

Figure 12 shows a complex network with two warehouses (i.e., two nodes at the farthest echelon upstream), and Figure 13 presents the corresponding results of the five algorithms.

Figure 12: The complex network, two warehouses

Figure 13 plots the false-negative errors vs. the false-positive errors for each approach and for a range of values for the naive approaches and a range of weights for the weighted DNN approach. The DNN approach dominates the naive approaches for most cases, but does worse when false-positives are tolerated in favor of reducing false-negatives. The average accuracy rates for this system are 91% for WDNN and 97% for DNN, which show the importance of hyper-parameter tuning for each weight of the weighted DNN approach. Tuning it for each weight individually would improve the results significantly (but increase the computation time). Plots of the errors vs. the accuracy of the predictions are similar to those in Figure 9; they are omitted to save space.

Figure 13: False positives vs. false negatives for complex network I

As in the serial network, there is just one retailer node; however, since the network is more complex, DNN produces less accurate predictions for complex network I than it does for the serial network, or for the other tree networks (OWMR and distribution). The added complexity of this network topology has an effect on the accuracy of our model, though the algorithm is still quite accurate.

4.5 Results: Complex Network II

Figure 14 shows the complex network with three retailers and Figure 15 presents the corresponding results of each algorithm.

Figure 14: The complex network, three retailers

Figure 15 plots the false-negative errors vs. the false-positive errors for each approach and for a range of values for the naive approaches and a range of weights for the weighted DNN approach. Figure 16 plots the errors vs. the accuracy of the predictions. As we did for the other network topologies, for complex network II we tuned the DNN network hyper-parameters for the case of and and used the resulting hyper-parameters for all other values of . However, the hyper-parameters obtained in this way did not work well for 46 sets of values, mostly those with . In these cases, the training network did not converge, i.e., after 3 epochs of training, the network generally predicted 0 (or 1) for every data instance, even in the training set, and the loss values failed to decrease to an acceptable level. Thus, we also tuned the hyper-parameters for and and used them to obtain the results for these 46 cases. The hyper-parameters obtained using and are all given in Table 1. We used the first set of hyper-parameters for 72 of the 118 combinations of values and the second set for the remaining 46 combinations. Additional hyper-parameter tuning would result in further improved dominance of the DNN approach.

Network lr
Serial 0.001 0.0005 0.0001
Distribution 0.0005 0.001 0.0005
OWMR 0.001 0.0005 0.0005
Complex-I 0.05 0.000005 0.000005
Complex-II, 0.05 0.05 0.05
Complex-II, 0.005 0.005 0.005
Table 1: The hyper-parameters used for each network
Figure 15: False positives vs. false negatives for complex network II
Figure 16: Accuracy of each algorithm for complex network II

Complex network II is the most complex network among all the networks we analyzed, since it is a non-tree network with multiple retailers. As Figure 16 shows, WDNN performs worse than the naive approaches for a few values of the weight, which shows the difficulty of the problems and the need to tune the network’s hyper-parameters for each set of cost coefficients.

4.6 Results: Comparison

In order to get more insight, the average accuracy of each algorithm for each of the networks is presented in Table 2. The average is taken over all instances of a given network type, i.e., over all cost parameters. In the column headers, N1, N2, and N3 stand for the Naive-1, Naive-2, and Naive-3 algorithms. The corresponding hyper-parameters that we used to obtain these results are also presented in Table 1.

DNN provides the best accuracy compared to the other algorithms. WDNN is equally good for the serial and OWMR networks and slightly worse for the distribution and complex II networks. The difference is larger for complex I; this is a result of the fact that we did not re-tune the DNN network for each value of the cost parameters, as discussed in Section 4.4. We conclude that DNN is the method to choose if the user wants to ensure high accuracy; and WDNN is useful if the user wants to control the balance between false positive and false negative errors.

The column labeled shows the number of cost-parameter values in which one of Naive-3’s predictions has fewer false positive and fewer false negative errors than at least one of the predictions of Naive-1. This happens often for some networks, since the simulated data are normally distributed and since Naive-3 happens to assume a normal distribution. We would expect the method to work worse if the simulated data were from a different distribution.

The last column shows a similar comparison for the Naive-3 and WDNN algorithms. In particular, Naive-3 never dominates WDNN in this way.

Network N1 N2 N3 WDNN DNN
Serial 0.94 0.97 0.95 0.99 0.99 0 0
Distribution 0.91 0.93 0.99 0.95 0.95 91 0
OWMR 0.91 0.93 0.91 0.95 0.98 19 0
Complex I 0.86 0.94 0.92 0.91 0.97 22 0
Complex II 0.86 0.94 0.92 0.94 0.97 0 0
Table 2: Average accuracy of each algorithm

4.7 Extended Results

In this section we present results on some extensions of our original model and analysis. In Section 4.7.1, we examine the ability of the algorithms to predict whether the inventory level will fall below a given threshold that is not necessarily 0. In Section 4.7.2, we apply our method to problems with dependent demands. Finally, in Section 4.7.3, we explore multiple-period-ahead predictions.

4.7.1 Threshold Prediction

The models discussed above aim to predict whether a stock-out will occur; that is, whether the inventory level will fall below 0. However, it is often desirable for inventory managers to have more complete knowledge about inventory levels; in particular, we would like to be able to predict whether the inventory level will fall below a given threshold that is not necessarily 0. In order to see how well our proposed algorithms perform at this task, in this section we provide results for the case in which we aim to predict whether the inventory level will fall below 10.

A similar procedure is applied to achieve the results of all algorithms. In particular, we changed the way that the data labels are applied so that we assign a label of 1 when and a label of 0 otherwise. We exclude the results of the DNN and Naive-2 algorithms, since they are dominated by the WDNN and Naive-3 algorithms. Figures 1721 present the results of the serial, OWMR, distribution, complex I, and complex II networks. As before, WDNN outperforms the naive algorithms. Table 3 provides the overall accuracy of all algorithms and the comparisons between them; the columns are the same as those in Table 2. As before, WDNN performs better than or equal to the other algorithms for all networks. The accuracy figures for this case are provided in Appendix E.

Figure 17: False positives vs. false negatives for serial network
Figure 18: False positives vs. false negatives for OWMR network
Figure 19: False positives vs. false negatives for distribution network
Figure 20: False positives vs. false negatives for complex network I
Figure 21: False positives vs. false negatives for complex network II
Network N1 N3 WDNN
Serial 0.88 0.96 0.99 0 0
Distribution 0.90 0.92 0.93 89 0
OWMR 0.91 0.92 0.96 97 0
Complex I 0.85 0.87 0.97 13 0
Complex II 0.82 0.87 0.96 0 0
Table 3: Average accuracy of each algorithm for predicting inventory level less than 10

4.7.2 Multi-Item Dependent Demand Multi-Echelon Problem

The data sets we have used so far assume that the demands are statistically independent. However, in the real world, demand for multiple items are often dependent on each other. Moreover, this dependence information provides additional information for DNN and might help to provide more accurate stock-out predictions. To analyze this, we generated the data for seven items with dependent demands, some positively and some negatively correlated. The mean demand of the seven items for seven days of a week is shown in Figure 22. For more details see Appendix D which provides the demand means and standard deviation for each item and each day.

Figure 22: The demand of seven items in each day

We tested this approach using the distribution network (Figure 10). Figure 23 plots the false-negative errors vs. the false-positive errors for each approach and for a range of values for the naive approaches and a range of weights for the weighted DNN approach. WDNN produces an average accuracy rate of 99% for this system, compared to 95% for the independent-demand case, which shows how DNN is able to make more accurate predictions by taking advantage of information it learns about the demand dependence. Finally, Figure 24 plots the errors vs. the accuracy of the predictions. DNN and WDNN provide much more accurate predictions than the naive methods.

Figure 23: False positives vs. false negatives for distribution network with multi-item dependent demand
Figure 24: Accuracy of each algorithm for distribution network with multi-item dependent demand

4.7.3 Multi-Period Prediction

In order to see how well our algorithm can make stock-out predictions multiple periods ahead, we revised the DNN structure, such that there are output values in the DNN algorithm, where is the number of prediction periods. We tested this approach using the distribution network (Figure 10).

We tested the algorithm for three different problems. The first predicts stock-outs for each of the next two days; the second and third to the same for the next three and seven days, respectively. The accuracy of the predictions for each day are plotted in Figure 25. For example, the blue curve shows the accuracy of the predictions made for each of the next 3 days when we make predictions over a horizon of 3 days. The one-day prediction accuracy is plotted as a reference.

Not surprisingly, it is harder to predict stock-outs multiple days in advance. For example, the accuracy for days 4–7 is below 90% when predicting 7 days ahead. Moreover, when predicting over a longer horizon, the predictions for earlier days are less accurate. For example, the accuracy for predictions 2 days ahead is roughly 99% if we use a 2-day horizon, 95% if we use a 3-day horizon, and 94% if we use a 7-day horizon. Therefore, if we wish to make predictions for each of the next days, it is more accurate (though slower) to run separate DNN models rather than a single model that predicts the next days.

Figure 25: Average accuracy over seven days in multi-period prediction

5 Conclusion and Future Works

We studied stock-out prediction in multi-echelon supply chain networks. In single-node networks, classical inventory theory provides tools for making such predictions when the demand distribution is known. However, there is no algorithm to predict stock-out in multi-echelon networks. To address this need, we proposed an algorithm based on deep learning. We also introduced three naive algorithms to provide a benchmark for stock-out prediction. None of the algorithms require knowledge of the demand distribution; they use only historical data.

Extensive numerical experiments show that the DNN algorithm works well compared to the three naive algorithms. The results suggest that our method holds significant promise for predicting stock-outs in complex, multi-echelon supply chains. It obtains an average accuracy of 99% in serial networks and 95% for OWMR and distribution networks. Even for complex, non-tree networks, it attains an average accuracy of at least 91%. It also performs well when predicting inventory levels below a given threshold (not necessarily 0), making predictions when the demand is correlated, and making predictions multiple period ahead.

Several research directions are now evident, including expanding the current approach to handle other types of uncertainty, e.g., lead times, supply disruptions, etc. Improving the model’s ability to make accurate predictions for more than one period ahead is another interesting research direction. Our current model appears to be able to make predictions accurately up to roughly 3 periods ahead, but its accuracy degrades quickly after that. Finally, the model can be extended to take into account other supply chain state variables in addition to current inventory and in-transit levels.

6 Acknowledgment

This research was supported in part by NSF grant #CMMI-1663256. This support is gratefully acknowledged.


  • Cannon [2011] Alex J Cannon. Quantile regression neural networks: Implementation in R and application to precipitation downscaling. Computers & geosciences, 37(9):1277–1284, 2011.
  • Clark and Scarf [1960] Andrew J Clark and Herbert Scarf. Optimal policies for a multi-echelon inventory problem. Management science, 6(4):475–490, 1960.
  • Efendigil et al. [2009] Tuğba Efendigil, Semih Önüt, and Cengiz Kahraman. A decision support system for demand forecasting with artificial neural networks and neuro-fuzzy models: A comparative analysis. Expert Systems with Applications, 36(3):6697–6707, 2009.
  • Eruguz et al. [2016] Ayse Sena Eruguz, Evren Sahin, Zied Jemai, and Yves Dallery. A comprehensive survey of guaranteed-service models for multi-echelon inventory optimization. International Journal of Production Economics, 172:110–125, 2016.
  • Gartner Inc. [2011] Gartner Inc. Improving on-shelf availability for retail supply chains requires the balance of process and technology, gartner group., 2011. Accessed: 2016-08-04.
  • Goodfellow et al. [2016] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
  • Graves [1988] Stephen C. Graves. Safety stocks in manufacturing systems. Journal of Manufacturing and Operations Management, 1:67–101, 1988.
  • Graves and Willems [2000] Stephen C. Graves and Sean P. Willems. Optimizing strategic safety stock placement in supply chains. Manufacturing and Service Operations Management, 2(1):68–83, 2000.
  • Gruen et al. [2002] Thomas W Gruen, Daniel S Corsten, and Sundar Bharadwaj. Retail out-of-stocks: A worldwide examination of extent, causes and consumer responses. Grocery Manufacturers of America Washington, DC, 2002.
  • Jia et al. [2014] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
  • Ko et al. [2010] Mark Ko, Ashutosh Tiwari, and Jörn Mehnen. A review of soft computing applications in supply chain management. Applied Soft Computing, 10(3):661–674, 2010.
  • Kourentzes and Crone [2010] Nikolaos Kourentzes and Sven Crone. Advances in forecasting with artificial neural networks. 2010.
  • Magnanti et al. [2006] Thomas L. Magnanti, Zuo-Jun Max Shen, Jia Shu, David Simchi-Levi, and Chung-Piaw Teo. Inventory placement in acyclic supply chain networks. Operations Research Letters, 34:228–238, 2006.
  • Oroojlooyjadid et al. [2016] Afshin Oroojlooyjadid, Lawrence Snyder, and Martin Takáč. Applying deep learning to the newsvendor problem., 2016.
  • Rosling [1989] Kaj Rosling. Optimal inventory policies for assembly systems under random demands. Operations Research, 37(4):565–579, 1989.
  • Simchi-Levi and Zhao [2012] David Simchi-Levi and Yao Zhao. Performance evaluation of stochastic multi-echelon inventory systems: A survey. Advances in Operations Research, vol. 2012, 2012.
  • Snyder and Shen [2018] Lawrence V Snyder and Zuo-Jun Max Shen. Fundamentals of Supply Chain Theory. John Wiley & Sons, 2th edition, 2018.
  • Taylor [2000] James W Taylor. A quantile regression neural network approach to estimating the conditional density of multiperiod returns. Journal of Forecasting, 19(4):299–311, 2000.
  • Vieira [2015] Armando Vieira. Predicting online user behaviour using deep learning algorithms. Computing Research Repository -, abs/1511.06247, 2015. URL
  • Xu et al. [2016] Qifa Xu, Xi Liu, Cuixia Jiang, and Keming Yu. Quantile autoregression neural network model with applications to evaluating value at risk. Applied Soft Computing, 49:1–12, 2016.
  • Zipkin [2000] Paul H. Zipkin. Foundations of Inventory Management. McGraw-Hill, Irwin, 2000.

Appendix A Stock-Out Prediction for Single-Stage Supply Chain Network

Consider a single-stage supply chain network. The goal is to obtain the stock-out probability and as a result make a stock-out prediction, i.e., we want to obtain the probability:

where is the ending inventory level in period . Classical inventory theory (see, e.g., Snyder and Shen [2018], Zipkin [2000]) tells us that

where is the lead time, is the inventory position (inventory level plus on-order inventory) after placing a replenishment order in period , and is the lead-time demand. Since we know and we know the probability distribution of , we can determine the probability distribution of and use this to calculate . Then we can predict a stock-out if this probability is larger than , for some desired threshold .

Appendix B Gradient of Weighted Soft-max Function


Then the gradient of the soft-max loss function (4) is:

and the gradient of weighted soft-max loss function (7) is:

Appendix C Activation and Loss Functions

The most common loss functions are the hinge (8), logistic (9), and Euclidean (10) loss functions, given (respectively) by:


where is the observed value of sample , and is the output of the DNN. The hinge loss function is appropriate for classification. The logistic loss function is also used for classification; however, it is a convex function which is easier to optimize than the hinge function. The Euclidean loss function minimizes the difference between the observed and calculated values and penalizes closer predictions much less than farther predictions.

Each node of the DNN network has an activation function. The most commonly used activation functions are sigmoid, tanh, and inner product, given (respectively) by:


Appendix D Dependent Demand Data Generation

This section provides the details of data generation for dependent demands. In the case of dependent demand, there are seven items, and the demand mean of each item is different on different days of the week. Tables 4 and 5 provide the mean () and standard deviation () of the normal distribution of for each item in each day of week.

Item Mon Tue Wen Thu Fri Sat Sun
1 12 10 9 11 14 9 11
2 14 12 11 9 16 7 9
3 8 7 6 14 10 13 14
4 7 6 5 15 9 14 15
5 6 5 4 16 7 15 16
6 8 7 6 14 10 13 13
7 10 9 8 12 12 11 12
Table 4: The mean demand () of each item on each day of the week.
Item Mon Tue Wen Thu Fri Sat Sun
1 3 2 4 1 2 3 2
2 4 3 4 1 3 2 1
3 1 1 2 2 2 4 3
4 1 1 1 3 1 4 3
5 1 1 1 2 1 3 3
6 2 1 1 3 1 3 3
7 3 2 4 1 2 3 2
Table 5: The mean standard deviation () of each item on each day of the week.

Appendix E Results of Threshold-Prediction Case

This section provides the accuracy results for the problem described in Section 4.7.1, in which we wish to predict whether the inventory level will fall below 10. Figures 2630 show the results for the serial, OWMR, distribution, complex I, and complex II networks, respectively.

Figure 26: Accuracy of each algorithm for serial network
Figure 27: Accuracy of each algorithm for OWMR network
Figure 28: Accuracy of each algorithm for distribution network
Figure 29: Accuracy of each algorithm for complex network I
Figure 30: Accuracy of each algorithm for complex network II
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description