Optimal Policies for the Sequential Stochastic Threshold Assignment Problem
The Stochastic Sequential Threshold Assignment Problem (SSTAP) addresses the optimal assignment of arriving tasks (jobs) to available resources (workers) to maximize a reward function which consists of indicator functions that incorporate threshold constraints. We present an optimal assignment policy for SSTAP, independent of the probability distribution of the job values and of the number of arriving jobs. We show through an example that this type of reward function can model aviation security problems. We analyze the performance limitations of systems that use the SSTAP optimal assignment policy. Finally, we study the multiple levels SSTAP and the SSTAP with uncertainties in workers performance rates.
The stochastic sequential assignment problem (SSAP) addresses the assignment of entities (jobs) to available resources (workers) under uncertainties in the parameters. The uncertainties are typically modeled as probability distributions that govern the random parameters of the problem . SSAP appears naturally in the passenger screening process for aviation security and in the Internet for the optimal assignment of online requests to available servers.
SSAP was introduced in  where an optimal assignment policy is proven for independent and identically distributed (i.i.d.) random job values, based on the partition of the domain of jobs into subintervals. Optimal assignment identifies the subinterval for each job. Kennedy  presents an updated optimal assignment policy for random job values that are not necessarily i.i.d..
Sakaguchi  discusses a generalization of the SSAP for unknown total number of jobs and, Nikolaev and Jacobson  for random total number of jobs. Other variations of the problem include the optimal sequential assignment with random arrival times and reward functions with discount factors , the optimal policy for SSAP with random deadlines  and the SSAP with uncertainty in the job value distribution .
Markov chain is a regularly used mathematical structure for the modeling of sequential random processes. Nakai and Toru  discuss SSAP for partially observed Markov chains. Baharian and Jacobson  provide a Markov-decision-process approach for the assignment of tasks under a threshold criterion, which minimizes the probability of the total reward failing to a target value. Furthermore, Baharian and Jacobson  obtain stationary policies, which achieve the optimal expected reward per task as the number of tasks approaches infinity, with distributions governed by an ergodic Markov chain.
Apart from the uncertainties in the job values, uncertainties in the performance rates of the workers may occur. In the case of human workers, the performance rates are uncertain due to human errors and fatigue; in the case of machines due to disturbances in measurements and equipment aging. This issue led to the study of SSAP under uncertainties in workers performance rates. The study of optimal assignment policies for SSAP with random worker performance rates  and with time-dependent performance rates  has led to the formulation and solution of the doubly stochastic sequential assignment problem .
SSAPs have found application in numerous areas. Nikolaev et al.  addresses the sequential stochastic security design problem (SSSDP), which models passenger and carry-on baggage-screening operations in an aviation security system, to maximize the total security of all passenger-screening decisions over a fixed time period, given passenger risk levels and security device parameters.
McLay et al.  introduces the Sequential Stochastic Passenger Screening Problem (SSPSP), which allows passengers to be optimally assigned (in real-time) to aviation security resources. Lee et al.  study a real-time sequential binary passenger assignment model as a discrete time difference equation, which is manipulated via nonlinear control techniques. Nikolaev et al.  address the multistage sequential passenger screening problem (MSPSP) that models passenger and carry-on baggage screening operations in an aviation security system with the capability of dynamically updating the perceived risk of passengers. SSAP also applies to financial problems such as optimal stochastic sequential investment  and investment decisions under uncertainty . Other applications appear in the fields of computer science for reservation systems  and game theory for online mechanism design .
This paper introduces a version of SSAP based on a new type of reward function defined using indicator functions, which capture threshold constraints. The new version called stochastic sequential threshold assignment problem (SSTAP). We us order-preserving functions in the inequality conditions of the indicator functions and we prove an optimal assignment policy based on a Greedy algorithm which assigns the available resource of smallest value that can satisfy the threshold. We provide an example which illustrates the application of SSTAP in aviation security. A suitable order-preserving function that captures the characteristics of the aviation security problem is used.
Given the performance rates of the workers, the threshold and the order-preserving function, we provide a performance analysis of a system that uses SSTAP optimal assignment policy. We research the maximum and minimum job load that a SSTAP system can service while achieving its maximum reward. In SSTAP the optimal sequential assignment algorithm does not depend on the distributions of the job values. Therefore, we can look for a probability distribution for the i.i.d. random job values that maximizes the reward function (6) for the maximum and minimum job load respectively. The optimal mass probability function is provided and the respective density function is an approximation defined as a summation of Gaussians.
In passenger screening for aviation security, workers are organized in more than one level. We analyze the multiple levels SSTAP, which reflects more accurately the aviation security process, where the workers are partitioned into levels according to their performance rates. Human workers deficiencies or machine workers faults have led to the development of the SSTAP with uncertainties in workers performance rates, which is termed the doubly stochastic sequential threshold assignment problem (DSSTAP). We provide an optimal assignment policy for the DSSTAP.
We mention the necessary mathematical background from graph theory and probability theory. We provide an overview of the basic results on SSAP and define the reward function using indicator functions that capture threshold constraints for order-preserving functions.
2.1 Mathematical background
We start with basic definitions from graph theory. A graph is a pair , where is a finite set called the vertex set and is the edge set which consists of unordered pairs of vertices. A bipartite graph, is the graph whose set of vertices is decomposed into two disjoint sets such that no two graph vertices within the same set are adjacent. A matching is a subset of edges such that each node in appears in at most one edge in . We provide the notion of the maximum weight bipartite matching used in Section 6, .
(Maximum weight bipartite matching). Let us consider a bipartite graph with bipartition and weight function . The maximum weight bipartite matching is the matching that maximizes the matching weight:
We also require some basic notions from probability theory. A sequence of random variables can convergence to a random variable in four ways; almost surely, in probability, in the mean square sense and in distribution. We provide the definition of convergence in distribution for a sequence of random variables used in Section 4, .
(Convergence in distribution). A sequence of random variables converges in distribution to random variable if at all continuity points of . Convergence in distribution is denoted by .
2.2 Results on SSAP
Suppose that there are workers available to perform sequentially arriving tasks. The value of task is a random variable with a known cumulative distribution function and domain . If are independent and identically distributed (i.i.d) random variables, then . Each worker is characterized by a deterministic performance rate , . Once task arrives, its value is revealed. The goal for SSAP is to invent the optimal assignment policy that maximizes the total expected reward,
where is the set of all permutations of the integers and refers to the worker assigned to the arriving task. A worker can not be reassigned.
(Hardy’s Lemma )
If and are sequences of non-negative numbers, then
where is a permutation of .
The optimal assignment policies for i.i.d and for general random job values are given in the following two theorems,
 For each , there exist numbers , such that whenever there are i.i.d jobs values and performance rates then the optimal choice is to use if the random variable of the job is contained in the interval . The are independent of the performance rates but depend on according to the following recursive relation
for , where and are defined to be 0.
 Let , , be any (not necessarily i.i.d.) random variables. For any and , define random variables such that:
, for ; (2) , for ; (3) ; (4) , for , . where , , is a sigma-field over all possible realizations of vector , , denotes the maximum, and denotes the minimum.
If the performance rates of the workers are revealed at the beginning of the process then the problem reduces to the original case where the performance rates were known. A similar problem with i.i.d. random performance rates that follow a distribution at each stage of the problem, was studied for the SSAP in . The main result is presented below
 The Greedy algorithm, which assigns the arriving task to the worker with the maximum performance rate value at each stage, achieves the maximum total expected reward in the DSSAP with i.i.d. random performance rates. Moreover, the maximum total expected reward is given by
where denotes the expected value of the random variables and denotes the cdf of the workers’ random performance rates at stage of the process where we have i.i.d. random performance rates.
2.3 Threshold Reward Function
We introduce a new type of reward function (6) defined using indicator functions which capture threshold constraints. Again, we consider jobs arriving sequentially, each following a distribution , and workers with performance rates , . By the time job arrives, its value is randomly generated following and it is assigned online to a worker. The reward function is
the threshold , the threshold function is a two variable function , where is the set of job values, is the set of workers performance rates, and represents the assignment policy. The objective is to find an optimal policy that determines the assignment of jobs to workers, , , , such that , , , , and the reward function (6) is maximized.
For SSTAP we perform a stronger type of optimization. We do not maximize the expected value of the reward function but for each randomly generated sequence of job values we maximize the reward function. We focus our attention on a general class of threshold functions , the order-preserving functions on the argument of worker performance rate.
(Order-preserving function) Consider the two variable function , where , are discrete sets of positive real numbers. The function is order-preserving on the arguments of if the order of the values , , , is independent of the .
3 Optimal Assignment Policy
We present the optimal assignment policy given a threshold , job values arriving sequentially and worker performance rates , . The number of jobs, is fixed. The optimal policy is provided for order-preserving functions on the argument of worker value. Each time a job arrives, it is assigned to the non assigned worker with . If no such exists then is rejected. First, we give a lemma used in the proof of the theorem.
where is any performance rate such that .
Proof: By definition, since then for any such that , it holds that . Since is order-preserving in the argument of the performance rates, we have that for any .
Although the order-preserving function guarantees for any , it does not imply that for any . We provide the main theorem of the paper.
Given a set of job values arriving sequentially where is randomly generated following , a set of performance rates , an order-preserving function and a threshold value , the optimal assignment policy that maximizes the cost function
is to assign to each arriving the not already assigned worker who corresponds to , where belongs to the set of the performance rates of the non-assigned workers. If no such exists then is rejected.
Proof: Suppose that jobs have arrived. Define the following sets: contains the first jobs that appeared, contains the performance rates of the already assigned workers, contains the performance rates of the non-assigned workers and contains the jobs assigned to some worker (not rejected).
The proof proceeds by induction on . For , we assign the single job to the worker with and we get the maximum possible reward which is . If such does not exist then the job is rejected and we get zero reward. Thus using the suggested policy, we maximize the reward. We now assume that the claim holds for , we prove that it holds for .
Let , where are ordered and assigned to workers with performance rates , respectively. Let be the job value. If there is a performance rate such that , then we assign the job to the worker.
Let us assume that such a success rate does not exist and that there is an alternate assignment of the preceding jobs such that the jobs can all pass the threshold. We recognize two assignments, the initial and the alternate assignment. In the initial assignment we follow the suggested policy and the jobs are assigned to the performance rates , respectively. In the alternate assignment, the jobs are assigned to the performance rates respectively and to ; let , where is a mapping from to .
If a job is successfully assigned to a worker in the initial assignment then it will also be assigned to a worker in the alternate assignment; not necessarily the same. This is because there exist suitable performance rates such that the order-preserving function evaluated at this job is greater than the threshold. It is meaningless to ignore a job that ”passes” the threshold because we reduce the maximum reward by one unit that may not be replaced by one of the upcoming jobs. Even if it is replaced by one of the upcoming jobs we could have kept the job we initially ignored without changing the optimal reward.
Since, in the alternate assignment we are able to assign job to a worker the reward , provided by the alternate assignment, will be increased by one compare to the reward , provided by the initial assignment i.e., . We describe a process for all jobs according to which the performance rates of the alternate assignment can be swapped with the performance rates of the initial assignment without reducing the . We provide a detailed exposition of the process for job and it is the same for the jobs .
If we can assign to since it passes the threshold in the initial assignment, i.e ; and is placed in . If then there exists such that , where is assigned to job in the alternate assignment. From the alternate assignment we have and . We claim that we can swap the success rates , and still the thresholds are satisfied i.e., and .
In the initial assignment, we apply the suggested policy which implies:
For job , from the alternate assignment we have
This is the end of the process. Equations (11), (13) prove the claim. We continue this process for the success rates of all jobs: ,,. When we finish with all jobs, we get the following assignment , respectively and to , where is not necessarily equal to , due to the swaps that take place in the process, in any case . Therefore , which is a contradiction, since we assumed that does not exist. Hence, it is optimal the to be assigned to a performance rate according to the suggested policy. The claim holds for . This concludes the induction step. The suggested policy is optimal.
We provide two examples. The first one is an application of the optimal policy. The second example shows the necessity of the order-preserving function in order the suggested policy to be optimal.
Consider the SSTAP for four workers and four jobs with the order-preserving function , threshold and performance rates , , , that imply the following ordering . For the job values , , , , arriving sequentially, we get the following assignment: rejected, assigned to , assigned to and assigned to .
We provide an example that highlights the necessity of an order-preserving function in SSTAP. For the performance rates , , and the job values , , arriving sequentially in this order, we have the following function in the threshold constraints:
The is not order-preserving, since for while for . For threshold we observe that according to the suggested policy is assigned to . The is aborted since only could satisfy the threshold, but it is already assigned. The is assigned to . The reward is . However, we could have assigned to , to and to and get a reward of .
3.1 Illustrative Example
We give an example of the optimal assignment policy for a reward function with threshold constraints inspired from aviation security applications. We assume that the job values stand for the risk value of the passenger and the performance rates quantifies the capabilities of the workers, for . A job stands for a low risk passenger and he must be assigned to an officer of lower capabilities . Similarly, a job represents a high risk passenger and he must be assigned to high capabilities officer, . To this end, we introduce the following cost function,
We observe that the threshold function is order-preserving in the argument of , thus we can apply the optimal policy algorithm. For a threshold value in , if the job passes the threshold with very high probability and it will be served by a low performance rate officer, which is what we expect for a low risk passenger. On the other hand, if we need a higher performance rate officer in order to pass the threshold, which also describes the problem appropriately.
We provide a figure which depicts the number of passengers which pass the threshold out of a total of 200 passengers. The threshold varies from 0.1 to 5. The values of the jobs follow the uniform distribution . The values of the 200 workers are given by the expression , for .
For threshold , the jobs which pass the threshold are 200 out of 200. For threshold , the jobs which pass the threshold are 193 out of 200. For threshold , the jobs which pass the threshold are 58 out of 200.
3.2 Infinite Number of Jobs and Workers Cycling Back
In passenger screening systems for aviation security, the total number of passengers that will be examined is a random variable. In Internet transaction, the sequentially arriving tasks are infinite and the available servers must handle them. The optimal policy works for random number of jobs, even infinite as long as there exist available workers. The lack of available workers can be resolved by accepting workers that cycle back.
In cycling back, workers can be reused after the completion of a task. Each worker cycles back with cycle rate for , where is the number of workers. By definition, if worker never cycles back, then . The set of available workers is updated when a worker returns. We apply the policy given by Theorem (3.2) at the arrival of a new job and we use as input the updated list of available workers.
4 Performance analysis
In this section, we provide a performance analysis of a system that uses SSTAP optimal assignment policy. We research the maximum and minimum job load that a SSTAP system can service while achieving its maximum reward. The maximum and minimum job load are marginal values which if exceeded or missed respectively, the maximum reward is reduced.
We define the job load, for the set of job values , as the Euclidean norm
Given the performance rates and an order-preserving function , we compute the set . The is the maximum job load for reward equal to . If we increase the job load, the reward will be reduced. By computing the set , the is the minimum job load for reward equal to . If we further decrease the job load, the reward will be reduced.
We consider the set of job values , a threshold value , an order-preserving function and the sets , . Then
Proof: We proceed by contradiction. Let us assume that , this implies that all jobs with values are assigned to workers with performance rates respectively.
a) If we assume that then there exists such that , , which is a contradiction.
b) If we assume that then there exists such that , , which is a contradiction.
In the SSAP the random job values that follow determine the subintervals in the domain of for the optimal sequential assignment. In SSTAP the optimal sequential assignment algorithm does not depend on the distributions of the job values. Taking advantage of that, we can look for a probability distribution for the i.i.d. random job values that maximizes the reward function (6) for the maximum and minimum job load respectively.
We take , for small positive number, the mass probability function , maximizes the reward function given that we use the optimal policy. In case we have equal values , . However, we are interested in continuous probability distributions. To this end, we approximate the mass probability function using Gaussian distributions with mean values and standard deviations very small and peak value at
5 Multiple Levels SSTAP
The screening process of the airplane passengers is performed in multiple levels. This fact inspired the study of the multiple levels SSTAP. Multiple levels SSTAP is a generalization of SSTAP, where workers are organized into multiple levels. For SSTAP with -levels, a subset of workers is operating in a single level. In the level, , there are workers with performance rates , threshold and the order-preserving function . The reward function in the level is given by
For an arriving job , the optimal policy can be applied at level . If the job is rejected in level , it proceeds to level . The process continues until we reach the last level, where if the job is rejected, it is never assigned. The multiple levels SSTAP is characterized by a priority property from lower to higher levels. If a job can be assigned to a worker of the level then it must be assigned without proceeding to the next levels. The goal of SSTAP is to maximize the under the priority property. We provide the theorem that describes the optimal assignment policy for the -level SSTAP.
The optimal assignment policy for the -level SSTAP, is the vector , where is the optimal policy given by the algorithm described in Theorem 3.2 for the level of the problem, .
Proof: With respect to the priority property, we apply the policy described in Theorem 3.2 at each level and we get the vector of the optimal assignment policy for the -levels problem . If a job fails to be assigned at its current level it moves to the next level. At the final level if a job fails to be assigned, it is aborted.
The following lemma states that the partition of workers into multiple levels under the priority property may result in a smaller reward compared to the single level case.
Consider the -levels SSTAP, , with the same order-preserving function in every level, , and the same threshold , . We consider the induced -level SSTAP with reward function
For the total reward of the -levels SSTAP under the optimal policy and the reward of the induced -level SSTAP under the optimal policy , it holds
where the last inequality comes from the optimality of .
The multiple levels SSTAP permit us to organize the workers into groups according to their performance rates. For example, for a three level multiple levels SSTAP, following the ordering indicated by the order-preserving function , we can place the first of the workers in the first level, the next to the second level and the remaining to the third level.
6 SSTAP with Random Performance Rates
We extend the result (2.6) for SSTAP. This new problem is denoted as doubly stochastic sequential threshold assignment problem (DSSTAP). The reward function, we maximize is
Case I: We consider i.i.d. job values that follow the distribution and worker performance rates not i.i.d., that follow the distribution . The reward function is independent of the policy we apply
Case II: We consider job values not i.i.d. that follow the distribution and worker performance rates not i.i.d., that follow the distribution . The reward function depends on the policy we apply and the assignment problem reduces to the maximum weighted matching of the bipartite graph between the disjoint sets and with edge weights . The maximum weighted matching of the bipartite graph is resolved using the Hungarian algorithm, in time complexity.
We introduce SSTAP, a variation of the SSAP problem defined using indicator functions that capture threshold constraints. An optimal assignment policy is proven and it is independent from the number of jobs. Via an illustrative example, we show that SSTAP models accurately aviation security problems. We provide a performance analysis of systems that use SSTAP optimal policy for their sequential assignment projects. Finally, we analyze the multiple levels SSTAP, and we study the SSTAP with random performance rates.
Acknowledgement:I would like to thank professor Sheldon Howard Jacobson (Department of Computer Science, University of Illinois at Urbana Champaign) for the insightful disucssions and for providing me research questions presented in this paper.
- Nikolaev, Alexander G., and Sheldon H. Jacobson. ”Stochastic sequential decision-making with a random number of jobs.” Operations Research 58.4-part-1 (2010): 1023-1027.
- Lee, Adrian J., and Sheldon H. Jacobson. ”Sequential stochastic assignment under uncertainty: estimation and convergence.” Statistical Inference for Stochastic Processes 14.1 (2011): 21-46.
- Golshid Baharian, and Sheldon H. Jacobson. ”Stochastic sequential assignment problem with threshold criteria.” Probability in the Engineering and Informational Sciences 27.3 (2013): 277-296.
- Golshid Baharian, and Sheldon H. Jacobson. ”Limiting behavior of the target-dependent stochastic sequential assignment problem.” Journal of Applied Probability 51.4 (2014): 943-953.
- Albright, S. Christian. ”Optimal sequential assignments with random arrival times.” Management Science 21.1 (1974): 60-67.
- Billingsley, Patrick. ”Probability and Measure.” John Wiley and Sons. Inc., New York (1979).
- Derman, Cyrus, Gerald J. Lieberman, and Sheldon M. Ross. ”A sequential stochastic assignment problem.” Management Science 18.7 (1972): 349-355.
- Derman, Cyrus, Gerald J. Lieberman, and Sheldon M. Ross. ”A stochastic sequential allocation model.” Operations Research 23.6 (1975): 1120-1130.
- Hardy, G. H., J. E. Littlewood, and G. Polya. ”Inequalities”, Cambridge University Press. (1934).
- Kennedy, D. P. ”Optimal sequential assignment.” Mathematics of Operations Research 11.4 (1986): 619-626.
- Nakai, Toru. ”A sequential stochastic assignment problem in a partially observable Markov chain.” Mathematics of Operations Research 11.2 (1986): 230-240.
- Prastacos, Gregory P. ”Optimal sequential investment decisions under conditions of uncertainty.” Management Science 29.1 (1983): 118-134.
- Righter, Rhonda. ”The stochastic sequential assignment problem with random deadlines.” Probability in the Engineering and Informational Sciences 1.2 (1987): 189-202.
- Sakaguchi, M. ”A sequential stochastic assignment problem with an unknown number of jobs.” Mathematica Japonica 31 (1984): 141-152.
- Van Hentenryck, Pascal, Russell Bent, and Yannis Vergados. ”Online stochastic reservation systems.” International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming. Springer, Berlin, Heidelberg, 2006.
- Khatibi, Arash, et al. ”The sequential stochastic assignment problem with random success rates.” IIE Transactions 46.11 (2014): 1169-1180.
- Golshid Baharian, Arash Khatibi, and Sheldon H. Jacobson. ”Sequential stochastic assignment problem with time-dependent random success rates.” Journal of Applied Probability 53.4 (2016): 1052-1063.
- Khatibi, Arash, and Sheldon H. Jacobson. ”Doubly stochastic sequential assignment problem.” Naval Research Logistics (NRL) 63.2 (2016): 124-137.
- Nikolaev, Alexander G., Sheldon H. Jacobson, and Laura A. McLay. ”A sequential stochastic security system design problem for aviation security.” Transportation Science 41.2 (2007): 182-194.
- McLay, Laura A., Sheldon H. Jacobson, and Alexander G. Nikolaev. ”A sequential stochastic passenger screening problem for aviation security.” IIE Transactions 41.6 (2009): 575-591.
- Lee, Adrian J., Laura A. McLay, and Sheldon H. Jacobson. ”Designing aviation security passenger screening systems using nonlinear control.” SIAM Journal on Control and Optimization 48.4 (2009): 2085-2105.
- Nikolaev, Alexander G., Adrian J. Lee, and Sheldon H. Jacobson. ”Optimal aviation security screening strategies with dynamic passenger risk updates.” IEEE Transactions on intelligent transportation systems 13.1 (2012): 203-212.
- Parkes, David C., and Satinder P. Singh. ”An MDP-based approach to online mechanism design.” Advances in neural information processing systems. 2004.
- D. B. West, Introduction to graph theory, Prentice Hall, edition 2, 2000.
- Hajek, Bruce, Random processes for engineers, Cambridge University Press, Cambridge, United Kingdom 2015.