Discriminative Datadriven Selfadaptive Fraud Control Decision System with Incomplete Information
Abstract
While Ecommerce has been growing explosively and online shopping has become popular and even dominant in the present era, online transaction fraud control has drawn considerable attention in business practice and academic research. Conventional fraud control considers mainly the interactions of two major involved decision parties, i.e. merchants and fraudsters, to make fraud classification decisions without paying much attention to dynamic looping effect arose from the decisions made by other profitrelated parties. This paper proposes a novel fraud control framework that can quantify interactive effects of decisions made by different parties and can adjust fraud control strategies using data analytics, artificial intelligence, and dynamic optimization techniques. Three control models, Naive, Myopic and Prospective Controls, were developed based on the availability of data attributes and levels of label maturity. The proposed models are purely datadriven and selfadaptive in a realtime manner. The field test on Microsoft real online transaction data suggested that new systems could sizably improve the company’s profit.
keywords:
Ecommerce, transaction fraud risk, optimal control, artificial intelligence, datadriven decision support, incomplete information.1 Introduction
As Ecommerce has grown explosively in recent years, many merchants have been providing some centralized platforms for consumers to buy products with ”OneClick”. Although online (cardnotpresent) type of transactions have offered the great benefit of consumer convenience, it also has increased the high risk of transaction frauds. As a result, merchants unavoidably have to employ many resources to develop an effective and efficient mechanism for fraud detection and transaction risk control. These control systems usually consist of two core engines: a risk scoring engine and a risk control engine.
The risk scoring engine is designed to measure the risk level of each transaction. Instead of assigning a transaction with explicit 01 (legitimacy  fraud) classification, the majority of merchants calculate the risk score for each transaction based on its attributes, such as purchase price, order quantity, payment information, product market, etc. Whenever a transaction with a higher score is seen, it is more likely to be fraudulent. With the help of big data and machine learning technologies, the modern scoring model has been significantly improved using streaming historical data.
The risk control engine gets involved once a risk score is calculated. Some transactions that violate predetermined policies or rules get instantly rejected. These predetermined rules and policies are set due to some governments and merchants made regulations, or they are needed when some obvious frauds require immediate blockade. However, the majority of frauds fail to be restrained by these rules, so the risk control engine needs to step in and further prevent more fraudulent transactions using the risk scores. Conventional risk controls apply static risk cutoff score thresholds: approve transactions with risk scores lower than the low score threshold; reject transactions with scores higher than the high score threshold; utilize human intelligence (manual review) for further investigations on transactions with the risk scores inbetween. The cutoff score thresholds are set so that the inline fraud detection system can optimally prevent fraudsters’ attacks. This threshold band method is widely applied in ecommerce merchants and financial institutions. Despite the fact that the method of risk score evaluation has been significantly improved during the past few years, due to the following three main reasons decisions made by risk scores are still not always reliable: 1) Rapid changes in fraudsters’ behavior patterns; 2) Loss of fraud signals from rejected transactions, and; 3) Long data maturity lead time. Because of these issues, the conventional fraud control engine lacks for flexibility and capability of realtime selfadjustment, and hence cannot always provide the most accurate risk decisions.
Our research motivation for this paper stemmed not only from the drawbacks of the current fraud control systems but also from the broader view of various risk control parties who contribute to the final decisions in different transaction flows. Merchants’ risk control decision making should not be isolated from the entire decision environment, where payment issuing banks and manual review teams make followup decisions that constitute the final decisions on every transaction. Figure 1 depicts how a transaction is processed through different decision stations until it reaches its final decision.
When a transaction arrives, the risk scoring engine calculates its risk level score based on all its associated features.The risk control engine then makes a decision (approval, rejection, MR review) using some important attributes of this transaction (including its risk score). If the transaction is approved by the risk control engine, it is then sent to the bank for the followup decision (a bank authorized transaction is marked as Final Approval, and a bank declined transaction is marked as Final Rejection). If the transaction is rejected by the risk control engine, it is directly marked as Final Rejection. If the transaction is not approved nor rejected by the risk control engine, it would also be sent to the bank first. Only if the bank authorizes the transaction, it has the chance to reach to the manual review (MR) agents for further investigation and for its final decision (a transaction that is authorized by bank and approved is marked as Final Approval, and marked Final Rejection otherwise). The blue box indicates the target of this research, and the grey boxes point out other involved decisionmaking parties.
Banks are regarded as a single decision party for simplicity. From the data, we found that when the risk control engine approved and submitted transactions that included more frauds (false negative: wrongful approval) to the banks, when banks sensed it, they became more conservative and would decree more rejections of good transactions (false positive: wrongful rejection). Data also showed that when the risk control engine submitted transactions that included fewer frauds (true negative: rightful approval) to the manual review (MR) teams, manual review teams tended to have much harder time to make accurate risk decisions since fraud patterns are less massive and recognizable. Interactions of different decision parties, legitimate customers and fraudsters are demonstrated in Figure 2.
Considering the high total dollar amount of ecommerce transactions taking place in this such rapidly changing risk decision environment, there is a strong need to design a fraud control engine that can conquer all the aforementioned challenges and optimize the decision accuracy so that the higher profit can be reached. In this paper, the proposed control framework is designed to achieve the following:

Adaptive learning: the proposed risk control engine is trained using streaming transaction records which might include some incomplete information such as the immature label, and it can adaptively recognize the new decision environment;

Discriminative control: instead of using static uniform cutoff thresholds, the proposed control system can assign inline decision (Approve, Reject or Manual Review) in a realtime manner based on the attributes of each incoming transaction;

Datadriven: the risk control is entirely datadriven which helps avoid unreliable ad hoc humanmade hardcoding rules on risk decisions.
The field test on Microsoft real online transaction data suggested that the proposed control system could significantly improve the company’s profit by reducing the loss caused by inaccurate decisions (including both wrongful approvals and wrongful rejections).
The rest of this paper is structured as follows: In Section 2 previous research work related to fraud control is first outlined and the existence of the research gap is discussed. In Section 3 the Perfect State Dynamic Model with rigorous mathematical formulation are introduced and the intractableness of the model is then discussed. Three approximate dynamic control models are proposed in Section 4, and the test results of their performance are included in Section 5. Section 6 concludes this paper.
2 Related Research
Online shopping fraud detection research using machine learning methodologies started from early 90’s right after the occurrence of Ecommerce, in which the major research task was to evaluate fraud risk levels of transactions. Fraud risk level was measured using risk scores, and thus the research on risk scoring gained widespread attention. These scoring engines were inspired by neural network GhoshReilly1994FraudDetectionNN (); Aleskerov1997FraudDetectionNN (); Dorronsoro1997FraudDetectionNN (), decision tree mena2002FraudDetectionBook (); Kirkos2007FraudDetectionDT (); Sahin2013FraudDetectionDT (), random forest Bhattacharyya2011FraudDetectionRF (), network approach APATE2015FraudDetectionNW () and deep neural network Kang2016FraudDetectionCNN (). Readers who are interested in this topic may also refer to Review2011DSS () and references therein for other related papers that discussed different scoring methods. Despite the fact that current research admits the fact that fraud patterns keep changing and fraud risk scores are not always that reliable, no existing papers discuss how to optimally utilize these scores in fraud control operations. On the other hand, data mining papers provide weak guidance in detailed operations, as risk score is indeed a blur expression of fraud. There is currently no literature demonstrating how to deal with the transactions in ”gray zone”, where the risk score of a transaction is neither too low nor too high. Additionally, no literature has addressed interactions of decisions made by multiple parties for transaction risk control. The main reason of lack of related literature is that ecommerce data are strictly confidential and thus very limited access are granted for academic researches. Our this paper fills the gaps between the transaction fraud evaluation and the systematic risk control operations.
Dynamic control research started from the 1940’s. We suggest MDP1994 () and DP1995 () for comprehensive introduction to dynamic optimal control methods, as well as their applications in communication, inventory control, production planning, quality control, etc.. In this research, we investigated an important segment of the dynamic control research, dynamic optimal control with incomplete information, as the main technical foundation of our paper which targeted the challenge of some fraud control systems that can only obtain and utilize partially mature data for modeling. One previous related research is Partially Observed Markovian Decision Process (POMDP). POMDP is a sequential decisionmaking model that deals with inaccurate and incomplete observations of the system state or decision environment. It models/infers transition probability matrix, and the underlying relationship between partially observed and true states (fully observed) information. However, POMDP brings in the significant computational challenges and often requires carefully designed heuristic algorithms to achieve suboptimal solutions. Structural properties of the reward function and computational algorithms of POMDP are available in POMDP1973Finite (), POMDP1978InfiniteDiscount (), POMDP1980Platzman (), POMDP1989SolutionProcedureWhite (), POMDP1991WhiteSurvey (), POMDP1994AAAI94 (), POMDP1998HeuristicWhite (), POMDP2004HeuriticWhite () and POMDP2010Geometric (). Another related research in dynamic control with incomplete information is Adaptive Dynamic Programming (ADP). ADP assumes that perfect information is not known a priori and needs to be gradually learned from historical data or feedback signals of the dynamic system. ADP concepts started from 1970’s and contributed as one of the core methods in reinforcement learning. In this paper, we will only highlight a number of papers addressing ActorCritic structure, one branch of ADP research that is closer in respect to the depth and width of our research. Readers who are interested in ADP should refer to ADP2009Survey () for a comprehensive review of ADP with respect to theoretical developments as well as application studies. The ActorCritic structure was first proposed in ACD1983Sutton (), which suggested an optimal control of learning while improving. ActorCritic structure implies two steps: an actor applies an action to the environment and receives feedback from a critic; Action improvement is then guided by the evaluation signal feedback. The decision environment feedback is recognized and reinforced after receiving feedback rewards with neural network (ADPNN1989Werbos (),NNcontrol1991MSW () and NeuroDP1996 ()), probabilistic models RL1998 () for bandit, Monto Carlo and Approximate MDP methods) and other stochastic models SLO2009Cao (). There are two main challenges in solving ADP: (1) curse of dimensionality: as the dimensions of state space and action space get extremely high, a large amount of information must be stored and it makes the computational cost grows explosively ADPCoD2009 (); (2) Implicit form of objective functions: reward/cost function in dynamic control does not have an explicit form, which needs to be carefully approximated OLADP2013 (). Powell introduced several parametric approximation methods to mitigate curse of dimensionality in ADPCoD2009 () . Powell et al. in OLADP2013 () proposed general dynamic control heuristics in ADP, including myopic control and lookahead control with different approximation schemes for cost function and decision environment transition probabilities, while decision environment is learned using local searching, regression or Bayesian methods with either offline or online fashion.
Our research is motivated by the current research gap in risk management literatures. Problem formulations in Section 3 is supported by POMDP literatures, and heuristic solution algorithms are inspired by the ideas in ADP literatures. We studied some realistic issues in fraud control domain, and adapted the general POMDP models and ADP heuristics to fit the structure of fraud control problem. The model and algorithms proposed in this paper are not limited to the application of transaction fraud control, and can be easily extended to other fraud control and defense applications in finance, healthcare, electrical system, robotics, and homeland security.
3 Problem Formulation
In this section, we rigorously formulate the dynamic control model assuming that the state information and the state transition information in the dynamic control model can be exactly characterized. However, the state information and the state transition probabilities in perfect state model, called (Perfect) in Section 3.1, are not explicit, which need to be approximated from incomplete streaming data. Section 3.2 discusses challenges in solving the dynamic model.
3.1 Perfect State Dynamic Model
We first focus and investigate the expected profit in transaction level, which are the building blocks of the control system. Let , and denote risk score, profit margin and costs (cost of goods, manual review costs, chargeback fine, etc.) respectively. has a finite integral support with upper bound , and , are real numbers. According to system logistics shown in Figure 1, profits of approval (), review () and rejection () of this transaction can be formulated as follow:
where is unit labor cost for each manual review, and is the indicator function, i.e. given event ,
Given the fact that risk score is a comprehensive evaluation of the risk level, which is estimated using thousands of transaction attributes, we assume that for any two transactions that have the same risk score , i.e. and , the interactive effect of bank or MR are identical, which can be expressed in the mathematical form as,
(1) 
With Eq.(1), the expected profit for each risk operation for transaction can be derived as
(2a)  
(2b)  
(2c) 
functions in Eq.(2a)(2c) are probabilities of different events given risk score . function is short for gold function, whose values represent profitrelated probabilities associated with different risk decisions.
We further delve into a realistic dynamic system, in which banks and MR decision behaviors are changing dynamically. We consider a discrete time dynamic control model with infinite time horizon . Let be a set of transactions occurred during period . Elements of this transaction set include the risk score , margin and costs of this th transaction in period . Let be the total number of transactions occurred during period , so is then a random variable. We can then formally define the dynamic control model as follow.

State space: , which is a set of 5 functions values at all risk scores. In period , the state can be expressed as .

Action space in period : , which has feasible decision sequences. Let be one feasible action sequence in period , and for the th transaction, risk control engine can choose action .

State transition probability matrix: , where is the probability that system move from state to state when taking action sequence a. We assume that is fixed but implicit through out this paper.
Let be the rewardtogo function at the beginning of period , then this stochastic dynamic model can be formulated with Bellman’s equation as
(Perfect) 
where is a discount factor of future rewards, and reward function can be formulated as
Throughout the entire paper, we assume that a finite number of transactions occurred in each period, and the reward of each transaction is bounded. Theorem 3.1 gives the condition that Model (Perfect) has a unique optimal solution.
Theorem 3.1.
If (1) number of transaction occurred in each period is finite and margin/loss from each transaction is bounded, and (2) the arriving process of transactions is stationary, then there exists an optimal profit satisfying
and there is a unique solution to this equation.
Theorem 3.1 is guaranteed by contracting mapping argument and directly follows Theorem 6.2.3 and Theorem 6.2.5 from MDP1994 ().
3.2 Incomplete Information and Intractableness of (Perfect) Model
Although Theorem 3.1 provides solid guidance to find the optimal control strategy, there are several issues of implementing Model Perfect in reality.

Exact state information is unavailable: State information, i.e. functions, can only be inferred using partially mature data, since data maturity lead time is a latent random variable with range . We have no way to obtain the true time point of maturity for each transaction until the transaction is eventually marked as a chargeback. However, through analyzing the historical data we do have the knowledge that after periods of time the fraud status (having chargeback or not) should be all mature;

Reward functions are not entirely exact: Reward functions, , are based on estimations of functions, and is not known a priori. Therefore, reward functions could vary due the different estimated functions and the different ;

Transition probability matrix does not have an explicit form: State space has extremely high dimension (five functions estimated at risk scores); Action space has exponential dimension that explosively increase as number of transactions increases ( has possible decision sequences).
The lag of data maturity and the curse of dimensionality lead to the fact that Model (Perfect) is intractable. Thus we propose three approximate dynamic heuristics to obtain suboptimal control decisions. Details of these different control algorithms will be demonstrated in Section 4.
All dynamic control heuristics require a base module which utilizes incomplete information, such as the mature old data and the partially mature recent data, to infer future functions in these heuristic algorithms. Data mining results suggest that correlations exist between recent period’s partially mature chargeback rate and bank/MR behavior patterns. This fact implies that we should track partially mature chargeback rate of transactions portfolio in period , so that functions can be properly calibrated. This happens to have the same view with business intuitions in multiparty fraud control: If bank and MR learn that recently received transactions have high chargeback rate, they will become more conservative with their decision making by reducing the number of authorization/approval decisions to prevent more undesirable chargebacks. Two decision environment modules, Current Environment Inference (CEI) module and Future Environment Inference (FEI) module, are adopted from GFunctionEstimation2018TKDE (). Discussion of these two modules are out of the scope of the current paper, we suggest readers refer to GFunctionEstimation2018TKDE () for details of CEI and FEI modules. CEI and FEI utilize historical data to produce function estimations, which contribute to the datadriven property of our risk control framework.
4 Dynamic Risk Control Algorithms
In this section, we propose three different dynamic risk control algorithms: Naive, Myopic and Prospective control. Naive control is the simplest heuristic algorithm that only uses fully mature data before period . Myopic control estimates the current decision environment using CEI module with both mature and immature data in period . The most complex control model, Prospective control, further takes into account that current decision will influence not only the current profit but also the near future profit. Three models are demonstrated in Section 4.1  4.3.
4.1 Naive control
Figure 3 depicts decision flow of naive control. At the beginning of period , decision engine uses mature data before period to estimate functions.
In period , let =, , , , be the estimated current state, and be the action space of period . Then feasible action sequence has a form of , where . Naive model disregards the future effects. For transactions take place in period , we need to solve the following model to get action sequence .
(Naivet)  
Naive control repeats this procedure for each period . Theorem 4.1 claims that (Naivet) can be easily solved by greedily choosing the decision option that yields the highest expected reward for each incoming transaction. Details about Naive control policy is summarized in Algorithm 1.
Theorem 4.1.
Optimal action sequence of (Naivet) can be obtained by the greedy algorithm, i.e. for , sequentially set
4.2 Myopic control
Figure 4 shows the decision flow of myopic control.
This control model is designed to resolve the pattern recognition lag issue due to the delay of data maturity. We adopt CEI module from GFunctionEstimation2018TKDE () to infer current period decision environments. Mathematically, CEI maps matured function trajectories (, , , , and : ) and partially mature chargeback rate to estimate functions ( and ) at current period, .
(CEI) 
where is calculated by
Then for transactions occurred in period , Myopic Dynamic Control model solves the following model to get action sequence .
(Myopict)  
CEI module is updated at the beginning of each period and (Myopict) is solved during each period to provide optimal control actions. Theorem 4.2 provides theoretical guarantee that (Myopict) can be solved by the greedy method. Details of Myopic control policy is summarized in Algorithm 2.
Theorem 4.2.
The optimal action sequence of (Myopict) can be obtained by the greedy algorithm, i.e. for , sequentially set
4.3 Prospective control
Figure 5 depicts decision flow of prospective control.
Prospective control model has a similar CEI module that can diminish pattern recognition lag. In addition, FEI module is adopted from GFunctionEstimation2018TKDE () to estimate future decision environment change due to the action taken at current period. These environments are characterized by the functions of period and . Similar with Myopic control, in period , we use the output of the CEI module as the state estimation, i.e. . Action space of period is still . While different from previous two control models, prospective control considers future effects caused by the current decisions: the action sequences will play a role on the behavior patterns of bank and MR in period . For transactions occurred in period , we need to solve the following model to get our action sequence .
(Prospectivet)  
where is a discount factor, and is a reference future profit of period . A reference sample from mature control group is bootstrapped from mature data set in order to provide reference future profit . Let this reference transaction set sample be with elements. FEI module includes two subprocedures:

Calculate estimated chargeback rate of period , : at a given time point during period , suppose we have received transaction request, and our decision action sequence is , we can then estimate charge back rate of period ,
(3) where is the indicator function.

Predict future functions (, , , and ) with matured function trajectories (, , , , and : ) and estimate weekly full chargeback rate . FEI is trained with mature data and
(FEI)
(Prospectivet) is hard to solve due to high dimension of and nonanalytic form of . A similar realtime updated greedy heuristic is introduced to obtain a sub optimal solution for (Prospectivet). This Realtime Greedy Heuristic (RGH) allows us to update estimation of on the fly and to adjust our strategy within period . Figure 6 illustrates the logics of RGH within period .
Let time be a decision time point in period where transaction occurs and risk team needs to make decision either to approve, reject or manual review this transaction. Suppose from , starting point of period , to current decision point , we have observed transactions. Hence, we can estimate the chargeback rate of period , if we approve, review or reject using Eq. (4).
(4) 
We further estimate the expected reward of approval, review or rejection of . Note that the future effect is first averaged to reward per transaction and then discounted by a factor of .
(5a)  
(5b)  
(5c) 
and for ,
(6)  
where are calculated using Eq. (4), and is derived by (FEI). RGH sequentially assigns action that has the largest prospective reward to each incoming transaction. For , we sequentially set
(ProspectiveRGH) 
Prospective control algorithm is summarized in Algorithm 3.
Repeat for period :
5 Field Tests on Microsoft Ecommerce
Field tests were conducted to exam the performances of these three dynamic models. Testing dataset was extracted from a subunit of Microsoft Ecommerce business. We sample no more than 3% of total transactions as the testing data set. For transactions in the testing set, we recorded decisions in our database while we flipped all final rejected transactions to final approval, so that we could obtain unbiased chargeback signals for model training and profit calculation. We set the length of the testing period to one week and tested all dynamic control model paralleling with current Microsoft inline decision engine.
Our data indicated that maximum lead time for the data maturity was , and the recent partially mature reference time was . The testing time window is 14 weeks, and the bank and MR decisions for each transaction are kept identical for different control methods to ensure appletoapple comparison. The historical data continued to be maturing while the testing time moved forward. For Naive and Myopic controls, the risk decision engine was updated weekly (functions and (CEI) module was retrained at the beginning of each period). For Prospective control, the risk decision engine refreshed the belief of current functions, (CEI) module and (FEI) module once a week, while estimations of current week chargeback rate and future functions in realtime were updated. Due to the Microsoft’s confidentiality requirements, the name of the Ecommerce subunit is muted, and this section only includes the summarized feature values that were aggregated over a 14week of transaction period to demonstrate the usability of Naive, Myopic and Prospective control models. The discount factor in Prospective control model was tuned using fold cross validation at the beginning of the testing and is a fixed valued, , throughout the 14 week testing periods.
We first studied numbers of different risk control operations (approve, review, and reject) out of total testing transactions. Figure 7 summarizes counts of different risk control decisions made by the current Microsoft’s decision engine and three proposed dynamic control engines. Over the 14 weeks testing period, the dynamic control engines gradually captured the decision accuracy of the manual review group. All of the three models learned the fact that manual review agents overly rejected nonfraud transactions, and thus started to cut off the volume of transactions submitted to the manual review team. Figure (a)a suggests that all three dynamic models approved more transactions than the current decision engine did. We can see later that these dynamic models also enhanced decision accuracy significantly in Figure 8: the dynamic control models not only approved more nonfraud transactions but also approved fewer fraud transactions. All three models suggest sending fewer transactions for manual review. Naive control aggressively decreased review volume to only 10% of the review volume suggested by current Microsoft’s decision engine, while Myopic and Prospective control mildly decreased review volume to roughly 30% of the original volume. As for the decision of rejection, Naive control increased rejection volume by about 12%, while Myopic and Prospective control decreased rejection volume by 12.5% and 9% respectively. We can also observe the fact that Myopic and Prospective control models again enhanced decision accuracy in Figure 8 by rejecting much fewer nonfraud transactions but more fraud transactions.
Numbers of performance measures were used to validate the decision quality of a risk control engine. First, we investigated the decision quality by comparing the losses caused by wrong decisions. Two common performance metrics for this are false negative (FN) loss and false positive (FP) loss. FN loss measures the total loss of approving fraud transactions (wrongly approval), which consists cost of goods and all related fees of chargeback. On the other hand, FP loss measures the total loss of rejecting nonfraud transactions (wrongly rejection), and it includes all the margins that should have been but not earned. We then checked the manual review (MR) cost, which is the total labor cost of the human review team. We found that when the risk engine submitted transactions that included fewer frauds (true negative: rightful approval) to the manual review teams, manual review teams tended to have a much more difficult time to make accurate risk decisions since fraud patterns are less massive and recognizable. Therefore, with more transactions sent to manual review teams, not only more labor costs will arise, but the decision accuracy instability will likely to increase. Figure 8 summarizes aggregated improvement on FN loss, FP loss and MR cost on the selected testing data set.
Figure (a)a shows the fact that all three dynamic control methods made better ”approval” decisions by producing fewer FN losses. Naive, Myopic and Prospective control model decreases FN loss by 8.48%, 7.32%, and 7.55% respectively. Figure (b)b suggests that Naive control model is relatively aggressive which rejected more nonfraud transactions and yielded 9.49% more FP losses. Meanwhile, Myopic and Prospective control mildly decrease FP loss by 4.73% and 3.05% respectively, and these two dynamic control methods make more correct rejections. As mentioned earlier, all dynamic decision engine found that MR had limited accuracy in detecting fraud. In this way, Naive, Myopic and Prospective control model deceased transactions submit for review by 93.0%, 64.2%, and 64.7%.
Second, we compare the differences of total profits and total chargeback rates among three dynamic control methods and current Microsoft’s risk control method. Providing higher profit is the ultimate goal for business operations. While on the other hand, risk control team also needs to ensure the new dynamic control methods do not escalate the chargeback rate for merchants. We need to ensure that proposed dynamic control methods can produce higher profit but not increase (or even lower) the chargeback rate.
Naive  Myopic  Prospective  


+ $ 79,962  + $ 97,863  + $ 96,693  

+ $ 9,900,071  + $ 12,116,318  + $ 11,971,568  

0.72%  1.64%  2.98% 
Table 1 summarizes insights of improvements in overall profit and chargeback rate. The first row of Table 1 includes profit improvement on the testing set calculated by . The second row extrapolates total profit from training set to an estimated annual improvement on the selected subunit. The third row reports the relative differences in proportion on chargeback rates, calculated by
Over the 14 week testing period, Naive control contributed $79,962 more on the testing portfolio while maintained a similar chargeback rate with current Microsoft risk decision engine had. Naive control decreased chargeback rate slightly by only 0.72% of Microsoft’s current chargeback rate. Myopic control contributed to the largest profit improvement for $97,863 on the testing set. Meanwhile, Myopic control decreased chargeback rate relatively for 1.64%. Finally, for Prospective control, it produced $96,693 more profit on the testing transaction set, while provided the largest improvement on chargeback rate by decreasing chargeback rate by 2.98%. The estimated annual improvements for Naive, Myopic and Prospective control on selected subunit were $ 9,900,071, $ 12,116,318, and $ 11,971,568 respectively by extrapolation.
We conclude this section with a few business takeaways. We have seen that all three models have potentials for significantly improving company profit while slightly decreasing chargeback rates. All three dynamic models enhanced decision qualities by decreasing FN losses, FP losses and MR costs. Although Naive control model performed relatively aggressive in rejecting transactions, Myopic and Prospective control made better rejection decisions by rejecting fewer nonfraud transactions. All three dynamic methods had great performance with approving more nonfraud transactions and rejecting more fraud transactions. Artificial intelligence modules in these dynamic control models were well developed, and outperformed human review agents one most of the fraud decisions. Manual review volumes decreased as expected, and hence MR labor costs were reduced significantly.
6 Conclusion and Future Study
To minimize ad hoc humanmade decision, and improve the accuracy and robustness of the risk decision making, we investigated how to reach the optimal action when if complete information is available. We defined our problem rigorously, characterized all profit related components in the current system and investigated decision interactions between three different decisionmaking parties. We acknowledged the fact that perfect information is unavailable in reality and thus we designed three datadriven dynamic optimal control models, Naive control, Myopic control, and Prospective control. These control models are 100% datadriven and selftrained/adapted in a realtime manner. As demonstrated, these dynamic control models helped increase the profit significantly by minimizing false negative loss, false positive loss, and manual review costs by employing incomplete information, including longterm and shortterm mature and partiallymature data. Meanwhile, the proposed control models also slightly lowered chargeback rates as desired. The field test on subunit of Microsoft Ecommerce suggested that the discriminative dynamic control models had better fraud detection performance than the current general score cutoff control.
The research proposed in this paper can contribute greatly to both theoretical and applied research on fraud detection for the systems that have problems with incomplete information and decision looping effect due to multiple decision parties. Its application is not limited to financial risk systems, but can also be used for application and research in cybersecurity, homeland security, contagion disease screens etc.. Our future research will include information sharing and information fusion. We will extend this current research to more complex and realistic settings, where information sources are shared at different levels among different risk control decision parties.
Acknowledgments and Funding Sources
This research was supported by Microsoft, Redmond, WA. The authors are thankful to researchers and members from Microsoft Knowledge and Growth group for providing data and their knowledge of the system.
References
 (1) S. Ghosh, D. L. Reilly, Credit card fraud detection with a neuralnetwork, in: Proceedings of the TwentySeventh Annual Hawaii International Conference on System Sciences, Vol. 3, IEEE, 1994, pp. 621–630.
 (2) E. Aleskerov, B. Freisleben, B. Rao, Cardwatch: a neural network based database mining system for credit card fraud detection, in: Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering, IEEE, 1997.
 (3) J. R. Dorronsoro, F. Ginel, C. Sanchez, C. S. Cruz, Neural fraud detection in credit card operations, IEEE Transactions on Neural Networks 8 (4) (1997) 827–834.
 (4) J. Mena, Investigate data mining for security and criminal detection, ButterworthHeinemann, 2002.
 (5) E. Kirkos, C. Spathis, Y. Manolopoulos, Data mining techniques for the detection of fraudulent financial statements, Expert Systems with Applications 32 (4) (2007) 995â1003.
 (6) Y. Sahin, S. Bulkan, E. Duman, A costsensitive decision tree approach for fraud detection, Expert Systems with Applications 40 (2013) 5916â5923.
 (7) S. Bhattacharyya, S. Jha, K. Tharakunnel, J. C. Westland, Data mining for credit card fraud: a comparative study, Decision Support Systems 50 (2011) 602