Discriminative Data-driven Self-adaptive Fraud Control Decision System with Incomplete Information

# Discriminative Data-driven Self-adaptive Fraud Control Decision System with Incomplete Information

Junxuan Li Yung-wen Liu Yuting Jia Jay Nanduri Dynamic 365 Fraud Protection, Microsoft, Redmond, WA 98052
###### Abstract

While E-commerce has been growing explosively and online shopping has become popular and even dominant in the present era, online transaction fraud control has drawn considerable attention in business practice and academic research. Conventional fraud control considers mainly the interactions of two major involved decision parties, i.e. merchants and fraudsters, to make fraud classification decisions without paying much attention to dynamic looping effect arose from the decisions made by other profit-related parties. This paper proposes a novel fraud control framework that can quantify interactive effects of decisions made by different parties and can adjust fraud control strategies using data analytics, artificial intelligence, and dynamic optimization techniques. Three control models, Naive, Myopic and Prospective Controls, were developed based on the availability of data attributes and levels of label maturity. The proposed models are purely data-driven and self-adaptive in a real-time manner. The field test on Microsoft real online transaction data suggested that new systems could sizably improve the company’s profit.

###### keywords:
E-commerce, transaction fraud risk, optimal control, artificial intelligence, data-driven decision support, incomplete information.
journal: Decision Support Systems

## 1 Introduction

As E-commerce has grown explosively in recent years, many merchants have been providing some centralized platforms for consumers to buy products with ”One-Click”. Although online (card-not-present) type of transactions have offered the great benefit of consumer convenience, it also has increased the high risk of transaction frauds. As a result, merchants unavoidably have to employ many resources to develop an effective and efficient mechanism for fraud detection and transaction risk control. These control systems usually consist of two core engines: a risk scoring engine and a risk control engine.

The risk scoring engine is designed to measure the risk level of each transaction. Instead of assigning a transaction with explicit 0-1 (legitimacy - fraud) classification, the majority of merchants calculate the risk score for each transaction based on its attributes, such as purchase price, order quantity, payment information, product market, etc. Whenever a transaction with a higher score is seen, it is more likely to be fraudulent. With the help of big data and machine learning technologies, the modern scoring model has been significantly improved using streaming historical data.

The risk control engine gets involved once a risk score is calculated. Some transactions that violate predetermined policies or rules get instantly rejected. These predetermined rules and policies are set due to some governments and merchants made regulations, or they are needed when some obvious frauds require immediate blockade. However, the majority of frauds fail to be restrained by these rules, so the risk control engine needs to step in and further prevent more fraudulent transactions using the risk scores. Conventional risk controls apply static risk cut-off score thresholds: approve transactions with risk scores lower than the low score threshold; reject transactions with scores higher than the high score threshold; utilize human intelligence (manual review) for further investigations on transactions with the risk scores in-between. The cut-off score thresholds are set so that the inline fraud detection system can optimally prevent fraudsters’ attacks. This threshold band method is widely applied in e-commerce merchants and financial institutions. Despite the fact that the method of risk score evaluation has been significantly improved during the past few years, due to the following three main reasons decisions made by risk scores are still not always reliable: 1) Rapid changes in fraudsters’ behavior patterns; 2) Loss of fraud signals from rejected transactions, and; 3) Long data maturity lead time. Because of these issues, the conventional fraud control engine lacks for flexibility and capability of real-time self-adjustment, and hence cannot always provide the most accurate risk decisions.

Our research motivation for this paper stemmed not only from the drawbacks of the current fraud control systems but also from the broader view of various risk control parties who contribute to the final decisions in different transaction flows. Merchants’ risk control decision making should not be isolated from the entire decision environment, where payment issuing banks and manual review teams make follow-up decisions that constitute the final decisions on every transaction. Figure 1 depicts how a transaction is processed through different decision stations until it reaches its final decision.

When a transaction arrives, the risk scoring engine calculates its risk level score based on all its associated features.The risk control engine then makes a decision (approval, rejection, MR review) using some important attributes of this transaction (including its risk score). If the transaction is approved by the risk control engine, it is then sent to the bank for the follow-up decision (a bank authorized transaction is marked as Final Approval, and a bank declined transaction is marked as Final Rejection). If the transaction is rejected by the risk control engine, it is directly marked as Final Rejection. If the transaction is not approved nor rejected by the risk control engine, it would also be sent to the bank first. Only if the bank authorizes the transaction, it has the chance to reach to the manual review (MR) agents for further investigation and for its final decision (a transaction that is authorized by bank and approved is marked as Final Approval, and marked Final Rejection otherwise). The blue box indicates the target of this research, and the grey boxes point out other involved decision-making parties.

Banks are regarded as a single decision party for simplicity. From the data, we found that when the risk control engine approved and submitted transactions that included more frauds (false negative: wrongful approval) to the banks, when banks sensed it, they became more conservative and would decree more rejections of good transactions (false positive: wrongful rejection). Data also showed that when the risk control engine submitted transactions that included fewer frauds (true negative: rightful approval) to the manual review (MR) teams, manual review teams tended to have much harder time to make accurate risk decisions since fraud patterns are less massive and recognizable. Interactions of different decision parties, legitimate customers and fraudsters are demonstrated in Figure 2.

Considering the high total dollar amount of e-commerce transactions taking place in this such rapidly changing risk decision environment, there is a strong need to design a fraud control engine that can conquer all the aforementioned challenges and optimize the decision accuracy so that the higher profit can be reached. In this paper, the proposed control framework is designed to achieve the following:

1. Adaptive learning: the proposed risk control engine is trained using streaming transaction records which might include some incomplete information such as the immature label, and it can adaptively recognize the new decision environment;

2. Discriminative control: instead of using static uniform cut-off thresholds, the proposed control system can assign inline decision (Approve, Reject or Manual Review) in a real-time manner based on the attributes of each incoming transaction;

3. Data-driven: the risk control is entirely data-driven which helps avoid unreliable ad hoc human-made hard-coding rules on risk decisions.

The field test on Microsoft real online transaction data suggested that the proposed control system could significantly improve the company’s profit by reducing the loss caused by inaccurate decisions (including both wrongful approvals and wrongful rejections).

The rest of this paper is structured as follows: In Section 2 previous research work related to fraud control is first outlined and the existence of the research gap is discussed. In Section 3 the Perfect State Dynamic Model with rigorous mathematical formulation are introduced and the intractableness of the model is then discussed. Three approximate dynamic control models are proposed in Section 4, and the test results of their performance are included in Section 5. Section 6 concludes this paper.

## 2 Related Research

Online shopping fraud detection research using machine learning methodologies started from early 90’s right after the occurrence of E-commerce, in which the major research task was to evaluate fraud risk levels of transactions. Fraud risk level was measured using risk scores, and thus the research on risk scoring gained widespread attention. These scoring engines were inspired by neural network GhoshReilly1994FraudDetectionNN (); Aleskerov1997FraudDetectionNN (); Dorronsoro1997FraudDetectionNN (), decision tree mena2002FraudDetectionBook (); Kirkos2007FraudDetectionDT (); Sahin2013FraudDetectionDT (), random forest Bhattacharyya2011FraudDetectionRF (), network approach APATE2015FraudDetectionNW () and deep neural network Kang2016FraudDetectionCNN (). Readers who are interested in this topic may also refer to Review2011DSS () and references therein for other related papers that discussed different scoring methods. Despite the fact that current research admits the fact that fraud patterns keep changing and fraud risk scores are not always that reliable, no existing papers discuss how to optimally utilize these scores in fraud control operations. On the other hand, data mining papers provide weak guidance in detailed operations, as risk score is indeed a blur expression of fraud. There is currently no literature demonstrating how to deal with the transactions in ”gray zone”, where the risk score of a transaction is neither too low nor too high. Additionally, no literature has addressed interactions of decisions made by multiple parties for transaction risk control. The main reason of lack of related literature is that e-commerce data are strictly confidential and thus very limited access are granted for academic researches. Our this paper fills the gaps between the transaction fraud evaluation and the systematic risk control operations.

Our research is motivated by the current research gap in risk management literatures. Problem formulations in Section 3 is supported by POMDP literatures, and heuristic solution algorithms are inspired by the ideas in ADP literatures. We studied some realistic issues in fraud control domain, and adapted the general POMDP models and ADP heuristics to fit the structure of fraud control problem. The model and algorithms proposed in this paper are not limited to the application of transaction fraud control, and can be easily extended to other fraud control and defense applications in finance, healthcare, electrical system, robotics, and homeland security.

## 3 Problem Formulation

In this section, we rigorously formulate the dynamic control model assuming that the state information and the state transition information in the dynamic control model can be exactly characterized. However, the state information and the state transition probabilities in perfect state model, called (Perfect) in Section 3.1, are not explicit, which need to be approximated from incomplete streaming data. Section 3.2 discusses challenges in solving the dynamic model.

### 3.1 Perfect State Dynamic Model

We first focus and investigate the expected profit in transaction level, which are the building blocks of the control system. Let , and denote risk score, profit margin and costs (cost of goods, manual review costs, chargeback fine, etc.) respectively. has a finite integral support with upper bound , and , are real numbers. According to system logistics shown in Figure 1, profits of approval (), review () and rejection () of this transaction can be formulated as follow:

 Rapp(w)= δw( Bank Auth. ∩ Non-fraud)⋅m−δw( Bank Auth. ∩ Fraud)⋅c Rrev(w)= δw( Bank Auth. ∩ MR App. ∩ Non-% fraud)⋅m −δw( Bank Auth. ∩ MR App. ∩ % Fraud)⋅c−δw( Bank Auth.)⋅c0 Rrej(w)= 0

where is unit labor cost for each manual review, and is the indicator function, i.e. given event ,

 δ(H)={1if H is true;0if H is false.

Given the fact that risk score is a comprehensive evaluation of the risk level, which is estimated using thousands of transaction attributes, we assume that for any two transactions that have the same risk score , i.e. and , the interactive effect of bank or MR are identical, which can be expressed in the mathematical form as,

 Pr(H∣w)=Pr(H∣s)=Pr(H∣w′) (1)

With Eq.(1), the expected profit for each risk operation for transaction can be derived as

 E[Rapp(w)]= Pr(Bank Auth. ∩ Non-fraud∣s)⋅m −Pr(Bank Auth. ∩ Fraud∣s)⋅c = g1(s)⋅m−g2(s)⋅c (2a) E[Rrev(w)]= Pr(Bank Auth. ∩ MR App. ∩ Non-% fraud∣s)⋅m −Pr(Bank Auth. ∩ MR App. ∩ Fraud% ∣s)⋅c −Pr(Bank Auth.∣s)⋅c0 = g3(s)⋅m−g4(s)⋅c−g5(s)⋅c0 (2b) E[Rrej(w)]= 0. (2c)

-functions in Eq.(2a)-(2c) are probabilities of different events given risk score . -function is short for gold function, whose values represent profit-related probabilities associated with different risk decisions.

We further delve into a realistic dynamic system, in which banks and MR decision behaviors are changing dynamically. We consider a discrete time dynamic control model with infinite time horizon . Let be a set of transactions occurred during period . Elements of this transaction set include the risk score , margin and costs of this th transaction in period . Let be the total number of transactions occurred during period , so is then a random variable. We can then formally define the dynamic control model as follow.

• State space: , which is a set of 5 -functions values at all risk scores. In period , the state can be expressed as .

• Action space in period : , which has feasible decision sequences. Let be one feasible action sequence in period , and for the th transaction, risk control engine can choose action .

• State transition probability matrix: , where is the probability that system move from state to state when taking action sequence a. We assume that is fixed but implicit through out this paper.

Let be the reward-to-go function at the beginning of period , then this stochastic dynamic model can be formulated with Bellman’s equation as

 u(S(t))=maxa(t)∈A(t) ⎧⎨⎩E⎡⎣N(t)∑j=1Ra(t)j(w(t)j,S(t))⎤⎦ +α⋅∑s(t+1)QS(t),S(t+1)(a% (t))⋅u(S(t+1))⎫⎬⎭ (Perfect)

where is a discount factor of future rewards, and reward function can be formulated as

Throughout the entire paper, we assume that a finite number of transactions occurred in each period, and the reward of each transaction is bounded. Theorem 3.1 gives the condition that Model (Perfect) has a unique optimal solution.

###### Theorem 3.1.

If (1) number of transaction occurred in each period is finite and margin/loss from each transaction is bounded, and (2) the arriving process of transactions is stationary, then there exists an optimal profit satisfying

 u∗(S)=maxa{E[N∑j=1Raj(wj,S)]+α⋅∑S′QS,S′(a)⋅u∗(S′)},

and there is a unique solution to this equation.

Theorem 3.1 is guaranteed by contracting mapping argument and directly follows Theorem 6.2.3 and Theorem 6.2.5 from MDP1994 ().

### 3.2 Incomplete Information and Intractableness of (Perfect) Model

Although Theorem 3.1 provides solid guidance to find the optimal control strategy, there are several issues of implementing Model Perfect in reality.

1. Exact state information is unavailable: State information, i.e. -functions, can only be inferred using partially mature data, since data maturity lead time is a latent random variable with range . We have no way to obtain the true time point of maturity for each transaction until the transaction is eventually marked as a chargeback. However, through analyzing the historical data we do have the knowledge that after periods of time the fraud status (having chargeback or not) should be all mature;

2. Reward functions are not entirely exact: Reward functions, , are based on estimations of -functions, and is not known a priori. Therefore, reward functions could vary due the different estimated -functions and the different ;

3. Transition probability matrix does not have an explicit form: State space has extremely high dimension (five -functions estimated at risk scores); Action space has exponential dimension that explosively increase as number of transactions increases ( has possible decision sequences).

The lag of data maturity and the curse of dimensionality lead to the fact that Model (Perfect) is intractable. Thus we propose three approximate dynamic heuristics to obtain suboptimal control decisions. Details of these different control algorithms will be demonstrated in Section 4.

All dynamic control heuristics require a base module which utilizes incomplete information, such as the mature old data and the partially mature recent data, to infer future -functions in these heuristic algorithms. Data mining results suggest that correlations exist between recent period’s partially mature chargeback rate and bank/MR behavior patterns. This fact implies that we should track partially mature chargeback rate of transactions portfolio in period , so that -functions can be properly calibrated. This happens to have the same view with business intuitions in multi-party fraud control: If bank and MR learn that recently received transactions have high chargeback rate, they will become more conservative with their decision making by reducing the number of authorization/approval decisions to prevent more undesirable chargebacks. Two decision environment modules, Current Environment Inference (CEI) module and Future Environment Inference (FEI) module, are adopted from GFunctionEstimation2018TKDE (). Discussion of these two modules are out of the scope of the current paper, we suggest readers refer to GFunctionEstimation2018TKDE () for details of CEI and FEI modules. CEI and FEI utilize historical data to produce -function estimations, which contribute to the data-driven property of our risk control framework.

## 4 Dynamic Risk Control Algorithms

In this section, we propose three different dynamic risk control algorithms: Naive, Myopic and Prospective control. Naive control is the simplest heuristic algorithm that only uses fully mature data before period . Myopic control estimates the current decision environment using CEI module with both mature and immature data in period . The most complex control model, Prospective control, further takes into account that current decision will influence not only the current profit but also the near future profit. Three models are demonstrated in Section 4.1 - 4.3.

### 4.1 Naive control

Figure 3 depicts decision flow of naive control. At the beginning of period , decision engine uses mature data before period to estimate -functions.

In period , let =, , , , be the estimated current state, and be the action space of period . Then feasible action sequence has a form of , where . Naive model disregards the future effects. For transactions take place in period , we need to solve the following model to get action sequence .

 maxa(t)∈A(t) E⎡⎣N(t)∑j=1^Ra(t)j(w(t)j)⎤⎦ (Naive-t) s.t. E[^Rapp(w(t)j)]=g(t−L)1(s(t))⋅m−g(t−L)2(s(t))⋅c E[^Rrev(w(t)j)]=g(t−L)3(s(t))⋅m−g(t−L)4(s(t))⋅c−g(t−L)5(s(t))⋅c0 E[^Rrej(w(t)j)]=0 A(t)={app,rev,rej}N(t)

Naive control repeats this procedure for each period . Theorem 4.1 claims that (Naive-t) can be easily solved by greedily choosing the decision option that yields the highest expected reward for each incoming transaction. Details about Naive control policy is summarized in Algorithm 1.

###### Theorem 4.1.

Optimal action sequence of (Naive-t) can be obtained by the greedy algorithm, i.e. for , sequentially set

 a(t)∗j=argmaxa(t)j∈{app,rev,rej}E[^Ra(t)j(w(t)j)]
###### Proof.

The rewards of different transactions are independent, and for period , (Naive-t) can be decomposed into sub-maximization problems. Thus the greedy algorithm can solve (Naive-t) exactly. ∎

### 4.2 Myopic control

Figure 4 shows the decision flow of myopic control.

This control model is designed to resolve the pattern recognition lag issue due to the delay of data maturity. We adopt CEI module from GFunctionEstimation2018TKDE () to infer current period decision environments. Mathematically, CEI maps matured -function trajectories (, , , , and : ) and partially mature chargeback rate to estimate -functions ( and ) at current period, .

 ⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣^g(t)1(s)^g(t)2(s)^g(t)3(s)^g(t)4(s)^g(t)5(s)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣^Φ(t)1(s,g(t′)1(s),ρ(t−l)PCB)^Φ(t)2(s,g(t′)2(s),ρ(t−l)PCB)^Φ(t)3(s,g(t′)3(s),ρ(t−l)PCB)^Φ(t)4(s,g(t′)4(s),ρ(t−l)PCB)^Φ(t)5(s,g(t′)5(s),ρ(t−l)PCB)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦ (CEI)

where is calculated by

 ρt−lPCB=(\# of chargeback % transactionsin week t−l occurred before week t)(\# of % finally approved transactions in week t−l).

Then for transactions occurred in period , Myopic Dynamic Control model solves the following model to get action sequence .

 maxa(t)∈A(t) E⎡⎣N(t)∑j=1^Ra(t)j(w(t)j)⎤⎦ (Myopic-t) s.t. E[^Rapp(w(t)j)]=^g(t)1(s(t))⋅m−^g(t)2(s(t))⋅c E[^Rrev(w(t)j)]=^g(t)3(s(t))⋅m−^g(t)4(s(t))⋅c−^g(t)5(s(t))⋅c0 E[^Rrej(w(t)j)]=0 A(t)={app,rev,rej}N(t)

CEI module is updated at the beginning of each period and (Myopic-t) is solved during each period to provide optimal control actions. Theorem 4.2 provides theoretical guarantee that (Myopic-t) can be solved by the greedy method. Details of Myopic control policy is summarized in Algorithm 2.

###### Theorem 4.2.

The optimal action sequence of (Myopic-t) can be obtained by the greedy algorithm, i.e. for , sequentially set

 a(t)∗j=argmaxa(t)j∈{app,rev,rej}E[^Ra(t)j(w(t)j)]

The proof of Theorem 4.2 is similar with proof of Theorem 4.1 and thus omitted.

### 4.3 Prospective control

Figure 5 depicts decision flow of prospective control.

Prospective control model has a similar CEI module that can diminish pattern recognition lag. In addition, FEI module is adopted from GFunctionEstimation2018TKDE () to estimate future decision environment change due to the action taken at current period. These environments are characterized by the -functions of period and . Similar with Myopic control, in period , we use the output of the CEI module as the state estimation, i.e. . Action space of period is still . While different from previous two control models, prospective control considers future effects caused by the current decisions: the action sequences will play a role on the behavior patterns of bank and MR in period . For transactions occurred in period , we need to solve the following model to get our action sequence .

 maxa(t)∈A(t) E⎡⎣N(t)∑j=1^Ra(t)j(w(t)j)⎤⎦+λ⋅Δ (Prospective-t) s.t. E[^Rapp(w(t)j)]=^g(t)1(s(t)j)⋅m(t)j−^g(t)2(s(t)j)⋅c(t)j, ∀j E[^Rrev(w(t)j)]=^g(t)3(s(t)j)⋅m(t)j−^g(t)4(s(t)j)⋅c(t)j−^g(t)5(s(t)j)⋅c0, ∀j E[^Rrej(w(t)j)]=0, ∀j A(t)={app,rev,rej}N(t)

where is a discount factor, and is a reference future profit of period . A reference sample from mature control group is bootstrapped from mature data set in order to provide reference future profit . Let this reference transaction set sample be with elements. FEI module includes two sub-procedures:

1. Calculate estimated chargeback rate of period , : at a given time point during period , suppose we have received transaction request, and our decision action sequence is , we can then estimate charge back rate of period ,

 ^ρ(t)CB=1∑n′j=1δ(atj≠Rej.)(n′∑j=1^gt2(stj)⋅δ(atj=App.)+n′∑j=1^gt4(stj)⋅δ(atj=Rev.)) (3)

where is the indicator function.

2. Predict future -functions (, , , and ) with matured -function trajectories (, , , , and : ) and estimate weekly full chargeback rate . FEI is trained with mature data and

 ⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣^g(t)1(s)^g(t)2(s)^g(t)3(s)^g(t)4(s)^g(t)5(s)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣^Ψ(t)1(s,g(t′)1(s),^ρ(t)CB)^Ψ(t)2(s,g(t′)2(s),^ρ(t)CB)^Ψ(t)3(s,g(t′)3(s),^ρ(t)CB)^Ψ(t)4(s,g(t′)4(s),^ρ(t)CB)^Ψ(t)5(s,g(t′)5(s),^ρ(t)CB)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦ (FEI)

(Prospective-t) is hard to solve due to high dimension of and non-analytic form of . A similar real-time updated greedy heuristic is introduced to obtain a sub optimal solution for (Prospective-t). This Real-time Greedy Heuristic (RGH) allows us to update estimation of on the fly and to adjust our strategy within period . Figure 6 illustrates the logics of RGH within period .

Let time be a decision time point in period where transaction occurs and risk team needs to make decision either to approve, reject or manual review this transaction. Suppose from , starting point of period , to current decision point , we have observed transactions. Hence, we can estimate the chargeback rate of period , if we approve, review or reject using Eq. (4).

 ^ρ(t)CB(τ)=1∑n1+1j=1δ(atj≠Rej.)(n1+1∑j=1^gt2(stj)⋅δ(atj=App.)+n1+1∑j=1^gt4(stj)⋅δ(atj=Rev.)) (4)

We further estimate the expected reward of approval, review or rejection of . Note that the future effect is first averaged to reward per transaction and then discounted by a factor of .

 RFapp(w(t)n1+1)=E[^Rapp(w(t)n1+1)]+λmΔτ,app (5a) RFrev(w(t)n1+1)=E[^Rrev(w(t)n1+1)]+λmΔτ,rev (5b) RFrej(w(t)n1+1)=E[^Rrej(w(t)n1+1)]+λmΔτ,rej (5c)

and for ,

 Δτ,a= (6) maxa(t+l)∈~A(t+l) E[m∑k=1^Ra(t)k(~w(t+l)k)] s.t. E[^Rapp(~w(t+l)k)]=^g(t+l)1(~s(t+l)k)⋅~m(t+l)k−^g(t+l)2(~s(t+l)k)⋅~c(t+l)k, E[^Rrev(~w(t+l)k)]=^g(t+l)3(~s(t+l)k)⋅~m(t+l)k−^g(t+l)4(~s(t+l)k)⋅~c(t+l)k −^g(t+l)5(~s(t+l)k)⋅c0, E[^Rrej(~w(t+l)k)]=0, ~A(t+2)={app,rev,rej}m

where are calculated using Eq. (4), and is derived by (FEI). RGH sequentially assigns action that has the largest prospective reward to each incoming transaction. For , we sequentially set

 a(t)∗j=argmaxa(t)j∈{App.,Rev.,Rej.}RFa(t)j(w(t)j). (Prospective-RGH)

Prospective control algorithm is summarized in Algorithm 3.

## 5 Field Tests on Microsoft E-commerce

Field tests were conducted to exam the performances of these three dynamic models. Testing dataset was extracted from a sub-unit of Microsoft E-commerce business. We sample no more than 3% of total transactions as the testing data set. For transactions in the testing set, we recorded decisions in our database while we flipped all final rejected transactions to final approval, so that we could obtain unbiased chargeback signals for model training and profit calculation. We set the length of the testing period to one week and tested all dynamic control model paralleling with current Microsoft inline decision engine.

Our data indicated that maximum lead time for the data maturity was , and the recent partially mature reference time was . The testing time window is 14 weeks, and the bank and MR decisions for each transaction are kept identical for different control methods to ensure apple-to-apple comparison. The historical data continued to be maturing while the testing time moved forward. For Naive and Myopic controls, the risk decision engine was updated weekly (-functions and (CEI) module was retrained at the beginning of each period). For Prospective control, the risk decision engine refreshed the belief of current -functions, (CEI) module and (FEI) module once a week, while estimations of current week chargeback rate and future -functions in real-time were updated. Due to the Microsoft’s confidentiality requirements, the name of the E-commerce sub-unit is muted, and this section only includes the summarized feature values that were aggregated over a 14-week of transaction period to demonstrate the usability of Naive, Myopic and Prospective control models. The discount factor in Prospective control model was tuned using -fold cross validation at the beginning of the testing and is a fixed valued, , throughout the 14 week testing periods.

We first studied numbers of different risk control operations (approve, review, and reject) out of total testing transactions. Figure 7 summarizes counts of different risk control decisions made by the current Microsoft’s decision engine and three proposed dynamic control engines. Over the 14 weeks testing period, the dynamic control engines gradually captured the decision accuracy of the manual review group. All of the three models learned the fact that manual review agents overly rejected non-fraud transactions, and thus started to cut off the volume of transactions submitted to the manual review team. Figure (a)a suggests that all three dynamic models approved more transactions than the current decision engine did. We can see later that these dynamic models also enhanced decision accuracy significantly in Figure 8: the dynamic control models not only approved more non-fraud transactions but also approved fewer fraud transactions. All three models suggest sending fewer transactions for manual review. Naive control aggressively decreased review volume to only 10% of the review volume suggested by current Microsoft’s decision engine, while Myopic and Prospective control mildly decreased review volume to roughly 30% of the original volume. As for the decision of rejection, Naive control increased rejection volume by about 12%, while Myopic and Prospective control decreased rejection volume by 12.5% and 9% respectively. We can also observe the fact that Myopic and Prospective control models again enhanced decision accuracy in Figure 8 by rejecting much fewer non-fraud transactions but more fraud transactions.

Numbers of performance measures were used to validate the decision quality of a risk control engine. First, we investigated the decision quality by comparing the losses caused by wrong decisions. Two common performance metrics for this are false negative (FN) loss and false positive (FP) loss. FN loss measures the total loss of approving fraud transactions (wrongly approval), which consists cost of goods and all related fees of chargeback. On the other hand, FP loss measures the total loss of rejecting non-fraud transactions (wrongly rejection), and it includes all the margins that should have been but not earned. We then checked the manual review (MR) cost, which is the total labor cost of the human review team. We found that when the risk engine submitted transactions that included fewer frauds (true negative: rightful approval) to the manual review teams, manual review teams tended to have a much more difficult time to make accurate risk decisions since fraud patterns are less massive and recognizable. Therefore, with more transactions sent to manual review teams, not only more labor costs will arise, but the decision accuracy instability will likely to increase. Figure 8 summarizes aggregated improvement on FN loss, FP loss and MR cost on the selected testing data set.

Figure (a)a shows the fact that all three dynamic control methods made better ”approval” decisions by producing fewer FN losses. Naive, Myopic and Prospective control model decreases FN loss by 8.48%, 7.32%, and 7.55% respectively. Figure (b)b suggests that Naive control model is relatively aggressive which rejected more non-fraud transactions and yielded 9.49% more FP losses. Meanwhile, Myopic and Prospective control mildly decrease FP loss by 4.73% and 3.05% respectively, and these two dynamic control methods make more correct rejections. As mentioned earlier, all dynamic decision engine found that MR had limited accuracy in detecting fraud. In this way, Naive, Myopic and Prospective control model deceased transactions submit for review by 93.0%, 64.2%, and 64.7%.

Second, we compare the differences of total profits and total chargeback rates among three dynamic control methods and current Microsoft’s risk control method. Providing higher profit is the ultimate goal for business operations. While on the other hand, risk control team also needs to ensure the new dynamic control methods do not escalate the chargeback rate for merchants. We need to ensure that proposed dynamic control methods can produce higher profit but not increase (or even lower) the chargeback rate.

Table 1 summarizes insights of improvements in overall profit and chargeback rate. The first row of Table 1 includes profit improvement on the testing set calculated by . The second row extrapolates total profit from training set to an estimated annual improvement on the selected sub-unit. The third row reports the relative differences in proportion on chargeback rates, calculated by

 chargeback rate(Dynamic)−chargeback rate%(Microsoft)chargeback rate(Microsoft).

Over the 14 week testing period, Naive control contributed $79,962 more on the testing portfolio while maintained a similar chargeback rate with current Microsoft risk decision engine had. Naive control decreased chargeback rate slightly by only 0.72% of Microsoft’s current chargeback rate. Myopic control contributed to the largest profit improvement for$97,863 on the testing set. Meanwhile, Myopic control decreased chargeback rate relatively for 1.64%. Finally, for Prospective control, it produced $96,693 more profit on the testing transaction set, while provided the largest improvement on chargeback rate by decreasing chargeback rate by 2.98%. The estimated annual improvements for Naive, Myopic and Prospective control on selected sub-unit were$ 9,900,071, $12,116,318, and$ 11,971,568 respectively by extrapolation.

We conclude this section with a few business takeaways. We have seen that all three models have potentials for significantly improving company profit while slightly decreasing chargeback rates. All three dynamic models enhanced decision qualities by decreasing FN losses, FP losses and MR costs. Although Naive control model performed relatively aggressive in rejecting transactions, Myopic and Prospective control made better rejection decisions by rejecting fewer non-fraud transactions. All three dynamic methods had great performance with approving more non-fraud transactions and rejecting more fraud transactions. Artificial intelligence modules in these dynamic control models were well developed, and outperformed human review agents one most of the fraud decisions. Manual review volumes decreased as expected, and hence MR labor costs were reduced significantly.

## 6 Conclusion and Future Study

To minimize ad hoc human-made decision, and improve the accuracy and robustness of the risk decision making, we investigated how to reach the optimal action when if complete information is available. We defined our problem rigorously, characterized all profit related components in the current system and investigated decision interactions between three different decision-making parties. We acknowledged the fact that perfect information is unavailable in reality and thus we designed three data-driven dynamic optimal control models, Naive control, Myopic control, and Prospective control. These control models are 100% data-driven and self-trained/adapted in a real-time manner. As demonstrated, these dynamic control models helped increase the profit significantly by minimizing false negative loss, false positive loss, and manual review costs by employing incomplete information, including long-term and short-term mature and partially-mature data. Meanwhile, the proposed control models also slightly lowered chargeback rates as desired. The field test on sub-unit of Microsoft E-commerce suggested that the discriminative dynamic control models had better fraud detection performance than the current general score cut-off control.

The research proposed in this paper can contribute greatly to both theoretical and applied research on fraud detection for the systems that have problems with incomplete information and decision looping effect due to multiple decision parties. Its application is not limited to financial risk systems, but can also be used for application and research in cyber-security, homeland security, contagion disease screens etc.. Our future research will include information sharing and information fusion. We will extend this current research to more complex and realistic settings, where information sources are shared at different levels among different risk control decision parties.

## Acknowledgments and Funding Sources

This research was supported by Microsoft, Redmond, WA. The authors are thankful to researchers and members from Microsoft Knowledge and Growth group for providing data and their knowledge of the system.

## References

• (1) S. Ghosh, D. L. Reilly, Credit card fraud detection with a neural-network, in: Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences, Vol. 3, IEEE, 1994, pp. 621–630.
• (2) E. Aleskerov, B. Freisleben, B. Rao, Cardwatch: a neural network based database mining system for credit card fraud detection, in: Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering, IEEE, 1997.
• (3) J. R. Dorronsoro, F. Ginel, C. Sanchez, C. S. Cruz, Neural fraud detection in credit card operations, IEEE Transactions on Neural Networks 8 (4) (1997) 827–834.
• (4) J. Mena, Investigate data mining for security and criminal detection, Butterworth-Heinemann, 2002.
• (5) E. Kirkos, C. Spathis, Y. Manolopoulos, Data mining techniques for the detection of fraudulent financial statements, Expert Systems with Applications 32 (4) (2007) 995â1003.
• (6) Y. Sahin, S. Bulkan, E. Duman, A cost-sensitive decision tree approach for fraud detection, Expert Systems with Applications 40 (2013) 5916â5923.
• (7) S. Bhattacharyya, S. Jha, K. Tharakunnel, J. C. Westland, Data mining for credit card fraud: a comparative study, Decision Support Systems 50 (2011) 602