Catching Loosely Synchronized Behavior in Face of Camouflage
Abstract.
The problem of online fraud detection can often be formulated as mining a bipartite graph of users and objects for suspicious patterns. The edges in the bipartite graph represent the interactions between users and objects (e.g., reviewing or following). However, smart fraudsters use sophisticated strategies to influence the ranking algorithms used by existing methods. Based on these considerations, we propose FraudTrap, a fraud detection system that addresses the problem from a new angle. Unlike existing solutions, FraudTrap works on the object similarity graph (OSG) inferred from the original bipartite graph. The approach has several advantages. First, it effectively catches loosely synchronized behavior in face of different types of camouflage. Second, it has two operating modes: unsupervised mode and semisupervised mode, which are naturally incorporated when partially labeled data is available to further improve the performance. Third, leveraging more features, it is easy to accommodate complicated application scenarios. Fourth, all algorithms we design have nearliner time complexities and apply on large scale realworld datasets. Aiming at each characteristics of FraudTrap, we design corresponding experiments that show FraudTrap outperforms other stateoftheart methods in both synthetic and realworld datasets.
1. Introduction
Fraud has severely detrimental impacts on the business of social networks and other online applications (CrimeReport, ). A user can become a fake celebrity by purchasing “zombie followers” on Twitter. A merchant can boost his reputation through fake reviews on Amazon. This phenomenon also conspicuously exists on Facebook, Yelp and TripAdvisor, etc. In all the cases, fraudsters try to manipulate the platform’s ranking mechanism by faking interactions between the fake accounts they control (fraud users) and the target customers (fraud objects).
These scenarios are often formulated as a bipartite graph of objects and users. We define an object as the target a user could interact with on a platform. Depending on the application, an object can be a followee, a product or a page. An edge corresponds to the interaction from a user to the object (e.g., reviewing or following). Detecting fraud in the bipartite graph has been explored by many methods. Since fraudsters rely on fraudulent user accounts, which are often limited in number, to create fraudulent edges for fraud objects’ gain (CATCHSYNC, ), previous methods are mainly based on two observations: (1) fraud groups tend to form dense subgraphs in the bipartite graph (highdensity signal) , and/or (2) the subgraphs induced by fraud groups have unusually surprising connectivity structure (structure signal). These methods mine the bipartite graph directly for dense subgraphs or rare structure patterns. Their performance varies in realworld datasets.
Unfortunately, smart fraudsters use more sophisticated strategies to avoid such patterns. First, by multiplexing a larger pool of fraud users, a fraudster can effectively reduce the density of the subgraph induced by a fraud group. This is called loosely synchronized behavior and leads to the limited performance of the methods(FRAUDAR, ; CROSSSPOT, ; KCORE, ; SPOKEN, ; NETPROBE, ) depending on the highdensity signal. Another commonly used technique is to create edges pointing to normal objects to disguise fraud users as normal ones. This strategy, often called camouflage, alters the connectivity structure of the bipartite graph and weakens the effectiveness of many approaches targeting such structure, such as HITS(CATCHSYNC, ; ECommerce, ), and belief propagation (BP)(FRAUDEAGLE, ; NETPROBE, ). Fig. 1 illustrates these two strategies.
The problem of fraud detection can also be handled using supervised or semisupervised approaches when (partially) labeled data are available. (ADOA, ; SYBILINFER, ) provide better performance using a subset of labeled frauds. (Abdulhayoglu2017HinDroid, ; 2010Uncovering, ; Egele2017Towards, ) build machine learning classifiers to detect anomalies. These approaches, however, have a number of limitations. Firstly, it is often very difficult to obtain enough labeled data in fraud detection due to the scale of the problem and the cost of investigation. Secondly, they require great effort in feature engineering which is tedious and demands high expertise level. Thirdly, they often fail to detect new fraud patterns. Finally, even though some labeled data can provide potentially valuable information for fraud detection, it is not straightforward to incorporate them into existing unsupervised or semisupervised solutions such as (ADOA, ; SYBILINFER, ).
In this paper, we propose FraudTrap, a graphbased fraud detection algorithm that overcomes these limitations with a novel change of the target of analysis. Instead of mining the bipartite graph directly, FraudTrap analyzes the Object Similarity Graph (OSG) that is derived from the original bipartite graph. There are two main advantages of our design: (1) fraud objects exhibit more similar behavior patterns since fraud objects are difficult to gain edges from normal users, which endows FraudTrap with inherent camouflageresistance (Sec. 4.4); (2) since the number of objects is typically smaller than the number of users (amazon_data, ), working with OSG reduces computation cost while guaranteeing the effectiveness. In addition, although FraudTrap works well without any labels, we can easily switch to a semisupervised mode and improve performance with partial labels.
In summary, our main contributions include:
1) [Metric ]. We build Object Similarity Graph (OSG) by a novel similarity metric, score, which transforms the sparse subgraphs induced by fraud groups in the bipartite graph into the much denser subgraphs in OSG, by merging information from unlabeled and labeled(if available) data.
2) [Algorithm LPATK]. We propose a similaritybased clustering algorithm, LPATK, that perfectly fits in OSG and outperforms the baseline (LPA) in face of noise edges (camouflage).
3) [Metric ]. Given candidate groups returned by + LPATK, we propose an interpretable suspiciousness metric, score, meeting the all basic “axioms” proposed in (CROSSSPOT, ).
4) [Effectiveness]. Our method FraudTrap ( + LPATK + ) can operate in two modes: unsupervised and semisupervised. The unsupervised mode outperforms other stateoftheart methods for catching synchronized behavior in face of camouflage. Semisupervised mode naturally takes advantage of partially labeled data to further improve the performance.
2. Related work
To maximize their financial gains, fraudsters have to share or multiplex certain resources (e.g., phone numbers, devices). To achieve the “economy of scale”, fraudsters often use many fraudulent user accounts
Unsupervised. Unsupervised methods achieve various performance on fraud detection. There are two types of unsupervised detection methods in the literature.
The first type is based on highdensity subgraphs formed by fraud groups. Mining dense subgraphs in the bipartite graph (KCORE, ; SPOKEN, ; DCUBE, ) is effective to detect the fraud group of users and objects connected by a massive number of edges. Fraudar (FRAUDAR, ) tries to find a subgraph with the maximal average degree using a greedy algorithm. CrossSpot (CROSSSPOT, ) focuses on detecting dense blocks in a multidimensional tensor and gives several basic axioms that a suspiciousness metric should meet. People have also adopted singularvalue decomposition (SVD) to capture abnormal dense user blocks (INFERRING, ; FBOX, ). However, fraudsters can easily evade detection by reducing the synchrony in their actions (details in Sec. 3).
The second type is based on rare subgraph structures of fraud groups. Such structures may include the sudden creation of massive edges to an object, etc.BP (FRAUDEAGLE, ; NETPROBE, ) and HITS (COMBATING, ; CATCHSYNC, ; UNDERSTAN, ) intend to catch such signals in the bipartite graph. FraudEagle (FRAUDEAGLE, ) uses the loopy belief propagation to assign labels to the nodes in the network represented by Markov Random Field (MRF). (Shah2017EdgeCentric, ) ranks abnormality of nodes based on the edgeattribute behavior pattern by leveraging minimum description length. (Kumar2017FairJudge, ; Hooi2015BIRDNEST, ) use Bayesian approaches to address the ratingfraud problem. SynchroTrap (SYNCHROTRAP, ) works on the user similarity graph. In all the cases, it is relatively easy for fraudsters to manipulate edges from fraud users to conceal such structural patterns (details in Sec. 3). The common requirement of parameter tuning is also problematic in practice, as the distribution of fraudsters changes often.
Fraudar(FRAUDAR, ) 
Spoken (SPOKEN, ) 
CopyCatch(COPYCATCH, ) 
CatchSync (CATCHSYNC, ) 
CrossSpot (CROSSSPOT, ) 
Fbox(FBOX, ) 
FraudEagle(FRAUDEAGLE, ) 
Mzoom(MZOOM, ) 
FraudTrap 


Loose synchrony?  
Camouresistant?  ?  
Side information?  
Semisupervised? 
(Semi)supervised. When partially labeled data are available, semisupervised methods can be applied to anomaly detection. The fundamental idea is to use graph structure to propagate known information to unknown nodes. (SEMIGRAPH, ; SYBILBELIEF, ) model graphs as MRFs and label the potential suspiciousness of each node with BP. (INTEGRO, ; KEEPFRIENDS, ; SYBILINFER, ) use the random walk to detect Sybils. ADOA(ADOA, ) clusters observed anomalies into clusters and classifies unlabeled data into these clusters according to both the isolation degree and similarity. When adequate labeled data are available, people have shown success with classifiers such as multikernel learning(Abdulhayoglu2017HinDroid, ), support vector machines (Tang2009Machine, ) and nearest neighbor (KNEAR, ). However, it is rare to have enough fraud labels in practice.
3. Design Considerations
We provide details why fraudsters can easily evade existing detection, and present the key ideas of FraudTrap design.
3.1. How Smart Fraudsters Evade Detection?
Reducing synchrony in fraud activities. One of the key signals that existing fraud detection methods rely on is the highdensity of a subgraph. A naive fraud campaign may reuse some of the resources such as accounts or phone numbers, resulting in highdensity subgraphs. However, experience shows that fraudsters now control larger resource pools and thus can adopt smarter strategies to reduce the synchrony by rotating the fraud users each time. For example, (CATCHSYNC, ) reports that on Weibo, the Chinese Twitter, a fraud campaign uses 3 million fraud accounts, a.k.a. zombie fans, to follow only 20 followees (fraud objects). Each followee gains edges from a different subset of the followers (CATCHSYNC, ). The edge density (the ratio of the number of edges to the maximum number of possible edges given its nodes) of the subgraph induced by the fraud group is only , which is very close to legit value. This strategy, as our experiments in Sec.5.2 will show, effectively reduces the synchrony and deceives many subgraphdensitybased methods (FRAUDAR, ; KCORE, ; SPOKEN, ; DCUBE, ; CROSSSPOT, ). For example, FRAUDAR (FRAUDAR, ), it is susceptible to synchrony reduction (details in Sec.5.2).
Adding camouflage. Fraudsters also try to confuse the detection algorithm by creating camouflage edges to normal objects, making the fraud users behave less unusual (Fig.1 (2)). According to (FRAUDAR, ), there are four types of camouflages: 1) random camouflage: adding camouflage edges to normal objects randomly; 2) biased camouflage: creating camouflage edges to normal objects with high indegree. 3) hijacked accounts: hijacking honest accounts to add fraudulent edges to fraud objects. 4) reverse camouflage: tricking normal users to add edges to fraud objects.
Camouflage severely affects graphstructurebased methods(FRAUDEAGLE, ; NETPROBE, ; COMBATING, ; CATCHSYNC, ; UNDERSTAN, ), as fraudsters can reshape the structure without many resources. For example, our experiments in Sec.5.2 demonstrate that the degree and HITS scores from Catchsync (CATCHSYNC, ) stops working even with a moderate number of camouflage edges.
3.2. Our Key Ideas
The fundamental reason that the above two strategies succeed in deceiving existing methods that these methods are based on analyzing the original bipartite graph. The fraudsters can easily manipulate the graph (both the density and structure) with a large number of fraud users. Unfortunately, the current black market makes the number of fraud accounts easy to obtain.
We propose to attack the problem from a different angle. Objects that pay for fraud activities are similar because the fraudsters must use their fraud user pool to serve many objects to make a profit. Thus, instead of analyzing the userobject bipartite graph directly, we work on the similarity among different objects, which we capture as an object similarity graph (OSG) whose nodes are all objects and the edges represent the similarity among these objects. As we will show, with a carefully designed similarity score, a fraud object is much more similar with other fraud objects than normal ones and it is much harder for fraudsters to manipulate the OSG than the original bipartite graph. This is because, in the OSG, the subgraph formed by loosely synchronized behavior is much denser compared to the corresponding subgraph in the original userobject bipartite graph and the density of can not be altered by camouflage. Figure 3 shows an intuitive example.
Furthermore, we want to leverage side information available in different applications instead of letting the algorithm limit the choices. Specifically, we allow optionally including two types of information, (partial) fraud labels to offer a semisupervised mode for the algorithm and side information of the activities, such as timestamp and star rating, etc. As we will show, the similarity score we design is additive for both labels and extra dimensions, so it is easy to incorporate all available information into the uniform framework.
4. Methods
In this section, we first provide an overview of the workflow (Figure 2), and then we detail each of the three steps of the OSG construction (), clustering on OSG (LPATK), and spot suspicious groups (). Finally, we provide the intuitions and proofs.
Problem definition and workflow. Consider a bipartite graph of a user set and an object set , and another bipartite graph formed by a subset of labeled fraud users and the same object set . We use an edge pointing from a user to an object to represent an interaction between them, be it a follow, comment or purchase. FraudTrap works in three stages:

OSG construction: The OSG captures object similarity, and we design a metric, score, to capture the similarity between two objects based on user interactions. If is available, i.e., there are some labeled data, the score incorporates that data too.

Clustering on OSG: We propose an similaritybased clustering algorithm that clusters each object into a group based on its most similar neighbors on OSG.

Spot suspicious groups: Given candidate groups, it is important to use an interpretable metric to capture how suspicious an object / user group is, relative to other groups. We design the score metric for the purpose.
We elaborate these three stages in the rest of this section.
Symbols  Definition 

The set of users,  
The set of labeled fraud users,  
The set of objects,  
The bipartite graph,  
The bipartite graph,  
An edge, and  
The set of edges pointing to ,  
The set of edges pointing to ,  
Object Similarity Graph,  
Object Similarity Score, and  
A subgraph of , 
4.1. Stage I: OSG Construction ()
OSG captures the similarity between object pairs, and thus the first step is to define the similarity metric, score. The score has two parts, similarity in (unlabeled) and in (labeled). Formally, we define the similarity score between object and object as
(1) 
where is the similarity score calculated from the unlabeled , while is obtained from the labeled .
In , let be the set of edges pointing to . Following the definition of the Jaccard similarity (JACCARD, ), we define the similarity of between and , , as
(2) 
In , let represent the set of edges pointing to . Then the similarity score between and is given by:
(3) 
where is the mean of the set .
Leverage side information. If side information describing additional properties of the userobject interaction is available, we want to include the information in the detection. For example, (COPYCATCH, ) reports that the time feature is essential for fraud detection. To do so, we can augment an edge both in and using the following attribute tuple:
where can be a timestamp, starrating, etc. We can append as many attributes as we need into the tuple and combine the synchronized behavior into a single score . We give the following simple example.
Example 1: In a collection of reviews on Amazon, a review action indicates that a user reviewed product at the time on IP . Then, we use to denote the review action, . We discard for the comparisons in Eq.2 and Eq.3.
Approximate comparisons. Furthermore, we use a customizable operator to the set intersection and set union in Eq.2 and Eq.3. For example, considering two edgeattribute tuples and and let denote a time range, then if . To make the computation fast, we quantize timestamps (e.g., hours) and use operator.
Reducing the Cscore computation complexity. In the worst case, it takes to compute , during the OSG construction. In practice, we only need to compute the object pair with positive .
We use the keyvalue approach to compute the score. The key corresponds to a user , and the value, denoted by , is the set of all objects that has edges pointed to. Because , has an edge with , we increases the value of by , . Thus, we compute all s by searching and . And actually is the difference between and the sum of indegrees of and . Therefore, we can compute by finding all keyvalue pairs in and .
Naively, it takes to find all keyvalue pairs and takes to build . However, we expect to be sparse because an object only has positive score with a very small subset of objects in the OSG. Empirically, we evaluate the edge density in several datasets and find the edge density quite low in all cases. Section 5.3 provides more details.
Furthermore, due to the Zipflaw, in many real datasets, there are a few objects with extremely high indegrees in the bipartite graph. For example, a celebrity on Twitter (or a popular store on Amazon) has a vast number of followers (or customers). In our preprocessing step, we delete these nodes and their incoming edges, as the most popular objects are usually not fraudulent. This preprocessing significantly reduces and , and thus the overall computation time for OSG construction.
4.2. Stage II: Clustering on OSG (LPATK)
We propose an algorithm, Label Propagation Algorithm based on TopK Neighbors on Weighted Graph (LPATK), to cluster nodes of OSG into groups in face of camouflage. The algorithm is inspired by LPA(SEMILPA, ; raghavan2007near, ) that has proven effective in detecting communities with the dense connectivity structure, while LPA only works on unweighted graph and does not resist noise/camouflage edges.
LPATK takes the OSG as input and outputs multiple object groups, based on the similarity. Algorithm 12 describe LPATK.
Initialization (Line 13)
First, we assign each node in OSG a unique label. Second, we color all nodes so that no adjacent nodes share the same color. The coloring process is efficient and parallelizable, which takes only synchronous parallel steps (Barenb, ). And the number of colors, denoted by , is upper bounded by , where denotes the maximum degree over all nodes in .
Iterations (Line 48)
In the th iteration, each node updates its label by the function of labels of its neighboring nodes, denoted by . Since the update of a node’s label is only based on its neighbors, we can simultaneously update all the nodes sharing the same color. Thus, we need at most updates per iteration. The iteration continues until it meets the stop condition:
where is the label of in the th iteration, and tie represents a condition that changes because we have more than one label choices by (line 8).
Return Groups (Line 911)
After the iteration terminates, we partition into groups by nodes sharing the same final labels.
[Update Criteria: Sum]. Obviously, it is significant to design that determines the final results. Based on the update criterion in (SEMILPA, ) that only works on unweighted graphs, we first define as the following form:
(4) 
where is the set of neighbors of and is an indicator function:
According to Eq.4, the label of is determined by the sum of edge weights of each distinct label among its neighbors. Unfortunately, the results of clustering deteriorate as the camouflage edges increase. Fig.4(a) gives an intuitive concrete example.
[Update Criteria: Max]. To minimize the influence of camouflage, we propose another form of :
(5) 
where is the set of neighbors of and
Based on Eq.5, the label of is determined by the maximal edge weight of each distinct label among its neighbors. Although Eq.5 can eliminate the influence of camouflage because the most similar neighbor of a fraud object should also be fraudulent, the result of clustering is not well and a group of objects often is divided into multiple parts. Fig.4(b) gives an example.
[Update Criteria: Top K]. Based on these considerations previously mentioned, we propose our final form of , which can eliminate the influence of camouflage and keep ideal clustering results, following the Algorithm 2.
In Algorithm 2, the label of is determined by the sum of TopK maximal edge weights of each distinct label among its neighbors. Empirically, we set as a small integer (e.g., we set in our experiments). Not only does LPATK resist camouflage (because camouflage edges do not change its TopK most similar neighbors for a fraud object), but also has nice clustering performance (eliminate the probability that its label determined by certain neighbors). In Fig.4(a) will be labeled as ‘A’, and in Fig.4(b), will be labeled as ‘A’, using LPATK.
The algorithm is deterministic: it always generates the same graph partitions whenever it starts with the same initial node labels. Furthermore, the algorithm converges provably. We formally prove its convergence with the following theorem:
Theorem 4.1 ().
Given a graph , and , the algorithm uses the updating criterion (4) and the stop condition. Then the algorithm converges.
Proof.
Let be the number of monochromatic edges of at th iteration step, and . In the th step, at least one vertex changed its label if it does not meet the stop condition. This indicates that strictly increases during step , i.e. . ∎
4.3. Stage III: Spot Suspicious Groups ()
After generating all candidate groups, how to spot fraud groups? In this section, we propose an interpretable suspiciousness metric to score each group and find top suspicious groups. Given a fraud group s (returned by LPATK), let be the subgraph of induced by , . Then follows the form:
(6) 
where ,
and
Intuitively, is the average value of score on all edges of , is the average number of edges pointed from same users on all object pairs of .
The advantage of score is that the score obeys the following good properties including axioms proposed in (CROSSSPOT, ) that all good algorithms should have. First, we present a wellknown metric, edge density denoted by . And ‘=’ represents ‘not change’.

AXIOM 1. [Object Size]. Keeping , , and fixed, a larger is more suspicious than one with a smaller size.

AXIOM 2. [Object Similarity]. Keeping , , and fixed, a with more similar object pairs is more suspicious.

AXIOM 3. [User Size]. Keeping , , and fixed, a fraud object group () connected with more fraud users is more suspicious.

AXIOM 4. [Edge Density]. Keeping , , and fixed, a denser is more suspicious.
Let ) be the same total suspiciousness.

AXIOM 5. [Concentration.] With the same total suspiciousness, a smaller is more suspicious.
Note that naive metrics do not meet all axioms. For example, the edge density is not a good metric because it does not satisfy AXIOM 13 and 5.
Therefore, leveraging , we can sort groups in descending order of suspiciousness and catch top suspicious groups.
Give suspicious , we catch fraud users from comprised of fraud objects . The approach follows the form:
(7) 
where is the set of users having edges to .
To reduce false alarms in , we filter out users with low outdegrees in the subgraph induced by and of , because a normal user should not interact with many fraud objects in a group but may a few by accident.
4.4. Analysis
There are four advantages of FraudTrap ( + LPATK + ) :

[Camouflageresistance]. + LPATK is inherent to resist camouflage (see Theorem 4.2). However, LPA(SEMILPA, ; raghavan2007near, ), its group detection results can be easily destroyed by camouflage (demonstrated in Sec.5.1).

[Capture Loose Synchrony]. + LPATK + focuses on catching loosely synchronized behavior, because its topK most similar neighbors do not change in OSG for a fraud object. However, The density signal can be decreased significantly by synchrony reduction (FRAUDAR, ; CROSSSPOT, ; SPOKEN, ) (demonstrated in Sec.5.2).
Time complexity. In the OSG construction stage, it takes time, based on the optimization (Sec.4.1). In Stage II, the time cost is the product of the number of iterations and the number of colors, where the former value has been experimentally indicated to grow logarithmically with graph size (SEMILPA, ) and the latter value is bounded by . In Stage III, it takes to compute score and catch fraud users of , where . Thus, FraudTrap has nearlinear time complexity.
Capturing loosely synchronized behavior. We use a concrete example to show why the algorithm can handle loosely synchronized behaviors.
Consider a fraud group with 100 fraud users and 50 fraud objects, and each fraud user creates 30 edges to random fraud objects. Let denote the induced subgraph induced in the original userobject bipartite graph, and let denote the subgraph formed by fraud objects in OSG. We compute the edge density and in Eq.(6) for both and . We have
Obviously, the subgraph in OSG is much denser than the original bipartite graph. Then, let us reduce the synchrony of fraud group by doubling the number of fraud users and keep the same number of edges. Then we have
It shows that is affected slightly by the reduction of synchrony, compared to . Furthermore, as normal users hardly exhibit synchronized behavior, the score of normal object pairs are close to zero. Thus, FraudTrap ( + LPATK + ) is inherently more effective than approaches relying on density (FRAUDAR, ; SPOKEN, ; FBOX, ).
Camouflageresistance. FraudTrap is robust to resist different types of camouflage. There are two reasons. First, the scores of subgraphs induced by fraud object groups do not decrease while adding camouflage edges. Formally, we give the following theorem and proof.
Theorem 4.2 ().
Let denote a subgraph induced by fraud objects , and denote the fraud users. and are from a single fraud group. does not change when users in add camouflage edges to nonfraud objects.
Proof.
Let and denote two fraud objects, . Camouflage only introduces edges between and normal objects. It does not add or remove edges pointing to , which demonstrates that and in Eq.(1) do not change. Thus does not change, . ∎
Second, in OSG, a camouflage edge between a fraud user and a normal object only produces a quite small value of score due to the denominator of Eq. (1). Fig. 3 (b) provides a typical case. For a fraud user, this indicates that camouflage edges do not change its the most top similar neighbors. Thus, the subgraphs induced by fraud groups can be effectively detected by LPATK.
Effectiveness of the semisupervised mode. Given a subset of labeled fraud users, FraudTrap switches to the semisupervised mode. Because of the design of score, the partially labeled data does enhance the similarities between fraud objects in a group and increase the density of induced subgraph on OSG. Thus, unsurprisingly, LPATK will more accurately cluster fraud objects into groups. The experiments in Section 5.2 demonstrated our conclusion.
5. Experiments and results
We want to answer the following questions in the evaluation:

How does FraudTrap handle loosely synchronized behavior?

Is FraudTrap robust with different camouflage?

Does the semisupervised mode improve the performance?

Is FraudTrap scalable to large realworld datasets?
Table3 gives the details of datasets used in the paper.
datasets  edges  datasets  edges 

AmazonOffice(amazon_data, )  53K  YelpChi(YELP, )  67K 
AmazonBaby(amazon_data, )  160K  YelpNYC(YELP, )  359K 
AmazonTools(amazon_data, )  134K  YelpZip(YELP, )  1.14M 
AmazonFood(amazon_data, )  1.3 M  DARPA(DARPA, )  4.55M 
AmazonVideo(amazon_data, )  583K  Registration  26k 
AmazonPet(amazon_data, )  157K 
Implementation and existing methods in comparison. We implemented FraudTrap by Python and we run all experiments on a server with two 10core GHz Intel Xeon E5 CPUs and GB memory. We compared FraudTrap with the following three stateoftheart methods that focus on synchronized behavior with application to fraud detection.

Fraudar(FRAUDAR, ) finds the subgraph with the maximal average degree in the bipartite graph using an approximated greedy algorithm. It is designed to be camouflageresistance.

CatchSync(CATCHSYNC, ) specializes in catching rare connectivity structures of fraud groups that exhibit the synchronized behavior, it proposes the synchronicity and normality features based on the degree and HITS score of the user.

CrossSpot(CROSSSPOT, ) detects the dense blocks which maximize the suspiciousness metric in the multidimensional dataset.
We did our best to finetune the hyperparameters to achieve their best performances. For CrossSpot, we set the random initialization seeds as 400, 500 and 600, and chose the one with the maximal F1score. Fraudar detects the subgraph with the maximal average degree and multiple subgraphs by deleting previously detected nodes and their edges. For all methods, we test the performance according to the rank of the suspiciousness scores. We compared the performance using the standard metric, F1 score (the harmonic mean of precision and recall) across all algorithms.
FraudTrap and FraudTrap+. We run FraudTrap in two modes. The unsupervised mode (FraudTrap) and the semisupervised mode (FraudTrap+) assuming 5% fraud users are randomly labeled. And in all experiments, we set for LPATK. In the experiments regarding [Amazon] datasets, assume is a fraud object group returned by FraudTrap, is a fraud user group returned by Eq.7. Then we filtered out if the outdegree of is less than 3 in the subgraph induced by of , .
Fraud Group Formulation. To simulate the attack models of smart fraudsters, we used the same method as (FRAUDAR, ; CATCHSYNC, ) to generate labels: inject fraud groups into Amazon datasets ([Amazon] datasets contain six collections of reviews for different types of commodities on Amazon, listed in Table 3.) . To accurately depict the injection, we formulate the fraud group as following.
Definition 5.1 (Synchrony).
Given a subgraph induced by a group in where is a set of users, is a set of objects. (1) For each , where for each , the edge exists. We define as
where is the mean for all s. (2) For each , and , where for each , the edge exists. We set and is the mean for all s.
Thus, we use to represent a fraud group, where represents how loosely its synchronized behavior is and denotes the number of camouflage edges of each user on average. Naturally, and are labeled as ‘fraudulent’.
Before we evaluate the performance of FraudTrap, we first verify the effectiveness of LPATK.
5.1. Performance of LPATK
We recall that LAPWK has the best clustering performance and camouflageresistance. We design this experiment, to demonstrate the conclusion. We injected a fraud group into AmazonOffice, where indicates that each fraud user of reviews 15 fraud objects of and represents the number of camouflage edges of each fraud user on average. We varied to specifically examine the resistance to camouflage of each clustering algorithm. Let denote the bipartite graph formed by injected AmazonOffice and we built the OSG of using the method in section 4.1, . We run each algorithm on and evaluated the clustering performance and the performance of detecting fraud objects, and we used metric to compute suspiciousness scores of detected groups. Note that we only injected one group and thus should be clustered into one group. LPA denotes the algorithm (SEMILPA, ) that treats each edge weight equally, LPASum denotes the Algorithm 1 + Eq.4 and LPAMax denotes the Algorithm 1 + Eq.5.
= 0  = 5  = 10  =20  
Num—AUC  Num—AUC  Num—AUC  Num—AUC  
LPA  1 —1.0  1 —0.787  1 — 0.727  1 —0.731 
LPASum  1 — 1.0  1 — 1.0  1 — 0.787  1 — 0.761 
LPAMax  14 — 0.998  14 — 0.996  13 — 0.998  10 —0.991 
LPATK  1 —1.0  1 —1.0  1 — 0.999  1— 0.998 
Table 4 presents the clustering performance of each algorithm. Certainly, we expect ‘AUC’ = 1.0 for the best performance of detecting fraud objects and ‘Num’ = 1 for the best clustering result. Then we have the following observations: (1) without camouflage (), LPA has an ideal performance. However, once camouflage is added (), its performance is destroyed and LPA clustered all objects into one group (thus ‘num’ = 1). (2) LPASum shows weak camouflageresistance, and it performance deteriorates as camouflage edges increases. (3) LAPMax resists camouflage obviously. However, it divided into multiple groups, which is not good to group analysis and inspection. (4) Our algorithm LPATK has perfect performance. It clustered all fraud objects into one group and separated the group from legit objects even in face of camouflage. Thus the experiments demonstrate the advantages of LPATK.
5.2. Performance of FraudTrap
To [Amazon] datasets, we designed two fraud group injection schemes: the first is to examine in detail the performances for detecting loosely synchronized behavior and resisting camouflage; the second is for more general performance evaluation.
[Injection Scheme 1]. We chose AmazonOffice as the representative and injected a fraud group into it with varying configurations. We set the fraud group as . We introduced two perturbations according to strategies of smart fraudsters: (1) reduce synchrony by decreasing ; (2) add camouflage edges obeying . In , indicates that the number of a node’s camouflage edges is equal to the number of its edges within the fraud group. And we used all four types of camouflage as in Sec. 3.
Figure 6 and Figure 5 summarize the performance of detection fraud objects and fraud users with varying respectively, where the Xaxises are the synchronization ratio (varying from 0 to 0.5), and the Yaxises are the F1 scores. We have the following observations. 1) Without camouflage and with a high synchronization ratio , both FraudTrap and CatchSync can catch all injected frauds. 2) At lower ’s, even without camouflage, the performance of Fraudar decreases significantly, but FraudTrap maintains its performance. In fact, even for , the edge density of the fraud group is only , FraudTrap can still achieve an F1 score of 0.97. The effects confirm the robustness of our novel approach ( + LPATK + ). 3) Camouflage significantly decreases the performance of CatchSync, but both FraudTrap and Fraudar are resistant to camouflage. Not surprisingly, FraudTrap performs much better when camouflage and loose synchronization exist together. 4) As shown in Fig. 6, without the camouflage and loose synchronization, CatchSync and Fraudar perform perfectly, but their performance degrade quickly when decreases with camouflage. 5) CrossSpot performs poorly for any .
[Injection Scheme 2]. In this experiment, we injected 5 fraud groups into AmazonOffice, AmazonBaby, AmazonTools, AmazonFood, AmazonVideo, and AmazonBook, in which was randomly chosen from , respectively. Out of the 5 fraud groups, 1 of them is no camouflage, 4 out of them are augmented with four types of camouflage respectively. The performances are shown in Table 5 and Table 6 with respect to the detection of fraud objects and users. Overall, FraudTrap is the most robust and accurate across all variations and camouflages. The semisupervised FraudTrap+ with a random selection of 5% labeled fraud users kept or further improve the performance, verifying the conclusion in Section 4.4.
AmazonOffice  AmazonBaby  AmazonTools  
Fraudar  0.8915  0.8574  0.8764 
CatchSync  0.8512  0.8290  0.8307 
CrossSpot  0.8213  0.8342  0.7923 
FraudTrap  0.9987  0.9495  0.9689 
FraudTrap+  0.9987  0.9545  0.9675 
AmazonFood  AmazonVideo  AmazonBook  
Fraudar  0.6915  0.7361  0.8923 
CatchSync  0.7612  0.7990  0.7634 
CrossSpot  0.7732  0.7854  0.8324 
FraudTrap  0.8458  0.8651  0.9534 
FraudTrap+  0.8758  0.8951  0.9644 
AmazonOffice  AmazonBaby  AmazonTools  
Fraudar  0.9015  0.8673  0.8734 
CatchSync  0.8732  0.8391  0.8304 
CrossSpot  0.8113  0.8422  0.7823 
FraudTrap  1  0.9795  0.9796 
FraudTrap+  1  0.9845  0.9855 
AmazonFood  AmazonVideo  AmazonBook  
Fraudar  0.7213  0.7451  0.8815 
CatchSync  0.7234  0.8243  0.7763 
CrossSpot  0.7653  0.7913  0.8532 
FraudTrap  0.8637  0.8843  0.9572 
FraudTrap+  0.8818  0.9111  0.9579 
[Yelp] (YELP, ). YelpChi, YelpNYC, and YelpZip are three datasets collected by (YELP2, ) and (YELP, ), which contain a different number of reviews for restaurants on Yelp. Each review includes the user who made the review and the restaurants. Thus the three datasets can be represented by the bipartite graph formed by (users, restaurants). The three datasets all include labels indicating whether each review is fake or not. Detecting fraudulent users has been studied in (YELP, ) but used review text information. In this paper, we give the evaluation of catching fraudulent restaurants which bought fake reviews only using the two features. Intuitively, more reviews a restaurant contains, the more suspicious it is. Therefore, we label a restaurant as “fraudulent” if the number of fake reviews it receives exceeds 40 (because legit restaurants also may contain a few fake reviews). Table 7 shows the results. FraudTrap and FraudTrap+ have the best accuracy on all three datasets.
YelpChi  YelpNYC  YelpZip  
Fraudar  0.9905  0.8531  0.7471 
CatchSync  0.9889  0.8458  0.7779 
CrossSpot  0.9744  0.7965  0.7521 
FraudTrap  0.9905  0.8613  0.7793 
FraudTrap+  0.9905  0.8653  0.7953 
[DARPA] DARPA(DARPA, ) was collected by the Cyber Systems and Technology Group in 1998. It is a collection of network connections, some of which are TCP attacks. Each connection contains source IP and destination IP. Thus the dataset can be modeled as a bipartite graph formed by (source IPs, destination IPs) and we evaluate the performance of detecting malicious source IPs and destination IPs respectively for each method. The dataset includes labels indicating whether each connection is malicious or not and we labeled an IP as ‘malicious’ if it was involved in a malicious connection.
Table 8 presents the corresponding accuracies. Unfortunately, all baselines have bad performance regarding the detection of malicious IPs. However, FraudTrap and FraudTrap+ exhibit nearperfect accuracy because of our novel approach.
Detection of  source IP  destination IP 

Fraudar  0.7420  0.7298 
CatchSync  0.8069  0.8283 
CrossSpot  0.7249  0.6784 
FraudTrap  0.9968  0.9920 
FraudTrap+  0.9968  0.9920 
[Registration] is a realworld user registration dataset with 26k log entries from a large ecommerce website. Each entry contains user ID and two more features, IP subnet and phoneprefix, and an additional feature timestamp. The dataset includes labels indicating whether each entry (user ID) is fraudulent or not and the labels are obtained by tracking these user accounts for several months, and see if the user conduct fraud after registration and each fraud user has a group ID according to obvious attribute sharing among them. Of all user accounts, 10k are fraudulent. Note that the registration records do not contain the userobject interaction. We can easily adapt it to the FraudTrap framework assuming that each registered account (“object”) has many followers identified by a feature value, IP subnet or phoneprefix (“user”). Intuitively, an IP subnet can be used in many registrations, and we model it the same as a user follows multiple objects in a social network. Moreover, we use FraudTrap* to denote the mode of FraudTrap using the side information timestamp, and an edge in FraudTrap*.
Feature: IP  Feature: phone  
Fraudar  0.7543  0.8742 
CatchSync  0.7242  0.8435 
CrossSpot  0.6976  0.8231 
FraudTrap  0.7658  0.8979 
FraudTrap+  0.7826  0.9113 
FraudTrap*  0.7724  0.9215 
In our first experiment, we only used the IP subnet feature as the “user” side of the bipartite graph. The left half of Table 9 summarizes the results. We have the following observations: 1) FraudTrap, FraudTrap+, and FraudTrap* outperformed all the other existing methods by a small margin. Taking a closer look at the detection result, we found that FraudTrap captured a fraud group of 75 fraud users that all other methods missed. The group is quite loosely synchronized with the edge density of only 0.14 in the original bipartite graph. However, having an edge density in 1.0 in OSG makes it highly suspicious in FraudTrap. 2) FraudTrap+ performed better than the unsupervised version, even with only 5% of the fraud labels.
In the second experiment, we used phone features as the “user” side of the bipartite graph. The right column of Table 9 summarizes the results. The key observations are: 1) FraudTrap, FraudTrap+, and FraudTrap* still outperformed other methods. Other baselines have lower performance because they worked poorly on a group with 125 false positives (and 75 true positives). This is because they are based on edge density on the bipartite graph only, and the groups’ edge density is too big enough to distinguish this group from the normal.
As an additional benefit, FraudTrap can provide insights on the grouping of fraud users/objects by their similarity. Fig. 7 plots a projection of the data onto a 2D space using tSNE(Van2017Visualizing, ) on [Registration]. We labeled the fraud groups and normal groups according to the score ranking. We plotted the users in the same group with the same color. We expect that points in the similar groups are clustered together. We observe that the results from FraudTrap are much better than Fraudar and CrossSpot, since the users with the same color cluster better, which is very similar to the clustering result of labels.
5.3. Scalability
Sparsity of OSG edges. All the three datasets above have low edge densities. In fact, we also studied several public datasets by the construction of the OSG and the computation of the edge density. For example, three datasets in (amazon_data, ) and one dataset in (Leskovec2010Signed, ) have edge densities of 0.0027, 0.0027, 0.0028, 0.0013 respectively. With this density, the time and space complexities of FraudTrap are both nearlinear to the number of edges in the graph.
Based on the dataset AmazonFood, we vary the number of edges using downsampling, and verify the running time of FraudTrap is indeed nearlinear, as shown in Fig. 8.
6. Conclusion
Fraud detection is a battle with smart adversaries. This is its key difference from typical data mining tasks that work on naturally generated patterns, such as user profiling and recommendations. Fraudsters can adapt their behavior to avoid detection. Specifically, they can reduce their synchronized behaviors and conduct camouflage to destroy the performance of stateoftheart methods in the literature. The reason is that the graph features on the original userobject bipartite graph are not highlevel enough to capture the real semantics of group frauds, i.e., lots of similar (fraud) users interacting with many similar (fraud) objects. We propose FraudTrap to capture the more fundamental similarity among fraud objects, and work on the edge density on the Object Similarity Graph (OSG) instead. We design FraudTrap with many practical considerations for the general fraud detection scenario in many applications, such as supporting a mixture of unsupervised and semisupervised learning modes, as well as multiple features. We believe the metrics of FraudTrap are much harder for fraudsters to manipulate.
As future work, we will explore more on graph embedding methods to better capture the similarities of objects and users, especially the similarity over a longer period of time. We will also expand the approach to more scenarios, such as social community detection and interest management applications.
Footnotes
 conference: ; 2019;
 journalyear: 2019
 To be succinct, we use fraud users to refer to these accounts.
References
 U. D. of Justice Federal Bureau of Investigation, “2015 Internet Crime Report,” 2015, https://pdf.ic3.gov/2015_IC3Report.pdf.
 M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Catchsync: catching synchronized behavior in large directed graphs,” in ACM SIGKDD, 2014, pp. 941–950.
 B. Hooi, H. A. Song, A. Beutel, N. Shah, K. Shin, and C. Faloutsos, “Fraudar: Bounding graph fraud in the face of camouflage,” in ACM SIGKDD, 2016, pp. 895–904.
 M. Jiang, A. Beutel, P. Cui, B. Hooi, S. Yang, and C. Faloutsos, “Spotting suspicious behaviors in multimodal data: A general metric and algorithms,” IEEE TKDE, vol. 28, no. 8, pp. 2187–2200, 2016.
 K. Shin, T. EliassiRad, and C. Faloutsos, “Patterns and anomalies in k cores of realworld graphs with applications,” Knowledge & Information Systems, vol. 54, no. 3, pp. 677–710, 2018.
 B. A. Prakash, A. Sridharan, M. Seshadri, S. Machiraju, and C. Faloutsos, “Eigenspokes: Surprising patterns and scalable community chipping in large graphs,” in PAKDD, 2010.
 S. Pandit, D. H. Chau, S. Wang, and C. Faloutsos, “Netprobe:a fast and scalable system for fraud detection in online auction networks,” in WWW, 2007, pp. 201–210.
 H. Weng, Z. Li, S. Ji, C. Chu, H. Lu, T. Du, and Q. He, “Online ecommerce fraud: a largescale detection and analysis.” ICDE, 2018.
 L. Akoglu, R. Chandy, and C. Faloutsos, “Opinion fraud detection in online reviews by network effects.” in ICWSM. The AAAI Press, 2013.
 Y.L. Zhang, L. Li, J. Zhou, X. Li, and Z.H. Zhou, “Anomaly detection with partially observed anomalies.” ACM WWW, 2018, pp. 639–646.
 G. Danezis and P. Mittal, “Sybilinfer: Detecting sybil nodes using social networks,” Ndss the Internet Society, 2009.
 S. Hou, Y. Ye, Y. Song, and M. Abdulhayoglu, “Hindroid: An intelligent android malware detection system based on structured heterogeneous information network,” in ACM SIGKDD, 2017, pp. 1507–1515.
 K. Lee, J. Caverlee, and S. Webb, “Uncovering social spammers:social honeypots + machine learning,” ACM SIGIR, pp. 435–442, 2010.
 M. Egele, G. Stringhini, C. Kruegel, and G. Vigna, “Towards detecting compromised accounts on social networks,” IEEE TDSC, vol. 14, no. 4, pp. 447–460, 2017.
 M. Julian, “Amazon product data,” http://jmcauley.ucsd.edu/data/amazon/.
 Q. Cao, C. Palow, C. Palow, and C. Palow, “Uncovering large groups of active malicious accounts in online social networks,” in ACM CCS, 2014, pp. 477–488.
 A. Beutel, W. Xu, V. Guruswami, C. Palow, and C. Faloutsos, “Copycatch: Stopping group attacks by spotting lockstep behavior in social networks,” in WWW. New York, NY, USA: ACM, 2013, pp. 119–130.
 M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, “Finding deceptive opinion spam by any stretch of the imagination,” in ACL. Association for Computational Linguistics, 2011, pp. 309–319.
 K. Shin, B. Hooi, J. Kim, and C. Faloutsos, “Dcube: Denseblock detection in terabytescale tensors,” in ACM WSDM, New York, NY, USA, 2017, pp. 681–689.
 M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Inferring lockstep behavior from connectivity pattern in large graphs,” Knowledge & Information Systems, vol. 48, no. 2, pp. 399–428, 2016.
 N. Shah, A. Beutel, B. Gallagher, and C. Faloutsos, “Spotting suspicious link behavior with fbox: An adversarial perspective,” in IEEE International Conference on Data Mining, 2014, pp. 959–964.
 H. GarciaMolina and J. Pedersen, “Combating web spam with trustrank,” in VLDB, 2004, pp. 576–587.
 S. Ghosh, B. Viswanath, F. Kooti, N. K. Sharma, G. Korlam, F. Benevenuto, N. Ganguly, and K. P. Gummadi, “Understanding and combating link farming in the twitter social network,” in WWW, 2012.
 N. Shah, A. Beutel, B. Hooi, L. Akoglu, S. Gunnemann, D. Makhija, M. Kumar, and C. Faloutsos, “Edgecentric: Anomaly detection in edgeattributed networks,” in IEEE ICDM, 2017, pp. 327–334.
 S. Kumar, B. Hooi, D. Makhija, M. Kumar, C. Faloutsos, and V. S. Subrahmanian, “Fairjudge: Trustworthy user prediction in rating platforms,” CoRR, 2017.
 B. Hooi, N. Shah, A. Beutel, S. Günnemann, L. Akoglu, M. Kumar, D. Makhija, and C. Faloutsos, “Birdnest: Bayesian inference for ratingsfraud detection,” in Proceedings of the 2016 SIAM. SIAM, 2016, pp. 495–503.
 K. Shin, B. Hooi, and C. Faloutsos, “Mzoom: Fast denseblock detection in tensors with quality guarantees,” in ECML PKDD, 2016.
 Y. Li, Y. Sun, and N. Contractor, “Graph mining assisted semisupervised learning for fraudulent cashout detection,” in Proceedings of the 2017 IEEE/ACM ASONAM, 2017, pp. 546–553.
 N. Z. Gong, M. Frank, and P. Mittal, “Sybilbelief: A semisupervised learning approach for structurebased sybil detection,” IEEE Transactions on Information Forensics & Security, pp. 976–987, 2013.
 Y. Boshmaf, D. Logothetis, G. Siganos, J. Leria, J. Lorenzo, M. Ripeanu, and K. Beznosov, “Integro: Leveraging victim prediction for robust fake account detection in osns,” in Network and Distributed System Security Symposium, 2015, pp. 142–168.
 A. Mohaisen, N. Hopper, and Y. Kim, “Keep your friends close: Incorporating trust into social networkbased sybil defenses,” in IEEE INFOCOM, 2011, pp. 1943–1951.
 H. Tang and Z. Cao, “Machine learningbased intrusion detection algorithms,” Journal of Computational Information Systems, pp. 1825–1831, 2009.
 M. Y. Su, “Realtime anomaly detection systems for denialofservice attacks by weighted knearestneighbor classifiers,” Expert Systems with Applications, vol. 38, no. 4, pp. 3492–3498, 2011.
 J. Paul, “The distribution of the flora in the alpine zone.1,” New Phytologist, vol. 11, no. 2, pp. 37–50.
 G. Cordasco and L. Gargano, “Community detection via semisynchronous label propagation algorithms,” in 2010 IEEE International Workshop on: BASNA, Dec 2010, pp. 1–8.
 U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm to detect community structures in largescale networks,” Physical review E, vol. 76, no. 3, p. 036106, 2007.
 L. Barenboim and M. Elkin, “Distributed (+1)coloring in linear (in ) time,” in ACM Symposium on Theory of Computing, 2009, pp. 111–120.
 S. Rayana and L. Akoglu, “Collective opinion spam detection: Bridging review networks and metadata,” in ACM SIGKDD, 2015.
 R. P. Lippmann, D. J. Fried, I. Graf, J. W. Haines, K. R. Kendall, D. McClung, D. Weber, S. E. Webster, D. Wyschogrod, R. K. Cunningham, and M. A. Zissman, “Evaluating intrusion detection systems: the 1998 darpa offline intrusion detection evaluation,” in Proceedings DARPA Information Survivability Conference and Exposition. DISCEX’00, vol. 2, Jan 2000, pp. 12–26 vol.2.
 A. Mukherjee, V. Venkataraman, B. Liu, and N. Glance, “What yelp fake review filter might be doing?” in Proceedings of the 7th International Conference on Weblogs and Social Media, ICWSM 2013. AAAI press, 1 2013, pp. 409–418.
 L. Van, der Maaten, G. Hinton, and L. V. D. Maaten, “Visualizing data using tsne,” Journal of Machine Learning Research, pp. 2579–2605, 2008.
 J. Leskovec, D. Huttenlocher, and J. Kleinberg, “Signed networks in social media,” in Sigchi Conference on Human Factors in Computing Systems, 2010, pp. 1361–1370.