An Improved Naive Bayes Classifierbased Noise Detection Technique for Classifying User Phone Call Behavior
Abstract
The presence of noisy instances in mobile phone data is a fundamental issue for classifying user phone call behavior (i.e., accept, reject, missed and outgoing), with many potential negative consequences. The classification accuracy may decrease and the complexity of the classifiers may increase due to the number of redundant training samples. To detect such noisy instances from a training dataset, researchers use naive Bayes classifier (NBC) as it identifies misclassified instances by taking into account independence assumption and conditional probabilities of the attributes. However, some of these misclassified instances might indicate usages behavioral patterns of individual mobile phone users. Existing naive Bayes classifier based noise detection techniques have not considered this issue and, thus, are lacking in classification accuracy.
In this paper, we propose an improved noise detection technique based on naive Bayes classifier for effectively classifying users’ phone call behaviors. In order to improve the classification accuracy, we effectively identify noisy instances from the training dataset by analyzing the behavioral patterns of individuals. We dynamically determine a noise threshold according to individual’s unique behavioral patterns by using both the naive Bayes classifier and Laplace estimator. We use this noise threshold to identify noisy instances. To measure the effectiveness of our technique in classifying user phone call behavior, we employ the most popular classification algorithm (e.g., decision tree). Experimental results on the real phone call log dataset show that our proposed technique more accurately identifies the noisy instances from the training datasets that leads to better classification accuracy.
An Improved Naive Bayes Classifierbased Noise Detection Technique for Classifying User Phone Call Behavior
Iqbal H. Sarker and Muhammad Ashad Kabir and Alan Colman and Jun Han 
Department of Computer Science and Software Engineering
School of Software and Electrical Engineering
Swinburne University of Technology,
Melbourne, VIC3122, Australia.
Email: {msarker, acolman, jhan}@swin.edu.au
School of Computing and Mathematics,
Charles Sturt University, NSW, Australia.
Email: akabir@csu.edu.au
Keywords: Mobile Data Mining, Noisy Data, Naive Bayes Classifier, Decision Tree, Classification, Laplace Estimator.
1 Introduction
Now a days, mobile phones have become part of our daily life. The number of mobile cellular subscriptions is almost equal to the number of people on the planet (?) and the phones are, for most of the day, with their owners as they go through their daily routines (?). People use mobile phones for various activities such as voice communication, Internet browsing, app using, email, online social network, instant messaging, etc. (?). In recent years, researchers use various types of mobile phone data such as phone call log (?), app usages log (?), mobile phone notifications history (?), web log (?), context log (?) for different personalized applications. For instance, phone call log is used to predict users’ behavior in order to build an automated call firewall or call reminder system (?).
In data mining area, classification is a function that describes and distinguishes data classes or concepts (?). The goal of classification is to accurately classify the class labels of instances whose attribute values are known, but class values are unknown. Accurately classifying user phone call behavior from log data using machine learning techniques (e.g., decision tree) is challenging as it requires a data set free from outliers or noise (?). However, realworld datasets may contain noise, which is anything that obscures the relationship between the features of an instance and it’s behavior class (?). Such noisy instances may reduce the classification accuracy, and increase the complexity of the classification process. It is also evident that decision trees are badly impacted by noise (?). Hence, we summarize the effects of noisy instances for classifying user phone call behavior as follows:

Create unnecessary classification rules that are not interesting to the users and make the ruleset larger.

The complexity of the classifiers and the number of necessary training samples may increase.

The presence of noisy training instances is more likely to cause overfitting for the decision tree classifier and thus decrease it’s accuracy.
According to (?), the performance of the classifier depends on two significant factors: (1) the quality of the training data, and (2) the competence of learning algorithm. Therefore, identification and elimination of the noisy instances from a training dataset are required to ensure the quality of the training data before applying learning technique in order to achieve better classification accuracy.
NBC is the most popular technique to detect noisy instances from a training dataset, as it is attributed to the independence assumption and the use of conditional probabilities (?) (?). Farid et al. (?) have proposed a naive Bayes classifier based noise detection technique for multiclass classification tasks. This technique finds the noisy instances from a training dataset using a naive Bayes classifier and removes these instances from the training set before constructing a decision tree learning for making decisions. In their approach, they identify all the misclassified instances from the training dataset using NBC and consider these instances as noise. However, some of these misclassified instances might represent true behavioral patterns of individuals. Therefore, such a strong assumption regarding noisy instances more likely to decrease the classification accuracy of mining phone call behavior.
In this paper, we address the above mentioned issue for identifying noisy instances and propose an improved noise detection technique based on the naive Bayes classifier for effectively classifying mobile users’ phone call behaviors. In our approach, we first calculate the conditional probability for all the instances using naive Bayes classifier and Laplaceestimator. After that we dynamically determine a noise threshold according to individual’s unique behavioral patterns. Finally, the (misclassified) instances that can’t satisfy this threshold are selected as noise. As individual’s phone call behavioral patterns are not identical in the real life, this threshold for identifying noisy instances changes dynamically according to the behavior of individuals. To measure the effectiveness of our technique for classifying user phone call behavior, we employ a prominent classification algorithm  decision tree. Our approach aims to improve the existing naive Bayes classifier based noise detection technique (?) for classifying phone call behavior of individuals.
The contributions are summarized as follows:

We determine a noise threshold dynamically according to individual’s unique behavioral patterns.

We propose an improved noise detection technique based on naive Bayes classifier for effectively classifying mobile users’ phone call behaviors.

Our experiments on real mobile phone datasets show that this technique is more effective than existing technique for classifying user phone call behavior.
The rest of the paper is organized as follows. We review the naive Bayes classifier and Laplacian estimator in Section 2 and Section 3 respectively. We present our approach in section 4. We report the experimental results in Section 5. Finally, Section 6 concludes this the paper and highlights the future work.
Day[TimeSegment]  Location  Situation  Relationship  User Behavior 

Fri[S1]  Office  Meeting  Friend  Reject 
Fri[S1]  Office  Meeting  Colleague  Reject 
Fri[S1]  Office  Meeting  Boss  Accept 
Fri[S1]  Office  Meeting  Friend  Reject 
Fri[S2]  Home  Dinner  Friend  Accept 
Wed[S1]  Office  Seminar  Unknown  Reject 
Wed[S1]  Office  Seminar  Colleague  Reject 
Wed[S1]  Office  Seminar  Mother  Accept 
Wed[S2]  Home  Dinner  Unknown  Accept 
Probability  Value 

P(behavior = Reject)  5/9 
P(behavior = Accept)  4/9 
Probability  Value 

3/5  
1/4  
0/5  
1/4  
2/5  
1/4  
0/5  
1/4  
5/5  
2/4  
0/5  
2/4  
3/5  
1/4  
2/5  
1/4  
0/5  
2/4  
2/5  
1/4  
2/5  
0/4  
0/5  
1/4  
0/5  
1/4  
1/5  
1/4 
2 Naive Bayes Classifier
A naive Bayes classifier (NBC) is a simple probabilistic based method, which can predict the class membership probabilities (?) (?). It has two main advantages: (a) easy to use, and (b) only one scan of the training data is required for probability generation. A naive Bayes classifier can easily handle missing attribute values by simply omitting the corresponding probabilities for those attributes when calculating the likelihood of membership for each class. It also requires the class conditional independence, i.e., the effect of an attribute on a given class is independent of those of other attributes.
Let D be a training set of data instances and their associated class labels. Each instance is represented by an ndimensional attribute vector, , depicting measurements made on the instance from attributes, respectively, . Suppose that there are classes, . For a test instance, , the classifier will predict that belongs to the class with the highest conditional probability, conditioned on . That is, the naive Bayes classifier predicts that the instance belongs to the class , if and only if 
for
The class for which is maximized is called the Maximum Posteriori Hypothesis.
(1) 
In Bayes theorem shown in Equation (1), as is a constant for all classes, only needs to be maximized. If the class prior probabilities are not known, then it is commonly assumed that the classes are likely equal, that is, , and therefore we would maximize . Otherwise, we maximize . The class prior probabilities are calculated by , where is the number of training instances of class in . To compute in a dataset with many attributes is extremely computationally expensive. Thus, the naive assumption of classconditional independence is made in order to reduce computation in evaluating . This presumes that the attributes’ values are conditionally independent of one another, given the class label of the instance, i.e., there are no dependence relationships among attributes. Thus, Equation (2) and (3) are used to produce .
(2) 
(3) 
In Equation (2), refers to the value of attribute for instance . Therefore, these probabilities can be easily estimated from the training instances. If the attribute value, , is categorical, then is the number of instances in the class with the value for , divided by , i.e., the number of instances belonging to the class .
To predict the class label of instance is evaluated for each class . The naive Bayes classifier predicts that the class label of instance is the class , if and only if 
for and
In other words, the predicted class label is the class for which is the maximum.
3 Laplacian Estimation
As in naive Bayes classifier, we calculate as the product of the probabilities , based on the independence assumption and class conditional probabilities, we will end up with a probability value of zero for some if attribute value is never observed in the training data for class . Therefore, Equation (3) becomes zeros for such attribute value regardless the values of other attributes. Thus, naive Bayes classifier cannot predict the class of such test instance. Laplace estimate (?) is usually employed to scale up the values by smoothing factor. In Laplaceestimate, the class probability is defined as:
(4) 
where is the number of instances satisfying , is the number of training instances, is the number of classes and .
Let’s consider a phone call behavior example, for the behavior class ‘reject’ in the training data containing 1000 instances, we have 0 instance with , 990 instances with , and 10 instances with . The probabilities of these contexts are 0, 0.990 (from 990/1000), and 0.010 (from 10/1000), respectively. On the other hand, according to equation (4), the probabilities of these contexts would be as follows:
, ,
In this way, we obtain the above nonzero probabilities (rounded up to three decimal places) respectively using Laplacianestimation. The “new” probability estimates are close to their “previous” counterparts, and these values can be used for further processing.
4 Noise Detection Technique
In this section, we discuss our noise detection technique in order to effectively classify user phone call behavior. Figure 1 shows the block diagram of our noise detection technique.
In order to detect noise, we use naive Bayes classifier (NBC) (?) as the basis for noise identification. Using NBC, we first calculate the conditional probability for each attribute by scanning the training data. Table 1 shows an example of the mobile phone dataset. Each instance contains four attribute values (e.g., time, location, situation, and relationship between caller and callee) and corresponding phone call behavior. Table 2 and Table 3 report the prior probabilities for each behavior class and conditional probabilities for each attribute value, respectively for this dataset. Using these probabilities, we calculate the conditional probability for each instance. As NBC was implemented under the independence assumption, it estimates zero probabilities if the conditional probability for a single attribute is zero. In such cases, we use Laplaceestimator (?) to estimate the conditional probability of any of the attribute value.
Once we have calculated conditional probability for each instance, we differentiate between the purely classified instances and misclassified instances using these values. “Purely classified” instances are those for which the predicted class and the original class is same. If different class found then these are “misclassified” instances. After that, we generate the instances groups by taking into account all the distinct probabilities as separate group values. Figure 2 shows an example of instances groups for the instances where consists of 5 instances with probability , consists of 3 instances with probability and finally consists of 3 instances with probability . We then identify the group among the purely classified instances for which the probability is minimum. This minimum probability is considered as “noisethreshold”. Finally, the instances in misclassified list, for those probabilities are less than the noise threshold, are identified as noise.
The process for identifying noise is set out in Algorithm 1. Input data includes training dataset: , which contains a set of training instances and their associated class labels and output data is the list of noisy instances. For each class, we calculate the prior probabilities (line 2). After that for each attribute value, we calculate the class conditional probabilities (line 5). For each training instance, we calculate the conditional probabilities (line 8). We then check whether it is nonzero. If we get zero probabilities, we then recalculate the conditional probabilities using Laplacian Estimator (line 11). Based on these probability values, we then check whether the instances are misclassified or purely classified and store all misclassified instances (line 14) with corresponding probabilities in (line 15). Similarly, we also store all purely classified instances (line 18) with corresponding probabilities in (line 19). We then identify the minimum probability from as noise threshold (line 22). As we aim to identify the noise list we check the conditional probabilities in for all instances. If any instance fails to satisfy this threshold then we store that instance as noise and store into (line 26). Finally this algorithm returns a set of noisy instances (line 29) for a particular dataset.
Rather than arbitrarily determine the threshold, our algorithm dynamically identifies the noise threshold according to individual’s behavioral patterns and identify noisy instances based on this threshold. As individual’s phone call behavioral patterns are not identical in the real life this noisethreshold for identifying noisy instances changes dynamically according to individual’s unique behavioral patterns.
5 Experiments
In this section, we describe our experimental setup and the phone log datasets used in experiment. We also present an experimental evaluation comparing our proposed noise detection technique and the existing naive Bayes classifier based noise detection technique (?) for classifying user phone call behavior.
5.1 Experimental Setup
We have implemented our noise detection technique (Algorithm 1) and existing naive Bayes classifier based technique (?) in Java programming language and executed them on a Windows PC with an Intel Core I5 CPU (3.20GHz) and 8GB memory. In order to measure the classification accuracy, we first eliminate the noisy instances identified by noise identification technique from the training dataset, and then apply the decisiontree classifier (?) on the noisefree dataset. The reason for choosing the decision tree as a classifier is that decision tree is the most popular classification algorithm in data mining (?) (?). The code for the basic versions of the decision tree classifier is adopted from Weka, which is an open source data mining software (?).
5.2 Dataset
We have conducted experiments on phone log datasets of five individual mobile phone users (randomly selected from Massachusetts Institute of Technology (MIT) Reality Mining dataset (?)). We extract 7tuple information of the call record for each phone user from the datasets: date of call, time of call, calltype, call duration, location, relationship, call ID. These datasets contain three types of phone call behavior, e.g., incoming, missed and outgoing. As can be seen, the user’s behavior in accepting and rejecting calls are not directly distinguishable in incoming calls in the dataset. As such, we derive accept and reject calls by using the call duration. If the call duration is greater than 0 then the call has been accepted; if it is equal to 0 then the call has been rejected (?). We also preprocess the temporal data in mobile phone log as it is continuous and numeric. For this, we use BOTS technique (?) for producing behaviororiented time segments. Table 4 describes each dataset of the individual mobile phone user.
Dataset  Contexts  Instances  Behavior Classes 

User 04  temporal, location, relationship  5119  accept, reject, missed, outgoing 
User 23  temporal, location, relationship  1229  accept, reject, missed, outgoing 
User 26  temporal, location, relationship  3255  accept, reject, missed, outgoing 
User 33  temporal, location, relationship  635  accept, reject, missed, outgoing 
User 51  temporal, location, relationship  2096  accept, reject, missed, outgoing 
5.3 Evaluation Metric
In order to measure the classification accuracy, we compare the classified response with the actual response (i.e., the ground truth) and compute the accuracy in terms of:

Precision: ratio between the number of phone call behaviors that are correctly classified and the total number of behaviors that are classified (both correctly and incorrectly). If TP and FP denote true positives and false positives then the formal definition of precision is:
(5) 
Recall: ratio between the number of phone call behaviors that are correctly classified and the total number of behaviors that are relevant. If TP and FN denote true positives and false negatives then the formal definition of recall is:
(6) 
Fmeasure: a measure that combines precision and recall is the harmonic mean of precision and recall. The formal definition of Fmeasure is:
(7)
5.4 Evaluation Results
To evaluate our approach, we employ the 10fold cross validation on each dataset. In fold crossvalidation, the initial data are randomly partitioned into mutually exclusive subsets or “folds”, , each of which has an approximately equal size. Training and testing are performed times. In iteration , the partition is reserved as the test set, and the remaining partitions are collectively used to train the classifier. Therefore, the 10fold cross validation breaks data into 10 sets of size N/10. It trains the classifier on 9 sets and tests it using the remaining one set. This repeats 10 times and we take a mean accuracy rate. For classification, the accuracy estimate is the total number of correct classifications from the kiterations, divided by the total number of instances in the initial dataset. To show the effectiveness of our technique, we compare the accuracy of both the existing naive Bayes classifier based noise detection approach (NBC) (?) and our proposed dynamic threshold based approach, in terms of precision, recall and fmeasure.
Dataset  Precision  Recall  Fmeasure 

User 04  0.91  0.30  0.45 
User 23  0.83  0.84  0.83 
User 26  0.89  0.51  0.65 
User 33  0.80  0.85  0.80 
User 51  0.78  0.78  0.78 
Dataset  Precision  Recall  Fmeasure 

User 04  0.89  0.70  0.78 
User 23  0.84  0.84  0.84 
User 26  0.92  0.72  0.80 
User 33  0.82  0.86  0.81 
User 51  0.86  0.85  0.85 
Table 5 and Table 6 show the experimental results for five individual mobile phone users’ datasets using the existing naive Bayes classifier based noise detection approach and our dynamic threshold based approach respectively. From Table 5 and Table 6, we find that our approach consistently outperforms previous NBCbased approach for all individuals in terms of precision, recall and Fmeasure. In addition to compare individual level, we also show the relative comparison of average precision, average recall and average Fmeasure for all the five different datasets in Figure 3.
The experimental results for a collection of users show that our approach consistently outperforms the NBCbased approach. The reason is that instead of treating all misclassified instances as noise we identify true noisy instances from misclassified list using a noise threshold. We determine this noise threshold for each individual dataset as it varies according to individual’s unique behavioral patterns. As a result, our technique improves the classification accuracy while classifying phone call behavior of individual mobile phone users.
6 Conclusion and Future Work
In this paper, we have presented an approach to detecting and eliminating noisy instances from mobile phone data in order to improve the classification accuracy. Our approach dynamically determines the noise threshold according to individual’s behavioral patterns. For this, we employ both the naive Bayes classifier and Laplacian estimator. Experimental results on multicontextual phone call log datasets indicate that compare to the NBCbased approach, our approach improves the classification accuracy in terms of precision, recall and Fmeasure.
In future work, we plan to investigate the effect of noise on confidence threshold to produce association rules. We will extend our noise detection technique to produce confidencebased association rules of individual mobile phone users in multidimensional contexts.
References
 [1]
 [2] [] Cestnik, B. et al. (1990), Estimating probabilities: a crucial task in machine learning., in ‘ECAI’, Vol. 90, pp. 147–149.
 [3]
 [4] [] Chen, J., Huang, H., Tian, S. & Qu, Y. (2009), ‘Feature selection for text classification with naïve bayes’, Expert Systems with Applications 36(3), 5432–5435.
 [5]
 [6] [] Daza, L. & Acuna, E. (2007), An algorithm for detecting noise on supervised classification, in ‘Proceedings of WCECS07, the 1st World Conference on Engineering and Computer Science’, pp. 701–706.
 [7]
 [8] [] Eagle, N., Pentland, A. & Lazer, D. (2006), ‘Infering social network structure using mobile phone data’, Proc. of National Academy of Sciences .
 [9]
 [10] [] Farid, D. M., Zhang, L., Rahman, C. M., Hossain, M. A. & Strachan, R. (2014), ‘Hybrid decision tree and naïve bayes classifiers for multiclass classification tasks’, Expert Systems with Applications 41(4), 1937–1946.
 [11]
 [12] [] Frénay, B. & Verleysen, M. (2014), ‘Classification in the presence of label noise: a survey’, IEEE transactions on neural networks and learning systems 25(5), 845–869.
 [13]
 [14] [] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. & Witten, I. H. (2009), ‘The weka data mining software: an update’, ACM SIGKDD explorations newsletter 11(1), 10–18.
 [15]
 [16] [] Halvey, M., Keane, M. T. & Smyth, B. (2005), Time based segmentation of log data for user navigation prediction in personalization, in ‘Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence’, IEEE Computer Society, pp. 636–640.
 [17]
 [18] [] Han, J., Pei, J. & Kamber, M. (2011), Data mining: concepts and techniques, Elsevier.
 [19]
 [20] [] John, G. H. & Langley, P. (1995), Estimating continuous distributions in bayesian classifiers, in ‘Proceedings of the Eleventh conference on Uncertainty in artificial intelligence’, Morgan Kaufmann Publishers Inc., pp. 338–345.
 [21]
 [22] [] Mehrotra, A., Hendley, R. & Musolesi, M. (2016), Prefminer: mining user’s preferences for intelligent mobile notification management, in ‘Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing’, ACM, pp. 1223–1234.
 [23]
 [24] [] Ozer, M., Keles, I., Toroslu, H., Karagoz, P. & Davulcu, H. (2016), ‘Predicting the location and time of mobile phone users by using sequential pattern mining techniques’, The Computer Journal 59(6), 908–922.
 [25]
 [26] [] Pejovic, V. & Musolesi, M. (2014), Interruptme: designing intelligent prompting mechanisms for pervasive applications, in ‘Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing’, ACM, pp. 897–908.
 [27]
 [28] [] Phithakkitnukoon, S., Dantu, R., Claxton, R. & Eagle, N. (2011), ‘Behaviorbased adaptive call predictor’, ACM Transactions on Autonomous and Adaptive Systems (TAAS) 6(3), 21.
 [29]
 [30] [] Quinlan, J. R. (1993), ‘C4.5: Programs for machine learning’, Machine Learning .
 [31]
 [32] [] Sarker, I. H., Colman, A., Kabir, M. A. & Han, J. (2016), Behaviororiented time segmentation for mining individualized rules of mobile phone users, in ‘2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, Canada’, IEEE, pp. 488–497.
 [33]
 [34] [] Sarker, I. H., Kabir, M. A., Colman, A. & Han, J. (2017), An effective call prediction model based on noisy mobile phone data, in ‘Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct’, ACM.
 [35]
 [36] [] Srinivasan, V., Moghaddam, S. & Mukherji, A. (2014), Mobileminer: Mining your frequent patterns on your phone, in ‘ACM International Joint Conference on Pervasive and Ubiquitous Computing’, ACM.
 [37]
 [38] [] Witten, I. H. & Frank, E. (2005), Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann.
 [39]
 [40] [] Witten, I. H., Frank, E., Trigg, L. E., Hall, M. A., Holmes, G. & Cunningham, S. J. (1999), ‘Weka: Practical machine learning tools and techniques with java implementations’.
 [41]
 [42] [] Wu, C.C., Chen, Y.L., Liu, Y.H. & Yang, X.Y. (2016), ‘Decision tree induction with a constrained number of leaf nodes’, Applied Intelligence 45(3), 673–685.
 [43]
 [44] [] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Philip, S. Y. et al. (2008), ‘Top 10 algorithms in data mining’, Knowledge and information systems 14(1), 1–37.
 [45]
 [46] [] Zhu, H. & Chen, E. (2014), ‘Mining mobile user preferences for personalized contextaware recommendation’, ACM Tran. on Intelligent Systems and Technology 5(4).
 [47]
 [48] [] Zhu, X. & Wu, X. (2004), ‘Class noise vs. attribute noise: A quantitative study’, Artificial Intelligence Review 22(3), 177–210.
 [49]