An anomaly prediction framework for financial IT systems using hybrid machine learning methods

An anomaly prediction framework for financial IT systems using hybrid machine learning methods


In financial field, a robust IT system is of vital importance to ensure the smooth operation of financial transactions. However, many financial corporations still depend on operators to identify and eliminate the system failures when financial IT systems break down. This traditional operation method is time consuming and extremely inefficient. To improve the efficiency and accuracy of system failure detection and thereby reduce the impact of system failures on financial services, we propose a novel machine learning-based framework to predict the occurrence of system exceptions and failures in a financial IT system. In particular, we first extract rich information from system logs and eliminate noises in the data. Then the cleaned data is leveraged as the input of our proposed anomaly prediction framework which consists of three modules: key performance indicator(KPI) data prediction module, anomaly identification module and severity classification module. Notably, we design a hierarchical architecture of alarm classifiers and try to alleviate the influence of class-imbalance problem on the overall performance. Empirically, the experimental results demonstrate the superior performance of our proposed method on a real-world financial IT system log data set.

Anomaly prediction Time series prediction Hierarchical classifier Imbalanced learning

1 Introduction

IT systems have been widely utilized in a number of areas and have played quite important roles in different corporations. The IT systems in financial companies are busy dealing with large amount of financial transactions every day, providing reliable and effective services to traders. However, system exceptions and failures have tremendously severe and even deadly impact on the financial IT systems, which can affect the smooth operation of financial transactions. In traditional operation methods, the operators need to monitor servers all the time and they suffer a lot from fault location and troubleshooting. Hence, it is well worth to automatically detect system anomalies and report alarms to operators, which can save time and thereby improve the efficiency of operation.

Recently, extensive research has been conducted on automatically detecting and predicting anomalies in IT systems. Pellegrini et al. (2015) proposed a machine learning based framework to calculate the remaining time to system failure. However, several procedures require manual intervention and parameters need to be set in advance, which still can not achieve the automatic operation goal. Naveiro et al. (2018) presented a framework aiming to monitor KPI time series and detecting anomalies in a completely automatical way. But the main drawback of this work is the thresholds need be preset empirically.

Although aforementioned methods have been proposed, predicting anomalies in IT systems still faces the following challenges:

  • Inaccuracy of labelling anomalies: Most of the anomalies are labelled by operators. Different operators may produce different label results for the same sample.

  • Incomplete anomaly cases: System anomalies usually occur rarely, and the collected data set can not cover all kinds of anomalous cases.

  • Feature extraction and selection: The performance of the anomaly prediction model depends on the input features to a large extent, and the raw log files contain lots of redundant information. Hence, it is essential to utilize appropriate features which can accurately capture the characteristics of system anomalies when training the model.

  • Class imbalance: The class imbalance problem means the number of anomalous cases is far exceeded by the number of normal cases in the collected data set. This problem can severely affect the performance of anomaly prediction methods.

The first two challenges are inevitable in the data collection process, therefore, we mainly focus on addressing the latter two challenges. As for the feature selection challenge, we combine multiple KPI features from the dirty system logs to depict the system running state as comprehensively as possible. Then we utilize these features to predict the existence of system anomalies. However, in previous works Bertero et al. (2017); Stearley and Oliner (2008), they only focused on text analysis approaches to reveal anomalies, which is prone to be affected by redundant information and noises in the text of system logs. Instead, we do not pay much attention to exploit textual features from system logs. We leverage more expressive KPI features to build the proposed framework.

For the class imbalance problem, it has aroused great attention in both academic and industry. Recently, many imbalanced learning algorithms have been proposed, including sampling methods, cost-sensitive learning methods, kernel-based methods and active learning methods He and Garcia (2009). In previous work Liu et al. (2015), it utilized dynamic adjusted threshold method to solve the class imbalance problem, which is sensitive to changes in the data and can affect the performance. In order to reduce the impact of class imbalance problem, we elaborately design a robust multi-level framework by hybrid machine learning methods, in which the KPI data prediction module predicts the KPI features of future time steps, the anomaly identification module distinguishes anomalous samples from normal samples, and the alarm severity classification module finally identifies the severity level for those anomalies. Specifically, in the alarm severity classification module, we also propose a weighted sampling strategy to alleviate the class-imbalance problem in anomalous samples. Hence, the proposed framework is able to reduce the impact of class imbalance problem on the overall performance.

Empirically, we conduct experiments on a real-world system log data set from a financial IT system. The experimental results demonstrate it is effective to utilize KPI time series as key features of the system running state and thereby predict the existence of system anomalies. Furthermore, we analyze the experimental results of different choices of algorithms in each module, further demonstrating the effectiveness and rationality of the propose framework. Note that our proposed framework is an automatic anomaly prediction framework for financial IT systems. When using our framework, the operators do not need to preset parameters, such as thresholds in the existing methods Pellegrini et al. (2015); Naveiro et al. (2018).

In summary, the main contributions of our paper are as follows.

  • We propose an anomaly prediction framework which can accurately predict the KPI features and then utilize the predicted features to further predict system anomalies in a totally automatic way.

  • We combine time series data of KPIs and text data of system exception logs to build our framework, instead of directly analyzing the text of system logs.

  • We conduct experiments on a real-world system log data set from a financial IT system, and demonstrate the effectiveness and robustness of our proposed framework.

  • We deploy the framework into the real financial IT system, and prove that our framework can meet the requirements of the real-time anomaly prediction.

The rest of the paper is organized as follows. In Sect. 2, we introduce some preliminary knowledge relevant to the actual server operation business. In Sect. 3, we review the related work. In Sect. 4, we provide a general view of the proposed framework and introduce each module in detail. We report the experimental results in Sect. 5 and summarize the paper in Sect. 6.

2 Background and Preliminaries

In this section, we first present some essential notions related to system anomaly prediction. Then we introduce the generation process of system log data in the financial IT system.

2.1 Notions

In order to detect and predict system exceptions and failures in real applications, system performance indexes are useful measures to reflect the system running state of servers. We define these system performance indexes, like the usage of CPU, disk and memory, as key performance indicators(KPIs). KPI data is then collected and utilized to predict system anomalies. Note that system anomalies refer to system exceptions and failures, rather than anomalies in the time series of KPI data. If a system exception occurs, it would trigger an alarm to remind operators to to resolve it. Hence, it is clear that an alarm is a signal indicating the existence and severity of system anomaly. In other words, predicting system anomalies is equivalent to predicting alarms.

Figure 1: The basic background of server operations and the generation process of the system log data set.

2.2 Generation of system logs

As shown in Fig. 1, server clusters are commonly utilized in financial corporations. While servers are running, KPI data recorders will record the usage of CPU, memory and disks for each server all the time, thereby yielding time series data of KPIs. Meanwhile, if system exceptions or failures occur in some servers, alarm monitors will record detailed information of system anomalies and produce alarms to operators. Then the operators will label the severity level of system anomalies according to the content of corresponding alarms. Therefore, the time series data of server KPIs and the labelled alarm data compose the whole system log data set.

3 Related work

Extensive research has been conducted on detecting and predicting anomalies in IT systems. According to different application scenarios, the type of anomaly may differ from each other. To the best of our knowledge, the anomalies in the previous works can be categorized into two kinds: one is the anomalies in KPI time-series, the other is the anomalies in system log content.

Most of the previous works focus on detecting anomalies in time series. Among all the traditional time-series anomaly detection methods, threshold-based methods are classical and practical ones. Lee et al. (2012) proposed a scalable threshold-based solution, called Threshold Compression, to eliminate anomalies and thereby accurately capture the spatial-temporal network dynamics. Liu et al. (2015) utilized dynamic adjusted threshold method to reveal anomalies in time-series data, and deployed this anomaly detection algorithm into a network monitoring system of an Internet-based service. Some machine learning based methodsSalfner and Malek (2007); Lee et al. (2018) are also be employed to detect anomalies in time series. Lee et al. (2018) developed a novel time-series anomaly detection system which combined state-of-the-art machine learning and data management approaches. In addition, other signal processing methods and hybrid methods are demonstrated to be useful in practical applicationsLu and Ghorbani (2009); Laptev et al. (2015); Taylor and Letham (2018). For example, Lu and Ghorbani (2009) proposed a new network signal modelling technique based on wavelet analysis technique for detecting anomalies on networks. Laptev et al. (2015) presented a generic and scalable framework, called EGADS, for automated anomaly detection on large scale time-series data. It utilized blended approaches, including time-series decomposition, change point detection and time-series clustering, to reveal different subtypes of anomaly.

Some other works detect anomalies based on the content of system log data. Text processing and analysis methods are commonly utilized to first extract features in system logsStearley and Oliner (2008), which ensures the subsequent anomaly detection methods. After the feature extraction process, machine learning methods can be leveraged to distinguish system anomalies. Fulp et al. (2008) presented a support vector machine(SVM) based method which distinguished system failures and normal messages based on the frequency of message sequences. Juvonen and Hamalainen (2014) employed random projection techniques on the preprocessed numeric matrix of system logs and then used Mahalanobis distance to find outliers. Note that unsupervised learning methods are also widely utilized to reveal anomalies Stearley (2004); Hu et al. (2018); Du and Cao (2015). For example, Du and Cao (2015) presented a new detection method which generated pattern sets based on the effective hierarchical clustering algorithm and detected anomalies according to the relation between the log sequences and the patterns in pattern sets. Hu et al. (2018) designed a similarity clustering algorithm on the system logs and then determined the anomalies according to distance measure.

As for anomaly prediction in IT systems, most of the system anomaly prediction framework consist of several main parts, such as feature selection phase, and prediction model training phase, sometimes including optimal model selection phase. For instance, Pellegrini et al. (2015) proposed a machine learning based framework to calculate the remaining time to system failure. It utilized regularization algorithm to select different sets of features, and then predicted system failures based on different generated prediction models. Naveiro et al. (2018) presented a framework with a class of models, and aimed to monitor KPI time series and detect anomalies in a completely automatic way. Sipos et al. (2014) proposed a data-driven approach based on multiple-instance learning for predicting equipment failures by mining equipment event logs. It extracted features from system logs and utilized a bootstrapped feature selection algorithm to select relevant features. After feature selection, it trained a sparse linear classification model by multiple-instance learning and generated predictive scores for test instances.

4 Methods

In this section, we propose a Real-Time Anomaly Prediction framework, namely RTAP, to predict system anomalies and report severity levels by employing hybrid machine learning methods. The architecture of RTAP includes four modules, which are data preprocessing module, time series prediction module, anomaly identification module and severity classification module, as illustrated in Fig. 2.

Figure 2: The architecture of the Real-Time Anomaly Prediction(RTAP) framework.

First, the raw system log data at time step comes as the input of data preprocessing module, which removes data noises and yields cleaned KPI data. Then the KPI data prediction module utilized cleaned KPI data to produce the predicted KPI data at time step . In essence, this module is the core part in our framework, as it determines the accuracy of subsequent severity level prediction results. Once the predicted KPI data at time step is generated, it is necessary to infer the existence of anomaly and identify the severity level from the predicted KPI data. Due to the data imbalance problem, we design a hierarchical architecture of classifiers, instead of a single multi-class classifier. Note that the anomaly identification module trains two-layer stacking classifiers to infer the existence of anomaly at time step . For those which are discriminated as anomalies, we utilize the severity classification module to identify the severity levels, including low, medium and high severity levels.

4.1 Data Preprocessing Module

The raw system log data contains noises and missing KPI values, which can influence the anomaly prediction performance of our proposed framework. In this module, we utilize data standardization, data cleaning and missing value filling techniques to eliminate errors and reduce noise to the most extent.

4.2 KPI Data Prediction Module

KPI data prediction module is an essential part of the proposed framework, since the subsequent two modules depend on its output to infer the existence of system anomalies and predict their severity levels. The accuracy of the predicted KPI data at time step is crucial for the overall performance of the RTAP framework.

Considering the tradeoff between high accuracy and superior real-time capacity, we adopt random forests regression(RFR) Segal (2004), a regression version of random forest(RF) Breiman (2001), as the KPI data predictor. It has shown significant gains in various time series prediction tasks Zarei et al. (2013). And evaluation results in Sect. 5 also demonstrate that RFR outperforms other algorithms in this prediction task.

For clarity, we introduce some basic ideas of RF. RF is an ensemble classifier including a number of decision trees. The key idea of RF is to utilize decision trees with low correlation to produce accurate ensemble predictions. Note that there are two important points in the RF: one is that RF leverages a technique, called bagging, to select random samples with replacement in the training set; the other is that when splitting nodes in decision trees, a random subset of features is considered to produce the most appropriate separation, instead of evaluating all possible features at a time. These properties guarantee RF to generalize well to new data and reduce the risk of overfitting. In contrast to RF, RFR utilizes a slightly different cost function of splitting nodes in decision trees. The commonly used node splitting cost function in RFR is the mean square error(MSE) function, instead of information gain or gini coefficient Hssina et al. (2014) in RF.

4.3 Anomaly Identification Module

As system anomalies usually appear rarely in the server running process and the number of anomalies with high severity level are even fewer, it is challenging to distinguish anomalous cases from normal cases in such class-imbalanced data. Hence, we elaborately design a pipeline to first identify the existence of anomalies and then recognize severity levels for those anomalies, instead of making a multi-classification to discriminate normal cases and all kinds of anomalous cases. The anomaly identification module is the first part of the pipeline, which can also be regarded as a binary classification module.

Figure 3: The hierarchical structure of stacking classifiers in the anomaly identification module. This structure has two layers: the base layer includes four single classifiers(DT, RF, kNN and GBDT) while the meta layer includes an LR classifier.

To increase the accuracy of this module, we propose a hierarchical structure involving several classifiers, as illustrated in Fig. 3. The structure contains two layers of classifiers, in which the base layer is comprised of decision tree(DT), random forest(RF), k-Nearest Neighbours(kNN), gradient boosting decision tree(GBDT) Friedman (2001, 2002), and the meta layer is a logistic regression(LR) model blending the output of base classifiers as its input features Sulzmann and Fürnkranz (2011). As follow the literature, simple linear classifier usually works well in the meta layer Witten et al. (2016). Note that this structure is brought from the stacking method, which is an ensemble technique widely utilized in various applications to improve the performance of classification algorithms Wolpert (1992).

4.4 Severity Classification Module

In this module, we only consider how to distinguish system anomalies with different severity levels. We present a k-Nearest Neighbors (kNN)-based method, which is commonly employed in many tasks due to its simplicity and the tolerance in high-dimensional and incomplete data Ban et al. (2013).

Formally, in kNN model, given a new KPI sample , the output severity level depends on the k-nearest neighbor samples of in the training data set. The distance between and all training samples is based on Euclidean distance, which is defined as:


where and represent the two samples and is the dimension of feature vector . Let us denote a set contains the indices of k-nearest neighbors of the sample , the target value of the alarm level is given by Eq. 2. As the severity of system anomalies only has three levels, is set to 3.


Since anomalies with high severity level are also rare in the system log data, the class-imbalanced problem still needs to be resolved in this module. To alleviate this problem, we propose a weighted sampling strategy before training the kNN model. Anomaly samples are replicated in the training data set according to the weights of different severity levels. In our experiments, we give higher weight to anomalies with higher severity level.

5 Experiments

In this section, we first describe some details of the utilized system log data set, and then illustrate the evaluation experiments of each modules and the overall framework. Due to the challenge in obtaining source codes of similar anomaly prediction frameworks, we do not make any comparison between the RTAP framework and others. However, we provide insights on how each module is indispensable in the RTAP framework by performing detailed comparison on various design choices of our framework.

5.1 Data Set

The data set utilized in this paper is a part of the actual system log files of server clusters in a financial company. And the system logs come from four different types of financial business servers. For ease of presentation, we denote these types as Biz, Mon, Ora and Trd, respectively. More specifically, the system logs contain two types of data: one is KPI time series which record system performance states of 258 servers for approximate 5 months, the other is alarm log data which record the alarm status of the entire system in detail, including server name, alarm time, alarm content, alarm severity level, etc. After data preprocessing stage, the cleaned data contains approximate 640000 records of KPI time series in total, but only contains about 9000 alarm records.

To exploit abundant information from system logs and thereby make better anomaly prediction, we elaborately select several features, such as the maximum, minimum and average of CPU usage, memory usage, and usage of several disks, to depict the running state of servers at a certain time. Due to the technology limitations at the period of collecting system logs, the obtained system logs are coarse-grained time-series data in which the minimal time interval is one hour. Hence, we also include the KPI features of previous hours in the training process.

5.2 Evaluation Metrics

In our experiments, we evaluate the performance of each module in the RTAP framework and the overall performance of the RTAP framework. For KPI data prediction module performance evaluation, we use root mean squared error (RMSE) as the evaluation metrics to measure the time-series prediction error. For clarity, we give the definition of RMSE as shown in Eq. 3:


where is the sample size, and are the predicted KPI value and the real KPI value.

For experiments in the anomaly identification module, we evaluate the binary-class classification results by three evaluation metrics: precision, recall and score, which are given by:


where and refer to true positive, false positive and true negative sample numbers. We choose score as the final evaluation metrics, as the precision often outweighs recall under the actual server operation and maintenance scenario.

For other experiments, we use Macro score and Micro score as the evaluation metrics to evaluate the multi-class classification results. Macro score is a metric which gives equal weight to each class. Its definition is as follows:


where is the overall label set and is the score of label . Micro score is a metric which gives equal weight to each instance. Its definition is as follows:


As aforementioned, we choose Macro score and Micro score as the final evaluation metrics.

5.3 Evaluation of KPI Data Prediction Module

In this experiment, several traditional methods including support vector regression(SVR), moving average (MA) and exponential smoothing(ES) are utilized for comparison. To objectively evaluate the performance of each method, we select a naive method as baseline: setting the real KPI value at time step as the predicted KPI value at time step . As the baseline is a quite weak predictor, the performance comparison to it can reflect the actual prediction capacity of each method.

Business Type Method Maximum Minimum Average
Biz baseline 1.62 0.33 0.22
MA 1.35 0.37 0.35
ES 1.46 0.31 0.23
SVR 1.29 0.31 0.24
RFR 1.31 0.33 0.26
Mon baseline 1.00 0.23 0.17
MA 0.82 0.30 0.31
ES 0.88 0.22 0.17
SVR 1.06 0.62 0.50
RFR 0.89 0.27 0.20
Ora baseline 1.30 0.10 0.09
MA 1.04 0.15 0.15
ES 1.17 0.09 0.08
SVR 1.11 0.11 0.10
RFR 1.07 0.10 0.11
Trd baseline 0.20 0.09 0.07
MA 0.21 0.14 0.14
ES 0.18 0.09 0.08
SVR 0.17 0.10 0.09
RFR 0.16 0.09 0.09
Table 1: Comparison of RMSE values among all compared KPI data prediction methods. The predicted features include maximum, minimum and average of KPI features.

Then we evaluate the KPI data prediction performance for all compared methods, as illustrated in Table. 1. We can observe that RFR has less prediction errors than other methods over three KPI features (maximum, minimum and average). Note that RFR consistently outperforms other methods across four business types, which demonstrates RFR is an effective and robust method to predict KPI data of this data set.

Figure 4: The KPI data prediction results of the RFR model. For simplicity, the KPI features shown here only include the maximum and average of CPU and memory usage.

To further evaluate the prediction accuracy of RFR over timestamps, we select a server called alarmsvr1 as an example to evaluate the long-term time-series prediction performance. Fig. 4 shows the real KPI data curves and the corresponding predicted KPI data curves. It indicates that RFR performs well on the prediction of each feature, since RFR not only effectively captures the trends of KPI data but also predicts the KPI values accurately27.

5.4 Evaluation of Anomaly Identification Module

Business Method Precision Recall
Biz DT 0.6906 0.7319 0.6985
RF 0.8744 0.7080 0.8351
k-NN 0.7401 0.6290 0.7300
GBDT 0.8303 0.6206 0.7777
Stacking 0.8803 0.7017 0.8376
Mon DT 0.5707 0.6660 0.5875
RF 0.8792 0.6427 0.8189
k-NN 0.7323 0.5373 0.6827
GBDT 0.8568 0.4339 0.7170
Stacking 0.8754 0.6961 0.8325
Ora DT 0.4482 0.5731 0.4686
RF 0.7799 0.4901 0.6974
k-NN 0.5658 0.2549 0.4548
GBDT 0.6133 0.2194 0.4513
Stacking 0.8511 0.5653 0.7729
Trd DT 0.4109 0.5887 0.4373
RF 0.8621 0.5319 0.7669
k-NN 0.8103 0.3333 0.6300
GBDT 0.7282 0.5319 0.6781
Stacking 0.8542 0.5190 0.7564
Table 2: The classification results between the stacking classifier we proposed and its base classifiers.

All the algorithms are trained and tested on four types of business data which are Biz, Mon, Ora and Trd, respectively. Table. 2 reflects the overall performance statistics of stacking classifier and other base classifiers. We can observe the stacking classifier outperforms other algorithms in most cases, although RF performs best on the Trd data. It indicates that the stacking classifier can distinguish normal cases and anomalous cases more accurately than its base classifiers. In addition, the stacking classifier behaves more stably on various types of business data, with all scores greater than 0.75. Note that the test data utilized in this module include normal samples and anomalous samples with a proportion of in Biz, in Mon, in Ora and in Trd. The results in Table. 2 suggest the stacking classifier can effectively reduce the false positive rate in such highly imbalanced data. Overall, we can conclude that our proposed stacking classifier is robust on all types of business data, and can predict the existence of system anomalies accurately based on the predicted KPI features.

5.5 Evaluation of Severity Classification Module

In this module, we evaluate the effectiveness of our proposed weighted sampling strategy. We compare the performance of the kNN model with weighted sampling strategy and its counterpart without the sampling strategy. As illustrated in Table. 3, we can see that when the sampling strategy is utilized, the kNN model consistently performs better on all four types of business data. We also observe that the kNN model with sampling strategy achieves a gain of at least in Macro score, which suggests this weighted sampling strategy can alleviate the impact of class-imbalance problem on the performance of this module. It further demonstrates this module can distinguish anomalies with different severity levels in a more accurate way.

Business Sampling
Biz Yes 0.7680 0.9940
No 0.5272 0.9924
Mon Yes 0.7102 0.9941
No 0.4998 0.9876
Ora Yes 0.6305 0.9874
No 0.4482 0.9750
Trd Yes 0.8591 0.9979
No 0.5607 0.9976
Table 3: The classification results of whether using the weighted sampling strategy in the kNN model or not. The test data utilized in this table contains system anomalies with all three severity levels.

5.6 Evaluation of the RTAP Framework

To evaluate the overall performance of our RTAP framework, we utilize the data from 201X-01-01 00:00 to 201X-05-31 23:00 28 to train the framework and test its performance on the data from 201X-06-01 00:00 to 201X-06-30 23:00. As RF is a commonly used ensemble classifier which is robust to noises and performs well in many scenarios, we replace the anomaly identification module and the subsequent severity classification module with a single RF classifier as a comparable baseline method, namely RTAP-C.

As shown in Table. 4, we can see that both RTAP and the baseline can reveal all normal cases in the test data. However, RTAP almost consistently outperforms the baseline on anomalous cases across all types of business data. Specifically, RTAP can predict high level anomalies quite accurately in Biz data, while the baseline can not even predict the existence of high level anomalies in Biz and Ora data. It suggests that our framework can effectively alleviate the class imbalance problem in the system log data and thereby produce accurate results when predict different severity levels of system anomalies.

Business method normal low medium high macro micro
Biz RTAP-C 0.9955 0.6559 0.4301 0.0000 0.5118 0.9904
RTAP 0.9963 0.8107 0.2476 0.8333 0.6644 0.9939
Mon RTAP-C 0.9945 0.6208 0.3508 0.1999 0.5091 0.9874
RTAP 0.9961 0.8580 0.3845 0.4545 0.6121 0.9940
Ora RTAP-C 0.9900 0.4968 0.6250 0.0000 0.4827 0.9759
RTAP 0.9920 0.7536 0.8333 0.4762 0.6711 0.9876
Trd RTAP-C 0.9981 0.5960 —— —— 0.4917 0.9977
RTAP 0.9984 0.6219 —— —— 0.7227 0.9976
Table 4: The overall performance between the RTAP framework and the corresponding baseline method. The table illustrates the score on each type of data: normal cases and three severity levels of anomalous cases, as well as the macro score and the micro score. The line symbol in the table mean that the test data do not include corresponding types of cases.

6 Conclusion

In this paper, we propose an anomaly prediction framework to predict the existence and the severity level of anomalies in financial IT systems. Compared with previous works, we combine time series data of KPIs and text data of system exception logs to build the anomaly prediction framework. To select useful features from raw data, we integrate a number of KPI features to depict the system running state and further predict system anomalies. As class imbalance problem would tremendously affect the overall performance, we design a hierarchical architecture of alarm classifiers and utilize effective data sampling strategy to alleviate the problem. Empirically, the experimental results demonstrate the effectiveness and rationality of the propose framework. The results also give some insights for extending the framework into other financial IT systems, as long as similar KPI features can be recorded in the data collection process.

Due to the technology limitations at the period of collecting system logs, one of the limitations of this work is the obtained KPI time series are coarse-grained. Therefore, KPI changes in small time intervals can not be captured in our framework. Our future work is to optimize the KPI data prediction model on the fine-grained KPI time series, and make the proposed framework predict anomaly severity levels more accurately.

7 Acknowledgement

This work is supported by Shanghai Financial Futures Information Technology Co., Ltd. We acknowledge Shanghai Financial Futures Information Technology for proposing the research demand of predicting server anomalies based on time series data of server KPIs. We also thank the reviewers for their careful reading and insightful comments on our manuscript.


  1. and
  2. and
  3. email:
  4. email:
  5. email:
  6. email:
  7. email:
  8. email:
  9. email:
  10. email:
  11. email:
  12. email:
  13. email:
  14. email:
  15. email:
  16. email:
  17. email:
  18. email:
  19. email:
  20. email:
  21. email:
  22. email:
  23. email:
  24. Exact date information cannot be provided due to the protection of business information.
  25. Exact date information cannot be provided due to the protection of business information.


  1. Ban T, Zhang R, Pang S, Sarrafzadeh A, Inoue D (2013) Referential knn regression for financial time series forecasting. In: International Conference on Neural Information Processing, Springer, pp 601–608
  2. Bertero C, Roy M, Sauvanaud C, Trédan G (2017) Experience report: Log mining using natural language processing and application to anomaly detection. In: 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE), IEEE, pp 351–360
  3. Breiman L (2001) Random forests. Machine learning 45(1):5–32
  4. Du S, Cao J (2015) Behavioral anomaly detection approach based on log monitoring. In: 2015 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC), IEEE, pp 188–194
  5. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals of statistics pp 1189–1232
  6. Friedman JH (2002) Stochastic gradient boosting. Computational statistics & data analysis 38(4):367–378
  7. Fulp EW, Fink GA, Haack JN (2008) Predicting computer system failures using support vector machines. WASL 8:5–5
  8. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21(9):1263–1284
  9. Hssina B, Merbouha A, Ezzikouri H, Erritali M (2014) A comparative study of decision tree id3 and c4. 5. International Journal of Advanced Computer Science and Applications 4(2):0–0
  10. Hu S, Xiao Z, Rao Q, Liao R (2018) An anomaly detection model of user behavior based on similarity clustering. In: 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC), IEEE, pp 835–838
  11. Juvonen A, Hamalainen T (2014) An efficient network log anomaly detection system using random projection dimensionality reduction. In: 2014 6th International Conference on New Technologies, Mobility and Security (NTMS), IEEE, pp 1–5
  12. Laptev N, Amizadeh S, Flint I (2015) Generic and scalable framework for automated time-series anomaly detection. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1939–1947
  13. Lee SB, Pei D, Hajiaghayi M, Pefkianakis I, Lu S, Yan H, Ge Z, Yates J, Kosseifi M (2012) Threshold compression for 3g scalable monitoring. In: INFOCOM, 2012 Proceedings IEEE, IEEE, pp 1350–1358
  14. Lee TJ, Gottschlich J, Tatbul N, Metcalf E, Zdonik S (2018) Greenhouse: A zero-positive machine learning system for time-series anomaly detection. arXiv preprint arXiv:180103168
  15. Liu D, Zhao Y, Xu H, Sun Y, Pei D, Luo J, Jing X, Feng M (2015) Opprentice: Towards practical and automatic anomaly detection through machine learning. In: Proceedings of the 2015 Internet Measurement Conference, ACM, pp 211–224
  16. Lu W, Ghorbani AA (2009) Network anomaly detection based on wavelet analysis. EURASIP Journal on Advances in Signal Processing 2009:4
  17. Naveiro R, Rodríguez S, Insua DR (2018) Large scale automated forecasting for monitoring network safety and security. arXiv preprint arXiv:180206678
  18. Pellegrini A, Di Sanzo P, Avresky DR (2015) A machine learning-based framework for building application failure prediction models. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), IEEE, pp 1072–1081
  19. Salfner F, Malek M (2007) Using hidden semi-markov models for effective online failure prediction. In: 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007), IEEE, pp 161–174
  20. Segal MR (2004) Machine learning benchmarks and random forest regression
  21. Sipos R, Fradkin D, Moerchen F, Wang Z (2014) Log-based predictive maintenance. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1867–1876
  22. Stearley J (2004) Towards informatic analysis of syslogs. In: cluster, IEEE, pp 309–318
  23. Stearley J, Oliner AJ (2008) Bad words: Finding faults in spirit. In: Eighth IEEE International Symposium on Cluster Computing and the Grid, IEEE, pp 765–770
  24. Sulzmann JN, Fürnkranz J (2011) Rule stacking: An approach for compressing an ensemble of rule sets into a single classifier. In: International Conference on Discovery Science, Springer, pp 323–334
  25. Taylor SJ, Letham B (2018) Forecasting at scale. The American Statistician 72(1):37–45
  26. Witten IH, Frank E, Hall MA, Pal CJ (2016) Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann
  27. Wolpert DH (1992) Stacked generalization. Neural networks 5(2):241–259
  28. Zarei N, Ghayour MA, Hashemi S (2013) Road traffic prediction using context-aware random forest based on volatility nature of traffic flows. In: Asian Conference on Intelligent Information and Database Systems, Springer, pp 196–205
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description