A hybrid approach for risk assessment of loan guarantee network

A hybrid approach for risk assessment of loan guarantee network

Zhibin Niu   Dawei Cheng   Junchi Yan
Jiawan Zhang22footnotemark: 2   Liqing Zhang33footnotemark: 3   Hongyuan Zha
Correspondence author{zniu,jwzhang}@tju.edu.cn, Tianjin University, China{dawei.cheng, zhang-lq}@sjtu.edu.cn, Shanghai Jiao Tong University, Chinayanjc@cn.bim.com, IBM Research – Chinazha@cc.gatech.edu, Georgia Institute of Technology, US
Abstract

Groups of Small and Medium Enterprises (SME) back each other and form guarantee network to obtain loan from banks. The risk over the networked enterprises may cause significant contagious damage. To dissolve such risks, we propose a hybrid feature representation, which is feeded into a gradient boosting model for credit risk assessment of guarantee network. Empirical study is performed on a ten-year guarantee loan record from commercial banks. We find that often hundreds or thousands of enterprises back each other and constitute a sparse complex network. We study the risk of various structures of loan guarantee network, and observe the high correlation between defaults with centrality, and with the communities of the network. In particular, our quantitative risk evaluation model shows promising prediction performance on real-world data, which can be useful to both regulators and stakeholders.

A hybrid approach for risk assessment of loan guarantee network

Zhibin Niuthanks: Correspondence author thanks: {zniu,jwzhang}@tju.edu.cn, Tianjin University, China   Dawei Chengthanks: {dawei.cheng, zhang-lq}@sjtu.edu.cn, Shanghai Jiao Tong University, China   Junchi Yanthanks: yanjc@cn.bim.com, IBM Research – China
Jiawan Zhang22footnotemark: 2   Liqing Zhang33footnotemark: 3   Hongyuan Zhathanks: zha@cc.gatech.edu, Georgia Institute of Technology, US


1 Introduction

For financial innovation, financial safety and particularly risk management has attracted the main attention of the governments and banks. In general firms raise money either by going to public or by bank loan. For those Small and Medium Enterprises (SME) who have difficulty in financing, in developed countries like US and UK the government once worked as guarantors to help the SMEs to obtain funds from the banks [40, 33, 26, 47, 31]. While recently in emerging economies like orea [22] and China [34], corporations can also play a role as the guarantors to guarantee each other when they are trying to secure loans from lending institutions. This has led to a noticeable new phenomenon: a large amount of corporations back each other and form complex Guarantee Networks (GN). When we dive into the cooperated bank loan records which span the loan guarantee relationships, we obtain thousands of independent local graphs disconnected to each other. As will be shown later in the paper, these networks manifests various structures.

Such a cross-guarantee practice on one hand helps the growth of the SMEs by reducing the financing cost. But on the other hand, when one corporation gets trapped in risk, it may affect others in the network in a contagious way. The existence of guarantee network in fact exposes the involved corporations to the risk especially during the period of economy slowdown [40]. One well-performed enterprise could be jeopardized once it is involved in a risk guarantee network.

To effectively dissolve the risks of the GN, especially during the economic slow-down period, and in emerging economy entities like China and Korea, the credit risk assessment in GN is more urgent than ever before. In particular, there are several challenges to be addressed:

  1. The existing mechanism for loan decision making falls behind the business demand. The number of SMEs and the count of guarantee loans are both increasing rapidly. Guarantee loan has become one of the main channels to raise money for SMEs in China in recent years [34]. It is reported that a quarter of the $13 trillion in total outstanding loans in China are guaranteed loans in 2014 [40] and there is an 18% year-to-year increase [37]. In fact, current bank loan criterion are not designed for SMEs loan but for major players, which calls for developing tailored risk evaluation methods for SMEs.

  2. The veracity and volume of the information is another challenge. One one hand, some SMEs are motivated to provide inaccurate or even manipulated data to the banks to satisfy their strict regulations originally designed for big companies, which is difficult to distinguish by bank. One the other hand, various data need to be examined by the bank. For instance, the original evaluation criteria we obtain from the bank consists of more than 3000 entries, which causes considerable burden to specialists.

  3. In bank industry, people hardly consider the dependency among the borrowers in the guarantee network, and only node-wise profiles are taken into account. This may not be well suited to the case when firms form a risk-binding community by guaranteeing each other. In practice, thousands of such graphs of different complexity coexists for a long period. Only using node-wise profile for credit risk assessment suffers limitation in such complex situation. Our study verifies this conjecture by the quantitative results as shown later in the paper.

Figure 1: A real-world guarantee loan network formed from bank records, with each node as an SME. The guarantee loan relationship is given on the right. The graph means enterprise A (guarantor) guarantees B and C (borrower) to get loan from the bank (lender).

In this paper, we propose a principled approach for credit risk evaluation in loan guarantee network, which is derived from the bank loan records. Based on the records spanning more than ten years from a large commercial bank, we empirically study how the graph structure measurements relate with the node’s default possibility. We design a hybrid feature representation for the risk evaluation in GN. Based on such input features, we employ the XGBoost (https://github.com/dmlc/xgboost) method to predict the default rates on a sliding time window basis, and the prediction performance as shown later in the paper suggest the effectiveness of our approach.

In a nutshell, the paper’s main contributions are:

  1. We identify and provide practical solution to the problem for credit risk evaluation in loan guarantee network, which is driven by emerging finance industry demands, and we believe this is an important research problem to the data mining community;

  2. We propose a hybrid feature representation with a sliding-window based prediction paradigm for this problem, whose efficacy is verified by empirical studies on real-world dataset.

  3. We draw some findings by investigating the relationship between loan guarantee network measurements and default rate. The node with higher hub score in the network has lower default rate. While node with higher authority score has higher default rate. The former can be treated as the player with good reputation thus many other players turn to him for guarantee, the latter denotes those players who turn to many other players for guarantee, and they are often at risk to default.

The paper is organized as following: Section 2 describes works involving different aspects related to our problem and approach including credit risk evaluation, network based analytics in financial industry, and the gradient boosting learning algorithm. In Section 3, we first introduce how the data is preprocessed, and then illustrate our findings for how the graph centralities relate to the defaults based on statistics; Section 4 explains our hybrid representation for enterprises in guarantee network and introduces the prediction model. Experimental results are give in Section 5. Conclusions and future works are described in Section 6.

Figure 2: The default prediction pipeline.

2 Related Work

As aforementioned, there is an emerging trend in China that more and more enterprises are involved into the loan guarantee network [34]. The demand for credit evaluation in the guarantee network is urgent while it is few studied. We introduce several relevant approaches in the traditional setting of credit risk evaluation, network analytics in financial domain. We also mention the boosting algorithms as used in our studies.

Credit risk evaluation Consumer credit risk evaluation is often technically addressed in a data-driven fashion and has been extensively investigated [29, 7]. Since the seminal work “Partial Credit” model [39], numerous statistical approaches are introduced for credit scoring. For example, logistic regression [48], k-NN [30], neural network [21], support vector machine [32]. More recently, [6] presents an in-depth analysis on how to interpret the learned knowledge embedded in the neural networks using explanatory rules, and discussed how to visualize these rules. The authors in [35] combine debt-to-income ratio with consumer banking transactions, and use a linear regression model with time-windowed data set to predict the default rates in a short future. They claim a 85% default prediction accuracy and can save cost between 6% and 25%.

Financial network analytics Financial crises and systemic risk have always been a major concern for financial companies and governments, with extensive work having been studied [12, 25]. Networks or graph represented by interconnected nodes and links between them is a good representation of complex topologies including solid models [44], social relations, genetic interaction, transportation, the internet and ecological [43]. Modern financial systems can also be recognized as a complex network as they also have complex interdependence and connections inside [2]. The relationship between network structure and financial system risk are carefully studied and several insights have been drawn: network structure has few impact for system welfare but plays an important role in determining systemic risk and welfare in short-term debt [3]. They also report that rollover occurs less often in the clustered than in the less clustered network [3]. After the 2008 global financial crisis, network theory attracts more attention: the crisis brought by Lehman Brothers spreads on connected corporations in a similar infectious way as the epidemic of Severe Acute Respiratory Syndrome (SARS) in 2002 – both are small damage that hits a networked system and causes serious events [10, 15]. The journal of Nature Physics organizes a special on how to understand some fundamental economic issues using network theory [1]. These publications suggest the applicability of network based financial model. For example, the dynamic network produced by bank overnight funds loan may be an alert of the crisis [15]. Contrary to the conventional stereotype that large institutions are “too big to fail”, the truth is the position of the institution in the network is equally and sometimes more important than its size [8]. More central the vertex is to the graph, more influential it is to the whole economic network when default occurs [15]. Moreover, the research that aims to understand individual behavior and interactions in the social network, has also attracted extensive attention [45, 23, 9, 50, 49, 57, 46]. The advance of social network are also applied to financial system analysis. Although preliminary efforts have been made using network theory to understand fundamental problems in financial systems [52, 20, 14], there is little work on the system risk analysis in the guarantee loan network except for the preliminary work [41]. Among them, may be the most important work is using K-shell decomposition to predict the default rate; positive correlation between the K-Shell decomposition value of the network and default rates was reported [41]. To our best knowledge, this is the first work to address the guarantee network credit risk evaluation problem by considering both graph structure and individual node profile under the supervised learning paradigm.

Gradient boosting Gradient boosting is a highly effective classification and regression machine learning technique. The idea is to produce a strong prediction model by resembling a large amount of weak prediction models [42, 19]. The work is founded by a series of seminal theory studies [11, 27, 28, 13]. Specifically, XGBoost (short for “Extreme Gradient Boosting”) is an optimized distributed gradient boosting library [18, 17]. XGBoost is widely used in machine learning challenges and practical industry problems, and has achieved state-of-the-art results [16]. We use XGBoost in our testbed and bypass fine details of the approach here, readers are suggested to follow the references above.

3 Data Engineering

3.1 Data Description and Preprocessing

We collect loan records spanning ten years from a major commercial bank in China. The names of the customers in the records are encrypted and replaced by an ID; we can access the basic profile like the enterprise scale, the loan information like the guarantee ID and loan credit. We first introduce the loan process, and then explain how the information are extracted and cleaned.

Figure 3: Guarantee loan process. The SME (borrower) wishing to get loan from bank first need to sign guarantee loan contracts with guarantors before sign loan contracts; After the company received loan from the bank, it repays the loan by installment.

In order to obtain loans the borrower need to open an account and provide detailed information to the bank. The banks are reluctant to issue the loan as it is very difficult for SMEs to meet the existing bank criterion which is intended for big companies. There is more or less a blank area for setting the criterion for SMEs due to their lacking of security. Thus, the small business finds other corporations as endorsement. To reduce risk, the banks need to collect as much fine-grained information as possible, concerning the repayment ability of the enterprise. In our case, the information falls into four categories: transaction information, customer information, asset information such as mortgage status, history loan approval bank side record, etc. The most relevant to the guarantee loans are eight data tables: customer profile, loan account information, repayment status, guarantee profile, customer credit, loan contract, guarantee relationship, guarantee contract, default status. These attributes are listed in detail in Figure SM1 from supplementary material. There are often more than one guarantors for one loan transaction as Figure 3 shows, and there may be several loan transactions for a single guarantor in a period. Once the loan is approved, the SMEs usually can obtain the full size of loan immediately, and start to repay to the bank regularly by an installment plan until the end of the loan contract. In the record preprocess phase, by joining the nine tables as Figure 4 shows, we obtain records related to the corporation ID and loan contracts. We then construct the guarantee network and compute the network related measurements.

Figure 4: Overview of data association and cleaning.

3.2 Guarantee Data Exploration

We now report the observations derived from the data.

Overall statistics There are 11,000 loan customers, which span 60,948 mutual guarantee relationships derived from 36,618 loan contracts. There are 5,911 defaults during the past ten years, out of the total 87,307 repayments. The overall default rates to the number of contracts is 6.77%.

Figure 5: Left: Loan period distribution; Right: Default period distribution over time.

Figure 5 shows the distribution of the loan period and default period. Over 71.27% loans are one-year short loan, and about 13% are medium two or three-years loans, and about 16% are eight years or longer loans. We also observed 99.01% defaults occur in the first year, with each quarter there is periodic growth.

Figure 6: (a) Default rate over month. (b) Two complexity metrics over month

Figure 6 (a) gives the average default rate each month from 2012 to 2015. The whole trend of default rate is increasing. At the beginning, most of the 2012, there is no default, however, since the end of 2012 to the following six month, there is a sudden jump of defaults. This means large scale defaults happened, news report of the following year [58] corroborates this hypothesis. Two complexity metrics are plotted in Figure 6 (b): (i) the distance code centric index as defined in [51], (ii) the average diameter. Both increase as more corporations join in the network.

We next explore more relationships between the relationship between default rate and the number of loan borrowers in the guarantee network. We use the number of borrowers in the guarantee network to measure the guarantee complexity. As Figure 7 shows, the borrower numbers range from 1 to 466. The distribution gives a weak dichotomy phenomena – statistically 85.1% are graphs with fewer than 50 vertexes while about 6.6% are graphs composed of more than 300 vertexes. The guarantee network with vertex number between 50 and 300 are rare. It gives a rough-opposite distribution shape to the guarantee network vertex distribution. We note that the guarantee networks with a medium or large number of vertex tend to show a much higher default rate than those with a small number of vertex.

Figure 7: Distribution of vertex number against default rate and customer ratio.

Centrality indicators are helpful to identify the relative importance of vertices in the graph. Figure 9 gives how the default rates is distributed with different centrality indicator values.

Authority and Hub score are proposed by Jon Kleinberg to analysis web link importance [36]. Authority score is an estimation of the content value of the page and the hub score estimates the value of its links to other pages. As authority and hub score both are importance measurements of a node, we investigate their relationship with defaults in the guarantee network.

In particular, Figure 8 (a,b) gives both authority and hub score for GN32. GN32 is the NO. 32 connected subgraph in the whole network, and all the subgraphs are disjoint to each other. This is a typical independent subgraph we constructed from the bank loan records. It involves 106 vertex in the guarantee network and has an average 14.2% default ratio. The sizes of the nodes are proportional to their values. It can be see from the graph that the vertex has the largest hub values is the one in the middle one works as the “bridge” vertex connecting others; while the vertexes with large authority values are around the “hub” vertex nodes. Figure 9 gives the histogram of several most complex subgraphs on how the defaults distributed with the authority and hub value. It is noted defaults happen more on vertex with large authority value and small hub values. This is consistent with intuition – the enterprise works as the hub ones back a large number of other corporations and it is supposed to be relatively stable and operates in good condition. In contrast, the enterprise works as the authority ones and accepts guarantee from many other corporations and this means they lack funds security and have higher risk in trouble. The statistics indicate the lender to watch the status of the “authority” high nodes in the guarantee network.

Figure 8: Authority (a), Hub score (b) visualization for the connected subgraph GN32. Note the arrows indicate the guarantee relation from guarantor to borrowers. Graph communities and default rate (c) for graph GN32. Zoom in for better view.

PageRank is a famous ranking algorithm used by Google to determine the importance of a webpage among the internet. The websites receive more links from other websites are given higher pagerank values. Although the underlying assumption is quite alike authority score, we did not observe similar correlation between the values and default rates (see Figure 9).

K-shell decomposition finds subgraphs with nodes of degree at least k within the subgraph [5]. It is extensively used in many areas of complex network analysis including social network influence analysis [24], bioinformatics [4] etc. In financial risk analytics, the positive correlation with default rates is reported in recent work [41], we also empirically observe this idea as illustrated in Figure 9.

Eigenvector centrality, Betweenness centrality, Closeness centrality are also typical graph centrality measurements. We observed that the larger the centrality the higher default rates (see statistical results on typical guarantee network in Figure 9).

Communities with default rate Based on the conjecture that defaults occur as ethnic groups, we perform community detection on some graphs with significant default rate GNs. As an example, Figure 8 (c) shows the graph community detection results on GN32 subgraph. The communities are marked using separate color background and average default rates are labeled. There are 9 communities, but the default occurs on four of them with average 41.7%, 31.6%, 25%, 20% defaults rates separately, all other 5 communities have no default during the guarantee network existence. Similar phenomenon is observed on random walks, edge betweenness, and spinglass community. We adopt the average default rates of detected communities as features in our representation.

Figure 9: Overdue rates for different graph matric values. From left to right, each column is for a kind of graph matric, namely Authority score, Hub score, PageRank value, K-shell value, Eigenvector centrality, Betweenness centrality, Closeness centrality; From top to down, each row is the most complex independent subgraphs.

4 The Hybrid Feature based Method

The loan records reveal that the guarantee network and default rates are both increasing, and the network structures shows strong correlation with the defaults. We construct feature vector consisting of hybrid information and employ supervised learning approach to train the prediction model. In what follows, we discuss the hybrid features used in our model.

4.1 Feature Categorization and Extraction

In order to build a highly representative feature which can reliably reflect the statistic relationships between the customers information and their repayment ability, we cleaned the data and construct the features as basic profile, credit behavior, active loan information, and network features. The detailed features can be categorized into the following groups:
Basic Profile (BP) refers to the essential company registration information, which reflect the character, capital, collateral, capability, condition and stability [41]. We use business nature, registered capital, enterprise scale, employee number and others as corporation’s basic profile. Most banks require company to update the basic information when the enterprise makes a loan application, and we choose to use the latest information as the basic profile features of the loan.

Credit Behavior (CB) includes historical behavior e.g. credit history, default records, default amount, total loan amount and loan count, total loan frequency (if any), total default rates. They are calculated by all the loan records before the active loan contract.

Active Loan (AL) is the loan contract in its execution period. It contains active loan amount, active loan times, type of capital return and interest return etc.

Network Structure (NS) Network features such as centralities are extracted as NS. Note that as discussed above, the basic profile may be not completely trustworthy as the SMEs may provides out of date or even fake information to the bank. However, the guarantee network is trustable information as the bank can build it from its own record systems.

Community Behavior (CB) As Figure 8 (c) shows, defaults occur clustered: the default may spread like disease within groups. We compute the default rates of each community and use it as the CB feature.

4.2 Modeling

The prediction of default for a customer’s guarantee loan can be modeled as a supervised learning problem. We use gradient boosting tree [27] logistic regression for the predication. The tree ensemble model using additive function to prediction output can be represented as:

(4.1)

In Eq. 4.1, is the decision tree, is the training feature and is predication results [17].

In practice, finding parameters of the tree model is turned into minimize the objective function problem and it can be trained in an additive manner [17].

(4.2)

where

(4.3)

where is a training loss function measures the difference between the prediction and the target; is a regularization term helps to smooth the final learnt weights to avoid over fitting.

5 Empirical Study

We perform empirical experiments to compare the importance of the features and the effectiveness of the risk assessment models.

The risk assessment framework is illustrated in Figure 2. Firstly, the loan records (see Section 3.1 for details) are extracted from data warehouse and stored in customer data management (CDM) system. Then, five categories of features (basic profile, credit behavior, active loan, community behavior and network structure) from the loan records in the given sliding window are extracted. These features are model input.

Specifically, in this paper we use three-month window for training, observation, predication, and evaluation. As Figure 10 shows, in the training stage, for all customers who obtain bank loans from 2013 Q1(first quarter of 2013), the features are extracted in that period, and the repayment status between 2013 Q2 are the labels to train the model. In the testing stage, we use the trained model to predict the customers who obtain loans between 2013 Q2 and use the real repayment status from 2013 Q3 to evaluate the performance when reaching the end of September 2013. The reasons for such a sliding window setting are two-folds:

  1. Prediction shall be adapted to a dynamic setting with a regularly updated forecasting results. In fact, using sliding window is a typical way for rolling prediction as commonly adopted in event prediction practices such as [55, 54].

  2. The business often runs on a quarterly basis, which can also be observed from Figure 5 (Right) that the default happens intensively at each end of quarter. Thus from a business demand perspective, it would be helpful to know the borrowers who may be default on a quarterly basis.

Figure 10: Illustration for the rolling sliding windows protocol. Features are extracted in the training window, and the corresponding outcome default label is collected in the observation window. Then the features and default outcome are used to train the model. The trained model is used by collecting the input features during the prediction window, and verify its performance when we reach the end of evaluation window.

5.1 Predication Performance

Here, we compare the prediction performance using the proposed hybrid representation via an ablation test on the four categories of features as described as follows.

Period NW NW,CB NW, N H
2013 Q3 0.910 0.924 0.917 0.925
2013 Q4 0.905 0.926 0.920 0.931
2014 Q1 0.901 0.929 0.923 0.930
2014 Q2 0.907 0.931 0.928 0.933
2014 Q3 0.908 0.935 0.933 0.937
2014 Q4 0.910 0.933 0.939 0.941
2015 Q1 0.908 0.937 0.946 0.946
2015 Q2 0.902 0.938 0.942 0.945
2015 Q3 0.911 0.935 0.946 0.952
2015 Q4 0.907 0.935 0.954 0.959
Table 1: AUC of forecasting models

We define Node-wise (NW) Feature as the vector composed of basic profile, credit behavior, active loan information; define Network (N) Feature as only network structure features; define Community Behavior (CB) Feature as loan history behavior associated with graph community; define Hybrid (H) Feature consists of both node-wise feature, network feature, and community behavior feature. Gradient boosting regression tree is commonly used to predict the default possibility using either one of or multiple the three categories of features.

The AUC (Area under Cure) of the models with different sliding windows are listed in Table 1. As expected, the models using Hybrid feature always outperform other models with naive node-wise feature. It is worth noting that before 2014 Q4, the node-wise and community behavior feature (NW,CB) performs better than node-wise and network (NW,N) feature yet the latter outperforms since 2014 Q4. The recall curves for these model in Figure 10 also reveal such a phenomenon, which perhaps is attributed to the increase of guarantee network complexity over time.

Figure 11: Recall of forecasting models using different feature representation over time. Refer to Section 4.1 for the abbreviations.

The recall curves of the four feature based prediction models are shown in Figure 11. All of them can perfectly predict the large scale defaults happening in the middle of 2013. One may note that we set the start date from January 2013 and get the first test result since July 2013. The reason we discard the period of 2012 which in fact is in fact an infancy phase for this network: there is no default (refer to Figure 6) in the first half. Since the end of 2012, the guarantee network has become more complex, and the default rate also increases. The predication recall values also increase and become rather stable. The average recall value of the four representations are 0.801, 0.866, 0.873, 0.899.

5.2 Feature Importance

This section compares the prediction performance for using node-wise, network, community behavior and our hybrid feature representation. By counting the times each feature is split to a branch of a decision tree in XGBoost regression, we can also obtain relative importance of the features.

Firstly, we use Figure 6 (b) to illustrate the fact that the guarantee network becomes more complex over time. Specifically, in the beginning, the average diameter is only about . In this period, the majority enterprises are independent and the graph is sparse. Three years later, the average diameter becomes , which means more and more enterprises are involved. It is noted that in November 2012, there is a jump on both network complexity and diameter, this is because when a large-scale default happens – see Figure 6 (a), corporations have to re-establish new guarantee to obtain funds from the bank.

We compare the relative importance of node-wise, network and community behavior feature. We compute the average importance of the three features from 2014Q3 to 2015Q4. As Figure 12 shows, node-wise feature, community behavior and network feature take opposite trends over time. Initially, node-wise and community behavior features share similar weights and four times more than network features; With the network structure more and more complex, the network feature importance are increased and even account for nearly one-third importance at 2015Q4. This is consistent with the above observation that as the guarantee network becomes more complex over time, the network centrality related features become more important. Moreover, since node-wise feature only assumes customers are independent, it has weak discriminations when the enterprise are involved in a complex network.

Figure 12: Feature importance score from 2014Q3 to 2015Q4. Refer to Section 4.1 for the abbreviations.

6 Conclusion

This paper presents a hybrid representation for guarantee network credit risk assessment in financial industry, which shows a promising default prediction performance. Network structures and active loan information exhibits strong correlation with default in a short time window. In particular, we highlight the authority and hub score bear strong discriminative ability for the default prediction task. Future work will involve adapting the diffusion network analysis model in social networks to the guarantee network risk analytics problem. One particular framework is the point process based learning algorithms which have been recently used for event modeling and relation discovery [56, 53, 38].

References

  • [1] Net gains. Nat Phys, 9(3):119–119, 03 2013.
  • [2] F. Allen and A. Babus. Networks in finance. 2008.
  • [3] F. Allen, A. Babus, and E. Carletti. Financial connections and systemic risk. Technical report, National Bureau of Economic Research, 2010.
  • [4] M. Altaf-Ul-Amine, K. Nishikata, T. Korna, T. Miyasato, Y. Shinbo, M. Arifuzzaman, C. Wada, M. Maeda, T. Oshima, H. Mori, et al. Prediction of protein functions based on k-cores of protein-protein interaction networks and amino acid sequences. Genome Informatics, 14:498–499, 2003.
  • [5] J. I. Alvarez-Hamelin, L. Dall’Asta, A. Barrat, and A. Vespignani. k-core decomposition: A tool for the visualization of large scale networks. arXiv preprint cs/0504107, 2005.
  • [6] B. Baesens, R. Setiono, C. Mues, and J. Vanthienen. Using neural network rule extraction and decision tables for credit-risk evaluation. Management science, 49(3):312–329, 2003.
  • [7] B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J. Suykens, and J. Vanthienen. Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the operational research society, 54(6):627–635, 2003.
  • [8] S. Battiston, M. Puliga, R. Kaushik, P. Tasca, and G. Caldarelli. Debtrank: Too central to fail? financial networks, the fed and systemic risk. Scientific reports, 2, 2012.
  • [9] S. P. Borgatti, A. Mehra, D. J. Brass, and G. Labianca. Network analysis in the social sciences. science, 323(5916):892–895, 2009.
  • [10] S. Bougheas and A. Kirman. Complex financial networks and systemic risk: A review. Springer, 2015.
  • [11] L. Breiman. Arcing the edge. Technical report, Technical Report 486, Statistics Department, University of California at Berkeley, 1997.
  • [12] M. K. Brunnermeier and M. Oehmke. Bubbles, financial crises, and systemic risk. Technical report, National Bureau of Economic Research, 2012.
  • [13] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pages 89–96. ACM, 2005.
  • [14] W. K. Carroll, C. Carson, M. Fennema, E. Heemskerk, J. Sapinski, et al. The Making of a Transnational Capitalist Class: Corporate power in the twenty-first century. Zed books, 2010.
  • [15] M. Catanzaro and M. Buchanan. Network opportunity. Nature Physics, 9(3):121–123, 2013.
  • [16] T. Chen and C. Guestrin. Machine learning challenge winning solutions, 2016. Retrieved 11 May 2016.
  • [17] T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. arXiv preprint arXiv:1603.02754, 2016.
  • [18] T. Chen and T. He. xgboost: extreme gradient boosting. R package version 0.4-2, 2015.
  • [19] C. Cheng, F. Xia, T. Zhang, I. King, and M. R. Lyu. Gradient boosting factorization machines. In Proceedings of the 8th ACM Conference on Recommender systems, pages 265–272. ACM, 2014.
  • [20] W. S. Chow and L. S. Chan. Social network, social trust and shared goals in organizational knowledge sharing. Information & Management, 45(7):458–465, 2008.
  • [21] V. S. Desai, J. N. Crook, and G. A. Overstreet. A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1):24–37, 1996.
  • [22] T. Doh and K. Ryu. Analysis of loan guarantees among the korean chaebol affiliates. International Economic Journal, 18(2):161–178, 2004.
  • [23] N. B. Ellison et al. Social network sites: Definition, history, and scholarship. Journal of Computer-Mediated Communication, 13(1):210–230, 2007.
  • [24] P. E. B. J. Feng. Measuring user influence on twitter using modified k-shell decomposition. 2011.
  • [25] G. H. Fischer and I. W. Molenaar. Rasch models: Foundations, recent developments, and applications. Springer Science & Business Media, 2012.
  • [26] W. S. Frame, A. Srinivasan, and L. Woosley. The effect of credit scoring on small-business lending. Journal of Money, Credit and Banking, pages 813–825, 2001.
  • [27] J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
  • [28] J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.
  • [29] D. J. Hand and W. E. Henley. Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society: Series A (Statistics in Society), 160(3):523–541, 1997.
  • [30] W. Henley and D. J. Hand. A k-nearest-neighbour classifier for assessing consumer credit risk. The Statistician, pages 77–95, 1996.
  • [31] HMRC, Department for Business Innovation & Skills. 2010 to 2015 government policy: business enterprise, 2016. Retrieved 11 May 2016.
  • [32] C.-L. Huang, M.-C. Chen, and C.-J. Wang. Credit scoring with a data mining approach based on support vector machines. Expert systems with applications, 33(4):847–856, 2007.
  • [33] M. Jian and M. Xu. Determinants of the guarantee circles: The case of chinese listed firms. Pacific-Basin Finance Journal, 20(1):78 – 100, 2012.
  • [34] M. Jian and M. Xu. Determinants of the guarantee circles: The case of chinese listed firms. Pacific-Basin Finance Journal, 20(1):78–100, 2012.
  • [35] A. E. Khandani, A. J. Kim, and A. W. Lo. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance, 34(11):2767–2787, 2010.
  • [36] J. M. Kleinberg. Hubs, authorities, and communities. ACM Computing Surveys (CSUR), 31(4es):5, 1999.
  • [37] J. Lee. China faces default chain reaction as credit guarantees backfire, 2015. Retrieved 11 May 2016.
  • [38] X. Liu, J. Yan, X. Wang, H. Zha, and S. Chu. On predictive patent valuation: Forecasting patent citations and their types. In AAAI, 2017.
  • [39] G. N. Masters. A rasch model for partial credit scoring. Psychometrika, 47(2):149–174, 1982.
  • [40] D. McMahon. Loan ’guarantee chains’ in china prove flimsy, 2016. Retrieved 11 May 2016.
  • [41] X. L. X. Meng. Credit risk evaluation for loan guarantee chain in china. 2015.
  • [42] A. Natekin and A. Knoll. Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 7, 2013.
  • [43] M. E. Newman. The structure and function of networks. Computer Physics Communications, 147(1-2):40–45, 2002.
  • [44] Z. Niu, R. R. Martin, F. C. Langbein, and M. A. Sabin. Rapidly finding cad features using database optimization. Computer-Aided Design, 69:35–50, 2015.
  • [45] J.-P. Onnela et al. Complex networks in the study of financial and social systems. Helsinki University of Technology, 2006.
  • [46] J. Qiu, Y. Li, J. Tang, Z. Lu, H. Ye, B. Chen, Q. Yang, and J. E. Hopcroft. The lifecycle and cascade of wechat social messaging groups. In Proceedings of the 25th International Conference on World Wide Web, pages 311–320. International World Wide Web Conferences Steering Committee, 2016.
  • [47] SBA team. U.S. Small Business Administration (SBA), 2016. Retrieved 11 May 2016.
  • [48] A. Steenackers and M. Goovaerts. A credit scoring model for personal loans. Insurance: Mathematics and Economics, 8(1):31–34, 1989.
  • [49] L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817–826. ACM, 2009.
  • [50] L. Tang and H. Liu. Community detection and mining in social media. Synthesis Lectures on Data Mining and Knowledge Discovery, 2(1):1–137, 2010.
  • [51] R. Todeschini, V. Consonni, and R. Mannhold. Handbook of molecular descriptors. 2002.
  • [52] V. Van Vlasselaer, J. Meskens, D. Van Dromme, and B. Baesens. Using social network knowledge for detecting spider constructions in social security fraud. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on, pages 813–820. IEEE, 2013.
  • [53] S. Xiao, J. Yan, H. Zha, X. Yang, and S. Chu. Modeling the intensity function of point process via recurrent neural networks. In IJCAI, 2016.
  • [54] J. Yan, M. Gong, C. Sun, J. Huang, and S. Chu. Sales pipeline win propensity prediction: a regression approach. In IFIP/IEEE International Symposium on Integrated Network Management, 2015.
  • [55] J. Yan, Y. Wang, K. Zhou, J. Huang, C. Tian, H. Zha, and W. Dong. Towards effective prioritizing water pipe replacement and rehabilitation. In IJCAI, 2013.
  • [56] J. Yan, S. Xiao, C. Li, B. Jin, X. Wang, B. Ke, X. Yang, and H. Zha. Modeling contagious merger and acquisition via point processes with a profile regression prior. In IJCAI, 2016.
  • [57] R. Zafarani and H. Liu. Evaluation without ground truth in social media research. Communications of the ACM, 58(6):54–60, 2015.
  • [58] W. Zhao. Resolving the ”guarantee circle” crisis needs a multi pronged (news in chinese), 2016. Retrieved 11 May 2016.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
338584
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description