A Data Description and Dataset Creation

Assessing systemic risk due to fire sales spillover through maximum entropy network reconstruction1


Assessing systemic risk in financial markets is of great importance but it often requires data that are unavailable or available at a very low frequency. For this reason, systemic risk assessment with partial information is potentially very useful for regulators and other stakeholders. In this paper we consider systemic risk due to fire sales spillover and portfolio rebalancing by using the risk metrics defined by Greenwood et al. (2015). By using the Maximum Entropy principle we propose a method to assess aggregated and single bank’s systemicness and vulnerability and to statistically test for a change in these variables when only the information on the size of each bank and the capitalization of the investment assets are available. We prove the effectiveness of our method on 2001-2013 quarterly data of US banks for which portfolio composition is available.

JEL codes: C45;C80;G01;G33.

Keywords: systemic risk;maximum entropy;fire-sales;bank vulnerability;bank systemicness.

1 Introduction

After the recent troubled years for the global economy, in which two severe crises (the crisis of financial markets and the sovereign debt crisis) have put the whole economic system in dramatic distress, vulnerability of banks to systemic events is now the main focus of a growing number of investigations of the academic community, across different disciplines. Simultaneously, many research efforts are devoted to understand the role of banks or, broadly speaking, of financial institutions in the creation and in the consecutive spreading of systemic risk. Given the prominent importance of the topic and its multifaceted nature, the literature on evaluation and anticipation of systemic events is huge (see Demirgüç-Kunt and Detragiache, 1998; Kaminsky and Reinhart, 1999; Harrington, 2009; Scheffer et al., 2009; Barrell et al., 2010; Duttagupta and Cashin, 2011; Kritzman et al., 2011; Allen et al., 2012; Arnold et al., 2012; Bisias et al., 2012; Scheffer et al., 2012; Merton et al., 2013; Oet et al., 2013, among many contributions).

Several are the channels through which financial distress may propagate from one institution to another and, eventually, affect a vast portion of the global economy. Fire sales spillovers due to assets’ illiquidity and common portfolio holdings are definitely one of the main drivers of systemic risk. Shared investments create a significant overlap of portfolios between couples of financial institutions. Such (indirect) financial interconnectedness is an important source of contagion, since partial liquidation of assets by a single market player is expected to affect all other market participants that share with it a large fraction of their own investments (see Corsi et al., 2013; Huang et al., 2013; Caccioli et al., 2014; Lillo and Pirino, 2015, for a survey of the role of portfolio overlap in spreading financial distress). Fire sales move prices due to the finite liquidity of assets and to market impact. In a perfectly liquid market there will be no fire sale contagion at all (see Adrian and Shin, 2008, for a review on the role of liquidity in financial contagion). Finally, leverage amplifies such feedbacks. In fact, as described in detail by Adrian and Shin (2010, 2014), levered institutions continuously rebalance their positions inflating positive and, most importantly, negative assets’ price variations.

Assessing and monitoring systemic risk due to fire sales spillover is therefore of paramount importance for regulators, policy makers, and other participants to the financial markets. Greenwood et al. (2015) introduced recently a stylized model of fire sales, where illiquidity, target leverage, and portfolio overlap are the constituent bricks. They used the model to propose some systemic risk metrics. Specifically, systemicness and vulnerability of a bank are defined in Greenwood et al. (2015) as, respectively, the total percentage loss induced on the system by the distress of the bank and the total percentage loss experienced by the bank when the whole system is in distress. The peculiarity of the systemic measures of Greenwood et al. (2015) (and of any other similar approach) is that, in order to estimate both systemicness and vulnerability, a full knowledge of the banks’ balance sheets is needed. In practice, it is required to know the amount of dollars that each bank invest in each asset class, that is the matrix of bank portfolio holdings, whose estimation from partial information is the main focus of our analysis.

Greenwood et al. (2015) applied their method to the EBA data on the July 2011 European stress tests, which provide detailed balance sheets for the largest banks in the European Union. Duarte and Eisenbach (2013) exploits a publicly available dataset of balance sheets of US bank holding companies to apply the framework of Greenwood et al. (2015). They derive a measure of aggregate vulnerability that […] reaches a peak in the fall of 2008 but shows a notable increase starting in 2005, ahead of many other systemic risk indicators. Nevertheless, the detailed information set required to compute such indicators is not guaranteed to be easily accessible and its collection can be a difficult task, especially at frequency higher than quarterly.

This discussion leads to the main topic of this paper: how it is possible to estimate systemic risk due to fire sales spillover in absence of data on portfolio composition of financial intermediaries. Two possible approaches have been proposed in the literature. The first approach (such those proposed by Adrian and Brunnermeier, 2011; Acharya et al., 2012; Banulescu and Dumitrescu, 2015; Corsi et al., 2015) is purely econometric and it is typically based on publicly available data on price of assets and market equity value of publicly quoted financial institutions. Generically the method consists in estimating conditional variables, such as conditional Value-at-Risk or conditional Expected Shortfall. The econometric approach circumvents the unavailability of data on portfolio holdings, but pays this advantage introducing a strong stationarity assumption: estimates based on the past information are assumed to be always good predictors of the future behavior of the system. Nevertheless, due to the nature of a global financial crisis, it is in the very moment of the onset of a period of distress that the stationarity assumption may fail to work properly. Moreover it is often restricted to publicly quoted institutions for which equity value are available at daily frequency.

A second possible approach2, followed in the present paper, consists in inferring the matrix of portfolio holdings using only a reduced information set, and/or deriving a probability distribution for the portfolio weights according to some criterion. This is typically achieved summoning the maximum entropy principle, which postulates that (Anand et al., 2013) […] subject to known constraints […] the probability distribution that best represents our current knowledge and that is least biased is the one with maximal entropy. The approach of Maximum Entropy is not new in systemic risk studies Mistrulli (2011); Anand et al. (2013); Musmeci et al. (2013); Squartini et al. (2013); Bargigli et al. (2015). It is widely used in inferring the structure of the interbank network when only data of total interbank lending and borrowing (plus possibly other information) are available. In this context, the maximum entropy approach has been criticized (Mistrulli, 2011; Mastromatteo et al., 2012) because it infers a fully connected interbank network, while they are typically very sparse. A seminal contribution by Mistrulli (2011), comparing the empirical Italian interbank network with that reconstructed via an optimization procedure, has shown that, as a consequence of this misestimation, reconstructed network underestimates the risk contagion3.

In this paper we propose to apply maximum entropy approach to the inference of the network of portfolio weights in order to estimate metrics of systemic risk due to fire sales spillovers. Specifically, we show how indirect vulnerability, systemicness (as defined by Greenwood et al., 2015) and the aggregate systemic risk of US commercial banks can be estimated when only a partial information (the size of each bank and the capitalization of each asset) is available. Unlike Mistrulli (2011), Mastromatteo et al. (2012), and Anand et al. (2015) we deal with bipartite networks, namely graphs4 whose nodes can be divided into two sharply distinguished sets that, in our case, are commercial banks and asset classes. More specifically, we analyze the quarterly networks of US commercial banks’ exposures in the period 2001-2013 using the Federal Financial Institutions Examination Council (FFIEC) through the Call Report files5. We compute, for each quarter, systemicness and vulnerability of each bank and the aggregate vulnerability of the system. We compare them with the values inferred assuming the balance sheet compositions of the banks were not known. In this sense our paper is similar to Mistrulli (2011), but applied to systemic risk due to fire sale spillover rather than to cascades in the interbank network. Differently from the interbank case, we find that maximum entropy methods are very accurate in assessing systemic risk due to fire sales spillover when partial information is available.

The contribution of this paper is divided into three main lines. First, following a practice that is largely diffused among researchers of both academic institutions and central banks (see, among others, Sheldon and Maurer, 1998; Upper and Worms, 2004; Wells, 2004; Mistrulli, 2011; Sachs, 2014), we reconstruct the matrix of portfolio holdings as such that minimizes the Kullback-Leibler divergence from a \enquotenaive guess. We show that this approach, while underestimating systemic risk in interbank networks (Mistrulli, 2011), does a very good job in our case, providing unbiased estimates of the aggregate vulnerability as defined by Greenwood et al. (2015). Besides, we show that the reconstructed matrix corresponds to that implied by the Capital Asset Pricing Model, hence it possesses a clear economic meaning.

Second, we define a statistical ensemble, that is a set of graphs and a probability mass function defined on it, such that the expected value of the generic element of the matrix yields the value expected under the Capital Asset Pricing Model. Although sizes and capitalizations are the only information required to construct the ensemble, we show that it does a very good job in predicting systemic risk metrics, not only at aggregate level, but also for each bank. The performances of this newly proposed approach are shown to outperform those obtained with standard maximum entropy approaches (Saracco et al., 2015), i.e. the bipartite extensions of the models proposed by Mastrandrea et al. (2014) for unipartite networks.

Finally, we show how the statistical ensemble is potentially useful for financial regulators or supervisory authorities in their monitoring activities. As a matter of fact, the statistical ensemble implies a probability distribution for any risk metrics and hence it could be used to statistically test if a specific institution has increased its systemicness with respect to a date in the past.

We structure our paper as follows. Section 2 introduces some nomenclature and briefly describes the risk metrics of Greenwood et al. (2015). The dataset of US commercial banks provided by the FFIEC is discussed in Section 3. In Section 4 we derive the solution of the Cross-Entropy problem and we apply it to our dataset. Section 5 is dedicated to the definition the statistical ensembles whose performances in estimating risk metrics are investigated in a comparative analysis reported in Section 6. A statistical test useful for surveillance activities by central banks and other regulatory institutions is presented in Section 7. Finally Section 8 summarizes the main contributions of the paper. Appendices provide additional information on the construction of the dataset of bank portfolio holdings and all the analytical computations omitted in the main text.

2 Systemic risk metrics: Vulnerability and Systemicness

In this paper we use some metrics of systemic risk of individual banks and in aggregate, which have been recently introduced by Greenwood et al. (2015). Here the authors presented a model in which fire sales propagate shocks across balance sheets of banks. More specifically, they consider a system composed by banks and asset classes.

First of all, we introduce the matrix of portfolio holdings, whose element is the dollar-amount of -type assets detained by bank . The corresponding matrix of portfolio weights is thus written as

In what follows, we introduce a discretization of the elements of ’s, in such a way that the matrix belongs to the space of integer valued matrices. In the empirical application we will use the resolution of the dataset which is .

The total asset size of the -th bank and the total capitalization6 of the -th asset class are easily computed as, respectively, the total row and column sums of the matrix , in formula


where we have explicitly expressed the dependence of and from .

The rectangular matrix can be naturally associated to a bipartite network, i.e. a graph whose vertices can be divided into two disjoint sets such that every edge connects a vertex in one set to one in the other set, the two sets being the banks and the asset classes. In the network jargon, and are called the strength sequences for, respectively, the top (banks) and bottom (assets) nodes of the network represented by the matrix .

A relevant information concerning the balance sheet of each bank is the total equity , from which one can compute the leverage as (as in Greenwood et al., 2015). Finally, each asset class is characterized by an illiquidity parameter , with , defined as the return per dollar of net purchase of asset 7.

This setting is used in Greenwood et al. (2015) to define some metrics of systemic risk, capturing the effect of fire sales in response to a shock on the price of the assets. This is described by the dimensional vector , whose components are the assets’ shocks. They define:

  • Aggregate vulnerability as […] the percentage of aggregate bank equity that would be wiped out by bank deleveraging if there was a shock […] to asset returns.

  • Bank systemicness as the contribution of bank to aggregate vulnerability.

  • Bank’s indirect vulnerability as […]as the impact of the shock on its equity through the deleveraging of other banks.

By assuming that banks follow the practice of leverage targeting and that, in response to a negative asset shock, they sell assets proportionally to their pre-shock holdings, Greenwood et al. (2015) show that can be decomposed as


where is the total equity, , is the -th element of the vector , i.e. the portfolio return of bank due to the shock , and

The aggregate vulnerability is computed simply as


Finally, the indirect vulnerability of a bank is


In what follows we assume, as in Duarte and Eisenbach (2013), that for all , which in turns implies that in equations (2) and (4). Note however that if all the assets are shocked by the same amount, our results do not depend on it, since the systemic risk measures will have only a different pre-factor. Note also that both systemicness and indirect vulnerability must be thought as quantities expressed in percentage. Finally, we set the liquidity parameter at for all asset classes except for cash, for which we put (as in Greenwood et al., 2015; Duarte and Eisenbach, 2013).

In the next section we provide a detailed description of the dataset that we adopt in our analysis to measure systemic risk, as captured by the metrics of Greenwood et al. (2015), in the US banking sector. Such a dataset allows us to have quarterly estimates of systemicness, aggregate, and indirect vulnerability and to compare these estimates with those inferred from the Cross-Entropy approach and the Maximum Entropy principle. Since we have to deal with both real and reconstructed (or sampled form a statistical ensemble) networks, in order to avoid ambiguity, from now on we follow the convention to add a superscript to any variable whenever it is referred to a real (observed) network, while the variable is represented without the superscript every time it is referred to a reconstructed network (e.g. one sampled from a statistical ensemble as described in Section 5).

3 Data

All regulated financial institutions in the United States are required to file periodic financial information with their incumbent regulators. The Federal Financial Institutions Examination Council, is the regulatory institution responsible to collect and maintain the data used in our analysis. The financial institutions subject of our investigation are Commercial Banks and Savings and Loans Associations. The FFIEC defines officially a commercial bank as8: \enquote[…] a financial institution that is owned by stockholders, operates for a profit, and engages in various lending activities. FFIEC requires commercial banks to file the quarterly Consolidated Report of Condition and Income, generally referred to as Call Report. Each bank is required to fill a form with detailed information on its financial status, in particular its balance sheet. The specific reporting requirements depend upon the size of the bank and whether or not it has any foreign office. The form FFIEC031 is used for banks with both domestic (U.S.) and foreign (non-U.S.) offices while form FFIEC041 is designed for banks with domestic (U.S.) offices only. A Saving and Loan Association is a financial institution that accepts deposits primarily from individuals and channels its funds primarily into residential mortgage loans. From the first quarter of 2012, all Savings and Loan Associations are required to file the same reports, thus they are included in the dataset since then.

Figure 1: We report in the left panel the percentage of total assets detained by top 10, top 100, top 1000 and the remaining banks in shaded areas of different colors. A vast portion of total assets is controlled by the top 10 banks. In the right panel we report, for each quarters, the contribution of the top seven asset classes (in terms of capitalization) to the total capitalization. A large percentage to total asset capitalization is due to Loan Secured by Real Estates in Domestic Offices.

The data provided by the Call reports are publicly available 9 since 1986, although the form changed considerably throughout the years, showing an increasing level of details requested. To have a good compromise between the fine structure of data and a reasonably populated statistics we considered the time period going from March 2001 to September 2013, for a total of quarters. The number of financial institutions present in the data is pretty stable during quarters, starting from approximately entities in the first quarter and ending in roughly in the last one. The asset categories have been created as coherent sums of codes. We describe the procedure adopted to form asset classes in Appendix A along with some data statistics. In particular, we aggregate data in a set of 20 asset classes following the rationale of Duarte and Eisenbach (2013), that is each of the 20 asset classes is composed in such a way that, in case of a fire sale of assets belonging to a specific class, the price impact would be restricted mainly to the assets in the same class. In other words, it is reasonable to assume that the co-illiquidity of two different asset classes is negligible. The twenty macro asset classes used to build the network are described in Table 1 of Appendix A, which also documents in detail how they have been formed. In the left panel of Figure 1 we show how the total asset value is concentrated on the top tiered banks. The right panel of Figure 1 shows the relative importance of the top seven assets classes (in terms of total capitalization), revealing that a large portion of the total capitalization is due to Loan secured by real estates in domestic offices.

Hence, we are able to construct, for each quarter form the first of 2001 to the third of 2014, a matrix of bank holding whose element is the total dollars invested by the -th bank in the -th asset class. Finally, it is important to note that the matrix has around of zero entries. Thus the network is relatively dense, but far from being fully connected: it is not true that each bank invest in all assets classes.

4 The Cross-Entropy Approach

As anticipated in the introduction, Cross-Entropy is a popular method, largely adopted by scholars and researchers of central banks, used to reconstruct a target matrix (typically the inter-bank matrix) from partial knowledge of its properties. The idea is to select an a priori guess for the matrix and then to find its closest matrix subject to some constraints. The guess matrix is chosen either by using some randomness assumption or by using some economic intuition. The constraints are used to enforce all the partial knowledge on the target matrix to be estimated. In the simplest case, such constraints are non negativity conditions of matrix elements and the total row and column sums. Finally, as a measure of distance to be minimized between the guess and the target matrix one uses the Kullback-Leibler divergence.

For the specific case of the system of bank holdings for US commercial banks we assume to have at our disposal, for each quarter, only the information on the total asset size for the -th bank and the total capitalization for the -th asset class, as provided by the FFIEC data. The Cross-Entropy approach derives the target matrix as that which solves the optimization problem


where are the entries of the guess matrix. Note that the cases analyzed in the interbank lending literature (Mistrulli, 2011) typically have an additional constraint on the diagonal elements, required to avoid a single institution to be simultaneously a borrower and lender to itself (see, for example, the Appendix B in Mistrulli, 2011). The matrix of portfolio holdings analyzed here does not require any of such kind of restrictions, a feature that extremely simplifies the solution of problem (5). In fact, the initial guess as proposed, for example, by Mistrulli (2011) coincides, in our settings, with the capital asset pricing model (CAPM). In a standard CAPM, investors choose their portfolio in such a way that each weight on a stock is the fraction of that stock’s market value relative to the total market value of all stocks (Sharpe, 1964; Lintner, 1965; Mossin, 1966). Since is the total asset size of the -th bank and since the total market value of all stocks is given by , the CAPM portfolio weights are given by10


The CAPM-implied holdings are a natural choice for the initial guess , since they are the simplest combinations of the strength sequences that verify the constraints. Nevertheless, given that in (5) the condition on the diagonal elements is absent and since the Kullback-Leibler divergence is always positive, the optimal solution of the Cross-Entropy problem in (5) when is nothing but the itself. Hence, thanks to the bipartite nature of the network under study we do no have to resort to numerical routines to solve problem (5). If other constraints are added to the problem (e.g. that some banks cannot invest in some asset classes) one could numerically solve the problem (5) with the additional constraint.

We now test empirically the validity of the CAPM-implied portfolio holdings on our data. Notice that the true matrix is quite different from , since the latter has no vanishing elements, while in the former half of the matrix elements are zero. Therefore it is not a priori obvious that the CAPM-implied matrix is able to capture the dynamics of systemic risk. This is however the case, as clearly witnessed by the plot in Figure 2, where we compare the true value of the aggregated vulnerability (formula (3) of the previous section) as found in the dataset described in Section 3 with that computed on the CAPM-implied portfolio holdings.

The main implication of Figure 2 is that, at least for the dataset under analysis, it is not necessary to know the matrix to assess the systemic risk as measured by the aggregate vulnerability. The knowledge of banks’ size and assets’ capitalization is enough to infer the matrix , which very well reproduces the aggregate behavior (in terms of systemicness) of the system. This is different from the result of Mistrulli (2011) for the interbank network, since he finds that the Cross-Entropy approach significantly underestimates systemic risk, while in our case the bias is negligible.

Figure 2: This figure reports as a black continuous line the aggregated vulnerability, as defined by equation (3), computed on the matrix of portfolio holdings as provided by the FFIEC dataset of US commercial bank holdings, described in Section 3. The red dotted line is the aggregated vulnerability computed on the CAPM-implied matrix defined in equation (6) and that requires only the information on the strength sequences and .

At this stage two main issues remain to be addressed. The first is how accurate is the estimate of systemicness and indirect vulnerability of each single bank as provided by the matrix and, the second is which kind of model can be adopted to statistically validate a variation of systemicness (or indirect vulnerability) for a specific bank. In what follows, we turn to these issues.

5 Network Statistical Models

While providing a very accurate approximation of aggregate vulnerability, the matrix cannot say much about statistical significance of systemicness and indirect vulnerability of banks. For this purpose a statistical null model is required.

We define a statistical network model following the theoretical framework of Kolaczyk (2009). A network (statistical) model is defined by a set of graphs that is called ensemble and a probability mass function indexed by a vector of model parameters . In formula it is expressed as the triplet

where is a convex subset of , with the total number of parameters of the model. The set is a countable set whose element are called graphs. In what follows, we will not distinguish between the graph and the associated matrix , i.e. the probability mass function is defined in the space of integer valued matrices. Moreover the probability mass function is a function defined on with value in

and such that . Such a probability is allowed to depend on a vector of real parameters . For an arbitrary (regular enough) network function defined on the set (see for example the functions defined in equations (1)-(19)), the expected value of on the ensemble is defined as

A model can be defined by explicitly giving the ensemble, the probability mass function along with the space of the parameters, or by deriving through the recurrent application of some generative mechanism or rule, either starting from an empty graph or by applying a randomization procedure to a reference graph.

We define our network statistical models in the next subsections by invoking the Maximum Entropy principle. However, before proceeding, it is important to stress that, once a network model has been defined, there are at least two possible applications. First, in every situation in which solely the partial information inputed in the maximization constraints is available (and hence the true matrix is unknown), the network model can be used to estimate the true (unobserved) value of any network function (e.g. systemicness and indirect vulnerability of banks), by proxying it via, for example, . Second, as mentioned before, when the entire matrix of bank holdings is known, we can use the associated to statistically validate any network function as observed on real data, for example by testing whether the systemicness of a specific bank has statistically grown in time. The first application is discussed in Section 6 while the second in Section 7.

5.1 The Maximum Entropy Principle

One way to construct ensembles of networks is by invoking the Maximum Entropy (ME) principle11. In the general setting, the probability mass function is the one that maximizes the Shannon’s entropy

with the normalization constraint

and further additional constraints may be added to define a specific ensemble (see below). In our case, given the peculiar role of the CAPM in reproducing the aggregate vulnerability, we derive the probability mass function by solving the maximization problem


We call this model Maximum Entropy Capital Asset Pricing Model (shortened in MECAPM henceforth). In Appendix B.1 we prove that the MECAPM has the unique solution


hence each single matrix entry is geometrically distributed with mean .

Since other specifications of maximum entropy are quite popular in the literature of network reconstruction, only for comparison purposes we take into considerations two other ensembles, mainly inspired by the paper by Mastrandrea et al. (2014) and Saracco et al. (2015). Each of them is characterized by different constraints imposed on the maximization of the Shannon’s entropy.

In the first, whose probably mass function is found in Appendix B.3 and that we refer as Bipartite Weighted Configuration Model, the constrained maximization is modified in


BIPWCM imposes weaker constraints with respect to MECAPM, while exploiting the same information set. Despite this, MECAPM has a fully analytical solution while BIPWCM has not. As shown in the Appendix, it is possible to write a system of equations for the Lagrange multipliers of the optimization problem and the system is easily solved numerically.

Finally, we consider another (richer) statistical ensemble whose probability mass function, derived in Appendix B.3, corresponds in our bipartite framework to the enhanced configuration model of Mastrandrea et al. (2014). This newly defined ensemble, that we address as Bipartite Enhanced Configuration Model (BIPECM), is obtained via Maximum Entropy imposing both the mean value of strengths (as in BIPWCM) and the mean value of degrees, that is the number of edges incident in each vertex. In other words, we reconstruct the matrix by assuming the knowledge of the number of assets in which each bank invests as well as the number of banks investing in each asset. Despite the fact that this information is typically not known, we consider this ensemble to show that even with an information set significantly larger than the one used in MECAPM it is very difficult to outperform it. Mathematically, the BIPECM is obtained by solving the optimization problem


where and are, respectively, the row and the column degree sequences (see Appendix B.3 for more details). The peculiarity of BIPECM is the addition of the information on the degree sequences that is absent in both BIPWCM and MECAPM. Note that the three ensembles can be used not only for statistical inferences, but to produce estimates of any function defined on the network, which is the topic of the next section.

6 Assessing systemic risk for individual banks

In this Section we compare the performance of the three different ensembles presented above in order to assess systemic risk for individual banks. The three estimators are

  • MECAPM estimator: only the information on the strength sequences is used, systemicness and indirect vulnerability are estimated through their expected values according to the probability mass function in (8).

  • BIPWCM estimator: as for the MECAPM case only the information on the strength sequences is used, nevertheless the constraints imposed on the maximization are simpler. Systemicness and indirect vulnerability are estimated through their expected values according to the probability mass function in (17).

  • BIPECM estimator: the information on the strength and degree sequences is used, systemicness and indirect vulnerability are estimated through their expected values according to the probability mass function in (23).

Notice that, for each estimator, there are at least two possible points of comparison. Suppose, for instance, that we want to estimate , that is the systemicness of bank . We could either proxy via or , where is the expected adjacency matrix according to the network probability distribution.

A peculiarity of the MECAPM is that the computation of the distribution of each single entry does not require any numerical procedure, and hence we can provide explicit formulas for both and . In Appendix C we derive these formulas and, in particular, we show that the expected values and are well approximated by, respectively, and which, for the MECAPM case, coincides with the systemicness and indirect vulnerability of bank as returned by the CAPM-implied matrix. This result shows that the Cross-Entropy and the MECAPM are almost equivalent, at least for what concerns the estimates of the average risk metrics.

In evaluating the performances of each class of estimators it is important to have good estimates for the most systemic/vulnerable banks, being all the remaining ones not particularly relevant in the context of systemic risk. For this reason we proceed as follows. Suppose we want to assess the performance of an estimator that produces estimates and of, respectively, systemicness and indirect vulnerability of the -th bank. For each bank and for each quarter we compute the relative error in estimating both and as


We then divide the whole sample of banks in four quartiles, according to the true bank systemicness (or the true indirect vulnerability) in the investigated quarter and we compute the relative errors (10) for all the banks present in the quartile and according to each of the three estimation methods. We finally plot the median of the relative error and, as a measure of dispersion, we take record of the interquartile range, i.e. the difference between the upper and lower quartiles12. Clearly a median well centered around zero is an indication that the estimator is unbiased.

Figure 3: Time series of the relative error (see Eq. 10) of bank systemicness with respect to real data as estimated by the three ensembles BIPWCM (red and squares), BIPECM (blue and circles), and MECAPM (grey and dashed line). The thick lines indicate the median and the colored areas the interquartile range (red cross for BIPWCM, blue squares for BIPECM and dotted line for MECAPM). The four panels refer to four quartiles of banks according to their systemicness. First quartile includes the banks with smallest systemicness.

Figure 3 shows the results for bank systemicness and Figure 4 shows the results for indirect vulnerability. The three different colors refer to the three different estimation methods (see the figure caption for more details). We observe in all panels and for each quarter that BIPWCM strongly underestimates individual bank systemicness and indirect vulnerability. The median relative error ranges between and and the interquartile range includes zero only for the first quartile, i.e. for the least systemic/vulnerable banks, those who are of less interest in systemic risk assessment. The estimator based on BIPECM (using the additional information on degrees) gives slightly better results, even if a strong underestimation is still present. The median relative error ranges between and and again the interquartile range includes zero only for banks in the first quartile.

On the contrary the estimator based on MECAPM performs much better. The median relative error never goes below and almost always the interquartile range is centered around zero. The most notable exceptions refer to the first quartile (top left panels), which include least important banks and the fourth quartile of indirect vulnerability. In this important case the median relative error is around . Even if we do not have an explanation for this behavior, it is important to notice that MECAPM strongly outperforms the other two estimators.

In summary, the estimates of systemicness and indirect vulnerably for each single bank as provided by the CAPM-implied matrix are almost identical to those obtained as the corresponding expected values on the MECAPM ensemble. Besides, they are satisfactorily accurate and surely more reliable than those provided by standard maximum entropy ensembles.

Figure 4: Time series of the relative error (see Eq. 10) of bank indirect vulnerability with respect to real data as estimated by the three ensembles BIPWCM (red and squares), BIPECM (blue and circles), and WCAPM (grey and dashed line). The lines with symbols indicate the median and the colored areas the interquartile range (red cross for BIPWCM, blue squares for BIPECM and dotted line for MECAPM). The four panels refer to four quartiles of banks according to their vulnerability. First quartile includes the banks with smallest indirect vulnerability.

Once more, the important message is that it is possible to achieve pretty accurate estimates of systemic risk metrics, at the aggregate or individual institution level, due to fire sales spillover without a full knowledge of the portfolio holdings of financial institutions.

7 Monitoring and testing changes in systemicness

As another application of the ensembles of graphs obtained with the Maximum Entropy method, we consider here the problem of assessing whether the systemicness of a given bank (or of the whole system) has changed in a statistically significant way. In order to answer this question, it is necessary to have a null hypothesis and we propose to use network ensembles to this end. Since the MECAPM shows superior performances in estimating risk metrics, in this section we use it and we propose a possible application for statistical validation. Our objective here is not to study all the banks and all the quarters, but only to show how the testing method can be implemented.

In particular, imagine a regulator who monitors a given bank, measuring its systemicness and searching for evidences of a significant increase. Having a given quarter as reference, the regulator can extract the distribution of bank’s systemicness and, in the subsequent quarters, identify when the systemicness is outside a given confidence interval around the reference period. As a special case, we select four banks among the top fifty in the first quarter and that exist for the entire time period (i.e. they do not exit the dataset). For each quarter we compute the true bank systemicness and the - confidence bands according to the MECAPM ensemble (see Figure 5). We then added a magenta square in each quarter when the true systemicness is above the confidence band of the first quarter, used as reference. Hence, a magenta square is indicating a quarter when the systemicness of the bank is statistically larger (according to the MECAPM) than at beginning of . We show two banks for which a statistically significant change in systemicness is observed (top row) and two for which no change is observed (bottom row). Notably, for the former case we find that the systemicness of the banks analyzed increased significantly much before the onset of the 2007-2008 financial crisis. This phenomenon persisted along the entire period of the crisis and vanished not before the end of 2009. This suggests that network statistical models could be of valuable help in the surveillance activity of central banks and other supervisory authorities as monitoring tools and in constructing early warning indicators.

Figure 5: We report, for four selected banks, the true systemicness (thick dotted lines) and the - confidence bands according to the MECAPM ensemble. A magenta square is added in every quarter in which the systemicness of the bank is above the confidence level of the first quarter of .

8 Conclusions

In this paper we focused on the problem of estimating metrics of systemic risk due to fire sale spillover in presence of limited information on the composition of portfolios of financial institutions. A full knowledge of the portfolio holdings of each institution in the economy is required to have a precise estimate of any risk metrics that, as those proposed by Greenwood et al. (2015), is based on the mechanism of portfolio rebalancing through fire sales. Nevertheless, such a huge and detailed information may not be available, especially at frequency higher than quarterly, making the estimation of systemic risk quite difficult. In this paper we circumvent the problem by providing accurate estimates of systemic risk metrics that are based on a partial knowledge of the system, more precisely only on the sizes of balance sheets and the capitalization of assets (or asset classes), which are much easier to trace. In this respect, we have shown that the largely diffused method of Cross-Entropy minimization returns, for the system under analysis, the Capital Asset Pricing Model and that it does a very good job in estimating aggregate vulnerability without requiring any knowledge of the underlying matrix of bank portfolio holdings. Furthermore, we introduced a Maximum Entropy (ME) ensemble that reproduces, on average, the CAPM and performs quite well in estimating systemicness and indirect vulnerability of single institutions, outperforming standard ME competitors. The estimation of systemic risk metrics could provide valuable information to any policy maker, but variations in systemicness and indirect vulnerability are difficult to interpret in absence of a statistical validation. For this reason, as a final contribution, we have proposed the ME ensemble as a tool to assess the statistical significance of systemic risk metrics. On a selection of banks of our dataset we documented that their systemicness significantly increased, with respect to the level observed at the beginning of the 2001, much before the onset of the 2007-2008 financial crisis. Even if deeper investigations are required in this direction, we believe that this approach could be easily implemented as an early warning indicator of systemic risk.

Appendix A Data Description and Dataset Creation

This appendix provides some descriptive features of the data along with the method adopted to build the asset classes of the bank-asset network analyzed in the paper. The left panel (first row) of Figure 6 reports, on a log-log scale, the kernel density of the bank sizes (i.e. the total amount of assets detained by the bank) pooled across all quarters. It is evident that bank sizes are quite heterogeneous. The right panel (first row) of Figure 6 reports the density of the bank leverages pooled across all quarters. In this case we observe a much less heterogeneous distribution, with most banks showing a leverage around . Finally, the second row of Figure 6 reports the relation between size and leverage. The plot is achieved by sorting all records of bank size from the smallest to the largest and then applying a moving-window procedure. As expected from the density plots, there is no relation between leverage and bank size, having most bank a leverage of and a highly heterogeneous size.

Concerning the formation of the asset classes used in the main text, we provide in what follows details on how they have been created. As mentioned in the main text, the focus of the paper is on commercial banks, whose precise definition is given by the FFIEC as […] every national bank, state member bank, insured state nonmember bank, and savings association is required to file a consolidated Call Report normally as of the close of business on the last calendar day of each calendar quarter, i.e., the report date. The specific reporting requirements depend upon the size of the bank and whether it has any \enquoteforeign offices […]. This is the set of institutions that is referred as Commercial Banks throughout all the paper.

Forms FFIEC031 and FFIEC041 are dedicated to, respectively, banks with only domestic offices and banks with domestic and foreign offices. However, in both forms, it is adopted the same coding system. More specifically there are only two types of codes, RCON and RCFD, which are followed by a four digits alphanumerical code. The alphanumerical code identifies the budget item, for example refers to total assets of the bank. The prefix RCON is used for financial items relative to domestic offices, while RCFD encompasses both domestic and foreign offices. Hence is the code for the total assets of the bank detained in U.S. offices, while is relative to the sum of total assets detained in U.S. plus offices abroad. Of course, for banks that fill the FFIEC031 the two codes RCON and RCFD report the same value if they have the same alphanumerical code.

Table 1 documents the detailed composition of each asset class. For each asset class (first column) we report the composition in terms of FFEIC items in the third column and a short name given to the asset class in the second one. Such abbreviation is needed since some asset class, e.g. \enquoteloans to consumers in foreign offices, are assembled subtracting from the FFIEC codes some previously defined asset classes. There is a one-to-one correspondence between asset classes and variable names, a part for the case of \enquoteloans secured by real estates in domestic offices, which is computed as the sum of five variables, from \enquoteconstruction loans to \enquotenon farm, non residential. The composition of the FFEIC formula reported in the third column may vary during time, hence we report in bold the period of validity of the formula adopted. In this respect, note that the date 12/99 refers to the last available quarter, that is the third quarter of . In reporting the FFEIC formula, we adopt the convention that the prefix is omitted whenever RCON is used solely for banks with only domestic offices and RCFD solely for those that have at least on office abroad. On the contrary, when the prefix is specified, it means that only the code with that particular prefix is being been used. For example the code is used only in its domestic version, hence we do not use for banks with offices abroad.

Figure 6: This Figure reports some descriptive features of the data analyzed. Top left panel plots, on a log-log scale, the kernel density of bank sizes (defined as total assets in unit of ) while top right is the kernel density of the bank leverages. Both densities are computed using all records pooled across the entire time span. For the sake of visualization, we put a cut-off of on the maximum leverage allowed, although leverages of more than are (rarely) observed. The bottom panel shows that there is no relation between leverage and size. The procedure adopted to draw the plot is the following: all records of bank size are sorted from the smallest to the largest one and a rolling window of records is moved, with an incremental shift of records, from the first to the last. In each window we compute the mean leverage (black continuous line) and the standard deviation of leverage (red dotted line) of banks that fall in the window. Mean and standard deviation are plotted as a function of the mean size in the window, which is reported in the horizontal axis.
Asset Class Variable Name FFIEC Formula
Total assets tot_ass 03/01-12/99: 2170+2123+3123
Equity equity

03/01-03/09: 3210+3000

03/09-12/99: G105

Cash and balances due from depository institutions

cahab 03/01-12/99: 0081+0071

U.S. treasury securities

ust_sec 03/01-12/99: 0211+1287+RCON3531

U.S agency securities

agency_sec 03/11-12/99: 1289+1294+1293+1298+RCON3532

Securities issued by state and local governments

state_sec 03/01-12/99: 8496+8499+RCON3533

Mortgage backed securities

mbs 03/01-03/09: 1698+1702+1703+1707+1709+1713+1714+1717+1718+1732+
06/09-12/10: G300+G303+G304+G307+G308+G311+G312+G315+G316+



03/11-12/99: G300+G303+G304+G307+G308+G311+G312+G315+G316+


Asset backed securities


03/01-12/05: B838+B841+B842+B845+B846+B849+B850+B853+B854+


03/06-03/09 C026+C027

06/09-12/99: C026+C027+G336+G340+G344+G339+G343+G347

Other domestic debt securities

dom_debt_oth_sec 03/01-12/99: 1737+1741

Foreign debt securities

for_debt_sec 03/01-12/99: 1742+1746

Residual securities

res_sec 03/01-12/99: A511

Futures, forwards sold and securities purchased under the agreement to resell (asset)


03/01-12/01: 1350

03/02-12/99: RCONB987+B989

Loans secured by real estates in domestic offices Construction loans

03/01-12/07: RCON1415

03/08-12/99: RCONF158+RCONF159

Secured by farmland 03/01-12/99: RCON1420
1-4 Family real estate 03/01-12/99: RCON5367+RCON5368+RCON1797
Multifamily property loans 03/01-12/99: RCON1460
Non farm, non residential

03/01-12/07: RCON1480

03/08-12/99: RCONF160+RCONF161

Loans secured by real estate in foreign offices


03/01-12/99: (if present) RCFD1410 - ln_const - ln_farm - ln_rre - ln_multi - ln_nfnr,

3/01-12/99: (otherwise) zero

Commercial and industrial loans in domestic offices

ln_ci_dom 03/01-12/99: RCON1766

Commercial and industrial loans in foreign offices

ln_ci_for 03/01-12/99: (if present) RFCD1763+RFCD1764 - RCON1766,
03/01-12/99: (otherwise) zero

Loans to consumers in domestic offices


03/01-12/10: RCON2011+RCONB538+RCONB539

03/11-12/99: +RCONB538+RCONB539+RCONK137+RCONK207

Loans to consumers in foreign offices

ln_cons_for 03/01-12/10: (if present) RCFD2011+ RCFDB538+
RCFDB539 - ln_cons_dom, (otherwise) zero
03/11-12/99 (if present) RCFDB538+RCFDB539+
RCFDK137+RCFDK207-ln_cons_dom, (otherwise) zero

Loans to depository institutions and acceptances of other banks

ln_dep_inst_banks 03/01-12/99: (if present) RCFDB532+RCFDB533+RCFDB534+
RCFDB536+RCFDB537, (otherwise) RCON1288

other loans

oth_loans 03/01-12/99: 2122+2123-ln_const-ln_farm-ln_rre-ln_multi-ln_nfnr-

Equity securities that do not have readily determinable fair value

equ_sec_nondet 03/01-12/99: 1752

other assets

oth_ass 03/01-12/99: tot_ass - all preceding assets
Table 1: Composition of Asset Classes

Appendix B The Maximum Entropy Approach

In this appendix we provide the details of the derivation of the probability mass functions for the MECAPM, BIPWCM and BIPECM ensembles. In what follows we indicate with the entropy function

b.1 Maximum Entropy Capital Asset Pricing Model

The maximum entropy distribution is obtained by finding the distribution that solves the constrained maximization


The Lagrangian associated to the problem is written as

where and are Lagrange multipliers. Taking the first derivative we get

whose solution is

where is the function defined as

and is a normalizing factor given by



Note that the partition function in (11) is such that


Hence the Lagrange multipliers are determined by

which gives



b.2 Bipartite Weighted Configuration Model.

The maximum entropy distribution is obtained by finding the distribution that solves the constrained maximization


Using one Lagrange multiplier for each quantity indexed by , one Lagrange multiplier for each quantity indexed by and one, , for the constraint on the total probability , the Lagrangian function that has to be maximized is written as (Park and Newman, 2004)


which taking its first derivatives becomes

whose solution is


where is the a function defined as

and is a normalizing factor given by


We can go on with computation by explicitly writing the expression of and in terms of the elements of the matrix , obtaining


where and , whence


The value of the Lagrange multipliers are determined by imposing that the expected value of and on the ensemble are equal to, respectively, and . As for the MECAPM case, note the partition function in (16) is such that

and similarly

Hence we can compute and explicitly as a function of the Lagrange multipliers, that is

Therefore the Lagrange multipliers are determined by numerically solving the non-linear system of equations


b.3 Bipartite Enhanced Configuration Model.

The only difference with the Weighted model described in Appendix B.2 is the addition of the constraints on the number of degrees for each node. Before proceeding, we have thus to add some additional definitions.

The binary projection of is defined as the matrix . Accordingly, the number of assets in which the -th bank invests and the number of banks that own the -th asset class are computed as


where the capital letter stands for degree, as it is common practice in network theory13.

The maximization problem for the BIPECM case is hence stated as


The Lagrangian function is written accordingly as

with an obvious extension of the number of Lagrange multipliers with respect to the Lagrangian in (14). The first order condition reads now

Since (similarly, ), we obtain as a solution of the first order condition, an expression for the probability mass function identical to that obtained in (15) for the Weighted Model, that is


with the precaution that, now, the function and the normalizing factor are, respectively, given by