Assessing systemic risk due to fire sales spillover through maximum entropy network reconstruction^{1}
Abstract
Assessing systemic risk in financial markets is of great importance but it often requires data that are unavailable or available at a very low frequency. For this reason, systemic risk assessment with partial information is potentially very useful for regulators and other stakeholders. In this paper we consider systemic risk due to fire sales spillover and portfolio rebalancing by using the risk metrics defined by Greenwood et al. (2015). By using the Maximum Entropy principle we propose a method to assess aggregated and single bank’s systemicness and vulnerability and to statistically test for a change in these variables when only the information on the size of each bank and the capitalization of the investment assets are available. We prove the effectiveness of our method on 20012013 quarterly data of US banks for which portfolio composition is available.
JEL codes: C45;C80;G01;G33.
Keywords: systemic risk;maximum entropy;firesales;bank vulnerability;bank systemicness.
1 Introduction
After the recent troubled years for the global economy, in which two severe crises (the crisis of financial markets and the sovereign debt crisis) have put the whole economic system in dramatic distress, vulnerability of banks to systemic events is now the main focus of a growing number of investigations of the academic community, across different disciplines. Simultaneously, many research efforts are devoted to understand the role of banks or, broadly speaking, of financial institutions in the creation and in the consecutive spreading of systemic risk. Given the prominent importance of the topic and its multifaceted nature, the literature on evaluation and anticipation of systemic events is huge (see DemirgüçKunt and Detragiache, 1998; Kaminsky and Reinhart, 1999; Harrington, 2009; Scheffer et al., 2009; Barrell et al., 2010; Duttagupta and Cashin, 2011; Kritzman et al., 2011; Allen et al., 2012; Arnold et al., 2012; Bisias et al., 2012; Scheffer et al., 2012; Merton et al., 2013; Oet et al., 2013, among many contributions).
Several are the channels through which financial distress may propagate from one institution to another and, eventually, affect a vast portion of the global economy. Fire sales spillovers due to assets’ illiquidity and common portfolio holdings are definitely one of the main drivers of systemic risk. Shared investments create a significant overlap of portfolios between couples of financial institutions. Such (indirect) financial interconnectedness is an important source of contagion, since partial liquidation of assets by a single market player is expected to affect all other market participants that share with it a large fraction of their own investments (see Corsi et al., 2013; Huang et al., 2013; Caccioli et al., 2014; Lillo and Pirino, 2015, for a survey of the role of portfolio overlap in spreading financial distress). Fire sales move prices due to the finite liquidity of assets and to market impact. In a perfectly liquid market there will be no fire sale contagion at all (see Adrian and Shin, 2008, for a review on the role of liquidity in financial contagion). Finally, leverage amplifies such feedbacks. In fact, as described in detail by Adrian and Shin (2010, 2014), levered institutions continuously rebalance their positions inflating positive and, most importantly, negative assets’ price variations.
Assessing and monitoring systemic risk due to fire sales spillover is therefore of paramount importance for regulators, policy makers, and other participants to the financial markets. Greenwood et al. (2015) introduced recently a stylized model of fire sales, where illiquidity, target leverage, and portfolio overlap are the constituent bricks. They used the model to propose some systemic risk metrics. Specifically, systemicness and vulnerability of a bank are defined in Greenwood et al. (2015) as, respectively, the total percentage loss induced on the system by the distress of the bank and the total percentage loss experienced by the bank when the whole system is in distress. The peculiarity of the systemic measures of Greenwood et al. (2015) (and of any other similar approach) is that, in order to estimate both systemicness and vulnerability, a full knowledge of the banks’ balance sheets is needed. In practice, it is required to know the amount of dollars that each bank invest in each asset class, that is the matrix of bank portfolio holdings, whose estimation from partial information is the main focus of our analysis.
Greenwood et al. (2015) applied their method to the EBA data on the July 2011 European stress tests, which provide detailed balance sheets for the largest banks in the European Union. Duarte and Eisenbach (2013) exploits a publicly available dataset of balance sheets of US bank holding companies to apply the framework of Greenwood et al. (2015). They derive a measure of aggregate vulnerability that […] reaches a peak in the fall of 2008 but shows a notable increase starting in 2005, ahead of many other systemic risk indicators. Nevertheless, the detailed information set required to compute such indicators is not guaranteed to be easily accessible and its collection can be a difficult task, especially at frequency higher than quarterly.
This discussion leads to the main topic of this paper: how it is possible to estimate systemic risk due to fire sales spillover in absence of data on portfolio composition of financial intermediaries. Two possible approaches have been proposed in the literature. The first approach (such those proposed by Adrian and Brunnermeier, 2011; Acharya et al., 2012; Banulescu and Dumitrescu, 2015; Corsi et al., 2015) is purely econometric and it is typically based on publicly available data on price of assets and market equity value of publicly quoted financial institutions. Generically the method consists in estimating conditional variables, such as conditional ValueatRisk or conditional Expected Shortfall. The econometric approach circumvents the unavailability of data on portfolio holdings, but pays this advantage introducing a strong stationarity assumption: estimates based on the past information are assumed to be always good predictors of the future behavior of the system. Nevertheless, due to the nature of a global financial crisis, it is in the very moment of the onset of a period of distress that the stationarity assumption may fail to work properly. Moreover it is often restricted to publicly quoted institutions for which equity value are available at daily frequency.
A second possible approach
In this paper we propose to apply maximum entropy approach to the inference of the network of portfolio weights in order to estimate metrics of systemic risk due to fire sales spillovers. Specifically, we show how indirect vulnerability, systemicness (as defined by Greenwood
et al., 2015) and the aggregate systemic risk of US commercial banks
can be estimated when only a partial information (the size of each bank and the capitalization of each asset) is
available. Unlike Mistrulli (2011), Mastromatteo et al. (2012), and Anand
et al. (2015) we
deal with bipartite networks, namely graphs
The contribution of this paper is divided into three main lines. First, following a practice that is largely diffused among researchers of both academic institutions and central banks (see, among others, Sheldon and Maurer, 1998; Upper and Worms, 2004; Wells, 2004; Mistrulli, 2011; Sachs, 2014), we reconstruct the matrix of portfolio holdings as such that minimizes the KullbackLeibler divergence from a \enquotenaive guess. We show that this approach, while underestimating systemic risk in interbank networks (Mistrulli, 2011), does a very good job in our case, providing unbiased estimates of the aggregate vulnerability as defined by Greenwood et al. (2015). Besides, we show that the reconstructed matrix corresponds to that implied by the Capital Asset Pricing Model, hence it possesses a clear economic meaning.
Second, we define a statistical ensemble, that is a set of graphs and a probability mass function defined on it, such that the expected value of the generic element of the matrix yields the value expected under the Capital Asset Pricing Model. Although sizes and capitalizations are the only information required to construct the ensemble, we show that it does a very good job in predicting systemic risk metrics, not only at aggregate level, but also for each bank. The performances of this newly proposed approach are shown to outperform those obtained with standard maximum entropy approaches (Saracco et al., 2015), i.e. the bipartite extensions of the models proposed by Mastrandrea et al. (2014) for unipartite networks.
Finally, we show how the statistical ensemble is potentially useful for financial regulators or supervisory authorities in their monitoring activities. As a matter of fact, the statistical ensemble implies a probability distribution for any risk metrics and hence it could be used to statistically test if a specific institution has increased its systemicness with respect to a date in the past.
We structure our paper as follows. Section 2 introduces some nomenclature and briefly describes the risk metrics of Greenwood et al. (2015). The dataset of US commercial banks provided by the FFIEC is discussed in Section 3. In Section 4 we derive the solution of the CrossEntropy problem and we apply it to our dataset. Section 5 is dedicated to the definition the statistical ensembles whose performances in estimating risk metrics are investigated in a comparative analysis reported in Section 6. A statistical test useful for surveillance activities by central banks and other regulatory institutions is presented in Section 7. Finally Section 8 summarizes the main contributions of the paper. Appendices provide additional information on the construction of the dataset of bank portfolio holdings and all the analytical computations omitted in the main text.
2 Systemic risk metrics: Vulnerability and Systemicness
In this paper we use some metrics of systemic risk of individual banks and in aggregate, which have been recently introduced by Greenwood et al. (2015). Here the authors presented a model in which fire sales propagate shocks across balance sheets of banks. More specifically, they consider a system composed by banks and asset classes.
First of all, we introduce the matrix of portfolio holdings, whose element is the dollaramount of type assets detained by bank . The corresponding matrix of portfolio weights is thus written as
In what follows, we introduce a discretization of the elements of ’s, in such a way that the matrix belongs to the space of integer valued matrices. In the empirical application we will use the resolution of the dataset which is .
The total asset size of the th bank and the total
capitalization
(1) 
where we have explicitly expressed the dependence of and from .
The rectangular matrix can be naturally associated to a bipartite network, i.e. a graph whose vertices can be divided into two disjoint sets such that every edge connects a vertex in one set to one in the other set, the two sets being the banks and the asset classes. In the network jargon, and are called the strength sequences for, respectively, the top (banks) and bottom (assets) nodes of the network represented by the matrix .
A relevant information concerning
the balance sheet of each bank
is the total equity , from which one can compute the leverage as (as in Greenwood
et al., 2015).
Finally, each asset class is characterized by an
illiquidity parameter , with , defined as the return
per dollar of net purchase
of asset
This setting is used in Greenwood et al. (2015) to define some metrics of systemic risk, capturing the effect of fire sales in response to a shock on the price of the assets. This is described by the dimensional vector , whose components are the assets’ shocks. They define:

Aggregate vulnerability as […] the percentage of aggregate bank equity that would be wiped out by bank deleveraging if there was a shock […] to asset returns.

Bank systemicness as the contribution of bank to aggregate vulnerability.

Bank’s indirect vulnerability as […]as the impact of the shock on its equity through the deleveraging of other banks.
By assuming that banks follow the practice of leverage targeting and that, in response to a negative asset shock, they sell assets proportionally to their preshock holdings, Greenwood et al. (2015) show that can be decomposed as
(2) 
where is the total equity, , is the th element of the vector , i.e. the portfolio return of bank due to the shock , and
The aggregate vulnerability is computed simply as
(3) 
Finally, the indirect vulnerability of a bank is
(4) 
In what follows we assume, as in Duarte and Eisenbach (2013), that for all , which in turns implies that in equations (2) and (4). Note however that if all the assets are shocked by the same amount, our results do not depend on it, since the systemic risk measures will have only a different prefactor. Note also that both systemicness and indirect vulnerability must be thought as quantities expressed in percentage. Finally, we set the liquidity parameter at for all asset classes except for cash, for which we put (as in Greenwood et al., 2015; Duarte and Eisenbach, 2013).
In the next section we provide a detailed description of the dataset that we adopt in our analysis to measure systemic risk, as captured by the metrics of Greenwood et al. (2015), in the US banking sector. Such a dataset allows us to have quarterly estimates of systemicness, aggregate, and indirect vulnerability and to compare these estimates with those inferred from the CrossEntropy approach and the Maximum Entropy principle. Since we have to deal with both real and reconstructed (or sampled form a statistical ensemble) networks, in order to avoid ambiguity, from now on we follow the convention to add a superscript to any variable whenever it is referred to a real (observed) network, while the variable is represented without the superscript every time it is referred to a reconstructed network (e.g. one sampled from a statistical ensemble as described in Section 5).
3 Data
All regulated financial institutions in the United States are required to file periodic financial information with their incumbent regulators.
The Federal Financial Institutions Examination Council, is the regulatory institution responsible to collect and maintain the data used in our analysis.
The financial institutions subject of our investigation are Commercial Banks and Savings and Loans Associations.
The FFIEC defines officially a commercial bank as
The data provided by the Call reports are publicly available
Hence, we are able to construct, for each quarter form the first of 2001 to the third of 2014, a matrix of bank holding whose element is the total dollars invested by the th bank in the th asset class. Finally, it is important to note that the matrix has around of zero entries. Thus the network is relatively dense, but far from being fully connected: it is not true that each bank invest in all assets classes.
4 The CrossEntropy Approach
As anticipated in the introduction, CrossEntropy is a popular method, largely adopted by scholars and researchers of central banks, used to reconstruct a target matrix (typically the interbank matrix) from partial knowledge of its properties. The idea is to select an a priori guess for the matrix and then to find its closest matrix subject to some constraints. The guess matrix is chosen either by using some randomness assumption or by using some economic intuition. The constraints are used to enforce all the partial knowledge on the target matrix to be estimated. In the simplest case, such constraints are non negativity conditions of matrix elements and the total row and column sums. Finally, as a measure of distance to be minimized between the guess and the target matrix one uses the KullbackLeibler divergence.
For the specific case of the system of bank holdings for US commercial banks we assume to have at our disposal, for each quarter, only the information on the total asset size for the th bank and the total capitalization for the th asset class, as provided by the FFIEC data. The CrossEntropy approach derives the target matrix as that which solves the optimization problem
(5)  
s.t.  
where are the entries of the guess
matrix.
Note that the cases analyzed in the interbank lending literature (Mistrulli, 2011)
typically have an additional constraint on the diagonal elements, required
to avoid a single institution to be simultaneously a borrower and lender to itself (see, for example, the Appendix B in Mistrulli, 2011).
The matrix of portfolio holdings analyzed here does not require any of such kind of restrictions,
a feature that extremely simplifies the solution of problem (5).
In fact, the initial guess as proposed, for example, by Mistrulli (2011) coincides, in our settings, with the
capital asset pricing model (CAPM). In a standard CAPM,
investors choose their portfolio in such a way that
each weight on a stock is the fraction of that stock’s market
value relative to the total market value of all stocks (Sharpe, 1964; Lintner, 1965; Mossin, 1966).
Since is the total asset size
of the th bank and since the total market value of all stocks is given by , the CAPM portfolio weights are given by
(6) 
The CAPMimplied holdings are a natural choice for the initial guess , since they are the simplest combinations of the strength sequences that verify the constraints. Nevertheless, given that in (5) the condition on the diagonal elements is absent and since the KullbackLeibler divergence is always positive, the optimal solution of the CrossEntropy problem in (5) when is nothing but the itself. Hence, thanks to the bipartite nature of the network under study we do no have to resort to numerical routines to solve problem (5). If other constraints are added to the problem (e.g. that some banks cannot invest in some asset classes) one could numerically solve the problem (5) with the additional constraint.
We now test empirically the validity of the CAPMimplied portfolio holdings on our data. Notice that the true matrix is quite different from , since the latter has no vanishing elements, while in the former half of the matrix elements are zero. Therefore it is not a priori obvious that the CAPMimplied matrix is able to capture the dynamics of systemic risk. This is however the case, as clearly witnessed by the plot in Figure 2, where we compare the true value of the aggregated vulnerability (formula (3) of the previous section) as found in the dataset described in Section 3 with that computed on the CAPMimplied portfolio holdings.
The main implication of Figure 2 is that, at least for the dataset under analysis, it is not necessary to know the matrix to assess the systemic risk as measured by the aggregate vulnerability. The knowledge of banks’ size and assets’ capitalization is enough to infer the matrix , which very well reproduces the aggregate behavior (in terms of systemicness) of the system. This is different from the result of Mistrulli (2011) for the interbank network, since he finds that the CrossEntropy approach significantly underestimates systemic risk, while in our case the bias is negligible.
At this stage two main issues remain to be addressed. The first is how accurate is the estimate of systemicness and indirect vulnerability of each single bank as provided by the matrix and, the second is which kind of model can be adopted to statistically validate a variation of systemicness (or indirect vulnerability) for a specific bank. In what follows, we turn to these issues.
5 Network Statistical Models
While providing a very accurate approximation of aggregate vulnerability, the matrix cannot say much about statistical significance of systemicness and indirect vulnerability of banks. For this purpose a statistical null model is required.
We define a statistical network model following the theoretical framework of Kolaczyk (2009). A network (statistical) model is defined by a set of graphs that is called ensemble and a probability mass function indexed by a vector of model parameters . In formula it is expressed as the triplet
where is a convex subset of , with the total number of parameters of the model. The set is a countable set whose element are called graphs. In what follows, we will not distinguish between the graph and the associated matrix , i.e. the probability mass function is defined in the space of integer valued matrices. Moreover the probability mass function is a function defined on with value in
and such that . Such a probability is allowed to depend on a vector of real parameters . For an arbitrary (regular enough) network function defined on the set (see for example the functions defined in equations (1)(19)), the expected value of on the ensemble is defined as
A model can be defined by explicitly giving the ensemble, the probability mass function along with the space of the parameters, or by deriving through the recurrent application of some generative mechanism or rule, either starting from an empty graph or by applying a randomization procedure to a reference graph.
We define our network statistical models in the next subsections by invoking the Maximum Entropy principle. However, before proceeding, it is important to stress that, once a network model has been defined, there are at least two possible applications. First, in every situation in which solely the partial information inputed in the maximization constraints is available (and hence the true matrix is unknown), the network model can be used to estimate the true (unobserved) value of any network function (e.g. systemicness and indirect vulnerability of banks), by proxying it via, for example, . Second, as mentioned before, when the entire matrix of bank holdings is known, we can use the associated to statistically validate any network function as observed on real data, for example by testing whether the systemicness of a specific bank has statistically grown in time. The first application is discussed in Section 6 while the second in Section 7.
5.1 The Maximum Entropy Principle
One way to construct ensembles of networks is by invoking the Maximum Entropy (ME) principle
with the normalization constraint
and further additional constraints may be added to define a specific ensemble (see below). In our case, given the peculiar role of the CAPM in reproducing the aggregate vulnerability, we derive the probability mass function by solving the maximization problem
(7)  
s.t.  
We call this model Maximum Entropy Capital Asset Pricing Model (shortened in MECAPM henceforth). In Appendix B.1 we prove that the MECAPM has the unique solution
(8) 
hence each single matrix entry is geometrically distributed with mean .
Since other specifications of maximum entropy are quite popular in the literature of network reconstruction, only for comparison purposes we take into considerations two other ensembles, mainly inspired by the paper by Mastrandrea et al. (2014) and Saracco et al. (2015). Each of them is characterized by different constraints imposed on the maximization of the Shannon’s entropy.
In the first, whose probably mass function is found in Appendix B.3 and that we refer as Bipartite Weighted Configuration Model, the constrained maximization is modified in
s.t.  
BIPWCM imposes weaker constraints with respect to MECAPM, while exploiting the same information set. Despite this, MECAPM has a fully analytical solution while BIPWCM has not. As shown in the Appendix, it is possible to write a system of equations for the Lagrange multipliers of the optimization problem and the system is easily solved numerically.
Finally, we consider another (richer) statistical ensemble whose probability mass function, derived in Appendix B.3, corresponds in our bipartite framework to the enhanced configuration model of Mastrandrea et al. (2014). This newly defined ensemble, that we address as Bipartite Enhanced Configuration Model (BIPECM), is obtained via Maximum Entropy imposing both the mean value of strengths (as in BIPWCM) and the mean value of degrees, that is the number of edges incident in each vertex. In other words, we reconstruct the matrix by assuming the knowledge of the number of assets in which each bank invests as well as the number of banks investing in each asset. Despite the fact that this information is typically not known, we consider this ensemble to show that even with an information set significantly larger than the one used in MECAPM it is very difficult to outperform it. Mathematically, the BIPECM is obtained by solving the optimization problem
(9)  
s.t.  
where and are, respectively, the row and the column degree sequences (see Appendix B.3 for more details). The peculiarity of BIPECM is the addition of the information on the degree sequences that is absent in both BIPWCM and MECAPM. Note that the three ensembles can be used not only for statistical inferences, but to produce estimates of any function defined on the network, which is the topic of the next section.
6 Assessing systemic risk for individual banks
In this Section we compare the performance of the three different ensembles presented above in order to assess systemic risk for individual banks. The three estimators are

MECAPM estimator: only the information on the strength sequences is used, systemicness and indirect vulnerability are estimated through their expected values according to the probability mass function in (8).

BIPWCM estimator: as for the MECAPM case only the information on the strength sequences is used, nevertheless the constraints imposed on the maximization are simpler. Systemicness and indirect vulnerability are estimated through their expected values according to the probability mass function in (17).

BIPECM estimator: the information on the strength and degree sequences is used, systemicness and indirect vulnerability are estimated through their expected values according to the probability mass function in (23).
Notice that, for each estimator, there are at least two possible points of comparison. Suppose, for instance, that we want to estimate , that is the systemicness of bank . We could either proxy via or , where is the expected adjacency matrix according to the network probability distribution.
A peculiarity of the MECAPM is that the computation of the distribution of each single entry does not require any numerical procedure, and hence we can provide explicit formulas for both and . In Appendix C we derive these formulas and, in particular, we show that the expected values and are well approximated by, respectively, and which, for the MECAPM case, coincides with the systemicness and indirect vulnerability of bank as returned by the CAPMimplied matrix. This result shows that the CrossEntropy and the MECAPM are almost equivalent, at least for what concerns the estimates of the average risk metrics.
In evaluating the performances of each class of estimators it is important to have good estimates for the most systemic/vulnerable banks, being all the remaining ones not particularly relevant in the context of systemic risk. For this reason we proceed as follows. Suppose we want to assess the performance of an estimator that produces estimates and of, respectively, systemicness and indirect vulnerability of the th bank. For each bank and for each quarter we compute the relative error in estimating both and as
(10) 
We then divide the whole sample of banks in four quartiles, according to
the true bank systemicness (or the true indirect vulnerability) in the investigated
quarter and we compute the relative errors (10) for all the banks present
in the quartile and according to each of the three estimation methods.
We finally plot the median of the relative error and, as a measure of dispersion,
we take record of the interquartile range, i.e. the difference
between the upper and lower quartiles
Figure 3 shows the results for bank systemicness and Figure 4 shows the results for indirect vulnerability. The three different colors refer to the three different estimation methods (see the figure caption for more details). We observe in all panels and for each quarter that BIPWCM strongly underestimates individual bank systemicness and indirect vulnerability. The median relative error ranges between and and the interquartile range includes zero only for the first quartile, i.e. for the least systemic/vulnerable banks, those who are of less interest in systemic risk assessment. The estimator based on BIPECM (using the additional information on degrees) gives slightly better results, even if a strong underestimation is still present. The median relative error ranges between and and again the interquartile range includes zero only for banks in the first quartile.
On the contrary the estimator based on MECAPM performs much better. The median relative error never goes below and almost always the interquartile range is centered around zero. The most notable exceptions refer to the first quartile (top left panels), which include least important banks and the fourth quartile of indirect vulnerability. In this important case the median relative error is around . Even if we do not have an explanation for this behavior, it is important to notice that MECAPM strongly outperforms the other two estimators.
In summary, the estimates of systemicness and indirect vulnerably for each single bank as provided by the CAPMimplied matrix are almost identical to those obtained as the corresponding expected values on the MECAPM ensemble. Besides, they are satisfactorily accurate and surely more reliable than those provided by standard maximum entropy ensembles.
Once more, the important message is that it is possible to achieve pretty accurate estimates of systemic risk metrics, at the aggregate or individual institution level, due to fire sales spillover without a full knowledge of the portfolio holdings of financial institutions.
7 Monitoring and testing changes in systemicness
As another application of the ensembles of graphs obtained with the Maximum Entropy method, we consider here the problem of assessing whether the systemicness of a given bank (or of the whole system) has changed in a statistically significant way. In order to answer this question, it is necessary to have a null hypothesis and we propose to use network ensembles to this end. Since the MECAPM shows superior performances in estimating risk metrics, in this section we use it and we propose a possible application for statistical validation. Our objective here is not to study all the banks and all the quarters, but only to show how the testing method can be implemented.
In particular, imagine a regulator who monitors a given bank, measuring its systemicness and searching for evidences of a significant increase. Having a given quarter as reference, the regulator can extract the distribution of bank’s systemicness and, in the subsequent quarters, identify when the systemicness is outside a given confidence interval around the reference period. As a special case, we select four banks among the top fifty in the first quarter and that exist for the entire time period (i.e. they do not exit the dataset). For each quarter we compute the true bank systemicness and the  confidence bands according to the MECAPM ensemble (see Figure 5). We then added a magenta square in each quarter when the true systemicness is above the confidence band of the first quarter, used as reference. Hence, a magenta square is indicating a quarter when the systemicness of the bank is statistically larger (according to the MECAPM) than at beginning of . We show two banks for which a statistically significant change in systemicness is observed (top row) and two for which no change is observed (bottom row). Notably, for the former case we find that the systemicness of the banks analyzed increased significantly much before the onset of the 20072008 financial crisis. This phenomenon persisted along the entire period of the crisis and vanished not before the end of 2009. This suggests that network statistical models could be of valuable help in the surveillance activity of central banks and other supervisory authorities as monitoring tools and in constructing early warning indicators.
8 Conclusions
In this paper we focused on the problem of estimating metrics of systemic risk due to fire sale spillover in presence of limited information on the composition of portfolios of financial institutions. A full knowledge of the portfolio holdings of each institution in the economy is required to have a precise estimate of any risk metrics that, as those proposed by Greenwood et al. (2015), is based on the mechanism of portfolio rebalancing through fire sales. Nevertheless, such a huge and detailed information may not be available, especially at frequency higher than quarterly, making the estimation of systemic risk quite difficult. In this paper we circumvent the problem by providing accurate estimates of systemic risk metrics that are based on a partial knowledge of the system, more precisely only on the sizes of balance sheets and the capitalization of assets (or asset classes), which are much easier to trace. In this respect, we have shown that the largely diffused method of CrossEntropy minimization returns, for the system under analysis, the Capital Asset Pricing Model and that it does a very good job in estimating aggregate vulnerability without requiring any knowledge of the underlying matrix of bank portfolio holdings. Furthermore, we introduced a Maximum Entropy (ME) ensemble that reproduces, on average, the CAPM and performs quite well in estimating systemicness and indirect vulnerability of single institutions, outperforming standard ME competitors. The estimation of systemic risk metrics could provide valuable information to any policy maker, but variations in systemicness and indirect vulnerability are difficult to interpret in absence of a statistical validation. For this reason, as a final contribution, we have proposed the ME ensemble as a tool to assess the statistical significance of systemic risk metrics. On a selection of banks of our dataset we documented that their systemicness significantly increased, with respect to the level observed at the beginning of the 2001, much before the onset of the 20072008 financial crisis. Even if deeper investigations are required in this direction, we believe that this approach could be easily implemented as an early warning indicator of systemic risk.
Appendix A Data Description and Dataset Creation
This appendix provides some descriptive features of the data along with the method adopted to build the asset classes of the bankasset network analyzed in the paper. The left panel (first row) of Figure 6 reports, on a loglog scale, the kernel density of the bank sizes (i.e. the total amount of assets detained by the bank) pooled across all quarters. It is evident that bank sizes are quite heterogeneous. The right panel (first row) of Figure 6 reports the density of the bank leverages pooled across all quarters. In this case we observe a much less heterogeneous distribution, with most banks showing a leverage around . Finally, the second row of Figure 6 reports the relation between size and leverage. The plot is achieved by sorting all records of bank size from the smallest to the largest and then applying a movingwindow procedure. As expected from the density plots, there is no relation between leverage and bank size, having most bank a leverage of and a highly heterogeneous size.
Concerning the formation of the asset classes used in the main text, we provide in what follows details on how they have been created. As mentioned in the main text, the focus of the paper is on commercial banks, whose precise definition is given by the FFIEC as […] every national bank, state member bank, insured state nonmember bank, and savings association is required to file a consolidated Call Report normally as of the close of business on the last calendar day of each calendar quarter, i.e., the report date. The specific reporting requirements depend upon the size of the bank and whether it has any \enquoteforeign offices […]. This is the set of institutions that is referred as Commercial Banks throughout all the paper.
Forms FFIEC031 and FFIEC041 are dedicated to, respectively, banks with only domestic offices and banks with domestic and foreign offices. However, in both forms, it is adopted the same coding system. More specifically there are only two types of codes, RCON and RCFD, which are followed by a four digits alphanumerical code. The alphanumerical code identifies the budget item, for example refers to total assets of the bank. The prefix RCON is used for financial items relative to domestic offices, while RCFD encompasses both domestic and foreign offices. Hence is the code for the total assets of the bank detained in U.S. offices, while is relative to the sum of total assets detained in U.S. plus offices abroad. Of course, for banks that fill the FFIEC031 the two codes RCON and RCFD report the same value if they have the same alphanumerical code.
Table 1 documents the detailed composition of each asset class. For each asset class (first column) we report the composition in terms of FFEIC items in the third column and a short name given to the asset class in the second one. Such abbreviation is needed since some asset class, e.g. \enquoteloans to consumers in foreign offices, are assembled subtracting from the FFIEC codes some previously defined asset classes. There is a onetoone correspondence between asset classes and variable names, a part for the case of \enquoteloans secured by real estates in domestic offices, which is computed as the sum of five variables, from \enquoteconstruction loans to \enquotenon farm, non residential. The composition of the FFEIC formula reported in the third column may vary during time, hence we report in bold the period of validity of the formula adopted. In this respect, note that the date 12/99 refers to the last available quarter, that is the third quarter of . In reporting the FFEIC formula, we adopt the convention that the prefix is omitted whenever RCON is used solely for banks with only domestic offices and RCFD solely for those that have at least on office abroad. On the contrary, when the prefix is specified, it means that only the code with that particular prefix is being been used. For example the code is used only in its domestic version, hence we do not use for banks with offices abroad.
Asset Class  Variable Name  FFIEC Formula 

Total assets  tot_ass  03/0112/99: 2170+2123+3123 
Equity  equity 
03/0103/09: 3210+3000 03/0912/99: G105 
Cash and balances due from depository institutions 
cahab  03/0112/99: 0081+0071 
U.S. treasury securities 
ust_sec  03/0112/99: 0211+1287+RCON3531 
U.S agency securities 
agency_sec  03/1112/99: 1289+1294+1293+1298+RCON3532 
Securities issued by state and local governments 
state_sec  03/0112/99: 8496+8499+RCON3533 
Mortgage backed securities 
mbs  03/0103/09: 1698+1702+1703+1707+1709+1713+1714+1717+1718+1732+ 
1733+1736+RCON3534+RCON3535+RCON3536.  
06/0912/10: G300+G303+G304+G307+G308+G311+G312+G315+G316+  
G319+G320+G323+G324+G327+G328+G331+RCONG379+RCONG380+ 

RCONG381+RCONG382 

03/1112/99: G300+G303+G304+G307+G308+G311+G312+G315+G316+  
G319+G320+G323+K142+K146+K145+  
K149+K150+K154+K153+K157+  
RCONG379+RCONG380+RCONG381+RCONK197+RCONK198 

Asset backed securities 
abs 
03/0112/05: B838+B841+B842+B845+B846+B849+B850+B853+B854+ 
B857+B858+B861 03/0603/09 C026+C027 06/0912/99: C026+C027+G336+G340+G344+G339+G343+G347 

Other domestic debt securities 
dom_debt_oth_sec  03/0112/99: 1737+1741 
Foreign debt securities 
for_debt_sec  03/0112/99: 1742+1746 
Residual securities 
res_sec  03/0112/99: A511 
Futures, forwards sold and securities purchased under the agreement to resell (asset) 
ffrepo_ass 
03/0112/01: 1350 03/0212/99: RCONB987+B989 
Loans secured by real estates in domestic offices  Construction loans 
03/0112/07: RCON1415 03/0812/99: RCONF158+RCONF159 
Secured by farmland  03/0112/99: RCON1420  
14 Family real estate  03/0112/99: RCON5367+RCON5368+RCON1797  
Multifamily property loans  03/0112/99: RCON1460  
Non farm, non residential 
03/0112/07: RCON1480 03/0812/99: RCONF160+RCONF161 

Loans secured by real estate in foreign offices 
ln_re_for 
03/0112/99: (if present) RCFD1410  ln_const  ln_farm  ln_rre  ln_multi  ln_nfnr, 
3/0112/99: (otherwise) zero  
Commercial and industrial loans in domestic offices 
ln_ci_dom  03/0112/99: RCON1766 
Commercial and industrial loans in foreign offices 
ln_ci_for  03/0112/99: (if present) RFCD1763+RFCD1764  RCON1766, 
03/0112/99: (otherwise) zero  
Loans to consumers in domestic offices 
ln_cons_dom 
03/0112/10: RCON2011+RCONB538+RCONB539 
03/1112/99: +RCONB538+RCONB539+RCONK137+RCONK207  
Loans to consumers in foreign offices 
ln_cons_for  03/0112/10: (if present) RCFD2011+ RCFDB538+ 
RCFDB539  ln_cons_dom, (otherwise) zero  
03/1112/99 (if present) RCFDB538+RCFDB539+  
RCFDK137+RCFDK207ln_cons_dom, (otherwise) zero  
Loans to depository institutions and acceptances of other banks 
ln_dep_inst_banks  03/0112/99: (if present) RCFDB532+RCFDB533+RCFDB534+ 
RCFDB536+RCFDB537, (otherwise) RCON1288  
other loans 
oth_loans  03/0112/99: 2122+2123ln_constln_farmln_rreln_multiln_nfnr 
ln_re_forln_ci_domln_ci_forln_cons_domln_cons_forln_dep_inst_banks  
Equity securities that do not have readily determinable fair value 
equ_sec_nondet  03/0112/99: 1752 
other assets 
oth_ass  03/0112/99: tot_ass  all preceding assets 
Appendix B The Maximum Entropy Approach
In this appendix we provide the details of the derivation of the probability mass functions for the MECAPM, BIPWCM and BIPECM ensembles. In what follows we indicate with the entropy function
b.1 Maximum Entropy Capital Asset Pricing Model
The maximum entropy distribution is obtained by finding the distribution that solves the constrained maximization
s.t.  
The Lagrangian associated to the problem is written as
where and are Lagrange multipliers. Taking the first derivative we get
whose solution is
where is the function defined as
and is a normalizing factor given by
(11)  
Hence
Note that the partition function in (11) is such that
(12) 
Hence the Lagrange multipliers are determined by
which gives
(13) 
where
b.2 Bipartite Weighted Configuration Model.
The maximum entropy distribution is obtained by finding the distribution that solves the constrained maximization
s.t.  
Using one Lagrange multiplier for each quantity indexed by , one Lagrange multiplier for each quantity indexed by and one, , for the constraint on the total probability , the Lagrangian function that has to be maximized is written as (Park and Newman, 2004)
(14)  
which taking its first derivatives becomes
whose solution is
(15) 
where is the a function defined as
and is a normalizing factor given by
(16) 
We can go on with computation by explicitly writing the expression of and in terms of the elements of the matrix , obtaining
and
where and , whence
(17)  
The value of the Lagrange multipliers are determined by imposing that the expected value of and on the ensemble are equal to, respectively, and . As for the MECAPM case, note the partition function in (16) is such that
and similarly
Hence we can compute and explicitly as a function of the Lagrange multipliers, that is
Therefore the Lagrange multipliers are determined by numerically solving the nonlinear system of equations
(18) 
b.3 Bipartite Enhanced Configuration Model.
The only difference with the Weighted model described in Appendix B.2 is the addition of the constraints on the number of degrees for each node. Before proceeding, we have thus to add some additional definitions.
The binary projection of is defined as the matrix . Accordingly, the number of assets in which the th bank invests and the number of banks that own the th asset class are computed as
(19) 
where the capital letter stands for degree,
as it is common practice in network theory
The maximization problem for the BIPECM case is hence stated as
s.t.  
The Lagrangian function is written accordingly as
with an obvious extension of the number of Lagrange multipliers with respect to the Lagrangian in (14). The first order condition reads now
Since (similarly, ), we obtain as a solution of the first order condition, an expression for the probability mass function identical to that obtained in (15) for the Weighted Model, that is
(20) 
with the precaution that, now, the function and the normalizing factor are, respectively, given by
(21) 
and