1 Introduction

A Regression Discontinuity Design for Ordinal Running Variables: Evaluating Central Bank Purchases of Corporate Bonds

Fan Li  Andrea Mercatanti  Taneli Mäkinen  Andrea Silvestrini 1


We propose a regression discontinuity design which can be employed when assignment to treatment is determined by an ordinal variable. The proposal first requires estimating an ordered probit model for the ordinal running variable. The estimated probability of being assigned to treatment is then adopted as a latent continuous running variable and used to identify a covariate-balanced subsample around the threshold. Assuming local unconfoundedness of the treatment in the subsample, an estimate of the effect of the program is obtained by employing a weighted estimator of the average treatment effect. Three types of balancing weights—overlap weights, inverse probability weights and ATT weights—are considered. An empirical M-estimator for the variance of the weighting estimator is derived. We apply the method to evaluate the causal effect of the Corporate Sector Purchase Programme of the European Central Bank on bond spreads.

Key words: asset purchase programs, balance, local unconfoundedness, ordered probit, regression discontinuity design, weighting

1 Introduction

Regression discontinuity (RD) design is a quasi-experimental design for causal inference. In the conventional sharp RD setting, the treatment status is a deterministic step function of a pretreatment variable, commonly referred to as the running variable. All units with a realized value of the running variable on one side of a pre-fixed threshold are assigned to one regime and all units on the other side are assigned to the other regime. The basic idea of RD is that one can compare units with similar values of the running variable, but different levels of treatment, to draw causal inference of the treatment at or around the threshold. First introduced in 1960 (Thistlethwaite and Campbell, 1960), RD has become increasingly popular since the late 1990s in economics and policy, with many influential applications (e.g., Angrist and Krueger, 1991; Imbens and van der Klaauw, 1995; Angrist and Lavy, 1999; Lee, 2001; van der Klaauw, 2002, among others).

In the standard RD setting, the running variable is continuous; one usually assumes continuity (namely, potential outcomes are continuous functions of the running variable at the threshold), and then employs local linear regressions or polynomials to extrapolate the counterfactual potential outcome under the opposite treatment status and estimate the causal effects at the threshold (Hahn et al., 2001; Imbens and Lemieux, 2008). A recent strand of research instead contends that RD designs lead to locally randomized experiments around the threshold (Lee, 2008; Lee and Lemieux, 2010). Building on this interpretation, several recent works provide formal identification conditions and inferential strategies to estimate the causal effects (e.g., Cattaneo et al., 2015; Li et al., 2015).

RD methods have been mostly developed in the context of continuous running variables. However, in many empirical applications, assignment to treatment is determined by covariates which are inherently discrete or only take on a limited number of values (Lee and Card, 2008; Kolesár and Rothe, 2017). A categorical running variable poses challenges for RD estimation for two reasons. First, RD estimation involves measuring the distance of each unit to the threshold. When the running variable is categorical, ordered or not, values of the running variable provide little information on the distance to the threshold. Consequently, one can no longer compare outcomes within arbitrarily small neighborhoods of the threshold to identify the causal effects, and thus has to account for the uncertainty about the relationship between the running variable and the outcomes (Lee and Card, 2008). Second, if the number of categories is small, even considering only units in the two categories bordering the threshold may lead to misleading results, particularly when the units within the two categories differ considerably from each other. Indeed, existing literature provides limited insights on how to apply RD techniques in such settings. Lee and Card (2008) assume a parametric functional form relating the outcome to the running variable and account for the uncertainty in the choice of this functional form. Dong (2015) considers a setting in which the running variable is discrete due to rounding; the author shows that in this case standard RD estimation leads to biased estimates of the average treatment effect and provides formulas to correct for this discretization bias.

Two recent works shed further light on the issues related to discrete running variables. Kolesár and Rothe (2017) show that the confidence intervals proposed by Lee and Card (2008) have poor coverage properties and suggest to calculate alternative confidence intervals under suitable restrictions on the functional form of the relationship between the outcome and the running variable. Imbens and Wager (2018) propose a general optimization-based approach that minimizes the worst-case conditional mean-squared error among all linear estimators, which is applicable to both continuous and discrete running variables. However, these recent advances are not directly applicable to problems in which the running variable is ordinal, rather than discretized from an underlying continuous variable.

In this paper we develop a framework for conducting RD inference when the running variable is ordered categorical. Our methodological innovation is motivated by our interest in the evaluation of the European Central Bank’s Corporate Sector Purchase Programme (CSPP), which illustrates the challenges posed by ordinal running variables in a RD context. The CSPP entails the acquisition of corporate bonds, with the aim of strengthening the pass-through of unconventional monetary policy measures to the financing conditions of the real economy. Under the CSPP, the Eurosystem purchases investment-grade bonds issued by non-bank corporations. Cast into a RD framework, the assignment to treatment in the CSPP is determined by an ordinal running variable, the rating of the bond. Specifically, only bonds with an investment-grade rating (i.e., BBB or above) receive the treatment, taking the form of being eligible for purchase by the Eurosystem.

The rating of a bond is determined by the financial strength of its issuer and bond-specific characteristics. This observation motivates us to develop a new approach in which we quantify the distance of each unit to the threshold in terms of a continuous latent variable which determines the assignment of each unit to a category. That is, we use supplementary pretreatment information to estimate a latent continuous running variable. Specifically, we adopt the local randomization perspective to RD design and propose a three-step procedure with several new features. First, we postulate an ordered probit model for the ordinal running variable, i.e., the bond rating, employing as predictors issuer and bond characteristics, and take the estimated probability of being assigned an investment-grade rating as the surrogate continuous running variable. Second, based on the estimated probability, we identify a subset of units in which the covariates in the treatment and control groups are similar. Third, within such a subset, we invoke a local unconfoundedness assumption and use the estimated probability to construct a weighted sample to estimate the causal effect of the treatment. The weighted sample represents a population of interest, namely units which could conceivably have been assigned to either treatment status. This is the population which we consider to be close to the threshold. This strategy is similar to propensity score weighting in causal inference (Hahn, 1998; Hirano and Imbens, 2001; Hirano et al., 2003; Li et al., 2018), which, to our knowledge, has not been previously discussed in the RD literature. We also derive an M-estimator for the variance of the causal effect that incorporates the uncertainty arising from both the design and analysis stages.

2 Methods

2.1 Basic setup and assumptions

We proceed under the potential outcomes framework to causal inference (Rubin, 1974; Imbens and Rubin, 2015). Consider a sample of units indexed by drawn from a super-population . Let be the ordinal running variable with categories and for any integer . Based on , a binary treatment is assigned according to a RD rule: if a unit has a value of falling above (or below, depending on the specific application) a pre-specified threshold, , then that unit is assigned to treatment; otherwise, that unit is assigned to control. That is, the treatment status is given by , where is the indicator function. For each unit, besides the running variable, a set of pretreatment covariates is also observed. Each unit has a potential outcome corresponding to each treatment level , and only the one corresponding to the observed treatment status is observed. Define the propensity score as the probability of unit receiving the treatment conditional on the covariates: .

For valid causal inference, we focus on the subpopulations whose units all have non-zero probability of being assigned to either treatment condition. Formally, we make the assumption of local overlap.

Assumption 1 (Local overlap)

There exists a subpopulation such that, for each in , we have .

Within the subpopulation , we further make the following two assumptions.

Assumption 2 (Local SUTVA)

For each unit in , consider two realizations of the running variable and with possibly . If , that is, if either and , or and , then , irrespective of the realized value of the running variable for any other unit in .

Assumption 3 (Local unconfoundedness)

For each unit in , the treatment assignment is unconfounded given :

Local SUTVA implies (i) the absence of interference between units, and (ii) independence of the potential outcome on the running variable given the treatment status for the same unit. Local unconfoundedness forms the basis for causal inference under RD: it entails the existence of a subpopulation around the threshold for which the assignment to treatment is unconfounded given the observed pretreatment variables. We will elaborate on the selection of this subpopulation in Section 2.2.3. Local unconfoundedness extends the local randomization assumption in Lee and Card (2008), and is similar to the bounded conditional independence assumption in Angrist and Rokkanen (2012). As explained by Lee and Card (2008, p. 655), the local randomization assumption means that “it may be plausible to think that treatment status is ‘as good as randomly assigned’ among the subsample of observations that fall just above and just below the threshold.” For instance, suppose that a policy grants a sum of money to households when their income (i.e., a continuous running variable) is below a given threshold. Local randomization means that for each household with an income level inside a small window around the threshold the observed income is assumed to be governed by chance. Given that the probability to enter this window around the threshold depends on household characteristics (e.g., households with high level of education and wealth plausibly have much less probability to have an income in an interval around the threshold compared to households with low level of education and wealth) and given the continuity of the running variable, the local randomization assumption implies that, for each unit in the window, the probability to observe a value of income above the threshold is 0.5 in the limit. Our local unconfoundedness assumption extends the local randomization assumption by allowing the probability to be assigned to the treatment to depart from 0.5 and to depend on the pretreatment variables (education and wealth in this example). Therefore, relaxation of the local randomization hypothesis allows us to enlarge the subsample of units around the threshold for which randomization can be assumed to hold, given that it lets units for which the probability progressively departs from 0.5 enter the subsample. However, our unconfoundedness assumption is still “local” in nature: indeed, we maintain the RD hypothesis according to which, for each unit in the population, treatment assignment also depends on the unobserved units’ characteristics. As a result, the interval around the threshold for which the unconfoundedness hypothesis holds has to be bounded.

2.2 Design and analysis

The key to our proposal is to treat the probability as a latent continuous running variable instead of using the observed category as an ordinal running variable. Under this perspective, we will define the causal estimand as a weighted average treatment effect on a subpopulation with particular policy interest, namely, the overlap population.

Our estimation strategy consists of three steps. First, we fit an ordered probit model for the ordinal running variable conditional on the observed covariates and take the estimated probability of being assigned to treatment as the latent continuous running variable. Second, based on the estimated probability, we identify a subset of units in which the local unconfoundedness assumption is plausible by checking the covariate balance. Third, within this subpopulation, we estimate the average treatment effect for the target population.

Probit model for the ordered running variable

We postulate an ordered probit model for the distribution of the ordered running variable, , and consequently for the propensity score . Specifically, we assume that each unit’s observed category is determined by a latent normally distributed variable as follows:




where is a series of cutoff points, with and . That is, falls in category when the latent variable falls in the interval between and . A probit model for is plausible in contexts where the category classifies units by ordered levels of “quality”, which can be for example the grade a student achieves in a subject, or in our case the credit quality of a bond. In these examples, the quality of a unit is supposed to be a continuous variable (e.g., the student’s level of knowledge in a subject, or the issuer’s capacity to honor its debts) we cannot observe, but for which we can observe the interval where it falls. Based on the ordered probit model (1)–(2), we have

The ordered probit model belongs to the class of generalized linear models suitable for ordinal responses. The link function is the inverse of the normal CDF, which implies that the probability of response is a monotonic function of the linear transformation (Agresti, 2013), namely, for any and , when . Given the deterministic relationship , the monotonicity also holds for the propensity score . Therefore, we expect the estimated propensity scores, , to be close to 1 for units for which we observe high values of , while being close to 0 for units for which we observe low values of .

Moreover, given the monotonicity of in , and provided that is a good predictor of the ordinal responses, we expect the average to be below 0.5 for units whose value of is just below the threshold , i.e., , and above or equal to 0.5 for units whose value of is at the threshold , i.e., . Therefore, values of the propensity score around 0.5 pertain to units which fall in categories around the threshold. These units form a target population of policy interest because they can be assigned with non-negligible probability to either treatment condition and therefore are the mostly affected by, even small, changes in the policy. This target population can be formally defined using the concept of “overlap weights”, as described in Section 2.2.2.

In practice, a well-specified ordered probit model would produce in-sample predictions of that satisfy the above patterns, which can be verified by inspecting the box plots of the estimated in each category of the observed running variable.

Causal estimands

Within a subpopulation where Assumptions 13 hold, we can define a class of average treatment effects estimands, each corresponding to a different target population. To formalize, we assume that the marginal distribution of the pretreatment variables in , , exists. Denote the density of the pretreatment variables in the entire, treated and control population in by , respectively. Representing the target population density by , where is pre-specified function of , we can define a general class of weighted average treatment effect (WATE) estimands (Hirano et al., 2003):


It is easy to show that for any ,

This implies that applying the balancing weights— for the treated units and for the controls—balances the distribution of the pretreatment variables between the treatment groups, and thus enables inferring the causal effect defined on the target population . A consistent moment estimator of is the sample difference in the weighted average outcomes between treatment groups


Among the general class of balancing weights, of particular relevance to our application is the overlap weights (Li et al., 2018), , corresponding to , the maximum of which is attained at . This defines a target population whose pretreatment characteristics could appear with substantial probability in either treatment group, i.e., with the most overlap in covariates. The corresponding causal estimand is called the average treatment effect for the overlap population (ATO). Arguably, the overlap population consists of the units whose treatment assignment might be most responsive to a policy shift as new information is obtained. In our RD framework, this overlap population is exactly the subpopulation around the threshold: with overlap weights, the units are smoothly downweighted as their latent running variable moves away from the threshold, i.e., . Crump et al. (2006) and Li et al. (2018) show that the overlap weights lead to the minimal asymptotic variance of among all functions under mild regularity conditions.

Other two estimands relevant to our application are the average treatment effect (ATE) and the average treatment effect for the treated (ATT). The ATE corresponds to and the balancing weight , while for the ATT and . Though the ATE and ATT do not have a natural connection to the ordinal RD setting here, they are estimands of common interest in the economics literature and we will compare them with the ATO in our empirical application.

Select the subpopulation

An important issue in practice is how to select the subpopulation where Assumption 3 holds. There can be many choices of the shape of the subpopulation. Following the convention in the literature, we first focus on the symmetric intervals around the threshold: . To select the bandwidth , we adopt the idea of balancing tests (Cattaneo et al., 2015; Cattaneo and Vazquez-Bare, 2016). Specifically, given the “local” nature of Assumption 3, we expect the pretreatment covariates to be balanced between treatment groups close to the threshold, but the balance will break down when moving away from the threshold. Therefore, starting from a small , we check the covariate balance of units in the interval and gradually increase until significant imbalance is detected. The “optimal” bandwidth will be set to the maximum such that the covariates are balanced. We also consider subpopulations defined by asymmetric intervals. This allows us to find covariate-balanced subsamples with a larger number of units. As a result, asymmetric intervals allow us to increase the external validity of our findings.

Variance estimation

We derive an M-estimation-based sandwich variance estimator (Huber, 1964; van der Vaart, 1998; Stefanski and Boos, 2002) of the moment estimator (4), which accounts for the uncertainty in estimating the propensities from the ordered probit model (2) (outline of the derivation is given in the Appendix). Denote the parameter vector in the ordered probit model as . The empirical M-estimation variance of with the overlap weights calculated from the ordered probit model is , where ,


and is the information matrix of , is the individual contribution to the gradient of the log-likelihood function and is the gradient of the propensity score.

3 The Corporate Sector Purchase Programme

On March 10, 2016, the European Central Bank (ECB) announced the Corporate Sector Purchase Programme (CSPP). The CSPP consists of purchases of investment-grade corporate bonds issued by euro-area non-bank corporations, and is a part of the ECB’s expanded Asset Purchase Programme (APP). The purchases can occur both in the primary and the secondary market. In order to be eligible for purchase under the CSPP, debt instruments issued must satisfy the following conditions: (i) have a remaining maturity between 6 months and 31 years at the time of purchase; (ii) be denominated in euro; (iii) have a minimum first-best credit assessment of at least rating of BBB- or equivalent (i.e., investment-grade) obtained from an external and independent credit assessment institution; (iv) provide a yield to maturity, which can also be negative, above the deposit facility rate.

In addition, the bond issuer has to comply with the following requirements: (i) is a corporation established in the euro area; (ii) is not a credit institution supervised under the Single Supervisory Mechanism; (iii) does not have a parent undertaking that is also a credit institution; (iv) is not an investment firm, an asset management vehicle or a national asset management fund created in order to support financial sector restructuring; (v) has not issued an asset-backed security, a ‘multi cedula’ or a structured covered bond; (vi) must not have a parent company which is under banking supervision inside or outside the euro area, and must not be a subsidiary of a supervised entity or a supervised group; (vii) is not an eligible issuer for the Public Sector Purchase Programme (PSPP).

There are a few earlier works analyzing the CSPP. Zaghini (2019) assesses the effects of the program in the context of the primary bond market, by controlling for many possible determinants of bond spreads. Looking instead at the secondary market, Abidi and Flores (2017) employ differences in credit rating standards between investors and the ECB to shed light on the market’s reaction to the announcement of the program. We complement these works by providing estimates of the effect of the program which rely on a formal statistical framework of causal inference.

4 Empirical application

We employ the methods proposed in Section 2 to evaluate the effects of the CSPP on bond spreads in the primary market. More specifically, we assess how the eligibility for purchase under the CSPP affects bond spreads at the time of their issuance. We define the treatment as the eligibility for purchase rather than the actual purchase of the bond for the following reasons. First, purchases under the CSPP are not pre-announced, making it impossible for market participants to react to them. Second, given that most eligible bonds issued after the program was announced have been purchased by the Eurosystem, market participants are likely to take the eligibility for purchase into consideration when pricing a bond at its issuance. Indeed, of the 346 eligible bonds that we ultimately use in our analysis, more than 85 per cent had been purchased by the Eurosystem as of the time of writing (January 26, 2018). Finally, due to the relatively low liquidity of the secondary bond market, the effect of the actual purchase can be expected to be highly bond-specific and potentially only short-lived. Any permanent effect of the program on spreads of eligible bonds is, instead, likely to be largely observed already at issuance. Defining the treatment in this manner implies that its effect can be evaluated using a sharp RD design.

Having defined the treatment as the eligibility for purchase, we classify all bonds whose highest rating is equal to or greater than BBB- as treated units and the remaining bonds as control units. It should be pointed out that this does not imply that the treatment is equivalent to being assigned an investment-grade rating. This is due to the fact that market participants employ either the average or the minimum rating to identify investment-grade bonds (Abidi and Flores, 2017). Therefore, the threshold employed by market participants is above that defining eligibility for purchase under the CSPP.

4.1 Data

We employ two sources of proprietary data. First, we obtained from Bloomberg all the corporate bonds satisfying the eligibility criteria of the program with the exception of that pertaining to ratings, and issued between March 10, 2016 and September 30, 2017. The choice of the start date is motivated by the fact that already when the program was announced it became known that only investment-grade bonds would be eligible for purchase. A total of 899 such bonds were found. We consider bonds issued after the program was announced as we wish to focus on the primary market. This is motivated by the relatively low liquidity of the secondary corporate bond markets in Europe (Biais et al., 2006; Gündüz et al., 2017), rendering secondary market quotes noisy indicators of going prices. Primary market prices, on the contrary, provide accurate information about the market valuation of bonds at the time of their issuance.

For each bond, we obtained from Bloomberg the following information: International Securities Identification Number (ISIN), coupon rate (cpn), maturity type, issue date, original maturity (mat), amount sold, coupon type, rating at issuance by Standard & Poor’s, Moody’s, Fitch and DBRS along with its option-adjusted spread. Maturity type refers to any embedded options the bond contains (callable, putable, convertible) or it being a bullet bond (at maturity). Coupon type is one of the following: fixed, zero-coupon, pay-in-kind or variable. The option-adjusted spread (OAS) compares the yield to maturity of the bond to the yield to maturity of a government bond with a similar maturity, and further accounts for any embedded option features of the bond. For the OAS, the first available value between the issue date and the subsequent eight days was employed. We also obtained from Bloomberg the country of incorporation and the industry (as given by the Bloomberg Industry Classification System) of the issuer of each bond. Due to the difficulty of comparing bonds with variable coupon rates to fixed rate bonds, we excluded the former (6 bonds) from the analysis.

We illustrate, in Figure 1, how the option-adjusted spreads vary across bonds issued during the program with different ratings (right in each pair). For the sake of comparison, the distributions of the OAS are presented also for bonds issued before the announcement of the CSPP, between March 13, 2014 and March 9, 2016. This time frame was chosen to obtain a similar number of bonds as in the program data. For all rating categories, apart from the highest two, the option-adjusted spreads were lower during the program than before it. A particularly notable difference is observed for the lowest investment-grade category, BBB-.

Figure 1: OAS by rating, before (left in each pair) and during the program.

The second source of data that we employ is S&P Capital IQ, from which we obtained balance sheet (BS) and income statement (IS) data for the bond issuers. More specifically, we first identified the ultimate parent company of each subsidiary issuer. Then, for these ultimate parents and the issuers with no parent companies, we obtained the following BS and IS items for the fiscal year 2015: earnings before interest and taxes (EBIT), total revenue, cash from operations, total assets, total liabilities, interest expenses, total debt, common equity and long-term debt. In addition, we recorded the year in which the company had been founded. When no data existed for the ultimate parent company, for instance due to it being a private company, we obtained data for the parent on the highest level in the corporate structure for which data was available. From the recorded data, we constructed the following variables: profitability (prof), cash flow (cf), liquidity (liq), interest coverage (cov), leverage (lev), solvency (solv), size, age and long-term debt (ltdebt). They are described in Table 1. We chose these variables as they are known to be determinants of credit quality (Blume et al., 1998; Mizen and Tsoukas, 2012). Units for which we obtained anomalous variable values, suggesting erroneously recorded BS or IS items, were excluded from the calculation of the summary statistics and the rest of the analysis. More specifically, we excluded the bonds issued by companies for which interest coverage exceeded 250 (3 companies), leverage exceeded 1 (3 companies) and solvency was below -1 (1 company). These exclusions led to the removal of 29 bonds.

variable definition mean sd Q Q Q N
prof 0.14 0.27 0.046 0.098 0.17 766
cf 0.055 0.095 0.033 0.067 0.096 699
liq 0.10 0.11 0.049 0.095 0.15 699
cov 7.2 17 1.4 3.6 6.9 727
lev 0.37 0.20 0.24 0.35 0.49 746
solv 0.29 0.20 0.17 0.28 0.41 756
size 3.6 1.0 3.0 3.8 4.4 772
age 2017 year founded 77 76 22 61 115 709
ltdebt 0.33 0.35 0.16 0.26 0.40 747

NOTES: The variable size is calculated with total revenue recorded in millions of euros.

Table 1: Summary statistics for the issuer characteristics.

In the following analysis, we restrict attention to bonds for which data about their coupon rate, original maturity and all the characteristics of their issuers, i.e., the pretreatment variables, is available. There are 591 such bonds, of which 29 are convertible, 351 callable and 211 bullet bonds. However, the convertible bonds are not used in assessing the effect of the program as OAS is not available for them. In what follows, we denote by call the indicator variable equal to 1 if the bond is callable and 0 otherwise.

4.2 Design

In the design phase, our first objective is to obtain a well-specified ordered probit model for the running variable conditional on the pretreatment variables. In particular, we are concerned about how well the ordered probit model predicts ratings around the BBB- eligibility threshold. Good predictive power around the threshold ensures that the subset of units which we ultimately employ to evaluate the program are close to the threshold in terms of their ratings.

Relying on substantive knowledge, we first include the economically most relevant pretreatment variables which help predict ratings. This leads to the following seven variables: cpn, mat, prof, cov, size, ltdebt and call. Then, we form all the possible interaction and quadratic terms from these variables and include a combination of them which yields a model specification with adequate predictive power.

To assess the predictive power of the model, we inspect how well it predicts the probability of being assigned to the treatment group. Figure 2 illustrates the distribution of the estimated propensity scores for each rating category. One observes that for high-yield bonds with a rating lower than BB and for investment-grade bonds with a rating higher than BBB the model predicts them to be with a high probability in the control and in the treatment group, respectively. Moreover, even for the four rating categories from BB to BBB around the threshold, the model correctly predicts the treatment status of most units. Specifically, the estimated propensity score is less than 0.5 for 60% of the BB+ and BB bonds. For the lowest investment-grade categories BBB- and BBB, the estimated propensity scores is greater than 0.5 for 98% of the bonds in these two categories. Most importantly, Figure 2 shows that all the bonds with estimated propensity scores around 0.5 have ratings that are close to the investment grade threshold BBB-, suggesting the probit model is well specified.

NOTES: The ordered probit specification contains: cpn, mat, prof, cov, size, ltdebt, call, cpnsize, cpnltdebt, cpncall, matprof, matltdebt, profprof, profcall, covcall, sizeltdebt, ltdebtltdebt

Figure 2: Estimated propensity scores by rating.

Our second objective in the design phase is to identify subsamples in which the distributions of the covariates are balanced between the treatment and control groups. Following the procedure in Section 2.2.3, we construct subsets of units in which the estimated propensity score of each unit falls in the interval , for some . Then, in each subsample and for each pretreatment variable, we assess covariate balance as measured by the standardized bias (SB):

where is the sample variance of the unweighted covariate and the sample size in group . When each unit is assigned a weight of unity, the SB is simply the two-sample -statistic. We employ the unweighted standard errors in the denominator to be able to compare the values of the statistic across different sets of weights. We first calculate the SB using the overlap weights. Our goal is to find subsamples in which all the covariates are well balanced, ensuring, at the same time, that the number of units in them is not too small. We identify five such values of , and present the corresponding SBs of the covariates in Panel A of Table 2. All of the absolute values of the SBs are smaller than 1.96 (the critical value of the two-sample -statistic at 0.05 level), suggesting overall satisfactory covariate balance between the treatment and control groups. This supports the plausibility of local unconfoundedness in the subsamples under consideration.

N cpn mat prof cf liq cov lev solv size age ltdebt call
Panel A. ATO weights
0.34 27
0.35 28
0.36 32
0.37 33
0.38 36
Panel B. ATE weights
0.34 27
0.35 28
0.36 32
0.37 33
0.38 36
Panel C. ATT weights
0.34 27
0.35 28
0.36 32
0.37 33
0.38 36
Panel D. Unitary weights
0.34 27
0.35 28
0.36 32
0.37 33
0.38 36
Table 2: Standardized bias of the covariates when .

We further investigate whether covariate balance is sensitive to the specific overlap weighting scheme. Specifically, we calculate the SB for each covariate in each of the identified subsamples when employing instead the weights corresponding to the two alternative estimands of our interest, ATE and ATT. The SBs obtained in this manner can be found in Panels B and C of Table 2. For both the ATE and ATT weighting scheme, the covariates remain balanced in the two subsamples with the fewest units. However, for the subsamples defined by , the SBs of two covariates exceed 1.96 in absolute value, signaling that local unconfoundedness is less likely to hold. For this reason, when we estimate the ATE and ATT, we focus on the first two subsamples ().

We also assess covariate balance in the five subsamples considered so far when all the units are weighted equally. This allows us to evaluate whether applying the three sets of weights materially improves covariate balance. That is, for each variable, we conduct a -test for the equality of the unweighted means of the variable in the two groups, the results of which are shown in Panel D of Table 2. They indicate significant imbalance in some of the covariates. Specifically, the -statistics for cpn and solv exceed the relevant 5% critical values in all the five subsamples. Taken together, the results in Table 2 suggest that applying any of the three sets of weights to the samples under consideration improves the overall covariate balance, even though not for each individual covariate. The greatest improvement is observed when employing the overlap weights. Note that, unlike in Li et al. (2018), here the balancing tests are conducted in subsamples, while the weights are estimated using the whole sample. Consequently, covariate balance is not an immediate consequence of applying the overlap weights.

Finally, we investigate whether covariate-balanced subsamples with a larger number of units can be found by rendering asymmetric the intervals in which the estimated propensity scores are required to lie. Specifically, for each of the three weighting schemes, we first identify from Table 2 the largest value of for which all the SBs are smaller than 1.96 in absolute value, indicating satisfactory covariate balance. Then, starting from these symmetric intervals ( for ATO and for ATE and ATT), we gradually increase the length of the interval on the right or left of 0.5 until significant imbalance emerges. In the case of both the ATO and ATT weighting scheme, we are able to identify subsamples with a significantly larger number of units, allowing us to more precisely estimate the effect of the program, as well as to improve the external validity of our results.

4.3 Results

Having identified the subsamples in which local unconfoundedness plausibly holds, we can proceed to estimate the average causal effect of being eligible for purchase under the CSPP for the overlap population, namely the effect for units which could conceivably have been assigned to either the treatment or control group. Specifically, we use the moment estimator in (4) for point estimates and the M-estimator in (5) for standard errors.

N N estimate se (-val.)
Panel A. ATO
0.34 10 17 37.1 23.6 (0.116)
0.35 11 17 39.5 23.1 (0.088)
0.36 13 19 42.7 21.4 (0.046)
0.37 14 19 45.9 21.1 (0.029)
0.38 16 20 38.4 23.1 (0.096)
Panel B. ATE
0.34 10 17 42.2 23.2 (0.069)
0.35 11 17 45.3 22.5 (0.044)
Panel C. ATT
0.34 10 17 36.4 27.6 (0.186)
0.35 11 17 39.9 26.2 (0.128)
NOTES: N is the sample size of group .
Table 3: Estimates of the weighted treatment effect. Symmetric intervals.

Table 3 contains the estimated effects in the five covariate-balanced subsamples identified above. The estimates of the ATO suggest that eligibility for purchase under the CSPP had a statistically significant and negative effect on bond spreads (a reduction in the range of 35–50 basis points). This is slightly lower than the 70 basis point reduction in the primary market found by Zaghini (2019). However, the difference could simply reflect the more “local” nature of our estimates compared to those in Zaghini (2019), which are based on all the bonds issued in the primary market. Relative to the announcement effect of the program in the secondary bond market, of 15 basis points according to Abidi and Flores (2017), our estimates are slightly higher. This difference could reflect the higher liquidity of the bonds which are actively traded in the secondary market.

Given the weighted average maturity of 7.5 years in the subsample defined by , 35–50 basis point reduction in yield to maturity corresponds approximately to a 2.6–3.8 per cent increase in the price of a zero-coupon bond at issuance. Relative to the weighted average amount sold of 620 million euros in the subsample under consideration (), this represents a significant decrease in the funding costs faced by the issuers of the eligible bonds.

As the effect of the program on bond spreads at issuance could have been due to higher expected liquidity of the eligible bonds, it is instructive to compare the effect that we have estimated to liquidity premia of corporate bonds. Dick-Nielsen et al. (2012) estimate the liquidity premia of BBB US corporate bonds to lie in the range of 4–93 basis points. Also relative to these additional yields required by investors to compensate for the illiquidity of corporate bonds, our estimates of the effect of the program are sizable.

We next examine how the estimates of the treatment effect vary when changing the target population. Namely, we calculate the estimates of the ATE and ATT. The former refers to the effect of the program for all units irrespective of their treatment status, without downweighting units further away from the threshold. The latter, on the other hand, is the effect for the units effectively treated. The estimates can be found in Table 3, along with those of the ATO. For both the ATE and ATT, we only consider the first two subsamples, defined by , given that that covariate imbalance emerges for larger values of . The results suggest that the ATE is slightly larger in absolute value than the ATT. In other words, the effect of the program on investment-grade bonds was slightly lower than the effect on high-yield bonds that would have been observed had they also been treated.

N N estimate se (-val.)
Panel A. ATO
0.10 0.88 21 16 39.6 22.8 (0.082)
0.09 0.88 23 16 41.3 22.3 (0.064)
0.08 0.88 24 16 42.7 22.1 (0.054)
0.07 0.88 27 16 44.8 21.5 (0.037)
0.06 0.88 29 16 45.0 21.2 (0.033)
0.05 0.88 31 16 45.5 20.9 (0.029)
Panel B. ATE
0.15 0.86 17 13 48.0 21.1 (0.023)
Panel C. ATT
0.13 0.85 19 11 40.6 25.7 (0.114)
0.11 0.85 20 11 41.2 25.5 (0.106)
0.09 0.85 22 11 42.2 25.2 (0.094)
0.08 0.85 23 11 43.0 25.1 (0.087)
0.07 0.85 26 11 44.2 24.8 (0.074)
0.06 0.85 28 11 44.3 24.6 (0.071)
NOTES: N is the sample size of group .
Table 4: Estimates of the weighted treatment effect. Asymmetric intervals.

The estimates of the ATO, ATE and ATT in Table 3 are based on subsamples which are rather limited in size. For this reason, we estimate the effect of the program also when employing the subsamples defined by asymmetric intervals, identified in Section 4.2. The estimates are presented in Table 4. The magnitude of the estimates change little when considering these subsamples with a larger number of units. The estimates of the ATO appear to settle around 45 basis points and the positive difference between the ATE and ATT decreases slightly. Not surprisingly, the standard errors of the estimates decrease as the sample sizes grow.

4.4 Alternative approach

An influential alternative approach is due to Angrist and Rokkanen (2015) (AR hereafter), who propose to identify causal effects away from the threshold by relying on a conditional independence assumption. In particular, they leverage a set of predictors of the dependent variable which does not contain the running variable. Conditional on this set of predictors, potential outcomes are assumed to be mean-independent of the running variable. Given the similarity between this and our local unconfoundedness assumption, we also estimate the effect of being eligible for purchase under the CSPP employing the framework of AR. Note that the basic conditional independence assumption, , , is unlikely to be satisfied in our application. Therefore, we invoke their alternative assumption, the bounded conditional independence assumption (BCIA): there exists such that , meaning that conditional mean-independence holds in a -neighborhood of the threshold.

However, the measure of distance in the definition of the BCIA is not directly applicable to ordinal running variables. Instead, we take advantage of the ordered nature of the running variable and identify the set of units around the threshold as those with a rating BB+ (the highest category in the control group) or BBB- (the lowest category in the treatment group). We then assume that conditional independence holds in this subset of units around the threshold. Moreover, AR propose to assess the BCIA by testing the coefficients in regressions of the outcome on the running variable and the pretreatment variables on either side of the threshold. This procedure is however not applicable to our selected subsample because we are only considering one category on each side of the threshold. Therefore, we cannot assess the validity of the conditional independence assumption in our subsample.

AR propose to identify the subset of units to analyze based on the value of the running variable. In our application, this leads to the subsample of all bonds whose rating is either BB+ (43 units) or BBB- (26 units). Based on this sample we find the WLS estimate is -36.7 (p-value 0.367) with ATT weights and -51.8 (p-value 0.205) with ATE weights. These estimates are in line with those obtained using our framework, even though the two approaches rely on rather different assumptions. However, the covariate distribution of this subsample can be imbalanced between the treated and the control groups; indeed, in our case, nearly all 12 covariates have larger SB than in our method, 4 of which are larger than 1.96. This speaks to a strength of our approach: we define the candidate subsamples based on richer covariate information, encoded in the estimated propensity scores obtained from the ordered probit model, and thus help identify subsets of units with better covariate balance between the treated and the control groups. Covariate balance lends powerful support to the validity of local unconfoundedness, being a stronger consequence of this assumption than the regression independence assessed by AR.

5 Conclusion and discussion

In this paper we have developed a regression discontinuity design applicable when the running variable, determining assignment to treatment, is ordinal. The estimation strategy is based on the following steps. We first estimate an ordered probit model for the ordinal running variable conditional on pretreatment variables. The estimated probability of being assigned to treatment is then adopted as a continuous surrogate running variable. In order to provide external validity to the analysis, we move away from the standard inference at the threshold by assuming local unconfoundedness of the treatment in an interval around the surrogate threshold. Then, once this interval has been identified via an overlap-weighted balancing assessment of the preprogram variables across treatments, an estimate of the effect of the program in the interval is obtained employing a weighted estimator of the average treatment effect.

We have applied our methodology to estimate the causal effect of the European Central Bank’s Corporate Sector Purchase Programme (CSPP) on corporate bond spreads. We have estimated the effect of the program in a subpopulation defined by the estimated conditional probability to be eligible for purchase. This subpopulation is composed of bonds that can be assigned with non-negligible probability to either eligibility status, and therefore are the most affected by, even small, changes in the program. Our results suggest that eligibility for purchase under the CSPP had a negative effect, in the order of 35–50 basis points, on bond spreads at issuance. This is somewhat higher than previous estimates of the announcement effect of the program on bonds traded in the secondary market (Abidi and Flores, 2017). Given that in the sample which is used to conduct inference the average amount issued exceeded 600 million euros, the 35–50 basis point reduction in the yield to maturity corresponds to a non-negligible decrease in the funding costs of the eligible issuers.

There are several limitations of our work. First, though our probit specification appears to perform well in the empirical application, it may be improved by a more objective procedure for choosing it. Specifically, our approach may give rise to a trade-off between variance and bias. Namely, when the model for the ordinal variable provides a good in-sample fit, the estimated propensity scores of most units are close to either 0 or 1. Consequently, covariate-balanced subsamples, identified using the estimated propensity scores, are likely to have moderate sample sizes. This may lead to elevated standard errors of the estimates of the treatment effect. One direction is to develop a cross-validation criterion based on an objective function that achieves the right balance between bias and variance. Another direction is to conduct some sensitivity analysis on the model specification, e.g., in the vein of Rosenbaum and Rubin (1983). Second, it is possible that our analysis captures not only the causal effect of the CSPP program but also the effect related to the rating of the bond. To be able to attribute the effects to the program, one possibility is to conduct a negative control analysis (Rosenbaum, 2002). That is, to evaluate the effect of eligibility on spreads during a pretreatment period (i.e., “no treatment” evaluation) to assess the potential effect due to the rating of the bond. Third, our method relies partially on the local SUTVA assumption, which rules out interference between units as well as any “externality effects”. One could borrow from the recent advances to tackle the interference problem in causal inference to relax this assumption.


Derivation of the M-estimator of variance in Section 2.2.4. The log likelihood function of the ordered probit model (1) is

where . Maximum likelihood estimates of the model parameters are obtained by solving the score function (first-order derivative of the log likelihood),

Let be the informatiom matrix, the stochastic expansion for estimating is then . The propensity score is , and its gradient is . Recall the moment estimator with estimated overlap weights is

From this, we can view as the solution to the unbiased estimating equation . Denote , we expand the estimating equation around and the true propensity score to obtain . Similarly, if we define , we have .

Differencing the above two expansions we obtain

where and . Replacing the expectation by its empirical counterpart leads to the empirical sandwich variance estimator in Section 2.2.4.


  1. FL is associate professor, Department of Statistical Science, Duke University, Durham, NC, 27708 (email: fl35@duke.edu); AM is researcher (andrea.mercatanti@liser.lu), Bank of Italy and Luxembourg Institute of Socio-Economic Research; TM (email: taneli.makinen@bancaditalia.it) and AS (email: andrea.silvestrini@bancaditalia.it) are researchers, Bank of Italy. The authors are grateful to Federico Apicella, Johannes Breckenfelder, Federico Cingano, Riccardo De Bonis, Alfonso Flores-Lagunes, Frank Li, Fabrizia Mealli, Santiago Pereda Fernández, Stefano Rossi and Stefano Siviero for helpful comments and suggestions. Part of this work was done while TM was visiting the Einaudi Institute for Economics and Finance, whose hospitality is gratefully acknowledged. The views expressed herein are those of the authors and not necessarily those of Bank of Italy. All remaining errors are ours.


  1. Abidi, N. and Flores, I. M. (2017), “Who Benefits from the Corporate QE? A Regression Discontinuity Design Approach,” Unpublished Working Paper.
  2. Agresti, A. (2013), Categorical Data Analysis, Hoboken, NJ: John Wiley & Sons, 3rd edition.
  3. Angrist, J. D. and Krueger, A. B. (1991), “Does Compulsory School Attendance Affect Schooling and Earnings?” The Quarterly Journal of Economics, 106, 979–1014.
  4. Angrist, J. D. and Lavy, V. (1999), “Using Maimonides’ Rule to Estimate the Effect of Class Size on Scholastic Achievement,” The Quarterly Journal of Economics, 114, 533–575.
  5. Angrist, J. D. and Rokkanen, M. (2012), “Wanna Get Away? RD Identification Away from the Cutoff,” NBER Working Papers 18662, National Bureau of Economic Research.
  6. — (2015), “Wanna Get Away? Regression Discontinuity Estimation of Exam School Effects Away from the Cutoff,” Journal of the American Statistical Association, 110, 1331–1344.
  7. Biais, B., Declerck, F., Dow, J., Portes, R., and von Thadden, E.-L. (2006), “European Corporate Bond Markets: Transparency, Liquidity, Efficiency,” CEPR Research Report, City of London.
  8. Blume, M. E., Lim, F., and Mackinlay, A. C. (1998), ‘‘The Declining Credit Quality of U.S. Corporate Debt: Myth or Reality?” The Journal of Finance, 53, 1389–1413.
  9. Cattaneo, M. D., Frandsen, B. R., and Titiunik, R. (2015), “Randomization Inference in the Regression Discontinuity Design: An Application to Party Advantages in the U.S. Senate,” Journal of Causal Inference, 3, 1–24.
  10. Cattaneo, M. D. and Vazquez-Bare, G. (2016), “The Choice of Neighborhood in Regression Discontinuity Designs,” Observational Studies, 2, 134–146.
  11. Crump, R. K., Hotz, V. J., Imbens, G. W., and Mitnik, O. A. (2006), “Moving the Goalposts: Addressing Limited Overlap in the Estimation of Average Treatment Effects by Changing the Estimand,” NBER Working Papers 330, National Bureau of Economic Research.
  12. Dick-Nielsen, J., Feldhütter, P., and Lando, D. (2012), “Corporate bond liquidity before and after the onset of the subprime crisis,” Journal of Financial Economics, 103, 471–492.
  13. Dong, Y. (2015), “Regression Discontinuity Applications with Rounding Errors in the Running Variable,” Journal of Applied Econometrics, 30, 422–446.
  14. Gündüz, Y., Ottonello, G., Pelizzon, L., Schneider, M., and Subrahmanyam, M. G. (2017), “Lighting up the dark: A preliminary analysis of liquidity in the German corporate bond market,” Unpublished Working Paper.
  15. Hahn, J. (1998), “On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects,” Econometrica, 66, 315–331.
  16. Hahn, J., Todd, P., and van der Klaauw, W. (2001), “Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design,” Econometrica, 69, 201–209.
  17. Hirano, K. and Imbens, G. W. (2001), “Estimation of Causal Effects Using Propensity Score Weighting: An Application to Data on Right Heart Catheterization,” Health Services and Outcomes Research Methodology, 2, 259–278.
  18. Hirano, K., Imbens, G. W., and Ridder, G. (2003), “Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score,” Econometrica, 71, 1161–1189.
  19. Huber, P. J. (1964), “Robust Estimation of a Location Parameter,” The Annals of Mathematical Statistics, 35, 73–101.
  20. Imbens, G. and van der Klaauw, W. (1995), “Evaluating the Cost of Conscription in The Netherlands,” Journal of Business & Economic Statistics, 13, 207–215.
  21. Imbens, G. W. and Lemieux, T. (2008), “Regression Discontinuity Design: A Guide to Practice,” Journal of Econometrics, 142, 615–635.
  22. Imbens, G. W. and Rubin, D. B. (2015), Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge, UK: Cambridge University Press.
  23. Imbens, G. W. and Wager, S. (2018), “Optimized Regression Discontinuity Designs,” arXiv: 1705.01677v2 [stat.AP].
  24. Kolesár, M. and Rothe, C. (2017), “Inference in a Regression Discontinuity Design with a Discrete Running Variable,” arXiv: 1606.04086v4 [stat.AP].
  25. Lee, D. S. (2001), “The Electoral Advantage to Incumbency and Voters’ Valuation of Politicians’ Experience: A Regression Discontinuity Analysis of Elections to the U.S. House,” NBER Working Papers 8441, National Bureau of Economic Research.
  26. — (2008), “Randomized experiments from non-random selection in U.S. House elections,” Journal of Econometrics, 142, 675–697.
  27. Lee, D. S. and Card, D. (2008), “Regression discontinuity inference with specification error,” Journal of Econometrics, 142, 655–674.
  28. Lee, D. S. and Lemieux, T. (2010), “Regression Discontinuity Designs in Economics,” Journal of Economic Literature, 48, 281–355.
  29. Li, F., Mattei, A., and Mealli, F. (2015), “Evaluating the Causal Effect of University Grants on Student Dropout: Evidence from a Regression Discontinuity Design Using Principal Stratification,” Annals of Applied Statistics, 9, 1906–1931.
  30. Li, F., Morgan, K. L., and Zaslavsky, A. M. (2018), “Balancing Covariates Via Propensity Score Weighting,” Journal of the American Statistical Association, 113, 390–400.
  31. Mizen, P. and Tsoukas, S. (2012), “Forecasting US Bond Default Ratings Allowing for Previous and Initial State Dependence in an Ordered Probit Model,” International Journal of Forecasting, 28, 273–287.
  32. Rosenbaum, P. R. (2002), Observational Studies, New York, NY: Springer-Verlag.
  33. Rosenbaum, P. R. and Rubin, D. B. (1983), “Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 45, 212–218.
  34. Rubin, D. B. (1974), “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies,” Journal of Educational Psychology, 66, 688–701.
  35. Stefanski, L. A. and Boos, D. D. (2002), “The Calculus of M-Estimation,” The American Statistician, 56, 29–38.
  36. Thistlethwaite, D. L. and Campbell, D. T. (1960), “Regression-discontinuity analysis: An alternative to the ex post facto experiment,” Journal of Educational Psychology, 51, 309–317.
  37. van der Klaauw, W. (2002), “Estimating the Effect of Financial Aid Offers on College Enrollment: A Regression-Discontinuity Approach,” International Economic Review, 43, 1249–1287.
  38. van der Vaart, A. W. (1998), Asymptotic Statistics, Cambridge, UK: Cambridge University Press.
  39. Zaghini, A. (2019), ‘‘The CSPP at work: Yield heterogeneity and the portfolio rebalancing channel,” Journal of Corporate Finance, in press.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description