Experimental Evaluation of Residential Demand Response in California

Experimental Evaluation of Residential Demand Response in California


We evaluate the causal effect of hour-ahead price interventions on the reduction of residential electricity consumption, using a large-scale experiment on 7,000 households in California. In addition to this experimental approach, we also develop a non-experimental framework that allows for an estimation of the desired treatment effect on an individual level by estimating user-level counterfactuals using time-series prediction. This approach crucially eliminates the need for a randomized experiment. Both approaches estimate a reduction of 0.10 kWh (11%) per Demand Response event and household. Using different incentive levels, we find a weak price elasticity of reduction. We also evaluate the effect of an adaptive targeting scheme, which discriminates users based on their estimated responses in order to increase the per-dollar reduction ratio by 30%. Lastly, we find that households with smart home automation devices reduce significantly more than households without, namely 0.28 kWh (37%).


1 Introduction

This paper studies the causal effect of incentivizing residential households to participate in Demand Response (DR) to temporarily reduce electricity consumption. DR has been promoted by the introduction of demand-side management programs (DSM) after the 1970s energy crisis [1], enabled by the integration of information and communications technology in the electric grid. The rationale behind DSM is the inelasticity of energy supply due to the slowness of power plants’ output adjustment, which causes small increases and decreases in demand to result in a price boom or bust, respectively. Since utilities are obligated to provide end-users with electricity at a quasi-fixed tariff at all times [2], e.g. Time-of-Use pricing, they have to bear price risks. Therefore, DSM attempts to protect utilities against such price risks by partially relaying them to end-users, which increases market efficiency according to the economic consensus [3].

In 2015, the California Public Utilities Commission (CPUC) launched a Demand Response Auction Mechanism (DRAM) [4], requiring utilities to procure a certain amount of reduction capacity from DR providers. These aggregators incentivize their customers (also called “Proxy Demand Resource” (PDR) [5]) under contract to temporarily reduce their consumption relative to their projected usage without intervention, referred to in this context as counterfactual or baseline, based on which compensations for (non-)fulfilled reductions are settled: If the consumer uses less (more) energy than the baseline, she receives a reward (incurs a penalty). Figure 1 illustrates the interactions between agents.

Wholesale Market

DR Provider

Scheduling Coordinator

Electric Utility

End-Use Customers


Figure 1: Interactions of Agents in Residential Demand Response

The estimation of the materialized reduction arguably is the most critical component of the DR bidding process. If the reductions are estimated with a biased counterfactual, either the DR provider or the utility clearing the bids is systematically discriminated against. If the baseline is unbiased but plagued by high variance, the profit settlement is highly volatile. Existing baselines employed by major power grid operators in the United States (e.g. California Independent System Operator (CAISO), New York ISO) are calculated with simple arithmetic averages of previous observations [5] and therefore are inaccurate. The estimation of more accurate baselines is a significant contribution of this paper.

1.1 Contributions

We estimate the average treatment effect (ATE) of hour-ahead notifications on the reduction of electricity consumption by evaluating a Randomized Controlled Trial (RCT) on residential households in California serviced by the three main electric utilities (PG&E, SDG&E, SCE). This experiment is funded by the California Energy Commission, which to the best of our knowledge is the first one to experiment with hour-ahead notifications on a residential household level. We estimate an ATE of kWh per DR Event and user and further discover notable geographic and temporal heterogeneity among users, as the largest estimated reductions occur in summer months as well as in regions with warmer climate.

In addition to this experimental approach, we also develop a non-experimental method for estimating this causal effect on an individual user level, which is easily aggregated into an ATE. Importantly, we find that the identified ATEs in both cases are close to each other. Interestingly, the non-experimental approach even achieves tighter confidence intervals of the estimated causal effect. This suggests that our methodology is capable of identifying the causal impact of any intervention in settings with high-frequency data, thereby circumventing financial and ethical constraints which frequently arise in clinical trials, transportation, or education.

Lastly, we design an adaptive targeting method to exploit the heterogeneity in users’ responses to incentive signals to assign differing price levels to different subsets of the treatment population. Specifically, we separate users based on their previous responses into two distinct groups, each of which either only receives low or high incentives. This method yields an increase of the per-dollar yield of 30%.

This paper is structured as follows: Section 2 explains the experimental setup and provides summary statistics on the RCT data. We then develop the non-experimental estimation framework in Section 3, where we pay particular attention to estimation bias and empirical de-biasing methods (Section 3.3). Non-experimental estimation results are provided in Section 4. Next, in Section 5 we estimate the ATE using a classical Fixed-Effects Estimator [6]. Section 5.6 compares the estimates obtained by both approaches. Lastly, the effect of adaptive targeting is discussed in Section 6. Section 7 concludes. Additional figures and numeric data are relegated to the appendix.

1.2 Related Work

Causal inference seeks to extrapolate the effect of interventions to general settings, however, many experiments are infeasible or unethical due to budget or ethical factors. With the rapid growth of collected user data, non-experimental estimates become more and more valuable, as the new hope is to use such estimates in place of experiments. These facts have spurred research at the intersection of machine learning and economics, whose general idea is to partition observations under treatment and control in order to fit a nominal model on the latter set, which, when applied on the treatment set, yields the treatment effect of interest.

Examples for such models are [7], who evaluates welfare effects of home automation by calculating the Kolmogorov-Smirnov Statistic between users, or [8], who constructs a convex combination of US states as the counterfactual estimate for tobacco consumption to estimate the effect of a tobacco control program in California on tobacco consumption. In [9], the estimators are random forests trained by recursive partitioning of the feature space and novel cross-validation criteria. [10] develops Bayesian structural time series models combined with a Monte-Carlo sampling method for treatment effect inference of market interventions.

Fitting an estimator on smart meter time-series is essentially a short-term load forecasting (STLF) problem, whose goal is to fit estimators on observed data to predict future consumption with the highest possible accuracy. Within STLF, tools employed are ARIMA models with a seasonal component [11] and classic regression models where support vector regression (SVR) and neural networks yield the highest accuracy [12, 13]. A comprehensive comparison between ML techniques for forecasting and differing levels of load aggregation is provided in [14].

In the context of smart meter data mining, much of the existing work focuses on disaggregation of energy consumption to identify contributions of discrete appliances from the total observed consumption [15] and to learn consumption patterns [16, 17]. Studies in applied economics typically emphasize the estimation of ATEs of experimental interventions. To increase precision of the estimates, the employed regression models often employ unit-level fixed effects [18, 19], which is an implicit way of training models for the consumption of individual consumers. In this work, we make these user-level models explicit, allowing for more general ML techniques. Importantly, our approach is original as it permits to perform causal inference on the level of individual treatment effects in a straightforward fashion by employing estimators from STLF. To the best of our knowledge, this paper is the first of its kind to analyze the potential of Demand Response interventions on a residential level, combining ideas at the intersection of causal inference from econometrics and Machine Learning for estimation.

2 Experimental Setup and Data Characteristics

2.1 Setup of the Experiment

The experiment is carried out by OhmConnect, Inc., using funds provided by the California Energy Commission. Figure 2 draws a flowchart of the experimental setup.

All Participants

Treatment Encouraged

Treatment Non-Encouraged


Estimate Responses




Phase 1 (90 days)

Phase 2 (90 days)

Figure 2: Setup of Experiment

Over the course of the experimental time period (Nov. 2016 - Dec. 2017), each consumer that signs up for the study is randomly assigned to one of the following groups:

  • TreatmentEncouraged: The user receives an average number of 25 DR events in the 90 days following the signup, with incentive levels being randomly chosen from the set . Additionally, the user is given a rebate for purchasing a smart home automation device.

  • TreatmentNon-Encouraged: Same as in Treatment-Encouraged, but without smart home automation rebate.

  • Control: Users do not receive any notifications for DR events for a period of 90 days after sigup.

These three groups form Phase 1 of the experiment. Users in the control group that have reached 90 days of age are removed from the study. Users in either the TreatmentEncouraged or TreatmentNon-Encouraged groups that have reached 90 days of age are pooled and systematically assigned to one of the following groups for Phase 2 interventions:

  • Targeted-High: The user receives an average number of 25 DR events for a period of 90 days after being rolled over into Phase 2. Each reward level is randomly drawn from the set .

  • Targeted-Low: Same as in Targeted-High, but rewards are randomly drawn from .

  • Non-Targeted: Same as in targeted groups, with rewards drawn from .

Users with completed Phase 2 are removed from the study. In Sections 3-5, we evaluate Phase 1 of the experiment whereas Section 6 is dedicated to adaptive targeting (Phase 2). In the remainder of this paper, we use the term “treatment users” to refer to users in the “Treatment-Encouraged” and “Treatment-Non-Encouraged” group. Users receive notifications of a DR event with the incentive level up to 20 minutes into an hour, which lasts until the end of the hour.

2.2 Summary Statistics

Table 1 reports the number of users by experiment group and proportion of users for which we were able to scrape historical smart meter reading data. The table shows that the randomized assignment of users to groups roughly follows a 1:2:2 ratio (Control vs. TreatmentEncouraged vs. TreatmentNon-Encouraged).

Historical Smart Meter Data Availability by Group
Group # Enrolled # With Data # With DR
Table 1: Number of Total Users Enrolled by Group, Data Availability, and Users with DR Events.

Users without DR events or for which we were unable to scrape historical data are omitted from the study. Since the assignment of users into the different experimental groups was randomized (see Section 2.4), dropping such users does not affect the evaluation of the experiment. Figure 3 shows the geographic distribution of the remaining users.

Figure 3: Geographic Distribution of Users

2.3 Weather Data

Hourly measurements of ambient air temperature are scraped from the publicly accessible California Irrigation Management Information System [20]. As there are fewer weather stations than distinct user ZIP codes, we linearly interpolate user-specific temperatures at their ZIP codes from the two closest weather stations in latitude and longitude by calculating geodesic distances with Vincenty’s formulae [21].

2.4 Balance Checks

To verify that users were randomly assigned to control and treatment groups, we perform a balance check on the distribution of observed air temperatures and electricity consumptions across both groups. Notice that the relatively large sample size renders a classical differences-in-means -test inappropriate. Therefore, we utilize Cohen’s to estimate the effect size based on the differences between means, which is insensitive to the large sample size. Given two discrete distributions and with sample sizes / and means /, Cohen’s is defined as


where and are the sample standard deviations for distributions and , respectively. In addition, we use the Hellinger Distance as a nonparametric comparison to quantify the similarity between the distributions [22]:


where and . To compute (1) and (2), we discretize the temperature and consumption distributions appropriately. Table 6 in the Appendix provides these metrics together with the differences in means for a selected subset of hours of the day, which was chosen to coincide with those hours of the day for which DR events were observed (see Figure 10). We omit the metrics for the remaining hours of the day as they are very similar to the listed ones. As the Hellinger Distance , with 0 corresponding to a perfect similarity and 1 to total dissimilarity, we can assume that the assignment of users into treatment and control group is as good as random.

3 Nonexperimental Treatment Effect Estimation

3.1 Potential Outcomes Framework

To estimate the effect of the DR intervention program, we adopt the potential outcomes framework introduced by Rubin (1974) [23]. Let denote the set of users. The indicator encodes the fact whether or not user received DR treatment at time . Each user is endowed with a consumption time series and associated covariates , , where time is indexed by and is the dimension of the covariate space . Let and denote user ’s electricity consumption at time for and , respectively. Let and denote the set of control and treatment times for user . That is,


The number of treatment hours is much smaller than the number of non-treatment hours. Thus .

Further, let and denote user ’s covariate-outcome pairs of treatment and control times, respectively. That is,


The one-sample estimate of the treatment effect on user at time , given the covariates , is


which varies across time, the covariate space, and the user population. Marginalizing this one-sample estimate over the set of treatment times and the covariate space yields the user-specific Individual Treatment Effect (ITE)


The average treatment effect on the treated (ATT) follows from (6):


Since users were put into different experimental groups in a randomized fashion, the ATT and the average treatment effect (ATE) are identical [24]. Lastly, the conditional average treatment effect (CATE) on is obtained by marginalizing the conditional distribution of one-sample estimates (5) on over all users and treatment times, where is a subvector of :


The CATE captures heterogeneity among users, e.g. with respect to specific hours of the day, the geographic distribution of users, the extent to which a user possesses “smart home” appliances, group or peer effects, etc. To rule out the existence of unobserved factors that could influence the assignment mechanism generating the complete observed data set , we make the following standard assumptions:

Assumption 1 (Unconfoundedness of Treatment Assignment).

Given the covariates , the potential outcomes are independent of treatment assignment:

Assumption 2 (Stationarity of Potential Outcomes).

Given the covariates , the potential outcomes are independent of time, that is,


Assumption 1 is the “ignorable treatment assignment” assumption introduced by Rosenbaum and Rubin [25]. Under this assumption, the assignment of DR treatment to users is implemented in a randomized fashion, which allows the calculation of unbiased ATEs (7) and CATEs (8). Assumption 2, motivated by the time-series nature of the observational data, ensures that the set of observable covariates can capture seasonality effects in the estimation of the potential outcomes. That is, the conditional distribution of the potential outcomes, given covariates, remains constant.

The fundamental problem of causal inference [26] refers to the fact that either the treatment or the control outcome can be observed, but never both (granted there are no missing observations). That is,


Thus, the ITE (6) is not identified, because one and only one of both potential outcomes is observed, namely for the treatment times and for the control times. It therefore becomes necessary to estimate counterfactuals.

3.2 Non-Experimental Estimation of Counterfactuals

Consider the following model for the estimation of such counterfactuals:


where denotes noise uncorrelated with covariates and treatment assignment. is the conditional mean function and pertains to . To obtain an estimate for , denoted with , control outcomes are first regressed on , namely their observable covariates. In a second step, the counterfactual for any can be estimated by evaluating on its associated covariate vector . Finally, subtracting from isolates the one-sample estimate , from which the user-specific ITE (6) can be estimated. Figure 4 illustrates this process of estimating the reduction during a DR event by subtracting the actual consumption from the predicted counterfactual . Despite the fact that consumption can be predicted for horizons longer than a single hour, we restrict our estimators to a single hour prediction horizon as DR events are at most one hour long.












actual consumption

ambient air temperature

Covariates for Estimation

estimated counterfactual

materialized consumption

DR Event

Estimated Reduction

Figure 4: Estimation of the Counterfactual using Treatment Covariates and Predicted Reduction

To estimate , we use the following classical regression methods [27], referred to as estimators:

  • Ordinary Least Squares Regression (OLS)

  • L1 Regularized (LASSO) Linear Regression (L1)

  • L2 Regularized (Ridge) Linear Regression (L2)

  • -Nearest Neighbors Regression (KNN)

  • Decision Tree Regression (DT)

  • Random Forest Regression (RF)

DT (E5:) and RF (E6:) follow the procedure of Classification and Regression Trees [28]. We compare estimators (E1:)-(E6:) to the CAISO 10-in-10 Baseline (BL) [5], which, for any given hour on a weekday, is calculated as the mean of the hourly consumptions on the 10 most recent business days during the selected hour. For weekend days and holidays, the mean of the 4 most recent observations is calculated. This BL is further adjusted with a Load Point Adjustment, which corrects the BL by a factor proportional to the consumption three hours prior to a DR event [5].

Since users tend to exhibit a temporary increase in consumption in the hours following the DR intervention [1], we remove hourly observations following each DR event in order to prevent estimators (E1:)-(E6:) from learning from such spillover effects. This process is illustrated in Figure 5.


Figure 5: Separation of consumption time series into training set (green), DR Events (grey), and Periods of Spillover Effects (blue)

Hence the training data used to estimate the conditional mean function (12) consists of all observations leading up to a DR event, excluding those that are within 8 hours of any DR event. To estimate user ’s counterfactual outcome during a DR event , we use the following covariates:

  • 5 hourly consumption values preceding time

  • Air temperature at time and 4 preceding measurements

  • Hour of the day, an indicator variable for (non-)business days, and month of the year as categorical variables

Thus, the covariate vector writes


In (13), denotes temperature, hour of day, an indicator variable for business days, and the month of year (all for user at time ). “C” denotes dummy variables and “:” their interaction.

3.3 Placebo Treatments and De-biasing of Estimators

As previously mentioned, a crucial element of an estimator is unbiasedness. If an estimator systematically predicts counterfactuals that are too large (small), users receive an excess reward (are paid less) proportional to the amount of prediction bias. For a fair economic settlement, it is thus desirable to minimize the amount of bias. In our application, such prediction bias is caused by the following two factors:

  • Inherent bias of estimators: With the exception of OLS (E1:), (E2:)-(E6:) are inherently biased, which is justified due to the well-known bias-variance tradeoff.

  • Seasonal and temporal bias: Due to the experimental design, DR events for a particular user are concentrated within a period of 180 days after signing up. Further, DR events are called only in the afternoon and early evening (see Figure 10). Thus, fitting an estimator on all available historical data is likely to introduce bias during these time periods of interest.

To deal with these challenges, we use the de-biasing procedure presented in Algorithm 1, which was first introduced in [29].

Input: Treatment data , control data , Estimator,

1:  Split into training data and placebo data according to empirical distribution of . Split control times into training times and placebo times
2:  Compute weights for according to (15a)-(15c), using
3:  Fit conditional mean function on with weights
4:  Estimate placebo counterfactuals
5:  Compute bias on placebo treatment set
6:  Estimate treatment counterfactuals
7:  Subtract placebo treatment bias from estimated treatment counterfactuals:
Algorithm 1 Unbiased Estimation of Counterfactuals

We first separate a subset of non-DR events from user ’s control data , which we call the placebo set with associated placebo treatment times (we chose to be of size 25). This placebo set is drawn according to user ’s empirical distribution of Phase 1 DR events by hour of day and month of year. Next, the non-experimental estimator of choice is fitted (using cross-validation to find hyperparameters to minimize the mean squared prediction error) on the training set . Importantly, to account for temporal bias, we assign weights to the training samples, ensuring that samples in “similar” hours or seasons as actual DR events are assigned larger weights. Specifically, the weights are determined as follows:


where is a constant to be chosen a-priori.

Then, the fitted model is used to predict counterfactuals associated with placebo events. This yields a set of paired samples from which we can obtain a proxy of the estimation bias that remains even after assigning sample weights according to the previous step. Finally, to obtain an empirically de-biased estimate of actual Phase 1 DR events, we simply subtract this proxy of the estimation bias from predicted Phase 1 DR event outcomes.

3.4 Estimation of Individual Treatment Effects

To obtain point estimates for user ’s ITE , we simply average all one-sample estimates (5) according to (6). To obtain an estimate of whether or not a given user has actually reduced consumption, we utilize a nonparametric permutation test with the null hypothesis of a zero ITE:


Given user ’s paired samples during DR periods, the -value associated with (16) is


In (17), denotes the mean of . denotes the set of all possible assignments of signs to the pairwise differences in the set . That is,


which is of size . Finally, the -value from (16) is calculated as the fraction of all possible assignments whose means are less than or equal the estimated ITE . In practice, as the number of DR events per user in Phase 1 is about 25 (see Figure 2), the number of total possible assignments becomes computationally infeasible. Thus, we randomly generate a subset of assignments from to compute the -value in (17). Moreover, we use the percentile bootstrap method [30] to compute a confidence interval of the estimated ITE for user around the point estimate .

4 Nonexperimental Estimation Results

4.1 Average Treatment Effect

Figure 6 shows ATE point estimates and their 99% bootstrapped confidence intervals conditional on differing reward levels for all estimators as well as the CAISO BL. Due to the empirical de-biasing procedure (see Section 3.3), the point estimates for estimators E1:-E6: are close to each other. BL appears to be biased in favor of the DRP, as it systematically predicts smaller reductions than E1:-E6:.

Figure 6: CATEs by Incentive Level with Bootstrapped Confidence Intervals

The ATE averaged over the predictions of estimators E1:-E6: is kWh / . The intercept and the slope of the demand curve are kWh / kWh/USD, meaning that users reduce an additional 0.013 kWh per dollar offered, which is only a small change. Due to the idiosyncratic nature of the CATE for , the slope and intercept have to be interpreted with caution. However, the results give rise to a notable correlation between incentive levels and reductions.

To compare the prediction accuracy of the estimators, Table 2 reports the width of the confidence intervals for each method and incentive level. The inferiority of the CAISO baseline compared to the non-experimental estimators, among which RF achieves the tightest confidence intervals, becomes apparent. Therefore, in the remainder of this paper, we restrict all results achieved with non-experimental estimators to those obtained with RF.

Width of CATE Confidence Intervals (kWh) by Incentive Level
0.05 0.25 0.50 1.00 3.00
RF 0.0211 0.0210 0.0212 0.0211 0.0205
Table 2: Width of 99 % Confidence Intervals around ATE Point Estimate Conditional on Incentive Level, All Estimators

4.2 Individual Treatment Effects

Figure 7 plots ITEs for a randomly selected subset of 800 users who received at least 10 DR events in Phase 1, estimated with RF. Users are sorted by their point estimates (blue), whose 95% bootstrapped confidence intervals are drawn in black. Yellow lines represent users with at least one active smart home automation device. By marginalizing the point estimates over all users with at least 10 events, we obtain an ATE of kWh (11.4%), which is close to kWh as reported earlier. The difference ensues from only considering users with at least 10 DR events. The 99% ATE confidence interval is kWh.

Figure 7: Distribution of ITEs with Bootstrapped Confidence Intervals

Table 3 reports estimated ATEs for users with or without active smart home automation devices, which are obtained by aggregating the relevant estimated ITEs from Figure 7. We notice larger responses as well as a larger percentage of estimated reducers among automated users.

ATEs Conditional on Automation Status for Users with 10 DR Events
# Users % Reducers ATE (kWh) ATE (%)
Table 3: Estimated CATEs by Automation Status, RF Estimator (E6:)

Table 4 reports the percentage of significant reducers for different confidence levels, obtained with the permutation test under the null (16). From Tables 3 and 4, it becomes clear that automated users show larger reductions than non-automated ones, which agrees with expectations.

Fraction of Significant Reducers (among sample of size )
# Automated
% of Total
# Non-Automated
% of Total
# All
% of Total
Table 4: Estimated Percentage of Significant Reducers according to Permutation Test, RF Estimator (E6:)

5 ATE Estimation with Fixed Effects Models

To estimate the ATE of DR interventions on electricity consumption, we consider the following fixed-effects model with raw consumption (kWh) as the dependent variable:


In (19), subscripts and refer to user at time , respectively. is a row vector of observable covariates, are unobserved fixed effects, and is the noise term which is assumed to be uncorrelated with the regressors and Gaussian distributed with zero mean and finite variance. The fixed effects term removes persistent differences across users in their hourly and monthly consumption interacted with a business day indicator variable:


5.1 Estimation by Incentive Level

To estimate the CATE by incentive level, the covariate matrix in (19) is specified as follows:


In (21a), is an indicator set to one for all treatment users (and zero for all control users). is the CAISO baseline for user at time , which is necessary to control for the non-random assignment of reward levels to users, is the ambient air temperature, and is the reward level.

5.2 Estimation by Hour of the Day

To estimate the CATE by hour of the day, we pool all reward levels into the indicator variable , which is one if user received treatment at time and zero otherwise:


5.3 Estimation by Month of the Year

The CATE by month of the year is found in a similar fashion to the CATE by hour of the day:


5.4 Role of Smart Home Automation

The CATE by automation status is determined by introducing the indicator :


5.5 Effect of Automation Uptake Encouragement

Lastly, the effect of incentivizing users to purchase a smart home automation device on energy consumption during DR events is determined as follows:


In (25), the indicators and are for all users in the “Treatment-Encouraged” and in “Treatment-Non-Encouraged”, respectively, and zero otherwise.

5.6 Comparison of Estimation Methods

We now benchmark the results obtained from the best non-experimental estimator (RF) to those from the fixed effects model with specification (20).

Figure 8 compares the point CATEs by reward levels and their 95% confidence intervals. It can be seen that the point estimates are close to each other ( kWh aggregated for fixed effects vs. for non-experimental approach with RF, a less than difference), a finding that suggests that our non-experimental estimation technique produces reliable estimates comparable to the experimental gold standard. The fact that the confidence intervals are notably tighter for RF corroborates this notion.

Figure 8: CATEs by Incentive Level with Confidence Intervals, Comparison Fixed Effects Estimators and Non-Experimental Estimators

6 Effect of Adaptive Targeting

The goal of adaptive targeting is to maximize the reduction per dollar paid to the users, which is achieved by either minimizing the payout and/or maximizing users’ reductions. We evaluate the reduction by reward ratios for the targeted and non-targeted groups by averaging the per-event reductions (5) normalized by the reward :


where denotes user ’s set of Phase 2 DR events.

6.1 Targeting Assignment Algorithm

Algorithm 2 describes the targeting assignment algorithm on a given set of users, which we denote with .

Input: Set of users with completed Phase 1
Output: Groups , , and of proportion

1:  Randomly split ,
2:  Estimate ITE with E6: for each user .
3:  Sort in ascending order
4:  Split , such that and
5:  Assign users in and to low- and high-targeted, respectively
Algorithm 2 Adaptive Targeting of Users

Users are transitioned into Phase 2 on a weekly basis. That is, for a particular week, all users who have reached 90 days of age in Phase 1 form the current weekly cohort, which is randomly split into a non-targeted group and targeted group of equal size (ties are broken randomly). For each user in , we calculate the ITE based on Phase 1 events. These ITEs are then sorted in ascending order. The 50% of the largest reducers (with the most negative ITEs) are defined to be the low-targeted group , whereas the other half is assigned to high-targeted group . This targeting scheme appears to be a double-edged sword: On the one hand, the DRP pays less money to large reducers and also achieves larger reductions for previously small reducers, increasing the desired ratio. On the other hand, previously large reducers now reduce less (in response to smaller rewards) and previously small reducers are paid more money for increased reductions, thereby counteracting the desired goal. However, the latter factors are dominated by the gains from the former ones, as we show in Section 6.3.

6.2 Validation of Adaptive Targeting

Algorithm 2 exploits the fact that users are relatively price inelastic (indeed Figure 8 only shows a weak negative slope of the demand curve) to assign large incentives to low reducers (and small incentives to high reducers) to minimize the total payout from DRP to users. The attentive reader might wonder why the targeting criterion had been determined to be the estimated ITE rather than any other criterion. Indeed, we performed a targeting exercise to determine which criterion is most suitable for maximizing (26a). The idea is to assign users into one of two targeted groups, based on one of the following criteria estimated from Phase 1 responses:

  • ITE

  • ITE normalized by average reward level received

  • Intercept of estimated individual demand curve

  • Slope of estimated individual demand curve

Each of these four criteria are computed in kWh and % values, and after a sufficient number of iterations it was determined that the ITE indeed is the criterion that maximizes (26a).

6.3 Results of Adaptive Targeting

Table 5 reports the targeting metrics together with the CATE by treatment group as well as the number of observations for targeted and non-targeted users . We restrict our analysis to samples obtained after June 27, 2017, as we observe larger effects of targeting in summer months.

Targeting Metrics for Phase 2
Table 5: Targeting Results for users between June 27, 2017 - December 31, 2017. , .

RF predicts a difference of , or an increase of about compared to the non-targeted scheme. For BL, this increase is smaller (15%). However, due to the biasedness of the BL (see Figure 6), the RF estimate is more reliable. We can observe the tradeoff between smaller reductions (indeed RF predicts a CATE for targeted users that is 34% smaller compared to non-targeted users) and a reduced average payout, which decreases by 85% (not reported in Table 5). The latter effect dominates the decrease in net reductions, resulting in the 30% increase of the reduction per reward ratio (26a).

7 Conclusion

We analyzed Residential Demand Response as a human-in-the-loop cyber-physical system that incentivizes users to curtail electricity consumption during designated hours. Utilizing data collected from a Randomized Controlled Trial funded by the CEC and conducted by a Demand Response provider in the San Francisco Bay Area, we estimated the causal effect of hour-ahead price interventions on electricity reduction. To the best of our knowledge, this is the first major study to investigate DR on such short time scales.

We developed a non-experimental estimation framework and benchmarked its estimates against those obtained from an experimental Fixed-Effects Linear Regression Model. Importantly, the former does not depend on the existence of an experimental control group to construct counterfactuals that are necessary to estimate the treatment effect. Instead, we employ off-the-shelf regression models to learn a consumption model on non-DR periods, which can then be used to predict counterfactuals during DR hours of interest. We find that the estimated treatment effects from both approaches are close to each other. The estimated ATE is kWh (11%) per Demand Response event and household. Further, we observe a weak positive correlation between the incentive level and the estimated reductions, suggesting that users are only weakly elastic in response to incentives.

The fact that the estimates obtained from both approaches are close to each other is encouraging, as our non-experimental framework permits to go a step further compared to the experimental method in that it allows for an estimation of individual treatment effects. From an economic perspective, being able to differentiate low from high responders allows for an adaptive targeting scheme, whose goal is to minimize the total payout to users while maximizing total reductions. We utilize this fact to achieve an increase of the reduction-per-reward ratio of 30%.

Lastly, we emphasize that our non-experimental estimation framework presented in this paper has to potential to generalize to similar human-in-the-loop cyber-physical systems that require the incentivization of users to achieve a desired objective. This is because our non-experimental framework, whose techniques are general rather than specific to Demand Response, admits results on an individual user level, which could be of particular interest in the incentivization of users in transportation or financial systems.

Future work includes the analysis of adversarial user behavior (baseline gaming) and advanced effects including peer and network effects influencing Demand Response. Also, we intend to investigate the effect of moral suasion and other non-monetary incentives on the reduction of electricity consumption of residential households.


.1 Summary Statistics

Figures 9-12 illustrate the distribution of the number of DR events received among users with completed Phase 1, as well as the total number of DR events broken out by hour of the day, day of the week, and month of the year.

Figure 9: Distribution of Number of Phase 1 DR Events Across Users with Completed Phase 1
Figure 10: Distribution of DR Events by Hour of the Day Across Users with Completed Phase 1
Figure 11: Distribution of DR Events by Day of the Week Across Users with Completed Phase 1
Figure 12: Distribution of DR Events by Month of the Year Across Users with Completed Phase 1

.2 Balance Checks

Table 6 provides the balance metrics introduced in Section 2.4.

Balance Metrics for Control and Treatment Group
Cohen’s D Hellinger Dist. Diff. Mean
air_temp, 0.005
historical obs. (hours)
Table 6: Balance Checks for Users in Control and Treatment Group

.3 Fixed Effects Regression Tables

Tables 7-9 provide the results of the Fixed Effects Regressions presented in Section 5. The point estimates of interest are printed in boldface and are accompanied by the standard errors as well as their 95% confidence intervals. The -value of the regression gives rise to the -value, where we use to denote statistical significance at the confidence level, respectively.

Effect of DR by Incentive Level on Electricity Consumption
Parameter \makecellEstimate
(Std. Err.) -Value 95% Conf. Int. -value
(0.003) -2.100 [-0.013, 0] 0.047
(0.010) 88.89 [0.859, 0.900] 0.001
(0.002) 10.79 [0.017, 0.024] 0.001
(0.014) -8.532 [-0.148, -0.091] 0.001
(0.018) -6.910 [-0.157, -0.085] 0.001
(0.016) -7.369 [-0.147, -0.083] 0.001
(0.020) -6.219 [-0.166, -0.083] 0.001
(0.010) -12.95 [-0.157, -0.114] 0.001
Table 7: Fixed Effect Regression Results by Incentive Level
Effect of DR by Month of Year on Electricity Consumption
Parameter \makecellEstimate
(Std. Err.) -Value 95% Conf. Int. -value
(0.003) -1.962 [-0.014, 0.001] 0.078
(0.016) 55.52 [0.844, 0.915] 0.001
(0.006) 3.326 [0.007, 0.034] 0.008
(0.010) -4.298 [-0.063, -0.020] 0.002
(0.009) -2.571 [-0.041, -0.003] 0.028
(0.004) -18.62 [-0.085, -0.067] 0.002
(0.004) -15.14 [-0.071, -0.053] 0.001
(0.005) -20.07 [-0.104, -0.083] 0.001
(0.008) -20.25 [-0.172, -0.138] 0.001
(0.007) -32.68 [-0.242, -0.211] 0.001
(0.008) -19.39 [-0.177, -0.141] 0.001
(0.014) -9.071 [-0.218, -0.142] 0.001
(0.009) -3.055 [-0.050, -0.008] 0.012
(0.010) -2.172 [-0.045, 0.001] 0.055
Table 8: Fixed Effect Regression Results by Month of Year
Effect of Home Automation on Electricity Consumption
Parameter \makecellEstimate
(Std. Err.) -Value 95% Conf. Int. -value
(0.003) -2.101 [-0.013, 0] 0.047
(0.010) 88.94 [0.859, 0.900] 0.001
(0.002) 10.79 [0.017, 0.024] 0.001
(0.042) -7.800 [-0.418, -0.243] 0.001
(0.014) -7.310 [-0.132, -0.074] 0.001
Table 9: Fixed Effect Regression Results by Automation Status
Effect of Automation Uptake Incentive on Electricity Consumption
Parameter \makecellEstimate
(Std. Err.) -Value 95% Conf. Int. -value
(0.003) -1.422 [-0.012, 0.002] 0.168
(0.003) -2.485 [-0.015, -0.001] 0.021
(0.024) 38.38 [0.886, 0.987] 0.001
(0.002) 10.794 [0.017, 0.024] 0.001
(0.016) -7.703 [-0.153, -0.088] 0.001
(0.015) -8.304 [-0.156, -0.094] 0.001
Table 10: Fixed Effect Regression Results by Automation Uptake Encouragement

.4 Comparison of Estimation Methods

Figure 8 visually compares the ATEs broken out by incentive level, and it can be seen that both methods produce similar estimates. Figure 13 does the same for month of the year. Agreeing with intuition, the reductions are notably larger in summer months compared to winter periods. Conditional on the automation status, Table 9 states that the reductions are and kWh for automated and non-automated users, respectively, compared to and kWh calculated by the non-experimental case. These values are close to each other. Lastly, no significant difference in the magnitude of reductions can be found between encouraged and non-encouraged users.

Figure 13: CATEs by Month of Year with Confidence Intervals, Comparison Fixed Effects Estimators and Non-Experimental Estimators

.5 Correlation of Temperature and ITE

As mentioned in the previous subsection, larger reductions are estimated in warm summer months. To test the hypothesis whether or not there exists such a correlation, Figure 14 scatter plots estimated ITEs as a function of the average ambient air temperature observed during the relevant DR events. We can notice a notable positive correlation of ambient air temperature and the magnitude of reductions. Indeed, a subsequent hypothesis test with the null being a zero slope is rejected with a -value of less than .

Figure 14: Correlation between Average Ambient Air Temperature and ITEs.

To support this notion, we marginalize ITEs for each ZIP code to obtain the geographic distribution of CATEs by location, see Figure 15, and it is visually striking that users in coastal areas in California show smaller reductions than users in the Central Valley, where the climate is hotter.

Figure 15: Correlation between Average Ambient Air Temperature and CATEs.


  1. P. Palensky and D. Dietrich, “Demand Side Management: Demand Response, Intelligent Energy Systems, and Smart Loads,” IEEE Transactions on Industrial Informatics, vol. 7, no. 3, pp. 381–388, 2011.
  2. Federal Energy Regulatory Commission, “Assessment of Demand Response and Advanced Metering,” Tech. Rep., 2016.
  3. S. Borenstein, “The Long-Run Efficiency of Real-Time Electricity Pricing,” The Energy Journal, 2005.
  4. “Public Utilities Commission of the State of California: Resolution E-4728. Approval with Modifications to the Joint Utility Proposal for a Demand Response Auction Mechanism Pilot,” July 2015.
  5. “California Independent System Operator Corporation (CAISO): Fifth Replacement FERC Electric Tariff,” 2014.
  6. P. J. Diggle, P. Heagarty, K.-Y. Liang, and S. L. Zeger, Analysis of Longitudinal Data.   Oxford University Press, 2013, vol. 2.
  7. B. Bollinger and W. R. Hartmann, “Welfare Effects of Home Automation Technology with Dynamic Pricing,” Stanford University, Graduate School of Business Research Papers, 2015.
  8. A. Abadie, A. Diamond, and J. Hainmueller, “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program,” Journal of the American Statistical Association, vol. 105, no. 490, pp. 493–505, 2012.
  9. S. Athey and G. W. Imbens, “Recursive Partitioning for Heterogeneous Causal Effects,” Proceedings of the National Academy of Sciences of the United States of America, vol. 113, no. 27, pp. 7353–7360, 2016.
  10. K. Brodersen, F. Gallusser, J. Koehler, N. Remy, and S. Scott, “Inferring Causal Impact Using Bayesian Structural Time-Series Models,” The Annals of Applied Statistics, vol. 9, no. 1, pp. 247–274, 2015.
  11. J. W. Taylor and P. E. Sharry, “Short-Term Load Forecasting Methods: An Evaluation Based on European Data,” IEEE Transactions on Power Systems, vol. 22, no. 4, pp. 2213–2219, 2007.
  12. T. Senjyu, H. Takara, K. Uezato, and T. Funabashi, “One-Hour-Ahead Load Forecasting Using Neural Network,” IEEE Transactions on Power Systems, vol. 17, no. 1, pp. 113–118, 2002.
  13. E. E. Elattar, J. Goulermas, and Q. H. Wu, “Electric Load Forecasting Based on Locally Weighted Support Vector Regression,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 40, no. 4, 2010.
  14. P. Mirowski, S. Chen, T. K. Ho, and C.-N. Yu, “Demand Forecasting in Smart Grids,” Bell Labs Technical Journal, 2014.
  15. F. Chen, J. Dai, B. Wang, S. Sahu, M. Naphade, and C.-T. Lu, “Activity Analysis Based on Low Sample Rate Smart Meters,” Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 240–248, 2011.
  16. A. Molina-Markham, P. Shenoy, K. Fu, E. Cecchet, and D. Irwin, “Private Memoirs of a Smart Meter,” Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, pp. 61–66, 2010.
  17. D. Zhou, M. Balandat, and C. Tomlin, “A Bayesian Perspective on Residential Demand Response Using Smart Meter Data,” 54th Allerton Conference on Communication, Control, and Computing, 2016.
  18. H. Allcott, “Rethinking Real-Time Electricity Pricing,” Resource and Energy Economics, vol. 33, no. 4, pp. 820–842, 2011.
  19. K. K. Jessoe, D. L. Miller, and D. S. Rapson, “Can High-Frequency Data and Non-Experimental Research Designs Recover Causal Effects?” Working Paper, 2015.
  20. “California Irrigation Management Information System,” 2017.
  21. T. Vincenty, “Geodetic Inverse Solution Between Antipodal Points,” Tech. Rep., 1975.
  22. M. S. Nikulin, “Hellinger distance,” “http://www.encyclopediaofmath.org/index.php?title=Hellinger_distance&oldid=16453”.
  23. D. B. Rubin, “Estimating Causal Effects of Treatments in Randomized and Non-Randomized Studies,” Journal of Educational Psychology, vol. 66, no. 5, pp. 688–701, 1974.
  24. J.-S. Pischke and J. D. Angrist, Mostly Harmless Econometrics, 1st ed.   Princeton University Press, 2009.
  25. P. R. Rosenbaum and D. B. Rubin, “The Central Role of the Propensity Score in Observational Studies for Causal Effects,” Biometrika, vol. 70, no. 1, pp. 41–55, 1983.
  26. P. W. Holland, “Statistics and Causal Inference,” Journal of the American Statistical Association, vol. 81, no. 396, pp. 945–960, 1986.
  27. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning.   Springer New York, 2009.
  28. L. Breiman, J. Friedman, C. Stone, and R. A. Olshen, “Classification and Regression Trees,” CRC Press, 1984.
  29. M. Balandat, “New Tools for Econometric Analysis of High-Frequency Time Series Data - Application to Demand-Side Management in Electricity Markets,” University of California, Berkeley, PhD Dissertation, 2016.
  30. B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap.   CRC Press, 1994.
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Comments 0
The feedback must be of minumum 40 characters
Add comment

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question