Building Model Identification during Regular Operation
– Empirical Results and Challenges
Abstract
The intertemporal consumption flexibility of commercial buildings can be harnessed to improve the energy efficiency of buildings, or to provide ancillary service to the power grid. To do so, a predictive model of the building’s thermal dynamics is required. In this paper, we identify a physicsbased model of a multipurpose commercial building including its heating, ventilation and air conditioning system during regular operation. We present our empirical results and show that large uncertainties in internal heat gains, due to occupancy and equipment, present several challenges in utilizing the building model for longterm prediction. In addition, we show that by learning these uncertain loads online and dynamically updating the building model, prediction accuracy is improved significantly.
I Introduction
Commercial buildings account for more than 35% of electricity consumption in the U.S., 39% of which is due to heating, ventilation and air conditioning (HVAC) systems [1]. Energy consumption of HVAC systems can be partly shifted in time without compromising occupant comfort, because of buildings’ inherent thermal capacity. As a result, there has been extensive research, using frameworks such as model predictive control (MPC), trying to harness this intertemporal consumption flexibility and minimize energy usage of buildings [2, 3]. More recently, the feasibility of using buildings to provide ancillary services, such as frequency regulation, to the power grid has also been studied [4, 5, 6, 7, 8]. Such applications require accurate models describing the thermal dynamic behavior of the buildings.
Ia Desired Model Features and Challenges
The building models should be identified using actual experimental data and capture realistic disturbances, such as internal gains. Furthermore, bilinear, multizone models may quantify buildings’ electricity consumption flexibility more precisely: the bilinear thermal dynamics naturally arise from the physics of the HVAC system (refer to Section III for details); controllers designed using multizone models can allow the temperature of a room to fluctuate when it is unoccupied, for instance, hence achieving energy savings which are not possible when simplified models that approximate the building as a single zone are used.
However, there are many challenges in identifying such a building model. First, actual buildings often have different types of spaces that are subject to very different uncertainties, e.g. occupancy, which are difficult to capture. Second, buildings are often not sufficiently excited, as they must satisfy strict regulatory requirements during regular operation, which limit the type and duration of excitation experiments that can be conducted [9].
To circumvent these difficulties, various approaches have been taken in the research community. In [9, 10, 11], the authors identify datadriven linear models for a single type of building space. Lin et al. [12] conduct frequency regulation experiments under a controlled environment without disturbances such as solar radiation and occupants, thus simplifying the building model required to design the controller. Faced with insufficient excitation of buildings, Mehdi et al. [13] reduce the required model complexity by carrying out their experiments in a single room, whereas others [14, 15, 16] use lumped thermal models that approximate the building as a single zone, with an average building temperature. Finally, the authors of [17, 18] identify and test multizone models for a single floor and an entire building, respectively, using simulated data where uncertainties are precisely controlled or removed, and arbitrary excitations can be simulated.
These approaches are valuable in providing estimates of a building’s consumption flexibility, however none of them delivers a model that satisfies all the aforementioned desired features for a more precise quantification of the building’s flexibility.
IB Contributions
In this paper, we identify a semiparametric model for a multizone commercial building during regular operation. Our main contributions are the following:

We propose a procedure to identify a physicsbased model of a multizone building, that is easy to implement with the building in regular operation, and captures internal gains such as occupancy. This procedure uses excitation experiments that actively perturb the building and generate data that can be used for more accurate parameter identification.

We provide an analysis of the model’s prediction accuracy versus its prediction horizon, and show that this model is limited in making longterm predictions, partly due to large disturbances such as internal gains, which are uncertain.

Finally, we propose to dynamically update the estimate of internal gains based on current temperature measurements, and show that this significantly improves the building model’s prediction accuracy.
This paper is organized as follows: We begin by describing the building and the excitation experiments in Section II. We then present the building model in Section III. Section IV describes our identification method for a fixed model and reports the prediction results. A model that is updated online is presented in Section V. Finally, we provide a discussion of uses and challenges related to building models.
Notation: Unless stated otherwise, subscripts in italics as in denote instances of variables. Upright subscripts as in denote variable names.
Ii Building and Excitation Experiments
Iia Building
Sutardja Dai Hall (SDH) is a building located on the University of California, Berkeley campus. For ease of presentation, we focus on the entire 4th floor of SDH, which has a total floor area of 1300m. As shown in Figure 1, we aggregate the rooms into 6 zones: Northwest (NW), West (W), South (S), East (E), Northeast (NE) and Center (C). The northside rooms are grouped into 2 zones because of their distinct characteristics: rooms in zone Northwest are offices with windows, whereas zone Northeast includes elevators, restrooms and utility rooms, and does not have windows.
This building is equipped with a variable air volume (VAV) HVAC system, that is common to 30% of all U.S. commercial buildings [19]. The system contains large supply fans that drive air over cooling coils, cooling it down to a desired supply air temperature, and then distribute the air to VAV boxes that govern the airflow to different building zones. The supply air may be reheated at the VAV box before entering the room. The 4th floor of SDH is served by 21 VAV boxes.
IiB Excitation Experiments and Data
Data was collected during 11 nonconsecutive weeks between September 2014 and June 2015. This time span includes periods when the building was under normal operation as well as periods with excitation experiments. Recorded data points include room temperatures measured at all VAV boxes on the 4th floor of SDH, air inflow rates from each VAV box, HVAC system’s supply air temperature, outside ambient air temperature and solar radiation data recorded from a nearby weather station [20].
For accurate parameter identification, temperatures of neighboring zones should not be strongly correlated [21]. For buildings in regular operation, this is generally achievable through forced response experiments. Because of commercial buildings’ large thermal inertia, each forced excitation must last sufficiently long before temperature changes are observed. With these points in mind, we conducted our experiment as follows: Starting at 8am, every 2 hours, the supply airflow rate to one zone is set to its maximum value, minimum airflow rates are set for each of its neighboring zones and a random airflow rate is chosen for each remaining zone. This is repeated for each of the 6 zones. This experiment is performed during weekends as (a) it minimizes effects due to building occupancy on our data, and thus the subsequent parameter identification; (b) temporary violation of comfort constraints during the weekend was allowed.
Iii Building Model
Iiia RC Modeling and the BRCM Toolbox
We derive a ResistanceCapacitance (RC) model for our building using the Building ResistanceCapacitance Modeling (BRCM) MATLAB toolbox [17]. The RC modeling methodology first decomposes a building into building elements (BE), such as the bulk volume of air in each room, walls, floors and ceilings. Then, an electric analogy is used to derive an equivalent electrical circuit whose resistances and capacitances represent thermal resistances and thermal capacitances of the BEs, and voltages and currents represent temperatures of BEs and heat transfers between those. The thermal resistances and capacitances of BEs are completely characterized by their geometry and construction data such as density, convection coefficient and specific heat capacity.
In the BRCM toolbox, a building model consists of two parts: a thermal submodel and external heat flux submodels (EHFM). The thermal submodel describes passive heat transfer between the BEs and the EHFMs capture heat gain or loss due to external inputs and disturbances such as the outside environment. The BRCM toolbox semiautomates the derivation of an RC model by automatically computing the thermal submodel using the electrical circuit analogy and an input file that contains geometry and construction data of all BEs (e.g. we use an EnergyPlus file developed for the 4th floor of SDH as our input file). The EHFMs can be user defined, as different buildings may be subject to distinct inputs and disturbances.
IiiB Building Model
In this section, we first describe the EHFMs for our building and then present the final statespace model of the building.
There are 3 EHFMs:

Building hull: convective heat transfer and solar radiation gains across exterior walls and windows.

HVAC: heat gain from the HVAC system.

Internal gains (IG): heat gain due to occupancy, electrical appliances and other unmodeled disturbances.
Let be the state vector whose elements are the temperatures of all BEs on the 4th floor of SDH^{1}^{1}1In the EnergyPlus file for the 4th floor of SDH, each wall, floor and ceiling is decomposed into 2 to 3 BEs. In RC building models, the temperature of each BE is modeled by one state variable, thus the model of the 4th floor of SDH has a large number of states: 289., be the air inflow rate from the 21 VAV boxes on this floor and be the disturbance vector whose elements represent ambient air temperature, supply air temperature from the HVAC system and solar radiation from the four geographical directions, respectively.
Building Hull: If the th BE is connected to the building hull, e.g. a room adjacent to the exterior wall, then the external heat fluxes acting on it due to the outside environment are modeled as:
(1)  
where and are the total areas of the exterior wall and the window respectively, and is the solar radiation affecting this BE. , , and are tuning parameters of the model and their descriptions are given in Table I.
HVAC: If the th BE is a room equipped with at least one VAV box, then the heat flux acting on it is:
(2) 
where is the specific heat capacity of air, is the set of VAV boxes serving the th room and is the th element of . Note that due to the lack of temperature measurements of the supply air at the outlet of VAV boxes, is the supply air temperature upstream of the VAV boxes’ heating coils, i.e., heat gains due to reheating at the VAV boxes are not modeled by (2), but are captured by the internal gains EHFM in our model.
Internal Gains: If the th BE is a room, then it is also subject to internal gains, which are modeled as:
(3) 
where is the room’s floor area. is an unknown constant vector representing a background timeinvariant heat gain per unit area in each of the 6 zones, due to idle appliances. The function is an unknown function that captures timevarying internal gains in different zones. Finally, and are the relevant elements of and that correspond to the th room.
After defining all EHFMs, the BRCM toolbox automatically generates the following model:
(4)  
where in the first equality, , represent the thermal submodel, and are the EHFMs, which are vectorvalued functions as follows: if the th BE is not subject to a specific EHFM, say HVAC, then , otherwise is given by Equation (2). The second equality is obtained by expressing and as functions of , and , using (1) to (3). The bilinearities in (4) naturally arise from the physics of the HVAC system (refer to Equation (2)).
Finally, we discretized the model using a fixed time step of 15 min to obtain its approximate discrete time model, which is semiparametric and bilinear:
(5)  
where , and are as defined before, and represents the measured average temperature of each zone.
Parameter  Description  Value [Unit] 

exterior wall convection coeff.  10.5 [W/(mK)]  
interior wall convection coeff.  29.4 [W/(mK)]  
floor convection coeff.  51.5 [W/(mK)]  
ceiling convection coeff.  44.3 [W/(mK)]  
ext. wall solar absorption coeff.  0.75 []  
window solar absorption coeff.  0.03 []  
window heat transmission coeff.  0.63 [W/(mK)]  
background heat gain in zone NW  0.3 [W/m]  
background heat gain in zone W  8.0 [W/m]  
background heat gain in zone S  18.8 [W/m]  
background heat gain in zone E  8.0 [W/m]  
background heat gain in zone NE  11.0 [W/m]  
background heat gain in zone C  8.0 [W/m] 
Iv Model Identification
The model identification process is carried out in two steps. In Section IVA, we identify the model parameters listed in Table I. To simplify the parameter identification process, we use the approximation during weekend days, in order to reduce (5) to a purely parametric model. With the optimal parameter values in hand, we then identify the function in Sections IVB.
Iva Parameter Identification
The model parameters are estimated using data collected during two weekends in spring and summer, when excitation experiments were carried out. The identified model is then validated on data collected during a weekend in fall (using ).
Let be the parameter vector whose elements are those parameters listed in Table I. We choose the optimal that solves the following optimization problem:
(6)  
s.t.  
where , and are measured zone temperatures, inputs and disturbances, respectively; and is the initial state estimated using a Kalman Filter. Estimating is necessary since not all states can be measured (for example, the wall temperatures are not). In other words, we choose the set of parameters such that when the model is simulated with this set of parameters and the measured inputs and disturbances, the sum of squared errors between the measured temperatures and the simulated temperatures is minimized. We can use prior knowledge about the building to compensate for limited excitation of the system, e.g. we use initial guesses for parameter values that are physically plausible.
The identified parameter values are reported in Table I. The rootmeansquare (RMS) errors between the model’s simulated temperature for different zones and the measured temperatures for the training data are shown in Table II. This table also shows the RMS prediction errors when the identified model is used to predict a validation dataset (also on weekend data, using ).
NW  W  S  E  NE  C  Mean  

Training  0.46  0.41  0.24  0.34  0.27  0.28  0.333 
Validation  0.62  0.57  0.31  0.28  0.39  0.31  0.413 
IvB Identification of the TimeVarying Internal Gains
A random subset of 8 weeks is selected from the entire dataset and used as training data for estimating the timevarying internal gains function , and the remaining 3 weeks of data are used as a validation set. For each week in the training set, we estimate an instance of this function, . The final estimate of the function is defined as the average of all estimates .
More specifically, at each time , let and denote the simulated state and measurement vectors with , i.e.,:
(7)  
By noting , we can estimate by solving the following set of linear equations using ordinary leastsquares:
(8) 
where is the measured zone temperature at time from the th training week. Finally, the estimate is obtained by:
(9) 
IvC Impact of Internal Gains
The estimated average increase in room temperature due to internal gains, i.e., is shown in red in Figure 2. It can be observed that internal gains profiles vary on both long and short time horizons from approximately 0°C to 1°C. A slightly larger increase in temperature of approximately 1.4°C is reported in [9] for a similar office space. This may be because the internal gains term in their model also includes heat gain from solar radiation, whereas our model captures the effects of solar radiation separately. Observe that the internal gains profiles increase during the day, peaking in the early afternoon and then slowly decrease, reaching a minimum at night. In addition, the profiles’ peaks are slightly lower during the weekends. These patterns coincide with when building occupants typically come into and leave the office.
The gray lines in Figure 2 show the same quantity estimated for each training week, i.e., . It can be observed that different zones experience different variations in internal gains across the training weeks. The zones West, East and Center are workspaces of students who tend to have regular schedules and hence, more regular internal gains patterns. On the other hand, the remaining three zones experience more uncertainty in internal gains, possibly due to the presence of windows, elevators and staircases, and known inaccuracies in the EnergyPlus input file for zone Northwest.
The identified model with (a fixed function) is used to make 24hour predictions of zone temperatures for all 3 weeks in the validation set, i.e., the state vector is estimated by a Kalman Filter every 24 hours. Figure 4 shows the results for one of these weeks and the average RMS prediction error for all validation weeks is reported in Table III. Furthermore, Figure 3 shows that the model’s prediction accuracy decreases with increasing prediction horizon, which could be explained by uncertainties in internal gains as well as model inaccuracies. It is interesting to observe that the RMS prediction error for zone Northwest is the largest and it also increases the fastest as the prediction horizon increases, which is in accordance with this zone experiencing large variations in internal gains (Figure 2) and its geometry data in the EnergyPlus input file being inaccurate.
V Online Update of the Internal Gains Function
The previous section shows that the large timevarying internal gains are difficult to capture a priori, nevertheless, they can significantly affect our model’s prediction accuracy. In light of this, we consider a learning based approach in this section, where we update the internal gains function online using past observations.
In other words, instead of estimating a fixed function a priori, at every time , we estimate using (7) and (8), and then use the following simple model
(10) 
to obtain an online estimate of , which is then used in (5) to predict and . The intuition for (10) is that internal gains do not change significantly from time to (i.e., 15 min).
This model is simulated for all 11 weeks of data, one of which is shown in Figure 5. The average RMS prediction errors are reported in Table III. Thus, by dynamically updating online, the model’s prediction accuracy is improved by 36% on average compared with when a fixed internal gains function was used.
NW  W  S  E  NE  C  Mean  

Fixed  0.84  0.42  0.48  0.43  0.36  0.38  0.485 
Online Updated  
0.50  0.31  0.15  0.41  0.32  0.16  0.308 
Vi Discussion
In Section IV, we conducted excitation experiments to actively perturb our building. Data collected during the experiments and additional weekends is used with the approximation that the timevarying internal gains are zero, to identify the model parameters. Then, we estimate a fixed internal gains function, , using 8 weeks of measurements. The resulting model is used to make 24hour predictions of the building’s temperature profile for 3 additional weeks, and an average RMS error of 0.48°C is achieved. Figures 2 and 3 suggest that our building is subject to large uncertain internal gains which make accurate long term predictions difficult. For buildings that are subject to fewer uncertainties, a model identified using this procedure may achieve better prediction accuracy.
In addition, there are several approaches that can be taken to further enhance the model’s prediction accuracy. When weekend data is used to identify the model parameters, more sophisticated approximations of the occupancy function, such as sinusoids, can be used. Sensors can be installed in the VAV boxes to measure the temperature of the supply air, from the HVAC system, downstream of the heating coils. Moreover, occupancy sensors can be used to improve the estimate of internal gains and hence the model’s prediction quality.
In Section V, we dynamically updated our estimate of the internal gains function using current temperature measurements. More specifically, we assume the current heating load, due to internal gains, remains constant during the next time step. We apply this model to make 24hr predictions and demonstrate that its prediction accuracy is significantly improved (compare Figure 4 with Figure 5). In addition, using more sophisticated regression techniques and taking into account other factors such as past heating loads and room temperatures may further improve the prediction accuracy and extend the prediction horizon.
For the frequency regulation application, a model is first applied to estimate the building’s power consumption for the next 24 hours, in order to determine its reserve capacity for the dayahead reserve market. MPC can then be used to provide these reserves without violating comfort constraints. Thus, by learning online and dynamically updating it in an MPC controller, errors from the 24hour prediction may be corrected during reserve provision.
Vii Conclusions
We describe an approach to construct a physicsbased model of a multizone commercial building, which uses experimental data measured during regular building operation. We show that large uncertainties in internal gains present several challenges in applying the model for long term prediction of a building’s thermal dynamics. In addition, we show that by dynamically updating the estimates of internal gains, the model’s prediction accuracy is improved significantly.
Future work will investigate the tradeoff between uncertainty in internal gains and prediction accuracy, as well as the necessary model complexity for a good control performance, in particular for harnessing building flexibility. In addition, we are working on experimentally verifying the performance of controllers designed using our building model.
Acknowledgment
The authors thank Rongxin Yin for providing the EnergyPlus model, David Sturzenegger for his assistance with using the BRCM Toolbox and the SDH building manager Domenico Caramagno for his help in facilitating our experiments.
References
 [1] “The annual energy outlook 2013,” US Energy Information Administration, Tech. Rep., 2013.
 [2] A. Parisio, L. Fabietti, M. Molinari, D. Varagnolo, and K. H. Johansson, “Control of HVAC Systems via ScenarioBased Explicit MPC,” IEEE Conference on Decision and Control, 2014.
 [3] J. Široky, F. Oldewurtel, J. Cigler, and S. Prívera, “Experimental Analysis of Model Predictive Control for an Energy Efficient Building Heating System,” Applied Energy, vol. 88, pp. 3079–3087, 2011.
 [4] F. Baccino, F. Conte, S. Massucco, F. Silvestro, and S. Grillo, “Frequency regulation by management of building cooling systems through model predictive control,” 18th power systems computation conference, pp. 1–7, August 2014.
 [5] M. Maasoumy, C. Rosenberg, A. SangiovanniVincentelli, and D. Callaway, “Model predictive control approach to online computation of demandside flexibility of commercial buidings HVAC systems for supply following,” American Control Conference, pp. 1082–1089, June 2014.
 [6] E. Vrettos, F. Oldewurtel, F. Zhu, and G. Andersson, “Robust provision of frequency reserves by office building aggregations,” Proceedings of the 19th IFAC World Congress, 2014.
 [7] E. Vrettos, F. Oldewurtel, and G. Andersson, “Robust energyconstrained frequency reserves from aggregations of commercial buildings,” IEEE Transactions on Power Systems, 2016.
 [8] M. Balandat, F. Oldewurtel, M. Chen, and C. Tomlin, “Contract design for frequency regulation by aggregations of commercial buildings,” 52nd Annual Allerton Conference on Communication, Control and Computing, September 2014.
 [9] A. Aswani, N. Master, J. Taneja, V. Smith, A. Krioukov, D. Culler, and C. Tomlin, “Identifying models of HVAC systems using semiparametric regression,” American Control Conference, 2012.
 [10] A. Aswani, N. Master, J. Taneja, A. Krioukov, D. Culler, and C. Tomlin, “Energyefficient building HVAC control using hybrid system LBMPC,” 4th IFAC Nonlinear Model Predictive Control Conference, August 2012.
 [11] D. Zhou, Q. Hu, and C. J. Tomlin, “Model comparison of a datadriven and a physical model for simulating HVAC systems,” arXiv:1603.05951, 2016.
 [12] Y. Lin, P. Barooah, S. Meyn, and T. Middelkoop, “Experimental evaluation of frequency regulation from commercial building HVAC system,” IEEE Transactions on Smart Grid, vol. 6, no. 2, pp. 776–783, 2015.
 [13] M. Maasoumy, M. Razmara, M. Shahbakhti, and A. SangiovanniVincentelli, “Handling model uncertainty in model predictive control for energy efficient buildings,” Energy and Buildings, vol. 77, pp. 377–392, 2014.
 [14] F. Oldewurtel, A. Parisio, C. Jones, M. Morari, D. Gyalistras, M. Gwerder, V. Stauch, B. Lehmann, and K. Wirth, “Energy efficient building climate control using stochastic model predictive control and weather predictions,” American Control Conference, pp. 5100–5105, July 2010.
 [15] Y. Ma, F. Borrelli, B. Hencey, B. Coffey, S. Bengea, and P. Haves, “Model predictive control for the operation of building cooling systems,” IEEE Transactions on control systems technology, pp. 796–803, 2011.
 [16] H. Hao, T. Middelkoop, P. Barooah, and S. Meyn, “How demand response from commercial buildings will provide the regulation needs of the grid,” 50th Annual Allerton Conference on communication, control and computing, pp. 1908–1913, October 2012.
 [17] D. Sturzenegger, D. Gyalistras, M. Morari, and R. Smith, “Semiautomated modular modeling of buildings for model predictive control,” BuildSys 2012 – Workshop of SCM SenSys Conference, 2012.
 [18] B. Sun, P. Luh, Q. Jia, Z. Jiang, F. Wang, and C. Song, “Building energy management: integrated control of active and passive heating, cooling, lighting, shading, and ventilation systems,” IEEE Transactions on automation science and engineering, pp. 588–602, 2012.
 [19] “Commercial buildings energy consumption survey (cbecs): Overview of commercial buildings, 2003,” Energy Information Administration, U.S. Department of Energy, Tech. Rep., 2008. [Online]. Available: http://www.eia.doe.gov/emeu/cbecs/cbecs2003/overview1.html
 [20] “CIMIS station reports,” California Irrigation Management Information System, Tech. Rep., 2015. [Online]. Available: http://www.cimis.water.ca.gov/
 [21] Y. Lin, T. Middelkoop, and P. Barooah, “Issues in identification of controloriented thermal models of zones in multizone buildings,” 51st IEEE Conference on Decision and Control, December 2012.