Transfer Learning for HVAC System Fault Detection
Abstract
Faults in HVAC systems degrade thermal comfort and energy efficiency in buildings and have received significant attention from the research community, with data driven methods gaining in popularity. Yet the lack of labeled data, such as normal versus faulty operational status, has slowed the application of machine learning to HVAC systems. In addition, for any particular building, there may be an insufficient number of observed faults over a reasonable amount of time for training. To overcome these challenges, we present a transfer methodology for a novel Bayesian classifier designed to distinguish between normal operations and faulty operations. The key is to train this classifier on a building with a large amount of sensor and fault data (for example, via simulation or standard test data) then transfer the classifier to a new building using a small amount of normal operations data from the new building. We demonstrate a proofofconcept for transferring a classifier between architecturally similar buildings in different climates and show few samples are required to maintain classification precision and recall.
I Introduction
Buildings account for roughly 40% of electrical demand in the United States [2] and climate control is one of the largest sources of power consumption in many buildings. The normal operation of heat, venting, and air conditioning (HVAC) systems is therefore critical for simultaneously maintaining energy efficiency and thermal comfort. Because of the widespread deployment of sensors, multiple datadriven algorithms have consequently been developed to detect faulty operation of HVAC systems [19, 11, 14]. A fundamental challenge of applying these types of machine learning algorithms, however, is the lack of labeled data.
Datadriven fault detection algorithms rely on having data about both the normal and faulted operation of HVAC systems [11, 23]. Despite the growth in buildings equipped with a large number of sensors that can generate high resolution measurements [4], it is nontrivial and labor intensive to correctly label each data point as coming from faulty or normal operation. In addition, HVAC systems mostly operate (fortunately) under fairly normal conditions. Therefore operators may need to either 1) wait for a long time to collect enough fault data—even if they can be correctly labeled—to train a useful algorithm, or 2) rely on established industry standards [22] and exhaustively simulate potential scenarios. Neither option takes direct and immediate advantage of the rich stream of data from wellequipped buildings.
Transfer learning [13] provides a potential solution to these challenges. A predictor trained on an existing, labeled data set can be used as a starting point to train a predictor for the same task in which labeled data is limited but known to be in a similar embedding [12]. Transfer learning has been used successfully in image classification [15, 25]. An image classifier, for example, is trained to recognize a set of image labels; to transfer the classifier to new images, initializing with the previously learned classifier requires far fewer examples of the new labels to achieve good accuracy. Transfer learning has only very recently begun to be applied to, for example, predicting energy consumption in buildings [14, 18].
In this work we develop a transferable, naive Bayesian framework for detecting faults and failures resulting from component degradation in three key steps:

We derive a novel loglikelihood classifier that depends only on building normal operations data and an estimated state transition matrix

For a building with a large, labeled data set of HVAC component operations and weather data, we learn a normal operations state transition matrix

With the same model parameters, we transfer the learned state transition matrix with a limited number of samples from a similar building
We accomplish item (1) by specifying a matrix normal prior to derive a novel loglikelihood classifier that determines whether a series of HVAC system state observations was generated by the learned state transition matrix or by some other faulty state transition matrix having arisen as a random perturbation. Item (3) employs weighted least squares to transfer the learned state transition matrix to a new building for which labeled data is limited but the feature space is similar—e.g. type and number of relevant HVAC components—using model parameters learned in item (2).
To test our framework we first perform simple, motivating numerical simulations. We then proceed to transfer an hourly state transition model trained on a standard medium office building simulated by EnergyPlus [3] to a physically monitored [9] testbed site, the Systems Engineering Building (SEB) located at Pacific Northwest National Laboratory (PNNL) [6, 7] in Richland, WA. We separately compare how effectively a classifier can be transferred between similar office buildings simulated by EnergyPlus in different climates subject to known fault conditions. In Sec. II we introduce the model and Sec. III describes the classifier and transfer procedure. In Sec. IV we present our results, and we conclude with Sec. V.
Ii Model
The state of an HVAC system can be described by a linear transformation with a polynomial kernel [1]. Let be a dependent state variable (e.g. fan power, indoor temperature, pump status) and be a independent state variable (e.g. time, outdoor temperature, humidity). Furthermore, let —the vector componentwise raised to the ’th power for and concatenated—and the kernelized, concatenated dependent and independent state vectors of the HVAC system of dimension . We will assume that if an HVAC system is operating normally, then a finite sequence of observations of will have been generated by
(1) 
where and is the true state transition matrix, which can be estimated via kernel regression. A primary advantage of using kernel regression is that the state estimator is readily interpretable and easy to use in transfer learning, while in general also requiring fewer samples to parameterize and train than a more general model like a neural network [1, 18, 21].
Iii Fault Detection and Model Transfer
Iiia Bayesian Fault and Degredation Detection
An entry of determines the relationship between the input and output states for explicit components of an HVAC system directly—we leverage this to derive a Bayesian classifier for determining if an HVAC system is operating in a faulty state without assuming explicitly what the faulty state transition matrix should look like.
By a faulty state, we mean that the HVAC is governed by some other transition matrix, denoted . For an observation of , there are two distinct probabilities: either the current dependent state of the HVAC system was generated by the normal state transition matrix , or by some faulty state transition matrix .
To compute these probabilities and derive a classifier, we make two assumptions: 1) we assume a Gaussian prior probability on the entries such that (i.e. a matrix normal distribution [8] centered at ) and 2) initially, an HVAC system is equally likely to be operating in a faulty state or a normal state at any given time. Note these assumptions are used to derive the detection algorithm and provide some intuition, but the actual simulation study in Sec. IV uses real data which may not follow them. The probabilities of observing conditioned on the state transition matrices and are
(2a)  
(2b) 
Eqn. (8a) is the probability a sample was generated by some other state transition matrix assumed to have a Bayesian prior probability distribution of a matrix normal random variable with mean and identity row and columnwise covariance (explained in Sec. IV). This approach fundamentally differs from many HVAC fault detection systems are rulebased [17, 22], or from algorithms that are learned from rulebased scenario generation [20], as no modes of failure are assumed beforehand. These methods can be combined to produce more informative priors for entries of that describe the relationship between fan power consumption and air flow volume, for example. Furthermore, directly incorporating failure statistics with informed HVAC component failure rates [10] is also possible but outside the scope of this paper as we seek to naively transfer a state transition matrix. By naive, we mean that no ground truth fault data is required to train. Once these probabilities are computed we can state a simple classification rule as
(3) 
where a positive difference in loglikelihoods indicates the system is operating normally and a negative value indicates the observed data was generated by faulty operations.
We show that (3) depends only on the entries of the normal operations state transition matrix and the sequence of observations . Indeed, for a single point , the classification rule (3) simplifies to,
(4) 
where
(5a)  
(5b) 
For convenience, let us define the binary classification output of (4) to be the function .
IiiB Model Transfer
Since the classification rule only depends on the observed data and the true state transition matrix , a reliable empirical estimation of can be used in the classifier to distinguish normal operations from previously unseen faulty operations. The process of transferring a learned classifier from one building to another is outlined in Alg. 1. We use data collected from building 1, and generated by a building which has been certified to be running or simulated at normal operations to estimate by solving the least squares problem:
(6) 
To transfer the classifier to an architecturally similar building 2, a new set of samples and is collected from the building. Rather than solving (3) over again, however, we use the previously learned as the initial value when solving the new kernel regression problem with weighted least squares [16] (WLS) to find the new estimation of the state transition matrix . Data from building 1, and are given low weight, and new data and are given higher weight. The building 1 data set serves to constrain the degrees of freedom of the new model, while the building 2 data set updates the operating levels (e.g. temperatures, power consumption) of the various HVAC components under consideration.
Because the thermodynamic laws that govern the HVAC systems in building 1 and building 2 are the same—only the climate, operating characteristics of the HVAC components, and the building materials may differ—we assume the span of and to be similar sets; thus initializing gradient descent at to learn will require fewer samples from building 2. Crossvalidation is used to determine optimal choices of weights for WLS.
This algorithm requires that building 1 and building 2 have comparable HVAC systems, where each system component represented in a row or column in has an analogous entry (possibly aggregated, e.g. sum of supply and exhaust fan power) in the true state transition matrix we wish to transfer our estimate to. Note that this requirement can be relaxed for if only a subset of components are of interest. That is, building 1 only needs to have similar components to those we wish to detect faulty operation for in building 2.
Iv Results
Here we state how the classifier is computed from the probabilities found in (4). Then we present illustrative numerical results on the number of additional samples required to transfer a classifier based on a matrix learned from data simulated by EnergyPlus to a new matrix based on data from PNNL’s SEB. Then we conclude with transferring a classifier designed to detect EnergyPlus simulated degradation in a variable air volume (VAV) box supply fan due to a fouling filter from an office building operating in a Seattle winter climate to a similar building operating in a Seattle summer climate.
Iva Posterior probability of fault matrix
In order to derive the equation found in (4), we need to compute (8a): the posterior probability of the fault matrix . An matrix normal random variable centered at has a PDF of the form
(7a) 
This is a random matrix where each element is normally distributed around , with rowwise covariance matrix () and columnwise covariance matrix (). If we assume the covariance matrices and are identity, then each element of is independent of the others.
With an abuse of notation on the indefinite integral, a matrix normal prior on with identity row and columnwise covariance, the conditional probability of observed given is given by,
(8a) 
We can compute this probability by combining the exponential terms, completing the square, and rescaling the integrand such that the integral evaluates to 1 by definition. We therefore have that (8a) equals
(9a)  
(9b)  
(9c) 
and results in a square in terms of . Letting matrix and matrix defined as (5a) and (5b) respectively we have that
(10a)  
(10b) 
Note that the first trace term in (10b) contains the only term with . We can factor the second and third trace terms out of the integrand in (8a). Indeed,
(11a) 
Let,
(12) 
is analagous to the columnwise covariance size . Noting that is always invertible and also size , we rescale the integral by such that it evaluates to 1 (by definition in (7a)), thus we have that,
(13a)  
(13b)  
(13c) 
Thus we have the probablility that the observed samples were not generated by the true state transition matrix ,
(14) 
IvB Classifier
Combining probabilities (2a) and (14), we can use the difference of their respective loglikelihoods to derive the classifier by substituting the computed probabilities into (3). For classifier values greater than 0, an observation is more likely to have come from an HVAC system operating normally, while values less than 0 indicates otherwise. Simplifying gives us the state classification rule in (4).
The joint probability of output values conditioned on the input values is independent since the noise is i.i.d. This means the loglikelihood is additive, and for multiple samples, (3) can be written as,
(15) 
and a sequence of observations can be used to increase the confidence of the classifier . This assumes that both the noise in the data and the degree of perturbation in faulty state transitions has unit variance. This is not the case in our EnergyPlus simulations, let alone in practice. To overcome this, in Sec. IVE we use the logdet term and separated trace terms of (4) as input features of a logistic regression classifier; practically we find and demonstrate below that this allows us to momentarily sidestep the problem of estimating the variance of the elements of when subject to the occurrence of a fault, but demonstrates the validity of the terms of (4) as being the correct featurization of data and estimated for naive fault classification. We will denote this modification of the classifier as .
IvC Numerical Results: Classification
As an initial demonstration of the effectiveness of via Monte Carlo, we consider a fixed matrix with diagonal entries and and elsewhere. For each Monte Carlo iteration, we perturb the entries of such that to generate an unseen faulty operations matrix . We then generate 1000 IID samples of inputoutput pairs and using (1) where input values are also normally distributed with zero mean and unit variance.
Using (15) as our classification rule , we compute the F1 score for increasing numbers of sample pairs (“lag”) per classification. Figure 1 illustrates the F1 score as a function of for each realization of , with simple lines of best fit to illustrate the trend. As the Frobenius norm of the difference between and increases the the bases of and are more likely to diverge and span increasingly different sets. Once more than a very small difference between and emerges, the F1 score of approaches 1.
IvD Transfer Results: Feasibility and samples savings performance
Here we demonstrate that a state transition model learned on one building can be transferred to another with similar HVAC system components using a limited number of samples. Building 1 is simulated with EnergyPlus (v. 9.1.0) using a preconfigured example medium office building and Seattle, WA weather data. Building 2 is the SEB on PNNL’s main campus in Richland. For both buildings we collect hourly total building power demand, primary air handling unit (AHU) supply and exhaust fan power demand, total lighting power demand, main floor internal zone temperature, as well as outdoor temperature, humidity, day of week and hour of day. All training data is normalized by feature according to the minimum and maximum training data values and time data is embedded as coordinates on a unit circle.
Each building differs in a small number of ways: the medium office building configuration in EnergyPlus is 3 floors, with three AHU’s and 3 VAV boxes—1 VAV box per floor; hot and chilled water are provided for onsite. The SEB building has 23 VAV boxes served by two AHU’s and 2 additional AHU’s with constantspeed fans serving one constant air volume box each. For the purpose of the transfer we select a single, centrally located VAV box on the 1st floor of the EnergyPlus simulated medium office building, and on the SEB main floor (VAV100), and take the indoor zone temperature measured near each central VAV box. Hot water for SEB is provided for onsite but chilled water is supplied by a central campus plant
We learn a state transition matrix for building 1, , using 6 months (JanuaryJune) of operations training data simulated by EnergyPlus. Crossvalidation for the selection of the polynomial degree parameter (), as well as a weight regularization parameter () on the Frobenius norm of , was performed with 6 months of validation data (JulyDecember). Using identical model parameters, a state transition matrix Fig. 2 illustrates the averaged mean squared error (MSE) of when trained on an increasing number of consecutive hourly samples both with the full training data set from building 1 (“Transfered ”) and without data from building 1 (“Scratch ”). The baseline in Fig. 2 is the best possible validation performance when is trained on the full SEB training data set.
In each case—transfer and scratch—100 training sequences of SEB data two weeks in length were randomly sampled from the training data set (MarchJune) and the validation data (JulyOctober) performance were averaged across training instances. The dramatically increased performance of the transferred given approximately 3 days worth of samples stems the inclusion of data from building 1 constraining the degrees of freedom of the model by using WLS (with weights and for building 1 and SEB data respectively). The transferred model achieves an MSE of 0.041 (on 01 normalized data) vs an MSE of 0.267 for a model learned from scratch, with the best possible performance of an MSE of 0.019 on validation data for a model trained on all SEB training data.
As a comparison, the same procedure is then used to learn a simple, feedforward neural network with 1 hidden layer twice the dimension of the input equipped with a sigmoid activation function to demonstrate the transfer procedure is not implicit to the usage of weighted least squares or linear methods. is trained with respect to MSE loss on the same input/output pairs, data, and featurization, with a hidden layer twice the dimension of the input. A neural network (NN) is learned for building 1, and is transferred to a model of SEB by initialized SGD at the weights learned in and by using the same data selection procedure as the WLS transfer method. We achieve near baseline MSE with two days worth of samples when transferring, and notably better validation performance overall due to using a more complex model than a single matrix. While more accurate at predicting future states on the same data sets, this neural network lacks intepretability and compatibility with our naive classifier (3).
IvE EnergyPlus Simulation Results: Transfer
We again use EnergyPlus to simulate true HVAC system operations for a preconfigured medium office building. For normal operations during weekday working hours, we learn a state transition matrix that maps input containing only outdoor and indoor temperature (on each of 3 floors), outdoor humidity, and VAV box supply fan power consumption (3 fans for 3 VAV boxes total) for 8 consecutive hours and polynomial kernel to single output of VAV box supply fan power consumption in the next hour.
First we simulate the office building for an entire year in typical Seattle weather. is computed by solving the least squares problem (6) using normal operations data only. Using EnergyPlus’ builtin fault simulation, we then simulate a single VAV box supply fan’s filter in the building as becoming increasingly fouled [24]
The row and columnwise covariance of the true, normal operations state transition matrix are not identity. Without knowing how the covariance in the simulation data scales the terms of the traces in (4), we use these terms as inputs to a logistic regression classifier which we then train after solving for on normal, nonfaulty operations data. For a full year of simulated normal and fault building operations, without covariance information achieves a 58.8% and 56.13% validation precision and recall using only indoor, outdoor temperature, humidity, and fan power consumption data on individual samples. Fig. 4 illustrates net fault (+1) vs. no fault (1) classifications for a continuous sequence of simulated validation data.
To test transferring a state transition matrix from an office building operating in a Seattle winter climate to a building operating in a Seattle summer climate . We solve (6) with only two weeks of normal operation data in the winter to find . When tested on two weeks of winter validation fault/no fault data, trained on two weeks of winter fault/no fault data achieves a precision of 55.15% and recall of 54.59%. To transfer to we use two weeks of summer normal operations data, and following Alg. 1, initialize (6) with and apply WLS to compute . trained on two weeks of summer fault/no fault data achieves a precision of 56.37% and recall of 58.67% on summer validation data according to eqn. (15).
V Conclusion
In this work we have demonstrated a novel, transferable Bayesian classifier for detecting faults due to HVAC component degradation. We employed a matrix normal prior distribution on the grounds that if a linear, timedynamic system described by a matrix under normal operations begins to fail, the failure process is generated by an unseen matrix . Our classifier depends only on and the observed data, and transferring via weighted least squares to a new building is sample efficient. Future work in this direction is rich: utilizing informed priors in the classifier with system knowledge, known failure rates, and fault rules; accounting for differences in empirical covariance between buildings to eliminate the need of a logistic layer around and thus the requirement for sample fault data; and learning control theory compatible state transition matrices via, for example, regularization and eigenvalue constraints.
Footnotes
 All code and EnergyPlus configurations used for this paper are available in our Git repository (github.com/cpatdowling/building_transfer/notebooks/build_sys_2019_current.ipynb).
 Further details about and floor plan of the building can be found in [5].
 EnergyPlus Version 8.9.0 Documentation Engineering Reference Section 11.2.4 ’Air Filter Fouling’ https://energyplus.net/sites/all/modules/custom/nrel_custom/pdfs/pdfs_v8.9.0/EngineeringReference.pdf
References
 (2012) Kernel regression for realtime building energy analysis. Journal of Building Performance Simulation 5 (4), pp. 263–276. Cited by: §II.
 (2016) Building energyconsumption status worldwide and the stateoftheart technologies for zeroenergy buildings during the past decade. Energy and buildings 128, pp. 198–213. Cited by: §I.
 (2001) EnergyPlus: creating a newgeneration building energy simulation program. Energy and buildings 33 (4), pp. 319–331. Cited by: §I.
 (2011) Smart meters for power gridâchallenges, issues, advantages and status. In 2011 IEEE/PES Power Systems Conference and Exposition, pp. 1–7. Cited by: §I.
 (2019) Online learning for commercial buildings. In Proceedings of the Tenth ACM International Conference on Future Energy Systems, pp. 522–530. Cited by: footnote 2.
 (2018) Online verification of transactive control for commercial buildings. In 2018 IEEE Power & Energy Society General Meeting (PESGM), pp. 1–5. Cited by: §I.
 (2019) Design and implementation of a test bed for building controls. Building Services Engineering Research and Technology, pp. 0143624419846775. Cited by: §I.
 (2018) Matrix variate distributions. Chapman and Hall/CRC. Cited by: §IIIA.
 (2013) VOLTTRONâ¢: an agent platform for integrating electric vehicles and smart grid. In 2013 International Conference on Connected Vehicles and Expo (ICCVE), pp. 81–86. Cited by: §I.
 (2000) Survey of reliability and availability information for power distribution, power generation, and hvac components for commercial, industrial, and utility installations. In 2000 IEEE Industrial and Commercial Power Systems Technical Conference. Conference Record (Cat. No. 00CH37053), pp. 31–54. Cited by: §IIIA.
 (2016) Fault detection and diagnosis for building cooling system with a treestructured learning method. Energy and Buildings 127, pp. 540–551. Cited by: §I, §I.
 (2008) Spectral domaintransfer learning. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 488–496. Cited by: §I.
 (2017) Deep transfer learning with joint adaptation networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 2208–2217. Cited by: §I.
 (2019) Machine learning methods to assist energy system optimization. Applied Energy 243, pp. 191–205. Cited by: §I, §I.
 (2007) Selftaught learning: transfer learning from unlabeled data. In Proceedings of the 24th international conference on Machine learning, pp. 759–766. Cited by: §I.
 (1994) Multivariate locally weighted least squares regression. The annals of statistics, pp. 1346–1370. Cited by: §IIIB.
 (2006) A hierarchical rulebased fault detection and diagnostic method for hvac systems. HVAC&R Research 12 (1), pp. 111–125. Cited by: §IIIA.
 (2018) Deeplearning neuralnetwork architectures and methods: using componentbased models in buildingdesign energy prediction. Advanced Engineering Informatics 38, pp. 81–90. Cited by: §I, §II.
 (2017) Advanced detection of hvac faults using unsupervised svm novelty detection and gaussian process models. Energy and Buildings 149, pp. 216–224. Cited by: §I.
 (2011) A dynamic machine learningbased technique for automated fault detection in hvac systems.. ASHRAE Transactions 117 (2). Cited by: §IIIA.
 (2017) Deep reinforcement learning for building hvac control. In Proceedings of the 54th Annual Design Automation Conference 2017, pp. 22. Cited by: §II.
 (2012) RP1312–tools for evaluating fault detection and diagnostic methods for airhandling units. Technical report ASHRAE, Tech. Rep. Cited by: §I, §IIIA.
 (2011) Automated fault detection and diagnosis of hvac subsystems using statistical machine learning. In 12th International Conference of the International Building Performance Simulation Association, Cited by: §I.
 (2016) Modeling and simulation of operational faults of hvac systems using energyplus. Cited by: §IVE.
 (2018) Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710. Cited by: §I.