Transfer Learning for Non-Intrusive Load Monitoring

Transfer Learning for Non-Intrusive Load Monitoring

Michele DÍncecco, Stefano Squartini, and Mingjun Zhong M. DÍncecco was with Dipartimento di Ingegneria dell’Informazione - Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy. E-mail:micheledincecco@yahoo.it.S. Squartini was with Dipartimento di Ingegneria dell’Informazione - Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy. E-mail: s.squartini@univpm.it.M. Zhong was with the School of Computer Science in the University of Lincoln, Lincoln LN6 7TS, UK. E-mail: mzhong@lincoln.ac.uk.
Abstract

Non-intrusive load monitoring (NILM) is a technique to recover source appliances from only the recorded mains in a household. NILM is unidentifiable and thus a challenge problem because the inferred power value of an appliance given only the mains could not be unique. To mitigate the unidentifiable problem, various methods incorporating domain knowledge into NILM have been proposed and shown effective experimentally. Recently, among these methods, deep neural networks are shown performing best. Arguably, the recently proposed sequence-to-point (seq2point) learning is promising for NILM. However, the results were only carried out on the same data domain. It is not clear if the method could be generalised or transferred to different domains, e.g., the test data were drawn from a different country comparing to the training data. We address this issue in the paper, and two transfer learning schemes are proposed, i.e., appliance transfer learning (ATL) and cross-domain transfer learning (CTL). For ATL, our results show that the latent features learnt by a ‘complex’ appliance, e.g., washing machine, can be transferred to a ‘simple’ appliance, e.g., kettle. For CTL, our conclusion is that the seq2point learning is transferable. Precisely, when the training and test data are in a similar domain, seq2point learning can be directly applied to the test data without fine tuning; when the training and test data are in different domains, seq2point learning needs fine tuning before applying to the test data. Interestingly, we show that only the fully connected layers need fine tuning for transfer learning.

NILM, Non-Intrusive Load Monitoring, Energy Disaggregation, Deep Neural Networks, Transfer Learning, Sequence-to-point Learning.

I Introduction

As the climate changes, the governments are committed to reduce CO2 emissions, which triggers the demand to reduce energy consumption globally [1, 29]. This requires to understand how the energy was used and thus lead to the optimised energy management which could finally help us to consume energy efficiently. Therefore energy management has become nowadays an increasing research field, which particularly is drawn attention to by machine learning community due to the emerging large scale data which needs to be understood in principle.

In household energy management, appliance load monitoring is a research topic which investigates to retrieve the real-time energy consumption of each appliance within household buildings [37, 5]. Since the purpose is to monitor all the appliances in a household, the instant power readings need to be recorded. Perhaps the easiest way to obtain those power readings is to deploy smart sensor devices for each appliance which are able to record the power consumption. These data could be used to retrieve the entire information about the energy consumption of the household buildings. However, this method is an intrusive load monitoring approach, and in addition each device is expensive to install and hard to maintain at the same time. In order for avoiding the use of sensor devices, an alternative approach which is called Non-Intrusive Load Monitoring (NILM) [17] was proposed to recover the energy consumption of each appliance. The aim of NILM is to extract energy consumption of each appliance in a household by only using the mains readings.

When NILM is applied to domestic buildings, it can inform the end-users the power consumption and consequently it could help householders to reduce the energy consumption. Studies have shown that, a feedback to the users is a useful tool to aid the users to understand how the energy was used, which acts as a “self-teaching tool[11]. The feedback to the users provided by smart meters could positively affect the energy consumption behaviours. Research has shown that NILM could help household users to reduce energy consumption by 15% [15]. It is even more useful for the energy providers which can optimise their smart grid operations on one side and propose specific tariffs for users according to their energy consumption habits.

Recently, machine learning methods —  typically deep learning algorithms —  have been greatly developed with the help of big data. Specifically in energy management, this encourages researchers to develop machine learning approaches to NILM since energy data of hundreds of households have been and are being collected from different countries. Two kinds of machine learning algorithms, i.e., unsupervised and supervised learning, are the main methods for NILM. Among those algorithms, deep learning approaches have achieved the state-of-the-art performance. Particularly, it has been shown in literature that convolutional neural networks (CNN) are able to extract meaningful latent features for appliances which are particularly useful for NILM and thus achieves the best performance [32]. Our hypothesis is that those features extracted by using CNN are invariant across appliances and as well as across data domains. To investigate this hypothesis, we propose appliance transfer learning (ATL) and cross-domain transfer learning (CTL). For ATL, the target is to evaluate the algorithms if the features learnt by one appliance are able to be transferred to other appliances. For CTL, the target is to evaluate the algorithms if they could be transferred from one data domain to another. Transfer learning is interesting for NILM because of the following reasons: 1) The ground truth active power data for each appliance is expensive to obtain. Transfer learning could potentially reduce the number of sensors for each appliance to be installed since available models could be transferred to other appliances or domains. 2) Transfer learning also offers remarkable computational savings as pre-trained models can be reused for other appliances or domains.

The structure of this paper is as follows: In section II a background of NILM is presented; Section III presents the seq2point learning algorithm and as well as the transfer learning; Experimental results are shown in section IV, and finally we draw our conclusions.

Ii Background

This section will introduce the NILM model in details. The existing machine learning approaches to NILM will be reviewed. These approaches are mainly classified into unsupervised and supervised learning methods.

Firstly we summarise the problem of non-intrusive load monitoring. Suppose that the mains are the aggregation of all the active power consumption of the individual appliances in a household. Note that denotes the mains reading at time . Then the mains could be represented as the following formula,

(1)

where represents the power reading of the appliance at time , is the number of appliances and is the variable representing the model noise. It is quite often that the noise variable follows a Gaussian with mean and variance , i.e., . Given the observed mains , our task is then to recover all the appliances . Clearly, this is a Single-channel Blind Source Separation problem, which means that given only a single observation we want to recover more than one sources. It is well-known that this problem is non-identifiable. Various approaches have been proposed to approach the NILM by alleviating the identifiability issue. Naturally, it is believed that this problem could be alleviated if the domain knowledge could be incorporated into the model. In literature, there are mainly two approaches to NILM, which are unsupervised and supervised learning. We now briefly review the existing algorithms belonging to these two approaches.

Ii-1 Unsupervised learning

The dominated approach in unsupervised learning is the additive factorial hidden Markov model (AFHMM) [24, 35, 27, 7]. This is a natural approach to modelling the NILM problem if we admit the following hypothesis: 1) an appliance is a realisation of a hidden Markov model; 2) the power reading of the mains at a time is the addition of the appliances at that time. Under these assumptions, NILM can be represented as an AFHMM. The aim is then to infer the most probable states of those appliances. However, the problem is still unidentifiable.

In order for improving the performance of AFHMM, constraints were imposed on the model from different aspects. The main approach was to constrain the model with domain knowledge. For example, local information such as appliance power levels, ON-OFF state changes and duration can be incorporated into the model [24, 27, 28, 33, 31]. The methods in [24, 34] proposed to use change-one-at-a-time constraint on the appliances. Global information such as total energy consumption, total number of cycles, and the total time of an appliance being occupied was also proposed to incorporate into the model [35, 36, 4, 13]. Signal processing methods were also proposed to incorporate useful information into NILM [28, 33, 8, 18]. However, the main limitation of these approaches is that the domain knowledge needs to be extracted manually from the observation data, which makes the method difficult to use.

Ii-2 Supervised learning

Recently, many household electricity data have been published. These data have both the mains and correspondingly the appliance power readings, which makes it possible to formulate NILM as a supervised learning. Precisely, we have observed the pairs of the data where and denote respectively the power reading of an appliance and the mains at time . Because there are plenty of observations, it is possible to train a model to represent the relationship between and . It is thus reasonable to learn a function to represent this relationship so that

(2)

which can be viewed as a non-linear regression problem. Deep neural networks is a natural approach to learn the function , and recently, it has been successfully applied to NILM [19, 32, 30, 10, 2, 3, 6, 26, 12, 9, 14, 21, 16]. From the point of the view of statistical methods, it has been pointed out in [32] that given the pairs we could train a model to represent a conditional distribution . Recently, both sequence-to-sequence and sequence-to-point learning methods were proposed to approximate this distribution [32]. Interestingly, those desired features, for example, appliance power levels, ON-OFF state changes and duration, were able to be learnt by the neural networks. These features do not need to be extracted manually.

As NILM is proposed as a non-linear regression problem, the function could be represented as various kinds of forms. In recent literature, various architectures of the neural networks representing have been proposed, which are de-noising auto-encoder [19, 6], convolutional neural networks (CNN) [32, 30, 10, 26], recurrent neural networks (RNN) [19]. Among these methods, CNN was believed performing the best [32, 30]. However, the hyper-parameters of the architectures, e.g, number of layers, could affect the performance of those neural networks.

Iii Methods

In this section, the methodologies employed in this paper are described. Most of the deep learning methods were assessed on the same domain. Less work has been done on the generalizability except the work of [26], in which the authors considered cross-domain transfer learning. However, their work does not study fine tuning and appliance transfer learning. In this paper we investigate transferability (generalizability) of deep neural networks applying to NILM in wider aspects. Among neural based techniques for NILM, the sequence-to-point learning is very well suited for transfer learning strategies, due to its architecture characteristics. We will thus consider the transferability of the seq2point learning. Two transfer learning approaches are considered which are appliance transfer learning and cross-domain transfer learning. Firstly, the seq2point learning is presented as follows.

Iii-a Sequence-to-point learning

The main idea of the seq2point learning is to learning a neural network to represent the midpoint of an appliance given a window of the mains being the input. See the Figure 1 for the architecture for seq2point learning. Precisely, the input of the network is a mains window , and the output is the midpoint element of the corresponding window of the target appliance, where .The representation assumes that the midpoint element is a non-linear function of the mains window. The intuition behind this assumption is that we expect the state of the midpoint element of that appliance to relate to the information of mains before and after that midpoint.

Instead of mapping sequence to sequence, the seq2point architectures define a neural network which maps sliding windows of the input to the midpoint of the corresponding windows of the output. The model is . The loss function used for training has the form

(3)

where are the network parameters. To deal with the endpoints of the sequence, given a full input sequence , we first pad the sequence with zeros at the beginning and end. The advantage of the seq2point model is that there is a single prediction for every , rather than an average of predictions for each window.

It has been shown that seq2point learning performs well on UK-DALE and REDD data [32]. For UK-DALE, the method was trained on the houses 1,3,4 and 5 and tested on the house 2; for REDD, the method was trained on houses and tested on house 1. This indicates that the seq2point learning was only evaluated on the same domain. However, the ability of generalisation is unclear. To deploy a system employing seq2point method, it is crucial to assess the generalisability of the method. In the following sections, we consider appliance transfer learning and cross-domain transfer learning.

Fig. 1: Architecture for the sequence-to-point learning.

Iii-B Appliance transfer learning

We consider appliance transfer learning (ATL). Our question is that if the signatures learnt by one appliance could be transferred to other appliances. We observed that the features learnt by difference appliances have the similar patterns. It is interesting to note that mainly the features were ON-OFF changes, the power levels, and the duration of an activity. For example, Figure 2 demonstrates the features produced by the models trained by using kettle, microwave, fridge, dish washer and washing machine data. In this figure, all the five models used the same input window. Our observation is that the features learnt by different appliances are often different, although they are learning similar signatures. Some appliances, e.g., washing machine and dish washer, extract more information than others, e.g., kettle. Most of the feature channels from the kettle were not active; for example, in Figure 2, only two channels actively have significant signatures. Comparing to kettle, the washing machine learned more information; many more channels are actively showing interesting signatures.

Since all the appliances are learning similar signatures from the data, it is possible to share the same feature channels across all the appliances. If this was feasible, potentially this approach could greatly help to reduce the training costs and the number of sensors to be installed in a household. Our approach is then to train a model using only an individual appliance. The trained CNN layers will be directly applied to other appliances, and only the fully connected layers will be trained for a different appliance. We choose to use the CNN layers trained by using washing machines.

Fig. 2: The features learnt by using seq2point learning. The first row shows the input to the networks and the ground truth of an appliance. The other rows show the outputs of the last convolutional layer of seq2point models trained by using kettle, microwave, fridge, dish washer, and washing machine.
Fig. 3: Transfer learning used in this paper. For the appliance transfer learning, the CNN layers trained by appliance A are transferred to appliance B and frozen; only the fully connected layers are trained for appliance B. For cross-domain transfer learning, if domain A and domain B are similar, both CNN and fully connected layers trained in domain A are transferred to B; if they are different, the CNN layers are transferred from A to B and only fully connected layers are trained on domain B.

Iii-C Cross domain transfer learning

We are interested in cross domain transfer learning (CTL) for seq2point learning. In this paper, the different domain means that the data were collected from different regions either in the same or a different country. The model can be trained on one domain and tested on a different domain. Our aim is to evaluate the model and investigate how to transfer the model from one domain to another. Our purpose is to transfer the model by using as little information as possible from the test domain. We will employ fine tuning for transfer learning.

The seq2point model is trained against a large data set. Then, two transfer learning approaches will be investigated in this paper. The first approach is to directly apply the trained model to a test data in a different domain; the second approach is to fine tune the pre-trained model by using a small subset from the test data. It is interesting to note that we will not tune the CNN layers for both approaches, because our experiments suggested that tuning CNN layers do not improve the performance. Therefore, only the full connected layers are tuned in our transfer learning methods. Both ATL and CTL are illustrated in Figure 3.

Iv Experimental Setup

Iv-a Data sets

Several open-source data sets are available for the purpose of energy disaggregation. These data were measured in household buildings from different countries. The sensors installed in these buildings read active power, but some sensors also read other information, for example, reactive power, current, and voltage. In NILM, the active power data are used. However, the main difference between the data sets is the sampling frequency. Due to this issue, pre-processing for aligning the readings need to be done before NILM algorithms are applied to the data. In literature, five appliances are usually considered for disaggregation [32, 19], which are kettle, microwave, fridge, dish washer and washing machine. In our experiments, three household electricity data sets will be used, which are REFIT [25], UK-DALE [20], and REDD [23].

Iv-A1 Refit

The REFIT data were collected from 20 buildings in England (Loughborough area) and cover a period between 2013 and 2015 [25]. The data contain active power sampled every 8 seconds for both mains and individual appliances.

This data set is the largest among the three data sets in our study, and therefore is used for training the deep learning models. We expected that the large amount of electricity readings would be able to generalise the trained deep learning methods, which can then be applied to other unseen houses. Note that before training the models, the whole data were inspected and we found that houses 13 and 21 were using solar panel energy production, which were thus not used for our current research.

Iv-A2 Uk-Dale

The UK-DALE (UK Domestic Appliance-Level Electricity) contains 5 buildings in the UK during the period from 2013 to 2015. The sampling periods for mains and appliances were and , respectively. Further details and statistics can be found in [20].

Iv-A3 Redd

The Reference Energy Disaggregation Data Set (REDD) is a data measured for 6 buildings in US. Measurements include mains with sampling period and several appliances with sampling period. High-frequency current and voltage measurements are also available at sample frequency. The lengths of observations were between 3 and 19 days.

Iv-A4 Preparing training and test data

The deep learning models for appliances were trained individually, which indicates that the model of an appliance was trained by only using the data of that appliance. When preparing the training data, we firstly inspect the mains and appliance readings by visualising them. It is very often that both the mains and appliances were missing large chunks of readings, which may be caused when the hardware went wrong. For example, the sensors may be turned off or the batteries ran out. These large chunks of missing data were then removed from the training data. In other cases, some missing data were small, for example, the missing data were just from seconds to minutes. These missing data were kept in the data because they could be treated as noise, which could lead to a better regularized model. The splits for the REFIT data are shown in the Table I, which were used for all of our experiments.

Training Validation Test
Appliance house samples (M) time (Y) house samples (M) time (Y) house samples (M) time (Y)
Kettle 3, 4, 6, 7, 8, 9 59.19 15 5 7.43 1.9 2 5.73 1.5
12, 13, 19, 20
Microwave 10, 12, 19 18.22 4.6 17 5.43 1.4 4 6.76 1.7
Fridge 2, 5, 9 19.33 4.9 12 5.86 1.5 15 6.23 1.6
Dish w. 5, 7, 9, 13, 16 30.82 9.8 18 5 1.3 20 5.17 1.3
Washing m. 2, 5, 7, 9, 15, 16, 17 43.47 11 18 5 1.3 8 6 1.5
TABLE I: Distribution of the REFIT dataset. M: Millions; Y: years.

The buildings 1 and 2 from UK-DALE were used for our experiments since the data in other buildings are small. All the five appliances from buildings 1 and 2 were used for testing. There were 4.01 millions data samples recorded in around 12 months which were used for our experiments. Since sample period in UK-DALE was , comparing to the REFIT which has , the UK-DALE data were down-sampled to .

In REDD data, four appliances which are microwave, fridge, dish washer and washing machine from house 1, 2, and 3 were used for our experiments. Each appliance has around 1.2 million data samples which were recorded in about 4 months. It should be noted that kettle was not recorded in this data set.

All the data need to be preprocessed before they were used for training and testing. Firstly, the data were normalized by using the following formula,

(4)

where denotes a reading at time , denotes the mean value of an appliance or the mains, and denotes the standard deviation of an appliance or the mains. The mean and standard deviation values used for normalisation are shown in the Table III. These values were only used for the purpose of normalising the data, instead of informing the means and variances of those appliances. After the data were normalised, they can then be fed into the models for training.

mean standard deviation
Aggregate 522 814
Kettle 700 1000
Microwave 500 800
Fridge 200 400
Dishwasher 700 1000
Washing machine 400 700
TABLE III: The parameters used for training the models
Hyper-parameters for training
Input window size (samples) 599
Number of maximum epochs 50
Batch size 1000
Minimum early-stopping epochs 5
Patience of early-stopping (in epochs) 5
Parameters used for ADAM optimiser
Learning rate 0.001
Beta1 0.9
Beta2 0.999
Epsilon
TABLE II: Parameters for normalising the data

Iv-B Metrics

Three metrics will be employed in this paper to evaluate the algorithms. The first metric is the mean absolute error (MAE), which evaluates the absolute difference between the prediction and the ground truth at every time point and calculates the mean value

(5)

The second metric is the normalised signal aggregate error (SAE), which indicates the relative error of the total energy. Denote as the total energy consumption of the appliance and the predicted total energy, then SAE is defined as

(6)

The third metric is the energy per day (EpD):

(7)

where denotes the energy consumed in a day period and is the total number of days. This metric indicates the absolute error of the predicted energy used in a day, which is typically useful when the household users are interested in the total energy consumed in a period.

Iv-C Settings for training neural networks

In accordance with the network’s architecture of the sequence-to-point learning, a fixed-length window of aggregate active power consumption signal is given as input. For the neural networks employed in this paper, a sample window has 599 data points. However, other window lengths should also be evaluated. The best length could be selected experimentally, for example the work in [30] studied the best window length for sequence-to-sequence model. In our experiments, a sample window was generated by sliding the window forward by a single data point, and so all the possible sample windows were used for training. The windows of the mains were used as the inputs to the neural networks, whilst the midpoints of the corresponding windows of the appliances were used as the targets.

Tensorflow were used to train the model. The ADAM optimiser algorithm [22] was used for training and the early-stopping criterion was employed to reduce overfitting. The hyper-parameters for training and as well as the ADAM optimizer parameters are shown in the Table III. Note that dropout were not used in our experiments, which could be explored in future development.

V Experimental Results

The sequence-to-point learning were original applied to relatively small data sets which are UK-DALE and REDD [32]. The model was only evaluated at the same domains, which means that the model was trained on UK-DALE and then tested on the UK-DALE data, and the same procedure was applied to the REDD data. However, it is unclear if the seq2point learning could be generalised in different domains. It is well known that deep neural networks are often overfitting on a domain. This often occurs when the training domain has a significantly different distribution to the testing domain. In NILM application, the household data were collected from different countries. The appliances may have significantly different patterns in different countries. Even in the same country, the patterns of the appliances may also be different from different households. Our experiments will investigate the transferability of seq2point learning and show how the seq2point learning could be used for transfer learning.

Firstly, the seq2point learning will be trained and test in the same domain. The first columns in the Tables IV, V, and VI are the results on training and testing on the same domain. Note that the results on REDD and UK-DALE were taken from [32].

Secondly, seq2point was trained on REFIT and tested on UK-DALE and REDD. The second columns in Tables V and VI show the results on UK-DALE and REDD respectively. In terms of MAE and SAE, testing on UK-DALE the transfer learning interestingly improves the performance of seq2point learning; however, testing on REDD, the transfer learning did not improve the performance of seq2pint learning. The reason could be that REFIT and UK-DALE are in the similar domain because the data were collected in the same country. REFIT is a much larger data than UK-DALE, and so the model was better generalised. REDD does not have the similar domain to REFIT, which might be the reason that transfer learning was not improving the performance of seq2point learning.

V-1 Appliance transfer learning

According to the analysis in III-B, the CNN trained on washing machine will be used to all the other appliances. The CNN layers were trained using REFIT. In the appliance transfer learning, those trained CNN layers will be used directly as the CNN layers for other appliances. Only the fully-connected layers are trained for other appliances such as kettle, microwave, fridge and dishwasher. Note that the training hyper parameters in table III were used for training the dense layers. Since the CNN layers were not trained, the training time was reduced to a half of the time for training all the network parameters.

Table IV shows the results of ATL, where the CNN layers were trained on REFIT using washing machine, and tested on REFIT. The results are comparable to the standard training. Figures 4, 5, 6, 7 and 8 plotted the predicted power reading traces for each appliance, where standard prediction means the prediction produced by the model trained on that appliance, whilst the prediction with transferred CNN means the prediction produced by the ATL. We then examine if the CNN layers trained on REFIT can be transferred to UK-DALE and REDD. Two approaches were examined. Firstly, the CNN layers were trained individually on REFIT for each appliance, and then the fully connected layers were trained on UK-DALE and REDD. The model was then tested on UK-DALE and REDD. The results are shown in the third columns in the Table V and VI. Secondly, the CNN layers trained on REFIT were used for all the other appliances. Then the fully connected layers were trained on UK-DALE and REDD. The results are shown in the fourth columns in Table V and VI.

Comparing the two approaches for transferring CNN models, our results suggest that it is reasonable to use only the CNN layers trained on washing machine for all the appliances. This suggest that we could only need to collect data for washing machine, and thus could reduce the cost for hardware.

Trained on REFIT CNN trained on REFIT (using W.M.)
Tested on REFIT Tested on REFIT
MAE SAE EpD [Wh] MAE SAE EpD [Wh]
Kettle 6.830 0.130 153.92 12.690 0.050 96.04
Microwave 12.660 0.170 95.78 13.380 0.050 107.730
Fridge 20.020 0.330 216.85 20.560 0.440 270.48
Dish w. 12.260 0.260 180.00 12.360 0.610 206.15
Washing m. 16.850 2.610 319.11 16.850 2.610 319.11
Overall 13.724 0.702 193.13 15.168 0.752 149.62
mean std 4.477 0.957 74.32 3.135 0.955 87.765
TABLE IV: Appliance transfer learning for sequence-to-point learning. W.M. denotes Washing Machine.

V-2 Cross-domain transfer learning

In this section, we evaluate seq2point learning on the cross-domain transfer learning (CTL). The CTL here means that the neural networks are trained on a large data, which are then used as the pre-trained model for other data sets, where the neural networks will be fine tuned in the new domain.

The seq2point models were trained on REFIT. We then tested the trained model on both UK-DALE and REDD. The second column in the Table V shows the CTL results on UK-DALE. The first column in the same table shows the results when the model was trained on REFIT and tested on REFIT. The results show that the performances of the two schemes are similar. This suggests that the model trained on REFIT can be directly applied to UK-DALE for prediction. The first and second columns in Table VI show the results for REDD. However, the performance of seq2point learning was greatly decreased when it was trained on REFIT than when it was trained on REDD. This suggests that the model could not be directly transferred to REDD from REFIT. The intuition is that the signatures of the appliances in REDD are different to those in REFIT, because they are from different countries.

We then consider CTL, which means that the CNN layers are trained on REFIT and then the fully connected dense layers are trained on a small subset taken from REDD and UK-DALE, respectively. The results are shown in the third columns in Tables V and VI. It is interesting to note that the performance of seq2point was increased on REDD, but decreased on UK-DALE. It should be noted that the domains of UK-DALE and REFIT are similar. In contrast, the domain of REDD is different to REFIT. This may suggest that the fine tuning on the similar domain may cause overfitting. Fine tuning can help to improve the performance of seq2point learning for different domains.

We conclude that when the domains are different, fine tuning does help to improve the performance of seq2point learning. In contrast, when the domains are similar, we may not need fine tuning.

Trained on UK-DALE Trained on REFIT CNN trained on REFIT, dense CNN trained on REFIT (using W.M.),
Appliance layers trained on UK-DALE dense layers trained on UK-DALE
Tested on UK-DALE [32] Tested on UK-DALE Tested on UK-DALE Tested on UK-DALE
MAE SAE MAE SAE EpD [Wh] MAE SAE EpD [Wh] MAE SAE EpD [Wh]
Kettle 7.439 0.069 6.260 0.060 41.08 16.879 0.043 72.92 16.159 0.205 121.03
Microwave 8.661 0.486 4.770 0.080 32.01 10.973 0.019 63.41 27.56 0.40 280.27.03
Fridge 20.894 0.121 17.000 0.090 113.87 33.078 0.266 297.75 28.379 0.011 107.09
Dish w. 27.704 0.645 16.490 0.130 165.06 41.106 0.516 647.46 23.537 0.192 298.89
Washing m. 12.663 0.284 14.840 0.500 229.82 22.941 0.899 329.83 20.209 0.388 304.11
Overall 15.472 0.321 11.800 0.172 116.37 24.995 0.349 282.27 23.169 0.239 222.278
mean std 7.718 0.217 5.222 0.166 74.88 10.8775 0.329 213.35 4.57 0.144 88.824
TABLE V: Sequence-to-point learning tested on UK-DALE. W.M. denotes Washing Machine.
Trained on REDD Trained on REFIT CNN trained on REFIT, dense CNN trained on REFIT (using W.M.)
Appliance layers trained on REDD dense layers trained on REDD
Tested on REDD [32] Tested on REDD Tested on REDD Tested on REDD
MAE SAE MAE SAE EpD [Wh] MAE SAE EpD [Wh] MAE SAE EpD [Wh]
Microwave 28.199 0.059 23.106 0.357 208.02 27.792 0.023 247.57 13.806 0.144 80.90
Fridge 28.104 0.180 38.637 0.022 205.48 34.906 0.057 118.02 35.932 0.285 303.24
Dish w. 20.048 0.567 29.677 0.711 499.52 25.002 0.007 274.62 13.597 0.122 234.85
Washing m. 18.423 0.277 36.832 0.736 750.85 17.991 0.128 181.31 43.775 0.692 826.08
Overall 23.693 0.270 32.063 0.457 415.98 26.422 0.054 205.38 26.778 0.311 361.27
mean std 4.494 0.187 6.162 0.292 227.29 6.061 0.047 60.80 13.367 0.229 280.18
TABLE VI: Sequence-to-point learning tested on REDD. W.M. denotes Washing Machine.
Fig. 4: Prediction for washing machine using seq2point learning.
Fig. 5: Standard prediction using seq2point model trained on kettle data and the prediction using appliance transfer learning where the CNNs were trained on washing machine data.
Fig. 6: Standard prediction using seq2point model trained on microwave data and the prediction using appliance transfer learning where the CNNs were trained on washing machine data.
Fig. 7: Standard prediction using seq2point model trained on fridge data and the prediction using appliance transfer learning where the CNNs were trained on washing machine data.
Fig. 8: Standard prediction using seq2point model trained on dish washer data and the prediction using appliance transfer learning where the CNNs were trained on washing machine data.

Vi Conclusions

This work presents transfer learning for NILM. Typically, we considered the transferability of seq2point learning. In the previous studies, seq2point learning and testing were only considered in the same domain. In contrast, we consider to learn the model in one domain and test in a different domain in this paper. Our hypothesis was that the appliance features extracted by using convolutional neural networks are invariant across appliances and as well as across data domains. Our experiments on transfer learning support our hypothesis. Two different transfer learning approaches are studied experimentally, which are appliance transfer learning and cross-domain transfer learning.

Seq2point models were trained on REFIT, and then tested on REDD and UK-DALE. Our conclusions are the following. Firstly, concerning ATL, the CNN layers trained on washing machines can be applied to other appliances. Basically, this indicates that all appliances use the similar signatures for NILM purposes. Therefore, we can use a large data to train the CNN layers, and then apply the trained CNN layers to any other unseen data to obtain the signatures. The fully connect layers can then be trained solely on the unseen data. Secondly, regarding CTL, we found that if the domains of training and testing data are similar, the trained model may not require fine tuning; if the domains are different, fine tuning does help to improve the performance of the trained model. The intuition is that if the domains are similar, fine tuning may lead to over-fitting since only a small subset was used for fine tuning. On the other hand, if the domains are different, fine tuning does help the model to adapt for the unseen domain.

The benefits introduced by the adoption of a transfer learning strategy are: firstly, it could potentially reduce the number of sensors for each appliance to be installed in households since the trained features could be transferred to other appliances or domains and thus reduce financial cost; secondly, transfer learning does offer remarkable computational savings since pre-trained models can be reused for other appliance or domains.

References

  • [1] D. Archer. Global Warming: Understanding the Forecast. Wiley, 2012.
  • [2] Kaibin Bao, Kanan Ibrahimov, Martin Wagner, and Hartmut Schmeck. Enhancing neural non-intrusive load monitoring with generative adversarial networks. Energy Informatics, 1(1):18, 2018.
  • [3] Karim Said Barsim and Bin Yang. On the feasibility of generic deep disaggregation for single-load extraction. arXiv preprint arXiv:1802.02139, 2018.
  • [4] Nipun Batra, Amarjeet Singh, and Kamin Whitehouse. Gemello: Creating a detailed energy breakdown from just the monthly electricity bill. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 431–440. ACM, 2016.
  • [5] R. Bonfigli, S. Squartini, M. Fagiani, and F. Piazza. Unsupervised algorithms for non-intrusive load monitoring: An up-to-date overview. In 2015 IEEE 15th International Conference on Environment and Electrical Engineering (EEEIC), pages 1175–1180, June 2015.
  • [6] Roberto Bonfigli, Andrea Felicetti, Emanuele Principi, Marco Fagiani, Stefano Squartini, and Francesco Piazza. Denoising autoencoders for non-intrusive load monitoring: Improvements and comparative evaluation. Energy and Buildings, 158:1461–1474, 2018.
  • [7] Roberto Bonfigli, Emanuele Principi, Marco Fagiani, Marco Severini, Stefano Squartini, and Francesco Piazza. Non-intrusive load monitoring by using active and reactive power in additive factorial hidden Markov models. Applied Energy, 208:1590 – 1607, 2017.
  • [8] Aggelos S. Bouhouras, Paschalis A. Gkaidatzis, Evangelos Panagiotou, Nikolaos Poulakis, and Georgios C. Christoforidis. A NILM algorithm with enhanced disaggregation scheme under harmonic current vectors. Energy and Buildings, 183:392 – 407, 2019.
  • [9] Cillian Brewitt and Nigel Goddard. Non-intrusive load monitoring with fully convolutional networks. arXiv preprint arXiv:1812.03915, 2018.
  • [10] Kunjin Chen, Qin Wang, Ziyu He, Kunlong Chen, Jun Hu, and Jinliang He. Convolutional sequence to sequence non-intrusive load monitoring. arXiv preprint arXiv:1806.02078, 2018.
  • [11] S. Darby. The effectiveness of feedback on energy consumption: A review of the literature on metering, billing and direct displays. pages 1–21, 2006.
  • [12] Leen De Baets, Joeri Ruyssinck, Chris Develder, Tom Dhaene, and Dirk Deschrijver. Appliance classification using VI trajectories and convolutional neural networks. Energy and Buildings, 158:32–36, 2018.
  • [13] C. Dinesh, S. Makonin, and I. V. Bajic. Incorporating time-of-day usage patterns into non-intrusive load monitoring. In 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 1110–1114, Nov 2017.
  • [14] Ahmed F Ebrahim and Osama A Mohammed. Energy disaggregation based deep learning techniques: A pre-processing stage to enhance the household load forecasting. In 2018 IEEE Industry Applications Society Annual Meeting (IAS), pages 1–8. IEEE, 2018.
  • [15] Corinna Fischer. Feedback on household electricity consumption: a tool for saving energy? Energy efficiency, 1(1):79–104, 2008.
  • [16] A. Harell, S. Makonin, and I.V. Bajic. WaveNILM: A causal neural network for power disaggregation from the complex power signal. In 2018 the 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.
  • [17] G. W. Hart. Nonintrusive appliance load monitoring. Proceedings of the IEEE, 80(12):1870–1891, Dec 1992.
  • [18] K. He, L. Stankovic, J. Liao, and V. Stankovic. Non-intrusive load disaggregation using graph signal processing. IEEE Transactions on Smart Grid, 9(3):1739–1747, May 2018.
  • [19] Jack Kelly and William Knottenbelt. Neural NILM: Deep neural networks applied to energy disaggregation. In ACM International Conference on Embedded Systems for Energy-Efficient Built Environments, pages 55–64. ACM, 2015.
  • [20] Jack Kelly and William Knottenbelt. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes. Scientific data, 2:150007, 2015.
  • [21] Jihyun Kim, Thi-Thu-Huong Le, and Howon Kim. Nonintrusive load monitoring based on advanced deep learning and novel signature. Computational intelligence and neuroscience, 2017, 2017.
  • [22] D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. ArXiv e-prints, December 2014.
  • [23] J Zico Kolter and Matthew J Johnson. REDD: A public data set for energy disaggregation research. In Workshop on Data Mining Applications in Sustainability (SIGKDD), San Diego, CA, volume 25, pages 59–62. Citeseer, 2011.
  • [24] Zico Kolter and Tommi S Jaakkola. Approximate inference in additive factorial HMMs with application to energy disaggregation. In AISTATS, volume 22, pages 1472–1482, 2012.
  • [25] David Murray, Lina Stankovic, and Vladimir Stankovic. An electrical load measurements dataset of united kingdom households from a two-year longitudinal study. Scientific data, 4:160122, 2017.
  • [26] David Murray, Lina Stankovic, Vladimir Stankovic, Srdjan Lulic, and Srdjan Sladojevic. Transferability of neural networks approaches for low-rate energy disaggregation. In 2019 International Conference on Acoustics, Speech, and Signal Processing, 2018.
  • [27] O. Parson, S. Ghosh, M. Weal, and A. Rogers. Non-intrusive load monitoring using prior models of general appliance types. In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI-12), pages 356–362, July 2012.
  • [28] Sundeep Pattem. Unsupervised disaggregation for non-intrusive load monitoring. In International Conference on Machine Learning and Applications (ICMLA), volume 2, pages 515–520. IEEE, 2012.
  • [29] C. Rosenzweig, D. Karoly, M. Vicarelli, P. Neofotis, Q. Wu, G. Casassa, A. Menzel, T. L. Root, N. Estrella, B. Seguin, P. Tryjanowski, C. Liu, S. Rawlins, and A. Imeson. Attributing physical and biological impacts to anthropogenic climate change. Nature, 453:353–357, 2008.
  • [30] Changho Shin, Sunghwan Joo, Jaeryun Yim, Hyoseop Lee, Taesup Moon, and Wonjong Rhee. Subtask gated networks for non-intrusive load monitoring. Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019.
  • [31] S. M. Tabatabaei, S. Dick, and W. Xu. Toward non-intrusive load monitoring via multi-label classification. IEEE Transactions on Smart Grid, 8(1):26–40, Jan 2017.
  • [32] Chaoyun Zhang, Mingjun Zhong, Zongzuo Wang, Nigel Goddard, and Charles Sutton. Sequence-to-point learning with neural networks for nonintrusive load monitoring. In The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018.
  • [33] B. Zhao, L. Stankovic, and V. Stankovic. Blind non-intrusive appliance load monitoring using graph-based signal processing. In 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 68–72, Dec 2015.
  • [34] M. Zhong, N. Goddard, and C. Sutton. Interleaved factorial non-homogeneous hidden Markov models for energy disaggregation. In Neural Information Processing Systems, Workshop on Machine Learning for Sustainability, Lake Tahoe, Nevada, USA, 2013.
  • [35] Mingjun Zhong, Nigel Goddard, and Charles Sutton. Signal aggregate constraints in additive factorial HMMs, with application to energy disaggregation. In Advances in Neural Information Processing Systems, pages 3590–3598, 2014.
  • [36] Mingjun Zhong, Nigel Goddard, and Charles Sutton. Latent Bayesian melding for integrating individual and population models. In Advances in Neural Information Processing Systems, pages 3618–3626, 2015.
  • [37] Ahmed Zoha, Alexander Gluhak, Muhammad Ali Imran, and Sutharshan Rajasegarar. Non-intrusive load monitoring approaches for disaggregated energy sensing: A survey. Sensors, 12(12):16838–16866, 2012.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
340949
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description