Privacy-Preserving Distributed Parameter Estimation for Probability Distribution of Wind Power Forecast Error

Privacy-Preserving Distributed Parameter Estimation for Probability Distribution of Wind Power Forecast Error

Abstract

Building the conditional probability distribution of wind power forecast errors benefits both wind farms (WFs) and independent system operators (ISOs). Establishing the joint probability distribution of wind power and the corresponding forecast data of spatially correlated WFs is the foundation for deriving the conditional probability distribution. Traditional parameter estimation methods for probability distributions require the collection of historical data of all WFs. However, in the context of multi-regional interconnected grids, neither regional ISOs nor WFs can collect the raw data of WFs in other regions due to privacy or competition considerations. Therefore, based on the Gaussian mixture model, this paper first proposes a privacy-preserving distributed expectation-maximization algorithm to estimate the parameters of the joint probability distribution. This algorithm consists of two original methods: (1) a privacy-preserving distributed summation algorithm and (2) a privacy-preserving distributed inner product algorithm. Then, we derive each WF’s conditional probability distribution of forecast error from the joint one. By the proposed algorithms, WFs only need local calculations and privacy-preserving neighboring communications to achieve the whole parameter estimation. These algorithms are verified using the wind integration data set published by the NREL.

Wind power forecast error, Probability distribution, Distributed parameter estimation, Data privacy, Gaussian mixture model, Expectation-maximization algorithm
\setstretch

0.9

I Introduction

As the penetration of wind power continues to increase in multi-regional interconnected grids [22], a better understanding of wind power forecast error is highly desirable [4]. Building the probability distribution of wind power forecast error can benefit both wind farms (WFs) and regional independent system operators (ISOs). For the former, WFs can perform better in market bidding by quantifying stochastic features of their forecast errors [8]. For the latter, regional ISOs can make optimal decisions regarding stochastic economic dispatch [25] or can schedule enough reserves to meet the demand [29]. Note that, if we consider the correlation between WFs when estimating the probability distribution’s parameters, the reserve cost will be significantly reduced since the distributions of forecast errors will be more precise [24].

To take the wind power correlation into account, one should first establish the joint probability distribution of the wind power and the corresponding forecast data from correlated WFs. Then, one can derive the conditional probability distribution of the forecast error under a given forecast value from the joint one [26, 11]. An accurate distribution model is the prerequisite for the above. Therefore, we choose the Gaussian mixture model (GMM) as the distribution model [26], as the GMM can characterize multivariate random variables subject to an arbitrary distribution with remarkable performance [11].

The expectation-maximization (EM) algorithm is the most commonly used method to estimate the parameters of GMM [20, 33]. The centralized EM algorithm requires collecting all the historical wind power and forecast data to train the GMM-based joint probability distribution. After that, the conditional probability distribution of forecast error under a given forecast value can be derived from it. However, the centralized EM algorithm might not be practical. For example, in a multi-regional interconnected grid, the whole grid is actually controlled by multiple regional ISOs, respectively [7]. A central operator with access to all data of the whole grid may not exist for political and technical reasons [1]. Moreover, regional ISOs are unable to collect the raw data from other areas due to privacy considerations [23]. Furthermore, WFs with different stakeholders will not share their forecast values with others, as this may leak their commercial secrets in market bidding [11]. Except for the problem of the possible data barriers, the centralized architecture also faces the risk of single-point failures [18] and the requirement for high bandwidth [5]. To deal with the aforementioned dilemmas, a privacy-preserving distributed (PPD) EM algorithm is a better choice. Specifically, ‘distributed’ means that each WF only needs local calculations and neighboring communications with surrounding WFs, and “privacy-preserving” means that the raw data of a WF cannot be deduced by others during the whole estimation process.

In the data-mining field, many efforts have been made on PPD EM algorithms [6, 14, 13, 32]. The authors of [6] and [14] have proposed a PPD EM algorithm based on the secure sum technique. The algorithm can accurately calculate the sum of data without revealing the data privacy of any parties. A cyclic communication topology is adopted to perform the algorithm. Using the additive homomorphic encryption technique, Kaleb et al. present their PPD EM algorithm by encoding the raw data into cryptographic messages [13]. To prevent the leakage of data privacy when an adversary controls multiple parties, Kaleb et al. enforce one-way communication by a ring topology to guarantee the corruption-resistant feature of the proposed algorithm. Similar to [13], Yang et al. also utilize the additive homomorphic encryption technique to protect the raw data [32]. The differences lie in that (1) Yang et al. design a local-global secure summation protocol, and (2) the cryptographic messages are sent through a spanning tree communication topology in [32].

However, the above PPD EM algorithms mentioned in [6, 14, 13, 32], even including the privacy-free distributed EM algorithms in [28, 27, 17, 10], cannot be applied to the joint probability distribution estimation of correlated WFs because all these algorithms are designed for “horizontally partitioned data”. In fact, wind power and its forecast data are “vertically partitioned” among correlated WFs. Take three parties and 100 samples of 3-dimensional random variables as an example. The data being horizontally partitioned refer to the situation where party A owns 30 samples of 3-dimensional data, party B owns 40 samples, and party C owns 30 samples, while the data being vertically partitioned means that all parties own 100 samples but each of them only has one dimension of the 3-dimensional data. Since each WF has only its own historical wind power and forecast data, i.e., two dimensions of the full multidimensional data, wind power and forecast data are actually vertically partitioned among correlated WFs.

Moreover, the PPD EM algorithms in [6, 14, 13, 32] are not fully distributed. Both the preselected cyclic communication topology in [6, 14, 13] and the preselected spanning-tree communication topology in [32] need a global perspective for preselection. Besides, the failure of any communication line on the preselected path will make the whole algorithm fail.

In this paper, we aim to solve the above problems and develop a PPD EM algorithm to deal with vertically partitioned wind power and forecast data. This algorithm should enable each WF to obtain the GMM-based joint probability distribution in a fully distributed and privacy-preserving manner, and should be robust to communication line failures. After that, using the PPD derivation algorithm proposed in [12], each WF can eventually achieve its conditional probability distribution of forecast error considering the correlation of all WFs. The original contributions of this paper are as follows:

  1. We formulate a distributed framework for the EM algorithm by reformulating the algorithm into local and global parts. Then, the keys to developing a PPD EM algorithm for vertically partitioned wind power and forecast data are pointed out — to find PPD methods to calculate summations and inner products among the statistics of the correlated WFs.

  2. We propose a distributed summation algorithm by leveraging the average consensus algorithm. Then, this algorithm is modified with an additive homomorphic encryption technique to become a PPD summation algorithm. Moreover, we also propose a PPD inner product algorithm on the basis of the randomized binary hash mapping and the average consensus algorithm.

  3. Combining the proposed PPD summation and inner product algorithms, we finally propose a PPD EM algorithm to estimate the GMM-based joint probability distribution. This algorithm is fully distributed and strictly protects the raw data of the correlated WFs from leakage. Meanwhile, this algorithm is robust to communication line failures.

The remainder of this paper is organized as follows. In Section 2, the GMM-based joint and conditional probability distributions are described. In Section 3, the PPD framework for the EM algorithm is formulated to point out the keys to the realization of the PPD EM algorithm. The PPD summation algorithm is developed in Section 4, and the PPD inner product algorithm is designed in Section 5. In Section 6, the PPD EM algorithm is finally proposed. Case studies are provided in Section 7. Section 8 concludes this paper.

Ii GMM-based Probability Distributions

This section introduces the GMM-based joint and conditional probability distributions. For M spatially correlated WFs, their wind power and wind power forecast constitute a 2M-dimensional random variable , which is defined as . Elements and represent the wind power and the forecast of the m-th WF, respectively. The GMM-based joint probability distribution function (PDF) of is actually a convex combination of J multivariate Gaussian distributions with its weighted coefficient , mean vector and covariance , as given in (1):

(1)

where is a 2M-dimensional Gaussian distribution and is the parameter set of GMM-based joint PDF. The details of are given as follows:

(2)
(3)
(4)
(5)

Define the forecast error of the -th WF as . Given a forecast value , the conditional PDF of can be derived from (1) and detailed in (6), where its weighted coefficient is given in (7), its mean vector is given in (8) and its variance is given in (9).

(6)
(7)
(8)
(9)

Iii PPD Framework for the EM Algorithm

In this section, the centralized EM algorithm for estimating the parameters of GMM is first introduced. Then, we formulate a distributed framework for the EM algorithm by reformulating the algorithm into local and global parts. Specifically, the local parts can be calculated by each WF, and the global parts require the results of all local parts to be computed. To design a PPD EM algorithm, the keys lie in how to achieve the global parts in a PPD manner. Therefore, the distributed framework points out the direction for the PPD EM algorithm design.

Iii-a Centralized EM Algorithm

The training set consists of N historical observations of . The n-th observation is described as , where is the n-th wind power observation of the m-th WF, while is its n-th forecast observation. The closed-form expression of the centralized EM algorithm consists of the expectation step (E-step) and maximization step (M-step). In the k-th iteration, the centralized E-step is given in (10) and the centralized M-step in (11):

(10)
(11a)
(11b)
(11c)

where T represents the transpose of a vector or matrix. After convergence, the estimation of is achieved, and GMM-based joint PDF is established. For detailed derivation and proof, please refer to [3]. It should be emphasized that, since the calculation processes are the same for every Gaussian component in every iteration, we will omit the subscripts k and j in later derivations when it does not cause ambiguity.

Iii-B Distributed Framework for the E-step

The E-step aims to calculate the statistics in (10) using the parameter updated by each WF from the last iteration. Therefore, in the E-step, becomes public knowledge for all WFs. The part that actually involves exists only in the exponential term of (12), as given in (13).

(12)
(13)

Equation (13) shows that for vertically partitioned data, the E-step can be divided into two summations among WFs: the first one is given in (14), and the second one is given in (15).

(14)
(15)

The local part of the first summation for the -th WF is defined as in (16), where the -th WF can compute it with the known and its own data. The global part of the first summation is (14) itself.

(16)

The local part of the second summation for the -th WF is defined as in (17), while the global part of the second summation is (15) itself as well.

(17)

In fact, the relationships between the local and global parts of (14) and (15) are the same. Therefore, we provide a unified form of the relationships in (18).

(18)

To achieve the global part , one should collect of others. However, for the m-th WF, sharing with others might leak the information of its raw data because the wind power is close to its forecast value ; thus, other WFs can estimate the m-th WF’s data to some extent from or . Therefore, how to calculate (18) in a distributed manner under the premise of data privacy preservation is the key to realizing the PPD E-step.

Iii-C Distributed Framework for the M-step

The M-step aims to update in (11) using calculated from the E-step. For (11a), since is already obtained by all WFs as public knowledge, every WF can directly compute (11a) to update . For (11b), its m-th element is reformulated in (19). The m-th WF can compute and by itself. Meanwhile, since no WF can deduce N observations from the result of (19), the m-th WF can share its and with other WFs as no data privacy is sacrificed. Finally, each WF can update using the results of (19) from others.

(19)

For (11c), the diagonal and nondiagonal elements of are reformulated in (20) and (21), respectively. The m-th WF can directly compute and in (20). Besides, neither nor contain private information because no WF can deduce raw data from them. Thus, the m-th WF can share and with others.

(20)
(21)
(22)
(23)
(24)

However, the situation is completely different when calculating in (21), which requires computing an inner product between vector and in (22). The local part of the inner product is defined in (23) for the -th WF and (24) for the -th WF, where the -th or -th WF can directly compute or with its own data. The global part of the inner product is (22). As can be observed, to obtain the global part, one should collect the local part, i.e., the vector of all WFs. However, since and are all public knowledge after the calculations of (10) and (11b), collecting is essentially collecting the raw data of the -th WF, which reveals privacy. Therefore, how to calculate (22) for any two WFs in a distributed manner under the protection of data privacy is the key for realizing the PPD M-step.

Iv PPD Summation Algorithm

This section proposes a PPD summation algorithm to calculate (18) in a fully distributed and privacy-preserving manner.

Iv-a Average Consensus Algorithm

To calculate (18) in a fully distributed manner, the average consensus algorithm is an effective approach [15]. Some definitions are presented before the demonstration. The communication topology of M spatially correlated WFs is represented by a graph , where denotes the set of nodes and denotes the set of edges . Once the distance between two nodes is less than a preset distance threshold , the two nodes are connected. The neighbors of node m are denoted by . The weighted adjacency matrix is represented by with adjacent coefficient as given in (25), where and denote the degree of nodes m and i. is a symmetric matrix, and , where .

(25)

The discrete form of the average consensus algorithm is presented in (26). After convergence, each WF can obtain the average value of (18) in a distributed manner, as given in (27).

(26)
(27)

To compute (26), the -th WF only needs to collect (for ) from its neighbors to calculate a local summation during each iteration, i.e.,

(28)

However, in the first iteration, (for ) is revealed to the -th WF. Thus, the average consensus algorithm is not privacy-preserving.

Iv-B PPD Summation Algorithm

To achieve the local summation in (28) under the premise of protecting privacy, we leverage an additive homomorphic encryption technique named Paillier cryptosystem. The Paillier cryptosystem is favored by many researchers for the analysis of social networks [32] or the Internet [31].

Let denote a plaintext, denote a ciphertext and denote a prespecified large prime integer. The encryption process with a public key is given in (29), and the decryption with a secret key is given in (30).

(29)
(30)

To compute the sum of M plaintexts, the decrypter only needs to multiply the corresponding M ciphertexts and then decrypt the multiplication result, as given in (31). The entire process strictly protects data privacy. See [16] for more details.

(31)

Inspired by the secure summation protocol in [32], we utilize the Paillier cryptosystem to compute (28), which helps us to realize a privacy-preserving average consensus algorithm. Specifically, in the first iteration of the average consensus algorithm, the neighbors of the m-th WF, which are numbered from 1 to , encrypt their initial value using the 1st neighbor’s in (32). Meanwhile, the m-th WF encrypts a random and secret number by (33). Then, these neighbors send their ciphertexts to the m-th WF. After that, the m-th WF performs the multiplication calculation in (34) and sends the result to the 1st neighbor. Thereafter, the 1st neighbor decrypts into the summation in (35) and sends it back to the m-th WF. Finally, the m-th WF subtracts to obtain the result of (28). For the subsequent iterations of the average consensus algorithm, no encryption is needed. Details of the PPD summation algorithm are given in Algorithm 1.

(32)
(33)
(34)
(35)

Please note that although the Paillier cryptosystem is introduced in the average consensus algorithm, it does not affect the convergence of the average consensus algorithm. In fact, the PPD summation algorithm and the average consensus algorithm are mathematically equivalent. See [30] for the convergence proof of the original average consensus algorithm.

Input: with its .
Output: obtains .
1 while convergence criterion is not met do
2       for  to  do
3             ;
4             if t=0 then
5                   computes (32);
6                   computes (33) and sends to ;
7                   computes (35) and sends it to ;
8                   subtracts and obtains ;
9                   completes the calculation of (28);
10                   ;
11                  
12            else
13                   sends to ;
14                   completes the calculation of (28);
15                   ;
16                  
17             end if
18            
19       end for
20      
21 end while
22for  to  do
23       computes
24 end for
Algorithm 1 The PPD summation algorithm

V PPD Inner Product Algorithm

In this section, a PPD inner product algorithm is proposed to calculate (22) for any two WFs in a fully distributed manner considering privacy protection.

For the inner product calculation in (22), once the angle in (36) is obtained, by sharing the norm and among WFs, which will not reveal any raw data, the inner product can be directly calculated by every WF. Therefore, the problem of computing (22) becomes into this one: how to compute the angle between two vectors in (36) without revealing any raw element of the vectors?

(36)

In an N-dimensional space, the probability of finding a random hyperplane separating two vectors and is proportional to the angle [9]. For calculating the probability, a publicly known random vector set is first defined, where each column is a random vector . Then, the probability can be computed by (37).

(37)

For a further demonstration, the randomized binary hash mapping function is defined in (38),

(38)

where the function can encode an L-dimensional real vector into an L-dimensional binary vector according to the sign of each element in the real vector. Thus, actually represents the sign information of the multiplication results between and .

Once and are obtained, (37) can be easily computed by counting the number of different sign pairs. Note that the counting process is essentially calculating the Hamming distance between and , i.e., [19]. Therefore, based on the randomized binary hash mapping function and Hamming distance calculation, the angle can be calculated by (39).

(39)

It should be emphasized that, our goal is not only to calculate the inner product of two vectors under the premise of protecting privacy but also to obtain all the inner products between any two WFs through a fully distributed manner. For computing all the inner product values, the set is required. Thus, based on (39) and the average consensus algorithm, the PPD inner product algorithm is proposed in Algorithm 2.

Input: with its and .
Output: obtains for
1 computes its and by (38) and converts them into decimal and ;
2 formulates its , where is the m-th element and is the (M+m)-th element. Other elements are 0;
3 computes (26) with as input until convergence ;
4 multiplies the convergence result by M to obtain for ;
5 converts into binary for ;
6 computes its and ;
7 repeats step 2 to 4 by replacing with and with ;
8 computes (39) to obtain () using ();
computes (36) to obtain () using () and ();
Algorithm 2 The PPD inner product algorithm

Vi PPD EM Algorithm

Since the two keys mentioned in Section 3 are solved by the proposed PPD summation and inner product algorithm, the PPD EM algorithm for estimating the GMM-based joint PDF of multiple spatially correlated WFs is eventually developed in Algorithm 3.

1Initialization:
2 01 Set , and for ;
3 02 Set k=1;
4 The PPD E-step:
5 for  to and to  do
6       03 Define , , ;
7       04 Input: with . Run Algorithm 1 and obtains ;
8       05 Input: with . Run Algorithm 1 and obtains ;
9       06 computes (12) via , and then updates in (10) via the result of (12);
10      
11 end for
12The PPD M-step:
13 for  to  do
14       07 updates and in (19);
15       08 updates and in (20);
16       09 updates and in (22);
17       10 formulates , where is the m-th element and is the (M+m)-th element. Other elements are 0;
18       11 computes (26) with its as input until convergence ;
19       12 multiplies the convergence result by M to obtain for ;
20       13 Repeat step 10 to step 12 while replacing and by and ;
21       14 Input: with and . Run Algorithm 2 and obtains for ;
22       15 obtains for by (22);
23       16 obtains by (11) directly;
24       17 updates ;
25       18 obtains ;
26      
27 end for
2819 Set ;
29 20 Loop the PPD E-step and M-step until convergence;
Algorithm 3 The PPD EM algorithm

The advantages of the proposed algorithm are as follows:

  • Strict privacy-preserving. For the summation calculation in the PPD E-step, the Paillier cryptosystem is used to protect the raw data; for the inner product calculation in the PPD M-step, the randomized binary hash mapping function is used to prevent data privacy disclosure. The two techniques that we utilized can strictly protect the privacy of each WF.

  • Fully distributed. As we introduce the average consensus algorithm into the PPD E-step and M-step, each WF only needs to communicate with its neighbors. Thus, we avoid the assumption made in [31, 6, 21] that any two nodes are connected, and we improve the scalability of the proposed algorithm. Meanwhile, the preselected communication paths in [6, 14, 13, 32] are no longer required. Thus, the proposed algorithm is fully distributed.

  • Robust. As the communication between neighbors may fail, robustness to communication failure is necessary. Since the proposed PPD EM algorithm is developed based on the average consensus algorithm, as long as the communication topology is still connected, the communication failure basically will not affect the final estimation results due to the consensus feature [2].

Please note that once the joint PDF is established, each WF can derive its conditional PDF of the forecast error in (6) via the PPD derivation algorithm presented in [12].

Vii Case Study

The historical data of wind power and forecast value are from the “eastern wind integration data set” published by the National Renewable Energy Laboratory (NREL), the U.S., where we choose 9 WFs in Maryland, numbered 4401, 5405, 6211, 6359, 6526, 6812, 6931, 7187, and 7460. Their communication topology is shown in Fig. 1(a). Each WF has 20 days of hourly wind power and forecast data. Thus, and . Then we build the joint PDF of the wind power and its forecast of the nine spatially correlated WFs by the proposed EM algorithm. After that, by leveraging the PPD derivation algorithm presented in [12], we also derive the conditional PDF of the forecast error of each WF from the joint PDF. Since the privacy-preserving feature of the proposed algorithm has already been discussed in the previous sections, this section mainly aims to verify the correctness and robustness of the proposed algorithm.

Vii-a Correctness of the PPD Summation Algorithm

We use the wind power data of 9 WFs at 2004/1/1-01:00 as input, and use the proposed PPD summation algorithm to estimate the summation of the 9 data points. To show more details, we illustrate the iterative process of each WF’s estimation for the summation in Fig. 1(b). It can be observed that after 30 iterations, all WFs achieve consensus on the exact summation value, showing the correctness of the proposed PPD summation algorithm.

Fig. 1: (a) Communication topology of the nine WFs; (b) the iterative process of each WF’s estimation for the summation

Vii-B Correctness of the PPD Inner Product Algorithm

To verify the correctness of the proposed PPD inner product algorithm, we use it to calculate the inner products between every two WFs’ private vectors. The private vector of a WF consists of its 20-day historical wind power data. Then, compared to the real inner product results, we provide the average relative errors of the proposed algorithm under different values of in Fig. 2. It can be observed that the error decreases significantly as increases, but it nearly stabilizes when reaches . Thus, we finally choose as bit kb. Furthermore, the average relative error is when , which proves the correctness of the proposed algorithm.

Fig. 2: Average relative error and its corresponding binary code length

Vii-C Verification of the PPD EM Algorithm

We use the Bayesian information criterion (BIC) to set the number of Gaussian components as 5. After that, we build the joint probability distribution of wind power and forecast of the nine spatially correlated WFs using the proposed PPD EM algorithm. The distribution constructed by the centralized EM algorithm is also given as the benchmark. Since the 18-dimensional distribution cannot be drawn for illustration, we derive several 1-dimensional and 2-dimensional distributions from the 18-dimensional one based on the linear invariance property of GMM [11].

First, the 1-dimensional PDF and the 1-dimensional cumulative distribution function (CDF) are shown in Fig. 3. Only the first dimension is provided.

Fig. 3: The marginal PDFs and CDFs comparisons

In Fig. 3, the empirical distributions are obtained from the corresponding original historical data, the benchmarks are built by the centralized EM algorithm, and the distribution of each WF is constructed by the proposed PPD EM algorithm. It can be observed that (1) the benchmark and the distributions built by the WFs match the corresponding empirical distributions; (2) the distributions built by the WFs are coincident with each other, indicating that the consensus among WFs is achieved by the proposed algorithm; and (3) the distribution built by each WF is coincident with the benchmark, indicating the correctness of the proposed algorithm.

Then, further comparisons between the benchmark and the distribution of each WF are made using the relative standard error (RSE) as defined in (40), where represents the PDF or CDF built by each WF, represents the benchmark PDF or CDF, and represents the mean value. The RSE results are provided in Table. I. First, all the RSE values are less than , which means that the difference between the benchmark and the distribution of each WF is tiny. Second, the consensus effect of the proposed algorithm is obvious as the RSE values are identical correspondingly. Third, the RSE values of the CDFs are much smaller than those values of the PDFs. Note that the CDF is what we ultimately want for optimal decisions, e.g., calculating the quantile from the CDF. Thus, the RSE values of the CDFs eventually represent the accuracy of the proposed algorithm.

(40)
Wind Farm
PDF () 2.4 2.4 2.4 2.4 2.4 2.4 2.4 2.4 2.4
CDF () 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8
TABLE I: The RSEs between the distributions built by the centralized EM algorithm and the proposed algorithm

Furthermore, we also choose two dimensions from the 18-dimensional joint distribution to form 2-dimensional PDF and CDF. The 2-dimensional benchmarks built by the centralized EM algorithm and the 2-dimensional joint distributions built by the 1st WF are illustrated in Fig. 4.

Fig. 4: The 2-dimensional PDFs and CDFs comparison

Thereafter, the Kullback–Leibler divergence (KLD) between the joint distribution built by the 1st WF and by other WFs are given in Table II to illustrate the differences between them. Since all the KLDs are less than , the distribution built by the 1st WF and those built by others are basically the same. Thus, using the result of the 1st WF as a representative is reasonable and acceptable. We can observe that, in Fig. 4, the 2-dimensional PDF and CDF built by the 1st WF match the 2-dimensional benchmarks well. Therefore, the correctness of the proposed PPD EM algorithm is eventually verified.

Wind Farm
KLD () 0 0.02 0.19 2.19 1.09 0.33 1.11 2.06 1.01
TABLE II: KL divergences between the distributions built by the 1st WF and those built by other WFs

Vii-D Robustness of the PPD EM Algorithm

Since the proposed PPD EM algorithm is developed based on the average consensus algorithm, as long as the communication network topology is still connected, the communication failure basically will not affect the final estimation results due to the consensus feature.

Fig. 5: The CDFs built after communication failures

To verify the robustness of the proposed PPD EM algorithm, we first cut off communication lines to simulate communication failures. Then, we inspect the variations of the CDFs built by the proposed algorithm after the failures. Since the consensus of the proposed algorithm has already been verified, we still use the estimation result of the 1st WF as the representative. The CDFs built by the 1st WF after the communication failures are shown in Fig. 5. In this figure, the benchmarks are the CDFs built by the centralized EM algorithm. Besides, legend ’Original’ represents the CDFs built by the 1st WF when no failure occurs, and legend ’line mi’ represents the CDFs built by the 1st WF when the communication between the m-th WF and the i-th WF fails. For example, legend ‘line 1-3’ means that the communication between the 1st WF and the 3rd WF fails while other communication lines operate normally. As we can see, in Fig. 5, the CDFs built by the 1st WF under different communication failures still coincide with the corresponding benchmarks and original CDFs. This proves that the proposed PPD EM algorithm can still maintain high accuracy after communication failures. Therefore, the robustness of the proposed algorithm is verified.

Vii-E Verification of the Conditional Distribution

Based on the established joint distribution via the proposed PPD EM algorithm, we derive each WF’s conditional distribution of forecast error in (6) by the PPD derivation algorithm proposed in [12].

Fig. 6: The conditional CDF comparison between the centralized manner and the proposed PPD manner

The PPD derivation algorithm can enable each WF to obtain its conditional distribution from the joint one in the PPD manner. Here we illustrate the conditional CDF obtained by the 1st, 3rd, 5th, and 7th WFs in Fig. 6, where the benchmarks are all built in a centralized manner. The matches between the benchmarks and the CDFs built by WFs show that the joint distribution obtained via the proposed PPD EM algorithm is correct, so the conditional distributions derived from the joint distribution are correct as well.

Viii Conclusion

Under the consideration of wind power correlation, estimating the conditional probability distribution of WF’s forecast error requires the historical wind power and forecast data of all WFs. However, for the multi-correlated WFs spreading over a multi-regional interconnected grid, data barriers among the WFs belonging to different regions may exist. Therefore, we propose a PPD EM algorithm to estimate the joint probability distribution of the vertically partitioned wind power and forecast data. Then, we derive each WF’s conditional probability distribution of forecast error from the joint one. To achieve this, we first formulate a distributed framework for the EM algorithm by reformulating the algorithm into local and global parts. Hereafter, the keys to developing a PPD EM algorithm are pointed out: calculating summations and inner products among the statistics of the correlated WFs in a PPD manner. After that, we design a PPD summation algorithm based on the additive homomorphic encryption and the average consensus algorithm. Thereafter, we develop a PPD inner product algorithm by leveraging the randomized binary hash mapping function and the average consensus algorithm. Combining the PPD summation and inner product algorithms, we finally propose the PPD EM algorithm. This algorithm can enable each WF to estimate the joint probability distribution of the wind power and forecast data of all the WFs in a fully distributed and privacy-preserving manner.

Compared with the centralized EM algorithm, the proposed algorithm not only has high accuracy but also is fully distributed because it only needs local communication between neighboring WFs. Moreover, it strictly protects the data privacy of every WF during communications. Furthermore, its robustness to communication failure is guaranteed by the consensus feature.

References

  1. A. Ahmadi-Khatir, A. Conejo and R. Cherkaoui (2014-07) Multi-area unit scheduling and reserve allocation under wind power uncertainty. In 2014 IEEE PES General Meeting — Conference Exposition, Vol. , pp. 1–1. External Links: Document, ISSN Cited by: §I.
  2. T. C. Aysal, M. J. Coates and M. G. Rabbat (2008-10) Distributed average consensus with dithered quantization. IEEE Transactions on Signal Processing 56 (10), pp. 4905–4918. External Links: Document, ISSN Cited by: 3rd item.
  3. J. Bilmes (1997) A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical report Technical Report ICSI-TR-97-02, University of Berkeley. Cited by: §III-A.
  4. H. Bludszuweit, J. A. Dominguez-Navarro and A. Llombart (2008-08) Statistical analysis of wind power forecast error. 23 (3), pp. 983–991. External Links: Document, ISSN Cited by: §I.
  5. H. Chen, X. Wang, Z. Li, W. Chen and Y. Cai (2019) Distributed sensing and cooperative estimation/detection of ubiquitous power internet of things. 4 (1), pp. 13. Cited by: §I.
  6. C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin and M. Y. Zhu (2002-12) Tools for privacy preserving distributed data mining. SIGKDD Explor. Newsl. 4 (2), pp. 28–34. External Links: ISSN 1931-0145, Document Cited by: §I, §I, §I, 2nd item.
  7. J. Contreras, A. Losi, M. Russo and F. F. Wu (2002-02) Simulation and evaluation of optimization problem solutions in distributed energy management systems. 17 (1), pp. 57–62. External Links: Document, ISSN 1558-0679 Cited by: §I.
  8. A. Fabbri, T. G. S. Roman, J. R. Abbad and V. H. M. Quezada (2005-08) Assessment of the cost associated with wind generation prediction errors in a liberalized electricity market. IEEE Trans. Power Syst. 20 (3), pp. 1440–1446. External Links: Document, ISSN 0885-8950 Cited by: §I.
  9. M. X. Goemans and D. P. Williamson (1995-11) Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42 (6), pp. 1115–1145. External Links: ISSN 0004-5411, Document Cited by: §V.
  10. D. Gu (2008-07) Distributed em algorithm for gaussian mixtures in sensor networks. IEEE Trans. Neural Netw. 19 (7), pp. 1154–1166. External Links: Document, ISSN 1045-9227 Cited by: §I.
  11. M. Jia, C. Shen and Z. Wang (2018) A distributed probabilistic modeling algorithm for the aggregated power forecast error of multiple newly built wind farms. IEEE Trans. Sustain. Energy. (), pp. 1–10. External Links: Document, ISSN 1949-3029 Cited by: §I, §I, §VII-C.
  12. M. Jia, C. Shen and Z. Wang (2019, Available: https://arxiv.org/abs/1905.06420) A distributed privacy-preserving incremental update algorithm for probability distribution of wind power forecast error. [Online]. External Links: arXiv:1905.06420 Cited by: §I, §VI, §VII-E, §VII.
  13. K. L. Leemaqz, S. X. Lee and G. J. McLachlan (2017-08) Corruption-resistant privacy preserving distributed em algorithm for model-based clustering. In 2017 IEEE Trustcom/BigDataSE/ICESS, Vol. , pp. 1082–1089. External Links: Document, ISSN 2324-9013 Cited by: §I, §I, §I, 2nd item.
  14. X. Lin, C. Clifton and M. Y. Zhu (2005-07-01) Privacy-preserving clustering with distributed em mixture modeling. Knowledge and Information Systems 8 (1), pp. 68–81. External Links: ISSN 0219-3116, Document Cited by: §I, §I, §I, 2nd item.
  15. R. Olfati-Saber, J. A. Fax and R. M. Murray (2007-01) Consensus and cooperation in networked multi-agent systems. Proceedings of the IEEE 95 (1), pp. 215–233. External Links: Document, ISSN 0018-9219 Cited by: §IV-A.
  16. P. Paillier (1999) Public-key cryptosystems based on composite degree residuosity classes. In Advances in Cryptology — EUROCRYPT ’99, J. Stern (Ed.), Berlin, Heidelberg, pp. 223–238. External Links: ISBN 978-3-540-48910-8 Cited by: §IV-B.
  17. S. S. Pereira, S. Barbarossa and A. Pagés-Zamora (2010-10) Consensus for distributed em-based clustering in wsns. In 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop, Vol. , pp. 45–48. External Links: Document, ISSN 2151-870X Cited by: §I.
  18. M.S. Rahman and A.M.T. Oo (2017) Distributed multi-agent based coordinated power management and control strategy for microgrids with distributed energy resources. 139, pp. 20 – 32. External Links: ISSN 0196-8904, Document Cited by: §I.
  19. M. Sanparith and M. Ithipan (2013) Fast nearest neighbor retrieval using randomized binary codes and approximate euclidean distance. Pattern Recognition Letters 34 (9), pp. 1101 – 1107. External Links: ISSN 0167-8655, Document Cited by: §V.
  20. M. Sun, C. Feng and J. Zhang (2019) Conditional aggregated probabilistic wind power forecasting based on spatio-temporal correlation. Appl. Energy.Appl. Energy.Appl. Energy.Appl. Energy.Appl. Energy.Journal of Parallel and Distributed ComputingIEEE Trans. Power Syst.Protection and Control of Modern Power SystemsAppl. Energy.Appl. Energy.IEEE Trans. Power Syst.Energy Conversion and ManagementEnergy Conversion and ManagementEnergy Conversion and ManagementEnergy Conversion and ManagementEnergy Conversion and ManagementEnergy Conversion and ManagementProtection and Control of Modern Power Systems 256, pp. 113842. External Links: ISSN 0306-2619, Document Cited by: §I.
  21. J. Vaidya and C. Clifton (2002) Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, New York, NY, USA, pp. 639–644. External Links: ISBN 1-58113-567-X, Document Cited by: 2nd item.
  22. X. Wang, J. Chang, X. Meng and Y. Wang (2018) Short-term hydro-thermal-wind-photovoltaic complementary operation of interconnected power systems. 229, pp. 945 – 962. External Links: ISSN 0306-2619, Document Cited by: §I.
  23. Z. Wang, F. Liu, S. H. Low, C. Zhao and S. Mei (2019-01) Distributed frequency control with operational constraints, part ii: network power balance. IEEE Trans. Smart Grid. 10 (1), pp. 53–64. External Links: Document, ISSN Cited by: §I.
  24. Z. Wang, C. Shen, F. Liu, J. Wang and X. Wu (2018-10) An adjustable chance-constrained approach for flexible ramping capacity allocation. IEEE Trans. Sustain. Energy. 9 (4), pp. 1798–1811. External Links: Document, ISSN 1949-3029 Cited by: §I.
  25. Z. Wang, C. Shen, F. Liu, X. Wu, C. Liu and F. Gao (2017-11) Chance-constrained economic dispatch with non-gaussian correlated wind power uncertainty. IEEE Trans. Power Syst. 32 (6), pp. 4880–4893. External Links: Document, ISSN 0885-8950 Cited by: §I.
  26. Z. Wang, C. Shen and F. Liu (2018) A conditional model of wind power forecast errors and its application in scenario generation. Appl. Energy. 212, pp. 771–785. Cited by: §I.
  27. Y. Weng, W. Xiao and L. Xie (2011) Diffusion-based em algorithm for distributed estimation of gaussian mixtures in wireless sensor networks. Sensors 11 (6), pp. 6297–6316. External Links: ISSN 1424-8220, Document Cited by: §I.
  28. Y. Weng, L. Xie and W. Xiao (2009-12) Diffusion scheme of distributed em algorithm for gaussian mixtures over random networks. In 2009 IEEE International Conference on Control and Automation, Vol. , pp. 1529–1534. External Links: Document, ISSN 1948-3449 Cited by: §I.
  29. Z. Wu, P. Zeng, X. Zhang and Q. Zhou (2016-11) A solution to the chance-constrained two-stage stochastic program for unit commitment with wind energy integration. IEEE Trans. Power Syst. 31 (6), pp. 4185–4196. External Links: Document, ISSN 0885-8950 Cited by: §I.
  30. L. Xiao, S. Boyd and S. Kim (2007) Distributed average consensus with least-mean-square deviation. 67 (1), pp. 33 – 46. External Links: ISSN 0743-7315, Document Cited by: §IV-B.
  31. F. Xu, S. Zeng, S. Luo, C. Wang, Y. Xin and Y. Guo (2010-Sept) Research on secure scalar product protocol and its’ application. In 2010 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM), Vol. , pp. 1–4. External Links: Document, ISSN 2161-9646 Cited by: §IV-B, 2nd item.
  32. B. Yang, I.Sato and H.Nakagawa (2012) Privacy-preserving em algorithm for clustering on social network. In Advances in Knowledge Discovery and Data Mining, Berlin, Heidelberg, pp. 542–553. External Links: ISBN 978-3-642-30217-6 Cited by: §I, §I, §I, §IV-B, §IV-B, 2nd item.
  33. J. Zhang, J. Yan, D. Infield, Y. Liu and F. Lien (2019) Short-term forecasting and uncertainty analysis of wind turbine power based on long short-term memory network and gaussian mixture model. 241, pp. 229 – 244. External Links: ISSN 0306-2619, Document Cited by: §I.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
410077
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel