Fast LowRank Bayesian Matrix Completion with Hierarchical Gaussian Prior Models
Abstract
The problem of lowrank matrix completion is considered in this paper. To exploit the underlying lowrank structure of the data matrix, we propose a hierarchical Gaussian prior model, where columns of the lowrank matrix are assumed to follow a Gaussian distribution with zero mean and a common precision matrix, and a Wishart distribution is specified as a hyperprior over the precision matrix. We show that such a hierarchical Gaussian prior has the potential to encourage a lowrank solution. Based on the proposed hierarchical prior model, we develop a variational Bayesian matrix completion method which embeds the generalized approximate massage passing (GAMP) technique to circumvent cumbersome matrix inverse operations. Simulation results show that our proposed method demonstrates superiority over some stateoftheart matrix completion methods.
atrix completion, lowrank Bayesian learning, generalized approximate massage passing.
1 Introduction
The problem of recovering a partially observed matrix, which is referred to as matrix completion, arises in a variety of applications, including recommender systems [1, 2, 3], genotype prediction [4, 5], image classification [6, 7], network traffic prediction [8], and image imputation [9]. Lowrank matrix completion, which is empowered by the fact that many realworld data lie in an intrinsically low dimensional subspace, has attracted much attention over the past few years. Mathematically, a canonical form of the lowrank matrix completion problem can be presented as
s.t.  (1) 
where is an unknown lowrank matrix, is a binary matrix that indicates which entries of are observed, denotes the Hadamard product, and is the observed matrix. It has been shown that the lowrank matrix can be exactly recovered from (1) under some mild conditions [10]. Nevertheless, minimizing the rank of a matrix is an NPhard problem and no known polynomialtime algorithms exist. To overcome this difficulty, alternative lowrank promoting functionals were proposed. Among them, the most popular alternative is the nuclear norm which is defined as the sum of the singular values of a matrix. Replacing the rank function with the nuclear norm yields the following convex optimization problem
s.t.  (2) 
It was proved that the nuclear norm is the tightest convex envelope of the matrix rank, and the theoretical recovery guarantee for (2) under both noiseless and noisy cases was provided in [10, 11, 12, 13]. To solve (2), a number of computationally efficient methods were developed. A wellknown method is the singular value thresholding method which was proposed in [14]. Another efficient method was proposed in [15], in which an augmented Lagrange multiplier technique was employed. Apart from convex relaxation, nonconvex surrogate functions, such as the logdeterminant function, were also introduced to replace the rank function [16, 17, 18, 19]. Nonconvex methods usually claim better recovery performance, since nonconvex surrogate functions behaves more like the rank function than the nuclear norm. It is noted that for both convex methods and nonconvex methods, one need to meticulously select some regularization parameters to properly control the tradeoff between the matrix rank and the data fitting error when noise is involved. However, due to the lack of the knowledge of the noise variance and the rank, it is usually difficult to determine appropriate regularization parameters.
Another important class of lowrank matrix completion methods are Bayesian methods [20, 21, 22, 23, 24], which model the problem in a Bayesian framework and have the ability to achieve automatic balance between the lowrankness and the fitting error. Specifically, in [20], the lowrank matrix is expressed as a product of two factor matrices, i.e. , and the matrix completion problem is translated to searching for these two factor matrices and . To encourage a lowrank solution, sparsitypromoting priors [25] are placed on the columns of two factor matrices, which aims to promote structuredsparse factor matrices with only a few nonzero columns, and in turn leads to a lowrank matrix . Nevertheless, this Bayesian method updates the factor matrices in a rowbyrow fashion and needs to perform a number of matrix inverse operations at each iteration. To address this issue, a bilinear generalized approximate message passing (GAMP) method was developed to learn the two factor matrices and [22, 23], without involving any matrix inverse operations. This method, however, cannot automatically determine the matrix rank and needs to try out all possible values of the rank. Recently, a new Bayesian prior model was proposed in [24], in which columns of the lowrank matrix follow a zero mean Gaussian distribution with an unknown deterministic covariance matrix that can be estimated via Type II maximum likelihood. It was shown that maximizing the marginal likelihood function yields a lowrank covariance matrix, which implies that the prior model has the ability to promote a lowrank solution. A major drawback of this method is that it requires to perform an inverse of an matrix at each iteration, and thus has a cubic complexity in terms of the problem size. This high computational cost prohibits its application to many practical problems.
In this paper, we develop a new Bayesian method for lowrank matrix completion. To exploit the underlying lowrank structure of the data matrix, a lowrank promoting hierarchical Gaussian prior model is proposed. Specifically, columns of the lowrank matrix are assumed to be mutually independent and follow a common Gaussian distribution with zero mean and a precision matrix. The precision matrix is treated as a random parameter, with a Wishart distribution specified as a hyperprior over it. We show that such a hierarchical Gaussian prior model has the potential to encourage a lowrank solution. The GAMP technique is employed and embedded in the variational Bayesian (VB) inference, which results in an efficient VBGAMP algorithm for matrix completion. Note that due to the nonfactorizable form of the prior distribution, the GAMP technique cannot be directly used. To address this issue, we construct a carefully devised surrogate problem whose posterior distribution is exactly the one required for VB inference. Meanwhile, the surrogate problem has factorizable prior and noise distributions such that the GAMP can be directly applied to obtain an approximate posterior distribution. Such a trick helps achieve a substantial computational complexity reduction, and makes it possible to successfully apply the proposed method to solve largescale matrix completion problems.
The rest of the paper is organized as follows. In Section 2, we introduce a hierarchical Gaussian prior model for lowrank matrix completion. Based on this hierarchical model, a variational Bayesian method is developed in Section 3. In Section 4, a GAMPVB method is proposed to reduce the computational complexity of the proposed algorithm. Simulation results are provided in Section 5, followed by concluding remarks in Section 6.
2 Bayesian Modeling
In the presence of noise, the canonical form of the matrix completion problem can be formulated as
s.t.  (3) 
where denotes the additive noise, and is a binary matrix that indicates which entries are observed. Without loss of generality, we assume . As indicated earlier, minimizing the rank of a matrix is an NPhard problem. In this paper, we consider modeling the matrix completion problem within a Bayesian framework.
We assume entries of are independent and identically distributed (i.i.d.) random variables following a Gaussian distribution with zero mean and variance . To learn , a Gamma hyperprior is placed over , i.e.
(4) 
where is the Gamma function. The parameters and are set to small values, e.g. , which makes the Gamma distribution a noninformative prior.
To promote the lowrankness of , we propose a twolayer hierarchical Gaussian prior model (see Fig. 1). Specifically, in the first layer, the columns of are assumed mutually independent and follow a common Gaussian distribution:
(5) 
where denotes the th column of , and is the precision matrix. The second layer specifies a Wishart distribution as a hyperprior over the precision matrix :
(6) 
where and denote the degrees of freedom and the scale matrix of the Wishart distribution, respectively. Note that the constraint for the standard Wishart distribution can be relaxed to if an improper prior is allowed, e.g. [26]. In Bayesian inference, improper prior distributes can often be used provided that the corresponding posterior distribution can be correctly normalized [27].
The Gaussianinverse Wishart prior has the potential to encourage a lowrank solution. To illustrate this lowrankness promoting property, we integrate out the precision matrix and obtain the marginal distribution of as (details of the derivation can be found in Appendix 7)
(7) 
From (7), we have
(8) 
If we choose , and let be a small positive value, the logmarginal distribution becomes
(9) 
where denotes the th eigenvalue of . Clearly, in this case, the prior encourages a lowrank solution . This is because maximizing the prior distribution is equivalent to minimizing with respect to . It is well known that the logsum function is an effective sparsitypromoting functional which encourages a sparse solution of [28, 29, 30]. As a result, the resulting matrix has a lowrank structure.
In addition to , the parameter can otherwise be devised in order to exploit additional prior knowledge about . For example, in some applications such as image inpainting, there is a spatial correlation among neighboring coefficients of . To capture the smoothness between neighboring coefficients, can be set as [31]
(10) 
where is a secondorder difference operator with its th entry given by
(11) 
Another choice of to promote a smooth solution is the Laplacian matrix [32], i.e.
(12) 
where is the adjacency matrix of a graph with its entries given by
(13) 
, referred to as the degree matrix, is a diagonal matrix with , and is a small positive value to ensure to be full rank.
It can be shown that defined in (10) and (12) promotes lowrankness as well as smoothness of . To illustrate this, we first introduce the following lemma.
Lemma 1
For a positivedefinite matrix , the following equality holds valid
(14) 
for any .
See Appendix 8.
From Lemma 1, we have
(15) 
where is the th eigenvalue associated with . We see that maximizing the prior distribution is equivalent to minimizing with respect to . As discussed earlier, this logsum functional is a sparsitypromoting functional which encourages a sparse solution . As a result, the matrix has a low rank. Since is full rank, this implies that has a lowrank structure. On the other hand, notice that is the firstorder approximation of . Therefore minimizing will reduce the value of . Clearly, for defined in (10) and (12), a smoother solution results in a smaller value of . Therefore when is chosen to be (10) or (12), the resulting prior distribution has the potential to encourage a lowrank and smooth solution.
Remarks: Our proposed hierarchical Gaussian prior model can be considered as a generalization of the prior model in [24]. Notice that in [24], the precision matrix in the prior model is assumed to be a deterministic parameter, whereas it is treated as a random variable and assigned a Wishart prior distribution in our model. This generalization offers more flexibility in modeling the underlying latent matrix. As discussed earlier, the parameter can be devised to capture additional prior knowledge about the latent matrix, and such a careful choice of can help substantially improve the recovery performance, as corroborated by our experimental results.
3 Variational Bayesian Inference
3.1 Review of The Variational Bayesian Methodology
Before proceeding, we firstly provide a brief review of the variational Bayesian (VB) methodology. In a probabilistic model, let and denote the observed data and the hidden variables, respectively. It is straightforward to show that the marginal probability of the observed data can be decomposed into two terms [27]
(16) 
where
(17) 
and
(18) 
where is any probability density function, is the KullbackLeibler divergence [33] between and . Since , it follows that is a rigorous lower bound for . Moreover, notice that the left hand side of (16) is independent of . Therefore maximizing is equivalent to minimizing , and thus the posterior distribution can be approximated by through maximizing .
The significance of the above transformation is that it circumvents the difficulty of computing the posterior probability , when it is computationally intractable. For a suitable choice for the distribution , the quantity may be more amiable to compute. Specifically, we could assume some specific parameterized functional form for and then maximize with respect to the parameters of the distribution. A particular form of that has been widely used with great success is the factorized form over the component variables in [34], i.e. . We therefore can compute the posterior distribution approximation by finding of the factorized form that maximizes the lower bound . The maximization can be conducted in an alternating fashion for each latent variable, which leads to [34]
(19) 
where denotes the expectation with respect to the distributions for all . By taking the logarithm on both sides of (19), it can be equivalently written as
(20) 
3.2 Proposed Algorithm
We now proceed to perform variational Bayesian inference for the proposed hierarchical model. Let denote all hidden variables. Our objective is to find the posterior distribution . Since is usually computationally intractable, we, following the idea of [34], approximate as which has a factorized form over the hidden variables , i.e.
(21) 
As mentioned in the previous subsection, the maximization of can be conducted in an alternating fashion for each latent variable, which leads to (details of the derivation can be found in [34])
where denotes the expectation with respect to (w.r.t.) the distributions . Details of this Bayesian inference scheme are provided next.
1). Update of : The calculation of can be decomposed into a set of independent tasks, with each task computing the posterior distribution approximation for each column of , i.e. . We have
(22) 
where denotes the th column of and , with being the th column of . From (22), it can be seen that follows a Gaussian distribution
(23) 
with and given as
(24)  
(25) 
We see that to calculate , we need to perform an inverse operation of an matrix which involves a computational complexity of .
2). Update of : The approximate posterior can be obtained as
(26) 
From (26), it can be seen that follows a Wishart distribution, i.e.
(27) 
where
(28)  
(29) 
3). Update of : The variational optimization of yields
(30) 
where and denote the th entry of and , respectively, is an index set consisting of indices of those observed entries, and is the cardinality of the set , in which denotes the th entry of .
It is easy to verify that follows a Gamma distribution
(31) 
with the parameters and given respectively by
(32) 
where
(33) 
Some of the expectations and moments used during the update are summarized as
(34)  
(35)  
(36) 
where denotes the th diagonal entry of .
For clarity, we summarize our algorithm as follows.
It can be easily checked that the computational complexity of our proposed method is dominated by the update of the posterior distribution , which requires computing an matrix inverse times and therefore has a computational complexity scaling as . This makes the application of our proposed method to large data sets impractical. To address this issue, in the following, we develop a computationally efficient algorithm which obtains an approximation of by resorting to the generalized approximate message passing (GAMP) technique [35].
4 VbGamp
GAMP is a lowcomplexity Bayesian iterative technique recently developed in [36, 35] for obtaining approximate marginal posteriors. Note that the GAMP algorithm requires that both the prior distribution and the noise distribution have factorized forms [35]. Nevertheless, in our model, the prior distribution has a nonfactorizable form, in which case the GAMP technique cannot be directly applied. To address this issue, we first construct a surrogate problem which aims to recover from linear measurements :
(37) 
where is obtained by performing a singular value decomposition of , is a unitary matrix and is a diagonal matrix with its diagonal elements equal to the singular values of , and denotes the additive Gaussian noise with zero mean and covariance matrix . We assume that entries of are mutually independent and follow the following distribution:
(38) 
where , , and denote the th entry of , , and , respectively, is a constant, , and are known parameters. It is noted that although is an improper prior distribution, it can often be used provided the corresponding posterior distribution can be correctly normalized [27]. Considering the surrogate problem (37), the posterior distribution of can be calculated as
(39) 
where .
Taking the logarithm of , we have
(40) 
where is a diagonal matrix with its th diagonal entry equal to . Clearly, follows a Gaussian distribution with its mean and covariance matrix given by
(41)  
(42) 
Comparing (24)–(25) with (41)–(42), we can readily verify that when , , (i.e. ), and , is exactly the desired posterior distribution . Meanwhile, notice that for the surrogate problem (37), both the prior distribution and the noise distribution are factorizable. Hence the GAMP algorithm can be directly applied to (37) to find an approximation of the posterior distribution . By setting , , , , an approximate of in (23) can be efficiently obtained. We now proceed to derive the GAMP algorithm for the surrogate problem (37).
4.1 Solving (37) via GAMP
GAMP was developed in a message passingbased framework. By using centrallimittheorem approximations, message passing between variable nodes and factor nodes can be greatly simplified, and the loopy belief propagation on the underlying factor graph can be efficiently performed. As noted in [35], the centrallimittheorem approximations become exact in the largesystem limit under an i.i.d. zeromean subGaussian measurement matrix.
Firstly, GAMP approximates the true marginal posterior distribution by
(43) 
where and are quantities iteratively updated during the iterative process of the GAMP algorithm. Here, we have dropped their explicit dependence on the iteration number for simplicity. For the case , substituting the prior distribution (38) into (43), it can be easily verified that the approximate posterior follows a Gaussian distribution with its mean and variance given respectively as
(44)  
(45) 
Similarly, for the case , substituting the prior distribution (38) into (43), the approximate posterior follows a Gaussian distribution with its mean and variance given respectively as
(46)  
(47) 
Another approximation is made to the noiseless output , where denotes the th row of . GAMP approximates the true marginal posterior by
(48) 
where and are quantities iteratively updated during the iterative process of the GAMP algorithm. Again, here we dropped their explicit dependence on the iteration number . Under the additive white Gaussian noise assumption, we have , where denotes the th diagonal element of . Thus also follows a Gaussian distribution with its mean and variance given by
(49)  
(50) 
With the above approximations, we can now define the following two scalar functions: and that are used in the GAMP algorithm. The input scalar function is simply defined as the posterior mean , i.e.
(51) 
The scaled partial derivative of with respect to is the posterior variance , i.e.
(52) 
The output scalar function is related to the posterior mean as follows
(53) 
The partial derivative of is related to the posterior variance in the following way
(54) 
Given the above definitions of and , the GAMP algorithm tailored to the considered problem (37) can now be summarized as follows (details of the derivation of the GAMP algorithm can be found in [35]), in which denotes the th entry of .
4.2 Discussions
We have now derived an efficient algorithm to obtain an approximate posterior distribution of for (37). Specifically, the true marginal posterior distribution of is approximated by a Gaussian distribution with its mean and variance given by (44)–(45) or (46)–(47), depending on the value of . The joint posterior distribution can be approximated as a product of approximate marginal posterior distributions:
(55) 
As indicated earlier, by setting , , , and , the posterior distribution obtained via the GAMP algorithm can be used to approximate in (23).
We see that to approximate by using the GAMP, we first need to perform a singular value decomposition (SVD) of , which has a computational complexity of . The GAMP algorithm used to approximate involves very simple matrixvector multiplications which has a computational complexity scaling as . Therefore the overall computational complexity for updating is of order . In contrast, using (24)–(25) to update requires a computational complexity of . Thus the GAMP technique can help achieve a significant reduction in the computational complexity as compared with a direct calculation of .
For clarity, the VBGAMP algorithm for matrix completion is summarized as
The proposed method proceeds in a doubleloop manner, the outer loop calculate the variational posterior distributions and , and the inner loop computes an approximation of . It is noted that there is no need to wait until the GAMP converges. Experimental results show that GAMP provides a reliable approximation of even if only a few iterations are performed. In our experiments, only one iteration is used to implement GAMP.
5 Experiments
In this section, we carry out experiments to illustrate the performance of our proposed GAMPassisted Bayesian matrix completion method with hierarchical Gaussian priors (referred to as BMCGPGAMP). Throughout our experiments, the parameters used in our method are set to be and . Here we choose a small in order to encourage a lowrank precision matrix. We compare our method with several stateoftheart methods, namely, the variational sparse Bayesian learning method (also referred to as VSBL) [20] which models the lowrankness of the matrix as the structural sparsity of its two factor matrices, the bilinear GAMPbased matrix completion method (also referred to as BiGAMPMC) [23] which implements the VSBL using bilinear GAMP, the inexact version of the augmented Lagrange multiplier based matrix completion method (also referred to ALMMC) [15], and a lowrank matrix fitting method (also referred to as LMaFit) [37] which iteratively minimizes the fitting error and estimates the rank of the matrix. It should be noted that both VSBL and LMaFit require to set an overestimated rank. Note that we did not include [24] in our experiments due to its prohibitive computational complexity when the matrix dimension is large. Codes of our proposed algorithm along with other competing algorithms are available at http://www.junfanguestc.net/codes/LRMC.rar, in which codes of other competing algorithms are obtained from their respective websites.
5.1 Synthetic data
We first examine the effectiveness of our proposed method on synthetic data. We generate the test rank matrix of size by multiplying by the transpose of , i.e. . All the entries of and are sampled from a normal distribution. We consider the scenarios where () and () entries of are observed. Here denotes the sampling ratio. The success rates as well as the run times of respective algorithms as a function of the rank of , i.e. , are plotted in Fig. 2 Results are averaged over 25 independent trials. A trial is considered to be successful if the relative error is smaller than , i.e. , where denotes the estimated matrix. For our proposed method, the matrix parameter is set to . The predefined overestimated rank for VSBL and LMaFit is set to be twice the true rank. For the case , VSBL and BiGAMPMC present the same recovery performance with their curves overlapping each other. From Fig. 2, we can see that

Our proposed method presents the best performance for both sampling ratio cases. Meanwhile, it has a moderate computational complexity. When the sampling ratio is set to 0.5, our proposed method has a run time similar to the ALMMC method, while provides a clear performance improvement over the ALMMC.

The LMaFit method is the most computationally efficient. But its performance is not as good as our proposed method.

The proposed method outperforms the other two Bayesian methods, namely, the VSBL and the BiGAMPMC, by a big margin in terms of both recovery accuracy and computational complexity. Since the BiGAMP cannot automatically determine the matrix rank, it needs to try all possible values of the rank, which makes running the BiGAMPMC time costly.
BMCGPGAMP  0.0567  0.0233 
VSBL  0.0587  0.0249 
LMaFit  0.2525  0.2472 
BiGAMPMC  0.0573  0.0282 
ALMMC  0.0550  0.0246 
5.2 Gene data
We carry out experiments on gene data for genotype estimation. The dataset [4] is a matrix of size provided by Wellcome Trust Case Control Consortium (WTCCC) and contains the genetic information from chromosome 22. The dataset, which is referred to as “Chr22”, has been shown in [4] to be approximately lowrank. We randomly select or of the entries of the dataset as observations, and recover the rest entries using lowrank matrix completion methods. Again, for our proposed method, the matrix parameter is set to . The predefined ranks used for VSBL and LMaFit are both set to . Following [4], we use a metric termed as “allelicimputation error rate” to evaluate the performance of respective methods. The error rate is defined as
(56) 
where and denotes the true and the estimated matrices, respectively, the operation returns a matrix with each entry of rounded to its nearest integer, counts the number of nonzero entries of , and denotes the number of unobserved entries. We report the average error rates of respective algorithms in Table 1. From Table 1, we see that all methods, except the LMaFit method, present similar results and the proposed method slightly outperforms other methods when entries are observed. Despite the superior performance on synthetic data, the LMaFit method incurs large estimation errors for this dataset.
BMCGPGAMP  0.1931  0.1851 
VSBL  0.2004  0.1847 
LMaFit  0.2677  0.2354 
BiGAMPMC  0.2009  0.1856 
ALMMC  0.2002  0.1893 
5.3 Collaborative Filtering
In this experiment, we study the performance of respective methods
on the task of collaborative filtering. We use the MovieLens 100k
dataset
(57) 
where is a set containing the indexes of those unobserved available ratings, and denote the maximum and minimum ratings, respectively. The results of NMAE are shown in Table 2, from which we see that the proposed method achieves the most accurate rating prediction when the number of observed ratings is small.
Monarch  Lena  

BMCGPGAMPI  19.3328/0.4942  23.8965/0.7066  23.4235/0.4946  25.6866/0.5991 
BMCGPGAMPII  20.8628/0.6960  25.6955/0.8705  25.3337/0.7479  28.0434/0.8328 
BMCGPGAMPIII  21.1797/0.6710  25.6665/0.8422  25.2876/0.7388  27.9684/0.8213 
VSBL  16.5478/0.3625  19.9965/0.5468  21.5168/0.4999  23.8180/0.5964 
LMaFit  17.6527/0.4018  19.2834/0.5117  21.5525/0.4494  22.3862/0.5504 
BiGAMPMC  18.9154/0.4618  22.3883/0.6338  22.6557/0.4869  24.8173/0.5997 
ALMMC  19.6250/0.5149  23.6854/0.7253  23.1028/0.5164  25.5310/0.6433 
5.4 Image Inpainting
Lastly, we evaluate the performance of different methods on image inpainting. The objective of image inpainting is to complete an image with missing pixels. We conduct experiments on the benchmark images Butterfly and Lena, which are of size and , respectively. In our experiments, we examine the performance of our proposed method under different choices of . As usual, we can set . Such a choice of is referred to as BMCGPGAMPI. We can also set according to (10) and (12), which are respectively referred to as BMCGPGAMPII and BMCGPGAMPIII. The parameters and in (12) are set to and , respectively. As discussed earlier in our paper, the latter two choices exploit both the lowrankness and the smoothness of the signal. For the Butterfly image, we consider cases where and of pixels in the image are observed. For the Lena image, we consider cases where and of pixels are observed. We report the peak signal to noise ratio (PSNR) as well as the structural similarity (SSIM) index of each algorithm in Table 3. The original image with missing pixels and these images reconstructed by respective algorithms are shown in Fig. 3, 4, 5, and 6. From Table 3, we see that with a common choice of , our proposed method, BMCGPGAMPI, outperforms other methods in most cases. When is more carefully devised, our proposed method, i.e. BMCGPGAMPII and BMCGPGAMPIII, surpasses other methods by a substantial margin in terms of both PSNR and SSIM metrics. This result indicates that a careful choice of that captures both the lowrank structure as well as the smoothness of the latent matrix can help substantially improve the recovery performance. From the reconstructed images, we also see that our proposed method, especially BMCGPGAMPII and BMCGPGAMPIII, provides the best visual quality among all these methods.
6 Conclusions
The problem of lowrank matrix completion was studied in this paper. A hierarchical Gaussian prior model was proposed to promote the lowrank structure of the underlying matrix, in which columns of the lowrank matrix are assumed to be mutually independent and follow a common Gaussian distribution with zero mean and a precision matrix. The precision matrix is treated as a random parameter, with a Wishart distribution specified as a hyperprior over it. Based on this hierarchical prior model, we developed a variational Bayesian method for matrix completion. To avoid cumbersome matrix inverse operations, the GAMP technique was used and embedded in the variational Bayesian inference, which resulted in an efficient VBGAMP algorithm. Empirical results on synthetic and real datasets show that our proposed method offers competitive performance for matrix completion, and meanwhile achieves a significant reduction in computational complexity.
7 Detailed Derivation of (7)
We provide a detailed derivation of (7). We have