Robust Matrix Completion via Maximum Correntropy Criterion and Half Quadratic OptimizationYicong He, Fei Wang and Badong Chen are with the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, China, e-mails: heyicong@stu.xjtu.edu.cn, wfx@xjtu.edu.cn, chenbd@xjtu.edu.cn. Yingsong Li is with the College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China and National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China, e-mail: liyingsong@ieee.org Jing Qin is with the Center of Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hongkong, China. e-mail:harryqinjingcn@gmail.com

Robust Matrix Completion via Maximum Correntropy Criterion and Half Quadratic Optimizationthanks: Yicong He, Fei Wang and Badong Chen are with the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, China, e-mails: heyicong@stu.xjtu.edu.cn, wfx@xjtu.edu.cn, chenbd@xjtu.edu.cn. thanks: Yingsong Li is with the College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China and National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China, e-mail: liyingsong@ieee.org thanks: Jing Qin is with the Center of Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hongkong, China. e-mail:harryqinjingcn@gmail.com

Yicong He, Fei Wang, , Yingsong Li, , Jing Qin, Badong Chen,
Abstract

Robust matrix completion aims to recover a low-rank matrix from a subset of noisy entries perturbed by complex noises, where traditional methods for matrix completion may perform poorly due to utilizing error norm in optimization. In this paper, we propose a novel and fast robust matrix completion method based on maximum correntropy criterion (MCC). The correntropy based error measure is utilized instead of using -based error norm to improve the robustness to noises. Using the half-quadratic optimization technique, the correntropy based optimization can be transformed to a weighted matrix factorization problem. Then, two efficient algorithms are derived, including alternating minimization based algorithm and alternating gradient descend based algorithm. The proposed algorithms do not need to calculate singular value decomposition (SVD) at each iteration. Further, the adaptive kernel selection strategy is proposed to accelerate the convergence speed as well as improve the performance. Comparison with existing robust matrix completion algorithms is provided by simulations, showing that the new methods can achieve better performance than existing state-of-the-art algorithms.

I Introduction

Matrix completion is a novel technique that can fulfill the missing entries of a partially observed low-rank matrix [1, 2, 3]. Matrix completion takes advantages of low-rank properties of the matrix, which is always available in the fields such as computer vision [4, 5], recommender systems [6] and machine learning [7], where the data can always be well approximated by low-rank representation or structure. A typical application of matrix completion is the recommender system [8]. In the recommender system, how to guide users’ behavior based on existing data is a crucial problem that should be considered. For example, Netflix–the world’s largest online movie renter, which wants to recommend movies that might be of interest to users based on their behavior (ratings of various types of movies). User behavior consistency can be expressed by low rank property, thus matrix completion can be applied to predict the missing ratings of the users [9, 10].

Although matrix completion is, in general, an NP-hard problem, in the last decades various algorithms have been proposed to tackle this problem and show accurate reconstruction performance. In particular, by formulating the problem as a constrained rank (or nuclear norm) minimization problem, several algorithms have been developed including normalized iterative hard thresholding (NIHT) [11], iterative soft thresholding (IST) [12], singular value thresholding (SVT) [13] and fixed point continuation (FPC) [14]. These algorithms need to calculate full or truncated singular value decomposition (SVD) at each iteration, which may cause high computational complexity especially when the data scale is large. To reduce the performance degradation caused by SVD computation, recently the matrix factorization model is applied to solve the matrix completion problem [3, 15, 16, 17]. In matrix factorization, the target matrix is represented as a multiple of two matrices so that the low-rank property can be guaranteed. Thus algorithms based on matrix factorization can naturally overcome the drawback of low efficiency in SVD computation. The representative algorithms include Power factorization (PF) [18], Low-rank Matrix Fitting (LMaFit) [19] and alternating steepest descent (ASD) [20].

Traditional matrix completion often utilizes error norm in optimization, which can perform well under Gaussian noise assumption. However, in real applications, the observations may contain outliers. For example, in the recommender systems, the rate may exist human errors which is unreliable. In this case, the error norm may not properly represent the error statistics and the performance of traditional matrix completion algorithms may degrade. To improve the robustness against outliers, several robust matrix completion algorithms have been proposed. In [21], the authors utilize error norm-based the cost function and solve the optimization problem by regression and alternating direction method of multipliers (ADMM). In [22], the authors propose two new robust loss functions and utilize a distributed optimization framework [23] to solve the problem in parallel.

In recent years, an information theoretic learning (ITL) [24] measure called correntropy has been proposed to deal with the robust learning problem [25, 26]. Correntropy is a smooth local similarity measure, which has its root in Renyi’s entropy [27]. With a Gaussian kernel, correntropy involves all the even moments of the error and is insensitive to large outliers [28]. Compared with norm, correntropy based methods can achieve better performance especially when the outliers are large [29, 30]. Applying correntropy to matrix completion, the correntropy based iterative soft and hard thresholding strategies have been proposed [31]. However, as mentioned before, the iterative thresholding based algorithms need to compute the SVD and will suffer from high computational cost when data scale is large.

In the present paper, we combine correntropy with matrix factorization method and propose a new cost function for robust matrix completion. The correntropy measure is utilized instead of using norm, thus the negative effects of outliers can be alleviated. Using matrix factorization, there is no need to compute the SVD at each iteration. Further, to efficiently solve the correntropy based optimization, the half-quadratic (HQ) technique is adopted [32]. Using HQ, the complex optimization problem can be transformed into a quadratic optimization, and the traditional quadratic optimization method can be applied.

Based on HQ, we propose two algorithms for robust matrix completion. The first algorithm utilizes the traditional alternating minimization method [17]. At each minimization step, the correntropy based optimization is transformed to a weighted least squares problem so that the solution can be iteratively obtained. The second algorithm directly transforms the correntropy based cost to a weighted matrix completion problem and then utilize the alternating steepest descend (ASD) method [20]. Both algorithms utilize HQ technique but in different ways for optimization. Moreover, taking advantage of the properties of correntropy, we propose an adaptive kernel width selection strategy for the proposed algorithms to improve the convergence speed as well as reconstruction accuracy. In summary, the main contributions of this paper are:

1) A new cost function for robust matrix completion is proposed.

2) Two efficient algorithms are developed via HQ techniques.

3) Extensive simulations demonstrate the superior performance of the proposed algorithms compared with other state-of-the-art algorithms.

The paper is organized as follows. In section II we briefly review the concept of matrix completion and maximum correntropy criterion. In Section III we propose the new correntropy based matrix completion cost and propose two HQ based algorithms. In Section IV, simulation results are presented to demonstrate the reconstruction performance. Finally, conclusion is given in Section VI.

Ii Preliminaries

Ii-a Matrix completion

Consider a low-rank matrix where only a subset of entries can be observed. In particular, by defining the observed subset matrix where

(1)

the observed matrix can be represented as , where denotes the Hadamard product. The goal of matrix completion is to recover the whole entries of based on observed entries and low rank property. In detail, the matrix completion can be formulated as the following constrained minimization problem

(2)

where is the recovered matrix and denotes the Hadamard product.

The above optimization is an NP-hard and non-convex problem. In the last decade, various methods have been proposed to deal with the above matrix completion problem. Typically, there are three approaches, which are shown as follows:

1) Direct approach: Although Eq.(2) is NP-hard, methods based on iterative hard thresholding (IHT) technique [33] can still be directly applied to the optimization problem. Similar to compressive sensing, at each iteration, the IHT approach utilizes gradient descent to decrease a measurement fidelity objective and then perform the best rank-R approximation. Note that the to obtain the largest R singular values at each iteration, the truncated SVD should be performed. Moreover, the normalized IHT (NIHT) [11] has also been introduced to matrix completion, which shows better performance than IHT.

2) Convex relaxation: A popular method for solving a non-convex optimization problem is to relax the nonconvex optimization to a convex problem. In particular, the convex nuclear norm is always used to replace the nonconvex rank minimization, i.e.

(3)

where denotes the sum of all singular values of . The semidefinite programming (SDP) [34] and iterative soft thresholding (IST) [12] algorithms can be applied to solve Eq.(3) Note that to obtain the singular values, the SVD still should be performed at each iteration.

3) Matrix factorization: The above methods are both SVD based methods, and may suffer from high computational complexity when dealing with large scale data. Matrix factorization is a simple way to tackle this problem. Specifically, the recovered matrix is factorized to multiple of two matrices and where is the rank of . The matrix factorization then solves the matrix completion by utilizing following objective function

(4)

where denotes the Frobenius norm of matrix . The solution of (1.3) can be solved via alternating optimization methods. The representative algorithms include PowerFactorization (PF) [18], low-rank Matrix Fitting (LMaFit) [19] and alternating steepest descent (ASD) [20].

Ii-B Maximum Correntropy Criterion

Correntropy is a local and nonlinear similarity measure between two random variables within a ”window” in the joint space determined by the kernel width. Given two random variables and , the correntropy is defined by [28]

(5)

where is a shift-invariant Mercer kernel, and denotes the joint distribution function of . Given a finite number of samples , the correntropy can be approximated by

(6)

In general, the kernel function of correntropy is the Gaussian kernel

(7)

where and is the kernel width.

Compared with the norm based second-order statistics of the error, correntropy involves all the even moments of the difference between and and is insensitive to outliers. Replacing the second-order measure with correntropy measure leads to the maximum correntropy criterion (MCC) [35]. The MCC solution is obtained by maximizing the following utility function

(8)

Moreover, in practice, the MCC can also be formulated as minimizing the following correntropy-induced loss (C-loss) function [36, 37]

(9)

The above cost function is closely related to the Welsch’s cost function originally introduced in [38]. The C-loss function with different kernel width are shown in Fig.1. One can observe that the C-loss function can effectively alleviate the impact of large errors. Moreover, selecting different kernel widths can adjust the range of sensitivity to outliers, while the error measure near zero will not be highly affected.

Fig. 1: Curves of versus error under different cost functions

Iii Proposed Algorithms

In this section, we combine matrix factorization with correntropy measure and propose a new objective function. Then we propose two efficient algorithms to solve the optimization problem.

Eq.(4) can be further rewritten as the sum of weighted squared residuals

(10)

where denotes the -th entry of matrix . When the observed entry contain large outliers, the error measure may not work well because the outliers may highly bias the optimization. To improve the robustness, in this paper we introduce the correntropy as the error measure. In particular, by replacing the error measure with correntropy, one can obtain the following new optimization problem:

(11)

The above equation can also be simplified as the following representation

(12)

The formulation of the above correntropy based optimization is closely related to [39] and [40]. In particular, when matrix is fully observed (i.e. for all ), Eq.. equals to the optimization of robust PCA based on MCC [39]. Further, when continues to impose constrains that both and are non-negative matrices, the optimization in Eq.(12) can be equivalent to the correntropy based nonnegative matrix factorization problem [40]. Certainly, due to existence of observation matrix in Eq.(12), the solution in [39] and [40] will no longer suitable for matrix completion. Thus one should seek new approaches to solve Eq.(12).

Iii-a Optimization via half-quadratic

In general, the correntropy based objective function in Eq.(12) is difficult to be minimized directly. To tackle this problem, the half-quadratic (HQ) technique has been applied to optimize these correntropy based cost functions [39, 40, 41, 42]. By introducing additional auxiliary variable, HQ reformulates a nonquadratic cost function as an augmented objective function in enlarged parameter space.

According to half quadratic theory [43], for there exists a convex conjugated function so that

(13)

where and the maximum is reached at . Eq.(13) can be further rewritten as

(14)

By defining and , the above equation can be further derived as

(15)

Thus, as shown above, minimizing the nonconvex C-loss function in terms of is equivalent to minimizing an augmented cost function in an enlarged parameter space . Therefore, by substituting Eq.(15) to Eq.(11), the correntropy based objective function can be further formulated as

(16)

Further, by defining the augmented cost function

(17)

where , we have the following relation

(18)

and the correntropy based optimization problem can be formulated as a half-quadratic based optimization. Similar to [21], treating and as a whole, the optimization can be solved with following alternating minimization procedure:

1) optimizing : from Eq.(13) and Eq.(15) one can observe that given a certain , the minimum is reached as . Therefore, given the fixed and , the optimal solutions of can be obtained as

(19)

We should note that when , optimal is unavailable. However, computing for do not affect solution of Eq.(12) and Eq.(17) because of multiplication with . To simplify the expression, in the following we directly use for full matrix . In practical application, one just need to calculate for .

2) Given a fixed , Eq.(17) becomes a weighted matrix completion optimization problem

(20)

The weighting matrix assigns different weights based on error residuals. According to the property of Gaussian function, a large error will assign small weight, such that the negative impact of outliers may be greatly reduced. So far, there are no existing algorithms to directly solve Eq.(17). In the following, we follow two methods of traditional matrix completion and propose two efficient algorithms to solve the above weighted matrix completion optimization.

Iii-B Correntropy based power factorization algorithm

In this part, we utilize the alternating minimization method to solve the correntropy based matrix completion problem. Alternating minimization is a widely used method for solving matrix factorization based optimization problem, and the representative algorithm is the Power factorization (PF). In particular, PF alternately minimizes and at each iteration for Eq.(4), i.e. fix one factored matrix and optimize the other. Similar to the traditional PF, for correntropy based optimization in Eq.(12), we can also alternatively optimize and as follows

(21)
(22)

where denotes the iteration number. Then, the HQ techniques can be utilized for each minimization step in Eq.(21) or Eq.(22). In particular, similar to Eq.(18), given a fixed , Eq.(21) can be further rewritten as

(23)

The above minimization problem can be solved by alternating minimization described in the previous subsection. To distinguish the iteration procedure in Eq.(21) and Eq.(22), we denote the alternating minimization for in Eq.(23) as the inner iteration, while the iteration in Eq.(21) and Eq.(22) are outer iteration. Therefore, at each inner iteration , we first obtain optimal from Eq.(19) and then solve

(24)

Due to existence of , Eq.(24) do not has explicit solution. To solve this problem, similar to [21], by defining

(25)

Eq.(24) can be optimized by solving the following subproblems:

(26)

denotes the diagonal matrix whose entries are composed by elements of . For each subproblem, by dropping the zero diagonal entries of , one can finally obtain

(27)

where denotes the index set of non-zero entries of , such that and where is the cardinality of . is a diagonal matrix with entries

(28)

Eq.(27) is essentially a weighted least squares problem and has a explicit solution

(29)

Thus each can be obtained by alternating calculate Eq.(28) and Eq.(29) until convergence. The same iteration procedure can be applied for Eq.(22) with fixed .

The above algorithm is called the half-quadratic based powerfactorization (HQ-PF) algorithm. In the following we give two propositions for HQ-PF.

Proposition 1: For a non-increasing , the sequence generated by HQ-PF will converge.

Proof: According to properties of alternating minimization, for a fixed we can obtain

(30)

Moreover, considering the following function in terms of

(31)

Taking derivative of Eq.. we can obtain

(32)

The (a) holds since for . Therefore monotonically increases as increases, and then for

will always satisfied. Thus, for a non-increasing , the sequence monotonically non-increases . It can be also verified that is always below-bounded for arbitrary . Thus, will converge.

Proposition 2: When , the HQ-PF is equal to PF.

Proof: one can observe that when the kernel width tends to infinity, the equation

(33)

will hold. Therefore, for sufficient large , the correntropy based optimization in Eq.(21) and Eq.(22) becomes equal to error norm based optimization

(34)
(35)

which is the typical iteration procedure of PF. Moreover, as , will be equal to 1, and becomes the identity matrix. The and can be directly obtained by

(36)
(37)

and the inner iteration number becomes 1. The solution coincides with the solution of PF [18].

As shown in Fig.1, the kernel width of Gaussian kernel function affects the range of sensitivity to outliers. Lots of works of literature have shown that relatively small can offer more accurate performance but also suffers from low convergence speed [44]. A practical way is to use adaptive kernel width [44, 27]. On the other hand, in the field of online adaptive filtering, many algorithms utilize the LS method at the first several iterations to speed up the convergence. In this work, to improve both the efficiency and accuracy, we combine the above two methods and propose a new kernel width selection strategy for HQPF. In particular, by defining the error residual matrix at iteration with

(38)

we measure the convergence speed using the relative change of , i.e. . At the initial iterations, we directly utilize norm based PF solution in Eq.(36) to update and to speed up the initial convergence speed. When the convergence speed is slow, i.e. where is the free threshold parameter, the optimization is switched to correntropy based optimization in Eq.(21) and Eq.(22). The adaptive kenrel width at -th iteration is selected using the following strategy

(39)

where denotes the vector composed by elements of all non-zero entries of , and denotes the -th quantile of . is the parameter controls the kernel width, and is the lower bound of . Finally, if is less than a sufficient small value , we consider the algorithms converge to a local minimum, and the iteration procedure terminates.

For optimization of Eq.(24), the selection of kernel width for inner iteration also affects the performance. Too small at initial iteration may lead to near-zero , and may cause the singularity problem. Therefore, we also utilize the adaptive kernel selection method to update at each inner iteration. In particular, is initialized to a sufficient large value (i.e.,10000) at and then update as follows for :

(40)

where is the threshold parameter. The norm of relative error vector is also utilized as the stop criterion for inner iteration.

The pseudocode of HQPF algorithm with adaptive kernel selection is summarized in Algorithm 1. Note that in each alternating minimization step, (or ) subproblems are actually independent with each other. Thus one can further utilize a distributed system to solve the subproblems in parallel and speed up the computation.

0:  , and r
  %Initialization
  initial matrices and , ,
  %Computation using norm based solution
  repeat
     solve using (36)
     solve using (37)
     
  until 
  %Computation using correntropy based solution
  repeat
     compute according to (39)
     solve alternatively computing (28) and (29) until convergence
     solve using the same method with until convergence
     
  until 
  
Algorithm 1 HQ-PF for robust matrix completion

Iii-C Correntropy based alternating steepest descent algorithm

HQ-PF is an extension of the traditional PF algorithm. Although HQ-PF is a distributable algorithm which can improve computation efficiency, the whole computational cost is still much higher than based algorithm since at each iteration the weighted LS optimization should be applied. Recently, an alternating steepest descent (ASD) method is proposed for matrix completion task. ASD directly applies gradient descend method and shows faster performance than alternating minimization based algorithms. Inspired by ASD, in this section we introduce the gradient descent method to solve Eq.(12) and derive a more efficient algorithm.

As described in subsection A, we first optimize according to Eq.(19) Then, unlike alternating minimization, we directly apply the gradient descent method to alternative update and only one step at each iteration. For further derivation, we add a coefficient to Eq.(20) so that the minimization becomes

(41)

then, based on Eq.(25), for a fixed , Eq.(41) can be rewritten as the function in terms of :

(42)

where . Thus the gradient of Eq.(42) at each element can be derived as

(43)

hence the gradient descent of along can be obtained as

(44)

Further, the gradient descent step size is selected by solving the following optimization problem

(45)

Similar to Eq.(44) and Eq.(45), for a fixed , we can obtain the gradient descent along and the corresponding step size as

(46)
(47)

Therefore, the matrices and can be alternated update using gradient descend method, i.e. for each iteration .

(48)

The algorithm with update above is called the half-quadratic alternating steepest descend (HQASD) algorithm. The following proposition guarantees the convergence of the using the above gradient descend method.

Proposition 3: For a non-increasing , the sequence generated by HQASD will converge.

Proof: according to properties of alternating descend, for a fixed we can obtain

(49)

Since is bounded below, one can obtain [32]

Moreover, according to proof of proposition 1, for

will always satisfied. Thus, the sequence generated by HQ-ASD will converge for non-increasing .

Proposition 4: When , the HQ-PF is equal to PF.

Proof: As , will be equal to 1, thus all the entries of becomes 1. The in Eq.(19) does not need to be optimized, and Eq.(44)-(47) will be the same as the algorithm for ASD in [20].

In [20], the author also proposed the scaled ASD (ScaledASD) algorithm to improve the convergence speed and recover performance. Similar to ScaledASD, we can further scale the gradient descent directions in Eq.(44) and Eq.(46) by and , respectively, i.e.

(50)

the corresponding step sizes are obtained as

(51)

and the gradient update of and can be then derived as

(52)

Since ScaledASD has been proved to show better performance than ASD [20]. Thus, we can conduct that Scaled HQ-ASD will also perform better than Scaled HQ-ASD. Therefore, for simplicity, in the following part we directly utilize Eq.(52) as the update of HQ-ASD.

Similar to HQPF, we also apply the adaptive selection of kernel width to improve the convergence speed and performance of HQASD. In particular, at first several iterations, the kernel width is fixed to sufficient large value (or equivalently set to an all one matrix and use ASD update procedure). When , the optimization is switched to correntropy based optimization and the HQASD with adaptive kernel width in Eq.(39) is applied.

The pseudocode of HQASD is summarized in Algorithm 2.

0:  , and r
  %Initialization
  initial matrices and , , ,
  %Computation using ASD (i.e. sufficient large )
  repeat
     
     compute using (52)
     
  until 
  %Computation using correntropy based solution
  repeat
     compute according to (39)
     compute using (52)
     
  until 
  
Algorithm 2 HQASD for robust matrix completion

Iii-D Complexity analysis

In this part, we discuss the complexity of the two proposed algorithms. For HQPF, at each minimization step of Eq.(HQPFU) and Eq.(22), the complexity is where is the number of outer iteration. For the inner iteration of HQPF, we consider two cases. When utilizing PF at first several iterations (denoted as ), the least squares solution can be directly obtained, such that . When applied weighted least squares for HQPF, an iteration procedure should be performed, and the inner iteration number is denoted as . Therefore, the final complexity of HQPF is . One can observe that the complexity is closely related to the percentage of observations and rank of . Larger rank or larger amount of observed entries may both increase the computational cost of HQPF. Moreover, as mentioned in Section B, HQPF is friendly to multicore and distributed systems. In particular, subproblems for solving and are independent and can be applied in a parallel way. Therefore, for a distributed system with workers, the complexity of each worker will be reduced to .

The complexity of HQASD is similar to ASD [20]. In particular, the complexity per iteration without can be directly obtained from the complexity of ASD, i.e. the complexity is . When taking computation of into consideration, the complexity of MCC-ASD at each iteration becomes . Therefore, the overall complexity of MCC-ASD is where is the iteration number. As can be seen, the complexity of HQASD is also positively correlated to the percentage of observations and rank of . Moreover, compared with HQPF, the complexity of HQASD per iteration is much smaller especially when rank or matrix size is large (large matrix size will lead to large ). Certainly, the gradient descend based HQASD may need a larger number of iterations than HQPF. The final computation cost comparison will be verified by simulations.

Iv Simulations

In this section, we carry out simulations to verify the performance of the proposed two algorithms.

We compare the performance with existing state-of-the-art robust matrix completion methods including based alternating minimization via PowerFactorization (-PF), Quadratic Programming (QP) with the loss function (BMFC) and correntropy based iterative hard thresholding (CIHT). In particular, to ensure fairness in comparison, the kernel width adaptation method proposed in this paper is also applied to CIHT in the simulations. All the algorithms are implemented in MATLAB r2017b on a 2.6-GHz and 16-GB memory computer without any acceleration. The completion performance is evaluated by normalized mean square error (NMSE) defined by

(53)

In the simulation, the expectation in Eq.(53) is approximated by

where is the number of Monte Carlo runs.

In the simulation, the typical two-component Gaussian mixture model (GMM) is utilized as the non-Gaussian noise model. The probability density function (PDF) of GMM is defined as

(54)

where represents general noise disturbance with variance , and stands for outliers that occur occasionally with a large variance . The variable controls the occurrence probability of outliers.

For all algorithms, similar to HQPF and HQASD, we utilize the relative change of the current and previous iterations as the stop criterion, and the threshold parameter is set specific to each algorithm. During all the simulations, without explicitly mentioned, the threshold parameter for stop criterion is set to for BMFC, -PF, HQ-PF, and for HQASD and CIHT. The threshold parameter for adaptive kernel width strategy is set to , and for HQ-PF, HQASD and CIHT, respectively. The inner iteration threshold for weighted LS is set to for both -PF and HQPF. Other parameters are tuned to achieve the best during each simulation.

Fig. 2: Curves of NMSE with different (dB).
Fig. 3: Curves of NMSE with different (dB).
Fig. 4: Curves of NMSE with different .

Iv-a Random matrix completion

We first compare the performance on the synthetic random data. The matrix with rank is generated by multiplying two matrices and . The observation matrix with the percentage of observation is generated by randomly assign of the entries of with value 1. In this part, without explicitly mentioned, we set the matrix as the squared matrix and the dimension is set to . The parameters of GMM noise are set as . The rank is set to . The percentage of observation is fixed at . The parameter for BMFC is set to 6. For each simulation, the NMSE is obtained via 200 Monte Carlo runs with different realizations of , , , and noises.

First, we compare the performance under different noise environments. We gradually increase the variance , , and calculate the NMSE values. The curves of NMSE with different noise distributions are shown in Fig.2-4. One can observe that the performance of all algorithms degrade as and increases, while the performance is slightly changed with different levels of outliers (i.e. ). In particular, all algorithms achieve comparable performance except -PF, which is mainly caused by nonsmooth of norm near zero error.

Second, the performance with different sizes of are investigated. Fig.5 shows the curves of NMSE in terms of different matrix sizes , and Fig.6 shows the corresponding average running times for a single matrix completion procedure. As can be seen, as the size of the matrix increases, the NMSE values of all algorithms decrease, while the time costs increase significantly for all algorithms. The proposed two algorithms HQPF and HQASD achieve comparable lower NMSE than other algorithms, and HQASD runs much faster than other algorithms. In particular, HQASD can run as fast as about 3 orders of magnitude than -PF and BMFC when the matrix size is larger than 700.

Third, we explore the largest recoverable rank of under different percentage of observed entries , which is also called the phase transition. The rank and the percentage are set within [2,30] and [0,100], respectively. For each selection of and , 200 Monte Carlo runs are performed, and the recovery is judged to be successful if the NMSE is lower than . The phase transition image for different algorithms is shown in Fig.7. The shade of the color block represents the probability of success, i.e. the percentage of successful recovery in 200 Monte Carlo runs. One can observe that the white region of HQASD and HQPF is larger than other algorithms, which shows that the proposed algorithms can afford larger rank or lower percentage of observations. Meanwhile, the corresponding average running times and NMSE values curves for each algorithm are shown in Fig.8 and Fig.9. Note that only corresponding data with white blocks in Fig.7 are shown in Fig.8 and Fig.9. As seen, the proposed HQASD achieve lowest NMSE as well as lowest computational cost among all algorithms. HQPF and CIHT achieve similar low NMSE with HQASD but need more time for completion. Moreover, among all distributable algorithms, the HQPF performs the best.

It is known that for correntropy based algorithms, the kernel width highly affects the performance. Thus, here we also analyze the sensitivity of kernel parameters. As shown in Eq.(39), in kernel adaptation strategy, the kernel width is determined by the choice of and . Thus we select different values of and and then obtain the corresponding NMSE over 200 Monte Carlo runs. The NMSE curves versus different are shown in Fig.10. As can be seen, the algorithms can work well in a wide range of values of and . In particular, when , the NMSE increases as grows, while the NMSE will not highly be affected when selecting different values of . When the and are both selected as the too small values, the kernel width is too small and the algorithms may not converge to local minima.

Fig. 5: Curves of NMSE under different matrix sizes .
Fig. 6: Curves of average running times versus matrix size .
Fig. 7: Phase transition image for different algorithms. (a) BMFC, (b) -PF, (c) CIHT, (d) HQPF, (e) HQASD.
Fig. 8: Surfaces of NMSE with different rank and observation percentage .
Fig. 9: Surfaces of average running times with different rank and observation percentage .
Fig. 10: Surfaces of NMSE with different and .

Iv-B Image Inpainting

In this part, we compare the performance of algorithms on image inpainting tasks with non-Gaussian noise. Image inpainting aims to fill in the unknown pixels of an image from an incomplete image. Because the most image can be well approximated by low-rank representation, image inpainting can be seen as a matrix completion task. Moreover, to evaluate the performance under non-Gaussian noise, we select a mixture of Gaussian and Salt-and-pepper noise as the noise model. Gaussian noise is a typical normal image noise caused by electronic components. Salt-and-pepper noise is another noise produced in errors in the analog-to-digital converter or bit transmission caused by sudden intense interference, such that the noise value with or sparsely occurs on the image. We utilize the popular peak signal to noise ratio (PSNR) to evaluate the performance, which is defined as

A higher PSNR represents better recovery performance.

We select the Lena and Palace image as the test images. Lena image (Fig.11(a)) is a popular image for performance evaluation, while Palace image (fig.12(a)) contains duplicate patterns which is always utilized for image inpainting test. Each image is compressed via best rank-50 approximation (see. Fig.11(b) and Fig.12(b)) so that the low rank property is guaranteed. Then the two images are masked in a ”cross pattern” and a ”stamp mark”, respectively. Finally, the observed pixels of images are added with Gaussian noise with variance , and then of the observed pixels are also disturbed by salt-and-pepper noise. During the simulations, 100 Monte Carlo runs are performed for each recovery task. The algorithms parameters for HQPF an HQASD are tuned as . For BMFC, to obtain better performance, the parameter is set to 6 and 20 for Gaussian and non-Gaussian noise, respectively.

Table.1 lists the average recovery PSNR and the corresponding average running times under noiseless and noisy environments. One can see that the proposed HQASD algorithm achieves the best performance in all simulations. In particular, HQASD obtains the highest PSNR as well as lowest running times. Moreover, HQPF obtains the comparable PSNR for Palace image, while the performance is worse than HQASD for Lena image. Nevertheless, HQPF still achieves the best performance among distributable algorithms.

To further demonstrate the recovery performance, samples of images recovered by different algorithms under non-Gaussian noise are shown in Fig.11 and Fig.12. The enlarge view of part of recovered images are also depicted to evidently show the recovery differences. One can see that when filling missing entries of the face region in Lena image, fringes are produced in all of the recovered images. In particular, BMFC and MCCIHT have the most obvious fringes, while HQASD has the least visible fringes. Moreover, -PF and HQPF fail to accurately recover the left eye, which is probably caused by converging to a wrong local minimum perturbed by non-Gaussian noise via alternative minimization. Furthermore, for the palace image, the recovered image of BMFC and MCCIHT still have visible reconstruction error. -PF, HQPF and HQASD can successfully recover the image. From the enlarged view, one can see that the recovered image of HQPF and HQASD are slightly clear than -PF, especially for object edges.

Fig. 11: Image inpainting sample of Lena image under mixture of Gaussian and Salt-and-pepper noise.
Fig. 12: Image inpainting sample of Palace image under mixture of Gaussian and Salt-and-pepper noise.
Method Lena+low rank Lena+low rank+noise Palce+low rank Palce+low rank+noise
PSNR(dB) Time(s) PSNR(dB) Time(s) PSNR(dB) Time(s) PSNR(dB) Time(s)
BMFC 13.81 1459.8 35.18 1374.3 18.71 706.3 22.2 1331.1
-PF 76.90 1752.9 34.65 2575.3 87.59 568.3 42.5 1839.7
CIHT 60.33 188.8 32.81 62.58 26.55 194.23 31.6 62.2
HQPF 75.33 465.2 36.93 181.34 88.60 77.8 45.3 130.3
HQASD 92.88 27.5 42.44 5.75 88.84 3.7 45.3 3.6
TABLE I: Image Inpainting Performance Comparison: PSNR and average running times

Iv-C Experiments on MovieLens dataset

In this part we evaluate the proposed algorithm on the real data set. MovieLens is a widely used dataset for recommender system. Firstly, similar to experiments in [22, 21], we carry out the experiment on MovieLens-100K data set. MovieLens-100K consists of 100,000 ratings (1-5) from 943 users on 1682 movies, and the percentage is observed data about . It also provide 5 splits of training data and testing data , where and account for and of the observed data, respectively. We perform the test on both noiseless case and noisy case. In noisy case, of the rating value are replaced by , and of the rating value are replaced by , too. The performance is evaluated using the root mean square error (RMSE) defined as [22]

where is a logical matrix for testing data where the each entry denotes whether the -th entry of testing data is observed. The expectation is approximated by 10 Monte Carlo realizations.

In this experiment, we set for HQPF and HQASD. For CIHT and HQASD, the threshold is set to . Fig.13 and Fig.14 depict the RMSE results for all algorithms with different values of rank under the noiseless and noisy case, respectively. As seen, all algorithms work well when or in both noiseless and noisy case. While when , all algorithms suffer from different degrees of performance degradation. In particular, HQASD can maintain good performance when is as large as 5. On the contrary, the performance of alternative minimization based -PF and HQPF algorithms degrades seriously when . Furthermore, we list the average RMSE and corresponding average running times under in Table II. Here we also add traditional -based PF algorithm for comparison. One can observe that the HQASD achieves the lowest RMSE in both noiseless and noisy cases. HQASD also runs much faster than other robust methods.

To further demonstrate the advantage of proposed algorithms in computational cost, we also carry out the comparison on the more challenging MovieLens-1m dataset. The MovieLens-1m dataset contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 users. The observation percentage is only 4, and the matrix size is 15 times larger than MovieLens-100K. We evaluate the performance of all algorithms on MovieLens-1m under noiseless and noisy cases similar to experiments on MovieLens-100K. The noise and algorithm settings are the same as the previous simulation. Table II shows the average RMSE and corresponding average running times. One can observe that the cost times increase greatly compared with MovieLens-100K for all algorithms. The RMSE also increases compare to MovieLens-100K. In particular, HQASD still achieves much more fast performance than other robust algorithms, and HQPF obtains the best performance among distributable algorithms.

Fig. 13: RMSE curves with different rank without noise.
Fig. 14: RMSE curves with different rank under noisy environment.
Method MovieLens100K MovieLens100K+noise MovieLens1M MovieLens1M+noise
RMSE Time(s) RMSE Time(s) RMSE Time(s) RMSE Time(s)
-PF 0.9494 9.1 0.9776 18.6 1.1419 162.7 1.1457 122.4
BMFC 0.9849 104.2 1.0080 118.7 1.1516 2185.7 1.1751 2042.3
-PF 0.9912 242.4 1.0009 213.6 1.1955 5734.8 1.2045 5024.1
CIHT 0.9916 434.4 1.0012 412.8 1.1411 18694.0 1.1417 17295.2
HQPF 0.9506 42.4 0.9745 64.6 1.1441 3058.3 1.1437 2550.0
HQASD 0.9395 25.6 0.9541 18.5 1.1342 445.5 1.1356 358.7
TABLE II: MovieLens Dataset Performance Comparison: RMSE and average running times

V Conclusion

In this paper, we proposed two novel efficient and robust matrix completion algorithms. The algorithms apply correntropy as the error measure to improve the robustness. To overcome the complicated computation of non-quadratic correntropy based optimization, we utilize the half-quadratic technique to efficiently solve the problem. The two proposed algorithms, HQPF and HQASD, adopt the same half-quadratic method but are developed in different ways. HQPF is derived from traditional alternating minimization method and can be processed in parallel. HQASD is obtained by gradient descend method and has much lower computational cost. Further, an adaptive kernel width selection strategy is proposed to speed up the convergence of the new algorithms. Extensive simulations and real-world data experiments are conducted, demonstrating that the proposed algorithms can achieve better performance than existing state-of-the-art methods.

Acknowledgements

This work was supported by 973 Program (No.2015CB351703), National NSF of China (No.61273366 and No.91648208) and National Science and Technology support program of China (No.2015BAH31F01).

References

  • [1] E. J. Candes and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, no. 6, p. 717, 2009.
  • [2] E. J. Candes and Y. Plan, “Matrix completion with noise,” Proceedings of the IEEE, vol. 98, no. 6, pp. 925–936, 2010.
  • [3] R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from a few entries,” IEEE transactions on information theory, vol. 56, no. 6, pp. 2980–2998, 2010.
  • [4] H. Ji, C. Liu, Z. Shen, and Y. Xu, “Robust video denoising using low rank matrix completion,” in Computer Vision and Pattern Recognition, 2010.
  • [5] L. Wu, A. Ganesh, B. Shi, Y. Matsushita, Y. Wang, and Y. Ma, “Robust photometric stereo via low-rank matrix completion and recovery,” in Asian Conference on Computer Vision.   Springer, 2010, pp. 703–717.
  • [6] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, no. 8, pp. 30–37, 2009.
  • [7] R. Cabral, F. De la Torre, J. P. Costeira, and A. Bernardino, “Matrix completion for weakly-supervised multi-label image classification,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 1, pp. 121–135, 2015.
  • [8] F. Isinkaye, Y. Folajimi, and B. Ojokoh, “Recommendation systems: Principles, methods and evaluation,” Egyptian Informatics Journal, vol. 16, no. 3, pp. 261–273, 2015.
  • [9] S. Funk, “Netflix update: Try this at home,” 2006, p. [Online]. Available: http://sifter.org/ simon/journal/20061211.html.
  • [10] Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan, “Large-scale parallel collaborative filtering for the netflix prize,” in International Conference on Algorithmic Applications in Management.   Springer, 2008, pp. 337–348.
  • [11] J. Tanner and W. Ke, “Normalized iterative hard thresholding for matrix completion,” Siam Journal on Scientific Computing, vol. 35, no. 5, pp. S104–S125, 2013.
  • [12] WRIGHT, J. Stephen, NOWAK, D. Robert, FIGUEIREDO, and A. T. Mario, “Sparse reconstruction by separable approximation,” IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2479–2493, 2009.
  • [13] J.-F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
  • [14] S. Ma, D. Goldfarb, and L. Chen, “Fixed point and bregman iterative methods for matrix rank minimization,” Mathematical Programming, vol. 128, no. 1-2, pp. 321–353, 2011.
  • [15] R. Sun and Z. Q. Luo, “Guaranteed matrix completion via non-convex factorization,” in Foundations of Computer Science, 2015.
  • [16] M. Hardt, “Understanding alternating minimization for matrix completion,” in Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on.   IEEE, 2014, pp. 651–660.
  • [17] P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” in Proceedings of the forty-fifth annual ACM symposium on Theory of computing.   ACM, 2013, pp. 665–674.
  • [18] J. P. Haldar and D. Hernando, “Rank-constrained solutions to linear matrix equations using powerfactorization,” IEEE Signal Processing Letters, vol. 16, no. 7, pp. 584–587, 2009.
  • [19] Z. Wen, W. Yin, and Y. Zhang, “Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm,” Mathematical Programming Computation, vol. 4, no. 4, pp. 333–361, 2012.
  • [20] J. Tanner and K. Wei, “Low rank matrix completion by alternating steepest descent methods,” Applied and Computational Harmonic Analysis, vol. 40, no. 2, pp. 417–429, 2016.
  • [21] W. J. Zeng and H. C. So, “Outlier-robust matrix completion via -minimization,” IEEE Transactions on Signal Processing, vol. 66, no. 5, pp. 1125–1140, 2018.
  • [22] L. Zhao, P. Babu, and D. Palomar, “Efficient algorithms on robust low-rank matrix completion against outliers,” IEEE Transactions on Signal Processing, vol. 64, no. 18, pp. 4767–4780, 2016.
  • [23] G. Scutari, F. Facchinei, P. Song, D. P. Palomar, and J.-S. Pang, “Decomposition by partial linearization: Parallel optimization of multi-agent systems,” IEEE Transactions on Signal Processing, vol. 62, no. 3, pp. 641–656, 2014.
  • [24] J. C. Principe, Information theoretic learning: Renyi’s entropy and kernel perspectives.   Springer Science & Business Media, 2010.
  • [25] S. Zhao, B. Chen, and J. C. Príncipe, “Kernel adaptive filtering with maximum correntropy criterion,” in International Joint Conference on Neural Networks, 2011, pp. 2012–2017.
  • [26] B. Chen, X. Liu, H. Zhao, and J. C. Principe, “Maximum correntropy kalman filter,” Automatica, vol. 76, pp. 70–77, 2017.
  • [27] J. C. Principe, Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives.   Springer Publishing Company, Incorporated, 2010.
  • [28] W. Liu, P. P. Pokharel, and J. C. Príncipe, “Correntropy: Properties and applications in non-gaussian signal processing,” IEEE Transactions on Signal Processing, vol. 55, no. 11, pp. 5286–5298, 2007.
  • [29] B. Chen, L. Xing, H. Zhao, N. Zheng, and J. C. Principe, “Generalized correntropy for robust adaptive filtering,” IEEE Transactions on Signal Processing, vol. 64, no. 13, pp. 3376–3387, 2016.
  • [30] W. Ma, H. Qu, G. Gui, L. Xu, J. Zhao, and B. Chen, “Maximum correntropy criterion based sparse adaptive filtering algorithms for robust channel estimation under non-gaussian environments,” Journal of the Franklin Institute, vol. 352, no. 7, pp. 2708–2727, 2015.
  • [31] Y. Yang, Y. Feng, and J. Suykens, “Correntropy based matrix completion,” Entropy, vol. 20, no. 3, p. 171, 2018.
  • [32] M. Nikolova and M. K. Ng, “Analysis of half-quadratic minimization methods for signal and image recovery,” SIAM Journal on Scientific Computing, vol. 27, no. 3, pp. 937–966, 2005.
  • [33] T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sensing,” Applied and computational harmonic analysis, vol. 27, no. 3, pp. 265–274, 2009.
  • [34] Z. Liu and L. Vandenberghe, “Interior-point method for nuclear norm approximation with application to system identification,” Siam Journal on Matrix Analysis & Applications, vol. 31, no. 3, pp. 1235–1256, 2009.
  • [35] A. Singh and J. C. Principe, “Using correntropy as a cost function in linear adaptive filters,” in International Joint Conference on Neural Networks, IJCNN 2009, Atlanta, Georgia, Usa, 14-19 June, 2009, pp. 2950–2955.
  • [36] A. Singh, R. Pokharel, and J. Principe, “The c-loss function for pattern classification,” Pattern Recognition, vol. 47, no. 1, pp. 441–453, 2014.
  • [37] G. Xu, B.-G. Hu, and J. C. Principe, “Robust c-loss kernel classifiers,” IEEE transactions on neural networks and learning systems, vol. 29, no. 3, pp. 510–522, 2018.
  • [38] J. E. D. Jr and R. E. Welsch, “Techniques for nonlinear least squares and robust regression,” Communication in Statistics- Simulation and Computation, vol. 7, no. 4, pp. 345–359, 1978.
  • [39] R. He, B.-G. Hu, W.-S. Zheng, and X.-W. Kong, “Robust principal component analysis based on maximum correntropy criterion,” IEEE Transactions on Image Processing, vol. 20, no. 6, pp. 1485–1494, 2011.
  • [40] L. Du, X. Li, and Y.-D. Shen, “Robust nonnegative matrix factorization via half-quadratic minimization,” in 2012 IEEE 12th International Conference on Data Mining.   IEEE, 2012, pp. 201–210.
  • [41] R. He, W.-S. Zheng, T. Tan, and Z. Sun, “Half-quadratic-based iterative minimization for robust sparse representation,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 2, pp. 261–275, 2014.
  • [42] Y. Wang, Y. Y. Tang, and L. Li, “Correntropy matching pursuit with application to robust digit and face recognition,” IEEE transactions on cybernetics, vol. 47, no. 6, pp. 1354–1366, 2017.
  • [43] M. Nikolova and R. H. Chan, “The equivalence of half-quadratic minimization and the gradient linearization iteration,” IEEE Transactions on Image Processing, vol. 16, no. 6, pp. 1623–1627, 2007.
  • [44] F. Huang, J. Zhang, and S. Zhang, “Adaptive filtering under a variable kernel width maximum correntropy criterion,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 64, no. 10, pp. 1247–1251, 2017.