Solving Principal Component Pursuit in Linear Time via Filtering
Abstract
In the past decades, exactly recovering the intrinsic data structure from corrupted observations, which is known as robust principal component analysis (RPCA), has attracted tremendous interests and found many applications in computer vision. Recently, this problem has been formulated as recovering a lowrank component and a sparse component from the observed data matrix. It is proved that under some suitable conditions, this problem can be exactly solved by principal component pursuit (PCP), i.e., minimizing a combination of nuclear norm and norm. Most of the existing methods for solving PCP require singular value decompositions (SVD) of the data matrix, resulting in a high computational complexity, hence preventing the applications of RPCA to very large scale computer vision problems. In this paper, we propose a novel algorithm, called filtering, for exactly solving PCP with an complexity, where is the size of data matrix and is the rank of the matrix to recover, which is supposed to be much smaller than and . Moreover, filtering is highly parallelizable. It is the first algorithm that can exactly solve a nuclear norm minimization problem in linear time (with respect to the data size). Experiments on both synthetic data and real applications testify to the great advantage of filtering in speed over stateoftheart algorithms.
Keywords:
Robust Principal Component Analysis Principal Component Pursuit filtering singular value decomposition nuclear norm minimization norm minimization.∎
1 Introduction
Robustly recovering the intrinsic lowdimensional structure of highdimensional visual data, which is known as robust principal component analysis (RPCA), plays a fundamental role in various computer vision tasks, such as face image alignment and processing, video denoising, structure from motion, background modeling, photometric stereo and texture representation (see e.g., Wright et al (2009), Ji et al (2010), De la Torre and Black (2003), Peng et al (2010), Wu et al (2010), and Zhang et al (2012), to name just a few). Through the years, a large number of approaches have been proposed for solving this problem. The representative works include De la Torre and Black (2003), Nie et al (2011), Aanes et al (2002), Baccini et al (1996), Ke and Kanade (2005), Skocaj et al (2007), and Storer et al (2009). The main limitation of above mentioned methods is that there is no theoretical guarantee for their performance. Recently, the advances in compressive sensing have led to increasingly interests in considering RPCA as a problem of exactly recovering a lowrank matrix from corrupted observations , where is known to be sparse (Wright et al (2009), Candés et al (2011)). Its mathematical model is as follows:
(1) 
where is the norm of a matrix, i.e., the number of nonzero entries in the matrix.
Unfortunately, problem (1) is known to be NPhard. So Candés et al (2011) proposed using principal component pursuit (PCP) to solve (1), which is to replace the rank function and norm with the nuclear norm (which is the sum of the singular values of a matrix, denoted as ) and norm (which is the sum of the absolute values of the entries), respectively. More specifically, PCP is to solve the following convex problem instead:
(2) 
They also rigorously proved that under fairly general conditions and , PCP can exactly recover the lowrank matrix (namely the underlying lowdimensional structure) with an overwhelming probability, i.e., the difference of the probability from 1 decays exponentially when the matrix size increases. This theoretical analysis makes PCP distinct from previous methods for RPCA.
All the existing algorithms for RPCA need to compute either SVD or matrixmatrix multiplications on the whole data matrix. So their computation complexities are all at least quadratic w.r.t. the data size, preventing the applications of RPCA to largescale problems when the time is critical. In this paper, we address the largescale RPCA problem and propose a truly linear cost method to solve the PCP model (2) when the data size is very large while the target rank is relatively small. Such kind of data is ubiquitous in computer vision.
1.1 Main Idea
Our algorithm fully utilizes the properties of lowrankness. The main idea is to apply PCP to a randomly selected submatrix of the original noisy matrix and compute a low rank submatrix. Using this low rank submatrix, the true low rank matrix can be estimated efficiently, where the low rank submatrix is part of it.
Specifically, our method consists of two steps (illustrated in Figure 1). The first step is to recover a submatrix^{1}^{1}1Note that the “submatrix” here does not necessarily mean that we have to choose consecutive rows and columns from . (Figure 1 (e)) of . We call this submatrix the seed matrix because all other entries of can be further calculated by this submatrix. The second step is to use the seed matrix to recover two submatrices and (Figures 1 (f)(g)), which are on the same rows and columns as in , respectively. They are recovered by minimizing the distance from the subspaces spanned by the columns and rows of , respectively. Hence we call this step filtering. The remaining part (Figure 1 (h)) of can be represented by , and , using the generalized Nyström method (Wang et al (2009)). As analyzed in Section 3.4, our method is of linear cost with respect to the data size. Besides the advantage of linear time cost, the proposed algorithm is also highly parallel: the columns of and the rows of can be recovered fully independently. We also prove that under suitable conditions, our method can exactly recover the underling lowrank matrix with an overwhelming probability. To our best knowledge, this is the first algorithm that can exactly solve a nuclear norm minimization problem in linear time.
(a)  (b)  (c)  (d) 
(e)  (f)  (g)  (h) 
2 Previous Works
In this section, we review some previous algorithms for solving PCP. The existing solvers can be roughly divided into three categories: classic convex optimization, factorization and compressed optimization.
For small sized problems, PCP can be reformulated as a semidefinite program and then be solved by standard interior point methods. However, this type of methods cannot handle even moderate scale matrices due to their complexity in each iteration. So people turned to firstorder algorithms, such as the dual method (Ganesh et al (2009)), the accelerated proximal gradient (APG) method (Ganesh et al (2009)) and the alternating direction method (ADM) (Lin et al (2009)), among which ADM is the most efficient. All these methods require solving the following kind of subproblem in each iteration
(3) 
where is the Frobenious norm. Cai et al (2010) proved that the above problem has a closed form solution
(4) 
where is the singular value decomposition of and is the soft shrinkage operator. Therefore, these methods all require computing SVDs for some matrices, resulting in complexity, where is the matrix size.
As the most expensive computational task required by solving (2) is to perform SVD, Lin et al (2009) adopted partial SVD (Larsen (1998)) to reduce the complexity at each iteration to , where is the target rank. However, such a complexity is still too high for very large data sets. Drineas et al (2006) developed a fast Monte Carlo algorithm, named linear time SVD (LTSVD), which can be used for solving SVDs approximately (also see Halko et al (2011)). The main drawback of LTSVD is that it is less accurate than the standard SVD as it uses random sampling. So the whole algorithm needs more iterations to achieve the same accuracy. As a consequence, the speed performance of LTSVD quickly deteriorates when the target rank increases (see Figure 2). Actually, even adopting LTSVD the whole algorithm is still quadratic w.r.t. the data size because it still requires matrixmatrix multiplication in each iteration.
To address the scalability issue of solving largescale PCP problems, Shen et al (2011) proposed a factorization based method, named lowrank matrix fitting (LMaFit). This approach represents the lowrank matrix as a product of two matrices and then minimizes over the two matrices alternately. Although they do not require nuclear norm minimization (hence the SVDs), the convergence of the proposed algorithm is not guaranteed as the corresponding problem is nonconvex. Moreover, both the matrixmatrix multiplication and the QR decomposition based rank estimation technique require complexity. So this method does not essentially reduce the complexity.
Inspired by compressed optimization, Mu et al (2011) proposed reducing the problem scale by random projection (RP). However, this method is highly unstable – different random projections may lead to radically different results. Moreover, the need to introduce additional constraint to the problem slows down the convergence. And actually, the complexity of this method is also , where is the size of the random projection matrix and . So this method is still not of linear complexity with respect to the matrix size.
3 The Filtering Algorithm
Given an observed data matrix , which is the sum of a lowrank matrix and a sparse matrix , PCP is to recover from . What our approach differs from traditional ones is that the underlying lowrank matrix is reconstructed from a seed matrix. As explained in Section 1.1, our filtering algorithm consists of two steps: first recovering a seed matrix, second performing filtering on corresponding rows and columns of the data matrix. Below we provide details of these two steps.
3.1 Seed Matrix Recovery
Suppose that the target rank is very small compared with the data size: . We first randomly sample an submatrix from , where and are the row and column oversampling rates, respectively. Then the submatrix of the underlying matrix can be recovered by solving a small sized PCP problem:
(5) 
e.g., using ADM (Lin et al (2009)), where .
By Theorem 1.1 in (Candés et al (2011)), the seed matrix can be exactly recovered from with an overwhelming probability when and increases. In fact, by that theorem and should be chosen at the scale of . For the experiments conducted in this paper, whose ’s are very small, we simply choose .
3.2 Filtering
For ease of illustration, we assume that is the top left submatrix of . Then accordingly , and can be partitioned into:
(6) 
Since , there must exist matrices and , such that
(7) 
As is sparse, so are and . Therefore, and can be found by solving the following problems:
(8) 
and
(9) 
respectively. The above two problems can be easily solved by ADM.
With and computed, and are obtained as (7). Again by , the generalized Nyström method (Wang et al (2009)) gives:
(10) 
where is the MoorePenrose pseudo inverse of .
In real computation, as the SVD of is readily available when solving (5), due to the singular value thresholding operation (4), it is more convenient to reformulate (8) and (9) as
(11) 
and
(12) 
respectively, where is the skinny SVD of obtained from (4) in the iterations. Such a reformulation has multiple advantages. First, as , it is unnecessary to compute the inverse of and when updating and in the iterations of ADM. Second, computing (10) also becomes easy if one wants to form explicitly because now
(13) 
To make the algorithm description complete, we sketch in Algorithm 1 the ADM for solving (11) and (12), which are both of the following form:
(14) 
where and are known matrices and has orthonormal columns, i.e., . The ADM for (14) is to minimize on the following augmented Lagrangian function
(15) 
with respect to and , respectively, by fixing other variables, and then update the Lagrange multiplier and the penalty parameter .^{2}^{2}2The ADM for solving PCP follows the same methodology. As a reader can refer to (Lin et al (2009), Yuan and Yang (2009)) for details, we omit the pseudo code for using ADM to solve PCP.
Note that it is easy to see that (11) and (12) can also be solved in full parallelism as the columns and rows of and can computed independently, thanks to the decomposability of the problems. So the recovery of and is very efficient if one has a parallel computing platform, such as a general purpose graphics processing unit (GPU).
3.3 The Complete Algorithm
Now we are able to summarize in Algorithm 2 our filtering method for solving PCP, where steps 3 and 4 can be done in parallel.
3.4 Complexity Analysis
Now we analyze the computational complexity of the proposed Algorithm 2. For the step of seed matrix recovery, the complexity of solving (5) is only . For the filtering step, it can be seen that the complexity of solving (11) and (12) is and , respectively. So the total complexity of this step is . As the remaining part of can be represented by , and , using the generalized Nyström method (Wang et al (2009))^{3}^{3}3Of course, if we explicitly form then this step costs no more than complexity. Compared with other methods, our rest computations are all of complexity at the most, while those methods all require at least complexity in each iteration, which results from matrixmatrix multiplication. and recall that , we conclude that the overall complexity of Algorithm 2 is , which is only of linear cost with respect to the data size.
3.5 Exact Recoverability of Filtering
The exact recoverability of using our filtering method consists of two factors. First, exactly recovering from . Second, exactly recovering and . If all , , and can be exactly recovered, is exactly recovered.
The exact recoverability of from is guaranteed by Theorem 1.1 of (Candés et al (2011)). When and are sufficiently large, the chance of success is overwhelming.
To analyze the exact recoverability of and , we first observe that it is equivalent to the exact recoverability of and . By multiplying annihilation matrices and to both sides of (11) and (12), respectively, we may recover and by solving
(16) 
and
(17) 
respectively. If the oversampling rates and are large enough, we are able to choose and that are close to Gaussian random matrices. Then we may apply the standard theory in compressed sensing (Candés and Wakin (2007)) to conclude that if the oversampling rates and are large enough and and are sparse enough^{4}^{4}4As the analysis in the compressed sensing theories is qualitative and the bounds are actually pessimistic, copying those inequalities here is not very useful. So we omit the mathematical descriptions for brevity., and can be exactly recovered with an overwhelming probability.
We also present an example in Figure 1 to illustrate
the exact recoverability of filtering. We first truncate the
SVD of an image “Water” ^{5}^{5}5The image is
available at
http://www.petitcolas.net/fabien/
watermarking/image_database/.
to get a matrix of rank 30 (Figure 1 (b)). The
observed image (Figure 1 (a)) is obtained from
Figure 1 (b) by adding large noise to 30 of the
pixels uniformly sampled at random (Figure 1 (c)).
Suppose we have the topleft submatrix as the
seed (Figure 1 (e)), the lowrank image
(Figure 1 (d)) can be exactly recovered by
filtering. Actually, the relative reconstruction errors in
is only .
3.6 Target Rank Estimation
The above analysis and computation are all based on a known value of the target rank . For some applications, we could have an estimate on . For example, for the background modeling problem (Wright et al (2009)), the rank of the background video should be very close to one as the background hardly changes; and for the photometric stereo problem (Wu et al (2010)) the rank of the surface normal map should be very close to three as the normals are three dimensional vectors. However, the rank of the underlying matrix might not always be known. So we have to provide a strategy to estimate .
As we assume that the size of submatrix is , where and should be sufficiently large in order to ensure the exact recovery of from , after we have computed by solving (5), we may check whether
(18) 
are satisfied, where is the rank of . If yes, is accepted as a seed matrix. Otherwise, it implies that may be too small with respect to the target rank . Then we may increase the size of the submatrix to and repeat the above procedure until (18) is satisfied or
(19) 
We require (19) because the speed advantage of our filtering algorithm will quickly lost beyond this size limit (see Figure 2). If we have to use a submstrix whose size should be greater than , then the target rank should be comparable to the size of data, hence breaking our lowrank assumption. In this case, we may resort to the usual method to solve PCP.
Of course, we may sample one more submatrix to cross validate the estimated target rank . When is indeed very small, such a cross validation is not a big overhead.
4 Experimental Results
In this section, we present experiments on both synthetic data and real vision problems (structure from motion and background modeling) to test the performance of filtering. All the experiments are conducted and timed on the same PC with an AMD Athlon^{®} II X4 2.80GHz CPU that has 4 cores and 6GB memory, running Windows 7 and Matlab (Version 7.10).
4.1 Comparison Results for Solving PCP
We first test the performance of filtering on solving PCP problem (2). The experiments are categorized into the following three classes:

Compare with factorization based solver (e.g., LMaFit Shen et al (2011)) on recovering either randomly generated or deterministic lowrank matrix from its sum with a random sparse matrix.

Compare with random projection based solver (e.g., random projection Mu et al (2011)) on recovering randomly generated lowrank and sparse matrices.
In the experiments synthetic data, we generate random test data in the following way: an observed data matrix is synthesized as the sum of a lowrank matrix and a sparse matrix . The rank matrix is generated as a product of two matrices whose entries are i.i.d. Gaussian random variables with zero mean and unit variance. The matrix is generated as a sparse matrix whose support is chosen uniformly at random, and whose nonzero entries are i.i.d. uniformly in . The rank ratio and sparsity ratio are denoted as and , respectively.
4.1.1 Filtering vs. Classic Convex Optimization
Firstly, we compare our approach with ADM on the whole matrix^{6}^{6}6The Matlab code of ADM is provided by the authors of (Lin et al (2009)) and all the parameters in this code are set to their default values., which we call the standard ADM, and its variation, which uses lineartime SVD (LTSVD)^{7}^{7}7The Matlab code of lineartime SVD is available in the FPCA package at http://www.columbia.edu/sm2756/FPCA.htm. for solving the partial SVD, hence we call the LTSVD ADM. We choose these two approaches because the standard ADM is known to be the most efficient classic convex optimization algorithm to solve PCP exactly and LTSVD ADM has a linear time cost in solving SVD^{8}^{8}8However, LTSVD ADM is still of complexity as it involves matrixmatrix multiplication in each iteration. See also Section 2.. For LTSVD ADM, in each time to compute the partial SVD we uniformly oversample columns of the data matrix without replacement. Such an oversampling rate is important for ensuring the numerical accuracy of LTSVD ADM at high probability. For all methods in comparison, the stopping criterion is .
Table 1 shows the detailed comparison among the three methods, where is the relative error to the true lowrank matrix . It is easy to see that our filtering approach has the highest numerical accuracy and is also much faster than the standard ADM and LTSVD ADM. Although LTSVD ADM is faster than the standard ADM, its numerical accuracy is the lowest among the three methods because it is probabilistic.
Size ()  Method  RelErr  Time  
2000  , , ,  
SADM  1.46  20  39546  39998  998105  84.73  
LADM  4.72  20  39546  40229  998105  27.41  
1.66  20  39546  40000  998105  5.56 = 2.24 + 3.32  
5000  , , ,  
SADM  7.13  50  249432  249995  6246093  1093.96  
LADM  4.28  50  249432  250636  6246158  195.79  
5.07  50  249432  250000  6246093  42.34=19.66 + 22.68  
10000  , , ,  
SADM  1.23  100  997153  1000146  25004071  11258.51  
LADM  4.26  100  997153  1000744  25005109  1301.83  
2.90  100  997153  1000023  25004071  276.54 = 144.38 + 132.16 
We also present in Figure 2 the CPU times of the three methods when the rank ratio and sparsity ratio increases, respectively. The observed matrices are generated using the following parameter settings: , vary from 0.005 to 0.05 with fixed and vary from 0.02 to 0.2 with fixed . It can be seen from Figure 2 (a) that LTSVD ADM is faster than the standard ADM when . However, the computing time of LTSVD ADM grows quickly when increases. It even becomes slower than the standard ADM when . This is because LTSVD cannot guarantee the accuracy of partial SVD in each iteration. So its number of iterations is larger than that of the standard ADM. In comparison, the time cost of our filtering method is much less than the other two methods for all the rank ratios. However, when further grows the advantage of filtering will be lost quickly, because filtering has to compute the PCP on the submatrix . In contrast, Figure 2 (b) indicates that the CPU time of these methods grows very slowly with respect to the sparsity ratio.
(a)  (b) 
4.1.2 Filtering vs. Factorization Method
We then compare the proposed filtering with a factorization method (i.e., LMaFit^{9}^{9}9The Matlab code of LMaFit is provided by the authors of (Shen et al (2011)) and all the parameters in this code are set to their default values.) on solving (2). To test the ability of these algorithms in coping with corruptions with large magnitude, we multiply a scale to the sparse matrix, i.e., . We fix other parameters of the data (, and ) and vary the scale parameter from 1 to 10 to increase the magnitude of the sparse errors.
The computational comparisons are presented in Figure 3. Besides the CPU time and relative error, we also measure the quality of the recovered by its maximum difference (MaxDif) and average difference (AveDif) to the true lowrank matrix , which are respectively defined as and . One can see that the performance of LMaFit dramatically decreases when . This experiment suggests that the factorization method fails when the sparse matrix dominates the lowrank one in magnitude. This is because a sparse matrix with large magnitudes makes rank estimation difficult or impossible for LMaFit. Without a correct rank, the lowrank matrix cannot be recovered exactly. In comparison, our filtering always performs well on the test data.
(a)  (b)  (c)  (d) 
In the following, we consider the problem of recovering deterministic lowrank matrix from corruptions. We generate an “checkerboard” image (see Figure 4), whose rank is 2, and corrupt it by adding 10 impulsive noise to it. The corruptions (nonzero entries of the sparse matrix) are sampled uniformly at random. The image size ranges from 1000 to 5000 with an increment 500.
The results for this test are shown in Figure 4, where the first image is the corrupted checkerboard image, the second image is recovered by LMaFit and the third by filtering. A more complete illustration for this test can be seen from Figure 4(d), where the CPU time corresponding to all tested data matrix sizes are plotted. It can be seen that the images recovered by LMaFit and filtering are visually comparable in quality. The speeds of these two methods are very similar when the data size is small, while filtering runs much faster than LMaFit when the matrix size increases. This concludes that our approach has significant speed advantage over the factorization method on large scale data sets.
(a) Corrupted  (b) LMaFit  (c)  (d) CPU Time 
4.1.3 Filtering vs. Compressed Optimization
Now we compare filtering with a compressed optimization method (i.e., random projection^{10}^{10}10The Matlab code of random projection (RP) is provided by the author of (Mu et al (2011)) and all the parameters in this code are set to their default values.). This experiment is to study the performance of these two methods with respect to the rank of the matrix and the data size. The parameters of the test matrices are set as follows: , varying from 0.05 to 0.15 with fixed , and varying from 1000 to 5000 with fixed . For the dimension of the projection matrix (i.e., ), we set it as for all the experiments.
As shown in Figure 5, in all cases the speed and the numerical accuracy of filtering are always much higher than those of random projection.
(a)  (b) 
(c)  (d) 
4.2 Structure from Motion
In this subsection, we apply filtering to a real world vision application, namely structure from motion (SfM). The problem of SfM is to automatically recover the 3D structure of an object from a sequence of images of the object. Suppose that the object is rigid, there are frames and tracked feature points (i.e., ), and the camera intrinsic parameters do not change. As shown in (Rao et al (2010)), the trajectories of feature points from a single rigid motion of the camera all lie in a liner subspace of , whose dimension is at most four (i.e., ). It has been shown that can be factorized as , where recovers the rotations and translations while the first three rows of encode the relative 3D positions for each feature point in the reconstructed object. However, when there exist errors (e.g., occlusion, missing data or outliers) the feature matrix is no longer of rank 4. Then recovering the full 3D structure of the object can be posed as a lowrank matrix recovery problem.
For this experiment, we first generate the 2D feature points by applying an affine camera model (with rotation angles between 0 and , with a step size , and uniformly randomly generated translations) to the 3D “wolf” object ^{11}^{11}11The 3D “wolf” data is available at: http://tosca.cs.technion.ac.il/., which contains 3D points. Then we add impulsive noises (the locations of the nonzero entries are uniformly sampled at random) to part (e.g., or ) of the feature points (see Figure 6). In this way, we obtain corrupted observations with a size .
We apply our filtering to remove outliers (i.e., ) and compute the affine motion matrix and the 3D coordinates from the recovered features (i.e., ). For comparison, we also include the results from the robust subspace learning (RSL) (De la Torre and Black (2003))^{12}^{12}12The Matlab code of RSL is available at http://www.salleurl.edu/ftorre/papers/rpca/rpca.zip and the parameters in this code are set to their default values. and standard PCP (i.e., SADM based PCP). In Figure 7, we show the original 3D object, SfM results based on noisy trajectories and trajectories recovered by RSL, standard PCP and filtering, respectively. It is easy to see that the 3D reconstruction of RSL fails near the front legs and tail. In contrast, the standard PCP and filtering provide results with almost the same quality. Table 2 further compares the numerical behaviors of these methods. We measure the quantitative performance for SfM by the wellknown mean 2D reprojection error, which is denoted as “ReprojErr” and defined by the mean distance of the ground truth 2D feature points and their reprojections. We can see that the standard PCP provides the highest numerical accuracy while its time cost is extremely high (9 times slower than RSL and more than 100 times slower than filtering). Although the speed of RSL is faster than standard PCP, its numerical accuracy is the worst among these methods. In comparison, our filtering achieves almost the same numerical accuracy as standard PCP and is the fastest.
(a) Original  (b) Corrupted  (c) RSL  (d) SPCP  (e) filtering 
Noisy Level  Method  RelErr  Time  MaxDif()  AveDif()  ReprojErr  

,  
RSL  0.0323  4 (fixed)  15384229  93.05  32.1731  0.4777  0.9851  
SPCP  4  869200  848.09  
4  869644  6.46  
,  
RSL  0.0550  4 (fixed)  16383294  106.65  38.1621  0.9285  1.8979  
SPCP  4  1738410  991.40  
4  1739912  6.48 
4.3 Background Modeling
In this subsection, we consider the problem of background modeling from video surveillance. The background of a group of video surveillance frames are supposed to be exactly the same and the foreground on each frame is recognized as sparse errors. Thus this vision problem can be naturally formulated as recovering the lowrank matrix from its sum with sparse errors (Candés et al (2011)). We compare our filtering with the baseline median filter^{13}^{13}13Please refer to http://en.wikipedia.org/wiki/Median_filter. and other stateoftheart robust approaches, such as RSL and SPCP. When median filtering using all the frames, the complexity is actually also quadratic and the results are not good. So we only buffer 20 frames when using median filter to compute the background. For filtering, we set the size of the seed matrix as .
For quantitative evaluation, we perform all the compared methods on the “laboratory” sequence from a public surveillance database (Benedek and Szirányi (2008)) which has ground truth foreground. Both the false negative rate (FNR) and the false positive rate (FPR) are calculated in the sense of foreground detection. FNR indicates the ability of the method to correctly recover the foreground while the FPR represents the power of a method on distinguishing the background. These two scores correspond to the Type I and Type II errors in the statistical test theory^{14}^{14}14Please refer to http://en.wikipedia.org/wiki/Type_I_and_type_II_errors. and are judged by the criterion that the smaller the better. One can see from Table 3 that RSL has the lowest FNR but the highest FPR among the compared methods. This reveals that RSL could not exactly distinguish the background. Although the speed of our filtering is slightly slower than median filtering on 20 frames, its performance is as good as SPCP, which achieves the best results but with the highest time cost.
Video    Median  RSL  SPCP  
“laboratory”  Resolution: , No. Frames: 887  
FNR  9.85  7.31  8.61  8.62  
FPR  9.18  10.83  8.72  8.76  
Time  42.90  3159.92  10897.96  48.99  
“meeting”  Resolution: , No. Frames: 700  
Time  179.19  N.A.  N.A.  178.74 
To further test the performance of filtering on large scale data set, we also collect a video sequence (named “meeting”) of 700 frames, each of which has a resolution . So the data matrix is of size greater than , which cannot be fit into the memory of our PC. As a result, we cannot use the standard ADM to solve the corresponding PCP problem. As for RSL, we have found that it did not converge on this data. Thus we only compare the performance of median filter and filtering. The time cost is reported in Table 3 and the qualitative comparison is shown in Figure 8. We can see that filtering is as fast as median filtering with 20 frames, and median filter fails on this data set. This is because the mechanism of median filter is based on the (local) frame difference. Thus when the scene contains slowly moving objects (such as that in the “meeting” video sequence), median filter will not give good results. In contrast, the background and the foreground can be separated satisfactorily by filtering. This makes sense because our filtering can exactly recover the (global) lowrank structure for the background and remove the foreground as sparse errors.
(a) Original  (b) Background (Median)  (c) Foreground (Median)  (d) Background ()  (e) Foreground () 
5 Conclusion and Further Work
In this paper, we propose the first linear time algorithm, named the filtering method, for exactly solving very large PCP problems, whose ranks are supposed to be very small compared to the data size. It first recovers a seed matrix and then uses the seed matrix to filter some rows and columns of the data matrix. It avoids SVD on the original data matrix, and the filtering step can be done in full parallelism. As a result, the time cost of our filtering method is only linear with respect to the data size, making applications of RPCA to extremely large scale problems possible. The experiments on both synthetic and real world data demonstrate the high accuracy and efficiency of our method. It is possible that the proposed technique can be applied to other large scale nuclear norm minimization problems, e.g., matrix completion (Cai et al (2010)) and lowrank representation (Liu et al (2010)). This will be our future work.
Acknowledgements.
The authors would like to thank Prof. Zaiwen Wen and Dr. Yadong Mu for sharing us their codes for LMaFit (Shen et al (2011)) and random projection (Mu et al (2011)), respectively. This work is partially supported by the grants of the National Nature Science Foundation of ChinaGuangdong Joint Fund (No. U0935004), the National Nature Science Foundation of China Fund (No. 60873181, 61173103) and the Fundamental Research Funds for the Central Universities. The first author would also like to thank the support from China Scholarship Council.References
 Aanes et al (2002) Aanes H, Fisker R, Astrom K, Carstensen J (2002) Robust factorization. IEEE Trans on PAMI 24(9):359–368
 Baccini et al (1996) Baccini A, Besse P, de Falguerolles A (1996) An norm PCA and a heuristic approach. In: Proceedings of the International Conference on Ordinal and Symbolic Data Analysis, pp 359–368
 Benedek and Szirányi (2008) Benedek C, Szirányi T (2008) Bayesian foreground and shadow detection in uncertain frame rate surveillance videos. IEEE Trans on Image Processing 17(4):608–621
 Cai et al (2010) Cai J, Candés E, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20(4):1956–1982
 Candés and Wakin (2007) Candés E, Wakin M (2007) An introduction to compressive sampling. IEEE Signal Processing Magazine 25(2):21–30
 Candés et al (2011) Candés E, Li X, Ma Y, Wright J (2011) Robust Principal Component Analysis? Journal of the ACM 58(3):11
 De la Torre and Black (2003) De la Torre F, Black M (2003) A framework for robust subspace learning. IJCV 54(1–3):117–142
 Drineas et al (2006) Drineas P, Kannan R, Mahoney M (2006) Fast Monte Carlo algorithms for matrices II: Computing a low rank approximation to a matrix. SIAM Journal on Computing 36(1):158–183
 Ganesh et al (2009) Ganesh A, Lin Z, Wright J, Wu L, Chen M, Ma Y (2009) Fast algorithms for recovering a corrupted lowrank matrix. In: Proceedings of International Workshop on Computational Advances in MultiSensor Adaptive Processing
 Halko et al (2011) Halko N, Martinsson P, Tropp J (2011) Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review 53(2):217–288
 Ji et al (2010) Ji H, Liu C, Shen Z, Xu Y (2010) Robust video denoising using lowrank matrix completion. In: CVPR
 Ke and Kanade (2005) Ke Q, Kanade T (2005) Robust norm factorization in the presence of outliers and missing data by alternative convex programming. In: CVPR
 Larsen (1998) Larsen R (1998) Lanczos bidiagonalization with partial reorthogonalization. Department of Computer Science, Aarhus University, Technical report, DAIMI PB357
 Lin et al (2009) Lin Z, Chen M, Wu L, Ma Y (2009) The augmented Lagrange multiplier method for exact recovery of corrupted lowrank matrices. UIUC Technical Report UILUENG092215
 Liu et al (2010) Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by lowrank representation. In: ICML
 Mu et al (2011) Mu Y, Dong J, Yuan X, Yan S (2011) Accelerated lowrank visual recovery by random projection. In: CVPR
 Nie et al (2011) Nie F, Huang H, Ding C, Luo D, Wang H (2011) Robust principal component analysis with nongreedy norm maximization. In: IJCAI
 Peng et al (2010) Peng Y, Ganesh A, Wright J, Xu W, Ma Y (2010) RASL: Robust alignment by sparse and lowrank decomposition for linearly correlated images. In: CVPR
 Rao et al (2010) Rao S, Tron R, Vidal R, Ma Y (2010) Motion segmentation in the presence of outlying, incomplete, and corrupted trajectories. IEEE Trans on PAMI 32(10):1832–1845
 Shen et al (2011) Shen Y, Wen Z, Zhang Y (2011) Augmented Lagrangian alternating direction method for matrix separation based on lowrank factorization. preprint
 Skocaj et al (2007) Skocaj D, Leonardis A, Bischof H (2007) Weighted and robust learning of subspace representations. Pattern Recognition 40(5):1556–1569
 Storer et al (2009) Storer M, Roth P, Urschler M, Bischof H (2009) Fastrobust PCA. In: Proc. 16th Scandinavian Conference on Image Analysis (SCIA)
 Wang et al (2009) Wang J, Dong Y, Tong X, Lin Z, Guo B (2009) Kernel Nyström method for light transport. ACM Transactions on Graphics 28(3)
 Wright et al (2009) Wright J, Ganesh A, Rao S, Peng Y, Ma Y (2009) Robust principal component analysis: Exact recovery of corrupted lowrank matrices via convex optimization. In: NIPS
 Wu et al (2010) Wu L, Ganesh A, Shi B, Matsushita Y, Wang Y, Ma Y (2010) Robust photometric stereo via lowrank matrix completion and recovery. preprint
 Yuan and Yang (2009) Yuan X, Yang J (2009) Sparse and lowrank matrix decomposition via alternating direction methods. preprint
 Zhang et al (2012) Zhang Z, Ganesh A, Liang X, Ma Y (2012) TILT: Transforminvariant lowrank textures. accepted by IJCV