# Compressive Online Robust Principal Component Analysis with Optical Flow for Video Foreground-Background Separation

###### Abstract

In the context of online Robust Principle Component Analysis (RPCA) for the video foreground-background separation, we propose a compressive online RPCA with optical flow that separates recursively a sequence of frames into sparse (foreground) and low-rank (background) components. Our method considers a small set of measurements taken per data vector (frame), which is different from conventional batch RPCA, processing all the data directly. The proposed method also incorporates multiple prior information, namely previous foreground and background frames, to improve the separation and then updates the prior information for the next frame. Moreover, the foreground prior frames are improved by estimating motions between the previous foreground frames using optical flow and compensating the motions to achieve higher quality foreground prior. The proposed method is applied to online video foreground and background separation from compressive measurements. The visual and quantitative results show that our method outperforms the existing methods.

Robust principal component analysis, video separation, compressive measurements, optical flow, prior information

## I Introduction

The background and foreground separation of a video sequence is of great importance in a number of computer vision applications, e.g., visual surveillance and object detection. These separations make the video analysis more efficient and regions of interest extracted can be used as a preprocessing step for further identification and classification. In video separation, a video sequence is separated into the slowly-changing background (modeled by as a low-rank component) and the foreground (modeled by as a sparse component). Robust Principle Component Analysis (RPCA) [1, 2] was shown to be a robust method for separating the low-rank and sparse compenents. RPCA decomposes a data matrix into the sum of unknown sparse and low-rank by solving the Principal Component Pursuit (PCP) [1] problem:

(1) |

where is the matrix nuclear norm (sum of singular values) and is the -norm. RPCA has found many applications in computer vision, web data analysis, and recommender systems. However, batch RPCA processes all data samples, e.g., all frames in a video, which involves high computational and memory requirements.

Moreover, with inherent characteristics of video, correlations among consecutive frames can be taken into account to improve the separation. The correlations can be obtained in the form of motions that present the information changes from one frame to the others. Detecting motion is an integral part of the human visual system. One of the dominant techniques for estimating motion in computer vision is optical flow by variational methods [3, 4, 5]. The optical flow estimates the motion vectors of all pixels in a given frame due to the relative motions between frames. In particular, the motion vectors at each pixel can be estimated by minimizing a gradient-based matching of pixel gray value that is combined with a smoothness criteria [3]. Thereafter, the computed motion vectors in the horizontal and vertical directions [6] are used to compensate and predict information in the next frame. For producing highly accurate motions and correct large displacement correspondences, a large displacement optical flow [7] combines a coarse-to-fine optimization with descriptor matching. Therefore, the large displacement optical flow [7] can be exploited in the video separation to estimate the motions from previously separated frames to support the current frame separation.

In order to deal with the video separation in an online manner, we consider an online RPCA algorithm that recursively processes a sequence of frames (a.k.a., the column-vectors in ) per time instance. Additionally, we aim at recovering the foreground and background from a small set of measurements rather than a full frame data, leveraging information from a set of previously separated frames. In particular, at time instance , we wish to separate into and , where denotes a matrix and are column-vectors in and , respectively. We assume that and have been recovered at time instance and that at time instance we have access to compressive measurements of the full frame, a.k.a., vector , that is, we observe , where is a random projection. The recovery problem at time instance is thus written [8] as

(2) |

where , , and are given.

There are several works on the separation problems [9, 9, 10, 11, 12, 13] by advancing RPCA [1]. Incremental PCP [9] processes each column-vector in at a time. However, assuming access to the complete data (e.g., full frames) rather than compressive data. On the other hand, Compressive PCP [14] is a counterpart of batch RPCA that operates on compressive measurements. Some studies in [10, 11, 12, 13] addressed the problem of online estimation of low-dimensional subspaces from randomly subsampled data for modeling the background. The work in [15] proposed an algorithm to recover the sparse component in (2), however, the low-rank component in (2) was not recovered per time instance from a small number of measurements. The alternative method in [16], [17] estimates the number of compressive measurements required to recover foreground per time instance via assuming the background not-varying. This assumption is invalid in realistic scenarios due to illumination variations or moving backgrounds.

The problem of separating a sequence of time-varying frames using prior information brings significant improvements in the context of online RPCA [15, 18, 19]. Several studies on recursive recovery from low-dimensional measurements have been proposed to leverage prior information [16, 15, 18, 20]. The study in [20] provided a comprehensive overview of the domain, reviewing a class of recursive algorithms. The studies in [15, 18] used modified-CS [21] to leverage prior knowledge under the condition of slowly varying support and signal values. However, this method as well as the methods in [10, 11, 13] do not explore the correlations between the current frame and multiple previously separated frames. Our latest work in [8] leverages correlations across the previously separated foreground frames. However, displacements between the previous foreground frames and the current frame are not taken into account. These displacements can incur the degradation of the separation performance.

Contribution. We propose a compressive online robust PCA with optical flow (CORPCA-OF) method, which is based on our previous work in [8], to leverage information from previously separated foreground frames via optical flow [7]. The novelty of CORPCA-OF over CORPCA [8] is that the optical flow is used to estimate and compensate motions between the foreground frames to generate new prior foreground frames. These new prior frames have high correlation with the current frame and thus improve the separation. We also exploit the slowly-changing characteristics of backgrounds known as low-rank components via an incremental [22] method. The compressive separation problem in (2) is solved in an online manner by minimizing not only an --norm cost function [23] for the sparse foreground but also the rank of a matrix for the low-rank backgrounds. Thereafter, the new separated foreground and background frames are used to update the prior knowledge for the next processing instance.

The rest of this paper is organized as follows. We summarize the CORPCA algorithm [8], on which our proposed method is to be built, and state our problem in Sec. II-A. The proposed method is fully described in Sec. II-B. We test our proposed method for an online compressive video separation application on real video sequences and evaluate both visual and quantitative results in Sec. III.

## Ii Video foreground-background separation using Compressive Online Robust PCA with Optical Flow

In this section, we firstly review the CORPCA algorithm [8] for online compressive video separation and state our problem. Thereafter, we propose the CORPCA-OF method, which is summarized in the CORPCA-OF algorithm.

### Ii-a Compressive Online Robust PCA (CORPCA) for Video Separation

The CORPCA algorithm [8] is proposed for video separation that is based on the RAMSIA algorithm [23] solving an - minimization problem with adaptive weights to recover a sparse signal from low-dimensional random measurements with the aid of multiple prior information , , with . The objective function of RAMSIA [23] is given by

(3) |

where and are weights across the prior information, and is a diagonal matrix with weights for each element in the prior information signal ; namely, with being the weight for the -th element in the vector.

The CORPCA algorithm processes one data vector per time instance by leveraging prior information for both its sparse and low-rank components. At time instance , we observe with . Let , a set of , and denote prior information for and , respectively. The prior information and are formed by using the already reconstructed set of vectors and .

The objective function of CORPCA is to solve Problem (2) and formulated by

(4) |

where . It can be seen that when is static (not changing), Problem (II-A) would become Problem (3). Furthermore, when and are batch variables and we do not take the prior information, and , and the projection into account, Problem (II-A) becomes Problem (1).

The CORPCA algorithm^{1}^{1}1The code of the CORPCA algorithm, the test sequences, and the corresponding outcomes are available at https://github.com/huynhlvd/corpca solves Problem (II-A) given that and are known (they are obtained from the time instance or recursion). Thereafter, we update and , which are used in the following time instance.

Let us denote , , and . As shown in the COPRCA algorithm [8], and are iteratively computed at iteration via the soft thresholding operator [24] for and the single value thresholding operator [25] for :

(5) |

(6) |

Problem statement. Using the prior information as in CORPCA [8] has provided the significant improvement of the current frame separation. However, there can be displacements between the consecutive frames deteriorating the separation performance. Fig. 1 illustrates an example of three previous foreground frames, and . We can use them directly as prior information to recover foreground and background as done in CORPCA [8] due to the natural correlations between and . In the last line of three prior foreground frames in Fig. 1, it can be seen that motions exist among them and the current frame . By carrying out motion estimation using optical flow [7], we can obtain the motions between the previous foreground frames as in Fig. 1, which are presented using color code for visualizing the motion flow fields [7]. These motions can be used to compensate and generate better quality prior frames (see compensated compared with ), being more correlated to . In this work, we propose an algorithm - CORPCA with Optical Flow (CORPCA-OF), whose work flow is built as in Fig. 1 by using optical flow [7] to improve prior foreground frames.

### Ii-B The proposed COPRCA-OF Method

Compressive Separation Model with CORPCA-OF. Fig. 2 depicts a compressive separation model using the proposed CORPCA-OF method. Considering a time instance , the inputs consist of compressive measurements and prior information from time instance , and . The model outputs foreground and background information and by solving the CORPCA minimization problem in (II-A). Finally, the outputs and are used to generate better prior foreground information via a prior generation using optical flow and update and for the next instance via a prior update. The novel block of COPRCA-OF compared with CORPCA [8] is the Prior Generation using Optical Flow, where prior foreground information is improved by exploiting the large displacement optical flow [7]. The CORPCA-OF method is further described in Algorithm 1.

Prior Generation using Optical Flow. CORPCA-OF aims at improving the foreground prior frames via optical flow. In Algorithm 1, the prior frames are initialized by , and . Optical flow is used to compute the motions between frames and (also and ) to obtain optical flow vectors for these two frames. This can be seen in the CORPCA-OF work flow diagram in Fig. 1 as the optical flow fields represented in color code. The function in Lines 1 and 1 [see Algorithm 1] computes the motions between prior foreground frames. This is based on the large displacement optical flow, as formulated in [7] and involves computing the optical flow vectors containing horizontal () and vertical () components, denoted by and , respectively. The estimated motions in the form of optical flow vectors, and , are then used to predict the next frames by compensating for the forward motions on . We generate the prior frames, and , using motion compensation indicated by the function as shown in Algorithm 1 in Lines 1 and 1.

Considering a point in the given frame, the horizontal and vertical components and of corresponding and are obtained, as mentioned in [26] by solving :

(7) |

where and are the intensity changes in the horizontal () and vertical () directions, respectively, constituting the spatial gradients of the intensity level ; is the time gradient, which is a measure of temporal change in the intensity level at point . There are various methods [3, 7, 4, 5] to determine and . Our solution is based on large displacement optical flow [7], that is a combination of global and local approaches to estimate all kinds of motion. It involves optimization and minimization of error by using descriptor matching and continuation method, which utilizes feature matching along with conventional optical flow estimation to obtain the flow field. We combine the optical flow components of each point in the image into two vectors , i.e., the horizontal and the vertical components of the optical flow vector. Similarly we obtain .

The estimated motions in the form of optical flow vectors are used along with the frame to produce new prior frames that form the updated prior information. Linear interpolation is used to generate new frames via column interpolation and row interpolation. This is represented as in Lines 1 and 1 in the Algorithm 1. By using the flow fields and to predict motions in the next frame and compensate them on , we obtain and , respectively. Here is obtained by compensating for the half of motions, i.e., , between and . These generated frames , are more correlated to the current frame than , . We also keep the most recent frame (in Line 1) as one of the prior frames.

Thereafter, and are iteratively computed as in Lines 1-1 in Algorithm 1 to solve Problem (II-A). It can be noted that the proximal operator in Line 1 of Algorithm 1 is defined [8] as

(8) |

where . The weights and are updated per iteration of the algorithm (see Lines 1-1). As suggested in [2], the convergence of Algorithm 1 in Line 1 is determined by evaluating the criterion After this, we update the prior information for the next instance, and , in Lines 1-1.

Prior Update. The update of and [8] is carried out after each time instance (see Lines 1-1, Algorithm 1). Due to the correlation between subsequent frames, we update the prior information by using the latest recovered sparse components, which is given by, . For , we consider an adaptive update, which operates on a fixed or constant number of the columns of . To this end, the incremental singular decomposition [22] method ( in Line 1, Algorithm 1) is used. It is worth noting that the update , causes the dimension of to increase as after each instance. However, in order to maintain a reasonable number of , we take . The computational cost of is lower than conventional SVD [22, 9] since we only compute the full of the middle matrix with size , where , instead of .

The computation of is presented in the following: The goal is to compute , i.e., . By taking the SVD of to obtain . Therefore, we can derive via and . We write the matrix as

(9) |

where and . By taking the of the matrix in between the right side of (9), we obtain . Eventually, we obtain , , and .

## Iii Experimental Results

We evaluate the performance of our proposed CORPCA-OF in Algorithm 1 and compare CORPCA-OF against the existing methods, RPCA [1], GRASTA [10], and ReProCS [15]. RPCA [1] is a batch-based method assuming full access to the data, while GRASTA [10] and ReProCS [15] are online methods that can recover either the (low-rank) background component (GRASTA) or the (sparse) foreground component (ReProCS) from compressive measurements. In this work, we test two sequences [27], Bootstrap (6080 pixels) and Curtain (6480 pixels), having a static and a dynamic background, respectively.

### Iii-a Prior Information Evaluation

We evaluate the prior information of CORPCA-OF compared with that of CORPCA[8] using the previously separated foreground frames directly. For CORPCA-OF, we generate the prior information by estimating and compensating motions among the previous foreground frames. Fig. 3 shows a few examples of the prior information generated for the sequences Bootstrap and Curtain. In Fig. 3(a), it can be observed that frames #2210’, #2211’ and #2212’ (of CORPCA-OF) are better than corresponding #2210, #2211 and #2212 (of CORPCA) for the current frame #2213, similarly in Figs. 3(b), 3(c), and 3(d). Specially, in Fig. 3(c), the generated frames #448’ and #449’ are significantly improved due to compensating the given dense motions. In Fig. 3(d), it is clear that the movements of the person is well compensated in #2771’ and #2772’ by CORPCA-OF compared to #2771 and #2772 respectively, of CORPCA, leading to better correlations with the foreground of current frame #2774.

### Iii-B Compressive Video Foreground and Background Separation

We assess our CORPCA-OF method in the application of compressive video separation and compare it against the existing methods, CORPCA[8], RPCA [1], GRASTA [10], and ReProCS [15]. We run all methods on the test video sequences. In this experiment, we use frames as training vectors for the proposed CORPCA-OF, CORPCA[8] as well as for GRASTA [10] and ReProCS [15]. Three latest previous foregrounds are used as the foreground prior for CORPCA, meanwhile COPRCA-OF uses them to refine the foreground prior by using optical flow [7].

#### Iii-B1 Visual Evaluation

We first consider background and foreground separation with full access to the video data; the visual results of the various methods are illustrated in Fig. 4. It is evident that, for both the video sequences, CORPCA-OF delivers superior visual results than the other methods, which suffer from less-details in the foreground and noisy background images. We can also observe improvements over CORPCA.

Additionally, we also compare the visual results of CORPCA-OF, CORPCA and ReProCS for the frames Bootstrap #2213 (in Fig. 5) and for Curtain #2866 (in Fig. 6) with compressed rates. They present the results under various rates on the number of measurements over the dimension of the data (the size of the vectorized frame) with rates: . Comparing CORPCA-OF with CORPCA, we can observe in Figs. 5 and 6 that CORPCA-OF gives the foregrounds that are less noisy and the background frames of higher visual quality. On comparison with ReProCS, our algorithm outperforms it significantly. At low rates, for instance with (in Fig. 5(a)) or (in Fig. 6(a)), the extracted foreground frames of CORPCA-OF are better than those of CORPCA and ReProCS. Even at a high rate of the sparse components or the foreground frames using ReProCS are noisy and of poor visual quality. The Bootstrap sequence requires more measurements than Curtain due to the more complex foreground information. It is evident from Figs. 5 and 6 that the visual results obtained with CORPCA-OF are of superior quality compared to ReProCS and have significant improvements over CORPCA.

#### Iii-B2 Quantitative Results

We evaluate quantitatively the separation performance via the receiver operating curve (ROC) metric [28]. The metrics True positives and False positives are defined as in [28]. Fig. 7 illustrates the ROC results when assuming full data access, i.e., , of CORPCA-OF, CORPCA, RPCA, GRASTA, and ReProCS. The results show that CORPCA-OF delivers higher performance than the other methods.

Furthermore, we compare the foreground recovery performance of CORPCA-OF against CORPCA and ReProCS for different compressive measurement rates: . The ROC results in Figs. 8 and 9 show that CORPCA-OF can achieve higher performance in comparison to ReProCS and CORPCA. In particular, with a small number of measurements, CORPCA-OF produces better curves than those of COPRCA, e.g., for Bootstrap at [see Fig. 8(a)] and for Curtain at [see Fig. 9(a)]. The ROC results for ReProCS are quickly degraded even with a high compressive measurement rate [see Figure 9(c)].

## Iv Conclusion

This paper proposed a compressive online robust PCA algorithm with optical flow (CORPCA-OF) that can process one frame per time instance using compressive measurements. CORPCA-OF efficiently incorporates multiple prior frames based on the - minimization problem. The proposed method exploits motion estimation and compensation using optical flow to refine the prior information and obtain better quality. We have tested our method on compressive online video separation application using video data. The visual and quantitative results showed the improvements on the prior generation and the superior performance offered by CORPCA-OF compared to the existing methods including the CORPCA baseline.

## References

- [1] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” J. ACM, vol. 58, no. 3, pp. 11:1–11:37, Jun. 2011.
- [2] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, “Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization,” in Advances in Neural Information Processing Systems 2, 2009.
- [3] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artif. Intell., vol. 17, no. 1-3, pp. 185–203, Aug. 1981.
- [4] A. Bruhn, J. Weickert, and C. Schnörr, “Lucas/kanade meets horn/schunck: Combining local and global optic flow methods,” International Journal of Computer Vision, vol. 61, no. 3, pp. 211–231, Feb 2005.
- [5] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski, “A database and evaluation methodology for optical flow,” International Journal of Computer Vision, vol. 92, no. 1, pp. 1–31, Mar 2011.
- [6] A. Bruhn, J. Weickert, and C. Schnörr, “Lucas/kanade meets horn/schunck: Combining local and global optic flow methods,” International Journal of Computer Vision, vol. 61, no. 3, pp. 211–231, 2005.
- [7] T. Brox and J. Malik, “Large displacement optical flow: Descriptor matching in variational motion estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 3, pp. 500–513, March 2011.
- [8] H. V. Luong, N. Deligiannis, J. Seiler, S. Forchhammer, and A. Kaup, “Compressive online robust principal component analysis with multiple prior information,” in IEEE Global Conference on Signal and Information Processing, Montreal, Canada (e-print in arXiv), Nov. 2017.
- [9] P. Rodriguez and B. Wohlberg, “Incremental principal component pursuit for video background modeling,” Journal of Mathematical Imaging and Vision, vol. 55, no. 1, pp. 1–18, 2016.
- [10] J. He, L. Balzano, and A. Szlam, “Incremental gradient on the grassmannian for online foreground and background separation in subsampled video,” in IEEE Conference on Computer Vision and Pattern Recognition, June 2012.
- [11] J. Xu, V. K. Ithapu, L. Mukherjee, J. M. Rehg, and V. Singh, “Gosus: Grassmannian online subspace updates with structured-sparsity,” in IEEE International Conference on Computer Vision, Dec 2013.
- [12] J. Feng, H. Xu, and S. Yan, “Online robust pca via stochastic optimization,” in Advances in Neural Information Processing Systems 26, 2013.
- [13] H. Mansour and X. Jiang, “A robust online subspace estimation and tracking algorithm,” in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, April 2015.
- [14] J. Wright, A. Ganesh, K. Min, and Y. Ma, “Compressive principal component pursuit,” Information and Inference, vol. 2, no. 1, pp. 32–68, 2013.
- [15] H. Guo, C. Qiu, and N. Vaswani, “An online algorithm for separating sparse and low-dimensional signal sequences from their sum,” IEEE Trans. Signal Process., vol. 62, no. 16, pp. 4284–4297, 2014.
- [16] J. F. Mota, N. Deligiannis, A. C. Sankaranarayanan, V. Cevher, and M. R. Rodrigues, “Adaptive-rate reconstruction of time-varying signals with application in compressive foreground extraction,” IEEE Trans. Signal Process., vol. 64, no. 14, pp. 3651–3666, 2016.
- [17] G. Warnell, S. Bhattacharya, R. Chellappa, and T. Basar, “Adaptive-rate compressive sensing using side information,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 3846–3857, 2015.
- [18] C. Qiu, N. Vaswani, B. Lois, and L. Hogben, “Recursive robust PCA or recursive sparse recovery in large but structured noise,” IEEE Trans. Inf. Theory, vol. 60, no. 8, pp. 5007–5039, 2014.
- [19] J. Zhan and N. Vaswani, “Robust pca with partial subspace knowledge,” in IEEE Int. Symposium on Information Theory, 2014.
- [20] N. Vaswani and J. Zhan, “Recursive recovery of sparse signal sequences from compressive measurements: A review,” IEEE Trans. Signal Process., vol. 64, no. 13, pp. 3523–3549, 2016.
- [21] N. Vaswani and W. Lu, “Modified-cs: Modifying compressive sensing for problems with partially known support,” IEEE Trans. Signal Process., vol. 58, no. 9, pp. 4595–4607, Sep. 2010.
- [22] M. Brand, “Incremental singular value decomposition of uncertain data with missing values,” in European Conference on Computer Vision, 2002.
- [23] H. V. Luong, J. Seiler, A. Kaup, and S. Forchhammer, “Sparse signal reconstruction with multiple side information using adaptive weights for multiview sources,” in IEEE Int. Conf. on Image Process., Phoenix, Arizona, Sep. 2016.
- [24] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2(1), pp. 183–202, 2009.
- [25] J.-F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM J. on Optimization, vol. 20, no. 4, pp. 1956–1982, Mar. 2010.
- [26] R. Szeliski, Computer vision: algorithms and applications. Springer Science & Business Media, 2010.
- [27] L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian, “Statistical modeling of complex backgrounds for foreground object detection,” IEEE Trans. Image Process., vol. 13, no. 11, pp. 1459–1472, 2004.
- [28] M. Dikmen, S. F. Tsai, and T. S. Huang, “Base selection in estimating sparse foreground in video,” in 16th IEEE International Conference on Image Processing, Nov 2009.