Full Reference Objective Quality Assessment
for Reconstructed Background Images
With an increased interest in applications that require a clean background image, such as video surveillance, object tracking, street view imaging and location-based services on web-based maps, multiple algorithms have been developed to reconstruct a background image from cluttered scenes. Traditionally, statistical measures and existing image quality techniques have been applied for evaluating the quality of the reconstructed background images. Though these quality assessment methods have been widely used in the past, their performance in evaluating the perceived quality of the reconstructed background image has not been verified. In this work, we discuss the shortcomings in existing metrics and propose a full reference Reconstructed Background image Quality Index (RBQI) that combines color and structural information at multiple scales using a probability summation model to predict the perceived quality in the reconstructed background image given a reference image. To compare the performance of the proposed quality index with existing image quality assessment measures, we construct two different datasets consisting of reconstructed background images and corresponding subjective scores. The quality assessment measures are evaluated by correlating their objective scores with human subjective ratings. The correlation results show that the proposed RBQI outperforms all the existing approaches. Additionally, the constructed datasets and the corresponding subjective scores provide a benchmark to evaluate the performance of future metrics that are developed to evaluate the perceived quality of reconstructed background images.
A clean background image has great significance in multiple applications. It can be used for video surveillance , activity recognition , object detection and tracking , , street view imaging and location-based services on web-based maps [5, 6], and texturing 3D models obtained from multiple photographs or videos . But acquiring a clean photograph of a scene is seldom possible. There are always some unwanted objects occluding the background of interest. The technique of acquiring a clean background image by removing the occlusions using frames from a video or multiple views of a scene, is known as background reconstruction or background initialization. Many algorithms have been proposed for initializing the background images from videos, for example, [8, 9, 10, 11, 12, 13, 14]; and also from multiple images such as [15, 16, 17].
Background initialization or reconstruction is crippled by multiple challenges. The pseudo-stationary background (e.g., waving trees, waves in water, etc.) poses additional challenges in separating the moving foreground objects from the relatively stationary background pixels. The illumination conditions can vary across the images thus changing the global characteristics of each image. The illumination changes cause local phenomena such as shadows, reflections and shading, which change the local characteristics of the background across the images or frames in a video. Finally, the removal of âforegroundâ objects from the scene creates holes in the background that need to be filled in with pixels that maintain the continuity of the background texture and structures in the recovered image. Thus the background reconstruction algorithms can be characterized by two main tasks: 1. foreground detection, in which the foreground is separated from the background by classifying pixels as foreground or background; 2. background recovery, in which the holes formed due to foreground removal are filled.
The performance of a background extraction algorithm depends on two factors: 1. its ability to detect the foreground objects in the scene and completely eliminate them; and 2. the perceived quality of the reconstructed background image. Traditional statistical techniques such as Peak Signal to Noise Ratio (PSNR), Average Gray-level Error (AGE), total number of error pixels (EPs), percentage of EPs (pEP), number of Clustered Error Pixels (CEPs) and percentage of CEPs (pCEPs)  quantify the performance of the algorithm in its ability to remove foreground objects from a scene to a certain extent, but they do not give an indication of the perceived quality of the generated background image. On the other hand, the existing Image Quality Assessment (IQA) techniques such as Multi-scale Similarity metric (MS-SSIM)  and Color image Quality Measures (CQM)  used by the authors in  to compare different background reconstruction algorithms are not designed to identify any residual foreground objects in the scene. Lack of a quality metric that can reliably assess the performance of background reconstruction algorithms by quantifying both aspects of a reconstructed background image motivated the development of the proposed Reconstructed Background visual Quality Index (RBQI). RBQI uses the contrast, structure and color information to determine the presence of any residual foreground objects in the reconstructed background image as compared to the reference background image and to detect any unnaturalness introduced by the reconstruction algorithm that affects the perceived quality of the reconstructed background image.
This paper also presents two datasets that are constructed to assess the performance of the proposed as well as popular existing objective quality assessment methods in predicting the perceived visual quality of the reconstructed background images. The datasets consist of reconstructed background images generated using different background reconstruction algorithms in the literature along with the corresponding subjective ratings. Some of the existing datasets such as video surveillance datasets (Wallflower , I2R ), background subtraction datasets (UCSD , CMU ) and object tracking evaluation dataset (“Performance Evaluation of Tracking and Surveillance (PETS)”) are not suited for this application as they do not provide reconstructed background images but just the foreground masks as ground-truth. The more recent database “Scene Background Modeling Net” (SBMNet)  is targeted at comparing the performance of the background initialization algorithms but it does not provide any subjective ratings for the reconstructed background images. Hence the SBMNet database  is not suited for benchmarking the performance of objective background visual quality assessment. The datasets proposed in this work are the first and currently the only datasets that can be used for benchmarking existing and future metrics developed to assess the quality of reconstructed background images.
The rest of the paper is organized as follows. In Section II we highlight the limitations of existing popular assessment methods . We introduce the new benchmarking datasets in Section III along with the details of the subjective tests. In Section IV, we propose a new index that makes use of a probability summation model to combine structure and color characteristics at multiples scales for quantifying the perceived quality in reconstructed background images. Performance evaluation results for the existing and proposed objective visual quality assessment methods are presented in Section V for reconstructed background images. Finally, we conclude the paper in Section VI and also provide directions for future research.
Ii Existing Full Reference Background Quality Assessment Techniques and their limitations
Existing background reconstruction quality metrics can be classified into two categories: statistical and image quality assessment (IQA) techniques, depending on the type of features used for measuring the similarity between the reconstructed background image and reference background image.
Ii-a Statistical Techniques
Statistical techniques use intensity values at co-located pixels in the reference and reconstructed background images to measure the similarity. Popular statistical techniques  that have been traditionally used for judging the performance of background initialization algorithms are briefly explained here.
(i) Average Gray-level Error (): AGE is calculated as the absolute difference between the gray levels of the co-located pixels in the reference and reconstructed background image.
(ii) Error Pixels (): gives the total number of error pixels. A pixel is classified as an error pixel if the absolute difference between the corresponding pixels in the reference and reconstructed background images is greater than an empirically selected threshold .
(iii) Percentage Error Pixels (): Percentage of the error pixels, calculated as , where is the total number of pixels in the image.
(iv) Clustered Error Pixels (): gives the total number of clustered error pixels. A clustered error pixel is defined as the error pixel whose 4 connected pixels are also classified as error pixels.
(v) Percentage Clustered Error Pixels (): Percentage of the clustered error pixels, calculated as , where is the total number of pixels in the image.
Though these techniques have been used to judge the quality of the reconstructed background images, their performance has not been previously evaluated. As we show in Section V and as noted by the authors in , the statistical techniques were found to not correlate well with the subjective quality scores.
Ii-B Image Quality Assessment
The existing Full Reference Image Quality Assessment (FR-IQA) techniques use perceptually inspired features for measuring the similarity between two images. Though these techniques have been shown to work reasonably well while assessing images affected by distortions such as blur, compression artifacts and noise, these techniques have not been designed for assessing the quality of reconstructed background images. In  popular FR-IQA techniques including Peak Signal to Noise ratio (PSNR), Multi-scale Similarity metric (MS-SSIM)  and Color image Quality Measure (CQM) , were adopted for objectively comparing the performance of the different background reconstruction algorithms; however, no performance evaluation was carried out to support the choice of these techniques. Other popular IQA techniques include Structural Similarity Index (SSIM) , visual signal-to-noise ratio (VSNR) , visual information fidelity (VIF) , pixel-based VIF (VIFP) , universal quality index (UQI) , image fidelity criterion (IFC) , noise quality measure (NQM) , weighted signal-to-noise ratio (WSNR) , feature similarity index (FSIM) , FSIM with color (FSIMc) , spectral residual based similarity (SR-SIM)  and saliency-based SSIM (SalSSIM) . The suitability of these techniques for evaluating the quality of reconstructed background images remains unexplored.
As the first contribution of this paper we present two benchmarking datasets that can be used for comparing the performance of different techniques in objectively assessing the perceived quality of the reconstructed background images. These datasets contain reconstructed background images along with their subjective ratings, details of which are discussed in Section III-A. When the statistical and IQA techniques were tested on these datasets, none of the techniques were found to correlate well with the subjective scores as discussed in Section V. This motivated our second contribution, the objective Reconstructed Background Quality Index (RBQI) that is shown to outperform all the existing techniques in assessing the perceived visual quality of reconstructed background images.
Iii Subjective Quality Assessment of Reconstructed Background Images
In this section we present two different datasets constructed as part of this work to serve as benchmarks for comparing existing and future techniques developed for assessing the quality of reconstructed background images. The images and subjective experiments for both datasets are described in the subsequent subsections.
Each dataset contains the original sequence of images or videos that are used as inputs to the different reconstruction algorithms, the background images reconstructed by the different algorithms and the corresponding subjective scores.
Iii-A1 Reconstructed Background Quality (ReBaQ) Dataset
This database consists of sequences of multiple images for eight different scenes. Every image sequence consists of 8 different views such that the background is visible at every pixel in at least one of the views. A reference background image that is free of any foreground objects is also captured for every scene. Figure 1 shows the reference images corresponding to each of the eight different scenes in this database.
Each of the image sequences is used as input to twelve different background reconstruction algorithms [8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. The 144 () background images generated by these algorithms along with the corresponding reference images for the scene are then used for the subjective evaluation. Each of the scenes pose a different challenge for the background reconstruction algorithms. For example, “Street” and “Wall” are outdoor sequences with textured backgrounds while the “Hall” is an indoor sequence with textured background. The “WetFloor” sequence challenges the underlying principal of many background reconstruction algorithms with water appearing as a low-contrast foreground object. The “Escalator” sequence has large motion in the background due to the moving escalator, while “Park” has smaller motion in the background due to waving trees. The “Illumination” sequence exhibits changing light sources, directions and intensities while the “Building” sequence has changing reflections in the background. Broadly, the dataset contains two categories based on the scene characteristics: (i) Static, the scenes for which all the pixels in the background are stationary; and (ii) Dynamic, the scenes for which there are non-stationary background pixels (e.g., moving escalator, waving trees, varying reflections). Four out of the eight scenes in the ReBaQ dataset are categorized as Static and the remaining four are categorized as Dynamic scenes. The reference background images corresponding to the static scenes are shown in Figure 1\subreffig:staticReBaQ. Although there are reflections on the floor in the “WetFloor” sequence, it does not exhibit variations at the time of recording and hence is categorized as static background scene. The reference background images corresponding to the dynamic background scenes are shown in Figure 1\subreffig:dynReBaQ.
Iii-A2 SBMNet based Reconstructed Background Quality (S-ReBaQ) Dataset
This dataset is created from the videos in the Scene Background Modeling Net (SBMNet) dataset  used for the Scene Background Modeling Challenge (SBMC) 2016 . SMBNet consists of image sequences corresponding to a total of 79 scenes. These image sequences are representative of typical indoor and outdoor visual data captured in surveillance, smart environment, and video database scenarios. The spatial resolutions of the sequences corresponding to different scenes vary from 240x240 to 800x600. The length of the sequences also varies from 6 to 9,370 images. The authors of SBMNet categorize these scenes into eight different classes based on the challenges posed : (a) Basic category represents a mixture of mild challenges typical of the shadows, Dynamic Background, Camera Jitter and Intermittent Object Motion categories; (b) Background motion category includes scenes with strong (parasitic) background motion; for example, in the “Advertisement Board” sequence the advertisement board in the scene periodically changes; (c) Intermittent Motion category includes sequences with scenarios known for causing “ghosting” artifacts in the detected motion; (d) Jitter category contains indoor and outdoor sequences captured by unstable cameras; (e) Clutter category includes sequences containing a large number of foreground moving objects occluding a large portion of the background; (f) Illumination Changes category contains indoor sequences containing strong and mild illumination changes; (g) Very Long category contains sequences each with more than 3,500 images; (h) Very Short category contains sequences with a limited number of images (less than 20). The authors of SBMNet  provide reference background images for only 13 scenes out of the 79 scenes. There is at least one scene corresponding to each category with reference background image available. We use only these 13 scenes for which the reference background images are provided. Figure 2 shows the reference background images corresponding to the scenes in this database with the categories from SBMNet  in brackets. Background images that were reconstructed by 14 algorithms submitted to SBMC [16, 12, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48] corresponding to the selected 13 scenes were used in this work for conducting subjective tests. As a result, a total of 182 () reconstructed background images along with their corresponding subjective scores form the S-ReBaQ dataset.
Iii-B Subjective Evaluation
The subjective ratings are obtained by asking the human subjects to rate the similarity of the reconstructed background images to the reference background images. The subjects had to score the images based on three aspects: 1) overall perceived visual image quality; 2) visibility or presence of foreground objects, and 3) perceived background reconstruction quality. The subjects had to score the image quality on a 5-point scale, with 1 being assigned to the lowest rating of âBadâ and 5 assigned to the highest rating of âExcellentâ. The second aspect was determining the presence of foreground objects. For our application, we defined the foreground object as any object that is not present in the reference image. The foreground visibility was scored on a 5-point scale marked as: â1-All foreground visibleâ, â2-Mostly visibleâ, â3-Partly visible but annoyingâ, â4-Partly visible but not annoyingâ and â5-None visibleâ. The background reconstruction quality was also measured using a 5-point scale similar to that of the image quality, but the choices were limited based on how the first two aspects of an image were scored. For example, as illustrated in Figure 4, if the image quality was rated as excellent but the foreground object visibility was rated 1 (all visible), the reconstructed background quality cannot be scored to be very high. The background reconstruction quality scores, referred to as raw scores in the rest of the paper, are used for calculating the Mean Opinion Score (MOS).
We adopted a double-stimulus technique in which the reference and the reconstructed background images were presented side-by-side  to each subject as shown in Figure 3 and 4. Though the same testing strategy and set up was used for the ReBaQ and S-ReBaQ datasets described in Section III-A, the tests for each dataset were conducted in separate sessions.
As discussed in , the subjective experiments were carried out on a 23-inch Alienware monitor with a resolution of 1920x1080. Before the experiment, the monitor was reset to its factory settings. The setup was placed in a laboratory under normal office illumination conditions. Subjects were asked to sit at a viewing distance of 2.5 times the monitor height.
Seventeen subjects participated in the subjective test for the ReBaQ dataset, while sixteen subjects participated in the subjective test for the S-ReBaQ dataset. The subjects were tested for vision and color blindness using the Snellen chart  and Ishihara color vision test , respectively. A training session was conducted before the actual subjective testing, in which the subjects were shown few images covering different quality levels and distortions of the reconstructed background images and their responses were noted to confirm their understanding of the tests.
Since the number of participating subjects was less than 20 for each of the datasets, the raw scores obtained by subjective evaluation were screened using the procedure in ITU-R BT 500.13 . The kurtosis of the scores is determined as the ratio of the fourth order moment and the square of the second order moment. If the kurtosis lies between 2 and 4, the distribution of the scores can be assumed to be normal. If more than 5% of the scores given by a particular subject lie outside the range of 2 standard deviations from the mean scores in case of normally distributed scores, that subject is rejected. For the scores that are not normally distributed the range is determined as times the standard deviation. In our study two subjects were found to be outliers and the corresponding scores were rejected for the ReBaQ dataset, while no subject was rejected for the S-ReBaQ dataset. MOS scores were calculated as the average of the raw scores retained after outlier removal. The raw scores and MOS scores with the standard deviations are provided along with the database.
Figure 5 shows an input sequence for a scene in the ReBaQ dataset together with reconstructed background images using different algorithms and corresponding MOS scores.
Iv Proposed Reconstructed Background Quality Index
In this section we propose a full-reference quality index that can automatically assess the perceived quality of the reconstructed background images. The proposed Reconstructed Background Quality Index (RBQI) uses a probability summation model to combine visual characteristics at multiple scales and quantify the deterioration in the perceived quality of the reconstructed background image due to the presence of any residual foreground objects or unnaturalness that may be introduced by the background reconstruction algorithm. The motivation for RBQI comes from the fact that the quality of a reconstructed background image depends on two factors namely: (i) the visibility of the foreground objects, and (ii) the visible artifacts introduced while reconstructing the background image.
A block diagram of the proposed quality index (RBQI) is shown in Figure 6. An -level multi-scale decomposition of the reference and reconstructed background images is obtained through lowpass filtering using an averaging filter  and downsampling, where corresponds to finest scale and corresponds to the coarsest scale. For each level , contrast, structure and color differences are computed locally at each pixel to produce a contrast-structure difference map and a color difference map. The difference maps are combined in local regions within each scale and later across scales using a ‘probability summation model’ to predict the perceived quality of the reconstructed background image. More details about the computation of the difference maps and the proposed RBQI based on a probability summation model are provided below.
Iv-a Structure Difference Map ()
An image can be decomposed into three different components: luminance, contrast and structure . By comparing these components, similarity between two images can be calculated [28, 19]. A reconstructed background image is formed by mosaicing together parts of different input images, hence, preservation of the local luminance from the reference background image is of low relevance as long as the structure continuity is maintained. Any sudden variation in the local luminance across the reconstructed background image manifests itself as contrast or structure deviation from the reference image. Thus, in our application we consider only contrast and structure for comparing the reference and reconstructed background images while leaving out the luminance component. These contrast and structure differences between the reference and the reconstructed background images, calculated at each pixel, give us the ‘contrast-structure difference map’ referred to as ‘structure map’ for short in the rest of the paper.
First the structure similarity between the reference and the reconstructed background image, referred to as Structure Index (), is calculated using :
where is the reference background image, is the reconstructed background image, and are the standard deviations of the reference and reconstructed background image, respectively. is the cross-correlation between the reference and reconstructed background images at location . is a small constant to avoid instability and is calculated as , is set to and is the maximum possible value of the pixel intensity ( in this case) . A higher value indicates higher similarity between the pixels in the reference and reconstructed background images.
The background scenes often contain pseudo-stationary objects such as waving trees, escalator, local and global illumination changes. Even though these pseudo-stationary pixels belong to the background, because of the presence of motion, they are likely to be classified as foreground pixels. For this reason the pseudo-stationary backgrounds pose an additional challenge for the quality assessment algorithms. Just comparing co-located pixel neighborhoods in the two considered images is not sufficient in the presence of such dynamic backgrounds, our algorithm uses a search window of size centered at the current pixel in the reconstructed image, where is an odd value. The is calculated between the pixel at location in the reference image and pixels within the search window centered at pixel in the reconstructed image. The resulting matrix is of size . The modified Equation (1) to calculate for every pixel location in the window centered at is given as:
The maximum value of the matrix is taken to be the final value for the pixel at location as given below:
The map takes on values between [-1,1].
In the proposed method, the map is computed at different scales denoted as . The quality maps generated at three different scales for the background image shown in Figure 5\subreffig:EscOut and reconstructed using method in  are shown in Figure 7. The darker regions in these images indicate larger structure differences between the reference and the reconstructed background images while the lighter regions indicate higher similarities.
The structure difference map is calculated using the map at each scale as follows:
takes on values between [0,1] where the value of corresponds to no difference while corresponds to largest difference.
Iv-B Color Distance ()
The map is vulnerable to failures while detecting differences in areas of background images with no textures or no structural information and/or with objects of same luminance but different color. Hence we incorporate the color information at every scale while calculating the RBQI. The reference and the reconstructed images are converted to the color space and filtered using a lowpass Gaussian filter. The color difference between the filtered reference and reconstructed background images at each scale is then calculated as the Euclidian distance between the values of co-located pixels as follows:
In (5), for the color space components the scale index was dropped from the notation for convenience.
Iv-C Computation of the Reconstructed Background Quality Index (RBQI) based on Probability Summation
As indicated previously, the reference and reconstructed background images are decomposed each into a multi-scale pyramid with levels. Structure difference maps and color difference maps are computed at every level as described in Equations (4) and (5), respectively. These difference maps are pooled together within the scale and later across all scales using a probability summation model  to give the final RBQI.
The probability summation model as described in  considers an ensemble of independent difference detectors at every pixel location in the image. These detectors predict the probability of perceiving the difference between the reference and the reconstructed background images at the corresponding pixel location based on its neighborhood characteristics in the reference image. Using this model, the probability of the structure difference detector signaling presence of a structure difference at pixel location at level can be modeled as an exponential of the form:
where is a parameter chosen to increase the correspondence of RBQI with experimentally determined MOS scores on a training dataset and is a parameter whose value depends upon the texture characteristics of the neighborhood centered at in the reference image. The value of is chosen to take into account that differences in structure are less perceptible in textured areas as compared to non-textured areas.
In order to determine the value of , every pixel in the reference image is classified as textured or non-textured using the technique in . This method first calculates the local variance at each pixel using a 3x3 window centered around it. Based on the computed variances a pixel is classified as edge, texture or uniform. By considering the number of edge, texture and uniform pixels in the 8x8 neighborhood of the pixel, it is further classified into one of the six types: uniform, uniform/texture, texture, edge/texture, medium edge and strong edge. For our application we label the pixels classified as ‘texture’ and ‘edge/texture’ as ’textured’ pixels and we label the rest as ‘non-textured’ pixels.
Let, be the flag indicating that a pixel is textured. Thus values of can be expressed as:
In our implementation we chose the value of resulting in a value of close to zero when a pixel is classified as textured.
Similarly, the probability of the color difference detector signaling the presence of a color difference at pixel location at level can be modeled as:
where is found in a similar way to and corresponds to the Adaptive Just Noticeable Distortion (AJNCD) calculated at every pixel in the color space as given in :
where is set to 2.3 , is the mean background luminance of the pixel at and is the maximum luminance gradient across pixel . In Equation (9), is the scaling factor used for adjusting the dimension of ellipsoid along the chroma axis as is given by :
where and correspond to the and color values of pixel located at in the Lab color space, respectively. is the scaling factor that simulates the local luminance texture masking as is given by:
where is the weighting factor as described in . Thus, varies at every pixel location based on the distance between the chroma values and texture masking properties of its neighborhood.
A pixel at the -th level is said to have no distortion if and only if neither the structure difference detector nor the color difference detector at location signal the presence of any differences. Thus, the probability of detecting no difference between reference and reconstructed background images at pixel and level can be written as:
A less localized probability of difference detection can be computed by adopting the probability summation hypothesis which pools over the localized detection probabilities over a region . The probability summation hypothesis is based on the following two assumptions: 1) no difference is detected if none of the detectors in the region sense the presence of distortion, and 2) the probabilities of detection at all locations in the region are independent. Then the probability of no difference detection over the region is given by:
Substituting Equation (12) in the above equation gives:
In the human visual system, the highest visual acuity is limited to the size of foveal region, which covers approximately of visual angle. In our work, we consider the image regions as foveal regions approximated by non-overlapping image blocks.
The probability of no distortion detection over the -th level is obtained by pooling the no detection probabilities over all the regions and is given by:
Thus the final probability of detecting no distortion in a reconstructed background image is obtained by pooling the no detection probabilities over all scales , , as follows:
From Equation (24), it can be seen that and take the form of a Minkowski metric with exponent and , respectively.
By substituting the values , , , , and in Equation (23) and simplifying, we get:
Thus the probability of detecting a difference between the reference image and a reconstructed background image is given as:
As it can be seen from Equation (28), a lower value of results in a lower probability of difference detection while a higher value results in a higher probability of difference detection. Therefore, can be used to assess the perceived quality in the reconstructed background image, with a lower value of corresponding to a higher perceived quality.
The final Reconstructed Background Quality Index (RBQI) for a reconstructed background image is calculated using the logarithm of as follows:
As increases the value of RBQI increases implying more perceived distortion and thus lower quality of the reconstructed background image. The logarithmic mapping models the saturation effect, i.e., beyond a certain point the maximum annoyance level is reached and more distortion does not affect the quality.
In this section we analyze the performance of RBQI in terms of its ability to predict the subjective ratings for the perceived quality of reconstructed background images. We evaluate the performance of the proposed quality index in terms of its prediction accuracy, prediction monotonicity and prediction consistency and provide comparisons with the existing statistical and IQA techniques. In our implementation, we set , and . We also evaluate the performance of RBQI for different scales and neighborhood search windows. We conduct a series of hypothesis tests based on the prediction residuals (errors in predictions) after nonlinear regression. These tests help in making statistically meaningful conclusions on the index’s performance.
We use the two databases ReBaQ and S-ReBaQ described in Section III-A to quantify and compare the performance of RBQI. For performance evaluation, we employ three most commonly used metrics: (i) Spearman rank-order correlation coefficient; (ii) Pearson correlation coefficient; and (iii) root mean squared error (RMSE). A 4-parameter regression function  is applied to IQA metrics to provide a non-linear mapping between the objective scores and the subjective mean opinion scores (MOS):
where denotes the predicted quality for the th image and denotes the quality score after fitting, and , are the regression model parameters.
V-a Performance Comparison
Image Quality Assessment Metrics
Tables I and II show the obtained performance evaluation results of the proposed RBQI technique on the ReBaQ and S-ReBaQ datasets, respectively, as compared to the existing statistical and FR-IQA algorithms. The results show that the proposed quality index yields higher correlation with the subjective scores as compared to any other existing technique. The statistical techniques are shown to not correlate well with the subjective scores on either of the datasets. Among the FR-IQA algorithms, the performance of the NQM  comes close to the proposed technique for scenes with static background images, i.e., for the ReBaQ dataset, as it considers the effects of contrast sensitivity, luminance variations, contrast interaction between spatial frequencies and contrast masking effect while weighting the SNR between the ground truth and reconstructed image. The more popular MS-SSIM  technique is shown to not correlate well with the subjective scores for the ReBaQ database. This is because the MS-SSIM calculates the final quality index of the image by just averaging over the entire image. In the problem of background reconstruction the error might occupy a relatively small area as compared to the image size, thereby under-penalizing the residual foreground. RBQI provides a higher correlation with the subjective scores by a margin of 6% over MS-SSIM on S-ReBaQ. None of the FR-IQA or statistical techniques were found to correlate with the scores in the ReBaQ dataset. This is because the assumption of pixel-to-pixel correspondence is no longer valid in the presence of pseudo-stationary background. The proposed RBQI technique uses a neighborhood window to handle such backgrounds, thereby improving the performance over NQM  by a margin of 10% and by 30% over MS-SSIM . CQM  used in the Scene Background Modeling Challenge 2016 (SBMC)  and  to compare the performance of the algorithms is shown to perform very poorly on all three datasets and hence is not a good choice for evaluating the quality of reconstructed background images and thus is not suitable for comparing the performance of background reconstruction algorithms.
The P-value is the probability of getting a correlation as large as the observed value by random chance, while the variables are independent. If the P-value is less than 0.05 then the correlation is significant. The P-values (P and P) reported in Tables I and II indicate that most of the correlation scores are statistically significant.
V-B Model Parameter Selection
The proposed quality index accepts four parameters: 1. , dimensions of the window centered around the current pixel for calculating the ; 2. , number of multi-scale levels; 3. , used in the calculation of in Equation (6); and 4. , used in the calculation of in Equation (8). In Table III, we evaluate our algorithm with different values for the parameters. These simulations were run only on the ReBaQ dataset. Table III\subreftable:nhoodCor shows the effect of varying values on the performance of RBQI. The performance of RBQI for ReBaQ did not significantly change with the increase in the neighborhood search window size as expected, but the performance of RBQI increased drastically for the ReBaQ dataset from to before starting to drop at . Thus we chose for all our experiments. Table III\subreftable:scaleCor gives performance results for different number of scales. As a tradeoff between the computation complexity and prediction accuracy we chose the number of scales to be . The probability summation model parameters and were found such that they maximized the correlation between RBQI and MOS scores on a training dataset consisting of randomly selected images from the ReBaQ dataset. Values were found to correlate well with the subjective tests.
These parameters remained unchanged for the experiments conducted on the S-ReBaQ dataset to obtain the values in Table II.
In this paper we addressed the problem of quality evaluation of reconstructed background images. We first proposed two different datasets for benchmarking the performance of existing and future techniques proposed to evaluate the quality of reconstructed background images. Then we proposed the first full-reference Reconstructed Background Quality Index (RBQI) to objectively measure the perceived quality of the reconstructed background images.
The RBQI uses the probability summation model to combine visual characteristics at multiple scales to quantify the deterioration in the perceived quality of the reconstructed background image due to the presence of any foreground objects or unnaturalness that may be introduced by the background reconstruction algorithm. The use of a neighborhood search window while calculating the contrast and structure differences provides further boost in the performance in the presence of pseudo-stationary background while not affecting the performance on scenes with static background. The probability summation model penalizes only the perceived differences across the reference and reconstructed background images while the unperceived differences do not affect the RBQI, thereby giving better correlation with the subjective scores. Experimental results on the benchmarking datasets showed that the proposed measure out-performed all the existing statistical and IQA techniques in estimating the perceived quality of reconstructed background images.
-  R. M. Colque and G. Cámara-Chávez, “Progressive background image generation of surveillance traffic videos based on a temporal histogram ruled by a reward/penalty function,” in Conference on Graphics, Patterns and Images, Aug 2011, pp. 297–304.
-  C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 747–757, Aug 2000.
-  L. Li, W. Huang, I. Y. H. Gu, and Q. Tian, “Statistical modeling of complex backgrounds for foreground object detection,” IEEE Transactions on Image Processing, vol. 13, no. 11, pp. 1459–1472, Nov 2004.
-  F. Fleuret, J. Berclaz, R. Lengagne, and P. Fua, “Multicamera people tracking with a probabilistic occupancy map,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 267–282, Feb 2008.
-  A. Flores and S. Belongie, “Removing pedestrians from google street view images,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2010, pp. 53–58.
-  W. D. Jones, “Microsoft and google vie for virtual world domination,” IEEE Spectrum, vol. 43, no. 7, pp. 16–18, 2006.
-  E. Zheng, Q. Chen, X. Yang, and Y. Liu, “Robust 3d modeling from silhouette cues,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, April 2009, pp. 1265–1268.
-  L. Maddalena and A. Petrosino, “A self-organizing approach to background subtraction for visual surveillance applications,” IEEE Transactions on Image Processing,, vol. 17, no. 7, pp. 1168–1177, July 2008.
-  S. Varadarajan, L. Karam, and D. Florencio, “Background subtraction using spatio-temporal continuities,” in Proc. European Workshop on Visual Information Processing, July 2010, pp. 144–148.
-  D. Farin, P. de With, and W. Effelsberg, “Robust background estimation for complex video sequences,” in Proc. IEEE International Conference on Image Processing, vol. 1, Sept 2003, pp. 145–148.
-  H.-H. Hsiao and J.-J. Leou, “Background initialization and foreground segmentation for bootstrapping video sequences,” EURASIP Journal on Image and Video Processing, vol. 2013, p. 12.
-  V. Reddy, C. Sanderson, and B. Lovell, “A low-complexity algorithm for static background estimation from cluttered image sequences in surveillance contexts,” EURASIP Journal on Image and Video Processing, pp. 1:1–1:14, Oct 2010.
-  J. Yao and J. Odobez, “Multi-layer background subtraction based on color and texture,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition,, June 2007, pp. 1–8.
-  A. Colombari and A. Fusiello, “Patch-based background initialization in heavily cluttered video,” IEEE Transactions on Image Processing, vol. 19, no. 4, pp. 926–933, April 2010.
-  C. Herley, “Automatic occlusion removal from minimum number of images,” in Proc. IEEE International Conference on Image Processing, vol. 2, Sept 2005, pp. I1046–1049.
-  A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B. Curless, D. Salesin, and M. Cohen, “Interactive digital photomontage,” ACM Transactions on Graphics, vol. 23, no. 3, pp. 294–302, Aug 2004.
-  A. Shrotre and L. Karam, “Background recovery from multiple images,” in Proc. IEEE Digital Signal Processing and Signal Processing Education Meeting, Aug 2013, pp. 135–140.
-  L. Maddalena and A. Petrosino, “Towards benchmarking scene background initialization,” in Proc. International Conference on Image Analysis and Processing, Sept 2015, pp. 469–476.
-  Z. Wang, E. Simoncelli, and A. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. Asilomar Conference on Signals, Systems and Computers, vol. 2, Nov 2003, pp. 1398–1402.
-  Y. Yalman and İ. Ertürk, “A new color image quality measure based on YUV transformation and PSNR for human vision system,” Turkish Journal of Electrical Engineering & Computer Sciences, vol. 21, no. 2, pp. 603–612, 2013.
-  T. Bouwmans, L. Maddalena, and A. Petrosino, “Scene background initialization: A taxonomy,” Pattern Recognition Letters, vol. 96, pp. 3–11, Sept 2017.
-  K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: principles and practice of background maintenance,” in Proc. IEEE International Conference on Computer Vision,, vol. 1, 1999, pp. 255–261.
-  L. Li, W. Huang, I.-H. Gu, and Q. Tian, “Statistical modeling of complex backgrounds for foreground object detection,” IEEE Transactions on Image Processing,, vol. 13, no. 11, pp. 1459–1472, Nov 2004.
-  V. Mahadevan and N. Vasconcelos, “Spatiotemporal saliency in dynamic scenes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 1, pp. 171–177, Jan 2010.
-  Y. Sheikh and M. Shah, “Bayesian modeling of dynamic scenes for object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 11, pp. 1778–1792, Nov 2005.
-  P. Jodoin, L. Maddalena, and A. Petrosino. (2016) Scene Background Modeling dataset. [Online]. Available: www.SceneBackgroundModeling.net
-  A. Shrotre and L. Karam, “Visual quality assessment of reconstructed background images,” in Proc. International Conference on Quality of Multimedia Experience, June 2016, pp. 1–6.
-  Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, April 2004.
-  D. Chandler and S. Hemami, “VSNR: A wavelet-based visual signal-to-noise ratio for natural images,” IEEE Transactions on Image Processing, vol. 16, no. 9, pp. 2284–2298, Sept 2007.
-  H. Sheikh and A. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430–444, Feb 2006.
-  Z. Wang and A. Bovik, “A universal image quality index,” IEEE Signal Processing Letters, vol. 9, no. 3, pp. 81–84, March 2002.
-  H. Sheikh, A. Bovik, and G. de Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Transactions on Image Processing, vol. 14, no. 12, pp. 2117–2128, Dec 2005.
-  N. Damera-Venkata, T. Kite, W. Geisler, B. Evans, and A. Bovik, “Image quality assessment based on a degradation model,” IEEE Transactions on Image Processing, vol. 9, no. 4, pp. 636–650, April 2000.
-  T. Mitsa and K. Varkur, “Evaluation of contrast sensitivity functions for the formulation of quality measures incorporated in halftoning algorithms,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, April 1993, pp. 301–304.
-  L. Zhang, D. Zhang, X. Mo, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, Aug 2011.
-  L. Zhang and H. Li, “SR-SIM: A fast and high performance iqa index based on spectral residual,” in Proc. IEEE International Conference on Image Processing, Sept 2012, pp. 1473–1476.
-  W. Akamine and M. Farias, “Incorporating visual attention models into video quality metrics,” in SPIE Proceedings, vol. 9016, 2014, pp. 90 160O–1–90 160O–9.
-  (2016) Scene Background Modeling Challenge. [Online]. Available: http://www.icpr2016.org/site/session/scene-background-modeling-sbmc2016/
-  B. Laugraud, S. PiÃ©rard, and M. Van Droogenbroeck, “LaBGen-P: A pixel-level stationary background generation method based on LaBGen,” in Proc. International Conference on Pattern Recognition, Dec 2016, pp. 107–113.
-  L. Maddalena and A. Petrosino, “Extracting a background image by a multi-modal scene background model,” in Proc. International Conference on Pattern Recognition, Dec 2016, pp. 143–148.
-  S. Javed, S. K. Jung, A. Mahmood, and T. Bouwmans, “Motion-Aware Graph Regularized RPCA for background modeling of complex scenes,” in Proc. International Conference on Pattern Recognition, Dec 2016, pp. 120–125.
-  W. Liu, Y. Cai, M. Zhang, H. Li, and H. Gu, “Scene background estimation based on temporal median filter with gaussian filtering,” in Proc. International Conference on Pattern Recognition, Dec 2016, pp. 132–136.
-  G. Ramirez-Alonso, J. A. Ramirez-Quintana, and M. I. Chacon-Murguia, “Temporal weighted learning model for background estimation with an automatic re-initialization stage and adaptive parameters update,” Pattern Recognition Letters, vol. 96, no. Supplement C, pp. 34–44, 2017.
-  T. Minematsu, A. Shimada, and R. I. Taniguchi, “Background initialization based on bidirectional analysis and consensus voting,” in Proc. International Conference on Pattern Recognition, Dec 2016, pp. 126–131.
-  M. Piccardi, “Background subtraction techniques: A review,” in Proc. IEEE International Conference on Systems, Man and Cybernetics, vol. 4, Oct 2004, pp. 3099–3104.
-  I. Halfaoui, F. Bouzaraa, and O. Urfalioglu, “CNN-based initial background estimation,” in Proc. International Conference on Pattern Recognition, Dec 2016, pp. 101–106.
-  M. I. Chacon-Murguia, J. A. Ramirez-Quintana, and G. Ramirez-Alonso, “Evaluation of the background modeling method auto-adaptive parallel neural network architecture in the SBMnet dataset,” in Proc. International Conference on Pattern Recognition, Dec 2016, pp. 137–142.
-  D. Ortego, J. C. SanMiguel, and J. M. MartÃnez, “Rejection based multipath reconstruction for background estimation in SBMnet 2016 dataset,” in Proc. International Conference on Pattern Recognition, Dec 2016, pp. 114–119.
-  “Methodology for the subjective assessment of the quality of television pictures,” International Telecommunications Union, Tech. Rep. ITU-R BT.500-13, Jan 2012.
-  H. Snellen, “Probebuchstaben zur bestimmung der sehschärfe,” Utrecht, 1862.
-  http://colorvisiontesting.com/ishihara.htm.
-  J. Robson and N. Graham, “Probability summation and regional variation in contrast sensitivity across the visual field,” Vision Research, vol. 21, no. 3, pp. 409–418, 1981.
-  J. Su and R. Mersereau, “Post-procesing for artifact reduction in jpeg-compressed images,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, May 1995, pp. 2363–2366.
-  C.-H. Chou and K.-C. Liu, “Colour image compression based on the measure of just noticeable colour difference,” IET Image Processing, vol. 2, no. 6, pp. 304–322, Dec 2008.
-  M. Mahy, L. Eycken, and A. Oosterlinck, “Evaluation of uniform color spaces developed after the adoption of cielab and cieluv,” Color Research & Application, vol. 19, no. 2, pp. 105–121, 1994.
-  Final report from the Video Quality Experts Group on the validation of objective models of video quality assessment, VQEG Std., 2000.