Mean Deviation Similarity Index: Efficient and Reliable Full-Reference Image Quality Evaluator

Mean Deviation Similarity Index: Efficient and Reliable Full-Reference Image Quality Evaluator

Hossein Ziaei Nafchi, Atena Shahkolaei, Rachid Hedjam, and Mohamed Cheriet, H. Ziaei Nafchi, A. Shahkolaei and M. Cheriet are with the Synchromedia Laboratory for Multimedia Communication in Telepresence, École de technologie supérieure, Montreal (QC), Canada H3C 1K3; Tel.: +1-514-396-8972; Fax: +1-514-396-8595; Emails: hossein.zi@synchromedia.ca, atena.shahkolaei.1@ens.etsmtl.ca, mohamed.cheriet@etsmtl.caR. Hedjam is with the Department of Geography, McGill University, 805 Sherbrooke Street West, Montreal, QC H3A 2K6, Canada (email: rachid.hedjam@mcgill.ca)
Abstract

Applications of perceptual image quality assessment (IQA) in image and video processing, such as image acquisition, image compression, image restoration and multimedia communication, have led to the development of many IQA metrics. In this paper, a reliable full reference IQA model is proposed that utilize gradient similarity (GS), chromaticity similarity (CS), and deviation pooling (DP). By considering the shortcomings of the commonly used GS to model human visual system (HVS), a new GS is proposed through a fusion technique that is more likely to follow HVS. We propose an efficient and effective formulation to calculate the joint similarity map of two chromatic channels for the purpose of measuring color changes. In comparison with a commonly used formulation in the literature, the proposed CS map is shown to be more efficient and provide comparable or better quality predictions. Motivated by a recent work that utilizes the standard deviation pooling, a general formulation of the DP is presented in this paper and used to compute a final score from the proposed GS and CS maps. This proposed formulation of DP benefits from the Minkowski pooling and a proposed power pooling as well. The experimental results on six datasets of natural images, a synthetic dataset, and a digitally retouched dataset show that the proposed index provides comparable or better quality predictions than the most recent and competing state-of-the-art IQA metrics in the literature, it is reliable and has low complexity. The MATLAB source code of the proposed metric is available at https://www.mathworks.com/matlabcentral/fileexchange/59809.

Image quality assessment, gradient similarity, chromaticity similarity, deviation pooling, synthetic image, Human visual system.

I Introduction

Automatic image quality assessment (IQA) plays a significant role in numerous image and video processing applications. IQA is commonly used in image acquisition, image compression, image restoration, multimedia streaming, etc [1]. IQA models (IQAs) mimic the average quality predictions of human observers. Full reference IQAs (FR-IQAs), which fall within the scope of this paper, evaluate the perceptual quality of a distorted image with respect to its reference image. This quality prediction is an easy task for the human visual system (HVS) and the result of the evaluation is reliable. Automatic quality assessment, e.g. objective evaluation, is not an easy task because images may suffer from various types and degrees of distortions. FR-IQAs can be employed to compare two images of the same dynamic range (usually low dynamic range) [2] or different dynamic ranges [3, 4]. This paper is dedicated to the IQA for low dynamic range images.

Among IQAs, the conventional metric mean squared error (MSE) and its variations are widely used because of their simplicity. However, in many situations, MSE does not correlate with the human perception of image fidelity and quality [5]. Because of this limitation, a number of IQAs have been proposed to provide better correlation with the HVS [6, 2, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. In general, these better performing metrics measure structural information, luminance information and contrast information in the spatial and frequency domains.

The most successful IQA models in the literature follow a top-down strategy [25]. They calculate a similarity map and use a pooling strategy that converts the values of this similarity map into a single quality score. For example, the luminance, contrast and structural information constitute a similarity map for the popular SSIM index [2]. SSIM then uses average pooling to compute the final similarity score. Different feature maps are used in the literature for calculation of this similarity map. Feature similarity index (FSIM) uses phase congruency and gradient magnitude features. GS [17] uses a combination of some designated gradient magnitudes and image contrast for this end, while the GMSD [19] uses only the gradient magnitude. SR_SIM [18] uses saliency features and gradient magnitude. VSI [21] likewise benefits from saliency-based features and gradient magnitude. SVD based features [26], features based on the Riesz transform [11], features in the wavelet domain [10, 27, 28] and sparse features [20] are used as well in the literature. Among these features, gradient magnitude is an efficient feature, as shown in [19]. In contrast, phase congruency and visual saliency features in general are not fast enough features to be used. Therefore, the features being used play a significant role in the efficiency of IQAs.

As we mentioned earlier, the computation of the similarity map is followed by a pooling strategy. The state-of-the-art pooling strategies for perceptual image quality assessment (IQA) are based on the mean and the weighted mean [2, 7, 15, 17, 16, 11]. They are robust pooling strategies that usually provide a moderate to high performance for different IQAs. Minkowski pooling [29], local distortion pooling [29, 30, 12], percentile pooling [31] and saliency-based pooling [18, 21] are other possibilities. Standard deviation (SD) pooling was also proposed and successfully used by GMSD [19]. The image gradients are sensitive to image distortions. Different local structures in a distorted image suffer from different degrees of degradations. This is the motivation that the authors in [19] used to explore the standard variation of the gradient-based local similarity map for overall image quality prediction. In general, features that constitute the similarity map and the pooling strategy are very important factors in designing high performance IQA models.

Here, we propose an IQA model called the mean deviation similarity index (MDSI) that shows very good compromise between prediction accuracy and model complexity. The proposed index is efficient, effective and reliable at the same time. It also shows consistent performance for both natural and synthetic images. The proposed metric follows a top-down strategy. It uses gradient magnitude to measure structural distortions and use chrominance features to measure color distortions. These two similarity maps are then combined to form a gradient-chromaticity similarity map. We then propose a novel deviation pooling strategy and use it to compute the final quality score. Both image gradient [32, 33, 17, 16, 19, 21] and chrominance features [16, 21] have been already used in the literature. The proposed MDSI uses a new gradient similarity which is more likely to follow HVS. Also, MDSI uses a new chromaticity similarity map which is efficient and shows good performance when used with the proposed metric. The proposed index uses the summation over similarity maps to give independent weights to them. Also, less attention has been paid to the deviation pooling strategy, except for a special case of this type of pooling, namely, standard deviation pooling [19]. We therefore provide a general formulation for the deviation pooling strategy and show its power in the case of the proposed IQA model. In the following, the main contributions of the paper as well as its differences with respect to the previous works are briefly explained.

Unlike previous researches [32, 33, 17, 16, 19, 21] that use a similar gradient similarity map, a new gradient similarity map is proposed in this paper which is more likely to follow the human visual system (HVS). This statement is supported by visual examples and experimental results.

This paper proposes a new chormaticity similarity map with the following advantages over the previously used chromaticity similarity maps [16, 21]. Its complexity is lower and it provides slightly better quality predictions when used with the proposed metric.

Motivated by a previous study that proposed to use standard deviation pooling [19], we propose a systematic and general formulation of the deviation pooling which has a comprehensive scope.

The rest of the paper is organized as follows. The proposed mean deviation similarity index is presented in section II. Extensive experimental results and discussion on six natural datasets, a synthetic dataset, and a digitally retouched dataset are provided in section III. Section IV presents our conclusions.

Ii Mean Deviation Similarity Index

The proposed IQA model uses two similarity maps. Image gradient, which is sensitive to structural distortions, is used as the main feature to calculate the first similarity map. Then, color distortions are measured by a chromaticity similarity map. These similarity maps are combined and pooled by a proposed deviation pooling strategy. In this paper, conversion to luminance is done through the following formula: . In addition, two chromaticity channels of a Gaussian color model [34] are used:

(1)

Ii-a Gradient Similarity

It is very common that gradient magnitude in the discrete domain is calculated on the basis of some operators that approximate derivatives of the image function using differences. These operators approximate vertical and horizontal gradients of an image using convolution: and , where and are horizontal and vertical gradient operators and denotes the convolution. The first derivative magnitude is defined as . The Sobel operator [35], the Scharr operator, and the Prewitt operator are common gradient operators that approximate first derivatives. Within the proposed IQA model, these operators perform almost the same.

Through this paper, Prewitt operator is used to compute gradient magnitudes of luminance channels of reference and distorted images, and . From which, gradient similarity (GS) is computed by the following SSIM induced equation:

(2)

where, parameter is a constant to control numerical stability. The gradient similarity (GS) is widely used in the literature [32, 33, 17, 16, 19, 21] and its usefulness to measure image distortions was extensively investigated in [19].

In many scenarios, human visual system (HVS) disagrees with the judgments provided by the GS for structural distortions. In fact, in such a formulation, there is no difference between an added edge to or a removed edge from the distorted image with respect to the reference image. An extra edge in bring less attention of HVS if its color is close to the relative pixels of that edge in . Likewise, HVS pays less attention to a removed edge from that is replaced with pixels of the same or nearly the same color. In another scenario, suppose that edges are preserved in but with different colors than in . In this case, GS is likely to fail at providing a good judgment “on the edges”. These shortcomings of the GS motivated us to propose a new GS map.

(JPEG compression) GS GCS ()

(color saturation) GS GCS ()
Fig. 1: Complementary behavior of the gradient similarity (GS) and chromaticity similarity () maps.

Ii-B The Proposed Gradient Similarity

The aforementioned shortcomings of the conventional gradient similarity map (equation 2) are mainly because and are computed independent of each other. In the following, we propose a fusion technique to include the correlation between and images into computation of the gradient similarity map.

We fuse the luminance channels of the and by a simple averaging: = 0.5 ( + ). Two extra GS maps are computed as follows:

(3)
(4)

where, is the gradient magnitude of the fused image , and is used for numerical stability. Note that , and that and can or can not be equal. The proposed gradient similarity () is computed by:

(5)

The added term , will put more emphasize on removed edges from than added edges to the . For weak added/removed edges, it is likely that weak edges smooth out in . Therefore, always put less emphasize on weak edges.

Comparing visually some outputs of the GS and at this step might not be fair because they have different numerical scales. GS is bounded between 0 and 1, while might have negative values greater than -1, and/or positive values smaller than +2. Therefore, this comparison is performed on the final similarity map and is presented in subsection II-E as well as more explanation on how the proposed works.

Ii-C Chromaticity Similarity

For the case of color changes and especially when the structure of the distorted image remains unchanged, the gradient similarity (GS) and the proposed may lead to inaccurate quality predictions. Therefore, previous researches such as [16, 21] used a color similarity map to measure color differences. Let and denote two chromaticity channels regardless of the type of the color space. In [16, 21], for each channel a color similarity is computed and their result is combined as:

(6)

where is a constant to control numerical stability. In this paper, we propose a new formulation to calculate color similarity. The proposed formulation calculates a color similarity map using both chromaticity channels at once:

(7)

Similar to the CS in equation (6), the above joint color similarity () formulation gives equal weight to both chromaticity channels and . It is clear that is more efficient than CS. CS needs 7 multiplications, 6 summations, 2 divisions, and 2 shift operations (multiplications by 2), while needs 6 multiplications, 6 summations, 1 division, and 1 shift operation. Note that CS can also be computed through 8 multiplications, 6 summations, 1 division, and 2 shift operations. In experimental results section, an experiment is conducted to compare usefulness of the CS and along with the proposed metric.

The gradient similarity maps (GS or ) can be combined with the joint color similarity map through the following summation (weighted average) scheme:

(8)
(9)

where the parameter adjusts the relative importance of the gradient and chromaticity similarity maps. The proposed metric MDSI uses equation (9). Equation (8) is included to be compared with equation (9). An alternative combination scheme which is very popular in state-of-the-art is through multiplication in the form of , where the parameters and are used to adjust the relative importance of the two similarity maps. For several reasons, the proposed index uses the summation scheme (refer to subsection III-E).

In Fig 1, two examples are provided to show that these two similarity maps, e.g. GS and , are complementary. In the first example, there is a considerable difference between the gradient maps of the reference and the distorted images. Hence, the GS map is enough for a good judgment. However, this difference in the second example (second row) is trivial, which leads to a wrong prediction by using GS as the only similarity map. The examples in Fig 1 show that the gradient similarity and chromaticity similarity are complementary.

Ii-D Deviation Pooling

The motivation of using the deviation pooling is that HVS is sensitive to both magnitude and the spread of the distortions across the image. Other pooling strategies such as Minkowski pooling and percentile pooling adjust the magnitude of distortions or discard the less/non distorted pixels. These pooling strategies and the mean pooling do not take into account the spread of the distortions. It is shown in [19] by case examples and experimental results that a common wrong prediction by mean pooling is where it calculates the same quality scores for two distorted images of different type. In such cases, deviation pooling is likely to provide good judgments over their quality through spread of the distortions. This is the reason why mean pooling have good inter-class (one distortion type) quality prediction but its performance might be degraded for intra-class (whole dataset) quality prediction. While this statement can be verified from the experimental results provided in [19], an example is also provided in subsection III-D to support this statement. Human visual system penalizes severer distortions much more than the distortion-free regions, and these pixels may constitute different fractions of distorted images. Mean pooling, however, depending on this fraction, is likely to nullify the impact of the severer distortions by inclusion of distortion-free regions into the average computation. Fig. 2 shows overlapped histograms of two similarity maps corresponding to two distorted images. While mean pooling indicate that image #1 is of better quality than image #2 (), deviation pooling provides an opposite assessment (). Given that , and that image #1 has more severe distortions compared to image #2 with their values farther from than , there are larger deviations in similarity map of image #1 than that of image #2. Therefore, deviation pooling is an alternative to the mean pooling that can also measure different levels of distortions. In the following, we propose the deviation pooling (DP) strategy and provide a general formulation of this pooling.

Fig. 2: Overlapped histograms of two similarity maps corresponding to two distorted images. Lower values of similarity maps indicate to more severe distortions, while higher values refer to less/non distorted pixels.

DP for IQAs is rarely used in the literature, except the standard deviation used in GMSD [19], which is a special case of DP. A deviation can be seen as the Minkowski distance of order between vector x and its MCT (Measure of Central Tendency):

(10)

where indicates the type of deviation. The only MCT that is used in this paper is mean. Though other MCTs such as median and mode can be used, we found that these MCTs do not provide satisfactory quality predictions.

Several researches shown that more emphasis on the severer distortions can lead to more accurate predictions [29, 31]. The Minkowski pooling [29] and the percentile pooling [31] are two examples. As mentioned before, these pooling strategies follow a property of HVS that penalize severer distortions much more than the less distorted ones even though they constitute a small portion of total distortions. Hence, they try to moderate the weakness of the mean pooling through adjusting magnitudes of distortions [29] or discarding the less/non distorted regions [31]. The deviation pooling can be generalized to consider the aforementioned property of HVS:

(11)

where, adjusts the emphasis of the values in vector x, and MCT is calculated through x values. Furthermore, we propose to use power pooling in conjunction with the deviation pooling to control numerical behavior of the final quality scores:

(12)

where, is the power pooling applied on the final value of the deviation. The power pooling can be used to make an IQA model more linear versus the subjective scores or might be used for better visualization of the scores. Linearity might not be a significant advantage of an IQA, but it is pointed to be of interest in [19]. Also, according to [36], linearity against subjective data is one of the measures for validation of IQAs that should be examined111Though linearity is measured after a nonlinear analysis.. The power pooling can also have small impact on the values of Pearson linear Correlation Coefficient (PCC) and Root Mean Square Error (RMSE). Note that the above deviation pooling is equal to the Minkowski pooling [29] when MCT , and . It is equal to the mean absolute deviation (MAD) to the power of for , and equal to the standard deviation (SD) to the power of for . The three parameters should be set according to the IQA model. More analysis on these three parameters can be found in experimental results section. For the proposed index MDSI, we set , and . Therefore, the proposed IQA model can be written as:

(13)


(GS)

GCS GCS
Fig. 3: The difference between similarity maps GCS and that use conventional gradient similarity and the proposed gradient similarity, respectively.

Note that possible interval for is , where and . It is worth to mention that values of mostly remain in . Also, are highly distorted pixels, while refer to less/non-distorted pixels, where is a very small number. The global variations of is computed by mean absolute deviation, which is followed by power pooling. Note that since absolute of deviations is computed, the quality scores are positive. Larger values of the quality predictions provided by the proposed index indicate to the more severe distorted images, while an image with perfect quality is assessed by a quality score of zero since there is no variation in its similarity map. The important point on the use of the Minkowski pooling on final similarity maps is that terms like “more emphasize” and “less emphasize”, regardless of the values have been used, depends also on the pooling strategy and underlying similarity map. For example, placing more emphasize on highly distorted regions by Minkowski pooling will decrease the quality score computed by the mean pooling, but the quality score provided by the deviation pooling might become larger or smaller depending on the spread of the distortions which is directly related to the underlying similarity map.

Ii-E Analysis and Examples of GCS Maps

In this section, final similarity maps after applying the Minkowski pooling, e.g. GCS and , are compared along with sufficient explanations. The difference between these two similarity map is their use of gradient similarity. GCS uses conventional GS, while uses the proposed gradient similarity . The best way to analyze the effect of the proposed gradient similarity is through step by step explanation and visualization of different examples. In subsection II-A, several disadvantages of the traditional GS was mentioned. Here, each of them are explained and examples are provided.

Case 1 (Removed edge)

Missing edges in distorted image with respect to its original image means that structural information are removed, hence this disappearance bring attention of the HVS. These regions has to be strongly highlighted in the similarity map.

Case 2 (A weak added/removed edge)

An extra edge in or a removed edge from bring less attention of HVS if its color is close to the relative pixels of that edge in (), or simply it is a weak edge.

Fig. 3 shows how the proposed gradient similarity map performs for case 1 and case 2 as a part of the compared to the GS for GCS. We can see that (GS) highlighted differences with details. The edges corresponding to the location of ropes in original image are mainly replaced with pixels of another color (dark replaced with green), but many other edges with smaller strengths in are replaced with pixels having the same color (green). This latter holds for added edges to the distorted image. In fused image (), some of these weaker edges are smoothed. This can be seen by comparing and . Both and indicate to high differences at the location of the ropes. will also put high emphasize on this location, but less emphasize on the weaker edges. The results is then subtracted by which in turn again less emphasize is placed on the weak edges (relevant to the darker pixels in ). Note that GCS and have different numerical behavior, so it is fair to compare them by looking at the GCS and . Compared to the GCS, indicate to larger differences at the location of ropes, but smaller differences elsewhere.



GCS
Fig. 4: The difference between similarity maps GCS and for the case of the inverted edges. Note that some intermediate outputs are not shown.

Case 3 (Preserved edge but with different color)

Although a color similarity map should measure color differences at the location of the inverted edges, edges constitute a small fraction of total pixels in images, and it is common to give smaller weights to a color similarity map than structural similarities such as gradient similarity. While traditional gradient similarity does not work well in this situation, the proposed gradient similarity can partially solve this problem. Fig. 4 provides an example in which most of the edges are inverted in the distorted image. We can see that highlighted much more differences than GCS at these locations, thanks to the added term to the traditional gradient similarity. In fact, is likely to be different than in this case because these edges in are likely to become closer to their surrounding pixels in either or images.

Iii Experimental results and discussion

In the experiments, eight datasets were used. The LIVE dataset [37] contains 29 reference images and 779 distorted images of five categories. The TID2008 [38] dataset contains 25 reference images and 1700 distorted images. For each reference image, 17 types of distortions of 4 degrees are available. CSIQ [12] is another dataset that consists of 30 reference images; each is distorted using six different types of distortions at four to five levels of distortion. The large TID2013 [39] dataset contains 25 reference images and 3000 distorted images. For each reference image, 24 types of distortions of 5 degrees are available. VCL@FER database [40] consists of 23 reference images and 552 distorted images, with four degradation types and six degrees of degradation. In addition to these five datasets, contrast distorted images of the CCID2014 dataset [41] are used in the experiments. This dataset contains 655 contrast distorted images of five types. Gamma transfer, convex and concave arcs, cubic and logistic functions, mean shifting, and a compound function are used to generate these five types of distortions. We also used the ESPL synthetic image database [42] which contains 25 synthetic images of video games and animated movies. It contains 500 distorted images of 5 categories. Fig. 5 shows an example of a reference and a distorted synthetic image. Finally, the digitally retouched image quality (DRIQ) dataset [43] was used in the experiments. It contains 26 reference images and 3 enhanced images for each reference image.



(Gaussian noise)
Fig. 5: An example of reference and distorted image in the ESPL synthetic images database [42].
MSSSIM VIF MAD IWSSIM SR_SIM FSIM GMSD SFF VSI DSCSI ADD-GSIM SCQI MDSI
SRC 0.8542 0.7491 0.8340 0.8559 0.8913 0.8840 0.8907 0.8767 0.8979 0.8634 0.9094 0.9051 0.9208
TID PCC 0.8451 0.8084 0.8290 0.8579 0.8866 0.8762 0.8788 0.8817 0.8762 0.8445 0.9120 0.8899 0.9160
2008 KRC 0.6568 0.5861 0.6445 0.6636 0.7149 0.6991 0.7092 0.6882 0.7123 0.6651 0.7389 0.7294 0.7515
RMSE 0.7173 0.7899 0.7505 0.6895 0.6206 0.6468 0.6404 0.6333 0.6466 0.7187 0.5504 0.6120 0.5383
SRC 0.9133 0.9195 0.9467 0.9213 0.9319 0.9310 0.9570 0.9627 0.9423 0.9417 0.9422 0.9434 0.9569
\multirow2*CSIQ PCC 0.8991 0.9277 0.9500 0.9144 0.9250 0.9192 0.9541 0.9643 0.9279 0.9313 0.9342 0.9268 0.9531
KRC 0.7393 0.7537 0.7970 0.7529 0.7725 0.7690 0.8129 0.8288 0.7857 0.7787 0.7894 0.7870 0.8130
RMSE 0.1149 0.0980 0.0820 0.1063 0.0997 0.1034 0.0786 0.0695 0.0979 0.0956 0.0937 0.0986 0.0795
SRC 0.9513 0.9636 0.9669 0.9567 0.9618 0.9645 0.9603 0.9649 0.9524 0.9487 0.9681 0.9406 0.9667
\multirow2*LIVE PCC 0.9489 0.9604 0.9675 0.9522 0.9553 0.9613 0.9603 0.9632 0.9482 0.9434 0.9657 0.9344 0.9659
KRC 0.8044 0.8282 0.8421 0.8175 0.8299 0.8363 0.8268 0.8365 0.8058 0.7982 0.8474 0.7835 0.8395
RMSE 8.6188 7.6137 6.9072 8.3472 8.0812 7.5296 7.6214 7.3460 8.6817 9.0635 7.0925 9.7355 7.0790
SRC 0.7859 0.6769 0.7807 0.7779 0.8073 0.8510 0.8044 0.8513 0.8965 0.8744 0.8285 0.9052 0.8899
TID PCC 0.8329 0.7720 0.8267 0.8319 0.8663 0.8769 0.8590 0.8706 0.9000 0.8782 0.8807 0.9071 0.9085
2013 KRC 0.6047 0.5147 0.6035 0.5977 0.6406 0.6665 0.6339 0.6581 0.7183 0.6862 0.6646 0.7327 0.7123
RMSE 0.6861 0.7880 0.6976 0.6880 0.6193 0.5959 0.6346 0.6099 0.5404 0.5930 0.5871 0.5219 0.5181
SRC 0.9227 0.8866 0.9061 0.9163 0.9021 0.9323 0.9177 0.7738 0.9317 0.9289 0.9366 0.9083 0.9318
VCL@ PCC 0.9232 0.8938 0.9053 0.9191 0.9023 0.9329 0.9176 0.7761 0.9320 0.9338 0.9339 0.9107 0.9349
FER KRC 0.7497 0.6924 0.7213 0.7372 0.7183 0.7643 0.7406 0.5779 0.7633 0.7588 0.7731 0.7316 0.7629
RMSE 9.4398 11.014 10.433 9.6788 10.589 8.8480 9.7643 15.488 8.9051 8.7902 8.7819 10.147 8.7136
SRC 0.7770 0.8349 0.7451 0.7811 0.7363 0.7657 0.8077 0.6859 0.7734 0.7400 0.8698 0.7811 0.8128
CCID PCC 0.8278 0.8588 0.7516 0.8342 0.7834 0.8204 0.8521 0.7575 0.8209 0.7586 0.8935 0.8200 0.8576
2014 KRC 0.5845 0.6419 0.5490 0.5898 0.5372 0.5707 0.6100 0.5012 0.5735 0.5468 0.6840 0.5812 0.6181
RMSE 0.3668 0.3350 0.4313 0.3606 0.4064 0.3739 0.3422 0.4269 0.3734 0.4260 0.2936 0.3734 0.3363
SRC 0.7247 0.7488 0.8624 0.8270 0.8802 0.8766 0.8209 0.8127 0.8717 0.7263 0.7828 0.8292 0.8806
\multirow2*ESPL PCC 0.7322 0.7423 0.8677 0.8300 0.8732 0.8738 0.8234 0.8179 0.8726 0.7302 0.7902 0.8356 0.8802
KRC 0.5208 0.5565 0.6720 0.6221 0.6932 0.6853 0.6178 0.6127 0.6765 0.5222 0.5814 0.6243 0.6895
RMSE 9.4519 9.2985 6.8985 7.7404 6.7646 6.7482 7.8753 7.9844 6.7791 9.4815 8.5053 7.6241 6.5862
SRC 0.6692 0.8078 0.6867 0.6903 0.7551 0.7751 0.7762 0.8342 0.8222 0.8167 0.7661 0.8482 0.8508
\multirow2*DRIQ PCC 0.7058 0.8496 0.6967 0.7155 0.8027 0.7989 0.8001 0.8420 0.8477 0.8463 0.8053 0.8638 0.8702
KRC 0.4739 0.5997 0.4898 0.4952 0.5604 0.5771 0.5758 0.6477 0.6177 0.6104 0.5618 0.6490 0.6557
RMSE 1.4450 1.0759 1.4631 1.4249 1.2165 1.2268 1.2235 1.1004 1.0820 1.0864 1.2092 1.0277 1.0050
\multirow3*
Direct
Avg.
SRC 0.8248 0.8234 0.8411 0.8408 0.8583 0.8725 0.8669 0.8453 0.8860 0.8550 0.8754 0.8826 0.9013
PCC 0.8394 0.8516 0.8493 0.8569 0.8743 0.8824 0.8807 0.8591 0.8907 0.8583 0.8894 0.8860 0.9108
KRC 0.6418 0.6466 0.6649 0.6595 0.6834 0.6960 0.6909 0.6689 0.7067 0.6708 0.7051 0.7023 0.7303
\multirow3*
Weighted
Avg.
SRC 0.8335 0.7783 0.8374 0.8387 0.8578 0.8769 0.8626 0.8585 0.8974 0.8698 0.8783 0.8977 0.9066
PCC 0.8521 0.8287 0.8546 0.8626 0.8810 0.8877 0.8838 0.8729 0.8963 0.8679 0.8995 0.8967 0.9160
KRC 0.6511 0.6112 0.6626 0.6587 0.6880 0.6999 0.6913 0.6791 0.7206 0.6855 0.7140 0.7231 0.7375
TABLE I: Performance comparison of the proposed IQA model, MDSI, and twelve popular/competing indices on eight benchmark datasets. Note that top three IQA models are highlighted.

For objective evaluation, four popular evaluation metrics were used in the experiments: the Spearman Rank-order Correlation coefficient (SRC), the Pearson linear Correlation Coefficient (PCC) after a nonlinear regression analysis (equation 14), the Kendall Rank Correlation coefficient (KRC) and the Root Mean Square Error (RMSE). The SRC, PCC, and RMSE metrics measure prediction monotonicity, prediction linearity, and prediction accuracy, respectively. The KRC was used to evaluate the degree of similarity between quality scores and MOS. In addition, Pearson linear Correlation Coefficient without nonlinear analysis is used and denoted by LPCC.

Twelve state-of-the-art IQA models were chosen for comparison [7, 9, 12, 15, 18, 16, 19, 20, 21, 44, 22, 23] including the most recent indices in literature [20, 19, 21, 44, 22, 23]. It should be noted that the five indices SFF [20], GMSD [19], VSI [21], [22], and SCQI [23] have shown superior performance over state-of-the-art indices.

Iii-a Performance comparison

In Table I, the overall performance of thirteen IQA models on eight benchmark datasets, e.g. TID2008, CSIQ, LIVE, TID2013, VCL@FER, CCID2014, ESPL, and DRIQ, is listed. For each dataset and evaluation metric, the top three IQA models are highlighted. On eight datasets, MDSI is 31 times among the top indices, followed by ADD-GSIM (16 times), SCQI (12 times), SFF/FSIM/VIF (6 times), VSI (5 times), GMSD/SR_SIM/MAD222Note the conflict between ‘MAD’ [12] as an IQA model, and ‘MAD’ as a pooling strategy. (4 times), DSCSI (2 times), and MSSSIM/IWSSIM (0 times). To provide a conclusion on the overall performance of these indices, direct and weighted333The dataset size-weighted average is commonly used in the literature [15, 20, 19, 21]. overall performances on the eight datasets (8150 images) are also listed in Table I. It can be seen that MDSI has the best overall performance on the eight datasets, while metrics VSI and SCQI are the second, and third best, respectively.

Iii-B Visualization and statistical evaluation

TID2008 1 2 3 4 5 6 7 8 9 10 sum
1 VIF - -1 -1 -1 -1 -1 -1 -1 -1 -1 -9
2 MAD +1 - -1 -1 -1 -1 -1 -1 -1 -1 -7
3 SR_SIM +1 +1 - +1 +1 +1 +1 -1 -1 -1 +3
4 FSIM +1 +1 -1 - -1 -1 +1 -1 -1 -1 -3
5 GMSD +1 +1 -1 +1 - -1 +1 -1 -1 -1 -1
6 SFF +1 +1 -1 +1 +1 - +1 -1 -1 -1 +1
7 VSI +1 +1 -1 0 -1 -1 - -1 -1 -1 -4
8 ADD-GSIM +1 +1 +1 +1 +1 +1 +1 - +1 -1 +7
9 SCQI +1 +1 +1 +1 +1 +1 +1 -1 - -1 +5
10 MDSI +1 +1 +1 +1 +1 +1 +1 +1 +1 - +9
CSIQ 1 2 3 4 5 6 7 8 9 10 sum
1 VIF - -1 +1 +1 -1 -1 0 -1 0 -1 -3
2 MAD +1 - +1 +1 -1 -1 +1 +1 +1 -1 +3
3 SR_SIM -1 -1 - +1 -1 -1 -1 -1 -1 -1 -7
4 FSIM -1 -1 -1 - -1 -1 -1 -1 -1 -1 -9
5 GMSD +1 +1 +1 +1 - -1 +1 +1 +1 0 +6
6 SFF +1 +1 +1 +1 +1 - +1 +1 +1 +1 +9
7 VSI 0 -1 +1 +1 -1 -1 - -1 0 -1 -3
8 ADD-GSIM +1 -1 +1 +1 -1 -1 +1 - +1 -1 +1
9 SCQI 0 -1 +1 +1 -1 -1 0 -1 - -1 -3
10 MDSI +1 +1 +1 +1 0 -1 +1 +1 +1 - +6


LIVE 1 2 3 4 5 6 7 8 9 10 sum
1 VIF - -1 +1 0 0 -1 +1 -1 +1 -1 -1
2 MAD +1 - +1 +1 +1 +1 +1 0 +1 0 +7
3 SR_SIM -1 -1 - -1 -1 -1 +1 -1 +1 -1 -5
4 FSIM 0 -1 +1 - 0 0 +1 -1 +1 -1 0
5 GMSD 0 -1 +1 0 - -1 +1 -1 +1 -1 -1
6 SFF +1 -1 +1 0 +1 - +1 -1 +1 -1 +2
7 VSI -1 -1 -1 -1 -1 -1 - -1 +1 -1 -7
8 ADD-GSIM +1 0 +1 +1 +1 +1 +1 - +1 0 +7
9 SCQI -1 -1 -1 -1 -1 -1 -1 -1 - -1 -9
10 MDSI +1 0 +1 +1 +1 +1 +1 0 +1 - +7
TID2013 1 2 3 4 5 6 7 8 9 10 sum
1 VIF - -1 -1 -1 -1 -1 -1 -1 -1 -1 -9
2 MAD +1 - -1 -1 -1 -1 -1 -1 -1 -1 -7
3 SR_SIM +1 +1 - -1 +1 -1 -1 -1 -1 -1 -3
4 FSIM +1 +1 +1 - +1 +1 -1 -1 -1 -1 +1
5 GMSD +1 +1 -1 -1 - -1 -1 -1 -1 -1 -5
6 SFF +1 +1 +1 -1 +1 - -1 0 -1 -1 0
7 VSI +1 +1 +1 +1 +1 +1 - +1 -1 -1 +5
8 ADD-GSIM +1 +1 +1 +1 +1 +1 -1 - -1 -1 +3
9 SCQI +1 +1 +1 +1 +1 +1 +1 +1 - -1 +7
10 MDSI +1 +1 +1 +1 +1 +1 +1 +1 +1 - +9


\multirow2*
VCL
 @FER
\multirow2*1 \multirow2*2 \multirow2*3 \multirow2*4 \multirow2*5 \multirow2*6 \multirow2*7 \multirow2*8 \multirow2*9 \multirow2*10 \multirow2*sum
1 VIF - -1 -1 -1 -1 +1 -1 -1 -1 -1 -7
2 MAD +1 - +1 -1 -1 +1 -1 -1 -1 -1 -3
3 SR_SIM +1 -1 - -1 -1 +1 -1 -1 -1 -1 -5
4 FSIM +1 +1 +1 - +1 +1 0 0 +1 0 +6
5 GMSD +1 +1 +1 -1 - +1 -1 -1 +1 -1 +1
6 SFF -1 -1 -1 -1 -1 - -1 -1 -1 -1 -9
7 VSI +1 +1 +1 0 +1 +1 - 0 +1 -1 +5
8 ADD-GSIM +1 +1 +1 0 +1 +1 0 - +1 0 +6
9 SCQI +1 +1 +1 -1 -1 +1 -1 -1 - -1 -1
10 MDSI +1 +1 +1 0 +1 +1 +1 0 +1 - +7
CCID2014 1 2 3 4 5 6 7 8 9 10 sum
1 VIF - +1 +1 +1 +1 +1 +1 -1 +1 0 +6
2 MAD -1 - -1 -1 -1 -1 -1 -1 -1 -1 -9
3 SR_SIM -1 +1 - -1 -1 +1 -1 -1 +1 -1 -3
4 FSIM -1 +1 +1 - -1 +1 0 -1 +1 -1 0
5 GMSD -1 +1 +1 +1 - +1 +1 -1 +1 -1 +3
6 SFF -1 +1 -1 -1 -1 - -1 -1 0 -1 -6
7 VSI -1 +1 +1 0 -1 +1 - -1 +1 -1 0
8 ADD-GSIM +1 +1 +1 +1 +1 +1 +1 - +1 +1 +9
9 SCQI -1 +1 -1 -1 -1 0 -1 -1 - -1 -6
10 MDSI 0 +1 +1 +1 +1 +1 +1 -1 +1 - +6


ESPL 1 2 3 4 5 6 7 8 9 10 sum
1 VIF - -1 -1 -1 -1 -1 -1 -1 -1 -1 -9
2 MAD +1 - -1 -1 +1 +1 -1 +1 +1 -1 +1
3 SR_SIM +1 +1 - 0 +1 +1 +1 +1 +1 -1 +6
4 FSIM +1 +1 0 - +1 +1 +1 +1 +1 -1 +6
5 GMSD +1 -1 -1 -1 - +1 -1 +1 -1 -1 -3
6 SFF +1 -1 -1 -1 -1 - -1 +1 -1 -1 -5
7 VSI +1 +1 -1 -1 +1 +1 - +1 +1 -1 +3
8 ADD-GSIM +1 -1 -1 -1 -1 -1 -1 - -1 -1 -7
9 SCQI +1 -1 -1 -1 +1 +1 -1 +1 - -1 -1
10 MDSI +1 +1 +1 +1 +1 +1 +1 +1 +1 - +9
DRIQ 1 2 3 4 5 6 7 8 9 10 sum
1 VIF - +1 +1 +1 +1 +1 +1 +1 0 -1 +5
2 MAD -1 - -1 -1 -1 -1 -1 -1 -1 -1 -9
3 SR_SIM -1 +1 - 0 0 -1 -1 -1 -1 -1 -5
4 FSIM -1 +1 0 - 0 -1 -1 -1 -1 -1 -5
5 GMSD -1 +1 0 0 - -1 -1 -1 -1 -1 -5
6 SFF -1 +1 +1 +1 +1 - 0 +1 -1 -1 +2
7 VSI -1 +1 +1 +1 +1 0 - +1 -1 -1 +2
8 ADD-GSIM -1 +1 +1 +1 +1 -1 -1 - -1 -1 -1
9 SCQI 0 +1 +1 +1 +1 +1 +1 +1 - -1 +6
10 MDSI +1 +1 +1 +1 +1 +1 +1 +1 +1 - +9
TABLE II: The results of statistical significance test for ten IQA models on eight datasets. The result of the F-test is equal to +1 if a metric is significantly better than another metric, it is equal to -1 if that metric is statistically inferior to another metric, and the result is equal to 0 if two metrics are statistically indistinguishable. The cumulative sum of individual tests for each metric is listed in the last column with top three IQA models being highlighted in the same column.

For the purpose of visualizing quality scores of the proposed index, the scatter plots of the proposed IQA model MDSI with and without using power pooling are shown in Fig. 6. The logistic function suggested in [45] was used to fit a curve on each plot:

(14)

where , , , and are fitting parameters computed by minimizing the mean square error between quality predictions and subjective scores MOS. It should be noted that reported PCC and RMSE values in this paper are computed after mapping quality scores to MOS based on above function.

LPCC = 0.8324 PCC = 0.9626 LPCC = 0.9618 PCC = 0.9659
Fig. 6: Scatter plots of quality scores against the subjective MOS on the LIVE dataset for the proposed model MDSI with and without using the power pooling. Comparison of LPCC and PCC values indicate that MDSI becomes more linear with respect to MOS (the right plot) by using the power pooling.

The reported results in Table I show the difference between different IQA models. As suggested in [46, 45], we use F-test to decide whether a metric is statistically superior to another index. The F-test is based on the residuals between the quality scores given by an IQA model after applying nonlinear mapping of equation (14), and the mean subjective scores MOS. The ratio of variance between residual errors of an IQA model to another model at 95% significance level is used by F-test. The result of the test is equal to 1 if we can reject the null hypothesis and 0 otherwise. The results of F-test on eight datasets are listed in Table II. In this Table, +1/-1 indicate that corresponding index is statistically superior/inferior to the other index being compared to. If the difference between two indices is not significant, the result is shown by 0. We note that type I error might be occurred, specially when quality scores of IQA models are not Gaussian. However, even existence of possible errors is very unlikely to result in another conclusion about the superiority of the proposed index because there is a considerable gap between the proposed index and the other metrics as discussed in the following.

From the results of Table II, we can see that MDSI is significantly better than the other indices on TID2008, TID2013, ESPL, and DRIQ datasets. Therefore, its sum value in the last column is +9 for these four datasets. SCQI is statistically superior to the other indices on the TID2013 dataset except for MDSI. On the LIVE dataset, indices MAD, MDSI, and ADD-GSIM are significantly better than the other indices. On the CSIQ dataset, only SFF performs significantly better than MDSI. On the CCID2014 dataset, ADD-GSIM is significantly better than the other indices, while the statistically indistinguishable indices VIF and MDSI show promising results. Considering all eight datasets used in this experiment, with a minimum sum value of +6, the proposed index MDSI performs very well in comparison with the other indices. We can simply add the eight cumulative sum values of each metric for the eight datasets to have an overall comparison based on the statistical significance test. This score indicates how many times a metric is statistically superior to the other metrics. The results show that MDSI is the best performing index by a score of +62 (out of maximum +72), followed by ADD-GSIM (+25), VSI (+1), SCQI (-2), FSIM (-4), GMSD (-5), SFF (-6), SR_SIM (-19), MAD (-24), and VIF (-26). The results based on the statistical significance test verify that unlike other IQA models, the proposed metric MDSI is among the best performing indices on different datasets.

Iii-C Performance comparison on individual distortions

A good IQA model should perform not only accurate quality predictions for a whole dataset; it should provide good judgments over individual distortion types. We list in Table III the average SRC, and PCC values of thirteen IQA models for 61 sets of distortions available in the six datasets of TID2008, CSIQ, LIVE, TID2013, VCL@FER, and ESPL. The minimum value for each evaluation metric and standard deviation of these 61 values are also listed. These two evaluations indicate to the reliability of an IQA model. An IQA model should provide good prediction accuracy for all of the distortion types. If a metric fails at assessing one or more types of distortions, that index can not be reliable.

The proposed index MDSI, has the best SRC, and PCC average on distortion types. MDSI, SCQI and FSIM in the worst case perform better than the other IQA models, as can be seen in the min column for each evaluation metric. This shows the reliability of the proposed index. The negative min values and close to zero min values in Table III indicate the unreliability of related models when dealing with some distortion types. The standard deviation of 61 values for each evaluation metric is another reliability factor. According to Table III, MDSI, SCQI and FSIM have the lowest variation. Therefore, we can conclude that indices MDSI, SCQI and FSIM are more reliable than the other indices.

\multirow2*IQA model SRC (Distortions) PCC (Distortions)
avg min std avg min std
MSSSIM 0.8343 -0.4099 0.1989 0.8560 -0.4448 0.1944
VIF 0.8537 -0.3099 0.1811 0.8760 -0.3443 0.1812
MAD 0.8111 -0.0575 0.2315 0.8296 0.0417 0.2108
IWSSIM 0.8329 -0.4196 0.2019 0.8568 -0.4503 0.1962
SR_SIM 0.8609 -0.2053 0.1806 0.8785 -0.3162 0.1839
FSIM 0.8775 0.4679 0.1041 0.8967 0.5488 0.0880
GMSD 0.8542 -0.2948 0.1954 0.8785 -0.3625 0.1851
SFF 0.8538 0.1786 0.1472 0.8721 0.0786 0.1441
VSI 0.8779 0.1713 0.1360 0.8969 0.4875 0.1044
DSCSI 0.8722 0.3534 0.1242 0.8908 0.5166 0.1093
ADD-GSIM 0.8650 -0.2053 0.1686 0.8799 -0.2190 0.1691
SCQI 0.8826 0.4479 0.1057 0.9010 0.6493 0.0841
MDSI 0.8903 0.4378 0.1030 0.9095 0.6899 0.0805
TABLE III: Overall performance comparison of the proposed IQA model MDSI and twelve popular/competing indices on individual distortion types of six datasets (TID2008, CSIQ, LIVE, TID2013, VCL@FER, and ESPL). The six datasets contain 61 distortion set, therefore results on distortion types are reported based on average of 61 correlation values. Top three IQA models are highlighted.
\multirow2*Pooling Weighted avg. SRC (8 datasets) Avg. SRC (61 Distortions)
Mean MAD SD Mean MAD SD
q = 1/4 0.8864 0.9066 0.8776 0.8919 0.8903 0.8828
q = 1/2 0.8833 0.9067 0.8820 0.8912 0.8898 0.8826
q = 1 0.8730 0.9041 0.8899 0.8899 0.8890 0.8820
q = 2 0.8519 0.8928 0.8972 0.8888 0.8866 0.8820
q = 4 0.8301 0.8766 0.8922 0.8869 0.8780 0.8753
TABLE IV: Performance of the proposed index MDSI with different pooling strategies and values of parameter .

Iii-D Parameters of deviation pooling (, , )

Considering the formulation of deviation pooling in equation (12), we used the mean absolute deviation (MAD), e.g. , for the proposed metric. Standard deviation (SD), e.g. , is another option that can be used for deviation pooling. In addition, the Minkowski power () of the deviation pooling can have significant impact on the proposed index. In Table IV, the SRC performance of the proposed index is analyzed for different values of and . Mean pooling is also used in this experiment. The results show that MAD pooling with is a better choice for the proposed index. Also, the performance of the mean pooling on 61 distortion set confirms our statement that mean pooling has a good performance for inter-class quality prediction.

The impact of the proposed power pooling of the deviation pooling on the proposed metric was shown in Fig. 6. Power pooling can be also used to increase linearity of other indices as well. For example, LPCC and PCC values of VSI [21] for TID2013 dataset, by setting , can be increased from 0.8373 to 0.8928, and 0.9000 to 0.9011, respectively.

Iii-E Summation vs. Multiplication

Two options for combination of the two similarity maps GS/ and are summation and multiplication as explained in subsection II-C. Deciding whether one approach is superior to another for an index depends on many factors. These factors might be the pooling strategy being used, overall performance, performance on individual distortions, reliability, efficiency, simplicity, etc. In an experiment, the performance of the MDSI using the multiplication approach was examined. Based on the many set of parameters were tested, we found that and are good parameters to combine and via the multiplication scheme. The observation was that summation is a better choice for TID2008, TID2013, VCL@FER, and DRIQ datasets, while multiplication is a better choice for ESPL dataset, and that both approaches show almost the same performance on other datasets. Overall, the summation approach provides better performance on individual distortions. This experiment also shown that MDSI is more reliable through summation than multiplication based on the reliability measures introduced in this paper. Based on this experiment, the simplicity of the summation combination approach and its efficiency over multiplication, the former was used along with MDSI. Table V justifies our choice.

Property Summation Multiplication
Statistically superior over more considered datasets
Better dataset-weighted average
Better performance on individual distortions
Reliability
Simplicity
Efficiency
TABLE V: Different criteria used to choose the combination scheme.

Iii-F Parameters of model

The proposed IQA model MDSI has four parameters to be set. The four parameters of MDSI are , , and . To further simplify the MDSI, we set . Therefore, MDSI has only two parameters to set, e.g. and . For an example, we refer to the SSIM index [2] that also uses such a simplification. Note that gradient similarities and chromaticity similarity have different dynamic ranges, therefore, these parameters should be set such that the relation between these maps also be taken into account.

In Fig. 7, the impact of these two parameters on the performance of the MDSI is shown. Even though the parameters , and are set approximately, it can be seen that MDSI is very robust under different setup of parameters. MDSI has greater weighted average SRC than 0.90 for any and . Note that many other possible setup of parameters are not included in this plot. In the experiments, we set , , , and .

Fig. 7: The weighted SRC performance of MDSI for different values of and on eight datasets (TID2008, CSIQ, LIVE, TID2013, VCL@FER, CCID2014, ESPL, and DRIQ).

Iii-G Effect of chromaticity similarity maps CS and

In this section, the impact of using CS [21] and proposed on the performance of the proposed index is studied through the following experiment. Contrast distorted images of the CCID2014 dataset [41] were chosen. The reason of choosing this dataset is to evaluate the ability of measuring color changes by CS and . We analyzed the SRC performance of the CS and as a part of the proposed index for wide range of values. Three pooling strategies were used in this experiment, e.g. mean pooling, mean absolute deviation (MAD) pooling and the standard deviation (SD) pooling. Fig. 8 shows the SRC performance of the proposed index for different scenarios. From the plot in Fig. 8, the following conclusions can be drawn. MAD pooling and both CS and are good choices for MDSI. For almost every pooling strategy and parameter of , the proposed performs better than CS. This advantage is at the same time that the proposed is more efficient than the existing CS.

Fig. 8: The SRC performance of the proposed index MDSI with two chromaticity similarity maps CS and (proposed) for different values of and three pooling strategies on CCID2014 dataset [41].

Iii-H Implementation and efficiency

Another very important factor of a good IQA model is its efficiency. The proposed index has a very low complexity. It first applies average filtering of size on each channel of the and images, downsample them by a factor of and convert the results to a luminance and two chromaticity channels [47]. The value of is set to [48], where and are image height and width, and is the round operator. Then, the proposed index calculates the gradient magnitudes of luminance channel, the chromaticity similarity map, and apply deviation pooling. All these steps are computationally efficient. Table VI lists the run times of fifteen IQA models when applied on images of size 384512 and 10801920. The experiments were performed on a Corei7 3.40GHz CPU with 16 GB of RAM. The IQA models were implemented in MATLAB 2013b running on Windows 7. It can be seen that MDSI is among top five fastest indices. The proposed index is less than 2 times slower than the competing GMSD index. The reason for this is that GMSD only uses the luminance feature. Compared to the other competing indices, SCQI, VSI, ADD-GSIM, SFF, and FSIM, the proposed index MDSI is about 3 to 6 times, 3 to 9 times, 4 to 5 times, 4 to 5 times, and 4 to 11 times faster, respectively. Another observation from the Table VI is that the ranking of indices might not be the same when they are tested on images of different size. For example, SSIM performs slower than the proposed index on smaller images, but faster on larger images.

IQA model 384512 10801920
PSNR 5.69 37.85
GMSD [19] 8.90 78.22
MDSI 12.21 152.85
SSIM [2] 14.97 80.23
SR_SIM [18] 17.02 100.06
MSSSIM [7] 52.16 413.70
ADD-GSIM [22] 59.58 566.99
SFF [20] 64.22 588.57
SCQI [23] 71.68 524.01
VSI [21] 106.87 492.85
FSIM [16] 145.02 590.84
IWSSIM [15] 244.00 2538.43
DSCSI [44] 423.73 4599.83
VIF [9] 635.22 6348.67
MAD [12] 847.54 8452.50
TABLE VI: Run time comparison of IQA models in terms of milliseconds

Iv Conclusion

We propose an effective, efficient, and reliable full reference IQA model based on the new gradient and chromaticity similarities. The gradient similarity was used to measure local structural distortions. In a complementary way, a chromaticity similarity was proposed to measure color distortions. The proposed metric, called MDSI, use a novel deviation pooling to compute the quality score from the two similarity maps. Extensive experimental results on natural and synthetic benchmark datasets prove that the proposed index is effective and reliable, has low complexity, and is fast enough to be used in real-time FR-IQA applications.

Acknowledgments

The authors thank the NSERC of Canada for their financial support under Grants RGPDD 451272-13 and RGPIN 138344-14.

References

  • [1] Z. Wang, “Applications of objective image quality assessment methods [applications corner],” IEEE Signal Processing Magazine, vol. 28, no. 6, pp. 137–142, Nov 2011.
  • [2] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, April 2004.
  • [3] H. Yeganeh and Z. Wang, “Objective quality assessment of tone-mapped images,” IEEE Transactions on Image Processing, vol. 22, no. 2, pp. 657–667, Feb 2013.
  • [4] H. Ziaei Nafchi, A. Shahkolaei, R. Farrahi Moghaddam, and M. Cheriet, “FSITM: A Feature Similarity Index For Tone-Mapped Images,” IEEE Signal Processing Letters, vol. 22, no. 8, pp. 1026–1029, Aug 2015.
  • [5] Z. Wang and A. Bovik, “Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures,” IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98–117, Jan 2009.
  • [6] N. Damera-Venkata, T. Kite, W. Geisler, B. Evans, and A. Bovik, “Image quality assessment based on a degradation model,” IEEE Transactions on Image Processing, vol. 9, no. 4, pp. 636–650, Apr 2000.
  • [7] Z. Wang, E. Simoncelli, and A. Bovik, “Multiscale structural similarity for image quality assessment,” in Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol. 2, Nov 2003, pp. 1398–1402.
  • [8] H. Sheikh, A. Bovik, and G. de Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Transactions on Image Processing, vol. 14, no. 12, pp. 2117–2128, Dec 2005.
  • [9] H. Sheikh and A. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430–444, Feb 2006.
  • [10] D. Chandler and S. Hemami, “VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for Natural Images,” IEEE Transactions on Image Processing, vol. 16, no. 9, pp. 2284–2298, Sept 2007.
  • [11] L. Zhang, D. Zhang, and X. Mou, “RFSIM: A feature based image quality assessment metric using Riesz transforms,” in 17th IEEE International Conference on Image Processing (ICIP), Sept 2010, pp. 321–324.
  • [12] E. C. Larson and D. M. Chandler, “Most apparent distortion: full-reference image quality assessment and the role of strategy,” Journal of Electronic Imaging, vol. 19, no. 1, p. 011006, 2010.
  • [13] M. Narwaria and W. Lin, “Objective image quality assessment based on support vector regression,” IEEE Transactions on Neural Networks, vol. 21, no. 3, pp. 515–519, March 2010.
  • [14] C. Li and A. C. Bovik, “Content-partitioned structural similarity index for image quality assessment,” Signal Processing: Image Communication, vol. 25, no. 7, pp. 517–526, 2010, special Issue on Image and Video Quality Assessment.
  • [15] Z. Wang and Q. Li, “Information content weighting for perceptual image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 5, pp. 1185–1198, May 2011.
  • [16] L. Zhang, D. Zhang, X. Mou, and D. Zhang, “FSIM: A Feature Similarity Index for Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, Aug 2011.
  • [17] A. Liu, W. Lin, and M. Narwaria, “Image quality assessment based on gradient similarity,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1500–1512, April 2012.
  • [18] L. Zhang and H. Li, “SR-SIM: A fast and high performance IQA index based on spectral residual,” in 19th IEEE International Conference on Image Processing (ICIP), Sept 2012, pp. 1473–1476.
  • [19] W. Xue, L. Zhang, X. Mou, and A. Bovik, “Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index,” IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 684–695, Feb 2014.
  • [20] H.-W. Chang, H. Yang, Y. Gan, and M.-H. Wang, “Sparse Feature Fidelity for Perceptual Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 22, no. 10, pp. 4007–4018, Oct 2013.
  • [21] L. Zhang, Y. Shen, and H. Li, “VSI: A Visual Saliency-Induced Index for Perceptual Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 23, no. 10, pp. 4270–4281, Oct 2014.
  • [22] K. Gu, S. Wang, G. Zhai, W. Lin, X. Yang, and W. Zhang, “Analysis of distortion distribution for pooling in image quality prediction,” IEEE Transactions on Broadcasting, vol. 62, no. 2, pp. 446–456, June 2016.
  • [23] S. H. Bae and M. Kim, “A novel image quality assessment with globally and locally consilient visual quality perception,” IEEE Transactions on Image Processing, vol. 25, no. 5, pp. 2392–2406, 2016.
  • [24] M. Oszust, “Decision fusion for image quality assessment using an optimization approach,” IEEE Signal Processing Letters, vol. 23, no. 1, pp. 65–69, Jan 2016.
  • [25] W. Lin and C.-C. J. Kuo, “Perceptual visual quality metrics: A survey,” Journal of Visual Communication and Image Representation, vol. 22, no. 4, pp. 297 – 312, 2011.
  • [26] A. Shnayderman, A. Gusev, and A. Eskicioglu, “An SVD-based grayscale image quality measure for local and global assessment,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 422–429, Feb 2006.
  • [27] S. Li, F. Zhang, L. Ma, and K. N. Ngan, “Image quality assessment by separately evaluating detail losses and additive impairments,” IEEE Transactions on Multimedia, vol. 13, no. 5, pp. 935–949, Oct 2011.
  • [28] M. Sampat, Z. Wang, S. Gupta, A. Bovik, and M. Markey, “Complex wavelet structural similarity: A new image similarity index,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2385–2401, Nov 2009.
  • [29] Z. Wang and X. Shang, “Spatial pooling strategies for perceptual image quality assessment,” in IEEE International Conference on Image Processing, Oct 2006, pp. 2945–2948.
  • [30] A. Moorthy and A. Bovik, “Visual importance pooling for image quality assessment,” IEEE Journal of Selected Topics in Signal Processing, vol. 3, no. 2, pp. 193–201, April 2009.
  • [31] A. K. Moorthy and A. C. Bovik, “Perceptually significant spatial pooling techniques for image quality assessment,” Proc. SPIE, vol. 7240, pp. 724 012–724 012–11, 2009.
  • [32] G.-H. Chen, C.-L. Yang, and S.-L. Xie, “Gradient-based structural similarity for image quality assessment,” in IEEE International Conference on Image Processing, Oct 2006, pp. 2929–2932.
  • [33] D.-O. Kim, H.-S. Han, and R.-H. Park, “Gradient information-based image quality metric,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 930–936, May 2010.
  • [34] J.-M. Geusebroek, R. Van den Boomgaard, A. Smeulders, and H. Geerts, “Color invariance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 12, pp. 1338–1350, Dec 2001.
  • [35] G. F. I. Sobel, “A 3x3 isotropic gradient operator for image processing,” 1968, presented at a talk at the Stanford Artificial Project.
  • [36] I.-T. P. 1401, “Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models,” https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-P.1401-201207-I!!PDF-E&type=items, July 2012.
  • [37] H. Sheikh, Z. Wang, L. Cormack, and A. Bovik, “Live image quality assessment database release 2,” Online: http://live.ece.utexas.edu/research/quality.
  • [38] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and F. Battisti, “TID2008 - A Database for Evaluation of Full-Reference Visual Quality Assessment Metrics,” Advances of Modern Radioelectronics, vol. 10, pp. 30–45, 2009.
  • [39] N. Ponomarenko, O. Ieremeiev, V. Lukin, K. Egiazarian, L. Jin, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, and C.-C. Kuo, “Color image database TID2013: Peculiarities and preliminary results,” in 4th European Workshop on Visual Information Processing (EUVIP), June 2013, pp. 106–111.
  • [40] N. B. H. H. M. L. E. D. S. G. A. Zaric, N. Tatalovic, “VCL@FER Image Quality Assessment Database,” AUTOMATIKA, vol. 53, no. 4, pp. 344–354, 2012.
  • [41] K. Gu, G. Zhai, W. Lin, and M. Liu, “The analysis of image contrast: From quality assessment to automatic enhancement,” IEEE Transactions on Cybernetics, vol. 46, no. 1, pp. 284–297, 2015.
  • [42] D. Kundu and B. L. Evans, “Full-reference visual quality assessment for synthetic images: A subjective study,” in Proc. IEEE Int. Conf. on Image Processing, 2015, pp. 2374–2378.
  • [43] C. Vu, T. Phan, P. Singh, , and D. M. Chandler, “Digitally Retouched Image Quality (DRIQ) Database,” Online: http://vision.okstate.edu/driq/, 2012.
  • [44] D. Lee and K. N. Plataniotis, “Towards a full-reference quality assessment for color images using directional statistics,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3950–3965, Nov 2015.
  • [45] H. Sheikh, M. Sabir, and A. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440–3451, Nov 2006.
  • [46] “Final report from the video quality experts group on the validation of objective models of video quality assessment,” ftp://vqeg.its.bldrdoc.gov/Documents/Meetings/Hillsboro_VQEG_Mar_03/VQEGIIDraftReportv2a.pdf, 2003.
  • [47] H. Ziaei Nafchi and M. Cheriet, “A note on efficiency of downsampling and color transformation in image quality assessment,” CoRR, vol. abs/1606.06152, 2016. [Online]. Available: http://arxiv.org/abs/1606.06152
  • [48] https://ece.uwaterloo.ca/~z70wang/research/ssim/.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
6027
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description