Mean Deviation Similarity Index: Efficient and Reliable FullReference Image Quality Evaluator
Abstract
Applications of perceptual image quality assessment (IQA) in image and video processing, such as image acquisition, image compression, image restoration and multimedia communication, have led to the development of many IQA metrics. In this paper, a reliable full reference IQA model is proposed that utilize gradient similarity (GS), chromaticity similarity (CS), and deviation pooling (DP). By considering the shortcomings of the commonly used GS to model human visual system (HVS), a new GS is proposed through a fusion technique that is more likely to follow HVS. We propose an efficient and effective formulation to calculate the joint similarity map of two chromatic channels for the purpose of measuring color changes. In comparison with a commonly used formulation in the literature, the proposed CS map is shown to be more efficient and provide comparable or better quality predictions. Motivated by a recent work that utilizes the standard deviation pooling, a general formulation of the DP is presented in this paper and used to compute a final score from the proposed GS and CS maps. This proposed formulation of DP benefits from the Minkowski pooling and a proposed power pooling as well. The experimental results on six datasets of natural images, a synthetic dataset, and a digitally retouched dataset show that the proposed index provides comparable or better quality predictions than the most recent and competing stateoftheart IQA metrics in the literature, it is reliable and has low complexity. The MATLAB source code of the proposed metric is available at https://www.mathworks.com/matlabcentral/fileexchange/59809.
I Introduction
Automatic image quality assessment (IQA) plays a significant role in numerous image and video processing applications. IQA is commonly used in image acquisition, image compression, image restoration, multimedia streaming, etc [1]. IQA models (IQAs) mimic the average quality predictions of human observers. Full reference IQAs (FRIQAs), which fall within the scope of this paper, evaluate the perceptual quality of a distorted image with respect to its reference image. This quality prediction is an easy task for the human visual system (HVS) and the result of the evaluation is reliable. Automatic quality assessment, e.g. objective evaluation, is not an easy task because images may suffer from various types and degrees of distortions. FRIQAs can be employed to compare two images of the same dynamic range (usually low dynamic range) [2] or different dynamic ranges [3, 4]. This paper is dedicated to the IQA for low dynamic range images.
Among IQAs, the conventional metric mean squared error (MSE) and its variations are widely used because of their simplicity. However, in many situations, MSE does not correlate with the human perception of image fidelity and quality [5]. Because of this limitation, a number of IQAs have been proposed to provide better correlation with the HVS [6, 2, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. In general, these better performing metrics measure structural information, luminance information and contrast information in the spatial and frequency domains.
The most successful IQA models in the literature follow a topdown strategy [25]. They calculate a similarity map and use a pooling strategy that converts the values of this similarity map into a single quality score. For example, the luminance, contrast and structural information constitute a similarity map for the popular SSIM index [2]. SSIM then uses average pooling to compute the final similarity score. Different feature maps are used in the literature for calculation of this similarity map. Feature similarity index (FSIM) uses phase congruency and gradient magnitude features. GS [17] uses a combination of some designated gradient magnitudes and image contrast for this end, while the GMSD [19] uses only the gradient magnitude. SR_SIM [18] uses saliency features and gradient magnitude. VSI [21] likewise benefits from saliencybased features and gradient magnitude. SVD based features [26], features based on the Riesz transform [11], features in the wavelet domain [10, 27, 28] and sparse features [20] are used as well in the literature. Among these features, gradient magnitude is an efficient feature, as shown in [19]. In contrast, phase congruency and visual saliency features in general are not fast enough features to be used. Therefore, the features being used play a significant role in the efficiency of IQAs.
As we mentioned earlier, the computation of the similarity map is followed by a pooling strategy. The stateoftheart pooling strategies for perceptual image quality assessment (IQA) are based on the mean and the weighted mean [2, 7, 15, 17, 16, 11]. They are robust pooling strategies that usually provide a moderate to high performance for different IQAs. Minkowski pooling [29], local distortion pooling [29, 30, 12], percentile pooling [31] and saliencybased pooling [18, 21] are other possibilities. Standard deviation (SD) pooling was also proposed and successfully used by GMSD [19]. The image gradients are sensitive to image distortions. Different local structures in a distorted image suffer from different degrees of degradations. This is the motivation that the authors in [19] used to explore the standard variation of the gradientbased local similarity map for overall image quality prediction. In general, features that constitute the similarity map and the pooling strategy are very important factors in designing high performance IQA models.
Here, we propose an IQA model called the mean deviation similarity index (MDSI) that shows very good compromise between prediction accuracy and model complexity. The proposed index is efficient, effective and reliable at the same time. It also shows consistent performance for both natural and synthetic images. The proposed metric follows a topdown strategy. It uses gradient magnitude to measure structural distortions and use chrominance features to measure color distortions. These two similarity maps are then combined to form a gradientchromaticity similarity map. We then propose a novel deviation pooling strategy and use it to compute the final quality score. Both image gradient [32, 33, 17, 16, 19, 21] and chrominance features [16, 21] have been already used in the literature. The proposed MDSI uses a new gradient similarity which is more likely to follow HVS. Also, MDSI uses a new chromaticity similarity map which is efficient and shows good performance when used with the proposed metric. The proposed index uses the summation over similarity maps to give independent weights to them. Also, less attention has been paid to the deviation pooling strategy, except for a special case of this type of pooling, namely, standard deviation pooling [19]. We therefore provide a general formulation for the deviation pooling strategy and show its power in the case of the proposed IQA model. In the following, the main contributions of the paper as well as its differences with respect to the previous works are briefly explained.
Unlike previous researches [32, 33, 17, 16, 19, 21] that use a similar gradient similarity map, a new gradient similarity map is proposed in this paper which is more likely to follow the human visual system (HVS). This statement is supported by visual examples and experimental results.
This paper proposes a new chormaticity similarity map with the following advantages over the previously used chromaticity similarity maps [16, 21]. Its complexity is lower and it provides slightly better quality predictions when used with the proposed metric.
Motivated by a previous study that proposed to use standard deviation pooling [19], we propose a systematic and general formulation of the deviation pooling which has a comprehensive scope.
The rest of the paper is organized as follows. The proposed mean deviation similarity index is presented in section II. Extensive experimental results and discussion on six natural datasets, a synthetic dataset, and a digitally retouched dataset are provided in section III. Section IV presents our conclusions.
Ii Mean Deviation Similarity Index
The proposed IQA model uses two similarity maps. Image gradient, which is sensitive to structural distortions, is used as the main feature to calculate the first similarity map. Then, color distortions are measured by a chromaticity similarity map. These similarity maps are combined and pooled by a proposed deviation pooling strategy. In this paper, conversion to luminance is done through the following formula: . In addition, two chromaticity channels of a Gaussian color model [34] are used:
(1) 
Iia Gradient Similarity
It is very common that gradient magnitude in the discrete domain is calculated on the basis of some operators that approximate derivatives of the image function using differences. These operators approximate vertical and horizontal gradients of an image using convolution: and , where and are horizontal and vertical gradient operators and denotes the convolution. The first derivative magnitude is defined as . The Sobel operator [35], the Scharr operator, and the Prewitt operator are common gradient operators that approximate first derivatives. Within the proposed IQA model, these operators perform almost the same.
Through this paper, Prewitt operator is used to compute gradient magnitudes of luminance channels of reference and distorted images, and . From which, gradient similarity (GS) is computed by the following SSIM induced equation:
(2) 
where, parameter is a constant to control numerical stability. The gradient similarity (GS) is widely used in the literature [32, 33, 17, 16, 19, 21] and its usefulness to measure image distortions was extensively investigated in [19].
In many scenarios, human visual system (HVS) disagrees with the judgments provided by the GS for structural distortions. In fact, in such a formulation, there is no difference between an added edge to or a removed edge from the distorted image with respect to the reference image. An extra edge in bring less attention of HVS if its color is close to the relative pixels of that edge in . Likewise, HVS pays less attention to a removed edge from that is replaced with pixels of the same or nearly the same color. In another scenario, suppose that edges are preserved in but with different colors than in . In this case, GS is likely to fail at providing a good judgment “on the edges”. These shortcomings of the GS motivated us to propose a new GS map.
IiB The Proposed Gradient Similarity
The aforementioned shortcomings of the conventional gradient similarity map (equation 2) are mainly because and are computed independent of each other. In the following, we propose a fusion technique to include the correlation between and images into computation of the gradient similarity map.
We fuse the luminance channels of the and by a simple averaging: = 0.5 ( + ). Two extra GS maps are computed as follows:
(3) 
(4) 
where, is the gradient magnitude of the fused image , and is used for numerical stability. Note that , and that and can or can not be equal. The proposed gradient similarity () is computed by:
(5) 
The added term , will put more emphasize on removed edges from than added edges to the . For weak added/removed edges, it is likely that weak edges smooth out in . Therefore, always put less emphasize on weak edges.
Comparing visually some outputs of the GS and at this step might not be fair because they have different numerical scales. GS is bounded between 0 and 1, while might have negative values greater than 1, and/or positive values smaller than +2. Therefore, this comparison is performed on the final similarity map and is presented in subsection IIE as well as more explanation on how the proposed works.
IiC Chromaticity Similarity
For the case of color changes and especially when the structure of the distorted image remains unchanged, the gradient similarity (GS) and the proposed may lead to inaccurate quality predictions. Therefore, previous researches such as [16, 21] used a color similarity map to measure color differences. Let and denote two chromaticity channels regardless of the type of the color space. In [16, 21], for each channel a color similarity is computed and their result is combined as:
(6) 
where is a constant to control numerical stability. In this paper, we propose a new formulation to calculate color similarity. The proposed formulation calculates a color similarity map using both chromaticity channels at once:
(7) 
Similar to the CS in equation (6), the above joint color similarity () formulation gives equal weight to both chromaticity channels and . It is clear that is more efficient than CS. CS needs 7 multiplications, 6 summations, 2 divisions, and 2 shift operations (multiplications by 2), while needs 6 multiplications, 6 summations, 1 division, and 1 shift operation. Note that CS can also be computed through 8 multiplications, 6 summations, 1 division, and 2 shift operations. In experimental results section, an experiment is conducted to compare usefulness of the CS and along with the proposed metric.
The gradient similarity maps (GS or ) can be combined with the joint color similarity map through the following summation (weighted average) scheme:
(8) 
(9) 
where the parameter adjusts the relative importance of the gradient and chromaticity similarity maps. The proposed metric MDSI uses equation (9). Equation (8) is included to be compared with equation (9). An alternative combination scheme which is very popular in stateoftheart is through multiplication in the form of , where the parameters and are used to adjust the relative importance of the two similarity maps. For several reasons, the proposed index uses the summation scheme (refer to subsection IIIE).
In Fig 1, two examples are provided to show that these two similarity maps, e.g. GS and , are complementary. In the first example, there is a considerable difference between the gradient maps of the reference and the distorted images. Hence, the GS map is enough for a good judgment. However, this difference in the second example (second row) is trivial, which leads to a wrong prediction by using GS as the only similarity map. The examples in Fig 1 show that the gradient similarity and chromaticity similarity are complementary.
IiD Deviation Pooling
The motivation of using the deviation pooling is that HVS is sensitive to both magnitude and the spread of the distortions across the image. Other pooling strategies such as Minkowski pooling and percentile pooling adjust the magnitude of distortions or discard the less/non distorted pixels. These pooling strategies and the mean pooling do not take into account the spread of the distortions. It is shown in [19] by case examples and experimental results that a common wrong prediction by mean pooling is where it calculates the same quality scores for two distorted images of different type. In such cases, deviation pooling is likely to provide good judgments over their quality through spread of the distortions. This is the reason why mean pooling have good interclass (one distortion type) quality prediction but its performance might be degraded for intraclass (whole dataset) quality prediction. While this statement can be verified from the experimental results provided in [19], an example is also provided in subsection IIID to support this statement. Human visual system penalizes severer distortions much more than the distortionfree regions, and these pixels may constitute different fractions of distorted images. Mean pooling, however, depending on this fraction, is likely to nullify the impact of the severer distortions by inclusion of distortionfree regions into the average computation. Fig. 2 shows overlapped histograms of two similarity maps corresponding to two distorted images. While mean pooling indicate that image #1 is of better quality than image #2 (), deviation pooling provides an opposite assessment (). Given that , and that image #1 has more severe distortions compared to image #2 with their values farther from than , there are larger deviations in similarity map of image #1 than that of image #2. Therefore, deviation pooling is an alternative to the mean pooling that can also measure different levels of distortions. In the following, we propose the deviation pooling (DP) strategy and provide a general formulation of this pooling.
DP for IQAs is rarely used in the literature, except the standard deviation used in GMSD [19], which is a special case of DP. A deviation can be seen as the Minkowski distance of order between vector x and its MCT (Measure of Central Tendency):
(10) 
where indicates the type of deviation. The only MCT that is used in this paper is mean. Though other MCTs such as median and mode can be used, we found that these MCTs do not provide satisfactory quality predictions.
Several researches shown that more emphasis on the severer distortions can lead to more accurate predictions [29, 31]. The Minkowski pooling [29] and the percentile pooling [31] are two examples. As mentioned before, these pooling strategies follow a property of HVS that penalize severer distortions much more than the less distorted ones even though they constitute a small portion of total distortions. Hence, they try to moderate the weakness of the mean pooling through adjusting magnitudes of distortions [29] or discarding the less/non distorted regions [31]. The deviation pooling can be generalized to consider the aforementioned property of HVS:
(11) 
where, adjusts the emphasis of the values in vector x, and MCT is calculated through x values. Furthermore, we propose to use power pooling in conjunction with the deviation pooling to control numerical behavior of the final quality scores:
(12) 
where, is the power pooling applied on the final value of the deviation. The power pooling can be used to make an IQA model more linear versus the subjective scores or might be used for better visualization of the scores. Linearity might not be a significant advantage of an IQA, but it is pointed to be of interest in [19]. Also, according to [36], linearity against subjective data is one of the measures for validation of IQAs that should be examined^{1}^{1}1Though linearity is measured after a nonlinear analysis.. The power pooling can also have small impact on the values of Pearson linear Correlation Coefficient (PCC) and Root Mean Square Error (RMSE). Note that the above deviation pooling is equal to the Minkowski pooling [29] when MCT , and . It is equal to the mean absolute deviation (MAD) to the power of for , and equal to the standard deviation (SD) to the power of for . The three parameters should be set according to the IQA model. More analysis on these three parameters can be found in experimental results section. For the proposed index MDSI, we set , and . Therefore, the proposed IQA model can be written as:
(13) 
Note that possible interval for is , where and . It is worth to mention that values of mostly remain in . Also, are highly distorted pixels, while refer to less/nondistorted pixels, where is a very small number. The global variations of is computed by mean absolute deviation, which is followed by power pooling. Note that since absolute of deviations is computed, the quality scores are positive. Larger values of the quality predictions provided by the proposed index indicate to the more severe distorted images, while an image with perfect quality is assessed by a quality score of zero since there is no variation in its similarity map. The important point on the use of the Minkowski pooling on final similarity maps is that terms like “more emphasize” and “less emphasize”, regardless of the values have been used, depends also on the pooling strategy and underlying similarity map. For example, placing more emphasize on highly distorted regions by Minkowski pooling will decrease the quality score computed by the mean pooling, but the quality score provided by the deviation pooling might become larger or smaller depending on the spread of the distortions which is directly related to the underlying similarity map.
IiE Analysis and Examples of GCS Maps
In this section, final similarity maps after applying the Minkowski pooling, e.g. GCS and , are compared along with sufficient explanations. The difference between these two similarity map is their use of gradient similarity. GCS uses conventional GS, while uses the proposed gradient similarity . The best way to analyze the effect of the proposed gradient similarity is through step by step explanation and visualization of different examples. In subsection IIA, several disadvantages of the traditional GS was mentioned. Here, each of them are explained and examples are provided.
Case 1 (Removed edge)
Missing edges in distorted image with respect to its original image means that structural information are removed, hence this disappearance bring attention of the HVS. These regions has to be strongly highlighted in the similarity map.
Case 2 (A weak added/removed edge)
An extra edge in or a removed edge from bring less attention of HVS if its color is close to the relative pixels of that edge in (), or simply it is a weak edge.
Fig. 3 shows how the proposed gradient similarity map performs for case 1 and case 2 as a part of the compared to the GS for GCS. We can see that (GS) highlighted differences with details. The edges corresponding to the location of ropes in original image are mainly replaced with pixels of another color (dark replaced with green), but many other edges with smaller strengths in are replaced with pixels having the same color (green). This latter holds for added edges to the distorted image. In fused image (), some of these weaker edges are smoothed. This can be seen by comparing and . Both and indicate to high differences at the location of the ropes. will also put high emphasize on this location, but less emphasize on the weaker edges. The results is then subtracted by which in turn again less emphasize is placed on the weak edges (relevant to the darker pixels in ). Note that GCS and have different numerical behavior, so it is fair to compare them by looking at the GCS and . Compared to the GCS, indicate to larger differences at the location of ropes, but smaller differences elsewhere.
Case 3 (Preserved edge but with different color)
Although a color similarity map should measure color differences at the location of the inverted edges, edges constitute a small fraction of total pixels in images, and it is common to give smaller weights to a color similarity map than structural similarities such as gradient similarity. While traditional gradient similarity does not work well in this situation, the proposed gradient similarity can partially solve this problem. Fig. 4 provides an example in which most of the edges are inverted in the distorted image. We can see that highlighted much more differences than GCS at these locations, thanks to the added term to the traditional gradient similarity. In fact, is likely to be different than in this case because these edges in are likely to become closer to their surrounding pixels in either or images.
Iii Experimental results and discussion
In the experiments, eight datasets were used. The LIVE dataset [37] contains 29 reference images and 779 distorted images of five categories. The TID2008 [38] dataset contains 25 reference images and 1700 distorted images. For each reference image, 17 types of distortions of 4 degrees are available. CSIQ [12] is another dataset that consists of 30 reference images; each is distorted using six different types of distortions at four to five levels of distortion. The large TID2013 [39] dataset contains 25 reference images and 3000 distorted images. For each reference image, 24 types of distortions of 5 degrees are available. VCL@FER database [40] consists of 23 reference images and 552 distorted images, with four degradation types and six degrees of degradation. In addition to these five datasets, contrast distorted images of the CCID2014 dataset [41] are used in the experiments. This dataset contains 655 contrast distorted images of five types. Gamma transfer, convex and concave arcs, cubic and logistic functions, mean shifting, and a compound function are used to generate these five types of distortions. We also used the ESPL synthetic image database [42] which contains 25 synthetic images of video games and animated movies. It contains 500 distorted images of 5 categories. Fig. 5 shows an example of a reference and a distorted synthetic image. Finally, the digitally retouched image quality (DRIQ) dataset [43] was used in the experiments. It contains 26 reference images and 3 enhanced images for each reference image.
MSSSIM  VIF  MAD  IWSSIM  SR_SIM  FSIM  GMSD  SFF  VSI  DSCSI  ADDGSIM  SCQI  MDSI  

SRC  0.8542  0.7491  0.8340  0.8559  0.8913  0.8840  0.8907  0.8767  0.8979  0.8634  0.9094  0.9051  0.9208  
TID  PCC  0.8451  0.8084  0.8290  0.8579  0.8866  0.8762  0.8788  0.8817  0.8762  0.8445  0.9120  0.8899  0.9160  
2008  KRC  0.6568  0.5861  0.6445  0.6636  0.7149  0.6991  0.7092  0.6882  0.7123  0.6651  0.7389  0.7294  0.7515  
RMSE  0.7173  0.7899  0.7505  0.6895  0.6206  0.6468  0.6404  0.6333  0.6466  0.7187  0.5504  0.6120  0.5383  
SRC  0.9133  0.9195  0.9467  0.9213  0.9319  0.9310  0.9570  0.9627  0.9423  0.9417  0.9422  0.9434  0.9569  
\multirow2*CSIQ  PCC  0.8991  0.9277  0.9500  0.9144  0.9250  0.9192  0.9541  0.9643  0.9279  0.9313  0.9342  0.9268  0.9531  
KRC  0.7393  0.7537  0.7970  0.7529  0.7725  0.7690  0.8129  0.8288  0.7857  0.7787  0.7894  0.7870  0.8130  
RMSE  0.1149  0.0980  0.0820  0.1063  0.0997  0.1034  0.0786  0.0695  0.0979  0.0956  0.0937  0.0986  0.0795  
SRC  0.9513  0.9636  0.9669  0.9567  0.9618  0.9645  0.9603  0.9649  0.9524  0.9487  0.9681  0.9406  0.9667  
\multirow2*LIVE  PCC  0.9489  0.9604  0.9675  0.9522  0.9553  0.9613  0.9603  0.9632  0.9482  0.9434  0.9657  0.9344  0.9659  
KRC  0.8044  0.8282  0.8421  0.8175  0.8299  0.8363  0.8268  0.8365  0.8058  0.7982  0.8474  0.7835  0.8395  
RMSE  8.6188  7.6137  6.9072  8.3472  8.0812  7.5296  7.6214  7.3460  8.6817  9.0635  7.0925  9.7355  7.0790  
SRC  0.7859  0.6769  0.7807  0.7779  0.8073  0.8510  0.8044  0.8513  0.8965  0.8744  0.8285  0.9052  0.8899  
TID  PCC  0.8329  0.7720  0.8267  0.8319  0.8663  0.8769  0.8590  0.8706  0.9000  0.8782  0.8807  0.9071  0.9085  
2013  KRC  0.6047  0.5147  0.6035  0.5977  0.6406  0.6665  0.6339  0.6581  0.7183  0.6862  0.6646  0.7327  0.7123  
RMSE  0.6861  0.7880  0.6976  0.6880  0.6193  0.5959  0.6346  0.6099  0.5404  0.5930  0.5871  0.5219  0.5181  
SRC  0.9227  0.8866  0.9061  0.9163  0.9021  0.9323  0.9177  0.7738  0.9317  0.9289  0.9366  0.9083  0.9318  
VCL@  PCC  0.9232  0.8938  0.9053  0.9191  0.9023  0.9329  0.9176  0.7761  0.9320  0.9338  0.9339  0.9107  0.9349  
FER  KRC  0.7497  0.6924  0.7213  0.7372  0.7183  0.7643  0.7406  0.5779  0.7633  0.7588  0.7731  0.7316  0.7629  
RMSE  9.4398  11.014  10.433  9.6788  10.589  8.8480  9.7643  15.488  8.9051  8.7902  8.7819  10.147  8.7136  
SRC  0.7770  0.8349  0.7451  0.7811  0.7363  0.7657  0.8077  0.6859  0.7734  0.7400  0.8698  0.7811  0.8128  
CCID  PCC  0.8278  0.8588  0.7516  0.8342  0.7834  0.8204  0.8521  0.7575  0.8209  0.7586  0.8935  0.8200  0.8576  
2014  KRC  0.5845  0.6419  0.5490  0.5898  0.5372  0.5707  0.6100  0.5012  0.5735  0.5468  0.6840  0.5812  0.6181  
RMSE  0.3668  0.3350  0.4313  0.3606  0.4064  0.3739  0.3422  0.4269  0.3734  0.4260  0.2936  0.3734  0.3363  
SRC  0.7247  0.7488  0.8624  0.8270  0.8802  0.8766  0.8209  0.8127  0.8717  0.7263  0.7828  0.8292  0.8806  
\multirow2*ESPL  PCC  0.7322  0.7423  0.8677  0.8300  0.8732  0.8738  0.8234  0.8179  0.8726  0.7302  0.7902  0.8356  0.8802  
KRC  0.5208  0.5565  0.6720  0.6221  0.6932  0.6853  0.6178  0.6127  0.6765  0.5222  0.5814  0.6243  0.6895  
RMSE  9.4519  9.2985  6.8985  7.7404  6.7646  6.7482  7.8753  7.9844  6.7791  9.4815  8.5053  7.6241  6.5862  
SRC  0.6692  0.8078  0.6867  0.6903  0.7551  0.7751  0.7762  0.8342  0.8222  0.8167  0.7661  0.8482  0.8508  
\multirow2*DRIQ  PCC  0.7058  0.8496  0.6967  0.7155  0.8027  0.7989  0.8001  0.8420  0.8477  0.8463  0.8053  0.8638  0.8702  
KRC  0.4739  0.5997  0.4898  0.4952  0.5604  0.5771  0.5758  0.6477  0.6177  0.6104  0.5618  0.6490  0.6557  
RMSE  1.4450  1.0759  1.4631  1.4249  1.2165  1.2268  1.2235  1.1004  1.0820  1.0864  1.2092  1.0277  1.0050  
\multirow3*

SRC  0.8248  0.8234  0.8411  0.8408  0.8583  0.8725  0.8669  0.8453  0.8860  0.8550  0.8754  0.8826  0.9013  
PCC  0.8394  0.8516  0.8493  0.8569  0.8743  0.8824  0.8807  0.8591  0.8907  0.8583  0.8894  0.8860  0.9108  
KRC  0.6418  0.6466  0.6649  0.6595  0.6834  0.6960  0.6909  0.6689  0.7067  0.6708  0.7051  0.7023  0.7303  
\multirow3*

SRC  0.8335  0.7783  0.8374  0.8387  0.8578  0.8769  0.8626  0.8585  0.8974  0.8698  0.8783  0.8977  0.9066  
PCC  0.8521  0.8287  0.8546  0.8626  0.8810  0.8877  0.8838  0.8729  0.8963  0.8679  0.8995  0.8967  0.9160  
KRC  0.6511  0.6112  0.6626  0.6587  0.6880  0.6999  0.6913  0.6791  0.7206  0.6855  0.7140  0.7231  0.7375 
For objective evaluation, four popular evaluation metrics were used in the experiments: the Spearman Rankorder Correlation coefficient (SRC), the Pearson linear Correlation Coefficient (PCC) after a nonlinear regression analysis (equation 14), the Kendall Rank Correlation coefficient (KRC) and the Root Mean Square Error (RMSE). The SRC, PCC, and RMSE metrics measure prediction monotonicity, prediction linearity, and prediction accuracy, respectively. The KRC was used to evaluate the degree of similarity between quality scores and MOS. In addition, Pearson linear Correlation Coefficient without nonlinear analysis is used and denoted by LPCC.
Twelve stateoftheart IQA models were chosen for comparison [7, 9, 12, 15, 18, 16, 19, 20, 21, 44, 22, 23] including the most recent indices in literature [20, 19, 21, 44, 22, 23]. It should be noted that the five indices SFF [20], GMSD [19], VSI [21], [22], and SCQI [23] have shown superior performance over stateoftheart indices.
Iiia Performance comparison
In Table I, the overall performance of thirteen IQA models on eight benchmark datasets, e.g. TID2008, CSIQ, LIVE, TID2013, VCL@FER, CCID2014, ESPL, and DRIQ, is listed. For each dataset and evaluation metric, the top three IQA models are highlighted. On eight datasets, MDSI is 31 times among the top indices, followed by ADDGSIM (16 times), SCQI (12 times), SFF/FSIM/VIF (6 times), VSI (5 times), GMSD/SR_SIM/MAD^{2}^{2}2Note the conflict between ‘MAD’ [12] as an IQA model, and ‘MAD’ as a pooling strategy. (4 times), DSCSI (2 times), and MSSSIM/IWSSIM (0 times). To provide a conclusion on the overall performance of these indices, direct and weighted^{3}^{3}3The dataset sizeweighted average is commonly used in the literature [15, 20, 19, 21]. overall performances on the eight datasets (8150 images) are also listed in Table I. It can be seen that MDSI has the best overall performance on the eight datasets, while metrics VSI and SCQI are the second, and third best, respectively.
IiiB Visualization and statistical evaluation
TID2008  1  2  3  4  5  6  7  8  9  10  sum  

1  VIF    1  1  1  1  1  1  1  1  1  9 
2  MAD  +1    1  1  1  1  1  1  1  1  7 
3  SR_SIM  +1  +1    +1  +1  +1  +1  1  1  1  +3 
4  FSIM  +1  +1  1    1  1  +1  1  1  1  3 
5  GMSD  +1  +1  1  +1    1  +1  1  1  1  1 
6  SFF  +1  +1  1  +1  +1    +1  1  1  1  +1 
7  VSI  +1  +1  1  0  1  1    1  1  1  4 
8  ADDGSIM  +1  +1  +1  +1  +1  +1  +1    +1  1  +7 
9  SCQI  +1  +1  +1  +1  +1  +1  +1  1    1  +5 
10  MDSI  +1  +1  +1  +1  +1  +1  +1  +1  +1    +9 
CSIQ  1  2  3  4  5  6  7  8  9  10  sum  
1  VIF    1  +1  +1  1  1  0  1  0  1  3 
2  MAD  +1    +1  +1  1  1  +1  +1  +1  1  +3 
3  SR_SIM  1  1    +1  1  1  1  1  1  1  7 
4  FSIM  1  1  1    1  1  1  1  1  1  9 
5  GMSD  +1  +1  +1  +1    1  +1  +1  +1  0  +6 
6  SFF  +1  +1  +1  +1  +1    +1  +1  +1  +1  +9 
7  VSI  0  1  +1  +1  1  1    1  0  1  3 
8  ADDGSIM  +1  1  +1  +1  1  1  +1    +1  1  +1 
9  SCQI  0  1  +1  +1  1  1  0  1    1  3 
10  MDSI  +1  +1  +1  +1  0  1  +1  +1  +1    +6 
LIVE  1  2  3  4  5  6  7  8  9  10  sum  
1  VIF    1  +1  0  0  1  +1  1  +1  1  1 
2  MAD  +1    +1  +1  +1  +1  +1  0  +1  0  +7 
3  SR_SIM  1  1    1  1  1  +1  1  +1  1  5 
4  FSIM  0  1  +1    0  0  +1  1  +1  1  0 
5  GMSD  0  1  +1  0    1  +1  1  +1  1  1 
6  SFF  +1  1  +1  0  +1    +1  1  +1  1  +2 
7  VSI  1  1  1  1  1  1    1  +1  1  7 
8  ADDGSIM  +1  0  +1  +1  +1  +1  +1    +1  0  +7 
9  SCQI  1  1  1  1  1  1  1  1    1  9 
10  MDSI  +1  0  +1  +1  +1  +1  +1  0  +1    +7 
TID2013  1  2  3  4  5  6  7  8  9  10  sum  

1  VIF    1  1  1  1  1  1  1  1  1  9 
2  MAD  +1    1  1  1  1  1  1  1  1  7 
3  SR_SIM  +1  +1    1  +1  1  1  1  1  1  3 
4  FSIM  +1  +1  +1    +1  +1  1  1  1  1  +1 
5  GMSD  +1  +1  1  1    1  1  1  1  1  5 
6  SFF  +1  +1  +1  1  +1    1  0  1  1  0 
7  VSI  +1  +1  +1  +1  +1  +1    +1  1  1  +5 
8  ADDGSIM  +1  +1  +1  +1  +1  +1  1    1  1  +3 
9  SCQI  +1  +1  +1  +1  +1  +1  +1  +1    1  +7 
10  MDSI  +1  +1  +1  +1  +1  +1  +1  +1  +1    +9 
\multirow2*

\multirow2*1  \multirow2*2  \multirow2*3  \multirow2*4  \multirow2*5  \multirow2*6  \multirow2*7  \multirow2*8  \multirow2*9  \multirow2*10  \multirow2*sum  
1  VIF    1  1  1  1  +1  1  1  1  1  7  
2  MAD  +1    +1  1  1  +1  1  1  1  1  3  
3  SR_SIM  +1  1    1  1  +1  1  1  1  1  5  
4  FSIM  +1  +1  +1    +1  +1  0  0  +1  0  +6  
5  GMSD  +1  +1  +1  1    +1  1  1  +1  1  +1  
6  SFF  1  1  1  1  1    1  1  1  1  9  
7  VSI  +1  +1  +1  0  +1  +1    0  +1  1  +5  
8  ADDGSIM  +1  +1  +1  0  +1  +1  0    +1  0  +6  
9  SCQI  +1  +1  +1  1  1  +1  1  1    1  1  
10  MDSI  +1  +1  +1  0  +1  +1  +1  0  +1    +7 
CCID2014  1  2  3  4  5  6  7  8  9  10  sum  
1  VIF    +1  +1  +1  +1  +1  +1  1  +1  0  +6 
2  MAD  1    1  1  1  1  1  1  1  1  9 
3  SR_SIM  1  +1    1  1  +1  1  1  +1  1  3 
4  FSIM  1  +1  +1    1  +1  0  1  +1  1  0 
5  GMSD  1  +1  +1  +1    +1  +1  1  +1  1  +3 
6  SFF  1  +1  1  1  1    1  1  0  1  6 
7  VSI  1  +1  +1  0  1  +1    1  +1  1  0 
8  ADDGSIM  +1  +1  +1  +1  +1  +1  +1    +1  +1  +9 
9  SCQI  1  +1  1  1  1  0  1  1    1  6 
10  MDSI  0  +1  +1  +1  +1  +1  +1  1  +1    +6 
ESPL  1  2  3  4  5  6  7  8  9  10  sum  

1  VIF    1  1  1  1  1  1  1  1  1  9 
2  MAD  +1    1  1  +1  +1  1  +1  +1  1  +1 
3  SR_SIM  +1  +1    0  +1  +1  +1  +1  +1  1  +6 
4  FSIM  +1  +1  0    +1  +1  +1  +1  +1  1  +6 
5  GMSD  +1  1  1  1    +1  1  +1  1  1  3 
6  SFF  +1  1  1  1  1    1  +1  1  1  5 
7  VSI  +1  +1  1  1  +1  +1    +1  +1  1  +3 
8  ADDGSIM  +1  1  1  1  1  1  1    1  1  7 
9  SCQI  +1  1  1  1  +1  +1  1  +1    1  1 
10  MDSI  +1  +1  +1  +1  +1  +1  +1  +1  +1    +9 
DRIQ  1  2  3  4  5  6  7  8  9  10  sum  

1  VIF    +1  +1  +1  +1  +1  +1  +1  0  1  +5 
2  MAD  1    1  1  1  1  1  1  1  1  9 
3  SR_SIM  1  +1    0  0  1  1  1  1  1  5 
4  FSIM  1  +1  0    0  1  1  1  1  1  5 
5  GMSD  1  +1  0  0    1  1  1  1  1  5 
6  SFF  1  +1  +1  +1  +1    0  +1  1  1  +2 
7  VSI  1  +1  +1  +1  +1  0    +1  1  1  +2 
8  ADDGSIM  1  +1  +1  +1  +1  1  1    1  1  1 
9  SCQI  0  +1  +1  +1  +1  +1  +1  +1    1  +6 
10  MDSI  +1  +1  +1  +1  +1  +1  +1  +1  +1    +9 
For the purpose of visualizing quality scores of the proposed index, the scatter plots of the proposed IQA model MDSI with and without using power pooling are shown in Fig. 6. The logistic function suggested in [45] was used to fit a curve on each plot:
(14) 
where , , , and are fitting parameters computed by minimizing the mean square error between quality predictions and subjective scores MOS. It should be noted that reported PCC and RMSE values in this paper are computed after mapping quality scores to MOS based on above function.
The reported results in Table I show the difference between different IQA models. As suggested in [46, 45], we use Ftest to decide whether a metric is statistically superior to another index. The Ftest is based on the residuals between the quality scores given by an IQA model after applying nonlinear mapping of equation (14), and the mean subjective scores MOS. The ratio of variance between residual errors of an IQA model to another model at 95% significance level is used by Ftest. The result of the test is equal to 1 if we can reject the null hypothesis and 0 otherwise. The results of Ftest on eight datasets are listed in Table II. In this Table, +1/1 indicate that corresponding index is statistically superior/inferior to the other index being compared to. If the difference between two indices is not significant, the result is shown by 0. We note that type I error might be occurred, specially when quality scores of IQA models are not Gaussian. However, even existence of possible errors is very unlikely to result in another conclusion about the superiority of the proposed index because there is a considerable gap between the proposed index and the other metrics as discussed in the following.
From the results of Table II, we can see that MDSI is significantly better than the other indices on TID2008, TID2013, ESPL, and DRIQ datasets. Therefore, its sum value in the last column is +9 for these four datasets. SCQI is statistically superior to the other indices on the TID2013 dataset except for MDSI. On the LIVE dataset, indices MAD, MDSI, and ADDGSIM are significantly better than the other indices. On the CSIQ dataset, only SFF performs significantly better than MDSI. On the CCID2014 dataset, ADDGSIM is significantly better than the other indices, while the statistically indistinguishable indices VIF and MDSI show promising results. Considering all eight datasets used in this experiment, with a minimum sum value of +6, the proposed index MDSI performs very well in comparison with the other indices. We can simply add the eight cumulative sum values of each metric for the eight datasets to have an overall comparison based on the statistical significance test. This score indicates how many times a metric is statistically superior to the other metrics. The results show that MDSI is the best performing index by a score of +62 (out of maximum +72), followed by ADDGSIM (+25), VSI (+1), SCQI (2), FSIM (4), GMSD (5), SFF (6), SR_SIM (19), MAD (24), and VIF (26). The results based on the statistical significance test verify that unlike other IQA models, the proposed metric MDSI is among the best performing indices on different datasets.
IiiC Performance comparison on individual distortions
A good IQA model should perform not only accurate quality predictions for a whole dataset; it should provide good judgments over individual distortion types. We list in Table III the average SRC, and PCC values of thirteen IQA models for 61 sets of distortions available in the six datasets of TID2008, CSIQ, LIVE, TID2013, VCL@FER, and ESPL. The minimum value for each evaluation metric and standard deviation of these 61 values are also listed. These two evaluations indicate to the reliability of an IQA model. An IQA model should provide good prediction accuracy for all of the distortion types. If a metric fails at assessing one or more types of distortions, that index can not be reliable.
The proposed index MDSI, has the best SRC, and PCC average on distortion types. MDSI, SCQI and FSIM in the worst case perform better than the other IQA models, as can be seen in the min column for each evaluation metric. This shows the reliability of the proposed index. The negative min values and close to zero min values in Table III indicate the unreliability of related models when dealing with some distortion types. The standard deviation of 61 values for each evaluation metric is another reliability factor. According to Table III, MDSI, SCQI and FSIM have the lowest variation. Therefore, we can conclude that indices MDSI, SCQI and FSIM are more reliable than the other indices.
\multirow2*IQA model  SRC (Distortions)  PCC (Distortions)  

avg  min  std  avg  min  std  
MSSSIM  0.8343  0.4099  0.1989  0.8560  0.4448  0.1944 
VIF  0.8537  0.3099  0.1811  0.8760  0.3443  0.1812 
MAD  0.8111  0.0575  0.2315  0.8296  0.0417  0.2108 
IWSSIM  0.8329  0.4196  0.2019  0.8568  0.4503  0.1962 
SR_SIM  0.8609  0.2053  0.1806  0.8785  0.3162  0.1839 
FSIM  0.8775  0.4679  0.1041  0.8967  0.5488  0.0880 
GMSD  0.8542  0.2948  0.1954  0.8785  0.3625  0.1851 
SFF  0.8538  0.1786  0.1472  0.8721  0.0786  0.1441 
VSI  0.8779  0.1713  0.1360  0.8969  0.4875  0.1044 
DSCSI  0.8722  0.3534  0.1242  0.8908  0.5166  0.1093 
ADDGSIM  0.8650  0.2053  0.1686  0.8799  0.2190  0.1691 
SCQI  0.8826  0.4479  0.1057  0.9010  0.6493  0.0841 
MDSI  0.8903  0.4378  0.1030  0.9095  0.6899  0.0805 
\multirow2*Pooling  Weighted avg. SRC (8 datasets)  Avg. SRC (61 Distortions)  

Mean  MAD  SD  Mean  MAD  SD  
q = 1/4  0.8864  0.9066  0.8776  0.8919  0.8903  0.8828 
q = 1/2  0.8833  0.9067  0.8820  0.8912  0.8898  0.8826 
q = 1  0.8730  0.9041  0.8899  0.8899  0.8890  0.8820 
q = 2  0.8519  0.8928  0.8972  0.8888  0.8866  0.8820 
q = 4  0.8301  0.8766  0.8922  0.8869  0.8780  0.8753 
IiiD Parameters of deviation pooling (, , )
Considering the formulation of deviation pooling in equation (12), we used the mean absolute deviation (MAD), e.g. , for the proposed metric. Standard deviation (SD), e.g. , is another option that can be used for deviation pooling. In addition, the Minkowski power () of the deviation pooling can have significant impact on the proposed index. In Table IV, the SRC performance of the proposed index is analyzed for different values of and . Mean pooling is also used in this experiment. The results show that MAD pooling with is a better choice for the proposed index. Also, the performance of the mean pooling on 61 distortion set confirms our statement that mean pooling has a good performance for interclass quality prediction.
The impact of the proposed power pooling of the deviation pooling on the proposed metric was shown in Fig. 6. Power pooling can be also used to increase linearity of other indices as well. For example, LPCC and PCC values of VSI [21] for TID2013 dataset, by setting , can be increased from 0.8373 to 0.8928, and 0.9000 to 0.9011, respectively.
IiiE Summation vs. Multiplication
Two options for combination of the two similarity maps GS/ and are summation and multiplication as explained in subsection IIC. Deciding whether one approach is superior to another for an index depends on many factors. These factors might be the pooling strategy being used, overall performance, performance on individual distortions, reliability, efficiency, simplicity, etc. In an experiment, the performance of the MDSI using the multiplication approach was examined. Based on the many set of parameters were tested, we found that and are good parameters to combine and via the multiplication scheme. The observation was that summation is a better choice for TID2008, TID2013, VCL@FER, and DRIQ datasets, while multiplication is a better choice for ESPL dataset, and that both approaches show almost the same performance on other datasets. Overall, the summation approach provides better performance on individual distortions. This experiment also shown that MDSI is more reliable through summation than multiplication based on the reliability measures introduced in this paper. Based on this experiment, the simplicity of the summation combination approach and its efficiency over multiplication, the former was used along with MDSI. Table V justifies our choice.
Property  Summation  Multiplication 

Statistically superior over more considered datasets  ✓  
Better datasetweighted average  ✓  
Better performance on individual distortions  ✓  
Reliability  ✓  
Simplicity  ✓  
Efficiency  ✓ 
IiiF Parameters of model
The proposed IQA model MDSI has four parameters to be set. The four parameters of MDSI are , , and . To further simplify the MDSI, we set . Therefore, MDSI has only two parameters to set, e.g. and . For an example, we refer to the SSIM index [2] that also uses such a simplification. Note that gradient similarities and chromaticity similarity have different dynamic ranges, therefore, these parameters should be set such that the relation between these maps also be taken into account.
In Fig. 7, the impact of these two parameters on the performance of the MDSI is shown. Even though the parameters , and are set approximately, it can be seen that MDSI is very robust under different setup of parameters. MDSI has greater weighted average SRC than 0.90 for any and . Note that many other possible setup of parameters are not included in this plot. In the experiments, we set , , , and .
IiiG Effect of chromaticity similarity maps CS and
In this section, the impact of using CS [21] and proposed on the performance of the proposed index is studied through the following experiment. Contrast distorted images of the CCID2014 dataset [41] were chosen. The reason of choosing this dataset is to evaluate the ability of measuring color changes by CS and . We analyzed the SRC performance of the CS and as a part of the proposed index for wide range of values. Three pooling strategies were used in this experiment, e.g. mean pooling, mean absolute deviation (MAD) pooling and the standard deviation (SD) pooling. Fig. 8 shows the SRC performance of the proposed index for different scenarios. From the plot in Fig. 8, the following conclusions can be drawn. MAD pooling and both CS and are good choices for MDSI. For almost every pooling strategy and parameter of , the proposed performs better than CS. This advantage is at the same time that the proposed is more efficient than the existing CS.
IiiH Implementation and efficiency
Another very important factor of a good IQA model is its efficiency. The proposed index has a very low complexity. It first applies average filtering of size on each channel of the and images, downsample them by a factor of and convert the results to a luminance and two chromaticity channels [47]. The value of is set to [48], where and are image height and width, and is the round operator. Then, the proposed index calculates the gradient magnitudes of luminance channel, the chromaticity similarity map, and apply deviation pooling. All these steps are computationally efficient. Table VI lists the run times of fifteen IQA models when applied on images of size 384512 and 10801920. The experiments were performed on a Corei7 3.40GHz CPU with 16 GB of RAM. The IQA models were implemented in MATLAB 2013b running on Windows 7. It can be seen that MDSI is among top five fastest indices. The proposed index is less than 2 times slower than the competing GMSD index. The reason for this is that GMSD only uses the luminance feature. Compared to the other competing indices, SCQI, VSI, ADDGSIM, SFF, and FSIM, the proposed index MDSI is about 3 to 6 times, 3 to 9 times, 4 to 5 times, 4 to 5 times, and 4 to 11 times faster, respectively. Another observation from the Table VI is that the ranking of indices might not be the same when they are tested on images of different size. For example, SSIM performs slower than the proposed index on smaller images, but faster on larger images.
IQA model  384512  10801920 

PSNR  5.69  37.85 
GMSD [19]  8.90  78.22 
MDSI  12.21  152.85 
SSIM [2]  14.97  80.23 
SR_SIM [18]  17.02  100.06 
MSSSIM [7]  52.16  413.70 
ADDGSIM [22]  59.58  566.99 
SFF [20]  64.22  588.57 
SCQI [23]  71.68  524.01 
VSI [21]  106.87  492.85 
FSIM [16]  145.02  590.84 
IWSSIM [15]  244.00  2538.43 
DSCSI [44]  423.73  4599.83 
VIF [9]  635.22  6348.67 
MAD [12]  847.54  8452.50 
Iv Conclusion
We propose an effective, efficient, and reliable full reference IQA model based on the new gradient and chromaticity similarities. The gradient similarity was used to measure local structural distortions. In a complementary way, a chromaticity similarity was proposed to measure color distortions. The proposed metric, called MDSI, use a novel deviation pooling to compute the quality score from the two similarity maps. Extensive experimental results on natural and synthetic benchmark datasets prove that the proposed index is effective and reliable, has low complexity, and is fast enough to be used in realtime FRIQA applications.
Acknowledgments
The authors thank the NSERC of Canada for their financial support under Grants RGPDD 45127213 and RGPIN 13834414.
References
 [1] Z. Wang, “Applications of objective image quality assessment methods [applications corner],” IEEE Signal Processing Magazine, vol. 28, no. 6, pp. 137–142, Nov 2011.
 [2] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, April 2004.
 [3] H. Yeganeh and Z. Wang, “Objective quality assessment of tonemapped images,” IEEE Transactions on Image Processing, vol. 22, no. 2, pp. 657–667, Feb 2013.
 [4] H. Ziaei Nafchi, A. Shahkolaei, R. Farrahi Moghaddam, and M. Cheriet, “FSITM: A Feature Similarity Index For ToneMapped Images,” IEEE Signal Processing Letters, vol. 22, no. 8, pp. 1026–1029, Aug 2015.
 [5] Z. Wang and A. Bovik, “Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures,” IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98–117, Jan 2009.
 [6] N. DameraVenkata, T. Kite, W. Geisler, B. Evans, and A. Bovik, “Image quality assessment based on a degradation model,” IEEE Transactions on Image Processing, vol. 9, no. 4, pp. 636–650, Apr 2000.
 [7] Z. Wang, E. Simoncelli, and A. Bovik, “Multiscale structural similarity for image quality assessment,” in ThirtySeventh Asilomar Conference on Signals, Systems and Computers, vol. 2, Nov 2003, pp. 1398–1402.
 [8] H. Sheikh, A. Bovik, and G. de Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Transactions on Image Processing, vol. 14, no. 12, pp. 2117–2128, Dec 2005.
 [9] H. Sheikh and A. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430–444, Feb 2006.
 [10] D. Chandler and S. Hemami, “VSNR: A WaveletBased Visual SignaltoNoise Ratio for Natural Images,” IEEE Transactions on Image Processing, vol. 16, no. 9, pp. 2284–2298, Sept 2007.
 [11] L. Zhang, D. Zhang, and X. Mou, “RFSIM: A feature based image quality assessment metric using Riesz transforms,” in 17th IEEE International Conference on Image Processing (ICIP), Sept 2010, pp. 321–324.
 [12] E. C. Larson and D. M. Chandler, “Most apparent distortion: fullreference image quality assessment and the role of strategy,” Journal of Electronic Imaging, vol. 19, no. 1, p. 011006, 2010.
 [13] M. Narwaria and W. Lin, “Objective image quality assessment based on support vector regression,” IEEE Transactions on Neural Networks, vol. 21, no. 3, pp. 515–519, March 2010.
 [14] C. Li and A. C. Bovik, “Contentpartitioned structural similarity index for image quality assessment,” Signal Processing: Image Communication, vol. 25, no. 7, pp. 517–526, 2010, special Issue on Image and Video Quality Assessment.
 [15] Z. Wang and Q. Li, “Information content weighting for perceptual image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 5, pp. 1185–1198, May 2011.
 [16] L. Zhang, D. Zhang, X. Mou, and D. Zhang, “FSIM: A Feature Similarity Index for Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, Aug 2011.
 [17] A. Liu, W. Lin, and M. Narwaria, “Image quality assessment based on gradient similarity,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1500–1512, April 2012.
 [18] L. Zhang and H. Li, “SRSIM: A fast and high performance IQA index based on spectral residual,” in 19th IEEE International Conference on Image Processing (ICIP), Sept 2012, pp. 1473–1476.
 [19] W. Xue, L. Zhang, X. Mou, and A. Bovik, “Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index,” IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 684–695, Feb 2014.
 [20] H.W. Chang, H. Yang, Y. Gan, and M.H. Wang, “Sparse Feature Fidelity for Perceptual Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 22, no. 10, pp. 4007–4018, Oct 2013.
 [21] L. Zhang, Y. Shen, and H. Li, “VSI: A Visual SaliencyInduced Index for Perceptual Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 23, no. 10, pp. 4270–4281, Oct 2014.
 [22] K. Gu, S. Wang, G. Zhai, W. Lin, X. Yang, and W. Zhang, “Analysis of distortion distribution for pooling in image quality prediction,” IEEE Transactions on Broadcasting, vol. 62, no. 2, pp. 446–456, June 2016.
 [23] S. H. Bae and M. Kim, “A novel image quality assessment with globally and locally consilient visual quality perception,” IEEE Transactions on Image Processing, vol. 25, no. 5, pp. 2392–2406, 2016.
 [24] M. Oszust, “Decision fusion for image quality assessment using an optimization approach,” IEEE Signal Processing Letters, vol. 23, no. 1, pp. 65–69, Jan 2016.
 [25] W. Lin and C.C. J. Kuo, “Perceptual visual quality metrics: A survey,” Journal of Visual Communication and Image Representation, vol. 22, no. 4, pp. 297 – 312, 2011.
 [26] A. Shnayderman, A. Gusev, and A. Eskicioglu, “An SVDbased grayscale image quality measure for local and global assessment,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 422–429, Feb 2006.
 [27] S. Li, F. Zhang, L. Ma, and K. N. Ngan, “Image quality assessment by separately evaluating detail losses and additive impairments,” IEEE Transactions on Multimedia, vol. 13, no. 5, pp. 935–949, Oct 2011.
 [28] M. Sampat, Z. Wang, S. Gupta, A. Bovik, and M. Markey, “Complex wavelet structural similarity: A new image similarity index,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2385–2401, Nov 2009.
 [29] Z. Wang and X. Shang, “Spatial pooling strategies for perceptual image quality assessment,” in IEEE International Conference on Image Processing, Oct 2006, pp. 2945–2948.
 [30] A. Moorthy and A. Bovik, “Visual importance pooling for image quality assessment,” IEEE Journal of Selected Topics in Signal Processing, vol. 3, no. 2, pp. 193–201, April 2009.
 [31] A. K. Moorthy and A. C. Bovik, “Perceptually significant spatial pooling techniques for image quality assessment,” Proc. SPIE, vol. 7240, pp. 724 012–724 012–11, 2009.
 [32] G.H. Chen, C.L. Yang, and S.L. Xie, “Gradientbased structural similarity for image quality assessment,” in IEEE International Conference on Image Processing, Oct 2006, pp. 2929–2932.
 [33] D.O. Kim, H.S. Han, and R.H. Park, “Gradient informationbased image quality metric,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 930–936, May 2010.
 [34] J.M. Geusebroek, R. Van den Boomgaard, A. Smeulders, and H. Geerts, “Color invariance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 12, pp. 1338–1350, Dec 2001.
 [35] G. F. I. Sobel, “A 3x3 isotropic gradient operator for image processing,” 1968, presented at a talk at the Stanford Artificial Project.
 [36] I.T. P. 1401, “Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models,” https://www.itu.int/rec/dologin_pub.asp?lang=e&id=TRECP.1401201207I!!PDFE&type=items, July 2012.
 [37] H. Sheikh, Z. Wang, L. Cormack, and A. Bovik, “Live image quality assessment database release 2,” Online: http://live.ece.utexas.edu/research/quality.
 [38] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and F. Battisti, “TID2008  A Database for Evaluation of FullReference Visual Quality Assessment Metrics,” Advances of Modern Radioelectronics, vol. 10, pp. 30–45, 2009.
 [39] N. Ponomarenko, O. Ieremeiev, V. Lukin, K. Egiazarian, L. Jin, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, and C.C. Kuo, “Color image database TID2013: Peculiarities and preliminary results,” in 4th European Workshop on Visual Information Processing (EUVIP), June 2013, pp. 106–111.
 [40] N. B. H. H. M. L. E. D. S. G. A. Zaric, N. Tatalovic, “VCL@FER Image Quality Assessment Database,” AUTOMATIKA, vol. 53, no. 4, pp. 344–354, 2012.
 [41] K. Gu, G. Zhai, W. Lin, and M. Liu, “The analysis of image contrast: From quality assessment to automatic enhancement,” IEEE Transactions on Cybernetics, vol. 46, no. 1, pp. 284–297, 2015.
 [42] D. Kundu and B. L. Evans, “Fullreference visual quality assessment for synthetic images: A subjective study,” in Proc. IEEE Int. Conf. on Image Processing, 2015, pp. 2374–2378.
 [43] C. Vu, T. Phan, P. Singh, , and D. M. Chandler, “Digitally Retouched Image Quality (DRIQ) Database,” Online: http://vision.okstate.edu/driq/, 2012.
 [44] D. Lee and K. N. Plataniotis, “Towards a fullreference quality assessment for color images using directional statistics,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3950–3965, Nov 2015.
 [45] H. Sheikh, M. Sabir, and A. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440–3451, Nov 2006.
 [46] “Final report from the video quality experts group on the validation of objective models of video quality assessment,” ftp://vqeg.its.bldrdoc.gov/Documents/Meetings/Hillsboro_VQEG_Mar_03/VQEGIIDraftReportv2a.pdf, 2003.
 [47] H. Ziaei Nafchi and M. Cheriet, “A note on efficiency of downsampling and color transformation in image quality assessment,” CoRR, vol. abs/1606.06152, 2016. [Online]. Available: http://arxiv.org/abs/1606.06152
 [48] https://ece.uwaterloo.ca/~z70wang/research/ssim/.