Compressed Image Quality Assessment Based on Saak Features

Compressed Image Quality Assessment Based on Saak Features


Compressed image quality assessment plays an important role in image services, especially in image compression applications, which can be utilized as a guidance to optimize image processing algorithms. In this paper, we propose an objective image quality assessment algorithm to measure the quality of compressed images. The proposed method utilizes a data-driven transform, Saak (Subspace approximation with augmented kernels), to decompose images into hierarchical structural feature space. We measure the distortions of Saak features and accumulate these distortions according to the feature importance to human visual system. Compared with the state-of-the-art image quality assessment methods on widely utilized datasets, the proposed method correlates better with the subjective results. In addition, the proposed methods achieves more robust results on different datasets.

Compressed Image Quality Assessment Based on Saak Features

Xinfeng Zhang, Sam Kwong and C.-C. Jay Kuo
School of Computer Science and Technology, University of Chinese Academy of Sciences
Department of Computer Science, City University of Hong Kong
Ming Hsieh Department of Electrical Engineering, University of Southern California

Index Terms—  Saak, structural distortion, image quality assessment, compressed image, HVS

1 Introduction

Lossy image compression techniques such as JPEG and JPEG2000 achieved high compression ratios at the cost of perceived degradation in image quality [1, 2, 3]. The state-of-the-art image compression systems usually need a quality metric to optimize the image coding procedure by assigning bits to various image contents adaptively. However, the widely utilized peak-signal-to-noise-ratio (PSNR) metric mainly focuses on the pixel-level difference between the original and compressed images, which is not well correlated with human perceptual quality. This is because human perception is very sensitive to structural distortions instead of individual pixel distortion.

In the past decades, to obtain more consistent quality measures with human visual perception, numerous image quality assessment (IQA) methods [4, 5, 6, 7, 8, 9, 10] and datasets [11, 12, 13] have been proposed for different distortion types. The existing IQA methods can be divided into three categories according to the availability of reference images, i.e., full reference IQA (FR-IQA), reduced reference IQA (RF-IQA) and no reference IQA (NR-IQA). In most scenarios, the reference images are available for compressed image quality assessment problem. In [4], Wang et al. proposed Structural SIMilarity (SSIM) index metric to calculate image quality according to patch similarity between the reference and distorted images instead of the pixel-level distortions. In addition, SSIM takes the correlation function as its quality metric model to reflect patch structural distortions instead of mean square error (MSE), which is utilized in PSNR. This simple and effective method achieved more correlated results with subjective quality on different IQA datasets.

To further improve the performance of SSIM, in [5], Wang et al. measured multi-scale structural similarity (MS-SSIM) to provide more flexibility than single-scale method by dealing with the variational perceivability of image details. In [6], Wang and Li further utilized the information content weighting strategy to improve the IQA performance in pooling stage, where the weights are calculated based on the Gaussian scale mixture (GSM) model of natural images. This weighting strategy not only can improve the performance of SSIM metric but also can make the pixel-level quality metric PSNR be a competitive perceptual quality metric. In [7, 8], Zhang et al. further improved the performance of the SSIM-based quality metric by designing more effective features to capture the local structural information, to which human visual system (HVS) is more sensitive than local pixel correlation in SSIM.

Different from other distortion types, the major distortions in compressed images are structural distortions, e.g. blocking artifacts in JPEG images and ringing artifacts in JPEG2000 images. Therefore, highly efficient structural information representation is more important for compressed image quality assessment problem. In [9], Gore and Gupta proposed the LSDBIQ technique by utilizing the local standard deviation of an image to reflect the structural information to evaluate the compressed JPEG image quality. In [10], Li et al. evaluated the structural distortions including blockiness and blurring artifacts by utilizing the Tchebichef moments as features to differentiate the quality of deblocked JPEG compression images.

In this paper, we proposed a new full reference image quality assessment method based on the efficient data-driven feature extraction transform, Subspace approximation with augmented Kernels (Saak) [14]. The proposed method first applies a low-pass filter to remove random noise which is difficult to be perceived by HVS. Then, the data-driven Saak transform is learnt from reference image, and both reference and distorted images are transformed by the learnt Saak transform, which converts images from pixel domain to feature domain. Since the feature maps in different spectral components of Saak transform domain show various influences on perceptual quality, we design a weighting strategy for the qualities of feature maps. The experimental results on JPEG and JPEG2000 images are conducted and the proposed method achieves very promising results, especially for the JPEG images.

The remainder of this paper is organized as follows. Section 2 introduces the data-driven Saak transform. Section 3 provides the detailed introduction on the proposed compressed image quality assessment method using Saak transform features. Extensive experimental results and discussions are reported in Section 4, and finally we conclude this paper in Section 5.

2 Saak Transform

Inspired by convolutional neural networks (CNN), Kuo and Chen proposed Saak transform in [14] to well support efficient feature extraction in image understanding [15] and image reconstruction. The Saak transform consists of two main ingredients similar with traditional CNNs, i.e., subspace approximation and kernel augmentation. Herein, the subspace approximation utilizes data-driven kernels generated from a learning process according to the principle of energy compaction. Different from the traditional CNNs, subspace approximation in Saak utilizes the orthonormal eigenvectors of the covariance matrix of training samples as transform kernels, which are the well-known Karhunen-Love transform (KLT) kernels. KLT is the optimal transform in terms of energy compaction and can provide the optimal approximation to the input with the smallest mean-squared-error (MSE) when truncating the KLT kernel functions associated with the smallest eigenvalues. To deal with various structure scales, Saak transform cascades two or more transforms directly to cover a larger receptive field.

However, this cascaded operation can result in “sign confusion” problem [16]. To solve the problem, Kuo proposed to insert the Rectified Linear Unit (ReLU) activation function in between. Since the ReLU function inevitably brings up the rectification loss, Kuo further proposed the kernel augmentation strategy to eliminate this loss in Saak transform. That is, each transform kernel is augmented with its negative vector, and both original and augmented kernels are utilized in the Saak transform. When an input vector is projected onto the positive/negative kernel pair, one will go through the ReLU operation while the other will be blocked. Therefore, the Saak transform can achieve multi-stage conversion between spatial-spectral representations with lossless. Fig.1 shows the architecture of forward and inverse Saak transforms, where the ReLU operation is implemented by signal-position conversion. In forward transform, each spectral component is split into positive and negative parts via S/P conversion as shown in Eqn.(1) and (2), and the two parts are merged via P/S conversion in inverse transform stage.


Herein, and are the positive and negative parts for the spectral component. The signed KLT coefficients in each stage are denoted as Saak coefficients. Since these coefficients reflect the dominant structural information of input images, they can serve as discriminative features of input images to carry out various image understanding tasks.

Fig. 1: The block diagram of forward and inverse Saak transforms

3 Compressed Image Quality Assessment based on Saak Features

In compressed images, the main distortion source is the quantization on transformed coefficients. For JPEG images, the independent quantization for blocks results in blocking artifacts, and for JPEG2000 compression, the bit plane truncation quantization for wavelet coefficients results in ringing artifacts. These two kinds of distortions are both structural distortions, and in particular HVS is more sensitive to the blocking artifacts.

Inspired by the principle of Saak transform, we utilize Saak transform as a structural information extractor to convert images into a feature domain with structural representations, to which HVS are more sensitive. Fig.2(c) and Fig.2(d) show the Saak coefficients in the first and AC spectral components of Monarch, where the main structures are extracted and enhanced in transform domain especially in low frequency spectral components. In the high frequency spectral components, the Saak coefficients show very weak structures.

(a) Monarch
(c) 1 AC spectral component
(d) 7 AC spectral component
Fig. 2: The Saak transformed image and the coefficient energy distribution.

Since the Saak transform kernels are data-driven, their performance on feature extraction is better if the statistical characteristics of test samples are more consistent with training samples. In the proposed method, we directly extract overlapped blocks from reference image and train special Saak transform matrix, , for individual image content. To avoid the influence of noisy blocks without meaningful structures, we only utilize the blocks with their pixel standard deviation larger than 2 as training samples.

To calculate the objective quality score for one distorted image, we first apply a low-pass Gaussian filter, , to both the reference and distorted images to remove the inevitable noisy information, which is usually masked by HVS. Then, the learnt Saak transform from the corresponding reference image is utilized to convert the reference and distorted images into spectral domain. Finally, a quality function is formulated based on these Saak coefficients, which is expressed as follows,


There are two popular kinds of quality functions, i.e., MSE and correlation based functions, which are utilized in PSNR and SSIM based quality metrics. Considering their advantages in different distortion types, we design our quality function as the weighted combination of the two kinds functions,


where and are the transformed reference and distorted images respectively, is the MSE between and , which correspond to the spectral components of reference and distorted images. is the correlation coefficient of and . Here, is the number of spectral components, is a constant factor and is a weighting factor.

Except for the augmentation, the Saak transform kernels actually are the same with KLT kernels, which can well concentrate the image energy in the low frequency spectral components. As shown in Fig.2(b), we can see the spectral component energy decreases rapidly. By joint analyzing Fig.2(c) and Fig.2(d), we can assume that the dominant image structures are extracted into the low frequency spectral components, while non-structural information is decomposed into the high frequency spectral components. The quality of low frequency spectral components plays more important role in the compressed image quality assessment problem. Therefore, we proposed a weighting function according to the energy distribution of spectral components,


where is the mean square of the coefficients in and , and is a constant parameter. is a normalization factor.

4 Experimental Results and Analyses

In this section, we evaluate the efficiency of the proposed method on two popular IQA datasets, LIVE [11] and CSIQ [12], and compare our method with some state-of-the-art IQA algorithms, including IFC [17], IWSSIM [6], MAD [12], MS-SSIM [5], PSNR, PSNR-HVS [18], RFSIM [7], SSIM [4], UQI [19], VIF [20], and VSI [21]. There are 30 reference images in LIVE and CSIQ respectively, but the reference images in the two datasets are different contents without duplication. There are two kinds of compressed images generated by JPEG and JPEG2000 respectively with different distortion levels in these datasets. To evaluate the performance of these IQA methods, we utilized three widely utilized correlation coefficients, i.e., Pearson linear correlation coefficient (PLCC), Spearman rank-order correlation coefficient (SRCC), Kendall rank-order correlation coefficient (KRCC). The objective scores from a better IQA method should have higher correlation coefficients with subjective scores. Herein, the PLCC metric is calculated between MOS and the objective scores after nonlinear regression, and the widely used nonlinear regression function proposed in [22] is as follows,

IFC 0.8992 0.8661 0.6756 0.9022 0.8920 0.6967 0.95 0.9412 0.7975 0.9331 0.9252 0.781
IWSSIM 0.9412 0.9074 0.7441 0.9578 0.9501 0.8001 0.9844 0.9662 0.8421 0.9809 0.9683 0.848
MAD 0.9384 0.9055 0.7375 0.9618 0.9531 0.8114 0.9827 0.9615 0.8351 0.9836 0.9752 0.8709
MSSIM 0.9429 0.9131 0.7487 0.9572 0.9529 0.8042 0.9823 0.9634 0.8332 0.9777 0.9684 0.8439
PSNR 0.8596 0.8409 0.6359 0.8964 0.8898 0.7037 0.8799 0.8881 0.6936 0.9451 0.9362 0.7667
PSNR-HVS 0.9352 0.9040 0.7215 0.9521 0.9454 0.7932 0.9697 0.9514 0.8032 0.9769 0.9703 0.8449
RFSIM 0.9255 0.8930 0.7084 0.9350 0.9286 0.7635 0.9617 0.9481 0.7954 0.9667 0.9635 0.8306
SSIM 0.9297 0.9028 0.7149 0.9368 0.9317 0.7633 0.942 0.9222 0.7545 0.9236 0.9207 0.754
UQI 0.8518 0.8273 0.6216 0.8487 0.8466 0.6434 0.9182 0.9078 0.728 0.9043 0.8813 0.7088
VIF 0.9302 0.9047 0.7326 0.9576 0.9524 0.8026 0.9228 0.9684 0.8416 0.9782 0.9697 0.8501
VSI 0.9443 0.9089 0.7308 0.9533 0.9480 0.7944 0.9808 0.9618 0.8271 0.9745 0.9694 0.8482
Proposed 0.9441 0.914 0.7517 0.9625 0.9542 0.809 0.9878 0.9691 0.8489 0.9813 0.9737 0.861
Table 1: The PLCC, SRCC and KRCC between the subjective scores and the objective scores from different IQA methods on LIVE and CSIQ datasets.
(a) JPEG images
(b) JPEG2000 images
Fig. 3: Scatter plots of subjective scores versus the predicted scores from the proposed method on LIVE dataset (a) JPEG, (b) JPEG2000.

In our experiment, the parameter in Eqn.(4) is set 400, and in Eqn.(5) is set 100. Since JPEG and JPEG2000 are different compression frameworks, which are suitable for different quality functions, we set the parameter as 0.7 for JPEG images and 0.2 for JPEG2000 images. In our experiments, the two-stage Saak transform with block size is utilized, and each image is transformed into 496 spectral components after P/S conversion, i.e., . Table 1 lists all the three correlation coefficients on the two datasets, where the top2 performances are highlighted by boldface. We can see that the proposed method achieves the best performance on the two datasets based on the three correlation coefficients. In particular, the proposed method achieves much better performance on JPEG compressed images compared with that on JPEG2000 images. This is because in JPEG images, the blocking artifacts are more destructive to image structures compared with the ringing artifacts in JPEG2000 images. Although MAD achieves better quality assessment performance on CSIQ JPEG2000 images, it is obviously inferior to the proposed method on the other cases. The proposed method shows strong robustness on different datasets which verifies that the structural information extracted by Saak transform is more meaningful for HVS.

In Fig.3, we illustrate the scatter plots of the subjective scores and the predicted scores from the proposed method on LIVE dataset. Herein, the curves shown in Fig.3 are obtained by the nonlinear fitting function in Eqn.(6). We can see that the proposed method provides very consistent results with the subjective ones.

5 Conclusion

In this paper, we proposed a new compressed image quality assessment method utilizing the Saak transform to extract dominant features. The proposed method measures the compressed image quality in Saak transform domain according to feature distortions instead of pixel-level distortions in spatial domain. The feature distortions in different spectral components are weighted according to the feature importance which is measured by the energy of spectral components in Saak transform domain. Finally, the MSE and correlation based quality functions are combined to predict the final objective quality score. Experimental results on two popular IQA datasets show that the proposed method achieved better performance and strong robustness compared with the widely utilized IQA methods. This work also illustrated the efficiency of the Saak transform in image structural information representation.


  • [1] Xinfeng Zhang, Ruiqin Xiong, Xiaopeng Fan, Siwei Ma, and Wen Gao, “Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity,” IEEE transactions on image processing, vol. 22, no. 12, pp. 4613–4626, 2013.
  • [2] Xinfeng Zhang, Weisi Lin, Ruiqin Xiong, Xianming Liu, Siwei Ma, and Wen Gao, “Low-rank decomposition-based restoration of compressed images via adaptive noise estimation,” IEEE Transactions on Image Processing, vol. 25, no. 9, pp. 4158–4171, 2016.
  • [3] Xinfeng Zhang, Shiqi Wang, Yabin Zhang, Weisi Lin, Siwei Ma, and Wen Gao, “High-efficiency image coding via near-optimal filtering,” IEEE signal processing letters, vol. 24, no. 9, pp. 1403–1407, 2017.
  • [4] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  • [5] Zhou Wang, Eero P Simoncelli, and Alan C Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers. IEEE, 2003, vol. 2, pp. 1398–1402.
  • [6] Zhou Wang and Qiang Li, “Information content weighting for perceptual image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 5, pp. 1185–1198, 2011.
  • [7] Lin Zhang, Lei Zhang, and Xuanqin Mou, “RFSIM: A feature based image quality assessment metric using Riesz transforms,” in IEEE International Conference on Image Processing (ICIP). IEEE, 2010, pp. 321–324.
  • [8] Lin Zhang, Lei Zhang, Xuanqin Mou, David Zhang, et al., “FSIM: a feature similarity index for image quality assessment,” IEEE transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, 2011.
  • [9] Akshay Gore and Savita Gupta, “Full reference image quality metrics for JPEG compressed images,” AEU-International Journal of Electronics and Communications, vol. 69, no. 2, pp. 604–608, 2015.
  • [10] Leida Li, Yu Zhou, Weisi Lin, Jinjian Wu, Xinfeng Zhang, and Beijing Chen, “No-reference quality assessment of deblocked images,” Neurocomputing, vol. 177, pp. 572–584, 2016.
  • [11] H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik, “LIVE Image Quality Assessment Database Release 2,”, Accessed Dec.12, 2018.
  • [12] Eric Cooper Larson and Damon Michael Chandler, “Most apparent distortion: full-reference image quality assessment and the role of strategy,” Journal of Electronic Imaging, vol. 19, no. 1, pp. 011006, 2010.
  • [13] Xinfeng Zhang, Weisi Lin, Shiqi Wang, Jiaying Liu, Siwei Ma, and Wen Gao, “Fine-grained quality assessment for compressed images,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1163–1175, 2019.
  • [14] C.-C. Jay Kuo and Yueru Chen, “On data-driven saak transform,” Journal of Visual Communication and Image Representation, vol. 50, pp. 237–246, 2018.
  • [15] Yueru Chen, Zhuwei Xu, Shanshan Cai, Yujian Lang, and C-C Jay Kuo, “A saak transform approach to efficient, scalable and robust handwritten digits recognition,” in Picture Coding Symposium (PCS). IEEE, 2018, pp. 174–178.
  • [16] C.-C. Jay Kuo, “The CNN as a guided multilayer recos transform [lecture notes],” IEEE Signal Processing Magazine, vol. 34, no. 3, pp. 81–89, 2017.
  • [17] Hamid R Sheikh, Alan C Bovik, and Gustavo De Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Transactions on image processing, vol. 14, no. 12, pp. 2117–2128, 2005.
  • [18] Karen Egiazarian, Jaakko Astola, Nikolay Ponomarenko, Vladimir Lukin, Federica Battisti, and Marco Carli, “New full-reference quality metrics based on HVS,” in Proceedings of the Second International Workshop on Video Processing and Quality Metrics, 2006, vol. 4.
  • [19] Zhou Wang and Alan C Bovik, “A universal image quality index,” IEEE signal processing letters, vol. 9, no. 3, pp. 81–84, 2002.
  • [20] Hamid R Sheikh and Alan C Bovik, “Image information and visual quality,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2004, vol. 3, pp. iii–709.
  • [21] Lin Zhang, Ying Shen, and Hongyu Li, “VSI: A visual saliency-induced index for perceptual image quality assessment,” IEEE Transactions on Image Processing, vol. 23, no. 10, pp. 4270–4281, 2014.
  • [22] Hamid R Sheikh, Muhammad F Sabir, and Alan C Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on image processing, vol. 15, no. 11, pp. 3440–3451, 2006.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description