Double JPEG Compression Detection by Exploring the Correlations in DCT Domain

Double JPEG Compression Detection by Exploring the Correlations in DCT Domain

Pengpeng Yang, Rongrong Ni, and Yao Zhao Beijing Jiaotong University, Institute of Information Science, Beijing, China Beijing Key Laboratory of Advanced Information Science and Network Technology
E-mail: {ppyang, rrni, yzhao}@bjtu.edu.cn
Abstract

In the field of digital image processing, JPEG image compression technique has been widely applied. And numerous image processing software suppose this. It is likely for the images undergoing double JPEG compression to be tampered. Therefore, double JPEG compression detection schemes can provide an important  clue for image forgery detection. In this paper,  we propose an effective  algorithm to detect double JPEG compression with different  quality factors.  Firstly, the quantized DCT coefficients with  same frequency are extracted to build the new data matrices. Then,  considering the direction effect  on the correlation  between the adjacent positions in  DCT domain, twelve kinds of high-pass filter templates with different directions  are  executed  and the translation probability matrix is calculated for  each filtered  data.  Furthermore,  principal component analysis and support vector machine technique are applied to reduce the feature dimension and train a classifier, respectively.  Experimental results have demonstrated that the proposed method is effective and has comparable performance.Â

ptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptpt

Double JPEG Compression Detection by Exploring the Correlations in DCT Domain


Pengpeng Yang, Rongrong Ni, and Yao Zhao Beijing Jiaotong University, Institute of Information Science, Beijing, China Beijing Key Laboratory of Advanced Information Science and Network Technology E-mail: {ppyang, rrni, yzhao}@bjtu.edu.cn


I Introduction

With the development of digital image processing technology, enormous image processing software that free and easy to operate are spreading across the world. The authenticity and integrity of digital images have encountered huge challenges and seeing is no longer believing [1-3], which might be harmful to social public security. To overcome such challenges, digital image forensics is put forward and has become a vital issue in recent years.

As an important branch of digital image forensics, double JPEG image forensic [4,5] has attracted much attention in forensic community, which can provide a clue for studying the authenticity of digital images. According to the status of the quality factor during double compression test , the studies can be divided into two categories: ( and denote the quality factor during the first and second compression, respectively), and . Since the situation with is a common practice, some researchers have paid much attention to this problem and a number of algorithms have been proposed. An effective scheme relied on the shape of the histogram of the DCT coefficients [6,7]. The authors observed that double JPEG compression would induce periodic artifacts in the histogram of the DCT coefficients. In addition, Benford’s Law or the First-Digit phenomenon [8-11] was used to solve this issue. The authors proposed that the distribution of the first number of the DCT coefficients would be destroyed after double JPEG compression. Besides, Chen et. al [12] put forward a method based on machine learning. The difference matrices of the DCT coefficients along with four directions were obtained firstly. Then the translation probability matrices were calculated to describe the feature of the correlations in DCT domain. Recently, the data-driven methods [13-17] have been presented and achieved good performance.

In this paper, focusing on the case of , we propose an improving algorithm by exploring the correlations in DCT domain. Firstly, inspired by Chen’s work, the different matrices of the DCT coefficients along with twelve kinds of the directions are calculated. In this way, the changes in correlations caused by double JPEG compression would be better described. Furthermore, considering the limitless of the global features, we extract twenty sub-matrices from the DCT coefficients according to the MODE (one MODE is at the same frequency). The proposed method is evaluated on a public image database: UCID [18] and compared with two previous algorithms. Experimental results prove its effectiveness. The rest of this paper is organized as follow. Section 2 give a brief description of the JPEG compression. Section 3 present in detail the proposed method. experimental results are discussed in Section 4 and conclusion is drawn in Section 5.

Ii JPEG Compression

Fig. 1: The workflow of single and double JPEG compression and decompression.
Fig. 2: The overall framework of proposed feature extraction method.

JPEG compression, a popular lossy compression scheme widely used in image data compression, was proposed by the Joint Photographic Experts Committee. There are six important steps in image compression and decompression system: B-DCT (Block Discrete Cosine Transform base on non-overlapping blocks of size 8x8 pixels), quantization, entropy coding, entropy decoding, de-quantization, and B-IDCT (Block Inverse Discrete Cosine Transform), which are executed in order.

The diagram of single and double JPEG compression and decompression is shown in Fig. 1. The procedure from (the raw image) to (the single compressed image) illustrates the single JPEG compression with . The process from (decompressed data) to denotes the double JPEG compression with . In the following sections, only one case, , will be discussed, which seems to be a common practice.

Iii Proposed method

The well-known machine learning technique represents an efficient tool for the detection of double JPEG compression. Inspired by Chen’s machine learning-based scheme [12], the two-phase framework for feature extraction and classification is applied into our method. The flow-process for feature descriptors extraction is presented in Fig. 2, which includes two different ways as shown in red and green dotted boxes. For the red one, the correlations along various directions in DCT domain are introduced. Specifically, the twelve types of high-pass filters are executed on the quantized DCT coefficients and the translation probability matrix is calculated for each filtered data to establish the feature set. For the green one, in order to further improve performance, the correlations of the zig-zag sequence of firstly twenty AC coefficients are separately considered based on the above idea. In addition, principle component analysis (PCA) is used to reduce feature dimension. In the stage of feature classification, the SVM [19] classifiers with two degree polynomial kernel are trained through the above features. The description of feature extraction in detail is as follows.

Fig. 3: Visualization on the global and local features of single and double JPEG compressed image. The abscissa and ordinate denote the index and values of the feature, respectively. (a) shows the original image; (b) represents the feature of the correlations among multiple directions with ; the firstly four MODE features with are shown in (c),(d),(e),(f).

Iii-a The Correlations Among Multiple Directions

According to the previous work [12], there is no double that the correlations in DCT domain provide an important clue for double JPEG compression detection and the transition probability matrix can effectively describe the changes of the correlations. Following this way, the correlations among multiple directions are explored in this part as shown in the red dotted box of Fig. 2. Firstly, the quantized DCT coefficients for the luminance channel, , are obtained by jpeg_read function in Matlab and taking its absolute values to build the quantized matrix. Then the twelve types of high-pass filters along with different directions are executed on the quantized matrix. The filtered data are truncated with a threshold.

(1)

where is the absolute value function, denotes the surrounding values of as shown in Fig. 2 and represents the high-pass filter with 5x5 windows.

(2)

where is the indicator function, which means that if and only if , otherwise . is truncate operation. In our experiments, the value for is setted to 4.

(3)

Next, the transition probability matrices for each filtered data are calculated as follows,

(4)
(5)

where , , denote the index of matrix. and have the same position relationship with and . In the end, considering the symmetry of DCT basis functions and the sign symmetry of the translation matrices, are combined as follows and the feature of correlations among multiple directions with 410 dimensions, , would be generated.

(6)

Iii-B The Correlations Based on MODE

To some extent the above feature can represent the global information of the images. However, the local feature would provide a new perspective sometime. Therefore, the MODE is introduced to our method, which is supposed to improve performance. Note that one MODE is at the same frequency. The framework of feature extraction about the correlations based on MODE is shown in the green dotted box of Fig. 2. Differing from the above feature, same MODE in is firstly extracted to construct the sub-matrices. It should be noted that only the firstly twenty MODEs in zig-zag order are used in this work. In order to verify the effectiveness of the feature based on MODE, we randomly chose one image from UCID dataset and extract its features of the quantized DCT coefficients and top four MODEs coefficients with . The difference of the features between single and double JPEG compression image is visualized in Fig. 3. It can be seen that the MODE-based features make it easier to distinguish. Furthmore, the MODE-based feature has up to 17010 dimensions and PCA is applied to reduce it to 1300.

Iv Experiments

In this Section, the experiments are conducted based on the public image database collected for image forensics, UCID [18] where 1338 raw images are included, to evaluate the effectiveness of the proposed method. These images are compressed with quality factors and in sequence to generate the single and double JPEG compressed images, where and . We randomly select 1138 single and double images as training data to train the SVM classifiers with two degree polynomial kernel. The others are assigned as testing data. The detection accuracy is averaged over 20 random experiments and the results are shown in the Tables below.

One can see from Table I and III that the proposed algorithm based on the correlations among multiple directions has better performance comparing with the machine learning based method [12], which indicates that the idea of multiple directions introduced shall certainly be beneficial to the detection of double JPEG compression. However, detection accuracy of the above schemes is worse in the cases of . The reason could be the limitation of the global feature, as we discussed in Section III.B. As shown in Table IV, using local feature of the correlations based on MODE typically receives a performance boost and obtains comparable detection performance with Benford’s Law based method [9]. In addition, it can be seen that the proposed scheme based on MODE gets better performance in the cases of .

V Conclusions

In this paper, focusing on the double JPEG compression detection, we explore the features in DCT domain from two perspectives: the correlations between various directions and the correlations based on MODE. The proposed method by fusing the above points together obtains comparable performance with Benford’s Law based scheme and has higher detection accuracy in much more difficult circumstances, .

Acknowledgment

This work was supported in part by National NSF of China (61672090, 61332012), the National Key Research and Development of China (2016YFB0800404), Fundamental Research Funds for the Central Universities (2015JBZ002, 2017YJS054).

References

  • [1] H. Farid, “Digital doctoring: how to tell the real from the fake,” Significance, vol. 3, no.4, pp. 162-166, 2006.
  • [2] B. Zhu, M. Swanson, and A. Tewfik, “When seeing isn’t believing [multimedia authentication technologies] ,” IEEE Signal Processing Magazine, vol. 21, no. 2, pp. 40-49, 2004.
  • [3] “Photo tampering through history,” 2012, http://www. fourandsix.com/photo-tamper-history/.
  • [4] W. Luo, J. Huang, and G. Qiu, “JPEG error analysis and its applications to digital images,” IEEE Trans. Inf. Forensics Security, vol. 5, no. 3 ,pp. 27-38, Jan. 2010.
  • [5] A. Piva, “An overview on image forensics,” Hindavi, ISRN Signal Procession , 2013.
  • [6] A. C. Popescu, “Statistical tools for digital image forensics,” Ph.D Thesis,Department of Computer Science, Dartmouth College, Hanover, New Hampshire, December 2004.
  • [7] B. Mahdian, S. Saic, “Detecting double compressed JPEG images,”
  • [8] D. Fu, Y. Q. Shi, and W. Sun. “A generalized Benfords law for JPEG coefficients and its applications in image forensics,” in Proc. of SPIE, vol. 6505, pp. 39-48, Feb 2009.
  • [9] B. Li, Y. Q. Shi, and J. Huang. “Detecting doubly compressed JPEG images by using mode based first digit features,” in proc. of MMSP 2008, Cairns, Queensland, Australia, pp. 730-735, Oct 2008.
  • [10] S. Milani, M. Tagliasacchi, and S. Tubaro, “Discriminating multiple JPEG compression using first digital features,” in ICASSP . IEEE, pp. 2253-2256, 2012.
  • [11] I. Amerini, R. Becarelli, “Splicing forgeries localization through the use of first digit features” in WIFS. IEEE, 2014.
  • [12] C.H. Chen, Y.Q. Shi, and W. Su, “A machine learning based scheme for double jpeg compression detection,” in Proc. IEEE International Conference on Pattern Recognition, pp. 1814–1817, 2008.
  • [13] Wang Q, Zhang R. Double JPEG compression forensics based on a convolutional neural network[J]. EURASIP Journal on Information Security, 2016, 2016(1): 23.
  • [14] Barni M, Bondi L, Bonettini N, et al. Aligned and non-aligned double JPEG detection using convolutional neural networks[J]. Journal of Visual Communication and Image Representation, 2017, 49: 153-163.
  • [15] Verma V, Agarwal N, Khanna N. DCT-domain Deep Convolutional Neural Networks for Multiple JPEG Compression Classification[J]. arXiv preprint arXiv:1712.02313, 2017.
  • [16] Amerini I, Uricchio T, Ballan L, et al. Localization of JPEG double compression through multi-domain convolutional neural networks[C]//Proc. of IEEE CVPR Workshop on Media Forensics. 2017.
  • [17] Bin Li, Hu Luo, Haoxin Zhang, Shunquan Tan, Zhongzhou Ji. A multi-branch convolutional neural network for detecting double JPEG compression. The 3rd International Workshop on Digital Crime and Forensics. Nanchang, 1-16, 2017
  • [18] http://vision.cs.aston.ac.uk/datasets/ucid/ucid.html
  • [19] https://www.csie.ntu.edu.tw/ cjlin/libsvm/
Q1\Q2 50 55 60 65 70 75 80 85 90 95
50 -— 97 100 100 100 100 100 100 100 100
55 96 -— 99 100 100 100 100 100 100 100
60 100 99 -— 99 100 100 100 100 100 100
65 100 100 100 -— 99 100 100 100 100 100
70 100 100 100 100 -— 100 100 100 100 100
75 99 100 100 100 100 -— 100 100 100 100
80 100 100 99 100 100 100 -— 100 100 100
85 98 89 99 100 99 100 100 -— 100 100
90 83 97 96 99 96 100 99 100 -— 100
95 63 62 76 74 86 77 94 85 100 -—
TABLE I: The detection accuracy of Machine Learning-based method [12].
Q1/Q2 50 55 60 65 70 75 80 85 90 95
50 -— 98 100 100 100 100 100 100 100 100
55 98 -— 100 100 100 100 100 100 100 100
60 100 99 -— 100 100 100 100 100 100 100
65 100 100 99 -— 99 100 100 100 100 100
70 99 100 100 100 -— 99 100 100 100 100
75 95 99 100 100 100 -— 100 100 100 100
80 100 99 100 100 100 100 -— 100 100 100
85 99 98 99 99 99 100 100 -— 100 100
90 97 98 98 98 99 99 99 100 -— 100
95 76 80 90 95 92 96 98 99 99 -—
TABLE II: The detection accuracy of Benford’s Law-based method [9].
Q1/Q2 50 55 60 65 70 75 80 85 90 95
50 -— 97 100 100 100 100 100 100 100 100
55 96 -— 99 100 100 100 100 100 100 100
60 100 99 -— 99 100 100 100 100 100 100
65 100 100 100 -— 99 100 100 100 100 100
70 100 100 100 100 -— 100 100 100 100 100
75 99 100 100 100 100 -— 100 100 100 100
80 100 100 99 100 100 100 -— 100 100 100
85 99 96 100 100 99 100 100 -— 100 100
90 91 98 97 99 97 100 99 100 -— 100
95 72 79 83 86 92 88 96 94 100 -—
TABLE III: The detection accuracy of proposed method based on the correlations among multiple directions
Q1/Q2 50 55 60 65 70 75 80 85 90 95
50 -— 95 100 100 100 100 100 100 100 100
55 94 -— 99 100 100 100 100 100 100 100
60 100 99 -— 99 100 100 100 100 100 100
65 100 100 99 -— 100 100 100 100 100 100
70 100 100 100 99 -— 100 100 100 100 100
75 99 100 100 100 100 -— 100 100 100 100
80 99 99 99 100 100 100 -— 100 100 100
85 98 97 99 99 99 100 100 -— 100 100
90 94 97 99 98 98 99 99 100 -— 100
95 79 86 92 94 96 98 98 99 100 -—
TABLE IV: The detection accuracy of proposed method based on MODE
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
208331
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description