A Deep Learning Based Attack for The Chaos-based Image Encryption

A Deep Learning Based Attack for The Chaos-based Image Encryption

Chen He,  Kan Ming, Yongwei Wang,  and Z. Jane Wang, 

In this letter, as a proof of concept, we propose a deep learning-based approach to attack the chaos-based image encryption algorithm in [1]. The proposed method first projects the chaos-based encrypted images into the low-dimensional feature space, where essential information of plain images has been largely preserved. With the low-dimensional features, a deconvolutional generator is utilized to regenerate perceptually similar decrypted images to approximate the plain images in the high-dimensional space. Compared with conventional image encryption attack algorithms, the proposed method does not require to manually analyze and infer keys in a time-consuming way. Instead, we directly attack the chaos-based encryption algorithms in a key-independent manner. Moreover, the proposed method can be trained end-to-end. Given the chaos-based encrypted images, a well-trained decryption model is able to automatically reconstruct plain images with high fidelity. In the experiments, we successfully attack the chaos-based algorithm [1] and the decrypted images are visually similar to their ground truth plain images. Experimental results on both static-key and dynamic-key scenarios verify the efficacy of the proposed method.


chaos-based encryption, image decryption, deep learning.

I Introduction

A series of characteristics of chaotic systems such as pseudo-random characteristics, unpredictability of orbit, sensitivity to initial state and control parameters are in good agreement with many requirements of cryptography, so chaotic cryptography has been extensively studied. Whit this, a chaos-based image encryption algorithm was proposed in [1]. The algorithm combines confusion and diffusion in traditional cryptography, which utilizes Arnold’s cat map[2] to shuffle the positions of plain-image pixels to introduce diffusion and Chen’s chaotic system [3] to change the grayscale values of the shuffled image pixels to introduce confusion, the combined transformation of confusion and diffusion provides greater security than using them separately.

In order to ensure the security level of the encryption algorithm, the researchers also continue to analyze the vulnerabilities of various encryption schemes and try to attack them. Among them, a traditional method is to infer the key by manually analyzing the encryption algorithm, just like the solution in [4], the authors demonstrated a chosen-plaintext attack and a known-plaintext attack that reveals the secret parameters of the encryption algorithms in [1], however this method is key-dependent and relatively time-consuming and labor intensive. Another approach is to search for keys from a key dictionary maintained with some special algorithms. For instance, in [5] a key dictionary was constructed by using machine learning algorithms to generate more human-compliant keys, the attacker constantly searches through the key dictionary until the correct one is found, but in addition to being time consuming, this solution has a great chance of not finding the right key.

In this letter, a novel image decryption approach is proposed to attack the chaos-based image encryption algorithm [1] based on deep learning. As proof of concept, we first proposed to use the deep learning method to crack the encryption algorithm. First, we extract essential features from cipher-images with a convolutional encoder architecture. Then we regenerate decrypted images from the features to approximate their corresponding plain images with a symmetric deconvolutional generator network. Experimental results demonstrate the effectiveness of the proposed method both for cracking the proposal [1] in both static-key and dynamic-key encryption cases. Compared to previous encrypted image attack schemes [4, 5], our method does not require a time-consuming manual analysis of the algorithm to infer the key. Instead, we directly attack the chaotic-based encryption algorithm in a key-independent manner. After training the decryption model, the well-trained model can quickly and automatically reconstruct the plain image from the encrypted image with high fidelity.

Fig. 1: Illustration of the proposed deep learning-based image decryption scheme for the chaos-based image encryption method [1]. The proposed method enables end-to-end relationship inference between plain images and chao-based encrypted images. The decryption model can be divided into two parts: the first part is to project cipher images into the low-dimensional feature space with a convolutional encoder, and the second part is to regenerate the decrypted image with a deconvolutional generator.

Ii The chaos-based image encryption algorithm

There are two steps in the chaos-based image encryption algorithm [1] : first, the positions of image pixels in the plain images are shuffled using Arnold’s cat map [2]. Then, the grayscale values of the shuffled image pixels are changed using Chen’s chaotic system [3].

Ii-a Arnold’s cat map

Arnold’s cat map [2] is a chaotic mapping method for repeated folding and stretching transformation in a finite region, which is generally applied to multimedia chaotic encryption. Without loss of generality, assume that we have an original grayscale image of size with pixel coordinates . Arnold’s cat map can be expressed as,


where and are positive integers. is the new coordinate value of the original pixel after iterating the map once. After Arnold’s cat map has been performed for times, we have,




Ii-B Chen’s chaotic system

In Chen’s chaotic system [3], there are a set of differential equations given as,


where , , and are parameters. The system is chaotic when , and [3, 1].

Ii-C The Chaos-based encryption algorithm

In [1], the secret keys of the encryption algorithm are , , the number of iterations of Arnold’s cat map and the initial value of Chen’s chaotic system, i.e., , , . The specific steps are as follows:
(1) Obtain the shuffled image by using Arnold’s cat map to shuffle the image .
(2) Get a pixels sequence by scanning the shuffled image in order from left to right and then top to bottom.
(3) Iterate Chen’s chaotic system times by using Runge-Kutta step size 0.001, in each iteration, we can get three values , and ,, by processing these values as follows:


the encryption key sequence will be obtained, where is the absolute value of . means round down, it returns the largest integer not larger than . In the cryptosystem, all variables have a 15-digit precision when expressed in scientific notation, so the decimal fractions of the variables need to be multiplied by .
(4) Encrypt the shuffled sequence by:


where and represents bitwise exclusive OR operation. So the encrypted sequence is Obtained.
(5) Obtained the cipher-image by reshaping the encrypted sequence into an image.

Iii Proposed method

To attack the chaos-based image encryption algorithm [1], we need to find a complex mapping function to model the inverse transform from encrypted images and plain images. We employ deep convolutional neural networks (CNN) [6, 7] to model such complex inverse functions [8, 9].

In Fig.1, the model is mainly divided into convolutional groups and deconvolutional groups. In convolutional groups, the input are the cipher images described as , we design several convolutional layers to analyze input image composition and obtain the low-dimensional features, the operation is defined as . In deconvolutional groups, we perform the opposite operation to the convolution stage and reconstruct plain images with high fidelity. The inverting operation is described as . The regenerate images are compared to their ground truth plain images given as target , we use Mean Squared Error (MSE) as the loss function [10, 11, 12, 13]. After training, the model can be used to attack the chaos-based image encryption algorithms [1].

Iii-a Network Architecture

As shown in Fig.2, on the left is the convolutional groups and the deconvolutional network is on right, they are basically symmetrical.

Fig. 2: The network structure of the deep learning-based model, which is also mainly composed of the convolutional groups and deconvolution groups. The former mainly includes convolution layers and the latter mainly has deconvolution layers.

In convolutional groups to , in order to ensure that the information at the edge of the image can be utilized, one-dimensional zero padding[6] is first applied around the input image. Then the padded image is convolved with a convolutional layer ( kernel size, stride). Next, we perform on the output [14]. Finally the output is put through the [15] function described as , introducing non-linear factors to neurons. In convolutional group , we only construct a convolutional layer ( kernel size, stride) and activation function [6].

In the convolutional groups above, except that the number of output feature maps of the convolutional layer in the first group is , the output of other convolutional layers is twice that of the corresponding input feature map number. For each of the groups, the convolutional operation is given by


where and denote the weights and biases of convolutional filters, respectively. represents the convolution operator and represents .

For the convolutional groups, there are six deconvolutional groups. The first deconvolutional group includes a deconvolutional layer ( kernel size, stride) followed by a layer and a function. In deconvolutional groups to , we design a deconvolutional layer ( kernel size, stride) with one-dimensional zero padding, followed by a layer and function except group 6.

In the above deconvolutional groups, the number of output feature maps for the deconvolution layer in the last group is one, while the number of output feature maps for other deconvolution layers is half of the corresponding input. For a single group , given the deconvolution operator , deconvolutional filter weights and biases , the deconvolutional operation is described as


In the network designed above, the reason for using the convolution layer instead of the downsampling layer when projecting input into a low-dimensional space is that the downsampling layer loses more original information. In addition, the convolution kernel size is used to simply and efficiently project the input into the low-dimensional space and to ensure the symmetry of the projected and reconstructed plain image.

Iii-B Network Training

In order to model the inverse transform from encrypted images and plain images, we will minimize the distance between the output of the network and the plain images corresponds to the input cipher images by using MSE. In addition, we introduce the regularization of weight decay to help for a better generalization in order to avoid overfitting[16, 17, 18]. Given the regularization weighting coefficients , we have loss function




After defining the loss function, we use the Adam [19] optimization method to train it. While iteratively inputting each data batch (), the filter is updated in the direction that minimizes the loss function. We set the initial learning rate to , and the coefficients used for computing running averages of gradient and its square is .

Iv Experiments

Iv-a Cracking of images encrypted with static-keys

In the static-key case, the MNIST dataset is used for training and cracking evaluations. In detail, we use plain images from the training set of the MNIST dataset, and the rest plain images from the test set. The precision is set as when encrypting plain images using the chaos-based image encryption algorithm [1]. The secret keys of Arnold’s cat map are chosen as: , , ; the parameters of Chen’s chaotic system are selected as: , , . And the initial condition of Chen’s system is: , , . For details of selecting the secret parameters, please refer to [1]. Next we resize all images into pixels and store in PNG format. Finally, we use the resulting images (i.e., pairs of cipher-plain images) from the training set to train the network. To test the decryption performance, we feed the encrypted images from the test set (i.e., encrypted images) to the well-trained network.

In Fig. 3, we show several decryption representatives. The top row shows ten plain digits which were randomly selected from the MNIST ten-digit categories. Corresponding to each digit sample in the first row, their cipher and regenerated images are shown in the second and the third row, respectively. In the second row, we find each digit has been randomly largely shuffled and changed with the chaos-based encryption [1]. Comparing the regenerated images with the cipher images, we observe that the regenerated images mostly restored the content information of their plain images. One could clearly tell the digit number represented by each regenerated image.

Fig. 3: Cracking of images encrypted with static keys. From the top to bottom are the plain-images, cipher-images and regenerate images, and the regenerate images have a very high degree of reduction on the plain-images, we can clearly distinguish each number.
Fig. 4: Cracking result of images encrypted with dynamic-keys. From the top to bottom are the plain-images, cipher-images and regenerate images, and regenerate-images still restores most of the information in the plain-images, and we could easily distinguish the numbers on each image, except for individual numbers such as .

Iv-B Cracking of images encrypted with dynamic-keys

In this experiment, we evaluate the image cracking performance in the dynamic-key setting. Specifically, we change the selection of secret parameters of Arnold’s cat map by setting and from a range to . During encryption process, and will randomly select the parameters within this given range. All other secret keys settings remain the same as the static-key experimental settings in subsection IV-A. In order to increase the coverage of our network for different encryption transformation methods due to distinct keys, we encrypted each sample of the MNIST training set four times using a set of dynamic-keys. With this, we have pairs of cipher-plain images to train the network. Similarly, the test set is encrypted with dynamic keys, and test results are shown in Fig. 4.

In the experimental results Fig.4, plain-images, cipher-images and regenerated images are also arranged in order from top to bottom. The regenerated images still restore most essential information of the plain-images but there are also a few collapsed results, e.g., the restored image of the number in the Fig.4. Despite several failed cases, we could still easily distinguish the numbers on the images except these individual examples.

V Evaluations

V-a Quantitative Evaluations

In order to measure the quality of our experimental results, we introduce the network [20] to classify and evaluate the results, because the network can classify MNIST data sets with high precision, its accuracy on the MNIST test set can reach , and its structure is relatively simple, so it is very efficient to use it as an experimental evaluation. In Evaluations, we first tested the classification accuracy of on the original MNIST test set, and then used it to evaluate the performance of our decrypted images. Table I shows the classification accuracy rates of network on MNIST test set and our experimental results. Among them, the accuracy rate on the original MNIST test dataset is , which indicates the efficiency of for the MNIST digit recognition. To quantitatively test the accuracy of our pre-trained decryption model, firstly we encrypt the MNIST test dataset using the chaos-based encryption [1] in the static-key encryption scenario. Then the encrypted MNIST digits are decrypted using our pre-trained model to approximately decrypt plain images. Finally, the decrypted images are recognized by the model. The accuracy is calculated by comparing the predicted label from decrypted images with ground truth label from their plain image counterparts. The recognition accuracy is , which is only slightly lower than that on the plain images. This means that decrypted images can be recognized quite well. Similarly, we test the recognition accuracy of decrypted images from the dynamic-key encryption. As showed in Table I, the accuracy rate is . From the recognition results, we conclude that the proposed image decryption method can efficiently crack the chaos-based image encryption methods [1].

Image Set Encryption Encryption Classification
case accuracy rates
origin MNIST no none 98.77%
decrypted MNIST yes static-key 97.87%
decrypted MNIST yes dynamic-key 92.04%
TABLE I: Classification accuracy rates of LeNet-5

V-B Comparison

The attack method proposed in [4] is a process of manually analyzing the encryption algorithm [1]. Among them, the chosen-plaintext attack requires constructing a specific plaintext input into the encryption algorithm[1], but many times there is no such condition. In contrast, our approach does not require specific inputs. And for the known-plaintext attack in [4], the secret parameter extraction of Arnold’s cat map is a process of continuously narrowing the scope of encryption transformation, the attacker needs to constantly find out all the possible encryption transformation methods until the final private parameters are found, which is an extremely time-consuming and labor-intensive process. In comparison, our method has the following advantages. The decryption method in [4] needs to be calculated separately for each crack. However, our method does not require to crack the keys before decryption, which is key-independent. For instance, in the dynamic-key encryption case,it only requires to be trained once to automatically decrypt MNIST digits encrypted with different keys. The experimental results confirm that the proposed method has certain degree of generalization ability.

Vi Conclusion

In this letter, we first present a new attack method based on deep learning for a chaos-based image encryption algorithm [1]. The proposed method first projects encrypted images to the low-dimensional feature space. Then decrypted images are perceptually reconstructed with the deconvolutional generator. Experimental results verify the decryption accuracy of the proposed method in both static-key and dynamic-key encryption cases. Compared with the previous method, our solution is key-independent and automatical. In the future work, we will explore the feasibility of decrypting chaos-based video encryption and other image encryption schemes.


  • [1] Z.-H. Guan, F. Huang, and W. Guan, “Chaos-based image encryption algorithm,” Physics Letters A, vol. 346, no. 1-3, pp. 153–157, 2005.
  • [2] G. Peterson, “Arnold’s cat map,” Math45-Linear algebra http://online. redwoods. cc. ca. us/instruct/darnold/maw/c atmap. htm, 1997.
  • [3] G. Chen and T. Ueta, “Yet another chaotic attractor,” International Journal of Bifurcation and chaos, vol. 9, no. 07, pp. 1465–1466, 1999.
  • [4] C. Cokal and E. Solak, “Cryptanalysis of a chaos-based image encryption algorithm,” Physics Letters A, vol. 373, no. 15, pp. 1357–1360, 2009.
  • [5] B. Hitaj, P. Gasti, G. Ateniese, and F. Perez-Cruz, “Passgan: A deep learning approach for password guessing,” arXiv preprint arXiv:1709.00440, 2017.
  • [6] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
  • [7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [8] Y. Wang, H. Palangi, Z. J. Wang, and H. Wang, “Revhashnet: Perceptually de-hashing real-valued image hashes for similarity retrieval,” Signal processing: Image communication, vol. 68, pp. 68–75, 2018.
  • [9] Y. Wang, R. Ward, and Z. J. Wang, “Coarse-to-fine image dehashing using deep pyramidal residual learning,” IEEE Signal Processing Letters, vol. PP, pp. 1–1, 05 2019.
  • [10] M. Ishikawa, “Structural learning with forgetting,” Neural networks, vol. 9, no. 3, pp. 509–521, 1996.
  • [11] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in European conference on computer vision.   Springer, 2014, pp. 184–199.
  • [12] W. Lotter, G. Kreiman, and D. Cox, “Deep predictive coding networks for video prediction and unsupervised learning,” arXiv preprint arXiv:1605.08104, 2016.
  • [13] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334–1373, 2016.
  • [14] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
  • [15] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 315–323.
  • [16] J. E. Moody, “The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems,” in Advances in neural information processing systems, 1992, pp. 847–854.
  • [17] W. S. Sarle, “Stopped training and other remedies for overfitting,” Computing science and statistics, pp. 352–360, 1996.
  • [18] I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” arXiv preprint arXiv:1711.05101, 2017.
  • [19] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [20] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description