Demoiréing of Camera-Captured Screen Images Using Deep Convolutional Neural Network
Taking photos of optoelectronic displays is a direct and spontaneous way of transferring data and keeping records, which is widely practiced. However, due to the analog signal interference between the pixel grids of the display screen and camera sensor array, objectionable moiré (alias) patterns appear in captured screen images. As the moiré patterns are structured and highly variant, they are difficult to be completely removed without affecting the underneath latent image. In this paper, we propose an approach of deep convolutional neural network for demoiréing screen photos. The proposed DCNN consists of a coarse-scale network and a fine-scale network. In the coarse-scale network, the input image is first downsampled and then processed by stacked residual blocks to remove the moiré artifacts. After that, the fine-scale network upsamples the demoiréd low-resolution image back to the original resolution. Extensive experimental results have demonstrated that the proposed technique can efficiently remove the moiré patterns for camera acquired screen images; the new technique outperforms the existing ones.
Capturing screen-displayed contents by cameras has become for many a spontaneous and convenient way of exchanging information and keeping records across different media platforms. This user behavior is quite natural due to the ubiquity and wide uses of digital cameras and information displays of all types, particularly those integrated into portable personal devices such as smartphones and tablets. In terms of multimedia interface, the alternative of issuing print-screen command, saving the resulting image file and having it emailed is cumbersome and less intuitive. In some situations digitally saving the displayed image is not even possible, taking public displays for example. However, the analog means of camera shooting screens can severely degrade the image quality. As both the display screen (as shown in Fig. 0(a)) and camera sensor array (as shown in Fig. 0(b)) need to resample an image in order to achieve color effects, the interference between the sampling grids of the display and camera often generates objectionable moiré (aliasing) patterns.
Despite the commonness and annoyance of the problem, little work has been done on the reduction of moiré artifacts in camera-captured screen images. In attempt to suppress moiré patterns, a camera often has a layer of optical low-pass (anti-aliasing) filter placed in front of the sensor arrays. But unless compromising the sharpness of an image, the optical approach has limited effectiveness against strong moiré patterns commonly found in camera-captured screen images. Another approach is to use adaptive digital filtering . While this method shows some improvement over the optical approach, its results are still far from satisfactory. After all, it is very difficult, if not impossible, for conventional signal processing methods to remove aliases like moiré patterns completely, as suggested by the Nyquist-Shannon sampling theorem .
In this work, we tackle the problem of restoring camera-captured screen images against moiré artifacts using neural network techniques, as shown in Fig. 2. Machine learning methods, deep convolutional neural networks (DCNN) in particular, hold the promise to satisfactorily solve the demoiréing problem, as they can exploit and benefit from the distinctive statistics of moiré-free images and moiré patterns learned from suitable training data.
In the development of our DCNN technique for demoiréing, we face the challenge of obtaining exactly matched pairs of real moiré-interfered screenshot and its corresponding clean image for training; it is very difficult to generate moiré-free screenshot and to spatially align the pair of images perfectly. Thus, instead of relying solely on real captured images, we carefully model the formation of moiré patterns during the capture of an LCD screen using a Bayer color filter array (CFA) camera, and generate a large number of synthetic training images with realistic moiré patterns from original digital images. While data synthesis brings convenience of building a large training set quickly, synthetic data may still deviate from real camera-captured screen images in certain aspects such as camera shake, chromatic aberration, unclean screen surface, reflection, etc. To improve the robustness of the proposed technique in realistic settings, we introduce a novel two-stage training procedure, which pretrains the neural network with synthetic data and then retrains it using real images and the results from the first stage. Another contribution of this work is the use of a multi-scale neural network architecture. As moiré patterns exist and exhibit vastly different characteristics in different scales, we find that removing moiré artifacts gradually, from coarse to fine, can greatly improve the effectiveness of the proposed DCNN technique.
The remainder of the paper is organized as follows. Section 2 provides an overview of related work in the literature. In Section 3, we introduce our method for synthesizing training data, and in Section 4, we present in detail our proposed method. Section 5 shows the experiment evaluations of the proposed method with both synthetic and real-world images. Finally, Section 6 concludes.
2 Related Work
To the best of our knowledge, the only existing published work on the problem of demoiréing for camera-captured screen images is a conventional image processing technique, called layer decomposition on polyphase components (LDPC) . The LDPC technique has three major steps: it first subsamples the input image into four polyphase components; next, for each component, it separates the moiré interference layer based on a patch based Gaussian mixture model (GMM) prior; at last, the technique recombines the four filtered polyphase components into one full resolution image as the output. Although LDPC can remove some small moiré artifacts, it tends to over-smooth the image detail and it cannot handle large-scale moiré patterns, such as color stripes, due to its very small patch size.
The authors of LDPC also proposed a demoiréing technique for textured images, i.e., images of objects with some fine grid patterns, like fabric . This technique removes moiré artifacts in green channel using signal decomposition and reconstructs the red and blue channels using guided image filter with the cleaned green channel as the guide. Another related demoiréing problem is the removal of moiré patterns for scanned images, where the artifacts have net-like structures. The techniques designed for solving this problem are commonly referred as descreening. One of the early descreening algorithms is , which employs wavelet domain filtering technique to remove moiré patterns. Siddiqui et al. proposed a non-iterative and non-linear descreening filter based on resolution synthesis-based denoising . Shou and Lin presented a technique that employs cellular neural network based texture classification for moiré pattern screening .
The demoiréing problem can also be categorized as one of image restoration. There have been quite a few inspiring image restoration techniques proposed in the past decade, such as non-local similarity based techniques , nuclear norm minimization based techniques , dictionary learning based techniques [4, 2, 19]. Recently, deep neural network based techniques have demonstrated their great strength in many different fields and achieved the state-of-the-art for various image restoration problems. For example, stacked denoising auto-encoder achieves good results for denoising corrupted images ; multi-layer perceptron (MLP) can be used as a denoising framework . For more general image restoration tasks, Mao proposed RED-Net, a very deep convolutional neural network with skip connections . Another example of general image restoration network is DnCNN. By combining batch normalization and residual learning method, DnCNN can tackle various problems, such as Gaussian denoising, single image super-resolution and JPEG deblocking . By retraining these DCNN methods with moiré-interfered images, they can be repurposed for solving demoiréing problem as well.
3 Preparation of Training Data
The basic idea of the proposed demoiréing technique is to map a moiré pattern tainted screen photo to an artifact-free image using an end-to-end neural network trained with a large number of such image pairs. The effectiveness of our technique, or any machine learning approaches, greatly relies on the availability of a representative and sufficiently large set of training data. In this section, we discuss the methods for preparing the training images for our technique.
To help the proposed DCNN technique identify the moiré artifacts accurately in real-world scenarios, ideally, the training process should only use real photographs of a screen and the corresponding original digital images displayed on it. While obtaining such a pair of images is easy, perfectly aligning them spatially, a necessary condition for preventing mismatched edges being misidentified as moiré patterns, is difficult to achieve. As many common imaging problems in real photos, such as lens distortion and non-uniform camera shake, adversely affect the accuracy of image alignment, it is challenging to build a sufficiently large and high quality training set using real photos.
Considering the drawbacks of using real photos, we employ synthetic screenshot images with realistic moiré patterns for training instead. The input images of the synthesizer, which are collected by using the print-screen command from computers running Microsoft Windows, cover various types of content, such as dialog box, text, web page, graphics, natural images, etc. To accurately simulate the formation of moiré patterns, we follow truthfully the process of image display on an LCD and the pipeline of optical image capture and digital processing on a camera. The whole simulation procedure can be summarized in the following steps.
Resample the input image into a mosaic of RGB subpixels as in Fig. 0(a) to simulate the image displayed on the LCD;
Apply a projective transformation with a certain degree of randomness on the image to simulate different relative positions and orientations of the display and camera;
Use radial distortion function to simulate lens distortion;
Apply a flat top Guassian filter to simulate anti-aliasing filter;
Resample the image using Bayer CFA as in Fig. 0(b) to simulate the raw reading of the camera sensor;
Add Guassian noise to simulate sensor noise;
Apply denoising filter;
Compress the image using JPEG to add compression noise;
Output the decompressed image as the synthetic image with moiré patterns
The corresponding groundtruth clean image is generated from the original image using the same projective transformation and lens distortion function. In addition, the groundtruth image is also scaled to match the same size of the synthetic camera-captured screen image.
4 Proposed Algorithm
Unlike the degradations in many other image restoration problems, moiré patterns are signal-dependent and structured; they often can only be distinguished from true image features when we examine both the big picture and fine details. Additionally, the moiré patterns in a camera-captured screen image appear very differently in different scales. As exemplified in Fig. 3, the moiré patterns are thin vertical stripes in fine scale; while they look like curved color bands in coarse scale. Due to these complex characteristics of moiré patterns, most existing DCNN architectures are unsuitable and ineffective for the demoiréing task.
To deal with these aforementioned difficulties, the proposed demoiréing technique adopts a multi-scale strategy, as sketched in Fig. 4. The basic idea is as follows. For an input camera-captured screen image , the proposed technique first blurs and downsamples using a Gaussian kernel, yielding image . Next, the technique employs a generator neural network trained with downsampled screen images to reduce the moiré patterns in . The resulting image is then upsampled back to the original scale and sent to another network along with the original input image for fine-scale moiré pattern removal. The following are the details of the proposed technique.
4.1 Coarse-Scale Demoiréing
The coarse-scale demoiréing is achieved by using a generator network to map a downsampled camera-captured screen image to its artifact-free counter part . Inspired by the work of Ledig et al. , we employ stacked residual blocks as the foundation of the architecture of network . Illustrated in Fig. 5 is the structure of a residual block, which consists of two convolutional layers with kernels and 64 feature maps, followed by batch normalization layers  and rectified linear units (ReLU) . As shown in Fig. 4, there are 16 such residual blocks in total in the network, and at the end, there are a skip connection to add back some details and a hyperbolic tangent function as the final nonlinear operation.
With the synthetic screen image set discussed in the previous section, we can use supervised learning to train a generator network using downsampled versions of the synthetic moiré-interfered and artifact-free image pairs. The loss function of network for the training process is a mean squared error (MSE) function defined as follows,
where the and are the height and width of downsampled image , respectively, and is the downsampled groundtruth image.
4.2 Retraining with Real Images
The synthetic data trained network performs well against artificially generated moiré patterns in coarse scale. However, since the data synthesizer cannot cover every possible characteristics of real camera-captured screen images, the network may fail to identify all the moiré patterns fully in a real image, leaving traces of artifacts.
To improve the robustness of the proposed technique against real images, we integrate idea of generative adversarial network (GAN) in our architecture , as shown in Fig. 6. In GAN, a discriminative network is jointly trained with the generative network for discriminating the output images of against a set of unrelated original clean images. This process guides to return images that are statistically similar to artifact-free images. As the training for GAN is not necessarily paired, we can train the network using real images even if the corresponding groundtruth images are unavailable.
Following the idea of Goodfellow et al., we set the discriminative network to solve the following minimax problem:
In practice, for better gradient behavior, we minimize instead of , as proposed in . This introduces an adversarial term in the loss function of the generator network :
In competition against generator network , the loss function for training discriminative network is the binary cross entropy:
Minimizing drives network to produce images that network cannot distinguish from original artifact-free images. Accompanying the evolution of , minimizing increases the discrimination power of network .
However, optimizing for loss function alone does not guarantee that an output image is similar to the input moiré-interfered screenshot image ; a fidelity term is necessary for high quality restoration. In the proposed technique, instead of using the input image to regulate , we employ , the result from the synthetic data trained network , to formulate the following loss function.
Finally, we combine the adversarial loss and fidelity loss when optimizing the generative network , namely,
where Lagrange multiplier is a user given weight balancing the two loss functions. In all the presented experiments, is set to empirically.
As demonstrated in Fig. 7, the synthetic data trained network works well for synthetic camera-captured screen image, but it fails to remove all the moiré patterns for real image. In comparison, the network , which is retrained using real data, removes the artifacts completely for both synthetic and real images.
4.3 Fine-Scale Demoiréing
At this stage, we have the high-resolution moiré-interfered input image and the low-resolution but demoiréd image , and we want to recover a high-resolution demoiréd image from these two images. The problem is akin to a super-resolution (SR) of , except that extra information, a degraded version of the original image, is available.
Intuitively, we can first upsample to the original resolution using SR and then merge the result with to produce the final output. Based on this idea, we design a fine-scale demoiréing network , as shown in the bottom half of Fig. 4. The proposed network consists of two stages: residual learning and retrieval learning. In the residual learning stage, the proposed network takes bicubic interpolated image as the input. Then, following the architecture of VDSR , the network uses 20 cascaded convolutional layers with the ReLU activation to recover the missing details. In first 10 layers, the kernel size is , and in next 10 layers, the kernel size is . 64 filters are used in each convolutional layer and a skip connection is employed after the last residual layer.
In the retrieval stage, the proposed network concatenates the output of residual learning stage with the original input and uses two convolutional layers to refine the final result using and upsampled . With synthetic data, we employ the MSE as the loss function of the fine-scale demoiréing network as follows,
where the and are the height and width of the input image , respectively, and is the groundtruth clean image.
Using the same GAN-based retraining scheme as presented previously, we can also boost the performance of the fine-scale demoiréing network on real camera-captured screen images. By retraining using real data, the improved network works much better than the synthetic data trained network , as demonstrated in the second row of Fig. 8. For synthetic input image, the results of and are similar, as shown in the first row of Fig. 8.
To evaluate the performance of the proposed demoiréing algorithm, we implement the deep convolutional neural network in using Tensorflow. All of the reported experiments in this paper are conducted on a computer with a NVIDIA Titan Xp GPU and an Intel i7-4770 CPU.
5.1 Training Details
Form 1,000 digital images with various content that is common displayed on a computer, we create 80,000 image patches with artificial moiré patterns using the data synthesizer presented in Section 3. Of the 80,000 patches, 60,000 are used for training and 20,000 are used for testing. After Gaussian blurring and downsampling with scale factor 4, each patch is resized to for the training of coarse-scale demoiréing network . In the training process, we employ Adam’s optimization method  with and a learning rate of . The network is trained with update iterations, where in each iteration, 32 patches are randomly picked as a mini-batch.
The synthetic data trained network is then used for retraining GAN-based network . The input training data for are real screen images displayed on 3 different screens (Alienware AW2310, Dell P2417H, and Samsung SyncMaster T260) and taken by 3 different smartphones (iPhone 6, iPhone 8, Samsung Galaxy S8). In total, 300 images with different combinations of devices from different distances and angles are taken. From the 300 images, 20,000 patches are extracted for our GAN training.
All the weights in the network are retrained with update iterations at a learning rate of . Following the method of Goodfellow et al. , we choose and alternately update the discriminator and the generator.
For the fine-scale demoiréing network , we adopt the same two-stage training strategy. First, we pretrain network using synthetic data. All the input demoiréd synthetic images are upsampled with the Bicubic interpolation method and cropped into patches for the pretraining. The batch size is 64 and the number of update iterations is . The Adam optimizer uses the same setting as in the training of . After being ready, we retrain the network using the real camera-captured data for another iterations at a lower learning rate of .
5.2 Experimental Results
For performance evaluation, two state-of-the-art general purpose image restoration algorithms, DnCNN  and RED-Net  are tested along with the proposed algorithm. DnCNN can tackle many different image restoration tasks, such as Gaussian denoising, single image super-resolution and JPEG image deblocking, and RED-Net is designed to preserve the primary image components and eliminate various corruptions.
For the demoiréing task, we train a 20-layer DnCNN (DnCNN20) with a receptive field of using our synthetic training data set. We also employ an extended DnCNN network with 35 layers (DnCNN35) to achieve a larger receptive field () to match the receptive field of our technique. For RED-Net, we trained a 20-layer network (RED20) and a 36-layer network (RED36) with default settings provided by the original authors. The receptive fields for RED20 and RED36 are and , respectively.
We first test all the algorithms using coarse-scale images, which are downsampled from the synthetic moiré-interfered images. The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are calculated for evaluation, as listed in Table 1. The proposed generator achieves much better demoiréing performance on the synthetic images, which exceeds the other compared methods almost 3dB in PSNR and 0.01 in SSIM.
We also test the trained models on the downsampled real camera-captured screen images. As shown in Fig. 9, the proposed coarse-scale network works well against different coarse-scale moiré patterns. The other tested general restoration methods, on the other hand, all fail to remove those wide moiré color bands, despite having similar receptive field as the proposed technique.
For fine-scale demoiréing, we retrain all the DnCNN and RED-Net models using the synthetic data of original resolutions. As shown in Table 2, the proposed technique is still substantially better than the competition in terms of PSNR and SSIM. The advantage of the proposed technique is also evident for human viewers. As shown in Fig. 11, the results of the proposed technique are much cleaner than the other tested methods.
Plotted in Fig. 10 is the running time of the tested algorithms as a function of the size of the input images. Benefiting from the multi-scale DCNN design and two-stage training strategy, the proposed algorithm presents much better demoiréing results without significantly increasing the computational cost.
Based on the observation that Moiré patterns exist and exhibit vastly different characteristics in different scales, we purposefully designed a coarse-to-fine DCNN technique for the task of demoiréing. By incorporating a novel retraining strategy, the proposed technique can work well for real camera-captured screen images even without paired real images for training. Extensive experimental results have demonstrated that the proposed technique can efficiently remove the moiré patterns for camera acquired screen images; the new technique outperforms the existing ones.
-  H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with bm3d? In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2392–2399. IEEE, 2012.
-  P. Chatterjee and P. Milanfar. Clustering-based denoising with locally learned dictionaries. IEEE Transactions on Image Processing, 18(7):1438–1451, 2009.
-  K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image processing, 16(8):2080–2095, 2007.
-  M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736–3745, 2006.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
-  S. Gross and M. Wilber. Training and investigating residual nets. Facebook AI Research, CA.[Online]. Avilable: http://torch. ch/blog/2016/02/04/resnets. html, 2016.
-  S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2862–2869, 2014.
-  S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
-  J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1646–1654, 2016.
-  D. Kinga and J. B. Adam. A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
-  C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint, 2016.
-  J. Luo, R. De Queiroz, and Z. Fan. A robust technique for image descreening based on the wavelet transform. IEEE Transactions on Signal Processing, 46(4):1179–1184, 1998.
-  X. Mao, C. Shen, and Y.-B. Yang. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Advances in neural information processing systems, pages 2802–2810, 2016.
-  V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010.
-  C. E. Shannon. Communication in the presence of noise. Proceedings of the IRE, 37(1):10–21, 1949.
-  Y.-W. Shou and C.-T. Lin. Image descreening by ga-cnn-based texture classification. IEEE Transactions on Circuits and Systems I: Regular Papers, 51(11):2287–2299, 2004.
-  H. Siddiqui and C. A. Bouman. Training-based descreening. IEEE Transactions on Image Processing, 16(3):789–802, 2007.
-  P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371–3408, 2010.
-  Z. Wang, Y. Yang, Z. Wang, S. Chang, J. Yang, and T. S. Huang. Learning super-resolution jointly from external and internal examples. IEEE Transactions on Image Processing, 24(11):4359–4371, 2015.
-  J. Yang, F. Liu, H. Yue, X. Fu, C. Hou, and F. Wu. Textured image demoiréing via signal decomposition and guided filtering. IEEE Transactions on Image Processing, 26(7):3528–3541, 2017.
-  J. Yang, X. Zhang, C. Cai, and K. Li. Demoiréing for screen-shot images with multi-channel layer decomposition. In Visual Communications and Image Processing (VCIP), 2017 IEEE, pages 1–4. IEEE, 2017.
-  K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.