Content Authentication for Neural Imaging Pipelines: End-to-end Optimization of Photo Provenance in Complex Distribution Channels
Forensic analysis of digital photo provenance relies on intrinsic traces left in the photograph at the time of its acquisition. Such analysis becomes unreliable after heavy post-processing, such as down-sampling and re-compression applied upon distribution in the Web. This paper explores end-to-end optimization of the entire image acquisition and distribution workflow to facilitate reliable forensic analysis at the end of the distribution channel. We demonstrate that neural imaging pipelines can be trained to replace the internals of digital cameras, and jointly optimized for high-fidelity photo development and reliable provenance analysis. In our experiments, the proposed approach increased image manipulation detection accuracy from 55% to nearly 98%. The findings encourage further research towards building more reliable imaging pipelines with explicit provenance-guaranteeing properties.
Ensuring integrity of digital images is one of the most challenging and important problems in multimedia communications. Photographs and videos are commonly used for documentation of important events, and as such, require efficient and reliable authentication protocols. Our current media acquisition and distribution workflows are built with entertainment in mind, and not only fail to provide explicit security features, but actually work against them. Image compression standards exploit heavy redundancy of visual signals to reduce communication payload, but optimize for human perception alone. Security extensions of popular standards lack in adoption .
Two general approaches to assurance and verification of digital image integrity include [23, 29, 31]: (1) pro-active protection methods based on digital signatures or watermarking; (2) passive forensics analysis which exploits inherent statistical properties resulting from the photo acquisition pipeline. While the former provides superior performance and allows for advanced protection features (like precise tampering localization , or reconstruction of tampered content ), it failed to gain widespread adoption due to the necessity to generate protected versions of the photographs, and the lack of incentives of camera vendors to modify camera design to integrate such features .
Passive forensics, on the other hand, relies on our knowledge of the photo acquisition pipeline, and statistical artifacts introduced by its successive steps. While this approach is well suited for potentially analyzing any digital photograph, it often fails short due to the complex nature of image post-processing and distribution channels. Digital images are not only heavily compressed, but also enhanced or even manipulated before, during or after dissemination. Popular images have many online incarnations, and tracing their distribution and evolution has spawned a new field of image phylogeny [12, 13] which relies on visual differences between multiple images to infer their relationships and editing history. However, phylogeny does not provide any tools to reason about the authenticity or history of individual images. Hence, reliable authentication of real-world online images remains untractable .
At the moment, forensic analysis often yields useful results in near-acquisition scenarios. Analysis of native images straight from the camera is more reliable, and even seemingly benign implementation details - like the rounding operators used in the camera’s image signal processor  - can provide useful clues. Most forensic traces quickly become unreliable as the image undergoes further post-processing. One of the most reliable tools at our disposal involves analysis of the imaging sensor’s artifacts (the photo response non-uniformity pattern) which can be used both for source attribution and content authentication problems .
In the near future, the rapid progress in computational imaging will challenge digital image forensics even in near-acquisition authentication. In the pursuit of better image quality and convenience, digital cameras and smartphones employ sophisticated post-processing directly in the camera, and soon few photographs will resemble the original images captured by the sensor(s). Adoption of machine learning has recently challenged many long-standing limitations of digital photography, including: (1) high-quality low-light photography ; (2) single-shot HDR with overexposed content recovery ; (3) practical high-quality digital zoom from multiple shots ; (4) quality enhancement of smartphone-captured images with weak supervision from DSLR photos .
These remarkable results demonstrate tangible benefits of replacing the entire acquisition pipeline with neural networks. As a result, it will be necessary to investigate the impact of the emerging neural imaging pipelines on existing forensics protocols. While important, such evaluation can be see rather as damage assessment and control and not as a solution for the future. We believe it is imperative to consider novel possibilities for security-oriented design of our cameras and multimedia dissemination channels.
In this paper, we propose to optimize neural imaging pipelines to improve photo provenance in complex distribution channels. We exploit end-to-end optimization of the entire photo acquisition and distribution channel to ensure that reliable forensics decisions can be made even after complex post-processing, where classical forensics fails (Fig. 1). We believe that imminent revolution in camera design creates a unique opportunity to address some of the long-standing limitations of the current technology. While adoption of digital watermarking in image authentication was limited by the necessity to modify camera hardware, our approach exploits the flexibility of neural networks to learn relevant integrity-preserving features within the expressive power of the model. We believe that with solid security-oriented understanding of neural imaging pipelines, and with the rare opportunity of replacing the well-established and security-oblivious pipeline, we can significantly improve digital image authentication capabilities.
We aim to inspire discussion about novel camera designs that could improve photo provenance analysis capabilities. We demonstrate that it is possible to optimize an imaging pipeline to significantly improve detection of photo manipulation at the end of a complex real-world distribution channel, where state-of-the-art deep-learning techniques fail. The main contributions of our work include:
The first end-to-end optimization of the imaging pipeline with explicit photo provenance objectives;
The first security-oriented discussion of neural imaging pipelines and the inherent trade-offs;
Significant improvement of forensic analysis performance in challenging, heavily post-processed conditions;
A neural model of the entire photo acquisition and distribution channel with a fully differentiable approximation of the JPEG codec.
To facilitate further research in this direction, and enable reproduction of our results, we will publish our neural imaging toolbox at https://github.com/ upon paper acceptance.
2 Related Work
Trends in Pipeline Design
Learning individual steps of the imaging pipeline (e.g., demosaicing) has a long history  but regained momentum in the recent years thanks to adoption of novel deep learning techniques. Naturally, the research focused on the most difficult operations, i.e., demosaicing [16, 34, 22, 33] and denoising [5, 39, 25]. The newly developed techniques delivered not only improved performance, but also additional features. Grahbi et al. proposed a convolutional neural network (CNN) trained to perform both demosaicing and denoising . A recent work by Syu et al. proposes to exploit CNNs for joint optimization of the color filter array and a corresponding demosaicing filter .
Optimization of digital camera design can go even further. The recently proposed L3 model by Jiang et al. replaces the entire imaging pipeline with a large collection of local linear filters . The L3 model reproduces the entire photo development process, and aims to facilitate research and development efforts for non-standard camera designs. In the original paper, the model was used for learning imaging pipelines for RGBW (red-green-blue-white) and RGB-NIR (red-green-blue-near-infra-red) color filter arrays.
Replacing the entire imaging pipeline with a modern CNN can also overcome long-standing limitations of digital photography. Chen et al. trained a UNet model  to develop high-quality photographs in low-light conditions  by exposing it to paired examples of images taken with short and long exposure. The network learned to develop high-quality well-exposed color photographs from underexposed raw input, and yielded better performance than traditional image post-processing based on brightness adjustment and denoising. Eilertsen et al. also trained a UNet model to develop high-dynamic range images from a single shot . The network not only learned to correctly perform tone mapping, but was also able to recover overexposed highlights. This significantly simplifies HDR photography by eliminating the need for bracketing and dealing with ghosting artifacts.
Trends in Forensics
The current research in forensic image analysis focuses on two main directions: (1) learning deep features relevant to low-level forensic analysis for problems like manipulation detection [2, 40], camera model identification , or detection of artificially generated content ; (2) adoption of high-level vision to automate manual analysis that exposes physical inconsistencies, such as reflections [32, 35], or shadows . To the best of our knowledge, there are currently no efforts to either assess the consequences of the emerging neural imaging pipelines, or to exploit this opportunity to improve photo reliability.
3 End-to-end Optimization of Photo Provenance Analysis
Digital image forensics relies on intrinsic statistical artifacts introduced to photographs at the time of their acquisition. Such traces are later used for reasoning about the source, authenticity and processing history of individual photographs. The main problem is that contemporary media distribution channels employ heavy compression and post-processing which destroy the traces and inhibit forensic analysis.
The core of the proposed approach is to model the entire acquisition and distribution channel, and optimize the neural imaging pipeline (NIP) to facilitate photo provenance analysis after content distribution (Fig. 1). The analysis is performed by a forensic analysis network (FAN) which makes a decision about the authenticity / processing history of the analyzed photograph. In the presented example, the model is trained to perform manipulation detection, i.e., aims to classify input images as either coming straight from the camera, or as being affected by a certain class of post-processing. The distribution channel in the middle mimics the behavior of modern photo sharing services and social networks which habitually down-sample and re-compress the photographs. As will be demonstrated later, forensic analysis in such conditions is severely inhibited.
The parameters of the NIP are updated to guarantee both faithful representation of a desired color photograph ( loss), and accurate decisions of forensics analysis at the end of the distribution channel (cross-entropy loss). Hence, the parameters of the NIP and FAN models are chosen as:
where: are the parameters of the NIP and FAN networks, respectively; are the raw sensor measurements for the -th example patch; is the color RGB image developed by NIP from ; denotes a color image patch processed by manipulation ; is the probability that an image belongs to the -th manipulation class, as estimated by the FAN model.
3.1 The Neural Imaging Pipeline
We replace the entire imaging pipeline with a CNN model which develops raw sensor measurements into color RGB images (Fig. 2). Before feeding the images to the network, we pre-process them by reversing the nonlinear value mapping according to the camera’s LUT, subtracting black levels from the edge of the sensor, normalizing by sensor saturation values, and applying white-balancing according to shot settings. We also standardized the inputs by reshaping the tensors to have feature maps with successive measured color channels. This ensures a well-formed input of shape () with values normalized to [0, 1]. All of the remaining steps of the pipeline are replaced by a NIP. See Section 4.1 for details on the considered pipelines.
3.2 Approximation of JPEG Compression
To enable end-to-end optimization of the entire acquisition and distribution channel, we need to ensure that every processing step remains differentiable. In the considered scenario, the main problem is JPEG compression. We designed a JPEGNet model which approximates the operation of a standard JPEG codec, and expresses successive steps of the codec as matrix multiplications or convolution layers that can be implemented in TensorFlow (see supplementary materials for a detailed network definition):
RGB to/from YCbCr color-space conversions are implemented as convolutions.
Isolation of blocks for independent processing is implemented by a combination of space-to-depth and reshaping operations.
Forward/backward 2D discrete cosine transforms are implemented by matrix multiplication according to where denotes a input, and denotes the transformation matrix.
Division/multiplication of DCT coefficients by the corresponding quantization steps are implemented as element-wise operations with properly tiled and concatenated quantization matrices (for both the luminance and chrominance channels).
The actual quantization is approximated by a continuous function (see details below).
The key problem in making JPEG fully differentiable lies in the rounding of DCT coefficients. We considered two possible approximations. Firstly, we used a Taylor series expansion, which can be made arbitrarily accurate by considering more terms. Finally, we decided to use a smoother, and simpler sinusoidal approximation obtained by matching the phase with the sawtooth function:
Both approximations are shown in Fig. 3a.
We validated our JPEGNet by comparing produced images with a reference codec from libJPEG. The results are shown in Fig. 3bcd for a standard rounding operation, and the two approximations, respectively. We used 5 terms for the harmonic rounding. The developed module produces equivalent compression results with standard rounding, and a good approximation for its differentiable variants. Fig. 3e-h show a visual comparison of an example image patch, and its libJPEG and JPEGNet-compressed counterparts.
In our distribution channel, we used quality level 50.
3.3 The Forensic Analysis Network
The forensic analysis network (FAN) is implemented as a CNN following the most recent recommendations on construction of neural networks for forensics analysis . Bayar and Stamm have proposed a new layer type, which constrains the learned filters to be valid residual filters . Adoption of the layer helps ignore visual content and facilitates extraction of forensically-relevant low-level features. In summary, our network operates on patches in the RGB color space and includes (see supplementary materials for full network definition):
A constrained convolutions layer learning residual filters and with no activation function.
Four convolutional layers with doubling number of output feature maps (starting from 32). The layers use leaky ReLU activation and are followed by max pooling.
A convolutional layer mapping 256 features into 256 features.
A global average pooling layer reducing the number of features to 256.
Two fully connected layers with 512 and 128 nodes activated by leaky ReLU.
A fully connected layer with output nodes and softmax activation.
In total, the network has 1,341,990 parameters.
4 Experimental Evaluation
We started our evaluation by using several NIP models to reproduce the output of a standard imaging pipeline (Sec. 4.1). Then, we used the FAN model to detect popular image manipulations (Sec. 4.2). Initially, we validated that the models work correctly by using it without a distribution channel (Sec. 4.3). Finally, we performed extensive evaluation of the entire acquisition and distribution network (Sec. 4.4).
We collected a data-set with RAW images from 8 cameras (Table 1). The photographs come from two public (Raise  and MIT-5k ) and from one private data-set. For each camera, we randomly selected 150 images with landscape orientation. These images will later be divided into separate training/validation sets.
|Canon EOS 5D||12||864 dng||MIT-5k||RGGB|
|Canon EOS 40D||10||313 dng||MIT-5k||RGGB|
|Nikon D5100||16||288 nef||Private||RGGB|
|Nikon D700||12||590 dng||MIT-5k||RGGB|
|Nikon D7000||16||1k nef||Raise||RGGB|
|Nikon D750||24||312 nef||Private||RGGB|
|Nikon D810||36||205 nef||Private||RGGB|
|Nikon D90||12||1k nef||Raise||GBRG|
|Sensor resolution in Megapixels [Mpx]|
4.1 Neural Imaging Pipelines
We considered three NIP models with various complexity and design principles (Table 2): INet - a simple convolutional network with layers corresponding to successive steps of the standard pipeline; UNet - the well-known UNet architecture  adapted from ; DNet - adaptation of the model originally used for joint demosaicing and denoising . A detailed specification of the networks’ architectures is included in supplementary materials.
We trained a separate model for each camera in our dataset. For training, we used 120 diverse full-resolution images. In each iteration, we extracted randomly located patches and formed a batch with 20 examples (one patch per image). For validation, we used a fixed set of px patches extracted from the remaining 30 images. The models were trained to reproduce color RGB images developed by a standard imaging pipeline. We used our own implementation based on Python and rawkit  wrappers over libRAW . Demosaicing was performed using an adaptive algorithm by Menon et al. .
|Train. speed [it/s]||8.80||1.75||0.66|
|Train. time||17 - 26 min||2-4 h||12 - 22 h|
All NIPs successfully reproduced target images with high fidelity. The resulting color photographs are visually indistinguishable from the targets. Objective fidelity measurements for the validation set are collected in Table 2 (average for all 8 cameras). Interestingly, the trained models often revealed better denoising and demosaicing performance, despite the lack of a denoising step in the simulated pipeline, and the lack of explicit optimization objectives (see Fig. 4).
Of all of the considered models, INet was the easiest to train - not only due to its simplicity, but also because it could be initialized with meaningful parameters that already produced valid results and only needed fine-tuning. We initialized the demosaicing filters with bilinear interpolation, color space conversion with a known multiplication matrix, and gamma correction with a toy model separately trained to reproduce this non-linearity. The UNet model was initialized randomly, but improved rapidly thanks to skip connections. The DNet model took the longest and for a long time had problems with faithful color rendering. The typical training times are reported in Table 2. The measurements were collected on a Nvidia Tesla K80 GPU. The models were trained until the relative change of the average validation loss for the last 5 dropped below . The maximum number of epochs was 50,000. For the DNet model we adjusted the stopping criterion to since the training progress was slow, and often terminated prematurely with incorrect color rendering.
4.2 Image Manipulation
Our experiment mirrors the standard setup for image manipulation detection [15, 2, 4]. We consider four mild post-processing operations: sharpening - implemented as an unsharp mask operator with the following kernel:
applied to the luminance channel in the HSV color space; resampling - implemented as successive 1:2 down-sampling and 2:1 up-sampling using bilinear interpolation; Gaussian filtering - implemented using a convolutional layer with a filter and standard deviation equal 4; JPG compression - implemented using the JPEGNet module with sinusoidal rounding approximation and quality level 80. Fig. 5 shows the post-processed variants of an example image patch: (a) just after manipulation; and (b) after the distribution channel (as seen by the FAN module).
4.3 FAN Model Validation
To validate our FAN model, we initially implemented a simple experiment, where analysis is performed just after image manipulation (no distribution channel distortion, as in ). We used the UNet model to develop larger image patches to guarantee the same size of the inputs for the FAN model ( RGB images). In such conditions, the model works just as expected, and yields classification accuracy of 99% .
4.4 Imaging Pipeline Optimization
In this experiment, we perform forensic analysis at the end of the distribution channel. We consider two optimization modes: (F) only the FAN network is optimized given a fixed NIP model; (F+N) both the FAN and NIP models are optimized jointly. In both cases, the NIPs are pre-initialized with previously trained models (Section 4.1). The training was implemented with two separate Adam optimizers, where the first one updates the FAN (and in the F+N mode also the NIP) and the second one updates the NIP based on the image fidelity objective.
Similarly to previous experiments, we used 120 full-resolution images for training, and the remaining 30 images for validation. From training images, in each iteration we randomly extract new patches. The validation set is fixed at the beginning and includes 100 random patches per each image (3,000 patches in total) for classification accuracy assessment. To speed-up image fidelity evaluation, we used 2 patches per image (60 patches in total). In order to prevent over-representation of empty patches, we bias the selection by outward rejection of patches with pixel variance 0.01, and by 50% chance of keeping patches with variance 0.02. More diverse patches are always accepted.
Due to computational constraints, we performed the experiment for 4 cameras (Canon EOS 40D and EOS 5D, and Nikon D7000, and D90, see Table 1) and for the INet and UNet models only. (Based on preliminary experiments, we excluded the DNet model which rapidly lost and could not regain image representation fidelity.) We ran the optimization for 1,000 epochs, starting with a learning rate of and systematically decreasing by 15% every 100 epochs. Each run was repeated 10 times. The typical progression of validation metrics for the UNet model (classification accuracy and distortion PSNR) is shown in Fig. 6 (sampled every 50 epochs). The distribution channel significantly disrupts forensic analysis, and the classification accuracy drops to for standalone FAN optimization (F). In particular, the FAN struggles to identify three low-pass filtering operations (Gaussain filtering, JPEG compression, and re-sampling; see confusion matrix in Tab. 3a), which bear strong visual resemblance at the end of the distribution channel (Fig. 5b). Optimization of the imaging pipeline significantly increases classification accuracy (up to 98%) and makes all of the manipulation paths easily distinguishable (Tab. 3c).
|(a) standalone FAN optimization (UNet) 55.2%|
|(b) joint FAN+NIP optimization (INet) 67.4%|
|(c) joint FAN+NIP optimization (UNet) 97.8%|
|Camera||PSNR [dB]||SSIM||Acc. [%]|
|EOS 40D||42.9 36.5||0.991 0.969||0.51 0.94|
|EOS 5D||43.0 36.0||0.991 0.966||0.57 0.97|
|D7000||43.7 35.4||0.992 0.963||0.55 0.97|
|D90||43.3 35.8||0.991 0.965||0.56 0.98|
|EOS 40D||42.2 40.9||0.986 0.983||0.50 0.56|
|EOS 5D||40.1 40.6||0.986 0.985||0.55 0.56|
|D7000||41.5 39.7||0.987 0.981||0.54 0.65|
|D90||43.8 40.8||0.992 0.986||0.56 0.65|
Fig. 7 collects the obtained results for both the UNet and INet models. UNet delivers consistent and significant improvement in analysis accuracy, which increases up to 95-98%. INet is much simpler and yields only modest classification improvements - up to . It also lacked consistency and for one of the tested cameras yielded virtually no improvement. However, since the final decision is often taken by boosting results from many patches from a larger image, we believe the model could still be useful given its simplicity and minimal quality degradation.
The observed improvement in forensic classification accuracy comes at the cost of image distortion, and leads to artifacts in the developed photographs. In our experiments, photo development fidelity dropped to 36 dB (PSNR) / 0.96 (SSIM) for the UNet model and 40 dB/0.97 for the INet model. The complete results are collected in Tab. 4. Qualitative illustration of the observed artifacts is shown in Fig. 8. The figure shows diverse image patches developed by several variants of the UNet/INet models (different training runs). The artifacts vary in severity from disturbing to imperceptible, and tend to be well masked by image content. INet’s limited flexibility resulted in virtually imperceptible distortions, which compensates for modest improvements in FAN’s classification accuracy.
5 Discussion and Future Work
We replaced the photo acquisition pipeline with a neural network, and developed a fully-differentiable model of the entire photo acquisition and distribution channel. Such an approach allows for joint optimization of photo analysis and acquisition models. Our results show that it is possible to optimize the imaging pipeline for better provenance analysis capabilities at the end of complex distribution channels. We observed a significant improvement in manipulation detection accuracy w.r.t state-of-the-art classical forensics  (accuracy increased from to nearly 98%).
Competing optimization objectives lead to imaging artifacts which tend to be well masked in textured areas, but are currently too visible in empty and flat regions. Limited artifacts and classification improvement for a simpler pipeline model indicate that better configurations should be achievable by learning to control the fidelity-accuracy trade-off. Further improvements may also possible be exploiting explicit HVS modeling. In the future, would like to consider joint optimization of additional components of the acquisition-distribution channel: e.g., Bayer filters in the camera, or lossy compression during content distribution.
-  S. Agarwal and H. Farid. Photo forensics from jpeg dimples. In Information Forensics and Security (WIFS), 2017 IEEE Workshop on, pages 1–6. IEEE, 2017.
-  B. Bayar and M. Stamm. Constrained convolutional neural networks: A new approach towards general purpose image manipulation detection. IEEE Transactions on Information Forensics and Security, 13(11):2691–2706, 2018.
-  P. Blythe and J. Fridrich. Secure digital camera. In Digital Forensic Research Workshop, pages 11–13, 2004.
-  M. Boroumand and J. Fridrich. Deep learning for detecting processing history of images. Electronic Imaging, 2018(7):1–9, 2018.
-  H. Burger, C. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with bm3d? In Int. Conf. Computer Vision and Pattern Recognition (CVPR), pages 2392–2399. IEEE, 2012.
-  V. Bychkovsky, S. Paris, E. Chan, and E. Durand. Learning photographic global tonal adjustment with a database of input / output image pairs. In IEEE Conf. on Computer Vision and Pattern Recognition, 2011.
-  P. Cameron and S. Whited. Rawkit. https://rawkit.readthedocs.io/en/latest/. Accessed 5 Nov 2018.
-  C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see in the dark. arXiv, 2018. arXiv:1805.01934.
-  M. Chen, J. Fridrich, M. Goljan, and J. Lukas. Determining image origin and integrity using sensor noise. IEEE Trans. Inf. Forensics Security, 3(1):74–90, 2008.
-  D. Cozzolino and L. Verdoliva. Noiseprint: a cnn-based camera model fingerprint. arXiv preprint arXiv:1808.08396, 2018.
-  D. Dang-Nguyen, C. Pasquini, V. Conotter, and G. Boato. RAISE - a raw images dataset for digital image forensics. In Proc. of ACM Multimedia Systems, 2015.
-  Z. Dias, S. Goldenstein, and A. Rocha. Large-scale image phylogeny: Tracing image ancestral relationships. IEEE Multimedia, 20(3):58–70, 2013.
-  Z. Dias, S. Goldenstein, and A. Rocha. Toward image phylogeny forests: Automatically recovering semantically similar image relationships. Forensic science international, 231(1):178–189, 2013.
-  G. Eilertsen, J. Kronander, G. Denes, R. Mantiuk, and J. Unger. HDR image reconstruction from a single exposure using deep CNNs. ACM Transactions on Graphics, 36(6):1–15, nov 2017.
-  W. Fan, K. Wang, and F. Cayre. General-purpose image forensics using patch likelihood under image statistical models. In Proc. of IEEE Int. Workshop on Inf. Forensics and Security, 2015.
-  M. Gharbi, G. Chaurasia, S. Paris, and F. Durand. Deep joint demosaicking and denoising. ACM Transactions on Graphics, 35(6):1–12, nov 2016.
-  A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, and L. Van Gool. Wespe: weakly supervised photo enhancer for digital cameras. arXiv preprint arXiv:1709.01118, 2017.
-  H. Jiang, Q. Tian, J. Farrell, and B. A. Wandell. Learning the image processing pipeline. IEEE Transactions on Image Processing, 26(10):5032–5042, oct 2017.
-  Joint Photographic Experts Group. JPEG Privacy & Security. https://jpeg.org/items/20150910_privacy_security_summary.html. Accessed 24 Oct 2018.
-  O. Kapah and H. Z. Hel-Or. Demosaicking using artificial neural networks. In Applications of Artificial Neural Networks in Image Processing V, volume 3962, pages 112–121. International Society for Optics and Photonics, 2000.
-  E. Kee, J. O’brien, and H. Farid. Exposing photo manipulation from shading and shadows. ACM Trans. Graph., 33(5):165–1, 2014.
-  F. Kokkinos and S. Lefkimmiatis. Deep image demosaicking using a cascade of convolutional residual denoising networks. arXiv, 2018. arXiv:1803.05215v4.
-  P. Korus. Digital image integrity–a survey of protection and verification techniques. Digital Signal Processing, 71:1–26, 2017.
-  P. Korus, J. Bialas, and A. Dziech. Towards practical self-embedding for JPEG-compressed digital images. IEEE Trans. Multimedia, 17(2):157–170, Feb 2015.
-  J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila. Noise2noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189, 2018.
-  H. Li, B. Li, S. Tan, and J. Huang. Detection of deep network generated images using disparities in color components. arXiv preprint arXiv:1808.07276, 2018.
-  LibRaw LLC. libRAW. https://www.libraw.org/. Accessed 5 Nov 2018.
-  D. Menon, S. Andriani, and G. Calvagno. Demosaicing with directional filtering and a posteriori decision. IEEE Transactions on Image Processing, 16(1):132–141, 2007.
-  A. Piva. An overview on image forensics. ISRN Signal Processing, 2013, 2013. Article ID 496701.
-  O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Int. Conf. on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
-  M. Stamm, M. Wu, and K. Liu. Information forensics: An overview of the first decade. IEEE Access, 1:167–200, 2013.
-  Z. H. Sun and A. Hoogs. Object insertion and removal in images with mirror reflection. In Information Forensics and Security (WIFS), 2017 IEEE Workshop on, pages 1–6. IEEE, 2017.
-  N.-S. Syu, Y.-S. Chen, and Y.-Y. Chuang. Learning deep convolutional networks for demosaicing. arXiv, 2018. arXiv:1802.03769.
-  R. Tan, K. Zhang, W. Zuo, and L. Zhang. Color image demosaicking via deep residual learning. In IEEE Int. Conf. Multimedia and Expo (ICME), 2017.
-  E. Wengrowski, Z. Sun, and A. Hoogs. Reflection correspondence for exposing photograph manipulation. In Image Processing (ICIP), 2017 IEEE International Conference on, pages 4317–4321. IEEE, 2017.
-  B. Wronski and P. Milanfar. See better and further with super res zoom on the pixel 3. https://ai.googleblog.com/2018/10/see-better-and-further-with-super-res.html. Accessed on 22 Oct 2018.
-  C. P. Yan and C. M. Pun. Multi-scale difference map fusion for tamper localization using binary ranking hashing. IEEE Transactions on Information Forensics and Security, 12(9):2144–2158, 2017.
-  M. Zampoglou, S. Papadopoulos, and Y. Kompatsiaris. Large-scale evaluation of splicing localization algorithms for web images. Multimedia Tools and Applications, 76(4):4801–4834, 2017.
-  K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
-  P. Zhou, X. Han, V. Morariu, and L. S. Davis. Learning rich features for image manipulation detection. In IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), pages 1053–1061, 2018.