Forgery Blind Inspection for Detecting Manipulations of Gel Electrophoresis Images
Recently, falsified images have been found in papers involved in research misconducts. However, although there have been many image forgery detection methods, none of them was designed for molecular-biological experiment images. In this paper, we proposed a fast blind inquiry method, named FBI, for integrity of images obtained from two common sorts of molecular experiments, i.e., western blot (WB) and polymerase chain reaction (PCR). Based on an optimized pseudo-background capable of highlighting local residues, FBI can reveal traceable vestiges suggesting inappropriate local modifications on WB/PCR images. Additionally, because the optimized pseudo-background is derived according to a closed-form solution, FBI is computationally efficient and thus suitable for large scale inquiry tasks for WB/PCR image integrity. We applied FBI on several papers questioned by the public on PUBPEER, and our results show that figures of those papers indeed contain doubtful unnatural patterns.
In scientific papers, there frequently exist edited image data, resulting from inappropriately local post-processing operations—including deliberately concealed cropping of images, deliberately concealed removal of lanes from gels and blots, or excessive processing to emphasize one region in the image at the expense of others. Although post-processing rules have been clearly stated as guidelines or standards [5, 12], some researchers still cross the borderline. For example, as indicated by Bik et al. in 2016 , about of 20,261 screened papers published in 40 journals from 1995 to 2014 contained problematic figures; and, many research-misconducts and faked-research scandals still happened in recent years [16, 3, 18]. To this end, a blind algorithm, which can detect the presence of vestige resulting from inappropriate local modifications on experiment images without prior knowledge of signal characteristics or imaging information, becomes a necessity.
Gel electrophoresis is a primary method for analyzing the macromolecules, e.g. DNA/RNA and proteins in the field of molecular biology. Based on the size and charge, the DNA/RNA fragments or proteins can be separated in the agarose gel. The distal images from the Western blot (WB) analysis and the gel electrophoresis result of polymerase chain reaction (PCR) sample are two common most data presented in scientific papers, as shown in Figure 1. The western blot (WB) imaging, also known as protein immunoblot imaging, is an analytical technique to detect specific proteins in a sample [29, 4]. Polymerase chain reaction (PCR) [13, 17] is a method for making several copies of a specific DNA segment and is widely used in applications such as DNA cloning for sequencing and detection of pathogens in nucleic acid tests for the diagnosis of infectious diseases. The results of these two techniques can be recorded as digital images via a camera or a scanner and demonstrated as grayscale images. Because the whole imaging process is complex and time-consuming, people may dishonestly create new results without repeating the experiments by editing/modifying existed WB/PCR images.
The most common post-processing operation of WB/PCR imagery is cloning, i.e., copy-paste. Hence, although there have been several categories of image forgery detection methods [8, 2, 25, 24], only copy-move forgery detectors and feature-based methods, such as [6, 24, 35] and , are related to this issue. However, because a user can always copy a WB/PCR band from one unpublished experiment result and paste it onto another, it is challenging to discriminate whether an unseen WB/PCR band is a clone or not. Moreover, even if two bands are suspected to be clones from the same data due to a high PSNR (peak signal-to-noise ratio) value between them, it is still not sufficient to rule out the possibility that these two bands are just similar to each other. Consequently, how to reveal the vestige of inappropriate post-processing operations, if any, becomes the prior concern in this molecular-biological image forgery detection problem.
The proposed FBI aims to highlight the discontinuity, caused probably by man-made modifications, in the magnitude of estimated background noise. Because each scanning/photoing procedure ought to have its own noise pattern and noise distribution, post-processing operations would hopefully leave some traceable vestiges, i.e., unnatural patterns, on a modified image. Specifically, because natural images usually have curvilinear contours and smooth transition areas, we can detect and even localize modifications by checking if there exist i) discontinuity or clear-cut boundaries in the background noise, and ii) rectangular contours on an input WB/PCR image. Consequently, FBI follows a common idea adopted in the TFT-LCD mura defect detection problem for estimating a pseudo-background of a very low-contrast image and adopts a classical concept used in Macro-Economics, i.e., Hodrick-Prescott filter , for separating the volatility term from a 1D time series. We design an optimization equation and derive its closed-form solution to estimate a best suitable background trend and the residue component of an input image. Based on this design, not only we can quantitatively describe the condition for revealing unnatural patterns, but also the optimized result can be derived in a deterministic fashion rather than a stochastic manner. In sum, FBI performs blind detection by making invisible unnatural patterns just noticeable, as the concept described in [34, 33, 26].
The contributions of FBI are therefore threefold.
Extending background analysis to a new realm, FBI is the first blind inquiry method for integrity of gel electrophoresis images.
FBI is nearly parameter free, and all its inspection results are derived on the same basis. Therefore, FBM can avoid false-alarms effectively, as will be described in Section IV.
FBI operates based on the closed-form solution described in Eq. (7). It is computationally fast and thus suitable for large-scale inquiry tasks.
Ii Related Work
Research integrity is the most essential thing in our research community. In order to deter tampering with experiment images, the Office of Research Integrity of U.S. already provided a photoshop-based macro, named Droplets , for screening falsified WB/PCR images. However, because Droplets operates based on only histogram equalization, gradient map and pseudo-coloring, it will fail to reveal features if i) the inspector sets inappropriate parameters, or ii) the brightness of a falsified WB/PCR image was carefully adjusted. Hence, Droplets cannot be effective enough.
Revealing invisible unnatural patterns on WB/PCR image is conceptually similar to detecting “mura defect” that describes the uneven patches of changes in luminance on thin-film-transistor liquid-crystal display (TFT-LCD) panels. However, most mura detectors were designed to identity defect regions, in which brightness is slightly different from the low-contrast background. Mura detection methods, such as [33, 26, 31, 32, 14, 28], would be severely disturbed by foreground bands of WB/PCR images and thus not suitable for WB/PCR forgery detection.
Therefore, we developed our FBI in a very conservative way by i) following the common idea of pseudo-background estimation used in mura defector design, and ii) exploiting a classical concept, named Hodrick-Prescott (HP) filter , in Macroeconomics. The HP filter was developed to separate permanent shocks , i.e., the stationary component or the trend component, of a 1D data series, which depicts a business cycle, from the temporary shocks causing volatility, i.e., the non-stationary part or cyclic component. The concept of HP filter can be summarized as below
Iii Fast Blind Inquiry for Integrity of Gel Electrophoresis Imagery
Similar to mura detection methods designed for low-contrast images, the proposed method needs to extract an estimated pseudo-background from an input WB/PCR image first. Inspired by a general idea of extracting trend and volatility components within a 1-dimensional time series, we use the optimization equation shown in Eq. (2) to define a pseudo-background of an input image . That is,
where denotes a high pass kernel. This equation forces the to-be-estimated trend component to be a smoothened approximation of . Obviously, when and be images of dimension and , Eq. (2) degenerates to the 1D Hodrick-Prescott filter described in Eq. (1).
However, one primary concern, in biologists’ points of view, in this blind inspection problem is whether the solution can be deterministic. Any randomness in the optimization solver may make it difficult for the investigation committee to role-out possible chance coincidences. Hence, a closed-form solution of Eq.(2) is required. To find the closed-form solution, we rewrote first Eq. (2) by replacing Frobenius norm by matrix trace. That is,
where is the Toeplitz matrix of a 1D -tap-long high-pass filter . That is, is now assumed to be separable and , where “” denotes matrix transpose. Then, letting the partial derivative of with respect to be zero, we obtain
Therefore, we have
This equation guarantees that the input image can be expressed as a weighting sum of a low-passed component and a high-passed term .
Because denotes the matrix multiplication form of a 2D convolution , Eq. (5) can be rewritten as
Consequently, by using discrete Fourier transform, the closed-form solution of can be derived as
where denotes Hadamard product, and is a constant matrix whose entries are all 1’s. Based on , the residue pattern of the input image is defined to be the absolute difference between and , i.e., . Based on , the vestige of man-made modifications can be revealed.
Iii-a Selecting A High-pass Filter
Although was assumed to be separable whiling deriving the closed-form solution, this constraint can be relaxed in practice. Based on the concept described by Eq.(6) that the input image is a linear combination of a low-passed component and a high-passed term extracted from , any high-pass filter , no matter it is separable or not, can result in an optimal pseudo-background . In practice, we set , where is a Gaussian kernel, e.g. a kernel given by MATLAB built-in function fspecial(‘gaussian’, [3 3], 1.0).
Iii-B Revealing Unnatural Patterns
To check if there exists any unusual pattern, the obtained is further processed by following steps:
Step-1: Bring all pixel values of into the range and apply hard thresholding on normalized with a user-specified threshold value .
Step-2: Binarize the thresholded normalized to obtain an indicator map , as examples shown in Figures 2 and 3.
Step-3: Fuse and via alpha blending, after staining the white area of yellow, to highlight where unnatural patterns locate, as the example shown in Figure 3(c).
Empirically, the default value of is ; and, the default value of is , although any value in between and is good for .
Iv Experiment Result
Iv-a Robustness against compression
We first verify FBI’s robustness against JPEG compression because published images are all compressed. In Figure 2, column-(a) shows mother images, and each of other columns is associated with a different compression quality setting. The upper row of Figure 2 shows inquiry results of simulated unmodified WB/PCR images whose foreground, i.e., the triangle and its bounding box, and background were contaminated by the same level of Gaussian noise (); and, the lower row contains results of copy-pasted forgery simulations whose background and foreground components were affected by different levels of Gaussian noises ( and ).
Based on Figure 2, we have four concluding remarks. First, for an unmodified image with homogeneous noise (upper row), the indicator map shows the same patterns on both background and foreground, no matter how severely an image was compressed. Second, for an edited WB/PCR images (lower row), the indicator map can highlight the difference between background template and copy-pasted foreground. Third, a high compression ratio (low compression quality) will not invalidate FBI; instead, a high compression ratio can make the copy-pasted foreground more distinguishable from background template. Fourth, the most important of all, by conservatively reporting only the pattern shown in the lower part of Figure 2(f) as an almost-surely falsification, FBI is expected to be capable of avoiding false-alarm.
Iv-B Tests on Open Data
Figure 3 shows the analysis results of Figure-4A in , one of the retracted papers listed in the investigation report released by Ohio State University [16, 20]. There are 252 western blot (WB) image bands—4 bands of Ctr, 4 bands of L, 4 bands of CG, and other bands—in Figure 3(c); however, both the indicator map demonstrated in Figure 3(a)-(b) show that the band in the row entitled p-p70S6K is an empty zone (the horizontal black stripe). In addition, we can also observe that although different values result in different indicator maps, locations and contour shapes of empty zones on Figures 3(a)-(b) are nearly consistent. Consequently, we deem that this empty zone forms an unnatural pattern, which might be a vestige of man-made modifications, such as erasure or block-wise region removal.
Next, demonstrated in Figure 4 are the WB images published with [7, 30] and ; these three papers have already drawn public attention and been questioned on following Pubpeer discussion pages: [21, 22, 23]. Here, three types of patterns can be observed. Examples of type-1 pattern include the -Tubulin and p-AMPK rows in Figure 3(c), and the AKT and the Tubulin rows in Figure 4(d). We consider this kind of patterns is normal and standard because its indicator map is homogeneous and contains no empty zone and no vertical interruption stripe. Furthermore, we define the type-2 pattern to be an almost empty zone. For example, the and the bands of the IPP5 row in Figure 4(a), the band of the C-Caspase3 row and the band of the Caspase8 row in Figure 4(b), and the three bands (of the P-AKT row) beneath the minus signs in Figure 4(d) are all of this kind. We consider the type-2 pattern unnatural and surmise that this pattern might result from a procedure similar to the one causes the empty zone demonstrated in Figure 3. At the end of this paragraph, we need to emphasize that the band of the C-Caspase3 row in Figure 4(b) has a sharp vertical edge at its right-hand-side boarder, and this appearance must not be a change coincidence.
Finally, the type-3 pattern includes those containing block-wise non-empty zones and those containing stripe-wise empty zones. For instance, bands in the mTOR row in Figure 4(c) and the band in the CF row in Figure 4(b) are exactly of this kind. Primary features of a type-3 pattern include i) a band (or an area containing multiple bands) that is independently surrounded with a rectangular zone formed by non-zero entries of the indicator map , and ii) a non-zero zone that is secluded from each other by some narrow, vertical stripe-wise empty zones (gray area). Because it does not make sense to modify the gel background where no reaction/response happens, we conjectured that a type-3 pattern would result from a copy-paste of a rectangular region. Such conjecture is supported by the fact that, by using template matching, the PSNR between the and the bands of the mTOR row in Figure 4(c) is 23.81 dB, and the PSNR between the and the bands is 22.08 dB. Besides, because the public also suspected that the mTOR row contains copy-move forgeries , we exploited following experiments to clarify this situation (For more demonstrations, please refer to our supplemental document and .).
Iv-C Interpretation of type-2 and type-3 patterns:
We utilized one additional simulation dataset to clarify the cause of each unnatural pattern we met. This simulation dataset consists of thirteen post-processed WB/PCR images, which were designed to reproduce the unnatural patterns demonstrated in previous subsection.
Figures 1(b) and 5(b) were designed to clarify the causes of type-2 and type-3 patterns. Figure 1(b) was created by copying three rectangular areas independently from other PCR images, pasting them together on the same template, and then adjusting image brightness and contrast properly. Meanwhile, Figure 5(b) was created by removing its and bands from the source image. Hence, Figure 5(b) is disguised as a new PCR result with negative response at its and bands, and Figure 1(b) looks as if a common experiment result containing six positive bands and one negative band.
Demonstrated in Figures 5(a) and (c) are the inquiry results derived by FBI. The black rectangular region in Figure 5(c) denotes an empty zone in the indicator map, and this empty zone corresponds to what we removed from the source image. Figure 5(a) also reveals the way we create Figure 1(b)—a background template with three copy-pasted rectangular foreground. Consequently, a possible way to create type-2 pattern is erasure or block-wise region removal, and the type-3 pattern can be reproduced by a typical copy-paste procedure. FBI also confirmed that the questions raised by the public on PUBPEER were reasonable, and images on those papers were indeed problematic.
|(a) (b) (c)|
V Concluding Remarks
In this paper, we proposed a fast blind inquiry method, named FBI, for integrity of images obtained from western blot (WB) and polymerase chain reaction (PCR) results. Based on an optimized pseudo-background, FBI can reveal traceable vestiges of inappropriate local modifications on WB/PCR images. Also, FBI is suitable for large scale inspection tasks for WB/PCR image integrity because it is computationally efficient. Our experiment results show that images on papers questioned by the public on PUBPEER are indeed doubtful. Finally, we have to emphasize two points. First, FBI was not designed for accusing anyone; instead, FBI was developed for helping academic community identify problematic figures and irreproducible experiments. Second, whether an image with unnatural pattern invalidates its significance in that very field is beyond the scope of this FBI method.
This work is supported by the Ministry of Science and Technology, Taiwan (MOST 107-2320-B-030-012-MY3). The author wants to thank Prof. Cheng-Ting Chien for providing source WB/PCR images. The authors also want to thank Prof. Yung-Chang Chen and Dr. Hsiu-Ming Chang for their suggestions on this paper.
Hao-Chiang Shao (Member, IEEE) received his Ph.D. in electrical engineering from National Tsing Hua University, Taiwan, in 2012. He has been an Assistant Professor with the Dept. Statistics and Information Science, Fu Jen Catholic University, Taiwan, since 2018. During 2012 to 2017, he was a postdoctoral researcher with the Institute of Information Science, Academia Sinica, involved in a series of Drosophila brain research projects; in 2017–2018, he was an R&D engineer with the Computational Intelligence Technology Center, Industrial Technology Research Institute, Taiwan, taking charges of DNN-based automated optical inspection (AOI) projects. His research interests include 2D+Z image atlasing, 3D mesh processing, big industrial image data analysis, and machine learning.
Ya-Jen Cheng received her Ph.D. in molecular and cellular biology from National Tsing Hua University, Taiwan, in 2012. She has been the manager to coordinate and execute the schemes planed by Neuroscience Program of Academia Sinica (NPAS) since 2010. She experts in creating strategic management, setting budgets, and managing risks with the leader and team members to effectively align with and support key research and academic expectations. With her research background in genetics and imaging techniques, she is also the manager of the imaging equipment and the related analyzing system in the neuroscience core facility. She is responsible for arranging the technical training and also training for imaging processing.
Meng-Yun Duh received her B.A. from Fu Jen Catholic University, Taiwan, in 2019. Her research interests include statistics, data science, and biomedical image forgery detection. She is now chasing her M.A. degree.
Chia-Wen Lin (Fellow, IEEE) received his Ph.D. from National Tsing Hua University (NTHU), Taiwan, in 2000. Dr. Lin is currently Professor with the Department of Electrical Engineering and the Institute of Communications Engineering, NTHU. His research interests include image/video processing, computer vision, and machine learning. He served as Distinguished Lecturer of IEEE Circuits and Systems Society (2018–2019). He is Chair of IEEE ICME Steering Committee. He served as TPC Co-Chair of IEEE ICIP 2019 and IEEE ICME 2010, and General Co-Chair of IEEE VCIP 2018. He was a recipient of Outstanding Electrical Engineer Professor Award presented by the Chinese Institute of Electrical Engineering, Taiwan. He received two best paper awards from VCIP 2010 and 2015. He has served as an Associate Editor of IEEE Transactions on Image Processing, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Multimedia, and IEEE Multimedia. He served as a Steering Committee member of IEEE Transactions on Multimedia from 2013 to 2015.
- (2016) The prevalence of inappropriate image duplication in biomedical research publications. mBio 7 (3), pp. e00809–16. Cited by: §I.
- (2013) Digital image forgery detection using passive techniques: a survey. Digit. Invest. 10 (3), pp. 226–245. Cited by: §I.
- University of kentucky researcher under investigation after article retraction. External Links: Cited by: §I.
- (1981) âWestern blottingâ: electrophoretic transfer of proteins from sodium dodecyl sulfate-polyacrylamide gels to unmodified nitrocellulose and radiographic detection with antibody and radioiodinated protein a. Anal. Biochem. 112 (2), pp. 195–203. Cited by: §I.
- Https://www.cell.com/figureguidelines. Cited by: §I.
- (2012) An evaluation of popular copy-move forgery detection approaches. IEEE Trans. Inf. Forensics Security 7 (6), pp. 1841–1854. Cited by: §I.
- (2010) Functional effects of ptpn11 (shp2) mutations causing leopard syndrome on epidermal growth factor-induced phosphoinositide 3-kinase/akt/glycogen synthase kinase 3 signaling. Mol. Cell. Biol. 30 (10), pp. 2498–2507. Cited by: Fig. 4, §IV-B.
- (2009) Image forgery detection. a survey. IEEE Signal Process. Mag., pp. 16–25. Cited by: §I.
- (2009) Induction of apoptosis in human leukemia cells by grape seed extract occurs via activation of c-jun nh2-terminal kinase. Clin. Cancer Res. 15 (1), pp. 140–149. Cited by: Fig. 4, §IV-B.
- (2010) Development of novel adenosine monophosphate-activated protein kinase activators. J. Med. Chem. 53 (6), pp. 2552–2561. Cited by: Fig. 3, §IV-B.
- (1997) Postwar us business cycles: an empirical investigation. Journal of Money, credit, and Banking, pp. 1–16. Cited by: §I, §II.
- Https://www.nature.com/nature-research/editorial-policies/image-integrity. Cited by: §I.
- (1971) Studies on polynucleotides: xcvi. repair replication of short synthetic dna’s as catalyzed by dna polymerases. J. Mol. Biol. 56 (2), pp. 341–361. Cited by: §I.
- (2004) Automatic detection of region-mura defect in tft-lcd. IEICE Trans. Inf.& Syst. 87 (10), pp. 2371–2378. Cited by: §II.
- (2017) Image forgery localization via integrating tampering possibility maps. IEEE Trans. Inf. Forensics Security 12 (5), pp. 1240–1252. Cited by: §I.
- Doi:10.1126/science.aat7511. External Links: Cited by: §I, §IV-B.
- (1994) The polymerase chain reaction (nobel lecture). Angewandte Chemie International Edition in English 33 (12), pp. 1209–1213. Cited by: §I.
- Doi:10.1126/science.aba3212. External Links: Cited by: §I.
- ”Https://ori.hhs.gov/droplets”. Cited by: §II.
OSU investigation report:
https://presspage-production-content.s3.amazonaws.com/uploads/2170/chen-documents-combined.pdf?10000. Cited by: §IV-B.
- Https://pubpeer.com/publications/460af825d0f43944898b2c689be51e. Cited by: §IV-B.
- Https://pubpeer.com/publications/7e17b85fb91772b61ad5c903b83ca3. Cited by: §IV-B, §IV-B.
- Https://pubpeer.com/publications/e0892176ca8d34995b8b8fe8472396. Cited by: §IV-B.
- (2015) Image forgery detection using adaptive over segmentation and feature point matching. IEEE Trans. Inf. Forensics Security 10 (8), pp. 1705–1716. Cited by: §I.
- (2013) Survey on blind image forgery detection. IET Image Process. 7 (7), pp. 660–670. Cited by: §I.
- (2009) Robust segmentation for automatic detection of mura patterns. In IEEE 13th International Symposium on Consumer Electronics, 2009. (ISCE’09), pp. 267–270. Cited by: §I, §II.
- (2018) Unveiling vestiges of man-made modifications on molecular-biological experiment images. 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 534–538. Cited by: §IV-B.
- (2006) A mura detection method. Pattern Recognit. 39 (6), pp. 1044–1052. Cited by: §II.
- (1979) Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: procedure and some applications. Proceedings of the National Academy of Sciences 76 (9), pp. 4350–4354. Cited by: §I.
- (2008) IPP5, a novel protein inhibitor of protein phosphatase 1, promotes g1/s progression in a thr-40-dependent manner. J. Biol. Chem. 283 (18), pp. 12076–12084. Cited by: Fig. 4, §IV-B.
- (2005) A standard model for foveal detection of spatial contrast. J. Vis. 5 (9), pp. 6–6. Cited by: §II.
- (2000) Visual detection of spatial contrast patterns: evaluation of five simple models.. Opt. Express 6 (1), pp. 12–33. Cited by: §II.
- (2007) The spatial standard observer: a new tool for display metrology. Information Display 23 (1), pp. 12. Cited by: §I, §II.
- (US2006/0165311 A1) Spatial standard observer. United States Patent Application Publication. Cited by: §I.
- (2016) Iterative copy-move forgery detection based on a new interest point detector. IEEE Trans. Inf. Forensics Security 11 (11), pp. 2499–2512. Cited by: §I.