Hyper-Hue and EMAP on Hyperspectral Images for Supervised Layer Decomposition of Old Master Drawings
Old master drawings were mostly created step by step in several layers using different materials. To art historians and restorers, examination of these layers brings various insights into the artistic work process and helps to answer questions about the object, its attribution and its authenticity. However, these layers typically overlap and are oftentimes difficult to differentiate with the unaided eye. For example, a common layer combination is red chalk under ink.
In this work, we propose an image processing pipeline that operates on hyperspectral images to separate such layers. Using this pipeline, we show that hyperspectral images enable better layer separation than RGB images, and that spectral focus stacking aids the layer separation. In particular, we propose to use two descriptors in hyperspectral historical document analysis, namely hyper-hue and extended multi-attribute profile (EMAP). Our comparative results with other features underline the efficacy of the three proposed improvements.
Copyright 2018 IEEE. Published in the 2018 International Conference on Image Processing (ICIP 2018), scheduled for October 7-10, 2018 in Athens, Greece. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: +Intl.908-562-3966.
Red chalk was a highly popular drawing material until the late nineteenth century [1, 2]. In the artistic work process, it has oftentimes been used for creating a first sketch, in order to later overdraw it with ink. For art historians today, these sketches provide insights into the creation process of the art work. In particular, differences between the underlying sketch and the overdrawn picture can indicate changes in the direction of the work.
In this work, we investigate the particular case where red chalk is overdrawn by ink. A widely used technique to visualize structures below a layer of ink is to image via infrared reflectography (IRR) the object in the infrared range, at wavelengths above 2000 nm. In this regime, ink becomes transparent. However, this approach is not applicable to make red chalk. Red chalk consists primarily of natural red clay containing iron oxide, and the reflectance of red chalk at wavelengths above 2000 nm is very similar to the image carrier (i.e., the paper or parchment). As a consequence, this range of wavelengths can not be used to visualize over-painted strata of red chalk [3, 4]. The difficulties of displaying and distinguishing the drawn strata by conventional IRR, or with remission-spectroscopy poses a significant challenge to recover the underlying substrate layers. This is also shown in the comparative sequence of images from the apocryphal Rembrandt drawings in Munich (visible spectrum versus infrared imaging), published by Burmester and Renger [3, pp. 19-31, Figs. 7a-21b].
In this work, we propose to close this diagnostic gap to visualize red chalk below ink by using hyperspectral imaging together with a pattern recognition pipeline. There are many works in the literature that used hyperspectral imaging for document analysis and proved its superiority to RGB imaging [5, 6, 7]. Our contributions are three-fold: We propose two descriptors for using in hyperspectral historical document analysis, namely hyper-hue and extended multi-attribute profile (EMAP), and we address a common artifact in hyperspectral imaging called focus shifting, and propose spectral focus stacking as its solution. We evaluate the proposed approaches on drawings that are created to exactly mimic the original work process.
Ii Hyperspectral Descriptors for Sketch Layer Separation
Ii-a Extended Multi-Attribute Profile (EMAP)
Attribute profiles are popular tools in remote sensing [8, 9]. The idea is to abstract morphological operators like opening or closing from specific shapes of structuring elements. The building blocks of attribute profiles are attribute filters that operate on connected components (CC) of lower or equal gray level intensities. On each CC in the image, an attribute (e.g., the area, standard deviation, or diameter of the CC) is computed and compared to a threshold . If , it is preserved. Otherwise, the -th CC is merged with the closest neighboring CC. Analogously to classical morphological operators, attribute thickening (denoted as ) is the process of merging the CCs of image to neighboring CC with higher gray level. Attribute thinning (denoted as ) is the process of merging the CCs of image to neighboring CC with lower gray level.
The attribute thinning profile of an image , denoted by , is generated by concatenating series of attribute thinning with an increasing criterion size :
Analogously, attribute thickening profile of an image , denoted by , is generated by concatenating series of attribute thickenings with an increasing criterion size :
The attribute profile (AP) is generated by concatenating series of attribute thickening and thinning profiles with an increasing criterion size :
In the case of , . Therefore, attribute profile vector’s size will be , i.e., for attribute thinning, for attribute closing and one for the original image.
By using more than one attribute and concatenating the generated APs, multi-attribute profiles (MAPs) are generated. Finally, stacking the computed MAPs over each spectral channel of a multi-/hyper-spectral image results in the extended multi-attribute profile (EMAP). EMAPs use both spatial and spectral signatures of a hyperspectral image and are capable of modeling and describing an image based on different attributes, e.g. area, standard deviation and moment of the CCs. In this work, we used the same attributes and threshold values as the work by Ghamisi et al. .
In RGB color space, pixels are 3-dimensional. In this cube, corresponds to black color and represents the white color. The vector connecting these two points, the diagonal of the cube, is called achromatic axis. By projecting all points in the RGB cube on a plane which is perpendicular to the achromatic axis and includes the point , the so-called chromatic plane with regular hexagon shaped borders is formed.
For -dimensional hyperspectral images, the same concept can be extended. Each point is represented by values. Therefore, an -dimensional hyperspectral image is defined as
The vector connecting the -dimensional points, let denote the black in dimensions, which we call HyperBlack, and let analogously denote HyperWhite. Let furthermore a denote the achromatic hyper-axis, which is the normal vector of the hyper-chromatic plane P. In order to mathematically define P, we derive its spanning unit vectors. In an -dimensional space, P is spanned by pairwise perpendicular -dimensional unit vectors, . The vectors have the properties that (1) they start from the point HyperBlack, (2) they are pairwise perpendicular, (3) they are unit vectors and therefore their norm is , (4) the direction of points towards the chromatic hyper-space, (5) are orthogonal to a.
Suppose the first elements of are and the remaining elements are non-zero. From these elements, denote the first one as and the remaining elements as . As it is derived in , we obtain a basis for by setting and . The projection of a hyperspectral point onto is then
Liu et al.  defined hyper-hue h, saturation and intensity of a hyperspectral point x via its projection c as
In this way, an extension of HSI color space is defined for hyperspectral images.
Iii Processing Pipeline
Iii-a Sensitivity Normalization
Hyperspectral imaging setups suffer from various limitations and artifacts which need to be corrected. Fig. 1 (a)-(e) show five sample channels of a raw hyperspectral image (HSI), namely channel 20 (representing 407.31 nm wavelength), channel 40 (representing 455.41 nm wavelength), channel 70 (representing 528.32 nm wavelength), channel 130 (representing 676.93 nm wavelength), channel 230 (representing 932.82 nm wavelength). As it can be observed, the sensitivity of the HS camera sensor along the spectrum is not uniform. Using a white reference, the uneven sensitivity can be corrected. Fig. 2-(a) shows the normalized sensitivity diagram of the sensor measured from a white reference. The inverse of this diagram is used as the sensitivity normalization coefficient. Fig. 2 (b)-(f) show the sensitivity-normalized version of the channels presented in Fig. 1 (a)-(e), respectively.
Every imaging setup needs good lighting for an acceptable acquisition and HS imaging is not an exception. In real world scenario, in a museum for instance, the subject may not be evenly illuminated. To simulate this situation, we sidelit our scene. Using a white reference, we estimate the uneven lighting, as shown in Fig. 3-(a). Fig. 3 (b)-(f) show the illumination-corrected version of the Fig. 2 (b)-(f), respectively.
Iii-B Focus Stacking
Common HS cameras suffer from focus shifting, which is a well-known artifact in the field. It leads to the issue that not all of the channels are simultaneously in focus when making a multispectral acquisition. Fig. 4 shows this behavior for two hyperspectral images, namely and . For acquiring , the lens is focused with the blue range aimed to be in focus. , on the other hand, is captured by having the red spectrum in focus. Fig. 4 (a) shows channel 41, representing 458.82 nm wavelength, of . Fig. 4 (b) shows the same channel of . Especially on fine edges, we can observe that (a) is sharper and more in focus. Similarly, Fig. 4 (c) and (d) show the channel 200, representing the 854.97 nm wavelength, of and , respectively. This time the channel corresponding to is sharper than .
One contribution of this work lies in producing one hyperspectral image with all channels in focus via spectral focus stacking. To this end, we acquire two images with two different focus points, one in the blue spectrum and one in the red spectrum. The final all-in-focus image is generated from the in-focus channels of the two input images. In our work, we generate our final all-in-focus image by using the first 75 channels from and the remaining 183 channels from . We quantitatively compared our all-in-focus HSI with and . The results are presented in Table I and are discussed in Sec.IV-C.
In a previous study on this application we followed an unsupervised approaches with k-means and GMM clustering algorithms , which performed weakly, especially for diluted red chalk. In this work, we assume that it is feasible to obtain a limited number of labeled pixels by a specialist, e.g., an art historian. This allows to use supervised learning to evaluate the proposed features and processing pipeline for layer separation. We consider the three classes red chalk, diluted red chalk and black ink. Classification is performed using a random forest (RF), with 10 trees. The number of variables for training the trees and bagging is set to the square root of the number of features, as proposed by Breiman . We used 100 random samples per class for training and the rest for testing. We repeated this process 25 times and reported the average classification performance metrics and their standard deviation (SD). In our dataset, the number of pixels for these classes is , and , respectively. For training, we select pixels from each class, which corresponds to , and of each class, respectively.
Iv-A1 Phantom Data
We created a set of sketches with multiple layers of graphite, chalk, and different inks of the same chemical composition that were commonly used in old master drawings. After each layer was drawn, the picture was scanned with a book scanner (Zeutschel OS 12000, in RGB mode). This step-by-step documentation of the controlled creation process allows to compute ground truth drawing layers, by subtracting two subsequent scanned images. A sample sketch from this data is shown in Figure 5.
Iv-A2 Hyperspectral Imaging
For imaging, we use a Specim PFD-CL-65-V10E hyperspectral camera equipped with a CMOS sensor, capable of capturing the spectrum in a wavelength range of 400 nm to 1000 nm. The imaging setup is shown in Fig. 6. We use a lens with 16 mm focal length and the distance between the subject and the camera is 68 cm. The document is illuminated with a 500 W tungsten lamp.
Iv-A3 Simulated RGB
In order to compare the effectiveness of using hyperspectral images for layer separation with RGB images, we simulated RGB images from our hyperspectral images. The blue color in RGB domain is corresponding to the wavelengths between 415 nm and 495 nm (HSI channels 24 to 56). Similarly, the green color corresponds to the wavelengths range of 495 nm-570 nm (HSI channels 57 to 87) and the red color lies between 620 nm-750 nm (HSI channels 108 to 156). We generated the red, green and blue channels by taking the average of HSI channels 108-156, 57-87 and 24-56, respectively. The reason that we do not use the RGB image acquired by the board scanner for comparison is that the board scanner has newer sensor, higher resolution, higher signal to noise ratio (SNR) and better lighting condition. Therefore, the comparison would not be fair. Fig.7 shows the simulated RGBs from the HSIs, before and after pre-processing.
Iv-B Evaluation Protocol
Iv-B1 Registration of HSI to the ground truth
Our ground truth, generated from Fig. 5, is acquired by a board scanner. The HSI images are acquired via a line scanner hyperspectral camera. Different modalities, resolutions, aspect ratios and the non-flat surface of the paper make the images from these modalities geometrically different. In order to compare the HSI analysis output, hyperspectral images need to be registered to the board scanner image. In a previous study , we concluded that a non-rigid registration using residual complexity similarity measure (RC)  suits our purpose well. Therefore, we use RC to register our HSI to the RGB image acquired by the board scanner.
To evaluate the classification performances, we used overall accuracy (OA), average accuracy (AA) and Kappa coefficient metrics. OA is the number of correctly classified instances divided by the number of all samples, while AA is the mean class-based accuracies. The kappa statistic is a measure of how closely the classified samples matches the ground truth. By measuring the expected accuracy, it results in a statistic expressing the accuracy of a random classifier.
Iv-C1 Impact of Spectral Focus Stacking
In order to study the effect of spectral focus stacking, we conducted two sets of experiments on simulated RGB images and HSI images. In the first experiment, we generated RGB images from , and all-in-focus HSI. In the second experiment, we carried out the classification on the illumination-corrected , and all-in-focus HSI. The results for these two experiments are presented in Table I. It can be seen that in both scenarios, spectral focus stacking yields better AA, OA and Kappa performance.
|Feature||AA% (SD)||OA% (SD)||Kappa (SD)|
|Simulated RGB image from HSI|
|70.63 (1.41)||60.82 (2.51)||0.3515 (0.0227)|
|72.32 (1.15)||63.62 (3.09)||0.3777 (0.0313)|
|Focus Stacking||73.72 (1.10)||64.96 (2.40)||0.3980 (0.0257)|
|74.76 (0.94)||64.67 (1.45)||0.3998 (0.0169)|
|76.12 (0.96)||66.34 (1.42)||0.4186 (0.0165)|
|Focus Stacking||76.57 (0.94)||67.21 (3.56)||0.4304 (0.0366)|
Iv-C2 Layer Separation Performance of the Proposed Features
As spectral focus stacking results in better performance, the remaining computations are performed over all-in-focus images. To study the impact of illumination correction, hyper-hue, and EMAP, we generated the following features.
SimRGB: Simulated RGB image, generated from the illumination-uncorrected all-in-focus HSI,
SimRGB-IC: Simulated RGB image, generated from the illumination-corrected HSI,
SimRGB-IC-EMAP: EMAP computed on SimRGB-IC. We used area as the only EMAP attribute with 20 thresholds . In order to choose the threshold values, we followed a similar approach to Ghamisi et al. ,
HSI: Illumination-uncorrected all-in-focus HSI,
HSI-IC: Illumination-corrected all-in-focus HSI,
HSI-DR: HSI-IC projected to its PCA components such that 99.9% of its variance are preserved,
HSI-h: Hyper-hue computed from the illumination-corrected HSI,
HSIhSI: HSI-IC, hyper-hue, saturation () and illumination () concatenated together,
HSIhSI-DR: Dimensionality reduced HSIhSI via PCA so that 99.9% of its variance is preserved.
HSI-EMAP: EMAP computed on dimensionality reduced HSI-IC: EMAP’s parameters are chosen similar to SimRGB-IC-EMAP,
HSIhSI-EMAP: EMAP computed on dimensionality reduced HSIhSI. EMAP parameters are chosen similar to SimRGB-IC-EMAP.
The results for the features are presented in Table II. The first observation is that illumination correction always improves the results, both for SimRGB vs. SimRGB-IC and for HSI vs. HSI-IC. Furthermore, comparing the SimRGB-IC with HSI-IC indicates that an illumination-corrected HSI performs better than an RGB image. In HSI-DR, applying PCA further improves the HSI performance.
Hyper-hue computed over HSI (HSI-h) results in a big jump in performance. Furthermore, the standard deviation in HSI-h is smallest among all the other features which indicates a high stability of this feature. Combining hyper-hue, saturation and illumination on the HSI image (HSI-hSI) can not exceed the performance of hyper-hue alone. Also dimensionality reduction on this combination (HSI-hSI-DR) can not compete with HSI-h, and performs even worse than HSI-hSI. EMAP computed on the HSI (HSI-EMAP) results in a performance that is well comparable with HSI-h. Finally, computing EMAP over HSIhSI leads to a slight improvement and results in the overall best layer separation performance.
It is worth mentioning that the threshold values we choose for EMAP by following the proposed method in  are probably not optimal. Observing a competitive performance by EMAP in this work motivates us to study other attributes and threshold values in our future works.
|Feature||AA% (SD)||OA% (SD)||Kappa (SD)|
|SimRGB||71.83 (0.79)||62.05 (1.90)||0.3632 (0.0178)|
|SimRGB-IC||73.72 (1.10)||64.96 (2.40)||0.3980 (0.0257)|
|SimRGB-IC-SI||74.29 (0.61)||66.08 (2.57)||0.4119 (0.0261)|
|SimRGB-IC-EMAP||74.63 (0.77)||67.25 (1.84)||0.4251 (0.0170)|
|HSI||75.43 (1.05)||66.94 (2.11)||0.4196 (0.0217)|
|HSI-IC||76.57 (0.94)||67.21 (3.56)||0.4304 (0.0366)|
|HSI-DR||80.35 (0.66)||72.58 (1.53)||0.5019 (0.0183)|
|HSI-h||83.00 (0.47)||77.39 (1.28)||0.5731 (0.0161)|
|HSIhSI||82.86 (0.52)||77.16 (1.53)||0.5701 (0.0213)|
|HSIhSI-DR||79.58 (0.86)||71.00 (2.41)||0.4817 (0.0273)|
|HSI-EMAP||82.61 (1.11)||77.35 (2.53)||0.5719 (0.0350)|
|HSIhSI-EMAP||83.08 (0.89)||77.70 (1.18)||0.5766 (0.0191)|
In order to see the effect of a better sensor, SNR and lighting, we also classified on the down-sampled RGB image that is acquired by the RGB board scanner (see Table III). These results are superior to the HSI-based results, which was expected. Along with the results in Table II, we conclude that multi-/hyper-spectral imaging with suitable processing can outperform RGB imaging when operating on images with identical noise and photon statistics. While the photon statistics is typically bounded by the fact that historic documents may not be exposed to too much light, it will be interesting to investigate multi-spectral imaging with a DSLR camera due to the improved resolution, SNR, and dynamic range in future work.
|AA% (SD)||OA% (SD)||Kappa (SD)|
|87.71 (0.66)||83.68 (1.31)||0.6773 (0.0206)|
The results are qualitatively compared in Fig. LABEL:Qualitative. In this figure, (a) represents the ground truth (GT). Red color in GT corresponds to the red chalk, green represents the red chalk that is overlaid by the black ink and blue color, is the black ink class. Black color in GT represents the background and is not considered during the classification. As it can be observed from this image, SimRGB label map contains high portion of misclassification, which is highly improved by HSI-h, HSI-EMAP and HSIhSI-EMAP.