Hyungjoo Cho phelahab@gmail.com
Seoul National University \ANDSungbin Lim sungbin@korea.ac.kr
Korea University \ANDGunho Choi ghc0311@gmail.com
Yonsei University \ANDHyunseok Min min6284@gmail.com
Equal contribution

Performance of data-driven network for tumor classification varies with stain-style of histopathological images. This article proposes the stain-style transfer (SST) model based on conditional generative adversarial networks (GANs) which is to learn not only the certain color distribution but also the corresponding histopathological pattern. Our model considers feature-preserving loss in addition to well-known GAN loss. Consequently our model does not only transfers initial stain-styles to the desired one but also prevent the degradation of tumor classifier on transferred images. The model is examined using the CAMELYON16 dataset.

Neural Stain-Style Transfer Network]Neural Stain-Style Transfer Learning using GAN for Histopathological Images

[ Hyungjoo Cho***Equal contribution phelahab@gmail.com
Seoul National University
Sungbin Lim sungbin@korea.ac.kr
Korea University
Gunho Choi ghc0311@gmail.com
Yonsei University
Hyunseok Min min6284@gmail.com

Keywords: Deep Learning, Stain Normalization, Domain Adaptation, Neural Style Transfer, Generative Adversarial Network

1 Introduction

Deep learning based image recognition receives a lot of attention due to its notable application to digital histopathology including automatic tumor classification. Convolutional neural networks(CNNs) have recently achieved state-of-the-art performance in the task of image classification and detection, especially, replaced the traditional rule-based methods in the several contests of medical image diagnosis LeCun et al. (2015); Wang et al. (2016). Such data-driven approach especially depends on quality of training dataset hence it requires sensible preprocesses. In histopathology, staining e.g. haematoxylin and eosin (H&E) is essential to examine the microscopic presence and characteristics of disease not only for pathologists but also for neural networks. For digital histopathology, several stain normalization preprocesses are well-known Ruifrok et al. (2001); Reinhard et al. (2001); Ruifrok et al. (2003); Annadurai (2007); Magee et al. (2009); Macenko et al. (2009); Khan et al. (2014); Li and Plataniotis (2015); Bejnordi et al. (2016).

Figure 1: Samples of tissue tiles from different institutes in CAMELYON16cam (2016), 17cam (2017) dataset. The first row shows normal samples, and the second row shows tumor samples. Samples of institute 1, 2 are included cam (2016), the others are included cam (2017) dataset.

Standard stain normalization algorithms are based on stain-specific color deconvolution Ruifrok et al. (2001). Stain deconvolution requires prior knowledge of reference stain vectors for every dye present in the whole-slide images (WSI). Ruifrok et al. (2001) suggested a manual approach to estimate the color deconvolution vectors by selecting representative sample pixels from each stain class, and a similar approach was used in Magee et al. (2009) for extracting stain vectors. Such manual estimation of stain vectors, however, strongly limits their applicability in large studies. Khan et al. (2014) modified Magee et al. (2009) by estimating stable stain matrices using an image-specific color descriptor. Combined with a robust color classification framework based on a variety of training data from a particular stain with nonlinear channel mappings, the method ensured smooth color transformation without introducing visual artifacts. Another approach used in Bejnordi et al. (2016) transforms the chromatic and density distributions for each individual stain class in the hue-saturation-density (HSD) color model. See Bejnordi et al. (2016) and references therein.

Stain normalization methods for histopathological images have been studied extensively, and yet these still possess challenging issues. Most of the conventional methods use various thresholds to filter out backgrounds and other irrelevant dimensions. However, these methods cannot represent the broad feature distribution of the entire target image, thus they require manual tuning of hyper-parameters such as thresholds. Furthermore, since nuclei detection has a significant impact on performance of color normalization, it is unlikely to expect good performance if there is a mistake in the nuclei detection stage. Finally, although the major aim of most conventional approaches is to enhance the prediction performance of classification system, these stain normalization methods and classifer work separately. It is reported that performance of network varies with institutes even they applied same staining methods Ciompi et al. (2017). In order to prevent such variation, it is required to consider a domain adaption method.

In this paper, we propose a novel stain-style transfer method using deep learning, as well as a special loss function which minimizes the difference between latent features of input image and that of target image, thus preserves the performance of the classifier. We implement fully convolutional network (FCNs) Long et al. (2015) in proposed stain-style generator that learns the color distribution of dataset which is used to train the tumor classifier.

Our contributions in this paper are of two areas. First, we replace the color normalization methods with a generative model which learns certain stain-style distribution of dataset. Second, we introduce feature-preserving loss to induce the classifier to extract better features than different methods.

2 Stain-Style Transfer with GAN

2.1 Stain-Style of Dataset

In this section, we summarize relevant material on our model. Let be a set of institutes and let be the dataset of histological sample and the corresponding label . The class of stained images or color images with RGB channels, denoted by , is defined to be the set of -matrix with entries. Under this setting, we define the stain-style of institute to be a random variable with a probability distribution

Since admits a certain conditional probability distribution , the definition of different stain-style with the same label makes sense.

2.2 Tumor Classifier Network

Suppose we trained tumor classifier network which infers histological pattern of input image . We write if the classifier is especially trained on dataset which follows stain-style . We estimate the performance of by


where is a loss function for classification e.g. cross-entropy. Practically, we make classifier to learn stained images rather than . Hence one can decompose the classifier by where is an actual network which is trained on dataset with stain-style . In this case, we estimate (1) by


2.3 Stain-Style Transfer Network

Figure 2: Overview of the stain-style transfer network. The network is composed of two transformations: Gray-normalization and style-generator . standardizes each stain-style of color images from different institutes and colorizes gray images following the stain-style of certain institute.

Since the stain-styles of each institute are dissimilar, the histological pattern in image from different institute would break up in the view of classifier network. Consequently, it would show degraded performance Ciompi et al. (2017):

To overcome this problem, we propose stain-style transfer (SST) network which transfers stain-style to the initial . Precisely, our aim is to find a network which satisfies


Due to the change of variable formula (Durrett, 2010, Theorem 1.6.9), (3) implies


hence the tumor classifier recovers its performance (2).

We emphasize that our SST network does not require the dataset of institute to train both and . To make independent of institute , we employ the gray normalization and train stain-style generator such that , as illustrated in Figure 2.

2.4 Stain-Style Generator by Conditional GAN

Figure 3: Illustration of feature-preserving loss. We use the global average pooled layers of input and generated image as the input of feature preserving loss.

To train the style-generator , we introduce three loss functions (a) reconstruction loss, (b) GAN loss, and (c) feature-preserving loss.

2.4.1 Reconstruction Loss

Restricted to the initial , SST network should be an reconstruction map i.e. . Hence we apply a reconstruction loss to minimize the -distance between and its original image using the architecture from Quan et al. (2016) which has very deep structure with short-cut and skip connections. The reconstruction loss is denoted by

2.4.2 Conditional GAN Loss

As Pathak et al. showed in Pathak et al. (2016), mixing GAN loss Goodfellow et al. (2014) with some traditional loss, such as , improves the performance of generator. Since we have labeled images, conditional GAN Mirza and Osindero (2014) was applied instead of Goodfellow et al. (2014). By means of GAN, is to learn a mapping from to and to trick the discriminator . Here is to distinguish between fake and real images using the architecture from DCGAN Radford et al. (2015). We use the following GAN loss

While learns to maximize , tries to minimize it until both arrives at its optimal state. Through the above procedure, every stained image might be transferred to have the desired stain-style. However, this approach often tend to make frequent color images independent of histological pattern. This phenomena is called mode collapse (of GANs) which possibly interrupt achieving (3). Therefore we need an additional loss function.

2.4.3 Feature-preserving Loss

As in (3), in the optimal state , an output of SST network should approximate target . By the means of Kullback-Leibler divergence, (3) can be restated by

To obtain , having (4) in mind, we employ the feature-presearving loss

where indicates the feature of given color image extracted from the classifier . As illustrated in Figure 3, the final layer before the activation function is used to examine feature vector, precisely, global average pooled layer.

Consequently, the overall loss function is

where , are the weights which are used to balance the update between different loss functions.

3 Experiment

We perform quantitative experiment in tumor classification to evaluate the SST network. To show the general performance of our method, we apply the extensions to vanilla models as well as conventional method. We have 4 baseline methods: Reinhard et al. (2001), Macenko et al. (2009), Histogram specification (HS) Annadurai (2007) and WSI color standardization (WSICS) Bejnordi et al. (2016).

3.1 Dataset

The Camelyon16 dataset is composed of 400 slides from two different institutes, Radbound and Utrecht. We use 180,000 patches for training, 20,000 for validation from Radbound and 140,000 patches for testing from Utrecht. The number of tumor and normal are the same. Hypothesizing the training and validation dataset belong to a certain institue and the test set is from another one, we can merge every stain-style into the same space by applying the gray normalization. Both training and validation dataset are labeled, supervised learning can be applied to train the mapping from gray image to the colored one. We used gray normalization based on Pillow package of python which uses this formula .

3.2 Network Architecture

In this part, we explain each network structure of classifier network and stain-style generator which constitute SST network.

3.2.1 Classifier Network

Classifier network carries out two tasks in experiment. Firstly, it is a discriminator which evaluates the performance of stain-style generator . Secondly, as already explained in subsection 2.4.3, it works as a feature-extractor which is used in feature preserving loss . We use ResNet-34 from torchvision library in PyTorch as a framework.

3.2.2 Stain-Style Generator

The generator network is provided an image as an input instead of a noise vector. Therefore we can use FCN type architectures and U-Net is one of the most famous network among them. However, because of its limit of performance, we use FusionNet which has combined the advantages of U-Net and that of ResNet. Hyperparameters of network are set as same as Quan et al. (2016). We adapt our discriminator architectures from Radford et al. (2015) which is based on VGG-Net without pooling layer. The hyperparameters of discriminator are the same as those in Radford et al. (2015).

3.3 Result

Figure 4: Comparison between SST and other stain normalization method: (a) Target image for transfer (b) Original input image to be transferred (c) SST (d) WSICS (e) HS (f) Marcenko (g) Reinhard

Figure 4 illustrates the result of each stain normalization method on a sample image. Target image comes from Radbound which is used for training the tumor classifier. Original image is sampled from Utrecht, used for testing the tumor classifier. Although there is no visual difference between outputs of each method, the classification performance on these color images varies significantly. Given the experiment results in Table 1, SST network successfully avoids the performance degradation. SST achieves the highest performance on original images on tumor classification with Area Under Curve(AUC) = 0.9185. This result shows that there are difference between visual judgment and the result of classifier. In case of WSICS’s result, which is most visually similar to SST’s, the AUC score is worse than that of SST by about 30%. On the other hand, Macenko, which was visually the worst, performs better than other methods except for SST. Conventional methods consider only the physical features of input images and lose patterns which are key features for classifier’s decision making process. In contrast, SST maintains those key features, input image’s own patterns, and also consider the color distribution of target images as well as the contextual information of original images.

Model Target Original SST WSICS HS Macenko Reinhard
AUC 0.9760 0.8900 0.9185 0.6408 0.4245 0.7169 0.5611
Precision 0.9114 0.8098 0.8440 0.5989 0.4987 0.6983 0.6114
Recall 0.9126 0.8111 0.8460 0.5957 0.4986 0.6956 0.6119
Specificity 0.9583 0.8014 0.8371 0.6010 0.4162 0.6500 0.5471
Table 1: Performance of tumor classifier network on different stain normalization methods. SST network shows significant improvement compared to direct application to original (untransferred image) and outperforms the others.

4 Conclusion

In this work, we have presented a stain style transfer approach to stain normalization for histopathological images. To that end, we replace the stain normalization models with a generative model which learns certain stain-style distribution of training dataset. This stain style transfer network is considerably simpler than contemporaneous work, and produces more realistic results without any additional labeling or annotation for training as well as prior knowledge. Further, unlike conventional stain normalization, which acts independently of the tumor classifier, the proposed feature-preserving loss induces our coloration in a direction that does not affect the tumor classifier. We demonstate that our model is optimized for the performance of the tumor classifier and allows successful stain-style transfer.

The style of chemical cell staining is mainly affected by structural information and morphology of cells rather than factors such as cell brightness. Based on these observation points, we converted the test image into a gray image and performed a stain style transfer process. While this method has the advantage of making the process simpler, it has also lost some information. To resolve the limitation, further investigation will assess direct stain style transfer approach from color image to color image. In addition, we hope to more closely examine parameters of our deep learning approach. Further, we will perform more rounds of hard negative mining and consider the reliability and reproducibility of the deep CNN models.


  • cam (2016) http://camelyon16.grand-challenge.org/, 2016.
  • cam (2017) http://camelyon17.grand-challenge.org/, 2017.
  • Annadurai (2007) S Annadurai. Fundamentals of digital image processing. Pearson Education India, 2007.
  • Bejnordi et al. (2016) Babak Ehteshami Bejnordi, Geert Litjens, Nadya Timofeeva, Irene Otte-Höller, André Homeyer, Nico Karssemeijer, and Jeroen AWM van der Laak. Stain specific standardization of whole-slide histopathological images. IEEE transactions on medical imaging, 35(2):404–415, 2016.
  • Ciompi et al. (2017) Francesco Ciompi, Oscar Geessink, Babak Ehteshami Bejnordi, Gabriel Silva de Souza, Alexi Baidoshvili, Geert Litjens, Bram van Ginneken, Iris Nagtegaal, and Jeroen van der Laak. The importance of stain normalization in colorectal tissue classification with convolutional networks. arXiv preprint arXiv:1702.05931, 2017.
  • Durrett (2010) Rick Durrett. Probability: theory and examples. Cambridge university press, 2010.
  • Goodfellow et al. (2014) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
  • Khan et al. (2014) Adnan Mujahid Khan, Nasir Rajpoot, Darren Treanor, and Derek Magee. A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Transactions on Biomedical Engineering, 61(6):1729–1738, 2014.
  • LeCun et al. (2015) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
  • Li and Plataniotis (2015) Xingyu Li and Konstantinos N Plataniotis. A complete color normalization approach to histopathology images using color cues computed from saturation-weighted statistics. IEEE Transactions on Biomedical Engineering, 62(7):1862–1873, 2015.
  • Long et al. (2015) Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
  • Macenko et al. (2009) Marc Macenko, Marc Niethammer, JS Marron, David Borland, John T Woosley, Xiaojun Guan, Charles Schmitt, and Nancy E Thomas. A method for normalizing histology slides for quantitative analysis. In Biomedical Imaging: From Nano to Macro, 2009. ISBI’09. IEEE International Symposium on, pages 1107–1110. IEEE, 2009.
  • Magee et al. (2009) Derek Magee, Darren Treanor, Doreen Crellin, Mike Shires, Katherine Smith, Kevin Mohee, and Philip Quirke. Colour normalisation in digital histopathology images. In Proc Optical Tissue Image analysis in Microscopy, Histopathology and Endoscopy (MICCAI Workshop), volume 100. Citeseer, 2009.
  • Mirza and Osindero (2014) Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
  • Pathak et al. (2016) Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2536–2544, 2016.
  • Quan et al. (2016) Tran Minh Quan, David GC Hilderbrand, and Won-Ki Jeong. Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics. arXiv preprint arXiv:1612.05360, 2016.
  • Radford et al. (2015) Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  • Reinhard et al. (2001) Erik Reinhard, Michael Adhikhmin, Bruce Gooch, and Peter Shirley. Color transfer between images. IEEE Computer graphics and applications, 21(5):34–41, 2001.
  • Ruifrok et al. (2001) Arnout C Ruifrok, Dennis A Johnston, et al. Quantification of histochemical staining by color deconvolution. Analytical and quantitative cytology and histology, 23(4):291–299, 2001.
  • Ruifrok et al. (2003) Arnout C Ruifrok, Ruth L Katz, and Dennis A Johnston. Comparison of quantification of histochemical staining by hue-saturation-intensity (hsi) transformation and color-deconvolution. Applied Immunohistochemistry & Molecular Morphology, 11(1):85–91, 2003.
  • Wang et al. (2016) Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, and Andrew H Beck. Deep learning for identifying metastatic breast cancer. arXiv preprint arXiv:1606.05718, 2016.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description