Label Super Resolution with Inter-Instance Loss
For the task of semantic segmentation, high-resolution (pixel-level) ground truth is very expensive to collect, especially for high resolution images such as gigapixel pathology images. On the other hand, collecting low resolution labels (labels for a block of pixels) for these high resolution images is much more cost efficient. Conventional methods trained on these low-resolution labels are only capable of giving low-resolution predictions. The existing state-of-the-art label super resolution (LSR) method is capable of predicting high resolution labels, using only low-resolution supervision, given the joint distribution between low resolution and high resolution labels. However, it does not consider the inter-instance variance which is crucial in the ideal mathematical formulation. In this work, we propose a novel loss function modeling the inter-instance variance. We test our method on two real world applications: cell detection in multiplex immunohistochemistry (IHC) images, and infiltrating breast cancer region segmentation in histopathology slides. Experimental results show the effectiveness of our method.
Conventional high-resolution semantic segmentation models require large amounts of high-resolution ground truth data (pixel-level labels) [1, 19, 8]. It is very labor intensive to collect these large scale datasets, especially for datasets of gigapixel images such as pathology images [13, 17]. The weakly supervised semantic segmentation approaches [2, 21, 22, 31, 23] learn to produce pixel-level segmentation results given sparse, e.g., image-level labels. It requires that the set of image-level classes must be the same as the pixel-level classes. For example, given that the image contains a cat, the network learns to segment the cat . In many applications, however, low-resolution (e.g., block-level) information may correlate with pixel-level labels in a more complex way . For example, a patch in a tissue image may be assigned a probability of containing cancer tissue and may contain high/low amounts of different types of cells [16, 27].
The Label Super Resolution (LSR) method  models this problem by utilizing the joint distribution between low-resolution and high-resolution labels, as shown in Fig. 1. The LSR model is trained with each low resolution label assigned to each group of pixels (i.e., an image block) . Let be the number of pixels with high-resolution class label in an image block, LSR tries to match the the actual count of in prediction with the count distribution indicated by .
For each fixed image block, the LSR loss matches the distribution of predicted given by the network, with the distribution of designated by the low resolution label : . Note that the ground truth is computed across multiple image blocks with the same low resolution label . On the other hand, the distribution of predicted is computed on each fixed image block. In other words, the existing LSR loss does not consider variance across image blocks with the same .
To address this problem, we propose a new loss function. The proposed loss functions match the distribution of across a set of image blocks with the same label to the distribution suggested by the low resolution label . Mathematically, this models the true variance of class/label counts across image blocks, not just within an image block.
We evaluate the proposed loss function on two image analysis tasks: semantic segmentation to identify different cell types in multiplex immunohistochemistry (IHC) images and infiltrating breast cancer region segmentation in Hematoxylin and Eosin stained pathology images. The experiment results show that both of the loss functions outperform the LSR loss function significantly. To summarize, our contribution are as follows:
A novel loss functions for label super resolution, which takes into account variance across image blocks with the same low-resolution label.
A multi-class cell detection model with low resolution pathologist annotations. The model significantly outperforms color-based baselines and the existing LSR method.
A breast cancer region segmentation model. The model can produce accurate high-resolution cancer segmentation boundary with only low resolution supervision in the training phase.
2 Label Super Resolution
The existing Label Super Resolution (LSR) approach  proposed an intra-instance loss function with which it learns to super-resolve low resolution labels. The key source of information it utilizes is the conditional distribution : the probability distribution of within an image block with low resolution label . As an example, Table 1 shows for each high-resolution label and low-resolution label for the cancer segmentation task. In this example, is a binary label indicating if an image block is a cancer block or not; is a binary label indicating if a pixel is a cancer pixel or not (i.e., if the pixel is in a cancer cell or not). The cancer probability of an image block is provided by a patch-level cancer classifier [17, 13]. The values in Table 1 were computed through manual annotation. For each label , a domain expert examined 10 to 12 pixel image blocks with label and visually approximated for each image block. The visual approximation process and the effect of using visually approximated counts rather than exact counts using ground truth masks are elaborated in Sec. B of the appendices. In total, the domain expert examined 100 to 120 image blocks, instead of painstakingly delineating the precise boundaries of small and large cancer and non-cancer regions in whole slide tissue images. The cost of annotation in LSR is very low compared with conventional per-pixel labeling.
|Image block with||Count% of|
|low resolution class :||high resolution class :|
|probability% as cancer block||cancer||Non-cancer|
All super resolution methods in this paper use the conditional distribution . We first describe this baseline method  as an intra-instance loss. We then formulate two new loss functions. An overview of these three loss functions is shown in Fig. 2
2.1 Baseline: intra-instance loss
We introduce the intra-instance loss  starting with label counting. The classification/segmentation network produces, for each pixel in the image, a probability that a given pixel is in class . This is expressed as , where is -th input image block with low resolution label ; and is the class of a pixel with coordinates . The LSR approach models the network’s output on a pixel as a Bernoulli distribution. If we sampled the model’s prediction at each pixel , the value of would be
where is the indicator function. Given a set of pixels in , whose class label is , the value of is approximated by a Gaussian distribution:
As shown in Table 1, the ground truth is also modeled as a Gaussian distribution, only depending on the low resolution class :
The LSR method minimizes the distance between and for each input with label . The distance between two Gaussian distributions is formulated as follows:
Drawback of Intra-instance Loss:
Given an instance (an image block) with a low-resolution label, the distribution of predicted class counts is computed by fixing the input instance. In other words, is computed instead of . By minimizing the dissimilarity between and , a classification/segmentation network trained with the absolute optimal training error produces the same distribution of class counts regardless of the fact that is varied.
2.2 Inter-instance loss
Because the distribution of real class counts is computed across different instances (image blocks) with the same label , we argue that one should also model the distribution of predicted class counts across instances.
We formulate our proposed inter-instance loss as follows. First, we develop a new intra-instance loss. For each input instance with low resolution label , the predicted value of is defined as the average predicted probability for high resolution class . In this case, is discrete:
In other words, we model the predicted count as a constant: .
Using this simplified formulation, we model the predicted count across different instances as an approximate Gaussian distribution:
where and are computed empirically:
In practice, it may not be possible to compute the exact and when the number of image blocks is large and computational resources are limited. We address this problem by estimating and on a batch of sample instances. This strategy is well in line with stochastic neural network training strategies.
The inter-instance loss is computed as follows:
Our method matches to by assuming that the predicted value of is a constant given an input block .
Drawback of Inter-instance Loss:
The inter-instance loss does not consider intra-image variation: the confidence of model prediction. Less confident predictions yield larger intra-image variations.
2.3 Intra + inter-instance loss
Following the intra-instance loss formulation in Sec. 2.1, the predicted label counts vary when prediction for each pixel is viewed as a Bernoulli random variable.
Our intra+inter-instance loss is based on label count sampling. We have developed the following sampling strategy. Given low resolution label , we first sample . We then use the segmentation network to compute . Finally we sample a class count according to for all . This across-block label count is approximated by the following Gaussian distribution:
Here, is the label count, , given with low resolution label . We compute and empirically:
We visually approximated and using a batch of image blocks. We use Eq. 9 as the statistics matching loss.
We evaluated our loss functions with two image analysis tasks in digital pathology: cell detection in multiplex Immunohistochemistry (IHC) images and cancer segmentation in Hematoxylin and Eosin (H&E) stained images.
3.1 Cell detection in multiplex images
Analysis of human patient tissue stained by immunohistochemistry (IHC) provides information on protein expression and distribution [24, 4]. These proteins function as biomarkers that can be used to classify cells that might otherwise indistinguishable . Furthermore, these assays can give physicians and researchers information that could be used to deliver more accurate prognosis or treatment to patients, or identify novel therapeutic targets to investigate.
Traditional IHC techniques are only able to stain one type of protein per slide; if assessment of more than one protein is necessary, individual IHC multiple staining operations must be performed on consecutive slides from the same tissue specimen. Each of these operations possibly adds error, such as inaccurate image registration, to the final IHC results.
Multiplex IHC [30, 34] is a new technique that allows for the staining of multiple markers on the same slide . In our case, 5 protein markers used were each indicative of a specific immune cell type. Thus, analysis of the multiplex IHC digital pathology images will give information on the immune response to cancer. This is an active area of cancer research as it is known that immune response can impact patient outcome and response to treatment.
3.1.1 Training data
The training data was extracted from 16 whole slide multiplex IHC images. Each tissue sample was stained with 6 stains, binding to different types of cells, and digitized at high-resolution using a digital microscopy scanner. Because of the staining process and image scanning being done at once in the RGB space without any specialized filters, stains may bleed into neighbor cells and stain colors may overlap and mix in the resulting images. This makes it difficult for methods that depend on explicit color and intensity manipulations to accurately detect different types of cells. Table 2 shows the associations between 5 types of cells considered in this work and the corresponding stains.
|# of patches with “high” cell count||422||170||773||229||461|
|# of patches with “low” cell count||2552||2804||2201||2745||2513|
|# of cells doted in validation set||200||83||169||77||265|
|# of cells doted in testing set||326||183||343||111||478|
A total of 2974 patches of -pixels were randomly extracted from the 11 multiplex IHC images at 10X (i.e., a pixel is 1 micron in each image dimension). Under the supervision of a pathologist, a medical major student assigned a high cell count or low cell count label for each patch and each cell type. The definition of high or low cell count is given by a pathologist beforehand. These patch level labels were used as low resolution training labels.
The unknown high resolution label is the pixel-level classification of each cell type. The LSR model needs the joint distribution between low resolution labels and high resolution labels . We assume that in each patch (image block), the count of a cell type is independent from the counts of other cell types. To create the joint distribution table for our methods, a graduate student selected 12 patches from each low resolution label and visually approximated the count of each cell type in terms of the number of pixels. The mean and standard deviation of pixel counts among the 12 patches were used as the ground truth values of and .
3.1.2 Testing and validation data
Ideally, semantic segmentation results are evaluated by the Intersection Over Union (IoU) score computed from predicted and ground truth segmentation masks. As mentioned before, the pixel level ground truth mask is very expensive to produce. We use cell locations as weak labels. Each cell’s center pixel is given a cell type label. In our experiments, 2 undergraduate students labeled 68 patches under the supervision of a pathologist. In these 68 patches, 12 patches were labeled by both of the 2 students to evaluate the inter-rater agreement. We used 24 labeled patches as validation data and the remaining 44 patches as testing data.
3.1.3 Evaluation method
We use the F1-score to evaluate the cell segmentation results. The segmentation network outputs a class (cell type) prediction at each pixel in the image. We view each isolated segmented region of class as a detection result for a cell or a group of cells of class . The recall and precision for each cell type are defined as below.
Since the cell types are mutually exclusive, no trivial method (such as predicting every pixel as one cell type) would achieve the best overall precision and recall.
3.1.4 Baseline and proposed loss methods
Since the cell types are differentiated by the color of the corresponding stain(s), a straightforward way to detect different cells is to do color decomposition. Color based methods, however, do not generally work well and are limited because of: chromatic overlapping ; the bleeding and mixing of stains; and the lack of other important information such as shape of cells. We compared our approach to two color-based methods.
Color deconvolution [26, 15, 20] is widely used for color decomposition of IHC images with no more than 3 stains. It is limited by the matrix inverse process which requires that the number of input color channels be equal to the number of decomposed channels. Multiplex IHC images possibly contain more than 3 stains, encoded in 3-channel (RGB) images. In those cases, color deconvolution is applied to decompose 3 stains at a time.
Color separation via L2 distance:
This method decomposes each stain by directly computing the L2 distance between pixels’ RGB values against the standard RGB values of a stain. This color decomposition method does not limit the number of output decomposed stains.
Segmentation with limited high res labels:
We use the “color separation via L2 distance” method to derive a few high resolution labels, for training a semantic segmentation network. In particular, limited, high confidence, high resolution predictions are obtained based on low resolution ground truth labels and the color decomposition results. These sparse high resolution prediction results are used as pseudo labels for training a fully supervised, semantic segmentation network. We refer to this method as high res. If a training image patch has label em high for a stain (type of cell), pixels in its L2 color decomposition with the top 0.25% to 0.5% confidence are selected to assign the high resolution labels for that stain. This is a time-efficient way to generate high resolution ground truth labels for training. We use a U-net-like semantic segmentation network . The output has 6 classes: 5 cell types and 1 background class. We assume that the different types of cells do not overlap spatially. Thus, the last layer of the network is a 6-way softmax layer.
Label super resolution:
We build Label Super Resolution (LSR) models to segment multiple types of cells to sub-pixel level, trained with low resolution labels. We test the three LSR loss functions as described in the previous sections: Intra-instance, as described in Sec. 2.1; Inter-instance, as described in Sec. 2.2; and the Intra+inter-instance, as described in Sec. 2.3. The only difference between these three methods is the way the predicted label count distribution is modeled, as described in Fig. 2.
Label super resolution adding limited high res labels:
We trained the same semantic segmentation network with both the limited high resolution labels and the label super resolution. The high resolution loss and super resolution loss terms are weighted and averaged in order to form the final loss. The weights of the losses are selected based on visual and quantitative evaluation on the validation set. We introduced three methods with this setting: Intra-instance & high res, Inter-instance & high res, and Intra+inter-instance & high res.
3.1.5 Training details
We use the RMSprop optimizer  with and a learning rate of 0.5 (loss applied on each output pixel is averaged instead of added) to train all of the networks. For the high resolution only setting (the “high res” method) the batch size is 10. For the networks with the intra-instance label super resolution loss, we also used a batch size of 10. In the inter-instance settings, the loss is first computed in a group of 10 images (recall that the loss requires inter-instance statistics), and two groups are used in a batch. Same for the intra+inter-instance settings, the group size is 10, and there are 2 groups in each batch.
3.1.6 Experimental results
The F1-scores on the testing set are in Table 3. Color decomposition based methods perform poorly, due to the fact that these methods do not consider critical information such as shapes of different cell types. Using the intra-instance loss together with limited high resolution supervision outperforms the network uses only limited high resolution supervision. More importantly, the intra+inter-instance loss outperforms the intra-instance loss significantly.
3.1.7 Inter-rater agreement
The inter-rater agreement is the averaged F1-score between across each pair of human raters: if two dots given by two separate human raters are within pixels away, we consider these two dots match. Otherwise, we consider these two dots do not match. One dot given by a human rater can match at most one dot (the closest dot) given by another human rater. The F1-score is then computed using one human rater’s dot annotations as if they are ground truth, and another human rater’s dot annotations as if they are detection results. The F1-score computed with is shown in Table 3. The value of is roughly the average radius of cells. We note that the inter-rater F1-scores and the algorithm’s F1-scores are not directly comparable due to different protocols of evaluation. The purpose of showing inter-rater F1-scores is to stress that cell detection is very hard, even in multiplex images.
|Color separation via L2||0.2655||0.2306||0.1911||0.2623||0.2913||0.3521|
|Intra-instance & high res||0.5241||0.4031||0.4339||0.4801||0.6137||0.6894|
|Inter-instance & high res||0.5355||0.5082||0.3825||0.5541||0.5755||0.6574|
|Intra+inter-instance & high res||0.5507||0.4934||0.4394||0.5248||0.6190||0.6772|
3.2 Breast cancer region segmentation
Automatic cancer segmentation in pathology images has significant applications such as computer aided diagnosis and scientific studies . Manually annotating pixel-accurate cancer regions is time consuming, cost ineffective, and ambiguous. On the other hand, low resolution labels are relatively easy to collect and publicly available. Existing methods utilize low resolution labels to automatically produce low resolution segmentation results . However, high resolution segmentation results have unique advantages such as showing accurate cancer boundaries which are important for the analysis of invasive carcinoma and infiltrating patterns of cancer [14, 32]. Our proposed method is able to produce high resolution segmentation results using the low-resolution annotations.
We applied the proposed method to the task of cancer segmentation in breast carcinoma. Our low resolution labels are automatically generated from a cancer/non-cancer region classifier. The cancer/non-cancer region classifier labels a patch of pixels at a time, giving it a probability value of being cancer. The probability value is then quantified into 10 bins as 10 low resolution classes. Using this classifier, we labeled 1,092 breast carcinoma (BRCA) slides in The Cancer Genome Atlas (TCGA) repository , patch by patch. From 1,000 slides, we randomly extracted 26,767 patches with their low-resolution labels as training data. The patches from the rest 92 slides were for the validation and testing purposes. For training, the -pixel patches were downsampled to pixels at 2.4X (4.2 microns per pixel). The classifier has a DICE score of 0.726 on the HASHI cancer segmentation dataset , which has 196 TCGA slides. The details of the classifier are described in Sec. C of the appendices.
3.2.2 Evaluation method
For evaluating our high resolution cancer segmentation results, we collected 49 patches of pixels at 2.5X magnification and carefully annotated cancer regions in detail. 42 of them are used as test set and 7 of them are used as validation set. We use the Intersection over Union (IoU) and DICE coefficient scores as the evaluation metrics.
Since the only difference between low and high resolution cancer maps is reflected near cancer/non-cancer boundaries, we compute IoU and DICE scores only in areas within a distance of 240 pixels (1000 microns, width of an input patch) away from the ground truth cancer/non-cancer boundaries. We call those metrics as masked IoU and masked DICE. These two scores show performance difference only in regions that matter.
3.2.3 Implementation details
Similar to the multiplex application, we use a U-net-like architecture  with label super resolution losses. We do not use any high resolution data during training: only label super resolution methods are used. We use the RMSprop optimizer  with to train all networks. In the intra-instance setting, we use a batch size of 30 and a learning rate of 0.00001. For the intra+inter-instance loss, the loss is computed using a group of 15 instances and each batch has 2 groups; and the learning rate is 0.001.
3.2.4 Experimental results
We compare our methods to the original low resolution results given by the cancer/non-cancer region classification method. We call this the low resolution model. The quantitative results are shown in Table 4. The proposed intra+inter-instance loss super resolves low resolution cancer region boundaries given by the low resolution model. This means that our method can generate finer cancer segmentation results with very limited amount of annotation labor overhead. More importantly, the network with the intra+inter-instance loss outperforms the network with the intra-instance loss.
|Masked IoU||Masked DICE|
|Low resolution model||0.5722||0.7278|
The high cost of high resolution annotations to train pixel-level classification and segmentation is a major roadblock to the effective application of deep learning in digital pathology and other domains that generate and analyze very high-resolution images. A label super resolution approach can address this problem by using low resolution annotations, but the current implementations do not take into account variations across image patches. The novel loss functions proposed in this work aim to alleviate this limitation. Our empirical results show that the across instance loss better captures and models the variance of high resolution labels within image blocks of the same low resolution label. As a result, they are capable of outperforming the existing baselines significantly. In the future, we plan to generalize this approach to detection networks, in addition to segmentation.
This work was supported in part by 1U24CA180924-01A1, 3U24CA215109-02, and 1UG3CA225021-01 from the National Cancer Institute, R01LM009239 from the U.S. National Library of Medicine, and a gift from Adobe. Approval from (SBU) Institutional Review Board (IRB) - SBU IRB number 94651-31.
-  V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495, 2017.
-  L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2018.
-  A. Cruz-Roa, H. Gilmore, A. Basavanhally, M. Feldman, S. Ganesan, N. Shih, J. Tomaszewski, A. Madabhushi, and F. González. High-throughput adaptive sampling for whole-slide histopathology image analysis (hashi) via convolutional neural networks: Application to invasive breast cancer detection. PloS one, 13(5):e0196828, 2018.
-  D. J. Dabbs. Diagnostic immunohistochemistry e-book. Elsevier Health Sciences, 2013.
-  B. Gecer, S. Aksoy, E. Mercan, L. G. Shapiro, D. L. Weaver, and J. G. Elmore. Detection and classification of cancer in whole slide breast histopathology images using deep convolutional networks. Pattern recognition, 84:345–356, 2018.
-  M. A. Gorris, A. Halilovic, K. Rabold, A. van Duffelen, I. N. Wickramasinghe, D. Verweij, I. M. Wortel, J. C. Textor, I. J. M. de Vries, and C. G. Figdor. Eight-color multiplex immunohistochemistry for simultaneous detection of multiple immune checkpoint molecules within the tumor microenvironment. The Journal of Immunology, 200(1):347–354, 2018.
-  C. P. Hans, D. D. Weisenburger, T. C. Greiner, R. D. Gascoyne, J. Delabie, G. Ott, H. K. Müller-Hermelink, E. Campo, R. M. Braziel, E. S. Jaffe, et al. Confirmation of the molecular classification of diffuse large b-cell lymphoma by immunohistochemistry using a tissue microarray. Blood, 103(1):275–282, 2004.
-  M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.-M. Jodoin, and H. Larochelle. Brain tumor segmentation with deep neural networks. Medical image analysis, 35:18–31, 2017.
-  K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
-  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
-  G. Hinton, N. Srivastava, and K. Swersky. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on, 14, 2012.
-  L. Hou, V. Nguyen, A. B. Kanevsky, D. Samaras, T. M. Kurc, T. Zhao, R. R. Gupta, Y. Gao, W. Chen, D. Foran, et al. Sparse autoencoder for unsupervised nucleus detection and representation in histopathology images. Pattern recognition, 86:188–200, 2019.
-  L. Hou, D. Samaras, T. M. Kurc, Y. Gao, J. E. Davis, and J. H. Saltz. Patch-based convolutional neural network for whole slide tissue image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2424–2433, 2016.
-  J. Jass, Y. Ajioka, J. Allen, Y. Chan, R. Cohen, J. Nixon, M. Radojkovic, A. Restall, S. Stables, and L. Zwi. Assessment of invasive growth pattern and lymphocytic infiltration in colorectal cancer. Histopathology, 28(6):543–548, 1996.
-  A. M. Khan, N. Rajpoot, D. Treanor, and D. Magee. A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Transactions on Biomedical Engineering, 61(6):1729–1738, 2014.
-  L. H. R. S. J. C. D. S. J. S. L. J. N. J. Kolya Malkin, Caleb Robinson. Label super-resolution networks. In International Conference on Learning Representations (ICLR), 2019.
-  Y. Liu, K. Gadepalli, M. Norouzi, G. E. Dahl, T. Kohlberger, A. Boyko, S. Venugopalan, A. Timofeev, P. Q. Nelson, G. S. Corrado, et al. Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.02442, 2017.
-  J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
-  H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1520–1528, 2015.
-  D. Onder, S. Zengin, and S. Sarioglu. A review on color normalization and color deconvolution methods in histopathology. Applied Immunohistochemistry & Molecular Morphology, 22(10):713–719, 2014.
-  G. Papandreou, L.-C. Chen, K. P. Murphy, and A. L. Yuille. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1742–1750, 2015.
-  D. Pathak, P. Krahenbuhl, and T. Darrell. Constrained convolutional neural networks for weakly supervised segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1796–1804, 2015.
-  K. Rakelly, E. Shelhamer, T. Darrell, A. A. Efros, and S. Levine. Few-shot segmentation propagation with guided networks. arXiv preprint arXiv:1806.07373, 2018.
-  J. Ramos-Vara. Technical aspects of immunohistochemistry. Veterinary pathology, 42(4):405–426, 2005.
-  O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
-  A. C. Ruifrok, D. A. Johnston, et al. Quantification of histochemical staining by color deconvolution. Analytical and quantitative cytology and histology, 23(4):291–299, 2001.
-  J. Saltz, R. Gupta, L. Hou, T. Kurc, P. Singh, V. Nguyen, D. Samaras, K. R. Shroyer, T. Zhao, R. Batiste, et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell reports, 23(1):181–193, 2018.
-  E. C. Stack, C. Wang, K. A. Roman, and C. C. Hoyt. Multiplexed immunohistochemistry, imaging, and quantitation: a review, with an assessment of tyramide signal amplification, multispectral imaging and multiplex analysis. Methods, 70(1):46–58, 2014.
-  The TCGA team. The Cancer Genome Atlas. https://cancergenome.nih.gov/.
-  T. Tsujikawa, S. Kumar, R. N. Borkar, V. Azimi, G. Thibault, Y. H. Chang, A. Balter, R. Kawashima, G. Choe, D. Sauer, et al. Quantitative multiplex immunohistochemistry reveals myeloid-inflamed tumor-immune complexity associated with poor prognosis. Cell reports, 19(1):203–217, 2017.
-  Y. Wei, X. Liang, Y. Chen, X. Shen, M.-M. Cheng, J. Feng, Y. Zhao, and S. Yan. Stc: A simple to complex framework for weakly-supervised semantic segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(11):2314–2320, 2017.
-  B. Weigelt, F. C. Geyer, R. Natrajan, M. A. Lopez-Garcia, A. S. Ahmad, K. Savage, B. Kreike, and J. S. Reis-Filho. The molecular underpinning of lobular histological growth pattern: a genome-wide transcriptomic analysis of invasive lobular carcinomas and grade-and molecular subtype-matched invasive ductal carcinomas of no special type. The Journal of Pathology: A Journal of the Pathological Society of Great Britain and Ireland, 220(1):45–57, 2010.
-  J. Xu, X. Luo, G. Wang, H. Gilmore, and A. Madabhushi. A deep convolutional neural network for segmenting and classifying epithelial and stromal regions in histopathological images. Neurocomputing, 191:214–223, 2016.
-  E. Yanagita, N. Imagawa, C. Ohbayashi, and T. Itoh. Rapid multiplex immunohistochemistry using the 4-antibody cocktail yana-4 in differentiating primary adenocarcinoma from squamous cell carcinoma of the lung. Applied Immunohistochemistry & Molecular Morphology, 19(6):509–513, 2011.
Appendix A Assumption made during the computation of intra-instance variance
In the Intra-instance loss, the variance of label counts is computed using the following equation (also Eq. 3 in the main submission):
Eq. 14 assumes that for are independent with each other. We explain this in detail:
Let be a condensed index representing , and where is the indicator function. Then
By assuming that are independent from each other, we have for all . Thus,
In practice, the assumption that are independent from each other are usually not true. In other words, there exists for some . As a result, the value of Eq. 16 is strictly smaller than the true variance in Eq. 15.
The statistics matching process is trying to match the variance computed by Eq. 16 to the empirical variance. Since Eq. 16 would be smaller than the empirical variance, directly matching them would introduce bias. Let
and denote the empirical variance as . We can match the value of Eq. 16 with . The optimal value of depends on the distribution of data.
For the Intra-instance LSR baseline, instead of matching (, ) with (, ) respectively by the Eq. 5 in the original submission, we match (, ) with (, ) respectively. The hyperparameter is selected via experiments. Note that this term is not presented nor used in the original Intra-instance LSR paper .
In the proposed Intra+inter-instance LSR setting, the variance is computed by Eq. (11) in the original submission as follows:
Here is also computed by Eq. 16. Thus, it is also strictly smaller than the actual in real world datasets. Then is also smaller than the actual . In our experiments, we also select the hyperparameter for matching (, ) with (, ) respectively. To investigate the influence of on the final performance of different models. We test different values of on the breast cancer segmentation task and the results are shown in Table 5. We see that for the breast cancer segmentation task, the best is .
|Intra+inter instance loss (Masked IoU)||0.5656||0.5753||0.5716||0.6275||0.6223|
Appendix B Results using alternative ground truth label counts
The label super resolution network is trained using the conditional distribution which is the distribution of label counts within an image block with given low resolution label .
b.1 Visually approximating ground truth label counts
In the main submission we show a less accurate but faster way of obtaining . We call this process visual approximation. The process of it is as follows: A domain expert visually approximates the count of pixels without drawing the exact cancer mask. The count of the rest pixels in the image would be the count of pixels for the non-cancer region. The distributions of for each low resolution class is the visually approximated , as shown in Table 6.
b.2 Estimating ground truth label counts using masks
In practice, training with visually approximated would save annotation time but may impede the performance of the model. Thus, we also show performance of the models trained with mask estimated . We call this process mask estimation. The process is as follows: A domain expert draws an accurate mask of cancer regions for each image block. Then the count of pixels for a class (such as cancer or non-cancer) is directly computed from the mask of this image block. Given a low resolution label , the mask estimated distribution of can be computed, as shown in Table 7.
For both visual approximation and mask estimation methods, we extracted 12-20 blocks for each of the low resolution classes. A total number of blocks are extracted. For each low resolution label , we extract at most one image block with label , per WSI.
b.3 Results of visual approximation and mask estimation
The time consumed to visually approximate the count for all the images and draw the cancer masks for all the images is shown in Table 8. From the table we can see that the mask drawing time for an image is times the visual approximation time. It should be noted that training a deep model using pixel-level supervision that can generalize well to different slides requires many more drawn masks. The time of drawing 167 blocks here is still much less than the actual annotation time for training a traditional pixel-level supervised semantic segmentation model. We also show the performance of the model trained only with those 167 blocks with high resolution supervision in the second row of Table 9.
The performance of models trained using the visually approximated , the mask estimated and limited high resolution supervision are in Table 9. We can see that for Intra-instance LSR, mask estimation does not significantly improve the performance. For intra+inter-instance LSR, mask estimation significantly improves the performance. As a result, with mask estimated , the intra+inter-instance LSR outperforms the intra-instance LSR significantly. Both of the LSR methods outperform the low resolution labels (the first row in the table) which shows the effectiveness of label super resolution. And both of the LSR methods outperform the model trained with the 167 blocks with pixel-level supervision, this shows that training a pixel-level supervised semantic segmentation model requires large amount of pixel-level super vision. Limited amount of pixel-level super vision may lead to overfitting.
|Image block with||Count% of|
|low resolution class :||high resolution class :|
|probability% as cancer block||Cancer||Non-cancer|
|Image block with||Count% of|
|low resolution class :||high resolution class :|
|probability% as cancer block||Cancer||Non-cancer|
|Visual approximation||Drawing masks|
|Time for 167 images||56 min 32s||136 min 17s|
|Average time for 1 image||20.3s||48.96s|
|Masked IoU||Masked DICE|
|Low resolution model||0.5722||0.7279|
|Model trained with limited high-res supervision||0.5507||0.7103|
|Intra-instance LSR (visually approximated )||0.5827||0.7363|
|Intra-instance LSR (mask estimated )||0.5832||0.7367|
|Intra+inter-instance LSR (visually approximated )||0.5850||0.7381|
|Intra+inter-instance LSR (mask estimated )||0.6315||0.7741|
Appendix C Details of the patch-level breast cancer classifier
In Sec. 3.2.1 of the main submission, we use a patch-level classifier to automatically generate low resolution labels. We show details of the classifier here.
The patch-level breast cancer classifier labels patches with probabilities of containing cancer. These probabilities are quantized to 10 bins as low resolution labels for label super resolution, since the probability of containing cancer given by a classifier is correlated with the percentage of cancer regions. We trained the classifier using 102 Whole Slide Images (WSIs) from the Surveillance, Epidemiology, and End Results (SEER) dataset as training data. A pathologist drew the boundaries of cancer regions in WSIs, generating a cancer region mask. To train the patch-level classifier, we extracted patches of pixels in 40X magnification. The label for each patch (0 or 1) was set by thresholding the ratio of cancer region in the patch by 0.5. We used ResNet34 , as the patch classification network. The resulting classifier was validated on a set of 7 WSIs and tested on a set of 89 WSIs. The DICE score between the prediction of this classifier and ground truth mask in the test set in  is 0.791.
For label super resolution in Sec. 3.2 of the main submission, we merge four patches into one pixel image block. The low resolution label, quantized cancer probability, of an image block is the maximum probability among its 4 patches. The image block is then resized to pixels for training the label super resolution network.
Appendix D Visual examples of cell detection results in multiplex Immunohistochemistry (IHC) images
Fig. 5 and Fig. 6 show the cell detection results for two patches using different methods. From the results we can see that the color based methods (color deconvolution and color separation by L2 distance) work poorly due to the fact that these methods do not consider critical information such as shapes of different cell types. LSR methods are able to detect cell with proper shapes better. The Inter-instance LSR and Intra+inter-instance LSR tend to have less false positive predictions than the Intra-instance LSR especially for CD 16 and CD 8 as shown in Fig. 5.
Fig. 7 shows more cell segmentation results using the proposed intra+inter-instance loss. The model detects different types of cells reasonably well. Note that the model is trained without accurate pixel level annotation drawn by human.
Appendix E More visual examples of breast cancer segmentation results
Fig. 8 show more breast cancer segmentation results. The green lines are the segmentation boundaries by thresholding the low resolution probability scores for patches. The red lines are ground truth cancer boundaries given by pathologists. The blue lines are the cancer segmentation boundaries predicted by the proposed Intra+inter-instance LSR. The cyan lines are the the cancer segmentation boundaries predicted by the Intra-instance LSR baseline.
From those figures, we can see that the proposed Intra+inter-instance LSR predicts more continuous boundaries than the Intra-instance LSR baseline. Because given the low resolution label of a block, the Intra-instance LSR tries to match the count of pixels of cancer regions in each block with while the proposed Intra+inter-instance method considers the variance among blocks with the same low resolution label .
The segmentation results with super resolution are much closer to the ground truth than the low resolution results (green lines). It is to be noted that the annotation effort to train a super resolution model to super resolve from low resolution labels is much less than the effort for training a pixel-level supervised semantic segmentation model.