Center-Extraction-Based Three Dimensional Nuclei Instance Segmentation of Fluorescence Microscopy Images ††thanks: This work was partially supported by a George M. O’Brien Award from the National Institutes of Health under grant NIH/NIDDK P30 DK079312 and the endowment of the Charles William Harrison Distinguished Professorship at Purdue University.
Fluorescence microscopy is an essential tool for the analysis of 3D subcellular structures in tissue. An important step in the characterization of tissue involves nuclei segmentation. In this paper, a two-stage method for segmentation of nuclei using convolutional neural networks (CNNs) is described. In particular, since creating labeled volumes manually for training purposes is not practical due to the size and complexity of the 3D data sets, the paper describes a method for generating synthetic microscopy volumes based on a spatially constrained cycle-consistent adversarial network. The proposed method is tested on multiple real microscopy data sets and outperforms other commonly used segmentation techniques.
Optical fluorescence microscopy enables imaging three dimensional subcellular components in tissue . In particular, two-photon microscopy allows imaging deeper into the tissue with near-infrared excitation light . Three dimensional segmentation of subcellular components, such as nuclei, is required to quantify and analyze the microscopy volumes. It is tedious to manually create labeled ground truth volumes for training machine learning methods. Moreover, this task is further complicated when nuclei are touching.
Watershed techniques which select local maxima of a distance transform as markers have been used to segment touching nuclei . In  watershed markers are selected based on mathematical morphology to segment nuclei in time-lapse microscopy. Watershed approaches generally over-segment nuclei due to their irregular structures. To circumvent this, deformable models such as active surfaces have been investigated . A method using multiple active surfaces was introduced to separate touching nuclei wherein the energy functional includes a penalty term for overlapping nuclei and a constraint term for volume conservation . Alternatively, a method, known as Squassh, couples image restoration and segmentation by using an energy functional derived from a generalized linear model . A common issue that arises is that these methods frequently cannot distinguish nuclei from other biological structures.
Recently, convolutional neural networks (CNNs), that rely on the availability of large amounts of labeled training images, have been used for many computer vision problems . CNNs have very much impacted biomedical image analysis . A deep contour-aware-network is described for gland segmentation in . The network produces object segmented images and contour segmented images where the contour segmented images are used to separate touching glands. In  weights are assigned to the boundary of nuclei in hematoxylin and eosin (H&E) stained histology images during training to ensure touching nuclei are separated. More recently, a cell detection and segmentation technique is presented in  using a U-Net architecture .
One challenge of using CNNs in biomedical image analysis is the lack of labeled training images due to the expensive and tedious labeling process. Data augmentation techniques using simple transformations can be used to generate more training images but they still require labeled training images. To address the problem of limited availability of 3D ground truth volumes, we described in  the generation of 3D synthetic microscopy volumes without using any labeled volumes. The synthetic volumes were generated using a statistical model and a simple model of the point spread function of the microscope with ellipsoidal shaped nuclei. The synthetic volumes are then used to train CNNs to segment nuclei in real microscopy volumes. We also presented a 3D detection and segmentation method in  using synthetic microscopy volumes generated similar to our previous work described in .
There has been a great deal of work in generating realistic synthetic images that can be used for training using generative adversarial networks (GANs) . A CycleGAN was introduced where a GAN with a cycle consistency term can produce synthetic images that can be used for training without access to any actual ground truth images . We described a spatially constrained CycleGAN (SpCycleGAN) in  to generate synthetic images where a spatial constraint term is included in the CycleGAN. We then trained a CNN using the synthetic volumes generated by the SpCycleGAN to produce accurate binary segmentation masks . One problem in  is that we could not distinctly label each nucleus accurately.
In this paper, we present a 3D nuclei instance segmentation method using two CNNs for fluorescence microscopy volumes. We define “instance segmentation” as a process where each object is detected and segmented with distinct labels. This paper is different from our work described in  that detects locations of nuclei using a distance transform causing over-detection of irregular nuclei structures and segments each nucleus using a CNN trained by a set of blurred and noisy synthetic volumes generating inaccurate segmentation masks. In the present paper, we use realistic synthetic training volumes generated by the SpCycleGAN  to train one CNN to detect the location of nuclei and a second CNN to segment each nucleus accurately. Note no actual ground truth volumes are used for generating the synthetic volumes. During detection we extract the central area of nuclei that do not overlap with each other even when the surfaces of the nuclei may overlap. We evaluate our method using a ground truth volume generated from a real fluorescence microscopy volume from a rat kidney. Our data are collected using two-photon microscopy where nuclei labeled with Hoechst 33342 stain.
Ii Proposed Method
Figure 1 is a block diagram of our proposed method for 3D nuclei instance segmentation. A 3D image volume of size is denoted as and the 2D focal plane image of size along the -direction is denoted as , where . A subvolume of , whose -coordinate is , -coordinate is , -coordinate is is denoted as , where , , , , , and . It is required that , , and . Lastly, a voxel is denoted as v.
Our method consists of two CNNs as shown in Figure 1. The first CNN, , is used for nuclei detection and binary segmentation and the second CNN, , is used for nuclei instance segmentation. To segment each nucleus using the second CNN, the first CNN produces a set of coordinates of the nuclei center locations, denoted as , and a nuclei mask volume denoted as . Specifically, consists of the centroid coordinates of components in a detection volume, . To accurately select the elements of , especially when multiple nuclei are touching, the components in are chosen to have no touching regions for distinct nuclei. The second CNN segments an individual nucleus in a 3D patch from centered at and is color-coded to produce the final segmentation volume, . Note that color-coding is done to visually label each nucleus in . To train the two CNNs a SpCycleGAN described in  is used to generate synthetic microscopy volumes, . Our implementation is done using PyTorch .
Ii-a Synthetic Volume Generation
As indicated above, creating labeled ground truth 3D volumes is tedious. We use the SpCycleGAN we described in  to produce synthetic microscopy 3D volumes that we use for training. Note we do not need any actual ground truth volumes to use the approach described in this section. Synthetic microscopy volumes, , nuclei mask ground truth volumes, , and detection ground truth volumes, , need to be generated. We start by generating a random 3D nuclei mask volume and then use it to generate the synthetic volume. To generate we develop two approaches: the first approach produces synthetic spherical nuclei and the second approach produces elliptical nuclei based on nuclei structures in . For the first approach the synthetic nuclei, , is generated as a sphere with a randomly selected radius, , between and , and centered at a randomly selected coordinate, , where . Simultaneously, the central region, , is generated where a central region of a nucleus is defined as a sphere inside the nucleus where the centroid of the central region matches to the centroid of the nucleus. We intentionally set the radius of to be to avoid multiple connected central regions although their corresponding synthetic nuclei may be touching. Once synthetic nuclei and their central regions are produced, they are added to and sequentially where and are initialized to zero. If overlaps with any previous synthetic nuclei in , then and are not added to and , respectively.
For the second approach is generated as an ellipsoid with randomly and independently selected three semi-axes between and , randomly rotated in , , and -axes, and centered at a randomly selected coordinate, . In our experiments we used both approaches for generating synthetic images.
Once the nuclei mask ground truth volume, , and the detection ground truth volume, , are generated, we use the SpCycleGAN to generate the corresponding synthetic volume, . For our experiments we generated 20 sets of synthetic volumes with a size of . Figure 2 shows examples of a real microscopy volume, a synthetic microscopy volume, and synthetic ground truth volumes visualized by Voxx , respectively.
Ii-B Nuclei Detection and Binary Segmentation
Our first CNN used for nuclei detection and binary segmentation outputs nuclei center locations, , and a nuclei mask volume, (Figure 1). This CNN is shown in more detail in Figure 3 and uses a modified 3D U-Net architecture . can be selected by finding centroids of elements of . To avoid false-detection, labels in are labeled as background if at the same voxel locations are labeled as background. Also, components with the number of voxels less than are not considered in order to remove noise. A 3D convolutional layer consists of a convolutional operation with a kernel with 1 voxel padding, 3D batch normalization, and a rectified-linear unit (ReLU) activation function. Note that the Sigmoid function is used as an activation function for the last convolutional layers. In the encoder, 3D max-pooling layer uses window with a stride of 2. In the decoder, a 3D transposed convolutional layer followed by 3D batch normalization and a ReLU activation function is used. In addition, concatenation transfers feature maps from the encoder to the decoder. The size of input/output volumes are . If the size of is larger than , then a 3D window with size of is moved in the , , and -directions until the entire is processed . During training, the Adam optimizer  is used with a learning rate of 0.001. The training loss function is a sum of the Binary Cross Entropy (BCE) loss of the detection volume and the BCE loss of the nuclei mask volume. The BCE loss, , is defined as where is the output volume, is the ground truth volume, and is the total number of voxels in the volume. For the training set, we used 160 synthetic volumes with a size of . Each synthetic volume with a size of generated in the synthetic volume generation stage is divided into 8 volumes with a size of .
Ii-C Nuclei Instance Segmentation
The goal of our method is nuclei instance segmentation which is segmenting individual detected nuclei with distinct labels. Therefore, the last step is to segment each nucleus in at a detected coordinate, , using our second CNN shown in Figure 4. First, the nucleus is cropped and included in a 3D patch with a size of from centered at , denoted as . Then the second CNN segments only the nucleus in and removes other nuclei structures partially included in the patch. Here, we denote the segmented nucleus as . Once the nucleus is segmented, it is color-coded and inserted in where the center location of lies at .
The second CNN in Figure 1, , consists of a series of 3D convolutional layers. We use dilated convolutions  to have receptive field larger than the size of the patch. From the feature map, , with a convolution filter, , the feature map, , is generated using a -dilated convolution at a voxel v as where is known as the dilation factor. Figure 4 shows the dilation factors for the convolutional layers such that the final receptive field is larger than . Note the kernel size for the last convolutional layer of the second CNN is . During training, the Adam optimizer  is used with a learning rate of 0.001. The BCE loss is used as the training loss function. 300 patches from centered at are used for the training.
Iii Experimental Results
Our method is tested on three rat kidney data sets. All data sets consist of gray scale images of size . Data-I consists of images, Data-II of , and Data-III of . To match resolution in -direction to resolution in and -directions, Data-II is downsampled in -direction by a factor of 2 and Data-III is linearly interpolated in -direction by a factor of 2. , , and with a spherical model and for Data-I, , , and with an ellipsoidal model and for Data-II, and , , and with a spherical model and for Data-III are used, respectively. Note that the size of synthetic nuclei for Data-I is small, so the size of patches during nuclei instance segmentation is reduced to and the fourth convolutional layer in is removed. Figure 5 shows original images and segmented images for Data-I, Data-II, and Data-III.
Our method was compared to other segmentation methods using object-wise evaluation criterion . The other segmentation methods include Squassh , watershed , our previous detection and segmentation method  that we will denote as Purdue1, and our previous segmentation method using a SpCycleGAN  that we will denote as Purdue2. Note our method in  generates binary segmentation masks but cannot label nuclei distinctly. To label touching nuclei distinctly, we added a post-processing step in Purdue2 using morphological operations with a 3D erosion, a 3D connected component for color-coding, and a 3D dilation with a sphere of radius of 1 used as the structuring element. For the object-wise evaluation, Precision (), Recall (), and F1 score () are defined as , , and , where , , and are the number of true positive objects, the number of false positive objects, and the number of false negative objects, respectively. A segmented nucleus is defined as a true positive object if it intersects at least 50% of the corresponding ground truth nucleus. Otherwise, it is defined as a false positive object. A ground truth nucleus is defined as a false negative object if it intersects less than 50% of the corresponding segmented nucleus or there is no corresponding segmented nucleus. In our evaluation, we generated a 3D ground truth volume, , using ITK-SNAP  from Data-I with size of containing 283 nuclei. Note that any components whose number of voxels is less than 50 are removed on and to remove partially included nuclei on the boundary of the subvolume.
Table I and Figure 6 show the object-based evaluation and the segmentation results visualized by Voxx  of other methods and our new proposed method for Data-I, respectively. Squassh  cannot distinguish nuclei and non-nuclei structures and cannot successfully separate touching objects. For watershed , is first binarized by a manually-selected threshold value of 64. Thresholding cannot distinguish nuclei and non-nuclei structures and watershed technique over-segments foreground region. Purdue1 can reject non-nuclei structures but still have a poor F1 score. Purdue2 can generate an accurate binary segmentation mask but cannot separate all touching nuclei. Our proposed method, detecting the locations of nuclei and individually segmenting nuclei in 3D patches using the SpCycleGAN, can successfully segment and separate nuclei.
This paper presented a nuclei instance segmentation method using a center-extraction technique to detect the center locations of nuclei. We individually segmented nuclei in 3D patches surrounding the nuclei. Our method can successfully segment nuclei visually and numerically. In the future we plan to develop a synthetic volume generation model which can produce synthetic nuclei with other shapes.
Data-I was provided by Malgorzata Kamocka of Indiana University and was collected at the Indiana Center for Biological Microscopy. Data-II was provided by Sherry Clendenon collected while at the Indiana Center for Biological Microscopy. She is currently at the Department of Intelligent Systems Engineering of Indiana University.
-  (2016-06) DCAN: deep contour-aware networks for accurate gland segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2487–2496. Note: Las Vegas, NV Cited by: §I.
-  (2016-10) 3D U-Net: Learning dense volumetric segmentation from sparse annotation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention, pp. 424–432. Note: Athens, Greece Cited by: §I, §II-B.
-  (2002-01) Voxx: a PC-based, near real-time volume rendering system for biological microscopy. American Journal of Physiology-Cell Physiology 282 (1), pp. C213–C218. Cited by: §II-A, §III.
-  (2015-01) Snakes on a plane: a perfect snap for bioimage analysis. IEEE Signal Processing Magazine 32 (1), pp. 41–48. Cited by: §I.
-  (2005-09) Segmenting and tracking fluorescent cells in dynamic 3-D microscopy with coupled active surfaces. IEEE Transactions on Image Processing 14 (9), pp. 1396–1410. Cited by: §I.
-  (2002-09) Functional studies of the kidney of living animals using multicolor two-photon microscopy. American Journal of Physiology-Cell Physiology 283 (3), pp. C905–C916. Cited by: §I.
-  (2019-01) U-Net: deep learning for cell counting, detection, and morphometry. Nature Method 16, pp. 67–70. Cited by: §I.
-  (2018-06) Three dimensional fluorescence microscopy image synthesis and segmentation. Proceedings of the Computer Vision for Microscopy Image Analysis workshop at Computer Vision and Pattern Recognition, pp. 2302–2310. Note: Salt Lake City, UT Cited by: §I, §I, §II-A, §II, TABLE I, §III.
-  (2014-12) Generative adversarial nets. Proceedings of the Advances in Neural Information Processing Systems, pp. 2672–2680. Note: Montreal, Canada Cited by: §I.
-  (2018-04) SAMS-NET: stain-aware multi-scale network for instance-based nuclei segmentation in histology images. Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 590–594. Note: Washington, D.C. Cited by: §I.
-  (2017-07) Nuclei segmentation of fluorescence microscopy images using three dimensional convolutional neural networks. Proceedings of the Computer Vision for Microscopy Image Analysis workshop at Computer Vision and Pattern Recognition, pp. 834–842. Note: Honolulu, HI Cited by: §I, §II-B.
-  (2018-04) Nuclei detection and segmentation of fluorescence microscopy images using three dimensional convolutional neural networks. Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 418–422. Note: Washington, D.C. Cited by: §I, §I, TABLE I, §III.
-  (2014-12) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, pp. 1–15. Cited by: §II-B, §II-C.
-  (2015-05) Deep learning. Nature 521, pp. 436–444. Cited by: §I.
-  (2017-07) A survey on deep learning in medical image analysis. Medical Image Analysis 42, pp. 60–88. Cited by: §I.
-  (2017-12) Automatic differentiation in pytorch. Proceedings of the Autodiff Workshop at the Advances in Neural Information Processing Systems, pp. 1–4. Note: Long Beach, CA Cited by: §II.
-  (2011-12) Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies 2 (1), pp. 37–63. Cited by: §III.
-  (2014-02) Segmentation and quantification of subcellular structures in fluorescence microscopy images using Squassh. Nature Protocols 9 (3), pp. 586–596. Cited by: §I, TABLE I, §III, §III.
-  (1991-06) Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (6), pp. 583–598. Cited by: §I, TABLE I, §III, §III.
-  (2006-05) The colored revolution of bioimaging. IEEE Signal Processing Magazine 23 (3), pp. 20–31. Cited by: §I.
-  (2006-11) Nuclei segmentation using marker-controlled watershed, tracking using mean-shift, and Kalman filter in time-lapse microscopy. IEEE Transactions on Circuits and Systems I: Regular Papers 53 (11), pp. 2405–2414. Cited by: §I.
-  (2016-04) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, pp. 1–13. Cited by: §II-C.
-  (2006-07) User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. NeuroImage 31 (3), pp. 1116–1128. Cited by: §III.
-  (2017-10) Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232. Note: Venice, Italy Cited by: §I.