Image Segmentation to Distinguish Between Overlapping Human Chromosomes
In medicine, visualizing chromosomes is important for medical diagnostics, drug development, and biomedical research. Unfortunately, chromosomes often overlap and it is necessary to identify and distinguish between the overlapping chromosomes. A segmentation solution that is fast and automated will enable scaling of cost effective medicine and biomedical research. We apply neural network-based image segmentation to the problem of distinguishing between partially overlapping DNA chromosomes. A convolutional neural network is customized for this problem. The results achieved intersection over union (IOU) scores of 94.7% for the overlapping region and 88-94% on the non-overlapping chromosome regions.
Image Segmentation to Distinguish Between Overlapping Human Chromosomes
R. Lily Hu UC Berkeley, Salesforce Research firstname.lastname@example.org Jeremy Karnowski Insight Data Science email@example.com Ross Fadely Insight Data Science firstname.lastname@example.org Jean-Patrick Pommier https://dip4fish.blogspot.fr/ email@example.com
noticebox[b]31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\end@float
Neural networks are a powerful approach to segmenting images, including for street scenes and biomedical images of tissue. In medicine, visualizing chromosomes is important for medical diagnostics, drug development, and biomedical research. Unfortunately, chromosomes often overlap and it is necessary to identify and distinguish between the overlapping chromosomes. For example, some diseases are associated with particular chromosomes or the existence of more or fewer than the expected number of chromosomes. Challenges to this problem include that the overlapping objects may be nearly identical and that it is arbitrary which object is considered the first object and which one the second. Furthermore, overlapping chromosomes may look like one larger chromosome, may criss-cross, or one may be almost entirely on top of the other. A segmentation solution that is fast and automated will enable scaling of cost effective medicine and biomedical research. Traditional methods of distinguishing between overlapping chromosomes involved printing and cutting out individual chromosomes by hand, thresholding on histogram values of pixels, geometric analysis of chromosome contours, among others, and required human intervention when partial overlaps occur.
In this work, we apply neural network-based image segmentation to the problem of distinguishing between partially overlapping human chromosomes.111Code available: https://github.com/LilyHu/image_segmentation_chromosomes A convolutional neural network, based on U-Net, is customized for this problem. The model is designed so that the output segmentation map has the same dimensions as the input image. To reduce computation time and storage, the model is also simplified. This is because the dimensions of the input image, the set of potential objects in the image, and the set of potential chromosome shapes, are all small, which reduces the scope of the problem, the required capacity of the model, and thus the modeling needs. Various hyperparameters of the model are explored and tested.
2.1 Cytogenetics and Molecular Cytogenetics
Cytogenetics is the study of chromosomes, including their numbers and structures up to the nucleotids scale   . Pionneering works in species from flies to maise  enabled the understanding of genes and their inheritance. Human cytogenetics started in 1956 with the discovery of the exact number of chromosomes in humans , soon followed by the discovery that structural chromosomal or number anomalies can be be associated with cancer or developmental diseases. Human cytogenetics become a diagnostic tool. Cytogenetics is also used as a biological dosimeter in radiobiology, which is the study of the effect of radiation on living beings .
2.2 Digital Image Processing in Cytogenetics
The advent of molecular cytogenetics and fluorescent probes (FISH or Fluorescent in-situ Hybridization) yields insights otherwise inaccessible by stained-based cytogenetics. Computers and dedicated software applications started to replace scissor cutouts of black and white pictures of chromosomes for karyotyping. New algorithms and application were developed to process and interpret fluorescent images, study genomic hybridization, and measure the telomere length Q-FISH   . Quantitative methods were developed to become metaphase-free and array-based . Metaphasic chromosomes were used to detect targeted chromosomal anomalies  or for QFISH .
Computer based chromosome segmentation and classification is still an open problem , particularly the resolving of overlapping chromosomes. Up to now, approaches rely on geometric approachs based on contour analysis , finding a skeleton   . These methods can be rule-based or involve classifiers with hand crafted features. Even for a case as simple as a pair of crossing chromosomes forming a cross, there is ambiguity when it comes to reassembling the pieces to reconstitute the two chromosomes . Grisan et al. developed a tree search to address this issue .
2.3 Contour-Based Resolution of Crossing Chromosomes
Chromosomes can be DAPI stained in fluorescence imaging, or stained with giemsa in conventional cytogenetics. After adaptive thresholding and labeling of connected components of binary particles, images of chromosomes can be isolated. Those images can yield single chromosomes, touching chromosomes or overlapping chromosomes.
In the following emblematic example taken from a metaphase, shown in Figure 1, a polygonal approximation is computed from the chromosome contour and some remarkable points can be isolated. The four points corresponding to the chromosomal crossing determine a polygon containing the pixels belonging to the overlapping domain.
Even for a case as emblematic as a pair of crossing chromosomes forming a four-armed cross, there is ambiguity of a combinatorial nature when it comes to reassembling the pieces to reconstitute the two chromosomes . This ambiguity is illustrated in Figure 2.
This ambiguity necessites a decision. Grisan et al. developed a tree search from high resolution Q banded chromosomes to address this issue . Successful results were reported on resolving chromosomes clusters , on limited numbers of chromosome clusters extracted from images of metaphases, and in some cases on synthetic images combining chromosomes using Adobe CS.
2.4 Deep Learning for Image Segmentation
Convolutional neural networks are popular for image segmentation. These include fully convolutional network , dilated convolutions , and encoder-decoder architectures  . We propose to solve the overlapping chromosome problem by replacing geometric algorithms with methods from deep learning.
3.1 Collection and Generation
To create a segmentation solution to resolve overlapping chromosomes, we built a dataset for semantic segmentation using thousands of semi-synthetically generated overlapping chromosomes.
Images of single chromosomes were extracted from an image of human metaphase hybridized with a Cy3 fluorescent telomeric probe . Blue (DAPI) and orange (Cy3) components of the image of a single chromosome were combined into a greyscale image as shown in Figure 3. Then the resolution of the images were decrease by two.
From the set of 46 chromosomes, there are possible pairs of chromosomes. 12 chromosomes were kept to generate a subset of pairs of chromosomes to combine different chromosomal size and morphology. In each pair of chromosomes, each chromosome was rotated and one chromosome was relatively translated horizontally and vertically to the other one. The overlapping chromosomes were generated by meaning the two grey scaled images of the chromosomes. The so-called ground-truth labels were generated by adding the mask of each single chromosome. By choosing the value 1 for the mask of the first chromosome and the value 2 for the mask of the other chromosome, the label of the overlapping domain has the value 3. Only pairs with ground-truth containing overlapping domains were kept. Raw images of metaphasic chromosomes, dataset and a jupyter notebook are available from kaggle or from dip4fish blog , , .
3.2 Description of the Dataset
The final data set is comprised of about thirteen thousand grayscale images (94 x 93 pixels). For each image, there is a ground truth segmentation map of the same size, as shown in Figure 4. In the segmentation map, class labels of 0 (shown as black) correspond to the background, class labels of 1 (shown as red below) correspond to non-overlapping regions of one chromosome, class labels of 2 (show as green) correspond to non-overlapping regions of the second chromosome, and labels of 3 (shown as blue) correspond to overlapping regions.
A few erroneous labels of 4 were corrected to match the label of the surrounding pixels. Mislabels on the non-overlapping regions, which were seen as artifacts in the segmentation map (example in Figure 5), were addressed by assigning them to the background class unless there were at least three neighboring pixels that were in the chromosome class. The images were cropped to 88 x 88 pixels so that the dimensions were divisible by 2, which helped processing in the pooling layers of the neural network.
4 Methods and Model Architecture
One simple solution is to classify pixels based on their intensity. Unfortunately, when histograms of the overlapping region and the single chromosome regions are plotted, as shown in Figure 6, there is significant overlap between the two histograms. Thus, a simple algorithm based on a threshold pixel intensity value would perform poorly.
A convolutional neural network was created for this problem, illustrated in Figure 7. The deep learning solution used for this problem was inspired by U-Net, a convolutional neural network for image segmentation that was demonstrated on medical images of cells. The model for overlapping chromosomes was designed so that the output segmentation map has the same length and width as the input image. To reduce computation time and storage, the model was also simplified, with almost a third fewer layers and blocks. This is because the dimensions of the input image are small (an order of magnitude smaller than the input to U-Net) and thus too many pooling layers is undesirable. Furthermore, the set of potential objects in the chromosome images is small and the set of potential chromosome shapes is also quite limited, which reduces the scope of the problem and thus the modeling needs. Also, cropping was not done within the network and padding was set to be ‘same’. This was because given the small input image, it was undesirable to remove pixels.
Since the problem was not straightforward, various architectures were investigated and the design of the model went through several iterations. These investigations included encoding the class labels as integers, using one-hot encodings, combining the classes of the non-overlapping regions, treating each chromosome separately, using or not using class weights, trying different activation functions, and choosing different loss functions. The model was trained on 64% of the data, validated on 16% of the data, and tested on the last 20% of the data.
Visualizations of the input, ground truth, and model predictions are shown in Figure 8. To quantitatively assess the results, the intersection over union (IOU, or Jaccard’s index) is calculated. The model is able to achieve an IOU of 94.7% for the overlapping region, and 88.2% and 94.4% on the two chromosomes.
6 Conclusion and Future Work
The deep learning model resulted in IOU scores of up to 94.7% on overlapping chromosomes. To improve the prediction results, the data set can be supplemented with images of single chromosomes and more than two overlapping chromosomes. Data augmentation can also include transformations such as rotations, reflections, and stretching. Additional hyperparameters can also be explored, such as sample weights, filter numbers, and layer numbers. Increasing convolution size may improve misclassification between the red and green chromosomes.
To build a production system that can operate on entire microscope images, the model proposed in this paper can be combined with an object detection algorithm. First, the object detection algorithm can draw bounding boxes around chromosomes in an image. Then, an image segmentation algorithm, based on the model presented here, can identify and separate chromosomes.
-  T. Arora and R. Dhir. A review of metaphase chromosome image selection techniques for automatic karyotype generation. Med Biol Eng Comput, 54(8):1147–1157, Aug 2016.
-  G. Aubert, M. Hills, and P. M. Lansdorp. Telomere length measurement-caveats and a critical assessment of the available technologies and tools. Mutat. Res., 730(1-2):59–67, Feb 2012.
-  Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561, 2015.
-  M. A. Ferguson-Smith. History and evolution of cytogenetics. Mol Cytogenet, 8:19, 2015.
-  J. M. Garcia-Sagredo. Fifty years of cytogenetics: a parallel view of the evolution of cytogenetics and genotoxicology. Biochim. Biophys. Acta, 1779(6-7):363–375, 2008.
-  E. Grisan, E. Poletti, and A. Ruggeri. Automatic segmentation and disentangling of chromosomes in Q-band prometaphase images. IEEE Trans Inf Technol Biomed, 13(4):575–581, Jul 2009.
-  L. Ji. Fully automatic chromosome segmentation. Cytometry, 17(3):196–208, Nov 1994.
-  Pommier JP. Resolving overlapping chromosomes: an emblematic case, 2013.
-  Pommier JP. Generating images of overlapping chromosomes. https://dip4fish.blogspot.fr/2016/06/generating-images-of-overlapping.html, 2016. [Online; accessed 19-2016-06-21].
-  Pommier JP. Overlapping chromosomes. https://www.kaggle.com/jeanpat/overlapping-chromosomes/data, 2016.
-  Pommier JP. Overlapping chromosomes. https://github.com/jeanpat/DeepFISH/tree/master/dataset, 2016.
-  P. M. Lansdorp, N. P. Verwoerd, F. M. van de Rijke, V. Dragowska, M. T. Little, R. W. Dirks, A. K. Raap, and H. J. Tanke. Heterogeneity in telomere length of human chromosomes. Hum. Mol. Genet., 5(5):685–691, May 1996.
-  T. Liehr. Cytogenetically visible copy number variations (CG-CNVs) in banding and molecular cytogenetics of human; about heteromorphisms and euchromatic variants. Mol Cytogenet, 9:5, 2016.
-  Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
-  B. McClintock. A Cytological and Genetical Study of Triploid Maize. Genetics, 14(2):180–222, Mar 1929.
-  Shervin Minaee, Mehran Fotouhi, and Babak Hossein Khalaj. A geometric approach for fully automatic chromosome segmentation, 2011.
-  M. V. Munot, J. Mukherjee, and M. Joshi. A novel approach for efficient extrication of overlapping chromosomes in automated karyotyping. Med Biol Eng Comput, 51(12):1325–1338, Dec 2013.
-  S. S. Poon, U. M. Martens, R. K. Ward, and P. M. Lansdorp. Telomere length measurements using digital fluorescence microscopy. Cytometry, 36(4):267–278, Aug 1999.
-  M. Popescu, P. Gader, J. Keller, C. Klein, J. Stanley, and C. Caldwell. Automatic karyotyping of metaphase cells with overlapping chromosomes. Comput. Biol. Med., 29(1):61–82, Jan 1999.
-  Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015.
-  B. J. Trask. Human cytogenetics: 46 chromosomes, 46 years and counting. Nat. Rev. Genet., 3(10):769–778, 10 2002.
-  E. Vera and M. A. Blasco. Beyond average: potential for measurement of short telomeres. Aging (Albany NY), 4(6):379–392, Jun 2012.
-  Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.