Image Segmentation to Distinguish Between Overlapping Human Chromosomes

Image Segmentation to Distinguish Between Overlapping Human Chromosomes

R. Lily Hu
UC Berkeley, Salesforce Research
lhu@salesforce.com
&Jeremy Karnowski
Insight Data Science
jeremy@insightdatascience.com
&Ross Fadely
Insight Data Science
ross@insightdatascience.com
&Jean-Patrick Pommier
https://dip4fish.blogspot.fr/
jeanpatrick.pommier@gmail.com
Abstract

In medicine, visualizing chromosomes is important for medical diagnostics, drug development, and biomedical research. Unfortunately, chromosomes often overlap and it is necessary to identify and distinguish between the overlapping chromosomes. A segmentation solution that is fast and automated will enable scaling of cost effective medicine and biomedical research. We apply neural network-based image segmentation to the problem of distinguishing between partially overlapping DNA chromosomes. A convolutional neural network is customized for this problem. The results achieved intersection over union (IOU) scores of 94.7% for the overlapping region and 88-94% on the non-overlapping chromosome regions.

 

Image Segmentation to Distinguish Between Overlapping Human Chromosomes


  R. Lily Hu UC Berkeley, Salesforce Research lhu@salesforce.com Jeremy Karnowski Insight Data Science jeremy@insightdatascience.com Ross Fadely Insight Data Science ross@insightdatascience.com Jean-Patrick Pommier https://dip4fish.blogspot.fr/ jeanpatrick.pommier@gmail.com

\@float

noticebox[b]31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\end@float

1 Introduction

Neural networks are a powerful approach to segmenting images, including for street scenes and biomedical images of tissue. In medicine, visualizing chromosomes is important for medical diagnostics, drug development, and biomedical research. Unfortunately, chromosomes often overlap and it is necessary to identify and distinguish between the overlapping chromosomes. For example, some diseases are associated with particular chromosomes or the existence of more or fewer than the expected number of chromosomes. Challenges to this problem include that the overlapping objects may be nearly identical and that it is arbitrary which object is considered the first object and which one the second. Furthermore, overlapping chromosomes may look like one larger chromosome, may criss-cross, or one may be almost entirely on top of the other. A segmentation solution that is fast and automated will enable scaling of cost effective medicine and biomedical research. Traditional methods of distinguishing between overlapping chromosomes involved printing and cutting out individual chromosomes by hand, thresholding on histogram values of pixels, geometric analysis of chromosome contours, among others, and required human intervention when partial overlaps occur.

In this work, we apply neural network-based image segmentation to the problem of distinguishing between partially overlapping human chromosomes.111Code available: https://github.com/LilyHu/image_segmentation_chromosomes A convolutional neural network, based on U-Net, is customized for this problem. The model is designed so that the output segmentation map has the same dimensions as the input image. To reduce computation time and storage, the model is also simplified. This is because the dimensions of the input image, the set of potential objects in the image, and the set of potential chromosome shapes, are all small, which reduces the scope of the problem, the required capacity of the model, and thus the modeling needs. Various hyperparameters of the model are explored and tested.

Section 2 outlines the background, Section 3 describes the data and preprocessing, Section 4 elaborates on the model, Section 5 summarizes the results, and Section 6 concludes with future work.

2 Background

2.1 Cytogenetics and Molecular Cytogenetics

Cytogenetics is the study of chromosomes, including their numbers and structures up to the nucleotids scale [4] [13] . Pionneering works in species from flies to maise [15] enabled the understanding of genes and their inheritance. Human cytogenetics started in 1956 with the discovery of the exact number of chromosomes in humans [21], soon followed by the discovery that structural chromosomal or number anomalies can be be associated with cancer or developmental diseases. Human cytogenetics become a diagnostic tool. Cytogenetics is also used as a biological dosimeter in radiobiology, which is the study of the effect of radiation on living beings [5].

2.2 Digital Image Processing in Cytogenetics

The advent of molecular cytogenetics and fluorescent probes (FISH or Fluorescent in-situ Hybridization) yields insights otherwise inaccessible by stained-based cytogenetics. Computers and dedicated software applications started to replace scissor cutouts of black and white pictures of chromosomes for karyotyping. New algorithms and application were developed to process and interpret fluorescent images, study genomic hybridization, and measure the telomere length Q-FISH [12] [18] [2]. Quantitative methods were developed to become metaphase-free and array-based [4]. Metaphasic chromosomes were used to detect targeted chromosomal anomalies [21] or for QFISH [22].

Computer based chromosome segmentation and classification is still an open problem [1], particularly the resolving of overlapping chromosomes. Up to now, approaches rely on geometric approachs based on contour analysis [7], finding a skeleton [19] [17] [16]. These methods can be rule-based or involve classifiers with hand crafted features. Even for a case as simple as a pair of crossing chromosomes forming a cross, there is ambiguity when it comes to reassembling the pieces to reconstitute the two chromosomes [8]. Grisan et al. developed a tree search to address this issue [6].

2.3 Contour-Based Resolution of Crossing Chromosomes

Chromosomes can be DAPI stained in fluorescence imaging, or stained with giemsa in conventional cytogenetics. After adaptive thresholding and labeling of connected components of binary particles, images of chromosomes can be isolated. Those images can yield single chromosomes, touching chromosomes or overlapping chromosomes.

In the following emblematic example taken from a metaphase, shown in Figure 1, a polygonal approximation is computed from the chromosome contour and some remarkable points can be isolated. The four points corresponding to the chromosomal crossing determine a polygon containing the pixels belonging to the overlapping domain.

Figure 1: Isolation of crossing domain from contour analysis of two crossing chromosomes. Remarkable points are found from contour (left), crossing domain can be found from four points then used to isolate the different parts of two crossing chromosomes (right).

Even for a case as emblematic as a pair of crossing chromosomes forming a four-armed cross, there is ambiguity of a combinatorial nature when it comes to reassembling the pieces to reconstitute the two chromosomes [8]. This ambiguity is illustrated in Figure 2.

This ambiguity necessites a decision. Grisan et al. developed a tree search from high resolution Q banded chromosomes to address this issue [6]. Successful results were reported on resolving chromosomes clusters[17] [16], on limited numbers of chromosome clusters extracted from images of metaphases, and in some cases on synthetic images combining chromosomes using Adobe CS[17].

Figure 2: Combinatorial issue when reassembling segmented parts of two crossing chromosomes. In this case three pairs, mutually exclusive, can be generated.

2.4 Deep Learning for Image Segmentation

Convolutional neural networks are popular for image segmentation. These include fully convolutional network [14], dilated convolutions [23], and encoder-decoder architectures [20] [3]. We propose to solve the overlapping chromosome problem by replacing geometric algorithms with methods from deep learning.

3 Data

3.1 Collection and Generation

To create a segmentation solution to resolve overlapping chromosomes, we built a dataset for semantic segmentation using thousands of semi-synthetically generated overlapping chromosomes.

Images of single chromosomes were extracted from an image of human metaphase hybridized with a Cy3 fluorescent telomeric probe [12]. Blue (DAPI) and orange (Cy3) components of the image of a single chromosome were combined into a greyscale image as shown in Figure 3. Then the resolution of the images were decrease by two.

Figure 3: Combination of DAPI (Chromosome) and Cy3 (Telomeres) images into a grey scaled image

From the set of 46 chromosomes, there are possible pairs of chromosomes. 12 chromosomes were kept to generate a subset of pairs of chromosomes to combine different chromosomal size and morphology. In each pair of chromosomes, each chromosome was rotated and one chromosome was relatively translated horizontally and vertically to the other one. The overlapping chromosomes were generated by meaning the two grey scaled images of the chromosomes. The so-called ground-truth labels were generated by adding the mask of each single chromosome. By choosing the value 1 for the mask of the first chromosome and the value 2 for the mask of the other chromosome, the label of the overlapping domain has the value 3. Only pairs with ground-truth containing overlapping domains were kept. Raw images of metaphasic chromosomes, dataset and a jupyter notebook are available from kaggle or from dip4fish blog [9], [10], [11].

3.2 Description of the Dataset

The final data set is comprised of about thirteen thousand grayscale images (94 x 93 pixels). For each image, there is a ground truth segmentation map of the same size, as shown in Figure 4. In the segmentation map, class labels of 0 (shown as black) correspond to the background, class labels of 1 (shown as red below) correspond to non-overlapping regions of one chromosome, class labels of 2 (show as green) correspond to non-overlapping regions of the second chromosome, and labels of 3 (shown as blue) correspond to overlapping regions.

Figure 4: Sample of overlapping chromosomes input image and ground-truth label

3.3 Preprocessing

A few erroneous labels of 4 were corrected to match the label of the surrounding pixels. Mislabels on the non-overlapping regions, which were seen as artifacts in the segmentation map (example in Figure 5), were addressed by assigning them to the background class unless there were at least three neighboring pixels that were in the chromosome class. The images were cropped to 88 x 88 pixels so that the dimensions were divisible by 2, which helped processing in the pooling layers of the neural network.

Figure 5: An initial data pre-processing step was performed on segmentation maps that had artifacts

4 Methods and Model Architecture

One simple solution is to classify pixels based on their intensity. Unfortunately, when histograms of the overlapping region and the single chromosome regions are plotted, as shown in Figure 6, there is significant overlap between the two histograms. Thus, a simple algorithm based on a threshold pixel intensity value would perform poorly.

Figure 6: Histogram of pixel vales

A convolutional neural network was created for this problem, illustrated in Figure 7. The deep learning solution used for this problem was inspired by U-Net, a convolutional neural network for image segmentation that was demonstrated on medical images of cells. The model for overlapping chromosomes was designed so that the output segmentation map has the same length and width as the input image. To reduce computation time and storage, the model was also simplified, with almost a third fewer layers and blocks. This is because the dimensions of the input image are small (an order of magnitude smaller than the input to U-Net) and thus too many pooling layers is undesirable. Furthermore, the set of potential objects in the chromosome images is small and the set of potential chromosome shapes is also quite limited, which reduces the scope of the problem and thus the modeling needs. Also, cropping was not done within the network and padding was set to be ‘same’. This was because given the small input image, it was undesirable to remove pixels.

Figure 7: Resulting neural network for separating overlapping chromosomes

Since the problem was not straightforward, various architectures were investigated and the design of the model went through several iterations. These investigations included encoding the class labels as integers, using one-hot encodings, combining the classes of the non-overlapping regions, treating each chromosome separately, using or not using class weights, trying different activation functions, and choosing different loss functions. The model was trained on 64% of the data, validated on 16% of the data, and tested on the last 20% of the data.

5 Results

Visualizations of the input, ground truth, and model predictions are shown in Figure 8. To quantitatively assess the results, the intersection over union (IOU, or Jaccard’s index) is calculated. The model is able to achieve an IOU of 94.7% for the overlapping region, and 88.2% and 94.4% on the two chromosomes.

Figure 8: Comparison of prediction with ground truth

6 Conclusion and Future Work

The deep learning model resulted in IOU scores of up to 94.7% on overlapping chromosomes. To improve the prediction results, the data set can be supplemented with images of single chromosomes and more than two overlapping chromosomes. Data augmentation can also include transformations such as rotations, reflections, and stretching. Additional hyperparameters can also be explored, such as sample weights, filter numbers, and layer numbers. Increasing convolution size may improve misclassification between the red and green chromosomes.

To build a production system that can operate on entire microscope images, the model proposed in this paper can be combined with an object detection algorithm. First, the object detection algorithm can draw bounding boxes around chromosomes in an image. Then, an image segmentation algorithm, based on the model presented here, can identify and separate chromosomes.

References

  • [1] T. Arora and R. Dhir. A review of metaphase chromosome image selection techniques for automatic karyotype generation. Med Biol Eng Comput, 54(8):1147–1157, Aug 2016.
  • [2] G. Aubert, M. Hills, and P. M. Lansdorp. Telomere length measurement-caveats and a critical assessment of the available technologies and tools. Mutat. Res., 730(1-2):59–67, Feb 2012.
  • [3] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561, 2015.
  • [4] M. A. Ferguson-Smith. History and evolution of cytogenetics. Mol Cytogenet, 8:19, 2015.
  • [5] J. M. Garcia-Sagredo. Fifty years of cytogenetics: a parallel view of the evolution of cytogenetics and genotoxicology. Biochim. Biophys. Acta, 1779(6-7):363–375, 2008.
  • [6] E. Grisan, E. Poletti, and A. Ruggeri. Automatic segmentation and disentangling of chromosomes in Q-band prometaphase images. IEEE Trans Inf Technol Biomed, 13(4):575–581, Jul 2009.
  • [7] L. Ji. Fully automatic chromosome segmentation. Cytometry, 17(3):196–208, Nov 1994.
  • [8] Pommier JP. Resolving overlapping chromosomes: an emblematic case, 2013.
  • [9] Pommier JP. Generating images of overlapping chromosomes. https://dip4fish.blogspot.fr/2016/06/generating-images-of-overlapping.html, 2016. [Online; accessed 19-2016-06-21].
  • [10] Pommier JP. Overlapping chromosomes. https://www.kaggle.com/jeanpat/overlapping-chromosomes/data, 2016.
  • [11] Pommier JP. Overlapping chromosomes. https://github.com/jeanpat/DeepFISH/tree/master/dataset, 2016.
  • [12] P. M. Lansdorp, N. P. Verwoerd, F. M. van de Rijke, V. Dragowska, M. T. Little, R. W. Dirks, A. K. Raap, and H. J. Tanke. Heterogeneity in telomere length of human chromosomes. Hum. Mol. Genet., 5(5):685–691, May 1996.
  • [13] T. Liehr. Cytogenetically visible copy number variations (CG-CNVs) in banding and molecular cytogenetics of human; about heteromorphisms and euchromatic variants. Mol Cytogenet, 9:5, 2016.
  • [14] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
  • [15] B. McClintock. A Cytological and Genetical Study of Triploid Maize. Genetics, 14(2):180–222, Mar 1929.
  • [16] Shervin Minaee, Mehran Fotouhi, and Babak Hossein Khalaj. A geometric approach for fully automatic chromosome segmentation, 2011.
  • [17] M. V. Munot, J. Mukherjee, and M. Joshi. A novel approach for efficient extrication of overlapping chromosomes in automated karyotyping. Med Biol Eng Comput, 51(12):1325–1338, Dec 2013.
  • [18] S. S. Poon, U. M. Martens, R. K. Ward, and P. M. Lansdorp. Telomere length measurements using digital fluorescence microscopy. Cytometry, 36(4):267–278, Aug 1999.
  • [19] M. Popescu, P. Gader, J. Keller, C. Klein, J. Stanley, and C. Caldwell. Automatic karyotyping of metaphase cells with overlapping chromosomes. Comput. Biol. Med., 29(1):61–82, Jan 1999.
  • [20] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015.
  • [21] B. J. Trask. Human cytogenetics: 46 chromosomes, 46 years and counting. Nat. Rev. Genet., 3(10):769–778, 10 2002.
  • [22] E. Vera and M. A. Blasco. Beyond average: potential for measurement of short telomeres. Aging (Albany NY), 4(6):379–392, Jun 2012.
  • [23] Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
72534
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description