A Geometric Approach To Fully Automatic Chromosome Segmentation
Abstract
A fundamental task in human chromosome analysis is chromosome segmentation. Segmentation plays an important role in chromosome karyotyping. The first step in segmentation is to remove intrusive objects such as stain debris and other noises. The next step is detection of touching and overlapping chromosomes, and the final step is separation of such chromosomes. Common methods for separation between touching chromosomes are interactive and require human intervention for correct separation between touching and overlapping chromosomes. In this paper, a geometricbased method is used for automatic detection of touching and overlapping chromosomes and separating them. The proposed scheme performs segmentation in two phases. In the first phase, chromosome clusters are detected using three geometric criteria, and in the second phase, chromosome clusters are separated using a cutline. Most of earlier methods did not work properly in case of chromosome clusters that contained more than two chromosomes. Our method, on the other hand, is quite efficient in separation of such chromosome clusters. At each step, one separation will be performed and this algorithm is repeated until all individual chromosomes are separated. Another important point about the proposed method is that it uses the geometric features of chromosomes which are independent of the type of images and it can easily be applied to any type of images such as binary images and does not require multispectral images as well. We have applied our method to a database containing 62 touching and partially overlapping chromosomes and a success rate of 91.9% is achieved.
I Introduction
Chromosome karyotyping is an essential task in cytogenetics and is usually performed in clinical and cancer cytogenetic labs and can be used in the diagnosis of genetic disorders. The normal human karyotypes contain 22 pairs of autosomal chromosomes and one pair of sex chromosomes. Chromosome karyotyping is meant to identify and assign each chromosome in the image to one of the 24 classes. Chromosome karyotyping has three main steps: preprocessing, segmentation and classification. Among these steps, chromosome segmentation is very important, since it affects performance of classification which is the final goal. Chromosome images may have some defects; they may be bent, they may touch or overlap and their bands may be spread. In addition, since touching and overlapping chromosomes exist in almost every metaphase image, the solution of this problem is vital. The first step in analyzing a chromosome image is segmentation of chromosomes from the image background, the main methods used in this step are based on the evaluation of a global threshold by means of the Otsu method [1], or on a rethresholding scheme [2]. Due to the fact that long chromosomes may touch and overlap, the first segmentation step is usually unable to identify each chromosome as a single object, and presents a number of clusters. So far, attempts have been made to deal with clusters of touching (but not overlapping) chromosomes [3], [4], [5], and for clusters of overlapping (but not touching) chromosomes [6], [7], where both of geometric and intensity based features have been used to resolve segmentation ambiguities. Lerner [8] proposed a method to combine the choice of correct cluster disentanglement with the classification stage, resulting in a classificationdriven segmentation. Grisan [9] proposed a similar method. There are many other methods for separation between touching and overlapping objects [10], [11]. Schwartzkopf [12] proposed a method for joint segmentation and classification that used statistical method. Since this method was applied to multispectral chromosome images, it does not work for binary images. So far, most of chromosome analysis systems have a common fault: their poor automatic chromosome incision ability. Most of current systems for automatic chromosome segmentation are interactive and need human intervention. We have to mention that the original images are preprocessed and the chromosomes are segmented from the background and the intrusive objects and noises are removed from the background. Therefore, our main effort is to detect and separate touching or overlapping chromosomes.
It is worth mentioning that there are different approaches for segmentation and classification of medical images. One main approach is to the geometric characteristics of the object of interest, the other one is using spatial and transform domain features of the image, etc. The right set of features depends on the application. For example, in biometric recognition area, there are a lot of works based on spatial and frequency domain information of images. One such work is presented in [13], where the author uses the spatial and wavelet domain features of images to perform palmprint recognition. However, some of those approaches requires a very large dataset to train the model properly so they may not be applicable to small datasets, because they could be very prone to overfitting. A good work for dealing with small dataset is presented in [14], where the author explains how to jointly maximize the model accuracy and reliability .
In this paper, a geometric method for segmentation of the touching and partially overlapping chromosomes is presented. First, we introduce an approach to evaluate whether an object is a single chromosome or a chromosome cluster. By chromosome clusters, we mean a group of chromosomes which overlap and touch each other. Subsequently, for each cluster, we use geometric features of chromosome boundary which help separate touching or partially overlapping chromosomes. Chromosome segmentation is performed in two phases. In the first phase, touching or overlapping chromosomes are detected using the approach which is introduced in Section II where we deal with the chromosomes’ shape and their geometric features. If two or more chromosomes overlap, the resulting cluster would not have the usual long and thin shape and we can use such difference to detect chromosome clusters. In the second phase, we use other geometric features to separate touching or partially overlapping chromosomes. We will discuss about this step in Section III which deals with boundary pattern of chromosomes.
Our method has three advantages over earlier schemes:

First, it can be applied to any type of images, even binary image, and it does not need multispectral or grayscale images. Therefore it can reduce the cost of photography and the amount of computation.

Second, it can easily separate chromosome clusters that contain more than two chromosomes where most earlier schemes fail.

Third, our method is fully automatic and does not need any human intervention.
Ii Detection of touching or overlapping chromosomes
In order to detect chromosome clusters, we use three criteria which deal with the geometry of chromosomes. The first method is surrounding ellipse method (Section II.A), which is based on the ratio of the length of minor axis of surrounding ellipse to the length of its major axis. The second method is convex hull method (Section II.B) which is based on the number of pixels in the original chromosome to the number of pixels in its convex hull ratio. An important point about this method is its robustness in detecting small chromosomes that may produce error in the first method. The third method is skeleton and end points (Section II.C), which uses the skeleton of each chromosome (either single chromosome or a chromosome cluster) to find the end points of skeleton and decides based on the number of end points. All of these methods have some limitations, but through proper integration, we can detect all chromosome clusters (either touching or overlapping chromosomes) as shown through our simulation results. Each chromosome passes through these three methods and in case it satisfies the criteria of all three methods, it will be detected as a chromosome cluster. We will discuss the details of each method in the following parts.
Iia Surrounding ellipse method
Surrounding ellipse of a shape is an ellipse which surrounds that. Single chromosome is usually long and thin (unless those chromosomes which belong to 20th, 21st or 22nd group) so its surrounding ellipse will be long, but the overlapping chromosomes have a surrounding ellipse close to a circle. We can use this difference for detection of overlapping chromosomes. In order to take advantage of such difference, the ratio of the length of minor to major axis of the surrounding ellipse has to be found. If the label is overlapping, we expect this ratio to be close to 1, because the surrounding ellipse would be close to a circle, but if the chromosome is single it will have a smaller ratio. Therefore, a threshold can be determined to distinguish between chromosome clusters and single chromosomes. We propose the below algorithm for this step:

Find the surrounding ellipse of each chromosome label

Find the ratio of minor axis length to major axis length of surrounding ellipse for each chromosome (either a single chromosome or chromosome cluster)

Determine a threshold (we simply set this threshold to the average of all ratios, but if we have a large dataset, we can use a training set to determine this threshold)

Compare ratio of each label with this threshold. For each label, if the ratio is less than the threshold, remove it, but if the ratio is more than the threshold, keep this chromosome
This method is very fast, but it has problems with two types of chromosomes:

Small chromosomes

Bent chromosomes
The shape of small chromosomes is different from usual chromosomes as they have a round shape where the ratio of minor axis length to major axis length of their surrounding ellipse will be similar to overlapping chromosomes. Bent chromosomes also have ratios similar to overlapping chromosomes so they may wrongly be detected as overlapping chromosomes, an issue that needs to be addressed properly. As the proposed algorithms are applied to each chromosome in a cascade fashion, each step has to remove those single chromosomes which are not removed in the previous steps.
IiB Convex hull method
In Euclidean space, an object is convex if for every pair of points within the object, every point on the straight line segment that joins them is also within the object. The convex hull of a set C is the smallest convex set that contains C. Convex hull have been used in several applications in computer vision, image analysis, and digital image processing, including object recognition, image and video coding. As a normal chromosome has a relatively convex shape, its convex hull would approximately have the same number of pixels as the original chromosome. If we find the convex hull of the chromosomes, we will notice that the convex hull of chromosome clusters have much more pixels than chromosome clusters themselves, whereas the single chromosomes have almost the same number of pixels as their convex hulls. Consequently, we can detect chromosome clusters using such difference. In order to achieve this goal, we should find the ratio of the number of pixels in each chromosome to number of pixel in its convex hull for all chromosomes and then compare these ratios with a threshold. For each chromosome, if the ratio is less than a given threshold, we expect that this label would be an overlapping chromosomes and vice versa. The proposed algorithm for this method is given below:

Find convex hull of each chromosome label.

Calculate the ratio of the number of pixels in the original chromosome to the number of pixel in its convex hull for each chromosome.

Determine a threshold (this threshold can be determined using training set, or it can simply set to the average of these ratios for all chromosomes)

Compare this ratio for each chromosome with this threshold, for each label if the ratio was more than the threshold eliminate this chromosomes.

The remaining chromosomes will be sent to the next step.
One advantage of this method is that we can eliminate small single chromosomes remaining from the previous step. Since for these chromosomes the convex hull is almost coincident with original chromosome, the ratio will be more than the given threshold. However, as we still have problem with bent chromosomes, we should eliminate them in the next step. Fig.1 represents an image of chromosomes with convex hulls of two chromosomes.
IiC Skeleton and end points
Skeletonization is the transformation of a component in a digital image into a subset of the original component. Skeleton has been used in several applications in computer vision, image analysis, and digital image processing.
We used this method as one step of the chromosome clusters detection algorithm. If we find the skeleton of each chromosome and then find the end points of this skeleton (end points are those point which are the last point in any side of a line) we will notice that the overlapping chromosomes usually have more than 2 end points. Therefore, we can use this idea for detection of overlapping chromosomes. Skeletons and end points of a set of chromosomes are represented in Fig.2. End points of chromosomes are shown with red points. We observe that all chromosomes clusters in this picture have more than two end points.
The proposed algorithm for this method is:

Find the skeleton of each chromosomes.

Find the end points of each skeleton.

In the case more than two points are found, classify them as a chromosome clusters.
This method is robust for finding overlapping chromosomes. However, because of the iterative structure of the skeleton algorithm, it is timeconsuming and we should improve its computational complexity. In the next part, we combine these three methods in a proper way.
IiD Integration Step
In order to solve the problem of the skeleton method, we decided to apply this method to a fewer number of chromosomes, initially 40 to 46 chromosomes, some of which are overlapping. First, we apply convex hull and surrounding ellipse methods and eliminate a large number of single chromosomes. After these two steps we usually have about 8 to 14 chromosomes. Subsequently, the skeleton method can be applied to detect chromosome clusters from the remaining ones and because the number of chromosomes has been reduced from 46 to between 8 and 14, we will improve the time efficiency of the algorithm by a factor of 4. On the other hand, the surrounding ellipse method can not eliminate small single chromosomes and it is better to apply surrounding ellipse method after convex hull method. The block diagram for the direction of overlapping chromosomes shown in Fig.3 :
Each chromosome passes through these three methods and if it satisfies all three criteria, it will be considered as a chromosome cluster. After detection of all chromosome clusters, they will be used as the input of the second phase, which is separation of chromosome clusters.
Iii Separation of touching or partially overlapping chromosomes using cutline method
As discussed previously, after detection of overlapping chromosomes we need to separate them.
We introduce another geometricbased method for separation of touching or partially overlapping chromosomes. First, we find the crosspoints of overlapping chromosomes. Crosspoints are those points on the boundary of a chromosome cluster where two chromosomes touch or overlap. We will discuss the methods which can be used to find these crosspoints in Sections III.A and III.B. Through the application of the proposed method, various touching or partially overlapping chromosomes can be handled in the same way. Two chromosome clusters and their crosspoints are shown in Fig.4:
Once the crosspoints are found, we should separate chromosome clusters using these crosspoints. If the chromosome cluster consists of two chromosomes, it can then be cut from the line between crosspoints resulting in two single chromosomes. However, if it consists of more than two chromosomes, we should repeat the whole algorithm multiple times.
In the following sections, we will introduce some approaches to find such crosspoints. In order to find these crosspoints, we only need to search on the boundary of the chromosomes. Therefore, in order to reduce the amount of computations, we can extract the boundary of chromosomes and search for the crosspoints only in the boundary locations. Once the boundary extraction is done, we can sort the pixels on the boundary in a clockwise fashion. Suppose that the sorted boundary pixels are located in an N2 matrix in which each row contains the coordinates of the ith pixel on the boundary and N is the total number of pixels on the boundary. This boundary matrix will be denoted by B. Fig.5 illustrates the result of boundary extraction in a chromosome cluster.
In order to find the crosspoints, we use two criteria based on the geometry of the boundary. The criteria are:

Variations in the Angle of Motion Direction (VAMD)

Sum of Distances among Total Points (SDTP)
The first criterion, VAMD, is explained in Section III.A. It tries to find the crosspoints based on the variation in the angle of motion. The second criterion is SDTP as explained in Section III.B. This criterion uses the fact that crosspoints are usually located in the middle of a chromosome cluster. All pixels of the boundary pass through these two criteria, and at each step, some of boundary points will be eliminated and the total crosspoints will be selected with a cost function which takes into account both these criteria.
Iiia Variations in the Angle of Motion Direction (VAMD)
In order to understand the meaning of VAMD, suppose that an object is moving on the boundary of a chromosome. At each pixel, it has to move in a direction called motion direction which leads it to the next pixel. The angle between this direction and the horizontal axis is called the angle of motion direction. This angle can be calculated as the angle of connecting line between ith and (i+1)th pixels. We denote this angle with .
(1) 
It should be noted that due to noise on the boundary, it is better to use more pixels to find a better estimation of the angle of motion direction. One can use angle of the connecting line between ith pixel and jth pixel as:
(2) 
Subsequently, we can use a weighted average of for different j’s to find a robust estimation of .
(3) 
The first summation is the estimated using the forthcoming pixels and the second summation is the estimated using previous pixels. The weights and can be set to a fixed value or can be adaptive. The adaptive choice usually works better and it has to be a function of the Euclidean distance between the ith and jth pixels. For example, one possible choice of and could be , where denotes the Euclidean distance between ith and jth pixels. As can be verified from the above formula, for pixels with long distance from the current pixel, would be large, therefore the corresponding weights would be very small, which is reasonable.
Based on our simulation we deduced that if j=i+4, i+5 are used, the result will be the most satisfactory. Therefore, we used the following formula to find :
(4) 
Fig.6 depicts this method on a curve.
After finding for all pixels on the boundary, we have to calculate the variation of angle in the ith pixel as the difference of the (i+1)th pixel angle and the ith pixel.
(5) 
We expect to have a larger in crosspoints compared to the other points of the boundary. We can use the following algorithm to remove superfluous points on the boundary:

Find the variation of angle in each pixel on the boundary.

Calculate the average of .

For each pixel if then remove this pixel from candidate pixels for crosspoints (we used =1 which is found by trial and error).
The result of this step is shown in Fig.7.
IiiB Sum of Distances among Total Points (SDTP)
Because of noise in chromosomes images, after the aforementioned step, there may remain more than two points, so we should use another criterion to find the crosspoints. Let us assume after the previous step, M points have remained. We denote these points with to where M is the number of remaining points. The crosspoints are usually located in the middle of overlapping chromosomes (because overlapping chromosomes are formed by two or more chromosomes). For each remaining pixel, we find the sum of distances between this pixel and other remaining pixels and we expect this sum in the crosspoints to be less than the other points. We can find the sum of distances from each pixel to other pixels as:
(6) 
where d() is the Euclidean distance between ith and jth pixels on the boundary.
We can select two pixels with the minimum amount of as crosspoints. However in some cases, this method can select the wrong points. For example, if one small chromosome touches a large chromosome by its end, this selection method will not work properly. Therefore, we have to use another criterion alongside this criterion in order to avoid such errors. In order to avoid misselection, we can use both VAMD and SDTP in our final decision criterion. Therefore, in cases that SDPT cannot select the right points, VAMD can help the algorithm avoid misselection. We defined a cost function which takes into account both VAMD and SDTP and selects two points with minimum amount of cost function as the crosspoints.
(7) 
The parameter should be a positive number that can control the effect of in the cost function. In order to minimize this function, one needs to minimize and maximize . The value of can be determined by trial and error on a training set. Based on our simulation =1000 produces satisfactory results.
After applying this algorithm, we will choose two points with the two least values of Cost(i) as the crosspoints and overlapping chromosome can be separated using the line between these two points. The result of this step is shown in Fig.8.
Iv Separation of chromosome clusters with more than two chromosomes
In the previous sections, we concentrated on chromosome clusters consisting of two chromosomes. In some cases, chromosome clusters may have more than two chromosomes. For binary images, it is difficult to separate a chromosome cluster with more than two single chromosomes in one step. However, we can easily separate this type of chromosome clusters by a multistep algorithm. In fact, if a chromosome cluster consists of N single chromosomes, we can separate all single chromosomes in N1 steps. We propose the following algorithm for separation of clusters with more than two chromosomes:

Separate each chromosome cluster with previous methods. After separation we will have two new chromosomes.

For each new chromosome check whether it is a single chromosome or a chromosome cluster.

If both of the new chromosomes are single, the algorithm is finished.

If at least one of the new chromosome is a chromosome cluster, separate it using the procedure in the second phase.

Continue this algorithm until all new chromosomes are single.
V Results
By using the proposed algorithm, we analyzed 25 images containing a total number of 1150 chromosomes. There are about 62 touching or partially overlapping chromosomes in this data set. We are interested in assessment of the ability of the proposed algorithm to successfully separate clusters into their composing chromosomes. We tested our algorithm on these 62 chromosome clusters and observed that this algorithm separates 57 chromosome clusters correctly. So an accuracy rate about 91.9% is attained. In Table I, we have reported the fraction of correct separations of touching and partially overlapping with respect to their total number. In this table, we also have reported a comparison with all similar results reported in the literature.
Method  Number of touching or partially overlapping chromosoms  Accuracy 

Ji (1989) [1] set 1  
Ji (1989) [1] set 2  
Lerner (1998) [4]  
Grisan (2009) [9]  
Proposed method 
Fig.9 presents results of separation between six touching and partially overlapping chromosomes. As we can see, this method provides very efficient separation of touching and partially overlapping chromosomes.
Vi Conclusion
In this paper, a geometricbased approach is proposed for chromosome segmentation. It uses three criteria for detection of chromosome clusters. After that, it uses a novel geometric method to find two crosspoints on the boundary of clusters which can be used for extraction of cutline. After the cutline is found, we can decompose groups of chromosomes which touch and overlap each other. This algorithm is able to decompose clusters of touching or partially overlapping chromosomes that consist of more than two chromosomes. Another advantage of this method is that it can easily apply to any type of images, even binary chromosome images. In addition, due to use of geometric features of chromosomes which are independent of image type, the proposed scheme does not need multispectral images.
In future, we will focus on separation of completely overlapping chromosomes. For this purpose, first we should distinguish between touching chromosomes and overlapping chromosomes and apply the related algorithm to each class (separation algorithm for touching or partially overlapping chromosomes is different from separation algorithm for completely overlapping chromosomes). Separation algorithm for completely overlapping chromosomes is based on finding the cross section between two overlapping chromosomes and using it for separation of chromosome clusters.
Acknowledgments
The authors would like to thank Prof. Hamid Aghajan for his invaluable help during this project. We would also thank Mr. Payam Delgosha, Mr. Amirali Abdolrashidi and Mr. Ali Hashemi for their useful comments on our project.
References
 [1] L. Ji, "Intelligent splitting in the chromosome domain," Pattern Recognition, vol. 22, no. 5, pp. 519532, 1989.
 [2] L. Ji, "Fully automatic chromosome segmentation," Cytometry, vol. 17, pp. 196208, 1994.
 [3] B. Lerner, "Toward a completely automatic neuralnetworkbased human chromosome analysis," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 28, no. 4, pp. 544552, Aug. 1998.
 [4] B. Lerner, H. Guterman, and I. Dinstein, "A classificationdriven partially occluded object segmentation (CPOOS) method with application to chromosome analysis," IEEE Trans. Signal Process., vol. 46, no. 10, pp. 28412847, Oct. 1998.
 [5] X. Shunren, X. Weidong, and S. Yutang, "Two intelligent algorithms applied to automatic chromosome incision," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., ’03. (ICASSP ’03)., Apr., pp. 697700.
 [6] G. Agam and I. Dinstein, "Geometric separation of partially overlapping nonrigid objects applied to automatic chromosome segmentation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 11, pp. 12121222, Nov. 1997.
 [7] C. Urdiales Garcia, A. Bandera Rubio, F. Arrebola PÂ´erez, and F. Sandoval HernÂ´andez, "A curvaturebased multiresolution automatic karyotyping system," Mach. Vis. Appl., vol. 14, pp. 145156, 2003.
 [8] B. Lerner, H. Guterman, and I. Dinstein, "A classificationdriven partially occluded object segmentation (CPOOS) method with application to chromosome analysis," IEEE Trans. Signal Process., vol. 46, no. 10, pp. 28412847, Oct. 1998.
 [9] Enrico Grisan, Enea Poletti, and Alfredo Ruggeri," Automatic Segmentation and Disentangling of Chromosomes in QBand Prometaphase Images" IEEE Trans. Information Technology in Biomedicine, vol. 13, no. 4, July 2009.
 [10] M.W.Koch and R.L. Kashyap, "Using Polygons to Recognize and Locate Partially Occluded Objects," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 4, pp. 483494, Apr. 1987.
 [11] H.J. Wolfson and Y. Lamdan, "Transformation Invariant Indexing," Geometric Invariance in Computer Vision, J.L. Mundy and A. Zisserman, eds., pp. 335353. MIT Press, 1992.
 [12] Wade C. Schwartzkopf, Alan C. Bovik, and Brian L. Evans," MaximumLikelihood Techniques for Joint SegmentationClassification of Multispectral Chromosome Images." IEEE Trans on medical imaging, vol. 24, no. 12, Dec 2005
 [13] S. Minaee and A. Abdolrashidi. "Highly Accurate Multispectral Palmprint Recognition Using Statistical and Wavelet Features." arXiv preprint arXiv:1408.3772 (2014).
 [14] S. Minaee, Y. Wang and Y. W. Lui, "Prediction of Longterm Outcome of Neuropsychological Tests of MTBI Patients Using Imaging Features," Signal Processing in Medicine and Biology Symposium (SPMB), IEEE, 2013.