Enhanced Center Coding for Cell Detection with Convolutional Neural Networks
Cell imaging and analysis are fundamental to biomedical research because cells are the basic functional units of life. Among different cell-related analysis, cell counting and detection are widely used. In this paper, we focus on one common step of learning-based cell counting approaches: coding the raw dot labels into more suitable maps for learning. Two criteria of coding raw dot labels are discussed, and a new coding scheme is proposed in this paper. The two criteria measure how easy it is to train the model with a coding scheme, and how robust the recovered raw dot labels are when predicting. The most compelling advantage of the proposed coding scheme is the ability to distinguish neighboring cells in crowded regions. Cell counting and detection experiments are conducted for five coding schemes on four types of cells and two network architectures. The proposed coding scheme improves the counting accuracy versus the widely-used Gaussian and rectangle kernels up to 12%, and also improves the detection accuracy versus the common proximity coding up to 14%.
Index Terms– Cell counting, Cell detection, Convolutional neural networks
I Introduction†† This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Dr. Daniel S. Weller is supported by NSF 1759802. Dr. Cedric L. Williams is supported by ARO W911NF-16-C-0104. One Titan X Pascal GPU used in this work is supported by NVIDIA. H. Liang and D. S. Weller are with the Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904 USA (email: email@example.com; firstname.lastname@example.org). A. Naik and J. Kapur are with the Department of Neurology, University of Virginia, Charlottesville, VA 22904 USA (email: email@example.com; firstname.lastname@example.org). C. L. Williams is with the Department of Psychology, University of Virginia, Charlottesville, VA 22904 USA (email: email@example.com).
The cell number and distribution pattern reflect many underlying biomedical mechanisms. For example, the cell number is important for the diagnosis and treatment of breast cancer, and subcellular object counting and localization facilitate large scale tumor study with tissue microarrays. With the advances of microscopy imaging in recent years, automatically extracting meaningful information from large amounts of raw imaging data is necessary in many applications. Cell counting and detection methods can be classified as feature-based methods and learning-based methods. Feature-based methods [4, 5, 6] are built with hand-designed features and parametric models. Common hand-designed features include maximally stable extremal regions (MSER) , scale invariant feature transform (SIFT)  and histogram of gradients (HOG). Level sets  and active contours are two widely used parametric models. Features based on local convexity are processed by a graph model for cell detection. Support vector machines (SVM) are fed with regional proposals for cell detection[4, 5]. However, one limitation of feature-based methods is that the performance varies among different types of cells, and sometimes these features need to be redesigned for different cell targets. Recently, data-driven methods receive more attention [12, 13, 14] for two reasons. First, the success of deep learning in the area of object classification and detection provides practical experience on the architecture design and training of convolutional neural networks (CNN) that can be adapted to other domains. Packages such as Pytorch , and regularization techniques such as dropout and residual convolution make building customized CNNs easier. Secondly, huge amounts of imaging data are generated with advanced microscopy technology. Models trained with more imaging data are more robust and accurate.
Because microscopy data differ from natural scene images in many ways, transferring solutions for natural scene images to cell analysis should be performed carefully. For cell analysis, the objects of interest, cells, usually are of a smaller size compared with the objects in natural scene images. Very deep network architectures are not necessary for cell detection, since the typical size of a single cell is within pixels in microscopy images. The benefit of designing a deeper network is marginal if the receptive field is large enough to cover a single cell. Deeper networks also have more trainable parameters, and this could even hurt the performance by over-fitting. In biomedical research, cell analysis is a highly customized task. The morphology could change dramatically among different types of cells. Even for the same type of cell, the appearance changes with different tissue preparation protocols, imaging equipment and imaging protocols. Instead of training an omnipotent cell analysis framework, the ability to adapt the analysis model to a specific cell type and imaging modality is desirable in practice.
The success of deep learning on numerous computer vision tasks is widely perceived as a result of improved learning architectures and big data. CNNs and many regularization techniques, such as max out, drop out  and residual convolution , are practical components for building neural networks. The other factor, data enhancement, is discussed less especially for the output data. In this paper, instead of focusing on the data enhancement at the input side of a learning network, the coding scheme at the output side is discussed. Compared with the input data, the output data has more freedom to be designed to suit our applications. For cell analysis, one type of the raw labels is a sparse 2D picture where non-zero pixels indicate centers of cells. The widely used cell density function for cell counting can be taken as an augmentation of the output data. The cell density usually is a smoothed version of the raw dot labels. Similarly, cropping the whole images into patches can be taken as filtering the raw dot labels with a rectangle filter.
To better understand the effect of output data enhancement, two criteria of raw label coding are discussed in this paper: entropy and reversibility. The entropy measures how suitable the coding scheme is for training, and the reversibility measures how robust inverting the coding to raw labels is when predicting. Different raw label coding methods are trading-off between these two criteria. For applications with different variations of shape, size and crowdedness of the cells, the coding scheme should be designed accordingly. The coding scheme with both high entropy and high reversibility is always preferred. However, a trade-off between these two indexes has to be made in most cases depending on the training loss and the overall performance. For example, if the training loss with a coding scheme converges fast, but the accuracy is not ideal after inversing the outputs to the raw data, then a coding scheme with lower entropy but higher reversibility should be considered. We also propose a new coding scheme that balances these two criteria well in most cases. Fig. 1 shows one example of granule cell detection in the mouse brain with the proposed coding scheme. However, it should be emphasized that we are not proposing an optimal coding scheme for all types of cells. The coding scheme should be carefully designed based on each application.
The rest of this paper is organized as follows. Section II reviews recent cell counting and detection works with CNNs, and highlights one common step in these works: raw dot label enhancement. Section III details the proposed coding evaluation criteria, and provides a cell center coding scheme based on the design criteria. Section IV verifies the proposed coding method with four cell datasets and two network architectures. At last, Section V reviews the proposed method, and discusses future work in cell analysis.
Ii Related works
Ii-a Object Detection in Natural Scenes and Cell Analysis
Object detection for natural scene images usually outputs a group of bounding boxes to represent the location and the size of detected objects. Both R-CNN, YOLO and their variants adopt this box representation. Some widely used detection benchmarks also use the box labeling, such OTB and COCO. By representing an object with a bounding box by four numbers, the number of output nodes in a detection network is significantly reduced. For cell analysis, the center dots and contours of cells are more common labeling formats rather the bounding box. Since labeling contours for training costs more time, dot labeling is used more often if the cell morphology is not of particular importance. In this paper, we focus on the dot labeling for cell counting and detection.
Among the learning-based cell detection works, the raw dot labels usually are pre-processed before being set as the output data for training. Cropping whole images into small patches[14, 25], and transforming the dot labels into a density representation [20, 26] are two common approaches.
Designed for cell counting, the inputs of the CNN are patches of size pixels, and the output is the total number of cells in this patch. The CNN treats number estimation as a regression problem rather than a classification one. The reason is that the appearance difference between patches with similar number of cells should be smaller than that between patches with great disparity in cell numbers. By cropping a whole image into multiple patches, the quantity of training examples is greatly enhanced. With the implementation of fully convolutional networks [27, 28], another way to understand the pipeline of cropping and counting is that the raw dot labels are filtered with a moving average filter of size . In Walach’s work , raw dot labels convolved with a Gaussian kernel are used for cell detection, and raw dot labels convolved with a human-shape kernel are used for pedestrian detection.
Cell counting and detection are implemented by a CNN followed by a compressive sensing module. The input to the CNN is a patch of size containing multiple cells, and the output is a vector that contains compressed center location information. The compressed location information is computed by a learned sensing matrix with , where is a vector of raw dot labels. The length of is much less than the length of . As a result, decoding with a sparse prior is used to recover the exact position at prediction. The concept of coding the raw dot labels is emphasized , but the coding with compressive sensing is used less than simple spatial filter codings for two reasons. First, the sensing matrix, , is an extra mapping relation learned by the neural network. Because of the size of , a large amount of training data is required to avoid over-fitting and maintain good accuracy. Second, recovering the location information from the compressed vector can be time-consuming.
where is the Euclidean distance from the pixel to the closest cell center. Because proximity coding preserves local maxima at cell centers, such coding can be used for cell detection as well .
From recent works on cell analysis, there are two observations. First, transformation of the raw dot labels is necessary before training the network. As a result, a corresponding inverse transformation is required at the prediction phase. For cell counting, integration over the outputs serves this purpose, and for cell detection, local maximum detection is used. Second, cell counting and cell detection share many common components. The tasks of cell counting and detection share the same pipeline . A more detailed comparison between cell counting and detection is provided in the next part.
Ii-B Counting vs. Detection
Object counting usually is preferred over detection in applications where objects are crowded and single objects are not distinguishable. In this case, detection will not provide accurate location information, and texture information is a crucial clue to estimate the object density[30, 31]. However, if each single object can be identified, the detection approach has more advantages. First, detection provides the location information lost in counting. Second, more complexity is involved in counting than detection for training. For the applications of cell analysis, usually the size of cells is much smaller than the whole image. This indicates that a single cell can be recognized with a smaller receptive field than the whole image required by counting. Larger receptive fields of CNNs usually mean more trainable parameters and deeper networks. As mentioned before, cell analysis is a highly customized application, and retraining a model to fit a specific imaging modality and cell morphology is more effective than an overly general model. For this reason, cell detection enables a smaller network design that is easier to train with limited labeled data. In general, detection is a more appropriate approach if objects are not heavily occluded and each single object is recognizable. The major challenge of detection is when objects are densely packed or partially occluded. In the next section, two criteria are proposed for the raw dot label coding for cell detection, and a new raw dot coding method, repel coding, is proposed to better tackle this challenges.
Iii Proposed coding scheme
|Dot labels||Gaussian kernel||Avg. kernel||Proximity coding||Repel coding|
The two proposed criteria for the center coding are entropy and reversibility. At the training phase, entropy characterizes if the coding scheme is easy for the neural network to learn. At the prediction phase, reversibility measures if the coding scheme can recover the raw dot labels robustly.
The entropy of a coding scheme, , is defined asï¼
where represents the coded value at position . Zero values in the transformed coding are excluded here, since these regions usually are far from any cell. Entropy measures how evenly the non-zero values are distributed. An ideal coding scheme should distribute the coding values uniformly over a range. By doing so, the gradient backpropagation during the training phase is more robust. A similar concept is mentioned by modeling the counting problem as regression rather than classification . The extreme case of low entropy coding is the raw dot labels, where the entropy is always zero.
The reversibility of a coding is defined as,
where is the mask defining the proximity region of cell centers. The binarized raw dot labels, or the dilated version of raw dot labels can be used for . In the prediction phase, the output response is not identical to the ideal coding scheme. A robust coding scheme should be able to recover the original coding, raw dot labels, in challenging cases. Reversibility is a similarity measurement between the raw dot labels and the coded response. Because local maximum detection is used to recover the raw dot labels at the prediction phase, reversibility here is defined as the degree of energy concentration around the raw dot labels.
For cell detection, a coding scheme with large entropy and reversibility indexes is preferred. As an extreme case, the dot label itself has the maximum reversibility index. However, the raw dot label has the smallest entropy index. This means it is hard for the neural network to learn raw dot labels. On the other hand, coding by the Gaussian kernel has a larger entropy but a lower reversibility index. The result is that networks trained with coding by a Gaussian kernel converge fast and robustly in terms of loss value, but center recovery is obscured in the prediction phase. More analysis on the entropy and reversibility trade-off is illustrated with experimental results in Section IV.
Iii-B Repel Coding
The proposed coding scheme of raw dot labels is based on proximity coding defined in Eqn. 1. When proximity coding was first proposed , it was designed for cell counting. Because proximity coding produces local maxima at cell centers, it was also used for cell detection later . However, one common challenge for cell detection is to distinguish two neighboring cells. For cell counting, only a global counting number is required. In other words, only the entropy is considered when coding raw dot labels for cell counting, but not the reversibility. In practice, we notice that proximity coding does not perform well for detection when cells are crowded. The response valley between two cells is not significant enough, and local maxima do not align with cell centers accurately during the prediction. Aiming at increasing the reversibility of proximity coding, the proposed repel coding is defined as,
where is the distance of the pixel to its nearest cell center, and is the distance of the pixel to its second nearest cell center. The intermediate variable can be taken as suppressed by .
In Fig. 2, examples of different coding schemes are illustrated. Comparing Fig. 2 (d) and Fig. 2 (e), it is obvious that the proposed repel coding forms a more significant valley between two neighboring cells than proximity coding. Table I provides the entropy and the reversibility of different coding schemes shown in Fig. 2. The entropy, in Table I, is calculated by separating the coded non-zero values into eight bins. The reversibility, in Table I, is calculated by using the raw dot labels as in Eqn. 3. Because it is rare in practice that the centers of two cells are 1 pixel away, the dilated reversibility, in Table I, is calculated by dilating the raw dot labels with a disk of diameter of pixels. The meaning of the entropy and the reversibility in Table I can be interpreted by comparing with the illustrations in Fig. 2. The coding scheme with the highest entropy is the Gaussian kernel, and the entropy of the proposed repel coding is slightly less than that of the Gaussian kernel. With visual inspection, the intensity variations of the Gaussian kernel and the repel coding are larger than the other codings. Measured by , the proposed repel coding achieves the highest reversibility index except for raw dot labels. This also aligns with the visual inspection where cell centers with the repel coding are more prominent than those with proximity coding.
Iii-C Relation to existing works
Besides different coding schemes for cell analysis, coding of the raw labeled data is also widely adopted for computer vision tasks with natural scenes. The anchor box introduced in YOLO v2  resembles convolution kernels with different shapes . The watershed transformation  for semantic segmentation is similar to the proposed repel coding. The difference is that in watershed transformation, boundary information is of interest, while for cell detection, center information is the final output. A two-step coding scheme inspired by the watershed algorithm is effective in many semantic segmentation applications. The first step involves coding object boundaries. In this step, the coded response at each pixel is a two dimensional unit vector that points to the closest boundary pixel. The coding in the first step aims at maximizing the reversibility of the raw labels. In the second step, the coded response at each pixel is the distance from a pixel to its closest boundary pixel. The second step focuses more on entropy maximization. These two steps are cascaded in the watershed transformation pipeline.
In general, different coding schemes are different ways to transfer an end-to-end training framework to a stepwise implementation. Another way to understand coding the raw dot labels is taking the neural network as a signal processing system. As in the analog domain, designing filters with shape responses such as the unit impulse response is challenging. By coding the raw dot labels, we impair the ideal response by smoothing it, but such smoothing is preferred sometimes because of its easier implementation.
In this part, different coding schemes are tested for four types of cells and with two CNN architectures. Experimental results show that the proposed repel coding outperforms existing coding schemes both for cell counting and detection tasks. Discussion of the examples from the four types of cells provides some insights into different coding schemes.
Four datasets are evaluated in the experiments: granule cells in the mouse dentate gyrus (DG), human adipocyte cells (Adip), human bone marrow cells (HBM) and Vgg-generated synthetic cells (Vgg). Fig. 3 shows examples from these four datasets.
Iv-A1 DG dataset
The DG dataset comprises mouse brain tissues stained by tdTomato after seizure. These brain tissues include the dentate gyrus in the V-shape, and the highlighted cells are granule cells. A Zeiss 780 confocal microscope with a C-Apochromat objective under 10X magnification is used to image the brain tissues. The DG dataset contains 26 high resolution dentate gyrute images, and the image size is from to . More details about the DG dataset can be found in our previous work .
Iv-A2 Adip dataset
The Adip dataset contains human subcutaneous adipose tissues obtained from the genotype tissue expression consortium. The available images of Adip dataset are , and the size of each single cell is within . These adipocyte cells are densely-packed as shown in Fig. 3 (b). The Adip dataset contains 200 images.
Iv-A3 HBM dataset
Iv-A4 Vgg dataset
Iv-B Experimental settings
Two network architectures are tested in these experiments. The training settings, including the cost function, the learning rate, and the optimization algorithm are kept the same through different experiments. The learning rate is set to , the optimizer is Adam with default parameters , and the training batch size is set as eight for HBM, Adip and Vgg datasets, and two for the DG dataset, to fit in the 12GB memory of the NVIDA TITAN Xp used in the experiments.
Iv-B1 Cost function
Iv-B2 Network architectures
Two CNNs based on the Unet  architecture are evaluated in the experiments. One CNN is the same as the FCNN-A , and the other one replaces the convolutional layers in the FCNN-A with residual convolution blocks. The overall architecture of the FCNN-A is shown in Fig. 4. The activation function used in all layers is rectified linear unit (ReLU). The receptive fields of these two CNNs are both .
Since coding schemes for both counting and detection are compared in the experiments, two measures are adopted here. For codings with the Gaussian kernel and the rectangle kernel, the integration over the outputs is taken as the total number of cells. For raw dot coding, proximity coding, and repel coding, cell centers are extracted by local maximum detection. The F1 score  is used as a comprehensive index to evaluate the detection accuracy. Alg. 1 summarizes the F1 score calculation. In Alg. 1, each non-paired detected cell center is matched to the closest non-paired ground truth cell center. Cell centers in the detected list, , and the ground truth list, , can be paired only once. A match with the distance less than the average radius of the cell is considered to be a successful match since the cell size variant within a dataset is not much. The average radius is pixels for the DG dataset, pixels for the Adip dataset, pixels for the HBM dataset, and pixels for the Vgg dataset.
Iv-C Results and analysis
With the four datasets, five coding methods, and two CNN implementations, 40 sub-experiments are evaluated. Each sub-experiment tests one coding method with one CNN implementation on one dataset. For each dataset, 80% percent of the data are used for training, and 20% are used for testing. Each sub-experiment is run five times with random training/test splitting, and the average performance is reported in Table II and Table III. The proposed repel coding achieves the best performance in most sub-experiments. The only exception is the sub-experiment on the Vgg dataset with the FCNN-A implementation, where the performance of raw dot labels is slightly better than the proposed method. The reason may be that the cell variance in the Vgg dataset is less than in the other datasets, and thus is a less challenging dataset. Comparing the results in Table II and Table III, another observation is that the performance of all the coding methods benefits from the residual convolution blocks. This result is expected since the increased effectiveness of the residual convolution block is demonstrated in previous cell analysis works[14, 26].
To clarify why the proposed repel coding outperforms others, examples from four datasets are shown in Figs. 5-7. Fig. 5 shows an example from the DG dataset with the FCNN-A network. When two cells are close, proximity coding tends to merge the two centers. By comparing the prediction results with the illustrations in Fig. 2, we can find the reason. The proposed repel coding suppresses the responses of pixels that lie in the middle of two cell centers, and boosts the responses that are close to cell centers. The outputs from the Gaussian kernel coding and the rectangle kernel coding are as expected in Fig. 5, and do not have the ability to recover cell centers. Fig. 6 shows an example from the Vgg dataset trained by the CNN with residual convolutaion blocks. The advantage of the repel coding is obvious in the partially occluded regions. The outputs of the raw dot labels in Fig. 6 (a) are unstable, and tend to output duplicated cell centers. In addition, we find that training with raw dot labels can easily diverge if the training batch size is less than eight on Adip and Vgg datasets, and this does not happen with the other coding schemes. Image sizes of Adip and Vgg datasets are smaller than those of DG and HBM datasets. This may be due to the sparsity in the raw dot labels that leads to insufficient positive training examples. At last, Fig. 7 compares the performance of proximity coding and repel coding on Adip and HBM datasets. In these two datasets, occlusion is less common, but cell appearance varies more. Because the proposed repel coding has a larger reversibility index, the repel coding generally provides stronger responses around cell centers, resulting in more robust center detection.
|Dot label||Gaus. kernel||Rec. kernel||Proximity||Repel|
|Dot label||Gaus. kernel||Rec. kernel||Proximity||Repel|
In this paper, after reviewing recent learning-based works on cell counting and detection, the common step of coding raw dot labels is extracted and discussed. Two center coding criteria are proposed: entropy and reversibility. These two criteria help predict the performance of a coding scheme at the training and prediction steps. A new coding scheme, repel coding, is proposed for a better balance with these two center coding criteria. Experimental results verify the effectiveness of repel coding for cell detection on four types of cells. In the future, we would like to explore more about the cell activation topology with the detected cell centers.
The authors wish to thank Dr. Suchitra Joshi for her help preparing the brain tissues of the DG dataset, and Anvitha Kambham and Smriti Subedi for their help labeling the DG dataset.
-  M. Ignatiadis, M. Buyse, and C. Sotiriou, “St Gallen international expert consensus on the primary therapy of early breast cancer: an invaluable tool for physicians and scientists,” Annals of Oncology, vol. 26, no. 8, pp. 1519–1520, 2015.
-  R. L. Camp, G. G. Chung, and D. L. Rimm, “Automated subcellular localization and quantification of protein expression in tissue microarrays,” Nature medicine, vol. 8, no. 11, pp. 1323–1328, 2002.
-  E. Meijering, “Cell segmentation: 50 years down the road,” IEEE Signal Processing Magazine, vol. 29, no. 5, pp. 140–145, 2012.
-  T. Tikkanen, P. Ruusuvuori, L. Latonen, and H. Huttunen, “Training based cell detection from bright-field microscope images,” in Image and Signal Processing and Analysis, 2015, pp. 160–164.
-  C. Arteta, V. Lempitsky, J. A. Noble, and A. Zisserman, “Learning to detect cells using non-overlapping extremal regions,” in MICCAI, 2012, pp. 348–356.
-  E. Bernardis and S. X. Yu, “Pop out many small structures from a very large microscopic image,” Medical image analysis, vol. 15, no. 5, pp. 690–707, 2011.
-  J. Matasa, O. Chuma, M. Urbana, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image Vision Comput., vol. 22, no. 10, pp. 761–767, 2004.
-  D. G. Lowe, “Object recognition from local scale-invariant features,” in ICCV, 1999.
-  N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in CVPR, 2005, pp. 886–893.
-  M. Sussman, P. Smereka, and S. Osher, “A level set approach for computing solutions to incompressible two-phase flow,” Journal of Computational physics, vol. 114, no. 1, pp. 146–159, 1994.
-  T. F. Chan and L. A. Vese, “Active contours without edges,” IEEE Transactions on image processing, vol. 10, no. 2, pp. 266–277, 2001.
-  O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, 2015, pp. 234–241.
-  W. Xie, J. A. Noble, and A. Zisserman, “Microscopy cell counting and detection with fully convolutional regression networks,” Computer methods in biomechanics and biomedical engineering: Imaging and Visualization, vol. 6, no. 3, pp. 283–292, 2018.
-  Y. Xue, R. Nilanjan, J. Hugh, and G. Bigras, “Cell counting by regression using convolutional neural network,” in ECCV, 2016, pp. 274–290.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012, pp. 1097–1105.
-  A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. Devito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS Workshop, 2017.
-  N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
-  I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, “Maxout networks,” 2013, arXiv 1302.4389.
-  J. P. Cohen, G. Boucher, C. A. Glastonbury, H. Z. Lo, and Y. Bengio, “Count-ception: Counting by fully convolutional redundant counting,” in ICCV Workshop, 2017, pp. 18–26.
-  R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in CVPR, 2014, pp. 580–587.
-  J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in CVPR, 2016, pp. 779–788.
-  Y. Wu, J. Lim, and M. H. Yang, “Online object tracking: A benchmark,” in CVPR, 2013, pp. 2411–2418.
-  T. Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollar, “Microsoft COCO: Common objects in context,” in ECCV, 2014, pp. 740–755.
-  Y. Xie, F. Xing, X. Kong, H. Su, and L. Yang, “Beyond classification: structured regression for robust cell detection using convolutional neural network,” in MICCAI, 2015, pp. 358–365.
-  E. Walach and L. Wolf, “Learning to count with cnn boosting,” in ECCV, 2016, pp. 660–676.
-  J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015, pp. 3431–3440.
-  K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in ICCV, 2017, pp. 2980–2988.
-  Y. Xue and N. Ray, “Cell detection with deep convolutional neural network and compressed sensing,” 2017, arXiv 1708.03307.
-  C. Shang, H. Ai, and B. Bai, “End-to-end crowd counting via joint learning local and global count,” in ICIP, 2016, pp. 1215–1219.
-  H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source multi-scale counting in extremely dense crowd images,” in CVPR, 2013, pp. 2547–2554.
-  J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in CVPR, 2017, pp. 7263–7271.
-  M. Bai and R. Urtasun, “Deep watershed transform for instance segmentation,” in CVPR, 2017, pp. 2858–2866.
-  H. Liang, N. Dabrowska, J. Kapur, and D. S. Weller, “Structure-based Intensity Propagation for 3D Brain Reconstruction with Multilayer Section Microscopy,” IEEE Transactions on Medical Imaging, 2018, in press.
-  P. Kainz, “You Should Use Regression to Detect Cells,” in MICCAI, 2015, pp. 276–283.
-  V. Lempitsky and A. Zisserman, “Vgg cell dataset from learning to count objects in images,” 2010.
-  A. Lehmussola, P. Ruusuvuori, J. Selinummi, H. Huttunen, and O. Yli-Harja, “Computational framework for simulating fluorescence microscope images with cell populations,” IEEE Transactions on Medical Imaging, vol. 26, no. 7, pp. 1010–1016, 2007.
-  D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv 1412.6980.
-  Y. Sasaki, “The truth of the F-measure,” Teach Tutor mater, vol. 1, no. 5, pp. 1–5, 2007.