Cost-efficient segmentation of electron microscopy images using active learning††thanks: We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research. Y.S. is a Marylou Ingram Scholar.
Over the last decade, electron microscopy has improved up to a point that generating high quality gigavoxel sized datasets only requires a few hours. Automated image analysis, particularly image segmentation, however, has not evolved at the same pace. Even though state-of-the-art methods such as U-Net and DeepLab have improved segmentation performance substantially, the required amount of labels remains too expensive. Active learning is the subfield in machine learning that aims to mitigate this burden by selecting the samples that require labeling in a smart way. Many techniques have been proposed, particularly for image classification, to increase the steepness of learning curves. In this work, we extend these techniques to deep CNN based image segmentation. Our experiments on three different electron microscopy datasets show that active learning can improve segmentation quality by 10 to 15 in terms of Jaccard score compared to standard randomized sampling.
Keywords:Electron microscopy Image segmentation Active learning.
Semantic image segmentation, the task of assigning pixel-level object labels to an image, is a fundamental task in many applications and one of the most challenging problems in generic computer vision. Particularly in biomedical imaging such as electron microscopy (EM), where annotated data is very sparsely available and image data contains high resolution ( 5 nm) and ultrastructural content. Nevertheless, deep learning has caused significant improvements in this particular research domain, over the last years [6, 11, 8].
Even though the impressive advances that have been made so far, state-of-the-art techniques mostly rely on large annotated datasets. This is an impractical assumption and only satisfied for particular use-cases such as e.g. neuron segmentation . For segmentation of alternative classes, research often falls back to manual segmentation or interactive approaches that rely on shallow segmentation algorithms [14, 3, 1], which is costly or sacrifices performance.
This work focuses on active learning, a subdomain of machine learning that aims to minimize supervision without sacrificing predictive accuracy. This is achieved by iteratively querying a batch of samples to a label providing oracle, adding them to the train set and retraining the predictor. The challenge is to come up with a smart selection criterion to query samples and maximize the steepness of the training curve .
In this work, we employ state-of-the-art active learning approaches, commonly used for classification, to image segmentation. Particularly, we illustrate on three EM datasets that the amount of annotated samples can be reduced to a few hundreds to obtain close to fully supervised performance. We start by formally defining the active learning problem in the context of image segmentation in Section 2. In Section 3, we give an overview of commonly used, recent active learning approaches in classification  and how these techniques can be used in segmentation. This is followed by experimental results and a discussion in Section 4. Lastly, the paper is concluded in Section 5.
We consider the task of image segmentation, i.e. given an pixel image , we aim to compute a pixel-level labeling , where is the label space and is the number of classes. We particularly focus on the case of binary segmentation, i.e. . Let be the probability class distribution of pixel of a parameterized segmentation algorithm (for example, an encoder-decoder network such as U-Net ).
Consider a large pool of i.i.d. sampled data points over the space as , where , and an initial pool of randomly chosen distinct data points indexed by . An active learning algorithm initially only has access to and and iteratively extends the currently labeled pool by querying samples from the unlabeled set to an oracle. After iteration , the predictor is retrained with the available samples and labels , thereby improving the segmentation quality. Note that, without loss of generalization, the active learning approaches below are described for as we can also query samples for iterations, without retraining. The complete active learning workflow is shown in Figure 1.
3 Active learning
In the following sections, we will discuss 5 well known and recent active learning approaches for classification: maximum entropy selection [9, 10], least confidence selection , Bayesian active learning disagreement , k-means sampling  and core set active learning . Furthermore, we will show how these techniques can be applied to image segmentation.
3.1 Maximum entropy sampling
Maximum entropy is a straightforward selection criterion that aims to select samples for which the predictions are uncertain [9, 10]. Formally speaking, we adjust the selection criterion to a pixel-wise entropy calculation as follows:
In other words, the entropy is calculated for each pixel and cumulated. Note that a high entropy will be obtained when , this is exactly when there is no real consensus on the predicted class (i.e. high uncertainty).
3.2 Least confidence sampling
Similar to maximum entropy sampling, the least confidence criterion selects samples for which the predictions are uncertain:
As the name suggest, the least confidence criterion selects the probability that corresponds to the predicted class. Whenever this probability is small, the predictor is not confident about this decision. For image segmentation, we cumulate the maximum probabilities to select the least confident samples.
3.3 Bayesian active learning disagreement
The Bayesian active learning disagreement (BALD) approach  is specifically designed for convolutional neural networks (CNNs). It makes use of Bayesian CNNs in order to cope with the small amounts of training data that are usually available in active learning workflows. A Bayesian CNN assumes a prior probability distribution placed over the model parameters . The uncertainty in the weights induces prediction uncertainty by marginalising over the approximate posterior :
where is the dropout distribution, which approximates the prior probability distribution . In other words, a CNN is trained with dropout and inference is obtained by leaving dropout on. This causes uncertainty in the outcome that can be used in existing criteria such as maximum entropy (Equation (1)).
3.4 K-means sampling
Uncertainty-based approaches typically sample close to the decision boundary of the classifier. This introduces an implicit bias that does not allow for data exploration. Most explorative approaches that aim to solve this problem transform the input to a more compact and efficient representation (e.g. the feature representation before the fully connected stage in a classification CNN). The representation that we used in our segmentation approach was the bottleneck representation in the U-Net. The -means sampling approach in particular then finds clusters in this embedding using -means clustering. The selected samples are then the samples in the different clusters that are closest to the centroids.
3.5 Core set active learning
The core set approach  is a recently proposed active learning approach for CNNs that is not based on uncertainty or exploratory sampling. Similar to -means, samples are selected from an embedding in such a way that a model trained on the selection of samples would be competitive for the remaining samples. Similar as before, the representation that we used in our segmentation approach was the bottleneck representation in the U-Net. In order to obtain such competitive samples, this approach aims to minimize the so-called core set loss. This is the difference between average empirical loss over the set of labeled samples (i.e. ) and the average empirical loss over the entire dataset including unlabelled points (i.e. ).
4 Experiments & discussion
Three public EM datasets where used to validate our approach:
The EPFL dataset111Data available at https://cvlab.epfl.ch/data/data-em/ represents a m section taken from the CA1 hippocampus region of the brain, corresponding to a volume. Two subvolumes were manually labeled by experts for mitochondria. The data was acquired by a focused ion-beam scanning EM and the resolution of each voxel is approximately nm.
The VNC dataset222Data available at https://github.com/unidesigner/groundtruth-drosophila-vnc/ represents two m sections taken from the Drosophila melanogaster third instar larva ventral nerve cord, corresponding to a volume. One stack was manually labeled by experts for mitochondria. The data was acquired by a transmission EM and the resolution of each voxel is approximately nm.
The MiRA dataset333Data available at http://220.127.116.11/MiRA/mitochondria31/  represents a m section taken from the mouse cortex, corresponding to a volume. The complete volume was manually labeled by experts for mitochondria. The data was acquired by an automated tape-collecting ultramicrotome scanning EM and the resolution of each voxel is approximately nm.
To properly validate the discussed approaches, we split the available labeled data in a training and testing set. In the cases of a single labeled volume (VNC and MiRA), we split these datasets halfway along the axis. A smaller U-Net (with 4 times less feature maps) was initially trained on randomly selected samples in the training volume (learning rate of for 500 epochs). Next, we consider a pool of samples in the training data to be queried. Each iteration, samples are selected from this pool based on one of the discussed selection criteria, and added to the labeled set , after which the segmentation network is finetuned (learning rate of for 200 epochs). This procedure is repeated for iterations, leading to a maximum training set size of 500 samples. We validate the segmentation performance with the well known Jaccard score:
This segmentation metric is also known as the intersection-over-union (IoU).
The resulting performance curves of the discussed approaches on the three datasets are shown in Figure 2. We additionally show the performance obtained by full supervision (i.e. all labels are available during training), which is the maximum achievable segmentation performance. In comparison to the random sampling baseline, we observe that the maximum entropy, least confidence and BALD approach perform significantly better. These methods obtain about 10 to 15 performance increase for the same amount of available labels for all datasets. The recently proposed core set approach performs similar to slightly better than the baseline. We expect that this method can be improved by considering alternative embeddings. Lastly, we see that -means performs significantly worse than random sampling. Even though this could also be an embedding problem such as with the core set approach, we think that exploratory sampling alone will not allow the predictor to learn from challenging samples, which are usually outliers. We expect that a hybrid approach based on both exploration and uncertainty might lead to better results, and consider this future work.
Figure 3 shows qualitative segmentation results on the EPFL dataset. In particular, we show results of the random, -means and maximum entropy sampling methods using 120 samples, and compare this to the fully supervised approach. The maximum entropy sampling technique is able to improve the others by a large margin and closes the gap towards fully supervised learning significantly.
Lastly, we are interested in what type of samples the active learning approaches select for training. Figure 4 shows 4 samples of the VNC dataset that correspond to the highest prioritized samples, according to the least confidence criterion, that were selected in the first 4 iterations. The top row illustrates the probability predictions of the network at that point in time, whereas the bottom row shows the pixel-wise uncertainty of the sample (i.e. the maximum in Equation (2)). Note that the initial predictions at are of poor quality, as the network was only trained on 20 samples. Moreover, the uncertainty is high in regions where the network is uncertain, but it is low in regions where the network is wrong. The latter is a common issue in active learning and related to the exploration vs. uncertainty trade off. However, over time, we see that the network performance improves and more challenging samples are being queried to the oracle.
Image segmentation is one of the most challenging computer vision tasks, particularly for biomedical data such as electron microscopy as annotations are sparsely available. In order to be practically usable and scalable, image segmentation algorithms such as U-Net need to be able to cope with smaller amounts of annotated data. In this work, we propose to employ recent active learning approaches to minimize annotation efforts for training segmentation networks. Specifically, several of these approaches (e.g. maximum entropy and least confidence sampling) obtain the same performance as the random sampling baseline, but require 4 times fewer annotations. In future work, we will further minimize labeling efforts, by combining this active learning paradigm with weakly supervised approaches (i.e. using partially annotated data).
-  (2017) Trainable Weka Segmentation: A machine learning tool for microscopy pixel classification. Bioinformatics. External Links: Cited by: §1.
-  (2015) Crowdsourcing the creation of image segmentation algorithms for connectomics. Frontiers in Neuroanatomy 9. External Links: Cited by: §1.
-  (2016) Microscopy Image Browser: A Platform for Segmentation and Analysis of Multidimensional Datasets. PLoS Biology 14 (1). External Links: Cited by: §1.
-  (2003) Incorporating Diversity in Active Learning with Support Vector Machines. In Proceedings, Twentieth International Conference on Machine Learning, External Links: Cited by: §3.
-  (2011) Active Learning with Clustering. Active Learning and Experimental Design @ AISTATS. Cited by: §3.
-  (2012) Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images. NIPS, pp. 1–9. External Links: Cited by: §1.
-  (2017) Deep Bayesian active learning with image data. In International Conference on Machine Learning, External Links: Cited by: §3.3, §3.
-  (2018) High-precision automated reconstruction of neurons with flood-filling networks. Nature Methods. External Links: Cited by: §1.
-  (2009) Multi-class active learning for image classification. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §3.1, §3.
-  (2013) Adaptive Active Learning for Image Classification. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 859–866. Cited by: §3.1, §3.
-  (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pp. 234–241. External Links: Cited by: §1, §2.
-  (2018) Active Learning for Convolutional Neural Networks: A Core-Set Approach. In International Conference on Learning Representations, Cited by: §3.5, §3.
-  (2010) Active Learning Literature Survey. Technical report University of Wisconsin. External Links: Cited by: §1, §1.
-  (2011) Ilastik: Interactive learning and segmentation toolkit. In IEEE International Symposium on Biomedical Imaging, pp. 230–233. External Links: Cited by: §1.
-  (2018) Automatic Mitochondria Segmentation for EM Data Using a 3D Supervised Convolutional Network. Frontiers in Neuroanatomy. External Links: Cited by: 3rd item.