BCCNet: Bayesian classifier combination neural network

BCCNet: Bayesian classifier combination neural network

Olga Isupova
&Yunpeng Li
&Danil Kuzin
&Stephen J Roberts
&Katherine Willis
Steven Reece
Department of Engineering Science, University of Oxford, UK
Department of Computer Science, University of Surrey, UK
Department of Automatic Control and Systems Engineering, University of Sheffield, UK
Department of Zoology, University of Oxford, UK

Machine learning research for developing countries can demonstrate clear sustainable impact by delivering actionable and timely information to in-country government organisations (GOs) and NGOs in response to their critical information requirements. We co-create products with UK and in-country commercial, GO and NGO partners to ensure the machine learning algorithms address appropriate user needs whether for tactical decision making or evidence-based policy decisions. In one particular case, we developed and deployed a novel algorithm, BCCNet, to quickly process large quantities of unstructured data to prevent and respond to natural disasters. Crowdsourcing provides an efficient mechanism to generate labels from unstructured data to prime machine learning algorithms for large scale data analysis. However, these labels are often imperfect with qualities varying among different citizen scientists, which prohibits their direct use with many state-of-the-art machine learning techniques. We describe BCCNet, a framework that simultaneously aggregates biased and contradictory labels from the crowd and trains an automatic classifier to process new data. Our case studies, mosquito sound detection for malaria prevention and damage detection for disaster response, show the efficacy of our method in the challenging context of developing world applications.


BCCNet: Bayesian classifier combination neural network

  Olga Isupova olga.isupova@eng.ox.ac.uk Yunpeng Li yunpeng.li@surrey.ac.uk Danil Kuzin dkuzin1@sheffield.ac.uk Stephen J Roberts sjrob@robots.ox.ac.uk Katherine Willis kathy.willis@zoo.ox.ac.uk Steven Reece reece@robots.ox.ac.uk Department of Engineering Science, University of Oxford, UK Department of Computer Science, University of Surrey, UK Department of Automatic Control and Systems Engineering, University of Sheffield, UK Department of Zoology, University of Oxford, UK


noticebox[b]NeurIPS 2018 Workshop on Machine Learning for the Developing World (ML4D), Montréal, Canada.\end@float

1 Introduction

Wide area situation awareness or surveillance, for example, following a natural disaster or preempting disease, benefits from rich, update-to-date yet unstructured data, including post hurricane satellite imagery and malarial mosquito audio signals. A small amount of data labelled by hand through crowdsourcing platforms like Zooniverse111https://www.zooniverse.org can be used to train machine learning algorithms, such as neural networks (NNs), to label the rest of the data [1]. However, the crowdsourced labels can be noisy and inconsistent, posing enormous challenges for machine learning algorithms to aggregate information and produce best decisions for policy makers and rescue workers [2]. The Bayesian classifier combination (BCC) algorithm [3] resolves classifier bias and aggregates labels taking classifier consistency into account.

Figure 1: Heatmap of building damage proportion in Northern Dominica after hurricane Maria in 2017: less than (green), (magenta), greater than (red).

We propose an extension to BCC, the Bayesian classifier combination neural network (BCCNet), which incorporates a neural network object classifier. BCCNet effectively trains the neural network object classifier using BCC bias corrected crowd labels. A novel hybrid variational Bayesian and maximum likelihood approach is developed to jointly learn the neural network and BCC parameters. We demonstrate the efficacy of the approach on imbalanced data and biased crowd labels, scenarios common in real applications.

Our algorithm has been developed and deployed in collaboration with Zooniverse and Rescue Global222Rescue Global, Oxford machine learning and Zooniverse operational response team is collectively called the ‘Planetary Response Network’., a UK based not-for-profit, to generate damage heatmaps for disaster responders by combining crowd labels of satellite imagery immediately following Hurricanes Irma and Maria (2017) [3, 4, 5] (see Figure 1) and earlier versions following earthquakes in Nepal (2015) and Ecuador (2016). These heatmaps were passed to the UN, FEMA and over 60 NGOs during the response phase of Irma and Maria in a timely manner. This work has led to several research projects in disaster management and environment protection in Africa, South East Asia and South America. Our Zooniverse project on mosquito detection has crowdsourced labels from more than 1200 citizen scientists on data collected in Thailand, Kenya, US and UK.

The rest of the paper is organised as follows: Section 2 describes the BCCNet model. We present two case studies in Section 3 and conclusions in Section 4.

2 The Bayesian Classifier Combination Neural Network Algorithm

BCCNet is a multi-class classifier that combines high dimensional data (e.g., images, audio signals) and noisy, potentially biased crowdsourced labels from a set of imperfect base classifiers (e.g., crowd members). It integrates a neural network with the independent Bayesian classifier combination algorithm [3].


Figure 2: Graphical model of BCCNet

A neural network with parameters takes an object , e.g., an image patch of a satellite image, as input and predicts a probability that this object has class , , where is the number of data points, and is the number of possible classes.

A label of a base classifier is drawn from the multinomial distribution depending on the true label for this data point:


where is a confusion matrix for the base classifier , is the -th row of the confusion matrix , is the total number of base classifiers, is the number of values for the base classifiers’ labels. Our approach tolerates the case when labels from the base classifiers are missing for some objects.

We impose a Dirichlet prior with hyperparameters for rows of the confusion matrices:


The resulting graphical model is given in Figure 2. BCCNet inference is based on maximisation of the evidence lower bound (ELBO). The ELBO is optimised using coordinate ascent over the NN parameters and the posterior approximating distributions for object class labels and confusion matrices for the base classifiers. The NN parameters are updated via stochastic gradient ascent and the posterior approximating distribution is found using the variational mean-field approach. We iterate between one full pass of the data for the NN parameter update and one iteration for the approximating distribution update. We refer to this algorithm as VB.

3 Experiments and results

We evaluated our approach on two real case studies, response after a natural disaster and malaria prevention, and compared the proposed algorithm (VB) with two baselines: i) the EM-algorithm [6] (EM) extended to our BCCNet model from Section 2, and ii) the neural network with an added crowd layer that models the confusion matrices [7] (CL). The base neural network for all methods was LeNet-5 [8] with the Adam optimiser [9]. The learning rate was chosen by grid search on validation datasets. We also used validation datasets for early stopping. The results are obtained from trained neural networks on held-out test datasets over Monte Carlo runs with random initialisation.

3.1 Case study 1: damage detection in satellite imagery for disaster response

We analysed crowdsourced labels of damage from Digital Globe333https://www.digitalglobe.com high resolution (30cm) optical satellite imagery of Dominica before and after Hurricane Maria in 2017. Crowd members were presented with a subset of satellite sub-images after the hurricane and asked, amongst other tasks, to draw bounding boxes around all buildings in their sub-images and also mark building damage.

We extracted image patches from both ‘before’ and ‘after’ imagery corresponding to the bounding boxes as input for a neural network. Image patches were resized as (the size of an average bounding box). Before and after image patches formed different channels of the NN input layer. We also extracted corresponding labels from the crowd as: “background”, “undamaged building”, and “damaged building”. We thus obtain a dataset with objects labelled by volunteers (each object is labelled on average by volunteers). This dataset is challenging because of the high discrepancy between different crowd members’ answers: of the objects were assigned to different classes by the crowd members (for comparison in the second case study, below, the data had only of such objects).

The data lacked ground truth labels to validate the algorithms so we defined ground truth as the crowd consensus output inferred using BCC  [3] when the whole dataset was processed. We then divided the dataset in the ratio into training, validation and test datasets for evaluation of the algorithms. The classification accuracy is given in Figure (a)a. One can notice that the crowd layer network has the lowest accuracy. The VB algorithm for BCCNet provides not only the highest accuracy but also the most stable results among different Monte Carlo runs consistently for all three classes.

3.2 Case study 2: mosquito detection in audio for malaria prevention

The HumBug project444http://humbug.ac.uk aims to detect malaria-vectoring mosquitoes through their flight tones [10]. A malaria epidemic can occur a few weeks after initial impact of the disease and it is crucial to monitor malaria vectors (i.e. Anopheles species) and respond in the early stages [11]. As an initial step, we have launched a crowdsourcing project on the Zooniverse platform555https://www.zooniverse.org/projects/yli/humbug to label 2-second length audio clips as containing “mosquito sound” or “no mosquito sound”. The project has attracted volunteers up to date who have labelled audio clips from laboratory recordings collected in UK, US and Kenya and field recordings from Thailand. However, the crowd label matrix is still very sparse, of the matrix values are missing, so we chose data clips that were labelled by at least volunteers as our training dataset to ensure that our objects were assigned a class with some confidence. Consequently, we had and in this case. We used a subset of laboratory recordings with labels provided by the research team of the Humbug project as ground truth labels for test and validation datasets with samples for testing and samples for validation.

The neural network input comprised sound ‘images’ constructed from audio clips where is the dimension of the mel-spectrum and is the number of windows we used to divide each of the 2-second long audio clips. Mosquito detection audio clips are naturally heavily imbalanced with most of clips containing no mosquito sounds. According to the majority voted labels in the training data there are only of clips containing mosquito sounds. In these settings, the crowd layer neural network always predicts “no mosquito”. Therefore, for the CL algorithm we balanced the training dataset based on majority voted labels. Both EM and VB algorithms for BCCNet are able to train appropriate networks on the raw data.

Figure (b)b provides box plots of F1 measure for the mosquito sound class. We used the F1 measure in this case as the data is highly imbalanced. The crowd layer neural network has the lowest median accuracy and the highest variance among different Monte Carlo runs. The EM-algorithm for BCCNet provides more stable and more accurate results in comparison to the crowd layer network. The proposed VB-algorithm for BCCNet also gives stable results and additionally it has the highest median F1 measure amongst the competitors.

Figure 3: Performance results. LABEL:sub@fig:catapult_results box plots for accuracy on the damage detection data: for all classes (blue), for the “background” class (red), for the “undamaged building” class (green), and for the “damaged building” class (lavender). LABEL:sub@fig:humbug_results box plots for F1 measure on the mosquito detection data.

4 Conclusions

We present BCCNet, an approach to jointly aggregate noisy crowdsourced labels and train a neural network to process new data. This approach can be rapidly deployed as a solution to challenging problems in the developing world that lack labelled data. We demonstrate that BCCNet is stable, able to work with imbalanced data and contradictory crowd labels. Ongoing operational engagement with disaster responders shows that this technology delivers sustainable impact by providing actionable and timely information to end users.


This work is part-funded by a Google Impact Challenge award, by a grant from the Alan Turing Institute’s Data Centric Engineering programme and also through the UK Space Agency’s International Partnerships Programme. The authors would like to thank Digital Globe, Planet, ESA and the Satellite Applications Catapult for ongoing satellite data provision; Dr. Marianne Sinka at the University of Oxford, UK, Paul I. Howell at the Centers for Disease Control and Prevention (CDC), BEI Resources in Atlanta, USA, Dustin Miller in CDC Foundation, Centers for Disease Control and Prevention in Atlanta, Dr. Sheila Ogoma, US Army Military Research Unit, Kisumu, Kenya (USAMRU-K), and Dr. Theeraphap Chareonviriyaphap, Kasersart University, Thailand for their collaborations on data collection and system deployment.


  • Gaunt et al. [2016] A. Gaunt, D. Borsa, and Y. Bachrach. Training deep neural nets to aggregate crowdsourced responses. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, pages 242–251, Jun. 2016.
  • Poblet et al. [2017] M. Poblet, E. García-Cuesta, and P. Casanovas. Crowdsourcing roles, methods and tools for data-intensive disaster management. Information Systems Frontiers, Jan. 2017.
  • Simpson et al. [2013] E. Simpson, S.J. Roberts, I. Psorakis, and A. Smith. Dynamic Bayesian combination of multiple imperfect classifiers. In Decision making and imperfection, pages 1–35. Springer, 2013.
  • Simpson et al. [2017] E. Simpson, S. Reece, and S.J. Roberts. Bayesian heatmaps: probabilistic classification with multiple unreliable information sources. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 109–125, 2017.
  • R. Yore [2017] R. Yore. Here’s how citizen scientists assisted with the disaster response in the caribbean. The Conversation (Science and Technology), 2017.
  • Albarqouni et al. [2016] S. Albarqouni, C. Baur, F. Achilles, V. Belagiannis, S. Demirci, and N. Navab. Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Transactions on Medical Imaging, 35(5):1313–1321, 2016.
  • Rodrigues and Pereira [2018] F. Rodrigues and F. Pereira. Deep learning from crowds. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1611–1619, 2018.
  • Lecun et al. [1998] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • Kingma and Ba [2015] D. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, 2015.
  • Li et al. [2017] Y. Li, D. Zilli, H. Chan, I. Kiskin, M. Sinka, S.J. Roberts, and K. Willis. Mosquito detection with low-cost smartphones: data acquisition for malaria research. In NIPS Workshop on Machine Learning for the Developing World, Long Beach, USA, Dec. 2017. arXiv:1711.06346.
  • Waring and Brown [2005] S. C. Waring and B. J. Brown. The threat of communicable diseases following natural disasters: A public health response. Disaster Management & Response, 3(2):41 – 47, 2005. ISSN 1540-2487.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description