###### Abstract

We study pool-based active learning with abstention feedbacks where a labeler can abstain from labeling a queried example with some unknown abstention rate. Using the Bayesian approach, we develop two new greedy algorithms that learn both the classification problem and the unknown abstention rate at the same time. These are achieved by incorporating the estimated average abstention rate into the greedy criteria. We prove that both algorithms have near-optimality guarantees: they respectively achieve a constant factor approximation of the optimal expected or worst-case value of a useful utility function. Our experiments show the algorithms perform well in various practical scenarios.

oddsidemargin has been altered.

marginparsep has been altered.

topmargin has been altered.

marginparwidth has been altered.

marginparpush has been altered.

paperheight has been altered.

The page layout violates the ICML style.
Please do not change the page layout, or include packages like geometry,
savetrees, or fullpage, which change it for you.
We’re not able to reliably undo arbitrary changes to the style. Please remove
the offending package(s), or layout-changing commands and try again.

Bayesian Active Learning With Abstention Feedbacks

Cuong V. Nguyen ^{1 }
Lam Si Tung Ho ^{2 }
Huan Xu ^{3 }
Vu Dinh ^{4 }
Binh Nguyen ^{5 }

^{†}

^{†}footnotetext: Work done prior to Amazon and Alibaba

^{1}Amazon AI

^{2}Dalhousie University

^{3}Georgia Institute of Technology and Alibaba

^{4}University of Delaware

^{5}University of Science, Vietnam. Correspondence to: Cuong V. Nguyen <nguycuo@amazon.com>.

2019 ICML Workshop on Human in the Loop Learning (HILL 2019), Long Beach, USA. Copyright by the author(s).\@xsect

We consider active learning with abstention feedbacks, where a labeler can abstain from labeling queried examples with some unknown abstention rate. This problem is one of the several attempts to deal with imperfect labelers in active learning who may give incorrect or noisy labels (Donmez & Carbonell, 2008; Golovin et al., 2010; Naghshvar et al., 2012; Ni & Ling, 2012; Malago et al., 2014; Zhang & Chaudhuri, 2015; Cuong et al., 2016; Chen et al., 2017) or in our case, give abstention feedbacks to queries (Fang et al., 2012; Ramirez-Loaiza et al., 2014; Yan et al., 2015). Learning with abstention feedbacks is important in many real-life scenarios. Below we discuss some examples where this problem is useful. In these examples, although the reasons for the abstention vary, from the learner’s view they are the same: the learner will receive no labels for some queries and the true labels for the others.

Crowdsourcing: In crowdsourcing (Yan et al., 2011; Zhao et al., 2011; Mozafari et al., 2014; Manino et al., 2016; Singla et al., 2016), we have many labelers, each of whom only has expertise in some certain area and therefore can only provide labels for a subset of the input domain. These labelers were also called labelers with a knowledge blind spot (Fang et al., 2012). In this case, active learning is a good approach to quickly narrow down the expertise domain of a labeler and focus on querying examples in this region to learn a good model. By adapting active learning algorithms to each labeler, we can also gather representative subsets of labeled data from the labelers and combine them into a final training set.

Learning with Corrupted Labels: In this problem, the abstention feedbacks do not come from the labeler but occur due to corruptions in the labels received by the learner. The corruptions could be caused by bad communication channels that distort the labels or could even be caused by attackers attempting to corrupt the labels (Zhao et al., 2017). The setting in our paper can deal with the case when the corrupted labels are completely lost, i.e. they cannot be recovered and are not converted to incorrect ones.

In this paper, we consider pool-based active learning with a fixed budget setting, where a finite pool of unlabeled examples is given in advance and we need to sequentially select examples from the pool to query their labels. Our setting assumes abstention feedbacks count towards the budget , so we need to be careful when selecting the queried examples. Our work takes a Bayesian approach to the problem and learns both the classification model and the unknown abstention rate at the same time. We call this approach the Bayesian Active Learning with Abstention Feedbacks (BALAF) framework. Our framework can be used to instantiate different algorithms for the active learning with abstention feedbacks problem.

We also contribute to the understandings of this problem both algorithmically and theoretically. Algorithmically, we develop two novel greedy algorithms for active learning with abstention feedbacks based on our BALAF framework. Each algorithm uses a different greedy criterion to select queried examples that can give information for both the classification model and the abstention rate. Theoretically, we prove that our proposed algorithms have theoretical guarantees for a useful utility of the selected examples in comparison to the optimal active learning algorithms. To the best of our knowledge, these are the first theoretical results for active learning with abstention feedbacks in the Bayesian pool-based setting.

The first greedy algorithm that we propose aims to maximize the expected version space reduction utility (Golovin & Krause, 2011) of the joint deterministic space deduced from the spaces of possible classification models and abstention rates. Version space reduction was shown to be a useful utility for active learning (Golovin & Krause, 2011; Cuong et al., 2014; Cuong & Xu, 2016), and our algorithm targets this utility when selecting queried examples. In essence, the proposed algorithm is similar to the maximum Gibbs error algorithm (Cuong et al., 2013) except that we incorporate the terms controlling the estimated abstention rate into the greedy criterion. By using previous theoretical results for adaptive submodularity (Golovin & Krause, 2011), we are able to prove that our algorithm has an average-case near-optimality guarantee: the average utility value of its selected examples is always within a constant factor of the optimal average utility value.

In contrast to the first algorithm, the second algorithm that we propose aims to maximize the worst-case version space reduction utility above. This algorithm resembles the least confidence active learning algorithm (Lewis & Gale, 1994) with the main difference that we also incorporate the estimated abstention rate into the greedy criterion. From previous theoretical results for pointwise submodularity (Cuong et al., 2014), we can prove that the proposed algorithm has a worst-case near-optimality guarantee: the worst-case utility value of its selected examples is always within a constant factor of the optimal worst-case utility value.

We conduct experiments to evaluate our proposed algorithms on various binary classification tasks under three different realistic abstention scenarios. The experiments show that our algorithms are useful compared to the passive learning and normal active learning baselines with various abstention rates under these scenarios.

In pool-based active learning, we are given a finite set (called a pool) of unlabeled examples and a budget , and we need to sequentially query the labels of examples from to learn a good classifier. Normal active learning assumes the human labeler would always give labels for queried examples. By contrast, in this paper, we consider active learning with abstention feedbacks where the labeler is allowed to abstain from labeling a queried example. In other words, the labeler may return “no label” to a queried example. Our work considers the case where abstention feedbacks count towards the budget , so we need to select queried examples to obtain as many useful labels as possible.

To define the problem, let be the set of all possible labels. Assume there is an unknown true labeling of the whole pool that is used by the labeler to label queried examples, and the labeler will return for a queried example if he decides to label it. Also assume there is an unknown true abstention pattern used by the labeler to decide whether or not to label a queried example. That is, if the labeler abstains from labeling and if he decides to label it.

In this setting, an active learning algorithm is a policy for choosing queried examples from , and these chosen examples depend on the labels as well as abstention feedbacks of previously selected examples. By definition, a policy is a mapping from a set of examples and the labeler’s corresponding responses to the next unlabeled example to be queried, and it can be represented by a policy tree (see Figure 1). During the active learning process, a learner (a policy or algorithm) sequentially selects unlabeled training examples one at a time from and asks the labeler for their labels. The labeler would use to decide whether or not to give the labels, which in turn are determined by . Pool-based active learning with abstention feedbacks aims to design algorithms (policies) for selecting queried examples that can give us as much information about (and in some cases, ) as possible.

We shall take the Bayesian approach to pool-based active learning with abstention feedbacks, which we call the Bayesian active learning with abstention feedbacks (BALAF) framework. In our framework, we consider a (possibly infinite) set of probabilistic hypotheses, where each hypothesis is a random function from to . Formally, for any , is a categorical distribution with probability mass function for all . We assume a prior distribution on . If we observe a label of an example , we can use Bayes’ rule to obtain a posterior distribution: .

To deal with the abstention feedbacks, we also take the Bayesian approach and consider a set of possible abstention hypotheses from to , where each function gives us the abstention rate of the examples in . More specifically, is the probability that the labeler abstains from labeling , according to the abstention hypothesis . We also assume a prior on . Note that we have slightly abused the notation for both priors on and . In this case, can be thought of as a joint distribution on where the two elements are independent, i.e., for and .

During the active learning process, if we receive a label for an example , we can update the posterior distribution using the following Bayes’ rule:

Otherwise, if the labeler abstains from labeling , we can update the posterior distribution using:

We summarize the general BALAF framework in Algorithm 1, where examples are chosen sequentially and the posteriors are updated according to the above rules. The framework returns the final posteriors and which can be used to make prediction on new examples or to serve as priors in future active learning processes. For example, the label distribution of a new example can be predicted using the posterior by:

The posterior , on the other hand, can be used as a prior on in future active learning processes if the same labeler is employed to give labels. This would enable the learning algorithm to use the prior knowledge about the labeler’s preferences to select the most suitable queried examples while avoiding re-learning his abstention patterns from scratch. The posterior can also be transferred and adapted to other labelers who may have similar abstention patterns.

In this section, we propose two specific instances of the BALAF framework above that can achieve near-optimality guarantees for reducing the hypothesis spaces that contain and . Our first algorithm provides an average-case near-optimality guarantee, while the second algorithm provides a worst-case near-optimality guarantee. The algorithms only differ in the ways we choose the queried data point in Algorithm 1.

In what follows, for any and , we define . We shall assume and are independent for any fixed and . Thus, is also a categorical distribution with probability mass function for all . We call a labeling of as it contains the labels of the examples in .

For any and any distribution on , let be the random variable for the labeling of with respect to the distribution . We note that takes values in with probability mass function:

(1) |

for all . This is also the marginal probability that the labeling of is . As a special case, if is a singleton and is the random variable for the label of , we write for to denote the probability mass function of .

In this average-case BALAF algorithm, at each iteration in Algorithm 1, we select the queried data point as follows. First, we estimate the average abstention function based on the current posterior :

(2) |

Then we select the example to query using the following greedy criterion:

(3) |

Intuitively, this criterion maximizes the expected one-step utility increment, with the utility function being defined in Equation (4) below. Equation (3) resembles the maximum Gibbs error criterion (Cuong et al., 2013) which selects , except that we incorporate the terms and into the criterion. This new criterion gives less preference to examples whose estimated abstention rate is near .

Near-optimality Guarantee. We now show the average-case near-optimality guarantee for this algorithm. In the context of this paper, near-optimality means the algorithm can achieve a constant factor approximation to the optimal algorithm with respect to some objective function.

To define an objective function that is useful for active learning with abstention feedbacks, we first induce a deterministic hypothesis space equivalent to the original probabilistic hypothesis space . In particular, consider the hypothesis space consisting of all deterministic functions from to . We induce a new prior on from the original prior such that , the marginal probability that the labeling of the whole pool is . For any and , we can define similarly to Equation (1) with the hypothesis space and distribution .

Also consider the space consisting of all deterministic functions from to . In essence, means the labeler abstains from labeling while means the labeler gives a label for . We will call each an abstention pattern. The prior also induces a probability distribution on where:

is the probability (with respect to the rate ) that the labeler gives or abstains from giving labels to the whole pool according to the abstention pattern . For any and , we can define similarly to Equation (1) with the hypothesis space and distribution , where is the random variable for the abstention pattern of . Note that the induced prior can also be thought of as a joint prior on where the two elements are independent, i.e., .

For , , and , we consider the following utility function:

(4) |

where is the joint marginal probability (with respect to ) that the labeling of is and the abstention pattern of is . This is a useful utility function for active learning because it is the version space reduction utility with respect to the joint prior on the joint space (Golovin & Krause, 2011).

With this utility, our objective function is defined as:

(5) |

where for all and , is the set of examples selected by the policy given that the true labeling is and the true abstention pattern is . This objective function is the average of the above utility with respect to the joint prior . Note that in this objective, and are drawn from the prior since we operate in the Bayesian setting. The following theorem states the guarantee for the average-case BALAF algorithm (proof in appendix).

###### Theorem 1.

For any budget , let be the policy selecting examples using average-case BALAF and let be the optimal policy w.r.t. that selects examples. We have: .

The worst-case BALAF algorithm is essentially similar to the average-case BALAF algorithm, except that we replace the criterion (3) by the following greedy criterion:

(6) |

Intuitively, this criterion maximizes the worst-case one-step utility increment, with the version space reduction utility in Equation (4). The criterion (6) resembles the least confidence criterion (Lewis & Gale, 1994), which selects , except that we also incorporate the terms and into the criterion. This new criterion gives less preference to examples whose estimated abstention rate is near .

Near-optimality Guarantee. We now show the worst-case near-optimality guarantee for this algorithm. For this guarantee, we still make use of the version space reduction utility function defined in Equation (4). Using this utility, we define the following worst-case objective function:

(7) |

This objective function is the worst possible utility achieved by the policy . The following theorem states the guarantee for this algorithm (proof in appendix).

###### Theorem 2.

For any budget , let be the policy selecting examples using worst-case BALAF and let be the optimal policy w.r.t. that selects examples. We have: .

In this section, we experimentally evaluate the proposed algorithms. In particular, we compare 4 algorithms: passive learning baseline (PL), active learning baseline using maximum Gibbs error (ALg), average-case BALAF (ALa), and worst-case BALAF (ALw).

For binary classification, ALg is equivalent to other well-known active learning algorithms such as the least confidence (Lewis & Gale, 1994) and maximum entropy (Settles, 2010) algorithms. The PL and ALg baselines do not learn the abstention probability of the examples, i.e., they ignore whether an example would be labeled or not when making a decision. In contrast, the proposed algorithms ALa and ALw take into account the estimated abstention probability when making decisions.

To show the potential of our algorithms further, we also consider two variants of ALa and ALw that are assumed to know a good estimate of the training examples’ abstention rates . In particular, for these versions of ALa and ALw (shown as dashed lines in Figures 2 and 3), we train a logistic regression model using the actual abstention pattern on the whole training set to predict the abstention probability for each example. We keep this classifier fixed throughout the experiments and use it to estimate in these versions of ALa and ALw.

In the algorithms, we use Bayesian logistic regression models for both and . That is, each hypothesis and each abstention hypothesis is a logistic regression model. We put an independent Gaussian prior on each parameter of the logistic regression models (for both and ). In this case, the posteriors are proportional to the regularized likelihood of the observed data with penalty. Since we experiment with data sets containing very high dimensional data (more than 61,000 dimensions), running MCMC or even variational inference is very slow. Thus, for efficiency, we use the maximum a posteriori (MAP) hypotheses to estimate the probabilities in our algorithms. Finding the MAP hypotheses is equivalent to maximizing the regularized log-likelihood of the observed data.

Following previous work (Settles & Craven, 2008; Cuong et al., 2013; 2014), we evaluate the algorithms using the area under the accuracy curve (AUAC) scores. For each task, we compute the scores on a separate test set during the first 300 queries and then normalize these scores so that their values are between 0 and 100. The final scores are obtained by averaging 10 runs with different random seeds.

We shall consider three scenarios: (1) the labeler abstains from labeling examples unrelated to the target classification task, (2) the labeler abstains from labeling easy examples, and (3) the labeler abstains from labeling hard examples.

We consider binary text classification between two topics, rec.motorcycles and rec.sport.baseball, from the 20 Newsgroups data (Joachims, 1996). In the pool of unlabeled data, we allow examples from other classes (e.g., in the computer category) that are not related to the two target classes. The labeler always abstains from labeling these redundant examples while always giving labels for examples from the target classes. Thus, the abstention is on examples unrelated to the target task, and this satisfies the independence assumption between and (or between and ) in Sections id1 and id1. In the experiment, we fix the pool size to be 1322 and vary the abstention percentage (%) of the labeler by changing the ratio of the redundant examples.

Figure 2 shows the results for various abstention percentages. From the figure, our algorithms ALa and ALw are consistently better than the baselines for abstention percentages above 40%. When a good estimate of is available, our algorithms perform better than all the other algorithms for abstention percentages above 30%. This shows the advantage of modeling the labeler’s abstention pattern, especially for medium to high abstention percentages.

In this scenario, we test with the labeler who abstains from labeling easy data, which are far from the true decision boundary. This setting may seem counter-intuitive, but it is in fact not unrealistic. For example in the learning with corrupted labels setting discussed in Section id1, easy examples may be considered less important than hard examples and thus were less protected than hard ones. In this case, an attacker may attempt to corrupt the labels of those easy examples to bring down the performance of the learned classifier. Furthermore, under a heavy attack, we may expect a high abstention percentage.

We simulate the abstention pattern for this scenario by first learning a logistic regression model with regularizer on the whole training data set and then measuring the distance between the model’s prediction probability to 0.5 for each example. The labeler would always abstain from labeling the subset of the training data (with size depending on the abstention percentage) that have the largest such distances while he would always give labels for the other examples. Figure 3 (first row) shows the results for this setting on the following 4 binary text classification data sets from the 20 Newsgroups data (from left to right): comp.sys.mac.hardware/comp.windows.x, rec.motorcycles/rec.sport.baseball, sci.crypt/sci.electronics, and sci.space/soc.religion.christian.

From the results, ALa and ALw work very well for abstention percentages above 50%. This shows it is useful to learn and take into account the abstention probabilities when the abstention percentage is high (e.g., under heavy attacks), and our algorithms provide a good way to exploit this information. When the abstention percentage is small, the advantages of ALa and ALw diminish. This is expected since in this scenario, learning the abstention pattern is more expensive than simply ignoring it. However, when a good estimate of is available, ALa and ALw perform better than all the other algorithms for most abstention percentages.

In this scenario, we test with the labeler who abstains from labeling hard data, which are near to the true decision boundary. This setting is common when the labeler wants to maximize the number of labels giving to the learner (e.g., in crowdsourcing where he is paid for each label provided). The abstention pattern in this experiment is generated similarly to the previous scenario, except that the labeler abstains from labeling the examples having the smallest distances above instead of those with the largest distances.

Figure 3 (second row) shows the results for this scenario on the same 4 data sets above. These results suggest that this is a more difficult setting for active learning. From the figure, ALa and ALw are only better than the baselines when the abstention percentage is from 20-40%. For other abstention percentages, ALa, ALw, and ALg do not provide much advantage compared to PL. However, when a good estimate of is available, ALa and ALw perform very well and are better than all the other algorithms.

Summary: The results above have shown that the proposed algorithms are useful for pool-based active learning with abstention feedbacks when the abstention percentage is within an appropriate range that depends on the problem. The algorithms are especially useful when a good estimate of the abstention rate is available. In practice, this estimate can be pre-computed from previous interactions between the learning systems and the labeler (e.g., using previous labeling preferences of the labeler), and then inputted into our algorithms as the priors . During the execution of our algorithms, this estimate will be gradually improved.

We proposed two new greedy algorithms under the Bayesian active learning with abstention feedbacks framework. This framework is useful in many real-world scenarios, including learning from multiple labelers and under corrupted labels. We proved that the algorithms have theoretical guarantees in the average and worst cases and also showed experimentally that they are useful for classification, especially when a good estimate of the abstention rate is available. Our results suggest that keeping track and learning the abstention patterns of labelers are important for active learning with abstention feedbacks in practice.

Acknowledgements The authors would like to thank John Bradshaw, Paavo Parmas, and Long Tran-Thanh for their comments on the manuscript. Lam S. T. Ho was supported by startup funds from Dalhousie University, the Canada Research Chairs program, and the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant RGPIN-2018-05447.

Appendix

Proof of Theorem 1 To prove this theorem, we first apply Theorem 5.2 in (Golovin & Krause, 2011). This requires us to prove that the utility function is adaptive monotone and adaptive submodular with respect to the joint prior distribution . Note that is the version space reduction function with respect to the joint prior on the joint space . From the results in Section 9 of (Golovin & Krause, 2011), version space reduction functions are adaptive monotone and adaptive submodular with respect to the corresponding prior. Thus, the utility function is adaptive monotone and adaptive submodular with respect to the joint prior .

With the above properties of , applying Theorem 5.2 in (Golovin & Krause, 2011), we have:

where is the greedy algorithm that selects the examples maximizing the expected utility gain at each step. From the proof of Theorem 4 of (Cuong et al., 2013), this greedy algorithm is equivalent to the maximum Gibbs error algorithm that selects the examples according to the criterion:

(8) |

where is the current posterior distribution, is the random variable for the label of , and is the random variable for the abstention pattern of . To understand this equation, we can think of the considered problem as a classification problem with labels or , where indicates an example is labeled the label and indicates an example is not labeled.

Since and are independent, Equation (S6.Ex7) is equivalent to:

We also have:

Similarly, .

Hence, the previous equation is equivalent to:

which is Equation (3). Therefore, the average-case BALAF algorithm is equivalent to and Theorem 1 holds.

Proof of Theorem 2

To prove this theorem, we first apply Theorem 3 in (Cuong et al., 2014). This requires us to prove that the utility is pointwise monotone and pointwise submodular. Note that is the version space reduction function with respect to the joint prior on the joint space . From the proof of Theorem 5 in (Cuong et al., 2014), version space reduction functions are both pointwise monotone and pointwise submodular. Thus, is pointwise monotone and pointwise submodular.

With the above properties of , applying Theorem 3 in (Cuong et al., 2014), we have:

where is the greedy algorithm that selects the examples maximizing the worst-case utility gain at each step. From the proof of Theorem 5 of (Cuong et al., 2014), this greedy algorithm is equivalent to the least confidence algorithm that selects the examples according to the criterion:

(9) |

where is the current posterior distribution, is the random variable for the label of , and is the random variable for the abstention pattern of . Similar to the proof of Theorem 1 above, to understand this equation, we can think of the considered problem as a classification problem with labels or , where indicates an example is labeled the label and indicates an example is not labeled.

Since and are independent, Equation (S6.Ex17) is equivalent to:

From the proof of Theorem 1, we have and .

Hence, the previous equation is equivalent to:

which is Equation (6). Therefore, the worst-case BALAF algorithm is equivalent to and Theorem 2 holds.

## References

- Chen et al. (2017) Chen, Y., Hassani, S. H., and Krause, A. Near-optimal Bayesian active learning with correlated and noisy tests. In Proceedings of the International Conference on Artificial Intelligence and Statistics, 2017.
- Cuong & Xu (2016) Cuong, N. V. and Xu, H. Adaptive maximization of pointwise submodular functions with budget constraint. In Proceedings of the Conference on Neural Information Processing Systems, 2016.
- Cuong et al. (2013) Cuong, N. V., Lee, W. S., Ye, N., Chai, K. M. A., and Chieu, H. L. Active learning for probabilistic hypotheses using the maximum Gibbs error criterion. In Proceedings of the Conference on Neural Information Processing Systems, 2013.
- Cuong et al. (2014) Cuong, N. V., Lee, W. S., and Ye, N. Near-optimal adaptive pool-based active learning with general loss. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2014.
- Cuong et al. (2016) Cuong, N. V., Ye, N., and Lee, W. S. Robustness of Bayesian pool-based active learning against prior misspecification. In Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
- Donmez & Carbonell (2008) Donmez, P. and Carbonell, J. G. Proactive learning: Cost-sensitive active learning with multiple imperfect oracles. In Proceedings of the Conference on Information and Knowledge Management, 2008.
- Fang et al. (2012) Fang, M., Zhu, X., and Zhang, C. Active learning from oracle with knowledge blind spot. In Proceedings of the AAAI Conference on Artificial Intelligence, 2012.
- Golovin & Krause (2011) Golovin, D. and Krause, A. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research, 2011.
- Golovin et al. (2010) Golovin, D., Krause, A., and Ray, D. Near-optimal Bayesian active learning with noisy observations. In Proceedings of the Conference on Neural Information Processing Systems, 2010.
- Joachims (1996) Joachims, T. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. Technical report, DTIC Document, 1996.
- Lewis & Gale (1994) Lewis, D. D. and Gale, W. A. A sequential algorithm for training text classifiers. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 1994.
- Malago et al. (2014) Malago, L., Cesa-Bianchi, N., and Renders, J. Online active learning with strong and weak annotators. In NIPS Workshop on Learning from the Wisdom of Crowds, 2014.
- Manino et al. (2016) Manino, E., Tran-Thanh, L., and Jennings, N. R. Efficiency of active learning for the allocation of workers on crowdsourced classification tasks. In NIPS CrowdML Workshop, 2016.
- Mozafari et al. (2014) Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., and Madden, S. Scaling up crowd-sourcing to very large datasets: A case for active learning. Proceedings of the VLDB Endowment, 2014.
- Naghshvar et al. (2012) Naghshvar, M., Javidi, T., and Chaudhuri, K. Noisy Bayesian active learning. In Proceedings of the Annual Allerton Conference on Communication, Control, and Computing, 2012.
- Ni & Ling (2012) Ni, E. and Ling, C. Active Learning with c-Certainty. Advances in Knowledge Discovery and Data Mining, 2012.
- Ramirez-Loaiza et al. (2014) Ramirez-Loaiza, M. E., Culotta, A., and Bilgic, M. Anytime active learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2014.
- Settles (2010) Settles, B. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 2010.
- Settles & Craven (2008) Settles, B. and Craven, M. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008.
- Singla et al. (2016) Singla, A., Tschiatschek, S., and Krause, A. Noisy submodular maximization via adaptive sampling with applications to crowdsourced image collection summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
- Yan et al. (2015) Yan, S., Chaudhuri, K., and Javidi, T. Active learning from noisy and abstention feedback. In Proceedings of the Annual Allerton Conference on Communication, Control, and Computing, 2015.
- Yan et al. (2011) Yan, Y., Fung, G. M., Rosales, R., and Dy, J. G. Active learning from crowds. In Proceedings of the International Conference on Machine Learning, 2011.
- Zhang & Chaudhuri (2015) Zhang, C. and Chaudhuri, K. Active learning from weak and strong labelers. In Proceedings of the Conference on Neural Information Processing Systems, 2015.
- Zhao et al. (2011) Zhao, L., Sukthankar, G., and Sukthankar, R. Incremental relabeling for active learning with noisy crowdsourced annotations. In PASSAT and SocialCom, 2011.
- Zhao et al. (2017) Zhao, M., An, B., Gao, W., and Zhang, T. Efficient label contamination attacks against black-box learning models. In Proceedings of the International Joint Conferences on Artificial Intelligence, 2017.