Textual Membership Queries

Textual Membership Queries

Jonathan Zarecki
Department of Computer Science
Technion - Israel Institute of Technology
Haifa, Israel
szarecki@cs.technion.ac.il
&Shaul Markovitch
Department of Computer Science
Technion - Israel Institute of Technology
Haifa, Israel
shaulm@cs.technion.ac.il
Abstract

Human labeling of textual data can be very time-consuming and expensive, yet it is critical for the success of an automatic text classification system. In order to minimize human labeling efforts, we propose a novel active learning (AL) solution, that does not rely on existing sources of unlabeled data. It uses a small amount of labeled data as the core set for the synthesis of useful membership queries (MQs) — unlabeled instances synthesized by an algorithm for human labeling. Our solution uses modification operators, functions from the instance space to the instance space that change the input to some extent. We apply the operators on the core set, thus creating a set of new membership queries. Using this framework, we look at the instance space as a search space and apply search algorithms in order to create desirable MQs. We implement this framework in the textual domain. The implementation includes using methods such as WordNet and Word2vec, for replacing text fragments from a given sentence with semantically related ones. We test our framework on several text classification tasks and show improved classifier performance as more MQs are labeled and incorporated into the training set. To the best of our knowledge, this is the first work on membership queries in the textual domain.

\pdfstringdefDisableCommands

 

Textual Membership Queries


  Jonathan Zarecki Department of Computer Science Technion - Israel Institute of Technology Haifa, Israel szarecki@cs.technion.ac.il Shaul Markovitch Department of Computer Science Technion - Israel Institute of Technology Haifa, Israel shaulm@cs.technion.ac.il

\@float

noticebox[b]Preprint. Work in progress.\end@float

1 Introduction

As text data becomes a major and highly accessible information source, many research efforts have been directed to the text classification task in an attempt to extract useful information. Practical applications in widespread use today include sentiment analysis, e-mail spam classification and document filtering [1, 31]. In these applications, supervised learning algorithms use labeled documents to construct a classification model that predicts a class value given an unlabeled instance.

Learning algorithms require sufficient labeled data to produce a high-quality model. However, getting labeled data can pose a significant challenge in some problem domains and may require human labor. We can ask which examples will benefit the algorithm the most and classify those alone in an attempt to reduce labeling efforts. This question is addressed in the field of Active Learning [26], where we assume the existence of an oracle, capable of labeling any instance. Under these settings, an active learning algorithm (the “learner") chooses which queries to present to the oracle, actively trying to choose only the most informative ones.

One of the first theoretical models proposed for active leaning was membership queries (MQs) [3]. In this setting, the learner may request a label for any unlabeled instance in the instance space, including queries that the learner generates from scratch. This approach holds strong theoretical promise as its learning model is more robust than the standard PAC [30] learning model in many cases [7, 2], since it does not depend on the class distribution.

However, the MQ model suffers from a major drawback. Learning algorithms usually work in the feature space, whereas human labelers, work in the instance space. Hence, human oracles cannot evaluate and label new instances in the feature space. While the feature functions map from the instance space to the feature space, the reverse mapping usually does not exist. Therefore, MQs generated by the learner in the feature space are useless in classification tasks.

To illustrate this problem, let us look at a flower classification task, where the input is a black-and-white image of a flower and our goal is to classify it to the correct flower species present in the image. The instance space for this task is the set of all possible black-and-white flower images. The feature space is all binary matrices, representing all possible binary feature vectors. Obviously, a major part of this feature space cannot be mapped into the instance space, and as such cannot be classified by a human oracle. A well-known illustration of this problem is the early work of Baum and Lang [15], who attempted to use MQs for handwritten digit recognition. Given two digits, they combined them into a new image and queried a human oracle for a classification. This process, however, often resulted in an unrecognizable digit. Another major problem with membership queries is that the mapping between the feature space (what learning algorithms receive as input) and the instance space (the domain recognizable for humans) is not 1-to-1. In the textual domain, for example, the bag of words (BOW) feature representation [13], is used to represent textual instances, and two different instances can be represented using the same feature vector. Thus, the sentences “Man bites dog" and “Dog bites man" would be represented in the same way despite their obviously different meanings.

To address these problems, stream-based [4] and the more popular pool-based [17] approaches for active learning were suggested. Pool-based active learning assumes that a collection (pool) of unlabeled instances is available. Queries are selected from the pool, usually in a greedy fashion, to extract the instance estimated to be most useful. Stream-based active learning [4] assumes the data is presented as a stream of instances, and the learner chooses whether to request the label for the presented instance. Both methods rely on outside sources to supply large amounts of unlabeled data, and their performance is highly dependent on the quality (with respect to the learning process) of the instances this source provides.

There are, however, real-world scenarios where not enough data can be collected or the quality of the sampled instances is not sufficient. One example is domains where the data is extremely rare. Another is where negative instances are distributed widely across the instance space and most of them are located very far from the classification boundary. Such instances are poor in quality and hence are not beneficial to the learning algorithm [12]. In both scenarios, classic learning models will find it difficult to generalize. These problems, associated with pool-based active learning and stream-based active learning do not affect the third type of active learning - membership queries.

In this work we present a new general and practical methodology for generating membership queries. Our work tackles the problems in membership query synthesis and presents a novel practical algorithm for synthesizing new instances.

Our main contributions are as follows:

  1. We present a new, practical way of synthesizing high-quality membership queries.

  2. We present and study the instance space as a search space and define operators over this space.

  3. We present the textual space as an instance space, defining its operators and implementing algorithms that utilize the textual instance space for the generation of new textual membership queries.

2 A Framework for Generating Textual Membership Queries

This research is focused on novel ways of synthesizing membership queries based on expanding a small set of existing labeled instances, which we call a core set. To achieve this goal, we look at the instance space as a search space, which includes different operators, and apply them as a sequence in order to generate new instances. We then utilize local search algorithms in order to maximize a utility function, thus creating highly desirable instances.

Before describing the algorithms and implementation, we shall first formalize the learning setup.

2.1 Learning Setup

As stated above, many works in the active learning field fail to distinguish between the feature space and the instance space. Let us first formally define all the components of an active learning system.

Let be a set of instances, called the instance space. Let o: be an oracle that can label instances from as positive (1) or negative (0)111For simplicity we describe only binary classifications, but our framework can generalize to multi-class problems.. Let be a set of feature functions, where . We denote the feature space by . We denote as a feature vector for each . It is important to note that different feature extractors may use different feature spaces.

These distinctions are important as the oracle can only classify objects in , and not from the feature space used by . In the textual domain, is only able to receive well-structured sentences and will not classify BOW vectors. Furthermore, it is important to note that there is normally no inverse function that is able to transform feature vectors to instances. In almost all cases, the transformation is not 1-1, and many feature vectors cannot originate from a member of , as discussed with regard to BOW features in the introduction. The lack of an inverse function from to implies that we cannot produce desirable feature vectors and then transform them back into instances to be labeled by the oracle.

2.2 Membership Query Synthesis in the Instance Space

In this work, we present a framework for automatic generation of MQs in the instance space . Our framework uses modification operators () in order to generate new examples from a core set of labeled instances.

2.2.1 Modification Operators

As stated in section 2.1, generating examples in the feature space and then transforming them into members of the instance space is usually impossible, forcing us to generate instances exclusively in . We therefore introduce modification operators that are functions . A modification operator modifies some aspects of the input instance to produce another instance. We assume the availability of a set of modification operators . The set is specific to each instance space, meaning that for each new problem domain, new operators must be defined.

The quality of the modification operators is crucial to the performance of our algorithm. For the algorithm to perform well, the modification operators must be able to create a diverse set of new instances given an input, while also keeping their output within the instance space . Using our notations of modification operators, we can define the closure of a given core set as: . We would like to define the operators such that is as large and diverse as possible. We will later discuss the implementation of the modification operators for the textual domain.

2.2.2 Instance Evaluation Functions

In order to evaluate the synthesized membership queries, an evaluation function is required. One option is to use existing active-learning functions . These functions are designed to assign higher values to more informative instances. We can compose these functions with the feature mapping function and build the instance evaluation as: .

Well-known AL heuristic functions include Uncertainty Sampling [17], Expected Model Change [18] and Query-by-Committee [27], which help score unlabeled instances according to their value to the learner. Pool-based active learning approaches use these utility functions greedily in order to decide which instances to label.

2.2.3 Stochastic Query Synthesis

A simple way of utilizing the modification operators is to apply them in random order to the core set of instances. The algorithm maintains a set of instances . We initialize with the core set and in each step we randomly choose an instance from and apply a random operator to it. The resulting new instance is added to . At the end of the algorithm we return the new instances in generated by the algorithm as the MQs. After the set of potential MQs is generated, an instance evaluation function is applied to select which instances to send to the oracle for labeling. The pseudo-code for stochastic query synthesis can be seen in Algorithm 1.

2.2.4 Query Synthesis using Search Algorithms

The stochastic synthesis algorithm can be improved by treating the instance space as a search space and actively seeking the more informative instances to generate. Let us define the instance search space as follows: The state space is any member of the instance space, Initial states can be any member of the instance space. Actions are the modification operators explained in 2.2.1 and the heuristic value is the score given by the instance evaluation function, as discussed in 2.2.2. A search algorithm that applies modification operators with the goal of maximizing some heuristic value is more likely than the stochastic synthesis algorithm to generate instances with a high evaluation score. It is also difficult, using stochastic synthesis to apply many modification operators to a single instance. As search algorithms apply operators one after the other, we expect that using them will solve this problem as well.

Input : seed - core set of initial instances
OPs - instance modification operators
K - the number of membership queries
we want to synthesize
Output : set of membership queries
1 = seed;
2 while  do
      3 base = randomly choose an instance from ;
      4 op = randomly choose a modification op from OPs ;
      5 new_inst = apply(base, op) ;
      6 = {new_inst} ;
      
7return ;
Algorithm 1 Stochastic query synthesis
Input : seed - core set of initial instances
OPs - instance modification operators
H - instance evaluation function
searchAlg - local search algorithm
K - the number of membership queries
we want to synthesize
Output : set of membership queries
1 = seed;
2 while  do
      3 base = randomly choose an instance from ;
      4 new_inst = apply(searchAlg, base, OPs, H) ;
      5 = {new_inst} ;
      
6return ;
Algorithm 2 Search-based query synthesis

Our search-based algorithm is listed in Algorithm 2. Similarly to the stochastic synthesis algorithm, the search-based algorithm maintains a set of instances . We initialize with the core set, and in each step we randomly choose an instance from and run a search algorithm, such as beam search [25] or hill climbing [25], with the chosen instance as the initial state. The resulting new instance is added to . At the end of the algorithm we return the new instances in generated by the algorithm as the MQs. As we can see, the search-based algorithm can work with any local search algorithm, and with any active-learning evaluation function.

2.3 Textual Instance Space

In this work, we focus on using our approach in the textual domain. At this stage we limit our study to sentences, but we plan to extend it to larger textual objects. As discussed in the introduction, generating instances in the textual domain is especially problematic. Synthesized sentences can easily become unreadable when not treated carefully. Furthermore, common representations, such as BOW, are not surjective: there are feature vectors that do not map to any instance. For example, the feature vector {“man", “dog", “cat"} does not map into any legal sentence. These representations are also not injective: there is not a 1-1 correspondence between the instance space and the feature space. As we saw in the introduction, the sentences “man bites dog" and “dog bites man" will receive the same BOW vector. These limitations prevent us from generating examples in the feature space and mapping them back to the instance space to be tagged.

To apply our methodology to the textual domain, we need to define the instance space and the modification operators (our methodology is independent of the feature space). We define the instance space for the text classification domain to be the set of all syntactically and semantically legal sentences in English. A complete example of our modification operators is presented in the supplementary material.

2.3.1 Modification Operators in the Textual Domain

To define our modification operators, we must first define a semantic neighborhood of a word and a distance function between words. We define a semantic neighborhood of a word as the set of words that hold related meaning and can be used in similar contexts. Our modification operators use the semantic neighborhood in order to substitute words in a sentence with other words in their semantic neighborhood, generating new legal sentences.

For now, we assume the existence of a function that measures the semantic distance between two words. Let us define the k-semantic neighborhood for a particular word , , as the closest words to .

Using the semantic neighborhood, we can now define the modification operators for a given sentence s. First, all verbs, nouns and adjectives in s are marked as replaceable words. Then, the k-semantic neighborhood of each replaceable word is calculated. A candidate operator replaces a replaceable word with a member of its k-semantic neighborhood. The candidate operators are then filtered to keep only those that retain the syntactic structure of the original sentence.

2.3.2 Computing the Semantic Distance

In order to calculate , we can use existing methods for computing semantic distance. We have considered 4 existing methods: WordNet [14], Word2vec [19], Glove [22] and Dependency Word2vec [16].

Given a word, our goal is to find a diverse set of words that are semantically related to it. For a word to serve as a possible replacement within a specific context, it has to adhere to two general rules: First, it has to be functionally similar [29] to the original, meaning that the two words behave similarly in their context. Second, it has to be semantically related to the original word. For example, “book" and “dog" can be functionally similar to “cat", but in the sentence “I want to pet this cat", it is acceptable replace with “dog" but not with “book".

For a qualitative analysis of each method with respect to these two rules we refer the reader to the supplementary material. We chose to use Dependency Word2vec with a semantic environment size of 10 for the empirical evaluations to follow.

3 Empirical Evaluation

We analyzed the performance of our framework on 5 publicly available sentence classification datasets. The code for all experiments is available at https://github.com/jonzarecki/textual-membership-queries.

3.1 Experimental Methodology

In this subsection we will discuss the methodology of our experiments: the datasets used, the methods compared. Our experimental design is presented in the supplementary material.

3.1.1 Datasets

We report results on 5 binary sentence classification datasets: three sentiment analysis datasets, one sentence subjectivity dataset, and one hate-speech detection dataset.

CMR: Cornell sentiment polarity dataset [21]. SST: Stanford sentiment treebank, a sentence sentiment analysis dataset [28]. KS: A Kaggle short sentence sentiment analysis dataset. 222The KS dataset is available at: https://www.kaggle.com/c/si650winter11. HS: Hate speech and offensive language classification dataset [9]. SUBJ: Cornell sentence subjective / objective dataset [20].

3.1.2 Simulating the Human Oracle

As in most works on active learning, we require a human expert to query our unlabeled instances. However, because we generate different MQs in every run and need to label these instances every time, a very significant labeling effort is required. To address this problem, we borrow an idea from a work on feature labeling [11] and simulate a human labeler. To make the artificial setting as close as possible to a real-world setting, the artificial expert is a learning model trained on the entire dataset, and for each dataset we chose a model which is close to the state-of-the-art for the task. 333For SST, CMR, SUBJ & KS we used the open-source implementation of Generating Reviews and Discovering Sentiment [23], which achieved state-of-the-art results for CMR and SUBJ. For KS it achieved 94% accuracy. Available at: https://github.com/openai/generating-reviews-discovering-sentiment. 444For the hate-speech (HS) dataset we used a BOW-based classifier, which achieved 91% accuracy. The cross-validation accuracy of each artificial expert on its dataset are 86.8% in CMR, 94.4% in SUBJ, 86.2% in SST, 91.0% in HS and 94.5% in KS. The expert receives close to state-of-the-art performance for each dataset and as such it is the closest possible simulation to a human expert.

3.1.3 Compared Methods

In the following experiments, we compared our methods with other approaches, as follows.
Our methods: a) Uncertainty sampling hill-climbing MQ synthesis (US-HC-MQ): The proposed search-based synthesis method, using hill-climbing as the search algorithm and uncertainty sampling as the heuristic function. We used an average depth of 4 for the hill-climbing algorithm. A sensitivity test for the depth is present in the supplementary material; b) Uncertainty sampling beam-search MQ synthesis (US-BS-MQ): The proposed search-based synthesis method, using beam-search as the search algorithm and uncertainty sampling as the heuristic function. We used an average depth of 4 for the beam-search algorithm; Stochastic Synthesis (S-MQ): A degenerated version of our method, described in detail in 2.2.3.

As no other work attempted to build membership queries in the textual domain prior to this work, a direct comparison with similar works was not possible. We thus chose 3 somewhat similar approaches to generation/augmentation of textual instances to compare with.
Competitor Methods: a) Original examples (IDEAL): Randomly select a set of unlabeled examples, label them with original labels and insert them into the model. This model has an unfair advantage of using a pool of unlabeled instances not available to the other methods. It is presented to assume a ’upper-limit’ role, enabling us to see what would happen if we had unlabeled examples; b) RNN Generator (RNN): A method for generating sentences with an LSTM (RNN) model. We trained the models with only the core set of instances; c) WordNet-based data augmentation (WNA): A possible approach to text augmentation, where words are replaced with their respective synonyms from WordNet [32].

3.2 Experiment 1 - Batch Active Learning with Membership Queries

We measured the performance of our MQ synthesis framework in a batch AL setup, where at each step a pool of unlabeled instances is generated and then m (batch size) samples are chosen to be labeled and incorporated into the core set.

In each step, we use the pool generation function ‘gen_MQs’ to return a pool of unlabeled instances with size P. Then heuristic function H is used to extract the m most informative instances in a batch to be labeled by the expert. These labels are then incorporated into the the training set and used to train a model. The model’s accuracy is then measured against the test set. Additional details concerning the experimental design can be found in the supplementary materials.

RNN results are not shown because the sentences generated using this approach were not comprehensible. Training a neural-network model with only 10 sentences is impractical, as usually these models are fed with large amounts of data555https://blog.openai.com/generative-models/. Some examples of the sentences are: “r and lapaglia ." and “x this film ."

Figure 1 plots the accuracy curves as a function of the number of queries generated by our algorithm or by the competitor methods. We used a core set of 10 sentences, a pool size of 20, an AL batch size of 5, and the uncertainty sampling-based [17] heuristic function as H for all experiments.

Figure 1: Comparison of accuracy achieved by the different methods                                                                                                                                     US-HC-MQ (red circle), US-BS-MQ (blue star), S-MQ (green pentagon), IDEAL (gray down triangle), WNA (purple up triangle)

The two search-based approaches (US-HC-MQ and US-BS-MQ) both exhibited excellent performance across the 5 datasets. The comparison of the search-based approaches to S-MQ showed that, as we expected, more valuable examples are obtained when using the utility function in the generation process. WNA performed admirably considering that in principle it is using only a small semantic neighborhood and therefore receives only synonyms. However, its lack of diversity resulted in low accuracy on some datasets. Another significant disadvantage results from WNA using only synonyms. The limited amount of synonyms available from WordNet makes it unable to generate a large pool of instances. In comparison, our method can theoretically generate as many instances as required.

Finally, the Friedman test [10] with Bonferroni-Dunn post-hoc testing and a shows a clear statistical advantage of the search-based method over WNA even when a relatively small number of datasets (5) was used for the test. The test diagram is present in the supplementary material.

3.3 Experiment 2 - The Effect of the Synthesis Algorithm on Label Switches

In our framework, instances are also able to change their original label. In this experiment we tested the effect of our synthesis algorithms on the number of label changes they generate. Three algorithms were compared, Uncertainty hill-climbing (US-HC-MQ), Stochastic hill-climbing (S-HC-MQ) and Stochastic synthesis (S-MQ). US-HC-MQ uses a heuristic function to direct its search, S-HC-MQ applies multiple operators randomly, and S-MQ randomly applies only one operator at a time, just as described in Algorithms 1 and 2. We randomly chose a core set of 10 instances for each dataset and used each synthesis algorithm to generate 50 examples. The score for each algorithm is the portion of examples it generated that changed their original label. We repeat the experiment 20 times with different core sets and show the average results on all available datasets.

Figure 2 shows a clear hierarchy, where US-HC-MQ has the most label changes, followed by S-HC-MQ, and then by S-MQ. This result reinforces our hypothesis that using multiple operators on a single sentence as well as using heuristic functions during generation results in more diverse sentences. The instances that changed labels are “near-misses" [12]. While they originally belonged to a certain class, our sequence of modification operators caused them to switch their original class without significant changes to the instance.

Figure 2: The effect of the synthesis algorithm on the number of changed labels

In addition to previous experiments, a qualitative analysis of the sentences generated by our approach is available in the supplementary material.

4 Related Work

In this section we will further discuss three works that are of particular relevance to the topic of membership queries in the textual domain.

X-local membership queries were first introduced by Awasthi et al. [5], defined as a query to any point for which there exists a point in the training sample with Hamming distance lower than X. Bary [6] built on this idea and proved that even a learning model that uses only 1-local MQs (a query for which there exists a point in the training sample with Hamming distance of 1) is stronger than the standard PAC learning model. In addition, Bary also employed a method similar to 1-local queries to gather information for the task of sentiment analysis.

However, neither Bary or Awasthi applied the idea of local membership queries to instances. Rather, they applied this idea to the feature vectors representing those queries. As we discussed in subsection 2.1, it is usually impossible present an altered feature vector to a human oracle, as was done in these works. Thus the idea of local membership queries has remained mainly theoretical.

In the previous work [12] we presented the idea of modification operators that remain in the instance space. In contrast to the operators applied by Awasthi and Bary, these operators are applied to instances, easily read by a human expert, and returns other instances. This work used a small seed of only positive instances to model the entire instance space, generating “near miss examples" in order to effectively model the vast space of negative instances. However, this work was applicable only to the image domain, and its operators obviously could not be applied to textual sentences. Nor did this work fully explore the options of generation when using the modification operators, such as using search algorithm and heuristic functions during the MQ generation process, but was mainly focused on generating near-miss examples.

Several works have discussed the topic of textual data augmentation [32, 24], where existing examples from the training set are augmented into other very similar instances. However, the augmented instances are not allowed to change their class and so are limited to synonyms which limits the variety of instances they generate. Indeed, our empirical evaluation showed Zhang & LeCun’s method [32] to be less effective than our suggested methods.

5 Conclusions

The goal of this work was to show membership queries in a more practical light. We presented a novel modification operator-based framework for generating membership queries that are within the instance space and recognizable to the human oracle. Using this framework we presented a local-search-based algorithm for generating MQs that uses information from an additional utility function to direct the search to highly valuable instances. We implemented our approach in the textual domain and demonstrated that our modification operators will result in legal sentences. We evaluated our methods on several datasets, finding high accuracy gains when using MQs. Somewhat surprisingly, the approach is sometimes be competitive with an approach that utilizes a pool of unlabeled instance not available to our MQ framework. To further this line of research, we have released the implementation of our textual MQs as open source software at https://github.com/jonzarecki/textual-membership-queries.

Our results motivate a few interesting directions for future work that we plan to explore. First, our results indicate that there is much more information to be found in examples already present in the training set. Thus it is possible for our modification operators to be used within an augmentation setup. In addition, augmentation operators can be used as a means of oversampling, as a alternative to SMOTE [8] for example, which does the oversampling in the feature space. Second, our results also indicate that using utility functions (such as uncertainty sampling) does result in more valuable instances. Thus we plan to explore ways to enrich existing pools of unlabeled instances using modification operators using a method similar to the one used here. This direction may enhance the results of existing methods for pool-based AL.


References

  • [1] Charu C Aggarwal and ChengXiang Zhai. A Survey of Text Classification Algorithms. In Charu C Aggarwal and ChengXiang Zhai, editors, Mining Text Data, pages 163–222. Springer US, Boston, MA, 2012.
  • [2] Dana Angluin. Learning Regular Sets from Queries and Counterexamples. Inf. Comput., 75(2):87–106, 1987.
  • [3] Dana Angluin. Queries and Concept Learning. Machine Learning, 2(4):319–342, 1988.
  • [4] Les E Atlas, David A Cohn, and Richard E Ladner. Training Connectionist Networks with Queries and Selective Sampling. In D S Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 566–573. Morgan-Kaufmann, 1990.
  • [5] Pranjal Awasthi, Vitaly Feldman, and Varun Kanade. Learning using local membership queries. COLT, 30:1–34, 2013.
  • [6] Galit Bary. Learning Using 1-Local Membership Queries. PhD thesis, The Hebrew University of Jerusalem, 2015.
  • [7] N H Bshouty. Exact Learning Boolean Functions via the Monotone Theory. Information and Computation, 123(1):146–153, 1995.
  • [8] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, 2002.
  • [9] Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. Automated Hate Speech Detection and the Problem of Offensive Language. In Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM ’17, pages 512–515, 2017.
  • [10] Janez Demšar. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res., 7:1–30, 2006.
  • [11] Gregory Druck, Burr Settles, and Andrew McCallum. Active Learning by Labeling Features. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP ’09, pages 81–90, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.
  • [12] Nela Gurevich, Shaul Markovitch, and Ehud Rivlin. Active learning with near misses. Proceedings of the National Conference on Artificial Intelligence, 21(1):362, 2006.
  • [13] Zellig S. Harris. Distributional Structure. Word, 10(2-3):146–162, 1954.
  • [14] Mario Jarmasz. Roget’s Thesaurus as a Lexical Resource for Natural Language Processing. CoRR, abs/1204.0, 2012.
  • [15] Kevin J. Lang and Eric B Baum. Query Learning Can Work Poorly when a Human Oracle is Used. In IJCNN 1992, pages 335–340, 1992.
  • [16] Omer Levy and Yoav Goldberg. Dependency-Based Word Embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 302–308, 2014.
  • [17] David D Lewis and William A Gale. A Sequential Algorithm For Training Text Classifiers. Proceedings of the 17th International Conference on Research and Development in Information Retrieval (SIGIR’94), 94:3–12, 1994.
  • [18] Michael Lindenbaum, Shaul Markovitch, and Dmitry Rusakov. Selective Sampling for Nearest Neighbor Classifiers. Machine Learning, 54(2):125–152, 2004.
  • [19] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pages 3111–3119, USA, 2013. Curran Associates Inc.
  • [20] Bo Pang and Lillian Lee. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the ACL, 2004.
  • [21] Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL, 2005.
  • [22] Jeffrey Pennington, Richard Socher, and Christopher D Manning. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1532–1543, 2014.
  • [23] Alec Radford, Rafal Józefowicz, and Ilya Sutskever. Learning to Generate Reviews and Discovering Sentiment. CoRR, abs/1704.0, 2017.
  • [24] Ryan Rosario. A Data Augmentation Approach to Short Text Classification. PhD thesis, University of California, Los Angeles, 2017.
  • [25] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd edition, 2009.
  • [26] Burr Settles. Computer Sciences Active Learning Literature Survey. Technical Report January, University of Wisconsin–Madison, 2009.
  • [27] H S Seung, M Opper, and H Sompolinsky. Query by Committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, number April in COLT ’92, pages 287–294, New York, NY, USA, 1992. ACM.
  • [28] Richard Socher, Alex Perelygin, and Jy Wu. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, 2013.
  • [29] Peter D. Turney and Patrick Pantel. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37:141–188, 2010.
  • [30] L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 11 1984.
  • [31] Bishan Yang, Sun Jian-Tao, Tengjiao Wang, and Zheng Chen. Effective Multi-Label Active Learning for Text Classification. In The 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 917. Association for Computing Machinery, Inc., 2009.
  • [32] Xiang Zhang and Yann LeCun. Text Understanding from Scratch. CoRR, abs/1502.0, 2015.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
192123
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description