Knowledge Base Relation Detection via Multi-View Matching
Relation detection is a core component for Knowledge Base Question Answering (KBQA). In this paper, we propose a knowledge base (KB) relation detection model based on multi-view matching, which utilizes useful information extracted from questions and KB.The matching inside each view is through multiple perspectives to compare two input texts thoroughly. All these components are trained in an end-to-end neural network model. Experiments on SimpleQuestions and WebQSP yield state-of-the-art results on relation detection.
Knowledge Base Question Answering (KBQA) systems query a knowledge base (KB) (e.g., Freebase, DBpedia) to answer questions Berant et al. (2013); Yao et al. (2014); Bordes et al. (2015); Bast and Haussmann (2015); Yih et al. (2015); Xu et al. (2016). To transform a natural language question to a KB query, a KBQA system needs to perform at least two sub-tasks: (1) detect KB entities appearing in a question and (2) detect a KB relation associated with a question. A KBQA system usually includes separate components to accomplish these sub-tasks. This paper focuses on the second sub-task, frequently referred to as relation detection, to identify which KB relation(s) are expressed by a given question. As discussed in Yu et al. Yu et al. (2017), relation detection remains a bottleneck of KBQA systems owing to its inherent difficulty.
KB relation detection is more challenging compared to general relation detection Zhou et al. (2005); Rink and Harabagiu (2010); Sun et al. (2011); Nguyen and Grishman (2014); Gormley et al. (2015); Nguyen and Grishman (2015) in the field of IE for two key reasons: (1) the number of relations to predict is usually large (1k) and (2) relations in a test set may not be seen during training. Previous work mainly focused on the large relation vocabulary problem and zero-shot relation learning. For example, Dai et al. Dai et al. (2016) use pre-trained relation embeddings from TransE Bordes et al. (2013) to initialize the relation representations. Yin et al. Yin et al. (2016) and Liang et al. Liang et al. (2016) factorize relation names into word sequences motivated by the fact that KB relation names usually comprise meaningful word sequences. Yih et al. Yih et al. (2015) and Golub et al. Golub and He (2016) represent questions at character level. Yu et al. Yu et al. (2017) propose to use both granularities of relation names and words in a hierarchical encoding and matching framework.
In this paper, we propose to improve KB relation detection by exploiting multiple views i.e., by leveraging more information from KB to obtain better question-relation matching. Besides frequently used relation names, we propose to make use of entity type(s) a relation can logically have as objects (i.e., object in a KB triple ). For instance, for a given question “What country is located in the Balkan Peninsula?”, the correct relation is and the object type for this relation is . We hypothesize that, in addition to relation names, it may also be useful to match this question against the object entity type (i.e., ) since the question has the word “located”, indicating that the answer to this question is a location.
Our contributions are two-fold. (1) We formulate relation detection as a multi-view matching task, where multiple views of information from both question and relation are extracted. We use an attention-based model to compare question and relation from multiple perspectives in each view. (2) We exploit object entity types, automatically extracted from KB, in our multi-view matching model. These two contributions help us achieve state-of-the-art KB relation detection accuracies on both WebQSP and SimpleQuestions datasets.
2 Related Work
Relation extraction (RE) was researched originally as an sub-field of information extraction. The major research methods in the traditional RE has the knowledge of a (small) pre-defined relation set, then given a text sequence and two target entities, the goal of these methods is to choose a relation or none which means if this relation or no relation holds between the two target entities. Thus from another perspective, RE methods are usually described as a classification task. Most of these RE methods need a step to manually pick large amount of featuresSun et al. (2011); Zhou et al. (2005); Rink and Harabagiu (2010). Due to recent machine learning and especially deep learning advances, many recent proposed RE approaches begin to explore the benefits of deep learning instead of using hand-crafted features. The main benefits ranging from pre-trained word embeddings Gormley et al. (2015); Nguyen and Grishman (2014) to deep neural networks like convolutional neural networks (CNN) and long-short term memories (LSTMs) Vu et al. (2016); Zeng et al. (2014); dos Santos et al. (2015) and attention models Zhou et al. (2016); Wang et al. (2016) which is shown to be key for a lot of other NLP tasks, such as machine translation, named entity recognition, reading comprehension, etc.
One strong assumption mentioned above in the most RE methods is that a fixed (i.e., closed) set of relation types is given as an prior knowledge, thus no zero-shot learning capability (i.e. detecting new relations that did not occur during training) is required. Another commonality among these RE methods is that the relation set is usually not large. Here are some examples. The widely used ACE2005 has 11/32 coarse/fine-grained relations; SemEval2010 Task8 has 19 relations; TAC-KBP2015 has 74 relations although it considers open-domain Wikipedia relations. Compared to that, KBQA usually has thousands of relations. Thus most RE approaches may not work well by directly being adapted to large number of relations or unseen relations. The relation embeddings in a low-rank tensor method were used Yu et al. (2016). However it is still using supervised way to train their relation embeddings and relation set used in the experiments is still not large.
Relation Detection in KBQA Systems
Similar to how RE methods evolved over time, relation detection methods for KBQA were also originally based on many hand-crafted features Bast and Haussmann (2015); Yao and Van Durme (2014). Later researchers started to explore the benefits of some simple deep neural networks Dai et al. (2016); Yih et al. (2015); Xu et al. (2016) and some advances onces including attention models Golub and He (2016); Yin et al. (2016).
In order to work well for open-domain question answering, many of the relation detection research for KBQA are designed to support large relation set and even open relation sets, for example ParaLex Fader et al. (2013)) and SimpleQuestions which are datasets need the capacity to support large relation sets and unseen relations becomes more necessary. While some KBQA data does not take such abilities into consideration because of the unnatural distribution of testing data: most of the gold test relations can be observed during training. For example WebQuestions, such property makes it less a open-domain task, thus the problem of supporting full relation vocabulary and zero-shot learning becomes less serious. Thus some prior work on this task adopted the close domain assumption like in the general RE research.
To support open QA, there are two main solutions for relation detection: (1) use pre-trained relation embeddings (e.g. from TransE Bordes et al. (2013)), like Dai et al. (2016); (2) factorize the relation names to sequences and formulate relation detection as a sequence matching and ranking task. Such factorization works because that the relation names usually comprise meaningful word sequences, especially for the OpenIE patterns such as in ParaLex. For example, relations are split into word sequences for single-relation detection Yin et al. (2016). Also good performance was achieved on WebQSP with word-level relation representation in an end-to-end neural programmer model Liang et al. (2016) . Character tri-grams was used as inputs on both question and relation sides Yih et al. (2015). \newcitegolub2016character proposed a generative framework for single-relation KBQA which predicts relation with a character-level sequence-to-sequence model.
Another significant difference between relation detection in KBQA and general RE is that general RE research works on the condition that the two argument entities are both available. Thus it usually can learn from features Gormley et al. (2015); Nguyen and Grishman (2014) or attention mechanisms Wang et al. (2016) based on the entity information (e.g. entity types or entity embeddings). In contrast relation detection for KBQA mostly does not have this information: (1) one question usually contains single argument (the topic entity) and (2) one KB entity could have multiple types (type vocabulary size larger than 1,500). This makes KB entity typing itself a difficult problem so no previous used entity information in the relation detection model. Such entity information has been used in some KBQA systems as features for the final answer re-rankers.
3 Problem Overview
Formally, for an input question , the task is to identify the correct relation from a set of candidate relations . The problem thus becomes learning a scoring function for optimizing some ranking loss.
Both questions and relations have different views of input features. Each view can be written as a sequence of tokens (regular words or relation names). Therefore, for a view of relation , we have , where is the length of relation ’s word sequence for view . The same definition holds for the question side. Finally, we have the multi-view inputs for both a question and a relation , where and denote the number of views for and , respectively. Note that and may not be equal.
Views for KB Relation Detection
For an input question, we generate views from relation names and their corresponding tail entity types and use three pairs of inputs in the model (see Figure 1).
entity name, entity mention pair captures entity information from question and KB.
relation name, abstracted question pair captures the interaction between an input question and a candidate relation. Following previous work Yin et al. (2016); Yu et al. (2017), we replace the entity mention in a question by a special token (“Balkan Peninsula” is replaced by e in Figure 1) to become an abstracted question, so that the model could focus better on matching a candidate relation name to the entity’s context in a question.
relation tail entity types, abstracted question pair helps determine how well relation tail types match with a question. Section 4 describes how we extract and use tail entity types.
For an input question, the first pair of inputs remains the same for all candidate relations to help the model differentiate between the candidates. So this pair does not need to be thoroughly compared via multi-perspective matching as all the other pairs of inputs.
For inputs to the 2nd and 3rd view, we generate two matching feature vectors, one for each of the directions of matching (i.e., for a pair , the directions are and ). Finally, the model combines these two pairs of interaction information to have a high-level joint view. The joint view helps us detect the most promising relation given how the question matches with the candidate relation names and the corresponding tail entity types. We present more details in Section 5 and present the experimental results with different combinations of views in Section 6.
4 Relation Tail Entity Types Extraction
In this work, we propose to make use of entity type(s) a relation can logically have as tails (i.e., object in a KB triple ). More often than not, KB relations can only have tail entities of specific types. For instance, for our example question “What country is located in the Balkan Peninsula?”, the corresponding relation in Freebase is and the tail entity (i.e., the answer to the question) can only be of type . This and other relations such as , , , can only have locations as tail entities, however the relations do not explicitly contain word(s) indicating the type of entities expected as answers. Motivated by this, we hypothesize that exploiting tail entity type information may improve relation detection performance. For our example, the learner may exploit the tail entity type (i.e., ) to learn that the relations are somewhat similar as they all share the same tail entity type and learn more generic representations for relations that have locations as tail types. Yin et al. Yin et al. (2017) also exploit tail entity types as they predict answer entity type as an intermediate step before predicting an answer. In contrast, we describe next how we heuristically generate a short list of relevant tail entity types for each unique KB relation.
A tail entity in an instance of a relation may be associated with multiple types. Given the triple , , , has types ranging from as generic as to more specific ones such as , , and . Therefore, given the relation , it is crucial to prune the unrelated entity types (, ) and retain the relevant ones (, ). To achieve this, we first obtain at most 500 instances 111We empirically found that 500 instances were sufficient for our entity type extraction experiment. for each unique relation from Freebase. Next, we query for the types for each of the tail entities obtained in the first step.222In Freebase, the relation lists the types for an entity. Finally, we retain only the types that at least 95% of the tail entities have. A default special token is used if we can not find any tail entity type for a relation in this approach. Once the tail types are obtained for a particular relation, we form one string by concatenating the words in each of the tail types and use the string as tail entity type string in the model described in Section 5.
5 Model Architecture
Figure 1 illustrates the architecture of our model. Apart from the entity alias and entity span pair (henceforth referred to as entity pair), each pair of inputs is matched from multiple perspectives, and then the matching representations of all pairs and the representations of entity pair are aggregated for final prediction. Next, we describe the three main components: inputs, context representation module, matching module and aggregation module.
The inputs to all views in the model are word sequences, and our model encodes each sequence in two steps. First, the model constructs a -dimensional vector for each word with two components: a word and a character-based embedding. A word embedding is a fixed, pre-trained vector (e.g., GloVe Pennington et al. (2014), word2vec Mikolov et al. (2013)). A character-based embedding is calculated by feeding each character (also represented by a vector) within a word into a LSTM.
Context Representation Module
The model leverages the same BiLSTM to encode all views of inputs. Then, the output contextual vectors of each BiLSTM are used in the matching modules. The contextual vectors for a question are fed into multiple matching modules to match with relation and tail types.
The purpose of this module is to incorporate contextual information into the representation of each time step of each input sequence. We utilize a bi-directional LSTM (BiLSTM) to encode contextual embeddings for each time-step of input sequence and get hidden state for each word position. We can have separated parameters for question and passage encoders but a single shared encoder for both works better in the experiments.
As RNN input, a word is represented by a row vector . can be the concatenation of word embedding and word features, though we do not use any additional word features. The word vector for the -th word is . A word sequence is processed using an RNN encoder with long-short term memory (LSTM) Hochreiter and Schmidhuber (1997), which was proved to be effective in many types of NLP tasks, including machine reading comprehension and neural machine translation tasks Bahdanau et al. (2015); Kadlec et al. (2016); Dhingra et al. (2016). For each position , LSTM computes with input and previous state , as:
where , , , and are d-dimensional hidden state, input gate, forget gate, output gate and memory cell, respectively; , and , are the parameters of the LSTM; is the sigmoid function, and denotes element-wise production. For a word at , we use the hidden state from the forward RNN as a representation of the preceding context, and the from a backward RNN that encodes text reversely, to incorporate the context after . Next, , the bi-directional contextual encoding of , is formed. is the concatenation operator. To distinguish hidden states from different sources, we denote the of -th word in and the of -th word in as and respectively.
The core task of relation detection is to calculate information interaction between relations and the given question. In this work, we design the matching module with attention models to match each view of a relation with a given question. The reason attention could be important here is that different views of relations usually correspond to different parts of questions. For example in Figure 1, the question words are usually more likely to indicate the relation types.
We modify the bilateral multiple perspective matching (BiMPM) model Wang et al. (2017), which performs comparably with state-of-the-art systems for several text matching tasks. We hypothesize that BiMPM could also be effective for relation detection since a unique view of a question may be required to match with either a relation or a tail entity type, and the matching method should match the relation with the question in multiple granularities and multiple perspectives.
In Figure 1, each box at the “Matching” layer is a single directional multi-perspective matching (MPM) module, therefore two such boxes together form a BiMPM module. We have modules on all views share the same parameters in the experiments.. Each MPM module takes two sequences, an anchor and a target, as inputs, and matches each contextual vector of the anchor with all the contextual vectors of the target. The arrows inside the MPM boxes in Figure 1 denote the direction of matching i.e., anchor target. To form a BiMPM, for instance, a question and a relation are considered anchor and target, respectively, and vice versa. During matching, a matching vector is calculated for each contextual vector of the anchor by composing all the contextual vectors of the target. Then, the model calculates similarities between the anchor contextual vector and the matching vector from multiple perspectives using the multi-perspective cosine similarity function.
The multiple-perspective cosine matching function to compare two vectors is
In the equation 9, and are two d-dimensional vectors, is a trainable parameter with the shape , is the number of perspectives, and the returned value is a l-dimensional vector. Each element is a matching value from the k-th perspective, and it is calculated by the cosine similarity between two weighted vectors
where is the element-wise multiplication, and is the k-th row of , which controls the k-th perspective and assigns different weights to different dimensions of the d-dimensional space.
The MPM module uses four matching strategies in this regard.
(1) Full-Matching: Each contextual vector of an anchor is compared with the last contextual vector of a target, which represents the entire target sequence.
(2) Max-Pooling-Matching: Each contextual vector of an anchor is compared with every contextual vector of the target with the multi-perspective cosine similarity function, and only the maximum value of each dimension is retained.
where is element-wise maximum.
(3) Attentive-Matching: First, the cosine similarities between all pairs of contextual vectors in the two sequences are calculated. Then the matching vector is calculated by taking the weighted sum of all contextual vectors of the target, where the weights are the cosine similarities computed above.
(4) Max-Attentive-Matching: This strategy is similar to Attentive-Matching except that, instead of taking the weighted sum of all the contextual vectors as the matching vector, it picks the contextual vector with the maximum cosine similarity from the target.
The first step in this module is to apply another BiLSTM on the two sequences of matching vectors individually. Then, we construct a fixed-length matching vector by concatenating vectors from the last time-step of the BiLSTM models. This is the representation of the overall matching for one view.
For combining the matching results from different views of input pairs and entity pair, we have the aggregation layer at the end, which takes the matching representations or scores from different views and extracted feature representation for entity pair, then constructs a feature vector for relation prediction. In this work, we simply use the concatenation of different matching representations generated from all the views by the matching modules. The combined representation of all multiple views are transformed into a final prediction through a multiple perception layer.
|1||BiCNN Yih et al. (2015)||Y||77.74||90.0|
|2||AMPCNN Yin et al. (2016)||N/A||-||91.3|
|3||Hier-Res-BiLSTM Yu et al. (2017)||N/A||82.53||93.3|
|8||(Q, Relation)(Q, Type)||Y||83.71||93.13|
|9||(Q´, Relation)(Q´, Type)||Y||84.74||93.38|
|10||(Q´, Relation)(Q´, Type)||N||84.86||93.52|
|11||(Entity Pair)(Q´, Relation)(Q´, Type)||Y||85.95||93.69|
|12||(Entity Pair)(Q´, Relation)(Q´, Type)||N||85.41||93.75|
We use two standard datasets - SimpleQuestions (SQ) Bordes et al. (2015) and WebQSP (WQ) Yih et al. (2016). Each question in these datasets is labeled with head entity and relation information. SQ has only single-relation questions i.e., there is one , , triple per question. In contrast, WQ has both single and multiple-relation questions. For a multiple-relation question, there are multiple relations on the path connecting a head to a tail entity. We adopt the same approach as Yu et al. Yu et al. (2017) to create positive and negative instances.
SimpleQuestions (SQ): The dataset has only single-relation questions i.e., a head entity and a tail entity is connected by one relation. To compare with previous work Bordes et al. (2015), we use a subset of Freebase with 2M entities (FB2M). We use the same training, validation, and test sets used in Bordes et al. (2015).
It is a single-relation KBQA task. The KB we use consists of a Freebase subset with 2M entities (FB2M) Bordes et al. (2015), in order to compare with previous research. Yin et al. (2016) also evaluated their relation extractor on this data set and released their proposed question-relation pairs, so we run our relation detection model on their data set. The training set has 571k instances and each question on average has about 8 candidate relations.
WebQSP (WQ): Unlike SimpleQuestions, WebQSP has both single and multi-relation questions. In case of a multi-relation, a head and a tail entity are connected via one or more contextual vectorTs (Compound Value Type) and there are multiple relations on the path connecting the head entity, the contextual vectorT(s), and the tail entity. We use the entire Freebase for our experiments on this dataset. The train and test sets are the same as the ones used in Yih et al. (2016). We adopt the same approach as Yu et al. Yu et al. (2017) to create positive and negative instances.
A multi-relation KBQA task. We use the entire Freebase KB for evaluation purposes. Following Yu et al. (2017), we evaluate the relation detection models through a new relation detection task from the WebQSP data set. The training set has 215k instances and each question on average has about 71 candidate relations.
We used development sets to pick the following hyper-parameter values: (1) the size of hidden states for LSTMs (300); (2) learning rate (0.0001); and (3) the number of training epochs (30). All word vectors are initialized with 300- GloVe embeddings Pennington et al. (2014). During testing, we predict the candidate relation with the highest confidence score.
Results and Analysis
Table 1 shows that our model yields state-of-the-art relation detection scores for both WQ (Row 11) and SQ (Row 12) by beating the previous best system Yu et al. (2017) by 3.42 and 0.45 points, respectively.
Rows 8-10 show that using relation and tail type as two separate inputs consistently outperforms the setting, where they are provided as a single input (Rows 6-7). Rows 8-9 also show that replacing entity mentions in question texts helps our model to focus more on the contextual parts of questions.
We found that using character embeddings on top of word embeddings does not have any significant impact. We hypothesize that this is due to the small number of KB relations and tail types. Although there are several thousands of these in Freebase, they are still much smaller in number compared to a vocabulary obtained from a large text corpus. Owing to this, there is little scope for character embeddings to capture prefix, suffix, or stem patterns that can otherwise be observed more frequently in a large corpus.
As the scores indicate, WQ is more difficult than SQ and several reasons may contribute to this trend. First, owing to multi-relations, the average number of candidate relations per question is more in WQ. Second, WQ has more questions that are close to real world questions asked by humans. In contrast, the questions in SQ are synthetic in nature as they are composed by looking at the true answer in KB. Third, WQ needs more complex reasoning on KB, as the path from head entity to answer often consists of multiple hops. As a result, scores for SQ are in the 90s whereas there is still room for improvement for WQ.
Last two rows show that our proposed model achieves the best performance on both WQ and SQ. While replacing entity mentions yields improvement, the model cannot use entity information in this process. However, our results confirmed that extracting features from entity pair inputs separately for final prediction was useful.
As the multi-perspective matching using question with entity mention replaced, it helps the model to focus on what information needed. However it also reduces information about entity, so we think extracting features from entity pair for final prediction should help and the results confirmed it.
From the table 1, we see that adding relation tails entity types always helps. If using it more appropriately by matching in different view with question, it can helps significantly. Comparing row 8 and 9, using question text with entity mention being replaced helps model to focus on important information from question than using original question text.
From the table 1, we can also see that adding character embedding on top of word embedding is not that helpful. We think the main reason is the vocabulary is limited in KBQA scenario than normal texts. The word variation is less, thus character embedding could not capture a lot prefix/postfix/stemming that could helps in other NLP tasks.
Relation detection, a crucial step in KBQA, is significantly different from general relation extraction. To accomplish this task, we propose a novel KB relation detection model that performs bilateral multiple perspective matching between multiple views of question and KB relation. Empirical results show that our model outperforms the previous methods significantly on KB relation detection task and is expected to enable a KBQA system perform better than state-of-the-art KBQA systems.
- Bahdanau et al. (2015) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. ICLR .
- Bast and Haussmann (2015) Hannah Bast and Elmar Haussmann. 2015. More accurate question answering on freebase. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, pages 1431–1440.
- Berant et al. (2013) Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on Freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, pages 1533–1544.
- Bordes et al. (2015) Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. 2015. Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075 .
- Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems. pages 2787–2795.
- Dai et al. (2016) Zihang Dai, Lei Li, and Wei Xu. 2016. Cfo: Conditional focused neural question answering with large-scale knowledge bases. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pages 800–810.
- Dhingra et al. (2016) Bhuwan Dhingra, Hanxiao Liu, William W Cohen, and Ruslan Salakhutdinov. 2016. Gated-attention readers for text comprehension. arXiv preprint arXiv:1606.01549 .
- dos Santos et al. (2015) Cicero dos Santos, Bing Xiang, and Bowen Zhou. 2015. Classifying relations by ranking with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, pages 626–634.
- Fader et al. (2013) Anthony Fader, Luke S Zettlemoyer, and Oren Etzioni. 2013. Paraphrase-driven learning for open question answering. In ACL (1). Citeseer, pages 1608–1618.
- Golub and He (2016) David Golub and Xiaodong He. 2016. Character-level question answering with attention. arXiv preprint arXiv:1604.00727 .
- Gormley et al. (2015) Matthew R. Gormley, Mo Yu, and Mark Dredze. 2015. Improved relation extraction with feature-rich compositional embedding models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, pages 1774–1784.
- Hochreiter and Schmidhuber (1997) Sepp Hochreiter and JÃ¼rgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9(8):1735–1780. https://doi.org/10.1162/neco.19126.96.36.1995.
- Kadlec et al. (2016) Rudolf Kadlec, Martin Schmid, Ondrej Bajgar, and Jan Kleindienst. 2016. Text understanding with the attention sum reader network. ACL .
- Liang et al. (2016) Chen Liang, Jonathan Berant, Quoc Le, Kenneth D Forbus, and Ni Lao. 2016. Neural symbolic machines: Learning semantic parsers on freebase with weak supervision. arXiv preprint arXiv:1611.00020 .
- Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. pages 3111–3119.
- Nguyen and Grishman (2014) Thien Huu Nguyen and Ralph Grishman. 2014. Employing word representations and regularization for domain adaptation of relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Baltimore, Maryland, pages 68–74.
- Nguyen and Grishman (2015) Thien Huu Nguyen and Ralph Grishman. 2015. Combining neural networks and log-linear models to improve relation extraction. arXiv preprint arXiv:1511.05926 .
- Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP. volume 14, pages 1532–43.
- Rink and Harabagiu (2010) Bryan Rink and Sanda Harabagiu. 2010. Utd: Classifying semantic relations by combining lexical and semantic resources. In Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Uppsala, Sweden, pages 256–259.
- Sun et al. (2011) Ang Sun, Ralph Grishman, and Satoshi Sekine. 2011. Semi-supervised relation extraction with large-scale word clustering. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, pages 521–529.
- Vu et al. (2016) Ngoc Thang Vu, Heike Adel, Pankaj Gupta, and Hinrich Schütze. 2016. Combining recurrent and convolutional neural networks for relation classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, pages 534–539.
- Wang et al. (2016) Linlin Wang, Zhu Cao, Gerard de Melo, and Zhiyuan Liu. 2016. Relation classification via multi-level attention cnns. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pages 1298–1307.
- Wang et al. (2017) Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. In IJCAI 2017.
- Xu et al. (2016) Kun Xu, Siva Reddy, Yansong Feng, Songfang Huang, and Dongyan Zhao. 2016. Question answering on freebase via relation extraction and textual evidence. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pages 2326–2336.
- Yao et al. (2014) Xuchen Yao, Jonathan Berant, and Benjamin Van Durme. 2014. Freebase qa: Information extraction or semantic parsing? ACL 2014 page 82.
- Yao and Van Durme (2014) Xuchen Yao and Benjamin Van Durme. 2014. Information extraction over structured data: Question answering with freebase. In ACL (1). Citeseer, pages 956–966.
- Yih et al. (2015) Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao. 2015. Semantic parsing via staged query graph generation: Question answering with knowledge base. In Association for Computational Linguistics (ACL).
- Yih et al. (2016) Wen-tau Yih, Matthew Richardson, Chris Meek, Ming-Wei Chang, and Jina Suh. 2016. The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Berlin, Germany, pages 201–206.
- Yin et al. (2017) Jun Yin, Wayne Xin Zhao, and Xiao-Ming Li. 2017. Type-aware question answering over knowledge base with attention-based tree-structured neural networks. Journal of Computer Science and Technology 32(4):805–813. https://doi.org/10.1007/s11390-017-1761-8.
- Yin et al. (2016) Wenpeng Yin, Mo Yu, Bing Xiang, Bowen Zhou, and Hinrich Schütze. 2016. Simple question answering by attentive convolutional neural network. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, Osaka, Japan, pages 1746–1756.
- Yu et al. (2016) Mo Yu, Mark Dredze, Raman Arora, and Matthew R. Gormley. 2016. Embedding lexical features via low-rank tensors. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, pages 1019–1029. http://www.aclweb.org/anthology/N16-1117.
- Yu et al. (2017) Mo Yu, Wenpeng Yin, Kazi Saidul Hasan, Cícero Nogueira dos Santos, Bing Xiang, and Bowen Zhou. 2017. Improved neural relation detection for knowledge base question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. pages 571–581.
- Zeng et al. (2014) Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, pages 2335–2344.
- Zhou et al. (2005) GuoDong Zhou, Jian Su, Jie Zhang, and Min Zhang. 2005. Exploring various knowledge in relation extraction. In Association for Computational Linguistics. pages 427–434.
- Zhou et al. (2016) Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Berlin, Germany, pages 207–212.