Knowledge Representation via Joint Learning of Sequential Text and Knowledge Graphs
Textual information is considered as significant supplement to knowledge representation learning (KRL). There are two main challenges for constructing knowledge representations from plain texts: (1) How to take full advantages of sequential contexts of entities in plain texts for KRL. (2) How to dynamically select those informative sentences of the corresponding entities for KRL. In this paper, we propose the Sequential Text-embodied Knowledge Representation Learning to build knowledge representations from multiple sentences. Given each reference sentence of an entity, we first utilize recurrent neural network with pooling or long short-term memory network to encode the semantic information of the sentence with respect to the entity. Then we further design an attention model to measure the informativeness of each sentence, and build text-based representations of entities. We evaluate our method on two tasks, including triple classification and link prediction. Experimental results demonstrate that our method outperforms other baselines on both tasks, which indicates that our method is capable of selecting informative sentences and encoding the textual information well into knowledge representations.
Jiawei Wu, Ruobing Xie, Zhiyuan Liu^2
School of Medicine, Tsinghua University, Beijing, China
State Key Laboratory of Intelligent Technology and Systems
Tsinghua National Laboratory for Information Science and Technology
Department of Computer Science and Technology, Tsinghua University, Beijing, China
1 IntroductionKnowledge graphs (KGs), which provide significant well-structured information for modeling abstract concepts as well as concrete entities in real world, have attracted great attention in recent years. A typical knowledge graph usually arranges multi-relational data in the form of triple facts (head entity, relation, tail entity) that is abridged as . There are large numbers of KGs like Freebase, YAGO and DBpedia that are widely utilized in nature language processing applications such as question answering and web search [Bollacker et al., 2008]. However, applications for KGs are suffering from challenges of data sparsity and computational inefficiency with KG size increasing. To alleviate these problems, representation learning (RL) is proposed and widely used, significantly improving the capability of knowledge representations in knowledge inference, fusion and completion [Yang et al., 2014, Dong et al., 2014, Neelakantan et al., 2015]. Many methods have introduced representation learning to KGs, projecting both entities and relations into a continuous low-dimensional vector space [Nickel et al., 2011, Jenatton et al., 2012, Bordes et al., 2013]. Among existing methods, translation-based models, which interpret relations as translating operations between head and tail entities, are effective and efficient that possess the state-of-the-art performance. Furthermore, rich external information is considered as supplement to triple facts that helps to represent knowledge graphs, and textual information has shown significant contribution to this goal. [Wang et al., 2014a, Zhong et al., 2015] propose a joint model projecting both entities and words into the same vector space with alignment models. However, their models only consider bag-of-words assumption when modeling words in plain texts, neglecting rich textual information embedded in word order. [Xie et al., 2016] directly builds entity representations from entity descriptions, while their model is restricted by the completeness and quality of entity descriptions. There are two main shortages in existing methods for constructing knowledge representations from multiple sentences in plain texts: (1) The bag-of-words assumption fails to encode explicit word order information into sentence representations. (2) Not all sentences containing entity names are reliable and suitable for explaining the corresponding entities. Fig. 1 demonstrates an example of multiple reference sentences for the entity economics. We can observe that the first two sentences talk about the definition and attributes of economics, which could represent the entity well, while the third one provides rather confusing and meaningless information for understanding the meaning of economics. As shown in [Nagy et al., 1987], humans learn meanings of new words from their contextual information. Inspired by this, we intend to infer the meaning of an entity from its reference sentences. To overcome the shortages mentioned above, we propose the Sequential Text-embodied Knowledge Representation Learning (STKRL). Specifically, in our method, we first utilize recurrent neural network (RNN) with pooling or long short-term memory network (LSTM) to build sentence-level representations of an entity from multiple sentences. Each sentence-level representation is considered as a candidate of the corresponding entity representation. Second, we combine these sentence-level representations to form the summary text-based representation, with the favor of attention which highlights more informative sentences. Finally, we follow the margin-based score function in translation-based methods as our objective for training. To the best of our knowledge, our model is the first model which combines multi-instance learning with attention in knowledge representation learning with texts. We evaluate our models on triple classification and link prediction, and the significantly improved experimental results indicate our model is capable of representing knowledge graph better with textual information. Meanwhile, our model is of potential usefulness to definition extraction according to inspection towards the sentence attention component in our model.
2 Related Work
2.1 Translation-based MethodsTranslation-based methods have achieved great successes for representation learning in knowledge graphs. TransE [Bordes et al., 2013] interprets relations as translating operations between head and tail entities, and projects both entities and relations into the same continuous low-dimensional vector space. The energy function is defined as follows: (1) which assumes that the tail embedding should be in the neighborhood of . TransE is straightforward and effective, while this simple translating operation may have issues when modeling 1-to-N, N-to-1 and N-to-N relations. Moreover, the translating operation only focuses on a single step, regardless of rich information located in long-distance relational paths. To address the first problem, TransH [Wang et al., 2014b] models translating operations between entities on relation-specific hyperplanes. TransR [Lin et al., 2015b] interprets entities and relations in different semantic spaces, and projects entities from entity space to relation space when learning the potential relationship between entities. TransD [Ji et al., 2015] proposes dynamic mapping matrix constructed by both entities and relations for multiple representations of entities. To extend the single-step translating operation, [Gu et al., 2015, Lin et al., 2015a] encode multiple-step relation paths into representation learning of knowledge graphs and achieve significant improvements.
2.2 Representation Learning of Knowledge Graphs with Textual InformationMulti-source information, especially textual information, is significant in representation learning of knowledge graphs, which has attracted great attention recently. It can provide useful information from different aspects, which helps modeling knowledge graphs. [Wang et al., 2014a] encodes both entities and words into a joint low-dimensional vector space by alignment models using entity names as well as Wikipedia anchors. [Zhong et al., 2015] extends the alignment model by considering entity descriptions. However, the methods of modeling plain texts in the both models above are rather simple, ignoring significant information encoded in word order. [Xie et al., 2016] proposes a new kind of representation for entities, which is directly constructed from entity descriptions using CNN and thus is capable of modeling new entities. However, their description-based representation is restricted by the completeness as well as quality of entity descriptions. To the best of our knowledge, our model is the first effort which learns knowledge representations from multiple sentences extracted from noisy plain texts with word order being considered.
2.3 Multi-instance LearningMulti-instance learning, which was originally proposed in [Dietterich et al., 1997], arises in the tasks where a single object may possess multiple alternative examples or representations that describe it. Multi-instance learning aims to find out the reliability of examples in each object. [Bunescu and Mooney, 2007] extends a relation extraction method using minimal supervision with multi-instance learning. [Zhou et al., 2012] proposes multi-instance multi-label learning on multiple tasks such as scene classification and text categorization. [Riedel et al., 2010, Hoffmann et al., 2011, Surdeanu et al., 2012] adopt multi-instance learning in distant supervision for relation extraction. [Zeng et al., 2015] further combines multi-instance learning with convolutional neural network for relation extraction on distant supervision data. To fully utilize the rich textual information located in multiple sentences of each entity, we propose a sentence-level attention over multiple sentences to highlight more informative instances. To the best of our knowledge, multi-instance learning combined with attention-based model hasn’t been adopted in representation learning of knowledge graphs.
3 MethodologyWe first describe the notations used in this paper. For any triple , it consists of two entities and a relation . stands for the set of entities while stands for the set of relations. is the training set of triples, and the embeddings of entities and relations take values in . We have two representations for each entity. We set , as the structure-based representations of head and tail entities, which are the same as those in previous knowledge models, and , as the text-based representations, which are learned from plain texts by sentence encoders. For each entity, we first scan through the corpus to extract all sentences which contain the corresponding entity name. These sentences are considered as the reference sentences of the corresponding entity.
3.1 Overall ArchitectureFirst, we introduce the overall architecture of the STKRL model. Inspired by translation-based methods, we define the energy function as follows: (2) where is the same energy function as TransE, , and are new-added parts determined by the two representations jointly. in which both head and tail are text-based representations learned from reference sentences. Similarly, we also have and , jointly considering the two types of entity representations. The overall architecture of the STKRL model is demonstrated in Fig. 2. First, The RNN/LSTM sentence encoder takes reference sentences as inputs and learns the sentence-level representations. Second, an attention method is implemented to select the top ( is a hyper parameter) informative sentences to generate the text-based representation of an entity. The attention is based on the semantic similarity between the sentence-level representations and the corresponding structure-based representation. Finally, those representations will be learned jointly under the translation-based method.
3.2 Word RepresentationsIn our framework, we consider the embeddings of each word token in reference sentences as inputs, and each entity name is also regarded as a word. Inspired by [Zeng et al., 2014], the word representations consist of two parts, including word features and position features.
3.2.1 Word FeaturesThe word features could be learned by negative sampling Skip-gram models, for these models could encode contextual information located in large corpus. The learned word embeddings are then directly considered as word features.
3.2.2 Position FeaturesWord order information is significant that it can help us to better understand sentences, and we also intend to highlight the position of the entity name in its reference sentences. Suppose each sentence is represented as a sequence , where represents the -th word. Given a sentence, the position feature of its entity name will be marked as , and the positions of other words are marked according to the relevant integer distance towards the entity name. The left words have negative position values, while the right have positive position values. The position features will be marked as or if their relevant distances are larger than ( is a hyper parameter).
3.3 Sentence EncoderWe assume that the meaning of an entity could be extracted from its reference sentences. There are amounts of algorithms to represent sentence information with word order in consideration, such as recurrent neural network (RNN) and long short-term memory (LSTM). These models have been widely used in several natural language processing tasks such as machine translation. We adopt RNN with pooling and LSTM as sentence encoders to learn sentence-level representations of entities, intending to extract entity meanings from reference sentences.
3.3.1 Recurrent Neural NetworkRecurrent neural network takes a reference sentence as input. It works on a sequence and maintains a hidden state over time. At each time-step , the hidden state vector is updated as follows: (3) in which transition function is an affine transformation followed by a non-linear function such as hyperbolic tangent. More specifically, RNN reads each word representation of the input sentence sequentially. While reading each word representation, the hidden state of the RNN changes according to Eq. (3). After finishing the whole sequence marked by an end-of-sequence symbol, we can obtain the final hidden state vector as the output. is then viewed as the sentence-level representation.
3.3.2 RNN with PoolingRecurrent neural network is powerful and widely used in various fields, while it usually suffers from gradient vanishment. Therefore, it is difficult for the final hidden state of RNN to capture the early local information when the sentence is too long. [Collobert et al., 2011] proposes a mean-pooling approach to solve gradient vanishment to some degree. To leverage efficiency and effectiveness, we add a mean-pooling layer to encode the overall information of a sentence into its sentence-level representation . We have: (4) in which all intermediate hidden states containing different local information should have contribution to the final sentence-level representation, and thus could be updated during back propagation. The structure is shown in Fig. 3.
3.3.3 Long Short-Term Memory NetworkLSTM [Hochreiter and Schmidhuber, 1997] is an enhanced neural network based on RNN, which could address the gradient vanishment when learning long-term dependencies. LSTM introduces memory cells that are able to preserve state over long periods of time. At each time step , the LSTM unit is composed of: an input gate , a forget gate , an output gate , a memory cell and a hidden state . The entries of the gating vectors , and are in . The LSTM transition equations are the following: (5) (6) (7) (8) (9) (10) where is the input at the current time step , denotes the logistic sigmoid function and denotes the elementwise multiplication.
3.4 Attention over Reference SentencesWe utilize sentence encoders to build sentence-level representations from reference sentences, next we want to integrate those sentence-level representations to the text-based representation for each entity. Simply considering the mean or max of sentence embeddings will suffer from noises or lost rich information. Instead of taking the mean/max sentence embeddings, we propose an attention method to automatically select sentences which can explicitly explain the meaning of the entity. The attention-based model is powerful and has been widely applied to machine translation [Bahdanau et al., 2015], abstractive sentence summarization [Rush et al., 2015] and speech recognition [Chorowski et al., 2014]. It can automatically highlight more informative instances from multiple candidates. We implement an attention-based multi-instance learning method to select top- informative reference sentences for the corresponding entity from all candidates. More specifically, each entity has a structure-based representation . For a sentence-level representation which belongs to entity , the attention between and is as follows: (11) Reference sentences with higher are expected to better represent their corresponding entity information more explicitly. Finally, we pick the top- reference sentences, and the text-based representation is as follows: (12)
3.5 Objective FormalizationWe utilize a margin-based score function as our training objective. The overall score function is defined as follows: (13) where is a margin hyper parameter. is the energy function stated above, which can be either or -norm. is the negative sampling set of , which is defined as (14) in which the head and tail entities are randomly replaced by another entity. Note that for each entity, there are two types of entity representations, including the text-based representation and the structure-based representation. Hence we can learn these two types of entity representations simultaneously into the same vector space. The energy function (2) is used to learn structure-based and text-based representations into the same vector space, and the single item in (2) is the energy function of TransE. Thus, as mentioned above, we can train our model using the objective formalization (13).
3.6 Model Initialization and OptimizationThe STKRL model can be initialized either randomly or with pre-trained TransE embeddings. The word representations are pre-trained through Word2Vec with Wikipedia corpus. The optimization is a standard back propagation using mini-batch stochastic gradient descent (SGD). For efficiency, we also use GPU to accelerate the training process. Note that there is only one RNN/LSTM, which means that all RNN/LSTM in the figure shares the same parameters, and this RNN/LSTM is trained to encode reference sentences for all entities. Our model consists of two parts: (1) text-based RNN/LSTM with attention for encoding sentences, and (2) structure-based TransE [Bordes et al., 2013]for encoding knowledge. We train the two parts simultaneously. The training process is described as follows: Given a triple, learn structure-based and text-based knowledge representations using extended TransE with negative sampling. Note that, there are four terms for each triple, (, , ), (, , ), (, , ) and (, , ), with four terms of negative samples respectively. Utilize RNN/LSTM to encode all reference sentences of head and tail entities. Attention scheme is applied to dynamically select top-m informative sentences of head and tail from all encoded sentences and construct text-based representations respectively. For selected sentences, the structure-based representation of head, , (respectively, tail, ) is used to compute training error and optimize RNN/LSTM with back-propagation.
4.1 DatasetsWe adopt FB15K to evaluate our models on triple classification and link predication in this paper. FB15K is a subset of Freebase that has 14,951 entities and 1,345 relations. For each entity, we use Wikipedia as the corpus to extract reference sentences. Note that the Wikipedia article about the entity itself has the highest priority to provide reference sentences. Each entity has approximately 40 sentences. The statistics of the FB15K dataset are listed in Table 1. Dataset #Rel #Ent #Train #Valid #Test FB15K 1,345 14,951 483,142 50,000 59,071
4.2 Parameter SettingsThe detailed parameters will be released upon acceptance. Tasks Predicting Head (Hits@10) Predicting Tail (Hits@10) Relation Category 1-to-1 1-to-N N-to-1 N-to-N 1-to-1 1-to-N N-to-1 N-to-N TransE 78.2 87.6 42.1 70.0 79.3 48.9 87.9 71.2 Wang’s 84.6 89.4 49.3 72.1 81.8 66.4 92.3 76.7 STKRL (RNN w/o ATT) 81.9 87.2 45.7 71.8 80.8 52.6 91.2 73.2 STKRL (RNN+ATT) 84.5 89.9 48.9 73.0 81.6 64.5 92.3 76.9 STKRL (RNN+P+ATT) 87.0 90.9 52.0 77.6 85.8 67.1 93.9 82.2 STKRL (LSTM+ATT) 88.9 91.6 53.2 77.0 85.9 68.3 94.2 83.6 For STKRL, we implement 4 sentence encoders for evaluation. “RNN w/o ATT” represents selecting RNN as sentence encoder and using the mean vector of sentence-level representations instead of attention, while “RNN+ATT”, “RNN+P+ATT” and “LSTM+ATT” represent selecting the corresponding sentence encoders of RNN, RNN+pooling and LSTM with the help of attention. We implement TransE and jointly (name) model in [Wang et al., 2014a] as baselines, following their experimental settings. For fair comparisons, all baselines have the same dimension of entities and relations.
4.3 Triple ClassificationThe task of triple classification aims to test whether a triple is true or false, which could be viewed as a binary classification test.
4.3.1 Evaluation ProtocolSince each entity has two types of representations, a triple has four representations, , , and . Following the similar protocol in [Socher et al., 2013], for each triple representation, we construct a negative example by randomly replace the head or tail entity with another entity. The new entity’s representation should be the same type as the replaced one (e.g. should be replaced with or ). And we also ensure that the number of true triples is equal to false ones. The evaluation strategy is described as follows: if the dissimilarity of a testing triple is below the relation-specific threshold , it is predicted to be positive, otherwise negative. The threshold can be determined via maximizing the classification accuracy on the validation set.
4.3.2 ResultsThe results of triple classification are shown in Table 3. From the results, we observe that: (1) Our attention-based models significantly outperform all baselines, which indicates the capability of our methods in modeling knowledge representations. (2) STKRL (RNN w/o ATT) outperforms TransE, which implies the significance of textual information. Moreover, the attention-based STKRL models outperform Wang’s method which also encodes textual information into knowledge representations. It confirms that our sentence encoders can better understand the meaning of sentences, and thus get better performances. (3) Attention is the key component in our model. It can automatically select the more informative reference sentences to represent an entity, alleviating the noises caused by low-quality reference sentences. (4) STKRL (LSTM+ATT) achieves the best performance, successfully capturing the rich information in long-distance dependencies. Besides, STKRL (RNN+P+ATT) outperforms STKRL (RNN+ATT), which also demonstrates the advantages of pooling strategy. Method FB15K TransE 77.1 Wang’s 84.2 STKRL (RNN w/o ATT) 79.2 STKRL (RNN+ATT) 85.8 STKRL (RNN+P+ATT) 87.9 STKRL (LSTM+ATT) 88.6
4.4 Link PredictionThe task of link prediction aims to complete a triple fact when or is missing. Rank Sentence 1 Economics is the social science that describes the factors that determine the production, distribution and consumption of goods and services. 2 Economics focuses on the behavior and interactions of economic agents and how economies work. 10 The ultimate goal of economics is to improve the living conditions of people in their everyday life. 44 There are a variety of modern definitions of economics. Entity Rank No.1 Sentence Productivity Productivity is the ratio of output to inputs in production. February February is the second month of the year in the Julian and Gregorian calendars. Food Food is any substance consumed to provide nutritional support for the body. Travis County Austin is the capital of Texas and the seat of Travis County.
4.4.1 Evaluation ProtocolFor each test triple, the head is replaced by each of the entities of the entity set in turn. Dissimilarities of those corrupted triples are first computed by the models and then sorted by ascending order. The rank of the correct entity is finally stored. This whole procedure is repeated while removing the tail instead of the head, and the same rule applies to the tail. Note that in Wang’s and our model, each entity has two representations. As a result, the predicted rank for a certain entity is the mean of two representations. However, as for Hits@10 test, either of two representations appearing at top 10 could be viewed as a successful hit.
4.4.2 ResultsFrom Table 6, we can observe that: (1) Our model significantly outperforms all baselines in both Mean Rank and Hits@10, which demonstrates the effectiveness and robustness of our model. (2) STKRL (RNN+P+ATT) and STKRL (LSTM+ATT) significantly outperform Wang’s method. It is because that we utilize sentence encoders to model textual information with the help of word orders, instead of simply considering separate words used in Wang’s method. Moreover, the attention-based STKRL models could pay more attention to those informative reference sentences when constructing text-based representations. (3) STKRL (LSTM+ATT) achieves the best performances, which indicates the power in utilizing better sentence encoders. Metric Mean Rank Hits@ 10() Raw Filter Raw Filter TransE 215 112 46.5 68.7 Wang’s 184 67 50.9 78.2 STKRL (RNN w/o ATT) 207 96 47.4 71.9 STKRL (RNN+ATT) 195 73 50.9 78.4 STKRL (RNN+P+ATT) 188 56 52.2 80.6 STKRL (LSTM+ATT) 182 51 52.1 81.2 Table 2 demonstrates the link prediction results with different categories of relations. Relations are split into 4 categories following the same settings in [Bordes et al., 2013]. From Table 2 we can observe that: (1) the STKRL model achieves great improvements consistently on all categories of relations. (2) The improvements locate more in N-to-N relations. It indicates that textual information is important when modeling complex relations, and our STKRL models could well capture the useful textual information for better knowledge representations.
4.5 Case StudyIn this section, we give two cases and analysis to prove that we can successfully select informative reference sentences to represent entities.
4.5.1 Effectiveness of AttentionTo show that the attention has reasonably ranked reference sentences, we give 4 examples of the entity economics with their ranks. The results in Table 4 indicate that STKRL can successfully select the more informative sentences of the corresponding entity. Those top-ranked sentences will usually be the definitions or descriptions of entities. Besides, the low-ranked sentences are usually less relevant to the targeted entity, which also indicates the capability of attention in filtering noises.
4.5.2 Definition ExtractionDefinition extraction is an important task in text mining [Espinosa-Anke et al., 2015]. We want to demonstrate that the top-ranked reference sentences selected by our model are usually definitions of the corresponding entities. By randomly selecting 100 entities from the entity set , we find that 61 entities’ No.1 reference sentences are entity definitions based on manual annotation. In Table 5 we give 4 No.1 reference sentence examples. The first three sentences are exactly entity definitions, while the last is not. However, all sentences are still informative to learn the meanings of entities. The results demonstrate that our model has rational selectivity to extract reference sentences and is of potential usefulness to definition extraction.
5 ConclusionIn this paper, we propose the STKRL model, a novel model for knowledge representation learning, which jointly considers triple facts and sequential text information. We also explore how to extract informative sentences from lots of candidates with attention-based method. We evaluate our models on two tasks, triple classification and link prediction, and also give some examples to prove the ability to efficiently extract useful information. Experimental results show that our models achieve significant improvements compared to other baselines. The code and dataset will be released upon acceptance. We will explore the following further work: (1) We assume that for entity representations, there are naturally three types of representations including word, sentence and knowledge representations. We mainly use knowledge and sentence representations in the STKRL model, and we will explore to integrate all three representations in future. (2) The STKRL model can extract definition sentences of entities as by-products. In future, we will further explore this issue by designing a more sophisticated method for definition extraction.
- [Bahdanau et al., 2015] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.
- [Bollacker et al., 2008] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of KDD, pages 1247–1250.
- [Bordes et al., 2013] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of NIPS, pages 2787–2795.
- [Bunescu and Mooney, 2007] Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of ACL.
- [Chorowski et al., 2014] Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. End-to-end continuous speech recognition using attention-based recurrent nn: first results.
- [Collobert et al., 2011] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. JMLR, 12:2493–2537.
- [Dietterich et al., 1997] Thomas G Dietterich, Richard H Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence, 89(1):31–71.
- [Dong et al., 2014] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of KDD, pages 601–610.
- [Espinosa-Anke et al., 2015] Luis Espinosa-Anke, Francesco Ronzano, and Horacio Saggion. 2015. Weakly supervised definition extraction. In Proceedings of Recent Advances in Natural Language Processing, page 176.
- [Gu et al., 2015] Kelvin Gu, John Miller, and Percy Liang. 2015. Traversing knowledge graphs in vector space. In Proceedings of EMNLP, pages 318–327.
- [Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
- [Hoffmann et al., 2011] Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S Weld. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of ACL, pages 541–550.
- [Jenatton et al., 2012] Rodolphe Jenatton, Nicolas L Roux, Antoine Bordes, and Guillaume R Obozinski. 2012. A latent factor model for highly multi-relational data. In Proceedings of NIPS, pages 3167–3175.
- [Ji et al., 2015] Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of ACL, pages 687–696.
- [Lin et al., 2015a] Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu. 2015a. Modeling relation paths for representation learning of knowledge bases. In Proceedings of EMNLP, pages 705–714.
- [Lin et al., 2015b] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015b. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of AAAI.
- [Nagy et al., 1987] William E Nagy, Richard C Anderson, and Patricia A Herman. 1987. Learning word meanings from context during normal reading. American educational research journal, 24(2):237–270.
- [Neelakantan et al., 2015] Arvind Neelakantan, Benjamin Roth, and Andrew McCallum. 2015. Compositional vector space models for knowledge base completion. Proceedings of EMNLP.
- [Nickel et al., 2011] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data. In Proceedings of ICML, pages 809–816.
- [Riedel et al., 2010] Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Machine Learning and Knowledge Discovery in Databases, pages 148–163. Springer.
- [Rush et al., 2015] Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization.
- [Socher et al., 2013] Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Proceedings of NIPS, pages 926–934.
- [Surdeanu et al., 2012] Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of ACL, pages 455–465.
- [Wang et al., 2014a] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014a. Knowledge graph and text jointly embedding. In Proceedings of EMNLP, pages 1591–1601.
- [Wang et al., 2014b] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014b. Knowledge graph embedding by translating on hyperplanes. In Proceedings of AAAI, pages 1112–1119.
- [Xie et al., 2016] Ruobing Xie, Zhiyuan Liu, Jia Jia, Huanbo Luan, and Maosong Sun. 2016. Representation learning of knowledge graphs with entity descriptions. In Proceedings of AAAI.
- [Yang et al., 2014] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2014. Embedding entities and relations for learning and inference in knowledge bases. Proceedings of ICLR.
- [Zeng et al., 2014] Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of COLING, pages 2335–2344.
- [Zeng et al., 2015] Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of EMNLP.
- [Zhong et al., 2015] Huaping Zhong, Jianwen Zhang, Zhen Wang, Hai Wan, and Zheng Chen. 2015. Aligning knowledge and text embeddings by entity descriptions. In Proceedings of EMNLP, pages 267–272.
- [Zhou et al., 2012] Zhi-Hua Zhou, Min-Ling Zhang, Sheng-Jun Huang, and Yu-Feng Li. 2012. Multi-instance multi-label learning. Artificial Intelligence.