RDSGAN: Rank-based Distant Supervision Relation Extraction with Generative Adversarial Framework
Distant supervision has been widely used for relation extraction but suffers from noise labeling problem. Neural network models are proposed to denoise with attention mechanism but cannot eliminate noisy data due to its non-zero weights. Hard decision is proposed to remove wrongly-labeled instances from the positive set though causes loss of useful information contained in removed instances. In this paper, we propose a novel generative neural framework named RDSGAN (Rank-based Distant Supervision GAN) which automatically generates valid instances for distant supervision relation extraction. Our framework combines soft attention and hard decision to learn the distribution of true positive instances via adversarial training and selects valid instances conforming to the distribution via rank-based distant supervision, which addresses the false positive problem. Experimental results show the superiority of our framework over strong baselines.
Relation extraction is fundamental for constructing large scale knowledge bases, which aims to extract the relations between entity pairs. One popular way to handle this task is distant supervision mintz2009distant which automatically generates numerous labeled data via aligning text with the existing knowledge bases. However, generated training data contains numerous noisy samples due to the strong assumption. To tackle this issue, most recent state-of-the-art methods perform neural networks du2018multi; li2019gan; beltagy2019combining on denoising operation with distant supervision. Various attention mechanisms lin2016neural; han2018hierarchical; gao2019hybrid are proposed for calculating precise attention weights over instances, but soft attention mechanism usually assigns non-zero weights to noisy instances, which does not eliminate noisy data. Qin2018DSGAN; qin2018robust; ma2019easy argue that wrongly-labeled instances must be treated with hard decision by removing false positive instances from the positive set, though hard decision may cause loss of useful information contained in removed instances. In order to keep as much useful information and reduce as much noise as possible, combining both soft attention and hard decision to learn the distribution of true positive instances is a better choice.
In this paper, we propose a novel generative neural framework Rank-based Distant Supervision GAN (named RDSGAN). Firstly, we train the framework to learn the distribution of true positive instances excluding false positive instances via adversarial training. Secondly, we rank all the instances in a sentence bag and select instances conforming to the distribution of true positive instances with the method of rank-based distant supervision, which optimizes the framework to generate a
clean and valid instance in each sentence bag and addresses the false positive problem. Finally, the framework can automatically generate massive valid instances
Our contributions are summarized as follows:
(1) We propose a novel generative neural framework which learns the distribution of true positive instances and automatically generates massive valid instances to provide a clean dataset for distant supervision relation extraction.
(2) We propose the method of rank-based distant supervision to address the false positive problem.
In this section, we present the procedure of our framework, details of adversarial training and rank-based distant supervision as follows.
As illustrated in Figure 1, input instances are the concatenation of encoded embeddings of sentence 0 to , we initialize the discriminator (D) and the generator (G) with random weights and . In the first phase, input instances are fed to train D to learn the distribution of true positive instances, then G is trained to generate instances more similar to real ones. In the second phase, we fix D and use ranking module to rank mixed instances, then we select instances conforming to the distribution based on selective attention lin2016neural, which produces the bag representation for relation classification. Rank loss and relation classification loss are added (denoted by ) with weights to optimize G to generate a valid instance in one bag for building up a clean dataset for distant supervision. The complete training procedure of the framework is shown in Algorithm 1.
2.2 Adversarial Training
The target of the generator is to generate a vector sequence representing a clean and valid instance which conforms to the distribution of true positive data. As shown in Figure 1, The decoder-based generator is fed into a triplet and outputs a valid vector sequence. Hence, given the triplet of , we first map and into vectors via their word embeddings and map via a relation matrix , i.e. , where is the number of all relation classes, and is the dimension of sentence embedding, is the query vector associated with relation . The input of the generator is the sum of the three vectors:
In detail, we utilize Bidirectional-GRU (BiGRU) for the decoder and place dropouts on the hidden states of BiGRU. The generation process can be formulated as:
where is the hidden vector of the BiGRU and . The generation process goes on until it reaches the aligned sentence length . After the generation, we obtain a sentence bag shown in Figure 1, then we feed the sentence bag into the discriminator.
The discriminator is designed to learn the distribution of the true positive data, for each instance in a sentence bag, the discriminator calculates its probability of coming from the real data as follows:
where and is the number of instances in a bag. Hence, as for instances in the -th bag in the training data, the discrimination loss can be formulated as:
2.3 Rank-based Distant Supervision
As shown in Figure 1, Ranking and Classifier perform rank-based distant supervision. Given a bag containing instances related to entity pair , the representation of and the conditional probability of expressing relation are respectively calculated as:
where is the representation of , is the attention weight for each sentence . is the total number of relation classes. represents all the parameters, and is the score for relation ,:
where is weight matrix and is a bias vector.
We further define the loss function for rank-based distant supervision as the sum of rank loss and relation classification loss with their respective weights ,:
Rank Loss: In the ranking module, for all the instances in one bag, an instance containing less or no noise has higher attention weights and thus ranks higher. Hence, we attempt to make the generated instance rank in top- ( is a hyperparameter), and rank loss of the generated instance in a bag is calculated as follows:
where is referred to as a query-based function which scores how well the input instance and the predicting relation matches. The rank loss can be calculated as the average of the rank loss of each bag, where is the number of instances in a sentence bag:
Relation Classification Loss: We define the loss of relation classification using cross-entropy:
3.1 Experiment Setup
We conduct experiments on Riedel dataset riedel2010modeling, which aligns Freebase relations with the New York Times (NYT) corpus. The dataset contains 53 relations including no relation “NA”. There are 522,611 sentences linked to 281,270 entity pairs for training and 172,448 sentences linked to 96,678 entity pairs for testing.
In our experiments, we adopt stochastic gradient descent (SGD) as optimization strategy. We select the word dimension as , position dimension as , kernel size as , the number of feature maps or filters as , batch size as , aligned sentence length as , and the dropout probability as . We also set the learning rate of generator and discriminator as and respectively.
Following previous works, we evaluate our framework on the held-out evaluation. We adopt Precision@N (P@N), area under curve (AUC) and aggregated Precision-Recall (PR) curves as evaluation metrics to illustrate the performance of our proposed framework.
3.2 Performance Evaluation of RDSGAN
We adopt following baselines for distant supervised relation extraction.
â¢ Mintz mintz2009distant, MultiR hoffmann2011knowledge and MIML surdeanu2012multi: Non-neural models based on handcrafted features.
â¢ CNN+ATT and PCNN+ATT lin2016neural: Robust CNN-based models reducing noisy data based on selective attention mechanism.
â¢ DSGAN+ATT Qin2018DSGAN: A robust model using GAN to recognize true positive data.
â¢ PDCNN+TATT peng2019dilated: A dilated CNN-based model with soft entity type constraints.
The overall performance of our method compared with aforementioned baselines for distant supervised relation extraction is shown in Table 1. We can see that our method achieves much better results on P@N (100, 200, 300) metrics, and improves the AUC value by 8.98% and 7.69% compared to DSGAN+ATT and PDCNN+ATT respectively. The huge improvement comes from rank-based distant supervision which reduces much false positive data for relation extraction.
We also plot PR curves between different models shown in Figure 2 with recall number smaller than 0.4. From the overall result, we can see that: (1) All the non-neural baselines perform poorly as their features used by them are mostly derived from NLP tools, which can be erroneous. (2) CNN+ATT and PCNN+ATT improve the performance because they utilize sentence-level selective attention to reduce noise in the bag of entity pair. (3) PDCNN+TATT further enhances the performance as it incorporates soft entity type constraints to improve attention mechanism. (4) Our method RDSGAN+ATT achieves the best precision over the entire range of recall on the NYT dataset. As the recall rate increases, the precision rate of RDSGAN+ATT decreases more slowly than other models and outperforms PDCNN+ATT by 6% on average. It shows that our proposed framework can consistently generate valid instances to promote the performance for distant supervision relation extraction.
4 Related Work
Generative Adversarial Training: Recent studies have proposed several GAN-based methods utilizing gradient information in adversarial training to generate instances for relation extraction. Qin2018DSGAN proposes DSGAN to recognize true positive instances from noisy dataset via reinforcement learning yu2017seqgan. li2019gan uses GAN-driven semi-distant supervision approach to construct accurate instances and avoid wrong negative labeling. zhao2019auxiliary proposes an auxiliary classifier in the discriminator to generate high-quality training data for relation classifiers. Unlike previous models focusing on discrimination, we focus on generating valid instances to provide a clean dataset for relation extraction.
Neural Relation Extraction: In recent years, neural network models have shown superior performance on denoising operation over relation extraction. zhang2018attention explores the attention-based capsule networks in a multi-instance multi-label learning (MIML) framework. bai2019structured employs minimally structured learning to predict instance-level relation mentions. beltagy2019combining utilizes joint training on distant supervision to identify noisy sentences. Most recently, BERT devlin2018bert and its variants shi2019simple; soares2019matching; papanikolaou2019deep have been proposed to leverage attention mechanism and transformer to learn word contextual relations. Unlike previous approaches, we utilizes rank-based distant supervision which combines both soft attention and hard decision to reduce noise.
In this paper, we propose RDSGAN, a novel generative neural framework which learns the distribution of true positive instances and automatically generates massive valid instances to provide a clean dataset for distant supervision relation extraction. We propose the method of rank-based distant supervision to address the false positive problem. Experimental results on the NYT dataset shows the superiority of our framework over strong baselines.
- Valid instances include true positive and true negative instances