FedNER: Medical Named Entity Recognition with Federated Learning
Medical named entity recognition (NER) has wide applications in intelligent healthcare. Sufficient labeled data is critical for training accurate medical NER model. However, the labeled data in a single medical platform is usually limited. Although labeled datasets may exist in many different medical platforms, they cannot be directly shared since medical data is highly privacy-sensitive. In this paper, we propose a privacy-preserving medical NER method based on federated learning, which can leverage the labeled data in different platforms to boost the training of medical NER model and remove the need of exchanging raw data among different platforms. Since the labeled data in different platforms usually has some differences in entity type and annotation criteria, instead of constraining different platforms to share the same model, we decompose the medical NER model in each platform into a shared module and a private module. The private module is used to capture the characteristics of the local data in each platform, and is updated using local labeled data. The shared module is learned across different medical platform to capture the shared NER knowledge. Its local gradients from different platforms are aggregated to update the global shared module, which is further delivered to each platform to update their local shared modules. Experiments on three publicly available datasets validate the effectiveness of our method.
Medical named entity recognition (NER) aims to identify medical entities (e.g., drug names, adverse reactions and symptoms) from unstructured medical texts and classify them into different categories Tang et al. (2013). It can be used in many intelligent healthcare tasks such as pharmocovigilance and health monitoring Wang and Zhang (2013). Medical NER has attracted increasing attentions in NLP community, and many methods have been proposed Alex et al. (2007); Ekbal and Saha (2013); Dai et al. (2017). For example, Habibi et al. Habibi et al. (2017) proposed a LSTM-CRF approach, which used LSTM to encode context information within a sentence and used CRF to jointly decode word labels. Gridach Gridach (2017) further improved this approach by adding an additional character-level LSTM to better encode medical words.
Sufficient labeled data is critical for these methods to train accurate medical NER model Ratinov and Roth (2009). However, the labeled medical data in a single medical platform such as a hospital is usually limited. Annotating sufficient labeled data for medical NER is very expensive and time-consuming, and requires huge expertise in medical domain Abacha and Zweigenbaum (2011). Although many medical platforms may have some annotated medical NER datasets, they cannot be directly shared to train medical NER models since medical data has rich information of patients and is highly privacy-sensitive.
Recently, McMahan et al. McMahan et al. (2017) proposed a privacy-preserving machine learning framework named federated learning, where the user data is locally stored and a master server coordinates massive user devices to collaboratively train a global model by aggregating the local model updates. Motivated by federated learning, in this paper we propose a privacy-preserving medical NER method named FedNER. It can leverage the knowledge in the labeled data of different medical platforms to boost the training of medical NER model in each platform without uploading or exchanging the raw medical data. Since the labeled data in different platforms may have some differences in entity type and annotation criteria, different from the original federated learning framework where all users share the same model, in our FedNER method we decompose the medical NER model in each platform into a shared module and a private module. The private module is used to capture the characteristics of the local data in each platform, and is updated using the gradients computed from the local labeled data. The shared module is used to capture the shared knowledge among different platforms to empower the training of medical NER model in each single platform. Its gradients from different medical platforms are aggregated into a unified one to update the global shared module, which is further delivered to each platform to update the local shared module. Above process is repeated for multiple times until model converges. We conduct experiments on three publicly available medical NER datasets. The experimental results validate that our method can boost the performance of medical NER by leveraging the labeled data on different platforms for model training in a collaborative way, and at the same time remove the need to directly exchange raw data among different platforms for better privacy protection.
The main contributions of this paper are summarized as follows:
We propose a FedNER method based on federated learning to learn more accurate medical NER model from the labeled data of multiple medical platforms without the need to directly exchange the raw privacy-sensitive medical data among different platforms.
Different from original federated learning where all clients share the same model, in FedNER we propose to decompose the medical NER model on each platform into shared and private modules to effectively leverage the knowledge from other platforms and at the same time capture the characteristics of the local data.
We conduct extensive experiments on different benchmark datasets to verify the effectiveness of the proposed FedNER method.
2 Related Work
Medical named entity recognition is a challenging research topic as it requires both understanding of texts and domain knowledge.
Both rule-based methods and statistical methods are proposed to tackle NER in the medical domain Dong et al. (2016); Nadeau and Sekine (2007).
For example, Embarek et al. Embarek and Ferret (2008) developed a rule-based tagging systems by capturing linguistic patterns, e.g., inflected form and lemma.
Other rule-based approaches involve some domain-specific knowledge bases or tools, such as MetaMap Aronson (2001) and UMLS Odisho* et al. (2019).
However, most of these rule-based methods require heavy effort and expertise to design effective rules.
Thus, statistical methods have also been widely adopted, ranging from SVM Isozaki and Kazawa (2002) to more recent neural methods Wang et al. (2018); Jain (2015).
For instance, Xu et al. Xu et al. (2017) proposed to use bi-directional LSTM to learn both character and word embeddings and CRF for label decoding.
Zhao et al. Zhao et al. (2019) proposed a joint learning approach for medical entity recognition and normalization.
It uses a character-level CNN to form word representations along with the pre-trained word embeddings and a Bi-LSTM to learn contextual representation of words.
One problem for these methods is the dependency on a large-scale and well-annotated corpus, facing a small corpora, their performances may degrade significantly.
However, the labeled data in one single medical platform is usually limited, and annotating a large-scale corpora is laborious and time-consuming.
Without sufficient labeled data, it is difficult for these deep learning based methods to achieve satisfactory performance.
Although different medical platforms may have some own labeled datasets, they cannot be directly aggregated since medical data is highly privacy-sensitive Sweeney (2000).
Both uploading medical data from different platforms to a server and exchanging it between different platforms will cause high risk of privacy leakage.
Moreover, recent laws and regulations such as GDPR
Recently, federated learning is proposed by McMahan et al. McMahan et al. (2017) to collectively train intelligent models from the locally stored data of massive users and remove the need to upload it to server to reduce the risk of privacy and security. In federated learning, all user client share the same model which is coordinated by a central server. Each client updates its local model with private data and transmits the local model update to the central server. The server then aggregates received model updates from massive user clients, updates the global model and distributes the new model to each client for next-round training. In federated learning, the raw data never leaves the user devices and only model updates are uploaded to server, which generally contain less information than the raw data. Federated learning has been applied to a few NLP tasks to exploit the corpus from different sources in a privacy-preserving way Jiang et al. (2019); Hardy et al. (2017). For instance, Jiang et al. Jiang et al. (2019) proposed a federated topic modeling approach, which trains a unified high-quality topic model using data from multiple sensitive text corpus. In existing federated learning methods, different clients usually share the same model, assuming that the private data of different clients share the same characteristics. However, in medical NER, the entity types and annotation criteria of different medical platforms usually have significant difference. Thus, in our FedNER method we decompose the medical NER model on each medical platform into a shared module and a private module to leverage the shareable knowledge among different platforms and at the same time capturing the characteristics of the data on each platform.
3 FedNER Method
In this section we first introduce the basic medical NER model used in our method. Then we introduce the FedNER framework for privacy-preserving medical NER model training with data from different medical platforms.
3.1 Medical NER Model
Following many existing works Xu et al. (2017); Zhao et al. (2019), we formulate medical NER as a sequence labeling task. For example, given an input sentence “Aspirin causes me a severe headache”, the medical NER model will output a tag sentence “[DRUG] [O] [O] [O] [ADE] [ADE]”.
The word representation module incorporates three kinds of embedding to represent words, i.e, pre-trained word embedding, character-based embedding and language model embedding.
The pre-trained word embedding represents each word using a semantic vector.
Denote a sentence with words as , it is converted to an embedded sequence through the embedding matrix , where is the dimension of the word embedding and is the vocabulary size.
However, they are insufficient due to the massive rare and out-of-vocabulary (OOV) words in the medical field.
Thus, we additionally model each word at a character level.
For a word with characters, we first use an embedding matrix to obtain the character-level embedding outputs, where is the dimension of the character embedding and is the number of characters.
The output character embedding of word is denoted as .
To model the relation between characters in a word, we apply a CNN layer to learn contextual representation of each character, the output sequence of the word is denoted as .
Then the contextual character sequence is sent to a max-pooling layer, and transformed to the final character-based embedding for word .
For a sentence of length , the final character-based embedding of a sentence is denoted as .
Besides, since many words are context-dependent, we use the pre-trained ELMo
The context modeling module utilizes two layers to enhance the word representations by capturing the dependency between words. The first layer is a word-level CNN, which aims to capture local context information Kim (2014). Many medical entities are the short combination of several words, e.g., “itchy scalp” and “restless leg syndrome”. Thus, modeling relations between near neighbours may help better recognize entities. Given an output sequence from the word representation module as input, the word-level CNN output is denoted as . The second layer is a Bi-LSTM, which aims to model long-distance dependencies between words in both directions Huang et al. (2015). Some descriptions of health condition have a relatively long span, and modeling only local contexts may be insufficient. For instance, in the expression “hair has been definitely falling out”, the interaction between “hair” and “falling out” is essential for entity prediction. Thus, we use a Bi-LSTM layer to model this kind of relationship. The output of the Bi-LSTM layer is the contextual sequence . By using a combination of CNN and LSTM, both local and global contexts can be taken into consideration.
The label decoding module aims to decode word labels. Neighbor labels usually have relatedness with each other in NER task. Thus, we use the conditional random field (CRF) to jointly decode the optimal label chains by considering label dependencies. The loss function of our NER model is formulated as:
where is all the trainable parameters in the NER model, is the labeled training dataset, is a word sequence and is the corresponding label sequence.
3.2 FedNER Framework
In this section, we introduce our privacy-preserving approach for medical NER. The framework of our approach is shown in Figure 2. In this framework, the server coordinates multiple clients for local model updating and global model sharing. To be more specific, the clients here are different medical platforms, and train their local models with privately stored data. The central server monitors each platform for gradient aggregation and performs global model updating once it has collected gradients from all platforms. Then it distributes the updated parameters of the shared model to each platform for next-round model training. The overall learning framework of FedNER is illustrated in Algorithm 1.
In the training phase, each platform computes the model gradients using its locally stored data.
The data distribution across different platforms is non-I.I.D., and each platform keeps its data as private, only updating gradients obtained from the data training process.
Since the medical data stored in different platforms may have different characteristics and annotation criteria, sharing all model parameters between them may not be an optimal solution.
For example, some platforms may use the BIO tagging scheme while others may prefer the more complex BIOES tagging scheme.
Besides, some platforms mainly aim to find drug names and their corresponding dosages, while others may be more user-oriented, requiring the system to recognize user symptoms and adverse drug effects.
Thus, we propose to decompose the model into a shared module and a private module.
The private module consists of two top layers in our medical NER model, i.e, Bi-LSTM and CRF, which aim to learn platform-specific context representations and label decoding strategies.
Denote the set of platforms as and the global model batch size as . For the platform in , the training dataset is and the loss function is denoted as . In the beginning of each iteration, the platform will first select a mini-batch of training data from , where . Then the platform computes the gradients associated with parameters in the private and shared modules as and . The parameters of the private module are locally updated by , where is the learning rate. Gradients of the shared module are sent to a third-party central server for information sharing among different platforms. Instead of directly sharing raw data, our approach only uploads gradients of the shared module, which generally contain less privacy-sensitive information.
The central server contains an aggregator and a globally shared model. Here we assume the server belongs to one trusted third party, which means it will not make any vicious attack Bonawitz et al. (2019). In the beginning of each iteration, the server first monitors each platform for any possible gradient uploading. Once it receives gradients from one platform, the server will store them for future aggregation. When the server finishes receiving gradients from platforms, the aggregator aggregates the locally-computed gradients from all platforms. The aggregated gradients are weighted summations of the received locally-computed gradients, which are formulated as:
Since gradients from different platforms are aggregated together, the information of labeled data in each platform is harder to be inferred. Thus, the privacy is well-protected. The aggregator uses the aggregated gradients to update the parameters of a globally shared model stored on the central server by . The updated globally shared model is then distributed to each platform to update their local shared module. The process described above is repeated iteratively until the entire model converges.
In FedNER, the medical NER model learning can benefit from incorporating the annotated information of the labeled data on different platforms, and the privacy is also well-protected by removing the need to exchange raw data directly among different platforms.
4.1 Dataset and Experimental Settings
|Dataset||# Sentences||Entity types||# Entities|
|ADE Corpus||4,484||Drug, ADE, Dosage||4,785|
We experiment on three publicly available medical NER datasets, e.g., CADEC Karimi et al. (2015), ADE Corpus Gurulingappa et al. (2012) and SMM4H Weissenbacher et al. (2019). The detailed information of these datasets is listed in Table 1. Among all three datasets, there are 341 overlapping entities between them. For each dataset, we randomly sample 80% of sentences as training data, and the rest as testing data. For word embedding, we use the pretrained Glove embedding Pennington et al. (2014), which has a dimension of 300. The dimension of the randomly initialized character embedding is 100. The convolution layers of character-level and word-level CNN have 200 filters, with a kernel size of 3. The Bi-LSTM layer has 2200 hidden states. Adam is chosen as the optimizer with an initialized learning rate of 0.001. We use dropout strategy to mitigate overfitting, the dropout rate is set to 0.2. The aggregated number of gradients is 64 in each interaction. Following previous work Abacha and Zweigenbaum (2011), we use the BIO tagging scheme. We independently repeat each experiment 10 times and report the average strict F1 and relax F1 scores. Under strict F1 evaluation, entity spans are considered correct only if position indices exactly match the gold annotations. Under relax F1 evaluation, only an overlap between the the range of predicted positions and gold annotations is needed.
4.2 Experimental Results
We conduct experiments to compare the performance of our FedNER method with several baseline NER methods, including: (1) CNN-CRF Collobert et al. (2011), using CNN to learn word representations and CRF to decode labels; (2) LSTM-CRF Habibi et al. (2017), using a Bi-LSTM to learn word representations; (3) GRAM-CNN Zhu et al. (2017), using CNN to learn both character and word representations and CRF for label decoding. (4) CNN-LSTM-CRF Ma and Hovy (2016), using CNN to learn character representations and LSTM to learn word representations; (5) S-LSTM-CRF Lample et al. (2016), a variant of LSTM-CRF that uses a stacked Bi-LSTM for word representation; (6) CNN-CLSTM-CRF Shen et al. (2017), a variant of CNN-LSTM-CRF, which uses a combination of CNN and LSTM to learn word representations; (7) ELMoNER, the medical NER model introduced in Section 3.1, which is trained on single platform. The results are summarized in Table 2.
We have two main findings from the results. First, compared with other baseline NER methods, ELMoNER can achieve better performances. This is because ELMoNER learns word representation at both word level and character level. By learning a character-level word representation, the model may better handle out-of-vocabulary medical terminologies by looking at their character contexts. Besides, ELMoNER captures both local and long-term context information by using a combination of CNN and Bi-LSTM networks. Furthermore, it also utilizes context-aware word representations generated by the pre-trained language model ELMo to enhance representations of words.
Second, our FedNER method can consistently outperform other methods on medical NER. This is because the labeled data in a single medical platform is usually limited and insufficient to train an accurate NER model, and the datasets from different medical platforms cannot be exchanged due to the privacy sensitivity. Different from the baseline methods which are trained on the data of a single medical platform, our FedNER method can leverage the labeled data from different medical platforms in a privacy-preserving way to learn the shareable NER knowledge and alleviate the data sparsity problem. Thus, our FedNER method can achieve better performance on medical NER.
4.3 Influence of Training Data Size
Next, we explore whether the proposed FedNER can effectively handle the data scarcity problem on each platform by leveraging the useful data of different platforms. We randomly select different ratios of data for model training, and due to space limit we only show the results on ADE Corpus dataset in Figure 3. We find that compared with training model on the data of a single platform, the FedNER can train more accurate medical NER model by leveraging the useful information from multiple platforms. In addition, as the size of labeled data on each platform decreases, i.e., the data scarcity problem in single platform in more serious, and the performance improvement of FedNER over single-platform training becomes more significant. These results indicate that FedNER can effectively leverage the useful information on different platforms to train more accurate medical NER model and alleviate the data scarcity problem on a single platform.
4.4 Model Decomposition Strategy
In FedNER we decompose the medical NER model into a shared module and a private module. Next we explore the influence of different model decomposition strategies on the performance of FedNER. The results on ADE Corpus dataset are shown in Figure 4 and the results on other datasets show similar patterns. We find that if the module is not shared, the performance is sub-optimal, since the shareable knowledge among different platforms is not exploited at all, and the labeled data on a single platform is insufficient to train an accurate enough model. However, if all platforms share the same model, the performance is also not optimal. This happens because the data on different platforms usually has different characteristics such as entity types and annotation criteria, which cannot be captured if we constrain different platforms to share exactly the same model. These results validate the effectiveness of our strategy in decomposing the neural medical NER model into a shared module to learn the general and shareable knowledge for NER from multiple platforms, and a private module to capture the local data characteristics.
4.5 Influence of Overlapped Entity Number
A natural assumption is that the performance improvement of FedNER is probably brought by the overlapped entities in different platforms. In this section we explore this assumption by randomly masking different ratios of overlapped entities in training data. The experimental results are shown in Figure 5. We have two findings from the results. First, when there are more overlapped entities, the performance improvement of FedNER over single-platform training is more significant. This is intuitive, since the knowledge of overlapped entities can be easily learned in FedNER by leveraging the information of different platforms for model training. Second, even all the overlapped entities are masked, FedNER can still bring consistent performance improvement. This result shows that our FedNER can learn some generalized knowledge of NER from the data of different platforms, rather than only the knowledge of overlapped entities.
4.6 Generalization of FedNER Framework
To verify the generalization ability of the FedNER framework, we apply it to different existing medical NER methods to see whether they can benefit from leveraging the data on different platforms for model training under our FedNER framework. We selected the best model decomposition strategy for each method. The results on the ADE Corpus dataset are illustrated in Figure 6, and the results on other datasets show similar patterns. We can see that all the existing medical NER methods compared here can achieve significant performance improvement under the FedNER framework compared with single-platform training. These results show that FedNER is a general framework, and can help different medical NER methods to leverage the labeled data from different medical platforms in a privacy-preserving way to enhance the medical NER model training and alleviate data sparsity problem.
5 Conclusion and Future Work
In this paper, we propose a FedNER method for medical NER. It can train medical NER models by leveraging the labeled data on different platforms meanwhile removing the need to directly exchange the privacy-sensitive medical data among different platforms for better privacy protection. We decompose the medical NER model in each platform into a shared module and a private module. The private module is updated in each platform using the local data to model the platform-specific characteristics. The shared module is used to capture the shareable knowledge among different platforms, and is updated in a server based on the aggregated gradients from multiple platforms. It is further sent to each platform for next-round training. Experiments on three benchmark datasets show our method can effectively improve the performance of medical NER by exploiting the useful information of multiple medical platforms in a privacy-preserving way.
In the future, we plan to strengthen the security guarantees of FedNER by adopting the homomorphic encryption or local differential privacy techniques when gradients of the shared module are uploaded to the server. In addition, we plan to apply FedNER to other NER tasks with privacy-sensitive data on different platforms, such as financial text NER among different companies.
- “ADE” stands for adverse drug effect.
- We also tried BERT but we found ELMo can achieve better performance.
- The partition strategy of the private module and shared module will be further discussed in Section 4.4.
- Asma Ben Abacha and Pierre Zweigenbaum. 2011. Medical entity recognition: A comparison of semantic and statistical methods. In BioNLP Workshop, pages 56–64.
- Beatrice Alex, Barry Haddow, and Claire Grover. 2007. Recognising nested named entities in biomedical text. In BioNLP Workshop, pages 65–72.
- Alan R Aronson. 2001. Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In Proceedings of the AMIA Symposium, page 17.
- Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konecny, Stefano Mazzocchi, H Brendan McMahan, et al. 2019. Towards federated learning at scale: System design. In SysML.
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. JMLR, 12(Aug):2493–2537.
- Xiang Dai, Sarvnaz Karimi, and Cecile Paris. 2017. Medication and adverse event extraction from noisy text. In ALTA Workshop, pages 79–87.
- Xishuang Dong, Lijun Qian, Yi Guan, Lei Huang, Qiubin Yu, and Jinfeng Yang. 2016. A multiclass classification method based on deep learning for named entity recognition in electronic medical records. In New York Scientific Data Summit, pages 1–10. IEEE.
- Asif Ekbal and Sriparna Saha. 2013. Stacked ensemble coupled with feature selection for biomedical entity extraction. Knowledge-Based Systems, 46:22–32.
- Mehdi Embarek and Olivier Ferret. 2008. Learning patterns for building resources about semantic relations in the medical domain. In LREC.
- Mourad Gridach. 2017. Character-level neural network for biomedical named entity recognition. Journal of biomedical informatics, 70:85–91.
- Harsha Gurulingappa, Abdul Mateen Rajput, Angus Roberts, Juliane Fluck, Martin Hofmann-Apitius, and Luca Toldo. 2012. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. Journal of biomedical informatics, 45(5):885–892.
- Maryam Habibi, Leon Weber, Mariana Neves, David Luis Wiegandt, and Ulf Leser. 2017. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics, 33(14):i37–i48.
- Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv:1711.10677.
- Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional lstm-crf models for sequence tagging. arXiv:1508.01991.
- Hideki Isozaki and Hideto Kazawa. 2002. Efficient support vector classifiers for named entity recognition. In ACL, pages 1–7.
- Devanshu Jain. 2015. Supervised named entity recognition for clinical data. In CLEF (Working Notes).
- Di Jiang, Yuanfeng Song, Yongxin Tong, Xueyang Wu, Weiwei Zhao, Qian Xu, and Qiang Yang. 2019. Federated topic modeling. In CIKM, pages 1071–1080. ACM.
- Sarvnaz Karimi, Alejandro Metke-Jimenez, Madonna Kemp, and Chen Wang. 2015. Cadec: A corpus of adverse drug event annotations. Journal of biomedical informatics, 55:73–81.
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In EMNLP, pages 1746–1751.
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In NAACL-HLT, pages 260–270.
- Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. In ACL, pages 1064–1074.
- Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In AISTATS, pages 1273–1282.
- David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3–26.
- Anobel Odisho*, Briton Park, Nicholas Altieri, William Murdoch, Peter Carroll, Matthew Coopberberg, and Bin Yu. 2019. Pd58-09 extracting structured information from pathology reports using natural language processing and machine learning. The Journal of Urology, 201(Supplement 4):e1031–e1032.
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543.
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In NAACL.
- Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In CoNLL, pages 147–155.
- Yanyao Shen, Hyokun Yun, Zachary Lipton, Yakov Kronrod, and Animashree Anandkumar. 2017. Deep active learning for named entity recognition. In RepL4NLP Workshop, pages 252–256.
- Latanya Sweeney. 2000. Simple demographics often identify people uniquely. Health (San Francisco), 671:1–34.
- Buzhou Tang, Hongxin Cao, Yonghui Wu, Min Jiang, and Hua Xu. 2013. Recognizing clinical entities in hospital discharge summaries using structural support vector machines with word representation features. In BMC medical informatics and decision making, volume 13, page S1.
- Zhenghui Wang, Yanru Qu, Liheng Chen, Jian Shen, Weinan Zhang, Shaodian Zhang, Yimei Gao, Gen Gu, Ken Chen, and Yong Yu. 2018. Label-aware double transfer learning for cross-specialty medical named entity recognition. arXiv:1804.09021.
- Zhong-Yi Wang and Hong-Yu Zhang. 2013. Rational drug repositioning by medical genetics. Nature biotechnology, 31(12):1080.
- Davy Weissenbacher, Abeed Sarker, Arjun Magge, Ashlynn Daughton, Karen OâConnor, Michael Paul, and Graciela Gonzalez. 2019. Overview of the fourth social media mining for health (smm4h) shared tasks at acl 2019. In SMM4H Workshop, pages 21–30.
- Kai Xu, Zhanfan Zhou, Tianyong Hao, and Wenyin Liu. 2017. A bidirectional lstm and conditional random fields approach to medical named entity recognition. In AISI, pages 355–365.
- Sendong Zhao, Ting Liu, Sicheng Zhao, and Fei Wang. 2019. A neural multi-task learning framework to jointly model medical named entity recognition and normalization. In AAAI, volume 33, pages 817–824.
- Qile Zhu, Xiaolin Li, Ana Conesa, and Cécile Pereira. 2017. Gram-cnn: a deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics, 34(9):1547–1554.