Adversarial Attacks Against Deep Learning Systems for ICD-9 Code Assignment
Manual annotation of ICD-9 codes is a time consuming and error-prone process. Deep learning based systems tackling the problem of automated ICD-9 coding have achieved competitive performance. Given the increased proliferation of electronic medical records, such automated systems are expected to eventually replace human coders. In this work, we investigate how a simple typo-based adversarial attack strategy can impact the performance of state-of-the-art models for the task of predicting the top 50 most frequent ICD-9 codes from discharge summaries. Preliminary results indicate that a malicious adversary, using gradient information, can craft specific perturbations, that appear as regular human typos, for less than of words in the discharge summary to significantly affect the performance of the baseline model.
The International Classification of Diseases (ICD) establishes a standardized fine-grained classification system for a broad range of diseases, disorders, injuries, symptoms, and other related health conditions . It is primarily intended for use by healthcare workers, policymakers, insurers and national health program managers. The United States incurs administrative costs in billions of dollars annually arising from a complex billing infrastructure . Specifically, the ICD code assignment is typically a manual process, consuming on average between 25 to 43 minutes per patient depending on the ICD version . It is also prone to errors resulting from inexperienced coders, variation between coders, incorrect grouping of codes or mistakes in the patient discharge summaries. These errors are very costly with one report estimating that preventable errors in ICD coding have cost Medicare system 31.6 billion in FY2018 .
Recent work [12, 15, 1] has tried to automate the task of ICD code assignment using deep learning. Typically framed as a multilabel classification problem, researchers have trained Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Transformer models to predict ICD-9 codes from patient discharge summaries. These models have outperformed rule-based approaches and those utilizing conventional algorithms such as Logistic Regression, Support Vector Machines, Random Forests etc., achieving competitive micro F1-scores in the range 42% - 68%. Amongst these models, those based on CNNs have achieved the best performance.
Neural network models have revolutionized the field of NLP and SOTA models for various NLP tasks involve deep neural network models such as BERT, Bidirectional RNN or CNN-based methods. Recent works [9, 10, 13, 19] have shown a particular vulnerability of such deep models to adversarial examples that are often produced by adding small and imperceptible perturbations to the input data. The state of the art models of NLP are no exceptions to such perturbations.  provides a review of different adversarial attacks and defense strategies in the NLP literature. Based on granularity of the perturbation, adversarial attack strategies in NLP can be classified into three types - character-level attacks, word-level attacks and sentence-level attacks. In a character-level attack strategy, the model induces noise at the character level. Character-level noise can be induced due to naturally occurring reasons such as typos and misspellings or due to intentional modification by a malicious third-party. [11, 3, 2] are some of the existing character-level attack strategies in NLP. To accurately model the naturally occurring typos,  restrict the typos distribution based on the character constraints found in a standard English keyboard. We follow this strategy in our work. Furthermore, we assume a white-box setting where the adversary has access to gradients of the loss function wrt to the model inputs. To our knowledge, this is the first work to investigate the effects of adversarial samples in clinical NLP domain.
2 Data and Preprocessing
We used MIMIC-III , a large open source database comprising information of patients admitted to critical care units of Beth Israel Deaconess Medical Center (Boston, Massachusetts, USA). The database contains de-identified Electronic health records with both structured and unstructured data including diagnostics and laboratory results, medications, and discharge summaries. In this work, we focus on discharge summaries which encapsulates details pertaining to a patient’s stay.
Each discharge summary is manually annotated by human coders with multiple ICD-9 codes, describing both the diagnoses and procedures that the patient underwent. Out of the approx. 13000 possible ICD-9 codes, 8921 (6918 diagnosis, 2003 procedure) are present in our dataset. Following previous work, we merge discharge summaries corresponding to the same patient ID, such that no patient appears twice in our dataset resulting in 47,427 discharge summaries. This is done to ensure that there is no ‘data leakage’ between train, validation, and test sets.
The full label setting is quite noisy and suffers from class imbalance. Potential sources of noise include both missed assignments (not annotating all relevant ICD-9 codes) and incorrect assignments (annotating similar but incorrect ICD-9 codes). Consequently, it is relatively trivial to develop an adversarial attack strategy in the full label setting. For instance, one could simply find the keywords corresponding to low frequency labels and then either append or remove them from a discharge summary to alter a machine learning model’s prediction. This strategy will however fail for frequent labels since we expect the model to generalize beyond simply memorizing a few keywords. Therefore, we limited the label set to the 50 most frequent labels and removed discharge summaries which were not annotated with at least one of the labels. The resulting dataset was then split into training, validation and testing sets which contained 8067, 1574, and 1730 discharge summaries, respectively.
We followed the same pre-processing steps as in previous work . All tokens without any alphabetic characters were removed. We then lowercased all tokens and replaced those appearing less than three times in the training documents with an ‘UNK’ token.
3 Baseline model
Our baseline models were the same as . Specifically, we used a CNN-based sentence classifier model introduced by  which utilizes a max pooling layer to get sentence vector representations. We call this model Max Pool based CNN. The other model that we use instead utilizes label embeddings to calculate attention weights over word positions. These weights are then used to pool the output of the convolutional layer and calculate the sentence vector representation. This model is referred to as the Attention Pool based CNN.
4 Adversarial attack strategy
We generate adversarial examples based on the following algorithm: Given a pre-trained NLP algorithm and a measure of classification , we are interested in finding perturbations on the input such that under the constraint . The final constraint ensures that the perturbations are small. In our work, we consider perturbations (typos) of four types:
Insert - Insert characters into a word, such as hike hlike
Delete - Delete characters in a word, such as hike hke
Swap - swap two characters of a wors, such as hike hkie
Replace - Replace a character in a word with any neighboring keys in the keyboard, such as hike hoke. Here o is a neighboring word to i in a standard english keyboard.
Given an input sentence that is tokenized according to the model’s tokenizer as , we compute the partial derivative of loss with respect to each input item as shown below,
Based on this gradient information, we select a input word to attack. We experiment with two different strategies here, the maximum gradient strategy where we choose the word corresponding to the maximum gradient and a random strategy where a random word is chosen to attack. Once a word is chosen, we generate all possible typos based on the four ways described above. The typo which decreases the score of the output based on the score function is chosen. Here, we use the top5 precision as the score function. Now the word replaced with the optimal typo word is again fed through this loop for times. Each time, a different word is chosen to ensure that final words don’t change from the initial words by a lot. We experiment with different choices of . The algorithm is shown in alg. 1
To the best of our knowledge,  is the current state-of-the-art for the task of automated ICD-9 code assignment. We re-implemented their best performing models using the AllenNLP framework . The test-set performance of the models for the task of predicting the top-50 most frequent ICD-9 codes from discharge summaries is given in table 1. We found that the Max Pool based CNN outperformed the Attention Pool based CNN on all performance metrics. Further, we found that the computation time for training as well as generating predictions for the former was much lesser than the latter. Therefore, we decided to limit our focus on developing an adversarial attack strategy for the Max Pool based CNN.
We experiment with three different values of budget and two different strategies - maximum gradient and random strategy for selecting the token to attack. The maximum gradient strategy can be used to analyze the robustness of the model to malicious attacks while the random strategy can be used to simulate natural settings with adversarial examples. The training time for each run on the entire corpus ( discharge summaries) - hrs to hrs on a machine with Tesla K80 GPU. The results are summarized in table 2.
In accordance with our intuition, max grad strategy performs better than random strategy. This is because, max grad strategy can produce meaningful perturbations in a large input space (average size of input document is tokens). The model’s performance doesn’t drop much with random strategy. This suggests that the model is some what robust to naturally occurring noise such as typos and misspellings. However, this might change as the budget is increased. Due to computational limits, we did not explore budgets beyond . A key result of our work is that, with less than of input tokens modified, the model’s performance drops significantly from to . This shows the potential vulnerability of this model to malicious attacks. Since, only a very few tokens are changed, it might be hard to defend against these attacks by training a discriminator to distinguish maliciously modified documents from regular ones.
Tables 3 and 4 show examples of discharge summaries before and after attack with their top5 labels. It is important to note that, on a few discharge summaries (last example in both the tables), the algorithm increases the top5 precision instead of decreasing it. One can make modifications to the algorithm to ensure that this doesn’t happen which would result in further drop in precision. Due to time constraints, we were not able to accommodate this modification. Nevertheless, these examples show the brittleness of the baseline model to input tokens.
|Metric||Max Pool CNN||Label Attention Pool CNN|
|Macro F1 Score|
|Micro F1 Score|
|Top 5 Precision|
|Budget||Max grad strategy||Random strategy|
|Maximum gradient strategy, budget|
|Maximum gradient strategy, budget|
This work is a first step at exploring the robustness of NLP models used for automatic ICD-9 code classification. Clinical documents are different from regular documents as they are typically generated in a fast-paced environment with higher than average typos and non-standard acronyms. As a result, clinical NLP models are more susceptible to adversarial samples compared to a regular NLP model trained on a standard English dataset. A key extension of the work would be to consider a dictionary learnt from clinical documents and biomedical literature as a defense against these character-level perturbations. Although this might mitigate the decrease in performance, it wouldn’t completely solve it. A more rigorous way to deal with this would be to account for this in the tokenization strategy. It is easy to push a word out of vocabulary when using tokenization strategies like word2vec and GloVe. Other strategies that model words unseen in training dataset such as word-piece and byte-pair encoding will also break when typos are introduced because these models learn sub words from a standard dictionary. Therefore, any defense must account for these typos in the fundamental tokenization strategy. An interesting direction would be to learn a word similarity metric and map an unknown word to a closer word in the vocabulary given the input word and the context in which it appears. Building a robust tokenization strategy would be the first step towards a robust NLP model against character-level adversarial attacks.
- (2019-09) MLT-dfki at clef ehealth 2019: multi-label classification of icd-10 codes with bert. pp. . Cited by: §1.
- (2018) On adversarial examples for character-level neural machine translation. External Links: Cited by: §1.
- (2019) Text processing like humans do: visually attacking and shielding nlp systems. External Links: Cited by: §1.
- (2018) Error rate drops, but medicare still lost $31.6 billion to preventable billing errors in fy2018. Cited by: §1.
- (2018-03) AllenNLP: a deep semantic natural language processing platform. pp. . Cited by: §5.
- (2019-10) International classification of diseases (icd) information sheet. World Health Organization. External Links: Cited by: §1.
- (2016) MIMIC-iii, a freely accessible critical care database. Scientific data 3, pp. 160035. Cited by: §2.
- (2014-08) Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. . External Links: Cited by: §3.
- (2016) Adversarial examples in the physical world. External Links: Cited by: §1.
- (2016) Adversarial machine learning at scale. External Links: Cited by: §1.
- (2019) TextBugger: generating adversarial text against real-world applications. Proceedings 2019 Network and Distributed System Security Symposium. External Links: Cited by: §1.
- (2018-06) Explainable prediction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp. 1101–1111. External Links: Cited by: §1, §2, §3, §5.
- (2016) Practical black-box attacks against machine learning. External Links: Cited by: §1.
- (2014) Perspectives. External Links: Cited by: §1.
- (2020) Exemplar auditing for multi-label biomedical text classification. ArXiv abs/2004.03093. Cited by: §1.
- (2018-11) Digital health and the state of interoperable ehrs (preprint). JMIR Medical Informatics 7, pp. . External Links: Cited by: §1.
- (2020) Adv-bert: bert is not robust on misspellings! generating nature adversarial samples on bert. External Links: Cited by: §1.
- (2019) Adversarial attacks on deep learning models in natural language processing: a survey. External Links: Cited by: §1.
- (2017) Generating natural adversarial examples. External Links: Cited by: §1.