Adversarial Attacks Against Deep Learning Systems for ICD-9 Code Assignment

Adversarial Attacks Against Deep Learning Systems for ICD-9 Code Assignment


Manual annotation of ICD-9 codes is a time consuming and error-prone process. Deep learning based systems tackling the problem of automated ICD-9 coding have achieved competitive performance. Given the increased proliferation of electronic medical records, such automated systems are expected to eventually replace human coders. In this work, we investigate how a simple typo-based adversarial attack strategy can impact the performance of state-of-the-art models for the task of predicting the top 50 most frequent ICD-9 codes from discharge summaries. Preliminary results indicate that a malicious adversary, using gradient information, can craft specific perturbations, that appear as regular human typos, for less than of words in the discharge summary to significantly affect the performance of the baseline model.

1 Introduction

The International Classification of Diseases (ICD) establishes a standardized fine-grained classification system for a broad range of diseases, disorders, injuries, symptoms, and other related health conditions [6]. It is primarily intended for use by healthcare workers, policymakers, insurers and national health program managers. The United States incurs administrative costs in billions of dollars annually arising from a complex billing infrastructure [16]. Specifically, the ICD code assignment is typically a manual process, consuming on average between 25 to 43 minutes per patient depending on the ICD version [14]. It is also prone to errors resulting from inexperienced coders, variation between coders, incorrect grouping of codes or mistakes in the patient discharge summaries. These errors are very costly with one report estimating that preventable errors in ICD coding have cost Medicare system 31.6 billion in FY2018 [4].

Recent work [12, 15, 1] has tried to automate the task of ICD code assignment using deep learning. Typically framed as a multilabel classification problem, researchers have trained Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Transformer models to predict ICD-9 codes from patient discharge summaries. These models have outperformed rule-based approaches and those utilizing conventional algorithms such as Logistic Regression, Support Vector Machines, Random Forests etc., achieving competitive micro F1-scores in the range 42% - 68%. Amongst these models, those based on CNNs have achieved the best performance.

Neural network models have revolutionized the field of NLP and SOTA models for various NLP tasks involve deep neural network models such as BERT, Bidirectional RNN or CNN-based methods. Recent works [9, 10, 13, 19] have shown a particular vulnerability of such deep models to adversarial examples that are often produced by adding small and imperceptible perturbations to the input data. The state of the art models of NLP are no exceptions to such perturbations. [18] provides a review of different adversarial attacks and defense strategies in the NLP literature. Based on granularity of the perturbation, adversarial attack strategies in NLP can be classified into three types - character-level attacks, word-level attacks and sentence-level attacks. In a character-level attack strategy, the model induces noise at the character level. Character-level noise can be induced due to naturally occurring reasons such as typos and misspellings or due to intentional modification by a malicious third-party. [11, 3, 2] are some of the existing character-level attack strategies in NLP. To accurately model the naturally occurring typos, [17] restrict the typos distribution based on the character constraints found in a standard English keyboard. We follow this strategy in our work. Furthermore, we assume a white-box setting where the adversary has access to gradients of the loss function wrt to the model inputs. To our knowledge, this is the first work to investigate the effects of adversarial samples in clinical NLP domain.

2 Data and Preprocessing

We used MIMIC-III [7], a large open source database comprising information of patients admitted to critical care units of Beth Israel Deaconess Medical Center (Boston, Massachusetts, USA). The database contains de-identified Electronic health records with both structured and unstructured data including diagnostics and laboratory results, medications, and discharge summaries. In this work, we focus on discharge summaries which encapsulates details pertaining to a patient’s stay.

Each discharge summary is manually annotated by human coders with multiple ICD-9 codes, describing both the diagnoses and procedures that the patient underwent. Out of the approx. 13000 possible ICD-9 codes, 8921 (6918 diagnosis, 2003 procedure) are present in our dataset. Following previous work, we merge discharge summaries corresponding to the same patient ID, such that no patient appears twice in our dataset resulting in 47,427 discharge summaries. This is done to ensure that there is no ‘data leakage’ between train, validation, and test sets.

The full label setting is quite noisy and suffers from class imbalance. Potential sources of noise include both missed assignments (not annotating all relevant ICD-9 codes) and incorrect assignments (annotating similar but incorrect ICD-9 codes). Consequently, it is relatively trivial to develop an adversarial attack strategy in the full label setting. For instance, one could simply find the keywords corresponding to low frequency labels and then either append or remove them from a discharge summary to alter a machine learning model’s prediction. This strategy will however fail for frequent labels since we expect the model to generalize beyond simply memorizing a few keywords. Therefore, we limited the label set to the 50 most frequent labels and removed discharge summaries which were not annotated with at least one of the labels. The resulting dataset was then split into training, validation and testing sets which contained 8067, 1574, and 1730 discharge summaries, respectively.

We followed the same pre-processing steps as in previous work [12]. All tokens without any alphabetic characters were removed. We then lowercased all tokens and replaced those appearing less than three times in the training documents with an ‘UNK’ token.

3 Baseline model

Our baseline models were the same as [12]. Specifically, we used a CNN-based sentence classifier model introduced by [8] which utilizes a max pooling layer to get sentence vector representations. We call this model Max Pool based CNN. The other model that we use instead utilizes label embeddings to calculate attention weights over word positions. These weights are then used to pool the output of the convolutional layer and calculate the sentence vector representation. This model is referred to as the Attention Pool based CNN.

4 Adversarial attack strategy

We generate adversarial examples based on the following algorithm: Given a pre-trained NLP algorithm and a measure of classification , we are interested in finding perturbations on the input such that under the constraint . The final constraint ensures that the perturbations are small. In our work, we consider perturbations (typos) of four types:

  1. Insert - Insert characters into a word, such as hike hlike

  2. Delete - Delete characters in a word, such as hike hke

  3. Swap - swap two characters of a wors, such as hike hkie

  4. Replace - Replace a character in a word with any neighboring keys in the keyboard, such as hike hoke. Here o is a neighboring word to i in a standard english keyboard.

Given an input sentence that is tokenized according to the model’s tokenizer as , we compute the partial derivative of loss with respect to each input item as shown below,


Based on this gradient information, we select a input word to attack. We experiment with two different strategies here, the maximum gradient strategy where we choose the word corresponding to the maximum gradient and a random strategy where a random word is chosen to attack. Once a word is chosen, we generate all possible typos based on the four ways described above. The typo which decreases the score of the output based on the score function is chosen. Here, we use the top5 precision as the score function. Now the word replaced with the optimal typo word is again fed through this loop for times. Each time, a different word is chosen to ensure that final words don’t change from the initial words by a lot. We experiment with different choices of . The algorithm is shown in alg. 1

1:  Input: Document , ground truth labels , classifier , budget and score function
3:  while  do
4:      Segmentation
5:     for each token in  do
6:         Compute gradients of component according to eq. 1
7:     end for
8:     Find the token or word based on the gradient according to the strategy
9:     Generate all possible typos for the chosen word
10:     Create a list of documents; each document corresponding to a typo
11:     Find the document instance that decreases the output score the most assign this to
13:  end while
Algorithm 1 Adversarial attack for ICD-9 classification

5 Results

To the best of our knowledge, [12] is the current state-of-the-art for the task of automated ICD-9 code assignment. We re-implemented their best performing models using the AllenNLP framework [5]. The test-set performance of the models for the task of predicting the top-50 most frequent ICD-9 codes from discharge summaries is given in table  1. We found that the Max Pool based CNN outperformed the Attention Pool based CNN on all performance metrics. Further, we found that the computation time for training as well as generating predictions for the former was much lesser than the latter. Therefore, we decided to limit our focus on developing an adversarial attack strategy for the Max Pool based CNN.

We experiment with three different values of budget and two different strategies - maximum gradient and random strategy for selecting the token to attack. The maximum gradient strategy can be used to analyze the robustness of the model to malicious attacks while the random strategy can be used to simulate natural settings with adversarial examples. The training time for each run on the entire corpus ( discharge summaries) - hrs to hrs on a machine with Tesla K80 GPU. The results are summarized in table  2.

In accordance with our intuition, max grad strategy performs better than random strategy. This is because, max grad strategy can produce meaningful perturbations in a large input space (average size of input document is tokens). The model’s performance doesn’t drop much with random strategy. This suggests that the model is some what robust to naturally occurring noise such as typos and misspellings. However, this might change as the budget is increased. Due to computational limits, we did not explore budgets beyond . A key result of our work is that, with less than of input tokens modified, the model’s performance drops significantly from to . This shows the potential vulnerability of this model to malicious attacks. Since, only a very few tokens are changed, it might be hard to defend against these attacks by training a discriminator to distinguish maliciously modified documents from regular ones.

Tables  3 and  4 show examples of discharge summaries before and after attack with their top5 labels. It is important to note that, on a few discharge summaries (last example in both the tables), the algorithm increases the top5 precision instead of decreasing it. One can make modifications to the algorithm to ensure that this doesn’t happen which would result in further drop in precision. Due to time constraints, we were not able to accommodate this modification. Nevertheless, these examples show the brittleness of the baseline model to input tokens.

Metric Max Pool CNN Label Attention Pool CNN
Macro F1 Score
Micro F1 Score
Macro AUC
Micro AUC
Top 5 Precision
Table 1: Performance of baseline models on MIMIC-III dataset for predicting the top 50 most frequent ICD-9 codes.
Top5 precision
Budget Max grad strategy Random strategy
Baseline ()
Table 2: Results of adversarial attacks on the corpus of discharge summaries of size .
Maximum gradient strategy, budget
Top5 precision Description
…unchanged as well. A tracheostomy tube and right subclavian line…
…unchanged as well. A tacheostomy ttube and right subclavian line…
…performed on. During tracheostomy procedure, pneumothorax occured and
chest tube…
…performed on. During tacheostomy proecedure, pneumothroax occurred
and chest tube…
Top5 labels before attack - Insertion of Sengstaken tube, Pneumonia,
Respiratory Ventilation, Venous catheterization, Arterial catheterization
Top5 labels after attack - Pneumonia, Unspecified pleural effusion, Insertion
of Sengstaken tube, Anemia, Acute post-hemorrhagic anemia
…cholelithiasis complicated hospital course including sepsis w persistent
…cholelithiasis complicated hospital course including sespis w persistent
…surgical or invasive procedure - ercp, laparoscopic cholecystectomy,
laparoscopic liver biopsy..
…surgical or invasive preocedure - erccp, laproscopic, cholecysectomy,
laparoscopic liver biopsy..
…presentation to hospital1 intubated jaundiced scleral…
…presentation to hospital1 int8bated jaundiced scleral…
Top5 labels before attack - Unspecified acquired hypothyroidism, Insertion
of endotracheal tube, Respiratory Ventilation, Enteral infusion of concentrated
nutritional substances, Continuous invasive mechanical ventilation
Top5 labels after attack - Unspecified acquired hypothyroidism, Diagnostic
ultrasound of heart, Old myocardial infarction, Major depressive disorder,
Other and unspecified hyperlipidemia.
…higher on tube feeds appreciate nutrition recs tfs changed to…
…higher on ttube fees apprciate nutritin res tfts changed to..
…for both chf and suspected aspiration pna w iv lasix…
…for both chf and suspected aspirtation pna w iv lasix…
Top5 labels before attack - Enteral infusion of concentrated nutritional
substances Venous catheterization, Food / vomit pneumonitis, Urinary tract
infection, Acute respiratory failure.
Top5 labels after attack - Acute respiratory failure, Venous catheterization,
Congestive heart failure, Insertion of endotracheal tube, Unspecified essential
Table 3: Examples of sentences for budget in maximum gradient strategy where the adversarial attack strategy resulted in maximum change in top5 labels. The first two examples cause the predictions to be worse and the last example shows a case where the adversarial example results in increased top5 precision. Labels in blue appear are part of ground truth labels.
Maximum gradient strategy, budget
Top5 precision Description
…cabg, x4, hyperlipidemia, anxiety, hypertension, migraines, gi bleed…
cbg, x4, hyperlipiddemia, axiety, hypertnesion, migranes, gi, bleedd
…medical history - coronary artery disease, hyperlipidemia, anxiety…
…medical history - conronary atery disease, hyperlipdiemia, anxiety…
…room and underwent coronary artery bypass grafting x4 with left…
…room and underwent coronary bypas gratfting x4 with left…
Top5 labels before attack - Single internal mammary-coronary artery bypass,
Extracorporeal circulation auxiliary to open heart surgery, Other and unspecified
hyperlipidemia, Atherosclerotic heart disease of native coronary artery without
angina pectoris, Unspecified essential hypertension
Top5 labels after attack - Extracorporeal circulation auxiliary to open heart
surgery, Enteral infusion of concentrated nutritional substances, Transfusion of
packed cells, Diagnostic ultrasound of heart, Atrial fibrillation
…to posterior descending artery bronchosccopy reintubated history of present…
…to posterior descending artery bronchosccopy reitnubated history of present…
…the procedure was hemoptysis requiring intubation he was transferred back…
…the procedure was hemoptysis requiring ibntubation he was transferred back…
…mitral regurgitation, hypertension, hypercholesterolemia, congestive heart
failure, tobacco abuse…
motral regunrgitation, hypertension, hypercholesterolemia, congesitve heeart
failre, taobacco abusee
Top5 labels before attack - Extracorporeal circulation auxiliary to open heart
surgery, Single internal mammary-coronary artery bypass, Atherosclerotic heart
disease of native coronary artery without angina pectoris, Mitral valve disorders,
Congestive heart failure
Top5 labels after attack - Unspecified essential hypertension, enteral infusion of
concentrated nutritional substances, Extracorporeal circulation auxiliary to open
heart surgery, Respiratory Ventilation, Transfusion of packed cells.
…cancer s p resection bilateral renal masses per pcp name…
…cancer s p resecton bliateral reanl mases per pcp name..
…morbid obesity, depression, restless leg syndrome…
mtorbid obestity, deprssion, resltess leg syndrome…
Top5 labels before attack - Congestive heart failure, Chronic obstructive
pulmonary disease Chronic kidney disease, Hypertensive chronic kidney disease,
Non-invasive mechanical ventilation
Top5 labels after attack - Congestive heart failure, Chronic obstructive pulmonary
disease, Unspecified essential hypertension, Diabetes mellitus without mention of
complication, Urinary tract infection
Table 4: Examples of sentences for budget in maximum gradient strategy where the adversarial attack strategy resulted in maximum change in top5 labels. The first two examples cause the predictions to be worse and the last example shows a case where the adversarial example results in increased top5 precision. Labels in blue appear are part of ground truth labels.

6 Discussion

This work is a first step at exploring the robustness of NLP models used for automatic ICD-9 code classification. Clinical documents are different from regular documents as they are typically generated in a fast-paced environment with higher than average typos and non-standard acronyms. As a result, clinical NLP models are more susceptible to adversarial samples compared to a regular NLP model trained on a standard English dataset. A key extension of the work would be to consider a dictionary learnt from clinical documents and biomedical literature as a defense against these character-level perturbations. Although this might mitigate the decrease in performance, it wouldn’t completely solve it. A more rigorous way to deal with this would be to account for this in the tokenization strategy. It is easy to push a word out of vocabulary when using tokenization strategies like word2vec and GloVe. Other strategies that model words unseen in training dataset such as word-piece and byte-pair encoding will also break when typos are introduced because these models learn sub words from a standard dictionary. Therefore, any defense must account for these typos in the fundamental tokenization strategy. An interesting direction would be to learn a word similarity metric and map an unknown word to a closer word in the vocabulary given the input word and the context in which it appears. Building a robust tokenization strategy would be the first step towards a robust NLP model against character-level adversarial attacks.


  1. S. Amin, G. Neumann, K. Dunfield, A. Vechkaeva, K. Chapman and M. Wixted (2019-09) MLT-dfki at clef ehealth 2019: multi-label classification of icd-10 codes with bert. pp. . Cited by: §1.
  2. J. Ebrahimi, D. Lowd and D. Dou (2018) On adversarial examples for character-level neural machine translation. External Links: 1806.09030 Cited by: §1.
  3. S. Eger, G. G. Şahin, A. Rücklé, J. Lee, C. Schulz, M. Mesgar, K. Swarnkar, E. Simpson and I. Gurevych (2019) Text processing like humans do: visually attacking and shielding nlp systems. External Links: 1903.11508 Cited by: §1.
  4. (2018) Error rate drops, but medicare still lost $31.6 billion to preventable billing errors in fy2018. Cited by: §1.
  5. M. Gardner, J. Grus, M. Neumann, O. Tafjord, P. Dasigi, N. Liu, M. Peters, M. Schmitz and L. Zettlemoyer (2018-03) AllenNLP: a deep semantic natural language processing platform. pp. . Cited by: §5.
  6. (2019-10) International classification of diseases (icd) information sheet. World Health Organization. External Links: Link Cited by: §1.
  7. A. E. Johnson, T. J. Pollard, L. Shen, H. L. Li-wei, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi and R. G. Mark (2016) MIMIC-iii, a freely accessible critical care database. Scientific data 3, pp. 160035. Cited by: §2.
  8. Y. Kim (2014-08) Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. . External Links: Document Cited by: §3.
  9. A. Kurakin, I. Goodfellow and S. Bengio (2016) Adversarial examples in the physical world. External Links: 1607.02533 Cited by: §1.
  10. A. Kurakin, I. Goodfellow and S. Bengio (2016) Adversarial machine learning at scale. External Links: 1611.01236 Cited by: §1.
  11. J. Li, S. Ji, T. Du, B. Li and T. Wang (2019) TextBugger: generating adversarial text against real-world applications. Proceedings 2019 Network and Distributed System Security Symposium. External Links: ISBN 189156255X, Link, Document Cited by: §1.
  12. J. Mullenbach, S. Wiegreffe, J. Duke, J. Sun and J. Eisenstein (2018-06) Explainable prediction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp. 1101–1111. External Links: Link, Document Cited by: §1, §2, §3, §5.
  13. N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik and A. Swami (2016) Practical black-box attacks against machine learning. External Links: 1602.02697 Cited by: §1.
  14. (2014) Perspectives. External Links: Link Cited by: §1.
  15. A. Schmaltz and A. L. Beam (2020) Exemplar auditing for multi-label biomedical text classification. ArXiv abs/2004.03093. Cited by: §1.
  16. J. Shull (2018-11) Digital health and the state of interoperable ehrs (preprint). JMIR Medical Informatics 7, pp. . External Links: Document Cited by: §1.
  17. L. Sun, K. Hashimoto, W. Yin, A. Asai, J. Li, P. Yu and C. Xiong (2020) Adv-bert: bert is not robust on misspellings! generating nature adversarial samples on bert. External Links: 2003.04985 Cited by: §1.
  18. W. E. Zhang, Q. Z. Sheng, A. Alhazmi and C. Li (2019) Adversarial attacks on deep learning models in natural language processing: a survey. External Links: 1901.06796 Cited by: §1.
  19. Z. Zhao, D. Dua and S. Singh (2017) Generating natural adversarial examples. External Links: 1710.11342 Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description