Identifying Mild Traumatic Brain Injury Patients From MR Images Using Bag of Visual Words
Mild traumatic brain injury (mTBI) is a growing public health problem with an estimated incidence of one million people annually in US. Neurocognitive tests are used to both assess the patient condition and to monitor the patient progress. This work aims to directly use MR images taken shortly after injury to detect whether a patient suffers from mTBI, by incorporating machine learning and computer vision techniques to learn features suitable discriminating between mTBI and normal patients. We focus on 3 regions in brain, and extract multiple patches from them, and use bag-of-visual-word technique to represent each subject as a histogram of representative patterns derived from patches from all training subjects. After extracting the features, we use greedy forward feature selection, to choose a subset of features which achieves highest accuracy. We show through experimental studies that BoW features perform better than the simple mean value features which were used previously.
Mild traumatic brain injury (mTBI) is a growing public health problem, which can cause loss of consciousness and/or confusion and disorientation. In addition to civilian head trauma, we are now faced with on-going U.S. military-related brain injury as well as greater numbers of sport-related head injuries . The person with mTBI usually has cognitive problems such as headache, difficulty thinking, memory problems, attention deficits, mood swings and frustration. Up to 20-30% of patients with mTBI develop persistent symptoms months to years after the initial injury, referred to as post-concussive syndrome (PCS), resulting in substantial disability.
Currently, several different definitions of mTBI exist (World Health Organization, American Congress of Rehabilitation Medicine [ACRM] , Centers for Disease Control and Prevention , Department of Defense , and Department of Veteran Affairs ). There is universal agreement that a unified, objective definition is needed. Furthermore, most identification schemes rely on Glasgow Coma Scale score, which was recently deemed insufficient for diagnosing traumatic brain injury by the National Institute for Neurological Disorders and Stroke, which proposed that neuroimaging have a larger role in the classification scheme for mTBI. Recent work using MRI revealed that there are areas of subtle brain injury after mTBI; however, no single imaging metric has thus far been shown to be sufficient as an independent biomarker.
While diffusion MRI has been extremely promising in the study of mTBI, identifying patients with recent mTBI remains a challenge. The literature is mixed with regard to localizing injury in these patients, however, gray matter such as the thalamus and white matter including the corpus callosum and frontal deep white matter have been repeatedly implicated as areas at high risk for injury. In , Lui proposed a machine learning approach based on mean feature values of different metrics from MR images. In , Vergara proposed an approach based on features derived from resting state functional network connectivity (rsFNC) and diffusion magnetic resonance imaging, followed by linear support vector machine. While these works are also using a machine learning approach, but the feature used for them may not be the best set for this task.
The purpose of this study is to develop a machine learning framework to classify mTBI patients and controls using features derived from multi-shell diffusion MRI in the thalamus, frontal white matter and corpus callosum. In the machine learning community, it is well known that using multiple features can improve classification performance compared with a single feature alone, and that the performance of classification algorithm mainly relies on the usefulness of feature set. We have explored a new approach for feature extraction from MR images, where instead of the prior approach where the mean value of different metrics in various brain regions are used as feature, we use computer vision based techniques to learn a set of visual words from diffusion MR images of brain, using bag-of-visual-word (BoW) approach . We then use feature selection followed by a classification algorithm to identify mTBI patients. We show that by using greedy forward feature selection, we are able to achieve higher accuracy over single best feature. Through experimental study, we show that these features result in much higher accuracy compared to the simple mean features. The preliminary results of this work are presented in .
The structure of the rest of this paper is as follows. Section II provides a description of the proposed framework. Section III provides a brief overview of bag of visual words approach. Section IV provides the experimental studies and performance analysis. And finally the paper is concluded in Section V.
Ii The Proposed Framework
There have been some previous works on mTBI classification using various sets of features, from demographic (such as age and gender) and neurocognitive to imaging related features. Demographic features alone would not be sufficient to classify a person as mTBI, and it would be helpful to include all possible features in the feature pool and use feature selection to pick the best subset of features. Demographic and neurocognitive features are easy to derive, but for imaging features, it is not clear what is the best way to derive them. For demographic features, age and sex are used in this paper. And for neurocognitive features, we used Stroop, Symbol Digit Modalities Test (SDMT), California Verbal Learning Test (CVLT) and Fatigue Severity Scale (FSS). In the past few years, we have developed specialized MR imaging protocols and related image features that are promising for distinguishing mTBI patients from controls -. Some of these metrics are summarized in Table I.
|MRI Metric||Metric Description|
|AWF||Axonal Water Fraction|
|DA||Diffusivity within Axons|
|De-par||Diffusion parallel to the axonal tracts in the extra-axonal|
One way to derive features from MR images, is to calculate the mean value of (some of ) the above metrics in different regions such as: thalamus, prefrontal white-matter, corpus callosum (CC) Body, CC-Genu, and CC-Splenium (which are focused in our work). But mean value may not be the best way to extract features from a specific region and metric. In this work, we propose a new approach for learning features from MR images, based on bag-of-visual-words. This approach is explained in Section III. After extracting features, we use feature selection  to reduce the dimensionality of the feature. We tried multiple greedy approaches for feature selection, and found out greedy forward feature selection performs best for this task. Greedy forward feature selection selects the best features one at a time. Assuming denotes the best subset of features of size , the -th feature is selected as the one which results in the highest cross-validation accuracy rate along with the features already chosen (in ). One can stop adding feature, either by setting a maximum size for the feature set, or when adding more features does not increase the accuracy rate. The block diagram of the overall algorithm is shown in Figure 1. As we can see the features from image are concatenated with selected demographic and neurocognitive features and used for classification. After selecting the feature subset, a classification algorithm is used to classify the samples into patient and control. Different classifiers can be used for this purpose, such as support vector machine (SVM) , logistic regression , random forest , and neural network . Based on our experimental studies, we found that most of these classifiers results in similar accuracy, but SVM achieves slightly higher accuracy. Therefore we performed most of our experimental studies using SVM.
Iii region-specific bag-of-words representation for MR images
Bag of visual words is a popular approach in computer vision , which is used for various applications -. The idea of bag of visual words in computer vision is inspired by bag of word representation in text analysis, where a document is represented as a histogram of words from a dictionary, and these histograms are used to analyze the text documents . In the same way, one can represent an image (or video) as a histogram of visual words. Since there is no intrinsic words defined for images, we need to first create the visual words. A popular approach is to extract a large number of patches from training images (either around key-points, or over a regular grid), and then use clustering algorithms, such as k-means  and mean-shift , to cluster these patches into clusters, and use their centroids as the visual patterns. Instead of raw pixel values in patches, one can also extract some image descriptor from each patch and learn the words from those features. Then to derive the BoW representation for a new image, it is first divided into several patches, and then the histogram of those patches are found over the visual words (learned over training samples), and these histograms are used as the feature representation of the image (or video). Figure 2 denotes the schematic of the BoW algorithm for brain images.
In our case, 16x16 patches are extracted from brain slices through the areas of interest and all the training patches in the mTBI patients and control subjects are separately clustered to learn the most representative visual patterns (called âvisualâ words). For our problem, we applied BoW approach on 5 imaging metrics (AWF, DA, De_par, FA, MD) for two brain regions (Thalamus, Corpus Callosum), and learned a set of 20 visual words from patches of size 16x16 for each one of them for mTBI and control populations separately. K-means clustering is used to learn visual words in each case. Then for each subject, we extract patches from the two brain regions, and for each patch find the closest one among all words in both dictionary. Finally we concatenate the BoW histograms for all metrics and two regions to derive the final visual representation (a 200-dimensional feature).
Iv experimental results
We collected a set of 69 mTBI subjects between 18 and 64 years old, within 1 month of mTBI as defined by the American College of Rehabilitation Medicine (ACRM) criteria for head injury and 40 healthy age and sex-matched controls. Imaging was performed on a 3.0 Tesla Siemens Trio (Erlangen, Germany) magnet including multi-shell diffusion MRI at 5 b-values (250, 1000, 1500, 2000, 2500 s/mm2) in a total of 136 directions using multiband 2 at isotropic 2.5mm image resolution.
To evaluate the model performance, we use a similar approach to 5-fold cross validation, where each time we take 20% of the samples for validation, and the rest for training. For the forward feature selection process, for each candidate new feature, the training samples are used to train the SVM model, and the performance is evaluated on validation samples. To decrease the sampling bias, we repeat this approach 50 times, and take the average accuracy as the cross validation accuracy. For SVM, we use radial basis function (RBF) kernel. The hyper-parameters of SVM model (kernel width gamma, and the mis-classification penalty weight, C) are tuned to achieve the highest cross validation accuracy for each candidate feature set. It is worth to mention that, we normalize all features before feeding as the input to SVM, by making them zero-mean and unit-variance.
For the baseline, we use a feature set that includes the mean value of metrics in different regions (as in ), along with demographic and neurocognitive features, the best single feature achieves a classification accuracy of 72% (in cross-validation sense), using AWF in CC-Body. Also, the best feature subset chosen by the greedy feature selection has an accuracy of 80% with 8 features (De_par in thalamus, De_par and DA in pre-frontal white matter, FA in CC-Genu, AWF and De_per in CC-Body, Stroop and SDMT).
For the proposed approach, the raw representation is 206 dimensional, including 20 words for each of 5 MR metrics (AWF, DA, De_par, FA, MD) in 2 brain regions (thalamus and Corpus callosum), and 6 demographic and neurocognitive features. BoW approach achieved further improvement in accuracy to 88%, and the optimum subset contains 8 features (which includes age, and words from AWF, De_par and FA in thalamus, and also DA, De_par, FA and MD in corpus callosum). We have also evaluated the classification accuracy for feature subset of different size. Figure 3 shows the classification accuracies achieved by optimum subset of feature of dimension 2 to 8.
Besides classification accuracy, we also report the sensitivity and specificity, which are important in the study of medical data analysis. The sensitivity and specificity are defined as in Eq. (1), where TP, FP, TN, and FN denote true positive, false positive, true negative, and false negative respectively. In our evaluation, we treat the mTBI subjects as positive.
Figure 4 denotes the classification accuracies, sensitivities and specificities for different ratios of training samples (i.e. to keep different percentages of samples as training and the rest as test).
The classification accuracies using different approaches are summarized in Table 2.
|Features used with SVM||Classification Accuracy|
|Single best feature||72%|
|The selected subset with 8 features||80%|
|BoW approach with 8 selected features||88%|
As we can see, BoW approach achieves significant improvement over mean value features. One possible explanation is that BoW features are able to find more discriminative pattern between mTBI and control cohorts. We show some example BoW histograms of two mTBI and two control subjects in Figure 5. As we can see mTBI and control subjects have clear differences in frequency of some visual words, for example along the right side of the histograms.
Here we show the application of bag of visual words on diffusion MR images for classification of patients with mTBI compared with controls. In the approach, a set of visual features are learned from 5 MR metrics in two brain regions, and are used along with two demographic features and four neurocognitive tests. Then greedy forward feature selection and support vector machine are used to perform classification. We show that by learning visual feature from MR images, we obtain significant gain over mean value features which were used previously. This shows the promise of feature learning for medical image classification. These visual features can also be used for long-term outcome prediction of mTBI patients .
We would like to thank Cameron Johnson for his help on some part of this project.
-  Faul M LW, Wald MM, Coronado VG, “Traumatic Brain Injury in the United States: Emergency Department Visits, Hospitalizations and Deaths”, 2010.
-  Carroll LJ, Cassidy JD, Peloso PM, et al. “Prognosis for mild traumatic brain injury: results of the WHO Collaborating Centre Task Force on Mild Traumatic Brain Injury” J Rehabil Med: 84â105, 2004.
-  Kay T, Harrington D, Adams R, et al. “Definition of mild traumatic brain injury”, J Head Trauma Rehabil; 8:86â87, 1993.
-  Marr A, Corronado V, editors. “Central Nervous System Injury Surveillance Data Submission Standardsâ2002. Atlanta: Centers for Disease Control and Prevention”, National Center for Injury Prevention and Control; 2004.
-  Veterans Administration Department of Defense Clinical Practice Guideline for Mangement of Concussion/Mild Traumatic Brain Injury [online]. Available at: http://www.dcoe.mil/files/VA-DoD-Management-of-Concussion-mild-Traumatic-Brain-Injury.pdf, Accessed August 22, 2014.
-  YW Lui, Y Xue, D Kenul, Y Ge, RI Grossman, Y Wang, “Classification algorithms using multiple MRI features in mild traumatic brain injury”, Neurology 83.14: 1235-1240, 2014.
-  VM Vergara, AR Mayer, E Damaraju, KA Kiehl, V Calhoun, “Detection of mild traumatic brain injury by machine learning classification using resting state functional network connectivity and fractional anisotropy”, Journal of neurotrauma: 1045-1053, 2017.
-  J Yang, YG Jiang, AG Hauptmann, CW Ngo, “Evaluating bag-of-visual-words representations in scene classification”, Proceedings of the international workshop on Workshop on multimedia information retrieval, ACM, 2007.
-  S Minaee, Y Wang, S Chung, X Wang, E Fieremans, S Flanagan, J Rath, and YW Lui, “A Machine Learning Approach For Identifying Patients with Mild Traumatic Brain Injury Using Diffusion MRI Modeling”, The ASFNR 11th Annual Meeting, 2017.
-  Cohen, B. A., M. Inglese, H. Rusinek, J. S. Babb, R. I. Grossman, and O. Gonen. “Proton MR spectroscopy and MRI-volumetry in mild traumatic brain injury”, American Journal of Neuroradiology 28, no. 5: 907-913, 2007.
-  Miles, Laura, Robert I. Grossman, Glyn Johnson, James S. Babb, Leonard Diller, and Matilde Inglese. “Short-term DTI predictors of cognitive dysfunction in mild traumatic brain injury”, Brain injury 22, no. 2: 115-122, 2008.
-  Raz, E., Jensen, J.H., Ge, Y., Babb, J.S., Miles, L., Reaume, J., Grossman, R.I. and Inglese, M., “Brain iron quantification in mild traumatic brain injury: a magnetic field correlation study”, American journal of neuroradiology, 32(10), pp.1851-1856, 2011.
-  Tang, L., Ge, Y., Sodickson, D.K., Miles, L., Zhou, Y., Reaume, J. and Grossman, R.I., “Thalamic resting-state functional networks: disruption in patients with mild traumatic brain injury”, Radiology, 260(3), pp.831-840, 2011.
-  Zhou Y, Kierans A, Kenul D, Ge Y, Rath J, Reaume J, Grossman RI, Lui YW, “Longitudinal Regional Brain Volume Changes in Mild Traumatic Brain Injury Patients”, Radiology, in press 2013.
-  Y Yang, S Newsam, “Bag-of-visual-words and spatial extensions for land-use classification”, Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, ACM, 2010.
-  X Peng, L Wang, X Wang, Y Qiao, “Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice”, Computer Vision and Image Understanding 150: 109-125, 2016.
-  I Guyon, A Elisseeff, “An introduction to variable and feature selection”, Journal of machine learning research 3 Mar: 1157-1182, 2003.
-  Cortes, Corinna, and Vladimir Vapnik. ”Support-vector networks.” Machine learning 20.3: 273-297, 1995.
-  Christopher Bishop, “Pattern recognition and machine learning”, springer, 2006.
-  A Liaw, M Wiener, “Classification and regression by random forest”, R news 2.3: 18-22, 2002.
-  I Goodfellow, Y Bengio, A Courville, “Deep learning”, MIT press, 2016.
-  G Lebanon, Y Mao, J Dillon, “The locally weighted bag of words framework for document representation”, Journal of Machine Learning Research: 2405-2441, 8 Oct 2007.
-  A Likas, N Vlassis, JJ Verbeek, “The global k-means clustering algorithm”, Pattern recognition 36.2: 451-461, 2003.
-  D Comaniciu, P Meer, “Mean shift: A robust approach toward feature space analysis”, IEEE Transactions on pattern analysis and machine intelligence, 603-619, 2002.
-  S Minaee, Y Wang, and YW. Lui. “Prediction of longterm outcome of neuropsychological tests of MTBI patients using imaging features”, Signal Processing in Medicine and Biology Symposium, IEEE, 2013.