Predicting Language Recovery after Stroke with Convolutional Networks on Stitched MRI

Predicting Language Recovery after Stroke with Convolutional Networks on Stitched MRI

Yusuf H. Roohani
GlaxoSmithKline, Cambridge, MA
\AndNoor Sajid
University College London
\AndPranava Madhyastha
Imperial College London
\AndCathy J. Price
University College London
\AndThomas M. H. Hope
University College London

One third of stroke survivors have language difficulties. Emerging evidence suggests that their likelihood of recovery depends mainly on the damage to language centers. Thus previous research for predicting language recovery post-stroke has focused on identifying damaged regions of the brain. In this paper, we introduce a novel method where we only make use of stitched dimensional cross-sections of raw MRI scans in a deep convolutional neural network setup to predict language recovery post-stroke. Our results show: a) the proposed model that only uses MRI scans has comparable performance to models that are dependent on lesion specific information; b) the features learned by our model are complementary to the lesion specific information and the combination of both appear to outperform previously reported results in similar settings. We further analyse the CNN model for understanding regions in brain that are responsible for arriving at these predictions using gradient based saliency maps. Our findings are in line with previous lesion studies.


Predicting Language Recovery after Stroke with Convolutional Networks on Stitched MRI

  Yusuf H. Roohani GlaxoSmithKline, Cambridge, MA Noor Sajid University College London Pranava Madhyastha Imperial College London Cathy J. Price University College London Thomas M. H. Hope University College London


noticebox[b]Machine Learning for Health (ML4H) Workshop at NeurIPS 2018.\end@float

1 Introduction

Stroke is one of the most common causes of disability. One third of stroke survivors leave the hospital with difficulties relating to cognitive and language understanding [1]. This is known as aphasia or dysphasia in less severe cases. The likelihood of a patient to recover their language capabilities after stroke is thought to depend mainly on the proportion of damage to the brain and intensity of the initial symptoms  [2, 3].

Previous research has focused on the explicit use of brain structures derived from anatomically defined regions of the brain [4, 5, 6, 2, 7]. This, more often than not, will require specialists’ knowledge of the brain. The usual features for prediction tend to make use of the proportion of damaged regions of the brain [6, 4], commonly referred to as lesions, alongside demographic and behavioural features. However, outside the realm of predicting recovery post-stroke, Wilson el al. 2009 [8] have proposed the use of principle component analysis for a more direct feature extraction process from MRI scans to predict primary progressive aphasia variants.

The paper introduces a novel method that makes use of stitched dimensional cross-sections of raw MRI scans in a deep convolutional neural network setup. The results indicate that our proposal is able to predict language outcome post stroke with comparable performance to models that are dependent on expert derived lesion specific information. We measure language ability using Comprehensive Aphasia Test’s (CAT) spoken picture description score  [9]. This score is highly correlated with the prediction of language recovery  [4] and assesses the patient’s ability to verbally describe a picture in three words or more.

Features Hope et al. 2013  Ours Baseline 0.0* 0.0 Img. Rep. - 0.53 Demographic - 0.13 Lesion* 0.50* 0.56 Img. Rep. - 0.60
Table 1: R-squared
Features Hope et al. 2018  Ours Baseline - 0.25 Img. Rep. - 0.74 Demographic - 0.42 Lesion* 0.73 0.75 Img. Rep. - 0.78
Table 2: Pearson R’s scores
Table 3: Comparison to previous work [4, 5]. *Note: previous approaches use different splits

2 Related Work

Recent work in the domain of predicting language recovery has focused on the relevance of imaging-based methods for better understanding language recovery post-stroke patients. Price et al. 2010 [6] introduce a data centric system that relies on structural MRI data in combination with behavioral data from standardised assessments, and demographic information to better predict individual outcomes and recovery post-stroke. Saur et al. 2010 [10] demonstrate the usefulness of language functional MRI activations to predict individual language outcomes obtained six months post stroke as a binary classification problem using support vector machines. Their work highlighted the importance of using imaging based methods since limiting the feature space only to age and current language deficit reduced the accuracy from to .

Hope et al. 2013 [4] used PLORAS data to predict severity of language impairment, at the individual level, months or years after stroke onset. Their work was reliant on lesion identification techniques [11] for converting the MRIs to anatomically defined regions of interest that are destroyed [12]. Their work also emphasized the importance of using more representative information about lesion location derived from MRIs. Their R-squared results increased from to with inclusion of finer-grained lesion-location data. Hope et al. 2018 [5] again showcased the necessity of using MRI derived lesion information to predict language outcomes post stroke. However, in all these studies, they tend to rely on more statistical and expert driven methodologies on lesion extraction and language outcome prediction. In contrast, our proposed method relies directly on MRI scans. Zikic et al.  2014  [13] have shown the success of using convolutional neural networks for brain tumor classification using a combination of 2d and 3d MRI scans as inputs.

3 PLORAS with CNN using Image Stitching

We faced a few challenges with training a network directly on the fMRI scans. Given the scarcity of data, we felt it would be difficult to effectively train convolutional operations due to the higher number of free parameters required as compared to  [14]. We instead choose to use axial cross sections to capture the maximum variation while minimizing the number of trainable parameters.

However, analyzing -dimensional slices individually was also not an option as the model would not be able to access vital contextual information across different scans. To overcome this, we proposed to stitch the slices together for each scan to create a single large image for each scan. (Figure 1 ). In this setup, the MRI scans followed a standard numbering system such that each voxel corresponded to the same location of the brain across different scans. Thus, we ensured that the individual image layers within each scan are in the same order and each pixel in the stitched image physiologically matched every other image in the training set. We found that this setup was very helpful as it prevented the neural network from training on meaningless variation in the dataset.

4 Experiments


Our dataset comprises of stroke patients from the Predicting Language Outcome Recovery After Stroke (PLORAS) database [1]. For each patient, we have demographic information, high resolution T1-weighted post-stroke MRI brain scans and associated CAT behavioral test results [9]. PLORAS dataset contains a total of records from patients. There are unique assessments for spoken picture description scores and their associated MRI scan, of which entries were initial assessments, and the remaining were follow-ups.

The training set contained patients, females and males, with an average age at stroke of (Q1-Q3: ). The spoken picture description outcomes were skewed distributed, ranging from to . These scores were assessed from any time post-stroke, ranging from one day to years. The test set had patients with a balanced split between the classes.

The following demographic features are included in our model a) years between stroke and scan, b) whether vision is affected, c) whether hearing is affected, d) gender, e) number of lesions, f) localisation of lesion; left and right, g) years of education, h) age at stroke i) age since stroke and j) handedness. Alongside, we make use of expert derived lesion information. The baseline model (see Table 3) includes only a), d), h) and j) as defined in Hope et al. 2013 [4].


We use stitched images to train a convolutional neural network that classifies images as above or below the threshold (in our experiments this was based on Hope et al. 2018 [5]). Our model was composed of trainable parameters. The input to the model was a resized image of size . We trained the model using fold cross validation and the model achieved an average accuracy of . We extracted the dimensional output from the final convolutional layer as image feature representation. In order to visualize our learned image feature representation, we project the data into the first two principle components. Figure 1(right) illustrates the ability of these features to distinguish between the classes within the validation set. In comparison with the demographic features, we see that our model has learned highly discriminative features.

The extracted -dimensional feature vector for each image was used to regress against the spoken picture description score along with the rest of the demographic features. For this purpose, we use a feed-forward neural network with -hidden layer, adaptive moment estimation for optimisation and mean squared error as the loss. We also trained the MRI scans directly on a convolutional neural network regressor and found that the architecture has to be much more extensive to achieve comparable results and it does not allow for the inclusion of demographic features.

Figure 1: Left: The 2-dimensional stitched MRI scans. Right: Visualization of the penultimate representation using PCA

Results and Observations

Our CNN classifier obtained an accuracy of on the held-out test set. We then use the features to train the feed-forward neural network and summarize our results in Table 3 and Table 3. We compare our results to previous state-of-the-art results reported in [4] and [5]. We note that, in both previous work, specialized and sophisticated lesion based features were used to obtain the best results. In contrast, our models only make use of either a) raw image features or b) demographic features and raw image features. We observe that compared to previous approaches that use lesion information, our model obtains competitive performance using only image information. However, we note that Hope et al. 2013 and Hope et al. 2018 use different subsets of the dataset.


We inspected the flow of gradients within the CNN to identify regions of patient MRI scans that were most salient towards the output. We visualized the regions in the form of a gradient based saliency map averaged across patients belonging to either class (Fig.  2 (left)). We further stacked the slices back to compose the original image and visualized the sources of network activation within the original image (Fig.  2 (right)). We observe that the right prefrontal region lights up as particularly significant in predicting speech outcome. These results match with a similar study performed to predict reading [15] and language [16] outcomes. We also observe that since the network is trained on 2D horizontal slices, the regions of activation tend to follow a similar format and are dispersed more widely in the axial plane than in the other two. This was an expected constraint given the training methodology. We also performed correlation distance analysis [17] between the learned representations and the manually extracted white and gray matter and found a high distance correlation of 0.7970. This indicates that the features learned from our proposed CNN based method correlates strongly with the manually extracted gray and white matter features information.

Figure 2: Left: Average gradient based saliency maps for the 2D stitched fMRI scans of all patients who scored below the threshold of 60 on the spoken picture description test. Right: Visualization of the same saliency maps in 3-dimensional cross sections (from top: axial, saggital, coronal). Color map is shared and normalized to peak activation.

5 Discussion

We have proposed a novel method that makes use of stitched -dimensional plots fed into a convolutional neural network for the prediction of language recovery post stroke. This work provides a preliminary investigation into the utility of breaking -dimensional MRI scans into -dimensional images for the extraction of raw image features that achieve comparable performance to models using more sophisticated information. Our models are able to predict the possibility of recovery competitively, even with very simple CNN based models. Our empirical results indicate that our model learns important representations that are useful for predicting language recovery. Recent work addresses potential challenges in predicting the functional outcome after stroke which may not be entirely captured by an MRI scan [18]. Our future work focuses on visualizing the layers and the abstract representations using relevant techniques from computer vision [19, 20]. We would also like to obtain explanations of our predictions using black-box model interpretation techniques [21, 22, 23]. We are excited by the possibility that the technique might provide a new pathway towards better understanding of language centers and stroke.


This PLORAS dataset collection was funded by Wellcome (203147/Z/16/Z and 205103/Z/16/Z) and the Stroke Association (TSA PDF 2017/02). We also thank the Alan Turing Institute (EPSRC grant EP/N510129), in particular for hosting the data study group (/TU/B/000012).


  • [1] Mohamed L Seghier, Elnas Patel, Susan Prejawa, Sue Ramsden, Andre Selmer, Louise Lim, Rachel Browne, Johanna Rae, Zula Haigh, Deborah Ezekiel, et al. The ploras database: a data repository for predicting language outcome and recovery after stroke. Neuroimage, 124:1208–1212, 2016.
  • [2] E. Plowman, B. Hentz, and C. Jr Ellis. Post-stroke aphasia prognosis: a review of patient-related and stroke-related factors. journal of evaluation in clinical practice. Journal of Evaluation in Clinical Practice, 3:689–694, 2012.
  • [3] Katrien Segaert, Laura Menenti, Kirsten Weber, Magnus Karl Petersson, and Peter Hagoort. Shared syntax in language production and language comprehension-an fmri study. Cerebral Cortex, 22:1662–1670, 2012.
  • [4] Thomas M.H. Hope, Mohamed L Seghier, Alex P. Leff, and Cathy J. Price. Predicting outcome and recovery after stroke with lesions extracted from mri images. NeuroImage: Clinical, 2:424–433, 2013.
  • [5] Thomas M.H. Hope, Alex P. Leff, and Cathy J. Price. Predicting language outcomes after stroke: Is structural disconnection a useful predictor? NeuroImage: Clinical, 19:22–29, 2018.
  • [6] Cathy J. Price, Alex P. Leff, and Mohamed L Seghier. Predicting language outcome and recovery after stroke: the ploras system. nature reviews. Neurology, 6:202–210, 2010.
  • [7] B Crosson, K McGregor, KS Gopinath, TW Conway, M Benjamin, YL Chang, AB Moore, AM Raymer, RW Briggs, MG Sherod, CE Wierenga, and KD White. Functional mri of language in aphasia: a review of the literature and the methodological challenges. Neuropsychol Review, (17):157–177, 2007.
  • [8] Stephen M. Wilson, Jennifer M. Ogar, Victor Laluz, Matthew Growdon, Jung Jang, Shenly Glenn, Bruce L. Miller, Michael W. Weiner, and Maria Luisa Gorno-Tempini. Automated mri-based classification of primary progressive aphasia variants. Neuroimage, 47:1558–1567, 2009.
  • [9] K. Swinburn, G. Porter, and Howard D. Comprehensive Aphasia Test. Psychology Press, 2004.
  • [10] D. Saur, O Ronneberger, I Kummerer, C Weiller, and S Klöppel. Early functional magnetic resonance imaging activations predict language outcome after stroke. Brain, 133:1252–64, 2010.
  • [11] Mohamed L Seghier, A. Ramlackhansingh, J.T. Crinion, A.P. Crinion, and C.J. Price. Lesion identification using unified segmentation-normalisation models and fuzzy clustering. Neuroimage, 41:1253–1266, 2008.
  • [12] S. Eickhoff, K. Stephan, H. Mohlberg, C. Grefkes, G.R. Fink, K. Amunts, et al. A new spm toolbox for combining probabilistic cytoarchitectonic maps and func- tional imaging data. Neuroimage, 25:1325–1335, 2005.
  • [13] Segmentation of brain tumor tissues with convolutional neural networks. MICCAI Multimodal Brain Tumor Segmentation Challenge (BraTS), 2014.
  • [14] Classification of brain MRI with big data and deep 3D convolutional neural networks, 2018.
  • [15] Fumiko Hoeft, Bruce D. McCandliss, Jessica M. Black, Alexander Gantman, Nahal Zakerani, Charles Hulme, Heikki Lyytinen, Susan Whitfield-Gabrieli, Gary H. Glover, Allan L. Reiss, and John D. E. Gabrieli. Neural systems predicting long-term outcome in dyslexia. Proceedings of the National Academy of Sciences, 108(1):361–366, 2011.
  • [16] Thomas M. H. Hope, Alex P Leff, Susan Prejawa, Rachel Bruce, Zula Haigh, Louise Lim, Sue Ramsden, Marion Oberhuber, Philipp Ludersdorfer, Jenny Crinion, Mohamed L. Seghier, and Cathy J. Price. Right hemisphere structural adaptation and changing language skills years after left hemisphere stroke. Brain, 140:1718–1728, 2017.
  • [17] Gábor J Székely, Maria L Rizzo, Nail K Bakirov, et al. Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6):2769–2794, 2007.
  • [18] Cathy J Price, Thomas M Hope, and Mohamed L Seghier. Ten problems and solutions when predicting individual outcome from lesion site after stroke. Neuroimage, 145:200–208, 2017.
  • [19] Bolei Zhou, David Bau, Aude Oliva, and Antonio Torralba. Interpreting visual representations of neural networks via network dissection. Journal of Vision, 18(10):1244–1244, 2018.
  • [20] Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Advances in Neural Information Processing Systems, pages 6076–6085, 2017.
  • [21] Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730, 2017.
  • [22] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, pages 618–626, 2017.
  • [23] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144. ACM, 2016.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description