Harnessing spatial MRI normalization: patch individual filter layers for CNNs

Harnessing spatial MRI normalization: patch individual filter layers for CNNs

Fabian Eitel
Humboldt Universität Berlin
Berlin, 10117
Jan Philipp Albrecht
Freie Universität Berlin
Berlin, 14195
Friedemann Paul
Charité - Universitätsmedizin Berlin
Berlin, 10117
Kerstin Ritter
Charité - Universitätsmedizin Berlin
Berlin, 10117

Neuroimaging studies based on magnetic resonance imaging (MRI) typically employ rigorous forms of preprocessing. Images are spatially normalized to a standard template using linear and non-linear transformations. Thus, one can assume that a patch at location contains the same brain region across the entire data set. Most analyses applied on brain MRI using convolutional neural networks (CNNs) ignore this distinction from natural images. Here, we suggest a new layer type called patch individual filter (PIF) layer, which trains higher-level filters locally as we assume that more abstract features are locally specific after spatial normalization. We evaluate PIF layers on three different tasks, namely sex classification as well as either Alzheimer’s disease (AD) or multiple sclerosis (MS) detection. We demonstrate that CNNs using PIF layers outperform their counterparts in several, especially low sample size settings.

1 Introduction

CNNs have been successfully applied on neuroimaging data Jo et al. (2019); Vieira et al. (2017). However, several challenges have been discussed: First, sample sizes are low and public data sets typically contain no more than 1,000 patients of a specific disease. Second, MR sequences are three dimensional and can contain up to 1 million non-zero voxels, making the number of features much greater than the number of samples. Lastly, many features of the brain and neurological or psychiatric diseases are not fully understood. Even though there are guidelines for neurological assessment of diseases, these change over time (e.g. the McDonald criteria Polman et al. (2011); Thompson et al. (2018)) and only represent our current understanding of a disease.

Previous approaches using CNNs in neuroimaging tend to convert architectures which are successful on natural images to 3D MRI classifiers Guan et al. (2019) by replacing 2D with 3D operations. In those, the special features of MRI data are typically ignored. Here, we specifically make use of the spatial homogeneity of brain MRI data. Through linear and non-linear transformations, MR images are normalized to a shared template within the MNI space such as the ICBM 152 atlas Ashburner (2000); Avants et al. (2008); Fonov et al. (2011). This ensures that a voxel at location contains more or less the same brain region in every image and allows researchers to investigate a specific region (e.g. the hippocampus) across subjects. We suggest to address the spatial homogeneity by training within patches of the image and evaluate this approach in three different tasks: sex classification, AD and MS detection. Unlike patch-based approaches (see Section Related Work) we only intend to train patch-wise in higher layers. This is motivated by the idea of abstraction: whereas lower level features such as edge detectors might be globally relevant, some higher level features might be more locally relevant. Because higher level filters in the PIF setting train on patches which contain less information and noise than the entire image, we show here that they require fewer iterations over the training set as well as fewer samples to converge as compared to vanilla CNNs.

2 Related Work

PIF layers are different from patch-based training. In patch-based training Kamnitsas et al. (2016); Ghafoorian et al. (2017); Yoo et al. (2018), multiple patches are sampled from the dataset and fed into the same classifier regardless of the position of each patch. Therefore, the classifier’s filters share weights between different patches. Conversely, within PIF layers, weights are only shared within a spatially restricted patch. PatchGANs Li and Wand (2016); Isola et al. (2017) use Markovian patches as input for a discriminator network in order to focus penalization on high-frequency structure.

PIF layers are a generalization of local convolutions as implemented in Lasagne111https://lasagne.readthedocs.io/en/latest/modules/layers/local.html and Keras222https://keras.io/layers/local/. Local convolutions are similar to regular convolutions but do not share weights. Local convolutions are a special case of PIF layers where , where is the patch size, is the padding size and is the kernel size. Thus, the convolution kernel does not slide over the selected patch (because they are congruent).

3 Methods

The heterogeneity of natural images depicting the same object requires filters to be convolved with the entire image. In a cat detection model, for example, we would expect that some higher level filters detect cat ears. In this case, it is necessary to convolve those cat ear filters with the entire image, because cat ears might be located anywhere in the image. However, when all images are spatially standardized, e.g. objects are in the same angle, viewpoint and distance and all major facial features of the cats are at the same location in each image, it would suffice if the cat ear filter searches around a small subspace of the image. This drastic form of spatial normalization is unlikely to achieve in natural images, but in neuroimaging it is the de facto standard and a major requirement for mass-univariate and multivariate pattern analysis (MVPA).

Figure 1: Depiction of a patch individual filter (PIF) layer in 2D. In this setting, inputs are 5 feature maps from a previous layer. Each feature map is being split in 16 patches and convolutions are applied patch-wise. Finally, the feature maps are reassembled in the same order.

For the analysis of spatially normalized MRI data, we suggest a new CNN architecture relying on PIF layers. PIF layers consist of 3 stages: (i) split, (ii) process and (iii) reassemble. Each output feature map of the previous layer is first split (i) into patches of size . Next, the patches at row and column of all feature maps are processed (ii) with a series of local convolutions of kernel size . This is repeated for all patches. When , weights are shared within each patch but not across patches. Lastly, all patches are reassembled (iii) in the same order as they were split. Figure 1 shows an overview of the layer design. The final model consists of 4 convolutional blocks (Conv-BatchNorm-ReLU) followed by a PIF layer with a single convolutional block between split and reassemble phases.

4 Experiments

We evaluated PIF layers on three different datasets/tasks: 1) sex classification on a subset of the UK Biobank (F=1005, M=849), 2) AD detection on data of the Alzheimer’s Disease Neuroimaging Initiative (ADNI; AD=475, HC=494) and 3) MS detection on a small private data set (MS=76, HC=71). We evaluate the performance in terms of balanced accuracy, iterations until early stopping and performance on a smaller subset. The subset contains 20% of randomly drawn samples from the full data set. As the number of samples for the MS data set is already small, we did not use a subset here. We compare results to a simple 5-layer CNN architecture that has shown good results on the ADNI data set and is adjusted slightly for each task. For UK Biobank and ADNI the number of parameters is smaller in the PIF architecture, whereas in MS it is larger. After finding suitable hyperparameters (learning rate, weight decay, number of filters, dropout) for each task, all experiments were repeated 10 times and averages over all repetitions are reported. Data for baseline model training was augmented using horizontal flips and translation, whereas for PIF model training only horizontal flips were used to avoid misaligned images.

5 Results

Table 1 shows the balanced accuracy and iteration in which training finished using early stopping. On the sex classification task (UK Biobank), the PIF model works almost identical on the full data set with an accuracy of almost 90%. When using the small subset of only 20% from the original data set, the balanced accuracy of the PIF model increased from 64.47% to 78.11%. On the full data set, the required iterations until early stopping almost halve, whereas on the subset they increase from 40.7 to 69.5 iterations. This is the only case where the PIF architecture has a higher number of iterations and might be due to the task-specific PIF model having a higher amount of parameters than on the other data sets. On the AD classification task (ADNI), both baseline and PIF model perform similarly with a balanced accuracy of around 84.5% but with a reduced set of required iterations in case of the PIF model (31.8 to 22.3). On the ADNI subset, the baseline outperformed the PIF architecture with an accuracy of 81.09% over 76.65% but early stopping with the PIF model occurred on average in iteration 71.9 compared to iteration 106.4 in the baseline model. Lastly, for MS detection balanced accuracy increased from 75.04% to 80.92% when using the PIF architecture and the required number of iterations decreased on average from 83.7 to 53.5.

Large data set Small data set
Data Model Bal. acc. Early stopping iter. Bal. acc. Early stopping iter.
UK Biobank Baseline-A 89.52% 59.2 64.47% 40.7
UK Biobank PIF 89.06% 33.6 78.11% 69.5
ADNI Baseline-B 84.62% 31.80 81.09% 106.4
ADNI PIF 84.43% 22.30 76.65% 71.9
MS Baseline-C - - 75.04% 83.7
MS PIF - - 80.92% 53.5
Table 1: Results

6 Discussion

In this work, we have introduced a new CNN architecture relying on PIF layers to harness the established techniques of spatial normalization in neuroimaging. In multiple experiments, we have shown that PIF layers can outperform simple CNNs, especially in low sample scenarios. Further experiments are required to investigate whether the success of PIF layers is task-specific, such as the learning of regional differences in MS lesions in comparison to global atrophy in AD.


  • J. Ashburner (2000) Computational neuroanatomy. Ph.D. Thesis, University College London. External Links: Link Cited by: §1.
  • B.B. Avants, C.L. Epstein, M. Grossman, and J.C. Gee (2008) Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis 12 (1), pp. 26 – 41. Note: Special Issue on The Third International Workshop on Biomedical Image Registration – WBIR 2006 External Links: ISSN 1361-8415, Document, Link Cited by: §1.
  • V. Fonov, A. C. Evans, K. Botteron, C. R. Almli, R. C. McKinstry, and D. L. Collins (2011) Unbiased average age-appropriate atlases for pediatric studies. NeuroImage 54 (1), pp. 313 – 327. External Links: ISSN 1053-8119, Document, Link Cited by: §1.
  • M. Ghafoorian, N. Karssemeijer, T. Heskes, M. Bergkamp, J. Wissink, J. Obels, K. Keizer, F. de Leeuw, B. van Ginneken, E. Marchiori, and B. Platel (2017) Deep multi-scale location-aware 3D convolutional neural networks for automated detection of lacunes of presumed vascular origin.. NeuroImage. Clinical 14, pp. 391–399. External Links: Document, ISSN 2213-1582, Link Cited by: §2.
  • Z. Guan, R. Kumar, Y. R. Fung, Y. Wu, and M. Fiterau (2019) A comprehensive study of alzheimer’s disease classification using convolutional neural networks. CoRR abs/1904.07950. External Links: Link, 1904.07950 Cited by: §1.
  • P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Cited by: §2.
  • T. Jo, K. Nho, and A. J. Saykin (2019) Deep learning in alzheimer’s disease: diagnostic classification and prognostic prediction using neuroimaging data. Frontiers in Aging Neuroscience 11, pp. 220. External Links: Link, Document, ISSN 1663-4365 Cited by: §1.
  • K. Kamnitsas, E. Ferrante, S. Parisot, C. Ledig, A. V. Nori, A. Criminisi, D. Rueckert, and B. Glocker (2016) DeepMedic for brain tumor segmentation. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, A. Crimi, B. Menze, O. Maier, M. Reyes, S. Winzeck, and H. Handels (Eds.), Cham, pp. 138–149. External Links: ISBN 978-3-319-55524-9 Cited by: §2.
  • C. Li and M. Wand (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In European Conference on Computer Vision, pp. 702–716. Cited by: §2.
  • C. H. Polman, S. C. Reingold, B. Banwell, M. Clanet, J. A. Cohen, M. Filippi, K. Fujihara, E. Havrdova, M. Hutchinson, L. Kappos, et al. (2011) Diagnostic criteria for multiple sclerosis: 2010 revisions to the mcdonald criteria. Annals of neurology 69 (2), pp. 292–302. Cited by: §1.
  • A. J. Thompson, B. L. Banwell, F. Barkhof, W. M. Carroll, T. Coetzee, G. Comi, J. Correale, F. Fazekas, M. Filippi, M. S. Freedman, et al. (2018) Diagnosis of multiple sclerosis: 2017 revisions of the mcdonald criteria. The Lancet Neurology 17 (2), pp. 162–173. Cited by: §1.
  • S. Vieira, W. H.L. Pinaya, and A. Mechelli (2017) Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: methods and applications. Neuroscience & Biobehavioral Reviews 74, pp. 58 – 75. External Links: ISSN 0149-7634, Document, Link Cited by: §1.
  • Y. Yoo, L. Y.W. Tang, T. Brosch, D. K.B. Li, S. Kolind, I. Vavasour, A. Rauscher, A. L. MacKay, A. Traboulsee, and R. C. Tam (2018) Deep learning of joint myelin and T1w MRI features in normal-appearing brain tissue to distinguish between multiple sclerosis patients and healthy controls. NeuroImage: Clinical 17, pp. 169–178. External Links: Document, ISSN 22131582, Link Cited by: §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description