Feature extraction with regularized siamese networks for outlier detection: application to lesion screening in medical imaging

Feature extraction with regularized siamese networks for outlier detection: application to lesion screening in medical imaging

Z. ALAVERDYAN Univ Lyon, INSA‐Lyon, Université Claude Bernard Lyon 1, UJM-Saint Etienne, CNRS, Inserm, CREATIS UMR 5220, U1206, F‐69621, LYON, France C. LARTIZIEN Univ Lyon, INSA‐Lyon, Université Claude Bernard Lyon 1, UJM-Saint Etienne, CNRS, Inserm, CREATIS UMR 5220, U1206, F‐69621, LYON, France
June 6, 2017

Computer aided diagnosis (CAD) systems are designed to assist clinicians in various tasks, including highlighting abnormal regions in a medical image. A common approach consists in training a voxel-level binary classifier on a set of feature vectors extracted from normal and pathological areas in patients’ scans. However, many pathologies (such as epilepsy) are characterized by lesions that may be located anywhere in the brain, have various shapes, sizes and texture. An adequate representation of such a heterogeneity requires a significant amount of annotated data which is a major issue in the medical domain. Therefore, we built on a previously proposed approach that considers epilepsy lesion detection task as a voxel-level outlier detection problem. It consists in building a oc-SVM classifier for each voxel in the brain volume using a small number of clinically-guided features [EAHJ16].
Our goal in this study is to make a step forward by replacing the handcrafted features with automatically learnt representations using neural networks. We propose a novel version of siamese networks trained on patches extracted from healthy patients’ scans only. This network, composed of stacked autoencoders as subnetworks, is regularized by the reconstruction error of the patches. It is designed to learn representations that bring patches centered at the same voxel localization ’closer’ with respect to the chosen metric (i.e. cosine). Finally, the middle layer representations of the subnetworks are fed to oc-SVM classifiers at voxel-level. The method is validated on 3 patients’ MRI scans with confirmed epilepsy lesions and shows a promising performance.

Keywords: regularized siamese network, stacked autoencoders, outlier detection, computer aided diagnosis, medical imaging

1 Introduction

Computer aided diagnosis (CAD) systems have been introduced as to assist clinicians in various tasks such as tumor segmentation, detection of abnormal regions in a medical image, etc. Recent CAD systems exploit various modalities of neuroimaging data, such as magnetic resonance imaging (MRI) and positron emission tomography (PET). A frequently applied approach in the existing CAD systems [HKS14, ZAT16] assumes extracting voxel-level descriptors and feeding them to a classification algorithm that would learn how to separate the suspicious voxels from the healthy ones. However, such approaches are hard to exploit when the number of pathological cases in the training set is not sufficient to account for the complexity of the task. In particular, epilepsy lesions vary largely in terms of shapes, sizes, textures and it is not trivial to obtain a well-annotated dataset to represent such a variability. Therefore, in [EAHJ16] we proposed to adapt a different approach which consists in treating the epilepsy lesion detection in brain magnetic resonance (MR) images as an outlier detection problem. For each voxel in the brain volume, six clinically-guided features were extracted and fed into a oc-SVM classifier. These oc-SVM models (one per voxel) were trained solely on the voxels belonging to healthy brain scans and allowed detecting epilepsy lesions as outliers when tested on MR scans carrying pathologies.
This work extends the previous approach by replacing the handcrafted features with automatically extracted representations learned with a siamese network. The network is composed of stacked denoising autoencoders and is trained on the patches of healthy brain volumes only, by utilizing a novel loss function adapted to the given context. We expect such a network to be efficient in producing useful representations in our outlier detection context.

2 Method

In this work we make an attempt to learn patch-level representations in an unsupervised manner and use them as feature vectors for oc-SVM classifiers at voxel-level. The representations are learnt on a set of patches extracted from MRI scans of healthy patients only, hence pathological examples are not exploited. Our approach could be beneficial in the settings where it is not feasible to sample an adequate amount of pathological cases.
We aim at finding a mapping from the original patches to a representation space where the patches centered at the same voxel but belonging to different patients (such patches will be referred to as ”similar”) are ”close”. Considering each set of similar patches as representatives of a certain class, we would have as many classes as there are voxels in a typical brain volume (about 4 millions), while having a far smaller number of examples per class. Siamese networks have been successfully applied in learning such a mapping when the number of classes is much larger than the number of samples in each of them [[BBB93, CHL05]].

Figure 1: Siamese neural network with stacked autoencoders with tied weights.

2.1 Regularized siamese architecture for feature extraction

2.1.1 Architecture

The proposed architecture is illustrated on figure 1. Our regularized siamese neural network (rSNN) consists of two identical (same architecture, shared parameters) subnetworks - stacked denoising autoencoders (sDA) with hidden layers and a cost module. Each layer of the sDAs is associated with a weight matrix , that connects the neurons in the layer to those in the -th layer, and a bias vector . A denoising autoencoder [VLBM08] receives a distorted version of a patch at input and yields a reconstructed output . The parameters are iteratively updated to optimize a loss function that measures the deviation between and the clean input . The siamese network receives a pair of patches at input, then each patch is propagated through the corresponding subnetwork yielding representations in the middle (narrow) layer which are then passed to the loss function below. Unlike in the classical siamese frameworks where the network also receives a binary label that stands for the similarity/dissimilarity of the pair, in our application all the considered pairs are similar and therefore the label is not present in the loss function. The loss function, however, can be easily modified to meet the typical setting. Another version of regularized siamese networks has been proposed by [CS11], however, within the classical context.

2.1.2 Loss function

Our loss function is designed to maximize the cosine similarity between and . In the absence of dissimilar pairs (the notion of dissimilar patches is not defined in our context) it is necessary to add a regularizing term. To this end we propose to use the mean squared error between the input patches and their reconstructions output by the subnetworks. Without a proper regularization term the loss function could be driven to 0 by mapping all the patches to a constant value. The proposed loss function for a single pair hence is:


where is the reconstructed output of subnetwork of the patch while is its representation in the middle layer and is a coefficient that controls the tradeoff between the two terms. represents the parameter set. (Note that in our case the input is scaled between 0 and 1).

2.1.3 Training

The training of the network is achieved in two steps. In the first step, one of the stacked denoising autoencoder-subnetworks is pre-trained using greedy layer-wise pretraining [B09]. This allows to initialize the network parameters for the next fine-tuning step. Since the subnetworks are identical the parameter values are copied to initialize the other subnetwork. In the fine-tuning step the network parameters are iteratively updated as to minimize the loss function (1) following a stochastic gradient descent method. Each parameter is updated with the sum of the gradients of the two subnetworks with respect to the corresponding parameter.

2.2 Voxel-level outlier detection with oc-SVM classifiers

A oc-SVM [SPST01] classifier seeks to find the optimal hyperplane that separates the given points from the origin in a dot product space defined by some kernel function . The optimization problem to be solved is the following:

subject to

where is the number of training examples, is the -th example in the dataset , -s are slack variables relaxing the inequality constraints as to account for the non-separable classes, and define the separating hyperplane, is a parameter that sets a boundary to the fraction of outliers allowed. The decision function, then, for an example is . This decision function contributes to the signed score output by a oc-SVM model (in a typical scenario examples with negatives scores would be considered outliers).
To validate the usefulness of the features learnt by the proposed method we use the representations in the middle layer of the subnetworks to train oc-SVM classifiers at voxel level. Each voxel is associated with a classifier, hence the number of classifiers is equal to the number of voxels in a volume (around 4 million voxels). For a given voxel , the associated oc-SVM classifier is trained on the matrix where is the feature vector corresponding to the patch centered at of patient and is the number of patients. The length of is equal to the number of neurons in the middle layer.
For a new patient, each voxel is matched against the corresponding classifier and is assigned the signed score output by the classifier. This yields a ”distance” map for the given patient. This map is later thresholded for each patient individually (the threshold is chosen as the score corresponding to a pre-chosen -value in the distribution of scores of the patient) and 26-connectivity rule is applied. Further, the voxel clusters smaller than a fixed size (set to 82 voxels in this study) are discarded.

3 Experiments and results

3.1 Dataset description and pre-processing

Our database consists of T1-weighted MR images of 96 healthy subjects (T1 weighted MRI acquisition is a standard exam in a clinical routine where contrast in the image is due to the different relaxation time-properties of the tissues). 29 of those (DB1) had a 3D anatomical T1-weighted brain MRI sequence (TR/TE 9.7/4 ms; 176 sagittal slices of 256 x 256 millimetric cubic voxels) on a 1.5 T Sonata scanner (Siemens Healthcare, Erlangen, Germany). The remaining 67 (DB2) had a 3D anatomical T1-weighted brain MRI sequence on the same scanner but with a slightly different protocol (TR/TE 2400/3.55; 160 sagittal slices of 192 x 192 1.2mm cubic voxels). All the volumes were normalized to the standard brain template of the Montreal Neurological Institute (MNI) [MTE01] using a voxel size of 1 x 1 x 1 mm. This step is important in assuring the voxel-level correspondance between the subjects. We validated the method on 3 patients with confirmed epileptogenic lesions - all 3 patients acquired with the same scanner and the same parameters as the subjects from DB1 and having a positive MRI screening () meaning that the lesion was visually detected on the MR scan and eventually outlined by a neurologist.

Figure 2: First row: pathological slices of patient A, B and C respectively. The true lesions are outlined in red circles. Second row: Maximum Intensity Projections of the cluster maps onto the same slice as in the first row. The maps are reported for a p-value = 0.003. Note that some clusters appear outside of the brain volume as a result of the MIP projection. Note also that when MIP is used, multiple clusters can appear jointly on the projection, hence the number of false positive clusters on the visualization can be smaller than the reported number in the table below.

3.2 Feature extraction with rSNN

The results below are reported for the best network among the ones we tried (the networks were designed by varying the essential parameters and configurations such as the input patch size, the corruption rate, the parameter ). The proposed rSNN subnetworks are stacked denoising autoencoders with 2 hidden layers consisting of 64 and 32 neurons respectively. During the pretraining the distortion rate (the probability of masking a voxel in a patch i.e. setting it to 0) was set to 0.3 and 0.1 for the first and second layers respectively. The same rate in the fine-tuning step was set to 0.1. We extracted 9 x 9 patches with stride 5 from all available volumes of the healthy controls, which gave around 14 million patches in total, and used them in pre-training. The same number of similar pairs was used to fine-tune the model.

3.3 oc-SVM classifier design

We used oc-SVM classifiers with RBF kernel which gives us two parameters to set - (upper bound on the fraction of permitted outliers) and (the kernel parameter). We empirically set the parameter to 0.03 for all the oc-SVMs. The RBF kernel width, was set for each voxel individually, using the median of the pairwise distances between the points of the corresponding matrix as it was suggested in [CSFS02].

3.4 Experimental results

Below we report the results obtained with the best model among the tried ones - rSNN trained on 9 x 9 patches with (we varied the size of the input patches and the parameters ). We also evaluated the performance of the same pipeline when using only one of the subnetworks i.e. a stacked denoising autoencoder with 2 hidden layers (will be referred to as sDA). To evaluate the performance of the considered models we follow the following steps. As explained in section 2.2, for each voxel localization in the brain a oc-SVM model is trained on the voxels of the healthy subjects extracted from the corresponding localization. When given a test patient’s MR image, the CAD system, for each voxel of the volume, gathers the score/output of the corresponding oc-SVM model. This results in a volume of the same dimensions as the original MR image, where each voxel now is the signed score assigned by the oc-SVM model (we refer to this output as distance map). We then calculate the score distribution of the patient and find the score corresponding to the a fixed -value. The voxels with score above this value are discarded. The remaining voxels are checked against the 26-connectivity rule which produces connected components (we will refer to those as clusters). Clusters containing less than 82 voxels are discarded and the remaining constitute the final cluster map.
Table 1 reports the true lesion detections/clusters and the number of false positive clusters(clusters that were detected by the system but were not considered epileptogenic by the neurologist are considered false positive in this application). The results show that both rSNN and sDA perform adequately and manage to detect the lesions. rSNN, however, outputs less irrelevant clusters (only one epileptogenic lesion was pointed out per patient by the neurologist). An important step of evaluation is to visualize the output cluster maps. Figure 2 demonstrates the maximum intensity projections of the detected clusters onto 3 slices of interest for patients A, B and C. The true lesion locations are outlined in red circles. The maps were obtained for a p-value = 0.003, a threshold which allowed a clear detection of the lesions. This threshold can be varied by a physician allowing to find anomalies on different scales. As we can see, the true lesions (the ground truth is highlighted in red circles) are well detected while some of the false detections (at the brain and skull interface) can be easily eliminated by a trained eye or by post-processing the cluster maps based on geometric features. Some reported clusters may also correspond to true anomalies that are either benign or were not reported as epileptogenic by the neurologist.

Model patient patient patient
sDA ✓(9) ✓(7) ✓(8)
rSNN / =0.66 ✓(5) ✓(4) ✓(7)
Table 1: The results obtained with sDA and rSNN (with =0.66) reported for -value = 0.003

4 Conclusion

This work presents an approach of feature extraction by the means of a regularized siamese network in the context of voxel-level outlier detection task. The preliminary experiments showed a promising performance when applying the method to the epilepsy lesion detection in T1-weighted MRI scans. The performance of the learnt representations could be improved by replacing the stacked denoising autoencoder-subnetworks with their convolutional counterparts. Additionally, other metrics could be explored to replace the cosine similarity in the proposed loss function. Regarding the specific application of epilepsy lesion detection, the proposed method would gain in efficiency if other medical imaging modalities, such as PET and/or FLAIR, were considered. Incorporating multiple modalities in a single model is one of the perspectives of the future work.


We thank Dr Julien Jung from the Lyon Neurological Hospital for providing the MRI data set and for sharing his clinical expertise on the epilepsy patients.


  • [B09] Yoshua Bengio et al. Learning deep architectures for ai. Foundations and trends® in Machine Learning, 2(1):1–127, 2009.
  • [BBB93] Jane Bromley, James W. Bentz, Léon Bottou, Isabelle Guyon, Yann LeCun, Cliff Moore, Eduard Säckinger, and Roopak Shah. Signature verification using a ”siamese” time delay neural network. IJPRAI, 7(4):669–688, 1993.
  • [CHL05] Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 539–546. IEEE, 2005.
  • [CS11] Ke Chen and Ahmad Salman. Extracting speaker-specific information with a regularized siamese deep network. In Advances in Neural Information Processing Systems, pages 298–306, 2011.
  • [CSFS02] Barbara Caputo, K Sim, F Furesjo, and Alexander Smola. Appearance-based object recognition using svms: which kernel should i use? In Proc of NIPS workshop on Statistical methods for computational experiments in visual processing and computer vision, Whistler, volume 2002, 2002.
  • [EAHJ16] Meriem El Azami, Alexander Hammers, Julien Jung, Nicolas Costes, Romain Bouet, and Carole Lartizien. Detection of lesions underlying intractable epilepsy on t1-weighted mri as an outlier detection problem. PloS one, 11(9):e0161498, 2016.
  • [HKS14] Seok-Jun Hong, Hosung Kim, Dewi Schrader, Neda Bernasconi, Boris C Bernhardt, and Andrea Bernasconi. Automated detection of cortical dysplasia type ii in mri-negative epilepsy. Neurology, 83(1):48–55, 2014.
  • [MTE01] John Mazziotta, Arthur Toga, Alan Evans, Peter Fox, Jack Lancaster, Karl Zilles, Roger Woods, Tomas Paus, Gregory Simpson, Bruce Pike, et al. A probabilistic atlas and reference system for the human brain: International consortium for brain mapping (icbm). Philosophical Transactions of the Royal Society of London B: Biological Sciences, 356(1412):1293–1322, 2001.
  • [SPST01] Bernhard Schölkopf, John C Platt, John Shawe-Taylor, Alex J Smola, and Robert C Williamson. Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443–1471, 2001.
  • [VLBM08] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096–1103. ACM, 2008.
  • [ZAT16] Yijun Zhao, Bilal Ahmed, Thomas Thesen, Karen E Blackmon, Jennifer G Dy, Carla E Brodley, Ruben Kuzniekcy, and Orrin Devinsky. A non-parametric approach to detect epileptogenic lesions using restricted boltzmann machines. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 373–382. ACM, 2016.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description