Late fusion of deep learning and handcrafted features for Achilles tendon healing monitoring
Abstract
Healing process assessment of the Achilles tendon is usually a complex procedure that relies on a combination of biomechanical and medical imaging tests. As a result, diagnostics remains a tedious and longlasting task. Recently, a novel method for the automatic assessment of tendon healing based on Magnetic Resonance Imaging and deep learning was introduced. The method assesses six parameters related to the treatment progress utilising a modified pretrained network, PCAreduced space and linear regression. In this paper, we propose to improve this approach by incorporating handcrafted features. We first perform a feature selection in order to obtain optimal sets of mixed handcrafted and deep learning predictors. With the use of approx. 20,000 MRI slices, we then train a metaregression algorithm that performs the tendon healing assessment. Finally, we evaluate the method against scores given by an experienced radiologist. In comparison with the previous baseline method, our approach significantly improves correlation in all of the six parameters assessed. Furthermore, our method uses only one MRI protocol and saves up to 60% of the time needed for data acquisition.
Keywords:
Achilles tendon rupture, Deep learning, Magnetic Resonance Imaging1 Introduction
Achilles tendon rupture is common among physically active middleaged people. It seriously affects the patient’s mobility and ability to be physically active over a long period of time. Proper oversight during the rehabilitation is important and can lead to a reduction of complications i.a. tendon rerupture.
Existing methods like ATRS [Kearney2012] are only suitable for measuring the general outcome of the rehabilitation, related to symptoms and physical activity of patients. The inclusion of medical imaging allows to complement the monitoring and properly assess tissue morphology associated with the tendon state. However, due to costs of medical examinations, limited time and resources of radiology departments, it is still an uncommon approach.
Both problems were recently addressed in [Kapinski2018], where the authors presented a first MRIbased method for automatic assessment of the Achilles tendon healing progression as well as selected the two most informative protocols suitable for their approach. The method is treated as a stateoftheart baseline for our studies.
We aim to improve the method in terms of the assessment quality as well as the number of MRI protocols required, hence enabling clinics to perform more studies and efficiently assist radiologists in patients evaluation. To do so, we incorporate handcrafted features to the previous approach and perform a late fusion with the use of a metaregression algorithm. More precisely, the baseline method consists of a convolutional neural network feature extractor and a principal component layer that performs dimensionality reduction. We take 200 first principal components and 46 handcrafted features investigated in [Nowosielski17]. Subsequently, we select an optimal mixed feature set with the use of the LASSO method. Finally, we train a metaregression algorithm to fit the resulting representation to the 6 tendon state scores assigned by a human annotator. Our method outperforms the baseline model in terms of correlation with the ground truth in every single parameter as well as improve the mean absolute error for 4 out of 6 and max absolute error for 2 out of 6 parameters. Furthermore, the proposed method uses only one MRI protocol instead of two, which directly translates to up to a 60% shorter time of acquisition and lower cost.
2 Method
In this section, we describe our improved method for predicting the Achilles tendon healing phase. We start by selecting the most valuable representation of features for training our metaregression algorithm. To do so we apply the LASSO method on a set of 246 features, being a combination of the 200 most significant principal component features extracted by a baseline model and 46 handcrafted predictors. The latter represents statistics from the Region of Interest (ROI) which in our case is the segmented tendon. More precisely, the new features include the ROI area, pixel value based statistics over the ROI (min, max, mean, standard deviation, skewness, kurtosis, 25percentile, median, 75percentile), as well as Haralick’s textural features [Haralick1973, Nowosielski17] (angular second moment, contrast, correlation, variance, inverse difference moment, sum average, sum variance, sum entropy, entropy, difference variance, difference entropy, maximum probability) for separation distance d=1,5,10.
We use the resulting representation of mixed features after LASSO transformation and further described ground truth labels to train a regression algorithm. Finally, we use the metric proposed in the baseline method [Kapinski2018] to represent a score of the Achilles tendon condition, visible in a single 3D MRI study:
(1) 
where is a truncated mean with 2.5 upper and lower hinges (a value used by the baseline method), is the regression score computed on the slice where is the index of the slice in the 3D MRI study.
Fig. 1 shows the overview of our framework based on a late fusion of socalled deep learning features and handcrafted ones.
In general, the approach for each 2D axial slice in the 3D MRI study produces a single assessment score that is merged into one value with the use of the metric.
The deep learning features (without ROI information) are computed in the same way as described in the baseline paper (see region marked in yellow on Fig. 1). It is a truncated version of the AlexNet convolutional neural network [AlexNet] trained with the use of 10 MRI protocols on a binary task of distinguishing between healthy and injured tendons. The architecture consists of the feature extractor, the first fully connected layer (namely fc6) followed by a principal component layer that performs dimensionality reduction. According to the authors of this approach, the first 200 principal components preserve a 98.8% amount of variance from 4096 fc6 activation outputs. Thus, we use all 200 as inputs to our late fusion approach. The model and the principal component transformation remain the same as in the original paper.
On top of the previous approach, we developed the novel contribution of this paper, namely the metaregression model that combines information conveyed by the handcrafted and deep learning features. The approach uses the LASSO feature selection, that allows us to reduce dimensionality and then effectively train the metaregression without overfitting.
3 Experiments
Within this section, we present experiments that allowed us to select the components for our final method. We start by introducing our dataset. Next, we show an analysis of different MRI protocols that brought us to the selection of the final input data. Subsequently, we present the feature selection method followed by a detailed study on training different metaregression algorithms. We conclude this section by comparing the results obtained by our method and the baseline approaches for 4 test patients excluded from the training procedure.
3.1 Dataset
The acquired dataset includes 3D MRI scans of 60 injured patients that suffered from an acute Achilles tendon rupture. The injured patients have their lower limb scanned once before the surgical reconstruction and then 9 times after the surgery, i.e., 1, 3, 6, 9, 12, 20, 26, 39, 52 weeks after the operation (10 MRI studies altogether for a single injured patient). Healthy volunteers were scanned only once. The single MRI study includes scans performed with 10 MRI protocols i.e four 3D FSPGR Ideal [Fast Spoiled Gradient Echo] (In Phase, Out Phase, Fat, Water), PD [Proton Density], T1, T2, T2 mapping, T2 GRE [Gradient Echo] and T2 GRE TE_MIN [Minimal Time Echo].
Within our dataset, there is the complete sequence of 10 MRI studies in time (including the manual segmentation and the ground truth for the radiological survey) available for 48 injured patients. In the case of T2 GRE TE_MIN (our selected protocol based on further described studies), these 48 patients translate to 480 MRI studies and 18,863 slices with nonempty tendon ROI. We randomly selected 4 patients (40 studies and 1545 slices) to form a separate test set. The remaining 44 injured patients constitute a training set which is used for crossvalidation and for the final training of the feature selector and the metaregression.
Manual segmentation of the Achilles tendon ROI has been provided by an expert radiologist, who also annotated the ground truth parameters through the scoring of several aspects of the tissue, namely:

Structural changes within the tendon (SCT)

Tendon thickening (TT)

Sharpness of the tendon edges (STE)

Tendon edema (TE)

Tendon uniformity (TU)

Tissue edema (TisE)
The TisE parameter relies on the tissues outside the tendon and the STE parameter on the border, thus both can be treated as extratendon scores. All of the other parameters rely on intratendon structures. The assessment of the parameters was done on a scale ranging from to , where represents a healthy tendon and a severely injured one.
3.2 Tendon healing assessment with late fusion
3.2.1 MRI protocol selection:
Within this study we focus on selecting one MRI protocol that shows the most valuable information regarding the tendon healing process. We discovered that the MRI signal from the tendon area in the subsequent healing weeks is more differentiable with the use of the T2GRE TE_MIN protocol, than with the use of any other (see Fig 2).
As healthy tendon tissue is characterised by very short T2 times, the tendon visible on the images derived from the other protocols is almost black after just 12 weeks of rehabilitation, while the T2GRE TE_MIN still detects pathological changes. Better sensitivity of this protocol for the partially healed Achilles tendon results from its very short echo time. Considering this observation, we decided to use T2GRE TE_MIN as the input data for our method.
3.2.2 Metaregression training task:
We followed a standard 4fold crossvalidation procedure to tune the hyperparameter of the LASSO feature selector and hyperparameters of metaregression algorithms. We have chosen 4 groups to ensure approximately equally sized folds. Each fold contains slices from 11 patients (approx. 4300 axial slices).
In terms of the LASSO method we obtained best results regarding correlation with the ground truth for and the following feature set:

SCT: 6 DL features and 10 handcrafted features, including 3 Haralick’s features (sum variance for d=1, sum average for d=5, sum average for d=10);

TT: 5 DL features and 7 handcrafted features, but no Haralick’s features;

STE: 5 DL features and 8 handcrafted features, but no Haralick’s features;

TE: 4 DL features and 8 handcrafted feature, with one Haralick’s feature (sum average for d=10);

TU: 4 DL features and 9 handcrafted features, with one Haralick’s feature (sum variance for d=1);

TisE: 6 DL features and 7 handcrafted features, but no Haralick’s features.
The total number of selected features is always below 20 for all of the ground truth parameters. Furthermore, there are both deep learning and handcrafted features present in all of the sets, indicating a possibility of successful fusion approach.
Using the limited feature sets, for each of the ground truth parameters we train several regression algorithms, namely: linear regression (LR), second degree polynomial regression (poly), support vector regression (SVR), multilayer perceptron regression with 4 units in the hidden layer (MPR) and random forest (RF). Despite our multiple trials with different random forest sizes, the algorithm always showed a tendency to overfitting, hence we exclude RF from the table presenting metaregression algorithms performance on the training set (Tab. 1).
Model  SCT  TT  STE  TE  TU  TisE  

poly 
MAE  
MAXAE  3.53  2.35  3.62  2.49  2.90  2.64  
Corr  0.87  0.82  0.46  0.80  0.65  0.87  
SVR  MAE  
MAXAE  3.73  2.32  3.83  2.50  2.95  2.75  
Corr  0.89  0.85  0.59  0.83  0.72  0.88  
LR  MAE  
MAXAE  3.52  2.39  3.77  2.52  2.88  2.65  
Corr  0.87  0.83  0.46  0.80  0.65  0.87  
MPR  MAE  
MAXAE  3.47  2.51  3.57  2.52  2.86  2.65  
Corr  0.86  0.83  0.46  0.80  0.65  0.88 
The models are evaluated with three metrics, i.e. mean absolute error (MAE), maximal absolute error (MAXAE) and Fisher ZTransformed mean Pearson correlations between computed values and ground truth scores for an individual patient. Presented metaregression approaches resulted in comparable scores, thus we select them all in the following experiments.
3.2.3 Healing progress assessment:
In this task, we evaluate our late fusion approach against the baseline. We use two variants of the baseline introduced in the original paper: (1) inference based on PD, T2 GRE and T2 GRE TE_MIN protocols and (2) inference based only on the T2 GRE TE_MIN protocol. In (1) we focus on reproducing the pipeline as in the original paper and (to make a fair comparison) enriched it with the protocol that we use, while in (2) we evaluate how the models inference on limited data only.
The test set performance for the metaregression algorithms is summarised in Tab. 2.
Model  SCT  TT  STE  TE  TU  TisE  

poly 
MAE  
MAXAE  2.67  1.78  1.81  2.50  2.12  2.39  
Corr  0.82  0.83  0.25  0.71  0.63  0.78  
SVR  MAE  
MAXAE  2.62  1.82  1.92  2.54  2.01  2.38  
Corr  0.85  0.85  0.31  0.72  0.65  0.80  
LR  MAE  
MAXAE  2.60  1.78  1.81  2.54  2.04  2.38  
Corr  0.84  0.84  0.18  0.71  0.62  0.78  
MPR  MAE  
MAXAE  2.63  1.77  1.78  2.54  2.04  2.40  
Corr  0.83  0.83  0.20  0.73  0.63  0.77  
baseline 
MAE  
MAXAE  3.53  2.49  1.91  2.34  2.2  2.47  
Corr  0.58  0.47  0.07  0.60  0.56  0.58  

MAE  
MAXAE  3.54  2.46  1.82  2.70  2.13  2.18  
Corr  0.61  0.64  0.08  0.55  0.55  0.65 
In comparison with both baselines, even the simple linear metaregression model allows to significantly improve correlation for all of the parameters, max absolute error for SCT, TT, STE and TU as well as mean absolute error for SCT, TE, and lastly TT as the one that has the statistical significance.
In Fig. 3 we show an example of the worst and best assessments in terms of the correlation for both intra and extratendon parameters. The results are presented for the SVR, which resulted best in 5 scores in terms of the correlation while remaining competitive in other metrics.
We observe that in all cases the outcome of the rehabilitation and the starting point is assessed in a similar manner (with an absolute error below or approx. in one case). The level of 1 is also similar to the radiologist score uncertainty and should have minimal impact on clinical decisions. The assessment fluctuations during the healing process will be further discussed.
4 Discussion
We show that our late fusion approach and the selection of appropriate deep learning and handcrafted features improve the automatic assessment of the Achilles tendon healing. The process is affected by many disturbances like patient activity, diet and their obedience to the treatment prescription. According to the feedback provided by radiologists and medical professionals, extratendon conditions, especially edema, are more prone to these factors, hence explicitly incorporating the handcrafted features that put emphasis on intratendon area results in an overall improvement.
The key to the boost in performance is to select only a small number of significant predictors from a mixture of handcrafted and deep learning predictors. Haralick’s features depend on pixel cooccurrence and are indicators of specific textural patterns. It is advantageous particularly in the case of the SCT, TE and TU. ROI statistics are particularly significant in terms of TT assessment, which resulted in the overall best scores for all of the computed metrics. On top of that, the inclusion of deep learning features after the principal component transformation allowed for the successful assessment of an extratendon parameter i.e. TisE. The other one, namely STE, resulted in a relatively low correlation, although still improved. This score doesn’t rely on Haralick’s and only indirectly incorporates ROI information, like area. The improvement mainly comes from the use of five DL features and not one like in the case of the baseline method.
5 Conclusions
In this paper, we proposed a metaregression model based on the late fusion of deep learning and handcrafted features that improved the existing stateoftheart automatic assessment of the healing Achilles tendon visible in the MRI studies.
Furthermore, inline with the achieved improvement we decreased the number of MRI protocols required for the approach from two to one, namely T2 GRE TE_MIN. This directly translates to savings in time and costs, while the acquisition of data for our method takes approx. 5 minutes and for the previous approach approx. 12.
As for future work, we plan to automate the segmentation of the tendon region of interest (ROI), which currently remain manual. We have performed initial tests that indicate the possibility of employing fully convolutional networks for this task.
Acknowledgments
The following work was part of Novel Scaffoldbased Tissue Engineering Approaches to Healing and Regeneration of Tendons and Ligaments (START) project, cofunded by The National Centre for Research and Development (Poland) within STRATEGMED programme (STRATEGMED1/233224/10/NCBR/2014).