Forecasting intracranial hypertension using multi-scale waveform metrics
Objective: Intracranial hypertension is an important risk factor of secondary brain damage after traumatic brain injury. Hypertensive episodes are often diagnosed reactively and time is lost before counteractive measures are taken. A pro-active approach that predicts critical events ahead of time could be beneficial for the patient. Methods: We developed a prediction framework that forecasts onsets of intracranial hypertension in the next 8 hours. Its main innovation is the joint use of cerebral auto-regulation indices, spectral energies and morphological pulse metrics to describe the neurological state. One-minute base windows were compressed by computing signal metrics, and then stored in a multi-scale history, from which physiological features were derived. Results: Our model predicted intracranial hypertension up to 8 hours in advance with alarm recall rates of 90% at a precision of 36% in the MIMIC-II waveform database, improving upon two baselines from the literature. We found that features derived from high-frequency waveforms substantially improved the prediction performance over simple statistical summaries, in which each of the three feature categories contributed to the performance gain. The inclusion of long-term history up to 8 hours was especially important. Conclusion: Our approach showed promising performance and enabled us to gain insights about the critical components of prediction models for intracranial hypertension. Significance: Our results highlight the importance of information contained in high-frequency waveforms in the neurological intensive care unit. They could motivate future studies on pre-hypertensive patterns and the design of new alarm algorithms for critical events in the injured brain.
With at least 10 million cases annually leading to hospitalization worldwide, traumatic brain injury (TBI) is a major public health issue , mainly caused by falls, motor vehicle accidents or violent assaults. After initial admission to the intensive care unit (ICU) and assessment of the primary brain injury, further neurological damage often occurs. This phenomenon is referred to as secondary brain injury, and often leads to long-term brain damage through e.g. cerebral ischemia (decrease of blood flow to the brain) , cerebral hypoxia (decrease of substrate/oxygen flow to the brain)  and brain herniation (swelling leading to compression of brain structures ).
Current management of TBI patients in the ICU focuses on mitigating and possibly reversing secondary injuries . A key variable in the management of secondary brain injury is intracranial pressure (ICP) [6, 7], which is determined by the overall volume of the cranial components, i.e. neural tissue, blood, and the cerebrospinal fluid. According to the Monro-Kellie doctrine, the total volume of the cranial system is constant and a volume increase in one of its components can potentially increase ICP . One of the major concepts that emerged in the study of ICP auto-regulation  is intracranial adaptive capacity (IAC). IAC refers to the ability of the brain to maintain blood- and energy substrate flow by holding cerebral pressure constant against slight volume changes of the cranial components . The ICP value of a healthy adult is maintained by the IAC mechanism in the range 7-15 mmHg . However, if regulatory capacity is reduced, rapid non-linear ICP elevations can occur . A sustained ICP elevation over 20 mmHg is defined as intracranial hypertension (ICH) . An illustrative example of an ICH event is shown in Figure 1.
A direct association of time spent in the ICH state with clinical outcome has been empirically shown: The area under the mean ICP curve in the first 48 hours of ICU treatment is an independent predictor of in-hospital mortality . Various other studies have established an association of ICH and poor neurological outcome [15, 16, 17]. Accordingly, it is a common treatment goal in neuro-critical care to avoid intracranial hypertension . Invasive, intra-parenchymal ICP monitoring combined with surgical interventions is the gold standard to control and maintain ICP in the physiological range below 15-20 mmHg and ensure adequate IAC . Advances in monitoring and signal processing technology have allowed to record high-frequency ICP traces and analyze them in real-time . Yet there are several caveats that hinder the interpretation of the ICP signal and its use as a decision-support tool: (a) recorded ICP signals can be corrupted by high-frequency noise caused by measurement devices as well as artifacts due to patient movement ; (b) raw data and time-varying trends are presented to the clinician, and no risk estimates for ICH are available. This can lead to information overload and over-consumption of human attention for the ICU personell. For example, a study has found that clinicians are often not confident that effort spent on inspection of ICP traces is redeemed by improving outcome after TBI ; (c) threshold-based track-and-trigger systems often have too high false alarm rates, which can desensitize staff to dangerous hypertension events ; (d) alarms are only triggered after onset of intracranial hypertension, when long-term effects might be already harder to prevent.
To address these problems, robust forecasting of ICH onsets could augment the current treatment protocol which is overly manual, reactive, and prone to errors. Such an approach would automatically identify precursor patterns of ICP elevation. Previous works have shown that characteristic changes in auto-regulation indices and ICP/ABP waveform morphology occur prior to hypertensive events [22, 23]. However, it is not clear what is the marginal value of extracting features from high-frequency waveforms, compared to simple statistical summaries, which have been shown to yield competitive prediction performance by Myers et al. . In this extensive empirical study of ICH prediction, we evaluate the benefits of morphological and auto-regulation indices, and present the critical components of a robust ICH forecasting system. To this end, we propose a prediction framework that describes the neurological state of a patient using multi-scale waveform metrics, conduct a series of feature ablation experiments and derive a set of important physiological features using the recently proposed SHAP (SHapley Additive Explanation) value technique . Preliminary and partial versions of this work have been reported in clinical abstracts [26, 27].
Ii Related work
The association of information contained in high-frequency physiological waveforms/time series and elevated ICP has been studied in various works. For example, Hornero et al.  have found that decreased ICP signal complexity and irregularity is associated with acute intracranial hypertension. Fan et al.  identified an association between ICP variability and decreased intracranial adaptive capacity. Recently, it was established that characteristic patterns in various physiological channels are correlated with ICP and could thus be used to predict ICH . Several auto-regulation indices defined on physiological channels were reported, such as by Zeiler et al. , which studied the moving correlation coefficient between ICP, ABP and CPP channels, and others [32, 33]. The relationship between auto-regulation indices and successive ICH events has been studied by Kim et al. . In general, it has long been suspected that the information contained in the pulsatile ICP signal is very rich beyond simple statistical summaries [35, 36].
Besides auto-regulation indices, previous works have attempted to use morphological descriptors of the intracranial pressure pulse to predict ICH onset up to 20 minutes in advance [37, 23, 38]. More generally, morphological analysis of ICP pulses  has emerged as a successful approach and was used to e.g. reduce false alarm rates of ICP alarms  and track pulse metrics in real-time . Hu et al.  applied cluster analysis to individual ICP pulses. Other types of features that have been proposed to summarize physiological time series include bag-of-words of physiological motifs applied to ECG/EEG time series  and entropy measures [44, 45]. The recently proposed ICP trajectories framework  uses longitudinal ICP time series to discover clinical phenotypes. Different approaches have also been proposed, based on assessing risk only from static clinical data  or biomarkers [48, 49, 50], instead of using historical time series.
To obviate the need for explicit feature engineering on historical time series, deep learning architectures have been proposed, which detect intracranial hypertension from the raw pulse waveform . Simpler dimensionality reduction approaches, such as principal component analysis, have also been used to find non-correlated features  that describe ICH.
Major recent works explicitly addressing the ICH forecasting problem include the approach proposed by Güiza et al. , which obtained an AUROC of 0.87 for prediction of ICH in the next 30 minutes. Their analysis showed that the most predictive channel is ICP and that the most recent measurements are the most relevant features. Subsequently, their model was externally validated, resulting in similar performance . Myers et al.  proposed a model that is able to predict ICH up to 6 hours in advance. It uses simple features such as the last measured ICP value or the time to the last ICH crisis. Besides tackling the classification task directly, other models have been suggested that predict the future ICP mean value, for example by using nearest-neighbor regression , neural networks [56, 57] or ARIMA models .
In summary, the problem of predicting ICH has been mainly studied from two disparate angles, which involve complex morphological analysis and auto-regulation indices, as well as recently emerged simple statistical models that have demonstrated promising results. Yet it is not well understood or investigated which of these approaches is necessary or sufficient to achieve high prediction performance for ICH, and whether additional benefits could be derived from their combination.
Iii-a Physiological database
In all experiments reported in this paper, we have used the multi-parameter intelligent monitoring in intensive care II waveform database (MIMIC-II WFDB) , Version 3.2. The entire dataset consists of 17,468 adult patient stays at the Beth Israel Deaconess Medical Center, Boston, MA, United States. The database contains high-resolution waveforms captured at 125 Hz and time series of vital signs captured at 1 Hz. Available waveforms include different leads of the electrocardiogram (I-V), invasive arterial blood pressure (wABP), intracranial pressure (wICP), raw output of fingertip plethysmograph (wPLETH) and respiration waveform (wRESP). Waveforms were acquired using the bedside IntelliVue Patient Monitoring system, Philips Healthcare, The Netherlands. Low-frequency time series include mean, diastolic and systolic arterial blood pressures (ABPm/d/s), mean intracranial pressure (ICP), cerebral perfusion pressure (CPP), heart rate (HR), respiration rate (RESP) and blood oxygen saturation estimated from fingertip plethysmography (SpO2). The MIMIC-II WFDB was chosen for this study because it contains simultaneous measurements of high-frequency waveforms and derived time series for a range of physiological channels that are relevant to the prediction problem. Other clinical variables of the matched MIMIC-II clinical data-base were not used in this study, because only an insufficient number of selected waveform records were linked to a clinical record. Among all available channels, we selected ICP , CPP and ABPm/d/s time series, and wICP, wABP, wPLETH, wRESP and II (ECG) waveforms. This broad range allows us to compare the relevance of different channels for predictive modeling, while still ensuring that we can extract a cohort of at least 50 ICU stays with regular sampling.
Iii-B Cohort selection
Only a small fraction of available records in the MIMIC-II WFDB contain ICP data. In a first step, we discarded all segments that have no available ICP time series, which left 346 relevant segments. We further require a minimum recording length of 24 hours, and a missing value ratio of at most 25% for each considered waveform or time series channel. We applied these criteria to ensure that the relevance of different channels as features could be meaningfully compared, and individual channels would not be negatively affected by long stretches of missing data. After applying these criteria, 54 segments remained in the cohort. This set of recording segments was used in all reported experiments. A diagram summarizing patient exclusions and cohort definition is shown in Figure 2. Overall, our data-set contains 4382 hours of data, which corresponds to approximately 1.97 billion waveform samples recorded at 125 Hz. Each segment has a mean recording length of 81.15 hours (std: 46.46 hours). The mean ICP value in the cohort was 9.79 mmHg (std: 7.5 mmHg).
Iii-C Intracranial hypertension alarms
We define an ICH event as 5 successive 1-minute blocks with mean ICP greater than 20 mmHg. A time point on the 1-minute grid is labeled as positive if there is any intracranial hypertension event in the next 8 hours. Our predictive model is only trained and evaluated for time points at which the patient is not already hypertensive, which implements an early warning system deployed in phases where the patient has normal ICP values. Our design choice was to train one overall model for predicting events in the next 8 hours, without targeting any specific prediction horizon. Positive labels correspond to time points at which an alarm should be produced by our prediction model. Recall is defined as the fraction of those points, at which an alarm is indeed produced. Precision denotes the fraction of produced alarms which are in the 8 hours prior to some ICH event. Both metrics are maximized if continuous sequences of alarms are produced exactly in the 8 hours before events, one for each grid point. However, in clinical implementation this strict condition could be relaxed by applying post-processing such as moving window functions over the sequence of thresholded prediction scores. We consider such processing to be out-of-scope for this paper, but we suspect it can improve practical alarm system performance significantly, both in terms of recall and false alarm rate. The temporal prevalence of ICH in our cohort was 3.9%, and the per-segment mean hypertensive ratio was 3.0% (std: 10.1%).
Iii-D Physiological feature extraction framework
Basic block functions
During the feature generation process, so-called basic block functions are computed online on non-overlapping windows containing 1 minute of high-frequency waveforms/time series, corresponding to 60 samples @1 Hz or 7500 samples @125 Hz. The choice of 1 minute as a basic interval makes computation of complex morphological functions tractable, increases robustness to signal artifacts and sensor detachments, and allows to produce updated predictions every minute. Before computing basic block functions, a window is pre-processed by removing physiologically implausible values. If at least half of the samples are valid, we reconstruct the remaining samples by linear interpolation. Otherwise, invalid basic blocks marked by a symbolic value are emitted. Basic block functions are then computed on valid blocks. As basic block functions we have considered statistical/complexity summaries (median, interquartile range(IQR), line length, Shannon entropy), spectral band energies of waveforms, morphological pulse summaries of the wABP and the wICP waveforms, as well as cerebral auto-regulation indices describing IAC. Morphological pulse metrics are computed by an algorithm consisting of several steps. First, individual pulses on wABP/wICP are segmented, using variants of known algorithms [60, 61, 62], with the help of the II waveform as a reference to identify pulse onsets. Valid pulses in the window are then temporally scaled to make their lengths comparable, overlaid and averaged point-wise, yielding an averaged pulse. Morphological pulse metrics, modeled on those described by Hu et al.  and Almeida et al. , are then computed on the averaged pulse. A complete overview of the basic block functions is provided in the overview Table I.
|Statistical/complexity summaries (ICP, CPP, ABPm/d/s)|
|Median, Interquartile range, Line length , Shannon entropy|
|Spectral band energy metrics (wICP, wABP, wPLETH, wRESP, II)|
|Energy in frequency bands [0,1],[1,2],[2,3],[3,6],[6,9],[9,12],[12,15] Hz|
|Autoregulation indices on time series (1 Hz sample rate)|
|AmpIndex(ICP,ABPm), AmpIndex(ICP,CPP), AmpIndex(CPP,ABPm) |
|PrxIndex(ICP,CPP,ABPm) [65, 66]|
|RapIndex(ICP,CPP) [67, 68]|
|TFIndex(ICP,ABPm), TFIndex(ICP,CPP), TFIndex(CPP,ABPm) |
|Autoregulation indices on waveforms (125 Hz sample rate)|
|Morphological pulse metrics on waveforms|
|wABP pulse descriptor (17 metrics) :|
|A, UpstrokeTime, TimeAt, TimeAtDw, DownstrokeTime,|
|R1, R2, R3, R4, R5, R6, Aix|
|wICP pulse descriptor (20 metrics) :|
|Mean, Dias, DP1, DP2, DP3, DP12, DP13, DP23,|
|L1, L2, L3, L12, L13, L23, Curv1, Curv2,|
|Curv3, Slope, DecayTimeConst, AverageLatency|
Computed basic block features are appended to a multi-scale history buffer using an online algorithm, with one batch of features appended per minute. If a block is invalid or some features cannot be computed, for example due to missing signals, they are forward filled from the last valid feature in the history. If there is no valid feature in the recent past, the feature value is set to the median of that feature value in the accumulated history. After the multi-scale history buffer is updated, a new batch of machine learning features is emitted by summarizing the current state of the history buffer. As summary functions, we use the median (average value of a basic feature over the history), IQR (variability of a basic feature over the history) and the slope of regression line fit (trend of a basic feature over the history). These summary functions are applied separately over the last 15, 30, 60, 120, 240, 360 and 480 minutes to capture pre-hypertensive patterns at various scales of the feature buffer history. After the full feature matrix is built, we standardize all feature columns to have zero mean and unit standard deviation, using statistics from the training data-set. Missing values are replaced by zero, which corresponds to global mean imputation. For machine learning models that can deal with missing data natively, like decision trees or tree ensembles, missing data imputation/normalization is not performed. The online signal processing and feature generation algorithms were implemented using the numerical packages SciPy and NumPy in Python 3.6.
Machine learning models
As machine learning models we have considered LogReg, a L2-regularized logistic regression model optimized using stochastic gradient descent ; Tree, a single decision tree; GradBoost, a gradient-boosting ensemble of decision trees ; and MLP, a multi-layer perceptron with a sigmoid activation function. The GradBoost model was used to compute SHAP values and feature rankings per split, using the recently introduced TreeShap method . Implementation details and hyper-parameter grids for all machine learning models are listed in the supplementary material.
Prediction models are evaluated using precision @70, 80 and 90% recalls, which reflects our prior belief that an alarm system for ICH should have high sensitivity, whereas false alarms are more tolerable and can be reduced with post-processing defined on top of the sequence of prediction scores. All feature ablation experiments report these 3 metrics. Before ablating to specific feature subsets, models are pruned to the 500 most important features, according to mean absolute SHAP values. In this way, overfitting to non-informative features, which are numerous due to the broad range of feature combinations, is reduced. 95 % standard-error-based confidence intervals of performance metrics, which are used in all figures/tables, are constructed by drawing 10 randomized train/validation/test splits (proportion 50:25:25%) with respect to complete recording segments. Splits are stratified with respect to positive prevalence of the label. The experiments performed per split are completely independent. The training set was used for model fitting, while the validation set was used for implementing early stopping heuristics, choosing the optimal set of hyperparameters, and computing mean absolute SHAP values that define the reported feature rankings. Each split is associated with a distinct feature ranking, which we integrate over in our feature importance analysis. The test set was used to compute all reported performance metrics. To account for test set variability, besides training process variability, we draw 100 bootstrap samples (size 50% of test-set samples) with replacement from the test set, yielding 1000 overall replicates. Models with (indistinguishable based on overlapping 95% confidence intervals) best performance are listed in bold-face. An overview of the feature ablation results can be found in Tables II, III and Figure 3. In these, we only show results for the LogReg model, which yields the highest overall performance (Table IV).
Low-frequency time series channels
As a sanity check, we trained several models that do not use any features derived from high-frequency waveforms. The results, shown in the first part of Table II, indicate that ICP is the single most valuable time series across all desired recall levels. The addition of ABP/CPP context information leads to consistent performance increases.
|ICP||0.364 0.007||0.342 0.005||0.308 0.004|
|ABP||0.286 0.002||0.283 0.002||0.274 0.002|
|CPP||0.270 0.002||0.262 0.002||0.259 0.002|
|ICP+ABP+CPP (1 Hz)||0.378 0.008||0.350 0.006||0.326 0.005|
|+wICP||0.400 0.005||0.357 0.005||0.315 0.004|
|+wICP/ABP||0.442 0.006||0.405 0.005||0.347 0.003|
|only wICP||0.395 0.006||0.355 0.005||0.290 0.003|
|only wICP/wABP||0.428 0.005||0.392 0.003||0.348 0.004|
Importance of high-frequency waveform metrics
Taking the most performant time series model (from the first part of Table II), we tested whether adding features derived from 125 Hz waveforms has a positive effect on the prediction performance (second part of Table II). Our results indicate that adding wICP yields a marked performance increase, and the joint use with wABP strengthens this effect even further. Using only waveform channels shows consistently higher performance than just using time series.
Morphological, spectral energy metrics and cerebral auto-regulation indices
Morphological pulse metrics, cerebral auto-regulation indices and band energy have each been shown to exhibit characteristic changes prior to hypertensive events. We tested whether such changes can translate into performance benefits when the corresponding features are added to the model. Our results are summarized in the first part of Table III. Incremental additions of feature categories (ordered roughly by computational cost and algorithmic complexity) lead to consistent performance increases across all desired recalls.
|Statistical/Complexity||0.364 0.003||0.341 0.003||0.324 0.004|
|+AutoRegIndices||0.387 0.006||0.365 0.005||0.333 0.004|
|+BandEnergy||0.404 0.007||0.383 0.006||0.348 0.005|
|+PulseMorphology||0.442 0.006||0.405 0.005||0.347 0.003|
|Location||0.417 0.006||0.387 0.006||0.343 0.005|
|Loc+Trend||0.445 0.006||0.417 0.005||0.363 0.003|
|Loc+Trend+Variation||0.442 0.006||0.405 0.005||0.347 0.003|
Multi-scale history summary modes
It has been reported in the literature that variability or trends of individual metrics are predictive of ICH events. Using different feature buffer summarization functions, we tested whether such features are indeed valuable vs. metric averages. Our results, listed in the second part of Table III, suggest that trend function provide benefits, whereas additional variability functions have no marginal positive effect.
How much history do we need to store?
Given the benefits of complex features, it is still unclear whether any change in pre-hypertensive patterns occurs during the short- or also long-term history before the event. We tried to answer this question by ablating the set of multi-scale summary functions supported by our framework. Our results (Figure 3) indicate that there seem to be no clear saturation effects when adding averages/trends over additional length scales, and that the largest performance gains are provided by adding summaries of the last 6 and the last 8 hours of data.
Comparison of optimal model with baselines
In a last step, we evaluated different machine learning methods applied to our optimal model and compared with two baselines from the literature. We simulated the method of Hu et al.  (BL1: ICP morphology) by computing medians of ICP pulse morphology metrics in the last 15/30 minutes, which is similar to their pre-hypertensive segment features. A second baseline implements the recently proposed method by Myers et al.  (BL2: Last 2 points + Time to last crisis) which uses as the three features the last 2 ICP values in a 30 minute window and the time since the last ICH event. If there was no such event, the last feature was set to a large value. Results, summarized in Table IV, show that the simplest machine learning model, i.e. LogReg, performed the best, while more complicated models like neural networks (MLP) or tree-based methods (Tree/GradBoost) provided no improvements. Our optimal model significantly outperformed the two baselines. N/A is shown as a table entry if the recall could not be achieved by a model in all splits.
|Best model (LogReg)||0.445 0.006||0.417 0.005||0.363 0.003|
|Best model (MLP)||0.421 0.006||0.384 0.005||0.338 0.004|
|Best model (Tree)||0.280 0.002||N/A||N/A|
|Best model (GradBoost)||0.352 0.002||0.320 0.003||N/A|
|BL1: ICP morphology (LogReg)||0.400 0.004||0.362 0.005||0.307 0.004|
|BL2: Last 2 points + Time to last crisis (LogReg)||0.402 0.005||0.369 0.005||0.303 0.003|
Ranking of most important physiological metrics
By computing mean absolute SHAP values on the validation set in all 10 splits, we obtained a joint ranking of importance of individual physiological metrics. The 20 most important features of our final model are listed in Table V. Features that have identical signatures but are computed over distinct scales, are not repeated. Instead, scales that belong to the top 50 features overall are listed in the last column. More details on important features are listed in Tables VII (wICP/wABP waveform) and VIII (auto-regulation indices).
|Rank||Feature descriptor||Important scales|
|26||Time since admission||N/A|
|39||Current ICP median||N/A|
As a complementary analysis to feature ablation, we also looked at which feature categories provided important features according to rankings of mean absolute SHAP values. To enable an easier comparison, we computed the fraction of actual inclusions in the top 200 features (per split) over the number of theoretically possible inclusions. Results are summarized in Table VI. Waveforms contribute more to highly ranked features than time series, both in absolute and relative terms. In addition, several important features are auto-regulation indices, spectral energies or morphological summaries. Finally, long-scale history summaries between 4 and 8 hours also provide many highly ranked features.
|Feature descriptor||Inclusion count||Normalized inclusion count|
|Base feature function|
|History length [mins]|
Overall model performance
The prediction performance of the fixed optimal feature set for relevant event recalls is shown in Figure 4. To provide more insights into the behavior of a derived alarm system in clinical settings, we have analyzed the recall of alarms before events, conditional on the time until the ICH phase starts. This measures the timeliness of alarms given a fixed model with a constant overall false alarm rate, which is a realistic scenario of clinical implementation. Three false alarm rates that yield sensitive retrieval in excess of 80% across the entire range of time-before-events were chosen. As other models did not perform competitively, only the results for LogReg are shown. We can observe a modest decay of alarm recall rates, which stay above 80-90% even 8 hours prior to the event.
|Rank||Feature descriptor||Most important scales|
|Rank||Feature descriptor||Most important scales|
We have designed and evaluated a prediction framework for intracranial hypertension events, which describes the neurological state of a patient using multi-scale descriptors of cerebral autoregulation indices, waveform morphology metrics, spectral energy and statistical summaries. Critical events were retrieved up to 8 hours before the onset of ICH (Figure 4) with a recall of 90 % at a precision of 36 %. By analyzing the system using recall/precision we have chosen metrics that more easily translate to the clinical deployment of alarm systems than standard metrics like AUROC, and are more relevant in the context of rare events.
The results in Table II provide an interesting perspective on the design of ICH alarm systems. While building a model just using time series defining the event status (ICP) provides a good baseline performance, the inclusion of context information (CPP/ABP) and richer data modalities like waveforms can substantially increase performance. Including high-frequency context information (wABP) in addition to wICP increases the performance considerably. This hints at new independent information in the ABP waveform and supports the importance of auto-regulation indices, which are functions of two waveforms simultaneously. However, data storage and computational cost associated with waveform data might be considerable, even though our framework has faster than real-time performance.
To our knowledge, there is no previous work that assesses the relative merits of different data modalities for ICH prediction. It is an interesting finding that only using waveforms performed better than only time series, especially when compared to recent related works, which found high performance using very simple models, e.g. using only minute-by-minute summaries.
Each individual feature category among auto-regulation indices, spectral energy, pulse morphology metrics provides marginal performance gains (Table III) and is relevant for explaining predictions, as assessed using SHAP values (Table VI). This shows that more complex characteristic pre-hypertensive changes can translate into relevant machine learning features in our framework. It is surprising that inclusion of variability summaries of basic metrics (Table III) decreased performance. This might be related to saturation effects and introduction of correlated features, leading to overfitting.
Our results (Figure 3) show a clear trend between the length of considered history and prediction performance, which confirms the design principle of the multi-scale history, and also hints at the relative importance of long-scale physiological changes before hypertensive events, which could inform clinical studies. The same observation can be derived from the analysis of feature category importance in Table VI, where a clear trend in importance from short-term to long-term features is visible.
The comparison of machine learning models (Table IV) provides a pragmatic look at the relevance of the exact statistical learning method for predicting ICH. As has been observed also for other prediction problems in health care, simple models perform surprisingly well, and are not outperformed by models with modestly higher complexity. It is conceivable however, that the data available for this study was not sufficient in size to answer this question, and there might be a break-even point when the more complex functions learnable by MLPs outperform logistic regression on this problem. We also suspect that, since the feature choices already incorporate extensive domain knowledge, a simple model like logistic regression is powerful enough. Given the similar performance of MLP and LogReg, we did not consider the construction of more complicated models like RNNs or CNNs for this study.
We have presented an online machine learning and signal processing framework that forecasts onsets of intracranial hypertension up to 8 hours in advance. Using an extensive series of ablation studies we have shed light on the critical components of our ICH prediction framework. An investigation of feature importances using SHAP values provided a second perspective on the importance of different feature categories in explaining predictions as well as a ranking of discriminative physiological pattern changes that occur before intracranial hypertension. Both perspectives highlight the importance of information derived from high-frequency waveforms for this prediction problem, which provided a substantial performance increase. Our method out-performed two baseline methods from the literature, which use ICP pulse morphology and 3 simple features of the ICP time series, respectively.
Directions of future work include more sophisticated signal cleaning and artifact detection methods at the block level, aiming to minimize the corruption of down-stream feature generation, which is highly sensitive to accurate input signals. Exploring the per-sample SHAP values could provide interpretable reasons for predictions of future ICH events, visualize regions of interest in the history, flag abnormal physiological indices that could precede ICH, as well as generate hypotheses for future studies on the phenomenon. Furthermore, our method could be extended by providing a calibrated alarm system on top of the prediction scores, which triggers alarms at the bedside as a function of sequences of prediction scores. This would be an important step towards clinical implementation of our proposed approach. Finally, we expect that our framework could be applied to predict other critical events occurring in the injured brain.
We acknowledge Emanuela Keller, head of the Neurocritical Care Unit at the University Hospital Zürich, Switzerland, for providing indispensible clinical insights and motivation for this work. MH gratefully acknowledges helpful discussions with Panagiotis Farantatos, Viktor Gal, Stephanie Hyland, Xinrui Lyu, Ngoc M. Pham and Gunnar Rätsch.
-  J. A. Langlois et al., “The epidemiology and impact of traumatic brain injury: a brief overview,” J. Head Trauma Rehabil., vol. 21, no. 5, pp. 375–378, 2006.
-  H. M. Bramlett and W. D. Dietrich, “Pathophysiology of cerebral ischemia and brain trauma: similarities and differences,” J. Cereb. Blood Flow Metab., vol. 24, no. 2, pp. 133–150, Feb. 2004.
-  M. Oddo et al., “Brain hypoxia is associated with short-term outcome after severe traumatic brain injury independently of intracranial hypertension and low cerebral perfusion pressure,” Neurosurgery, vol. 69, no. 5, pp. 1037–1045, Nov. 2011. [Online]. Available: http://dx.doi.org/10.1227/NEU.0b013e3182287ca7
-  T. Rehman et al., “Rapid progression of traumatic bifrontal contusions to transtentorial herniation: A case report,” Cases J., vol. 1, no. 1, p. 203, Oct. 2008. [Online]. Available: http://dx.doi.org/10.1186/1757-1626-1-203
-  C. Werner and K. Engelhard, “Pathophysiology of traumatic brain injury,” Br. J. Anaesth., vol. 99, no. 1, pp. 4–9, Jul. 2007. [Online]. Available: http://dx.doi.org/10.1093/bja/aem131
-  A. Lavinio and D. K. Menon, “Intracranial pressure: why we monitor it, how to monitor it, what to do with the number and what’s the future?” Current Opinion in Anesthesiology, vol. 24, no. 2, pp. 117–123, 2011.
-  N. Carney et al., “Guidelines for the management of severe traumatic brain injury, 4th edition,” Neurosurgery, vol. 80, no. 1, pp. 6–15, Jan. 2017. [Online]. Available: https://academic.oup.com/neurosurgery/article-abstract/80/1/6/2585042
-  D.-J. Kim et al., “Continuous monitoring of the Monro-Kellie doctrine: is it possible?” J. Neurotrauma, vol. 29, no. 7, pp. 1354–1363, May 2012. [Online]. Available: http://dx.doi.org/10.1089/neu.2011.2018
-  J. A. Claassen et al., “Transfer function analysis of dynamic cerebral autoregulation: A white paper from the international cerebral autoregulation research network,” J. Cereb. Blood Flow Metab., vol. 36, no. 4, pp. 665–680, 2016. [Online]. Available: http://journals.sagepub.com/doi/abs/10.1177/0271678X15626425
-  L. Rangel-Castilla et al., “Cerebral pressure autoregulation in traumatic brain injury,” Neurosurg. Focus, vol. 25, no. 4, p. E7, Oct. 2008. [Online]. Available: http://dx.doi.org/10.3171/FOC.2008.25.10.E7
-  L. Rangel-Castillo et al., “Management of intracranial hypertension,” Neurologic clinics, vol. 26, no. 2, pp. 521–541, 2008.
-  J. McNames et al., “Precursors to rapid elevations in intracranial pressure,” in Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 4, 2001, pp. 3977–3980. [Online]. Available: http://dx.doi.org/10.1109/IEMBS.2001.1019715
-  T. F. Bardt et al., “Monitoring of brain tissue PO2 in traumatic brain injury: effect of cerebral hypoxia on outcome,” Acta Neurochir. Suppl., vol. 71, pp. 153–156, 1998. [Online]. Available: https://www.ncbi.nlm.nih.gov/pubmed/9779171
-  S. Badri et al., “Mortality and long-term functional outcome associated with intracranial pressure after traumatic brain injury,” Intensive Care Med., vol. 38, no. 11, pp. 1800–1809, 2012.
-  B. W. Bonds et al., “Predictive value of hyperthermia and intracranial hypertension on neurological outcomes in patients with severe traumatic brain injury,” Brain Inj., vol. 29, no. 13-14, pp. 1642–1647, Oct. 2015. [Online]. Available: http://dx.doi.org/10.3109/02699052.2015.1075157
-  E. Karamanos et al., “Intracranial pressure versus cerebral perfusion pressure as a marker of outcomes in severe head injury: a prospective evaluation,” Am. J. Surg., vol. 208, no. 3, pp. 363–371, Sep. 2014. [Online]. Available: http://dx.doi.org/10.1016/j.amjsurg.2013.10.026
-  M. Majdan et al., “Timing and duration of intracranial hypertension versus outcomes after severe traumatic brain injury,” Minerva Anestesiol., vol. 80, no. 12, pp. 1261–1272, Dec. 2014. [Online]. Available: https://www.ncbi.nlm.nih.gov/pubmed/24622160
-  A. Bhatia and A. K. Gupta, “Neuromonitoring in the intensive care unit. i. intracranial pressure and cerebral blood flow monitoring,” Intensive Care Medicine, vol. 33, no. 7, pp. 1263–1271, 2007.
-  M. Feng et al., “Artifact removal for intracranial pressure monitoring signals: a robust solution with signal decomposition,” Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 2011, pp. 797–801, 2011. [Online]. Available: http://dx.doi.org/10.1109/IEMBS.2011.6090182
-  R. Sahjpaul and M. Girotti, “Intracranial pressure monitoring in severe traumatic brain injury–results of a canadian survey,” Can. J. Neurol. Sci., vol. 27, no. 2, pp. 143–147, 2000.
-  M.-C. Chambrin et al., “Multicentric study of monitoring alarms in the adult intensive care unit (ICU): a descriptive analysis,” Intensive Care Med., vol. 25, no. 12, pp. 1360–1366, Dec. 1999. [Online]. Available: https://doi.org/10.1007/s001340051082
-  J. McNames et al., “Sensitive precursors to acute episodes of intracranial hypertension,” in The 4th International Workshop in Biosignal Interpretation, Como, Italy, 2002, pp. 303–306. [Online]. Available: https://pdfs.semanticscholar.org/0cf1/93cd3d50d9882ddd463874b442fc795148ee.pdf
-  X. Hu et al., “Forecasting ICP elevation based on prescient changes of intracranial pressure waveform morphology,” IEEE Transactions on Biomedical Engineering, vol. 57, no. 5, pp. 1070–1078, 2010.
-  R. B. Myers et al., “Predicting intracranial pressure and brain tissue oxygen crises in patients with severe traumatic brain injury,” Crit. Care Med., vol. 44, no. 9, pp. 1754–1761, 2016.
-  S. M. Lundberg et al., “Consistent individualized feature attribution for tree ensembles,” Feb. 2018. [Online]. Available: http://arxiv.org/abs/1802.03888
-  V. De Luca et al., “Temporal prediction of cerebral hypoxia in neurointensive care patients: a feasibility study,” in 16th International Symposium on Intracranial Pressure and Neuromonitoring, 2016.
-  M. Hüser et al., “Forecasting intracranial hypertension using waveform and time series features,” in Vasospasm 2015 - 13th International Conference on Neurovascular Events after Subarachnoid Hemorrhage, 2015.
-  R. Hornero et al., “Interpretation of approximate entropy: analysis of intracranial pressure approximate entropy during acute intracranial hypertension,” IEEE Transactions on Biomedical Engineering, vol. 52, no. 10, pp. 1671–1680, 2005.
-  J.-Y. Fan et al., “An approach to determining intracranial pressure variability capable of predicting decreased intracranial adaptive capacity in patients with traumatic brain injury,” Biol. Res. Nurs., vol. 11, no. 4, pp. 317–324, Apr. 2010. [Online]. Available: http://dx.doi.org/10.1177/1099800409349164
-  P. Naraei et al., “Toward learning intracranial hypertension through physiological features: A statistical and machine learning approach,” in Intelligent Systems Conference, 2017, pp. 395–399. [Online]. Available: http://dx.doi.org/10.1109/IntelliSys.2017.8324324
-  F. A. Zeiler et al., “A description of a new continuous physiological index in traumatic brain injury using the correlation between pulse amplitude of intracranial pressure and cerebral perfusion pressure,” J. Neurotrauma, vol. 35, no. 7, pp. 963–974, 2018. [Online]. Available: https://www.liebertpub.com/doi/abs/10.1089/neu.2017.5241
-  M. J. H. Aries et al., “Continuous monitoring of cerebrovascular reactivity using pulse waveform of intracranial pressure,” Neurocrit. Care, vol. 17, no. 1, pp. 67–76, Aug. 2012. [Online]. Available: http://dx.doi.org/10.1007/s12028-012-9687-z
-  D. K. Radolovich et al., “Pulsatile intracranial pressure and cerebral autoregulation after traumatic brain injury,” Neurocrit. Care, vol. 15, no. 3, pp. 379–386, Dec. 2011. [Online]. Available: http://dx.doi.org/10.1007/s12028-011-9553-4
-  N. Kim et al., “Trending autoregulatory indices during treatment for traumatic brain injury,” J. Clin. Monit. Comput., vol. 30, no. 6, pp. 821–831, 2016.
-  M. Balestreri et al., “Intracranial hypertension: what additional information can be derived from ICP waveform after head injury?” Acta Neurochir., vol. 146, no. 2, pp. 131–141, Feb. 2004. [Online]. Available: http://dx.doi.org/10.1007/s00701-003-0187-y
-  C. J. Kirkness et al., “Intracranial pressure waveform analysis: clinical and research implications,” J. Neurosci. Nurs., vol. 32, no. 5, pp. 271–277, Oct. 2000. [Online]. Available: https://www.ncbi.nlm.nih.gov/pubmed/11089200
-  F. Scalzo et al., “Intracranial hypertension prediction using extremely randomized decision trees,” Med. Eng. Phys., vol. 34, no. 8, pp. 1058–1065, 2012.
-  R. Hamilton et al., “Forecasting intracranial pressure elevation using pulse waveform morphology,” in Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society, 2009, pp. 4331–4334.
-  P. K. Eide, “A new method for processing of continuous intracranial pressure signals,” Med. Eng. Phys., vol. 28, no. 6, pp. 579–587, 2006.
-  F. Scalzo et al., “Reducing false intracranial pressure alarms using morphological waveform features,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 1, pp. 235–239, Jan. 2013. [Online]. Available: http://dx.doi.org/10.1109/TBME.2012.2210042
-  F. Scalzo et al., “Intracranial pressure signal morphology: Real-time tracking,” IEEE Pulse, vol. 3, no. 2, pp. 49–52, March 2012.
-  X. Hu et al., “Morphological clustering and analysis of continuous intracranial pressure,” IEEE Transactions on Biomedical Engineering, vol. 56, no. 3, pp. 696–705, 2009.
-  J. Wang et al., “Bag-of-words representation for biomedical time series classification,” Biomed. Signal Process. Control, vol. 8, no. 6, pp. 634–644, 2013.
-  P. Xu et al., “Improved wavelet entropy calculation with window functions and its preliminary application to study intracranial pressure,” Comput. Biol. Med., vol. 43, no. 5, pp. 425–433, Jun. 2013. [Online]. Available: http://dx.doi.org/10.1016/j.compbiomed.2013.01.022
-  C.-W. Lu et al., “Complexity of intracranial pressure correlates with outcome after traumatic brain injury,” Brain, pp. 2399–2408, 2012.
-  R. M. Jha et al., “Intracranial pressure trajectories: A novel approach to informing severe traumatic brain injury phenotypes,” Crit. Care Med., vol. 46, no. 11, pp. 1792–1802, Nov. 2018. [Online]. Available: http://dx.doi.org/10.1097/CCM.0000000000003361
-  J. Pace et al., “A clinical prediction model for raised intracranial pressure in patients with traumatic brain injuries,” J. Trauma Acute Care Surg., vol. 85, no. 2, pp. 380–386, Aug. 2018. [Online]. Available: http://dx.doi.org/10.1097/TA.0000000000001965
-  D. M. Stein et al., “Use of serum biomarkers to predict secondary insults following severe traumatic brain injury,” Shock, vol. 37, no. 6, pp. 563–568, Jun. 2012. [Online]. Available: http://dx.doi.org/10.1097/SHK.0b013e3182534f93
-  A. A. Adamides et al., “Brain tissue lactate elevations predict episodes of intracranial hypertension in patients with traumatic brain injury,” J. Am. Coll. Surg., vol. 209, no. 4, pp. 531–539, Oct. 2009. [Online]. Available: http://dx.doi.org/10.1016/j.jamcollsurg.2009.05.028
-  G. Hergenroeder et al., “Identification of serum biomarkers in brain-injured adults: potential for predicting elevated intracranial pressure,” J. Neurotrauma, vol. 25, no. 2, pp. 79–93, Feb. 2008. [Online]. Available: http://dx.doi.org/10.1089/neu.2007.0386
-  B. Quachtran et al., “Detection of intracranial hypertension using deep learning,” Proc. IAPR Int. Conf. Pattern Recogn., vol. 2016, pp. 2491–2496, Dec. 2016. [Online]. Available: http://dx.doi.org/10.1109/ICPR.2016.7900010
-  P. Naraei and A. Sadeghian, “A PCA based feature reduction in intracranial hypertension analysis,” in 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), Apr. 2017, pp. 1–6.
-  F. Güiza et al., “Novel methods to predict increased intracranial pressure during intensive care and long-term neurologic outcome after traumatic brain injury: Development and validation in a multicenter dataset,” Crit. Care Med., vol. 41, no. 2, pp. 554–564, 2013.
-  F. Güiza et al., “Early detection of increased intracranial pressure episodes in traumatic brain injury: External validation in an adult and in a pediatric cohort,” Critical Care Medicine, vol. 45, no. 3, pp. e316–e320, 2017. [Online]. Available: https://www.ingentaconnect.com/content/wk/ccm/2017/00000045/00000003/art00009
-  B. W. Bonds et al., “Predicting secondary insults after severe traumatic brain injury,” J. Trauma Acute Care Surg., vol. 79, no. 1, pp. 85–90, 2015.
-  F. Zhang et al., “Artificial neural network based intracranial pressure mean forecast algorithm for medical decision support,” in Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society, 2011, pp. 7111–7114.
-  J.-S. Shieh et al., “Intracranial pressure model in intensive care unit using a simple recurrent neural network through time,” Neurocomputing, vol. 57, pp. 239–256, Mar. 2004. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0925231203005137
-  F. Zhang et al., “Online ICP forecast for patients with traumatic brain injury,” in Proceedings of the 21st International Conference on Pattern Recognition (ICPR), Nov. 2012, pp. 37–40. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/6460066/
-  M. Saeed et al., “Multiparameter intelligent monitoring in intensive care II (MIMIC-II): a public-access intensive care unit database,” Crit. Care Med., vol. 39, no. 5, pp. 952–952, 2011.
-  J. Pan and W. J. Tompkins, “A real-time QRS detection algorithm,” IEEE Transactions on Biomedical Engineering, no. 3, pp. 230–236, 1985.
-  X. Hu et al., “An algorithm for extracting intracranial pressure latency relative to electrocardiogram R wave,” Physiol. Meas., vol. 29, no. 4, pp. 459–459, 2008.
-  W. Zong et al., “An open-source algorithm to detect onset of arterial blood pressure pulses,” in Computers in Cardiology, 2003, pp. 259–262.
-  V. G. Almeida et al., “Machine learning techniques for arterial pressure waveform analysis,” J Pers Med, vol. 3, no. 2, pp. 82–101, May 2013. [Online]. Available: http://dx.doi.org/10.3390/jpm3020082
-  R. Esteller et al., “Line length: an efficient feature for seizure onset detection,” in Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 2, 2001, pp. 1707–1710.
-  L. A. Steiner et al., “Continuous monitoring of cerebrovascular pressure reactivity allows determination of optimal cerebral perfusion pressure in patients with traumatic brain injury,” Crit. Care Med., vol. 30, no. 4, pp. 733–738, Apr. 2002.
-  M. Czosnyka et al., “Continuous assessment of the cerebral vasomotor reactivity in head injury,” Neurosurgery, vol. 41, no. 1, pp. 11–17, Jul. 1997. [Online]. Available: https://www.ncbi.nlm.nih.gov/pubmed/9218290
-  C. J. Avezaat et al., “Cerebrospinal fluid pulse pressure and intracranial volume-pressure relationships,” J. Neurol. Neurosurg. Psychiatry, vol. 42, no. 8, pp. 687–700, Aug. 1979. [Online]. Available: https://www.ncbi.nlm.nih.gov/pubmed/490174
-  D.-J. Kim et al., “Index of cerebrospinal compensatory reserve in hydrocephalus,” Neurosurgery, vol. 64, no. 3, pp. 494–501, Mar. 2009. [Online]. Available: http://dx.doi.org/10.1227/01.NEU.0000338434.59141.89
-  J. J. Lemaire et al., “Slow pressure waves in the cranial enclosure,” Acta Neurochir., vol. 144, no. 3, pp. 243–254, Mar. 2002. [Online]. Available: http://dx.doi.org/10.1007/s007010200032
-  R. Zhang et al., “Transfer function analysis of dynamic cerebral autoregulation in humans,” Am. J. Physiol., vol. 274, no. 1 Pt 2, pp. H233–241, Jan. 1998. [Online]. Available: https://www.ncbi.nlm.nih.gov/pubmed/9458872
-  L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT. Springer, 2010, pp. 177–186.
-  G. Ke et al., “LightGBM: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems 30, 2017, pp. 3146–3154. [Online]. Available: http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf