Medically Relevant Criteria used in EEG Compression for Improved Post-Compression Seizure Detection
Biomedical signals aid in the diagnosis of different disorders and abnormalities. When targeting lossy compression of such signals, the medically relevant information that lies within the data should maintain its accuracy and thus its reliability. In fact, signal models that are inspired by the bio-physical properties of the signals at hand allow for a compression that preserves more naturally the clinically significant features of these signals. In this paper, we illustrate this through the example of EEG signals; more specifically, we analyze three specific lossy EEG compression schemes. These schemes are based on signal models that have different degrees of reliance on signal production and physiological characteristics of EEG. The resilience of these schemes is illustrated through the performance of seizure detection post compression.
Nowadays medical information management systems and transmission of biomedical signals are widely used in hospitals and clinics. In addition, transmission of biomedical signals allows medical experts to remotely evaluate the information carried by the signals in a cost-effective manner. The massive amount of data requires large storage space and channel bandwidth and therefore, this problem calls for efficient compression methods.
There is a need to efficiently compress biomedical signals while preserving the important diagnostic-oriented information that lies within this data. Lossless compression guarantees no added distortion and therefore the data remains reliable for medical analysis. Although lossless compression is more desired for medical signals, higher compression rates can be achieved using lossy techniques.
When targeting lossy compression for biomedical signals, more focus should be given on retaining medically relevant information. And thus, when coding the signals, more emphasis should be given to this particular aspect of the data in order to achieve good compression performance. In this paper, taking the example of Electroencephaogaphy (EEG) signals, we argue that biomedical signal compression systems that take into account more effectively the underlying nature of the signal lead to better results, in terms of the preservation of clinically significant features after compression. In the next paragraphs we will focus on the characteristics of EEG signals, more specifically, on the underlying generators and the different neurological aspects of these signals.
Most observed scalp EEG activity is generated within the cerebral cortex . A synchronous synaptic simulation of a very large number of neurons results in a dipolar current source oriented orthogonal to the cortical surface . The measured EEG is actually the propagation of this current onto the different electrodes’ locations.
Thus, EEG signals can be considered as projections of certain activities that are occurring inside the cerebral cortex. These projected electrical signals are measured from certain locations on the patient’s head, i.e. electrodes. Since certain neurological components are behind these observations, a lot of redundancy is present. This redundancy can be directly seen between the different recording channels. This is known as spatial redundancy.
EEG is used to diagnose certain disorders and also in sleep analysis, known as polysomnography. These signals reflect the state of the patient. In fact, the different functional stages of a patient’s state of mind can be characterized by certain EEG rhythms or brain waves . Brain activity of EEG signals is usually divided into five main frequency rhythms: delta ( - Hz), theta ( - Hz), alpha ( - Hz), beta ( - Hz) and gamma ( - Hz) [3, p. 33] . The presence or absence of these waves during certain periods of recording can help determine certain abnormalities. Since these rhythms tend to naturally extend and repeat during different stages of the EEG recording, there is redundancy present at certain frequency sub-bands between different periods of recording.
When compressing these signals, the neurological characteristics that are usually used in the medical analysis of these signals can help achieve better analysis and approximation and thus better remove redundant information.
We recently suggested three different methods that target the compression of scalp EEG data using different modelling and coding techniques. These methods were developed while focusing on the neurological characteristics of these signals.
The first method is based on using classic transformation and coding techniques to compress the EEG recordings while focusing on spatial redundancy . The second method explores a common physiological characteristic of the EEG signals, more specifically brain waves, in order to develop appropriate compression methods. It focuses on extracting the redundancy present at specific frequency bands to achieve decorrelation at different time instances . In the third method, the underlying physiological sources behind the observed signals on the scalp are explored. The observed signals are modelled using these sources which help in extracting the mutual information present between the EEG channels .
In this paper, these three different methods are first presented. Afterwards, performance results in terms of post compression seizure detection are shown. This recently proposed qualitative measure that is used to compare the original and reconstructed signals provides a better reflection of the information loss with respect to the medically relevant data  . A comparative analysis that discusses the weaknesses and strengths of each suggested method is then presented. The paper ends with a conclusion and suggestions for future work.
Ii Compression Methods
Ii-a Pre-Processing of Multi-Channel EEG for Improved Compression Performance using SPIHT
As previously mentioned, in scalp recordings, EEG signals measured from certain locations on the scalp can be seen as the projection of activity located inside the brain . In fact, EEG channels display a lot of similarity and even superposition of the different signals. Thus looking at these recordings in the spatial dimension, i.e. between different channels, is very important in capturing this redundancy .
The first method uses discrete wavelet transform (DWT) and SPIHT in D to code the EEG channels   . Thus, it makes use of the inter-channel redundancy present between different EEG channels of the same recording and the intra-channel redundancy between the different samples of a specific channel. SPIHT was originally suggested for the compression of D images, thus this method exploits the basic characteristics of this type of data. More precisely, it exploits images characteristics where most of the image’s energy is located in the low frequency components and there is spatial self-similarity among the sub-bands .
In this SPIHT-based method, classic compression techniques that are initially targeted for D images are applied on matrices of EEG recordings. However, pre-processing is performed as first step in order to optimize the performance of these coders on the characteristics of our signals .
Ii-B Dynamic Dictionary for Combined EEG Compression and Seizure Detection
When analyzing EEG signals for the purpose of medical diagnosis, brain waves are identified in order to find the different functional stages of a patient’s state of mind. As previously mentioned, these different rhythms can be used to characterize the different EEG segments . Thus, depending on the state of mind of the patient, brain waves tend to extend and repeat throughout different segments of recording. This creates redundancy between the segments.
The second suggested method, dictionary-based method, aims at comparing EEG segments of different time periods and extracting the redundancy present between these segments. To do that, this technique focuses on the energy in the different frequency sub-bands that correspond to the different brain rhythms. DWT, dynamic reference lists and SPIHT are used to compute and code the decorrelated sub-band coefficients. This method is able to both compress EEG channels and detect seizure-like activity .
Therefore, this method uses a physiological characteristic of EEG signals, which is the different brain waves, in order to analyze the signals and remove the intrinsic redundancy between the different segments in a single EEG channel.
Ii-C EEG Compression of Scalp Recordings based on Dipole Fitting
As previously mentioned, there are certain neuronal generators that are behind the observed EEG signals  . In fact, the non-invasive localization of these generators is known as the inverse solution and is used in the medical analysis of EEG. Finding a solution to the inverse problem by relying on the pattern of recorded EEG is able to give us a model that maps the generators to the measured projections on the scalp . Therefore, having solved the inverse problem, one can use such a model to generate, from the calculated dipoles, an approximation of the EEG recordings. This is known as the forward problem .
This third method, dipole-based method, provides a deeper analysis of the intrinsic dependency inherent between the different EEG channels. It is based on dipole fitting that is usually used in order to find a solution to the classic problems in EEG analysis: inverse and forward problems  . The suggested compression system uses dipole fitting as a first building block to provide an approximation of the recorded signals. Then, based on a smoothness factor, appropriate coding techniques are suggested to compress the residuals of the fitting process.
Iii Results and Discussions
As previously mentioned, in medical signals, it is important to move towards a diagnostics-oriented performance assessment  . In the next section we will focus on analyzing the performance of the three suggested compression methods using an automatic seizure detection system Stellate Harmonie System  .
Data used in the testing, known as CHB-MIT Scalp EEG Database, was collected at the Children’s Hospital Boston . Recordings are done on pediatric patients suffering from intractable seizures. These recordings are annotated by medical experts and are sampled at Hz and bits used in the recording’s precision.
|Dipole-Based||1D Dictionary-Based||2D SPIHT-Based|
|2 bps||4 bps||2 bps||4 bps||2 bps||4 bps|
Iii-B Statistical Measures used in Detection Analysis
In order to test the performance of the different methods on the chosen datasets, the pre-processor of Stellate Harmonie, ICTA-S onset detector, is used. Testing is done with data compressed at different bit rates and flagged sections are compared in order to analyze the information loss. The statistical measures described below are similar to the ones explained in previous studies  .
The percentage of true positives () and the total number of false positives () are used in the evaluation process. In these measures, the ground truth is chosen from the detection output when testing the original EEG records. It is equal to the total number of flagged sections found. The following provides a definition of the statistical measures for this scenario  :
True Positive (): A period of one minute or more of overlap occurs between a flagged section in the compressed file and a flagged section in the original file.
False Positive (): No overlap, or an overlap of less than a minute, is found between a flagged section in the compressed recording and flagged sections in the original recording.
The total number of is divided by the total number of flagged sections in the original file in order to compute the percentage of the true positives.
Table shows the detection results of all three methods when taking Stellate Harmonie tested on the original files (i.e. at bps) as ground truth. When looking at both the individual patients and on the average values over all patients (shown in the last row), we notice that there is degradation in performance between the different methods. It can clearly be seen that the dipole-based method gives higher percentage of for most patients compared to both methods. In addition, the dictionary-based method outperforms the D SPIHT-based method. The same can be observed for the total number of false positives. The number of for the dipole-based method is lower than the one for the dictionary-based method, which also is lower than the D SPIHT-based method. This can be observed for most of the patients used in the testing.
Figures and summarize the results shown in Table by highlighting the mean, minimum and maximum values of and for the two bit rates and for the three different methods. It should be noted that certain parameters have minimum values, for this reason certain bars only show two values, the average and the maximum.
These figures highlight the fact that the mean and the minimum values of are highest for the dipole-based method and decrease as we switch to the dictionary-based method then to the D SPIHT-based method. In addition, all methods at all bit rates have a maximum value of of . The opposite is noticed for the false positives where an increase occurs between these three methods. We actually notice a big jump in for the D SPIHT-based method for a bit rate of bps.
Figure shows the scatter plots of where the ground truth is taken as the detections of Stellate Harmonie tested on the original files with respect to the mean PRD values of all patients of MIT DB, at bit rates and bps. Both Figures a and b show a direct relation between distortion and true detections for the dictionary-based and the D SPIHT-based methods. For these two methods, PRD values vary a lot and detections decrease as the values of PRD increase. However, for the dipole-based method, at both bit rates of and bps, PRD values do not vary a lot and all values of are high. Thus there is no apparent relationship that links distortion and detections.
In the D SPIHT-based method, pre-processing transforms applied prior to coding focus more on the inter-channel redundancy which is a physiological characteristic of the EEG caused by the placement of electrodes in neighboring regions on the scalp. However, in this method, decorrelation does not go beyond a certain matrix. This means that this method does not examine redundancy present between different temporal sections of recording.
The second method takes into account a different physiological characteristic of the EEG, which is the presence of certain brain waves at certain periods of time. As mentioned previously, these brain rhythms are an indication of the patient’s state of mind. Thus, in EEG recordings, different EEG segments can display similar features and characteristics. In fact, EEG segments can be grouped based on certain features for manual classification and abnormality detection  . In this method reference lists with dynamic update are used to achieve this grouping.
When comparing the first two methods, an improvement is observed in the detection results where we notice an increase in true detections and a decrease in false positives. In fact, the D SPIHT-based method gives the worst results in terms of compression distortion and seizure detection. This method is based on D SPIHT coding that uses a tree-like hierarchy. In this hierarchy, low frequency components are considered to have higher energy than high frequency components. Thus, when using SPIHT coding, less and less bits are allocated to the high frequency components as bit rates decrease. This causes more distortion in the high frequency band.
Seizures sometimes manifest an increase in amplitude and frequency. Distortion added to the high frequency sections causes a degradation in true detections. Thus, for this reason, the first suggested compression method is not recommended for recordings of patients suffering from epilepsy. This method is not able to well preserve important diagnostic oriented information when allocated bit rates decrease.
The second suggested method, dictionary-based method, gives good detection results compared to the first method. Average values of are almost as good as the dipole-based method, as seen in Figures and . This method is able to detect seizure-like activity as shown in . Thus, compression based on this method is recommended for recordings of patients suffering with epilepsy, where detection and compression of the data can be performed in parallel.
As mentioned previously, the third method, i.e. dipole-based method, is based on modelling the relationships between the different channels using dipoles and their moments. It examines and explores a deeper physiological characteristic of the EEG compared to the other two methods, which is the fact that the signals are generated by dipoles located inside the skull. Thus, it provides better extraction of the redundancy between the different channels. In addition the suggested coding techniques further decorrelate the EEG matrices in time. This improvement in coding is highlighted in the results shown in . The third method is able to provide both lower distortion values for high and improvement in seizure detection even at low bit rates compared to the other two compression methods.
Results show that when exploring physiological characteristics of EEG signals, better extraction of redundancy can be achieved. In addition, the deeper and more meaningful the physiological feature used, the better the compression.
The D SPIHT-based method uses basic decorrelation by relying on the spatial and temporal redundancies that characterize the EEG. It applies simple pre-processing techniques and D transform and coder. Improvement is achieved in the dictionary-based method where dynamic reference lists enable us to examine and explore a more pronounced physiological characteristic of the EEG, which is the presence of brain waves. A deeper extraction of the redundancy present between the channels is achieved in the dipole-based method, where dipole fitting is used to model the relationship between these channels and therefore explore a deeper physiological characteristic of the EEG. The coders used in this method achieve further decorrelation in D.
Results highlight the improvements in performance achieved from the first suggested method, D SPIHT-based method to the latest suggested method, the dipole-based method. When the method is able to achieve better decorrelation of the recorded signals, an improvement in post-compression detection performance is achieved for very low bit rates.
The dipole-based method is based on the assumption that a single dipole is behind the generation of the observed activity on the scalp. This gives very low distortion for event-related potentials . Improvements can be added to this method by exploring the usage of a larger number of dipoles for different types of EEG recordings.
The authors gratefully acknowledge Professor Jean Gotman and his team at the Montreal Neurological Institute and Hospital, McGill University, for helping and allowing us to use Stellate Hamronie System.
-  L. Zhukov, D. Weinstein, and C. Johnson, “Independent Component Analysis for EEG source localization,” IEEE Engineering in Medicine and Biology Magazine, vol. 19, no. 3, pp. 87 –96, 2000.
-  H. Daou and F. Labeau, “Dynamic Dictionary for Combined EEG Compression and Seizure Detection,” IEEE Journal of Biomedical and Health Informatics, vol. 18, no. 1, pp. 247–256, January 2014.
-  E. Niedermeyer and F. Da Silva, Eds., Electroencephalography, 5th ed. Lippincott Williams and Wilkins, 2005, vol. 7.
-  H. Daou and F. Labeau, “Pre-Processing of Multi-Channel EEG for Improved Compression Performance using SPIHT,” in Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), San Diego, California USA, 28 August - 1 September, 2012., 2012, pp. 2232 – 2235.
-  H. Daou and F. Labeau, “EEG Compression of Scalp Recordings based on Dipole Fitting,” IEEE Journal of Biomedical and Health Informatics, Submitted in November 2013, available at: http://arxiv.org/abs/1403.2001.
-  H. Daou and F. Labeau, “Performance analysis of a -D EEG Compression Algorithm using an Automatic Seizure Detection System,” in Proceedings of Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, California USA, 4 - 7 November 2012.
-  D. Rawat, C. Singh, and M. Sukadev, “A hybrid coding scheme combining SPIHT and SOFM based vector quantization for effectual image compression,” European Journal of Scientific Research, 2009.
-  Z. Lu, Y. Kim, Z. Lu, D. Y. Kim, and W. Pearlman, “Wavelet compression of ECG signals by the set partitioning in hierarchical trees (SPIHT) algorithm,” IEEE Transactions on Biomedical Engineering, vol. 47, pp. 849 – 856, 1999.
-  R. Pascual-Marqui, “Review of Methods for Solving the EEG Inverse Problem,” International Journal of Bioelectromagnetism, vol. 1, no. 1, pp. 75 – 86, 1999.
-  J. Mosher, R. Leahy, and P. Lewis, “EEG and MEG: forward solutions for inverse methods,” IEEE Transactions on Biomedical Engineering, vol. 46, no. 3, pp. 245 –259, march 1999.
-  A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, , and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals,” Circulation, vol. 101, no. 23, p. 215 â 220, 2000.
-  R. Agarwal and J. Gotman, “Long-term EEG compression for intensive-care settings,” Engineering In Medicine and Biology (IEEE), vol. 20, no. 5, pp. 23–29, 2001.