Abstract
Electroencephalography (EEG) is a method to record the electrical signals in the brain. Recognizing the EEG patterns in the sleeping brain gives insights into the sleeping disorders. The dataset uploaded under consideration contains data points associated to numerous physiologies. There are particular patterns associated with the NonRapid Eye Movement (NREM) sleep cycle of the brain. This study attempts to generalize the detection of these patterns using a machine learning model. The proposed model uses additional feature engineering to incorporate sequential information for training a classifier to predict the occurrence of Cyclic Alternating Pattern (CAP) sequences in the sleep cycle, which are often associate with sleep disorders.
A machine learning model for identifying cyclic alternating patterns in the sleeping brain
\@xsect
The electroencephalogram (EEG) is an important tool to derive important information about an individualâs sleeping activity. Recent studies have been introduced in sleep research based on the nature and quantitation of the sleep microstructure, taking into account the time structure of phasic EEG events observed during nonREM (NREM) stage and shorter than the standardized scoring epoch (Mariani, 2011)
The cyclic alternating pattern (CAP) is a periodic EEG activity occurring during NREM sleep. It is characterized by cyclic sequences of cerebral activation (phase A) followed by periods of deactivation (phase B). Phase B separates two successive phase A periods with an interval of less than one minute. A CAP cycle is defined as an âAâ phase followed by a âBâ phase and at least two CAP cycles are required to form a CAP sequence. CAP is also a marker of sleep instability and can be correlated with several sleep related pathologies.
One of the studies conducted by (Diego, 2006) aims to analyze and identify the difference in EEG signals between neural patterns of thought and the signals obtained from the imagination of a vowel. Their computation implements the Fourier transform for obtaining frequency data of all the samples, after which ANOVA was used to cross check measurable difference among samples.
(Mariani, 2011) analytically evaluated the information content of each description and introduced new descriptors through the application of FFT and normalization of the mean. This helps in improving the automatic recognition of CAP.
(Bashivan, 2015) obtained a sequence of topology preserving multispectral images as opposed to standard EEG analysis techniques that missed out on the spatial information. After obtaining the EEG movie, a deep recurrentconvolutional neural architecture was trained to learn robust representations from the sequence of images or frame. This approach preserved the spatial, spectral and temporal structure of the EEG data.
There have been increasing efforts for automatic detection of phase âAâ start time and duration using purely data driven approaches. (Novona, 2002)
Using various descriptors, the EEG patterns during sleep of 108 participants are obtained. These datasets are publicly available on physionet.org (Terzano, 2002) The participants included 16 healthy subjects and 92 subjects with some past pathological recording, with the following distribution:
Pathology 
Participants 

No pathology  16 
Bruxism  2 
Insomnia  9 
Narcolepsy  5 
Nocturnal Frontal Lobe Epilepsy (NFLE)  40 
Periodic Leg Movements (PLM)  10 
REM behavior disorder (RBD)  22 
Sleep disordered breathing (SBD)  4 

This work is based on the sleep cycle of the first healthy subject who did not have any neurological disorder. The waveforms recorded after every 0.0019531 seconds constitute a total of 17,725,426 data points. Using these data points, feature engineering is performed to form the feature matrix. The feature matrix acts as an input to the logistic regression model, which will account for the time series analysis of the brain signals and classify the start of phase A in a given CAP cycle.
There are numerous EEG readings of 108 participants taken recorded after every 0.0019531 seconds. This number varies throughout the data. The following is a sample of how the data looks for one participant for the first 40 seconds:
The data needed for this study is open source The data is collected for 108 nights and be to different patients with different psychological conditions. However, for the sake of this study, we only consider the data for a healthy psychology in the first part. After the model is trained on this dataset, it can be extended to a few other datasets to compare performance.
The training data set has 17,725,426 data points, each separated at a fixed time interval of 0.0019531 seconds. The only EEG reading that this study concerns with is F2F4 signal. Separately, data about the start time and duration of phase A is also available. Both of these data files are combined into a single data file by labelling each data point in the EEG dataset with information about the presence of a CAP cycle. This data file has three columns, column one represents the absolute time (in seconds) at which a reading was recorded, column two is the actual EEG reading (in microvolts), column three, are binary representations (0 or 1), where 0 indicates the absence of phase A at this given time and 1 indicates presence of phase A at this time. This cleaned data is then used for preprocessing, followed by training a binary classifier.
The EEG data is highly nonstationary. A nonstationary data is characterized by timevarying mean and variance. It is important to make the data stationary to draw any conclusive evidence relating to the presence of phase A during any given time interval.
Differential moving average is used to make the data stationary. The DickeyFuller test (Said, 1984) is used to test stationarity. It was found that a differential moving average with window size of 15 made the data the most stationary, as shown by the corresponding test statistic value. The first window size value was selected randomly. Based on the test statistic value obtained at this point, the next point was selected and so on until the best value was obtained. The original EEG readings were then transformed into differential moving average values using a window size of 15.
This study intends to classify each data point based on the 0 or 1 value indicated in the third column in the dataset. The objective is to distinguish those data points that fall under the duration of phase A from those that do not. In addition, this dataset is highly sequential. In other words, if phase A occurs at any given data point, the chances of phase A continuing to the next data point are very high. In order to train a classifier for a problem of this type, additional feature
The original data points are merely a sequence of stationary EEG values. This sequence can be denoted as follows:
Where, X1, X2 are the first two data points of the sequence, Xw and Xp are some intermediate data points and XN denotes the last data points.
The labels corresponding to these data points can be represented in the following manner
yi = 1 indicates the presence of phase A in that duration and yi = 0 indicates the absence.
To ensure sequential properties of the data points are maintained during training, the data points are transformed into data vectors denoted in the following manner
Accordingly, the feauture matrix can be formulated as:
where âwâ is called the window and âpâ is called the stride. Both w and p are hyperparameters. Their values are determined by running tests on the crossvalidation data. It is important to note that if the value of p is greater than w, some data points are lost. So, the upperlimit of p is kept w for crossvalidation. There are a number of ways in which each of these data vectors can be labelled. A heuristic is to choose the label of the last data point within the vector and assign it to the entire data vector. Another approach could be to method is to implement a maximum vote strategy. In this method, the label is given to any data vector Xi is â0â if maximum number of data points contained in this data vector have label â0â, else it is 1. If w is even and there are equal number of â0â and â1â labels to the data points, â0â is chosen as the tiebreaker, since the probability of getting a â0â is much more than the probability of getting a â1â.
There can be combinations of weighted averages of data point labels that could be used to assign labels to a vector. But, since the number of zeros is much greater than the number of ones, it might make sense to assign a vector a label of â1â even if only one data point in the vector has a label of â1â. There is a compromise with detecting the accurate duration of phase A, but for the sake of this study, this compromise is ignored, as long as the size of window, w, is fairly small. In the current implementation, we have assigned the value of the last data point to the vector.
Logistic regression is implemented on Python using the scikitlearn package (Pedregosa, 2011, Buitinck, 2013).The entire dataset is split into training and test sets in the ratio 70:30. An L2 regularizer was used with a regularization coefficient of 0.01. The optimizer used was âNewtonconjugate gradientâ to incorporate parallelization across CPU cores. In order to account for the class imbalance, the weights were balanced using corresponding class frequencies. A series of experiments were performed (Fig 2.) as discussed in Section 3.1 to obtain the optimum values of the hyperparameters. All experiments were performed on Intel Core I7 machines with 16 GB RAM and 2.90 GHz clock frequency.
Stride (P) 
Window (W)  Accuracy % 

58 
512  51 
58 
1024  52 
58 
1536  52 
116  512  53 
116  1024  53 
116  1536  54 
479  512  53 
479  1024  54 
479  1536  55 
958  1024  56 
958  1536  57 
1276  1536  58 
For hyperparameter tuning, the model is run on test sets with a combination of values of stride p and window w. The lowest time for which any phase A exists in this dataset is 2 seconds. The value of w chosen should be lesser than this value to ensure no occurrence is skipped. The number of time intervals which make up one second intervals is 512. Therefore, the value of w is restricted to multiples of w. whereas, to ensure that the value of p stays lesser than w and at the same no data is lost, there are a finite number of values of p to choose from. The results are shown in figure 2. The values of w and p which give the maximum classification accuracy (58%) are chosen. In this case, the optimum values of p and w are 1276 and 1536 respectively. The AreaUnderCurve Receiver Operating Characteristic (AUROC) value for the optimum combination (1276 and 1536) comes out to 0.512. The results plotted in terms of the Receiver Operating Characteristic (ROC) curve are shown in figure 3.
A simple binary logistic regression classifier was trained for classifying EEG data in the sleep cycle into phase A and nonphase A. This study attempted to train raw timedomain data as recorded by the EEG directly without frequencydomain transform. Such a model can be trained and interpreted easily. In addition, this also stresses the importance of using sequential data models like Markov chains and recurrent neural networks on timedomain data. However, considerable accuracy was not obtained using this model. Further work with sequential models might provide better insights.
We thank the Auton Lab and Professor Artur Dubrawski for providing a platform for showcasing this research. We thank Philips for motivating the problem statement and we also thank Pittsburgh Supercomputing Centre for allowing us to access their computational resources.

Mariani, Sara, et al. ”Characterization of A phases during the cyclic alternating pattern of sleep.” Clinical Neurophysiology 122.10 (2011): 20162024.

Rojas, Diego Alfonso, Leonardo AndrÃ©s GÃ³ngora, and Olga Lucia Ramos. ”EEG signal analysis related to speech process through BCI device EMOTIV, FFT and statistical methods.” (2006).

Bashivan, Pouya, et al. ”Learning representations from EEG with deep recurrentconvolutional neural networks.” arXiv preprint arXiv:1511.06448 (2015).

Navona, Carlo, et al. ”An automatic method for the recognition and classification of the Aphases of the cyclic alternating pattern.” Clinical neurophysiology 113.11 (2002): 18261831.

Terzano, Mario Giovanni, et al. ”Atlas, rules, and recording techniques for the scoring of cyclic alternating pattern (CAP) in human sleep.” Sleep medicine 3.2 (2002): 187199.

Said, Said E., and David A. Dickey. ”Testing for unit roots in autoregressivemoving average models of unknown order.” Biometrika 71.3 (1984): 599607.

Pedregosa, Fabian, et al. ”Scikitlearn: Machine learning in Python.” Journal of machine learning research 12.Oct (2011): 28252830.

Buitinck, Lars, et al. ”API design for machine learning software: experiences from the scikitlearn project.” arXiv preprint arXiv:1309.0238 (2013).