Abstract

Sequence-based Sleep Stage Classification using              Conditional Neural Fields


Intan Nurma Yulita1,2, Mohamad Ivan Fanany1, Aniati Murni Arymurthy1,

1 Machine Learning and Computer Vision Laboratory,

Faculty of Computer Science, Universitas Indonesia

2 Department of Computer Science, Universitas Padjadjaran

* intanurma@gmail.com

Abstract

Sleep signals from a polysomnographic database are sequences in nature. Commonly employed analysis and classification methods, however, ignored this fact and treated the sleep signals as non-sequence data. Treating the sleep signals as sequences, this paper compared two powerful unsupervised feature extractors and three sequence-based classifiers regarding accuracy and computational (training and testing) time after 10-folds cross-validation. The compared feature extractors are Deep Belief Networks (DBN) and Fuzzy C-Means (FCM) clustering. Whereas the compared sequence-based classifiers are Hidden Markov Models (HMM), Conditional Random Fields (CRF) and its variants, i.e., Hidden-state CRF (HCRF) and Latent-Dynamic CRF (LDCRF); and Conditional Neural Fields (CNF) and its variant (LDCNF). In this study, we use two datasets. The first dataset is an open (public) polysomnographic dataset downloadable from the Internet, while the second dataset is our polysomnographic dataset (also available for download). For the first dataset, the combination of FCM and CNF gives the highest accuracy (96.75%) with relatively short training time (0.33 hours). For the second dataset, the combination of DBN and CRF gives the accuracy of 99.96% but with 1.02 hours training time, whereas the combination of DBN and CNF gives slightly less accuracy (99.69%) but also less computation time (0.89 hours).

Keywords— Sleep stage, Conditional Neural Fields, Deep Belief Networks, Fuzzy C-Means Clustering, Classification

Introduction

Accurate sleep stage classification is paramount in telemedicine and home care treatment of patients with a sleep disorder. Most sleep stage classification methods, however, ignored the nature of sleep signals as a sequence signal. We argue that this sequence nature is necessary to be considered and might bring more insightful pattern which ultimately delivers accurate classification. Many previous sleep stage classification methods treat sleep data as non-sequence data. They used wavelet and artificial neural networks [7], shallow classifiers [10], and Deep Belief Networks [27]. On the other hand, HMM is a widely known method for labeling sequence data [14]. This approach has been used in many areas such as speech [26], gesture [3], marine [21], health [2],and Biology [25], and also sleep stage [13] . However, HMM has some limitations that the distribution of the conditional probability for hidden variable only at one time segment. To achieve better performance, Laferty et al. proposes the use of Conditional Random Fields (CRF) to overcome the weakness of HMM [12]. CRF is a undirected graph model that combines the features of a complex observational sequence. Unlike HMM, CRF also accommodates bias in HMM [18].

Due to its success in the sequence labeling, CRF variants have been developed with some additional functions such as hidden-state (in HCRF) to capture the interacting features in the data [4]. However, the HCRF eliminated the role of interacting labels, so the LDCRF was proposed to combine these factors [6], [24], [20]. Furthermore, the CRF and LDCRF have been further developed by adding a new layer to map the complex non-linear relationship. The new layer is called the gate, which is built into the internal processes [17]. Both these last methods are called as CNF and LDCNF.

Längkvist conducts a sleep stage classification study, et al. [13]. They proposed an unsupervised learning for generating features by Deep Belief Networks (DBN). While not specifically treat the sleep data as a sequence, they compared the use of DBN-HMM and DBN-only classifiers. They found that the DBN-HMM (sequential treatment) gives better results DBN-only (non-sequential treatment). In this paper, we are interested in improving their classification performance by replacing the HMM part with CRF and CNF variants in handling the sequential data for sleep stage classification. Längkvist et al. [13] also found that hand-crafted features works better than machine-generated features automatically obtained from raw data. Inline with the finding, this study also employs hand-crafted features.

We also interested to investigate further the effectiveness of DBN as unsupervised features extractor in comparison with other features extractor such as FCM clustering. We compared many configurations regarding the accuracy and computation time. The required output of Deep Belief Networks is the probability of a segment for each class. It has similarities with the concept of cluster formation in an FCM clustering, in which, a segment is a member of all clusters with different membership degree. Based on such objective, the use of FCM clustering might be more optimal due to its concept of uncertainty, which enables no information loss in constructing new features. Therefore, this paper also proposes to examine the use of FCM clustering as the extraction of new features.

1 2. Materials and Methods

This chapter describes our dataset, the used of feature extraction and classification methods, and also our experiment schemes. This research compares two unsupervised feature extraction methods: DBN and FCM; and three classification methods: HMM, CRF-variants, and CNF-variants. Each of these methods is described in the following Section.

1.1 Dataset

The proposes methods are evaluated using two datasets, namely:

  1. The first dataset came from St. Vincent’s University Hospital and University College Dublin which can be downloaded from:http//physionet.org/pn3/ucddb/. This dataset is also used by Längkvist et al. 2012 but this paper only test the first 10 from the available 25 data. The dataset consists of a set of recording of electroencephalography (EEG), electrooculography (EOG), and electromyography (EMG). For the classification, it detects five sleep stages: Awake, Stage 1 (S1), Stage 2 (S2), Slow wave sleep (SWS), and Rapid eye movement (REM).

  2. The second dataset was obtained through a collaboration between Mitra Keluarga Kemayoran Hospital, Jakarta. The dataset was obtained from a research collaboration between Faculty of Computer Science, Universitas Indonesia, with The Mitra Kemayoran Hospital. It was about 8 hours full night sleep of polysomnogram records from five males and females [9]. For the classification, it also uses five sleep stages: Awake, Stage 1(S1), Stage 2 (S2), Stage 3 (S3), and Rapid eye movement (REM). The data annotation was performed by a sleep specialist from the Mitra Keluarga Kemayoran Hospital.

1.2 Conditional Random Fields

Recognition can be performed without processing dynamic interactions between labels in a sequence. However, modeling by considering the dynamics, certainly, have an advantage in exploiting the correlation between labels. Some techniques are specifically modeled sequential data aimed at extracting such correlation, one of them is CRF [16].

CRF modeling is based on the joint distribution of the data sequence X and a sequence label Y[1]. Both these sequences have the same length.

(1)
(2)

where is the length of the data segment, while is a feature vector data from sleep, which have a sequence label . Members of are the set of possible labels. In CRF, is a global feature and consists of local features that are defined as follows [22]:

(3)

Local features consist of state feature or transition feature and each local feature vector must correspond to a weight vector . Thus, the model of CRF is defined as follows:

(4)

where

(5)

Training the CRF can be done through maximizing the log-likelihood of the training set. It is given as follows:

(6)

For improving the convergence of the model, some optimization techniques that commonly used are Conjugative Gradient (CG) and Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithms. The CRF has also been developed by adding some hyperparameters such as a layer of hidden states and gates. Both are further described in the following Subsections, and also compared in Figure 1.

Figure 1: Graphical Models

Hidden States

CRF is limited in capturing the correlation between features in the data. To alleviate this limitation, HCRF is designed by adding an intermediate structure consisting of some hidden states into the CRF [5], [23]. Due to this intermediate structure, the conditional models of HCRF becomes as follows:

(7)

where h is the vector of the intermediate structure. HCRF structure designed to capture the correlation between the features of the data but ignores the dynamics of intercorrelated label that has been owned previously by the CRF structure. Therefore, LDCRF was developed to combine the CRF structure and HCRF [19]. To address the HCRF limitations, of LDCRF is zero, so conditional models of LDCRF are formulated as follows:

(8)

Using the CRF formula, it is known that:

(9)

where

(10)
(11)

Gate Function

Conditional Neural Fields (CNF) combines the advantages of CRF and Artificial Neural Networks [17]. CRF has the advantage in representing the structural relationship among their labels, whereas Artificial Neural Networks captures the correlation between input and output. For representing the function of artificial neural network in the CRF, a new intermediate structure is added. The new intermediate structure consists of some gate functions. These gate functions extract the input features becomes dimensions. By using the gate functions then the state feature is formulated as follows:

(12)

As CRF, the CNF does not capture the intermediate structure inter-correlations. Thus, LDCNF was developed to overcome these deficiencies and also studied the complex non-linear relationship of input to output [15]. The resulting model has two layers: gate and hidden states. Therefore, the feature state is defined as follows:

(13)

1.3 Deep Belief Networks

Deep Belief Networks (DBN) a directed acyclic graph that is composed of some hidden variables that are arranged in layers [11]. DBN works in two schemes. The first scheme learns unsupervised as a feature extractor for input data to find a subset of features. The second scheme learns supervised as a classifier. The result of the first scheme becomes the input to the second scheme.

DBN is composed of several Restricted Boltzmann Machines (RBMs). The RBM has a role in getting the weights of the adjacent two layers and constructs a new feature subset. To build a model, RBM uses some visible and hidden units. The units in the visible and hidden layers are interconnected each other, but no connections between units inside visible and units inside hidden layer. The reconstruction process is done through forward and backward mechanism from the top to the bottom of the RBM and vice-versa. Forward and backward process are often called encoding and decoding.

1.4 FCM Clustering

FCM clustering is a technique to group or to cluster objects into all cluster memberships [8]. Although the object is in the membership of all clusters, the degree of membership for each cluster is different. It means that the higher degree of membership to a cluster, then the characteristics of object are closer to the center of this cluster. In this study, FCM used as a feature extraction so that it has the same role with DBN. The output of feature extraction to be achieved in this study is any data have a subset of features which these features have membership degree of data towards the centroids.

The FCM clustering algorithm is as follows:

  1. Initialize : is the data to be processed in the FCM -dimensional matrix of data and m attributes.

  2. Determine parameter settings: Some cluster is not altered within the processing cluster so that the number needs to be defined. Weights () controls fuzziness of the reach for each cluster.

  3. Initialization of initial partition matrix : is an n-dimensional matrix of data and cluster . This initialization determines the centroid to be set up so that if it establishes the proper initialization, then it will be more quickly to find the expected optimal value. This matrix is defined as follows:

    (14)
    (15)

    where is used for normalization.

  4. Calculate the cluster center with and :

    (16)
  5. Calculate the objective function:

    (17)
  6. Calculate a new partition matrix:

    (18)
  7. Check the stop condition: If or then the training process has finished, but if not, and repeat steps 4. Where is expected e smallest and is maximum iteration

1.5 Design of The Proposed Method

Sleep stage classification in this study is described in Figure 2:

Figure 2: The architecture of the proposed methods

In the earlier step, all signals are preprocessed using notch filtering and down sample process. The preprocessed result is extracted to get essential characteristics. The results of the feature extraction is treated in three scenarios before modeling the CNF, namely:

  1. Without an additional feature extraction, such as DBN and FCM

  2. With an additional feature extraction using DBN

  3. With an additional feature extraction using FCM

Some new features are extracted by DBN as the number of classes used. In contrast with FCM Clustering, which extracts the number of clusters as a new set of features, the three scenarios have some different features. To form a CNF model, the used window size is 0, and the maximum window size is 1000. The training for making the model convergence is done by using the Broyden-Fletcher-Goldfarb-Shanno algorithm. To test the resulted model, the experimental process is conducted through 10 cross-validation.

2 Results and Discussion

This Section explains the evaluation of conditional models obtained from both datasets. The evaluated parameters are the accuracy (%) and the computation time (hours).

2.1 St. Vincent’s Dataset

Gate function in Conditional Neural Fields (CNF) maps the complex nonlinear relationship of input to output. The effectiveness of this additional function in CNF is shown in Table 1. The percentage of highest average accuracy was 95.98 when it used six gates, but others differed only slightly. It indicates the average accuracy of CNF tends to be stable even when changing the number of the gate. The same thing also happened on the required computation time. Without using DBN or FCM features, the CNF has achieved the average accuracy above 95 by using 28 features. However, through the implementation of DBN as feature extractor in Table 2, the CNF can reduce the computation time to within 0.04 to 0.06 hours range for each fold. Even though the accuracy is only slightly increased. Besides that, the highest accuracy of DBN achieved when used five gates. It seems that the DBN does work in giving the optimal subset of features so that the required number of gates needed to reach optimal conditions are also less.

Fold Accuracy (%) Computation Time (hours)
g=2 g=3 g=4 g=5 g=6 g=7 g=8 g=2 g=3 g=4 g=5 g=6 g=7 g=8
1 98.54 98.53 98.60 98.44 85.93 0.03 0.03 0.24 0.14 0.18 0.25 0.24 0.23 0.24
2 89.93 89.96 89.75 86.22 89.55 0.02 0.02 0.21 0.12 0.14 0.17 0.17 0.13 0.09
3 98.97 94.99 96.84 95.05 93.20 0.03 0.03 0.12 0.10 0.09 0.13 0.10 0.13 0.15
4 96.87 97.16 97.06 97.42 97.38 0.03 0.03 0.09 0.08 0.10 0.10 0.10 0.11 0.01
5 97.85 98.00 97.45 98.30 97.18 0.03 0.03 0.04 0.07 0.07 0.03 0.01 0.08 0.11
6 82.88 84.09 93.47 89.71 81.34 0.02 0.02 0.11 0.09 0.12 0.13 0.09 0.10 0.07
7 96.67 99.48 99.50 93.27 99.51 0.03 0.03 0.07 0.08 0.10 0.05 0.12 0.13 0.11
8 99.61 98.77 98.79 99.38 99.98 0.03 0.03 0.07 0.06 0.09 0.02 0.10 0.10 0.06
9 94.76 91.97 86.64 53.01 76.00 0.03 0.03 0.08 0.07 0.06 0.05 0.05 0.07 0.09
10 99.94 99.94 99.95 99.94 99.73 0.03 0.03 0.07 0.05 0.06 0.08 0.05 0.07 0.06
Average 95.60 95.29 95.80 91.07 91.98 0.03 0.03 0.11 0.09 0.10 0.10 0.10 0.11 0.10
Table 1: Number of Gates in CNF for St. Vincent’s Dataset
Fold Accuracy (%) Computation Time (hours)
g=2 g=3 g=4 g=5 g=2 g=3 g=4 g=5
1 96.11 96.26 96.10 96.38 0.08 0.11 0.07 0.09
2 88.70 88.68 88.60 88.64 0.09 0.05 0.05 0.04
3 96.25 96.25 96.04 96.33 0.06 0.08 0.05 0.07
4 94.78 94.78 94.41 97.26 0.05 0.04 0.05 0.01
5 98.93 98.92 98.81 98.92 0.05 0.04 0.01 0.03
6 93.17 94.06 93.80 93.17 0.04 0.05 0.05 0.04
7 99.73 99.74 99.74 99.91 0.04 0.03 0.02 0.02
8 99.70 99.70 99.70 99.70 0.04 0.03 0.03 0.04
9 95.51 96.41 95.49 95.58 0.03 0.03 0.03 0.03
10 99.89 99.89 99.90 99.89 0.03 0.04 0.03 0.03
Average 96.28 96.47 96.26 96.58 0.05 0.05 0.04 0.04
Table 2: Number of Gates in DBN-CNF for St. Vincent’s Dataset

On the other hand, the use of FCM clustering to extract four features from 28 features seems works so that CNF gives the highest accuracy. This accuracy is even better than the use of DBN even though the different is small. Using only four features. However, the required computation time is longer since the constructing of clusters takes a long time and affect the overall execution time for the classification process. In fact, the time required is longer than the classification without using additional feature extraction. As shown in Table 3, the average computation time ranges 0.3 hours up to 0.33 hours for each fold so that the feature extraction becomes ineffective since the increased accuracy is obtained only slightly but the computation time is much larger.

Fold Accuracy (%) Computation Time (hours)
g=2 g=3 g=4 g=2 g=3 g=4
1 96.08 97.57 97.91 0.59 0.60 0.68
2 89.56 89.86 86.67 0.58 0.44 0.19
3 98.66 97.09 97.78 0.34 0.27 0.38
4 93.35 92.45 93.00 0.24 0.33 0.31
5 99.30 98.98 99.05 0.29 0.29 0.26
6 95.44 95.26 93.50 0.48 0.19 0.22
7 98.69 98.78 99.23 0.14 0.51 0.29
8 99.53 99.87 99.81 0.05 0.25 0.46
9 96.41 97.67 97.78 0.10 0.12 0.15
10 99.99 99.98 99.94 0.15 0.34 0.11
Average 96.70 96.75 96.47 0.30 0.33 0.31
Table 3: Number of Gates in FCM-CNF for St. Vincent’s Dataset

The implementation of CNF in Table 3 using four features obtained from FCM Clustering. These features represent the four clusters. The selection of only four of these features is based on tests performed in Table 4 which test the CNF performance for alteration of the number of clusters based on the accuracy and computation time. The result showed that the highest accuracy was obtained when using only four clusters and the accuracy was decreased when the number of clusters increased. However, unlike the computation time, which has an ascended trend when the number of clusters increased. From the table, it is also known that the highest computation time obtained when used eight clusters. Also, if the highest accuracy is achieved when four clusters are applied even though the number of clusters is a representation of the class label for sleep stage classification which is defined five classes, then it indicates that two classes have similar characteristics.

Fold Accuracy (%) Computation Time (hours)
cl=4 cl=5 cl=6 cl=7 cl=8 cl=4 cl=5 cl=6 cl=7 cl=8
1 96.08 95.26 95.99 91.31 86.82 0.59 0.61 0.49 0.43 0.86
2 89.56 91.41 87.49 88.96 87.53 0.58 0.57 0.35 0.40 0.66
3 98.66 85.63 98.14 85.69 86.25 0.34 0.32 0.38 0.43 0.64
4 93.35 94.82 95.73 96.48 96.35 0.24 0.21 0.33 0.30 0.47
5 99.30 99.16 97.96 87.73 97.18 0.29 0.17 0.19 0.36 0.68
6 95.44 93.68 96.36 95.81 95.44 0.48 0.17 0.16 0.40 0.45
7 98.69 98.92 99.15 98.36 98.12 0.14 0.15 0.14 0.27 0.37
8 99.53 99.88 98.49 97.88 98.69 0.05 0.13 0.24 0.33 0.43
9 96.41 96.53 95.67 89.50 73.12 0.10 0.12 0.32 0.45 0.32
10 99.99 99.99 99.78 99.94 99.94 0.15 0.10 0.13 0.39 0.26
Average 96.70 95.53 96.48 93.17 91.94 0.30 0.26 0.27 0.38 0.51
Table 4: Number of Clusters in Fuzzy-CNF for St. Vincent’s Dataset
Fold Accuracy (%) Computation Time (hours)
w=1.05 w=1.1 w=1.2 w=1.3 w=1.4 w=1.05 w=1.1 w=1.2 w=1.3 w=1.4
1 96.08 95.26 95.99 91.31 86.82 0.59 0.61 0.49 0.43 0.86
2 89.56 91.41 87.49 88.96 87.53 0.58 0.57 0.35 0.40 0.66
3 98.66 85.63 98.14 85.69 86.25 0.34 0.32 0.38 0.43 0.64
4 93.35 94.82 95.73 96.48 96.35 0.24 0.21 0.33 0.30 0.47
5 99.30 99.16 97.96 87.73 97.18 0.29 0.17 0.19 0.36 0.68
6 95.44 93.68 96.36 95.81 95.44 0.48 0.17 0.16 0.40 0.45
7 98.69 98.92 99.15 98.36 98.12 0.14 0.15 0.14 0.27 0.37
8 99.53 99.88 98.49 97.88 98.69 0.05 0.13 0.24 0.33 0.43
9 96.41 96.53 95.67 89.50 73.12 0.10 0.12 0.32 0.45 0.32
10 99.99 99.99 99.78 99.94 99.94 0.15 0.10 0.13 0.39 0.26
Average 96.70 95.53 96.48 93.17 91.94 0.30 0.26 0.27 0.38 0.51
Table 5: Degree of Fuzziness (w) in Fuzzy-CNF for St. Vincent’s Dataset

In implementing the FCM clustering to find a subset of features, we should also pay attention to the degree of fuzziness () is used. The higher , a more blurred membership of a cluster. Hence, if the fuzziness is too high, then all objects have the same degree to all clusters. Therefore, it is necessary to find an optimal testing St. Vincent’s dataset. The results are shown in Table 5. The accuracy tendency is decreasing with greater the degree of fuzziness () except implementations with =1.2. The computation time has an upward trend in line with the increasing degree of fuzziness except for = 1.05 which is higher than = 1.1.

Methods Accuracy (%) Computation Time (hours)
CRF 94.34 0.58
HCRF 86.64 1.96
LDCRF 90.28 8.12
CNF 95.98 0.10
LDCNF 96.19 0.52
DBN 84.47 0.04
DBN - HMM 93.11 0.12
DBN - CRF 93.51 0.35
DBN - HCRF 89.97 0.95
DBN - LDCRF 92.37 5.38
DBN - CNF 96.58 0.12
DBN - LDCNF 96.45 0.40
FCM - CRF 92.89 0.36
FCM - CNF 96.75 0.33
Table 6: Comparing all methods for St. Vincent’s Dataset

The accuracy of CNF for this sleep stage classification ranges 91.07% up to 95.80 %. The performance of CNF increased when DBN and FCM applied. By using DBN, the achieved accuracy ranges 96.26 % up to 96.58. Even this performance has the highest performance when FCM is implemented. It ranges 96.47 % up to 96.75 %. Thus, it can be seen that the two additional methods are proven effective in improving the accuracy of CNF. Also, the CNF also proved as effective for classifying the sleep stage from St. Vincent’s dataset with the accuracy above 90%. An excellent performance of CNF for sleep stage classification can also be seen by comparing the other methods shown in Table 6.

The lowest accuracy obtained when using DBN is 84.47 %, then HCRF, LDCRF, CNF, and LDCNF although when they did not add any feature extraction. The HCRF gives lower accuracy than the CRF might be because it applied its intermediate layer but ignoring the factor of label interaction. When taking into account this label interaction factor, the LDCRF managed to give higher accuracy than HCRF. However, the accuracy of the LDCRF is still lower than the CRF. Maybe, it can be inferred that the addition of an intermediate layer causes a decrease in accuracy for sleep stage classification. On the other hand, the LDCNF accuracy is higher than the CNF. It means the gate function successfully find a subset of the features that have interacting features. By using this feature subset, the performance of the classification is increased when LDCNF also implements its hidden states to get the probabilistic models.

When applied as a classifier, DBN showed the lowest accuracy. When applied as feature extractor, however, the DBN was proven effective in improving the accuracy of CRF, HCRF, LDCRF, CNF, and LDCNF. With DBN as feature extractor, the CNF has the highest accuracy than other methods which also use the DBN. From Table 6, it also known that DBN combined with CRF, CNF, and LDCNF have the higher accuracy than the DBN on Hidden Markov Models which proposed by Längkvist et.al. 2012 on the same dataset. Moreover, the accuracy of CNF was able to achieve 96.58 %. Also, by using FCM Clustering, the accuracy of CNF increases of 96.75 % but CRF decreased when compared using the DBN.

As for the computation time, the use of hidden state takes longer time than without hidden state. The HCRF Computation time was 1.96 hours, while for the CRF took only 0.58 hours when to apply the application of DBN as an additional feature extraction. The computation time increased when the external and internal structure of hidden states are both used. The LDCRF performance (with or without DBN as feature extractor). Besides that, it is also known that the gate function in CNF and LDCNF also proved effective to improve the accuracy and also decrease the computation time.

2.2 Mitra Keluarga Kemayoran’s Dataset

Mitra Keluarga Kemayoran’s dataset also has five classes. To perform sleep stage classification at Mitra Keluarga Kemayoran’s dataset, the DBN is used for feature extraction. The extracted feature subset consists of five labels that class. According to Table 7, the use of two to four gate gives accuracy within 96.99 % to 98.9 % range with a computation time ranges are from 0.82 to 0.89 hours for each fold. The accuracy of CNF decreased when FCM Clustering is used for feature extraction. The highest accuracy obtained is only 82.7 % when using three gates. This implementation is described in detail in Table 8. From the Table, it can be seen that the accuracy of FCM Clustering did not increase with the addition of one or two more gates, on the other hand, the computation time was increased with the addition. This indicates that the use of FCM clustering is worse than the DBN for CNF classifier.

Tests in Table 8 were performed to gain insight into the influence of the gate in CNF using a subset of the features derived from the FCM clustering. The test only focuses on the used number of gates, whereas the number of clusters and the degree of fuzziness () are not altered. The most optimal values for both two parameters are taken based on the Tables 9 and 10. In Table 9, the number of clusters is tested against its classification performance of the CNF. Tests carried out using four to eight clusters. From these tests, it is found that the optimal number of clusters is six. Increasing the number of clusters further did not improve the accuracy, but it might cost longer computation time. Whereas in 10, it can be seen the influence of the degree of fuzziness () of the CNF. The result obtained optimal condition when .

Table 11 compared the CNF performance against CRF. It was known that CRF accuracy is lower than the CNF. However, when using DBN for feature extraction, the accuracy of CRF is higher than CNF, but the difference is only slight. However, unlike when FCM clustering was used feature extraction, the CRF accuracy is lower than the CNF. From this comparison, it is also learned that the gate function proved to be effective to reduce the computation time of the classification.

Fold Accuracy (%) Computation Time (hours)
g=2 g=3 g=4 g=2 g=3 g=4
1 99.28 99.18 99.74 0.82 0.95 0.90
2 95.41 98.47 99.31 0.79 0.83 0.87
3 97.88 98.95 99.81 0.81 0.95 0.85
4 99.77 99.87 99.91 0.92 0.95 0.88
5 97.11 98.71 99.59 0.79 0.94 0.84
6 95.96 98.67 99.78 0.87 0.80 0.91
7 95.22 98.75 99.58 0.81 0.85 0.83
8 93.69 98.65 99.63 0.80 0.84 0.89
9 96.28 98.27 99.78 0.80 0.82 0.99
10 99.31 99.49 99.77 0.79 0.81 0.89
Average 96.99 98.90 99.69 0.82 0.87 0.89
Table 7: Number of Gates in DBN-CNF for Mitra Keluarga Kemayoran’s Dataset
Fold Accuracy (%) Computation Time (hours)
g=2 g=3 g=4 g=5 g=2 g=3 g=4 g=5
1 94.57 95.51 94.92 94.94 0.28 0.30 0.29 0.32
2 76.64 76.74 77.64 76.19 0.33 0.46 0.37 0.37
3 85.38 86.08 85.18 85.21 0.39 0.37 0.41 0.45
4 96.49 96.61 95.72 95.76 0.28 0.32 0.28 0.31
5 76.79 76.78 79.46 80.30 0.39 0.36 0.37 0.38
6 69.96 72.19 57.54 57.58 0.25 0.34 0.24 0.31
7 93.36 93.84 91.53 91.49 0.47 0.31 0.53 0.50
8 44.67 48.40 46.59 47.39 0.37 0.25 0.30 0.33
9 88.37 88.69 87.34 87.62 0.25 0.32 0.29 0.29
10 91.74 92.22 91.87 91.88 0.33 0.39 0.53 0.40
Average 81.80 82.70 80.78 80.84 0.34 0.34 0.36 0.37
Table 8: Number of Gates in FCM-CNF for Mitra Keluarga Kemayoran’s Dataset
Fold Accuracy (%) Computation Time (hours)
cl=4 cl=5 cl=6 cl=7 cl=8 cl=4 cl=5 cl=6 cl=7 cl=8
1 95.96 95.05 95.51 95.63 95.47 0.25 0.21 0.30 0.46 0.29
2 79.62 78.99 76.74 79.80 77.33 0.40 0.20 0.46 0.31 0.23
3 85.95 86.60 86.08 84.80 85.23 0.30 0.23 0.37 0.24 0.42
4 92.60 96.67 96.61 96.68 95.65 0.31 0.22 0.32 0.27 0.53
5 83.40 81.32 76.78 78.97 80.55 0.20 0.30 0.36 0.61 0.29
6 53.54 71.88 72.19 74.80 72.36 0.25 0.23 0.34 0.27 0.19
7 72.82 75.50 93.84 89.39 91.10 0.27 0.33 0.31 0.29 0.36
8 52.22 44.57 48.40 44.49 44.60 0.30 0.24 0.25 0.54 0.58
9 73.46 81.67 88.69 88.11 88.48 0.37 0.34 0.32 0.31 0.31
10 91.57 91.39 92.22 90.79 91.70 0.18 0.30 0.39 0.21 0.63
Average 78.11 80.36 82.70 82.35 82.25 0.28 0.26 0.34 0.35 0.38
Table 9: Number of Clusters in FCM-CNF for Mitra Keluarga Kemayoran’s Dataset
Fold Accuracy (%) Computation Time (hours)
w=1.05 w=1.1 w=1.2 w=1.3 w=1.4 w=1.05 w=1.1 w=1.2 w=1.3 w=1.4
1 95.51 95.25 95.46 94.17 93.71 0.30 0.51 0.36 0.43 0.46
2 76.74 79.65 78.28 72.40 65.08 0.46 0.29 0.32 0.45 0.56
3 86.08 85.23 86.83 85.92 85.52 0.37 0.52 0.87 0.72 0.58
4 96.61 96.01 94.53 95.23 95.61 0.32 0.53 0.40 0.35 0.60
5 76.78 79.55 79.42 77.23 76.93 0.36 0.49 0.42 0.36 0.41
6 72.19 69.81 72.68 73.49 56.06 0.34 0.43 0.47 0.29 0.54
7 93.84 92.80 92.64 88.37 88.49 0.31 0.24 0.25 0.49 0.39
8 48.40 44.93 48.92 46.64 56.64 0.25 0.45 0.49 0.48 0.47
9 88.69 88.90 86.86 74.24 73.61 0.32 0.46 0.36 0.66 0.65
10 92.22 91.76 90.98 89.85 90.94 0.39 0.64 0.28 0.52 0.50
Average 82.70 82.39 82.66 79.75 78.26 0.34 0.46 0.42 0.47 0.52
Table 10: Degree of Fuzziness (w) in FCM-CNF for Mitra Keluarga Kemayoran’s Dataset
Methods Accuracy (%) Computation Time (hours)
CRF 75.59 0.58
CNF 79.39 0.2
DBN - CRF 99.96 1.02
DBN - CNF 99.69 0.89
FCM - CRF 76 0.46
FCM - CNF 82.7 0.34
Table 11: Comparing all methods for Mitra Keluarga Kemayoran’s Dataset

3 Conclusion

This paper proposed the sequence-based treatment of sleep signals for sleep stage classification using Conditional Neural Fields (CNF). In this study, we compared the DBN and FCM clustering as feature extractors. The result showed that the CNF is mostly superior in accuracy and computation time compared with CRF, HCRF, LDCRF, and LDCNF. As for the feature extraction, On the other hand, we found that the DBN is better than the FCM clustering. In the first dataset, the accuracy of DBN is lower than the FCM clustering but with a very thin margin. While in the second dataset, the accuracy of from DBN feature extractor is much higher than the FCM Clustering. Since the FCM Clustering shown as effective methods for sleep stage classification in St. Vincent’s Data, for our future study, it might be interesting to elaborate the use of another type of Fuzzy clustering, such as Fuzzy Subtractive Clustering (FSC). Unlike FCM, the FSC adaptively determined the number of effective clusters directly based on the data and membership computation based on density.

4 Acknowledgment

This work is supported by Higher Education Center of Excellence Research Grant funded Indonesia Ministry of Research and Higher Education Contract No. 1068/UN2.R12/ HKP.05.00/2016

5 Conflict of Interests

The authors declare that there is no conflict of interest regarding the publication of this paper.

References

  1. An Introduction to Conditional Random Fields for Relational Learning. C. sutton and a. mccallum. MIT Press, pages 93–128, 2006.
  2. B. Cooper and M. Lipsitch. The analysis of hospital infection data using hidden markov models. Biostatistics, 5.2:223–237, 1995.
  3. M. Elmezain and A. Al-hamadi. A hidden markov model-based isolated and meaningful hand gesture recognition. In Proceeding of World Academy of Science, Technology, Engineeering and Technology, pages 394–401, 2008.
  4. A. G. et. al. Hidden conditional random fields for phone classification. Interspeech, pages 1117–1120, 2005.
  5. A. Q. et. al. Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intellr, 29:1848–1852, 2007.
  6. S. B. et. al. Language recognition using latent dynamic conditional random field model with phonological features. Mathematical Problems in Engineering, 2014.
  7. V. P. J. et. al. Sleep stages classification using wavelettransform and neural network. In Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics, pages 71–74, 2012.
  8. Y. L. et. al. Implementation of the fuzzy c-means clustering algorithm in meteorological data. International Journal of Database Theory and Application, pages 1–13, 2013.
  9. I. H. et.al. An integrated sleep stage classification device based on electrocardiograph signal. 2012 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 2012.
  10. E. P. Giri, M. Fanany, and A. M. Arymurthy. Sleep stages classification using shallow classifier. International Conference on Advanced Computer Science and Information Systems ICACSIS, pages 297–301, 2015.
  11. G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, pages 1527–1554, 2006.
  12. J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Morgan Kaufmann Publishers Inc., 2001.
  13. M. Langkvist, L. Karlsson, and A. Loutfi. Sleep stage classification using unsupervised feature learning. Advances in Artificial Neural Systems, 2012.
  14. Lawrence Rabiner and Biing-Hwang Juang. Fundamentals Of Speech Recognition. Prentice Hall, 1993.
  15. J. C. . Levesque, L. P. Morency, and C. Gagné. Sequential emotion recognition using latent dynamic conditional neural fields. In Proc. 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, page 6 pages, 2013.
  16. T. Liu, X. Huang, and J. Ma. Conditional random fields for image labeling. Mathematical Problems in Engineering, 2016.
  17. J. Peng, L. Bo, and J. Xu. Conditional neural fields. In Proceedings of Neural Information Processing Systems (NIPS), 2009.
  18. A. Quattoni, M. Collins, and T. Darrell. Conditional random fields for object recognition. NIPS, pages 1097–1104, 2004.
  19. A. M. Rahimi, R. Ruschel, and B. S. Manjunath. Cuav sensor fusion with latent-dynamic conditional random fields in coronal plane estimation. CVPR, 2016.
  20. Z. S. Fuzzy-based latent-dynamic conditional random fields for continuous gesture recognition. Comput. Methods Appl. Mech. EngrgOptical Engineering194, 51.6, 2012.
  21. C. Spampinato and S. Palazzo. Hidden markov models for detecting anomalous fish trajectories in underwater footage. 2012 IEEE International Workshop on Machine Learning for Signal Processing, 2012.
  22. C. Sutton and A. McCallum. An introduction to conditional random fields. Foundations and Trends in Machine Learning, 2011.
  23. Y. Tong, R. Chen, and J. Gao. Hidden state conditional random field for abnormal activity recognition in smart homes. Entropy, 2015.
  24. C. Wittner, B. Schauerte, and R. Stiefelhagen. What’s the point? frame-wise pointing gesture recognition with latent-dynamic conditional random fields. arXiv, 2015.
  25. B.-J. Yoon. Hidden markov models and their applications in biological sequence analysis. Current Genomics, 10:402–415, 2009.
  26. I. N. Yulita, T. H. Liong, and Adiwijaya. Fuzzy hidden markov models for indonesian speech classification. Journal of Advanced Computational Intelligence and Intelligent Informatics (JACIII), 16, 2012.
  27. J. Zhang, Y. Wu, J. Bai, and F. Chen. Automatic sleep stage classification based on sparse deep belief net and combination of multiple classifiers. Sage Journals, 14:1–9, 2015.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minumum 40 characters
Add comment
Cancel
Loading ...
10532
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description