# Entropy-Assisted Multi-Modal Emotion Recognition Framework Based on Physiological Signals

###### Abstract

As the result of the growing importance of the Human Computer Interface system, understanding human’s emotion states has become a consequential ability for the computer. This paper aims to improve the performance of emotion recognition by conducting the complexity analysis of physiological signals. Based on AMIGOS dataset, we extracted several entropy-domain features such as Refined Composite Multi-Scale Entropy (RCMSE), Refined Composite Multi-Scale Permutation Entropy (RCMPE) from ECG and GSR signals, and Multivariate Multi-Scale Entropy (MMSE), Multivariate Multi-Scale Permutation Entropy (MMPE) from EEG, respectively. The statistical results show that RCMSE in GSR has a dominating performance in arousal, while RCMPE in GSR would be the excellent feature in valence. Furthermore, we selected XGBoost model to predict emotion and get 68% accuracy in arousal and 84% in valence.

## I Introduction

Due to the progression of the technology and the increasing emergence of the Internet of Things (IoT) device, building the completed human-computer interaction (HCI) system is becoming increasingly vital. However, if we hope to let the machine interacts with human more appropriately, we must give the machine an ability to consider human affect. Hence, the importance of affective computing has grown by leap and bound.

Currently, many methods have been proposed to identify people’s emotional states, such as facial expression, body movements, or speech [1], [2]. However, since human inclines to hide the true emotion inside and disguise it by the social mask, those methods mentioned above could not veridically reflect the human emotional state. In contrast, physiological measurements own several merits when developing emotion-based HCI system. Firstly, physiological signals such as EEG, ECG, and GSR cannot be easily controlled by conscious, thus the human social masking problem would be crossed out. Secondly, most physiological signals are not culturally specific, so it could be a favorable method to build an user-independent and standardized HCI system.

Among several physiological signal datasets for emotion recognition, we selected AMIGOS [3] proposed by Juan Abdon Miranda-Correa et al. It is a dataset for Multi-modal research of human affect, which collected 40 participantsâ physiological signals: Electroencephalogram (EEG), Electrocardiogram (ECG) and Galvanic Skin Response (GSR) while watching emotional videos.

AMIGOS extracted a total of 213 features in both time-domain and frequency-domain, then conducted an emotion recognition task. However, after implementing AMIGOS’s emotion recognition protocol, we deduced that the features in time and frequency-domain might not be robust, thus yielded low F1-score. On the other hand, the complexity of the physiological signals has attracted a lot of notice in recent decades. Madalena Costa et al. used Multi-Scale Entropy (MSE) to separate healthy and pathologic groups successfully [4], Massimiliano Zanin et. al. purposed Permutation Entropy to study Epilepsy [5]. With respect to this trend, we aimed to use several entropy-based analysis to enhance the performance of emotion recognition. In this paper, we analyzed the pattern of Refined Composite Multi-Scale Entropy (RCMSE), Multivariate Multi-Scale Entropy (MMSE), Refined Composite Multi-Scale Permutation Entropy(RCMPE), and Multivariate Multi-Scale Permutation Entropy(MMPE) in different affective states. Furthermore, based on the original experimental protocol of AMIGOS, we did several modifications and implement an emotion recognition task.

Our research has two main contributions. 1) We discovered several remarkable correlations between the complexity of the physiological signals and human affect by conducting statistical analysis. 2) By applying new feature set and the different machine learning model, our classification results outperformed AMIGOS’s previous results to a large extent.

This paper is organized into four sections as follows. First, Section II would introduce the AMIGOS dataset and the basic processing flow. Then, Our methods of extracting four kinds of entropy-domain features: RCMSE, MMSE, RCMPE, MMPE would be explained in Section III. Section IV would list the statistical and classification results of our experiment. Finally, the conclusion and future work would be discussed in Section V.

## Ii Background

### Ii-a Dataset

Up to present, several datasets have been established to let researchers undergo affective computing experiments. DEAP [6] is one of the most commonly used dataset in affective computing, which collected EEG, peripheral physiological signals, and face videos from the participants via clinical devices. Compared with DEAP, ASCERTAIN [7] is the first dataset which used commercial wearable devices and analyzed the personal traits of the participants, which is conducive to building HCI architecture. AMIGOS [3] is the newest dataset for affective computing. It recruited the participants to watch several video clips and used state-of-the-art commercial devices to collect physiological signals. It conducted self-assessment of valence, the intrinsic positive or negative feeling, and arousal, the extent of psychological excitation. Although DEAP is favorable to clinical analysis, the complicated set-up process is not favorable to building HCI system. As for ASCERTAIN, the relatively unstable and low sampling rate (32 HZ in EEG) signals might cause several problems for comprehensive analysis. In contrast, AMIGOS collected the signals in high sampling rate (128 HZ in EEG), and it also annotated other emotion states such as social context, basic emotions and external annotation of valence and arousal. Thus, it could be further applied to various types of emotion recognition tasks. In respect to these pros and cons, we selected AMIGOS to be the dataset of our experiment.

In AMIGOS, participants’ emotions were annotated in range of 0 to 9 with the assessment of levels of valence and arousal. In our experiment, we split the emotional levels of into two classes: positive and negative based on the mean values of all assessment levels. This dichotomy let us could conduct the statistical analysis easier in our latter experiment.

### Ii-B Processing flow

AMIGOS’s experimental protocol is the typical processing flow of the affective computing, as shown in Fig. 1. First, the physiological signal from the participants would be the input of the entire flow, and it would undergo some preprocessing steps such as detrending and filtering to remove the artifact or noise. Next, several features would be extracted from the processed signals, typically from the time-domain and the frequency-domain. After then, the features would be fed into a machine learning model, and finally the machine would output its prediction of human’s emotional states.

On the basis of AMIGOS’s original experimental protocol, we did some modifications in order to improve the performance. In addition to time-domain and frequency-domain features, we added several entropy-domain features into our feature space, and we also replaced the classification model with XGBoost. Our modifications were marked by blue color in Fig. 1.

## Iii Method

### Iii-a Entropy-Domain Features

#### Iii-A1 Refined Composite Multi-Scale Entropy (RCMSE) [8]

Before introducing RCMSE, we first review the concept of sample entropy. It is defined as:

(1) |

where is the original time series, represent the total number of -dimensional matched template vector pairs, is the pattern length and is the maximum tolerance. The smaller of the value of would lead to the higher value of SampEn, which indicates the times series is more disorder. Based on this concept, Multi-Scale Entropy(MSE) had been proposed and widely used in analyzing the physiological signal in several experiments [9].

RCMSE is an adaptation of MSE to resolve the problems of undefined value while doing logarithm calculation. There are two steps in RCMSE. In the first step, for each scaling factor , coarse-grained time series are derived from the original time series . The point of -th coarse-grained time series of is defined as follows:

(2) |

,

where is the original time series, is the length of .

In second step, RCMSE could be calculated as:

(3) |

where . represents the total number of m-dimensional matched vector pairs which is calculated from the k-th coarse-grained time series at a scale factor of .

Due to the length of AMIGOS dataset is at most 150 seconds, in order to avoid undefined problem when doing logarithm calculation, we set , and = 0.2*(std of the original signal ) to analyze ECG, and set , and = 0.2*(std of the original signal ) for GSR.

#### Iii-A2 Multivariate Multi-Scale Entropy (MMSE) [10]

Since there exist some correlation between different EEG channels, we use Multivariate Multi-Scale Entropy (MMSE) to analyze -variate time series. The first step is to get coarse-grained time series for signals of all the considered channels which can be defined as

(4) |

where is the channel index and is the index of the new coarse-grained time series. Then we could create new template vector . The match pairs calculation mentioned in equation 1 is calculated based on this new template vector, then we increased for times series to time series respectively in template vector and calculated the matching pairs which is denoted by . Finally is defined as

(5) |

where . In our experiment, we divided EEG channels into five groups: (AF3, AF4), (F7, F3, FC5, F4, F8, FC6), (T7, T8), (P7, P8), (O1, O2) based on their locations. Then, we first normalized every time series by their mean and standard deviation, and set for every and = 0.15*(std of the normalized signals) to calculated MMSE.

#### Iii-A3 Refined Composite Multi-Scale Permutation Entropy (RCMPE)

Permutation entropy is another common method to evaluate the complexity of the signal [5]. For a signal of length , the PE value is defined as

(6) |

where is the embedding dimension of the permutation pattern, are distinct patterns, is the index of permutation and is the relative frequency of the permutation . These patterns are often denoted as motifs which indicate different kinds of amplitude variation of the signals. The value of PE is always between and where the lower bound is calculated for increasing and decreasing time series, and the upper bound for a random time series where all motifs have the same frequency. Multi-Scale PE (MPE) which incorporates coarse-graining in is often used for physiological signals due to the robust performance it brings [11].

RCMPE is a modified version of MPE proposed by Humeau-Heurtier et al. [12]. It overcomes the drawback of MPE where statistical reliability goes down when the coarse-graining procedure used in MPE reduces the length of the time series. There are two steps in RCMPE. First, coarse-grained time series are derived from the original signal as equation (2). Then, the RCMPE value is defined as

(7) |

where is the index of permutation and is the average relative frequency of the permutation in all of the coarse-grained time series .

In our experiment, we set and to analyze ECG, and set and for GSR.

#### Iii-A4 Multivariate Multi-Scale Permutation Entropy (MMPE)

As mentioned in Section III-A2, we need a different approach which considers the correlation between different channels when dealing with -variate time series such as EEG signals. We select MMPE proposed by Morabito et al. [13]. The first step is to get coarse-grained time series for signals of all the considered channels which can be defined as shown in (4). Then we calculate MMPE as

(8) |

where is the index of permutation and is the average relative frequency of the permutation in all of the coarse-grained time series .

The settings of MMPE are mostly the same with MMSE except for some differences. We set and .

### Iii-B Extreme Gradient Boosting (XGBoost)

We select XGBoost as our classification model to predict emotion. XGBoost is a scalable and flexible machine learning method based on gradient boosting. It was proposed by Tianqi Chen and Carlos Guestrin in 2015 [14]. It has become one of the most popular methods in many machine learning competitions because of the exceptional performance it shows in supervised learning problems.

The basis of XGBoost, Gradient Boosting, is an ensemble technique where a collection of predictors, commonly decision trees, are combined sequentially to become a stronger model [15]. The output of the combined model can be denoted as

(9) |

where is one of the predictor, T is the total number of predictors and is the input feature. A specific loss function for XGBoost which is optimized at each iteration of gradient boosting is proposed as

(10) |

where is the parameters of the model, is the training loss function, and are ground truth and predicted value respectively, is the regularization term and is the total number of predictors. indicates how well the predictor is performing, and logistic regression is commonly used for it. controls how complex the model is, and by adding it into the objective function, it can help us avoid over-fitting.

Since decision tree is typically selected as the predictor, the importance of each feature can be calculated by counting how many times a feature is used to split the data across all the trees. This can be particularly useful when evaluating the efficacy of the entropy-domain features.

## Iv Experiment Setup and Results

We conducted two sets of experiments: statistical analysis and classification. The first one was to discover how statistically significant the new entropy-domain features were. We could then employ the knowledge learned from it in the next step. Classification was the main task of emotion recognition since it was viewed as a binary classification problem. Our goal was to classify the classes of arousal and valence from the physiological signals of the corresponding subject.

In our experiment, only short videos were considered (There are 16 short videos per subject). The data of 7 subjects were removed due to bad signal quality and missing data in some of the modalities. Therefore, the total amount of samples in the dataset changed from 640 (40 subjects 16 videos) to 528 (33 subjects 16 videos).

### Iv-a Statistical Analysis

Analysis of Variance (ANOVA) was adopted for the statistical analysis of the features we extracted. It calculates the p-values by comparing the relative values between variation within groups and among groups. A common threshold for the significant statistical difference is 0.05. We used ANOVA to analyze the p-value of the entropy-domain features, as shown in Table I, II, III, IV, V. The boldfaces are the p-values smaller than 0.1, the italicize are the p-values smaller than 0.05.

#### Iv-A1 RCMSE and MMSE

The p-value of RCMSE and MMSE of ECG and GSR are shown in Table I and II. RCMSE of ECG has the best p-value when setting scaling factor to 2 for both arousal and valence. For RCMSE of GSR, we can observe that for arousal, the p-value becomes significant (0.01) when scaling factor is greater than 5. In these settings, the positive class would always have higher RCMSE, implying that the arousal of a subject is proportional to the complexity of its physiological signals. Note that the GSR would respond relatively slow according to the affect, thus the p-values become significant by increasing the scale factor. As for the p-value of MMSE of EEG, there isn’t any feature whose p-value is smaller than 0.1.

scale | 1 | 2 | 3 |
---|---|---|---|

m=0 | 0.91 | 0.08 | 0.98 |

m=1 | 0.16 | 0.07 | 0.38 |

m=2 | 0.13 | 0.82 | 0.11 |

scale | 1 | 2 | 3 |
---|---|---|---|

m=0 | 0.91 | 0.04 | 0.08 |

m=1 | 0.46 | 0.06 | 0.14 |

m=2 | 0.76 | 0.01 | 0.91 |

scale | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

A (2) | 0.23 | 0.20 | 0.18 | 0.11 | 0.10 | 0.07 | 0.05 | 0.04 | 0.03 | 0.03 |

V (2) | 0.25 | 0.33 | 0.28 | 0.34 | 0.29 | 0.31 | 0.33 | 0.35 | 0.33 | 0.32 |

scale | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |

A (2) | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |

V (2) | 0.31 | 0.29 | 0.29 | 0.29 | 0.28 | 0.27 | 0.27 | 0.27 | 0.27 | 0.26 |

#### Iv-A2 RCMPE and MMPE

The p-value of RCMPE and MMPE of ECG, GSR and EEG are shown in Table III, IV and V. RCMPE of ECG performs better in valence than arousal with the significantly low p-values (0.01) in all scale. RCMPE of GSR gets greater performance in arousal since most of the features are lower than 0.05 in arousal. The positive class will have higher RCMPE which is congruent with the case in RCMSE of GSR. MMPE of EEG performs much better in valence when the scale factor goes up, which indicates the importance of the coarse-graining step.

scale | 1 | 2 | 3 |
---|---|---|---|

A (3) | 0.03 | 0.03 | 0.1 |

V (6) | 0.01 | 0.01 | 0.01 |

scale | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

A (2) | 0.01 | 0.08 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |

V (5) | 0.06 | 0.13 | 0.18 | 0.22 | 0.25 | 0.28 | 0.31 | 0.32 | 0.34 | 0.36 |

scale | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |

A (2) | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |

V (5) | 0.37 | 0.38 | 0.39 | 0.39 | 0.40 | 0.40 | 0.41 | 0.41 | 0.41 | 0.41 |

scale | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

A (4) | 0.43 | 0.46 | 0.99 | 0.82 | 0.66 | 0.45 | 0.54 | 0.52 | 0.33 | 0.47 |

V (6) | 0.30 | 0.37 | 0.38 | 0.46 | 0.48 | 0.46 | 0.38 | 0.30 | 0.25 | 0.15 |

scale | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |

A (4) | 0.26 | 0.22 | 0.27 | 0.12 | 0.13 | 0.1 | 0.14 | 0.17 | 0.04 | 0.17 |

V (4) | 0.08 | 0.03 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |

### Iv-B Classification Results

We used XGBoost as our classification model, fixed maximum depth and number of estimators to compensate for different sizes of input and applied grid search for parameter tuning. The classification performance was evaluated in terms of mean F1-score, which is the harmonic mean of precision and recall. The macro version of F1-score was utilized to consider both positive and negative classes. We employed leave-one-subject-out as our cross-validation scheme, where the classification models were trained using all data but videos of one subject which were then used in testing.

The emotion recognition performance is shown in Table VI. Scheme I was the one reported in [3], where 213 traditional features and Gaussian Naive Bayes were employed. Scheme II was also fed with traditional features but had XGBoost as the classification model. Scheme III was the one we proposed, which utilized entropy-domain features and XGBoost. We concatenated old traditional features with new entropy-domain features which were statistically significant (p-value 0.05). There were 41 and 101 new entropy-domain features for arousal and valence respectively.

The result shows that the Entropy-assisted model (III) we propose has the best performance in most of the situations. For single modality, all three of them in valence raise the F1-score by over or around 10%. Huge improvements compared to previous methods (Scheme I), +12.1% and +25.3% for arousal and valence respectively, are found in Fusion modalities. +17.8% can be found in EEG for valence between Scheme II and Scheme III. The aforementioned improvements prove the efficacy of the proposed new scheme. Dominant features in terms of the feature importance of Fusion modalities in Scheme III are shown in Table VII. entropy-domain features are highlighted in boldface. The selection of entropy-domain features (MMPE of EEG and RCMPE of ECG) vindicates the improvements in the performance of Fusion between Scheme II and III where +1.9% and +2.9% are found for arousal and valence.

Scheme (A) | EEG | ECG | GSR | Fusion |

I [3] | 0.592 | 0.550 | 0.548 | 0.585 |

II | 0.568 | 0.556 | 0.665 | 0.687 |

III | 0.568 | 0.561 | 0.692 | 0.706 |

Scheme (V) | EEG | ECG | GSR | Fusion |

I [3] | 0.576 | 0.535 | 0.531 | 0.570 |

II | 0.575 | 0.621 | 0.796 | 0.794 |

III | 0.753 | 0.633 | 0.796 | 0.823 |

(up: arousal, down: valence)

A | Dominant features |
---|---|

Spectral power of GSR in [0.0 0.2] Hz | |

Spectral power of ECG in [1.8 1.9] Hz | |

Spectral power of HRV in [0.01 0.08] Hz | |

Mean derivative of skin conductance slow response (SCSR) | |

Number of local minima in GSR | |

MMPE of EEG in 3rd group (, ) | |

RCMPE of ECG (, ) | |

Spectral power of ECG in [3.0 3.1] Hz | |

Spectral power of ECG in [4.3 4.4] Hz | |

Mean second derivative of SCSR |

V | Dominant features |
---|---|

Spectral power of GSR in [0.0 0.2] Hz | |

MMPE of EEG in 2nd group (, ) | |

MMPE of EEG in 2nd group (, ) | |

Spectral power of GSR in [0.4 0.6] Hz | |

MMPE of EEG in 2nd group (, ) | |

Mean derivative of skin conductance slow response (SCSR) | |

MMPE of EEG in 2nd group (, ) | |

MMPE of EEG in 2nd group (, ) | |

Mean derivative of skin conductance (SC) | |

MMPE of EEG in 5th group (, ) |

## V Conclusion

In this paper, we propose an enhanced framework for emotion recognition. The proposed system integrates multiple entropy-domain features such as RCMSE, MMSE, RCMPE, and MMPE with XGBoost classifier. The results of statistical analysis suggest that the entropy-domain features extracted from EEG, ECG, and GSR are statistically significant for emotion recognition, especially for RCMPE of GSR in arousal. Emotion classification results show much-improved performance in classification of arousal and valence compared to previous methods.

## Acknowledgment

This work was supported by the Ministry of Science and Technology of Taiwan (MOST 106-2221-E-002-205-MY3 and MOST 106-2622-8-002-013-TA), National Taiwan University and Pixart Imaging Inc.

## References

- [1] Y. L. Hsu, J. S. Wang, W. C. Chiang, and C. H. Hung, “Automatic ecg-based emotion recognition in music listening,” IEEE Transactions on Affective Computing, pp. 1–1, 2017.
- [2] I. Daly, A. Malik, J. Weaver, F. Hwang, S. J. Nasuto, D. Williams, A. Kirke, and E. Miranda, “Identifying music-induced emotions from eeg for use in brain-computer music interfacing,” in 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), Sept 2015, pp. 923–929.
- [3] J. Abdon Miranda-Correa, M. Khomami Abadi, N. Sebe, and I. Patras, “Amigos: A dataset for mood, personality and affect research on individuals and groups,” 02 2017.
- [4] M. Costa, A. L. Goldberger, and C.-K. Peng, “Multiscale entropy analysis of complex physiologic time series,” Phys. Rev. Lett., vol. 89, p. 068102, Jul 2002.
- [5] C. Bandt and B. Pompe, “Permutation entropy: A natural complexity measure for time series,” Phys. Rev. Lett., vol. 88, p. 174102, Apr 2002.
- [6] S. Koelstra, C. Muhl, M. Soleymani, J. S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras, “Deap: A database for emotion analysis ;using physiological signals,” IEEE Transactions on Affective Computing, vol. 3, no. 1, pp. 18–31, Jan 2012.
- [7] R. Subramanian, J. Wache, M. K. Abadi, R. L. Vieriu, S. Winkler, and N. Sebe, “Ascertain: Emotion and personality recognition using commercial sensors,” IEEE Transactions on Affective Computing, vol. 9, no. 2, pp. 147–160, April 2018.
- [8] S.-D. Wu, C.-W. Wu, S.-G. Lin, K.-Y. Lee, and C.-K. Peng, “Analysis of complex time series using refined composite multiscale entropy,” Physics Letters A, vol. 378, no. 20, pp. 1369 – 1374, 2014.
- [9] K. Michalopoulos and N. Bourbakis, “Application of multiscale entropy on eeg signals for emotion detection,” pp. 341–344, Feb 2017.
- [10] M. U. Ahmed and D. P. Mandic, “Multivariate multiscale entropy: A tool for complexity analysis of multichannel data,” Phys. Rev. E, vol. 84, p. 061918, Dec 2011.
- [11] W. Aziz and M. Arif, “Multiscale permutation entropy of physiological time series,” in 2005 Pakistan Section Multitopic Conference, Dec 2005, pp. 1–6.
- [12] A. Humeau-Heurtier, C. W. Wu, and S. D. Wu, “Refined composite multiscale permutation entropy to overcome multiscale permutation entropy length dependence,” IEEE Signal Processing Letters, vol. 22, no. 12, pp. 2364–2367, Dec 2015.
- [13] F. C. Morabito, D. Labate, F. La Foresta, A. Bramanti, G. Morabito, and I. Palamara, “Multivariate multi-scale permutation entropy for complexity analysis of alzheimerâs disease eeg,” Entropy, vol. 14, no. 7, pp. 1186–1202, 2012.
- [14] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: ACM, 2016, pp. 785–794.
- [15] J. H. Friedman, “Greedy function approximation: A gradient boosting machine.” Ann. Statist., vol. 29, no. 5, pp. 1189–1232, 10 2001.