Universal Physiological Representation Learning with Soft-Disentangled Rateless Autoencoders

Universal Physiological Representation Learning with Soft-Disentangled Rateless Autoencoders


Human computer interaction (HCI) involves a multidisciplinary fusion of technologies, through which the control of external devices could be achieved by monitoring physiological status of users. However, physiological biosignals often vary across users and recording sessions due to unstable physical/mental conditions and task-irrelevant activities. To deal with this challenge, we propose a method of adversarial feature encoding with the concept of a Rateless Autoencoder (RAE), in order to exploit disentangled, nuisance-robust, and universal representations. We achieve a good trade-off between user-specific and task-relevant features by making use of the stochastic disentanglement of the latent representations by adopting additional adversarial networks. The proposed model is applicable to a wider range of unknown users and tasks as well as different classifiers. Results on cross-subject transfer evaluations show the advantages of the proposed framework, with up to an improvement in the average subject-transfer classification accuracy.


stochastic bottleneck, soft disentanglement, disentangled representation, deep learning, autoencoders, adversarial learning, physiological biosignals

1 Introduction


Human computer interaction (HCI) [15] is a fundamental technology enabling machines to monitor physiological disorders, to comprehend human emotions, and to execute proper actions, so that users can control external devices through their physiological status in a safe and reliable fashion. To measure traditional physiological biosignals such as electrocardiogram (ECG) [14], electromyography (EMG) [11] and electroencephalography (EEG) [23], either implanted or surface electrodes and their frequent calibration are necessary, reducing user comfort while increasing the overall expense. Recently, novel wearable sensors such as wrist-worn devices were developed for accurately measuring physiological signals [3, 1, 5, 9, 10, 22] (e.g., arterial oxygen level, heart rate, skin temperature, etc.) in comfortable and effective manners. Utilizing these non-EEG physiological biosignals can effectively increase the system convenience during data collection with less expense.

One major challenge of physiological status assessment lies in the problem of transfer learning caused by the variability in biosignals across users or recording sessions due to the unstable mental/physical conditions and task-irrelevant disturbances. Addressing biosignal datasets collected from a narrow amount of subjects, transfer learning methods [8, 18, 29, 4] are applied to build strong feature learning machines to extract robust and invariant features across various tasks and/or unknown subjects. Particularly, adversarial transfer learning [21, 20, 19, 13, 12, 6, 27, 24, 17] demonstrated impressive results in constructing such discriminative feature extractors. Traditional adversarial transfer learning works aim to extract latent representations universally shared by a group of attributes using adversarial inference, where a discriminative network is trained adversarially towards the feature extractor in order to differentiate universal features from various attributes. However, in most existing approaches, the adversarial training scheme is usually applied indiscriminately on the whole feature group when extracting cross-attribute latent representations, which inevitably leads to the loss of attribute-discriminative information. Therefore, rather than using only one adversarial discriminator to merely preserve shared cross-attribute features, we train two additional adversarial discriminators jointly with the feature extractor, so that the physiological features could be disentangled into two counterparts representative of subject and task associated information respectively. In this way, the variability in both subject and task space can be better accounted for.

As a commonly used feature extractor framework for transfer learning, autoencoders (AE) [25, 26, 28] can learn latent representations with a dimensionality typically much smaller than the input data, which is known as a “bottleneck” architecture, while capturing key data features to enable data reconstruction from the latent representation. A challenging problem in dimensionality reduction is to determine an optimal feature dimensionality which sufficiently captures latent information that is essential for particular tasks. To address this issue, the Rateless Autoencoder (RAE) [16] was proposed to enable the AE to seamlessly adjust feature dimensionality through its rateless property, while not requiring a fixed structure of bottleneck. To realize such flexibility in the latent space, RAE implements a probabilistic latent dimensionality which is stochastically decreased through dropout during training, where a non-uniform dropout rate distribution is imposed to the bottleneck structure.

In this work, we propose a method of adversarial feature extractor in order to exploit soft-disentangled universal representations, extended from [13] and [12], where the concept of RAE is newly introduced. Unlike traditional feature learning frameworks ignoring the specificity of either task calibrations or target subjects, the proposed model is applicable to a wider range of unknown individuals and tasks. Our contributions are summarized as follows:

  • We complementarily use two additional adversarial networks, i.e., adversary and nuisance blocks, to disentangle and re-organize the latent representations.

  • The rateless trade-off between subject-specific and task-relevant features is exploited by stochastically attaching adversary and nuisance blocks to the encoder.

  • Different dropout strategies of the disentangled adversarial RAE are discussed.

  • Empirical assessments were performed on a publicly available dataset of physiological biosignals for measuring human stress level through cross-subject evaluations with various classifiers.

  • Comparative experiments on multiple model setups including traditional autoencoder and adversarial methods are evaluated.

  • We demonstrate the remarkable advantage of the proposed framework, achieving up to an improvement in subject-transfer classification accuracy.

2 Methodology

(a) Conditional autoencoder (cAE)
(b) Conditional rateless autoencoder (cRAE)
Figure 1: (a) Conditional autoencoder (cAE): an encoder-decoder pair where the encoder estimates latent with parameters , and the decoder estimates reconstructed input signals with parameters , using the latent and conditioning variable . When decoder is , it reduces to a traditional autoencoder (AE). (b) Conditional rateless autoencoder (cRAE): a probabilistic cAE model with a stochastic bottleneck where th latent representation node is assigned with dropout probability rates , such that the conditional decoder takes a subset of the latent units as input.
(a) Disentangled adversarial autoencoder (DA-cAE)
(b) Disentangled adversarial rateless autoencoder (DA-cRAE)
Figure 2: Disentanlgled adversarial autoencoder for nuisance-robust transfer learning. (a) Disentangled adversarial conditional autoencoder (DA-cAE) with hard split: a deterministic disentangled universal latent representation learning model where is partitioned into sub-parts and which are adversarially trained to be -invariant (i.e., used as an input to an adversary network) and -variant (i.e., used as an input to a nuisance network) respectively. (b) Disentangled conditional rateless autoencoder (DA-cRAE) with soft split: a cRAE model with soft disentanglement, where the adversary and nuisance network inputs are determined through the stochastic bottleneck architecture with probabilities and respectively for the th latent node.

2.1 Notation and Problem Description

We define as a labeled data set, where is the input data vector recorded from channels of trial , is the class label of user task/status among classes, and is the user identification (ID) index among subjects. The task/status is assumed to be marginally independent with respect to subject ID , and the physiological signal is generated dependently on and , i.e., . The aim is to construct a model to estimate the task/status label given an observation , where the model is generalized across the variability of subject , which is considered as a nuisance variable associated with transferring the feature extraction model.

2.2 Rateless Autoencoder (RAE)

AE is a well-known feature learning machine which includes a network pair of encoder and decoder, as shown in Fig. 1(a). The encoder packs data features into a latent representation , while the decoder intends to re-construct the input data based on the latent representation . AE structures are typically bottleneck architectures, where the dimensionality of representation is lower than the dimensionality of input data , and the latent variables should contain adequate features capable of reconstructing the original data through its corresponding decoder network. A challenging problem in such a dimensionality reduction is to decide an optimal feature dimensionality which captures sufficient latent representations that are essential for specific tasks.

RAE [16] is an AE family providing a rateless property that enables the AE to seamlessly adjust feature dimensionality. Unlike a conventional AE with a deterministic bottleneck architecture, the RAE employs a probabilistic bottleneck feature whose dimensionality is stochastically reduced through dropout. Particularly, RAE imposes a specific dropout rate distribution that varies across the nodes of representation . For example, as depicted in Fig. 1(b), the RAE encoder generates latent variables of dimension which are randomly dropped out at a probability of for node , resulting in an effective latent dimensionality of . RAE is regarded as an ensemble method which jointly exploits all different AEs having a latent dimension of from to . It is hence more insensitive to the choice of the dimensionality parameter.

In our method, we make use of the RAE concept to realize a good trade-off between task-related features and person-discriminative information by attaching new adversary and nuisance blocks to the representation through different dropout strategies, with fed into the decoder without dropout. A soft-disentangled feature extractor is first trained based on the rateless conception, and a task classifier is then learned for the final discriminative model utilizing the features extracted from the pre-trained (frozen) feature encoder.

2.3 Disentangled Adversarial Transfer Learning with RAE

Disentangled Feature Extractor

In [13] and [12], disentangled feature extraction method was proposed to improve subject-transfer performance. As shown in Fig. 2(a), the features are divided into two parts of and , which are intended to conceal subject-invariant and subject-specific information, respectively. Despite the gain of the disentangled method, determining the split sizes of and is still challenging. In this paper, we extend the method with soft disentanglement motivated by RAE as shown in Fig. 2(b), to mitigate the sensitivity of the splitting parameter.

For implementing the soft-disentangled adversarial transfer learning, encoder output is forwarded into two additional units, the adversary network and nuisance network, with different dropout rate distributions. As illustrated in Fig. 2(b), the dropout rate distributions of representation to the adversary network and nuisance network are designed as and , respectively. Complete latent representation is further fed into the decoder without any dropout. Through the stochastic disentangling, the representations are re-organized into two sub-parts related to task and subject respectively: upper feature units with lower (higher ) to adversary network aim to conceal more subject information regarding , while lower units with lower (higher ) to nuisance network are designed to include more subject-related features. By dissociating the nuisance variable from task-related feature in a more clear way, the model is extrapolated into a broader domain of subjects and tasks. For the input data from an unknown user, task-related features with lower would be incorporated into the final prediction; simultaneously, the biological characteristics which are similar to known subjects could also be projected to representations with lower as a reference.

In order to filter out the variation elements caused by from the adversary counterpart of with lower and simultaneously maintain more task-relevant information in it, the encoder is driven to minimize the adversary likelihood of ; at the same time, to embed sufficient user-discriminative features within representations with lower , the encoder is also forced to maximize the nuisance likelihood of . The full representation from encoder is fed into the decoder with zero dropout, which is conditioned on as an additional input besides , where the encoder and decoder are trained to optimize the reconstruction loss of compared to the true input . Therefore, the final objective function to train the proposed model structure can be written as follows:


where the first item is the loss of decoder reconstructing inputs from , and and respectively represent the regularization weights for adversary and nuisance units in order to achieve a flexible trade-off between identification and invariance performance. The model will reduce to a regular conditional AE (cAE) structure when , which involves no stochastic bottleneck or disentangling transfer learning block.

Adversarial Training Scheme

In addition to the training of encoder-decoder pair, at every optimization iteration, the parameters of adversary and nuisance networks are learned towards maximizing the likelihoods and respectively to estimate the ID among subjects. The parameter updates and optimizations among the encoder-decoder pair, adversary network and nuisance network are performed alternatingly by stochastic gradient descent, where the two adversarial discriminators are separately trained to minimize their corresponding cross-entropy losses.

Discriminative Classifier

An independent status/task classifier is attached to the encoder with frozen network weights pre-trained by the proposed soft-disentangled adversarial method, and then optimized utilizing the input of latent feature . The purpose of the classifier is to estimate the corresponding status/task class among categories given the physiological input , where the feature of would be first extracted ahead to the task classifier. Parameterized by , the classifier optimization is further executed by minimizing the following cross-entropy loss:


where is the estimate of subject status/task category .

2.4 Discussion of Dropout Rate Distribution

Within the various dropout rate distributions for the representation input to the adversary and nuisance networks (when and ), the stochastic bottleneck architecture includes two cases: hard split and soft split.

Hard Split

For the particular case when the dropout rate is either or for each feature node , i.e., when the feature output of node is either input to the adversary network only or the nuisance network only along with decoder, the representation is hard split into two sub-parts and , corresponding respectively to the adversary and nuisance blocks, as shown in Fig. 2(a). The sub-part feature with and for aims at preserving task-related feature information, while subject-related feature would be embedded in representation with and for . In this case, it reduces to a regular disentangled adversarial cAE structure (DA-cAE) with adversary and nuisance networks attached but no rateless property, as introduced in [13, 12].

Soft Split

For the more generic case of soft-split representation , dropout rates to adversary and nuisance blocks are arbitrary, provided that they satisfy for each feature node . Therefore, the bottleneck architecture is soft split into adversary and nuisance counterparts stochastically according to the distribution and , respectively, as depicted in Fig. 2(b). This conditional RAE with soft-disentangled adversarial structure (DA-cRAE) can partly resolve the issue of hard split which requires pre-determined dimensionality for two disentangled latent vectors, whereas the proposed method can automatically consider different ratio of hard splits in a non-deterministic ensemble manner.

2.5 Model Implementations

Recently, neural networks and deep learning show impressive results in biosignal processing [19, 2, 7]. Motivated by those works, we mainly make use of neural networks to build feature extractor in the proposed model. However, we note that other learning frameworks without neural networks is also be able to be applied to the proposed method of soft-disentangled adversarial transfer learning.

Model Architecture

Encoder Network FC(, ) ReLU FC(, )
Decoder Network FC(, ) ReLU FC(, )
Adversary Network FC(, )
Nuisance Network FC(, )
Table 1: Network structures, where FC() is linear fully connected layer of dimensions and for input and output, and ReLU denotes rectified linear unit.

The utilized model structure for experiment evaluations is presented in Table 1, where representation has a dimensionality of . The adversary and nuisance networks have a same input dimension from the latent representation and output dimension for the classification of subject IDs. We note that we did not observe significant improvements by deepening the network or altering the number of units for our physiological biosignal dataset under test. To assess the robustness of the proposed soft-disentangled adversarial feature encoder, we implemented various classifiers for evluating the final task classification, including MLP, nearest neighbors, decision tree, linear discriminant analysis (LDA), and logistic regression classifiers with output dimensions for task classification.

Rateless Parameters

Representation with dimension is fed into adversary network and nuisance network respectively with dropout rates and . For the soft-split case in Section 2.4.2, we take and for , where parameter can adjust the ascent speed of dropout rate along , and we take in the experimental assessments. In the implementation for hard split of Section 2.4.1, we fix the ratio of dimensions between and to .

Comparison Model Definitions

We denote AE as a baseline architecture of a regular encoder-decoder pair for feature extraction as presented in [25] and [26], whose decoder is without adversarial disentangling units, and cAE as a conditional AE feature extractor with decoder conditioned on as described in [28]. A-cAE and A-cRAE denote the cAE models with the aforementioned hard-split and soft-split bottleneck features respectively attached to the adversary network only. D-cAE and D-cRAE represent cAE with hard-split and soft-split bottleneck variables respectively linked to the nuisance network only. DA-cAE and DA-cRAE specify hard-split and soft-split representations connected to both adversary and nuisance networks respectively with decoder conditioned on . Note that the A-cAE resembles to the traditional adversarial learning methods presented in [6, 27, 24, 17] where only one adversarial unit is adopted.

3 Experimental Study

3.1 Dataset

The proposed methodology was evaluated on a physiological biosignal dataset for assessing human stress status [3], which is available online1. It includes physiological biosignals of various modalities, in order to estimate discrete stress levels (physical stress, cognitive stress, emotional stress, and relaxation) based on data collected from subjects. The biosignals were generated from non-invasive biosensors worn on the wrist, containing heart rate, temperature, electrodermal activity, three-dimensional acceleration, and arterial oxygen level, therefore resulting in signal channels totally. We further downsampled the signals to  Hz in order to align all data channels. For each stress status, a -minute long task was assigned to the subjects. In total, trials were executed by every subject, among which trials were the status of relaxation. To address the data imbalance of trials with different categories, we only utilized the first trial of relaxation status, leading to four trials for the four stress status levels respectively and data samples in total.

Figure 3: Transfer learning accuracies for held-out subjects of different classifiers with eight feature learning frameworks: (1) AE: baseline of regular AE with decoder , (2) cAE: AE with -conditional decoder , (3) A-cAE: hard-split bottleneck cAE with adversary network, (4) D-cAE: hard-split bottleneck cAE with nuisance network, (5) DA-cAE: hard-split bottleneck cAE with both adversary and nuisance networks, (6) A-cRAE: soft-split bottleneck cAE with adversary network, (7) D-cRAE: soft-split bottleneck cAE with nuisance network, (8) DA-cRAE: soft-split bottleneck cAE with both adversary and nuisance networks. For each box, the central line marks the median, upper and lower bounds represent first and third quartiles, and dashed lines denote extreme values; the diamond-shape marker specifies the average.

3.2 Experiment Implementation

The regularization weights and were chosen for the disentangled adversarial model by parameter sweep and validation. We trained the model with different parameter combinations, and preferred the parameters producing lower accuracy of the adversary discriminator and higher accuracy of the nuisance discriminator, premised on obtaining higher cross-validation accuracy for the discriminative task classifier.

While optimizing the selection for and , to reduce the size of parameter combinations, we first swept over with ; then was fixed at its optimized value from the previous step to optimize value. The adopted ranges of and are and . Note that the selected parameter values can be even optimized more within larger scopes by cross-validating the same model learning process. We evaluated the model with transfer analysis of cross-subjects through a leave-one-subject-out method, where the cross-subject test data came from the left-out subject, and and of the data from the remaining subjects were randomly split as the training and validation sets respectively.

MLP Nearest Neighbors Decision Tree LDA Logistic Regression
avg acc avg acc avg acc avg acc avg acc
AE [25, 26] 0 0 72.2% 0 0 71.1% 0 0 71.2% 0 0 76.5% 0 0 78.7%
cAE [28] 0 0 72.9% 0 0 72.2% 0 0 72.4% 0 0 77.8% 0 0 79.7%
A-cAE [6, 27, 24, 17] 0.005 0 75.0% 0.1 0 73.9% 0.1 0 73.4% 0.05 0 79.8% 0.05 0 80.8%
D-cAE 0 0.005 75.2% 0 0.01 74.9% 0 0.01 75.8% 0 0.2 80.2% 0 0.2 81.8%
DA-cAE [13, 12] 0.01 0.005 81.0% 0.1 0.01 77.0% 0.2 0.01 77.3% 0.2 0.2 84.3% 0.2 0.2 85.3%
A-cRAE 0.02 0 76.8% 0.05 0 75.2% 0.05 0 74.1% 0.1 0 80.4% 0.02 0 81.9%
D-cRAE 0 0.05 77.2% 0 0.05 76.1% 0 0.1 75.2% 0 0.05 82.0% 0 0.05 83.7%
DA-cRAE 0.5 0.05 83.8% 0.5 0.05 79.6% 0.01 0.1 81.5% 0.5 0.05 84.5% 0.5 0.05 85.5%
Table 2: Optimized parameter selections with averaged cross-subject accuracies.

3.3 Results and Discussions

Comparative Experiments

Accuracies of transfer analysis across held-out subjects based on different feature encoders and classifiers are presented in Fig. 3, where AE, cAE, A-cAE, D-cAE, DA-cAE, A-cRAE, D-cRAE, and DA-cRAE as defined in Section 2.5.3 were trained and compared. Corresponding parameter settings for each case in Fig. 3 are displayed in Table 2, which were selected and optimized through the aforementioned parameter optimization procedure. The model architecture is as shown in Table 1, where feature dimension is .

MLP Adversary Nuisance
Classifier Network Network
AE 0 0 72.2% 7.8% 5.6%
cAE 0 0 72.9% 8.5% 5.8%
0 0.005 74.8% 7.7% 8.5%
D-cRAE 0 0.01 73.5% 12.5% 15.2%
0 0.05 77.2% 10.7% 19.7%
0 0.2 75.6% 13.6% 16.5%
0 0.5 74.1% 12.6% 35.5%
DA-cRAE 0.01 0.05 78.3% 9.4% 13.6%
0.05 0.05 77.3% 6.7% 14.6%
0.1 0.05 77.9% 5.9% 13.3%
0.2 0.05 81.5% 5.5% 12.7%
0.5 0.05 83.8% 4.9% 13.9%
Table 3: Parameter optimization of MLP classifier. Accuracies for the adversary, nuisance and classifier are presented.
Figure 4: MLP classification accuracies of DA-cRAE model (, ) for held-out subjects with different dimension D of representation , compared with baseline AE.

As shown in Fig. 3 and Table 2, first we observe that simply feeding the decoder an extra conditional input could yield slightly better classification performance when comparing cAE with AE. Furthermore, we notice accuracy improvements from A-cAE and D-cAE to cAE, demonstrating that more cross-subject features observed in the hard-split representation lead to better identification of . In addition, DA-cAE realizes further accuracy improvements with both adversary and nuisance networks compared to individual regularization approaches A-cAE and D-cAE. Under the disentangled adversarial transfer learning framework, our feature extractor results in lower variation of performances across all task classifiers and all subjects universally. More importantly, the soft-split RAE structures of A-cRAE, D-cRAE and DA-cRAE bring even more accuracy gain compared to the hard-split cases of A-cAE, D-cAE and DA-cAE. For the hard-split case, determining the split ratio of dimensions between subject-related and task-specified features is difficult since the representation nature is still unkown. However, the rateless property enables the encoder-decoder pair to seamlessly adjust dimensionalities of subject-related and task-specified features, and employs a smooth transition between the two stochastic counterparts by a probabilistic bottleneck representation, even though the underlying nature of the bottleneck is still vague. In general, the disentangled adversarial models of DA-cRAE with both adversary and nuisance networks attached to conditional decoder lead to significant improvements in average accuracy up to (e.g., the MLP classifier in Table 2) with respect to the non-adversarial baseline AE. Furthermore, as observed in Fig. 3, the cross-validation accuracies of the worst cases are also significantly improved, indicating that the proposed transfer learning architecture presents higher stability to a wider range of unknown individuals through reorganizing the subject- and task-relevant representations from the end of feature extractor.

Figure 5: MLP classification accuracies of optimized parameter choices in Table 2 with different training dataset sizes.

Impact of Disentangled Adversarial Parameters

We take the MLP classifier as an example to particularly illustrate the impact of disentangled adversarial RAE. As presented in Table 3, the baseline models of AE and cAE were first assessed with while training the MLP discriminative classifier. Then the D-cRAE was evaluated with and . Finally, we froze to observe the representation learning capability of the complete soft-disentangled adversarial transfer learning model DA-cRAE with different choices of . For each parameter selection, the average accuracy of the MLP task classifier for identifying stress levels is shown in Table 3, along with the discriminator accuracies of the adversary and nuisance blocks for decoding -class ID. With an increasing accuracy of MLP task classifier, stress levels are better discriminated; with a growing accuracy of nuisance network, more person-discriminative features are preserved in the nuisance counterpart; and with a decreasing accuracy of adversary network, more task-specific information are inherent in the adversary counterpart. We observe that the nuisance network produces higher accuracy with increasing , where particularly results in the better performance on task classification. Furthermore, with fixed , growing leads to lower accuracy of adversary network, and thus imposes less extraction of subject features but more task-related information on the adversary counterpart.

Impact of Feature Dimension

Other than the adversarial parameters and , we further inspect the impact of different feature dimensions on the performance of the proposed DA-cRAE model. We trained MLP classifiers with the DA-cRAE feature extractor and its optimized parameters as given in Table 2 ( and ), using various feature dimensions . Corresponding cross-validation accuracies for held-out subjects are shown as a function of in Fig. 4, where the average accuracy for each is also marked. The same assessments on were also applied to baseline AE feature extractor, and we present its curve of average accuracies in Fig. 4 as a reference to compare with DA-cRAE. It is verified that the proposed DA-cRAE consistently outperforms the baseline AE and latent dimensionality was sufficient for the problem. We observe that after a specific value of dimension , the performance of DA-cRAE remains relatively stable with varying value compared to AE. On one hand, when the feature dimension is large enough to carry necessary information for the classification task, higher value might not be able to bring more benefits when extracting features; on the other hand, the rateless property of DA-cRAE resolves the entanglement between task-related and subject-discriminative information and exploits the latent features in a more efficient manner, thus leading to a stronger robustness on the variance of latent representation dimensionality.

Impact of Data Size

In order to evaluate the robustness of our transfer learning method on data with smaller sizes, we investigated the performance of the proposed model when we reduced the available training data size from % to %, %, or %. Corresponding classification accuracies as a function of training data size are shown in Fig. 5. Here we consider the MLP classifier as the same example of Table 2, to make comparisons among DA-cRAE ( and ), DA-cAE ( and ), cAE and AE (). From Fig. 5 we observe that DA-cRAE still performs best regardless of the amount of available training data. Even with % data only, there is no significant drawback of DA-cRAE and DA-cAE compared to non-adversarial methods, showing the transfer learning ability of our method to the size deficiency of physiological data. Note that with more available training data, even better performance is expected to be implemented by the model.

(a) DA-cRAE convergence curves with 100% data size.
(b) DA-cRAE convergence curves with 25% data size.
Figure 6: Convergence of DA-cRAE ( and ) with different training data sizes.

Convergence Analysis

In addition, training convergence curves for a specific DA-cRAE ( and ) case with different training data sizes are presented in Fig. 6. When using the full % set of available training data, i.e., in Fig. 6(a), the total training loss value of DA-cRAE converges within epochs, while the nuisance loss decreases steadily with more training iterations and the loss value of the adversary unit keeps steady due to its antagonistic relationship with DA-cRAE, where the adversary unit continues to conceal subject-specific representations without undermining the discriminative performance of the entire network. With less data, as illustrated in Fig. 6(b), convergences are achieved after more training epochs, while the convergences of the DA-cRAE loss, adversary loss and nuisance loss are observed in a similar pattern with the full % data case, indicating the capability of the proposed model to learn universal features from data with even smaller sizes. Overall, we observe that with both adversary and nuisance networks attached to the encoder, the classifier improves the accuracy substantially and shows more stable performance across different left-out subjects.

4 Conclusion

A transfer learning framework was proposed based on a soft-disentangled adversarial model utilizing the concept of RAE to extract universal and nuisance-robust physiological features. In order to implement the rateless property and manipulate the trade-off between subject-specific features and task-relevant information, additional blocks of adversary and nuisance networks were complementarily attached and jointly trained with different dropout strategies, and therefore the transfer learning framework is capable of handling a wider range of tasks and users. Cross-subject transfer evaluations were performed with a physiological biosignal dataset for monitoring human stress levels. Significant benefits of the proposed framework were shown by improved worst-case accuracy and average classification accuracy, demonstrating the robustness to unknown users. The adaptability of the feature extractor over several task-discriminative linear and non-linear classifiers was also shown, and the transfer-learning ability of our method to data size deficiency was analysed. Note that our methodology is applicable to various different systems requiring nuisance-robust analysis beyond HCI.


  1. https://physionet.org/content/noneeg/1.0.0/


  1. A. M. Amiri, M. Abtahi, A. Rabasco, M. Armey and K. Mankodiya (2016) Emotional reactivity monitoring using electrodermal activity analysis in individuals with suicidal behaviors. In 10th International Symposium on Medical Information and Communication Technology, pp. 1–5. Cited by: §1.
  2. M. Atzori, M. Cognolato and H. Müller (2016) Deep learning with convolutional neural networks applied to electromyography data: a resource for the classification of movements for prosthetic hands. Frontiers in Neurorobotics 10, pp. 9. Cited by: §2.5.
  3. J. Birjandtalab, D. Cogan, M. B. Pouyan and M. Nourani (2016) A non-EEG biosignals dataset for assessment and visualization of neurological status. In IEEE International Workshop on Signal Processing Systems, pp. 110–114. Cited by: §1, §3.1.
  4. L. Chen, A. Zhang and X. Lou (2019) Cross-subject driver status detection from physiological signals based on hybrid feature selection and transfer learning. Expert Systems with Applications 137, pp. 266–280. Cited by: §1.
  5. D. Cogan, M. B. Pouyan, M. Nourani and J. Harvey (2014) A wrist-worn biosensor system for assessment of neurological status. In 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 5748–5751. Cited by: §1.
  6. H. Edwards and A. Storkey (2015) Censoring representations with an adversary. arXiv preprint arXiv:1511.05897. Cited by: §1, §2.5.3, Table 2.
  7. O. Faust, Y. Hagiwara, T. J. Hong, O. S. Lih and U. R. Acharya (2018) Deep learning for healthcare applications based on physiological signals: a review. Computer Methods and Programs in Biomedicine 161, pp. 1–13. Cited by: §2.5.
  8. S. Fazli, F. Popescu, M. Danóczy, B. Blankertz, K. Müller and C. Grozea (2009) Subject-independent mental state classification in single trials. Neural Networks 22 (9), pp. 1305–1312. Cited by: §1.
  9. D. Giakoumis, D. Tzovaras and G. Hassapis (2013) Subject-dependent biosignal features for increased accuracy in psychological stress detection. International Journal of Human-Computer Studies 71 (4), pp. 425–439. Cited by: §1.
  10. G. Giannakakis, D. Grigoriadis, K. Giannakaki, O. Simantiraki, A. Roniotis and M. Tsiknakis (2019) Review on psychological stress detection using biosignals. IEEE Transactions on Affective Computing. Cited by: §1.
  11. M. Han, S. Y. Günay, G. Schirner, T. Padır and D. Erdoğmuş (2020) HANDS: a multimodal dataset for modeling toward human grasp intent inference in prosthetic hands. Intelligent Service Robotics 13 (1), pp. 179–185. Cited by: §1.
  12. M. Han, O. Özdenizci, Y. Wang, T. Koike-Akino and D. Erdoğmuş (2020) Disentangled adversarial autoencoder for subject-invariant physiological feature extraction. In IEEE Signal Processing Letters, Cited by: §1, §1, §2.3.1, §2.4.1, Table 2.
  13. M. Han, O. Özdenizci, Y. Wang, T. Koike-Akino and D. Erdoğmuş (2020) Disentangled adversarial transfer learning for physiological biosignals. In 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Cited by: §1, §1, §2.3.1, §2.4.1, Table 2.
  14. S. H. Jambukia, V. K. Dabhi and H. B. Prajapati (2015) Classification of ecg signals using machine learning techniques: a survey. In 2015 International Conference on Advances in Computer Engineering and Applications, pp. 714–721. Cited by: §1.
  15. S. Jerritta, M. Murugappan, R. Nagarajan and K. Wan (2011) Physiological signals based human emotion recognition: a review. In IEEE 7th International Colloquium on Signal Processing and its Applications, pp. 410–415. Cited by: §1.
  16. T. Koike-Akino and Y. Wang (2020) Stochastic bottleneck: rateless auto-encoder for flexible dimensionality reduction. In 2020 IEEE International Symposium on Information Theory (ISIT), Cited by: §1, §2.2.
  17. G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer and M. Ranzato (2017) Fader networks: manipulating images by sliding attributes. In Advances in Neural Information Processing Systems, pp. 5967–5976. Cited by: §1, §2.5.3, Table 2.
  18. H. Morioka, A. Kanemura, J. Hirayama, M. Shikauchi, T. Ogawa, S. Ikeda, M. Kawanabe and S. Ishii (2015) Learning a common dictionary for subject-transfer decoding with resting calibration. NeuroImage 111, pp. 167–178. Cited by: §1.
  19. O. Özdenizci, Y. Wang, T. Koike-Akino and D. Erdoğmuş (2019) Adversarial deep learning in EEG biometrics. IEEE Signal Processing Letters 26 (5), pp. 710–714. Cited by: §1, §2.5.
  20. O. Özdenizci, Y. Wang, T. Koike-Akino and D. Erdoğmuş (2019) Transfer learning in brain-computer interfaces with adversarial variational autoencoders. In 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), pp. 207–210. Cited by: §1.
  21. O. Özdenizci, Y. Wang, T. Koike-Akino and D. Erdoğmuş (2020) Learning invariant representations from EEG via adversarial inference. IEEE Access 8, pp. 27074–27085. Cited by: §1.
  22. O. Özdenizci (2018) Time-series prediction of proximal aggression onset in minimally-verbal youth with autism spectrum disorder using physiological biosignals. In 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 5745–5748. Cited by: §1.
  23. P. C. Petrantonakis and L. J. Hadjileontiadis (2009) Emotion recognition from EEG using higher order crossings. IEEE Transactions on Information Technology in Biomedicine 14 (2), pp. 186–197. Cited by: §1.
  24. Y. Sun, X. Jing, F. Wu, J. Li, D. Xing, H. Chen and Y. Sun (2020) Adversarial learning for cross-project semi-supervised defect prediction. IEEE Access 8, pp. 32674–32687. Cited by: §1, §2.5.3, Table 2.
  25. M. Tschannen, O. Bachem and M. Lucic (2018) Recent advances in autoencoder-based representation learning. arXiv preprint arXiv:1812.05069. Cited by: §1, §2.5.3, Table 2.
  26. T. Wen and Z. Zhang (2018) Deep convolution neural network and autoencoders-based unsupervised feature learning of eeg signals. IEEE Access 6, pp. 25399–25410. Cited by: §1, §2.5.3, Table 2.
  27. F. Wu, X. Jing, Z. Wu, Y. Ji, X. Dong, X. Luo, Q. Huang and R. Wang (2020) Modality-specific and shared generative adversarial network for cross-modal retrieval. Pattern Recognition, pp. 107335. Cited by: §1, §2.5.3, Table 2.
  28. Y. Yang, K. Zheng, C. Wu and Y. Yang (2019) Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network. Sensors 19 (11), pp. 2528. Cited by: §1, §2.5.3, Table 2.
  29. Z. Yin, M. Zhao, W. Zhang, Y. Wang, Y. Wang and J. Zhang (2019) Physiological-signal-based mental workload estimation via transfer dynamical autoencoders in a deep learning framework. Neurocomputing 347, pp. 212–229. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description