Diffusion-based nonlinear filtering for multimodal data fusion with application to sleep stage assessment
The problem of information fusion from multiple data-sets acquired by multimodal sensors has drawn significant research attention over the years. In this paper, we focus on a particular problem setting consisting of a physical phenomenon or a system of interest observed by multiple sensors. We assume that all sensors measure some aspects of the system of interest with additional sensor-specific and irrelevant components. Our goal is to recover the variables relevant to the observed system and to filter out the nuisance effects of the sensor-specific variables. We propose an approach based on manifold learning, which is particularly suitable for problems with multiple modalities, since it aims to capture the intrinsic structure of the data and relies on minimal prior model knowledge. Specifically, we propose a nonlinear filtering scheme, which extracts the hidden sources of variability captured by two or more sensors, that are independent of the sensor-specific components. In addition to presenting a theoretical analysis, we demonstrate our technique on real measured data for the purpose of sleep stage assessment based on multiple, multimodal sensor measurements. We show that without prior knowledge on the different modalities and on the measured system, our method gives rise to a data-driven representation that is well correlated with the underlying sleep process and is robust to noise and sensor-specific effects.
keywords:common manifold learning, multimodal sensor fusion, nonlinear-filtering, diffusion maps, alternating diffusion maps
Often, when measuring a phenomenon of interest that arises from a complex dynamical system, a single data acquisition method is not capable of capturing its entire complexity and characteristics, and it is usually prone to noise and interferences. Recently, due to technological advances, the use of multiple types of measurement instruments and sensors have become more and more popular; nowadays, such equipment is smaller, less expensive, and can be mounted on every-day products and devices more easily. In contrast to a single sensor, multimodal sensors may capture complementary aspects and features of the measured phenomenon, and may enable us to extract a more reliable and detailed description of the measured phenomenon.
The vast progress in the acquisition of multimodal data calls for the development of analysis and processing tools, which appropriately combine data from the different sensors and handle well the inherent challenges that arise. One particular challenge is related to the heterogeneity of the data acquired in the different modalities; datasets acquired from different sensors may comprise different sources of variability, where only few are relevant to the phenomenon of interest. This particular challenge as well as many others have been the subject of many studies. For a recent comprehensive reviews, see khaleghi2013multisensor; lahat2015multimodal; gravina2017multi.
In this paper we consider a setting in which a physical phenomenon is measured by multiple sensors. While all sensors measure the same phenomenon, each sensor consists of different sources of variability; some are related to the phenomenon of interest, possibly capturing its various aspects, whereas other sources of variability are sensor-specific and irrelevant. We present an approach based on manifold learning, which is a class of nonlinear data-driven methods, e.g. Tenenbaum2000; Roweis2000; Donoho2003; Belkin_Niyogi:2003, and specifically, we use the framework of diffusion maps (DM)Coifman_Lafon:2006. On the one hand, manifold learning is particularly suitable for problems with multiple modalities since it aims to capture the intrinsic geometric structure of the underlying data and relies on minimal prior model knowledge. This enables to handle multimodal data in a systematic manner, without the need to specially tailor a solution for each modality. On the other hand, applying manifold learning to data acquired in multiple (multimodal) sensors may capture undesired/nuisance geometric structures as well. Recently, several manifold learning techniques for multimodal data have been proposed davenport2010joint; keller_audio_visual_2010; yair2016local; salhov2016multi. In davenport2010joint, the authors suggest to concatenate the samples acquired by different sensors into unified vectors. However this approach is sensitive to the scaling of each dataset, which might be especially diverse among datasets acquired by different modalities. To alleviate this problem, it is proposed in keller_audio_visual_2010 to use DMto obtain “standardized” representation of each dataset separately, and then to concatenate these “standardized” representations into the unified vectors. Despite handling better multimodal data, this concatenation scheme does not utilize the mutual relations and co-dependencies that might exist between the datasets.
While methods such as davenport2010joint; keller_audio_visual_2010; salhov2016multi take into account all the measured information, the methods in lederman2015alternating; talmon2016latent use DMto implement nonlinear filtering. Specifically, following a recent line of study in which multiple kernels are constructed and combined de_sa_spectral_2005; de_sa_multi_view_2010; boots_two_manifold_2012; michaeli2015nonparametric, in lederman2015alternating; talmon2016latent, it was shown that a method based on alternating applications of diffusion operators extracts only the common source of variability among the sensors, while filtering out the sensor-specific components. The shortcoming of this method arises when having a large number of sensors; often, sensors that measure the same system capture different information and aspects of that system. As a result, the common source of variability among all the sensors captures only a partial or empty look of the system, and important relevant information may be undesirably filtered out.
Here, we address the tradeoff between these two approaches. That is, we aim to maintain the relevant information captured by multiple sensors, while filtering out the nuisance components. Since the relevance of the various components is unknown, our main assumption is that the sources of variability which are measured only in a single sensor, i.e., sensor-specific, are nuisance. Conversely, we assume that components measured in two or more sensors are of interest. Importantly, such an approach implements implicitly a smart “sensor selection”; “bad” sensors that are, for example malfunctioned and measure only nuisance information, are automatically filtered out. These assumptions stem from the fact that the phenomenon of interest is global and not specific to one sensor. We propose a nonlinear filtering scheme, in which only the sensor-specific sources of variability are filtered out while the sources of variability captured by two or more sensors are preserved.
Based on prior theoretical results lederman2015alternating; talmon2016latent, we show that our scheme indeed accomplishes this task. We illustrate the main features of our method on a toy problem. In addition, we demonstrate its performance on real measured data in an application for sleep stage assessment based on multiple, multimodal sensor measurements. Sleep is a global phenomenon with systematic physiological dynamics that represents a recurring non-stationary state of mind and body. Sleep evolves in time and embodies interactions between different subsystems, not solely limited in the brain. Thus, in addition to the well-known patterns in electroencephalogram (EEG) signals, its complicated dynamics are manifested in other sensors such as sensors measuring breathing patterns, muscle tones and muscular activity, eyeball movements, etc. Each one of the sensors is characterized by different structures and affected by numerous nuisance processes as well. In other words, while we could extract the sleep dynamics by analyzing different sensors, each sensor captures only part of the entire sleep process, whereas it introduces modality artifacts, noise, and interferences. We show that our scheme allows for an accurate systematic sleep stage identification based on multiple EEG recordings as well as multimodal respiration measurements. In addition, we demonstrate its capability to perform sensor selection by artificially adding noise sensors.
The remainder of the paper is organized as follows. In Section 2 we present a formulation for the common source extraction problem and present an illustrative toy problem. In Section LABEL:sec:Algorithm, a brief review for the method proposed in lederman2015alternating; talmon2016latent is outlined, and then, a detailed description and interpretation of the proposed scheme are presented. In Section LABEL:sec:Experimental_results, we first demonstrate the capabilities of the proposed scheme on the toy problem introduced in Section 2. Then, we demonstrate the performance in sleep stage identification based on multimodal measured data recorded in a sleep clinic. Finally, in Section LABEL:sec:Conclusions, we outline several conclusions.
2 Problem Setting
Consider a system driven by a set of hidden random variables , where . The system is measured by observable variables , where each sensor has access to only a partial view of the entire system and its driving variables . To formulate it, we define a “sensitivity table” given by the binary matrix , indicating the variables sensed by each observable variable. Specifically, the th element in indicates whether the hidden variable is measured by the observable variable . The observable variables are therefore given by
where is a bilipschitz observation function, are hidden random variables captured only by the th observable variable, and is the subset of driving hidden variables of interest sensed by , given by
The random hidden variables are sensor-specific (associated only with the th observer). They are conditionally independent given the hidden variables of interest and will be assumed as noise/nuisance variables. We further assume that each random hidden variable in is measured by at least two observable variables, such that for each . As a result, we refer to the hidden variables in as common variables.
In order to simplify the notation, we denote the subset of all hidden variables (both common and sensor-specific) measured by the th observable by . Furthermore, we assume that the dimensions of the observations and the hidden variables satisfy
i.e., the observations are in higher dimension than the hidden common and nuisance variables.
|number of common hidden variables|
|th common hidden variable|
|set of all common hidden variables|
|number of observable variables|
|th observable variable|
|a bilipschitz th observation function|
|th sensor-specific hidden (nuisance) variables|
|subset of sensed by|
|subset of all hidden variables measured by|