A spectralspatial fusion model for robust blood pulse waveform extraction in photoplethysmographic imaging
Abstract
Photoplethysmographic imaging is a camerabased solution for noncontact cardiovascular monitoring from a distance. This technology enables monitoring in situations where contactbased devices may be problematic or infeasible, such as ambulatory, sleep, and multiindividual monitoring. However, extracting the blood pulse waveform signal is challenging due to the unknown mixture of relevant (pulsatile) and irrelevant pixels in the scene. Here, we design and implement a signal fusion framework, FusionPPG, for extracting a blood pulse waveform signal with strong temporal fidelity from a scene without requiring anatomical priors (e.g., facial tracking). The extraction problem is posed as a Bayesian least squares fusion problem, and solved using a novel probabilistic pulsatility model that incorporates both physiologically derived spectral and spatial waveform priors to identify pulsatility characteristics in the scene. Experimental results show statistically significantly improvements compared to the FaceMeanPPG method () and DistancePPG () methods. Heart rates predicted using FusionPPG correlated strongly with ground truth measurements (). FusionPPG was the only method able to assess cardiac arrhythmia via temporal analysis.
1 Introduction
Photoplethysmography (PPG) is a safe and inexpensive cardiovascular monitoring technology [1]. Using an illumination source and detector, transient fluctuations in illumination intensity pertaining to localized changes in blood volume enable noninvasive probing of vascular characteristics. Traditionally, PPG is monitored via a contact probe that operates in either transmittance or reflectance mode, where the source and detector are placed on opposite or the same side of the tissue, respectively. However, this conventional type of monitoring provides hemodynamic information only for a single point, and is an impractical measurement tool in settings such as ambulatory and multiindividual monitoring.
Recent studies have focused on developing and validating photoplethysmographic imaging (PPGI) systems [2]. These systems substitute contactbased detectors with a camera and use noncontact illumination sources, enabling noncontact cardiovascular monitoring from a distance [3]. The additional visual context can enable functionality such as motion compensation during exercise [4], multiindividual tracking [5], and spatial perfusion analysis [6]. One major challenge in PPGI is the automatic extraction of a blood pulse waveform from the video. Decoupling the detector from the body causes several challenges, such as illumination variations, motion changes, and hair occlusion. Furthermore, locations on the body that contain pulsatile flow are not readily apparent. Identifying pixels representing the skin does not guarantee pulsatile blood flow, as some areas may be minimally vascularized, or may not contain pulsatile vessels. In fact, the pulsatile nature of the blood pulse waveform is not fully understood [3, 7].
Existing methods for automatic signal extraction broadly rely on a combination of spatial and spectral information. The RGB components in cameras with Bayer filters have been leveraged to identify the blood pulse waveform using independent component analysis [5, 8], BeerLambert modeling [9], and skin composition modeling [10]. However, these methods rely on measuring multispectral reflectance values such as RGB, which may not be appropriate in lowlight settings such as sleep monitoring. Furthermore, the tissue penetration depth of incident illumination is wavelength and tissuedependent [11]. To solve this problem, some methods rely only on a single wavelength (or color channel) to extract the signal through spatial analysis [4, 6, 12]. Regardless of the spectrum chosen, existing methods average the pixel intensities over chosen areas, such as the facial bounding box [4, 5, 9], predefined facial areas [8], and facial segmentation [12]. Many of these averaging approaches evaluate success based on heart rate analysis, and do not consider temporal signal fidelity, which is important for temporal analysis such as heart rate variability. Additionally, they rely on the success of a facial tracking algorithm, which may fail due to varying lighting conditions, different facecamera perspectives, or from various facial features, and require overhead computation which may delay realtime analysis.
Here, we propose a spectralspatial fusion method for extracting a blood pulse waveform from a set of frames from an arbitrary scene (i.e., without facial tracking). Our goal was to extract signals that exhibited both spectral and temporal fidelity, to enable both spectral and temporal analysis. Using physiologically derived a priori spectral and spatial information about a typical blood pulse waveform, our method learns which regions contain the strongest pulsatility, and emphasizes their contribution to determine the final signal. Results across a 24participant study show that the proposed method generated signals that exhibited significantly stronger temporal correlation and spectral entropy compared to existing methods. As a result, this method may be used in the future to assess pulsatility in different anatomical locations.
2 Methods
The goal of the spectralspatial fusion model was to extract a clean temporal blood pulse waveform signal from a scene. By emphasizing temporal fidelity, not only can summary metrics such as heart rate be computed, but important temporal fluctuations such as cardiac arrhythmias can be assessed. The scene is assumed to contain an unknown mixture of relevant regions (i.e., skin areas which exhibit pulsatility), and irrelevant regions (e.g., background, clothing, nonpulsatile skin regions, etc.). Given this mixture of regions (input), the system must discover a temporal PPGI signal (output). Figure 1 provides a graphical overview of the system. Details are provided below.
2.1 Problem Formulation
Let be the (unknown) true blood pulse waveform. Let be a set of absorbance signals, where:
(1) 
where is the Dirac delta function, and is the sampling period. Here, following the BeerLambert law, absorbance is calculated as , where is the intensity signal for region . Each signal was detrended using a regularized least squares subtraction method which heavily emphasizes a smoothness prior [13]. Given the set of measurements , which is a mixture of signals from a scene that are both relevant (e.g., skin) and irrelevant (e.g., background, skin folds, hair), the goal is to estimate the “true” blood pulse signal using an intelligently weighted subset of regions that contain pulsatility. This inverse problem can be formulated as a Bayesian problem, where prior physiology knowledge can be injected into the model to educate assumptions about the state (specific priors will be discussed in the following section). Mathematically, it can be solved using the Bayesian least squares formulation [14]:
(2)  
(3) 
where is the posterior distribution of state signal given the measurements . The optimal solution is found by setting :
(4) 
Simplifying:
(5)  
(6) 
(7) 
Thus, to solve this equation, the unknown posterior distribution must be modeled. This distribution represents the probability that a state signal represents the true blood pulse waveform given the observed temporal signals . The posterior distribution can be modeled as a novel probabilistic pulsatility model, which we approximated using a discrete weighted histogram of the observed states [15]:
(8) 
where is a normalization term such that . The problem then becomes computing the probabilistic prior for each observed signal to determine how well it represents the true blood pulse waveform. The following subsections propose a solution using a spectralspatial model motivated by blood pulse waveform characteristics and vascular physiology.
2.2 Probabilistic Pulsatility Model
Ideally, should be a function of the SNR of the estimated temporal signal, since this provides information about the signal fidelity. However, SNR requires knowing the true signal, which is unknown at the time of acquisition. A proxy metric for estimating SNR should thus be computed using prior knowledge of blood pulse waveform characteristics. A spectralspatial model is proposed based on the following two observations, which can be leveraged as prior information in the Bayesian framework presented above:

Spectral: Clean blood pulse waveforms are quasiperiodic, and are primarily composed of a weighted sum of a small set of sinusoidal signals (see Figure 2).

Spatial: Nonhomogeneous skin areas exhibit high variability due to anatomical nonuniformity (e.g., boundary, skin fold, hair).
For motivation, Figure 2 shows a typical power spectral density of a clean blood pulse waveform. The spectral energy is compact, and is primarily composed of two harmonic frequencies. This indicates the quasiperiodic nature of the blood pulse waveform, and provides rationale for the spectral model.
In order to compute spectral properties, the normalized 0DC spectral power distribution for spatial region was computed:
(9) 
where are the complexvalued Fourier transform frequency coefficients for . The normalized spectral power (i.e., ) was used to model the relative AC pulsatile amplitude in the unitless blood pulse waveforms.
The quasiperiodic blood pulse waveform is dominated by the fundamental frequency corresponding to the heart rate and the first harmonic (see Figure 2). To quantify this property, the spectral power exhibited by the fundamental frequency and first harmonic was computed:
(10) 
where , and is the spectral window’s halfwidth. We used . was set to 0 for signals whose fundamental frequency was outside of the physiologically realistic heart rate range. The final “harmonic prior” was computed as:
(11) 
where is a tuning parameter. An inverse exponential was used to emphasize small values of (i.e., strong harmonic contributions).
To quantify noise exhibited by the quasiperiodic waveform, the maximum spectral power response outside of the fundamental heart rate range was found:
(12) 
The final “noise prior” was computed as:
(13) 
where is a tuning parameter. An inverse exponential model was used to emphasize small values of (i.e., low noise).
Local anatomical variations may corrupt any pulsatile signals exhibited by underlying vessels (e.g., hair, skin fold, shadow ridge), or may not contain a pulsatile components at all (e.g., clothing, naris, eyelid). In order to estimate the anatomical uniformity at a given location, the image gradient was computed. In particular, given an image scene whose individual regions are , the “spatial prior” was computed as:
(14) 
where is the gradient of image . An inverse exponential model was used to emphasize small values of (i.e., homogenous areas).
The individual priors for region were combined to form the final region spectralspatial probabilistic prior:
(15) 
where , and is the neighbourhood around region . Here, a regional first order statistic constraint was imposed on the priors in order to further enforce spatial cohesion. Substituting this into Equation 8 produces the estimate of the posterior distribution .
3 Results
3.1 Setup
Data were collected across 24 participants of varying age (9–60 years, ) and body compositions (fat% , muscle% , BMI kgm). Participants assumed a supine position throughout the study. A coded hemodynamic imaging (CHI) system was positioned facing down at the participant at a distance of 1.5 m, comprising a monochromatic camera with NIR sensitivity (Point Grey GS3U341C6NIR) and a diffuse halogen illumination source (Lowel Rifa eX 44). To capture deep tissue penetration using NIR wavelengths, and to minimize the effects of visible environmental illumination (e.g., flicker), an 850–1000 nm optical bandpass filter was mounted in front of the camera lens. A video of each participant was recorded at 60 fps, with 16 ms exposure time. The frames were downsampled using blockwise averaging. The ground truth PPG waveform was synchronously recorded using the Easy Pulse photoplethysmography finger cuff [16].
We compared our method, henceforth called FusionPPG, with DistancePPG [12] and “FaceMeanPPG”, where the face is tracked and the signal is extracted through framewise spatial averaging. This method is commonly used in similar studies [4, 5, 8]. Many pulse extraction methods rely on processing individual color channels [5, 8, 9, 17], and were therefore infeasible for this study (and infeasible in lowlight settings, such as sleep studies). For our implementation of FaceMeanPPG, we spatially averaged the area identified by ViolaJones face tracker [5]. In its original implementation, DistancePPG requires estimating the true heart rate based on an averaging approach similar to FaceMeanPPG [12]. To generate optimal comparison results, DistancePPG was provided with the groundtruth heart rate (rather than their estimation method, which was found to fail in some cases).
In order to evaluate and compare signal fidelity between methods, normalized spectral entropy () and Pearson’s linear correlation coefficient () were computed for each extracted signal:
(16) 
where is the normalized spectral power for according to Equation 9, and
(17) 
where are the standard deviation of the extracted signal and groundtruth signal respectively, and is the covariance between the two signals. To account for pulse time differences between the neck/head and finger, the maximum forwardsliding crosscorrelation value within a short temporal window was used.
The heart rate of a blood pulse waveform signal was computed in the temporal domain using an autocorrelation scheme for increased temporal resolution [18]. Specifically, each waveform was resampled at 200 Hz using cubic spline interpolation, and autocorrelation peaks were detected and used to estimate heart rate:
(18) 
where is the sampling rate, and is the time shift yielding peak autocorrelation response. Hyperparameter optimization was performed to find optimal tuning parameters using a grid search method with the following performance metric:
(19) 
where is the Fourier coefficient of the estimated signal , and is the set of coefficients pertaining to the fundamental frequency and first harmonic of the . When choosing the optimal hyperparameters, signals exhibiting physiologically unrealistic heart rates were excluded.
One participant’s data were removed due to erroneous groundtruth waveform readings. The study was approved by a University of Waterloo Research Ethics committee.
3.2 Data Analysis
Figure 3 shows the signals extracted using the proposed fusion method compared to the groundtruth finger waveform. The waveforms exhibited high temporal fidelity, and were highly correlated to the groundtruth waveforms. The foot of each blood pulse waveform can be observed, signifying the precise time of the start of ventricular contraction. The method failed on one participant due to high fat content (42.3%).
FusionPPG outperformed both comparison methods on the sample dataset. Figure 4 compares the box plot of the proposed and comparison methods using correlation (higher is better) and normalized spectral entropy (lower is better). FusionPPG attained statistically significantly higher correlation to the groundtruth waveform than FaceMeanPPG () and DistancePPG (), signifying signals with higher temporal fidelity. FusionPPG also attained statistically significantly lower normalized spectral entropy than FaceMeanPPG () and DistancePPG (), signifying more compact frequency components, consistent with the quasiperiodic nature of a true blood pulse waveform. DistancePPG attained higher correlation and lower entropy than FaceMeanPPG, consistent with previous findings [12].
FusionPPG was able to precisely estimate heart rate from the extracted waveforms. Figure 5 shows the correlation and BlandAltman plots showing FusionPPG’s ability to extract precise and accurate heart rate. The predicted heart rates were highly correlated to the groundtruth heart rate (), and were in tight agreement, with low mean error () and low variance (). The data were well represented within two standard deviations from the mean. The outlier was omitted from this analysis due to failed signal extraction.
Figure 6 compares the extracted waveforms from four participants using the three methods to the groundtruth waveform. The strongest waveforms (i.e., highest correlation) from DistancePPG, FaceMeanPPG, and FusionPPG were shown. An important characteristic is the foot of the waveform, which signifies the start of ventricular contraction. This foot was observed in each case, whereas it was not easily discernible in either DistancePPG or FaceMeanPPG due to the effects of averaging, resulting perhaps in a strong fundamental frequency which can predict heart rate, but is affected by spurious irrelevant frequencies that corrupt the waveform shape.
Figure 6(d) shows a participant that experienced a cardiac arrhythmia. An irregular cardiac contraction was observed at , resulting in a delayed contraction. Such cases cannot be observed using standard heart rate analysis in the frequency domain. However, the irregular heartbeat and delayed followup contraction was observed in FusionPPG’s waveform, whereas it was not apparent in FaceMeanPPG or DistancePPG. This demonstrates the important of temporal signal fidelity to assess irregular cardiac events that deviate from typical waveforms.
4 Discussion
In this study, the most pulsatile areas were often found in the neck region. The neck contains important vascular pathways, including the carotid arteries, which are major vessels that are closely connected to the heart and are close to the surface compared to other major arteries in the body. Pulsatile information in the face is more subdued, since the small arteries and arterioles are found further down the vascular tree. Thus, many existing methods that extract signals from the face may be at a disadvantage, and miss the rich information present in the neck.
The extraction method failed on the participant who had the highest fat % of the sample. Skin folds and thick tissue layers contributed to the inability to extract a signal with any method. Many existing studies do not provide participant body composition, which is an important parameter when assessing signal strength.
An important characteristic that is not often discussed in PPGI studies is its ability to extract and assess abnormal waveforms (e.g., arrhythmia). In order to be useful as a health monitoring system, a PPGI system must not directly or indirectly assume normal waveforms. This was apparent in the arrhythmia case (Figure 6(d)), where only FusionPPG’s waveform was able to temporally convey an abnormal cardiac event. During validation, emphasis should be placed on detected abnormal as well as normal waveforms.
Traditional heart rate variability is assessed through the RR peak intervals using an electrocardiogram. However, similar timing differences can be observed and quantified using the blood pulse waveform. An important part of this waveform is the blood pulse foot, which is the minimum point just prior to inflection due to the oncoming blood pulse. The blood pulse is ejected from the heart due to left ventricular contraction, which is directly controlled by the electrical signals governing the heart mechanics. The timing difference between the ECG’s R peak and the PPG’s foot is the pulse transit time. Thus, timing differences between the blood pulse feet indicate timing differences in the heart [19].
Many existing methods, including DistancePPG and FaceMeanPPG, require tracking and/or segmenting the individual’s face. However, it may be beneficial to assess pulsatility in areas other than the face (e.g., arm, hand, leg, foot). These methods will fail at this task since no face will be detected. In contrast, FusionPPG does not make any a priori assumptions about anatomical locations, and may therefore be used to assess pulsatility at other anatomical locations in future work.
5 Conclusions
We have proposed a probabilistic signal fusion framework, FusionPPG, for extracting a blood pulse waveform signal from a scene using physiologically derived prior information. This was accomplished by posing the problem as a Bayesian problem and modeling the posterior distribution as a novel probabilistic pulsatility model that incorporated spectral and spatial priors derived from blood pulse waveform physiology. Results showed signals with strong temporal fidelity. The proposed method’s improvement over the comparison methods was statistically significant () by assessing correlation and normalized spectral entropy. The FusionPPG waveform was the only one where cardiac arrhythmia was identifiable in the temporal domain. The model has presented such that it allows for future extensions based on this general theoretical framework
Funding
This work was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada, AGEWELL NCE Inc., and the Canada Research Chairs program.
References
 [1] J. Allen, “Photoplethysmography and its application in clinical physiological measurement,” Physiological Measurement 28(3), R1–R39 (2007).
 [2] Y. Sun and N. Thakor, “Photoplethysmography revisited: from contact to noncontact, from point to imaging,” IEEE Transactions on Biomedical Engineering 63(3), 463–477 (2016).
 [3] J. Allen and K. Howell, “Microvascular imaging: techniques and opportunities for clinical physiological measurements,” Physiological Measurement 35(7), R91 (2014).
 [4] Y. Sun, S. Hu, V. AzorinPeris, S. Greenwald, J. Chambers, and Y. Zhu, “Motioncompensated noncontact imaging photoplethysmography to monitor cardiorespiratory status during exercise,” Journal of Biomedical Optics 16(7), 077010–1–077010–9 (2011).
 [5] M.Z. Poh, D. J. McDuff, and R. W. Picard, “Noncontact, automated cardiac pulse measurements using video imaging and blind source separation,” Optics Express 18(10), 10762–10774 (2010).
 [6] A. A. Kamshilin, S. Miridonov, V. Teplov, R. Saarenheimo, and E. Nippolainen, “Photoplethysmographic imaging of high spatial resolution,” Biomedical Optics Express 2(4), 996–1006 (2011).
 [7] A. A. Kamshilin, E. Nippolainen, I. S. Sidorov, P. V. Vasilev, N. P. Erofeev, N. P. Podolian, and R. V. Romashko, “A new look at the essence of the imaging photoplethysmography,” Scientific Reports 5(10494) (2015).
 [8] D. McDuff, S. Gontarek, and R. W. Picard, “Remote detection of photoplethysmographic systolic and diastolic peaks using a digital camera,” IEEE Transactions on Biomedical Engineering 61(12), 2948–2954 (2014).
 [9] S. Xu, L. Sun, and G. K. Rohde, “Robust efficient estimation of heart rate pulse from video,” Biomedical Optics Express 5(4), 1124–1135 (2014).
 [10] W. Wang, S. Stuijk, and G. De Haan, “Exploiting spatial redundancy of image sensor for motion robust rPPG,” IEEE Transactions on Biomedical Engineering 62(2), 415–425 (2015).
 [11] R. Anderson and J. A. Parrish, “The optics of human skin.,” Journal of Investigative Dermatology 77(1), 13–19 (1981).
 [12] M. Kumar, A. Veeraraghavan, and A. Sabharwal, “DistancePPG: Robust noncontact vital signs monitoring using a camera,” Biomedical Optics Express 6(5), 1565–1588 (2015).
 [13] M. P. Tarvainen, P. O. Rantaaho, and P. A. Karjalainen, “An advanced detrending method with application to HRV analysis,” IEEE Transactions on Biomedical Engineering 49(2), 172–175 (2002).
 [14] P. Fieguth, Statistical Image Processing and Multidimensional Modeling, Springer (2010).
 [15] A. Wong, A. Mishra, W. Zhang, P. Fieguth, and D. A. Clausi, “Stochastic image denoising based on Markovchain Monte Carlo sampling,” Signal Processing 91(8), 2112–2120 (2011).
 [16] RB, “Easy Pulse sensor (version 1.1) overview (part 1).” http://embeddedlab.com/blog/?p=7336 (2013). (12 Dec 2014).
 [17] G. de Haan and A. Van Leest, “Improved motion robustness of remoteppg by using the blood volume pulse signature,” Physiological Measurement 35(9), 1913 (2014).
 [18] J. Wander and D. Morris, “A combined segmenting and nonsegmenting approach to signal quality estimation for ambulatory photoplethysmography,” Physiological Measurement 35(12), 2543–2561 (2014).
 [19] A. Schäfer and J. Vagedes, “How accurate is pulse rate variability as an estimate of heart rate variability?: A review on studies comparing photoplethysmographic technology with an electrocardiogram,” International Journal of Cardiology 166(1), 15–29 (2013).