Partial Separability and Functional Graphical Models for Multivariate Gaussian Processes
The covariance structure of multivariate functional data can be highly complex, especially if the multivariate dimension is large, making extension of statistical methods for standard multivariate data to the functional data setting quite challenging. For example, Gaussian graphical models have recently been extended to the setting of multivariate functional data by applying multivariate methods to the coefficients of truncated basis expansions. However, a key difficulty compared to multivariate data is that the covariance operator is compact, and thus not invertible. The methodology in this paper addresses the general problem of covariance modeling for multivariate functional data, and functional Gaussian graphical models in particular. As a first step, a new notion of separability for multivariate functional data is proposed, termed partial separability, leading to a novel Karhunen-Loève-type expansion for such data. Next, the partial separability structure is shown to be particularly useful in order to provide a well-defined Gaussian graphical model that can be identified with a sequence of finite-dimensional graphical models, each of fixed dimension. This motivates a simple and efficient estimation procedure through application of the joint graphical lasso. Empirical performance of the method for graphical model estimation is assessed through simulation and analysis of functional brain connectivity during a motor task.
Keywords: Functional Data; Inverse Covariance; Functional Brain Connectivity
The analysis of functional data continues to be an important field for statistical development given the abundance of data collected over time via sensors or other tracking equipment. Frequently, such time-dependent signals are vector-valued, resulting in multivariate functional data. Prominent examples include longitudinal behavioral tracking , blood protein levels , traffic measurements [6, 7], and neuroimaging data [19, 13], for which dimensionality reduction and regression have been the primary methods investigated. As for standard multivariate data, the nature of dependencies between component functions of multivariate functional data constitute an important question requiring careful consideration.
As our motivating data example, we will consider dependencies between fMRI signals for a large number of regions across the brain during a motor task experiment. Since fMRI signals are collected simultaneously, it is natural to consider a multivariate process , where is an interval in over which the scans are taken, as a model for the fMRI signals . The dual multivariate and functional aspects of the data make the covariance structure of quite complex, particularly if the multivariate dimension is large. This leads to difficulties in extending highly useful multivariate analysis techniques, such as graphical models, to multivariate functional data without further structural assumptions. For example, in the analogous setting of spatio-temporal data, it is common to impose further structure to the covariance, usually assuming that the spatial and temporal effects can be separated in some way. However, similar notions for multivariate functional data have not yet been considered.
As for ordinary multivariate data, the conditional independence properties of are perhaps of greater interest than marginal covariance, leading to the consideration of inverse covariance operators and graphical models for functional data. If is Gaussian, each component function is represented by a node in the functional Gaussian graphical model (FGGM), consisting of a single network of nodes. This is inherently different from methods seeking to estimate time-dependent graphical models (e.g. [28, 15, 22, 20]). For such models, the graph is dynamic and has nodes corresponding to scalar random variables. In an FGGM, the graph is static while each node represents an infinite-dimensional functional object. This is an important distinction, as covariance operators for functional data are compact and thus not invertible in the usual sense, so that presence or absence of edges cannot in general be identified immediately with zeros in any precision operator. In the past few years, there has been some investigation into FGGMs.  developed a Bayesian framework for graphical models on product function spaces, including the extension of Markov laws and appropriate prior distributions.  implemented a truncation approach, whereby each function is represented by the coefficients of a truncated basis expansion using functional principal components analysis, and a finite-dimensional graphical model is estimated by a modified graphical lasso criterion.  developed a non-Gaussian variant, where conditional independence was replaced by a notion of so-called additive conditional independence.
The methodology proposed in this paper is within the setting of multivariate Gaussian processes as in , and we explore notions of separability for multivariate functional data and their implications on the existence and estimation of suitable inverse covariance objects. There are at least three novel contributions of this methodology to the fields of functional data analysis and Gaussian graphical models. First, we define partial separability for multivariate functional data that yields a novel representation termed the partial separability Karhunen-Loève expansion (PSKL), given its similarity to the well-known analog for univariate functional data. The second contribution is to show that, when the process is indeed partially separable, an FGGM is well-defined and can be identified with a sequence of finite-dimensional graphical models. In particular, the FGGM under partial separability overcomes the problem of noninvertibility of the covariance operator when is infinite-dimensional, in contrast with [29, 21] which assumed that the functional data were concentrated on finite-dimensional subspaces. Third, we develop an intuitive estimation procedure for the partially separable functional Gaussian graphical model (psFGGM) based on simultaneous estimation of multiple graphical models. Furthermore, theoretical properties are derived under the regime of fully observed functional data. Empirical performance of the psFGGM is then compared to the FGGM method of  through simulations involving dense and noisily observed functional data, including a setting where partial separability is not satisfied. Finally, the method is applied to the study of functional brain connectivity using data from the Human Connectome Project (HCP) corresponding to a motor task experiment. Through these practical examples, psFGGM is shown to provide improved efficiency in estimation and computation. We also note that a downloadable version of both PSKL and psFGGM methods has been developed in R, and is freely available at https://github.com/javzapata/fgm.
2.1 Multivariate Functional Data
Let denote the space of square-integrable measurable functions on endowed with the standard inner product
and associated norm is its -fold Cartesian product for fixed , endowed with inner product and its associated norm In this paper, multivariate functional data constitute a random sample from a multivariate process , which we assume to be zero-mean and Gaussian such that almost surely and . The infinite-dimensional nature of these data makes dimension reduction a prerequisite for any statistical analysis, and we now review the two most commonly used methods of linear dimension reduction for multivariate functional data. We restrict ourselves to linear methods since a main object of the paper is to fit a graphical model as defined by the covariance properties of
The first method is ordinary FPCA applied to each component function. For each , the probability measure on associated with is uniquely characterized by the covariance kernel or, equivalently, the associated integral operator given by The well-known Karhunen-Loève expansion states that there exists an orthonormal sequence formed by the eigenfunctions of such that
A main takeaway from (1) is the equivalence between and the infinite sequence of uncorrelated functional principal component scores (FPCs), which satisfy and Here, represents the nonincreasing sequence of repeated eigenvalues of and satisfies . Expansion (1) is optimal in a univariate sense, as truncation of the series retains the maximum amount of variability for each component .
A second option is multivariate FPCA (mFPCA, ), which is based on the covariance operator of , where and is the integral operator with kernel The multivariate Karhunen-Loève expansion is based on the orthonormal sequence of eigenfunctions of :
In this expansion, the multivariate FPCs form a sequence of uncorrelated random variables, and are optimal in the sense of retaining variability of the multivariate process.
Turning to the goal of estimating a functional graphical model, the conditional dependency structure of is informally represented by the conditional covariance kernels
where and is the vector of functions remaining after removing and . If this object is well-defined, the conditional independence graph corresponding to is built by requiring if and only if for all  utilized univariate FPCA (1) separately for each function, and demonstrated that, if there exists such that for can be identified with a finite-dimensional precision matrix, thereby reducing the functional problem to an essentially multivariate one. Such a transition from a continuous kernel to a matrix is possible only if the conditioning variable is finite dimensional. Interestingly, while the Bayesian methodology in  provided a more rigorous definition of conditional independence, priors on the covariance were required to concentrate on finite-dimensional subspaces. Such an assumption is unreasonable for most functional data arising in practice, given the complex phenomena they represent. Aside from this consideration, a key practical drawback of univariate FPCA is that the choice of each component’s basis is made without regard to the dependencies between components, so that key aspects of the covariance between features can be omitted in the reduction. On the other hand, mFPCA in (2) is useless in terms of fitting a graphical model, since the multivariate FPCs are scalar random variables and cannot be used to estimate conditional dependencies between component functions.
Intuitively, the goal is to combine the strengths of (1) and (2) as follows. Each component function is represented by a sequence of scores obtained by projecting on an orthonormal basis (as in FPCA), where the basis is chosen optimally to retain the variability in the multivariate process (as in mFPCA). Results in Section 3 demonstrate that this can be achieved by applying an assumption of separability to . Under this assumption, the conditional covariance kernels in (3) are automatically well-defined, and can be identified with a sequence of covariance matrices, leading to substantial computational advantages.
3 Notions of Separability
The multi-way structure of the data, represented by the discrete and continuous indices and , make modelling difficult, particularly for understanding and estimating conditional independence. When encountering such complex data, structural assumptions can often simplify matters greatly, both in terms of interpretation and computation. In this case, it is desirable to separate the multivariate and functional aspects of the data. Motivated by recent work on dimension reduction and separability for so-called two-way functional data [1, 17, 4], we investigate similar separability notions for multivariate functional data. Recall that is the covariance operator of with orthonormal eigenfunctions and let denote its kernel. The developments in this section do not require that be Gaussian.
is strongly separable if there exist a covariance matrix and a symmetric, nonnegative definite covariance kernel such that
Thus, strong separability of states that its covariance properties factor cleanly into multivariate and functional aspects. Of course, and are not identifiable in Definition 1, since one can be scaled by a positive constant and the other by its inverse. However, the eignevectors of , denoted , and eigenfunctions of , denoted , are unique. As a consequence, the eigenfunctions of are given by the tensor basis and the scores are uncorrelated whenever or In fact, it is this last property that is most useful from a statistical point of view. As strong separability is a very restrictive assumption, this leads to the following definition of weak separability .
is weakly separable if there exist orthonormal bases and of and , respectively, such that the array
consists of uncorrelated random variables.
If weak separability holds, it need not be the case that cleanly factor into multivariate and functional components so that are not necessarily eigenfunctions of . However, the following parallel holds (see  for a similar result). Define
If is weakly separable and the eigenvalues of (respectively, ) have multiplicity one, then the orthonormal basis (respectively, ) is unique and consists of the eigenvectors (resp., eigenfunctions) of (resp., ).
By weak separability we have for . Expanding the kernel in the tensor basis
Thus, are the eigenvectors of .
On the other hand, if we marginalize over ,
we see that are the eigenfunctions of .
The key point of this result is the identifiability of the target dimension reduction spaces. Weak separability implies that the score vectors are uncorrelated across . When is Gaussian, these vectors become independent Gaussian -vectors. As we will see, this independence provides the main simplification in the fitting of an FGGM. However, weak separability implies a bit more, specifically that the stochastic processes are independent across which is of no particular advantage in our setting. We thus weaken separability even further as follows.
is partially separable if there exists an orthonormal basis of such that the random vectors , are mutually uncorrelated.
Similar to weak separability, if partial separability holds, the basis is unique. This fact allows for simple estimation of the basis as will be described in Section 5. Additionally, the basis is optimal under a certain fraction of variance explained criterion.
Suppose is partially separable and that the eigenvalues of the operator with kernel are positive and have multiplicity one. Then
the basis is unique and consists of the eigenfunctions , and
for any orthonormal basis of and any
where the are ordered by decreasing eigenvalue, with equality if and only if
To see part a), first observe that almost surely, for any satisfying Definition 3. Let be the unique Mercer expansion , where form an orthonormal sequence in and is a nonincreasing sequence of positive numbers with But by partial separability,
implying that for each . For part b), notice that is a self-adjoint Hilbert-Schmidt operator with eigenvalues and eigenfunctions . Hence, by the Schmidt-Mirsky Theorem we see that, for and denoting the Hilbert-Schmidt norm, the minimization
is uniquely attained by the integral operator with kernel . Therefore, for any orthonormal basis of we have
with equality if and only if .
4 The Functional Gaussian Graphical Model for Partially Separable Processes
Under the assumption that is partially separable, the multivariate process admits an expansion
dubbed the partial separability Karhunen-Loève expansion (PSKL). We thus refer to as the PSKL basis. The coefficient vectors satisfy whenever ; they are mutually uncorrelated. While the assumption of partial separability can be made independently of Gaussianity, under the assumption of a Gaussian process, this implies that the are mutually independent Gaussian random vectors. Let so that For ease of presentation, suppose is positive definite for all
Recall that, in order to define a coherent FGGM, one needs a well-defined measure of conditional covariance between component functions and informally described by (3). For any fixed define and the quantities
Then the conditional covariance between and given is
Since we are dealing with a finite-dimensional model for each we also have a connection between and the precision matrix Specifically,
The next result establishes that the conditional covariance functions are well-defined when the PSKL expansion holds. The proof of this and all remaining results can be found in the Appendix.
If is partially separable, then, for all the covariance between and conditional on the multivariate subprocess is
Now, the conditional independence graph for the multivariate Gaussian process can be defined by if and only if . In fact, the edge set in this partially separable functional Gaussian graphical model (psFGGM) has a direct connection to the finite-dimensional graphs for which if and only if
Under the setting of Theorem 2, the functional graph edge set is related to the sequence of edge sets by
The developments of the previous section suggest that, under the assumption of partial separability, the precision matrices contain all of the conditional independence information about . In fact, as a consequence of the PSKL expansion, so it is more convenient and stable to work with where is the correlation matrix corresponding to An immediate consequence will be an estimator of the overall edge set, but contains even richer information beyond the adjacency matrix, such as the coefficients in the conditional covariance functions in (7). After defining the estimation procedure, convergence rates will be established under the setting of fully-observed functional data. This setting is used for simplicity and in order to maintain comparability with , who also made this assumption in theoretical developments. Simulations and data analysis will investigate the estimators under discretely and noisily observed functional data.
Consider a sample independently and identically distributed according to a multivariate Gaussian process In order to make these methods applicable to any functional data set, it is assumed that one has constructed preliminary estimates and of the individual mean functions and auto- and cross-covariance functions for each component. As an example, if the are fully observed, cross-sectional estimates
leading to empirical eigenvalue/eigenfunction pairs as estimates of the PSKL components. These quantities can be used to compute estimates of as
A group graphical lasso approach  will be used to estimate the Let
be the estimated correlations. The estimation targets the first inverse correlation matrices by
This corresponds to a penalized likelihood objective function under the Gaussian assumption on the , where the penalty function is
The parameters and control the overall sparsity level and common sparsity pattern, respectively, in the estimates although we have suppressed this dependence in our notation. Then the estimated edge set is if and only if Although the joint graphical lasso estimator in  was proposed for borrowing structural information across multiple classes of multivariate data, it is utilized here for a slightly different purpose. Due to the equality inducing a common sparsity pattern in the precision matrix estimates intuitively causes the most prominent edges, those that appear in many different sets , to be included in the estimated graph . Lastly, letting be the diagonal matrix of scale estimates one immediately obtains estimates and
The main consistency result considers joint estimation of the matrices under the setting of fully observed functional data, so that and are as in (8), and with in (11), corresponding to a strict graphical lasso penalty. As a preliminary result, we first derive a concentration inequality for the estimated covariances in (9), requiring the following mild assumption.
The eigenvalues are distinct, and thus strictly decreasing.
The eigenvalue spacings play a key role in the first theorem through the quantities and
Suppose that is a partially separable Gaussian process and that (A1) holds. Then there exist constants such that, for any and for all and
A similar result was obtained by  under much stricter assumptions. As a comparison, if we simply add the assumption that for some , then for any , the choice yields
Additionally, our method of proof is greatly simplified by using the inequality
where is the usual norm. Because the estimator targets the inverse correlation matrices, the following corollary is useful. Let and
Under the setting of Theorem 3, there exist such that for any and all
Finally, applying this concentration result to the estimators in the usual way, a joint rate of convergence is obtained for the first inverse correlation estimates , where diverges with This requires one additional assumption.
The condition numbers of are uniformly bounded above.
Let denote the matrix Frobenius norm and write for the number of elements (edge pairs) in
Suppose assumptions (A1) and (A2) hold, and define If satisfies and then
For example, if the edge sets are eventually empty so that , and if , grows at a polynomial rate, then the choices and for yield a rate of for
6 Numerical Experiments
We perform extensive simulation studies in this section comparing psFGGM with the FGGM of . Other potentially competing non-functional based approaches are not included since they are clearly outperformed by FGGM (see ).
6.1 Simulation Setup
All settings start with a conditional independence graph with nodes and edge set . The graph is generated from a power law distribution with parameter . Then, for a fixed , a sequence of edge sets is generated so that . This process has two main steps. First, a set of common edges to all edge sets is computed and denoted as for a given proportion of common edges . Next, the set of edges is partitioned into where for and set . More details for constructing can be found in the Appendix.
Next, one precision matrix for each is generated following the algorithm in . Let be a matrix with entries:
where . And finally, is rescaled by rows, averaged with its transpose and has its diagonal entries set to one. This process outputs a precision matrix which is guaranteed to be symmetric and diagonally dominant.
Then, one can sample with for from a mean zero multivariate normal distribution. For the covariance matrix we consider two models, and , corresponding to partially separable and non-partially separable , respectively:
Partially separable case: with and decaying factors This guarantees that decreases monotonically on and that satisfies the assumptions of a partially separable multivariate Gaussian process.
Non-partially separable case: A block-banded precision matrix is computed with blocks and with . This ensures that is positive definite with non-zero off-diagonal blocks breaking the partial separability assumptions. Next, one computes a covariance matrix
with as in (i).
Finally, discrete and noisy functional data were generated as
with and generated according to the PSKL expansion in (5), and . Fourier basis functions evaluated on an equally spaced time grid of , with and , were used to generate the data. In all settings, simulations were conducted. To resemble real fMRI data from the HCP, we set , and for a sparse graph.
6.2 Comparison Results
We compare our method psFGGM against the FGGM, which we estimate using the code provided by the authors . As performance metrics, the true positive rate (TPR) and false positive rate (FPR) of correctly identifying edges in graph are computed varying for a fixed value of . The latter parameter takes values in a coarse grid of five evenly spaced points spanning the interval [0,1], and the value maximizing the area under the ROC curve is considered for the comparison. In all cases, a power law distribution parameter and a proportion of common edges were used.
The two methods are compared using principal components explaining at least of the variance. For all simulations, this threshold results in the choice of or components for both FGGM and psFGGM. For higher thresholds requiring larger values of , however, we see a sharp contrast; while psFGGM consistently converges to a solution, FGGM does not, due to increasing numerical instability. The likely reason for the instability of FGGM is due to the fact that it requires estimation of free parameters. On the other hand, psFGGM estimates a precision matrix with free parameters. In this sense, psFGGM estimator effectively regularizes the solution more, leading to numerical stability. As a result, psFGGM is able to include more functional bases into the analysis, thereby incorporating more information from the data. In the figures and tables, additional results are available for psFGGM when is increased to explain at least of the variance.
AUC15 is AUC computed for FPR in the interval [0, 0.15], normalized to have maximum area 1.
Figure 1 shows average ROC curves for the high-dimensional case with . The smoothed curves are computed using the supsmu R package that implements SuperSmoother , a variable bandwidth smoother that uses cross-validation to find the best bandwidth. Table 1 shows the mean and standard deviation of area under the ROC curves estimates under various simulation settings. When the partial separability assumption is satisfied, i.e., , psFGGM exhibits uniformly higher TPR profile across the FPR range. Even when the partial separability assumption is not satisfied, i.e., , psFGGM and FGGM perform comparably. More importantly and in all cases, psFGGM is able to leverage level of variance explained, owing to the numerical stability mentioned above. Figure 2 and Table 2 summarize results for the large sample case with similar conclusions.
A comparison of the two methods under a the very sparse case is also considered. For this case one has with a proportion of common edges . Finally, we check the robustness of our conclusions under other settings including and , as well as greater, equal or smaller than . Results for a very sparse case with are included in the Appendix. All simulation experiments show a comparative advantage of psFGGM.
7 Application to Functional Brain Connectivity
In this section, psFGGM is used to reconstruct the functional brain connectivity structure using functional magnetic resonance imaging (fMRI) data from the Human Connectome Project (HCP). We analyze the ICA-FIX preprocessed data variant as suggested by  that controls for spatial distortions and alignments across both subjects and modalities.
In this work, the goal is to reconstruct the connectivity pattern between 360 cortical regions of interest (ROIs) given in  using psFGGM approach. Each ROI timeseries signal is obtained by averaging all BOLD signals from the ROI.
AUC15 is AUC computed for FPR in the interval [0, 0.15], normalized to have maximum area 1.
For the analysis, we use the motor task fMRI dataset111The 1200 Subjects 3T MR imaging data available at https://db.humanconnectome.org. This dataset consists of fMRI scans of individuals performing basic body movements. During each scan, a sequence of visual cues signals the subject to move one of five body parts: fingers of the left or right hand; toes of the left or right foot; or the tongue. After each three-second cue, a body movement lasts for 12 seconds with temporal resolution of 0.72 seconds.
The left- and right-hand tasks are analyzed separately. Both of them consider the same subjects with complete records and ROIs. Having removed cool down and ramp up observations, we end up with time points of pure movement tasks. For illustration purposes, we use penalty parameters and to estimate very sparse graphs in both cases.
Figure 3 shows comparison of activation patterns from left and right-hand task datasets. We visualize the resulting ROI graph on a flat brain map by coloring those ROIs which have any positive degree of connectivity. Connected ROIs that are unique to each task (Figures 3a and 3b on top) are distinguished from those that are common to both (Figure 3c). One can see that almost all of the visual cortex ROIs in the occipital lobe are shared by both maps. This is expected as both tasks require individuals to watch visual cues. On the other hand, the main difference between these motor tasks lies at the motor cortex near the central sulcus. In Figure 3a and 3b the functional maps for the left- and right-hand tasks present particular motor-related cortical areas in the right and left hemisphere, respectively. These results are in line with the motor task activation maps obtained by .
Partial separability for multivariate functional data is a novel structural assumption with further potential applications beyond graphical models. For example, it is well-known that the functional linear model (see  and references therein) can be simplified by univariate FPCA, which parses out the problem into a sequence of univariate regressions. The PSKL expansion in (5) demonstrates a similar potential, namely to break down a problem involving multivariate functional data into a sequence of standard multivariate problems. This potential was demonstrated in this paper by decomposing a functional graphical model into a sequence of standard multivariate graphical models.
We have presented partial separability for multivariate processes where each component is a function defined on the same domain, motivated by the brain connectivity example. However, this restriction is not strictly necessary in order to define partial separability. If are elements of a more general definition of partial separability would be the existence of orthonormal bases of such that the vectors where are mutually uncorrelated across . Such a generalization is highly desirable, as many multivariate functional data sets consist of functions on different domains. In fact, the above notion is even applicable when the domains are of different dimension  or even a complex manifold, such as the surface of the brain. Questions of uniqueness and optimality of the bases are promising areas of future research.
The psFGGM is equally applicable to dense or sparse functional data, observed with or without noise. However, rates of convergence will inevitably suffer as observations become more sparse or are contaminated with higher levels of noise. To the best of our knowledge, concentration results such as Theorem 3 of this paper or Theorem 1 of  are only known for the case of fully observed functional data. Further investigation into graphical models for functional data may yield interesting insights into regime divisions similar to those found by .
This section contains proofs of the theoretical results in 4 and 5 and one additional auxiliary result. We also give details on the algorithm used in Section 6 to generate a sequence of edge sets (one per basis function) having a given proportion of common edges. Lastly, we provide ROC curves and performance metrics for additional simulation settings.
Proofs of Results from Section 4
Proof of Theorem 2.