Signal Processing on Higher-Order Networks: Livin’ on the Edge … and Beyond
This tutorial paper presents a didactic treatment of the emerging topic of signal processing on higher-order networks. Drawing analogies from discrete and graph signal processing, we introduce the building blocks for processing data on simplicial complexes and hypergraphs, two common abstractions of higher-order networks that can incorporate polyadic relationships. We provide basic introductions to simplicial complexes and hypergraphs, making special emphasis on the concepts needed for processing signals on them. Leveraging these concepts, we discuss Fourier analysis, signal denoising, signal interpolation, node embeddings, and non-linear processing through neural networks in these two representations of polyadic relational structures. In the context of simplicial complexes, we specifically focus on signal processing using the Hodge Laplacian matrix, a multi-relational operator that leverages the special structure of simplicial complexes and generalizes desirable properties of the Laplacian matrix in graph signal processing. For hypergraphs, we present both matrix and tensor representations, and discuss the trade-offs in adopting one or the other. We also highlight limitations and potential research avenues, both to inform practitioners and to motivate the contribution of new researchers to the area.
keywords:simplicial complex, hypergraph, higher-order network, graph signal processing, node embeddings
Graphs provide a powerful abstraction for systems comprised of (dynamically) interacting entities. By encoding these entities as nodes and the interaction between them as edges in a graph, we can model a large range of systems in an elegant, conceptually simple framework. Accordingly, graphs have been used as a modeling tool for a broad range of application areas Newman2018 (); Easley2010 (), including neuroscience Sporns2018 (); medaglia_2017_brain (), urban transportation Derible2011 (), and the social sciences Borgatti2009 (). Many of these applications may be understood in terms of graph signal processing (GSP), which provides a unifying framework for processing graph-based data. Modeling complex interactions with graphs for which attributes on nodes are seen as signals, GSP extends classical signal processing concepts and tools such as the Fourier transform and filtering to data on graphs Sandryhaila2013 (); Shuman2013 (); Ortega2018 ().
To enable computations with graph-based data, we typically encode the graph structure in an adjacency matrix or its associated (normalized or combinatorial) Laplacian matrix. However, rather than considering these matrices simply as a table that records the pairwise coupling between nodes, it is fruitful to think of these matrices as linear operators that map data from the node space back to itself. By analyzing the properties of these maps – e.g., in terms of their spectral properties – we can reveal important aspects both about the graphs themselves as well as signals defined on the nodes. Choosing an appropriate matrix operator associated with the graph structure is thus a key factor in gaining deeper insights about graphs and graph signals. In the context of GSP, we call these maps graph-shift operators, which provide a way to relate data associated with different nodes. Such shift operators are natural generalizations of the classical time delay, and constitute the fundamental building blocks of graph filters and other more sophisticated processing architectures segarra2017optimal (). The rapid advancement of GSP has benefited significantly from the fact that the properties of these matrices and methods that leverage their structure in order to analyze data have been explored extensively in the context of spectral graph theory Chung1997 ().
By construction, graph-based representations do not account for interactions between more than two nodes, even though such multi-way interactions are widespread in complex systems: multiple neurons can fire at the same time giusti2016 (), biochemical reactions usually include more than two proteins klamt2009 (), and people interact in small groups kee2013 (). To account for such polyadic interactions, a number of modeling frameworks have been proposed in the literature to represent higher-order relations, including simplicial complexes Hatcher2002 (), hypergraphs Berge1989 (), and others Frankl1995 (). However, in comparison to this line of work on representing the structure of complex multi-relational systems, the literature on the data processing defined on higher-order networks is comparatively sparse. In this tutorial paper, we focus on the topic of signal processing on simplicial complexes and hypergraphs. Following a high-level didactic style, we concentrate on the algebraic representations of these objects, and discuss how the choice of this algebraic representation can influence the way in which we analyze and model signals associated with higher-order networks.
Similarly to graphs, higher-order interactions can be encoded in terms of matrices or, more generally, tensors. Two of the most prominent abstractions for such polyadic data are simplicial complexes Hatcher2002 () and hypergraphs Berge1989 (), and we respectively encode higher-order relations in terms of incidence matrices or tensors that provide an algebraic description of these two data models. Analogously to the graph case, there can be many different algebraic descriptions of the same higher-order interaction structure. Clearly, the choice of the linear (or multilinear) operator representing higher-order interactions will matter for revealing interesting properties about the data, leading to the key question of how to choose an appropriate abstraction for this kind of data. In comparison to the case of graphs, the analysis of higher-order interaction data is more challenging due to several factors: (i) There exists a combinatorially large number of possible interactions: two-way, three-way, and so on. Hence, very large matrices and tensors are needed to capture all these relations; (ii) The large dimensionality of these representations gives rise to computational and statistical issues on how to efficiently extract information from higher-order data; and (iii) The theory on the structure of higher-order networks is largely unexplored relative to that of graphs. In the following, we will primarily focus on the question of choosing an appropriate algebraic descriptor to implement various signal processing tasks on simplicial complexes and hypergraphs. Specifically, we will consider the modeling assumptions inherent to an abstraction based on simplicial complexes versus hypergraphs, and discuss the relative advantages and disadvantages of a number of associated matrix and tensor descriptions that have been proposed. To make our discussions more concrete we provide a number of illustrative examples to demonstrate how the choice of an algebraic description can directly effect the type of results we can obtain.
Outline. We first briefly recap selected concepts from signal processing and GSP in Section 2. In Section 3, we present tools from algebraic topology and their use in representing higher-order interactions with simplicial complexes. In Section 4, we describe methods to analyze signals defined on simplicial complexes. We then turn our attention to hypergraphs in Section 5, and focus on the modeling of higher-order interactions via hypergraphs. Section 6 then builds on these models and outlines some of the existing methods for signal processing and learning on hypergraphs. Finally, in Section 7, we close with a brief discussion summarizing the main takeaways and laying out directions for future research.
2 Signal processing on graphs: A selective overview
Before discussing signal processing on higher-order networks, we revisit principles from signal processing and GSP and recall some important problem setups, which will later guide our discussion on higher-order signal processing.
2.1 Central tenets of discrete signal processing
In discrete signal processing (DSP), signals are processed by filters. A linear filter is an operator that takes a signal as input and produces a transformed signal as output, such that a filtering operation is represented by a matrix-vector multiplication and defines a linear system. A special role is played by the circular time shift filter , a linear operator that delays the signal by one sample. This so-called shift operator underpins the class of time shift-invariant filters, which is arguably the most important class of linear filters in practice. Specifically, in classical DSP, every linear time shift-invariant filter can be built based on a matrix polynomial of the time-shift oppenheim2009discrete ().
A filter represented by the matrix is shift-invariant if it commutes with the shift operator, i.e., . In particular, this implies that and preserve each others eigenspaces. Since the cyclic shift is a circulant matrix that is diagonalizable by discrete Fourier modes, this implies that the action of any shift-invariant linear filter in DSP can be understood by means of a Fourier transform. In other words, the eigenvectors of the cyclic time-shift operator provide an orthogonal basis for linear time shift-invariant processing of discrete time-signals in a naturally interpretable way by means of Fourier analysis oppenheim2009discrete ().
2.2 Graphs, incidence matrices, and the graph Laplacian
An undirected graph is defined by a set of nodes with cardinality and a set of edges with cardinality composed of unordered pairs of nodes in . Edges can be stored in the symmetric adjacency matrix whose entries are given by if and otherwise. Given the degree matrix , the graph Laplacian associated with is given by . Alternatively to the adjacency matrix , we can collect interactions between the nodes in the graph via the incidence matrix . For each edge we can define an arbitrary orientation, which we denote by where we think of the edge as oriented from tail node to its head node . Based on this orientation, the incidence matrix is defined such that and otherwise. Using this definition we can provide an equivalent expression for the graph Laplacian as . In the remainder of this paper, we choose an edge-orientation induced by the lexicographic ordering of the nodes, i.e., edges will always be oriented such that they point from a node with lower index to a node with higher index. However, we emphasize that this orientation is arbitrary and is distinct from the notion of a directed edge.
2.3 Graph signal processing
GSP generalizes the concepts and tools from DSP to signals defined on the nodes of graphs. A graph signal is a map from the set of nodes to the set of real numbers . This defines an isomorphism between the set of nodes and the set of real-valued vectors of length , so any graph signal may be represented as a vector . An example of a graph signal can be seen in Figure 1A, where the signal values at each node are indicated by the node color. Similarly to DSP, filtering in GSP can be represented by a matrix-vector multiplication operation . The analog of the shift operator in the GSP setting is any operator that captures the relational dependencies between nodes, including the adjacency matrix , the Laplacian matrix , or variations of these operators Shuman2013 (); Ortega2018 ().
As we are considering undirected graphs here, the choice of a shift operator imparts a natural orthogonal basis in which to represent the signal, through which we can express any shift-invariant filter as follows. Given the eigenvalue decomposition of the shift operator and a filtering weight function , a shift-invariant filter can be written as
where we have used the shorthand notation . By analogy to the Fourier basis in DSP, the eigenvectors of the shift operator are thus said to define a graph Fourier transform (GFT), and is called the frequency response of the filter . Specifically, the GFT of a graph signal is given by , while the inverse GFT is given by Ortega2018 (); Sandryhaila2013 ().
As our discussion emphasizes, any filtered signal on an undirected graph can be understood in terms of three steps: (i) project the signal into the graph Fourier domain, i.e., express it in the orthogonal basis (via multiplication with ); (ii) amplify certain modes and attenuate others (via multiplication with ), and (iii) push back the signal to the original node domain (via multiplication with ). The choice of an appropriate shift operator is thus crucial, as its eigenvectors define the basis for any shift-invariant graph filter for undirected graphs. We will encounter this aspect again when considering signal processing on higher-order networks.
In the context of GSP, we focus in the following on the graph Laplacian as shift operator. This choice has certain advantages. First, is positive semidefinite, and thus all the graph frequencies (eigenvalues) are real and non-negative. This enables us to order the GFT basis vectors (eigenvectors) in a natural way. Second, by considering the variational characterization of the eigenvalues of the Laplacian in terms of the Rayleigh quotient , it can be shown that eigenvectors associated with small eigenvalues have indeed a small variation along the edges of the graph (low frequency) and eigenvectors associated with large eigenvalues show a large variation along edges (high frequency). In particular, eigenvectors associated with eigenvalue are constant over connected components, akin to a DC signal component. An illustration of these aspects is given in Figure 1B, which displays the individual basis vectors of the graph Laplacian, and the coefficients with which these basis vectors would have to be weighted to obtain the previously considered graph signal in Figure 1A.
2.4 Graph signal processing: Illustrative problems and applications
Over the last few years, several relevant problems have been addressed using GSP tools including sampling and reconstruction of graph signals marques2016sampling (); chen2015discrete (); anis_2016_sampling (), (blind) deconvolution segarra_2016_blind (); zhu2020estimating (), and network topology inference dong_2016_laplacian (); segarra_2017_topo (); shen_2017_kernel (); mateos_2019_connecting (), to name a few. We now introduce a subset of illustrative problems and application scenarios that we will revisit in the context of higher-order signal processing.
Fourier analysis: Node embeddings and Laplacian eigenmaps
As discussed above, the GFT of a graph signal provides a fundamental tool of GSP. While we are often interested in filtering a signal and representing it in the vertex space, the Fourier representation can also be used to gain insight about specific graph components by considering a frequency domain representation of the indicator vector associated with the vertices of interest. In particular, by considering a truncated Fourier domain representation of the indicator vectors of individual nodes, we can recover a number of spectral node embeddings that have found a broad range of applications (see also heimowitz2017unified () for a related discussion). Specifically, by considering a truncated Fourier domain representation based on the normalized Laplacian as a shift operator, we recover a variant of the so-called Laplacian eigenmap Belkin2003 (), and by additionally incorporating a scaling associated with the eigenvalues, we can recover the diffusion map coifman2006diffusion (); heimowitz2017unified ().
We remark that while most of these spectral node embeddings focus on low frequency eigenvectors, in principle, high frequency components can be of interest as well for embeddings. For instance, if the graph to be analyzed is (almost) bipartite, then the eigenvectors associated with the highest frequencies of the graph Laplacian will (approximately) reveal the two independent node sets in the graph. Other type of (non-linear) node embeddings may also be viewed from a GSP lens, e.g., certain node embeddings derived from graph neural networks (cf. Section 4.4). An extensive discussion is however beyond the scope of this paper. We refer to hamilton2017representation () for a more in-depth discussion of the highly active area of node representation learning on graphs.
Signal smoothing and denoising
A canonical task in GSP is to denoise (smooth out) a noisy signal , where is the true signal we aim to recover and is a vector of zero-mean white Gaussian noise chen2014signal (). A natural assumption is that the signal should be smooth on nearby nodes in terms of the underlying graph, so that neighboring nodes will tend to take on similar values. Following our above discussion, this amounts to assuming that the signal has a low-pass characteristic, i.e., can be well-represented by the low frequency eigenvectors of the Laplacian. One way to formalize the above problem is in terms of the following optimization problem Zhou2004 (); dong_2016_laplacian ()
where is the estimate of the true signal . The coefficient can be interpreted as a regularization parameter that trades-off the smoothness promoted by minimizing the quadratic form and the fit to the observed signal in terms of the squared -norm. The optimal solution for (2) is given by dong_2016_laplacian ()
A different procedure to obtain a signal estimate is the iterative smoothing operation
for a certain fixed number of iterations and a suitably chosen update parameter . This may be interpreted in terms of gradient descent steps of the cost function .
Matching the signal modeling assumption of a smooth signal, the denoising and smoothing operators defined in (3) and (4) are instances of low-pass filters, i.e., filters whose frequency responses are vectors of non-increasing (decreasing) values. In the GSP context, the low-pass filtering operation guarantees that variations over neighboring nodes are smoothed out, in line with the intuition of the optimization problem defined in (2).
Graph signal interpolation
Another common task in GSP is signal interpolation, which can alternatively be interpreted in terms of graph-based semi-supervised learning chen2015discrete (); segarra2016reconstruction (). Suppose that we are given signal values (labels) for a subset of the nodes of a graph. Our goal is to interpolate these assignments and to provide a label to all unlabeled nodes .
As in the signal denoising case, it is natural to adopt a smoothness assumption that posits that well-connected nodes have similar labels Chapelle2002 (). This motivates the following constrained optimization problem Zhu2003 ()
which aims to minimize the sum-of-squares label difference between connected nodes under the constraint that the observed node labels should be accounted for in the optimal solution. Notice that the objective function in (5) can again be written in terms of the quadratic form of the graph Laplacian , highlighting the low-pass modeling assumption inherent in the optimization problem (5).
Graph neural networks
Motivated by spectral interpretations of filters and shift operators in the domain of graph signal processing, graph neural networks bronstein2017geometric (); wu2020comprehensive () have emerged as a popular approach to incorporate nonlinearities in the graph signal processing pipeline for purposes of node embedding cao2016deep (); wang2016structural (); kipf2016variational (), node classification kipf2016semi (); defferrard2016convolutional (), and graph classification defferrard2016convolutional (). Graph neural network architectures combine notions of graph filtering, permutation invariance, and graph Fourier analysis with nonlinear models from the design of neural networks.
One such architecture is the well-known graph convolutional network kipf2016semi (), which resembles the functional form of (4) with interleaved nonlinear, elementwise activation functions, i.e., for a set of input features gathered in the columns of a matrix ,
where we take for some integer as the output, are learnable weight matrices that perform linear transformations in the feature space, is a shift-invariant graph filter based on the adjacency or Laplacian matrix, and is a (possibly nonlinear) activation function applied elementwise. For instance, kipf2016semi () uses a normalized version of the graph Laplacian as a first-order filter , and the ReLU activation function for .
A closer look at (6) reveals a connection with the iterative smoothing method of (4). Taking to be the identity mapping, we see that (6) can be expressed as a linear graph filter independently applied to each of the features, with output defined as linear combinations of these filtered features at each node via the matrices . That is,
where itself represents a shift-invariant graph filter, due to the assumed shift-invariance of . Taking and recovers the iterative smoothing procedure of (4). However, by interleaving nonlinear functions as in (6) and taking linear combinations of features via , we allow the architecture to learn more sophisticated, nonlinear relationships between the nodes and node features by finding optimal weights for a suitable loss function.
There are many variants of the graph neural network architecture, designed for tasks ranging from semi-supervised learning kipf2016semi () to graph classification defferrard2016convolutional (). We refer the reader to the survey paper wu2020comprehensive () for further details, as well as gama2018convolutional () for a view focused on graph signal processing in particular. A relevant variant of graph neural networks are those based on message passing as in hamilton2017inductive (), where the graph determines the local neighborhoods of information aggregation but more general functions than shift-invariant filters can be applied at each layer.
3 Modeling higher-order interactions with simplicial complexes
In this section, we recap some of the mathematical underpinnings of simplicial complexes. We focus in particular on the Hodge Laplacian, which extends the graph Laplacian as a natural shift operator for simplicial complexes. Specifically, we discuss how the eigenvectors of the Hodge Laplacian provide an interpretable orthogonal basis for signals defined on simplicial complexes by means of the Hodge decomposition.
3.1 Background on simplicial complexes
Given a finite set of vertices , a -simplex is a subset of with cardinality . A simplicial complex is a set of simplices such that for any -simplex in , any subset of must also be in . A face of a simplex is a subset of with cardinality . A co-face of a simplex is a -simplex such that is a subset of . More detailed discussions and definitions can be found in Grady2010 (); Lim2015 (); Munkres2000 ().
Figure 2A provides an example of a simplicial complex. Here, simplices of order are depicted as nodes, simplices of order as edges, and simplices of order 2 are displayed as gray, filled triangles. Note how the edges and are faces of the -simplex . The -simplex is a co-face of the edges and .
For computational purposes, we define an orientation for each simplex by fixing an ordering of its vertices. This ordering induces an orientation by increasing vertex label. Based on the reference orientation for each simplex, we introduce a book-keeping of the relationships between -simplices and -simplices via linear maps called boundary operators which enable us to record higher-order interactions in networks. As the simplicial complexes considered here are all of finite order, these boundary operators can be represented by matrices . The rows of are indexed by -simplices and the columns of are indexed by -simplices. For instance, is nothing but the node-to-edge incidence matrix denoted in Section 2, while is the edge-to-triangle incidence matrix.
We adopt the lexicographic order to define the reference orientation of simplices in Figure 2. The corresponding boundary maps and are then given by
We may consider signals defined on any -simplices (nodes, edges, triangles, etc.) of a simplicial complex as illustrated in Figure 2B-D. Just like in the case of graph signals, we need to establish an appropriate shift operator to process such signals. While there are many possible choices, we will show in the next section that a natural choice is the co-called Hodge Laplacian, which is a generalization of the graph Laplacian with strong roots in algebraic topology.
3.2 The Hodge Laplacian as a shift operator for simplicial complexes
Based on the incidence matrices defined above, we can define a sequence of so-called Hodge Laplacians Lim2015 (). Specifically, the -th combinatorial Hodge Laplacian, originally in Eckmann1944 (), is given by Eckmann1944 (); Lim2015 ():
Notice that, according to this definition, the graph Laplacian corresponds to with . More generally, by equipping all spaces with an inner product induced by positive diagonal matrices, we can define a weighted version of the Hodge Laplacian (see, e.g., Grady2010 (); Lim2015 (); Schaub2018 (); BensonE11221 ()). This weighted Hodge Laplacian encapsulates operators such as the random walk graph Laplacian or the normalized graph Laplacian as special cases. For simplicity, in this paper we concentrate on the unweighted case.
Just like the graph Laplacian provides a useful choice for a shift operator for node signals defined on a graph due to its (spectral) properties, the Hodge Laplacian and its weighted variants provide a natural shift operator for signals defined on the edges of a simplicial complex (or graph). As the edges in our simplicial complexes are equipped with a chosen reference orientation, the Hodge Laplacian is in particular relevant as shift operator if the signals considered are indeed oriented, e.g., correspond to some kind of edge-flow in the case of a signal on edges.
Similar to the graph Laplacian, the Hodge Laplacian is positive semi-definite, which ensures that we can interpret its eigenvalues in terms of non-negative frequencies. Moreover, these frequencies are again commensurate with a notion of signal-smoothness that is captured by the eigenvectors of the Hodge Laplacian. In the case of signals on general -simplices, this notion of smoothness can be understood by means of the so called Hodge decomposition Lim2015 (); Grady2010 (); Schaub2018 (), which states that the space of -simplex signals can be decomposed into three orthogonal subspaces
where and are shorthand for the image and kernel spaces of the respective matrices, represents the union of orthogonal subspaces, and is the cardinality of the space of signals on -simplices (i.e., for the node signals, and for edge signals). Here we have (i) made use of the fact that a signal on a finite dimensional set of simplices is isomorphic to ; and (ii) implicitly assumed that we are only interested in real-valued signals and thus a Hodge decomposition for a real valued vector space (see Lim2015 () for a more detailed discussion).
To facilitate the discussion on how the Hodge decomposition (9) can be related to a notion of smooth signals let us consider the concrete case with Hodge Laplacian for illustration Schaub2018 (); Schaub2018a (); Jia2019 (). In this case, we can provide the following meaning to the three subspaces considered in (9). First, the space can be considered as the space of gradient flows (or potential flows). Specifically, since we may create any such flow according to the following recipe: (i) assign a scalar potential to all the nodes; (ii) induce a flow along the edges by considering the difference of the potentials on the respective endpoints. Clearly, any such flow cannot be cyclic and, accordingly, the space being orthogonal to is the so-called cycle space. As indicated, the cycle space is spanned by two types of cyclic flows. The space consists of curl flows and its elements are flows that can be composed of local combinations of local circulations along any -simplex. Specifically, we may assign a scalar potential to each -simplex as well and consider the induced flows , where is the vector of -simplex potentials. Note that every column of creates a triangular circulation around the respective -simplex along its chosen reference orientation. Hence, these flows correspond to local cycles associated with the -simplices present in the simplicial complex. Finally is the harmonic space, whose elements correspond to (global) circulations that are not representable as a linear combination of curl flows.
Let be the Hodge 1-Laplacian of a simplicial complex. Then the eigenvectors associated with nonzero eigenvalues of comprise two groups that span the gradient space and the curl space respectively.
Consider any eigenvector of the graph Laplacian associated with a nonzero eigenvalue . Then is an eigenvector of with the same eigenvalue . Moreover spans the space of all gradient flows.
Consider any eigenvector of the “2-simplex coupling matrix” associated with a nonzero eigenvalue . Then is an eigenvector of with the same eigenvalue . Moreover spans the space of all curl flows.
The above result shows that, unlike in the case of node signals, edge-flow signals can have a high frequency contribution due to two different types of (orthogonal) basis components being present in the signal: a high frequency may arise both due to a curl component as well as a strong gradient component present in the edge-flow. This has certain consequences for the filtering of edge signals that we will discuss in more detail in the following section.
4 Signal processing and learning on simplicial complexes
Using the algebraic framework of simplicial complexes as discussed in Section 3, in this section we revisit the four signal processing setups considered in Section 2.4—Fourier analysis and embeddings, smoothing and denoising, signal interpolation, and nonlinear (graph) neural networks—and discuss how these can be extended to simplicial complexes by means of the Hodge Laplacian and associated boundary maps. For concreteness, we concentrate primarily on the case of edge signals, though the results presented here can be extended to signals on any type of simplices.
4.1 Fourier analysis: Edge-flow and trajectory embeddings
In the same way that the (normalized) graph Laplacian provides a node embedding of the graph, the eigenvectors of the Hodge Laplacian can be used to induce a low-frequency edge embedding. As a concrete example, let us consider the harmonic embedding, i.e., the projection of an edge signal into the harmonic subspace, corresponding to signal with zero frequency
where corresponds to eigenvectors of the Hodge Laplacian associated with zero eigenvalues. As explained in Section 3, the harmonic space spanned by the vectors corresponds to (globally) cyclic flows that cannot be composed from locally cyclic flows (curl flows). Analogously to the embedding of nodes via indicator signals projected onto the low frequency eigenvectors of the graph Laplacian, we can construct embeddings of individual edges using (10). Unlike in the case of graphs where such node embeddings can indicate a clustering of the nodes Luxburg2007 (), an edge embedding into the harmonic subspace characterizes the position of an edge relative to the harmonic flows. Since the harmonic flows are in one-to-one correspondence with the -homology of the simplicial complex, i.e., the “holes” in the complex that are not filled with faces, such an embedding may be used to identify edges whose location is in accordance with particular harmonic cycles Schaub2018 (); ebli2019notion (). However, as the edges are equipped with an arbitrary reference orientation, the sign of the projection into the harmonic space is arbitrary. To account for this fact, subspace clustering can be used to group edges according to their contributions to harmonic cycles (-homology) ebli2019notion ().
Rather than aiming at grouping edges together into clusters according to their relative position with respect to the -homology ebli2019notion (), we may be interested in grouping sequences of edges corresponding to trajectories on a simplicial complex by projecting appropriate signal indicator vectors of such trajectories into the harmonic space Schaub2018 (). Here we represent a trajectory by a vector with entries if the edge is part of the trajectory and traversed along the chosen reference orientation, if the edge is part of the trajectory and traversed opposite to the chosen reference orientation, and otherwise.
In Figure 4A, we construct a simplicial complex by drawing random points in the unit square and generating a triangular lattice by Delaunay triangulation. We eliminate two points and all their adjacent edges in order to create two “holes” in the simplicial complex, which are not covered by a -simplex. These two holes are represented by orange shaded areas and can be interpreted as obstacles through which trajectories cannot pass. All other triangles are defined to be 2-simplices. Accordingly, the Hodge Laplacian has two zero eigenvalues associated to two harmonic functions and . On top of the simplicial complex, we define a set of five trajectories as displayed in Figure 4A. Figure 4B then shows the embedding of the vector for each trajectory and its evolution in the embedding space. Importantly, the embedding differentiates the topological properties of the trajectories. The magenta and olive green trajectories have a similar embedding since they both pass above the top left obstacle. The maroon and green trajectories pass between the two obstacle and have a similar embedding (negative coordinate along and zero component along ). The orange trajectory is the only one that goes through the right of the bottom right obstacle. Hence, its embedding stands out from the other four trajectories in the embedding space. For a more extensive discussion of these aspects see Schaub2018 ().
As we have seen in the above example, trajectories that behave similarly with respect to the -homology (“holes”) of a simplicial complex will have a similar embedding Schaub2018 (). One may thus, for instance, also identify topologically similar trajectories on the simplicial complex by clustering the resulting points in the harmonic embedding. Such an approach may also be of interest for a number of applications: One can construct simplicial complexes and appropriate trajectory embedding from a variety of flow data, including physical flows such as buoys drifting in the ocean Schaub2018 (), or “virtual” flows such as click streams or flows of goods and money. Related ideas for analyzing trajectories have also been considered in the context of traffic prediction ghosh2018topological ().
While we have considered here only harmonic embeddings corresponding to signals with zero frequency, other type of embeddings may be of interest as well. We may, for instance, be interested in gradient-flow-based embeddings, which can be used to define a form of ranking of the nodes in terms of the associated potentials Jiang2011 (), or be interested in other forms of flows, which are only approximately harmonic Jia2019 ().
4.2 Flow smoothing and denoising
We now revisit the question of smoothing and denoising from the perspective of signals defined in the edge space of a simplicial complex . In doing so, we provide more in-depth discussion on the basis vectors and notion of a smooth signal encapsulated in the Hodge 1-Laplacian and how it differs from the graph Laplacian.
Let us assume that the simplicial complex is associated with oriented flows defined on edges. Like in the node-based setup discussed in Section 2.4.2, we assume that we can only observe a noisy version of the true underlying signal, where is again a zero-mean white Gaussian noise vector of appropriate dimension. By analogy with the graph case, in order to get a smooth estimate of the true signal from the noisy signal , it is tempting to adopt the successful procedures from GSP (cf. equation (2)) and solve the following optimization program for the edge-flows
with optimal solution , where the matrix is a regularizer that needs to be chosen to ensure a smooth estimate. Following our discussion above, since the filter will inherit the eigenvectors of the regularizer , a natural choice for a regularizer is an appropriate (simplicial) shift operator.
We discuss three possible choices for the regularizer (shift operator) : (i) the graph Laplacian of the line-graph of the underlying graph skeleton of the complex , i.e., the line-graph of the graph induced by the -simplices (nodes) and -simplices (edges) of ; (ii) the edge Laplacian , i.e., a form of the Hodge Laplacian that ignores all -simplices in the complex such that ; (iii) the Hodge Laplacian that takes into account all the triangles of as well. Before embarking on this discussion, however, let us illustrate the effects of these choices by means of the following concrete example.
Figure 5A displays a conservative (cyclic) flow on a simplicial complex, i.e., all of the flow entering a particular node exits the node again. This flow is then distorted by a Gaussian noise vector in Figure 5B. The estimation error produced by the filter based on the line-graph (Figure 5C) is comparatively worse (9.45 vs. 1.94 and 0.99 respectively) than the estimation performance of the edge Laplacian (Figure 5D) and the Hodge Laplacian (Figure 5E) filters.
Let us explain the results obtained from the individual filters in the above example in more detail, starting with the line-graph approach. As can be seen from Figure 5C, in this case the filtering operation leads to an increased error compared to the noisy input signal. This ineffective filtering result by means of the line-graph Laplacian has been observed in Schaub2018a (). The reason for this unintended behavior is that the line-graph Laplacian is not well-suited as a shift operator for flow signals. The basis functions induced by the eigenvectors of the line-graph Laplacians are commensurate with a notion of smooth, low frequency signals that supposes that signals on adjacent edges in the simplicial complex have a small difference. This is equivalent to the notion that low-frequency modes in the node space do not vary a lot on tightly connected nodes on a graph. However, for flow signals this notion of smoothness induced by the set of eigenvectors of the Laplacian of the line-graph as shift operator is often not appropriate. Specifically, real-world flow signals typically display a large degree of flow conservation: most of the flow signal entering a node exits the node again, but the relative allocation of the flow to the edges does not have to be similar. Moreover, the line-graph Laplacian does not reflect the arbitrary orientation of the edges, so that performance is dependent on the chosen sign of the flow.
Unlike the line-graph Laplacian, the Edge Laplacian captures a notion of flow conservation, which implies that smooth flows should by cyclic Schaub2018a (). To see this, it is insightful to inspect the quadratic regularizer induced by . Note that this quadratic form can be written as . This is precisely the (summed) squared divergence of the flow signal , as each entry corresponds to the difference of the inflow and outflow to node
where is the flow on edge , and we have used a reference orientation induced by the lexicographic order. As a consequence, all cyclic flows will induce zero cost for the regularizer , which may also be seen from the fact that is precisely the cycle space of a graph with incidence matrix . Stated differently, any flow that is not divergence free, i.e., not cyclic, will be penalized by the quadratic form. Since by the fundamental theorem of linear algebra , any such non-cyclic flows can be written as a gradient flow for some vector of scalar node potentials — in line with the Hodge decomposition discussed in (9).
In contrast to the Edge Laplacian, the full Hodge Laplacian includes the additional regularization term , which may induce a non-zero cost even for certain cyclic flows. More precisely, as any cyclic flow can be written as a curl flow , for some vector , any curl flow will have a non-zero penalty. This penalty is incurred despite the fact that is a cyclic flow by construction (since , the vector is clearly in the cycle space; see also discussion in Section 3.2). The additional regularization term may thus be interpreted as squared curl penalty.
From a signal processing perspective, the based filter thus allows for a more refined notion of a smooth signal. Unlike in the Edge Laplacian filter, we do not declare all cyclic flows to be maximally smooth and consist only of frequency (eigenvalue) basis signals. Instead a signal can have a high-frequency even if it is cyclic, if it has a high curl component. Hence, by constructing simplicial complexes with appropriate (triangular) -simplices, we have additional modeling flexibility for shaping the frequency response of an edge-flow filter. In our example above, this more refined notion of a smooth signal is precisely what leads to an improvement in the filtering performance, since the ground truth signal is a harmonic function with respect to the simplicial complex and thus does not contain any curl components. We remark that the eigenvector basis of can always be chosen to be identical to the eigenvectors of ; thus, we may represent any signal in exactly the same way in a basis of or ; however, the frequencies associated with all cyclic vectors will be for the Edge Laplacian, while there will be cyclic flows with nonzero frequencies for , in general. This emphasizes that the construction of faces is an important modeling choice for the selection of an appropriate notion of a smooth signal.
4.3 Interpolation and semi-supervised learning
Let us now focus on the interpolation problem for edge-data on a simplicial complex Jia2019 (). Analogously to the case of node signals, we are given a simplicial complex (or its graph skeleton) and a set of “labeled” oriented edges , i.e., we assume that we have measured the edge-signals on some edges but not on all. Our goal is now to predict the signals on the unlabeled or unmeasured edges in the set , whose cardinality we will denote by . Following Jia2019 (), we will again start by considering the problem setup with no -simplices first (), before we consider the general case in which -simplices are present.
To arrive at a well-defined problem for imputing the remaining edge-flows, we need to make an assumption about the structure of the true signal. Following our above discussions, we will again assume that the true signal has a low-pass characteristic in the sense of the Hodge -Laplacian, i.e., that the edge flows are mostly conserved. Let denote the vector of the true (partly measured) edge-flow. As discussed in the context of flow smoothing, a convenient loss function to promote flow conservation is the sum-of-squares vertex divergence
We can then formalize the flow interpolation problem via the following optimization program
Note that, in contrast to the node signal interpolation problem, we have to add an additional regularization term to guarantee the uniqueness of the optimal solution. The reason is that, if there is more than one independent cycle in the network for which we have no measurement available, we may add any cyclic flow on such a cycle while not changing the cost function. To remedy this aspect, we simply add a -norm regularization which promotes small edge-flow magnitudes by default. Other regularization terms are possible as well, however this formulation enables us to rewrite the above problem in a least squares form as described below.
To arrive at a least-squares formulation, we consider a trivial feasible solution for (14) that satisfies if and otherwise. Let us now define the expansion operator as the linear map from to such that the true flow can be written as , where is the vector of the unmeasured true edge-flows. Reducing the number of variables considered in this way, we can convert the constrained optimization problem (14) into the following equivalent unconstrained least-squares estimation problem for the unmeasured edges :
We illustrate the above procedure by the following example.
We consider the network structure in Figure 2A. The ground truth signal is . We pick five labeled edges at random (colored in Figure 6A). The goal is to predict the labels of the unlabeled edges (in grey with a question mark in Figure 6A). The set of labeled edges is . The set of unlabeled edges is . Solving the optimization program (15), we obtain the predicted signal in Figure 6B. Numerical values are given in Figure 6C. The Pearson correlation coefficient between and is 0.99. The -norm of the error is 0.064.
Analogously to our discussion above, it may be relevant to include -simplices for the signal interpolation problem. We interpret such an inclusion of -simplices in two ways. From the point of view of the cost function, it implies that instead of penalizing primarily gradient flows (which have nonzero divergence), we in addition penalize certain cyclic flows, namely those that have a nonzero curl component. From a signal processing point of view, it means that we are changing what we consider a smooth (low-pass) signal, by adjusting the frequency representation of certain flows. Accordingly, one possible formulation of the signal interpolation problem, including information about simplices is
subject to the constraint that the components of corresponding to measured flows are identical to those measurements. As in (15), we can convert this program into the following least-squares problem
Note that the problem of flow interpolation is tightly coupled to the issue of signal reconstruction from sampled measurements. Indeed, if we knew that the edge signal to be recovered was exactly bandlimited Barbarossa2020 (), then we could reconstruct the edge-signal if we had chosen the edges to be sampled appropriately. Just like the interpolation problem considered here may be seen as a semi-supervised learning problem for edge labels, finding and choosing such optimal edges to be sampled may be seen as an active learning problem in the context of machine learning . While we do not expand further in this tutorial on the choice of edges to be sampled, we point the interested reader to two heuristic active learning algorithms for edge flows presented in Jia2019 (). We also refer the reader to Barbarossa2018 (); Barbarossa2020 () for a theory of sampling and reconstruction of bandlimited signals on simplicial complexes, and to barbarossa2020spmag () for a similar overview that includes an approach for topology inference based on signals supported on simplicial complexes.
4.4 Beyond linear filters: Simplicial neural networks and Hodge theory
As discussed in Section 2.4.4, graph neural networks incorporate nonlinear activation functions in the graph signal processing pipeline in order to learn rich representations for graphs. In order to generalize these architectures to operate on simplicial complexes, we discuss central concepts underpinning graph neural network architectures in order to understand desirable properties of neural networks for higher-order data. Graph neural networks in the nodal domain typically have two important features in common:
- Permutation equivariance.
Although the nodes are given labels and an ordering for notational convenience, graph neural networks are not at all dependent on the chosen labeling of the nodes. That is, if the node labels were permuted in some way, the output of the graph neural network, modulo said permutation, will not change.
Graph neural networks in their most basic form operate locally in the graph structure. Typically, at each layer a node’s representation is affected only by its own state and the state of its immediate neighbors. Forcing operations to occur locally is how the underlying graph structure is used to regularize the functional form of the graph neural network.
Based on these two principles, many architectures have been proposed, such as the popular graph convolutional network kipf2016semi (), which mixes one-step graph filters and nodewise nonlinearities for semi-supervised learning on the nodes of a graph. Indeed, there has been significant study in understanding the nature of graph convolutional architectures in terms of the spectral properties of the chosen shift or filter operation gama2020stability ().
Simplicial Graph Neural Networks
Motivated by work on graph neural networks in the node space, and the effectiveness of the Hodge Laplacian for representing certain types of data supported on simplicial complexes as in Section 3, we now discuss considerations and limitations for building graph neural network architectures based on representations grounded in combinatorial Hodge theory. As before, let be a simplicial complex over a finite set of vertices , with boundary operators , where is the order of .
We consider architectures built on the composition of matrix multiplication with boundary operators and/or Hodge Laplacians of varying order, aggregation functions, and nonlinear activation functions that obey permutation invariance, locality, and the additional properties of orientation invariance and simplicial locality.
We begin by defining orientation equivariance, which describes a similar property to permutation invariance for graph neural networks Roddenberry2019 ().
- Orientation equivariance.
If the chosen arbitrary reference orientation of the simplices in is changed, the output of the neural network architecture remains the same, modulo said change in orientation.
Due to the arbitrary nature of the simplex orientations, orientation invariance is clearly a desirable property for a neural network architecture to have. For a simple class of convolutional neural networks for flows, we must choose the nonlinear activation function carefully in order to satisfy this property. If one were to construct a simple architecture with weight matrices for flows on a simplicial complex based on of the form
we want to not change when a different orientation is chosen. Let be a matrix taking values on the diagonal and zeros elsewhere, representing a change in orientation for each edge. Then, for a flow and Hodge Laplacian , this change in orientation is realized by and . Therefore, for orientation equivariance to hold, we need
to hold for all flows . For this to be true, must be an odd function so that it commutes with . A natural extension to the notion of orientation equivariance is orientation invariance, which rewrites (19) as
This property has greater utility for tasks such as graph classification, where a global descriptor is desired, rather than output on each simplex.
Another consideration that does not typically arise in the design of graph neural networks is data supported on different levels of the graph. Data on a simplicial complex can lie on, e.g., nodes, edges, and faces simultaneously, motivating the need for architectures that pass data along the many levels of a simplicial complex. Analogous to the property of locality for graph neural networks, we consider a notion of locality for different levels of a simplicial complex.
- Simplicial locality.
At each layer of an architecture with simplicial locality, information exchange only occurs between adjacent levels of the underlying simplicial complex, i.e., the output of a layer restricted to simplices is dependent only on the input of that layer restricted to simplices.
As an illustrative example, consider a small two-layer neural network simultaneously operating over a simplicial complex of nodes, edges, and triangles. That is, the input to the neural network is a tuple of signals on the vertices (graph signals), edges (flows), and triangles, respectively, and each layer performs the following computation:
for some odd elementwise activation function . That is, at each layer, signals on each level of the simplicial complex are either lifted to the next highest level via the coboundary operator, projected to its boundary using the boundary operator, or diffused via the Hodge Laplacian. This “lifting” and “projecting” can only occur between adjacent levels of the simplicial complex, due to the fact that the composition of boundary operators is null, thereby satisfying simplicial locality.
We now examine the tuple of signals . First, suppose is the identity mapping, so that each signal in is a linear function of . Then, one can check that
Notice that each signal is strictly a function of the signals above and below it, even after multiple layers of the architecture are evaluated. This indicates that our architecture is incapable of incorporating information from nonadjacent levels of the simplicial complex, due to the composition of boundary operators being null: note that similar properties hold for linear variants of this example making use of boundary operators in this way.
This is not the case, though, when is nonlinear. While may hold for all signals on the faces, , in general. By incorporating nonlinear activation functions, we facilitate full incorporation of signals from all levels of the simplicial complex in the output at each level. We call this property extended simplicial locality.
- Extended simplicial locality.
For an architecture with extended simplicial locality, the output restricted to simplices is dependent on the input restricted to simplices at all levels, not just those of order .
Notice that while simplicial locality is defined for each layer of an architecture, extended simplicial locality is a global property, so that both are simultaneously attainable. There is a trade-off in achieving extended simplicial locality by interleaving nonlinearities: although there is full influence of the entire simplicial structure on all levels of the output, the structure endowed by the boundary operators (namely, the composition of boundary operators being null) is no longer in effect. This motivates further considerations of how nonlinearities may be necessary in modeling higher-order data, such as in the work of Neuhauser:2020 (); neuhauser2020multibody (), where it is shown that higher-order opinion dynamics must be nonlinear, lest they be equivalently modeled by a purely pairwise system. That is, we must relax the structure of simplicial complexes in order to represent more general high-order interactions. In doing this, we exchange the connection to algebraic topology for greater flexibility in modeling. This naturally leads to the consideration of hypergraphs and associated signal processing ideas, as discussed in the next section.
5 Modeling higher-order interactions via hypergraphs
In this section, we discuss hypergraphs as an alternative to simplicial complexes to model higher-order analogs of graphs, and then discuss how we can construct appropriate matrix-based and tensor-based shift operators for such hypergraphs to enable the development of signal processing tools.
An important structural feature of simplicial complexes is that for every simplex present all of its faces are also included in the complex (and recursively the corresponding faces, and so on). This inclusion property gives rise to the hierarchy of boundary operators, which anchor simplicial complexes to algebraic topology. However, this subset inclusion property may be an undesirable restriction, if we want to represent interactions that are exclusive to multiple nodes and do not imply the interaction between all the subsets of nodes. A related problem is the issue of (extended) simplicial locality as discussed in the previous section, which arises from the restrictions imposed on the boundary operators of simplicial complexes. Finally, while simplices are endowed with a reference orientation and may be weighted, we might be interested in encoding other types of directionality or heterogeneous weighting schemes of group interactions, which are not easily compatible with the mathematical structure of simplicial complexes.
To illustrate the utility of hypergraphs as modelling tools, let us consider a number of concrete examples in which a hypergraph model may be preferred over a simplicial complex, before providing a more mathematical definition.
In a co-authorship network, having a paper with three or more authors does not imply that these people have written papers in pairs. Hypergraphs can distinguish these two cases while graphs and simplicial complexes cannot, in general. Moreover, the relative contribution of the authors to a paper may be different and we thus may want to have a representation that enables us to assign heterogeneous weights within group interactions. This again can be done using hypergraphs. An email network may be described using a directed hypergraph, whenever there exist emails containing multiple senders or multiple receivers. This kind of directional information will be difficult to encode in a simplicial complex (while graphs can encode the directionality here, they lose the higher-order information). Further examples in which hypergraphs appear naturally include word-document networks in text mining, gene-disease networks in bioinformatics, and consumer-product networks in e-commerce.
Mathematically, a typical hypergraph consists of a set of vertices , a set of hyperedges containing subsets of the vertex set , and a function that assigns positive weights to hyperedges. In the most common case, where there is one type of node and one type of edge, a hypergraph is called homogeneous. A hypergraph is called -uniform if all of its hyperedges have the same cardinality . Notice, in particular, that a hypergraph is a bona fide generalization of a graph, since a -uniform homogeneous hypergraph reduces to a graph. More interestingly, a simplicial complex may be seen as a hypergraph satisfying the property that every subset of a hyperedge is also a hyperedge. Similar to a standard graph, a hypergraph can also be directed in which case each (directed) hyperedge is an ordered pair where and are two disjoint subsets of vertices respectively called the tail and the head of directed_hypergraph (). This flexibility is of interest, e.g., when modelling multiway communication patterns as illustrated in the email example above.
While the standard framework of hypergraphs is already very flexible, in recent years several more elaborate hypergraph models have been proposed to better represent real-world datasets:
Heterogeneous hypergraphs refer to hypergraphs containing different types of vertices and/or different types of hyperedges hg1 (); hg2 (); hg3 (); hg4 () and may thus be seen as a generalization of multilayer and multiplex networks. For example, in a GPS network gps (), a hyperedge can have three types of vertices (user, location, activity).
Edge-dependent vertex weights are introduced into hypergraphs in inh1 (); inh2 () to reflect the different contribution (e.g., importance or influence) of vertices in the same hyperedge. More precisely, for each hyperedge , a function is defined to assign positive weights to vertices in this hyperedge. For instance, in the co-authorship network in Example 8, the different levels of contribution of the authors of a paper can be encoded as edge-dependent vertex weights. If for every vertex and every pair of hyperedges and containing , then we say that the vertex weights are edge-independent. Such hypergraphs are also called vertex-weighted hypergraphs vertex_weighted (). Moreover, if for all vertices and incident hyperedges , the vertex weights are trivial and we recover the homogeneous hypergraph model.
In order to leverage the fact that different subsets of vertices in one hyperedge may have different structural importance, the concept of an inhomogeneous hyperedge is proposed in panli1 (). Each inhomogeneous hyperedge is associated with a function that assigns non-negative costs to different cuts of the hyperedge, where denotes the power set of . The weight indicates the cost of partitioning the hyperedge into two subsets and . This is called a submodular hypergraph when satisfies certain requirements panli2 ().
Similar to the case of graphs and simplicial complexes, a key factor for developing signal processing tools for hypergraphs is the definition of an appropriate shift operator. In the case of simplicial complexes, we argued that the Hodge Laplacian is a natural and principled operator for this purpose. For hypergraphs there are two major approaches to their mathematical representation, which induce different kinds of shift operators.
The first option is to used a matrix-based representation and derive a shift operator from it, akin to the approach of GSP. As any matrix may be interpreted as an adjacency matrix of a graph and thus induces a weighted, directed graph, this procedure may be understood as first deriving a graph-based representation of the hypergraph and then using an algebraic representation of this graph (e.g., adjacency or Laplacian matrices) as the algebraic shift operator of the hypergraph.
The second option is to represent the hypergraph using a tensor, i.e., a multi-dimensional array representation instead of the 2-dimensional array representation provided by matrices (we refer to kolda2009tensor (); cichocki2015tensor (); sidiropoulos2017tensor () for a general introduction to tensors and tensor decompositions). While this provides, in principle, a richer set of possible representations of the shift operator, there are also challenges associated with this procedure as the definition of a hypergraph signal and its processing is less grounded in GSP and related techniques. In the following subsections, we respectively discuss these two choices of representations, starting with matrix-based representations.
5.1 Matrix-based hypergraph representations
The most common approach to deal with hypergraph-structured data is to encode the hypergraph as a matrix. When interpreting the corresponding matrices as graphs, many of these matrix-based approaches can thus, alternatively, be viewed as deriving a graph representation for the hypergraph. Accordingly, these approaches are often described in terms of graph expansions. We prefer the term matrix representation here, as the fact that we encode a particular data structure via a matrix does not imply that the data structure is itself a graph (possibly with weights and signed edges). For instance, we studied matrix-based representations of simplicial complexes in the previous sections, but this would typically not be considered a graph expansion of a simplicial complex.
Let us now discuss some of the most common matrix-based hypergraph representations and transformations (see Figure 7 for a visual overview of the discussed variants), including the so-called clique and star expansions as the most popular variants Agarwal2006 (). To this end, consider a homogeneous hypergraph and define the node-to-hyperedge incidence matrix as with entries if node belongs to hyperedge . In addition, we will represent the weights of the hyperedges by the diagonal matrix , whose diagonal corresponds to the hyperedge weights.
Let us first consider the so called star-graph expansion (Figure 7D). Using the above defined matrices, the star-graph expansion can be explained by constructing the following adjacency matrix of a bipartite graph
When interpreted in terms of a graph, this construction may be explained as follows: We introduce a new vertex for each hyperedge and each of these vertices is then connected with a weight corresponding to the weight of the hyperedge to all the (original) vertices in this hyperedge. The constructed weighted graph , thus has a vertex set , an edge set , and an edge weight function . Many other weight functions are possible here as well, e.g., we may normalize by the cardinality of the hyperedges. By constructing appropriate Laplacian operators (combinatorial or normalized) of such a star expansion matrix, we can thus obtain a shift-operator for the hypergraph in a straightforward fashion.
An alternative matrix-based representation that can be derived from the same matrices defined above is the clique expansion (Figure 7C). In matrix terms, this corresponds to projecting out the hyperedge dimension of the incidence matrix . Specifically, if we assume unit hyperedge weights for simplicity, the clique expansion may be computed by forming the product . As this matrix has a nonzero diagonal, we can simply set the diagonal of this matrix to zero to obtain a basic clique expansion matrix . By including various weighting factors, alternative variants of this matrix can be derived. The name clique expansion becomes intuitive if we again interpret as the adjacency matrix of a graph: The above construction corresponds to replacing every hyperedge with a clique subgraph. More precisely, the clique expansion leads to the adjacency matrix of a graph in which , . One of the most common definitions for the edge weighting function in this context is , i.e., the edge weight in the graph is simply given by the sum of the weights of hyperedges that contain the two endpoints. However, many other weighting schemes are conceivable.
As has been shown in Agarwal2006 (), many hypergraph learning algorithms zhou2006learning (); bolla1993spectra (); rodri2002laplacian (); rodriguez2003laplacian (); gibson2000clustering (); sole1996spectra () correspond to either the clique or star expansions with an appropriate weighting function. However, apart from these common expansions, there also exist other methods for projecting hypergraphs to graphs such as constructing a line graph bandyopadhyay2020line (). This line-graph expansion for hypergraphs (see Figure 7E for an illustration) may be computed in terms of (weighted variants of) the second possible projection of the incidence matrix , namely . Apart from these three canonical types of graph representations (star, clique, and line graph) that can be derived from the incidence matrix and additional (weighting) transformations, a few other matrix-based schemes have been proposed for representing hypergraphs. For instance, the recent paper yang2020hypergraph () proposes the so-called line expansion of a hypergraph (different from the line graph; see Figure 7F), which is isomorphic to the line graph of its star expansion and aims to unify the clique and star expansions. In the line expansion, each incident vertex-hyperedge pair is considered as a “line node” and two “line nodes” are connected if they share either the vertex or the hyperedge. Apart from deriving standard (normalized and combinatorial) Laplacian matrices based on these various matrix representations, other types of (nonlinear) Laplacians have been considered and generalized to hypergraphs such as -Laplacians saito2017hypergraph (); ma2018hypergraph (); panli2 (). We would like to remark that in some cases we might be more interested in the dual of one hypergraph in which the roles of vertices and hyperedges are interchanged and the incidence matrix is ; see Figure 7B.
While we have so far considered only homogeneous hypergraphs, Laplacian matrices have also been proposed for more general hypergraph models. For instance, inh1 (); inh2 (); panli1 () use variants of the clique expansion to derive matrix representations of hypergraphs with edge-dependent vertex weights or inhomogeneous hyperedges. Specifically, in inh1 (); inh2 () hypergraphs with edge-dependent vertex weights are projected onto asymmetric matrices, corresponding to induced directed graphs with self-loops. The authors then use established combinatorial and normalized Laplacians for digraphs chung2005laplacians () applied to these matrices to derive a Laplacian matrix for hypergraphs. Finally, in panli1 (), a novel algorithm for assigning edge weights to the graph representation is proposed, allowing for non-uniform expansions of hyperedges.
As the above discussion shows, there is an enormous variety of matrix-based representations for hypergraphs, and the relative advantages and disadvantages of these constructions are still sparsely understood. Ultimately, the choice of a particular matrix representation corresponds to a specific model for what constitutes a smooth signal on a hypergraph. We believe that a better understanding of the spectral properties of the individual constructions will thus be an important step for choosing good matrix representations for different application scenarios.
5.2 Tensor-based hypergraph representations
Instead of working with matrix-based representations, hypergraphs can alternatively be represented by tensors. A tensor is simply a multi-dimensional array, whose order is the number of indices needed to label an element in the tensor kolda2009tensor (). For instance, a vector and a matrix are a first-order and a second-order tensor, respectively. Several different versions of a hypergraph adjacency tensor have been proposed in existing work cooper2012spectra (); michoel2012alignment (); pearson2015spectral (); ghoshdastidar2017uniform (); hu2015laplacian (); banerjee2017spectra (); ouvrard2017adjacency (); benson2015tensor (); benson2019three (); zhang2019introducing (). In this section, we focus on unweighted hypergraphs to keep our exposition accessible and to remain consistent with the majority of the existing work in this domain.
Due to their relative simplicity, -uniform hypergraphs have been first studied in the literature. As every hyperedge is of the same order, a -uniform hypergraph with nodes can be naturally represented by a th-order adjacency tensor , where each index ranges from to , and the entries of are defined as follows michoel2012alignment (); ghoshdastidar2017uniform ()
Every other entry in is set to zero. Similarly to how it can be meaningful to normalize the adjacency matrix, normalized versions of this adjacency tensor have been proposed as well. In cooper2012spectra (), the tensor in (28) is normalized by . This normalization guarantees that the degree of a vertex , i.e., the number of hyperedges that it belongs to, can be retrieved by summing the entries in the tensor whose first mode index is , namely ; see ouvrard2017adjacency (). This is desirable because it resembles the way of obtaining the degree of a vertex in a graph from its adjacency matrix. Another normalized adjacency is proposed in hu2015laplacian () where
and the rest of the entries are equal to zero. Its associated normalized Laplacian tensor is defined as where is a tensor of the same size as , and its entry if and otherwise. This normalization ensures that has certain desirable spectral properties that mimic those of the normalized graph Laplacian hu2015laplacian (). For example, the eigenvalues of as defined in qi2005eigenvalues () are guaranteed to be contained in . Having a bounded spectrum has shown to be useful in GSP for the stability analysis of graph filters isufi2017autoregressive ().
For hypergraphs with non-uniform hyperedges, i.e., hyperedges of different sizes, the above construction does not extend easily. Since some edges will have smaller cardinality than others, some indices in the adjacency tensor would simply be undefined. A naive approach would be to keep an adjacency tensor for each observed cardinality of hyperedges, but this approach is computationally impractical. An alternative is to augment the above construction of an adjacency tensor for general homogeneous hypergraphs as follows. Denote by the cardinality of the largest hyperedge across all hyperedges . Then, we construct an adjacency tensor of order according to the following rules banerjee2017spectra (). For every hyperedge of cardinality , we assign the following nonzero entries to
where the indices are chosen in all possible ways from such that every element of this latter set is represented at least once. The rest of the entries of are set to zero. The Laplacian tensor is then defined as where is a super-diagonal tensor of the same size as and with entries equal to the degree of vertex . To illustrate definition (30), consider the following example.
Consider a hypergraph composed of four nodes and two hyperedges and . We have that and the adjacency tensor is of size . For , the corresponding and , thus the corresponding entries in the tensor are defined as . For , the corresponding and there are two choices for and , i.e., or . Thus we have . The remaining entries are set to zero.
Having defined adjacency and Laplacian tensors, we can now construct appropriate shift operators based on these tensors. In the context in which we are interested in processing signals defined on the nodes, the following approach has been proposed zhang2019introducing (). First, given the signal vector construct the following th-order outer product tensor as
where denotes the tensor outer product and is the order of the adjacency or Laplacian (shift) tensor of interest. Then, the hypergraph shift operation leading to the output signal is defined elementwise as
where correspond to the entries of the chosen shift tensor. Equivalently, we may express the above in terms of tensor products as
Note that, due to the symmetry of the tensor , it does not matter which mode we leave out in the tensor multiplication, i.e., which of the indices is kept fixed to in (32). Furthermore, for the specific case where , we have that in (31) and the shift operation in (32) boils down to a standard matrix-vector multiplication as in GSP.
5.3 Comparison between matrix-based and tensor-based hypergraph representations
The major advantage of matrix-based methods is that a lot of well-developed graph-related algorithms can be directly utilized. However, if the resulting matrix representation is akin to a graph in that it only encodes pairwise relations between vertices (clique expansion), or hyper-edges (line graphs), there will be some information loss, in general, compared to the original hypergraph structure. In contrast, for the star-expansion, all the incidence information is kept in the matrix representation. However, the resulting graph is bipartite. The bipartite graph structure might be undesirable for some applications since there are no explicit links between the same types of vertices and there are much fewer algorithms tailored for bipartite graphs than those for simple graphs yang2020hypergraph ().
Compared with matrix representations, tensors can better retain the set-level information contained in hypergraphs. However, tensor computations are more complicated and lack algorithmic guarantees benson2015tensor (). For example, determining the rank of a specific tensor is NP-hard haastad1989tensor (). Most existing papers have focused on super-symmetric tensors qi2005eigenvalues (), while more general tensors are less explored. Indeed, how to best leverage tensor-based representations to study hypergraphs that are not homogeneous is an open problem.
6 Signal processing and learning on hypergraphs
Mimicking the respective developments in Section 2.4 for graphs and Section 4 for simplicial complexes, in this section we consider the four signal processing setups for hypergraphs equipped with the algebraic representations developed in Section 5.
6.1 Fourier analysis, node and hyperedge embeddings
As stated in Section 5.1, shift operators for hypergraphs can be represented via matrices. The corresponding eigenvectors may then be used as Fourier modes and, thus, most GSP tools discussed in Section 2 can be directly translated to hypergraphs for matrix-based hypergraph shift operators. However, unlike for the case of graphs, even an undirected hypergraph may result in an asymmetric matrix, e.g., if hyperedge weightings are considered. Hence, one may have to adopt tools from GSP for directed graphs in this case; see marques2020signal () for a more detailed exposition of these issues.
In contrast to matrix-based shift operators, the notion of Fourier analysis for hypergraphs represented via tensors is far less developed. Nonetheless, we may proceed analogously to the matrix case and define Fourier modes via a tensor decomposition, in lieu of the eigenvector decomposition. Specifically, we can consider the orthogonal canonical polyadic (CP) decomposition afshar2017cp () of the adjacency tensor (other representative tensors can also be considered) given by
where are scalars, and is the so-called rank of the tensor (i.e., is the smallest number such that can be represented as a weighted sum of rank-1 outer-product tensors). Using this decomposition, the hypergraph Fourier basis can then be defined as . When , the first vectors are determined via the CP decomposition and additional vectors satisfying specific conditions are selected to complete the basis (see Section III-F in zhang2019introducing () for details). Similar to the matrix case, the hypergraph Fourier frequencies are defined as the coefficients associated with the rank-1 terms in the decomposition.
Extending the arguments from the matrix case to the tensor case, the hypergraph Fourier transform (HGFT) and inverse HGFT (iHGFT) zhang2019introducing () are then defined as
where denotes the -th power of each entry of . By introducing such definitions, an application of the tensor shift can be equivalently interpreted as a HGFT followed by a combined operation consisting of filtering in the Fourier domain plus iHGFT (cf. Equation (25) in zhang2019introducing ()). Observe that, when , the hypergraph shift defined in (32), and the HGFT and iHGFT defined in (35) have the same form as the corresponding concepts in GSP (cf. Section 2.3).
Similar to how graph Fourier modes can be used to derive node embeddings, the same can be done for hypergraphs using either matrix or tensor representations. In the case of matrix representations, the procedure is entirely analogous. In the case of tensor representations, we have to use a tensor decomposition but can proceed in a similar fashion once the (tensor-based) Fourier modes are derived. While tensor-based embeddings have only been scarcely considered in the literature so far, e.g., sharma2018hyperedge2vec () represents the dual hypergraph (see Figure 7B) using tensors and learns hyperedge embeddings by performing a symmetric tensor decomposition comon2008symmetric (); kolda2015numerical (). Finally, the embedding of heterogeneous hypergraphs entails additional challenges that can be (partially) addressed through non-linear neural network based approaches; see hg2 () for more details.
6.2 Signal smoothing and denoising
As was the case for graphs and simplicial complexes, one can leverage the structure of hypergraphs to solve inverse problems associated with signals defined on them. For the specific problem of denoising, the assumption is that the signal to be recovered is smooth in the hypergraph, where smoothness typically encodes the fact that tightly connected nodes should have similar signal values. For instance, in the co-authorship network in Example 8, if two authors share many papers (hyperedges) either written solely by them or in collaboration with others, one would expect a signal that represents research interests to take similar values for the two mentioned authors. This notion of homophily is well established for graphs and naturally extends to hypergraphs.
Mathematically, in line with the graph case in Section 2.4.2, we assume that we observe a noisy version of the true underlying signal defined on the node set of our hypergraph . Then, we can try to estimate by solving the optimization problem
where the first term is to constrain the denoised signal to be close to the observation and the second term is a regularizer shaped by the structure of .
A possible choice for is to select a Laplacian matrix representation of (cf. Section 5.1) and set the regularizer to the quadratic form as in the graph case. From the discussion after (2) it follows that the optimal solution will then be a low-pass version of where the bases for low and high frequencies depend on the specific graph expansion selected. The most common one is to consider the clique expansion, in which we have