Powerset Convolutional Neural Networks
Abstract
We present a novel class of convolutional neural networks (CNNs) for set functions, i.e., data indexed with the powerset of a finite set. The convolutions are derived as linear, shiftequivariant functions for various notions of shifts on set functions. The framework is fundamentally different from graph convolutions based on the Laplacian, as it provides not one but several basic shifts, one for each element in the ground set. Prototypical experiments with several set function classification tasks on synthetic datasets and on datasets derived from realworld hypergraphs demonstrate the potential of our new powerset CNNs.
1 Introduction
Deep learningbased methods are providing stateoftheart approaches for various image learning and natural language processing tasks, such as image classification Krizhevsky et al. (2012); He et al. (2016), object detection Ren et al. (2015), semantic image segmentation Ronneberger et al. (2015), image synthesis Goodfellow et al. (2014), language translation / understanding Hochreiter and Schmidhuber (1997); Young et al. (2018) and speech synthesis Van Den Oord et al. (2016). However, an artifact of many of these models is that regularity priors are hidden in their fundamental neural building blocks, which makes it impossible to apply them directly to irregular data domains. For instance, image convolutional neural networks (CNNs) are based on parametrized 2D convolutional filters with local support, while recurrent neural networks share model parameters across different time stamps. Both architectures share parameters in a way that exploits the symmetries of the underlying data domains.
In order to port deep learners to novel domains, the according parameter sharing schemes reflecting the symmetries in the target data have to be developed Ravanbakhsh et al. (2017). An example are neural architectures for graph data, i.e., data indexed by the vertices of a graph. Graph CNNs define graph convolutional layers by utilizing results from algebraic graph theory for the graph Laplacian Shuman et al. (2012); Bruna et al. (2013) and message passing neural networks Scarselli et al. (2009); Gilmer et al. (2017) generalize recurrent neural architectures from chain graphs to general graphs. With these building blocks in place, neural architectures for supervised Defferrard et al. (2016); Gilmer et al. (2017); Selsam et al. (2018), semisupervised Kipf and Welling (2016) and generative learning Simonovsky and Komodakis (2018); Wang et al. (2018) on graphs have been deployed. These research endeavors fall under the umbrella term of geometric deep learning (GDL) Bronstein et al. (2016).
In this work, we want to open the door for deep learning on set functions, i.e., data indexed by the powerset of a finite set. There are (at least) three ways to do so. First, set functions can be viewed as data indexed by a hypercube graph, which makes graph neural nets applicable. Second, results from the Fourier analysis of set functions based on the WalshHadamardtransform (WHT) Stobbe and Krause (2012); De Wolf (2008); O’Donnell (2014) can be utilized to formulate a convolution for set functions in a similar way to Shuman et al. (2012). Third, Püschel (2018) introduces several novel notions of convolution for set functions (powerset convolution) as linear, equivariant functions for different notions of shift on set functions. This derivation parallels the standard 2Dconvolution (equivariant to translations) and graph convolutions (equivariant to the Laplacian or adjacency shift) Ortega et al. (2018). A general theory for deriving new forms of convolutions (and associated Fourier transforms and other signal processing tools) is outlined in Püschel and Moura (2008, 2006).
Contributions
Motivated by the work on generalized convolutions and by the potential utility of deep learning on novel domains, we propose a methoddriven approach for deep learning on irregular data domains and, in particular, set functions:

We formulate novel powerset CNN architectures by integrating prior and novel convolutions for set functions Püschel (2018).

As a protoypical application, we consider the set function classification task. Since there is little prior work in this area, we evaluate our powerset CNNs on three synthetic classification tasks (submodularity and spectral properties) and two classification tasks on data derived from realworld hypergraphs (Benson et al., 2018). For the latter, we design classifiers to identify the origin of the extracted subhypergraph. To deal with hypergraph data, we introduce several setfunctionbased hypergraph representations.

We validate our architectures experimentally, and show that they generally outperform the natural fullyconnected and graphconvolutional baselines on a range of scenarios and hyperparameter values.
2 Convolutions on Set Functions
We introduce background and definitions for set functions and associated convolutions. For context and analogy, we first briefly review prior convolutions for 2D and graph data. From the signal processing perspective, 2D convolutions are linear, shiftinvariant (or equivariant) functions on images , where the shifts are the translations . The 2D convolution thus becomes
(1) 
Equivariance means that all convolutions commute with all shifts: .
Convolutions on vertexindexed graph signals are linear, equivariant w.r.t. the Laplacian shifts , where is the graph Laplacian Shuman et al. (2012).
Set functions
With this intuition in place, we now consider set functions. We fix a finite set . An associated set function is a signal on the powerset of :
(2) 
Powerset convolution
A convolution for set functions is obtained by specifying the shifts to which it is equivariant. The work in Püschel (2018) specifies as one possible choice of shifts for . Note that in this case the shift operators are parametrized by the monoid since for all
which implies . The corresponding linear, shiftequivariant powerset convolution is given by Püschel (2018)
(3) 
Note that the filter is itself a set function. Table 1 contains an overview of generalized convolutions and the associated shift operations to which they are equivariant to.
Fourier transform
Each filter defines a linear operator obtained by fixing in (3). It is diagonalized by the powerset Fourier transform
(4) 
where denotes the Kronecker product. Note that in this case and that the spectrum is also indexed by subsets . In particular, we have
(5) 
in which denotes the frequency response of the filter Püschel (2018).
Other shifts and convolutions
Localized filters
Filters with for are localized in the sense that the evaluation of only depends on evaluations of on sets differing by at most elements from . In particular, localized filters are the counterpart of onehop filters that are typically used in graph CNNs Kipf and Welling (2016). In contrast to the omnidirectional onehop graph filters, these onehop filters have one direction per element in .
2.1 Applications of Set Functions
Set functions are of practical importance across a range of research fields. Several optimization tasks, such as cost effective sensor placement Krause and Guestrin (2007), optimal ad placement Golovin et al. (2014) and tasks such as semantic image segmentation Osokin and Vetrov (2015), can be reduced to subset selection tasks, in which a set function determines the value of every subset and has to be maximized to find the best one. In combinatorial auctions, set functions can be used to describe bidding behavior. Each bidder is represented as a valuation function that maps subsets of goods to the subjective values the customer assigns to them De Vries and Vohra (2003). Cooperative games are set functions. A coalition is a subset of players and a coalition game assigns a value to every subset of players. In the simplest case the value one is assigned to winning and the value zero to losing coalitions Branzei et al. (2008). Further, graphs and hypergraphs also admit set function representations:
Definition 1.
(Hypergraph) A hypergraph is a triple , where is a set of vertices, is a set of hyperedges and is a weight function.
The weight function of a hypergraph is a set function on by setting if and otherwise. Additionally, hypergraphs induce two set functions, namely the hypergraph cut and association score function:
(7) 
signal  shifted signal  convolution  reference  CNN  

image  standard  standard  
graph Laplacian  Shuman et al. (2012)  Bruna et al. (2013)  
graph adjacency  Sandryhaila and Moura (2012)    
group  Stankovic et al. (2005)  Cohen and Welling (2016)  
group spherical  Cohen et al. (2018)  Cohen et al. (2018)  
powerset  Püschel (2018)  this paper 
2.2 Convolutional Pattern Matching
The powerset convolution in (3) raises the question of which patterns are “detected” by a filter . In other words, to which signal does the filter respond strongest when evaluated at a given subset ? We call this signal (the pattern matched at position ). Formally,
(8) 
For , the answer is . This is because the dot product , with , is maximal if and are aligned. Similarly, the 1D convolutional filter matches at position and the group convolutional filter matches at position . Slightly rewriting (3) yields the answer for the general case :
(9) 
Equation (9) shows that the powerset convolution evaluated at position can be seen as the convolution of a new filter with restricted to the powerset evaluated at position , the case for which we know the answer: if and otherwise.
Notice that this behavior is different from 1D and 2D convolutions: there the underlying shifts are invertible and thus the detected patterns are again shifted versions of each other. Since powerset shifts are not invertible, the detected patterns by a filter are not just (set)shifted versions of each other as shown above.
A similar behaviour can be expected with graph convolutions since the Laplacian shift is never invertible and the adjacency shift is not always invertible.
3 Powerset Convolutional Neural Networks
Convolutional layers
We define a convolutional layer by extending the convolution to multiple channels, summing up the feature maps obtained by channelwise convolution as in Bronstein et al. (2016):
Definition 2.
(Powerset convolutional layer) A powerset convolutional layer is defined as follows:

The input is given by set functions ;

The output is given by set functions ;

The layer applies a bank of set function filters , with and , and a pointwise nonlinearity resulting in
(10)
Pooling layers
As in conventional CNNs, we define powerset pooling layers to gain additional robustness with respect to input perturbations, and to control the number of features extracted by the convolutional part of the powerset CNN. From a signal processing perspective, the crucial aspect of the pooling operation is that the pooled signal lives on a valid signal domain, i.e., a powerset. One way to achieve this is by combining elements of the ground set.
Definition 3.
(Powerset pooling) Let be the ground set of size obtained by combining all the elements in into a single element. E.g., for we get . Therefore every subset defines a pooling operation
(11) 
In our experiments we always use . It is also possible to pool a set function by combining elements of the powerset as in Scheibler et al. (2015) or by the simple rule for . Then, a pooling layer is obtained by applying our pooling strategy to every channel.
Definition 4.
(Powerset pooling layer) A powerset pooling layer takes set functions as input and outputs pooled set functions , with , by applying the pooling operation to every channel
(12) 
Powerset CNN
A powerset CNN is a composition of several powerset convolutional and pooling layers. Depending on the task, the outputs of the convolutional component can be fed into a multilayer perceptron, e.g., for classification.
Fig. 1 illustrates a forward pass of a powerset CNN with two convolutional layers, each of which is followed by a pooling layer. The first convolutional layer is parameterized by three onehop filters and the second one is parameterized by three times five onehop filters. The filter coefficients were initialized with random weights for this illustration.
Implementation
We implemented the powerset convolutional and pooling layers in Tensorflow Abadi et al. (2016). Our implementation supports various definitions of powerset shifts, and utilizes the respective Fourier transforms to compute the convolutions in the frequency domain. Sample implementations are provided in the supplementary material.
4 Experimental Evaluation
Our powerset CNN is built on the premise that the successful components of conventional CNNs are domain independent and only rely on the underlying concepts of shift and shiftequivariant convolutions. In particular, if we use only onehop filters, our powerset CNN satisfies locality and compositionality. Thus, similar to image CNNs, it should be able to learn localized hierarchical features. To understand whether this is useful when applied to set function classification problems, we evaluate our powerset CNN architectures on three synthetic tasks and on two tasks based on realworld hypergraph data.
Problem formulation
Intuitively, our set function classification task will require the models to learn to classify a collection of set functions sampled from some natural distributions. One such example would be to classify (hyper)graphs coming from some underlying data distributions. Formally, the set function classification problem is characterized by a training set composed of pairs (set function, label), as well as a test set. The learning task is to utilize the training set to learn a mapping from the space of set functions to the label space .
4.1 Synthetic Datasets
Unless stated otherwise, we consider the ground set with , and sample set functions per class. We use of the samples for training, and the remaining for testing. We only use one random split per dataset. Given this, we generated the following three synthetic datasets, meant to illustrate specific applications of our framework.
Spectral patterns
In order to obtain nontrivial classes of set functions, we define a sampling procedure based on the Fourier expansion associated with the shift . In particular, we sample Fourier sparse set functions, with sparse. We implement this by associating each target “class” with a collection of frequencies, and sample normally distributed Fourier coefficients for these frequencies. In our example, we defined four classes, where the Fourier support of the first and second class is obtained by randomly selecting roughly half of the frequencies. For the third class we use the entire spectrum, while for the fourth we use the frequencies that are either in both of class one’s and class two’s Fourier support, or in neither of them.
junta classification
A junta is a boolean function defined on variables that only depends on of the variables . In the same spirit, we call a set function a junta if its evaluations only depend on the presence or absence of of the elements of the ground set. We generate a junta classification dataset by sampling random juntas for . We do so by utilizing the fact that shifting a set function by eliminates its dependency on , i.e., for with we have because . Therefore, sampling a random junta amounts to first sampling a random value for every subset and performing set shifts by randomly selected singleton sets.
Submodularity classification
A set function is submodular if it satisfies the diminishing returns property
(13) 
In words, adding an element to a small subset increases the value of the set function at least as much as adding it to a larger subset. We construct a dataset comprised of submodular and "almost submodular" set functions. As examples of submodular functions we utilize coverage functions Krause and Golovin (2014) (a subclass of submodular functions that allows for easy random generation). As examples of what we informally call "almost submodular" set functions here, we sample coverage functions and perturb them slightly to destroy the coverage property.
4.2 Real Datasets
Finally, we construct two classification tasks based on real hypergraph data. Reference Benson et al. (2018) provides 19 realworld hypergraph datasets. Each dataset is a hypergraph evolving over time. An example is the DBLP coauthorship hypergraph in which vertices are authors and hyperedges are publications. In the following, we consider classification problems on subhypergraphs induced by vertex subsets of size ten. The hypergraphs are represented by their weight set function (with unit weights).
Definition 5.
(Induced Subhypergraph) Let be a hypergraph. The subset of vertices induces a subhypergraph with and .
Domain classification
As we have multiple hypergraphs, an interesting question is whether it is possible to identify from which hypergraph a given subhypergraph of size ten was sampled, i.e., whether it is possible to distinguish the hypergraphs by considering only local interactions. Therefore, among the publicly available hypergraphs in Benson et al. (2018) we only consider those containing at least 500 hyperedges of cardinality ten (namely, DAWN: 1159, threadsstackoverflow: 3070, coauthDBLP: 6599, coauthMAGHistory: 1057, coauthMAGGeology: 7704, congressbills: 2952). The coauth hypergraphs are coauthorship hypergraphs, in DAWN the vertices are drugs and the hyperedges patients, in threadsstackoverflow the vertices are users and the hyperedges questions on threads on stackoverflow.com and in congressbills the vertices are congresspersons and the hyperedges cosponsored bills. From those hypergraphs we sample all the subhypergraphs induced by the hyperedges of size ten and assign the respective hypergraph of origin as class label. In addition to this dataset (DOM6), we create an easier version (DOM4) in which we only keep one of the coauthorship hypergraphs, namely coauthDBLP.
Simplicial closure
Reference Benson et al. (2018) distinguishes between open and closed hyperedges (the latter are called simplices). A hyperedge is called open if its vertices in the 2section (the graph obtained by making the vertices of every hyperedge a clique) of the hypergraph form a clique and it is not contained in any hyperedge in the hypergraph. On the other hand, a hyperedge is closed if it is contained in one or is one of the hyperedges of the hypergraph. We consider the following classification problem: For a given subhypergraph of ten vertices, determine whether its vertices form a closed hyperedge in the original hypergraph or not.
In order to obtain examples for closed hyperedges, we sample the subhypergraphs induced by the vertices of hyperedges of size ten and for open hyperedges we sample subhypergraphs induced by vertices of hyperedges of size nine extended by an additional vertex. In this way we construct two learning tasks. First, CON10 in which we extend the ninehyperedge by choosing the additional vertex such that the resulting hyperedge is open (2952 closed and 4000 open examples). Second, COAUTH10 in which we randomly extend the size nine hyperedges (as many as there are closed ones) and use coauthDBLP for training and coauthMAGHistory & coauthMAGGeology for testing.
4.3 Experimental Setup
Baselines
As baselines we consider a multilayer perceptron (MLP) Rosenblatt (1961) with two hidden layers of size 4096 and an appropriately chosen last layer and graph CNNs (GCNs) on the undirected dimensional hypercube. Every vertex of the hypercube corresponds to a subset and vertices are connected by an edge if their subsets only differ by one element. We evaluate graph convolutional layers based on the Laplacian shift Kipf and Welling (2016) and based on the adjacency shift Sandryhaila and Moura (2012). In both cases one layer does at most one hop.
Our models
For our powerset CNNs (PCNs) we consider convolutional layers based on onehop filters of two different convolutions: and . For all types of convolutional layers we consider the following models: convolutional layers followed by an MLP with one hidden layer of size 512 as illustrated before, a pooling layer after each convolutional layer followed by the MLP, and a pooling layer after each convolutional layer followed by an accumulation step (average of the features over all subsets) as in Gilmer et al. (2017) followed by the MLP. For all models we use 32 output channels per convolutional layer and ReLU Nair and Hinton (2010) nonlinearities.
Training
We train all models for 100 epochs (passes through the training data) using the Adam optimizer Kingma and Ba (2014) with initial learning rate 0.001 and an exponential learning rate decay factor of 0.95. The learning rate decays after every epoch. We use batches of size 128 and the cross entropy loss. All our experiments were run on a server with an Intel(R) Xeon(R) CPU @ 2.00GHz with four NVIDIA Tesla T4 GPUs. Mean and standard deviation are obtained by running every experiment 20 times.
4.4 Results
Our results are summarized in Table 2. We report the test classification accuracy in percentages (for models that converged).
Patterns  Junta  Submod.  COAUTH10  CON10  DOM4  DOM6  

Baselines  
MLP    
GCN    
GCN pool    
GCN pool avg.  
GCN    
GCN pool  
GCN pool avg.  
Proposed models  
PCN  
PCN pool  
PCN pool avg.  
PCN      
PCN pool      
PCN pool avg. 
Discussion
Table 2 shows that in the synthetic tasks the powerset convolutional models (PCNs) tend to outperform the baselines with the exception of GCNs, which are based on the adjacency graph shift on the undirected hypercube. In fact, the set of convolutional filters parametrized by our GCNs is the subset of the powerset convolutional filters associated with the symmetric difference shift (6) obtained by constraining all filter coefficients for oneelement sets to be equal: with , for all . Therefore, it is no surprise that the GCNs perform well. In contrast, the restrictions placed on the filters of GCN are stronger, since Kipf and Welling (2016) approximates the onehop Laplacian convolution.
In contrast, this trend is not as clearly visible in the tasks derived from real hypergraph data. This might either be caused by the models with less degrees of freedom, i.e., omnidirectional graph onehop instead of directional set onehop, being more robust to noisy data, or, simply by permutation equivariance properties of such omnidirectional filters. The powerset convolutional filters, on the other hand, are sensitive to hypergraph isomorphy, i.e., hypergraphs with same connectivity structure but different vertex ordering are being processed differently.
Pooling
Interestingly, while reducing the hidden state by a factor of two after every convolutional layer, pooling in most cases only slightly decreases the accuracy of the PCNs in the synthetic tasks and has no impact in the other tasks. Also the influence of pooling on the GCN is more similar to the behavior of PCNs than the one for the GCN.
Equivariance
Finally, we compare models having a shiftinvariant convolutional part (suffix "pool avg.") with models having a shiftequivariant convolutional part (suffix "pool") models. The difference between these models is that the invariant ones have an accumulation step before the MLP resulting in (1) the inputs to the MLP being invariant w.r.t. the shift corresponding to the specific convolutions used and (2) the MLP having much fewer parameters in its hidden layer ( instead of ). For the PCNs the effect of the accumulation step appears to be task dependent. For instance, in Junta, Submod., DOM4 and DOM6 it is largely beneficial, and in the others it slightly disadvantageous. Similarly, for the GCNs the accumulation step is beneficial in Junta and disadvantageous in COAUTH10, which might be caused by the resulting models not being expressive enough due to the lack of parameters.
5 Related Work
Our work is at the intersection of geometric deep learning, generalized signal processing and set function learning. Since each of these areas is broad, due to space limitations, we will only review the work that is most closely related to ours.
Deep learning
Geometric deep learners Bronstein et al. (2016) can be broadly categorized into convolutionbased approaches Bruna et al. (2013); Defferrard et al. (2016); Kipf and Welling (2016); Cohen et al. (2018); Cohen and Welling (2016) and messagepassingbased approaches Gilmer et al. (2017); Selsam et al. (2018); Scarselli et al. (2009). The latter assign a hidden state to every element of the index domain (e.g., to every vertex in a graph) and make use of a message passing protocol to learn representations in a finite amount of communication steps. Reference Gilmer et al. (2017) points out that graph CNNs are a subclass of message passing / graph neural networks (MPNNs). References Bruna et al. (2013); Defferrard et al. (2016); Kipf and Welling (2016) utilize the spectral analysis of the graph Laplacian Shuman et al. (2012) to define graph convolutions, while Cohen and Welling (2016); Cohen et al. (2018) similarly utilize group convolutions Stankovic et al. (2005) with desirable equivariances. In a similar vein, in this work we we utilize the recently proposed powerset convolutions Püschel (2018) as the foundation of a generalized CNN. With respect to the latter reference, which provides the theoretical foundation for powerset convolutions, our contributions are an analysis of the resulting filters from a pattern matching perspective, to define its exact instantiations and applications in the context of neural networks, as well as to show that these operations are practically relevant for various tasks.
Signal processing
Set function signal processing Püschel (2018) is an instantiation of algebraic signal processing (ASP) Püschel and Moura (2008, 2006) on the powerset domain. ASP provides a theoretical framework for deriving a complete set of basic signal processing concepts, including convolution, for novel index domains, using as starting point a chosen shift to which convolutions should be equivariant. To date the approach was used for index domains including graphs (Sandryhaila and Moura, 2012; Ortega et al., 2018; Sandryhaila and Moura, 2014), powersets (set functions) Püschel (2018), meet/join lattices Püschel (2019), and a collection more regular domains, e.g., Püschel and Rötteler (2007); Sandryhaila et al. (2012).
Set function learning
In contrast to the set function classification problems considered in this work, most of existing set function learning is concerned with completing a single partially observed set function Stobbe and Krause (2012); Choi et al. (2011); Sutton et al. (2012); Badanidiyuru et al. (2012); Balcan et al. (2011); Balcan and Harvey (2011); Bilmes and Bai (2017); Zaheer et al. (2017). In this context, traditional methods Choi et al. (2011); Sutton et al. (2012); Badanidiyuru et al. (2012); Balcan et al. (2011); Balcan and Harvey (2011) mainly differ in the way how the class of considered set functions is restricted in order to be manageable. Reference Stobbe and Krause (2012) does this by considering WalshHadamardsparse (= Fourier sparse) set functions. Recent approaches Bilmes and Bai (2017); Zaheer et al. (2017) leverage deep learning. Reference Bilmes and Bai (2017) proposes a neural architecture for learning submodular functions and Zaheer et al. (2017) proposes one for learning set functions defined on a continuous ground set Wagstaff et al. (2019).
6 Conclusion
We introduced a convolutional neural network architecture for powerset data. We did so by utilizing novel powerset convolutions and introducing powerset pooling layers. The powerset convolutions used stem from algebraic signal processing theory Püschel and Moura (2008), a theoretical framework for porting signal processing to novel domains. Therefore, we hope that our methoddriven approach can be used to specialize deep learning to other domains as well. We conclude with a list of challenges and future work directions.
Lack of data
We argue that certain success components of deep learning are domain independent and our experimental results empirically support this claim to a certain degree. However, one cannot neglect the fact that data abundance is one of these success components and, for the supervised learning problems on set functions considered in this paper, one that is currently lacking.
Computational complexity
Set functions are exponentially large objects. Despite the existence of fast algorithms for computing set function Fourier transforms due to their Kronecker form Püschel (2018), their computational complexity for a ground set of size is still in . If one would like to scale our approach to larger ground sets, e.g., to support semisupervised learning on graphs or hypergraphs where there is enough data available, one should devise methods to preserve the sparsity of the respective set function representations while filtering, pooling and applying nonlinear functions. For instance, one could leverage that the hypergraph weight set function is zero almost everywhere and the cut/association score functions are also Fourier sparse.
Domain dependence
It is not straightforward to take filters learned on one index domain and apply them on another index domain even if , as changing the ground set means changing the filter space. One faces the same problem when learning graph convolutional filters where changing the underlying graph also changes the filter space Yi et al. (2017).
Acknowledgements
We thank Max Horn for insightful discussions and his extensive feedback.
References
 TensorFlow: a system for largescale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, pp. 265–283. External Links: ISBN 9781931971331, Link Cited by: §3.
 Sketching valuation functions. In Proceedings of the twentythird annual ACMSIAM symposium on Discrete Algorithms, pp. 1025–1035. Cited by: §5.
 Learning valuation functions. CoRR abs/1108.5669. External Links: Link, 1108.5669 Cited by: §5.
 Learning submodular functions. In Proceedings of the fortythird annual ACM symposium on Theory of computing, pp. 793–802. Cited by: §5.
 Simplicial closure and higherorder link prediction. Proceedings of the National Academy of Sciences 115 (48), pp. E11221–E11230. Cited by: 2nd item, §4.2, §4.2, §4.2.
 Deep submodular functions. CoRR abs/1701.08939. External Links: Link, 1701.08939 Cited by: §5.
 Models in cooperative game theory. Vol. 556, Springer Science & Business Media. Cited by: §2.1.
 Geometric deep learning: going beyond euclidean data. CoRR abs/1611.08097. External Links: Link, 1611.08097 Cited by: §1, §3, §5.
 Spectral networks and locally connected networks on graphs. CoRR abs/1312.6203. External Links: Link, 1312.6203 Cited by: §1, Table 1, §5.
 Almost tight upper bound for finding fourier coefficients of bounded pseudoboolean functions. Journal of Computer and System Sciences 77 (6), pp. 1039–1053. Cited by: §5.
 Spherical cnns. CoRR abs/1801.10130. External Links: Link, 1801.10130 Cited by: Table 1, §5.
 Group equivariant convolutional networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning  Volume 48, ICML’16, pp. 2990–2999. External Links: Link Cited by: Table 1, §5.
 Combinatorial auctions: a survey. INFORMS Journal on computing 15 (3), pp. 284–309. Cited by: §2.1.
 A brief introduction to fourier analysis on the boolean cube. Theory of Computing, pp. 1–20. Cited by: §1, §5.
 Convolutional neural networks on graphs with fast localized spectral filtering. CoRR abs/1606.09375. External Links: Link, 1606.09375 Cited by: §1, §5.
 Neural message passing for quantum chemistry. CoRR abs/1704.01212. External Links: Link, 1704.01212 Cited by: §1, §4.3, §5.
 Online submodular maximization under a matroid constraint with application to learning assignments. CoRR abs/1407.1082. External Links: Link, 1407.1082 Cited by: §2.1.
 Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1.
 Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §1.
 Long shortterm memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §1.
 Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.3.
 Semisupervised classification with graph convolutional networks. CoRR abs/1609.02907. External Links: Link, 1609.02907 Cited by: §1, §2, §4.3, §4.4, §5.
 Submodular function maximization.. Cited by: §4.1.
 Nearoptimal observation selection using submodular functions. In AAAI, Vol. 7, pp. 1650–1654. Cited by: §2.1.
 Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
 Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML10), pp. 807–814. Cited by: §4.3.
 Analysis of boolean functions. Cambridge University Press. Cited by: §1, §5.
 Graph signal processing: overview, challenges, and applications. Proceedings of the IEEE 106 (5), pp. 808–828. External Links: Document, ISSN 00189219 Cited by: §1, §5.
 Submodular relaxation for inference in markov random fields. CoRR abs/1501.03771. External Links: Link, 1501.03771 Cited by: §2.1.
 Algebraic signal processing theory: foundation and 1D time. IEEE Trans. on Signal Processing 56 (8), pp. 3572–3585. Cited by: §1, §5, §6.
 Algebraic signal processing theory: 2D hexagonal spatial lattice. IEEE Trans. on Image Processing 16 (6), pp. 1506–1521. Cited by: §5.
 A discrete signal processing framework for meet/join lattices with applications to hypergraphs and trees. In ICASSP 2019  2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 5371–5375. External Links: Document, ISSN 2379190X Cited by: §5.
 Algebraic signal processing theory. CoRR abs/cs/0612077. External Links: Link, cs/0612077 Cited by: §1, §5.
 A discrete signal processing framework for set functions. In Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. . External Links: ISSN Cited by: 1st item, §1, §2, §2, §2, Table 1, §5, §5, §6.
 Equivariance through parametersharing. In Proceedings of the 34th International Conference on Machine Learning, D. Precup and Y. W. Teh (Eds.), Proceedings of Machine Learning Research, Vol. 70, International Convention Centre, Sydney, Australia, pp. 2892–2901. External Links: Link Cited by: §1.
 Faster rcnn: towards realtime object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99. Cited by: §1.
 Unet: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computerassisted intervention, pp. 234–241. Cited by: §1.
 Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Technical report Cornell Aeronautical Lab Inc Buffalo NY. Cited by: §4.3.
 Algebraic signal processing theory: 1D nearestneighbor models. IEEE Trans. on Signal Processing 60 (5), pp. 2247–2259. Cited by: §5.
 Discrete signal processing on graphs: frequency analysis. IEEE Transactions on Signal Processing 62 (12), pp. 3042–3054. External Links: Document, ISSN 1053587X Cited by: §5.
 Discrete signal processing on graphs. CoRR abs/1210.4752. External Links: Link, 1210.4752 Cited by: Table 1, §4.3, §5.
 The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §1, §5.
 A fast hadamard transform for signals with sublinear sparsity in the transform domain. IEEE Transactions on Information Theory 61 (4), pp. 2115–2132. Cited by: §3.
 Learning a SAT solver from singlebit supervision. CoRR abs/1802.03685. External Links: Link, 1802.03685 Cited by: §1, §5.
 The emerging field of signal processing on graphs: extending highdimensional data analysis to networks and other irregular domains. arXiv preprint arXiv:1211.0053. Cited by: §1, §1, Table 1, §2, §5, §5.
 Graphvae: towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pp. 412–422. Cited by: §1.
 Fourier analysis on finite groups with applications in signal processing and system design. John Wiley & Sons. Cited by: Table 1, §5.
 Learning fourier sparse set functions. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, N. D. Lawrence and M. Girolami (Eds.), Proceedings of Machine Learning Research, Vol. 22, La Palma, Canary Islands, pp. 1125–1133. External Links: Link Cited by: §1, §2, §5.
 Computing the moments of kbounded pseudoboolean functions over hamming spheres of arbitrary radius in polynomial time. Theoretical Computer Science 425, pp. 58–74. Cited by: §5.
 WaveNet: a generative model for raw audio.. SSW 125. Cited by: §1.
 On the limitations of representing functions on sets. arXiv preprint arXiv:1901.09006. Cited by: §5.
 Graphgan: graph representation learning with generative adversarial nets. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §1.
 Syncspeccnn: synchronized spectral cnn for 3d shape segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2282–2290. Cited by: §6.
 Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine 13 (3), pp. 55–75. Cited by: §1.
 Deep sets. External Links: 1703.06114 Cited by: §5.