Determining Structural Properties of Artificial Neural Networks Using Algebraic Topology

Determining Structural Properties of Artificial Neural Networks Using Algebraic Topology

Abstract

Artificial Neural Networks (ANNs) are widely used for approximating complex functions. The process that is usually followed to define the most appropriate architecture for an ANN given a specific function is mostly empirical. Once this architecture has been defined, weights are usually optimized according to the error function. On the other hand, we observe that ANNs can be represented as graphs and their topological ’fingerprints’ can be obtained using Persistent Homology (PH). In this paper, we describe a proposal focused on designing more principled architecture search procedures. To do this, different architectures for solving problems related to a heterogeneous set of datasets have been analyzed. The results of the evaluation corroborate that PH effectively characterizes the ANN invariants: when ANN density (layers and neurons) or sample feeding order is the only difference, PH topological invariants hold; in the opposite direction, in different sub-problems (i.e. different labels), PH varies. This approach based on topological analysis helps towards the goal of designing more principled architecture search procedures and having a better understanding of ANNs.

1 Introduction

Different types of ANNs can be trained for the same problem. Even for the same type of neural network, there are many hyperparameters such as the number of neurons per layer or the number of layers. In addition, Stochastic Gradient Descent (SGD) is a highly stochastic process. The final weights for the same set of hyperparameters can vary, depending on the network initialization or the sample training order.

The aim of this paper is to find invariants that can group together neural networks trained for the same problem, with independence of the particular architecture or initialization. In this work, we focus on Fully Connected Neural Networks (FCNNs) for the sake of simplicity. Given an ANN, we can represent it with a directed weighted graph. It is possible to associate certain topological objects to such graphs. See Jonsson [2007] for a complete reference on graph topology.

We are using a special topological object from one topology area named algebraic topology. In particular, as we said, we are experimenting with a topological object called Persistent Homology. For complete analysis of topological objects on graphs, see Aktas et al. [2019].

2 Related Work

One of the fundamental papers of Topological Data Analysis (TDA) is presented in Carlsson [2009] and suggests the use of Algebraic Topology to obtain qualitative information and deal with metrics for large amounts of data. For an extensive overview of simplicial topology on graphs see Giblin [1977], Jonsson [2007]. Aktas et al. [2019] provide a thoroughly analysis of PH methods.

More recently, a number of publications have dealt with the study of the capacity of ANNs using PH. Guss and Salakhutdinov [2018a] characterize learnability of different ANN architectures by computable measures of data complexity. Rieck et al. [2019b] introduce the neural persistence metric, a complexity measure based on TDA on weighted stratified graphs. Donier [2019] propose the concept of spatial capacity allocation analysis. Konuk and Smith [2019] propose an empirical study of how ANNs handle changes in topological complexity of the input data.

In terms of pure ANN analysis, there are relevant works, like Hofer et al. [2020], that study topological regularization. Clough et al. [2020] introduce a method for training neural networks for image segmentation with prior topology knowledge, specifically via Betti numbers. Corneanu et al. [2020] estimate the performance gap between training and testing without the need of a testing dataset.

On the other hand, Topological analysis of decision boundaries has been a very prolific area. Ramamurthy et al. [2019] propose a labeled Vietoris-Rips complex to perform PH inference of decision boundaries for quantification of ANN complexity.

Naitzat et al. [2020] experiment on the PH of a wide range of point cloud input datasets for a binary classification problems to see that ANNs transform a topologically rich dataset (in terms of Betti numbers) into a topologically simpler one as it passes through the layers. They also verify that the reduction in Betti numbers is significantly faster for ReLU activations than hyperbolic tangent activations.

In Liu [2020], they obtain certain geometrical and topological properties of decision regions for ANN models, and provide some principled guidance to designing and regularizing ANNs. Additionally, they use curvatures of decision boundaries in terms of network weights, and the rotation index theorem together with the Gauss-Bonnet-Chern theorem.

Regarding ANN representation, one of the most related works to ours, Gebhart et al. [2019], focuses on topological neural network representation. They introduce a method for computing PH over the graphical activation structure of neural networks, which provides access to the task-relevant substructures activated throughout the network for a given input.

Interestingly, in Watanabe and Yamana [2020], authors work on ANN representation through simplicial complexes based on deep Taylor decomposition and they calculate the PH of ANNs in this representation. In Chowdhury et al. [2019], they use directed homology to represent feed-forward fully connected neural network architectures. They show that the path homology of these networks is non-trivial in higher dimensions and depends on the number and size of the network layers. They investigate homological differences between distinct neural network architectures.

There has been a considerable growth of interest in applied topology in the recent years. This popularity increase and the development of new software libraries3, along with the growth of computational capabilities, have empowered new works. Some of the most remarkable libraries are Ripser Tralie et al. [2018], and Flagser Lütgehetmann et al. [2019]. They are focused on the efficient computation of PH. For GPU-Accelerated computation of Vietoris-Rips PH, Ripser++ Zhang et al. [2020] offers a speedup of up to 30x in execution time with respect to the original Ripser. The Python library we are using, Giotto-TDA Tauzin et al. [2020], makes use of both above libraries underneath.

We contribute to the field of Topology applied to Deep Learning by effectively characterizing ANNs using the PH topological object, unlike other works Corneanu et al. [2019], Guss and Salakhutdinov [2018b] that approximate neural networks representation in terms of input space. In this way we provide a topological invariant that relate ANNs trained for similar problems, even if they have different architectures, and differentiate ANNs targeting distinct problems. These topological properties are useful for understanding the underlying topological complexity and topological relationships of ANNs.

3 Methods

We obtain topological invariants associated to neural networks that solve a given problem. For doing so, we use the PH of the graph associated to an ANN. We compute the PH for various networks applied to different tasks. We then compare all the diagrams for each one of the problems. See the code4 for further details.

3.1 Experimental Settings

We start with some definitions on algebraic topology:

Definition 1 (simplex)

A k-simplex is a k-dimensional polytope which is the convex hull of its k + 1 vertices. i.e. the set of all convex combinations where and .

Some examples of simplices are:

  • 0-simplex is a point.

  • 1-simplex is a line segment.

  • 2-simplex is a triangle.

  • 3-simplex is a tetrahedron.

Definition 2 (simplicial complex)

A simplicial complex is a set of simplices that satisfies the following conditions:

  1. Every subset (or face) of a simplex in also belongs to .

  2. For any two simplices and in , if , then is a common subset, or face, of both and .

Figure 1: Simplicial complex example
Definition 3 (directed flag complex)

Let be a directed graph. The directed flag complex is defined to be the ordered simplicial complex whose -simplices are all ordered -cliques, i.e., -tuples , such that , and for .

We define the boundary, , as a function that maps i-simplex to the sum of its (i-1)-dimensional faces. Formally speaking, for an i-simplex , its boundary is:

(1)

where the hat indicates the is omitted.

We can expand this definition to -chains. For an -chain , .

We can now distinguish two special types of chains using the boundary map that will be useful to define homology:

  • The first one is an -cycle, which is defined as an -chain with empty boundary. In other words, an -chain is an -cycle if and only if , i.e. .

  • An -chain is -boundary if there exists an -chain such that , i.e. .

We associate to the ANN a weighted directed graph that is analyzed as a simplicial complex consisting on the union of points, axes, triangles, tetrahedrons and larger dimension polytopes (those are the elements referred as simplices). As a formal definition of our central object, graphs:

Definition 4 (graph)

A graph is a pair , where is a finite set referred to as the vertices or nodes of , and is a subset of the set of unordered pairs of distinct points in , which we call the edges of . Geometrically the pair indicates that the vertices and are adjacent in . A directed graph, or a digraph, is similarly a pair of vertices and edges , except the edges are ordered pairs of distinct vertices, i.e.,the pair indicates that there is an edge from to in . In a digraph, we allow reciprocal edges, i.e., both and may be edges in , but we exclude loops, i.e., edges of the form .

Given a trained ANN, we take the collection of network connections as directed and weighted edges that join neurons, represented by graph nodes. Biases are considered as new edges that join new vertices, with a source neuron that has a given weight. Note that in this representation we lose the information about the activation functions, for simplicity and to avoid representing the network as a multiplex network. Bias information could also have been ignored because, as we will see, it is not very informative in terms of topology.

For negative edge weights, we decide to reverse edge directions and maintain the absolute value of the weights. We discard the use of weight absolute value as ANNs are not invariant under weight sign transformations. We also decided not to use PH interval , when , because the treatment of zero value is not topologically coherent.

We then normalize the weights of all the edges as expressed in Equation 2 where is the weight to normalize, are all the weights and is an smoothing parameter that we set to 0.000001. This smoothing parameter is necessary as we want to avoid normalized weights of edges to be 0. This is because 0 implies a lack of connection.

(2)

Given a weighted directed graph obtained from a trained ANN, we define a directed flag complex associated to it.

Next we define the topological object that we will use to analyze the directed flag complex associated with neural networks.

Definition 5 (homology group)

Given these two special subspaces, -cycles and -boundaries of , we now take the quotient space of as a subset of . In this quotient space, there are only the -cycles that do not bound an -complex, or -voids of . This quotient space is called -th homology group of the simplicial complex :

(3)

where and are the function kernel and image respectively.

The dimension of -th homology is called the -th Betti number of , , where:

(4)

The -th Betti number is the number of -dimensional voids in the simplicial complex ( gives the number of connected components of the simplicial complex, gives the number of loops and so on). For a deeper introduction to algebraic topology and computational topology, we refer to Edelsbrunner and Harer [2009], Ghrist [2014].

We are going to work with a family of simplicial complexes, , for a range of values of so that the complex at step is embedded in the complex at for , i.e. . This nested family of simplicial complexes is called a filtration.

Figure 2: Simplicial complex filtration

Given a filtration, one can look at the birth, where a homology object appears, and death, the time where the hole disappears. The PH treats the birth and the death of these homological features in for different values. Lifespan of each homological feature can be represented as an interval, , of the homological feature. Given a filtration, one can record all these intervals by a Persistence Barcode (PB) Carlsson [2009], or in a Persistence Diagram (PD), as a collection of multiset of intervals.

For the simplicial complex associated to ANN directed weighted graph, we use as filtration parameter the edge weight. This filtration gives a collection of contained directed weighted graph or simplicial complex , where and , (remember that edge weights are normalized).

As mentioned previously, our interest in this paper is to compare PDs from two different simplicial complex. There are two distances traditionally used to compare PDs, Wasserstein distance and Bottleneck distance. Their stability with respect to perturbations on PDs has been object of different studies Chazal et al. [2012], Cohen-Steiner et al. [2005].

In order to make computations feasible, we filter the PDs by limiting the minimum interval size. We do so by setting a threshold . Additionally, for computing distances, we need to remove infinity values. As we are only interested in the deaths until the maximum weight value, we replace all the infinity values by .

Definition 6 (Wasserstein distance)

The -Wasserstein distance between two PDs and is the infimum over all bijections: of:

(5)

where is defined for by . The limit defines the Bottleneck distance. More explicitly, it is the infimum over the same set of bijections of the value

(6)

The set of PDs together with any of the distances described above is a metric space. We work on this metric space to analyze the similarity between simplicial complexes associated to neural networks.

In order to apply PDs distance to real cases, we need to make calculations computationally feasible. Wasserstein distance calculations are computationally hard for large PDs (each PD of our ANN models has a million persistence intervals per diagram). We will use a vectorized version of PDs, also called PD discretization. This vectorized version summaries have been proposed and used on recent literature Adams et al. [2017], Berry et al. [2020], Bubenik [2015], Lawson et al. [2019], Rieck et al. [2019a].

For our topological calculations (persistence homology, discretization, and diagram distance calculation) we used Giotto-TDA Tauzin et al. [2020] and the following supported vectorized persistence summaries: {itemize*}

Persistence landscape.

Weighted silhouette.

Heat vectorizations.

Definition 7 (Persistence landscape)

Given a collection of intervals that compose a PD, its persistence landscape is the set of functions defined by letting be the -th largest value of the set where:

(7)

and . The function is referred to as the -layer of the persistence landscape.

Now we define a vectorization of the set of real-valued function that compose PDs on . For any we can restrict attention to PDs whose associated persistence landscape is -integrable, that is to say,

(8)

is finite. In this case, we refer to Equation (8) as the -landscape norm of . For , we define the value of the landscape kernel or similarity of two vectorized PDs and as

(9)

where and are their associated persistence landscapes.

is geometrically described as follows. For each , we draw an isosceles triangle with base the interval on the horizontal -axis, and sides with slope and . This subdivides the plane into a number of polygonal regions that we label by the number of triangles contained on it. If is the union of the polygonal regions with values at least , then the graph of is the upper contour of , with if the vertical line does not intersect .

Definition 8 (Weighted silhouette)

Let be a PD and a set of positive real numbers. The silhouette of weighted by is the function defined by:

(10)

where

(11)

and When for we refer to as the -power-weighted silhouette of . It defines a vectorization of the set of PDs on the vector space of continuous real-valued functions on .

Definition 9 (Heat vectorizations)

Considering PD as the support of Dirac deltas, one can construct, for any , two vectorizations of the set of PDs to the set of continuous real-valued function on the first quadrant . The heat vectorization is constructed for every PD by solving the heat equation:

(12)

where , then solving the same equation after precomposing the data of Equation (12) with the change of coordinates , and defining the image of to be the difference between these two solutions at the chosen time .

We recall that the solution to the heat equation with initial condition given by a Dirac delta supported at is:

(13)

To highlight the connection with normally distributed random variables, it is customary to use the the change of variable .

For a complete reference on vectorized persistence summaries and PH approximated metrics, see Tauzin et al. [2020], Berry et al. [2020] and Giotto-TDA package documentation appendix5.

3.2 Datasets

To determine the topological structural properties of trained ANNs, we select different kinds of datasets. We opt for three well-known benchmarks in the machine learning community: {enumerate*}[label=(0)]

the MNIST6 dataset for classifying handwritten digit images,

the CIFAR-107 (CIFAR) dataset for classifying ten different objects,

and the Language Identification Wikipedia dataset8 for classifying 7 different languages.

We selected the MNIST dataset since it is based on images but, at the same time, it does not require a CNN to obtain competitive results. For this dataset, we only used a Fully Connected Neural Network with Dropout. On the other hand, CIFAR, a more difficult benchmark, requires a CNN for obtaining good enough accuracy. Since our method does not contemplate CNNs weights, different trainings could provide noisy diagrams. To avoid this issue, we first train a CNN, and then we keep its convolutional feature extractor. Finally, we train all the networks with the same convolutional feature extractor by freezing the convolutional weights. In the case of the Language Identification dataset, it is a textual dataset that requires vectorizing (in this case, with character frequency), and the vectors are fed to a FCNN.

3.3 Experiments Pipeline

We study the following variables (hyperparameters): {enumerate*}

Layer width,

Number of layers,

Input order9),

Number of labels (number of considered classes).

We define the base architecture as the one with a layer width of 512, 2 layers, the original features order, and considering all the classes (10 in the case of MNIST and CIFAR, and 7 in the case of the language identification task). Then, doing one change at a time, keeping the rest of the base architecture hyperparameters, we experiment with architectures with the following configurations:

  • Layer width: 128, 256, 512 (base) and 1024.

  • Number of layers: 2 (base), 4, 6, 8 and 10.

  • Input order: 5 different randomizations (with base structure).

  • Number of labels (MNIST, CIFAR): 2, 4, 6, 8 and 10 (base).

  • Number of labels (Language Identification): 2, 3, 4, 6 and 7 (base).

Note that this is not a grid search over all the combinations. We always modify one hyperparameter at a time, and keep the rest of them as in the base architecture. In other words, we experiment with all the combinations such that only one of the hyperparameters is set to a non-base value at a time.

For each dataset, we train 5 times (each with a different random weight initialization) each of these neural network configurations. Then, we compute the topological distances (persistence landscape, weighted silhouette, heat) among the different architectures. In total, we obtain distance matrices (3 datasets, 5 random initializations, 3 distance measures). Finally, we average the 5 random initializations, such that we get matrices, one for each distance on each dataset. All the matrices have dimensions , since 19 is the number of experiments for each dataset (corresponding to the total the number of architectural configurations mentioned above). Note that the base architecture appears 8 times (1, on the number of neurons per layer, 1 on the number of layers, 1 on the number of labels and the 5 randomizations of weight initializations).

4 Results Analysis & Discussion

(a) MNIST - Heat.
(b) MNIST - Silhouette.
(c) MNIST - Landscape.
(d) CIFAR - Heat.
(e) CIFAR - Silhouette.
(f) CIFAR - Landscape.
(g) Language Identification - Heat.
(h) Language Identification - Silhouette.
(i) Language Identification - Landscape.
Experiment indices Experiment
0-3 Layer sizes.
4-8 Number of layers.
9-13 Input ordering.
14-18 Number of labels.
Figure 3: Distance matrices of all network architectures presented. Average of 5 runs.

Figure 3 provides the results for each dataset and for each one of the distance metrics. On the bottom of the figure, the experiments indices and experiments are related. The results shown in the figure are positive for the validation of our method, since most of the experiment groups are trivially distinguishable. Note that the matrices are symmetric and that the diagonal is all zeros.

In the first row of Figure 3 from the MNIST dataset, the Heat distance matrix shown in Figure (a)a and Silhouette distance matrix shown in Figure (b)b look more informative than the Landscape distance matrix of Figure (c)c. Heat and Silhouette distance matrices, unlike the Landscape one, make distinguishing the groups of experiments straightforward. In both the Heat and Silhouette distance matrices, the increases in layer size imply gradual differences in the topological distances. The same holds for the number of layers, and number of labels. This means that the increase in the network capacity directly maps to the increase in topological complexity. The input order slightly alters the topological space but with no interpretable topological meaning.

Results of experiments in CIFAR datasets are shown in second row of Figure 3. CIFAR dataset results are interestingly different from those of MNIST dataset. This is, presumably, because the CIFAR uses a pretrained CNN on the ten classes of the problem. Thus, we should understand the results from this context.

As shown in Figures (d)d and (e)e, in the Heat and Silhouette distance matrices, increasing the layer size implies a gradual increase in topological distance. The first fully connected layer size is important as it can avoid a bottleneck from the previous CNN output. Some works in the literature show that adding multiple fully connected layers does not necessarily enhance the prediction capability of CNNs Basha et al. [2019], which is congruent with our results when adding fully connected layers. Furthermore, experiments regarding the number of labels of the network remain close in distance. This could provide additional support to the theory that adding more FCN layers do not enhance CNNs prediction capabilities, and also to the claim that the CNN is the main feature extractor of the network (and the FCNN only works as a pure classifier). Concerning the experiments of input order, in this case there is slightly more homogeneity than in MNIST, again showing that the order of sample has negligible influence. Moreover, there could have been even more homogeneity taking into account that the fully connected network reduced its variance thanks to the frozen weights of the CNN. This also supports the fact that the CNN is the main feature extractor of the network. As in MNIST results, CIFAR results show that the topological properties are surprisingly a mapping of the practical properties of ANNs.

As for the Language Identification dataset with characters, Figures (g)g and (h)h indicate that increasing the layer size results in topological distance increase. However, unlike in the two previous datasets, introducing more layers does not derive in a gradual topological distance increase. Although it tends to a gradual increase, there are gaps or unexpected peaks that avoid this gradual increase from happening. This might be related to overfitting as the network with larger capacity memorizes the training samples and, therefore, it deviates from a correct problem generalization. In other words, there might be sample memorization. This happens in the Language Identification dataset and does not in the two vision ones because this one is simpler to learn (e.g., in this case, a plain Multi-Layer Perceptron with one hidden layer classifier is already competitive). Regarding the sample shuffling, as in the previous datasets there is small distance among all experiments with no noticeable variation.

Finally, for the experiments of number of labels, we hypothesize that experiments appear to be noisy due to {enumerate*}[label=(0)]

the similarity and dissimilarity of languages,

the increase in the complexity of the problem,

more labels help to enhance the understanding of the problem representation, and

label imbalance will make large languages to be more accurately recognized . While 4 and 4 add somehow expected noise as languages are arbitrarily ordered, 4 makes ANN complexity to be larger with more labels and, in contrary, 4 decreases the complexity of the ANN as the understanding of the problem is deeper with more labels. Languages are sorted as English, French, Spanish, Italian, German, Slovenian and Czech. In principle, one would say that English and German are similar as they are Germanic languages; Spanish, Italian and French are similar as they are Romance languages; Slovenian and Czech are similar as they are Slavic languages. Nevertheless, one must take into account classifier is based on character frequencies, so the intuitive closeness of languages does not necessarily hold. This can be checked by comparing the cosine similarity of the character frequencies vector of each language. See Figure 4 for further details.

Figure 4: Language Identification dataset cosine similarity of the sum of all character vectors for each language normalized.

In the Heat distance matrix of Figure (g)g, when increasing from two labels to three, we observe that the gap when adding Spanish is larger than when adding Spanish and Italian, as Italian is very close to English and French, while Spanish is far away. Adding German and Slovenian increases the difference, presumably because of the large differences that Slovenian has with the rest of the languages. Two hubs are easily distinguishable, the one with Romance languages plus English and the other one with the one with Slavic languages plus German. Differences might be due to the large inter-cluster distances; the learning structure changes dramatically in those cases. The Silhouette distance matrix of Figure (h)h is slightly more strict about computing similarities. In other words, only very close items appear to be similar. Adding labels up to Spanish and up to Italian show strong similarity.

5 Conclusions & Future Work

Results from different experiments, in three different datasets from computer vision and natural language, lead to similar topological structural properties and are trivially interpretable, which yields to general applicability.

The bests discretizations chosen for this work are the Heat and Silhouette. They show better separation of experiment groups (invariants), and are effectively reflecting changes in a sensible way. On the other hand, we also explored the Landscape discretization but it offers a very low interpretability and clearance. In other words, it is not helpful for comparing PH diagrams associated to ANNs.

The selected ANN representation is reliable and complete, and yields coherent and realistic results. Currently, our representation system is applicable to Fully Connected layers. However Most popular Deep Learning libraries do not include neither graph representation tools nor consistent computation graph traversing utilities which makes ANN graph representation harder.

As future work, we are planning to adapt to low-level Deep Learning libraries and to support popular ANN architectures such as CNNs, Recurrent Neural Networks, and Transformers Vaswani et al. [2017]. Furthermore, we would like to come up with a universal neuron-level computing graph representation.

Following the ANN representation concern, we would additionally like to represent the ANNs node types and operations more concretely in the graph. We are also working on these representations; the main challenge is to avoid representing the network as a multiplex network. We are performing more analysis regarding the learning of an ANN, and trying to topologically answer the question of how an ANN learns.

Acknowledgements

We want to thank David Griol Barres, Jerónimo Arenas-García and Esther Ibáñez-Marcelo for their review, feedback and corrections on the paper.

This work was funded by the Spanish State Secretariat for Digitalization and Artificial Intelligence to carry out support activities in supercomputing within the framework of the PlanTL10 signed on 14 December 2018.

Footnotes

  1. Contributed equally.
  2. footnotemark:
  3. https://www.math.colostate.edu/~adams/advising/appliedTopologySoftware/
  4. https://github.com/PlanTL-SANIDAD/net-homology-properties
  5. https://giotto-ai.github.io/gtda-docs/0.3.1/theory/glossary.html#persistence-landscape
  6. http://yann.lecun.com/exdb/mnist/
  7. https://www.cs.toronto.edu/~kriz/cifar.html
  8. https://www.floydhub.com/floydhub/datasets/language-identification/1/data
  9. Order of the input features. This one should definitely not affect the performance in an fully-connected neural network, so if our method is correct, it should be uniform as per the proposed topological metrics.
  10. https://www.plantl.gob.es/

References

  1. Persistence images: a stable vector representation of persistent homology. J. Mach. Learn. Res. 18, pp. 8:1–8:35. Cited by: §3.1.
  2. Persistence homology of networks: methods and applications. Applied Network Science 4, pp. 1–28. Cited by: §1, §2.
  3. Impact of fully connected layers on performance of convolutional neural networks for image classification. CoRR abs/1902.02771. External Links: Link, 1902.02771 Cited by: §4.
  4. Functional summaries of persistence diagrams. Journal of Applied and Computational Topology 4, pp. 211–262. Cited by: §3.1, §3.1.
  5. Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16, pp. 77–102. Cited by: §3.1.
  6. Topology and data. Bulletin of the American Mathematical Society 46, pp. 255–308. Cited by: §2, §3.1.
  7. Persistence stability for geometric complexes. Geometriae Dedicata 173, pp. 193–214. Cited by: §3.1.
  8. Path homologies of deep feedforward networks. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 1077–1082. Cited by: §2.
  9. A topological loss function for deep-learning based image segmentation using persistent homology. IEEE transactions on pattern analysis and machine intelligence PP. Cited by: §2.
  10. Stability of persistence diagrams. Proceedings of the twenty-first annual symposium on Computational geometry. Cited by: §3.1.
  11. What does it mean to learn in deep networks? and, how does one detect adversarial attacks?. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 4752–4761. External Links: Document Cited by: §2.
  12. Computing the testing error without a testing set. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2674–2682. Cited by: §2.
  13. Capacity allocation analysis of neural networks: a tool for principled architecture design. ArXiv abs/1902.04485. Cited by: §2.
  14. Computational topology - an introduction. American Mathematical Society. Cited by: §3.1.
  15. Characterizing the shape of activation space in deep neural networks. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 1537–1542. Cited by: §2.
  16. Elementary applied topology. Self-published. Cited by: §3.1.
  17. Graphs, surfaces, and homology : an introduction to algebraic topology. Chapman and Hall. Cited by: §2.
  18. On characterizing the capacity of neural networks using algebraic topology. ArXiv abs/1802.04443. Cited by: §2.
  19. On characterizing the capacity of neural networks using algebraic topology. CoRR abs/1802.04443. External Links: Link, 1802.04443 Cited by: §2.
  20. Topologically densified distributions. ArXiv abs/2002.04805. Cited by: §2.
  21. Simplicial complexes of graphs. Ph.D. Thesis, KTH Royal Institute of Technology. Cited by: §1, §2.
  22. An empirical study of the relation between network architecture and complexity. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 4597–4599. Cited by: §2.
  23. Persistent homology for the quantitative evaluation of architectural features in prostate cancer histology. Scientific Reports 9. Cited by: §3.1.
  24. Geometry and topology of deep neural networks’ decision boundaries. ArXiv abs/2003.03687. Cited by: §2.
  25. Computing persistent homology of directed flag complexes. arXiv: Algebraic Topology. Cited by: §2.
  26. Topology of deep neural networks. J. Mach. Learn. Res. 21, pp. 184:1–184:40. Cited by: §2.
  27. Topological data analysis of decision boundaries with application to model selection. ArXiv abs/1805.09949. Cited by: §2.
  28. Topological machine learning with persistence indicator functions. ArXiv abs/1907.13496. Cited by: §3.1.
  29. Neural persistence: a complexity measure for deep neural networks using algebraic topology. ArXiv abs/1812.09764. Cited by: §2.
  30. Giotto-tda: a topological data analysis toolkit for machine learning and data exploration. External Links: 2004.02551 Cited by: §2, §3.1, §3.1.
  31. Ripser.py: a lean persistent homology library for python. The Journal of Open Source Software 3 (29), pp. 925. External Links: Document, Link Cited by: §2.
  32. Attention is all you need. CoRR abs/1706.03762. External Links: Link, 1706.03762 Cited by: §5.
  33. Topological measurement of deep neural networks using persistent homology. In ISAIM, Cited by: §2.
  34. GPU-accelerated computation of vietoris-rips persistence barcodes. In Symposium on Computational Geometry, Cited by: §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
426410
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description