# Machine learning assisted measurement of local topological invariants

###### Abstract

The continuous effort towards topological quantum devices calls for an efficient and non-invasive method to assess the conformity of components in different topological phases. Here, we show that machine learning paves the way towards non-invasive topological quality control. We introduce a local topological marker, able to discriminate between topological phases of one-dimensional wires. The direct observation of this marker in solid state systems is challenging, but we show that an artificial neural network can learn to approximate it from the experimentally accessible local density of states. Our method distinguishes different non-trivial phases, even for systems where direct transport measurements are not available and for composite systems. This new approach could find significant use in experiments, ranging from the study of novel topological materials to high-throughput automated material design.

Topological insulators and superconductors are phases of matter characterised by the exact quantisation of macroscopic observables and the appearance of edge states at the boundary of open systems Hasan and Kane (2010). Such peculiar edge states include condensed-matter realisations of Majorana bound stated and unidirectional edge states, which are particularly robust against disorder and local perturbations. This makes them particularly appealing to engineer devices such as qubits, quantum channels Dlaska et al. (2017), and eventually quantum computers Nayak et al. (2008). In a quantum device, several components in different topological phases can be brought together; see Fig. 1. Therefore, it is convenient to have a means of locally discriminating between different topological phases. To this end, in analogy with the two-dimensional Chern marker Bianco and Resta (2011); Caio et al. (2019), we introduce a local quantity which we name “winding marker” that locally distinguishes topological phases of one-dimensional systems with chiral symmetry. This is in contrast both with the global approach of standard topological invariants that are only defined for infinite systems Hasan and Kane (2010), and with approaches based on scattering matrices Akhmerov et al. (2011); Fulga et al. (2011, 2012); Beenakker (2015) that fundamentally characterise an interface. Although attempting a direct measurement of the winding marker in solid-state systems would raise numerous challenges, we will show that it can be related to readily available experimental data.

In one-dimensional topological insulators and superconductors, the local density of states (LDOS) can be obtained from the tunnelling differential conductance, observed by scanning tunnelling microscopy (STM) Nadj-Perge et al. (2014); Ruby et al. (2015); Pawlak et al. (2016); Feldman et al. (2016); Jeon et al. (2017), or with more elaborate setups Zhang et al. (2018a). STM provides a relatively non-invasive measurement of the LDOS as it does not require the deposition of contacts; this might be relevant for the non-destructive testing of topological devices, e.g. to assess whether a manufactured sample is in the expected topological phase. Although the LDOS of a system without edges does not carry information about its topology, the edge states will appear in the LDOS of a finite size system. However, these may be obscured by the presence of disorder in the sample. Moreover, STM measurements only allow to access the LDOS up to an unknown prefactor Chen (2007). The relation between the measured LDOS and the winding marker can therefore be subtle, even in the absence of disorder, but we shall see that it can be inferred using supervised machine learning.

Machine learning techniques are increasingly used in physics Mehta et al. (); Carleo and Troyer (2017); Carrasquilla and Melko (2017); Baireuther et al. (); Baireuther et al. (2019). In particular, several works applied machine learning to study topological phases, mostly focusing on their classification from numerically accessible quantities such as entanglement spectra van Nieuwenburg et al. (2017), density matrices Carvalho et al. (2018), Hamiltonians Zhang et al. (2018b) or their eigenvectors Holanda and Griffith (2019), loops of two-point correlation functions Zhang and Kim (2017), the local density of a single state Ohtsuki and Ohtsuki (2016, 2017); Araki et al. (), and its disorder-averaged version Yoshioka et al. (2018). In cold-atom systems, artificial neural networks were used to identify topological phases from the experimental momentum distributions Rem et al. (). In solid-state systems however, the issue of determining the topological nature of a given sample from experimentally accessible data remains open.

In this article, we show that the winding marker in the centre of a finite size sample can be predicted from a measurement of the LDOS of the whole sample, by using supervised machine learning. Beside being able to distinguish trivial from topological phases, our method also discriminates between topological phases with distinct integer invariants. This is a non-trivial task as the simple counting of states is not available via STM measurement. Our method is of particular interest for unconventional superconductors, such as \ceSr2RuO4 Scaffidi and Simon (2015); Kallin and Berlinsky (2016) and one-dimensional Sahlberg et al. (2017) and two-dimensional Röntynen and Ojanen (2015) Shiba lattices, where large values of topological invariants are predicted. In those systems, the experimental determination of the number of edge states is highly challenging due to the lack of easily accessible electrical transport signatures and because of the difficulties in the accurate measurement of the quantised thermal conductance at low temperatures Jezouin et al. (2013); Banerjee et al. (2017, 2018); Kasahara et al. ().

Local winding marker. Topological phases of matter are characterised by quantised topological invariants, typically defined globally for infinite systems. These invariants often manifest themselves in the response function of the ground state to an appropriate gauge field Qi et al. (2008); Ludwig (2015). Hence, we can expect them to correspond to reasonably localised quantities in real space: topological invariants can be recast in terms of the Fermi projector on the ground state, which is nearsighted for gapped systems Aizenman and Graf (1998); Prodan and Kohn (2005); Bianco and Resta (2011). As first noticed by Bianco and Resta Bianco and Resta (2011) in the case of the anomalous Hall effect, this enables a local quantity closely related to the topological invariant to exist.

In this work, we focus on one-dimensional systems with chiral symmetry, such as polyacetylene Su et al. (1979) or the Kitaev chain Kitaev (2001). The chiral symmetry is realised by a unitary operator which anticommutes with the Hamiltonian . This class of topological systems is characterised by an integer-valued invariant called winding numbeA quantum computer needs the assistance of a classical algorithm to detect and identify errors that affect encoded quantum information. At this interface of classical and quantum computing the technique of machine learning has appeared as a way to tailor such an algorithm to the specific error processes of an experiment—without the need for a priori knowledge of the error model. Here, we apply this technique to topological color codes. We demonstrate that a recurrent neural network with long short-term memory cells can be trained to reduce the error rate epsilon L of the encoded logical qubit to values much below the error rate epsilon phys of the physical qubits—fitting the expected power law scaling , with d the code distance. The neural network incorporates the information from ’flag qubits’ to avoid reduction in the effective code distance caused by the circuit. As a test, we apply the neural network decoder to a density-matrix based simulation of a superconducting quantum computer, demonstrating that the logical qubit has a longer life-time than the constituting physical qubits with near-term experimental parameters.r, defined in momentum space as Chiu et al. (2016)

(1) |

Here, , where , , and is the projector on the states below the Fermi level. While convenient, translation invariance is not necessary to define the winding number. In an infinite system, it can be defined as the trace per unit volume Mondragon-Shem et al. (2014); Song and Prodan (2014); Rakovszky et al. (2017)

(2) |

where is the position operator. In particular, this real-space formulation applies to disordered systems. The topological invariant in (2) is a global quantity, quantised even at strong disorder Mondragon-Shem et al. (2014); Song and Prodan (2014); Prodan and Schulz-Baldes (2016).

In analogy with the Chern marker for two-dimensional systems introduced by Bianco and Resta Bianco and Resta (2011), we define the local winding marker

(3) |

where is the position along the chain, labels the degrees of freedom in the unit cell of the Bravais lattice, and is the volume of the unit cell. While we focus here on one-dimensional systems, the same construction is available in all odd space dimensions.

The local winding marker can be computed for the experimentally relevant case of disordered finite-size systems with open boundaries and, notably, for composite systems. In Fig. 1, a chain is divided in three regions, each with different parameters of the Hamiltonian. Infinite-size systems with the corresponding parameters would have three different winding numbers. Away from the interfaces, the winding marker displays plateaux at the corresponding values, up to fluctuations due to disorder.

Neural network assisted measurement. In order to infer the value of the winding marker from accessible experimental data, we use supervised machine learning in the form of a feedforward neural network. The spatially resolved density of the states close to the Fermi energy can be measured in STM experiments, as discussed Chevallier and Klinovaja (2016) and observed Nadj-Perge et al. (2014); Ruby et al. (2015); Pawlak et al. (2016); Feldman et al. (2016); Jeon et al. (2017) in the context of one-dimensional topological systems. However, there is little control on the number of states involved in the measurement when the system is disordered, and the LDOS can be measured only up to an unknown prefactor. In order to model such a measurement, we use as input of our neural network the LDOS corresponding to an energy window of size centred at the Fermi energy

(4) |

Here, the sum runs over the internal degrees of freedom , and over the eigenstates of the Hamiltonian with energies . Besides, is a normalisation constant ensuring that , so that the neural network does not simply count the total number of states in the window. In our numerical calculations, we set to be of the bandwidth. In Fig. 2, we show some examples of normalised LDOS drawn from the dataset used to train our neural network. A visual analysis reveals no obvious connection between the shape of the LDOS and the number of topological edge states.

Although the winding marker is defined locally, we are interested in its value in the bulk of the system. Away from the sample boundaries, the winding marker corresponds to the topological invariant , up to fluctuations due to disorder. To remove these, we label each item in the training set of the neural network with the average of over a region of size in the centre of the sample of size . The neural network is then trained to predict from a normalised LDOS. Details about the architecture, implementation, and -fold cross-validation training and testing of the feedforward neural network are discussed in the Supplemental Material.

Results. In this work, we focus on the disordered Kitaev chain Kitaev (2001), where we include next to nearest neighbours hoppings in order to explore the phase, in addition to the usual phases. For simplicity, we assume the hopping terms to be equal to the superconducting pairings, and consider the Hamiltonian , where , , , where are Pauli matrices in particle-hole space. Here, we consider uncorrelated disorder where are independent and identically distributed random variables following a normal distribution with mean and standard deviation , and similarly for . This Hamiltonian has both particle-hole and chiral symmetries. In a generic superconducting system, only particle-hole symmetry is present, and our method might be adapted to assess the corresponding topology. Here, we focus on the more delicate situation where several topologically non-trivial phases have to be distinguished.

The dataset for the training and testing of the neural network consists of tuples obtained by randomly drawing the parameters , and uniformly from the interval and from . In Fig. 3, we show the two-dimensional distribution of the predicted winding marker with respect to the actual (directly calculated) averaged winding marker . For a perfect prediction, all the data points should lie on the diagonal; and, for a perfectly quantised winding marker, all the data should concentrate at the points for . In Fig. 3(a), three spots are indeed clearly visible, and their finite width is due to the presence of disorder. The normalised distribution of the error , in Fig. 3(b), shows the accuracy of the predictions. For our trained neural network, we obtain a root mean squared error . The tail in the distribution of errors in Fig. 3(b) corresponds to the subtle vertical features in Fig. 3(a), where the error is larger. In order to test the scalability of our approach, we consider a system twice as big, with length . We obtain a similar using a dataset composed of tuples . The influence of the size of the dataset on the MSE is discussed in the Supplemental Material.

We expect the network to recognise features associated with the topological edge states, and not inessential features specific to the system. To verify this hypothesis, we train and test the same neural network using as input the LDOS of a sample of length restricted to the central sites. As expected, the network trained in this way loses any predictive ability; see Supplemental Material. In Fig. 4, we show a slice of the phase diagram of the disordered Kitaev chain, comparing the values of (a) the predicted winding marker and (b) the spatially averaged winding marker over a range of parameters, for a single disorder realisation. Further, in panels (c) and (d) of Fig. 4, we show their average over disorder realisations. The remarkable agreement between the actual and predicted winding markers illustrate the accuracy of the network in parameter space, even for large disorder.

So far, we have used a neural network to infer the bulk topology of a homogeneous finite-size chain from its LDOS. Further, we can take advantage of the local character of the winding marker by applying our method to a composite chain. As a proof of principle, we focus on the simplest example where the left and right halves of a one-dimensional chain of size are potentially different. More precisely, both the average values and the standard deviations of the parameters and are independently chosen for the left and the right of the chain. For simplicity, is set to identically vanish throughout the chain, which implies that the winding number can be either or , on each side. The same procedure as before is then applied: the LDOS of the entire chain is used as the input of a feedforward neural network, which is trained using as labels the averages and of the winding marker over regions of size centred at and . The network outputs the predicted values of the averaged winding markers and . Here, L and R respectively label the left and right sides of the chain.

In Fig. 5, we show the two-dimensional marginal distributions (a) and (b) , for a chain of length , and tuples in the dataset. The neural network is identical as for Fig. 3, except for the output layer which now includes two units. As expected, Fig. 5(a) resembles Fig. 3(a) while Fig. 5(b) shows the lack of any meaningful correlation between the predicted value of the left half and the actual value of the right half. The marginals and (not shown) have identical features. For the trained neural network, we obtain a .

Discussion. In this article, we introduced a local marker to characterise the topology of finite-size, possibly composite, one-dimensional chiral systems. We have shown that machine learning techniques allow to infer the average of this local winding marker from the experimentally accessible local density of states. Crucially, not only are we able to distinguish topological from non-topological phases but we can also discriminate between topological phases with different invariants. Our approach is as non-invasive as possible, and is suitable even for systems where direct transport measurements cannot be used.

While the winding marker is a genuinely local quantity, the neural network fundamentally recognises interfaces, as it relies upon the appearance of topological edge states in the local density of states. This is reminiscent of the scattering matrix description of topological systems. Although the neural network predicts spatially averaged values of the winding marker, we have shown that it can locally predict the topology of adjacent regions in a composite system.

Here, we focused on a proof of concept where the LDOS and the topological marker are determined from a specific family of tight-binding Hamiltonians. When a larger family of Hamiltonians is considered, e.g. including more degrees of freedom and longer range hoppings, a larger network and training set might be required to maintain the same level of accuracy. For example, we expect our approach to distinguish even larger values of the topological invariant, possibly at a cost of an increased size of the neural network. In most experimental setups, the parameter space of the Hamiltonians is strongly constrained by the symmetries and the locality. Therefore, we expect that for each setup, it is possible to tailor and train a network which can efficiently identify the distinct topological phases. We can also wonder if a finite training set is enough to learn a general rule, allowing the predictions to remain accurate even when the Hamiltonians are generated from larger and larger subsets of the whole set of class BDI Hamiltonians. This question goes beyond the scope of this work, but is highly interesting from a fundamental point of view. Future directions for research also include extensions to the Chern marker for two-dimensional systems, as well as to other local topological markers.

Acknowledgments. We thank J. Tworzydło and C. Beenakker for fruitful discussions. This research was supported by the Netherlands Organisation for Scientific Research (NWO/OCW) as part of the Frontiers of Nanoscience (NanoFront) program, by an ERC Synergy Grant, and by the Foundation for Polish Science through the IRA Programme, co-financed by EU within SG OP.

## References

- Hasan and Kane (2010) M. Z. Hasan and C. L. Kane, Reviews of Modern Physics 82, 3045 (2010).
- Dlaska et al. (2017) C. Dlaska, B. Vermersch, and P. Zoller, Quantum Science and Technology 2, 015001 (2017).
- Nayak et al. (2008) C. Nayak, S. H. Simon, A. Stern, M. Freedman, and S. D. Sarma, Reviews of Modern Physics 80, 1083 (2008).
- Bianco and Resta (2011) R. Bianco and R. Resta, Physical Review B 84, 241106 (2011).
- Caio et al. (2019) M. D. Caio, G. Möller, N. R. Cooper, and M. J. Bhaseen, Nature Physics (2019), 10.1038/s41567-018-0390-7.
- Akhmerov et al. (2011) A. R. Akhmerov, J. P. Dahlhaus, F. Hassler, M. Wimmer, and C. W. J. Beenakker, Physical Review Letters 106, 057001 (2011).
- Fulga et al. (2011) I. C. Fulga, F. Hassler, A. R. Akhmerov, and C. W. J. Beenakker, Physical Review B 83, 155429 (2011).
- Fulga et al. (2012) I. C. Fulga, F. Hassler, and A. R. Akhmerov, Physical Review B 85, 165409 (2012).
- Beenakker (2015) C. W. J. Beenakker, Reviews of Modern Physics 87, 1037 (2015).
- Nadj-Perge et al. (2014) S. Nadj-Perge, I. K. Drozdov, J. Li, H. Chen, S. Jeon, J. Seo, A. H. MacDonald, B. A. Bernevig, and A. Yazdani, Science 346, 602 (2014).
- Ruby et al. (2015) M. Ruby, F. Pientka, Y. Peng, F. von Oppen, B. W. Heinrich, and K. J. Franke, Physical Review Letters 115, 197204 (2015).
- Pawlak et al. (2016) R. Pawlak, M. Kisiel, J. Klinovaja, T. Meier, S. Kawai, T. Glatzel, D. Loss, and E. Meyer, NPJ Quantum Information 2, 16035 (2016).
- Feldman et al. (2016) B. E. Feldman, M. T. Randeria, J. Li, S. Jeon, Y. Xie, Z. Wang, I. K. Drozdov, B. A. Bernevig, and A. Yazdani, Nature Physics 13, 286 (2016).
- Jeon et al. (2017) S. Jeon, Y. Xie, J. Li, Z. Wang, B. A. Bernevig, and A. Yazdani, Science 358, 772 (2017).
- Zhang et al. (2018a) H. Zhang, C.-X. Liu, S. Gazibegovic, D. Xu, J. A. Logan, G. Wang, N. van Loo, J. D. S. Bommer, M. W. A. de Moor, D. Car, R. L. M. O. het Veld, P. J. van Veldhoven, S. Koelling, M. A. Verheijen, M. Pendharkar, D. J. Pennachio, B. Shojaei, J. S. Lee, C. J. Palmstrøm, E. P. A. M. Bakkers, S. D. Sarma, and L. P. Kouwenhoven, Nature 556, 74 (2018a).
- Chen (2007) C. J. Chen, Introduction to Scanning Tunneling Microscopy (Oxford University Press, 2007).
- (17) P. Mehta, M. Bukov, C.-H. Wang, A. G. R. Day, C. Richardson, C. K. Fisher, and D. J. Schwab, “A high-bias, low-variance introduction to machine learning for physicists,” arXiv:1803.08823 .
- Carleo and Troyer (2017) G. Carleo and M. Troyer, Science 355, 602 (2017).
- Carrasquilla and Melko (2017) J. Carrasquilla and R. G. Melko, Nature Physics 13, 431 (2017).
- (20) P. Baireuther, T. E. O’Brien, B. Tarasinski, and C. W. J. Beenakker, Quantum 2, 48.
- Baireuther et al. (2019) P. Baireuther, M. D. Caio, B. Criger, C. W. J. Beenakker, and T. E. O’Brien, New Journal of Physics 21, 013003 (2019).
- van Nieuwenburg et al. (2017) E. P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Nature Physics 13, 435 (2017).
- Carvalho et al. (2018) D. Carvalho, N. A. García-Martínez, J. L. Lado, and J. Fernández-Rossier, Physical Review B 97, 115453 (2018).
- Zhang et al. (2018b) P. Zhang, H. Shen, and H. Zhai, Physical Review Letters 120, 066401 (2018b).
- Holanda and Griffith (2019) N. L. Holanda and M. A. Griffith, “Machine learning topological phases in real space,” (2019), arXiv:1901.01963v1 .
- Zhang and Kim (2017) Y. Zhang and E.-A. Kim, Physical Review Letters 118, 216401 (2017).
- Ohtsuki and Ohtsuki (2016) T. Ohtsuki and T. Ohtsuki, Journal of the Physical Society of Japan 85, 123706 (2016).
- Ohtsuki and Ohtsuki (2017) T. Ohtsuki and T. Ohtsuki, Journal of the Physical Society of Japan 86, 044708 (2017).
- (29) H. Araki, T. Mizoguchi, and Y. Hatsugai, “Phase diagram of disordered higher order topological insulator: a machine learning study,” arXiv:1809.09865 .
- Yoshioka et al. (2018) N. Yoshioka, Y. Akagi, and H. Katsura, Physical Review B 97, 205110 (2018).
- (31) B. S. Rem, N. Käming, M. Tarnowski, L. Asteria, N. Fläschner, C. Becker, K. Sengstock, and C. Weitenberg, “Identifying quantum phase transitions using artificial neural networks on experimental data,” arXiv:1809.05519 .
- Scaffidi and Simon (2015) T. Scaffidi and S. H. Simon, Physical Review Letters 115, 087003 (2015).
- Kallin and Berlinsky (2016) C. Kallin and J. Berlinsky, Reports on Progress in Physics 79, 054502 (2016).
- Sahlberg et al. (2017) I. Sahlberg, A. Westström, K. Pöyhönen, and T. Ojanen, Physical Review B 95, 184512 (2017).
- Röntynen and Ojanen (2015) J. Röntynen and T. Ojanen, Physical Review Letters 114, 236803 (2015).
- Jezouin et al. (2013) S. Jezouin, F. D. Parmentier, A. Anthore, U. Gennser, A. Cavanna, Y. Jin, and F. Pierre, Science 342, 601 (2013).
- Banerjee et al. (2017) M. Banerjee, M. Heiblum, A. Rosenblatt, Y. Oreg, D. E. Feldman, A. Stern, and V. Umansky, Nature 545, 75 (2017).
- Banerjee et al. (2018) M. Banerjee, M. Heiblum, V. Umansky, D. E. Feldman, Y. Oreg, and A. Stern, Nature 559, 205 (2018).
- (39) Y. Kasahara, T. Ohnishi, Y. Mizukami, O. Tanaka, S. Ma, K. Sugii, N. Kurita, H. Tanaka, J. Nasu, Y. Motome, T. Shibauchi, and Y. Matsuda, “Majorana quantization and half-integer thermal quantum hall effect in a kitaev spin liquid,” arXiv:1805.05022 .
- Qi et al. (2008) X.-L. Qi, T. L. Hughes, and S.-C. Zhang, Physical Review B 78, 195424 (2008).
- Ludwig (2015) A. W. W. Ludwig, Physica Scripta T168, 014001 (2015).
- Aizenman and Graf (1998) M. Aizenman and G. M. Graf, Journal of Physics A: Mathematical and General 31, 6783 (1998).
- Prodan and Kohn (2005) E. Prodan and W. Kohn, Proceedings of the National Academy of Sciences 102, 11635 (2005).
- Su et al. (1979) W. P. Su, J. R. Schrieffer, and A. J. Heeger, Phys. Rev. Lett. 42, 1698 (1979).
- Kitaev (2001) A. Y. Kitaev, Physics-Uspekhi 44, 131 (2001).
- Chiu et al. (2016) C.-K. Chiu, J. C. Y. Teo, A. P. Schnyder, and S. Ryu, Reviews of Modern Physics 88, 035005 (2016).
- Mondragon-Shem et al. (2014) I. Mondragon-Shem, T. L. Hughes, J. Song, and E. Prodan, Physical Review Letters 113, 046802 (2014).
- Song and Prodan (2014) J. Song and E. Prodan, Physical Review B 89, 224203 (2014).
- Rakovszky et al. (2017) T. Rakovszky, J. K. Asbóth, and A. Alberti, Physical Review B 95, 201407 (2017).
- Prodan and Schulz-Baldes (2016) E. Prodan and H. Schulz-Baldes, Bulk and Boundary Invariants for Complex Topological Insulators (Springer International Publishing, 2016).
- Chevallier and Klinovaja (2016) D. Chevallier and J. Klinovaja, Physical Review B 94, 035417 (2016).
- (52) S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv:1502.03167 .
- Srivastava et al. (2014) N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Journal of Machine Learning Research 15, 1929 (2014).
- Kingma and Ba (2014) D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” (2014), arXiv:1412.6980 .
- Chollet et al. (2015) F. Chollet et al., “Keras,” https://keras.io (2015).
- Abadi et al. (2015) M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” https://www.tensorflow.org/ (2015).

## Appendix A Neural network architecture

The input of our model is an array of nonnegative real numbers of fixed length representing the normalised local density of states (LDOS) of the finite system close to the Fermi energy; its output is the predicted winding number , a single real number. For this regression task, we employ a feedforward artifical neural network composed of hidden layers. Each hidden layer , with , contains rectified linear units (ReLU) to provide non-linearity, followed by a batch normalisation (BN) Ioffe and Szegedy () to speed up and stabilise the training by reduction of the internal covariance shift. The output layer is a single unit with linear activation, which corresponds to a linear mapping from the last hidden layer. To regularise the model and thus prevent overfitting, during training we apply dropout Srivastava et al. (2014) to the output of the last hidden layer with a dropout probability of . The weights and biases of the network are fitted using the Adam optimizer Kingma and Ba (2014), with an learning rate of . We use the mean squared error (MSE) as the loss function to train the parameters of the neural network, as it provides both a suitable metric for the regression problem and can be used in backpropagation, being differentiable with respect to the network weights. The implementation is done with the Keras package Chollet et al. (2015), using TensorFlow Abadi et al. (2015) as backend.

The network is formally a function space of maps parametrised by a set of weights matrices and bias vectors that can be expressed as

(5) |

where

(6) | ||||

(7) | ||||

(8) |

The transformation is described in Algorithm 1 of Ref. Ioffe and Szegedy (). It is parametrised by the feature-wise mean and variance values, which are fitted during training.

For the case of a bipartite composite system, the output of the model is a vector . The only change to the neural network architecture is that in this case, the output layer is now composed of two neurons with linear activation.

The model architecture is obtained by evaluating the MSE of different architectures on a fixed test set when trained on a fixed training set; a separate validation set is used for interrupting the training when the MSE on it is no longer decreasing. The contending ML models were AlexNet-like convolutional neural networks, boosted trees, and support vector machine with linear kernel, but a feedforward neural network largely outperformed all of them. Once the general architecture of the model is found, the same strategy of MSE evaluation is used again to choose the hyperparameters of the network (e.g. number of hidden neurons, number of layers, dropout ratio).

After the architecture of the model is determined, a new dataset is generated to train and test the network. The presented results are obtained by -fold cross-validation with , where the whole dataset is randomly split in “folds” each containing the same fraction of data. One at a time, each fold is used for testing whereas the other are used for training. This allows us to estimate the expected error of our method as well as the uncertainty on this expected error, by computing the the average and standard deviation of the MSE for all the folds in the cross-validation.

## Appendix B Kitaev model with second nearest neighbours

The Kitaev Hamiltonian with second nearest neighbours reads

(9) |

The terms proportional to and break chiral symmetry and time-reversal symmetry (complex conjugation), but preserve particle-hole symmetry . As such, they collapse the invariant to a invariant when present.

In the main text, we consider for simplicity, and set to preserve chiral symmetry. We mainly consider disordered homogeneous systems, where are independent and identically distributed random variables following a normal distribution with mean and standard deviation , and similarly for .

When all parameters are uniform in space, the system is translation invariant and one can block-diagonalize it in Bloch representation, where the Bloch Hamiltonian is

(10) |

## Appendix C Additional data

The influence of the size of the dataset on the performance of the model is illustrated in Fig. 6. A series of subsets with sizes evenly distributed on a logarithmic scale are randomly drawn from the main dataset. For each subset, the MSE and its uncertainty are estimated using the -fold cross-validation training and testing procedure discussed in the section Neural network architecture. The MSE quickly decreases up to a dataset size of roughly . For larger datasets, a slower decrease of the MSE compatible with a linear behaviour on logarithmic scale is observed. The uncertainty on the MSE, which represents the variability for different folds in the -fold procedure, is notably larger in the first part. This might be due to an inadequate sampling of the parameter space for small datasets, due to an insufficient number of data points. The same behaviour is observed both for chains of length and ; in the second part of the graph, the curves for both lengths appear to be parallel with a constant offset, up to the uncertainties.

We presume that the machine learning procedure distinguishes systems with different topological invariants from the existence and shape of the topologically protected edge states expected from the bulk-boundary correspondence principle. Hence, it should not be possible to learn anything from the LDOS of a bulk system. To verify this, we train and test the same neural network as in the main text, but using as input the LDOS of a sample of length where the LDOS is restricted to the central sites. In this region, the edge states have a vanishing contribution for most of the system parameters. The two-dimensional histogram and the distribution of the absolute error in Fig. 7 show that the artificial neural network has lost any meaningful predictive ability. Correspondingly, the RMSE is relatively large, although this value alone would not be sufficient to draw a conclusion. The origin of the peaks in the marginal distribution is not clear, but is probably not physical; they may correspond to noise amplified by the artificial neural network.