# Generalized Transfer Matrix States from Artificial Neural Networks

###### Abstract

Identifying variational wavefunctions that efficiently parametrize the physically relevant states in the exponentially large Hilbert space is one of the key tasks towards solving the quantum many-body problem. Powerful tools in this context such as tensor network states have recently been complemented by states derived from artificial neural networks (ANNs). Here, we propose and investigate a new family of quantum states, coined generalized transfer matrix states (GTMS), which bridges between the two mentioned approaches in the framework of deep ANNs. In particular, we show by means of a constructive embedding that the class of GTMS contains generic matrix product states while at the same time being capable of capturing more long-ranged quantum correlations that go beyond the area-law entanglement properties of tensor networks. While generic deep ANNs are hard to contract, meaning that the corresponding state amplitude can not be exactly evaluated, the GTMS network is shown to be analytically contractible using transfer matrix methods. With numerical simulations, we demonstrate how the GTMS network learns random matrix product states in a supervised learning scheme, and how augmenting the network by long-ranged couplings leads to the onset of volume-law entanglement scaling. We argue that this capability of capturing long-range quantum correlations makes GTMS a promising candidate for the study of critical and dynamical quantum many-body systems.

## I I. Introduction

The quantum many-body problem is one of the outstanding challenges in physics. Besides providing deep theoretical insights, its solution may enable revolutionary technological applications including room-temperature superconductivity and new nano-technology enabled by the understanding of complex macro-molecules. The key issue in this context is the exponential complexity of generic quantum many-body states with the number of constituents. A widely applicable and successful approach towards taming this exploding complexity is to devise families of variational states that efficiently parametrize the physical scenario under investigation.

A paradigmatic example along these lines is provided by tensor network states such as matrix product states (MPS) WhiteDMRG (); McCullochiDMRG (); SchollwoeckMPS_DMRG (); Ostlund_MPS_DMRG (); Cirac_MPS_DMRG (), and their higher dimensional generalizations CiracPEPS (); VidalTTN (); NoackTTN (); MarcelloSimoneTTN (); Vidal_MERA (); CiracReview_TNS (). The tensor network ansatz describes exponentially decaying correlations as reflected in the area-law entanglement of the wavefunction, thus successfully capturing ground states of gapped local Hamiltonians EisertPlenio_AreaLaw (); Hastings_AreaLaw (); Vedral_entanglement_rev (). Tensor network states have also become an important tool for the study of critical systems Vidal_Kitaev_entangCritical (); Schollwoeck_DMRG_Luttinger (); Laflorencie_Entang1Dcrit () and dynamical properties Vidal_TimeEvo (); White_TimeEvo (); ZaletelExactMPS () even though it is clear that these situations exhibit quantum correlations beyond area-law entanglement. The price to pay for encompassing such scenarios is to increase the size of the tensors, i.e. the number of variational parameters, with system size and (exponentially) with time, respectively. Hence, the stronger growth of entanglement limits the amenable system sizes and periods of time-evolution.

Complementing existing variational approaches WhiteDMRG (); McCullochiDMRG (); SchollwoeckMPS_DMRG (); Ostlund_MPS_DMRG (); Cirac_MPS_DMRG (); CiracPEPS (); VidalTTN (); NoackTTN (); MarcelloSimoneTTN (); Vidal_MERA (); CiracReview_TNS (); McMillan_HeVMC (); Ceperley_HeVMC (); SorellaSR (); CasulaSorellaSR (); Jastrow (); Gutzwiller (), quantum states derived from artificial neural networks (ANN) LeCun_DeepLearning (); CarasquillaMelko (); Wang_LearningPhases (); Bengio_RBMrepresentability (); CarleoTroyer () have recently been introduced and studied. There, the physical degrees of freedom are coupled to a set of auxiliary units (see Fig. 1), and the wavefunction is obtained summing over all configurations of the auxiliary degrees of freedom (contracting the network), thus retaining the couplings as variational parameters. Analytically contracting one of the simplest ANN architectures known as restricted Boltzmann machine (RBM) Bengio_RBMrepresentability () already leads to quantum states CarleoTroyer () that exhibit volume-law entanglement DengVolumelaw (), thus offering an alternative variational wavefunction for those situations in which short range quantum correlations captured by conventional tensor network methods may not be sufficient RBMStabilizer (); NomuraHubbard (); SaitoHubbard (); our_article (). Using the more complex ANN class of deep Boltzmann machines (DBM) HintonDBM (); GaoDBM (); Bengio_RBMrepresentability (); Carleo_exactImag () (see Fig. 1), it has recently been proven that the imaginary time evolution towards the ground state of a generic many-body Hamiltonian can be exactly represented at polynomial network complexity Carleo_exactImag (). However, this does not imply an efficient solution of a given many-body problem, since the exact contraction of a DBM in general is a exponentially hard problem. Hence, the explicit form of the wavefunction is in general not accessible even if it can be efficiently represented graphically in the DBM framework.

The purpose of this work is to develop a hybrid approach bridging between tensor network and ANN states. To this end, we introduce and study a class of exactly contractible DBM networks, which we coin generalized transfer matrix state (GTMS) networks (see Fig. 1). There the wavefunction is analytically evaluated using transfer matrix methods. Quite remarkably, the resulting GTMS are capable of arbitrarily interpolating between MPS and RBM states thus combining key physical properties of these two powerful variational methods. As a limiting case, we obtain conventional MPS (RBM states) from the GTMS architecture by cutting the red (purple) couplings in Fig. 1. To demonstrate how arbitrary MPS are efficiently parameterized in the proposed framework, we show that GTMS networks can indeed learn random MPS by optimizing the coupling parameters in a supervised learning scheme. Furthermore, we argue how the GTMS generalizes and augments the class of MPS by making the tensors non-locally dependent on the physical degrees of freedom. This more complex structure allows the GTMS to capture correlations beyond area-law entanglement. Our analysis is supported by numerical studies on the scaling of the 2 Rényi entanglement entropy, showing that with the addition of non-local neural couplings in the network (red links in Fig. 1) the GTMS indeed acquires volume-law entanglement. This increased representational power makes the GTMS a promising candidate for the study of critical and time-dependent systems.

Several previous works have investigated the relationship between ANN sates and tensor network states HuangPiP (); GlasserMunich (); Clark (); ChenANN_TN (), establishing a general correspondence between certain Boltzmann machine (among which short-range RBM) architectures and MPS representations of quantum states. Going beyond these previous insights, our present construction provides a constructive and efficient embedding of MPS and RBM states into the general framework of DBM networks. This allows us to continuously interpolate between MPS and RBM states and generalize both approaches in a physically motivated fashion.

The remainder of the Paper is organized as follows. In the next Section we review the concept of a deep Boltzmann machine state. In Section III we introduce the GTMS, discussing the DBM architectures that can be exactly contracted by means of a transfer matrix approach, and showing how it can interpolate between MPS and RBM. In Section IV we explicitly show that MPS are a special case of GTMS, and we construct the paradigmatic AKLT state from a GTMS network as an example. We also numerically show that the GTMS architecture can be trained to learn generic random MPS. In Section V we provide a numerical analysis of the 2 Rényi entanglement entropy of the GTMS, and present a concluding discussion in Section VI.

## Ii Ii. Deep Boltzmann Machine States

To set the general stage for our construction, we start with a brief discussion of deep Boltzmann machine (DBM) states Bengio_RBMrepresentability (); HintonDBM (); GaoDBM (). We focus on a three-layer architecture, in which the auxiliary units can be organized in two distinct neural layers which we call hidden and deep layers (see Fig 1). The input to such a DBM network is the physical layer, i.e. the set of physical spins (or any quantum number locally associated to the sites of the physical system) (). The hidden and deep layers are chosen as sets of classical Ising spins with values , (, ). We denote the physical spin configuration with , the hidden spin configuration with and the deep spin configuration with . The physical and hidden spins are coupled by a set of (complex) weights representing the links of the network. Hidden and deep layers are coupled as well by a set of weights . Additionally, the weights , and are referred to as bias terms, and play the role of local fields for the physical , hidden and deep spins, respectively. All these couplings, collectively denoted by play the role of the variational parameters for the DBM state. The connectivity of the network is encoded in a function, called network energy, which in the present case reads as

(1) |

The network configurations are then assigned generalized complex Boltzmann weights , and the variational wavefunction is obtained after a partial partition sum, i.e. by tracing over the hidden and deep layer configurations:

(2) |

As a simpler limiting case obtained by discarding the deep layer, let us briefly recall the notion of restricted Boltzmann machine (RBM) states CarleoTroyer (). In a RBM network the auxiliary spins consist of only one hidden layer of classical Ising spins and there are no couplings within the set of hidden units. The network energy (see (1)) for a RBM simplifies to . To obtain , for the RBM we only need to sum over all configurations, which yields the RBM state CarleoTroyer ().

The power of going from RBM to DBM networks lies in the universal representational capabilities Bengio_RBMrepresentability () of the latter, which has been demonstrated in a quantum physics context by showing that a three-layer DBM is capable of exactly representing the (imaginary) time evolution of generic quantum many-body systems Carleo_exactImag (). Concretely, Ref. Carleo_exactImag () proves that a suitable DBM network of polynomial complexity in system size and imaginary time with weigths can exactly represent the imaginary time evolution of an initial quantum state with respect to a generic many-body Hamiltonian , i.e.

(3) |

However, the major caveat limiting the immediate applicability of this strong result is that it is practically impossible in general to exactly evaluate the wavefunction amplitude by performing the sum on the right-hand side of Eq. (2). This is in stark contrast to the simpler RBM, where the wavefunction is readily be calculated analytically. In more physical terms, Boltzmann machine state amplitudes resemble an effective action for the physical spins obtained by tracing out a bath of hidden spin variables. Within this analogy, for the RBM architecture the hidden layer amounts to a free spin-system, while for a DBM the auxiliary variables represent an interacting spin system which is hard to solve in general. In the remainder of this article we will identify and study a class of DBM architectures, coined generalized transfer matrix state (GTMS) networks, that can still be exactly contracted, and leads to a unifying generalization of MPS and RBM states.

## Iii Iii. Generalized Transfer Matrix States

We now define the central entity of this work, namely the generalized transfer matrix state (GTMS) network, as a particular exactly contractible, deep Boltzmann machine. Our construction is inspired by the aforementioned interpretation of the DBM wavefunction as an effective action obtained by tracing out an interacting spin system representing the set of auxiliary units. This raises the natural question what kind of wavefunctions are obtained when constraining the couplings so as to make this auxiliary spin-system exactly solvable, which leads us to a substantially larger class of networks than the previously considered RBM architecture (corresponding to a non-interacting auxiliary spin system). Specifically, we will group the auxiliary spins into blocks and limit the connectivity between hidden and deep layers to a nearest-neighbor connectivity between these blocks (see grey boxes in Fig. 1), while retaining all-to-all connectivity between the physical and the hidden layer. Once this constraint has been implemented, the sum over the auxiliary variables can be evaluated adopting a transfer matrix method, well known from the solution of the 1D Ising model.

A detailed exemplary visualization of this GTMS network architecture is shown in Fig. 2. The hidden and deep auxiliary spins are grouped into blocks (the grey shaded areas in Figs. 1 and 2), containing deep and hidden spins per block. Within these blocks, the connectivity between hidden and deep variables is all-to-all, but to make the network contractible the couplings between different blocks are limited to nearest neighbors (the purple dashed lines in the Fig. 2). We impose periodic boundary conditions (PBC) to the network, i.e. the last and the first blocks of auxiliary spins are also coupled. In general the number of blocks can be different from the number of physical sites . Also, arbitrary (i.e. from 2-body to -body) direct couplings between deep variables in the same block and in neighboring blocks, as well as direct all-to-all couplings between deep and physical layers, can be introduced, still keeping the network contractible (these connections are not shown in Fig. 2 - see the Appendix for more detailed discussion). Finally, we point out that the number and of hidden and deep spins per block can in general depend on the block index itself, i.e. and with .

Next, we explicitly contract the GTMS network illustrated in Fig. 2, so as to derive an analytical form of the GTMS amplitude. For a straightforward extension to the aforementioned slightly more general connectivity we refer to the Appendix. The network energy of the GTMS network reads as

(4) |

Here the set of weights contains: , , which are the complex on-site bias weights for , , respectively (not explicitly shown in Fig. 2), which denote the couplings between physical and hidden (red and purple links between physical and hidden layers in Fig. 2), that couple and within the same block, and that couple and in neighboring blocks (the dashed purple links in Fig. 2). We refer to the weights with as RBM weigths (red links in Fig. 2), while the rest of the links, except for the (the black links in Fig. 2) are referred to as MPS weights. This nomenclature is motivated by the fact that if the network is restricted to contain only MPS weights together with the ’s (i.e. if one sets to the RBM weights), the state obtained after its contraction can be recast as an MPS. This will be explained in more detail in Section IV. If we keep instead only the RBM weights with the ’s (that is, we set to the MPS weights) the dependence of the network energy (Eq. (4)) on the deep spins would disappear yielding eventually the network energy of a RBM, and therefore a RBM wavefunction after the contraction of the network.

To perform the sum over hidden and deep variables configurations of Eq. (2) we first trace out the hidden layer. Since the hidden spins are not directly coupled, this sum is easily performed (analogous to the RBM case), and yields:

(5) |

where denotes the deep spin configuration at block , i.e. . The elements of the product read as

(6) |

with

(7) |

Now, in order to perform the sum over all the configurations of the deep variables, we can interpret the complex numbers as elements of a transfer matrix associated to block . One can uniquely associate a index, running from to , to a deep spin configuration at block , by interpreting the Ising spins in as bits. We can define the elements of the transfer matrix as

(8) |

Tracing out the deep layer in Eq. (5) is then equivalent to taking the product of these transfer matrices, i.e.

(9) |

where the trace comes from the periodic boundary conditions imposed on the blocks of auxiliary spins of the network. We stress that the GTMS network does not require the introduction of PBC for being exactly contractible, and can equally well be used for physical systems with open boundary conditions (see the Appendix for a more detailed discussion).

In case and then the dimension of would depend on as well, being equal to . The transfer matrices depend in general on the index as well as on the physical spin configuration over the entire system, as opposed to the well known case of MPS where each matrix depends locally on the spin quantum number on one physical site. For this reason we call the state of Eq. (9) a generalized transfer matrix state (GTMS), and we will show below that this non-local dependence on the physical quantum numbers allows the GTMS to capture long-range correlations going beyond the area-law typical of MPS.

## Iv Iv. Matrix Product States From Gtms

In this Section we demontrate how, by removing the RBM weights, the GTMS network is able to parametrize generic MPS, with a bond dimension set by the number of deep spins per physical site. Defined as a product of tensors with elements associated to each physical site with one physical index and two auxiliary indices with the bond dimension , a generic MPS is of the form SchollwoeckMPS_DMRG (); Cirac_MPS_DMRG (); EisertPlenio_AreaLaw ():

(10) |

We can immediately see that this form is similar to the one of Eq. (9) with the number of transfer matrices equal to the number of physical sites , apart form the fact that here the matrices in the product depend only on the quantum number of the physical site . To reduce Eq. (9) to the MPS form of Eq. (10) we simply restrict the connectivity of the GTMS network so as to make the depend only on . We note that in Eq. (9) the dependence of on the entire physical layer enters via the angles of Eq. (7), where the term appears. Therefore, if we set to all the couplings where each depends only on . Pictorially this amounts to erasing all the red links in Fig. 2. In physical terms, this corresponds to neglecting the long-range quantum correlations which are mediated by the RBM couplings, keeping only the short-range correlations encoded in the MPS couplings between neighboring transfer matrices. This way, Eq. (9) becomes formally analogous to the MPS in Eq. (10), with bond dimension (assuming constant throughout the system and ):

(11) |

where can be identified with the tensor in Eq. (10) and the notation has been introduced to make the local dependence of the transfer matrices on the physical spins manifest. We note that for being able to parametrize an arbitrary MPS with bond dimension and spin physical degrees of freedom, one would in general need complex free parameters per matrix. Simple parameter counting shows that, in order to have enough free parameters, the number of hidden spins per block should scale with according to .

### iv.1 A. AKLT State from GTMS

As an emblematic example we explicitly construct the AKLT state AKLTpaper (); AKLTpaper2 () from a GTMS network, shown in Fig. 3. The AKLT state is one of the simplest MPS with bond dimension and tensors independent of position. This suggests that the use of a GTMS architecture with constant , and hidden variables per site will be sufficient for fully parametrizing the state. The AKLT state is the ground state of a modified spin-1 quantum Heisenberg model AKLTpaper (), hence the physical spin variables which constitute the inputs of our network can take values (while in the remainder of the paper we will use spin degrees of freedom). The normalized AKLT matrices read

(12) |

For the ANN architecture shown in Fig. 3 we obtain the transfer matrices

(13) |

with the angular arguments

(14) |

where the independence of several quantities on the physical site index reflects the translation invariance of the AKLT state. By choosing , , , , , , and , we find , where the normalization factor can formally be added to the DBM network by a constant shift to the network energy.

This explicit parameterization demonstrates how the AKLT state is exactly represented by a short range DBM network, in which the connectivity is limited to neighboring blocks of auxiliary units. Interestingly, this simple state cannot be directly represented by a short-range RBM network. Indeed, as noticed in GlasserMunich () short-range RBM states correspond to so called entangled-plaquette states, which are products of complex numbers associated to local clusters (plaquettes) of physical sites. Physically, this product structure of commuting local factors makes it impossible for such states to encode the hidden infinitely-ranged string order of the AKLT state AKLTstringorder (). In more practical terms, the string order constrains the AKLT wavefunction to vanish whenever two subsequent or two subsequent matrices at the physical sites and are separated only by matrices, no matter how large the distance between and is. This is encoded in the basic (non-commuting) algebra of these matrices. Clearly, such a constraint cannot be achieved by a product of complex numbers that depend only locally on the physical variables, as in the case of a short-range RBM.

### iv.2 B. Learning Random MPS

Generalizing from the basic example of the AKLT state, we now numerically show that GTMS networks without RBM weights (red links in Figs. 1 and 2) can learn a generic MPS. In the context of artificial neural networks, the word learning means that the weights of the network are iteratively optimized to find the minimum of a certain cost function LeCun_DeepLearning (); CarasquillaMelko (); Wang_LearningPhases (); CarleoTroyer (); DengVolumelaw (); RBMStabilizer (); NomuraHubbard (); SaitoHubbard (); our_article (). Assuming translation invariance, we optimize only for one MPS tensor with a network (corresponding to a single grey shaded block in Fig. 2), containing deep variables per physical site, thus yielding a bond dimension . The resulting network representation of the MPS tensor elements is visualized in Fig. 4. By fixing and the deep spin configurations and one obtains, after tracing out the hidden layer in the network of Fig. 4, the element of the transfer matrix .

The cost function that we optimize for is defined using the Frobenius norm of the difference between the GTMS transfer matrix and the random MPS tensor to be learned:

(15) |

where the dependence on the variational paramenters lies in the transfer matrix (see Eqs. (6-8)).

We considered the case of spin- degrees of freedom per physical site, so the total number of elements of the MPS tensor is . The optimization has been performed using stochastic gradient descent methods Bottou_MLwithSGD (); Bottou_Optimization_in_ML () such as AdaGrad and Adam Duchi_AdaGrad (); Adam () with parameters. An example of a convergence plot for using AdaGrad for optimizing the network with and is given in Fig. 5. By using Adam optimizer implemented in the Phyton TensorFlow libraries we were able to learn random MPS tensors up to bond dimension to a final relative accuracy , using and on an ordinary desktop computer.

At last we would like to address the question whether it is possible to learn a MPS tensor with a number of variational parameters that is lower than the number of tensor elements , thus attempting an approximate compression of the MPS. The results of this compression are shown in Fig. 6 relative to a set of realizations of random MPS tensors with bond dimension (orange data) and (blue data). We can see that for our network can indeed be optimized to learn the MPS tensors (each data point in Fig. 6 represents the best achieved relative accuracy averaged over the random MPS tensor realizations). As we find however , meaning that an efficient network compression in general is not viable when learning generic (random) MPS tensors.

## V V. Entanglement Analysis of Gtms

In this Section, we present a numerical analysis of the entanglement entropy of GTMS for one dimensional systems with PBC and spin degrees of freedom per site. We will show that the addition of non-local RBM weights to a GTMS network representing a MPS results in the onset of volume-law entanglement, as opposed to the area-law scaling obtained when keeping the MPS weights only. This is a clear indication of the improved representational power of generalized transfer matrix states.

Specifically, we calculate the second Rényi entropy for a bipartition of a one dimensional spin- system with PBC in two subsystems and , with the total system being in the pure state with GTMS wavefunctions of Eq. (9). The second Rényi entropy of subsystem is given by

(16) |

with being the reduced density matrix of subsystem (the natural logarithm is used). The algorithm introduced in Hastings_RenyiEntropy () offers a simple and efficient way for calculating with Monte Carlo, which requires Metropolis sampling of two copies of the system, as the trace of needs to be evaluated. For the Rényi entropy one would need to sample configurations of copies (see Hastings_RenyiEntropy () and the Appendix for more details).

We determine the scaling of the second Rényi entropy with the length of subsystem , comparing the two cases of a GTMS network parametrizing a conventional MPS, and the augmented GTMS to which non-local RBM couplings have been added while keeping the existing couplings unchanged. In Fig. 7, we show exact data on for a system of sites in the case of a GTMS with , (panel 7), and Monte Carlo data from Metropolis sampling for a system of sites in the case of a GTMS with , (panel 7). We observe that the addition of RBM couplings results in a volume-law (i.e. linear in ) scaling of the entanglement, and that exceeds the MPS bound (the dashed red line in Fig. 7) which is set by the bond dimension . In this sense, the GTMS familiy combines the properties of conventional MPS and RBM states.

## Vi Vi. Concluding Discussion

In summary, we proposed a deep ANN architecture that is exactly contractible and yields a class of quantum states called GTMS. The GTMS family is shown by means of a constructive mapping to include both generic MPS and RBM states, and allows to continuously interpolate between them. More specifically, GTMS networks are a family of deep Boltzmann machine networks which are exactly and efficiently contractible by means of a transfer matrix method. Our findings are corroborated by numerical data showing that the GTMS network is indeed able of efficiently parametrizing random MPS, where efficiently means that the number of variational weights scales as the number of independent parameters of the MPS. Moreover we show with a numerical analysis of the second Rényi entropy, that GTMS initially parametrizing a MPS (therefore a state with area-law entanglement) can, upon addition of RBM weights, encode long-range correlations with volume-law entanglement. On a general note, representation theorems Bengio_RBMrepresentability () tell us that the proposed augmentation of an existing network by additional couplings can only improve the capabilities of the network in representing quantum states. In our present construction, the onset of the volume-law scaling provides a concrete intuition for this increased representational power compared to conventional MPS.

The potential of RBM of representing states with volume-law entanglement and encoding up to -body correlations was already discussed DengVolumelaw (), and the general correspondence between RBM states and tensor network states is well known HuangPiP (); GlasserMunich (); Clark (); ChenANN_TN (). However, an efficient mapping between MPS and RBM states has remained elusive, since it is unclear how the required number of RBM couplings scales with the bond dimension. Here, we have used the higher representational power of DBMs to efficiently and constructively embed MPS into the general framework of ANN states. By efficiently, we mean that the number of DBM couplings needed scales as the number of free parameters in the MPS tensors. In this sense, the GTMS combines key representational properties of MPS or RBM states and has stronger representational power than either of the two alone.

The great advantage of GTMS is that both short-ranged MPS correlations and long-ranged entanglement can be efficiently captured. This makes GTMS a promising ansatz for problems where the entanglement growth poses severe limitations to tensor network studies. Such cases include the study of critical systems or of time evolution in quantum many-body systems far from equilibrium, where the MPS ansatz would require us to increase the bond dimension with system size and time, respectively EisertPlenio_AreaLaw (); Hastings_AreaLaw (); Vedral_entanglement_rev (); Schollwoeck_DMRG_Luttinger (); Vidal_Kitaev_entangCritical (); Laflorencie_Entang1Dcrit (); Vidal_TimeEvo (); White_TimeEvo (); ZaletelExactMPS (). However, it is fair to say that the generalization from MPS to GTMS comes at a price: While observables can be efficiently represented directly in the space of MPS SchollwoeckMPS_DMRG (); Cirac_MPS_DMRG (), for most variational wavefunctions including RBM states and also the proposed GTMS the understanding of the corresponding variational space is far less complete. Therefore, evaluating expectation values of physical observables and optimizing the variational parameters so far requires stochastic methods. Developing powerful optimization methods in the framework of ANN states that draw intuition from tensor network methods such as DMRG, TEBD, and TRG is an interesting direction of future research.

## Vii Vii. Acknowledgements

We acknowledge discussions with E. Bergholtz and H.-H. Tu. LP and JCB acknowledge financial support from the German Research Foundation (DFG) through the Collaborative Research Centre SFB 1143. RK is supported by the Austrian Science Fund SFB FoQuS (FWF Project No. F4016-N23) and the European Research Council (ERC) Synergy Grant UQUAM. The numerical calculations were performed on resources at the TU Dresden Center for Information Services and High Performance Computing (ZIH), and at the Chalmers Centre for Computational Science and Engineering (C3SE) provided by the Swedish National Infrastructure for Computing (SNIC).

## Appendix A Appendix

### a.1 Ai. General Exactly Contractable Dbm Network

The network shown in Fig. 2 is not the most general exactly contractible architecture. As discussed in Section III, one can introduce arbitrary (from 2-body to -body) couplings between deep variables in the same block and in neighboring blocks by still keeping the network contractible. In this Appendix we want to elaborate more on this structure, explaining how the presence of an hidden layer is fundamental for having enough variational parameters to parametrize generic MPS.

The network energy for a GTMS where arbitrary couplings between sets of deep variables within the same block and between neighboring blocks have been introduced, is obtained simply by adding a term

(17) |

to the espression of in Eq. (4), where denotes the sum of all possible direct couplings between the deep spins contained in block :

(18) |

and denotes the sum of all possible direct couplings between the deep spins in neighboring blocks and :

(19) |

The addition of these couplings does maintain the network exactly contractible. The only modification to the wavefunction amplitude is that the product elements of Eq. (6) for block are now multiplied by the corresponding factors :

One may now ask the question whether, with the addition of such arbitrary links between deep variables, the number of variational parameters is sufficient to parametrize arbitrary MPS without the addition of the hidden layer. It turns out that this is not the case. With deep spin variables per block, the number of direct links, and therefore of complex weights and in the sum between blocks and is

Therefore, with the addition of the bias terms for the deep spin variables, the total number of complex weights per transfer matrix in absence of hidden spin variables, would be , insufficient to parametrize an arbitrary MPS tensor, for which in general would be required ( being the local Hilbert space dimension). If we require the network to be able to parametrize generic MPS, we therefore must add additional hidden spins to be traced out before the sum over the deep variables configurations is computed, in order to have enough variational parameters.

### a.2 Aii. Open Boundary Conditions

Here we briefly mention how the GTMS can be contracted without the use of PBC, thereby allowing for the parametrization of MPS with open boundary conditions and their generalization to a GTMS that interpolates between them and RBM states.

Considering the network studied in Section III, described by the network energy of Eq. (4), we set open boundary conditions on it simply by erasing (i.e. setting to ) the couplings extending from tensor to . By applying the transfer matrix method to the contraction of a GTMS network with open boundaries we find an exact espression for the wavefunction, which reads

(20) |

The transfer matrices in the bulk of this open boundary GTMS are the same as the ones defined in Eqs. (6-8). The right-boundary tensor () is a column vector with elements

(21) |

where

(22) |

The left-boundary tensor () is given by

(23) |

that is a row vector, where is the transfer matrix matrix with elements .

### a.3 Aiii. Efficient Calculation of Second Rényi Entropy

Here we review the algorithm for the computation of the second Rényi entropy applicable to Monte Carlo calculations, introduced in Hastings_RenyiEntropy (). Consider a system in a quantum state , and a bipartition of it into two subsystems and . The Rényi entropy of subsystem is given by

(24) |

with being the reduced density matrix of subsystem . One can re-express in a form which is convenient for Monte Carlo calculation by considering an identical copy of the system with the same bipartition into subsystems and , and defining a Swap operator acting on the tensor product of the Hilbert spaces of the two copies, which swaps the configurations in and . More concretely, let be a state in the coordinate basis of (a spin configuration) which can be written as , where and are configurations in and respectively. Similarly, in . The swap operator acts on the tensor product of the two copies as . Using this definition it is easy to show that the second Rényi entropy can be rewritten as

(25) |

where the expectation value is taken over the product state of the two copies. This expectation value reads

(26) |

where, as before, in , in , and is the probability density for the configuration. The double sum in Eq. (26) can be replaced by sum over two sets of Monte Carlo samples of allowing for an efficient calculation of .

### a.4 Aiv. Translation Invariant Gtms

We discuss here how to implement translation invariance in a GTMS network. We start our discussion from the case of a translation invariant MPS parametrized by a GTMS network. For a translation invariant MPS, the individual tensors at each lattice site are the independent of the site index , namely . This suggests that a GTMS network parametrizing a translation invariant MPS must as well have the weights (the black and purple links in Fig. 2) independent of the site index, that is , , , , and (and ). Therefore, for a translation invariant MPS it is sufficient to calculate the matrices for the different values of once.

On the top of the MPS weights, we can then add non-zero RBM weights in such a way that translation invariance is preserved. For this it is sufficient to set the weights as dependent only on the distance between the the physical site and the position of the tensor connected by the link. This means , where one has to apply PBC by setting if is the number of physical sites of the system. However, since now the transfer matrices depend in general on the spin configuration on the whole system, one still needs to calculate all of the transfer matrices for each spin configuration.

## References

- (1) S. R. White, Density Matrix Formulation for Quantum Renormalization Groups, Phys. Rev. Lett. 69, 2863 (1992).
- (2) I. P. McCulloch, Infinite size density matrix renormalization group, revisited, arXiv:0804.2509 (2008).
- (3) U. Schollwöck, The Density-Matrix Renormalization Group in the Age of Matrix Product States, Ann. Phys. 326, 96 (2011).
- (4) S. Östlund, S. Rommer, Thermodynamic Limit of Density Matrix Renormalization, Phys. Rev. Lett. 75 19, 3537 (1995).
- (5) F. Verstraete, D. Porras, J. I. Cirac, Density Matrix Renormalization Group and Periodic Boundary Conditions: A Quantum Information Perspective, Phys. Rev. Lett. 93, 227205 (2004).
- (6) F. Verstraete, J. I. Cirac, Valence-bond states for quantum computation, Phys. Rev. A 70, 060302(R) (2004).
- (7) L. Tagliacozzo, G. Evenbly, G. Vidal, Simulation of two-dimensional quantum systems using a tree tensor network that exploits the entropic area law, Phys. Rev. B 80, 235127 (2009).
- (8) V. Murg, F. Verstraete, Ö. Legeza, R. M. Noack, Simulating strongly correlated quantum systems with tree tensor networks, Phys. Rev. B 82, 205105 (2010).
- (9) M. Gerster, M. Rizzi, P. Silvi, M. Dalmonte, S. Montangero, Fractional quantum Hall effect in the interacting Hofstadter model via tensor networks, Phys. Rev. B 96 195123 (2017).
- (10) G. Vidal, Class of Quantum Many-Body States that Can Be Efficiently Simulated, Phys. Rev. Lett 101, 110501 (2008).
- (11) F. Verstraete, V. Murg, J.I. Cirac, Matrix Product States, Projected Entangled Pair States, and Variational Renormalization Group Methods for Quantum Spin Systems, Advances in Physics, 57 (2), 143-224 (2008).
- (12) J. Eisert, M. Cramer, M. B. Plenio, Colloquium: Area Laws for the Entanglement Entropy, Rev. Mod. Phys. 82, 277 (2010).
- (13) M. B. Hastings, An Area Law for One-Dimensional Quantum Systems, J. Stat. Mech. (2007) P08024.
- (14) L. Amico, R. Fazio, A. Osterloh, V. Vedral, Entanglement in Many-Body Systems, Rev. Mod. Phys. 80, 517 (2008).
- (15) G. Vidal, J. I. Latorre, E. Rico, A. Kitaev, Entanglement in Quantum Critical Phenomena, Phys. Rev. Lett 90, 227902 (2003).
- (16) U. Schollwöck, V. Meden, W. Metzner, K. Schönhammer, DMRG Studies of Impurities in Luttinger Liquids, Progress of Theoretical Physics Supplement 145 (1), 312?319 (2002).
- (17) N. Laflorencie, E. S. So̱rensen, M.-S. Chang, I. Affleck, Boundary Effects in the Critical Scaling of Entanglement Entropy in 1D Systems, Phys. Rev. Lett. 96, 100603 (2006).
- (18) G. Vidal, Efficient Simulation of One-Dimensional Quantum Many-Body Systems, Phys. Rev. Lett 93 (4), 040502 (2004).
- (19) S. R. White, A. E. Feiguin, Real-Time Evolution Using the Density Matrix Renormalization Group, Phys. Rev. Lett 93 (7), 076401 (2004).
- (20) M. P. Zaletel, R. S. K. Mong, C. Karrasch, J. E. Moore, F. Pollmann, Time-evolving a matrix product state with long-ranged interactions, Phys. Rev. B 91, 165112 (2015).
- (21) R. Jastrow, Many-Body Problem with Strong Forces, Phys. Rev. 98 (5), 1479-1484 (1955).
- (22) M. C. Gutzwiller, Correlation of Electrons in a Narrow Band, Phys. Rev. 137 (6A), A1726-A1735 (1965).
- (23) W. L. McMillan, Ground State of Liquid He, Phys. Rev. 138 (2A), A442-A451 (1965).
- (24) D. Ceperley, G. V. Chester, M. H. Kalos, Monte Carlo simulation of a many-fermion study, Phys. Rev. B 16 (7), 3081-3099 (1977).
- (25) S. Sorella, Wave function optimization in the variational Monte Carlo method, Phys. Rev. B 71, 241103(R) (2005).
- (26) M. Casula, C. Attaccalite, S. Sorella, Correlated geminal wave function for molecules: An efficient resonating valence bond approach, J. Chem. Phys. 121, 7110 (2004).
- (27) Y. LeCun, Y. Bengio, G. Hinton, Deep Learning, Nature 521, 436-444 (2015).
- (28) J. Carrasquilla, R. G. Melko, Machine Learning Phases of Matter, Nature Physics 13, 431-434 (2017).
- (29) L. Wang, Discovering Phase Transitions with Unsupervised Learning, Phys. Rev. B 94, 195105 (2016).
- (30) N. Le Roux, Y. Bengio, Representational Power of Restricted Boltzmann Machines and Deep Belief Networks, Neural Comput. 20, 1631-1649 (2008).
- (31) G. Carleo, M. Troyer, Solving the quantum many-body problem with artificial neural networks, Science 355, 602 (2017).
- (32) D.-L. Deng, X. Li, S. Das Sarma, Quantum Entanglement in Neural Network States, Phys. Rev. X 7, 021021 (2017).
- (33) D.-L. Deng, X. Li, S. Das Sarma, Machine learning topological states, Phys. Rev. B 96, 195145 (2017).
- (34) Y. Nomura, A. S. Darmawan, Y. Yamaji, and M. Imada, Restricted Boltzmann machine learning for solving strongly correlated quantum systems, Phys. Rev. B 96, 205152 (2017).
- (35) H. Saito and M. Kato, Machine Learning Technique to Find Quantum Many-Body Ground States of Bosons on a Lattice, J. Phys. Soc. Jpn. 87, 014001 (2018).
- (36) R. Kaubruegger, L. Pastori, J. C. Budich, Chiral Topological Phases from Artificial Neural Networks, Phys. Rev. B 97, 195136 (2018).
- (37) R. Salakhutdinov, G. R. Hinton, Deep Boltzmann Machines, PMLR 5, 448-455 (2009).
- (38) X. Gao, L.-M. Duan, Efficient representation of quantum many-body states with deep neural networks, Nature Communications 8, 662 (2017).
- (39) G. Carleo, Y. Nomura, M. Imada, Constructing exact representations of quantum many-body systems with deep neural networks, arXiv:1802.09558 (2018).
- (40) Y. Huang, J. E. Moore, Neural network representation of tensor network and chiral states, arXiv:1701.06246 (2017).
- (41) I. Glasser, N. Pancotti, M. August, I. D. Rodriguez, and J. I. Cirac, Neural Networks Quantum States, String-Bond States and chiral topological states, Phys. Rev. X 8, 011006 (2018).
- (42) S. R. Clark, Unifying Neural-network Quantum States and Correlator Product States via Tensor Networks, J. Phys. A: Math. Theor. 51, 135301 (2018).
- (43) J. Chen, S. Cheng, H. Xie, L. Wang, and T. Xiang, Equivalence of restricted Boltzmann machines and tensor network states, Phys. Rev. B 97, 085104 (2018).
- (44) I. Affleck, T. Kennedy, E. H. Lieb, H. Tasaki, Rigorous results on valence-bond ground states in antiferromagnets, Phys. Rev. Lett. 59 (7), 799-802 (1987).
- (45) I. Affleck, T. Kennedy, E. H. Lieb, H. Tasaki, Valence bond ground states in isotropic quantum antiferromagnets, Commun. Math. Phys. 115, 477 (1988).
- (46) M. den Nijs, K. Rommelse, Preroughening transitions in crystal surfaces and valence-bond phases in quantum spin chains, Phys. Rev. B 40, 4709 (1989).
- (47) L. Bottou, Large-Scale Machine Learning with Stochastic Gradient Descent, Proceedings of the 19th International Conference on Computational Statistics, 177-187 (2010).
- (48) L. Bottou, F. E. Curtis, J. Nocedal, Optimization Methods for Large-Scale Machine Learning, Siam Reviews, 60 (2), 223-311 (2018).
- (49) J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research 12, 2121 (2011).
- (50) D. P. Kingma, J. L. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980 (2014).
- (51) M. B. Hastings, I. González, A. B. Kallin, R. G. Melko, Measuring Renyi Entanglement Entropy in Quantum Monte Carlo Simulations, Phys. Rev. Lett. 104, 157201 (2010).