Equivalence classes and local asymptotic normality in system identification for quantum Markov chains

Equivalence classes and local asymptotic normality in system identification for quantum Markov chains

Madalin Guta madalin.guta@nottingham.ac.uk  and  Jukka Kiukas jukka.kiukas@nottingham.ac.uk School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
Abstract.

We consider the problems of identifying and estimating dynamical parameters of an ergodic quantum Markov chain, when only the stationary output is accessible for measurements. On the identifiability question, we show that the knowledge of the output state completely fixes the dynamics up to a ‘coordinate transformation’ consisting of a multiplication by a phase and a unitary conjugation of the Kraus operators. When the dynamics depends on an unknown parameter, we show that the latter can be estimated at the ‘standard’ rate , and give an explicit expression of the (asymptotic) quantum Fisher information of the output, which is proportional to the Markov variance of a certain ‘generator’. More generally, we show that the output is locally asymptotically normal, i.e. it can be approximated by a simple quantum Gaussian model consisting of a coherent state whose mean is related to the unknown parameter. As a consistency check we prove that a parameter related to the ‘coordinate transformation’ unitaries, has zero quantum Fisher information.

1. Introduction

Quantum system identification has recently received significant attention due to its relevance in understanding complex quantum dynamical systems and the development of quantum technologies [13]. Among the different setups under investigation, we mention channel tomography [16], Hamiltonian identification [6, 11], and the estimation of the Lindblad generator of an open dynamical system [27].

In this paper we study two system identification problems set in a context which is particularly relevant for quantum control engineering applications [29, 34], namely the input-output formalism of quantum Markov dynamics, which in the continuous time framework has been studied in [17]. Our setup is that of a discrete time quantum Markov chain consisting of a quantum system (or ‘memory’) interacting successively with input ancillas (or ‘noise units’) which are identically prepared in a known pure state (see Figure 1). The goal is to learn about the dynamics encoded in the unitary operator , by measuring the quantum output consisting of the noise units after the interaction. We will assume that we do not have direct access to the ‘memory’ system, and we do not control the input state, both assumptions corresponding to realistic experimental constraints, e.g. in atom maser experiments [23]. We will also assume that the system’s transition operator is irreducible and aperiodic, so that in the long time limit the dynamics has forgotten the initial state and has reached stationarity.

Figure 1. Illustration of a quantum Markov chain. A sequence of identically prepared input ancillas (or ‘noise units’) with space interact successively (from right to left) with a system(or ‘memory’) via the unitary . After interactions the output ‘noise units’ are correlated with each other and with the system, and carry information about the unitary .

In the stationary regime, the output is in a finitely correlated state [15] denoted which is completely determined by the isometry mapping the system into a system-ancilla space in one evolution step

In particular, the restriction of the output state to the first noise units in the stationary regime is given by , where is the system’s stationary state and is the -steps iteration of the isometry (see below for the precise definition).

The first problem we address is that of finding the equivalence classes of isometries with identical output states. In Theorem 2 we show that if and only if and are related by an arbitrary complex phase and a conjugation with a local unitary on the system

We point out that a similar result holds in the following ‘classical’ setup. A (finite) hidden Markov chain is a discrete time stochastic process consisting of an ‘underlying’ Markov chain , with state space and transition matrix , and a sequence of random variables with values in , which ‘depend’ on the underlying Markov dynamics. More precisely, are independent when conditioned on , and depends on the Markov chain only through , with conditional distribution . We consider that is ‘hidden’ (not accessible to observations) and we observe the sequence whose marginal distribution depends only on the matrices and and the initial state of the Markov chain

Under ergodicity conditions similar to ours, and certain additional generic conditions, Petrie [30] has shown that by observing in the stationary regime, we can identify the matrices up to a permutation of the labels of the hidden states in . Since the output process of a quantum Markov chain can be interpreted as a quantum analogue of a hidden Markov process, our result can be seen as a quantum extension of [30].

After tackling the identifiability question, the second problem we consider is that of estimating the isometry , or more precisely finding how the output state varies with , and how much statistical information it contains. Our approach is based on asymptotic statistics, and in particular on the concept of local asymptotic normality [33]. To illustrate this in the case of the hidden Markov model, we assume that the matrices depend smoothly on an unknown real parameter which we would like to estimate, and denote by the probability distribution of the data . For large the unknown parameter can be ‘localised’ in a neighbourhood of size of a fixed value (e.g. a rough estimator), such that with a ‘local parameter’ to be estimated. Local asymptotic normality means that the statistical model can be approximated by a simple Gaussian model consisting of a single sample from the normal distribution with mean and fixed variance [4]. Closely related to this, and perhaps more relevant for the practitioner, is the fact that the maximum likelihood estimator is asymptotically normal [12, 5], i.e.

where the convergence holds in distribution, when . In particular, the mean square error scales as with the best possible constant which is the inverse limiting Fisher information per sample of the hidden Markov chain, at .

Returning to our quantum system identification problem, we note that in order to find good estimators we need to deal with the optimisation problem of finding the ‘most informative’ output measurement. In quantum state tomography this problem has been approached by developing quantum analogues of several key statistical notions such as quantum Fisher information [25, 24], and quantum local asymptotic normality [28, 19]. Equipped with these mathematical tools it is possible to solve the optimal state estimation problem in an asymptotic regime, by transforming it into a simpler Gaussian estimation one [18]. We will therefore apply the same strategy for the Markov system identification problem, but we note that since the output ‘noise units’ are not independent, we cannot use the existing state tomography theory in a straightforward fashion. Instead, we developed a notion of convergence of pure states statistical models based on the simple idea that two models are close to each other if the inner products of all corresponding pairs of vectors are very similar. To illustrate this, let be the joint system and output state after steps, with initial condition . In Theorem 4 we show that

where the right side is the inner product of two coherent states of the limit Gaussian model , with quantum Fisher information . The latter can be computed explicitly and can be interpreted as a certain Markov variance of the ‘generator’ , similarly to the well known formula for unitary rotation families of states. Building on this result, we show that the convergence to the Gaussian model can be extended to the mixed stationary output state itself, and can be formulated in a strong, operational way by means of quantum channels connecting the models in both directions [28]. In this statistical picture, the local unitary conjugation corresponding to changes inside an equivalence class has quantum Fisher information equal to zero, in agreement with the fact that such parameters are not identifiable.

The paper is organised as follows. In section 2 we review the formalism of quantum Markov chains, discuss the ergodicity assumptions and introduce the ‘Markov covariance’ inner product, which will be used later in interpreting the quantum Fisher information. In section 3 we prove Theorem 2 which characterises the equivalence classes of chains with identical outputs. Section 4 is a brief introduction to the statistical background of the paper, centred around the notion of convergence of quantum statistical models, and local asymptotic normality. The main result here is Lemma 5 which can be used to convert weak convergence of pure states models into strong convergence. Section 5 contains several local asymptotic normality results for different versions of the output state. In all cases, the quantum model described by the output state can be approximated by a quantum Gaussian model consisting of a one parameter family of coherent states. Some of the more technical proofs are collected in section 6.

A special case of the present local asymptotic normality result has been obtained in [22]. A detailed study comparing the quantum Fisher information with classical Fisher informations of various counting statistics for the atom maser model can be found in [9, 8]. The continuous time version of the local asymptotic normality result will appear in a forthcoming publication [7].

2. Quantum Markov chains

We begin by describing the general framework quantum Markov chains, and establishing some of the notations used in the paper. In the second part of this section we introduce a positive inner product describing the variance of Markov fluctuation operators, which will be used later for interpreting the limiting quantum Fisher information.

2.1. Output state, Markov transition operator and the quantum Perron-Frobenius Theorem

A discrete time quantum Markov chain consists of a ‘system’ with Hilbert space which interacts successively with ‘noise units’ (or ancillas, quantum coins) with identical Hilbert spaces , cf. Figure 1. The noise units are initially prepared independently, in the same state , and the system has initial state . The interaction is described by a unitary operator , such that after one step, the joint state of system and the first unit is , while the remaining units are still in the initial state. After the interaction with the first noise unit, the system moves one step to the left and the same operation is repeated between system and the second unit, and so on. Alternatively, one can think that the system is fixed and the chain is shifted to the right. After steps the state of the system and the noise units is therefore

where is the product of unitaries with denoting the copy of which acts on system and the -th noise unit, counting from right to left according to the dynamics of Figure 1.

Our goal is to investigate how the output state (reduced state of the noise units after the interaction) depends on the unitary and in particular which dynamical parameters can be identified by performing measurements on the output. With this in mind we note that the state can be expressed as

where is the isometry given by , and is obtained iteratively from . Therefore, since the input state is fixed and known, the output state depends on only through the isometry which will be the focus of our attention for most of the paper.

Let us fix orthonormal bases and in and respectively , and let be the collection of Kraus operators acting on , uniquely defined by the equation

(1)

which satisfy the normalisation condition

(2)

Conversely, any set satisfying (2) has a unique associated isometry given by (1). The system-output state can be written more explicitly in a matrix product form

where denotes the set of multi-indices , and for each we denote by the -step Kraus operator (note the backwards ordering)

For later purposes we define the concatenation of multi-indices and via

so that .

The properties of the system-output state depend crucially on the irreducibility of the restricted system dynamics. Let and be the algebras of system and noise observables, and let us denote by their preduals (linear spans of density matrices). In the Schrödinger picture the reduced evolution of the system is obtained by iterating the transition operator (quantum channel)

as it can be seen from the identity

The dual (Heisenberg) evolution is given by the unit preserving CP map

(3)

As for classical Markov chains, the system’s space may possess non-trivial invariant subspaces, and have multiple stationary states. We will restrict our attention to such ‘building blocks’ defined by the following properties.

Definition 1.

A channel is called

  • irreducible if there exists such that is strictly positive, i.e. for all positive operators .

  • primitive, if there exists an , such that is strictly positive.

We call an isometry irreducible (primitive), if the associated channel defined in (3) is irreducible (primitive).

Clearly primitivity is a stronger requirement than irreducibility. The following theorem (see [14, 32]) collects the essential properties of irreducible (primitive) quantum transition operators needed in this paper.

Theorem 1 (quantum Perron-Frobenius).

Let be a completely positive map and let be its spectral radius, where are the (complex) eigenvalues of arranged in decreasing order of magnitude. Then

  • is an eigenvalues of , and it has a positive eigenvector.

  • If additionally, is unit preserving then with eigenvector .

  • If additionally, is irreducible then is a non-degenerate eigenvalue for and , and both corresponding eigenvectors are strictly positive.

  • If additionally, is primitive, then for all other eigenvalues than .

We mainly need the following corollary of this theorem: If a channel is irreducible then it has a unique full rank stationary state, and if is also primitive then any state converges to the stationary state in the long run (mixing or ergodicity property);

or in the Heisenberg picture

(4)

Moreover, the speed of convergence is exponential, the rate depending on the second largest eigenvalue of .

2.2. The Markov covariance inner product

In this section we introduce an inner product playing the role of a ‘Markov covariance’, whose relevance will be become apparent when computing the quantum Fisher information of the output. On a deeper level, we conjecture that the covariance is accompanied by a Central Limit Theorem, but this topic will not be pursued here (see [22, 21] for the special case of output observables).

In this section we assume that is a primitive isometry, and we denote by the expectation with respect to the input state . Let be the subalgebra of observables of the system together with the -th (from right to left) copy of the noise unit , and for any we denote by its version in . If we define the Heisenberg evolved operator after steps by

and note that its stationary mean does not depend on and is given by

where is the conditional expectation

For all mean zero operators we define the associated fluctuations operator by

Finally, let and define to be the inverse of the restriction of the map to . Note that is well defined, as it follows from the fact that is primitive and has a unique eigenvector with eigenvalue .

Lemma 1.

Let be the linear space of mean zero operators . For any and any the following limit exists

and defines a positive inner product on .

Moreover has the explicit expression

Proof.

For simplicity we denote by the expectation with respect to the state . By expanding the fluctuation operators we obtain

Now, since converges to exponentially fast as , the first term in the last equality converges to . Similarly, the one of the summation indices in the second term can be changed to such that it can be written as

For any fixed the inner sum is dominated by terms with large and by the same stationarity argument as above, it converges to

Now, by assumption, belongs to the domain of so that the sum converges

The two facts together imply that

A similar reasoning applies to the third term of the sum. ∎

The previous lemma follows from the next conjecture which will not be investigated in this paper.

Conjecture 1 (Central Limit for quantum Markov chains).

Let be selfadjoint operator with The fluctuations operator satisfy the Central Limit Theorem

where is the centred Gaussian distribution with variance , and the convergence holds as in distribution with respect to the state

The special case where has been proven in [22]. Physically, it means that time averages of measurements of the observable are asymptotically Gaussian.

3. The equivalence class of quantum Markov chains with identical outputs

In Theorem 2 of this section we answer the first question set in the introduction: which (mixing) quantum Markov chains have the same output states in the stationary regime?

Definition 2.

Let , be two primitive isometries, where are finite-dimensional Hilbert spaces. Let denote the respective stationary states. We call and equivalent, if they have the same output states in the stationary regime, that is,

We begin with a straightforward observation that for any given primitive isometry , a number with , and a unitary , the isometry is equivalent to , and is primitive (with the stationary state ). The following Lemma characterises the case where two given primitive isometries are related this way. The proof is inspired by a similar argument from [1].

Lemma 2.

Let , , be two primitive isometries, and define the maps

for . Then the following conditions are equivalent:

  • has an eigenvalue of modulus one;

  • has an eigenvalue of modulus one;

  • there exists a unitary operator , and with , such that

In that case, and .

Proof.

Conditions (i) and (ii) are clearly equivalent: if with some and , then . Assuming (iii) we have

i.e. (i) holds, with the corresponding eigenvalue. Thus, the only nontrivial implication is (i) (iii).

Assume (i), let be an eigenvalue of modulus one, and such that . Then from the definition of it follows immediately that . Since is an isometry, we have (in fact, is a projection). Hence,

so by positivity of ,

Let be the projection onto the eigenspace of corresponding to its largest eigenvalue . Now by the primitivity of . Hence,

This implies that , i.e. is supported in the projection . But has full rank in , so , and, consequently, . By proceeding in exactly the same way using the primitive channel , we show that . Denote , and . Then is a unitary operator between the two Hilbert spaces and in particular, . Moreover,

For a unit vector , this implies

(5)

But by isometry, so there is actually equality in the Cauchy-Schwartz inequality

This is possible only if the two vectors are linearly dependent, i.e. there is a constant such that

Putting this back in (5), we see that . Since was arbitrary, we have (ii), and the proof is complete. ∎

The following lemma deals with the case where all eigenvalues of the map of the preceding lemma have modulus strictly less than one.

Lemma 3.

Let , , be two primitive isometries, and define as in Lemma 2. Let be the associated output states. Then the limits

exist and are strictly positive. If , or, equivalently, , then

(6)
Proof.

Let be the stationary state of , with spectral decomposition

The output states decompose as follows:

where

Now

so we can write

Since the isometries are primitive, we have for any by (4). Now, on the one hand, by choosing we get

On the other hand, assuming for , we get (6). ∎

We are now ready to prove the main result of this section.

Theorem 2.

Two primitive isometries , , are equivalent if and only if there exists a unitary operator , and a complex number with , such that

(7)
Proof.

As mentioned above, the ‘if’ part is straightforward. Assume now that and are equivalent, and define as in Lemma 2. We consider the direct sum isometry

We identify the elements in the usual way with block matrices

where , the set of linear operators . This identifies as a subspace , and each of these four subspaces is invariant under the channel associated with . Explicitly, we have

(8)

In particular, any eigenvalue of is also an eigenvalue of , because the subspaces are invariant. Since is completely positive and unital by construction, all eigenvalues of have modulus at most one. If all eigenvalues of have modulus strictly less than one, then , which according to Lemma 3 contradicts the assumption that the output states are equal. Hence has an eigenvalue of modulus one, so Lemma 2 concludes the proof. ∎

Remark 1.

According to [32], and applying a reconstruction procedure analogous to the one given in [2], one observes that the finitely correlated output states , , associated to a primitive isometry are completely determined by one state , provided that , where is the number of linearly independent Kraus operators. Hence, for any two isometries , , there exists a finite such that if and only if (7) holds for some phase factor and unitary .

4. Intermezzo on convergence of quantum statistical models and local asymptotic normality

In this section we introduce the basic elements of a theory of quantum statistical models, in as much as it is necessary to understand the second main result presented in the next section: the local asymptotic normality of the quantum Markov chain’s output, and the associated quantum Fisher information. This section is not directly connected to the Markov set-up and can be skipped at a first reading.

Definition 3.

Let be a parameter space. A quantum statistical model over is a family

of density matrices on a Hilbert space , which are indexed by an unknown parameter .

The typical quantum statistical problem associated to a model is to estimate the unknown parameter by measuring a system prepared in the state , and constructing an estimator based on the measurement outcome. In practice, the problem typically involves an additional parameter describing the ‘sample size’, and ‘good estimators’ have the property that the estimation error (e.g. the mean square error ) converges to zero as . The samples may be independent and identical as in quantum state tomography, or may consist of correlated systems depending on an unknown dynamical parameter, as considered in this paper. The rate of convergence is typically of the order and has a constant factor equal to the inverse of the (asymptotic) Fisher information, the latter describing the amount of statistical information per sample.

Asymptotic statistics deals with the ‘large n’ statistical inference set-up. The power of this set-up lies in the fact that one can take advantage of general Central Limit behaviour, and approximate the ‘n samples’ statistical model by simpler Gaussian models, with asymptotically vanishing approximation error [33].

4.1. I.I.D. pure state models.

To illustrate this idea, let us consider a simple model consisting of qubits which are independent and identically prepared in a pure state depending on a two-dimensional rotation parameter

where the factor has been inserted for later convenience. Since we work in an asymptotic framework, we can consider that the parameter belongs to a neighbourhood of size of a known fixed value which by symmetry can be chosen to be . Such a ‘localisation’ is not a prior assumption, but can be achieved with an adaptive procedure where a ‘small’ sample is used to produce a rough estimate , and this information is fed into the design of the second stage optimal measurement [28]. We will therefore write , where is a local parameter to be estimated. In this case, the Gaussian approximation mentioned above is closely related to what is known in physics as the Holstein-Primakov approximation for coherent spin states [26]. By a Central Limit argument one can show that in the limit of large the collective spin variables

converge in joint moments (with respect to the product state ) to continuous variables and respectively which satisfy the canonical commutation relations and have a coherent (Gaussian) state with means and Therefore, in the large limit the i.i.d. qubit model

is approximated (locally around ) by the ‘quantum Gaussian shift’ model

From the statistical viewpoint this approximation (when formulated in an appropriate way) provides the asymptotically optimal measurement procedures and estimation rates [20, 28]. Indeed if we would like to estimate , then the optimal measurement is that of the total spin which corresponds to the canonical variable in the limit model, and similarly for . However if one is interested in both parameters (i.e. the mean square error is ) then the optimal procedure is to measure each and on half of the spins, which corresponds to the heterodyne measurement for the Gaussian model, where the coherent state is split in two, and conjugate canonical variables are measured separately on the two subsystems.

The above convergence to the Gaussian model can be captured in a simple way, as convergence of the inner products for arbitrary pairs of local parameters

This ‘weak convergence’ [19, 22] has an appealing geometric interpretation and can be verified more easily than the ‘strong convergence’ investigated in [20, 28], the latter being nevertheless more powerful and applicable to more general models of mixed states. At the end of the section we will show how weak convergence can be upgraded to strong convergence under an additional assumption.

4.2. Weak and strong convergence of pure state models.

We will now briefly sketch a mathematical framework for weak convergence of pure state models, which will serve as motivation for our results on local asymptotic normality for quantum Markov chains. Since this is not the main focus of the paper, we leave the general theory for a separate work.

Definition 4.

Let be a quantum model with parameter space and Hilbert space , and let be another model with the same parameter space and Hilbert space . We say that is equivalent to if there exists quantum channels and such that

It can be easily seen that if two models are equivalent then for any statistical decision problem, their optimal procedures can be related through the channels and and the corresponding risks (figures of merit) are equal [19]. The definition is also naturally connected with the theory of quantum sufficiency developed in [31]. We will now extend this to allow for models which are ‘close’ to each other but not necessarily equivalent. For our purposes, it suffices to restrict our attention to pure state models.

Let and be as in definition 4, with and