Interference Effects in Quantum Belief Networks
Probabilistic graphical models such as Bayesian Networks are one of the most powerful structures known by the Computer Science community for deriving probabilistic inferences. However, modern cognitive psychology has revealed that human decisions could not follow the rules of classical probability theory, because humans cannot process large amounts of data in order to make judgements. Consequently, the inferences performed are based on limited data coupled with several heuristics, leading to violations of the law of total probability. This means that probabilistic graphical models based on classical probability theory are too limited to fully simulate and explain various aspects of human decision making.
Quantum probability theory was developed in order to accommodate the paradoxical findings that the classical theory could not explain. Recent findings in cognitive psychology revealed that quantum probability can fully describe human decisions in an elegant framework. Their findings suggest that, before taking a decision, human thoughts are seen as superposed waves that can interfere with each other, influencing the final decision.
In this work, we propose a new Bayesian Network based on the psychological findings of cognitive scientists. We made experiments with two very well known Bayesian Networks from the literature. The results obtained revealed that the quantum like Bayesian Network can affect drastically the probabilistic inferences, specially when the levels of uncertainty of the network are very high (no pieces of evidence observed). When the levels of uncertainty are very low, then the proposed quantum like network collapses to its classical counterpart.
The problem of violations of the axioms of probability go back to the early 60âs.  published a work that influenced modern psychology by showing that humans violate the laws of probability theory when making decisions under risk. The principle that humans were constantly violating is defined by The Sure Thing Principle. It is a concept widely used in game theory and was originally introduced by . This principle is fundamental in Bayesian probability theory and states that if one prefers action over under state of the world , and if one also prefers over under the complementary state of the world , then one should always prefer action over even when the state of the world is unspecified.
Cognitive psychologists A. Tversky and D. Khamenman also explored more situations where classical probability theory could not be accommodated in human decisions. In their pioneering work,  realised that the beliefs expressed by humans could not follow the rules of Boolean logic or classical probability theory, because humans cannot process large amounts of data in order to make estimations or judgements. Consequently, the inferences performed are based on limited data coupled with several heuristics, leading to a violation on one of the most important laws in bayesian theory: the law of total probability.
One of the key differences between classical and quantum theories is the way how information is processed. According to classical decision making, a person changes beliefs at each moment in time, but it can only be in one precise state with respect to some judgement. So, at each moment, a person is favouring a specific belief. The process of human inference deterministically either jumps between definite states or stays in a single definite state across time . Most computer science, cognitive and decision systems are modelled according to this single path trajectory principle. Figure 1 illustrates this idea.
In quantum information processing, on the other hand, information (and consequently beliefs) are modelled via wave functions and therefore they cannot be in definite states. Instead, they are in an indefinite quantum state called the state. That is, all beliefs are occurring on the human mind at the same time. According to cognitive scientists, this effect is responsible for making people experience uncertainties, ambiguities or even confusion before making a decision. At each moment, one belief can be more favoured than another, but all beliefs are available at the same time. In this sense, quantum theory enables the modelling of the cognitive system as it was a wave moving across time over a state space until a final decision is made. From this superposed state, uncertainty can produce different waves coming from opposite directions that can crash into each other, causing an interference distribution. This phenomena can never be obtained in a classical setting. Figure 2 exemplifies this. When the final decision is made, then there is no more uncertainty. The wave collapses into a definite state. Thus, quantum information processing deals with both definite and indefinite states .
1.1 Motivation: Violations in The Two-Stage Gamblings
 were one of the first researchers to test the veracity of Savage’s principle under human cognition in a gambling game. In their experiment, participants were asked at each stage to make the decision of whether or not to play a gamble that has an equal chance of winning $200 or losing $100. Figure 3 illustrates the experiment. Three conditions were verified:
Participants were informed if they had won the first gamble;
Participants were informed if they had lost the first gamble;
Participants did not know the outcome of the first gamble;
The two-stage gambling game was one of the first experiments used in order to determine if the Sure Thing Principle would be verified even with people that did not know about the existence of this principle. The results obtained in  experiment showed that this principle is constantly being violated and consequently humans do not perform inferences according to the laws of probability theory and Boolean logic.
The overall results revealed that participants who knew that they won the first gamble, decided to play again. Participants who knew that they lost the first gamble, also decided to play again. Through Savage’s sure thing principle, it was expected that the participants would choose to play again, even if they did not know the outcome of the first gamble. However, the results obtained revealed something different. If the participants did not know the outcome of the first gamble, then many of them decided not to play the second one.
Several researchers replicated this experiment. The overall results are specified in Table 1.
|Literature||Pr(Play Again Won 1st Play)||Pr(Play Again Lost 1st Play)||Pr(Play Again Unknown 1st Play)|
Why did the findings reported in Table 1 generate so much controversy in the scientific community? Because, the data observed is not in accordance with the classical law of total probability. In Tversky and Shafir’s experiment , the probability of a participant playing the second gamble, given that the outcome of the first gamble is unknown, , can be computed through the law of total probability:
In Equation 1, corresponds to the probability of a player winning the first gamble, given that (s)he participated on the game in the first place. is the probability of playing the second gamble, given that it is known that the player won the first one. corresponds to the probability of losing the first gamble, given that the participant decided to play the game in the first place. And finally, is the probability of a participant playing the second gamble, given that it is known that (s)he lost the first one.
Following the law of total probability in Equation 1, the probability of playing the second gamble, given that the player did not know the outcome of the first one, should be between the following values :
 explained these findings in the following way: when the participants knew that they won, then they had extra house money to play with and decided to play the second round. If the participants knew that they lost, then they chose to play again with the hope of recovering the lost money. But, when the participants did not know if they had won or lost the first gamble, then these thoughts, for some reason, did not emerge in their minds and consequently they decided not to play the second gamble. Other works in the literature also replicated this two-stage gambling experiment [65, 50, 51], also reporting similar results to . Their results are summarised in Table 1.
There have been different works in the literature trying to explain and model this phenomena [17, 61, 21]. Although the models in the literature diverge, they all agree in one thing: one cannot use classical probability theory to model this phenomena, since the most important rules are being violated. This two stage gambling game experiment was one of the most important works that motivated the use of different theories outside of classical bayesian theory and boolean logic, more specifically the usage of quantum probability theory.
1.2 Research Questions
Recent findings in the cognitive psychology literature revealed that humans are constantly violating the law of total probability when making decisions under risk [14, 21, 22]. These researchers also showed that quantum probability theory enables the development of decision models that are able to simulate human decisions. Given that most of the systems that are used nowadays are based on Bayesian probability theory, is it possible to achieve better inference mechanisms in these systems using quantum probability theory? For instance, many medical diagnosing systems are based in classical probabilistic graphical models such as Bayesian Networks. Can one achieve better performances in diagnosing patients using quantum probability?
Generally speaking, a Bayesian Network is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph.
In the work of , it is argued that any classical Bayesian Network can be extended to a quantum one by replacing real probabilities with quantum complex amplitudes. This means that the factorisation should be performed in the same way as in a classical Bayesian Network. One big problem with Tucci’s work is concerned with the inexistence of any methods to set the phase parameters. The author states that, one could have infinite Quantum Bayesian Networks representing the same classical Bayesian Network depending on the values that one chooses to set the parameters. This requires that one knows a priori which parameters would lead to the desired solution for each node queried in the network (which we never know).
In the work of , the authors argue that, in order to develop a quantum Bayesian Network, it is required a quantum version of probability distributions, quantum marginal probabilities and quantum conditional probabilities. The proposed model fails to provide any advantage relatively to the classical models, because it cannot take into account interference effects between unobserved random variables. In the end, both models provide no advantages in modelling decision making problems that try to predict decisions that violate the laws of total probability.
In this paper, the core of the proposed Bayesian Network is based on the psychological findings uncovered in the works of [21, 22, 17, 61] and on quantum information processing. These authors show that, before taking a decision, human thoughts are seen as superposed waves that can interfere with each other, influencing the final decision. In Bayesian Networks, nodes can either be query variables, evidences or simply unknown. Given that we do not observe the unknown nodes of a Bayesian Network, since we do not know for sure what values they can take, then what would happen to the inference process if these nodes are put in a representation of a quantum superposition and interfere with each other (Figure 4)? Can a better inference process be achieved? These are the main research questions that this paper aims at answering. So far, to the best of our knowledge, there is no previous work in the Computer Science community that attempts to map these psychological findings into computer science decision making systems, such as Bayesian Networks.
In order to validate our hypothesis, we performed experiments with well known classical Bayesian Networks from the literature. We first create a quantum Bayesian Network that can accommodate the paradoxical findings in the two-stage gambling game. We then generalise our quantum Bayesian Network in order to deal with larger and more complex datasets that are used in the literature: the well known Burglar/Alarm Bayesian Network from  and the Lung Cancer Bayesian Network from .
Before describing the proposed model, we first need to introduce some quantum probability concepts for the understanding of this work. Sections 2 and 3 present the main differences between classical and quantum probability theory. Instead of just presenting a set of formulas, we show this difference by means of an illustrative example, just like proposed in . In Section 4, we describe how beliefs can act like waves and interfere with each other. We show mathematically how this interference term can be derived by using well known rules of complex numbers. Section 5 addresses the main works of the literature that contributed for the development of the interference term. It also introduces a new interference formula that will be applied in the proposed quantum probabilistic graphical models. Section 6 presents a comparison between a classical Bayesian Network model against the proposed quantum interference Bayesian Network applied to the problem of two-stage gambles. Section 7 presents another comparison between the classical and quantum Bayesian Networks, but for a more complex network from the literature. In Section 8, it is made a discussion about the results obtained in the experiments performed in Section 7. Section 9 presents an additional experiment over another Bayesian Network in order to study the impact of the quantum interference parameters in different scenarios. Section 10 presents the most relevant works of the literature. Finally, Section 11 presents the main conclusions of this work.
2 Probability Axioms of Classical and Quantum Theory
In this section, we describe the main differences between classical theory and quantum probability theory through examples. The example analyzed concerns jury duty. Suppose you are a juror and you must decide whether a defendant is guilty or innocent. The following sections describe how the classical and quantum theory evolve in the inference process. All this analysis is based on the book of .
In classical probability theory, events are contained in Sample Spaces.
A Sample Space corresponds to the set of all possible outcomes of an experiment or random trial . For example, when judging whether a defendant is guilty or innocent, the sample space is given by . Figure 6 presents a diagram showing the sample space of a defendant being guilty or innocent.
In quantum probability theory, events are contained in the so called Hilbert Spaces. A Hilbert Space can be viewed as a generalisation and extension of the Euclidean space into spaces with any finite or infinite number or dimensions. It can be see as a vector space of complex numbers and offers the structure of an inner product to enable the measurement of angles and lengths . The space is spanned by a set of orthonormal basis vectors . Together, these vectors form a basis for the space. Figure 6 presents a diagram showing the Hilbert space of a defendant being guilty or innocent . Since a Hilbert space enables the usage of complex numbers, then, in order to represent the events and , one would need two dimensions for each event (one for the real part and another for the imaginary part). In quantum theory, one usually ignores the imaginary component in order to be able to visualise geometrically all vectors in a 2-dimensional space.
In classical probability theory, events can be defined by a set of outcomes to which a probability is assigned. They correspond to a subset of the sample space from which they are contained in. Events can be mutually exclusive and they obey to set theory. This means that operations such as intersection or union of events are well defined. Since they respect set theory, the distributive axiom is also defined between sets. In our example, or can be seen as two mutually exclusive events.
According to quantum probability theory, events correspond to a subspace spanned by a subset of the basis vectors contained in the Hilbert Space. Events can be orthogonal, that is, they can be mutually exclusive. Operations such as intersection and union of events are well defined if the events are spanned by the same basis vector . In quantum theory, all events contained in a Hilbert Space are defined through a superposition state which is represented by a state vector comprising the occurrence of all events. In our example, and correspond to column vectors representing the main axis of the circle in Figure 7. They are defined as follows:
In Figure 7, the superposition state can be defined as follows.
In Equation 4, one might be wondering what the values mean. They are called . They correspond to the amplitudes of a wave and are described by complex numbers. The term is defined as the phase of the amplitude. It can be seen as a shift of the wave. These amplitudes are related to classical probability by taking the squared magnitude of these amplitudes. This is achieved by multiplying the amplitude with its complex conjugate (represented by the symbol ).
In quantum theory, it is required that the sum of the squared magnitudes of each amplitude equals . This axiom is called the normalization axiom and corresponds to the classical theory constraint that the probability of all events in a sample space should sum to one.
2.3 System State
A system state is nothing more than a probability function which maps events into probability numbers, i.e., positive real numbers between and .
In classical theory, the system state corresponds to exactly its definition. There is a function that is responsible to assign a probability value to the outcome of an event. If the event corresponds to the sample space, then the system state assigns a probability value of to the event. If the event is empty, then it assigns a probability of . In our example, if nothing else is told to the juror, then the probability of the defendant being guilty is .
In quantum theory, the probability of a defendant being is given by the squared magnitude of the projection from the superposition state to the subspace containing the observed event . Figure 8 shows an example. If nothing is told to the juror about the guiltiness of a defendant, then according to quantum theory, we start with a superposition state .
When someone asks whether the defendant is guilty, then we project the superposition state into the relevant subspace, in this case the subspace , just like shown in Figure 8. The probability is simply the squared magnitude of the projection, that is:
Which has exactly the same outcome as in the classical theory.
2.4 State Revision
State revision corresponds to the situation where after observing an event, we are interested in observing other events given that the previous one has occurred.
In classical theory, this is addressed through the conditional probability formula . So, returning to our example, suppose that some evidence has been given to the juror proving that the defendant is actually guilty, then what is the probability of him being innocent? This is computed in the following way.
Since the events and are mutually exclusive, then their intersection is empty, leading to a zero probability value.
In quantum theory, the state revision is given by first projecting the superposition state into the subspace representing the observed event. Then, the projection is normalised such that the resulting vector is unit length. Again, if we want to determine the probability of a defendant being innocent, given he was found guilty, the calculations are performed as follows. We first start in the superposition state vector .
Then, we observe that the defendant is guilty, so we project the state vector into the subspace and normalise the resulting projection.
From the resulting state, we just extract the probability of being innocent by simply squaring the respective probability amplitude. Again, we obtain the same results as the classical theory.
3 The Path Trajectory Principle
In order to describe direct dependencies between a set of variables, path diagrams are generally used. This section shows how to compute quantum probabilities in a Markov model using Feynman’s path rules, just like presented in the work of .
3.1 Single Trajectories
Consider the diagram represented in Figure 12.
The computation of the probability of transiting from an initial state to a final state , transiting from an intermediate state , can be achieved through a classical Markov model. The probability can be computed by making the product of the individual probabilities for each transition, from one state to another, through the usage of conditional probabilities. According to Figure 12, the probability of transiting from state , followed by state and ending in state , that is, , is given by:
In a quantum path diagram model, the computation of the probabilities for a single path trajectory is similar to the classical Markov model. The calculation can be performed using Feynman’s first rule, which asserts that the probability of a single path trajectory consists in the product of the squared magnitudes of the amplitudes for each transition from one state to the next along the path. This means that the quantum probability value of a single path trajectory is the same as the classical Markov probability for the same path. In Equation 8 and through the rest of this work, complex probability amplitudes will be represented by the symbol .
3.2 Multiple Indistinguishable Trajectories
An indistinguishable path consists in transiting from an initial state to a final state by transiting from multiple possible paths without knowing for certain which path was taken to reach the goal state. Figure 12 shows an example of multiple indistinguishable trajectories. In a classical Markov model, if one does not observe which path was taken to reach the final state , then one simply computes this probability by summing the individual probabilities of each path. This is known as the single path principle and is in accordance with the law of total probability. So, following Figure 12, in order to reach state starting in state , one can take the path or the path . The final probability is given by:
Quantum probability theory rejects the single path trajectory principle. If one does not observe which path was taken to reach the goal state, then one cannot assume that one of each possible paths was used. Instead, quantum probability argues that, when the path is unobserved, then the goal state can be reached through a superposition of path trajectories. This is known as Feynman’s second rule, which states that the amplitude of transiting from an initial state to a final state , taking multiple indistinguishable paths, is given by the sum of all amplitudes for each path. This rule is in accordance with the law of total amplitude and the probability is computed by taking the squared magnitude of this sum. This probability is not equal to the classical Markov model.
The term is the inner product between the vectors formed by and . It comes from Euler’s rule: and corresponds to a quantum interference term that does not exist in classical probability theory. Section 4 details how this term is derived. Since the interference term can lead to estimations with values higher than 1, then it is necessary to normalize this value in order to obtain a probability value.
When the path is observed, quantum probability theory collapses to the classical Markov model. This is know as Feynman’s third rule and states that the probability amplitude of observed multiple path trajectories corresponds to the sum of the amplitudes of each individual path. The probabilities are then taken by making the squared magnitude of each individual path. Figure 12 illustrates this example:
4 The Interference Term
Quantum theory enables the modeling of the decision system as a wave moving across time over a state space until a final decision is made. Under this perspective, interference can be regarded as a chain of waves in a superposition state, coming from different directions. When these waves crash, one can experience a destructive effect (one wave destroys the other) or a constructive effect (one wave merges with another). In either case, the final probabilities of each wave is affected. Psychological findings showed that this not only occurs in a microscopic scale (such as electrons), but also occurs at a macroscopic setting [44, 61, 22].
In this section we show how to derive the interference term by just taking into account well known properties of complex numbers . The interference term can be derived in two different ways: (1) from the general probability formula of the union of mutually exclusive events and (2) through the total law of probability.
4.1 Deriving the Interference Term from the Union of Mutually Exclusive Events
For simplicity, we will start by deriving the interference term for 2 events and then we will generalize for events. The classical probability formula of the union of two mutual exclusive events is given by:
And the relation between a classical probability density function and a quantum probability amplitude is given by Born’s rule, that is:
Again, corresponds to the square magnitude of a complex amplitude. It is obtained by multiplying the probability amplitude with its complex conjugate. That is, .
Taking into account Equation 13, one can write a superposition state between two mutual exclusive events and in the following way:
The relation of this superposed state with classical probability theory remains:
The quantum counterpart of the classical probability of the union of two mutually exclusive events, when we do not observe them, collapses to Feynmann’s second rule, that is:
The classical probability of the union of two mutual exclusive events is . In Equation 16, the amplitude corresponds to and corresponds to . So, what is the additional term, , that we derived in Equation 16? This term does not exist in classical probability theory and is called the interference term. The interference term can be rewritten like in Equation 17:
Then Equation 17 can be rewritten as:
So, the complete quantum probability formula for the union of 2 mutually exclusive events is given by:
In the above formula, the angle corresponds to the phase of the inner product between and .
Equation 19 only computes the probability of the union of 2 mutually exclusive events. In classical probability theory, if we want to compute the probability of the union of mutually exclusive events, then we use a generalization of Equation 12, that is:
In quantum theory, we make an analogous calculation. In order to compute the probability of the union of mutually exclusive events, then one needs to generalize Equation 14. The outcome is given by the following formula:
4.2 Deriving the Interference Term from the Law of Total Probability
This interference term can also be derived directly from the law of total probability. Suppose that events form a set of mutually disjoint events, such that their union is all in the sample space, , for any other event . Then, the classical law of total probability can be formulated like in Equation 22.
For simplicity, we will expand Equation 23 for and only later we will find the general formula for events:
Simplifying Equation 27, we obtain:
Then, Equation 28 becomes
Generalizing Equation 30 for events, the final probabilistic interference formula, derived from the law of total probability, is given by:
Following Equation 31, when equals zero, then it is straightforward that quantum probability theory converges to its classical counterpart, because the interference term will be zero.
For non-zero values, Equation 31 will produce interference effects that can affect destructively the classical probability ( when interference term in smaller than zero ) or constructively ( when it is bigger than zero ). Additionally, Equation 31 will lead to a large amount of parameters when the number of events increases. For binary random variables, we will end up with parameters to tune.
5 The Role of the Interference Term in the Literature
In the 20th century, the physicist Max Born proposed a problem related to quantum probabilities, which was later known as the The Inverse Born Problem. The problem consisted in constructing a probabilistic representation of data from different sources (physics, psychology, economy, etc), by a complex probability amplitude, which could match Born’s rule (already presented in Equation 13).
The probabilistic interference formula for the law of total probability, derived in the previous section (Equation 31), can be seen as an answer to the Inverse Born Problem. The most important works in the literature that contributed for the derivation of this interference term, through the law of total probability, correspond to the works of A. Khrennikov [47, 44, 48, 39, 43, 24]. These authors address the interference term as . In the situations where , then one can apply the trigonometric formula derived in Equation 31. However, there are some data where this condition is not verified. Therefore, when , the authors propose the usage of a hyperbolic interference term, to act like an upper boundary in order to constraint the probability value to a maximum value of . This would require the usage of Hyperbolic Hilbert Spaces instead of the complex ones.
In this paper, we argue that there is no need to represent probabilities in a Hyperbolic Hilbert Space in order to avoid non-probability values. Since we will be dealing with probabilistic graphical models, we will always be required to normalise the probability amplitudes when performing probabilistic inferences. For this work, we follow the same probabilistic paradigm used in traditional Bayesian Networks. Thus, we constrain Equation 32 and Equation 33 by a normalisation factor that will guarantee that the computed values will always be probabilities lesser or equal than one. This normalisation factor corresponds to Feynman’s conjecture  that an electron can follow any path. Thus, in order to compute the probability that a particle ends up at a point , one must sum over all possible paths that the particle can go through. Since the interference term can lead to estimations with values higher than , then it is necessary to normalise in order to obtain a probability value.