Beyond DAGs: Modeling Causal Feedback with Fuzzy Cognitive Maps
Fuzzy cognitive maps (FCMs) model feedback causal relations in interwoven webs of causality and policy variables. FCMs are fuzzy signed directed graphs that allow degrees of causal influence and event occurrence. Such causal models can simulate a wide range of policy scenarios and decision processes. Their directed loops or cycles directly model causal feedback. Their nonlinear dynamics permit forward-chaining inference from input causes and policy options to output effects. Users can add detailed dynamics and feedback links directly to the causal model or infer them with statistical learning laws. Users can fuse or combine FCMs from multiple experts by weighting and adding the underlying fuzzy edge matrices and do so recursively if needed. The combined FCM tends to better represent domain knowledge as the expert sample size increases if the expert sample approximates a random sample. Many causal models use more restrictive directed acyclic graphs (DAGs) and Bayesian probabilities. DAGs do not model causal feedback because they do not contain closed loops. Combining DAGs also tends to produce cycles and thus tends not to produce a new DAG. Combining DAGs tends to produce a FCM. FCM causal influence is also transitive whereas probabilistic causal influence is not transitive in general. Overall: FCMs trade the numerical precision of probabilistic DAGs for pattern prediction, faster and scalable computation, ease of combination, and richer feedback representation. We show how FCMs can apply to problems of public support for insurgency and terrorism and to US-China conflict relations in Graham Allison’s Thucydides-trap framework. The Thucydides-trap FCM predicts war-like conflict for nearly of input scenarios where we treat all such scenarios as equally likely. The appendix gives the textual justification of the Thucydides-trap FCM and extends our earlier theorem \parenciteosoba-kosko2017 that shows how a concept node transitively affects downstream nodes. The more general result shows the transitive and total causal influence that upstream nodes exert on downstream nodes.
Fuzzy Cognitive Maps for Causal Modeling
FCMs allow users to quickly draw causal diagrams of complex social or other processes \parencitekosko1986. These causal diagrams can have closed loops or paths in them. The closed loops directly model feedback among the causal concepts or nodes. They are fuzzy because the causal arrows in the diagrams admit degrees or shades of gray.
Users can make what-if predictions with a given FCM: What happens given this input policy? The predictions are not numerical predictions. They are pattern predictions or repeating sequences of events. Users can also ask causal why questions: Why did these events occur? Different users can also fuse or combine their FCM diagrams into a unified FCM.
These FCM techniques are quite general and have led to numerous applications \parencitepapageorgiou2013review, glykas2010, fcm-papageorgiou2013, amirkhani2017review. A recent application developed an FCM representation of how ‘Brexit’ scenarios could affect energy demand in the United Kingdom \parenciteziv-brexit2018. Another application used FCM techniques to represent the numerous social factors involved in homelessness \parencitemago2013analyzing.
A FCM uses an arrow or directed-graph edge to describe how one concept node causally affects another concept node. So a FCM represents causality as a directed edge in graph. The causality can be partial or fuzzy because the directed edge has a numerical magnitude that states the degree or intensity of causality. A FCM model of military deterrence might include concept nodes for the degree of national military capability and for the degree of a threat’s credibility. The directed causal edges among two or more such concept nodes define a directed graph or digraph. A key aspect of FCMs is that in general their digraphs contain cycles or feedback loops. This means that most FCMs are not directed acyclic graphs or DAGs.
Fuzz as a Matter of Degree
Fuzz is a term of art in logic and decision science. Fuzz denotes degree of truth or degree of causality. Fuzzy logic extends binary logic to multi-valued logic and allows rule-based approximate reasoning \parencitezadeh1965, Kosko1993. Decision theorist Richard Bellman was one of the founders of fuzzy theory and even defined abstraction as the estimation of a fuzzy-set membership curve \parencitebellman1966abstraction.
FCMs are fuzzy both in their causal edges and often in there concept nodes. Fuzzy causal edges denote partial causality. All-or-none causality can still occur but only as the endpoints of the spectrum of causal influence. The same holds for the activation of a concept node. Concept nodes are often binary in practice and in the examples below. More sophisticated FCM models use fuzzy gray-scale concept nodes and may also use time lags.
Feedback loops in a FCM imply that a FCM is a dynamical system. Inputs stimulate the FCM system. The causal activation then swirls through the FCM until the FCM falls into a dynamical equilibrium. The FCM stays in that dynamical equilibrium until a new input perturbs the system. Most FCMs quickly reach an equilibrium. The simplest equilibrium is a fixed-point attractor or a single state of the FCM that repeats over and over. The more common equilibrium is a limit cycle where a sequence of states repeats over and over in a loop. The equilibrium serves as the system’s what-if prediction or forward inference from the input. FCMs with binary nodes usually converge to a limit cycle. The cycle of states is a form of pattern prediction.
Comparison with Other Methods
Two other approaches to modeling causal worlds are system-dynamics (SD) models \parencitesterman2000, abdelbari2018 and Bayesian belief networks (BBNs) \parencitepearl2009causality.
System-dynamics models allow users to represent and simulate causal interactions within relatively complex systems. But SD models have static parameters. Domain experts or random experiments often choose the parameters of the subsystems and their interconnections. FCMs admit both data-driven and expert-driven adaptation of the model structure and the model parameters. Statistically learning algorithms estimate causal edges from training data. Experts can also state edge values directly. SD models often account for stochasticity by using sensitivity analyses at the end of modeling. FCMs build uncertainty into their very structure.
BBNs model uncertain causal worlds with conditional probabilities. A user must first state a known joint probability distribution over all the nodes of the directed graph. This may not be practical for large numbers of nodes. Forward inference on a BBN also tends to be computationally intensive because it marginalizes out nodes. The directed graph is usually acyclic graph and thus has no closed loops. The acyclic structure simplifies the probability structure but it ignores feedback among the causal units. This is an important limitation for social and behavioral models that often feature causal loops. The acyclic constraint also makes it hard to combine BBN causal models from multiple experts because such combination may well produce a cycle. FCMs contain causal loops by design. Combining FCMs only tends to increase the loop or feedback structure and thereby produce richer feedback dynamics. It also tends to improve the modeling accuracy in many cases. FCM forward-inference process is simple and fast because it uses only vector-matrix multiplication and thresholding. But BBNs do give precise numerical probability descriptions and not the mere pattern predictions of FCMs. So FCMs trade numerical precision for pattern prediction, faster and scalable computation, ease of combination, and richer feedback representation.
FCMs offer many advantages for social-scientific modeling and simulation. FCMs can capture the causal beliefs of domain experts. FCM dynamics can reveal global “hidden patterns” in small or large causal webs. These patterns tend to far from obvious when examining a large-scale FCM model. The domain expert expert expresses his beliefs about how relevant factors causally relate to one another. Some examples of such expert-based FCMs include public support for terrorism \parenciteosoba-kosko2017, blood-clotting reactions \parencitetaber-yager-helgason2007, and medical diagnostics \parencitestylios2008fuzzy.
Analysts can also convert documents into FCMs. We show below an example translating Graham Allison’s textual descriptions of his proposed Thucydides Trap of international relations into a predictive FCM. Combining such documents and their FCMs gives one way to approach “big knowledge” because combing FCMs always results in a new FCM. FCMs can also apply driving agents’ behaviors or decisions in agent-based models. Earlier FCMs \parencitedickerson-kosko1993 show how lone or combined FCMs can govern the behavior of virtual actors. These behaviors can be simple or complex depending on the driving FCM. Applying learning laws to the agents’ FCMs can simulate the effect of agents learning new behaviors.
Fuzzy Cognitive Maps as Models of Causal Feedback
FCMs offer a practical way to model and process the interwoven causal structure of policy and decision problems. Numerous FCM applications have appeared in recent years. They range from control engineering and medicine to policy analysis and social modeling \parencitepapageorgiou2013review, glykas2010, fcm-papageorgiou2013, amirkhani2017review.
FCMs are fuzzy causal signed directed graphs \parencitekosko1986. The digraphs are fuzzy because in general both their directed causal edges and their concept nodes admit degrees. So they can assume more values than just the bivalent extremes of on or off. Their multivalued causal edges directly model degrees of causal influence. Users can express their causal and policy models by drawing signed and weighted causal edges between concept nodes. This does require specifying the nonlinear dynamical structure of the concept nodes. Statistical machine-learning algorithms can also infer or tune the edges using training data if representative data is available. A differential Hebbian learning law can approximate a FCM’s directed edges of partial causality using time-series training data.
Figure 1 shows a FCM fragment that models a simple undersea causal web of dolphins in the presence of sharks or other survival threats. This simple model uses concepts that are either active or not active. The directed causal edges likewise show all-or-none causal increase or decrease (missing edges have zero weight). Section 3 shows how to make what-if inferences or predictions with this simple FCM that has binary concept nodes and trivalent causal edges. The inference process uses only vector-matrix multiplication and thresholding. More complex FCMs can activate concept nodes with soft thresholds or bell curves or other nonlinear transformations of inputs to outputs. Figure 3 in \parenciteosoba-kosko2017 shows six examples of soft thresholds. Figure 2 shows how to combine FCMs even when some of the FCMs use different concept nodes. A causal learning law can approximate the causal edge values given time-series data of the concept nodes. Figure 3 shows the approximation path of a causal edge from a FCM that models public support for insurgency and terrorism.
A FCM’s overall cyclic signed digraph structure resembles a feedback neural or semantic network. FCMs also resemble causal loop diagrams and systems-dynamics models \parencitesterman2000, abdelbari2018 because nonlinear difference or differential equations describe FCM concept nodes (and sometimes FCM causal edges). A FCM’s directed graph structure permits inference through forward chaining. It also allows the user to control the level of causal or conceptual granularity. A FCM concept node can itself be part of another FCM or of some other nonlinear system. Feedforward fuzzy rule-based systems can also model the input-output structure of a concept node just as they can model a single causal edge that connects one concept node to another. Such fuzzy systems are uniform function approximators if they use enough if-then rules \parencitekosko-fat1994. A uniform approximation allows the user to pick the approximation error-tolerance level in advance for all inputs. Their rule bases adapt using both unsupervised and supervised learning laws \parencitekosko-fuzeng, osoba-mitaim-kosko-SMC2011.
A FCM tends to have many cycles or closed loops in its fuzzy directed graph. These cycles directly model causal feedback from self-loops to multi-path causality. The cycles also produce complex nonlinear dynamics. FCM causal inference maps input states or policies to equilibria of the nonlinear dynamical system. Users can also step through time slices of the FCM dynamical system to at least partially unfold the system in time.
A FCM’s feedback structure contrasts with the acyclic structure of Bayesian belief networks (BBNs). BBNs form the basis of Pearl’s popular model of causal inference \parencitepearl2009causality. Rubin’s counterfactual approach to causality is a related statistical model \parenciterubin2005, imbens2015causal. BBNs for causal inference assign probabilities to a directed acyclic graph (DAG). Their acyclic causal tree structure rules out feedback pathways. This strong acyclic assumption greatly simplifies the probability calculus on such digraphs and may permit finer control when propagating probabilistic beliefs. It allows the sum-product algorithm to compute marginal node probabilities from a known joint probability distribution on all the nodes. FCMs do not produce such probability estimates. But the acyclic structure is hard to reconcile with the inherent and extensive feedback causality of large-scale social phenomena from social networks to state-versus-state wars. These social systems are high-dimensional nonlinear dynamical systems. They have dynamics because they have feedback loops.
The loop-free structure of DAGs also makes it hard to combine causal models from multiple experts. Combining DAGs need not produce a new DAG. So combining DAGs is not a closed graph-theoretical operation in general. Some experts will tend to draw opposing causal arrows between nodes. Others will tend to add links that create multi-node closed loops. This helps explain why BBNs and tree-based expert systems often have relatively few nodes. A compounding factor is the sheer computational complexity of belief propagation.
FCMs naturally combine into a new FCM. So FCM combination is a closed graph-theoretical operation. The user can combine any number of FCMs by adding their underlying augmented adjacency matrices. This gives a simple and powerful way to perform expert knowledge fusion. Such knowledge fusion is a key function in many defense and intelligence decision-making processes \parencitedavis-rand2015, davis-wsc2015, kosko1986combination, kosko-hidden1988, taber1991,taber-yager-helgason2007. Figure 3 shows how two weighted medical FCMs combine into a fused FCM. It shows the minimal fusion case of combining two FCMs with overlapping concept nodes. The fused FCM is a weighted average or probability mixture of the combined FCMs. Averaging the binary augmented adjacency matrices of ordinary DAGs always produces a FCM.
The strong law of large numbers shows that in many cases this fused FCM converges with probability one to the population FCM of the sampled FCMs \parencitetaber-yager-helgason2007. This result holds formally if the FCM edge values from the combined experts approximate a statistical random sample with finite variance. A random sample is sufficient for this convergence result but not necessary. A combined FCM may still give a representative knowledge base when the expert responses are somewhat correlated or when the experts do not all have the same level of expertise. Users can also construct FCMs from written sources such as policy articles or books or legal testimony. They can also use statistical learning algorithms to grow FCMs from sample data. FCMs can in this way address the growing representational problems of big data and what we have called “big knowledge” \parencitefcm-papageorgiou2013.
FCM concept nodes can also be part of some other FCM or of some other nonlinear system. Feedforward fuzzy rule-based systems can also model the input-output structure of a concept node just as they can model a single causal edge that connects one concept node to another. These fuzzy systems are uniform function approximators if they use enough if-then rules \parencitekosko-fat1994. We can also generalize the causal edge values to equal the outputs of such fuzzy rule-based systems or generalized probability mixtures \parencitekosko2017generalized,kosko2018additive. We can further put a probabilistic structure on top of the fuzzy causal structure. This chapter looks only at the fuzzy causal structure.
Inference on a graph maps nodes to nodes. Forward inference maps observable evidence nodes or variables to output nodes or variables . Backward inference maps output nodes back to input nodes. Probabilistic inference on a graph computes the conditional probability of output nodes given a state vector on observable evidence nodes . This computation involves a complex marginalization operation on general DAGs. It also assumes that the user knows the closed-form joint probability distribution on all the nodes. The computation often requires complex message-passing algorithms such as belief propagation \parenciteyedidia2001, murphy2002 or the more general junction-tree algorithm \parencitewainwright-jordan2008). This probabilistic inference or computation is NP-hard in general \parencitedagum-luby1993,russell-norvig2016. And the problem of learning the underlying Bayesian network from data is an NP-complete problem in general \parencitechickering1996,chickering2004.
Loops or cycles in a causal graph model may render exact probabilistic inference difficult if it is even feasible. Inexact or loopy inference schemes can give useful approximations in many cases. These methods include loopy belief propagation, variational Bayesian methods, mean-field methods, and some forms of Markov-chain Monte Carlo simulation \parencitepearl2014,wainwright-jordan2008,beal-ghahramani2003, beal-ghahramani2006. But loopy algorithms need not converge. Nor does their use overcome the NP-hard complexity of probabilistic belief propagation. It may instead only compound the computational complexity.
FCM forward inference uses light computation. It requires only vector-matrix multiplication and nonlinear transformations of vectors. The transformations are often hard or soft thresholds. So FCM forward inference has only polynomial-time complexity. This means that FCMs scale fairly well to problems with high dimension or multiple concept nodes. Simple binary-state FCMs converge quickly to limit-cycle equilibria given an input stimulus \parencitekosko-nnfs. FCMs with more complex node nonlinearities can converge to aperiodic or chaotic equilibria if sufficiently complex nonlinearities describe the concept nodes.
FCMs do have at least two structural limitations. The first is that a user may not be able to use some predicted outcomes. A user may find it hard to interpret a FCM’s what-if predictions because they are equilibria of a highly nonlinear dynamical system. Simple predictions may be limit cycles that consist of only a few ordered system states. The dolphin example below is one such case. One equilibrium output consists of four binary vectors that repeat in sequence. Other bit-vector limit cycles can consist of much longer ordered sequences. Richer dynamics can produce aperiodic or chaotic equilibria in regions of the FCM state space. Such predicted outcomes may have no clear policy interpretation if we cannot clearly associate the equilibrium attractor’s region of the FCM state space with a temporally ordered sequence of FCM states.
A more fundamental limitation is that FCMs do not easily answer why questions given observed outcomes. FCMs do not easily admit backward inference from effects to causes because FCM nodes are neural-like nonlinear mappings of causal inputs to outputs. A user may need to test a wide range of random inputs to see which FCM states map to or near a given observed or conjectured output state. We often check all possible input policy states to find all output equilibria. We did this with the Thucydides-trap FCM below by “clamping on” input policy variables while the FCM dynamical system converged. We also tested this FCM’s pattern predictions against those of a thresholded FCM whose edge values were the trivalent extremes of . The thresholded FCM gave similar equilibrium predictions.
The next sections explore FCMs in some detail. We conclude with two FCM policy applications. The first shows how FCMs can assist in modeling the many causal and policy factors involved in public support for insurgency and terrorism. The second shows how a FCM model can give insight into Allison’s recent “Thucycides trap” model of US-China conflict.
Probabilistic vs. Fuzzy Models of Causality
Causality is uncertain in general. Different uncertainty model tend to produce different causal models. Probability and fuzz or fuzziness are two such different types of uncertainty. Probability or randomness describes whether an event occurs. Fuzz or vagueness describes the degree to which an event occurs. These two types of uncertainty produce different causal models even though both describe uncertainty with numbers in the unit interval .
Randomness and fuzz can also combine in their descriptions. This takes some care even at the level of natural language. The statement that “There is a 20% chance of light rain tomorrow” asserts the random occurrence of the fuzzy event of light rain. All rain is both light rain and not light rain to some degree. It is a matter of degree absent a binary definition that literally specifies which rain drop is light rain and which is not. Zadeh first showed in 1968 how to combine these two distinct types of uncertainty and in this order. The probability of a fuzzy event is just the expected value of a measurable multivalued indicator function\parencitezadeh1968.
A different juxtaposition of uncertainty types holds for the statement that “The probability of rain tomorrow is low.” It asserts a fuzzy probability rather than the probability of a fuzzy event. Its meaning also requires a fuzzy-set definition of low versus non-low probability. Such combinations can occur in digraph models of causality. This chapter focuses on the simpler cases of purely probabilistic causality and purely fuzzy causality. It focuses on how these uncertainty models view the directed causal edge from concept node to concept node . The FCM view of causality is simply that causality is what a fuzzy signed directed edge measures. Active causality is the flow of concept-node activation or influence from node to node through an intervening directed causal edge.
A probabilistic view might cast the directed causal edge value as the conditional probability that occurs given that occurs. An immediate problem is that takes on negative values in the bipolar interval to indicate causal decrease. There is a simple but somewhat costly way to address this. The original FCM paper \parencitekosko1986 showed how to introduce companion dis-concepts to keep all causal edge values nonnegative and thus how to convert causal decrease into causal increase: “Extreme terrorism decreases government stability” holds just in case “Extreme terrorism increases government instability” holds. So dis-concepts negate the noun and not the adjective that modifies it. Using dis-concepts doubles the number of concept nodes and expands the edge matrix to a matrix. The technique does preserve more causal structure when combining multiple FCMs because then two combined edges of opposite polarity do not cancel each other out if they have the same magnitude.
There are two structural problems with viewing the directed (positive) edge as the conditional probability . The first problem is that conditional probability is not transitive. This is telling since both logical and causal implication are transitive (at least on most reckonings): causes if causes and if causes . But the transitive equality does not hold in general. A simple counter-example takes any two disjoint or mutually exclusive events and with positive joint probabilities and if all three set events have positive probability. Then but because since . An even starker counter-example results if because then while .
The second and deeper problem with a probability interpretation of is that it collides with the Lewis Impossibility Theorem \parencitelewis1976, lewis1986. This triviality result and its progeny show that we cannot in general equate the probability of the logical if-then conditional with the conditional probability . The equality holds only in the trivial case when and are independent and thus when there is no conditional relationship at all. So a probabilistic transitive equality of the form lacks a formal foundation in general. One approach is to replace conditional probability with a more general probable equivalence relation. This gives upper and lower conditional probabilities based on the general inequality that \parencitekosko2004 because then if . The resulting conditioning interval does not directly address the basic prohibition that lies behind Lewis’s triviality theorem. So a meta-level heuristic may be the best we can make of probabilistic interpretations of the directed edge .
We stress that there are many ways to combine fuzz and probability in a FCM. A system-level example is the averaging technique we discuss below on FCM knowledge combination. The averaging computes a probability mixture of augmented signed fuzzy causal-edge matrices. The law of large numbers can also apply to such averages in many cases.
Probability and fuzz can also apply at the level of the lone directed edge .
We can view the causal influence from concept node to concept node as a mapping from values to . The edge maps more generally from all nodes to . Then a fuzzy system of fuzzy if-then rules can approximate any such mapping . The rules have the form “If then ” where is a fuzzy subset of the input space and is a fuzzy subset of the output space. The fuzzy system can suffer rule explosion when its input space has dimension. A Watkins representation can exactly represent with just two rules if the real function is bounded and not constant. Fuzzy rules can also absorb or approximate the prior, likelihood, and posterior probability density functions of modern Bayesian inference \parenciteosoba-mitaim-kosko-SMC2011. We note that fuzzy systems admit a complete probabilistic description in terms of mixtures of probability curves \parencitekosko2017generalized, kosko2018additive. The rules roughly correspond to the mixed probability curves. So a rule base of if-then fuzzy rules gives rise to a generalized probability mixture where the generalized mixing weights are nonnegative and sum to unity for each input and where the likelihood functions also depend on in general. The generalized mixture gives ia generalized version of the theorem on total probability. So it gives at once a Bayes theorem that describes which rules fired to which degree for any given input and output. The simplest case results when the fuzzy system has just one rule. Then reduces to the constant function .
We next show how to define a FCM and use it to make causal inferences.
Forward Inference with Fuzzy Cognitive Maps
FCMs are fuzzy signed directed graphs that describe degrees of causality and webs of causal feedback \parencitekosko-nnfs, fcm-papageorgiou2013. Most FCMs have cycles or closed loops that model causal feedback. FCMs can be acyclic and thus define directed trees. This is rare in practice and implies that such a FCM has no feedback dynamics.
FCMs are fuzzy because their nodes and edges can be multivalent and so need not be binary or bivalent. We now develop this fuzzy structure and apply it to FCMs. A property or concept is fuzzy if it admits degrees and is not just black and white \parencitezadeh1965, zadeh1973 Then the property or concept has borders that are gray and not sharp or binary. This formally means that a subset of a space is properly fuzzy if and only if at least one element belongs to to a degree other than or . Then the set breaks the so-called “law” of contradiction because then holds where is the complement set of . The set equivalently breaks the dual “law” of excluded middle because then holds. Equality holds in these two “laws” just in case is an ordinary bivalent set.
A FCM concept node is fuzzy in general because it can take values in the unit interval . So its values over time define a fuzzy set. This implies that a concept node that describes a survival threat or any other property or policy both occurs and does not occur to some degree at the same time. It cannot both occur and not occur at the same time. The two percentages must sum to . Nor again does this preclude applying a probability measure to a concept node or fuzzy set. The probability of a fuzzy event combines the two distinct uncertainty types of randomness and vagueness or fuzz (and formally involves taking the expectation of a measurable fuzzy indicator function \parencitezadeh1968). So it makes sense to speak of the probability of a partial survival threat. This differs from the compound uncertainty of a fuzzy probability such as the statement that the survival-threat probability is low or very high. This chapter works only with fuzzy concept values
A directed causal edge is also fuzzy because in general it takes on a continuum of values. The edge can also have a positive or negative sign. So it takes values in the bipolar interval . The use of “dis-concepts” can convert all negative causal edges into positive edge values \parencitekosko1986.
A simple FCM consists of concept nodes and directed fuzzy causal edges . The concept nodes are nonlinear and represent variable concepts or factors in a causal system. They are nonlinear in how they convert their inputs to outputs. The concept nodes can define concepts or social patterns that increase or decrease such as political instability or jihadi radicalism. Or they can describe policies or control variables that increase or decrease such as weapons spending or foreign investment in a country. The very first FCM published \parencitekosko1986 dealt with concepts related to Middle East stability such as Islamic fundamentalism and Soviet imperialism and the strength of the Lebanese government. The author based this first FCM on a 1982 newspaper editorial from political analyst Henry Kissinger titled “Starting Out in the Direction of Middle East Peace.”
A concept node’s occurrence or activation value measures the degree to which the concept occurs in the causal web at time . It can also reflect the degree to which it is true that the th node fires or appears in a given snapshot of the causal web at time . The FCM state vector gives a snapshot of the FCM system at time . The FCM edges themselves may be the weighted average of several experts as we explain below.
A FCM model must specify the nonlinear dynamics of the concept nodes
. It must also specify the directed and signed causal edge values that connect the th concept node to . The edges can be time-varying functions in more general FCMs.
We start with the nonlinear structure of the concept nodes. The th concept node depends at time on a scalar input that weights and aggregates all the in-flowing causal activation to . Then some nonlinear function converts into the concept node’s new state at the next discrete time . The FCM literature explores discrete and continuous node models with a wide variety of nonlinearities and time lags \parenciteglykas2010, fcm-papageorgiou2013. We present here the simplest case of a discrete FCM where each node’s current state depends on an edge-weighted inner product of the node activity:
where is some external or exogenous forcing value or input at time . The simplest nonlinear function is a hard threshold that produces bivalent or on-off concept node values:
for a zero threshold value.
Fixing the input as some very large positive (or negative) value ensures that stays on (or off) during an inference cycle. We call this “clamping” on (or off) the th concept node . We clamp one or more concept nodes to test a given policy or forcing scenario. Clamping is the only way to drive policy or other nodes that have no causal fan-in from other concept nodes. We show below how to model the sustained presence of a shark in the dolphin FCM of Figure 1 by clamping on the fourth concept node.
Continuous-valued concept nodes often use a monotone increasing nonlinearity such as the logistic sigmoid function. But can also be nonmonotonic. This happens if it is a Gaussian or Cauchy probability density function. It can also be multimodal by forming a mixture of such unimodal probability curves. Then the Expectation-Maximization algorithm can tune the mixture parameters based on numerical training data \parenciteosoba2016noisy. Almost all concept nodes are monotonically nondecreasing in the FCM literature. The causal-influence theorem below holds for such activation functions .
The logistic causal activation gives a soft threshold that approximates the hard threshold in (2) if the shape parameter is large enough:
The first graph in Figure 3 of \parenciteosoba-kosko2017 shows a logistic function and its sigmoidal or soft-threshold shape. Logistic units are popular in causal and neural-network learning algorithms because they smoothly approximate the on-off behavior of threshold units and still have a simple partial derivative of the form
for scaling constant . The positive derivative in (4) greatly simplifies many learning algorithms. We will also use it below to show the transitive product effect of the edges in a causal inference.
We turn next to the causal edge values . These values are constants during most FCM inferences. Section 6 below shows how a version of the differential Hebbian learning law can learn and tune these causal edge values from time-series data.
The causal edge value in (1) measures the degree that concept node causes concept node at time :
These causal edge values define the FCM’s fuzzy adjacency matrix or causal edge matrix . The th row lists the causal edge values that flow out from to the other concept nodes (including to itself). The th column lists the causal edge values that flow into from the other concept nodes. So the th row defines the causal fan out vector of concept node . The th column defines the causal fan in vector of . The matrix diagonal lists any causal self-excitation of the concept nodes.
We can also interpret in terms of fuzzy subsethood \parencitekosko-fuzzyentropy1986, kosko2004). Then states the degree to which the fuzzy concept set is a fuzzy or partial subset of fuzzy concept set \parencitekosko1986. This abstract framework implies that the edge value is the degree to which the fuzzy concept set belongs to the fuzzy power set of fuzzy set . This is only one interpretation of causal degrees. A FCM simply models causality as a fuzzy signed directed edge.
Figure 1 shows a FCM fragment that models an undersea causal web of dolphins in the presence of sharks or other survival threats \parencitedickerson-kosko1994. The next section shows how to make what-if inferences or predictions with this simple FCM that has binary concept nodes and trivalent causal edges. The inference process uses only vector-matrix multiplication and thresholding. More complex FCMs can activate concept nodes with nonlinear functions or with many other monotonic or nonmonotonic functions. A causal learning law can approximate the causal edge values given time-series data of the concept nodes.
0.1 FCM Dynamics for Binary Concept States
We first illustrate FCM inference with the simple but common case of binary or on-off concept nodes. FCM dynamics depend on the FCM’s nonlinear feedback structure. A FCM’s feedback loops model interwoven causal relationships and can produce rich and predictive equilibrium dynamics \parencitekosko-hidden1988, kosko-nnfs. These causal equilibria define “hidden patterns” \parencitekosko-hidden1988 in the often inscrutable web of edges and nodes. FCMs with binary concepts produce either limit-cycle equilibria or simple fixed-point attractors. A limit cycle is an ordered sequence of FCM states that repeats. A fixed point is a limit cycle of length one. Properly fuzzy concept nodes can in principle produce more exotic equilibria such as limit tori or chaotic attractors. See \parencitehirsch2012 for the formal definition of these dynamical equilibria.
The long-run evolution of the FCM state vector
depends on the initial state as well as on the nonlinear structure of the concept nodes and the structure of the FCM causal edge matrix . Simple two-state or binary-node FCMs converge either to a fixed-point attractor
or to a limit cycle of repeating bit vectors. This convergence assumes synchronous updating of all the concept nodes at each time step. This stability or convergence guarantee for binary-node FCMs follows from the local result that every square connection matrix is temporally stable \parencitekosko-nnfs.
Consider again the FCM fragment in Figure 1 that describes some of the predator-prey behavior of a dolphin pod in the presence of sharks or other survival threats \parencitedickerson-kosko1994. The concept nodes are binary with threshold activations that obey (2). Bivalent nodes simplify the dynamical analysis because updating all nodes at the same time must lead to either a fixed-point attractor or a limit-cycle of bit vectors.
Inference requires a causal edge matrix . So we start with the edge matrix that underlies the dolphin FCM in Figure 1.
The edges in the dolphin FCM fragment in Figure 1 are trivalent: . So an edge describes maximal causal increase () or maximal causal decrease () or there is no causal relationship at all (). The causal edge adjacency matrix for the FCM in Figure 1 is a 5-by-5 trivalent matrix:
A key argument for using trivalent edge weights in here and elsewhere is that experts may find it hard to accurately state a graded measure of causal intensity for a causal dependence. It is often much easier to elicit just sign values from experts than real-valued magnitudes. Taber et al. \parencitetaber-yager-helgason2007 refer to this difficulty as the expert’s articulation burden. Real-valued magnitudes also tend to be less reliable. Experts are far more likely to agree on edge signs than on both signs and magnitudes. Even the same expert may state different edge-value magnitudes at different times. This articulation burden motivates averaging the trivalent-edge-valued FCMs of experts to approximate the unknown population FCM. We also studied a thresholded version of the Thucydides-trap FCM below to gauge the robustness of the corresponding FCM with fractional edge values. Thresholding produced a trivalent FCM.
The stochastic convergence result in the appendix of \parencitetaber-yager-helgason2007 shows that averaging FCMs with trivalent edges approximates the underlying population FCM that has real edge values. FCM sample averages converge with probability one to the population average in accord with the strong law of large numbers. The underlying limit-cycle structure of the averaged FCM also appears to approximate the limit-cycle structure of the original or population FCM if the concept nodes are binary. The limit-cycle results in \parencitetaber-yager-helgason2007 are only preliminary simulations.
We now show how a limit-cycle hidden pattern occurs in the dolphin FCM in Figure 1. Suppose that a shark appears at time . Then the fourth or survival-threat concept node occurs or turns on. We can represent this initial state of the FCM with the unit bit vector
Each of the 5 concept nodes acts as a threshold function with zero threshold as in (2). So if and only if its total inner-product input is positive: . It otherwise equals zero and thus turns off or stays off if it is not active. Then a forward inference gives the following sequence of FCM state vectors:
This inference sequence defines an equilibrium 4-step limit cycle because the next state vector is just the first state vector . So the FCM equilibrium or hidden pattern is the indefinitely repeating cycle . This cycle defines the equivalent cycle of bit vectors . The repeating cycle predicts a predator-prey oscillation: The shark threat appears. Then the threatened dolphin pod clusters and runs away. Then the dolphins get tired. Then they rest. But the resting dolphins then attract a shark and so on. This limit cycle can model an incidental appearance of a shark.
Suppose instead that a shark appears and actively pursues the dolphins. We can model this what-if policy scenario by clamping the fourth node on during each update. This again amounts to adding a large positive input value for in (1). Clamping leads to two transient bit-vector states and then a stable 3-step equilibrium limit cycle:
The equilibrium 3-step limit cycle is or . The limit cycle defines and thus predicts a different form of predator-prey behavior: The shark tires the dolphin pod. The dolphins cluster in a safety maneuver. They then try to rest and still run away as they fatigue. The shark does not relent and the dolphins fatigue and so on.
We can formally describe the forward spread of causal activation in a FCM through the tools of the differential calculus. The Appendix treats the important special case where the concept nodes are soft thresholds or other differentiable functions of their inputs. The main theorem confirms that FCM causal activation is transitive.
We next show how to combine any number of FCMs into a common FCM knowledge base through an averaging or mixing process. The result is always some causal edge matrix . Then we will return to this fixed-matrix case and show how forward causal inference proceeds when the causal concept nodes are smoothly differentiable functions of their inputs. We will then show how time-differentiable causal edges can define causal learning laws.
Combining Causal Knowledge: Averaging Edge Matrices
Causal modeling faces a threshold epistemic question when dealing with multiple experts: How do we combine the causal models of multiple knowledge sources?
A common answer avoids the question by combining the causal knowledge or expertise before it enters a causal model. Some form of this knowledge preprocessing occurs with AI search trees and other DAG models. Multiple knowledge sources may lead a knowledge engineer to draw or otherwise modify a weighted causal arrow in a model. That differs from first letting each source have its own causal arrow and then combining. This fit-all-in-one-model approach may work well for problems of small dimension or small expert sample size. Even then it may obscure the disparate knowledge that went into the representation. But it can ensure that a causal DAG stays a DAG as it encodes new information. The approach can become more ad hoc and restrictive as the expert sample size grows. The likelihood of getting a causal cycle only increases with expert sample size and the node count of the model.
FCMs answer the epistimic question directly: Average the causal FCMs of each expert \parencitekosko-hidden1988, kosko-nnfs, taber1991. Preprocessing can still occur. But there is no limit to the number of FCMs that averaging can combine. The result is always a FCM and one with all the representative properties of a sample average. This numerical result holds even though experts may express their knowledge solely in words. Laws of large numbers can apply directly or partially based on how well the expert’s FCM sample approximates a random sample.
The FCM average forms a mixture or convex combination of the causal edges. A group of experts can each produce an FCM causal edge matrix that describes some fixed problem domain. Each expert can model different concept and policy nodes. The total number of nodes is . Augment the edge matrices with zero rows and columns for any missing nodes in an expert’s causal edge matrix. Then FCM knowledge fusion or combination takes the weighted average of their augmented causal edge matrices:
where the weights are convex weights and hence nonnegative and sum to one.
The weights can reflect relative expert credibility in the problem domain. They can reflect test scores or subjective rankings or some other measure of the experts’ predictive accuracy in prior experiments. The same weight need not apply to the entire th FCM edge matrix. Each edge value can have its own weight. So a weight matrix corresponds to each expert’s FCM edge matrix. Predd et al. \parencitepredd-et-al2008 developed a method for combining expert inputs when the experts abstain or when they are incoherent. Voting schemes \parenciteconitzer-et-al2009, caragiannis-procaccia2013 might also pick the FCM weights and affect the fusion process. We here take the weights as given and use equal weights as a default.
The edge matrices in (11) must be conformable for addition. So they must have the same number of rows and columns and in the same matrix positions. So we first take the union of all concept nodes from all knowledge sources. This again gives a total of distinct concept nodes. Then we zero-pad or add rows and columns of zeros for missing nodes in a given knowledge source’s causal edge matrix. This gives a conformable -by- signed fuzzy adjacency matrix after permuting rows and columns to bring them in mutual coincidence with all the other zero-padded augmented matrices.
The strong law of large numbers gives some guarantees about the convergence of this fusion knowledge graph to a representative population FCM if the knowledge sources are approximately statistically independent and identically distributed and if they have finite variance \parencitekosko-hidden1988, taber-yager-helgason2007. Then the weighted average in (11) can only reduce the inherent variance in the expert sample FCMs. So the knowledge fusion process improves with sample size . Simulations have shown that the equilibrium limit cycles of the combined FCM tend to resemble the limit cycles of the individual FCMs \parencitetaber-yager-helgason2007. An expert random sample is sufficient for this convergence result but not necessary. A combined FCM may still give a representative knowledge base when the expert responses are somewhat correlated or when the experts do not all have the same level of expertise or problem-domain focus. Users can also use policy articles or books or legal testimony as proxy experts.
Figure 2 shows the minimal combination case where two FCMs fuse into one representative FCM. The mixture or convex combination of FCMs creates a new fused FCM as the weighted averages of the FCMs’ augmented signed fuzzy adjacency matrices. Users can add new concept nodes or factors at will. Each new factor converts all -by- edge matrices into -by- edge matrices. This again amounts to adding a new zero-padded row and column to an edge matrix if its corresponding FCM does not include the factor as a concept node. An expert has a zero row and column for a concept node if the expert impliedly states that that concept is not causally relevant.
This fusion averaging technique can reflect bad effects as well as any other effect. The technique can reflect anomalous effects due to active sabotage or extreme variance in expert opinions. Highly variable expert inputs will tend to produce a highly variable FCM causal knowledge base. There may be no benefit from combining expert edge values that approximate thick-tailed probability densities. Cauchy probability bell curves closely resemble normal probability bell curves. Cauchy bell curves have slightly thicker tails that give rise to far more variable realizations. But the sample average of Cauchy random variables is itself a Cauchy random variable. So there is no benefit or decrease in system variance whatsoever in this thick-tailed case. The combined result has the same infinite variance that any one of the individual Cauchy samples has. Combining knowledge sources with even thicker-tailed probability densities (such as many alpha-stable densities) can produce variability even more extreme than the variability of any of the combined knowledge sources.
Large-scale FCM combination can combine FCMs with simple recursive updates. Then the new FCM edge matrix equals the current combined FCM edge matrix plus a correction term that includes the new FCM edge matrix. This recursive form of FCM combination can assist large-scale online FCM knowledge combination in social media and elsewhere. The recursions apply locally to the current combined edge value that gives the th causal edge of the combined FCMs .
We state the recursions for the simplest case where all FCMs and hence all knowledge sources have the same weight or credibility. Let denote the sample mean of the first causal edges :
Let denote the corresponding unbiased sample variance of the first combined th edge values:
for . The question is how to recursively update each of these averages given a new th edge value . The answer comes from the predictor-corrector form of updates often found in Kalman filters \parencitemeditch1969stochastic:
The new th edge statistic equals the old or predicted value plus a new or corrector value. A similar recursion holds for updating the combined edge’s sample covariance. More complex recursions hold for variable-weight edge values although these will likely not apply in large-scale online settings.
We turn next to inferring the directed causal edges from time-series data.
Learning FCM Causal Edges
Correlation does not imply causation. But some time-lagged correlations suggest causation. This is the idea behind the unsupervised learning laws below for estimating the directed causal edges . They are unsupervised because there is no teaching signal that the learning process matches against.
We can learn causal edge strengths through the concomitant activation among the factor pairs. This approach assumes that events (factor activities) are more likely to involve a causal connection if the events occur together \parencitehebb1949, kosko-nnfs, kosko-DHL1986. This suggests the well-known Hebbian correlation learning law (neurons that fire together wire together) for training neural network synaptic weights \parencitekosko-nnfs:
where denotes the time derivative of the signal . The passive decay term stabilizes the learning in the differential-equation model. It also models a “forgetting” constraint that helps the network prune inactive connections. The product term directly models concomitant correlation.
We can instead use concomitant variation \parencitejsm1843 in time between factors as partial evidence of a causal relation between those factors or concepts. Suppose the data show that an increase in occurs at the same time as increase in the . This concomitant increase suggests that the edge value should be positive. Suppose similarly that decreases in occur with decreases in . Then such concomitant decrease suggests a negative causal edge value . Even a slight time lag can between the two concept nodes can indicate the direction of causality in practice. Such concomitant variation or covariation leads to the differential Hebbian learning (DHL) law \parencitekosko-DHL1986, kosko-nnfs, kosko-hidden1988:
We use concomitant activation and variation as proxies for causation during unsupervised learning with Hebbian and differential Hebbian learning Laws. Hebbian learning tends to learn spurious causal links between any two concept nodes that occur at the same time. This quickly grows an edge matrix of nearly all unity values if most of the nodes are active. DHL correlates node velocities. So it has a type of arrow of time built into it. DHL correlates the signs of the time derivatives. So it grows a positive causal edge value if and only if the concept nodes and both increase or both decrease. It grows a negative edge value if and only if one of the nodes increases and the other decreases.
Both learning laws combine to give a more general version of DHL \parencitekosko-hidden1988:
This hybrid learning law fills in expected values for edge-strength vales when there is no signal variation in the factor set \parencitekosko1990unsupervised. The hybrid law takes advantage of the relatively rarer variation events to update the edge weights. It also tends to produce limit cycles or even more complex equilibrium attractors. It can produce fixed-point attractors given some strong mathematical assumptions \parencitekosko-hidden1988, kosko-nnfs.
Most applications use discretized versions of the DHL law \parencitekosko-fuzeng in (17):
DHL can infer causal edge values in a FCM if the system has access to enough time-series data. Such data can again come from expert opinion surveys. It can come from direct time-series data on measurable factors. Or it can come from indirect instrumental variables linked to the factors of interest: social media trends, Google trends, or topic modeling on news corpuses. Figures 3 and 4 show DHL training paths for single causal edge values. Figure 3 learns a causal edge for the PSOT FCM in the next section. Figure 4 shows the DHL training of an edge value using Google Trends time-series data of the use of politically charged terms “Black lives matter,” “All lives matter,” and “Blue lives matter” in online discourse. DHL here gave a close approximation of the true causal edge values after only a few iterations.
We can also fuse soft and hard knowledge sources through the above averaging technique in (11). Let denote the data-driven FCM. Let denote the expert-elicited FCM. Then the fused causal edge matrix is a simple mixture of the two edge matrices:
Then (19) or some other statistical learning law can continue the adaptation process by using new numerical data or occasional opinion updates from experts.
We can also learn edge values by taking a cue from the literature on Bayesian networks \parencitefriedman-koller2003. This entails putting a prior on a randomized FCM. Assume first that the FCM graph is random. Assume next a prior over the space of amenable graphs. Then use observed node data to update a posterior distribution of compatible FCM graphs. This Bayesian process requires taking care with the topology and size of the graph spaces. The process also requires that the user produce an accurate and tractable closed-form prior for the graphs. Fuzzy rules can directly represent these closed-form priors \parenciteosoba-mitaim-kosko-SMC2011.
Learning need not take place only in a stationary causal environment where the underlying causal relations do not change in time. Causal relations are apt to change in large-scale problems of social science. Figure 4 gives an example. Adaptive FCMs can still model these nonstationary causal worlds if the causal world does not change too fast and if the FCM learning system has access to enough time-series data that reflects these changes.
FCM Example: Public Support for Insurgency and Terrorism
Our first substantive FCM policy example is to the problem of public support for insurgency and terrorism (PSOT). We based two PSOT FCMs on the factor-tree PSOT analysis of \parenciteaom-pkd2013, davis-larson2012. Public support for insurgency and terrorism has complex socio-political causes \parencitesnow-soule-kriesi2008, ibrahim2007, nawaz2015, davis-cragin2009, davis-larson2012 that involve numerous factors. Davis’s later work \parencitedavis-wsc2015, davis-rand2015 used the PSOT model to motivate related models of an individual propensity for terrorism.
The PSOT model is a causal factor-tree model because it depicts the degree to which child nodes influence or cause parent nodes. Figure 5 and Table 1 give more details of the PSOT factor tree. The PSOT nodes represent factors that directly or indirectly relate to the Public Support for Insurgency and Terrorism concept .
Davis’s factor-tree models are multi-resolution models \parencitedavis-bigelow1998. Major elements have a hierarchical structure that allows users to specify factors at different levels of detail. Each node is an exogenously driven factor or it fires or activates based on a function of its inputs.
There are also cross-cutting factors besides sub-node factors. Cross-cutting factors affect multiple factors simultaneously. The “” nodes depend on all fan-in factors being present to a first approximation. The “” nodes depend on any of the fan-in factors being active or on a combination of the fan-in factors being active. There are several top-level factors that directly relate to the general of \parencitedavis-larson2012: Effectiveness of the organization , motivation for supporting the group/cause , the perceived legitimacy of violence , and the acceptability of costs and risks . Each of these factors have attendant contributory sub-factors.
PSOT edges denote positive influences by default. We denote negative edges with ‘-’ as with a FCM causal-decrease edge. Factor activation along a negative edge reduces the activation of the parent factor. We denote ambiguous edges with “”. The ambiguity refers to uncertainty over the edge’s direction of influence.
We based our FCM models on the important case of the al-Qa’ida transnational terrorist organization. We augmented the original PSOT with cross-links in the dynamic model to allow richer representation of system dynamics.
Davis et al. \parencitedavis-larson2012 have discussed how the PSOT model explains the public support for al-Qa’ida’s mission as follows (paraphrased from \parencitedavis-larson2012). The organizational effectiveness of al-Qa’ida depends in part on the charisma, strategic thinking, and organizational skills of its leadership (). al-Qa’ida has framed its ideology to appeal to many Muslims worldwide. Motivation for public support of al-Qa’ida’s beliefs comes from shared religious beliefs that stress common identity () and the sense of duty () that such identity fosters. al-Qa’ida also relies on a popular narrative of shared grievances () in the Muslim world. al-Qa’ida stresses the perceived glory () of supporting a cause that aims to redress these purported grievances. Religious beliefs and intolerance () help increase the perceived legitimacy () of violence against the West and against the many Muslims who do not share their Salafist views. Countervailing pressure () discourages more support for al-Qa’ida. This countervailing pressure occurs in part because much of the public believes that al-Qa’ida will not succeed and thus emerge as ultimate victors (). This pressure reduces the acceptability of costs and risks () for al-Qa’ida activities. The parameters of this al-Qa’ida case study determined the relative causal edge weights in our FCM models.
|lead||Leadership Strategic or otherwise|
|pkg||Ideological Package Framing|
|pres||Presence, Tactics, Deeds|
|EFF||Effectiveness of Organization|
|reli||Ideological Religious Concepts|
|MOTV||Motivation for Supporting Group, Cause|
|intl||Religious, Ideological, Ethical Beliefs; Intolerance|
|cprop||Cultural Propensity for Accepting Violence|
|PLEG||Perceived Legitimacy of Violence|
|lvic||Assessment of Likely Victor|
|prsk||Personal Risk and Opportunity Cost|
|scst||Countervailing Social Costs Pressures|
|ACR||Acceptability of Costs Risks|
|shgr||Shared Grievances Aspirations|
|ugb||Unacceptable Group Behavior|
|impl||Impulses, Emotions, Social Psychology|
|hsucc||History of Successes|
|efdoc||Effectiveness of Indoctrination/Passing Beliefs|
|hfail||History of Failures|
|PSOT*||Public Support for Insurgency and Terrorism|
Figure 6 shows FCM versions of the old static (acyclic) PSOT model and new dynamic PSOT model.
We now outline these changes to the original PSOT model. We first added a weak self-excitation feedback loop to the PSOT concept node because it is the highest-level concept node. This self-excitation loop modeled inertia in aggregate public opinion about insurgency and terrorism. This new feedback source induced a weak serial correlation in time in the PSOT concept node.
The next directed edges connected the top-level factors in \parenciteaom-pkd2013 from left to right: , , and . These directed causal edges made explicit an implicit point about O’Mahony and Davis’s use of factor trees. Their factor-tree representation assumed a left-to-right dependence of the top-level factors that we have linked \parencitedavis2011,aom-pkd2013. This implicit dependence made their factor tree more readable. The FCM model made this dependence explicit.
O’Mahony and Davis \parenciteaom-pkd2013 discuss other dynamic augmentations to the PSOT model. They point to the following new factors. A history of successes or failures can affect motivation and perceived risks. We model this dependence with the two factors “history of successes” and “history of failures.” These two nodes exert opposing influence on and . We split this history factor because traditional FCM models admit only positive values that represent the degree or intensity to which a concept occurs. The effectiveness of the organization factor EFF partly determines the history of successes: . Unacceptable group behavior also influences motivation and effectiveness: and and .
US-China Relations: A FCM of Allison’s Thucydides Trap
We next use a FCM to model a new conflict dynamic in international relations.
Political scientist Graham Allison calls this dynamic the Thucydides trap \parenciteallison2015, allison2017. Allison argues that this dynamic occurs when a new power emerges that challenges the dominance of an older power on the world stage. Superpowers such as the United States and China must avoid the Thucydides trap to avoid war.
Our FCM interpretation of Allison’s analysis predicts some type of war pattern in some cases and not in others. A large percentage (most) of the clamped input states led to a war-type outcome. But this was not a probability estimate. It reflects an exhaustive search of all possible clamped input states. It does not reflect that relative likelihood of the clamped input states themselves.
We based the fractional causal edge values for this FCM on Allison’s text. See the tables of textual justifications below. We also tested the robustness of this properly fuzzy FCM by thresholding all positive edge values to and all negative edge values to . This gave a trivalent FCM that predicted some type of war for the majority of all clamped input states. We stress again that the prevalence of a war outcome in this model does not mean that the FCM predicts war with high probability. That would require that all input states are equally likely and they clearly are not. We did not address the issue of which inputs are more or less likely to occur. Our task was to translate Allison’s textual claims into a representative FCM causal model and explore its pattern predictions.
The name “Thucydides trap” stems from a famous political conjecture in Thucydides’ History of the Peloponnesian War \parencitethucydides-jowett (Book 1, paragraph 23): “the real though unavowed cause [of the war] I believe to have been the growth of Athenian power, which terrified the [Spartans] and forced them into war.” Thucydides expands on his causal theory of war in a speech that an Athenian gives to the Spartan assembly \parencitethucydides-jowett (Book 1, paragraph 76):
So that, though overcome by three of the greatest things, honor, fear, and profit, we have both accepted the dominion delivered us and refuse again to surrender it, we have therein done nothing to be wondered at nor beside the manner of men. Nor have we been the first in this kind, but it hath been ever a thing fixed for the weaker to be kept under by the stronger.
Thucydides claimed that three main factors determine how nation-states interact: interest, fear, and honor. The interest or profit factor just restates a nation’s self-interested actions. Nation-states act against other states to maintain their high-priority national interests within the geographic scope of their power. These interests include national security, economic security, and sovereignty. Fear refers to the emotionally charged frames through which a nation views world events. Honor refers to the nation’s senses of self and entitlement. Examples include the nineteenth-century US’s manifest destiny or China’s older concept of Tianxia or “all under heaven.”
Allison expands on these factors in his Thucydides-trap model where again the rise of a new power risks war with a dominant power. He argues that fear is the main cause of war between such a dominant power and a new rising power. He looked at such historical power struggles that extend back to the century. He found that of these power struggles ended in war. Allison also contends that similar structural dynamics apply elsewhere in international relations.
We parsed Allison’s analysis \parenciteallison2017 to create a FCM of the Thucydides trap for current US-China relations. The FCM follows Thucydides and uses his three main factors of interest, fear, and honor. Auxiliary factors give context to the main factors. The resulting Thucydides-trap FCM has factors. Table 2 lists and describes these factors.
|usd||US Military/Defense Posture|
|chnd||China Military/Defense Posture|
|ENT||Sense of Entitlement/Honor|
|uspub||US Public Resentment|
|chnpub||Chinese Public Resentment|
|dipl||Diplomacy Channels & International Rules|
|INT||National Interests Clash|
|usecon||US Economic Dominance|
|chnecon||China Economic Dominance|
|ally||Alliance Network Structural Friction|
|shi||‘Shi’ or Contextual/Historical Military Momentum|
|WAR*||War, Military Conflict between USA and China|
The Thucydides-trap FCM also uses some of the auxiliary concepts that Allison discussed. One example is how nuclear weapons affect the chance of all-out war. Diplomatic institutions and economic dependencies also affect the chance of war. Treaty and alliance obligations can rapidly induce or expand war as happened in both World Wars. The FCM in Figure 7 shows the directed causal edges among the concepts. Figure 8 shows the Thucydides-trap FCM’s causal edge matrix .
We surmised the causal edge strengths based on Allison’s discussions. The final tables below show the textual justifications for the translation into these edge values . Below we present the results of thresholding the magnitudes to their binary extremes.
The Thucydides’ Trap FCM predicted war-type patterns between the US and China more often than it predicted peace-type patterns. An exhaustive search of the space of possible (clamped) scenarios found that only under of scenarios led to lasting peace between the dominant power (US) and the rising power (China). We point out again that these are not representative probabilities because we did not know or estimate the relative probabilities of the input states. We simply assumed that they were all equally likely. The FCM rapidly converged to an equilibrium state where was active when the input consisted of US-specific nodes that were stagnant and China-specific nodes that were rising. The key factors present in peaceful accommodations were significant geographic distance, mutual assured destruction (via nuclear weapons posture), a shared culture, economic interdependence, and the presence of diplomatic channels.
We simulated the FCM dynamical evolution from trap-like initial conditions (see Figure 9 for initial states and evolution trace). The test scenario consisted of six causal relations:
The US maintains a strong military or defensive posture.
China is economically rising or already dominant.
US public has high resentment towards China is high.
Both sides are economically interdependent.
Both sides have enough nuclear capability to pose credible threats to each other (sufficient for deterrence).
Strong diplomatic channels exist between both sides.
We coded this initial scenario for the FCM concept nodes. Forward inference gave the sequence of states in Figure 9.
The FCM converged in iterations to a fixed-point equilibrium state with the node active. The FCM’s state evolution showed that China’s economic dominance led to a clash in national interests while the US’s defensive posture led to fear. This led China to ramp up its defensive posture. The US public’s resentment towards China led to a sense of entitlement. That led in turn to Chinese public resentment. The clash in national interests, fear, and sense of entitlement or national honor combined to activate the node. This FCM behavior is similar to the Thucydides-trap dynamics that Allison described.
The FCM’s war prediction was robust against many perturbations of the input state. It persisted despite changes in the activation of nodes like diplomacy and geographic distance. But activating the Shared-Culture concept node did prevent war. The FCM also fell out of the war equilibrium when we shut off either the concept node for US Defense Posture or for Chinese Economic Dominance. These peaceful equilibrium outcomes also appear consistent with Allison’s analysis. Figure 10 (left panel) shows the average concept-node activations for initial scenarios that led to peaceful outcomes.
Other analysts may well surmise different fuzzy causal edge values given the same cited text in the tables. We would expect more agreement on the signs of these edges. So we tested whether a thresholded version of our properly fuzzy Thucydides-trap FCM made similar equilibrium predictions. We formed this trivalent Thucydides-trap FCM by replacing all positive edges with and all negative edges with . Zero-valued edges stayed the same. The trivalent FCM still predicted war-like patterns for most clamped input states. It predicted peace for only of all input scenarios compared with for the original fuzzy Thucydides-trap FCM. We point out again that we treated all input states as equally likely. Real-world conflict scenarios are not equally likely. The right panel of Figure 10 shows the similar average concept-node activations for the input scenarios that resolved peacefully in the trivalent FCM. This counts as evidence that the properly fuzzy Thucydides-trap FCM was reasonably robust to perturbations in the causal edge-value magnitudes.
Analysts may also disagree on structure and not just numerical edge values. Other researchers may surmise different causal links (edges) or different relevant concept nodes. This may depend on an alternative reading of the source text or different domain expertise. Such disagreements reflect the value of FCM modeling because different FCM models can capture these expert differences and then combine them if needed into a summary FCM.
The US-China FCM shows just one way that FCMs can represent this complex pattern of international dynamics. Domain experts can instantiate concurring or dissenting causal maps. Experts and critics alike can then compare or contrast the different theories in this social-scientific domain. Such comparisons extend beyond just comparisons of the basic representation of different social-scientific theories. Analysts can also compare the long-term implications of the different theories with more quantitative rigor. They can compare the equilibrium “hidden patterns” in the FCMs’ temporal dynamics).
Fuzzy cognitive maps offer a flexible way to model large-scale feedback causal system and to make forward inferences. Their cyclic structure produces a nonlinear dynamical system that tends to quickly equilibrate to limit-cycle predictions given a causal input or stimulus. Users can also step through FCM transient or equilibrium states and thereby unfold the dynamical system in time. The underlying matrix structure of a FCM’s directed causal edges permits natural knowledge fusion through simply adding or mixing the augmented FCM causal-edge matrices for any number of experts. Such knowledge combination tends to improve with the number of combined experts.
Directed acyclic graphs lack these features precisely because they are acyclic. Their use of probability to describe causal uncertainty is secondary to their lack of cycles in terms of modeling feedback. Cycles describe feedback in directed graphs. And combining directed graphs will in general produce several such cycles. The vector-matrix operations of FCMs also involve much less computation than the probabilistic computations in Bayesian belief networks. But FCMs cannot produce the precise probability descriptions that DAGs can if the user knows the DAG’s corresponding complete joint probability density function and uses the sum-product algorithm. Imposing this or a related probability structure on a FCM is an area for future research.
Current FCM inference and learning have two key limitations that future research also needs to address. The first is that FCMs do not easily permit backward chaining. So they do not in general answer which input caused an observed output effect. Users cannot simply run the FCM in reverse because of the node nonlinearities. We instead must exhaustively test all or nearly all input states to see which inputs map to which output equilibria. This computes the inverse image of each output attractor basin. It carves up the FCM state-space into attractor regions. We know for a given output only that the input came from an attractor region. Future research should address this limitation with new inferencing or other techniques.
The second limitation is more challenging: How do we infer missing FCM concept nodes? This just asks how we come up with a new causal hypothesis. A new node leads to new causal conjectures for all nonzero edges that connect to the new node. Current adaptive techniques infer and tune the causal edge values only for known concept nodes. An open research problem is to find data-based techniques that infer new or missing concept nodes in large-scale FCM causal models. Solutions may include Bayesian priors or rules over node sets or other statistical techniques for model building.
Osonde A. Osoba, Ph.D. (firstname.lastname@example.org) is a researcher at the RAND Corporation and a professor at the Pardee RAND Graduate School, Santa Monica, CA, USA. He received his Ph.D. in Electrical Engineering from the University of Southern California.
Bart Kosko, Ph.D., J.D. (email@example.com) is a professor of Electrical and Computer Engineering and Law at the University of Southern California in Los Angeles.
Mathematical Appendix: FCM Causal Influence Theorems
This appendix states and proves the two main theorems on downstream causal influence in a FCM. Both theorems apply to any two concept nodes in a FCM. Theorem 1 shows the transitive effect that upstream concept node causes on downstream concept node along the lone directed causal path with pairwise directed causal edge values . Theorem 2 extends this result to the total downstream effect along all acyclic paths from to .
We first show next how causal influence propagates through a FCM digraph with continuous or smooth concept nodes. The results still apply to binary or threshold concept nodes if the nodes use a steep logistic or other differentiable sigmoid function to define a soft threshold. This analysis results in two theorems. The second theorem extends the first. The Appendix gives the mathematical details and proofs of the theorems.
The first result describes downstream causal influence for just one causal path from concept node to . A FCM may contain many other directed causal paths from to . So the total causal change invokes the more general chain rule that sums over all the partial derivatives in (31) in all the paths involved. The second theorem in the Appendix states this total-causal result. Discrete versions of the theorems also hold. They require that one keep track of the discrete time steps as the causal activation flows from one node in a directed path to the next node.
FCM causal influence is a form of spreading activation in a semantic network. The spreading activation resembles the activation dynamics of asynchronous feedback neural networks. These networks differ in kind from the popular feedfoward neural networks often found “deep” neural classifiers. The spreading activation also roughly resembles the routed messages in message-passing probabilistic inference on DAGs. Such messages propagate “evidence” from observed evidence nodes to a set of output nodes of interest. Messages encode and quantify how much the state of a node affects beliefs about the state of another node. Forward-inference on DAGs depends crucially on proper routing of these messages on the causal digraph. Interlocking nonlinear differential or difference equations describe FCM causal influence. The analogy with message passing is more accurate when we can step through or otherwise unfold these FCM dynamical systems in time.
The FCM causal influence of node on describes how much the state of affects the state of . The theorems that follow formalize our intuitions on causal influence and generalize previous theorems on the propagation of causal influence \parenciteosoba-kosko2017. They show that causal influence is transitive and easy to track on loopless paths inside a FCM. The influence also varies directly with the number of causal paths that connect two nodes. The presence of cycles and loops can induce causal influences that are non-local in time. The theorems apply directly only to acyclic paths even though users can again step through the causal links in discretized time in an arbitrary path. The fuzzy or non-binary nature of most edge values can lead to rapid influence die-out because of the product nature of the causal influence. Feedback loops can reverse such influence die-out even in small-scale FCMs. The equilibrium result can be a periodic or aperiodic attractor. It can be a “hidden pattern” or forward prediction in the FCM causal tangle.
We show now how FCM nodes influence one another through a weighted product of intervening causal edge strengths . These results describe forward causal chaining along a directed path or summed over all such directed paths that connect two concept nodes. The results all involve the transitive causal product .
Consider first the directed causal path from concept node to node by way of the intervening node :
Then how does a change in the input node causally affect the downstream node ? The chain rule of differential calculus gives a transitive-based product answer for the logistic concept node activation in (3):
So the induced causal effect of a change in depends directly on the transitive causal-edge product . This causal influence decays in intensity the lesser or fires or occurs. The edge product is negative if exactly one of the edge values is negative. It is positive otherwise.
The causal-influence result (25) extends directly to longer causal chains. Suppose there is a directed causal path of length from the initial concept node to the final node :
where now the nonnegative weighting function is the double product . The edge product is positive if the number of negative edges is even. It is negative if the number of negative edges is odd. The magnitude of the change can only decrease as the causal chain lengthens. The fuzziness or partial firing of the concept nodes only exacerbates this monotone causal decay.
The causal influence in (27) still holds if we replace the logistic activation function (1) of concept node with an arbitrary monotonically nondecreasing functions . Then and so because the weighting function is just the product of these activation partial derivatives. We can now state and prove Theorems 1 and 2 on causal influence in FCMs. Theorem 1 summarizes and extends the above argument for an arbitrary single directed path in a FCM. Theorem 2 further extends the argument to summing over all acylic paths between two such concept nodes.
Partial Causal Influence in Fuzzy Cognitive Maps. Suppose that a fuzzy cognitive map has concept nodes and directed causal edges . Suppose further that the concept nodes have monotonically nondecreasing activations: where the argument of has the same inner-product form as in (1). Then the causal influence of the concept node on the downstream node of the length- directed causal chain
is a nonnegatively weighted product of the intervening causal edge strengths :
where the weighting function has the form
The result follows from iterated applications of the chain rule:
Theorem 1 implies that the sign of the edge product depends on only the number of negative edges. The edge product is positive if the number of negative edges is even. It is negative if the number of negative edges is odd. The magnitude of the change can only decrease as the causal chain lengthens. The fuzziness or partial firing of the concept nodes only exacerbates this monotone causal decay. The result is often fairly rapid die-out of the causal influence in practice.
A FCM often has more than one directed causal path from to because a FCM is not a tree in general. Then the total causal change sums the single-path or partial causal influences (31) over all paths from to . Let denote a directed fuzzy causal path from to :
Let denote the set of node subscripts in the directed path:
We can restate the causal influence along the lone path in Theorem Mathematical Appendix: FCM Causal Influence Theorems as
where denotes the number of nodes in the path. Then we can likewise state the total causal influence of on as the sum of all single-path causal influences over all such paths from to :
Taking this path sum gives Theorem 2.
[Total Causal Influence Over Acyclic FCM Paths] Suppose an -node FCM satisfies the hypothesis of Theorem 1. Suppose that there are no cycles in any of the directed paths from concept node to . Then the total causal influence of concept node on is
where is the complete set of directed paths from to
Appendix: Textual Justifications for Thucydides Trap FCM
|FEAR/INT/ENT WAR*||+ve||Pg. 39: “Thucydides identifies three primary drivers fueling this dynamic that lead to war: interests, fear, and honor.” “[Interests refer to the] survival of the state and its sovereignty in making decisions in its domain free from coercion…” “[…] ruling powers’ fears often fuel misperceptions and exaggerate dangers.” “[…] Thucydides’s concept of honor encompasses what we now think of as a state’s sense of itself, its convictions about the recognition and respect it is due, and its pride. As Athens’ power grew over the fifth century, so too did its sense of entitlement.”|
|ShrdCult WAR*||-ve||Pg. 200: “Cultural commonalities can help prevent conflict” Pg. 136: “… the fundamental source of conflict in the post-Cold War world would [be] cultural.” (quoting Huntington)|
|ally WAR*||+ve||Pg. 211–212: “Alliances can be a fatal attraction. […] coalitions have sought to create a balance of power to maintain regional peace and security. But such alliances also create risks – since alliance ties run in both directions.” Pg. 57–58: “Britain had no vital national interests at stake in the Balkans. Nevertheless, it was pulled into the fire, partly because of entangling alignments…”|
|NUKE WAR*||-ve||Pg. 206–210: “nuclear weapons have no precedent.” “Under [conditions of mutually assured destruction], one state’s decision to kill another is simultaneously a choice to commit national suicide.” “China has also developed a nuclear arsenal so robust that it creates a […] version of MAD with the United States.”|
|shi WAR*||+ve||Pg. 148: “… the evolving context in which a strategic situation occurs is critical, because it determines the shi of that situation. [This] most closely describes the momentum inherent in any given situation…” Pg. 175: “America’s history of [recent] military interventions would play an outsized role in shaping Washington’s response.” Pg. 215: “Bismarck […] described statecraft as essentially listening for the footsteps of God and then grabbing the hem of His garment as He goes by.”|
|econdep usecon/chnecon||-ve||Pg. 194: “States can be embedded in larger economic […] institutions that constrain historically normal [conflict behaviors].” Pg. 210: “Thick economic interdependence raises the cost and thus lowers the likelihood of war.”|
|geod FEAR/WAR*||-ve||Pg. 109: “ ’Making China Great Again’ means […] reestablishing control over the territories of ’Greater China’.” Pg. 161“…there is currently little chance of an accidental collision between US and Chinese ships. […] the ’tyranny of distance’ raises questions about America’s ability to sustain a campaign against China in [the East and South China Seas].”|
|dipl ENT/INT||-ve||Pg 190: “Higher authorities can help resolve rivalry without war. […] to the extent that states can be persuaded to defer to the constraints and decisions of supranational authorities or legal frameworks, […] these factors can play significant roles in managing conflicts that would otherwise end in war.” Pg. 198:“Wily statemen make a virtue of necessity and distinguish between needs and wants.”|
|WAR* WAR*||+ve||Continuity constraint: states at war are likely to remain in conflict barring military resolution or exogenous shock.|
|chnecon/usecon INT||+ve||Pg. 47: “The contest between a rising and ruling power often intensifies competition over scarce reseources. When an expanding economy compels the first to reach further afield to secure essential commodities […] the competition can become a resource scramble.”|
|chnpub/uspub ENT||+ve||Pg. 37: “Backing down was a non-starter […] the Athenian populace was unwilling to bow to Spartan demands…” Pg. 120: “…’winning or losing public support is an issue that concerns the [Chinese Communist Party]’s survival or extinction.’ ” (quoting Xi Jinping) Pg. 171: “the White House Situation Room cannot back down: video of the ship’s wreckage and stranded US sailors on cable news and social media has made that impossible” (part of a hypothetical US-China escalation scenario) Pg. 171: “As millions of it citizens’ social media postings are reminding the government, after its century of humiliation at the hands of sovereign powers, the ruling Communist Party has promised: ‘never again.’ ” (in the same hypothetical US-China escalation scenario)|
|chnd/usd FEAR||+ve||Pg. 54: “Because states can never be certain about each other’s intent, they focus instead on [military] capabilities. Defensive actions by one power can often seem threatening to its opponent…”|
|ShrdCult econdep||+ve||\parencite huntington1993 Pg. 34: “…cultural difference exacerbates economic conflict. […] the antipathies are not racial but cultural. The basic values, attitudes, behavioral patterns of the two societies could hardly be more different. The economic issues between the United States and Europe are no less serious than those between the United States and Japan, but they do not have the same political salience and emotional intensity because the differences between American culture and European culture are so much less than those between American civilization and Japanese civilization.”|