Stochastic Modelling of Urban Structure1footnote 11footnote 1Ellam L, Girolami M, Pavliotis GA,Wilson A. 2018 Stochastic modelling of urban structure. Proc. R. Soc. A 20170700.

Stochastic Modelling of Urban Structure111Ellam L, Girolami M, Pavliotis GA,Wilson A. 2018 Stochastic modelling of urban structure. Proc. R. Soc. A 20170700.

L. Ellam Email: M. Girolami G. A. Pavliotis Department of Mathematics, Imperial College London, London, SW7 2AZ A. Wilson
2 October 2017

The building of mathematical and computer models of cities has a long history. The core elements are models of flows (spatial interaction) and the dynamics of structural evolution. In this article, we develop a stochastic model of urban structure to formally account for uncertainty arising from less predictable events. Standard practice has been to calibrate the spatial interaction models independently and to explore the dynamics through simulation. We present two significant results that will be transformative for both elements. First, we represent the structural variables through a single potential function and develop stochastic differential equations (SDEs) to model the evolution. Secondly, we show that the parameters of the spatial interaction model can be estimated from the structure alone, independently of flow data, using the Bayesian inferential framework. The posterior distribution is doubly-intractable and poses significant computational challenges that we overcome using Markov chain Monte Carlo (MCMC) methods. We demonstrate our methodology with a case study on the London retail system.

Urban Modelling; Urban Structure; Bayesian Inference; Bayesian Statistics; Markov chain Monte Carlo (MCMC); Complexity

1 Introduction

The task of understanding the inner workings of cities and regions is a major challenge for contemporary science. The key features of cities and regions are activities at locations, flows between locations and the structure that facilitates these activities [1]. It is well understood that cities and regions are complex systems, and that an emergent structure arises from the actions of many interacting individuals. The flows between locations arise from the choices of individuals. An understanding of the underlying choice mechanism is therefore advantageous for planning and decision making. Economists have long supported the idea that consumer choices are derived from utility, a measure of net benefit, although preferences can only be measured indirectly by the phenomena they give rise to [2].

Random utility models, such as the multinomial logit model [3], provide a discrete choice mechanism based on a utility function. These models have received considerable attention in the econometrics literature [4]. The more conventional random utility models assume that choices are conditionally independent and require large volumes of flow data to calibrate. It is generally difficult to ascertain the flow data for a large number of individuals residing in a country or city, and this may require an extensive survey that suffers from sampling biases. On the other hand, the structure facilitating activities can be more straight-forward to measure.

It turns out that the flows between locations concern a vast number of individuals and are well-represented by statistical averaging procedures [5]. It also turns out that the evolution of urban structure can be described by a system of coupled first-order ordinary differential equations that are related to the competitive Lotka-Volterra models in ecology [6]. The conventional Harris and Wilson model in [6] is obtained by combining Lotka-Volterra models with statistical averaging procedures, after having expressed the flows in terms of the evolving structure and spatial interaction. As it tends to be more feasible to observe the emergent structure, for example, configurations of floorspace dedicated to retail activity, our work is largely motivated by the existing models of urban structure [1, 6, 7, 8, 9]. By adopting a similar approach, we view the flows between locations as ‘missing data’.

We note, however, that there is an urgent need to provide an improved modelling capability that captures the stochastic nature and uncertainty associated with the evolution of urban structure. The key shortcoming of the Harris and Wilson model is that it is deterministic and converges to one of multiple equilibria as determined by the initial conditions. In reality, the behaviour provided by the Harris and Wilson model would be accompanied by fluctuations arising from less predictable events. We instead introduce a mathematically well-posed systems of stochastic differential equations (SDEs) to address this shortcoming, and provide an associated Bayesian inference methodology for parameter estimation and model calibration.

To this end, we take a novel approach and construct a probability distribution to represent the uncertainty in equilibrium structures for urban and regional systems. The probability distribution is a Boltzmann-Gibbs measure that is the invariant distribution of a related SDE model [10], and is defined in terms of a potential function whose gradient describes how we expect urban structure to evolve forward in time. The potential function may be interpreted as constraints on consumer welfare and running costs from a maximum entropy argument [8, 11]. For the purposes of parameter estimation, the Boltzmann-Gibbs measure forms an integral part of the assumed data generating process in a Bayesian model of urban structure [12, 13]. A computational statistical challenge arises as there is an intractable term in the density of the Boltzmann-Gibbs measure that is parameter-dependent. The intractable term must be taken into consideration when using Markov chain Monte Carlo (MCMC) to explore the probability distributions of interest [14, 15]. Our approach is applicable to a wide range of applications in urban and regional modelling; we demonstrate our approach by inferring the full distribution over the model parameters and latent structure for the London retail system.

2 Modelling urban systems

In this section, we construct a probability distribution for urban and regional systems. We work in the setting of the Harris and Wilson model [6] and use consumer behaviour as an archetype, however, the methodology is general and has wider applications such as archaeology, logistics, health care and crime to name a few [1]. We are interested in the sizes of destination zones where consumer-led activities take place, for example, shopping. Similarly, there are origin zones from where consumers create demands for each of the destination zones. We define urban structure as the vector of sizes . In what proceeds it is more natural to work in terms of log-sizes where each . We refer to log-size as the attractiveness, which is an unscaled measure of benefit, and by working in terms of attractiveness we avoid positivity issues when developing a stochastic model. We first describe a stochastic generalisation of the Harris and Wilson model and then consider the equilibrium distribution as a probability distribution of urban structure.

2.1 A stochastic reformulation of the Harris and Wilson model

The flow of between destination zone and origin zone is denoted . We illustrate a component of an urban or regional system in Figure 1. For a singly-constrained urban system, the demands made by the origin zones are


and are known. The demands made for the destination zones are


and are to be determined. The demands for the destination zones depend on urban or regional structure. It is assumed that larger zones provide more benefits for their use and that local zones are more convenient and cost less to use. A suitable model of the flows is obtained by maximizing an entropy function subject to the constraint in (1) in additional to fixed benefit and cost constraints [5, 8]. The resulting flows are


where are scaling parameters and each represents the cost or inconvenience of carrying out an activity at zone from .

Figure 1: Illustration of a flow in a urban or regional system. It is assumed that there are origin zones (for example, left) and destination zones (for example, right). The flow denotes the flow of quantities from origin zone to destination zone . In a urban or regional system there are flows similar to the one depicted.

We expect that zones with unfulfilled demand will grow, whereas zones that do not fulfil their capacity will reduce to a more sustainable size. It is therefore reasonable to expect a degree of stability in the sizes of the destination zones. A suitable model of the dynamics is given by the Harris and Wilson model [6], which is described by a system of ordinary differential equations (ODEs)222For simplicity, we are not explicit about time-dependence when describing differential equations.


where is the responsiveness parameter and is the cost per unit floor size. The assumption that zones aim to maximize their size until an equilibrium is reached is justified by including the cost of capital in the running costs. A natural generalisation of the Harris and Wilson model is the following SDE with multiplicative noise that we interpret in the Stratonovich333The Stratonovich interpretation is obtained from a smooth approximation of white noise, which is appropriate for the dynamical system of interest. For further discussion refer to [10]. sense


for a standard -dimensional Brownian motion and volatility parameter . A heuristic interpretation of the SDE is that over a short time , the net capacity term in (4) is randomly perturbed by centred Gaussian noise with a standard deviation of . The noise term represents fluctuations in the growth rates arising from less predictable events that are not captured by the original model. The specification of multiplicative noise preserves the positivity of each .

With the change of variables , the Harris and Wilson model in (4) can be expressed as a gradient flow. The corresponding stochastic dynamics in (5) is an overdamped Langevin diffusion. To express this notion, we introduce a potential function , its gradient and an ‘inverse-temperature’ parameter and reformulate the stochastic dynamics as


where the potential function is


It is well understood that the density of , denoted , evolves in time according to the Fokker-Planck equation [10]. For the SDE in (6), the Fokker-Planck equation can be written as


Whilst (8) is very challenging to solve, especially in higher-dimensions, its steady-state solution is available in closed form and is the density of a Boltzmann-Gibbs measure given by


The Boltzmann-Gibbs measure described by (9) forms the basis of our stochastic model of urban structure. The potential function given by (7) does not yield a well-defined probability distribution as the normalizing constant in (9) is not finite. In order to address the issue we could restrict the dynamics to a bounded subset of , or introduce a confining term in the potential function. We adopt the latter approach and later argue that this approach amounts to an economically meaningful constraint.

2.2 Boltzmann-Gibbs measures for urban structure

We model urban and regional structure as a single realisation of the Boltzmann-Gibbs measure described by (9). The Boltzmann-Gibbs measure is the stationary distribution of the overdamped Langevin dynamics considered, however, we acknowledge that there are other stochastic processes that have the same stationary distribution [16]. It is desirable that the potential function satisfies the assumptions in Appendix A. It suffices to say here that smooth potential functions that grow at least linearly but no faster than exponentially at infinity have the desired mathematical properties.

The Boltzmann-Gibbs measure can also be obtained from a maximum entropy argument [8, 11]. The advantage of this view is that the terms in the potential function can be interpreted as economic constraints. We consider a potential function with three components to develop a baseline model, although more comprehensive presentations are possible444We present a simple aggregated model to illustrate the ideas. A refined model with disaggregation does not change the underlying arguments.:


where is as before and is an additional parameter. The utility potential describes consumer welfare arising from utility-based choices; the cost potential enforces capacity limits in the system; and the additional potential is a confining term that represents government initiatives, continued investment or a background level of demand. If we consider a random variable that is subject to the following equality constraints


with each , then the maximum entropy distribution of can be written as the Boltzmann-Gibbs measure whose density is given by (9) with reference to the potential function in (10). We now consider the meaning of each of these constraints in turn.

2.2.1 Utility potential

A natural candidate for a utility potential is a measure of consumer welfare. For example, welfare may be taken to be the area under the demand curve [17], given by (2), that is equal to the path integral


where we have defined the deterministic utility function


In Appendix B we show that (13) but with dependent on is also obtained by seeking a utility function consistent with a singly-constrained model and the path integral in (12). The log-sum function is commonly used as a welfare measure in the economics literature [17, 18, 19]. To make the connection with random utility maximisation explicit, we define the stochastic utility function for a choice being made from origin zone as


where the are independent and identically distributed Gumbel random variables. Then under the utility maximisation framework, the expected utility attained from a unit flow leaving origin zone is


where is the Euler-Mascheroni constant [17]. The utility potential may then be expressed as the expected utility attained from all flows in units of


Two remarks are in order. First, the scaling factor of is necessary to ensure that the utility potential is non-constant in the limit . Second, the tight bounds [20]


show that may be finite when any , and so an additional potential is needed to prevent zones from collapsing from a lack of activity. The bounds again show that the utility potential is closely related to the best alternative available to each of the origin zones.

2.2.2 Cost potential

The cost potential prevents each zone from becoming too large, and is justified by the notion that running costs increase with size. We therefore require that . In view of the equality constraints in (11), an appropriate cost potential is the total size or capacity of the system


in which is as before. Since we have


this choice of potential yields the linear cost term in the overdamped Langevin dynamics considered in (6).

2.2.3 Additional potential

The final potential term must satisfy and must grow sufficiently fast at infinity in order for (9) to be well-defined. The purpose of the additional potential is to prevent zones from collapsing from a lack of activity. Such mechanisms are commonplace in urban and regional systems, for example, continued investment or government initiatives. In view of the equality constraints in (11), a suitable potential function is


which ensures that the attractiveness of each zone is finite. The partial derivatives of the additional potential function are


Therefore the finiteness constraint requires that there is an additional positive constant term in the deterministic part of the SDE model, given by (6), to ensure that the SDE has a well-defined stationary distribution.

2.3 Model Summary

In summary, we have specified the following potential function


which satisfies the assumptions in Appendix A. The potential function is similar the one obtained by reformulating the Harris and Wilson model in (6)-(7), however, it contains an additional term to prevent zones from collapsing. A stochastic generalisation of the Harris and Wilson model is given by the overdamped Langevin diffusion in (6), for which the process converges at a fast rate to the well-defined Boltzmann-Gibbs measure described by (9). The corresponding size dynamics, in the form of (5), are given by the Stratonovich SDE


which is a stochastic generalisation of the original Harris and Wilson model that includes a positive shift to the multiplicative scale factor. In the limit , we obtain the original Harris and Wilson model in (4).

In the regime , the potential function has stationary points coinciding with the fixed points of the original Harris and Wilson model. The stationary points for the potential function are given by simultaneous equations


Whilst the behaviour of the stochastic and deterministic models may be similar in low-noise regimes over finite time intervals, we emphasise that the asymptotic behaviour differs greatly between the two. Here we consider a deterministic model to be given by (23) in the limit . For a deterministic model, the dynamics will converge to a stable fixed point satisfying (24), as determined by the initial condition. For a stochastic model, the system will converge to a statistical equilibrium that does not depend on the initial condition. As , the stochastic model spends more time around the lower values of , which occur around stable stationary points, as summarised by the limiting stationary distribution given by (9).

We now comment on the Boltzmann-Gibbs measure described by (9). The Boltzmann-Gibbs measure is the equilibrium distribution of (6), but is also justified as a probability distribution for urban and regional structure with a maximum entropy argument. When considering the Boltzmann-Gibbs measure, we specify to avoid over-parametrising the model since the relative level of noise is controlled by the inverse temperature . As , the Boltzmann distribution collapses to a Dirac mass around the global minimum of , which is unlikely to provide a good fit to the observed urban structure. As , the distribution of sizes approaches an improper uniform distribution. The profile of is largely influenced by the pair of and values, as illustrated in Figure 2. A large relative to results in all activity taking place in one of the zones, whereas this regime is unlikely when is low relative to .

Lastly, we can use the deterministic model to specify appropriate values of the cost of floorspace and the additional parameter . By defining


the deterministic model converges to an equilibrium with a total size of units. Setting as in (25) is justified with a supply and demand argument [21]. For simplicity, we use ; the choice is arbitrary. We can then specify relative to the size of the smallest zone possible, since at equilibrium the size of a zone with no inward flows is .

Figure 2: Illustration of the potential function for a small model comprising of two competing zones. The profile of the potential function is largely determined by the and pairing; in the illustration above we have held fixed and show for different values of .

3 Parameter estimation

In this section we consider the inverse problem; the task of determining and from observed urban structure. The value of describes consumer preference towards more popular destinations and the value of describes how much consumers are inconvenienced by travel. We use retail activity as an archetype, however, our methodology is general and can be applied to other singly-constrained systems. Whilst and can be estimated using discrete choice models [22, 23, 24, 25], this approach requires large volumes of flow data and is impractical for large systems. We instead make use of the model described by the Boltzmann-Gibbs measure in (2).

We formulate the task of inversion as a statistical inference problem, as advocated in [12]. The Bayesian approach is based on the following principles: the unknown parameters are modelled as random variables; our degree of uncertainty is described by probability distributions; and the solution to the inverse problem is the posterior probability distribution. Unlike classical methods, a Bayesian approach is well-posed and allows us to incorporate prior knowledge of the unknowns into the modelling process. A Bayesian approach yields a posterior probability over the model parameters, and the parameter values can be determined from the posterior mean or maximum a posteriori estimates.

3.1 A Bayesian approach to parameter estimation

The Boltzmann distribution in (2) is assumed to form an integral part of the data generating process, however, further uncertainty arises from measurement noise555It may be necessary to include a model error term if the Boltzmann-Gibbs measure provides a poor fit to the data.. To this end, we assume that an observed a configuration of urban structure is related to some latent sizes and multiplicative noise by


Multiplicative noise is appropriate since all measurements are positive and there is more scope for error when measuring larger zones. As before, it is natural to work in terms of log-sizes and we assume that is a realisation of the Boltzmann-Gibbs measure given by (9) and (22). The latent variables depend on the model parameters , which we summarise by a single variable for notational convenience. We assume that is a realisation of Gaussian noise for some symmetric positive definite covariance matrix .

We specify a prior on the model parameters. The prior distribution for the latent variables is denoted and is given by (9), which we repeat here to make the -dependence explicit in our notation


We emphasise that is only known up to a normalizing constant that is a function of . The likelihood function is the Gaussian density given by (26). The joint posterior density then has the form


and is ‘doubly-intractable’ as both the normalisation factor of (28) and the function are unknown. The estimation of is a notoriously challenging problem as it requires the integration of a complex function over a high-dimensional space  [13, 15]. The normalization constant is a probability-weighted sum of all possible outcomes and is a necessary penalty against model complexity. The -dependence for the -term poses significant computational challenges as the joint posterior density cannot be evaluated at all, not even up to an irrelevant multiplicative constant.

3.2 Computational strategies

To explore the posterior distribution we resort to numerical simulation and use Markov chain Monte Carlo (MCMC) to estimate integrals of the form


where is an integrable function of interest. For example, (29) can be used to compute the mean, variance and density estimates of the posterior marginals. As suggested in [13], we can use an approximate method to estimate . We consider the quadratic approximation of that is obtained from a second-order Taylor expansion around its global minima


Since the integral for only has significant contributions in the neighbourhood of , where (30) is a good approximation, we estimate as


This is known as a saddle point approximation and is asymptomatically accurate as  [26]. In all but special cases, the global minima of is unique and can be found inexpensively using Newton-based optimisation, for example, using the L-BFGS algorithm with the right initial condition [27]. We run the optimisation procedure for multiple initialisations to provide a good coverage of the basins, although this is only necessary for . The curvature term is given by (A.5). With (31) we can proceed with the MCMC scheme in Appendix C.

To obtain more accurate posterior summaries, especially in the case that the saddle point approximation performs poorly, we look towards a consistent estimator of (29). Despite the intractable term, we are able to construct a Markov Chain such that


The estimator requires that we can obtain unbiased estimates of the reciprocal normalizing constant , which can obtained by randomly truncating an infinite series involving importance sampling estimates of  [15, 28]. The estimator given by (32) is an importance-sampling style estimator, but with each weight equal to the sign of the unbiased estimate of for that iteration. The suitability of the scheme is dependent on being able to obtain precise importance sampling estimates of , which is challenging for low-noise regimes due to the concentration of measure. Negative values of arise from imprecision in the estimates and have the effect of increasing the variance of the estimator given by (32). Further details of the scheme are in Appendix C.

3.3 Implementation details

We specify weakly-informative uniform priors on and , restricted to the interval with a suitable scaling of determined by a preliminary study, as done in [1, 21]666In our implementation, we normalize the cost matrix so that all elements sum to .. In this setting, we are able to compare our inferred and values with the -squared analysis performed for the deterministic Harris and Wilson model in [1, 21]. Whilst ideally we would place priors on all parameters that specify the Boltzmann-Gibbs measure, we acknowledge that in doing so we would encounter both identifiability issues and tuning difficulties with regards to the importance sampling scheme for . We are able to proceed by fixing the remaining hyperparameters to suitable values. We specify to avoid over-parametrising the model and specify to reflect a desired level of noise. We set to the size of the smallest zone. This is justified by considering the Gamma distribution of a zone with no inward flows. We normalize the origin quantities and total sizes to determine from (25). Lastly, For demonstration purposes, we specify independent and homogeneous observation noise by setting where is the standard deviation of the noise.

To compute low-order summary statistics of the form in (29), we use the Monte Carlo scheme in Appendix C, comprising of a block Gibbs scheme. We use a Metropolis-Hastings random walk with reflective boundaries for the -updates and Hamiltonian Monte Carlo (HMC) for the -updates. We tune the step size parameter for the -updates to obtain an acceptance rate in the range , and we tune the step size and number of steps for the -updates to obtain an acceptance rate of at least . For the -updates, we either require global minima of , for the saddle point approximation in (31), or unbiased estimates of , for the pseudo-marginal MCMC framework described in Appendix C. When requiring a global minima of , we perform multiple runs of the L-BFGS algorithm for different initial conditions. When requiring consistent estimates of , we use annealed importance sampling (AIS) with HMC transition kernels. We initialize AIS with the log-gamma distribution that can be obtained from by letting . We produce unbiased estimates by truncating an infinite series of importance sampling estimates with a random stopping time with , and therefore requiring runs of AIS. Running AIS a large number of times is a computationally intensive task, however, the estimates can be obtained in parallel.

4 Case study: the London Retail system

In this section, we illustrate our proposed methodology with an aggregate retail model using London data similar to the example in [1, 21]777The code used for the results in this section is available at Whilst the model can be improved with disaggregation to capture further problem-specific characteristics, the underlying arguments would remain the same. We demonstrate how the Boltzmann-Gibbs measure can be used to simulate configurations of urban structure before setting out to infer the and values in the utility function. In the context of retail, the attractiveness term in (13) is justified by the benefit consumers gain from the improved range of options and economies of scale, and the cost term represents inconvenience of travel. The inverse problem is of particular interest in the context of retail as the flow data is difficult to obtain. On the other hand, urban structure is relatively straight-forward to measure and may be routinely available. Whilst some attempts have been made in the literature to estimate the parameters of a similar spatial interaction model [1, 21], these approaches are somewhat ad hoc but do provide a basis of comparison.

We obtain measurements of retail floorspace for London town centres for from a London Town Centre Health Check report prepared by the Greater London Authority [29]. We only include town centres classified as international, metropolitan and major town centre classifications in our study, giving town centres. The remaining town centres are mostly district town centres that have a relatively high concentration of convenience goods and more localised catchment; we argue that these would be better modelled separately. We determine the origin quantities from ward-level household and income estimates, with , published by the Greater London Authority [30, 31]. We take the origin quantities to be the spending powers as given by the population size multiplied by the average income. The floorspace measurements and residential data is presented in Figure 3, over the map of London [32]. In our implementation, we calculate the cost matrix from Euclidean distance, although a better representation would use a transport network [1].

Figure 3: Visualisation of the observation data (red) over the map of London. The red markers indicate the destination zones, which are the town centres, and the blue markers indicate the origin zones, which are the residential wards. The sizes of the markers are given by the respective and values, and each zone is plotted at its longitude-latitude coordinate.

We first perform a preliminary study of our model in the limit of no observation noise , in which case the -marginal of (28) is


With this simplification, we are able to evaluate the posterior probabilities over a grid of and values. We evaluate the probabilities over a grid for and , representing high-noise and low-noise regimes, respectively. Using the justification given in the previous section, we specify and . We produce the grid by estimating with the saddle point approximation in (31)888It is very difficult to estimate for high values of using importance sampling techniques due to the concentration of measure.. The results are presented in Figure 4, in which the scales indicate that the model with high-noise provides the better explanation of the data. We find that the best fit for the high-noise regime is and and the best fit for the low-noise regime is and . As expected, the low-noise regime suggests stronger attractiveness effects as the model with a higher level of noise is able to explain variation by stochastic growth. The alpha and beta values are positively correlated; this can be seen in Figure 4 and is due to the competing effects in the utility function in (13).

Figure 4: Evaluations of the logarithm of (33) over a grid of values of and for a regime with high-noise (left) and a regime with low-noise (right).

In [1], the authors perform an -squared analysis that we replicate here for our deterministic version of the Harris and Wilson model for a basis of comparison. The predicted value is taken to be the equilibrium obtained from the ODE model given by (23) with and the initial condition . The -squared value is defined as , where is the ratio of the variance of the residuals and the variance of the observed . Whilst our Bayesian approach is fundamentally different, and we should not expect to obtain too similar results, the -squared analysis yield a best fit of and . This is agreeable with the findings for the low-noise regime in Figure 4. Furthermore, there are some strong similarities between the profile of posterior probabilities and the profile of the -squared values. First, both approaches find that the poorest fit is for a regime in which is too high and beta is too low; these values result in most activity taking place in a single zone. Second, both approaches agree in that a good fit can be found for .

Next, we draw the latent variables from the prior distribution to verify the suitability of the modelling. For illustrative purposes, we consider a range of values across , and hold fixed. For the regime with high-noise, the approximate draws are obtained by running a Markov chain of length using HMC combined with parallel tempering for different temperature levels [14]. For the regime with low-noise, we plot configurations of the global minima of obtained from numerical optimisation as there is little variation between samples. The results are in Figures. 5-6, respectively. It can be seen that higher values of and lower values of create a sparse structure in that all activity takes place in very few zones. Conversely, lower values of and higher values of lead to a more dense structure.

Figure 5: Approximate draws of the latent variables from for a high-noise regime with , obtained by running a Markov chain of length using HMC combined with parallel tempering. Each row shows randomly selected states from the Markov chain with as specified and .
Figure 6: Global minima of the latent variables from , obtained by running the L-BFGS algorithm for different initial conditions. These configurations are representative of draws from in a low-noise regime with .

We now return to the observation model in (26) to account for observation noise in the data. For illustrative purposes we specify so that the relative noise for a zone of size 999Relative noise in this context is . is . Although an improved specification of observation noise from a preliminary study would lead to more accurate inferences, the arguments and methodology we are presenting would remain the same. We run a Markov chains of length . For the high-noise regime, we use the pseudo-marginal MCMC methodology in Appendix C. Our importance sampling estimates comprised of particles and equally spaced inverse-temperatures. For the low-noise regime, we were unable to obtain precise importance sampling estimates due to the concentration of measure, so used the MCMC methodology in Appendix C with the saddle point approximation in (31). For both examples, the empirical autocorrelation for and is below after steps. For pseudo-marginal MCMC scheme, 88% of the signs are positive, which is acceptably high for the scheme to be used.

Plots of the smoothed density estimates for and for the high-noise regime are presented in Figure 7. Plots of the latent sizes showing the posterior mean plus or minus three standard deviations are presented in Figure 8 alongside plots of the expected residuals and observation data. The posterior-marginals of and give mean plus or minus one standard deviation estimates of and , respectively, which appear reasonable in light of the analysis in Figure 4. The plots of the expected residuals and observation data suggest that the model provides a reasonable fit to the data, and that the assumption of homogeneous observation noise is reasonable. This is to be expected as the high-noise model provides a flexible model. After taking into account the observation noise, a weaker attractiveness effect was observed.

Similar plots are presented in Figures. 9-10 for the low-noise regime, and the posterior-marginals of and give a mean of and , respectively. The inferred values for the low-noise regime are in line with Figure 4. Both sets of posterior summaries suggest that attractiveness and inconvenience effects are present in the data, though the inferred values are considerably higher for the low-noise model. The plots of the expected residuals and observation data suggest that the model also provides a reasonable fit to the data, however, there is more dispersion in the plotted quantities and possibly a degree of heteroscedasticity due to the less flexible model. The model with more noise favours the simpler explanation that most variation is due to stochastic growth, whereas the low-noise model is more constrained. There is noticeably more uncertainty in the latent variables and the model parameters for the high-noise regime, as there are more possible explanations for the observation data. The uncertainty in the and estimates is so great for the high-noise regime that limited insights are gained for the purposes of model calibration. As a result, we conclude that strong assumptions are required in the prior modelling in order to be able to exploit known structure in the data generating process. The required assumptions can be made, for example, through the prior modelling of and or by specifying a high value of . On the other hand, the low-noise regime results in very confident posteriors. Although the resulting inferences are consistent with previous studies, care must be taken to avoid being overconfident in a particular model by not adequately accounting for uncertainty in the modelling process.

Figure 7: Posterior marginal density estimates for and for the high-noise regime (). The smooth density estimates were obtained by applying (29) to a Gaussian kernel. The blue line indicates the uniform prior density.
Figure 8: Visualisation of the posterior latent variables for the high-noise regime (). Left: The outer and inner rings show the posterior mean plus or minus three standard deviations, respectively. Top-right: Expected attractiveness against the expected residual. Bottom-right: Expected attractiveness against observed value.
Figure 9: Posterior marginal density estimates for and for the low-noise regime (). The smooth density estimates were obtained by applying (29) to a Gaussian kernel. The blue line indicates the uniform prior density.
Figure 10: Visualisation of the posterior latent variables for the low-noise regime (). Left: The outer and inner rings show the posterior mean plus or minus three standard deviations, respectively. Top-right: Expected attractiveness against the expected residual. Bottom-right: Expected attractiveness against observed value.

5 Discussion

We have developed a novel stochastic model to simulate realistic configurations of urban and regional structure. Our model is a substantial improvement on existing deterministic models as it fully addresses the uncertainties arising in the modelling process. Unlike existing time-stepping schemes, our model can be used to simulate realistic configurations of urban structure using MCMC methods without recourse to numerical error. We have demonstrated that our model can be used to infer the components of a utility function from observed structure, thereby providing an alternative to the existing discrete choice models. The key advantage is that we avoid the need to collect vast amounts of flow data. Whilst we have presented our methodology in the context of consumer-led behaviour, our approach is applicable to other urban and regional settings such as archaeology, logistics, health care and crime to suggest a few.

Our work has led to specific areas for further research. We are actively investigating the deployment of our methodology to large scale urban systems, for which there are substantial computational challenges to overcome. The cost of a potential or gradient evaluation is , however, increasing means that the estimates are more challenging to obtain owing to the curse of dimensionality. It is of interest to develop more tractable methods, for example, optimisation-based, so that inference can be performed for international models on a practical time scale. We have presented an aggregate model that can be refined to better represent domain-specific characteristics as discussed in [1]. It remains to use the proposed methodology as part of a more realistic study with wider objectives. Lastly, we emphasise that our methodology is only applicable to cross sectional data. In practice, many applications of interest require processing time-series data that is highly correlated over time. In this setting, we would need to solve the filtering or smoothing problem for (23), and in doing so would also need to account for general trends and seasonality effects that are exogenous to our model. Our work continues to be part of ongoing efforts to draw insights from data by making use of the known mathematical structure [33].

Appendix A Assumptions for the potential function

We make the following assumptions for the potential function with reference to the overdamped Langevin diffusion described by (6) and Boltzmann-Gibbs measure defined by (9):

  1. is and is confining in that and

  2. satisfies the following inequality for some


Assumption (i) is necessary to ensure that the Boltzmann-Gibbs measure is well-defined. Assumptions (i) and (ii) are sufficient to show that the distribution of , described by (6), converges exponentially fast to its Boltzmann-Gibbs measure. The reader is referred to [34] for further details.

The integrability condition in assumption (i) is satisfied by Lemma 3.14 in [35]. To show that assumption (ii) holds, we define






Then for we have




as claimed.

Appendix B Utility function for a singly-constrained potential

In this appendix, we present an alternative argument to obtain the utility potential as defined by (12). We rewrite the destination quantities in (2) as a gradient flow


and look towards specifications of that satisfy the constraint in (1). The constraint is satisfied whenever the flows leaving the origin zones are convex sums in that


Instead we can express (B.2) in terms of utility functions and some positive function so that


By inspection, we look for a potential function of the form


for some functions . Then by taking the gradient and substituting into (B.1), we obtain the requirements


for . The requirements are satisfied for when each utility function is a linear with respect to the attractiveness of the destination zone in question,


provided that each and that . The resulting potential function is


which is slightly more general than the potential considered in (12).

Appendix C Markov chain Monte Carlo for doubly intractable distributions

The idea behind MCMC is to construct an ergodic Markov chain whose time-average can be used to estimate integrals of interest [14, 36]. We use a block Metropolis-within-Gibbs scheme and alternate between -updates and -updates. The following steps can be repeated in succession to obtain a Markov-chain chain that is -invariant:

Latent variable update.

Hamiltonian Monte Carlo can be used for the -updates to suppress random walk behaviour [37]. We propose momentum variables and update by simulating Hamiltonian dynamics with a volume-preserving integrator to obtain . We accept/reject according to the Metropolis-Hastings acceptance probability


to correct for numerical error from numerically simulating Hamiltonian dynamics.

Model parameter update.

Random walk Metropolis with reflective boundaries can be used for the -updates. The Metropolis-Hastings acceptance probability is given by


The key challenge arises from proposing new -values, in which case the acceptance probability contains an intractable ratio We can either proceed with a deterministic estimate of , at the expense of a bias, or we can obtain a consistent estimator of (29) with pseudo-marginal MCMC [38] provided that unbiased and reasonably precise estimates of are available.

c.1 Unbiased estimates of the reciprocal of the normalizing constant

An unbiased estimate of is given by averaging over a batch of importance weights. The importance weights are evaluations of


at locations drawn from a proposal distribution with density . By Jensen’s inequality, the reciprocal of an importance sampling estimate of is a biased estimate of . Instead, unbiased estimates of can be obtained by randomly truncating an infinite series: for a sequence satisfying , and a random stopping time , the estimator


gives an unbiased estimate of  [28, 39, 40, 41]. We follow [15] and use the increasing averages estimator


with reference to the importance sampling weights described by (C.3). The unbiased estimates of can have high variance when the importance weights are highly variable. Fortunately, importance sampling may be carried out on an augmented state space, for example, using annealed importance sampling (AIS) [42].

c.2 Pseudo-marginal Markov chain Monte Carlo

The unbiased estimators of given by (C.4) may be negative estimate, which prohibits the use of pseudo-marginal MCMC. Fortunately, the so called ‘sign problem’ has been addressed in [28]. We can use the following importance sampling style of estimator that gives a consistent estimator of (29) in that


where is a Markov-chain obtained using the Metropolis-within-Gibbs scheme described at the start of this section but with the following acceptance probability for -updates


and . It is necessary to cache the value of at each iteration as part of the pseudo-marginal MCMC scheme.

Appendix D Table of key parameters and variables

For convenience, we provide a table of the key parameters and variables with brief explanations below:

Parameter Explanation Reference
Attractiveness scaling parameter. (3)
Cost scaling parameter. (3)
Additional parameter. (10)
Responsiveness parameter. (4)
Inverse-temperature. (6)
Cost per unit size. (4)
Standard deviation of observation noise. (26)
Noise parameter, equal to . (23)
Origin quantity for origin zone . (1)
Destination quantity for destination zone . (2)
Flow from origin zone to destination zone . (1)
Utility function for a flow from origin zone to destination zone . (13)
Cost of a flow from origin zone to destination zone . (3)
Size of destination zone . (4)
Attractiveness of destination zone , given by . (6)
Observed size of destination zone . (26)
Number of origin zones. (1)
Number of destination zones. (2)


L.E. was supported by EPSRC [EP/P020720/1]. MG was supported by EPSRC [EP/J016934/3, EP/K034154/1, EP/P020720/1, EP/R018413/1], an EPSRC Established Career Fellowship and The Alan Turing Institute - Lloyd’s Register Foundation Programme on Data-Centric Engineering. G.P. was supported by EPSRC [EP/P031587, EP/L020564, EP/L024926, EP/L025159]. A.W. was supported by EPSRC [EP/M023583/1]. This work was supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1.


  • [1] Joel Dearden and Alan Geoffrey Wilson. Explorations in urban and regional dynamics: a case study in complexity science, volume 7. Routledge, 2015.
  • [2] Alfred Marshall. Principles of economics: an introductory volume. Royal Economic Society (Great Britain), 1920.
  • [3] D. McFadden. Conditional logit analysis of qualitative choice behaviour. In P. Zarembka, editor, Frontiers in Econometrics, pages 105–142. Academic Press New York, New York, NY, USA, 1973.
  • [4] George Baltas and Peter Doyle. Random utility models in marketing research: a survey. Journal of Business Research, 51(2):115–125, 2001.
  • [5] Alan G Wilson. A statistical theory of spatial distribution models. Transportation research, 1(3):253–269, 1967.
  • [6] Britton Harris and Alan Geoffrey Wilson. Equilibrium values and dynamics of attractiveness terms in production-constrained spatial-interaction models. Environment and planning A, 10(4):371–388, 1978.
  • [7] Alan Geoffrey Wilson. Complex spatial systems: the modelling foundations of urban and regional analysis. Pearson Education, 2000.
  • [8] Alan Geoffrey Wilson. Entropy in urban and regional modelling. Routledge, 2011.
  • [9] Alan G Wilson. The science of cities and regions: lectures on mathematical model design. Springer Science & Business Media, 2012.
  • [10] Grigorios A Pavliotis. Stochastic processes and applications. Springer, 2016.
  • [11] Andrzej Lasota and Michael C Mackey. Chaos, fractals, and noise: stochastic aspects of dynamics, volume 97. Springer Science & Business Media, 2013.
  • [12] Jari Kaipio and Erkki Somersalo. Statistical and computational inverse problems, volume 160. Springer Science & Business Media, 2006.
  • [13] Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. Bayesian data analysis, volume 2. CRC press Boca Raton, FL, 2014.
  • [14] Jun S Liu. Monte Carlo strategies in scientific computing. Springer Science & Business Media, 2008.
  • [15] Iain Murray, Zoubin Ghahramani, and David J. C. MacKay. MCMC for doubly-intractable distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), pages 359–366. AUAI Press, 2006.
  • [16] Andrew B Duncan, Tony Lelievre, and GA Pavliotis. Variance reduction using nonreversible langevin samplers. Journal of statistical physics, 163(3):457–491, 2016.
  • [17] Huw C Williams. On the formation of travel demand models and economic evaluation measures of user benefit. Environment and planning A, 9(3):285–344, 1977.
  • [18] Kenneth A Small and Harvey S Rosen. Applied welfare economics with discrete choice models. Econometrica: Journal of the Econometric Society, pages 105–130, 1981.
  • [19] Gerard De Jong, Andrew Daly, Marits Pieters, and Toon Van der Hoorn. The logsum as an evaluation measure: review of the literature and new results. Transportation Research Part A: Policy and Practice, 41(9):874–889, 2007.
  • [20] Frank Nielsen and Ke Sun. Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities. Entropy, 18(12), DEC 2016.
  • [21] Joel Dearden and Alan Wilson. A framework for exploring urban retail discontinuities. Geographical Analysis, 43(2):172–187, 2011.
  • [22] Daniel McFadden. Modeling the choice of residential location. In Spatial Interaction Theory and Planning Models. 1978.
  • [23] Daniel McFadden. Econometric models of probabilistic choice. Structural analysis of discrete data with econometric applications, 198272, 1981.
  • [24] Daniel McFadden and Kenneth Train. Mixed mnl models for discrete response. Journal of applied Econometrics, pages 447–470, 2000.
  • [25] Alex Anas. Discrete choice theory, information theory and the multinomial logit and gravity models. Transportation Research Part B: Methodological, 17(1):13–23, 1983.
  • [26] Ronald W Butler. Saddlepoint approximations with applications, volume 22. Cambridge University Press, 2007.
  • [27] Jorge Nocedal and Stephen Wright. Numerical optimization. Springer Science & Business Media, 2006.
  • [28] Anne-Marie Lyne, Mark Girolami, Yves Atchadé, Heiko Strathmann, Daniel Simpson, et al. On russian roulette estimates for bayesian inference with doubly-intractable likelihoods. Statistical science, 30(4):443–467, 2015.
  • [29] Greater London Authority. 2009 london town centre health check analysis report. Technical report, 2009.
  • [30] Greater London Authority. Ward profiles and atlas., 2016.
  • [31] Greater London Authority. Household income estimates for small areas., 2015.
  • [32] Greater London Authority. Statistical gis boundary files for london., 2016.
  • [33] A Apte, Christopher KRT Jones, AM Stuart, and Jochen Voss. Data assimilation: Mathematical and statistical perspectives. International journal for numerical methods in fluids, 56(8):1033–1046, 2008.
  • [34] Gareth O Roberts and Richard L Tweedie. Exponential convergence of langevin distributions and their discrete approximations. Bernoulli, pages 341–363, 1996.
  • [35] Georg Menz, André Schlichting, et al. Poincaré and logarithmic sobolev inequalities by decomposition of the energy landscape. The Annals of Probability, 42(5):1809–1884, 2014.
  • [36] Christian P Robert. Monte carlo methods. Wiley Online Library, 2004.
  • [37] Radford M Neal. Mcmc using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11), 2011.
  • [38] Christophe Andrieu and Gareth O Roberts. The pseudo-marginal approach for efficient monte carlo computations. The Annals of Statistics, pages 697–725, 2009.
  • [39] Don McLeish. A general method for debiasing a monte carlo estimator. Monte Carlo Methods and Applications, 17(4):301–315, 2010.
  • [40] Peter W Glynn and Chang-han Rhee. Exact estimation for markov chain equilibrium expectations. Journal of Applied Probability, 51(A):377–389, 2014.
  • [41] Colin Wei and Iain Murray. Markov chain truncation for doubly-intractable inference. In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 776–784, Fort Lauderdale, FL, USA, 20–22 Apr 2017. PMLR.
  • [42] Radford M Neal. Annealed importance sampling. Statistics and Computing, 11(2):125–139, 2001.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description