Stochastic Modelling of Urban Structure^{1}^{1}1Ellam L, Girolami M, Pavliotis GA,Wilson A. 2018 Stochastic modelling of urban structure. Proc. R. Soc. A 20170700. http://dx.doi.org/10.1098/rspa.2017.0700
Abstract
The building of mathematical and computer models of cities has a long history. The core elements are models of flows (spatial interaction) and the dynamics of structural evolution. In this article, we develop a stochastic model of urban structure to formally account for uncertainty arising from less predictable events. Standard practice has been to calibrate the spatial interaction models independently and to explore the dynamics through simulation. We present two significant results that will be transformative for both elements. First, we represent the structural variables through a single potential function and develop stochastic differential equations (SDEs) to model the evolution. Secondly, we show that the parameters of the spatial interaction model can be estimated from the structure alone, independently of flow data, using the Bayesian inferential framework. The posterior distribution is doublyintractable and poses significant computational challenges that we overcome using Markov chain Monte Carlo (MCMC) methods. We demonstrate our methodology with a case study on the London retail system.
1 Introduction
The task of understanding the inner workings of cities and regions is a major challenge for contemporary science. The key features of cities and regions are activities at locations, flows between locations and the structure that facilitates these activities [1]. It is well understood that cities and regions are complex systems, and that an emergent structure arises from the actions of many interacting individuals. The flows between locations arise from the choices of individuals. An understanding of the underlying choice mechanism is therefore advantageous for planning and decision making. Economists have long supported the idea that consumer choices are derived from utility, a measure of net benefit, although preferences can only be measured indirectly by the phenomena they give rise to [2].
Random utility models, such as the multinomial logit model [3], provide a discrete choice mechanism based on a utility function. These models have received considerable attention in the econometrics literature [4]. The more conventional random utility models assume that choices are conditionally independent and require large volumes of flow data to calibrate. It is generally difficult to ascertain the flow data for a large number of individuals residing in a country or city, and this may require an extensive survey that suffers from sampling biases. On the other hand, the structure facilitating activities can be more straightforward to measure.
It turns out that the flows between locations concern a vast number of individuals and are wellrepresented by statistical averaging procedures [5]. It also turns out that the evolution of urban structure can be described by a system of coupled firstorder ordinary differential equations that are related to the competitive LotkaVolterra models in ecology [6]. The conventional Harris and Wilson model in [6] is obtained by combining LotkaVolterra models with statistical averaging procedures, after having expressed the flows in terms of the evolving structure and spatial interaction. As it tends to be more feasible to observe the emergent structure, for example, configurations of floorspace dedicated to retail activity, our work is largely motivated by the existing models of urban structure [1, 6, 7, 8, 9]. By adopting a similar approach, we view the flows between locations as ‘missing data’.
We note, however, that there is an urgent need to provide an improved modelling capability that captures the stochastic nature and uncertainty associated with the evolution of urban structure. The key shortcoming of the Harris and Wilson model is that it is deterministic and converges to one of multiple equilibria as determined by the initial conditions. In reality, the behaviour provided by the Harris and Wilson model would be accompanied by fluctuations arising from less predictable events. We instead introduce a mathematically wellposed systems of stochastic differential equations (SDEs) to address this shortcoming, and provide an associated Bayesian inference methodology for parameter estimation and model calibration.
To this end, we take a novel approach and construct a probability distribution to represent the uncertainty in equilibrium structures for urban and regional systems. The probability distribution is a BoltzmannGibbs measure that is the invariant distribution of a related SDE model [10], and is defined in terms of a potential function whose gradient describes how we expect urban structure to evolve forward in time. The potential function may be interpreted as constraints on consumer welfare and running costs from a maximum entropy argument [8, 11]. For the purposes of parameter estimation, the BoltzmannGibbs measure forms an integral part of the assumed data generating process in a Bayesian model of urban structure [12, 13]. A computational statistical challenge arises as there is an intractable term in the density of the BoltzmannGibbs measure that is parameterdependent. The intractable term must be taken into consideration when using Markov chain Monte Carlo (MCMC) to explore the probability distributions of interest [14, 15]. Our approach is applicable to a wide range of applications in urban and regional modelling; we demonstrate our approach by inferring the full distribution over the model parameters and latent structure for the London retail system.
2 Modelling urban systems
In this section, we construct a probability distribution for urban and regional systems. We work in the setting of the Harris and Wilson model [6] and use consumer behaviour as an archetype, however, the methodology is general and has wider applications such as archaeology, logistics, health care and crime to name a few [1]. We are interested in the sizes of destination zones where consumerled activities take place, for example, shopping. Similarly, there are origin zones from where consumers create demands for each of the destination zones. We define urban structure as the vector of sizes . In what proceeds it is more natural to work in terms of logsizes where each . We refer to logsize as the attractiveness, which is an unscaled measure of benefit, and by working in terms of attractiveness we avoid positivity issues when developing a stochastic model. We first describe a stochastic generalisation of the Harris and Wilson model and then consider the equilibrium distribution as a probability distribution of urban structure.
2.1 A stochastic reformulation of the Harris and Wilson model
The flow of between destination zone and origin zone is denoted . We illustrate a component of an urban or regional system in Figure 1. For a singlyconstrained urban system, the demands made by the origin zones are
(1) 
and are known. The demands made for the destination zones are
(2) 
and are to be determined. The demands for the destination zones depend on urban or regional structure. It is assumed that larger zones provide more benefits for their use and that local zones are more convenient and cost less to use. A suitable model of the flows is obtained by maximizing an entropy function subject to the constraint in (1) in additional to fixed benefit and cost constraints [5, 8]. The resulting flows are
(3) 
where are scaling parameters and each represents the cost or inconvenience of carrying out an activity at zone from .
We expect that zones with unfulfilled demand will grow, whereas zones that do not fulfil their capacity will reduce to a more sustainable size. It is therefore reasonable to expect a degree of stability in the sizes of the destination zones. A suitable model of the dynamics is given by the Harris and Wilson model [6], which is described by a system of ordinary differential equations (ODEs)^{2}^{2}2For simplicity, we are not explicit about timedependence when describing differential equations.
(4) 
where is the responsiveness parameter and is the cost per unit floor size. The assumption that zones aim to maximize their size until an equilibrium is reached is justified by including the cost of capital in the running costs. A natural generalisation of the Harris and Wilson model is the following SDE with multiplicative noise that we interpret in the Stratonovich^{3}^{3}3The Stratonovich interpretation is obtained from a smooth approximation of white noise, which is appropriate for the dynamical system of interest. For further discussion refer to [10]. sense
(5) 
for a standard dimensional Brownian motion and volatility parameter . A heuristic interpretation of the SDE is that over a short time , the net capacity term in (4) is randomly perturbed by centred Gaussian noise with a standard deviation of . The noise term represents fluctuations in the growth rates arising from less predictable events that are not captured by the original model. The specification of multiplicative noise preserves the positivity of each .
With the change of variables , the Harris and Wilson model in (4) can be expressed as a gradient flow. The corresponding stochastic dynamics in (5) is an overdamped Langevin diffusion. To express this notion, we introduce a potential function , its gradient and an ‘inversetemperature’ parameter and reformulate the stochastic dynamics as
(6) 
where the potential function is
(7) 
It is well understood that the density of , denoted , evolves in time according to the FokkerPlanck equation [10]. For the SDE in (6), the FokkerPlanck equation can be written as
(8) 
Whilst (8) is very challenging to solve, especially in higherdimensions, its steadystate solution is available in closed form and is the density of a BoltzmannGibbs measure given by
(9) 
The BoltzmannGibbs measure described by (9) forms the basis of our stochastic model of urban structure. The potential function given by (7) does not yield a welldefined probability distribution as the normalizing constant in (9) is not finite. In order to address the issue we could restrict the dynamics to a bounded subset of , or introduce a confining term in the potential function. We adopt the latter approach and later argue that this approach amounts to an economically meaningful constraint.
2.2 BoltzmannGibbs measures for urban structure
We model urban and regional structure as a single realisation of the BoltzmannGibbs measure described by (9). The BoltzmannGibbs measure is the stationary distribution of the overdamped Langevin dynamics considered, however, we acknowledge that there are other stochastic processes that have the same stationary distribution [16]. It is desirable that the potential function satisfies the assumptions in Appendix A. It suffices to say here that smooth potential functions that grow at least linearly but no faster than exponentially at infinity have the desired mathematical properties.
The BoltzmannGibbs measure can also be obtained from a maximum entropy argument [8, 11]. The advantage of this view is that the terms in the potential function can be interpreted as economic constraints. We consider a potential function with three components to develop a baseline model, although more comprehensive presentations are possible^{4}^{4}4We present a simple aggregated model to illustrate the ideas. A refined model with disaggregation does not change the underlying arguments.:
(10) 
where is as before and is an additional parameter. The utility potential describes consumer welfare arising from utilitybased choices; the cost potential enforces capacity limits in the system; and the additional potential is a confining term that represents government initiatives, continued investment or a background level of demand. If we consider a random variable that is subject to the following equality constraints
(11)  
with each , then the maximum entropy distribution of can be written as the BoltzmannGibbs measure whose density is given by (9) with reference to the potential function in (10). We now consider the meaning of each of these constraints in turn.
2.2.1 Utility potential
A natural candidate for a utility potential is a measure of consumer welfare. For example, welfare may be taken to be the area under the demand curve [17], given by (2), that is equal to the path integral
(12)  
where we have defined the deterministic utility function
(13) 
In Appendix B we show that (13) but with dependent on is also obtained by seeking a utility function consistent with a singlyconstrained model and the path integral in (12). The logsum function is commonly used as a welfare measure in the economics literature [17, 18, 19]. To make the connection with random utility maximisation explicit, we define the stochastic utility function for a choice being made from origin zone as
(14) 
where the are independent and identically distributed Gumbel random variables. Then under the utility maximisation framework, the expected utility attained from a unit flow leaving origin zone is
(15) 
where is the EulerMascheroni constant [17]. The utility potential may then be expressed as the expected utility attained from all flows in units of
(16) 
Two remarks are in order. First, the scaling factor of is necessary to ensure that the utility potential is nonconstant in the limit . Second, the tight bounds [20]
(17) 
show that may be finite when any , and so an additional potential is needed to prevent zones from collapsing from a lack of activity. The bounds again show that the utility potential is closely related to the best alternative available to each of the origin zones.
2.2.2 Cost potential
The cost potential prevents each zone from becoming too large, and is justified by the notion that running costs increase with size. We therefore require that . In view of the equality constraints in (11), an appropriate cost potential is the total size or capacity of the system
(18) 
in which is as before. Since we have
(19) 
this choice of potential yields the linear cost term in the overdamped Langevin dynamics considered in (6).
2.2.3 Additional potential
The final potential term must satisfy and must grow sufficiently fast at infinity in order for (9) to be welldefined. The purpose of the additional potential is to prevent zones from collapsing from a lack of activity. Such mechanisms are commonplace in urban and regional systems, for example, continued investment or government initiatives. In view of the equality constraints in (11), a suitable potential function is
(20) 
which ensures that the attractiveness of each zone is finite. The partial derivatives of the additional potential function are
(21) 
Therefore the finiteness constraint requires that there is an additional positive constant term in the deterministic part of the SDE model, given by (6), to ensure that the SDE has a welldefined stationary distribution.
2.3 Model Summary
In summary, we have specified the following potential function
(22) 
which satisfies the assumptions in Appendix A. The potential function is similar the one obtained by reformulating the Harris and Wilson model in (6)(7), however, it contains an additional term to prevent zones from collapsing. A stochastic generalisation of the Harris and Wilson model is given by the overdamped Langevin diffusion in (6), for which the process converges at a fast rate to the welldefined BoltzmannGibbs measure described by (9). The corresponding size dynamics, in the form of (5), are given by the Stratonovich SDE
(23) 
which is a stochastic generalisation of the original Harris and Wilson model that includes a positive shift to the multiplicative scale factor. In the limit , we obtain the original Harris and Wilson model in (4).
In the regime , the potential function has stationary points coinciding with the fixed points of the original Harris and Wilson model. The stationary points for the potential function are given by simultaneous equations
(24) 
Whilst the behaviour of the stochastic and deterministic models may be similar in lownoise regimes over finite time intervals, we emphasise that the asymptotic behaviour differs greatly between the two. Here we consider a deterministic model to be given by (23) in the limit . For a deterministic model, the dynamics will converge to a stable fixed point satisfying (24), as determined by the initial condition. For a stochastic model, the system will converge to a statistical equilibrium that does not depend on the initial condition. As , the stochastic model spends more time around the lower values of , which occur around stable stationary points, as summarised by the limiting stationary distribution given by (9).
We now comment on the BoltzmannGibbs measure described by (9). The BoltzmannGibbs measure is the equilibrium distribution of (6), but is also justified as a probability distribution for urban and regional structure with a maximum entropy argument. When considering the BoltzmannGibbs measure, we specify to avoid overparametrising the model since the relative level of noise is controlled by the inverse temperature . As , the Boltzmann distribution collapses to a Dirac mass around the global minimum of , which is unlikely to provide a good fit to the observed urban structure. As , the distribution of sizes approaches an improper uniform distribution. The profile of is largely influenced by the pair of and values, as illustrated in Figure 2. A large relative to results in all activity taking place in one of the zones, whereas this regime is unlikely when is low relative to .
Lastly, we can use the deterministic model to specify appropriate values of the cost of floorspace and the additional parameter . By defining
(25) 
the deterministic model converges to an equilibrium with a total size of units. Setting as in (25) is justified with a supply and demand argument [21]. For simplicity, we use ; the choice is arbitrary. We can then specify relative to the size of the smallest zone possible, since at equilibrium the size of a zone with no inward flows is .
3 Parameter estimation
In this section we consider the inverse problem; the task of determining and from observed urban structure. The value of describes consumer preference towards more popular destinations and the value of describes how much consumers are inconvenienced by travel. We use retail activity as an archetype, however, our methodology is general and can be applied to other singlyconstrained systems. Whilst and can be estimated using discrete choice models [22, 23, 24, 25], this approach requires large volumes of flow data and is impractical for large systems. We instead make use of the model described by the BoltzmannGibbs measure in (2).
We formulate the task of inversion as a statistical inference problem, as advocated in [12]. The Bayesian approach is based on the following principles: the unknown parameters are modelled as random variables; our degree of uncertainty is described by probability distributions; and the solution to the inverse problem is the posterior probability distribution. Unlike classical methods, a Bayesian approach is wellposed and allows us to incorporate prior knowledge of the unknowns into the modelling process. A Bayesian approach yields a posterior probability over the model parameters, and the parameter values can be determined from the posterior mean or maximum a posteriori estimates.
3.1 A Bayesian approach to parameter estimation
The Boltzmann distribution in (2) is assumed to form an integral part of the data generating process, however, further uncertainty arises from measurement noise^{5}^{5}5It may be necessary to include a model error term if the BoltzmannGibbs measure provides a poor fit to the data.. To this end, we assume that an observed a configuration of urban structure is related to some latent sizes and multiplicative noise by
(26) 
Multiplicative noise is appropriate since all measurements are positive and there is more scope for error when measuring larger zones. As before, it is natural to work in terms of logsizes and we assume that is a realisation of the BoltzmannGibbs measure given by (9) and (22). The latent variables depend on the model parameters , which we summarise by a single variable for notational convenience. We assume that is a realisation of Gaussian noise for some symmetric positive definite covariance matrix .
We specify a prior on the model parameters. The prior distribution for the latent variables is denoted and is given by (9), which we repeat here to make the dependence explicit in our notation
(27) 
We emphasise that is only known up to a normalizing constant that is a function of . The likelihood function is the Gaussian density given by (26). The joint posterior density then has the form
(28) 
and is ‘doublyintractable’ as both the normalisation factor of (28) and the function are unknown. The estimation of is a notoriously challenging problem as it requires the integration of a complex function over a highdimensional space [13, 15]. The normalization constant is a probabilityweighted sum of all possible outcomes and is a necessary penalty against model complexity. The dependence for the term poses significant computational challenges as the joint posterior density cannot be evaluated at all, not even up to an irrelevant multiplicative constant.
3.2 Computational strategies
To explore the posterior distribution we resort to numerical simulation and use Markov chain Monte Carlo (MCMC) to estimate integrals of the form
(29) 
where is an integrable function of interest. For example, (29) can be used to compute the mean, variance and density estimates of the posterior marginals. As suggested in [13], we can use an approximate method to estimate . We consider the quadratic approximation of that is obtained from a secondorder Taylor expansion around its global minima
(30) 
Since the integral for only has significant contributions in the neighbourhood of , where (30) is a good approximation, we estimate as
(31)  
This is known as a saddle point approximation and is asymptomatically accurate as [26]. In all but special cases, the global minima of is unique and can be found inexpensively using Newtonbased optimisation, for example, using the LBFGS algorithm with the right initial condition [27]. We run the optimisation procedure for multiple initialisations to provide a good coverage of the basins, although this is only necessary for . The curvature term is given by (A.5). With (31) we can proceed with the MCMC scheme in Appendix C.
To obtain more accurate posterior summaries, especially in the case that the saddle point approximation performs poorly, we look towards a consistent estimator of (29). Despite the intractable term, we are able to construct a Markov Chain such that
(32) 
The estimator requires that we can obtain unbiased estimates of the reciprocal normalizing constant , which can obtained by randomly truncating an infinite series involving importance sampling estimates of [15, 28]. The estimator given by (32) is an importancesampling style estimator, but with each weight equal to the sign of the unbiased estimate of for that iteration. The suitability of the scheme is dependent on being able to obtain precise importance sampling estimates of , which is challenging for lownoise regimes due to the concentration of measure. Negative values of arise from imprecision in the estimates and have the effect of increasing the variance of the estimator given by (32). Further details of the scheme are in Appendix C.
3.3 Implementation details
We specify weaklyinformative uniform priors on and , restricted to the interval with a suitable scaling of determined by a preliminary study, as done in [1, 21]^{6}^{6}6In our implementation, we normalize the cost matrix so that all elements sum to .. In this setting, we are able to compare our inferred and values with the squared analysis performed for the deterministic Harris and Wilson model in [1, 21]. Whilst ideally we would place priors on all parameters that specify the BoltzmannGibbs measure, we acknowledge that in doing so we would encounter both identifiability issues and tuning difficulties with regards to the importance sampling scheme for . We are able to proceed by fixing the remaining hyperparameters to suitable values. We specify to avoid overparametrising the model and specify to reflect a desired level of noise. We set to the size of the smallest zone. This is justified by considering the Gamma distribution of a zone with no inward flows. We normalize the origin quantities and total sizes to determine from (25). Lastly, For demonstration purposes, we specify independent and homogeneous observation noise by setting where is the standard deviation of the noise.
To compute loworder summary statistics of the form in (29), we use the Monte Carlo scheme in Appendix C, comprising of a block Gibbs scheme. We use a MetropolisHastings random walk with reflective boundaries for the updates and Hamiltonian Monte Carlo (HMC) for the updates. We tune the step size parameter for the updates to obtain an acceptance rate in the range , and we tune the step size and number of steps for the updates to obtain an acceptance rate of at least . For the updates, we either require global minima of , for the saddle point approximation in (31), or unbiased estimates of , for the pseudomarginal MCMC framework described in Appendix C. When requiring a global minima of , we perform multiple runs of the LBFGS algorithm for different initial conditions. When requiring consistent estimates of , we use annealed importance sampling (AIS) with HMC transition kernels. We initialize AIS with the loggamma distribution that can be obtained from by letting . We produce unbiased estimates by truncating an infinite series of importance sampling estimates with a random stopping time with , and therefore requiring runs of AIS. Running AIS a large number of times is a computationally intensive task, however, the estimates can be obtained in parallel.
4 Case study: the London Retail system
In this section, we illustrate our proposed methodology with an aggregate retail model using London data similar to the example in [1, 21]^{7}^{7}7The code used for the results in this section is available at https://github.com/lellam/cities_and_regions. Whilst the model can be improved with disaggregation to capture further problemspecific characteristics, the underlying arguments would remain the same. We demonstrate how the BoltzmannGibbs measure can be used to simulate configurations of urban structure before setting out to infer the and values in the utility function. In the context of retail, the attractiveness term in (13) is justified by the benefit consumers gain from the improved range of options and economies of scale, and the cost term represents inconvenience of travel. The inverse problem is of particular interest in the context of retail as the flow data is difficult to obtain. On the other hand, urban structure is relatively straightforward to measure and may be routinely available. Whilst some attempts have been made in the literature to estimate the parameters of a similar spatial interaction model [1, 21], these approaches are somewhat ad hoc but do provide a basis of comparison.
We obtain measurements of retail floorspace for London town centres for from a London Town Centre Health Check report prepared by the Greater London Authority [29]. We only include town centres classified as international, metropolitan and major town centre classifications in our study, giving town centres. The remaining town centres are mostly district town centres that have a relatively high concentration of convenience goods and more localised catchment; we argue that these would be better modelled separately. We determine the origin quantities from wardlevel household and income estimates, with , published by the Greater London Authority [30, 31]. We take the origin quantities to be the spending powers as given by the population size multiplied by the average income. The floorspace measurements and residential data is presented in Figure 3, over the map of London [32]. In our implementation, we calculate the cost matrix from Euclidean distance, although a better representation would use a transport network [1].
We first perform a preliminary study of our model in the limit of no observation noise , in which case the marginal of (28) is
(33) 
With this simplification, we are able to evaluate the posterior probabilities over a grid of and values. We evaluate the probabilities over a grid for and , representing highnoise and lownoise regimes, respectively. Using the justification given in the previous section, we specify and . We produce the grid by estimating with the saddle point approximation in (31)^{8}^{8}8It is very difficult to estimate for high values of using importance sampling techniques due to the concentration of measure.. The results are presented in Figure 4, in which the scales indicate that the model with highnoise provides the better explanation of the data. We find that the best fit for the highnoise regime is and and the best fit for the lownoise regime is and . As expected, the lownoise regime suggests stronger attractiveness effects as the model with a higher level of noise is able to explain variation by stochastic growth. The alpha and beta values are positively correlated; this can be seen in Figure 4 and is due to the competing effects in the utility function in (13).
In [1], the authors perform an squared analysis that we replicate here for our deterministic version of the Harris and Wilson model for a basis of comparison. The predicted value is taken to be the equilibrium obtained from the ODE model given by (23) with and the initial condition . The squared value is defined as , where is the ratio of the variance of the residuals and the variance of the observed . Whilst our Bayesian approach is fundamentally different, and we should not expect to obtain too similar results, the squared analysis yield a best fit of and . This is agreeable with the findings for the lownoise regime in Figure 4. Furthermore, there are some strong similarities between the profile of posterior probabilities and the profile of the squared values. First, both approaches find that the poorest fit is for a regime in which is too high and beta is too low; these values result in most activity taking place in a single zone. Second, both approaches agree in that a good fit can be found for .
Next, we draw the latent variables from the prior distribution to verify the suitability of the modelling. For illustrative purposes, we consider a range of values across , and hold fixed. For the regime with highnoise, the approximate draws are obtained by running a Markov chain of length using HMC combined with parallel tempering for different temperature levels [14]. For the regime with lownoise, we plot configurations of the global minima of obtained from numerical optimisation as there is little variation between samples. The results are in Figures. 56, respectively. It can be seen that higher values of and lower values of create a sparse structure in that all activity takes place in very few zones. Conversely, lower values of and higher values of lead to a more dense structure.
We now return to the observation model in (26) to account for observation noise in the data. For illustrative purposes we specify so that the relative noise for a zone of size ^{9}^{9}9Relative noise in this context is . is . Although an improved specification of observation noise from a preliminary study would lead to more accurate inferences, the arguments and methodology we are presenting would remain the same. We run a Markov chains of length . For the highnoise regime, we use the pseudomarginal MCMC methodology in Appendix C. Our importance sampling estimates comprised of particles and equally spaced inversetemperatures. For the lownoise regime, we were unable to obtain precise importance sampling estimates due to the concentration of measure, so used the MCMC methodology in Appendix C with the saddle point approximation in (31). For both examples, the empirical autocorrelation for and is below after steps. For pseudomarginal MCMC scheme, 88% of the signs are positive, which is acceptably high for the scheme to be used.
Plots of the smoothed density estimates for and for the highnoise regime are presented in Figure 7. Plots of the latent sizes showing the posterior mean plus or minus three standard deviations are presented in Figure 8 alongside plots of the expected residuals and observation data. The posteriormarginals of and give mean plus or minus one standard deviation estimates of and , respectively, which appear reasonable in light of the analysis in Figure 4. The plots of the expected residuals and observation data suggest that the model provides a reasonable fit to the data, and that the assumption of homogeneous observation noise is reasonable. This is to be expected as the highnoise model provides a flexible model. After taking into account the observation noise, a weaker attractiveness effect was observed.
Similar plots are presented in Figures. 910 for the lownoise regime, and the posteriormarginals of and give a mean of and , respectively. The inferred values for the lownoise regime are in line with Figure 4. Both sets of posterior summaries suggest that attractiveness and inconvenience effects are present in the data, though the inferred values are considerably higher for the lownoise model. The plots of the expected residuals and observation data suggest that the model also provides a reasonable fit to the data, however, there is more dispersion in the plotted quantities and possibly a degree of heteroscedasticity due to the less flexible model. The model with more noise favours the simpler explanation that most variation is due to stochastic growth, whereas the lownoise model is more constrained. There is noticeably more uncertainty in the latent variables and the model parameters for the highnoise regime, as there are more possible explanations for the observation data. The uncertainty in the and estimates is so great for the highnoise regime that limited insights are gained for the purposes of model calibration. As a result, we conclude that strong assumptions are required in the prior modelling in order to be able to exploit known structure in the data generating process. The required assumptions can be made, for example, through the prior modelling of and or by specifying a high value of . On the other hand, the lownoise regime results in very confident posteriors. Although the resulting inferences are consistent with previous studies, care must be taken to avoid being overconfident in a particular model by not adequately accounting for uncertainty in the modelling process.
5 Discussion
We have developed a novel stochastic model to simulate realistic configurations of urban and regional structure. Our model is a substantial improvement on existing deterministic models as it fully addresses the uncertainties arising in the modelling process. Unlike existing timestepping schemes, our model can be used to simulate realistic configurations of urban structure using MCMC methods without recourse to numerical error. We have demonstrated that our model can be used to infer the components of a utility function from observed structure, thereby providing an alternative to the existing discrete choice models. The key advantage is that we avoid the need to collect vast amounts of flow data. Whilst we have presented our methodology in the context of consumerled behaviour, our approach is applicable to other urban and regional settings such as archaeology, logistics, health care and crime to suggest a few.
Our work has led to specific areas for further research. We are actively investigating the deployment of our methodology to large scale urban systems, for which there are substantial computational challenges to overcome. The cost of a potential or gradient evaluation is , however, increasing means that the estimates are more challenging to obtain owing to the curse of dimensionality. It is of interest to develop more tractable methods, for example, optimisationbased, so that inference can be performed for international models on a practical time scale. We have presented an aggregate model that can be refined to better represent domainspecific characteristics as discussed in [1]. It remains to use the proposed methodology as part of a more realistic study with wider objectives. Lastly, we emphasise that our methodology is only applicable to cross sectional data. In practice, many applications of interest require processing timeseries data that is highly correlated over time. In this setting, we would need to solve the filtering or smoothing problem for (23), and in doing so would also need to account for general trends and seasonality effects that are exogenous to our model. Our work continues to be part of ongoing efforts to draw insights from data by making use of the known mathematical structure [33].
Appendix A Assumptions for the potential function
We make the following assumptions for the potential function with reference to the overdamped Langevin diffusion described by (6) and BoltzmannGibbs measure defined by (9):

is and is confining in that and
(A.1) 
satisfies the following inequality for some
(A.2)
Assumption (i) is necessary to ensure that the BoltzmannGibbs measure is welldefined. Assumptions (i) and (ii) are sufficient to show that the distribution of , described by (6), converges exponentially fast to its BoltzmannGibbs measure. The reader is referred to [34] for further details.
The integrability condition in assumption (i) is satisfied by Lemma 3.14 in [35]. To show that assumption (ii) holds, we define
(A.3) 
then
(A.4) 
and
(A.5) 
Then for we have
(A.6) 
and
(A.7) 
as claimed.
Appendix B Utility function for a singlyconstrained potential
In this appendix, we present an alternative argument to obtain the utility potential as defined by (12). We rewrite the destination quantities in (2) as a gradient flow
(B.1) 
and look towards specifications of that satisfy the constraint in (1). The constraint is satisfied whenever the flows leaving the origin zones are convex sums in that
(B.2) 
Instead we can express (B.2) in terms of utility functions and some positive function so that
(B.3) 
By inspection, we look for a potential function of the form
(B.4) 
for some functions . Then by taking the gradient and substituting into (B.1), we obtain the requirements
(B.5) 
for . The requirements are satisfied for when each utility function is a linear with respect to the attractiveness of the destination zone in question,
(B.6) 
provided that each and that . The resulting potential function is
(B.7) 
which is slightly more general than the potential considered in (12).
Appendix C Markov chain Monte Carlo for doubly intractable distributions
The idea behind MCMC is to construct an ergodic Markov chain whose timeaverage can be used to estimate integrals of interest [14, 36]. We use a block MetropoliswithinGibbs scheme and alternate between updates and updates. The following steps can be repeated in succession to obtain a Markovchain chain that is invariant:
Latent variable update.
Hamiltonian Monte Carlo can be used for the updates to suppress random walk behaviour [37]. We propose momentum variables and update by simulating Hamiltonian dynamics with a volumepreserving integrator to obtain . We accept/reject according to the MetropolisHastings acceptance probability
(C.1) 
to correct for numerical error from numerically simulating Hamiltonian dynamics.
Model parameter update.
Random walk Metropolis with reflective boundaries can be used for the updates. The MetropolisHastings acceptance probability is given by
(C.2) 
The key challenge arises from proposing new values, in which case the acceptance probability contains an intractable ratio We can either proceed with a deterministic estimate of , at the expense of a bias, or we can obtain a consistent estimator of (29) with pseudomarginal MCMC [38] provided that unbiased and reasonably precise estimates of are available.
c.1 Unbiased estimates of the reciprocal of the normalizing constant
An unbiased estimate of is given by averaging over a batch of importance weights. The importance weights are evaluations of
(C.3) 
at locations drawn from a proposal distribution with density . By Jensen’s inequality, the reciprocal of an importance sampling estimate of is a biased estimate of . Instead, unbiased estimates of can be obtained by randomly truncating an infinite series: for a sequence satisfying , and a random stopping time , the estimator
(C.4) 
gives an unbiased estimate of [28, 39, 40, 41]. We follow [15] and use the increasing averages estimator
(C.5) 
with reference to the importance sampling weights described by (C.3). The unbiased estimates of can have high variance when the importance weights are highly variable. Fortunately, importance sampling may be carried out on an augmented state space, for example, using annealed importance sampling (AIS) [42].
c.2 Pseudomarginal Markov chain Monte Carlo
The unbiased estimators of given by (C.4) may be negative estimate, which prohibits the use of pseudomarginal MCMC. Fortunately, the so called ‘sign problem’ has been addressed in [28]. We can use the following importance sampling style of estimator that gives a consistent estimator of (29) in that
(C.6) 
where is a Markovchain obtained using the MetropoliswithinGibbs scheme described at the start of this section but with the following acceptance probability for updates
(C.7) 
and . It is necessary to cache the value of at each iteration as part of the pseudomarginal MCMC scheme.
Appendix D Table of key parameters and variables
For convenience, we provide a table of the key parameters and variables with brief explanations below:
Parameter  Explanation  Reference 

Attractiveness scaling parameter.  (3)  
Cost scaling parameter.  (3)  
Additional parameter.  (10)  
Responsiveness parameter.  (4)  
Inversetemperature.  (6)  
Cost per unit size.  (4)  
Standard deviation of observation noise.  (26)  
Noise parameter, equal to .  (23)  
Origin quantity for origin zone .  (1)  
Destination quantity for destination zone .  (2)  
Flow from origin zone to destination zone .  (1)  
Utility function for a flow from origin zone to destination zone .  (13)  
Cost of a flow from origin zone to destination zone .  (3)  
Size of destination zone .  (4)  
Attractiveness of destination zone , given by .  (6)  
Observed size of destination zone .  (26)  
Number of origin zones.  (1)  
Number of destination zones.  (2) 
Acknowledgements
L.E. was supported by EPSRC [EP/P020720/1]. MG was supported by EPSRC [EP/J016934/3, EP/K034154/1, EP/P020720/1, EP/R018413/1], an EPSRC Established Career Fellowship and The Alan Turing Institute  Lloyd’s Register Foundation Programme on DataCentric Engineering. G.P. was supported by EPSRC [EP/P031587, EP/L020564, EP/L024926, EP/L025159]. A.W. was supported by EPSRC [EP/M023583/1]. This work was supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1.
References
 [1] Joel Dearden and Alan Geoffrey Wilson. Explorations in urban and regional dynamics: a case study in complexity science, volume 7. Routledge, 2015.
 [2] Alfred Marshall. Principles of economics: an introductory volume. Royal Economic Society (Great Britain), 1920.
 [3] D. McFadden. Conditional logit analysis of qualitative choice behaviour. In P. Zarembka, editor, Frontiers in Econometrics, pages 105–142. Academic Press New York, New York, NY, USA, 1973.
 [4] George Baltas and Peter Doyle. Random utility models in marketing research: a survey. Journal of Business Research, 51(2):115–125, 2001.
 [5] Alan G Wilson. A statistical theory of spatial distribution models. Transportation research, 1(3):253–269, 1967.
 [6] Britton Harris and Alan Geoffrey Wilson. Equilibrium values and dynamics of attractiveness terms in productionconstrained spatialinteraction models. Environment and planning A, 10(4):371–388, 1978.
 [7] Alan Geoffrey Wilson. Complex spatial systems: the modelling foundations of urban and regional analysis. Pearson Education, 2000.
 [8] Alan Geoffrey Wilson. Entropy in urban and regional modelling. Routledge, 2011.
 [9] Alan G Wilson. The science of cities and regions: lectures on mathematical model design. Springer Science & Business Media, 2012.
 [10] Grigorios A Pavliotis. Stochastic processes and applications. Springer, 2016.
 [11] Andrzej Lasota and Michael C Mackey. Chaos, fractals, and noise: stochastic aspects of dynamics, volume 97. Springer Science & Business Media, 2013.
 [12] Jari Kaipio and Erkki Somersalo. Statistical and computational inverse problems, volume 160. Springer Science & Business Media, 2006.
 [13] Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. Bayesian data analysis, volume 2. CRC press Boca Raton, FL, 2014.
 [14] Jun S Liu. Monte Carlo strategies in scientific computing. Springer Science & Business Media, 2008.
 [15] Iain Murray, Zoubin Ghahramani, and David J. C. MacKay. MCMC for doublyintractable distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI06), pages 359–366. AUAI Press, 2006.
 [16] Andrew B Duncan, Tony Lelievre, and GA Pavliotis. Variance reduction using nonreversible langevin samplers. Journal of statistical physics, 163(3):457–491, 2016.
 [17] Huw C Williams. On the formation of travel demand models and economic evaluation measures of user benefit. Environment and planning A, 9(3):285–344, 1977.
 [18] Kenneth A Small and Harvey S Rosen. Applied welfare economics with discrete choice models. Econometrica: Journal of the Econometric Society, pages 105–130, 1981.
 [19] Gerard De Jong, Andrew Daly, Marits Pieters, and Toon Van der Hoorn. The logsum as an evaluation measure: review of the literature and new results. Transportation Research Part A: Policy and Practice, 41(9):874–889, 2007.
 [20] Frank Nielsen and Ke Sun. Guaranteed Bounds on InformationTheoretic Measures of Univariate Mixtures Using Piecewise LogSumExp Inequalities. Entropy, 18(12), DEC 2016.
 [21] Joel Dearden and Alan Wilson. A framework for exploring urban retail discontinuities. Geographical Analysis, 43(2):172–187, 2011.
 [22] Daniel McFadden. Modeling the choice of residential location. In Spatial Interaction Theory and Planning Models. 1978.
 [23] Daniel McFadden. Econometric models of probabilistic choice. Structural analysis of discrete data with econometric applications, 198272, 1981.
 [24] Daniel McFadden and Kenneth Train. Mixed mnl models for discrete response. Journal of applied Econometrics, pages 447–470, 2000.
 [25] Alex Anas. Discrete choice theory, information theory and the multinomial logit and gravity models. Transportation Research Part B: Methodological, 17(1):13–23, 1983.
 [26] Ronald W Butler. Saddlepoint approximations with applications, volume 22. Cambridge University Press, 2007.
 [27] Jorge Nocedal and Stephen Wright. Numerical optimization. Springer Science & Business Media, 2006.
 [28] AnneMarie Lyne, Mark Girolami, Yves Atchadé, Heiko Strathmann, Daniel Simpson, et al. On russian roulette estimates for bayesian inference with doublyintractable likelihoods. Statistical science, 30(4):443–467, 2015.
 [29] Greater London Authority. 2009 london town centre health check analysis report. Technical report, 2009.
 [30] Greater London Authority. Ward profiles and atlas. data.london.gov.uk, 2016.
 [31] Greater London Authority. Household income estimates for small areas. data.london.gov.uk, 2015.
 [32] Greater London Authority. Statistical gis boundary files for london. data.london.gov.uk, 2016.
 [33] A Apte, Christopher KRT Jones, AM Stuart, and Jochen Voss. Data assimilation: Mathematical and statistical perspectives. International journal for numerical methods in fluids, 56(8):1033–1046, 2008.
 [34] Gareth O Roberts and Richard L Tweedie. Exponential convergence of langevin distributions and their discrete approximations. Bernoulli, pages 341–363, 1996.
 [35] Georg Menz, André Schlichting, et al. Poincaré and logarithmic sobolev inequalities by decomposition of the energy landscape. The Annals of Probability, 42(5):1809–1884, 2014.
 [36] Christian P Robert. Monte carlo methods. Wiley Online Library, 2004.
 [37] Radford M Neal. Mcmc using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11), 2011.
 [38] Christophe Andrieu and Gareth O Roberts. The pseudomarginal approach for efficient monte carlo computations. The Annals of Statistics, pages 697–725, 2009.
 [39] Don McLeish. A general method for debiasing a monte carlo estimator. Monte Carlo Methods and Applications, 17(4):301–315, 2010.
 [40] Peter W Glynn and Changhan Rhee. Exact estimation for markov chain equilibrium expectations. Journal of Applied Probability, 51(A):377–389, 2014.
 [41] Colin Wei and Iain Murray. Markov chain truncation for doublyintractable inference. In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 776–784, Fort Lauderdale, FL, USA, 20–22 Apr 2017. PMLR.
 [42] Radford M Neal. Annealed importance sampling. Statistics and Computing, 11(2):125–139, 2001.