Decentralized Signal Control for Urban Road Networks
Abstract
We propose in this paper a decentralized traffic signal control policy for urban road networks. Our policy is an adaptation of a socalled BackPressure scheme which has been widely recognized in data network as an optimal throughput control policy. We have formally proved that our proposed BackPressure scheme, with fixed cycle time and cyclic phases, stabilizes the network for any feasible traffic demands. Simulation has been conducted to compare our BackPressure policy against other existing distributed control policies in various traffic and network scenarios. Numerical results suggest that the proposed policy can surpass other policies both in terms of network throughput and congestion.
keywords:
BackPressure, Traffic light control, Capacity region, Stability[cor1]Corresponding author
1 Introduction
Traffic congestion is a major problem in modern societies due to increasing population and economic activity. This motivates the need for better utilizing the existing infrastructures and for efficiently controlling the traffic flow in order to minimize the impact of congestion.
One of the key tools for influencing the efficiency of traffic flow in urban networks is traffic signal control that enables conflicting traffic to flow through intersections via the timing of green/red light cycles. It has long been recognized that the challenge is to find optimal cycle timing over many intersections so as to reduce the overall congestion and to increase the throughput through the network.
There has been much work in the past both on designing and optimizing isolated or coordinated signals that reactively resolve congestion in the urban networks. Broadly, there are two types of control that have been used for signal control: static and vehicleactuated controls; see Hamilton et al. (2013). Static control (sometimes called “fixed time plan”) involves the optimization of the cycle time, the offset between nearby intersections, and the split of green times in different directions within a cycle. This can be optimized in isolation or in a coordinated manner, for instance to create a socalled green wave where vehicles always arrive at intersections during the green cycle time, e.g. Webster (1958); Gartner et al. (1975a, b); Kraft (6th Edition, 2009). In contrast, vehicleactuated controls use online measurements from onroad detectors (e.g., inductive loops) to optimize signal timings on a cycletocycle basis in real time. Some examples of commonly used implementations are: SCOOT Hunt et al. (1981); UTOPIA Mauro and Taranto (1990); and the hierarchical scheme RHODES Mirchandani and Head (2001). Combinations of both the fixed time plan and vehicleactuated control also exist; one widely used example is SCATS Lowrie (1982).
Given a choice of the control scheme, various approaches to optimize the signal plans have been proposed. Examples include MixedInteger Linear Programming problems, see Gartner et al. (1975a, b); Dujardin et al. (2011); Linear Complementary Problem, see De Schutter and De Moor (1998); rolling horizon optimization using dynamic programming, see Gartner (1983); Henry et al. (1983); Mirchandani and Head (2001), or its combination with online learning algorithms , see Cai et al. (2009); storeandforward models based on Model Predictive Control (MPC) optimization K. et al. (2009); Aboudolas et al. (2010); Tettamanti et al. (2008, 2010); Tettamanti and Varga (2010); Le et al. (2013), or MPC optimization with nonlinear prediction Shu et al. (2011). Many of these approaches formulate the problem in a way that is centralized and thus are inherently not scalable. While the state of the art is the use of centralized techniques, improved scalability may be obtained using decentralized approaches. In this paper, we focus exclusively on decentralized schemes. Although such schemes are in their infancy, this research is one step along the path of improving such schemes to offer performance comparable with centralized schemes while retaining their scalability.
A scalable distributed approach is to solve a set of loosely coupled optimizations, one for each intersection, with coupling provided by traffic conditions. Two natural approaches are to control the traffic lights based on either (a) the expected number of vehicles to enter the intersection during the next cycle, or (b) the difference in traffic load on the road leading into the intersection and those leading out. These approaches are now deployable in practice thanks to emerging technologies, such as cameras and wireless communication enabling better access to realtime traffic data.
Notable among the first class is the work of Smith (1980), a socalled policy and its variants Clegg et al. (2000); Smith (2011), followed by the work of Lämmer and Helbing Lämmer and Helbing (2008) where the switching cost between phases is taken into account. In this approach, each intersection estimates the amount of traffic that will arrive during the next complete cycle, and sets the split time such that each phase gets a time proportional to the number of cars expected to arrive on roads which have a green light during that phase. The lack of central control raises the possibility that intersections may interact in unexpected ways to cause instability. To limit this, a stabilization mechanism was proposed in Lämmer and Helbing (2010). However, beyond heuristic arguments, there remains no formal proof of stability of this approach.
Approach (b) including work by Varaiya (2013), Wongpiromsarn et al. (2012) and Zhang et al. (2012) was inspired by research developed for packet scheduling in wireless networks: a socalled max weight or back pressure (refer to as BackPressure in this paper) algorithm Tassiulas and Ephremides (1992); McKeown et al. (1999). Like approach (a), BackPressure does not require any a priori knowledge of the traffic demand, but it has the added benefit of provable stability. To make that more precise, define a traffic load to a network as “feasible” if there exist splits at each intersection such that the queues do not build up indefinitely. Under certain simplifying assumptions, it can be shown that the queues under BackPressure do not build up indefinitely for any feasible traffic load. This will be made more formal in Section 3. In wireless networks, BackPressure can be computationally prohibitive, but in road networks Wongpiromsarn et al. (2012); Varaiya (2013); Zhang et al. (2012), it admits a simple distributed implementation, just like approach (a).
It is worth noting that all the above mentioned policies Smith (1980, 2011); Lämmer and Helbing (2008); Varaiya (2013); Wongpiromsarn et al. (2012); Zhang et al. (2012) make decisions periodically bases on the evaluations of traffic over a fixed time interval. These are called fixed cycle policies. For example, the BackPressure policy Wongpiromsarn et al. (2012) determines the phase to be activated at the beginning of each fixed time slot, while the policy Lämmer and Helbing (2010) decides whether to keep serving the current flow or switch to other flow at a regular time interval which can be arbitrary small.
Given the possibility of a stability guarantee by the BackPressure scheme, our objective in this work is to fully adapt it to the traffic control scenarios. To this end, we propose in this paper a new signal control strategy that addresses two weaknesses in the prior application of BackPressure to road networks Wongpiromsarn et al. (2012); Varaiya (2013); Zhang et al. (2012) while retain and prove the important stability property of the BackPressurebased algorithms.
The first weakness to be addressed is that phases can form an erratic, unpredictable order in the previously proposed BackPressure policy. This is acceptable in the context of communications systems but for urban road traffic this is undesirable since erratic ordering of phases brings frustration to drivers and potentially causes confusion leading to dangerous actions. Moveover, if one inbound road is particularly backlogged, then it is possible that other roads are “starved” by being assigned a red light for an extended period. To rectify this, we modify BackPressure to a “cyclic phase” policy where a policy is said to be cyclic phase policy if it allocates strictly positive service time to all phases in each control decision, and thus, it is possible to arrange the phases into a fixed ordered sequence.
The second weakness that we address is that prior applications have required each intersection to know the “turning fractions”, that is, the fraction of traffic from each inbound road that will turn into each possible outbound road. We prove that the stability results still apply when these turning fractions are estimated using even very simple measurements; specifically, any unbiased estimator of the turning fractions suffices. Such stability proofs apply for a general network model but under idealized assumptions. Nonetheless these form an important step towards the application of BackPressure to real networks.
To test the practicality of the theoretical refinements described above, we also present the numerical comparison of the proposed BackPressure algorithm with the approach of Smith (1980) or Lämmer and Helbing (2008, 2010) without switching cost. The results suggest that our cyclic phase BackPressure policy tends to outperform other distributed polices both in terms of throughput and congestion. Although the performance of each policy varies widely depending on the parameter setting such as cycle length or decision frequency, under the optimal setting, the BackPressure with cyclic phase and without cyclic phase have better throughput in compare with the other policies.
The rest of this paper is organized as follows. We first present the notations and queue dynamics model before describing our proposed cyclic phase BackPressure policy in Section 2. The main results for stability of our policy are then provided in Section 3. These results in a certain sense mean, that we can interpret our policy as stabilizing the system for the largest possible set of arrival rates leading to sufficient throughput even in congested network. For readability, however, most of the mathematical details and derivations are listed in the Appendices of the paper. Section 4 presents the simulation results and numerical comparison of our scheme with other existing policies where we demonstrate the benefits of the proposed cyclic phase BackPressure signal control strategy. Finally, Section 5 concludes the paper and discusses future work.
2 Cyclic Phase BackPressure Traffic Signal Control
2.1 Notation and Network description
Consider a network of traffic intersections. This road network consists of a number of junctions, indexed by . Each junction consists of a number of inroads, . Note that the are mutually disjoint, and denote . A road with multiple lanes having different turning options (such as a leftturn only lane) is modeled as multiple inroads, thus an inroad may model one or more lanes of traffic flow. Whether these traffic flows are conflicting or not is not considered in this setting. We use the inclusion to indicate that inroad is part of junction , and we let notate the junction used by inroad .
Each junction may serve different combinations of inroads simultaneously. We call a combination of inroads served simultaneously a service phase. A service phase for junction is represented by a vector where denotes the rate at which cars can be served from inroad at junction during phase . In particular, if inroad has a green light during phase , or otherwise.
Let denote the set of phases at junction . We will let denote the set of links of the road traffic network. Each link represents a road connecting the junctions of the urban road network. Here we write if it is possible for cars served at inroad junction to next join inroad junction .
In the rest of this section we impose the additional constraint that all junctions have a common cycle length , the time devoted to serving cars from the different inroads at the junction. Thus we can model time as discrete and consider a slotted time model where denotes the number of the cycle about to be initiated. Control decisions in our policy are then made at the beginning of each time slot (so it is a fixed cycle policy which is similar to the policies in Lämmer and Helbing (2010); Varaiya (2013); Smith (1980)). We also assume that all junctions have the same loss of service due to idle times during switches and setups. Thus at each time step, the system decides at each junction how much time within the next interval to spend serving each phase with the constraint that each service phase must be enacted for some nonzero length of time and that the sum of the allocated times must not be greater than . We assume that a car served at one junction in one time interval presents at an inroad of the next junction in its route at the next time interval.
2.2 Queue Dynamics Model
Let the queue length denote the number of cars at inroad at the beginning of the th traffic cycle, and denote the vector of queue lengths by . The decisions in the policy will be based on the measured queue lengths , which might differ from the actual value of by an error term as described in the following equation. (1) where the error term is bounded and independent of or the terms at other inroads. We denote the vector of the error terms by .
Let denote the proportion of the traffic cycle at junction which is devoted to service phase . For any policy and for all , we require (2) Recall that gives the rate at which cars can be served at inroad if the entire traffic cycle were devoted to phase , and gives the proportion of the traffic cycle devoted to service phase . Their product, gives the expected number of cars to leave inroad under service phase , provided the inroad is not emptied. Accordingly if we let the random variable be the potential number of cars served from inroad at junction in traffic cycle , the mean of must satisfy
(3) 
where we note that the proportions are allocated according to the decision in the policy based on , thus the dependency on . The random variable only gives the number of cars served if the junction does not empty. Thus, it may be possible for to be greater than the queue size . In this case, will be the number of cars served. In other words, the number of cars actually served at junction is
(4) 
where .
Further, when traffic is served it will move to neighbouring junctions. For , we let denote the proportion of cars served at inroad that subsequently join inroad at time . We assume that cars within an inroad are homogeneous in the sense that each car at the junction has the same likelihood of joining each subsequent junction. We denote the expectation of by . We further assume that this likelihood is constant (i.e. time independent) and will not be altered by the queue lengths observed by cars within the network. Thus is the number of cars that leave inroad and, next, join inroad provided the inroad does not empty.
We let denote be the number of external arrivals at inroad at time . The expected number of arrivals or arrival rate into each inroad at time is defined as . Notice by allowing to vary as a function of time, we can model varying traffic demands which undoubtedly can change over the course of a day. In cases where we choose arrival rates to be static and unchanging with time, then we will simply denote these arrival rates by .
Given a service policy , we can define the dynamics of our queueing model. In particular, we define for inroad of junction
(5) 
Here we assume that cars first depart within a traffic cycle and then subsequently cars arrive from other inroads.
2.3 Cyclic Phase BackPressure Control Policy
Now we are ready to give our proposed policy as follows

At the beginning of each traffic cycle, form an estimate of the actual turning fractions with the unbiased estimator .
^{1} 
For each junction , calculate the weight associated with each service phase at the junction as a function of the measured queue sizes and the above defined estimated turning probabilities (6)

Given these weights, assign the following proportion of the common cycle length to each phase in within the next service cycle,
(7) for and where is a parameter of the model.
^{2}
The weights defined in (6) are used in the BackPressure policy as given by Tassiulas and Ephremides (1992). They can be viewed as a “pressure” a queue places on downstream queues, which is given by the weighted mean of the differences of the queue sizes. The larger the weight associate with a phase, the more important it is to serve the inroads with green lights during that phase. Then those weights are used to calculate the portion of the traffic cycle for each phase according to (7). The distribution (7) gives each phase positive service, with more service given to the higher weight phases. As , the service allocation tends to uniform, and as , the fraction of service given to the highest weight phase(s) tends to .
Notice that in contrast to BackPressure policies which always serve the phase associated with the highest weight, the proposed policy ensures that each phase (and subsequently each inroad) receives nonzero service in each and every cycle. Thus this ensures a cyclic phase policy, while maintaining the property that higher weights result in higher proportions of allocated green time. Note the policy can be implemented in a decentralized way, after each junction has communicated queue sizes with its upstream inroads, the phases can be calculated. This decentralization has numerous advantages: it is computationally inexpensive, it does not require centralized aggregation of information and thus is easier to implement, and it increases the road networks robustness to failures.
The cyclic phase feature which we introduce to the BackPressure polices is important from the users’ point of view for various reasons. Firstly, the drivers usually expect an ordered phase sequence and anticipate traffic signal changes in advance. Secondly, the waiting time to receive service for any inroad is bounded in our policy while it could be arbitrary large for some inroads in previous stateoftheart distributed policies, such as BackPressure. It is particularly important when considering that pedestrian phases might also be initiated in parallel with the service phases for vehicles.
From an implementation point of view the policy is desirable since it does not require knowledge regarding the destination of each car within the road network, nor does it assume that the proportion of cars moving between links is known in advance. The policy estimates turning fractions and measures queue sizes in an online manner, and uses this adaptive estimates and the measurement results to inform the policy decision.
3 Mathematical Results  Stability of Cyclic Phase BackPressure Control Policy
3.1 Stability Region and Queueing Stability
We define the stability region of the network to be the set of arrival rate vectors , for which there exists a positive vector , namely the green time proportion devoted to the service phases in a cycle, and a positive vector , namely the departure rates, satisfying the constraints (8) (9) (10) where equation (8) represents the need for the accumulated arrival rates to be less than the potential departure rates, equation (9) guarantees the yellow and allred periods at each cycle maintains sufficient time for switching and setup between phases, and equation (10) indicates the departure rates do not exceed the allocated service rates. We let denote the closure of the stability region, that is the set of rates where the above inequalities in (8)–(9) may hold with equality. We also note that the random variables and the assigned service time proportions are corresponding to and and take their respective values from the sets and in the stable case.
Given the vector of queue sizes , we define the total queue size of the road network to be
(11) 
So gives the total number of cars within the road network. We say that a policy for serving cars at the junctions stabilizes the network for a vector of arrival rates if the long run average number of cars in the queueing network is finite, in particular,
(12) 
This notion of stability originates from the theory of Markov chains, where (12) gives a necessary and sufficient condition for positive recurrence, for instance, see Meyn and Tweedie (2009). Our model does not assume that the underlying system is Markovian, thus recurrence cannot be defined. However by (12) we can have the same understanding of necessary and sufficient conditions for stability as in the previous literature, see Tassiulas and Ephremides (1992); Bramson (2008); Sipahi et al. (2009). So in the long run we expect there to be a finite number of cars within the road traffic network. If the road network was unstable then we would expect the number of cars within the system to grow over time. Thus we say that a policy is unstable for a vector of arrival rates if
(13) 
We note that if the queue size process was a Markov chain then definition of stable would be equivalent to the the definition of positive recurrence for that Markov chain. However, the process that we will define need not be a Markov chain hence we use the above definition.
3.2 Main Theoretical Results
First of all we state the following known result about the stability region defined by \crefRTBP:Stab1,RTBP:Stab2,RTBP:Stab3. We now refer to the demand induced by the arrival rates as the load on the network. In particular, we show that any set of arrival rates outside the stability region must be unstable no matter what policy is used. Note that in practice, the traffic load is determined by an origindestination (OD) demand rather than a perinroad arrival rate and turning fraction. If the OD demand is stationary, then these quantities are also stationary, and the model correctly captures the load on each roach, and hence the stability of the network. If the OD demand is nonstationary, then we capture the firstorder effects by allowing to vary, but the assumption that is constant is an additional modelling approximation.
Proposition 1.
Given that the arrivals at each time, , are independent identically distributed random variables with expectation , it follows that if then any policy is unstable under these arrival rates, .
The previous proposition shows that the best a policy can do to stabilize the road traffic network is to be stable for all rates in . The following result shows that our policy is indeed stable for all arrival rates within the set .
Theorem 1.
Given that there exists an such that for each traffic cycle , then, for a constant , the long run average queue sizes of inroads are bounded as (14) and thus the policy is stable.
We leave the proof of these statements to B.3. These results in a certain sense mean, that we can interpret our policy as being stable for the largest possible set of arrival rates. Thus the policy provides sufficient throughput in congested traffic as long as it is possible, reaching an efficient utilization of the existing capacities. Note that Theorem 1 applies to time varying traffic levels. Although the stability region is multidimensional, the intuition behind the traffic model can be understood by considering the scalar case. In that case, it corresponds to the expected number of arrivals in each time cycle () being bounded above. If we interpret that bound as the traffic level during peak hour, the theorem applies to networks in which queues would remain stable even if peak hour extended indefinitely. We acknowledge that this is a stricter requirement than necessary, since the system can be stable in the long term even if queues build up during the peak hour, provided they empty sufficiently after the peak.
4 Numerical Results  Performance Evaluation and Design
4.1 Simulation settings
In this section we evaluate via simulation the performance of our proposed cyclic phase BackPressure traffic signal control and compare its performance with a number of existing selfcontrol (i.e. decentralized) schemes by Lämmer and Helbing (2010), Smith (1980) and Wongpiromsarn et al. (2012) as detailed below.
First, the selfcontrol scheme in Lämmer and Helbing (2010) aims at minimizing the waiting times at each intersection anticipating future arrivals into those intersections instead of just efficiently clearing exiting queues as in a conventional priority rule Smith (1980). However, since future traffic demand is not known, this scheme Lämmer and Helbing (2010) more or less greedily attempts to minimize the waiting time. When the setup time or the amber traffic signal is ignored, this policy (referred to as greedy policy below) tends to allocate service to the phase that has longer queue length. In contrast, backpressure (includes the one proposed in this paper) is nongreedy. Backpressure based policy ensures that an action at this time is not too suboptimal, regardless of what future traffic is like. Although it does not explicitly seek to minimize the waiting time, it is likely to result in lower waiting time and subsequently lower total travel time through a network than a greedy algorithm that does.
Second, the priority rule of the selfcontrol scheme in Smith (1980) is approximately giving a green time split proportionally to the total number of vehicles on the inroads and thus will be referred to as proportional scheme in the rest of this section.
The third and final policy in Wongpiromsarn et al. (2012) allocates green time to the phase that has the highest queue backlog differences between upstream queues and downstream queue, thus it will be referred to as BackPressure policy in this section. Note that although these benchmarks are in their genesis by the standards of currently implemented centralized schemes, they are state of the art among distributed schemes, and thus appropriate for the goal of this paper. In summary the following policies will be evaluated and compared in this section.

Cyclic Phase BackPressure policy proposed in this paper: Refer to Subsection 2.3 for details.

BackPressure policy Wongpiromsarn et al. (2012):

At the beginning of each time slot, based on recent occurrences, form an estimate of the turning fractions according to
(15) where is a parameter of the model.

For each junction , calculate the weight associated with each service phase at the junction as
(16) 
Given these weights, assign the whole service time of the next time slot to phase where .


Proportional policy Smith (1980):

At the beginning of each traffic cycle, calculate the weight associated with each service phase at each junction as
(17) 
Given these weights, within the next service cycle assign an amount of time to each phase that is proportional to

In this section we utilize an open source microscopic simulation package SUMO (Simulation of Urban MObility) SUMO (2013) to study the above schemes in a small network of two intersections(Fig. 1) and in a large network(Fig. 2) that reassembles the Melbourne CBD (Australia) with about 70 intersections.
The small network has 2 junctions consisting of several inroads (numbered from 1 to 26 on the figure Fig. 1). All the roads have bidirectional traffic with the NorthSouth road going through the right intersection having double lanes. Direction of traffic movements on this network is indicated on each inroad leading to the junctions. The ingress queues, where vehicles enter the network, are assumed to be infinite and represented by a set of long links (i.e. links on Fig. 1). The cars immediately appear on the connecting inroads inside the network, which are of finite capacity. Since the ingress queues are infinitely large, vehicles can enter the network even when there is a heavy congestion on the bottleneck link. All other links (i.e. links on Fig. 1) have the same length at meters which can accommodate maximum cars per lane.
The topology of the large CBD Melbourne network is shown in Fig. 2. It consists of intersections and links. Most of the roads are bidirectional except for Little Lonsdale Street, Little Bourke Street, Little Collins Street and Flinder Lane which only have a single lane monodirectional traffic. King Street and Russell Street are the biggest roads in this scenario, each is modeled as lanes each direction. Collins Street has one lane each direction. All other roads have two lanes each direction. The link lengths are varied between meters for the vertical links and meters and meters for the horizontal links except for the ingress links at the edges.
Results are given in terms of the total number of vehicles in the network and the congestion level which is the average number of congested links in large network after long simulation runs using the different control schemes. In all the studied scenarios, the exact queue lengths and turning fractions are observed directly from the simulation and used to make control decision in various policies. These variables are calculated using Matlab MATLAB (2013) based on the actual control algorithm and then are fed back into the SUMO simulation at every time step. We ignore switching times (i.e. transition between phases) in all control schemes in our study. This overhead can be incorporated into the simulation by extending the phase times. Nevertheless, the qualitative insights gained in this section would not change by that extension.
4.2 Performance Study
Below we evaluate the performance of our cyclic phase BackPressure scheme and compare it with other policies using fixed setting of routes in the studied networks using simulation. The cycle time of the cyclic phase BackPressure policy and the proportional policy were set to seconds, while the slot time of the BackPressure policy and the greedy policy were set to seconds in our simulation.
Small Network
First, we study a small network scenario, for which the turning information and the arrival rates are indicated in Fig. 3. In particular, the arrows indicate 10 routes with direction and demands (cars/minutes) in the peak and offpeak (i.e. on/off) time periods as shown in Fig. 3. The value introduced in (7) was set to . The main traffic flows are the ones with NorthSouth direction of the second junction. The two BackPressure policies give the majority of service time to the NorthSouth phase of that junction which leads to heavy congestion on links 1, 2, 3, 14 and 15. On the other hand, the proportional policy and greedy policy put more balance between the service times depending on the queue lengths which creates more congestion in the NorthSouth direction at the cost of having less congestion in EastWest direction.
Results are shown in Fig. 4 and Fig. 5 where the total number of vehicles in the network and in the congested link between the two junctions are plotted over time. Note that there are two inroads between the two intersections but only link 3 is congested due to large traffic flows in the NorthSouth direction at the second intersection.
Note that Figs 4 and 5 were based on the number of cars present at the times when control decisions were made which is seconds for cyclic phase BackPressure and proportional policies. In contrast, the average travel time depends on the waiting time on individual link which is an integral of queue size over continuous time. For this reason, intermediate queue size was also measured at 10 s intervals in the simulation, and the results differed by less than 2% in compare with the coarse sampling at once per cycle assuming linear interpolation. The resulting travel time values are reported in the next subsection 4.3.
Observe that the cyclic phase BackPressure control yields a lower number of total vehicles present in the network and thus results in higher number of vehicles reaching their destination (i.e. increased network throughput) during the whole simulation. This is due to the fact that in the two BackPressure control schemes when the bottleneck link (link 3) is congested, less green time will be allocated to the EastWest direction at the first junction. As a result more traffic can move through the NorthSouth direction and the impact of a spill back from the second junction on the overall network throughput decreases. Similarly, the BackPressure policy also outperforms the proportional policy and the greedy policy, since those two schemes allocate similar amount of green time to the EastWest direction at the first junction despite the presence of a spilled back traffic and thus waste some of the green time.
Large Melbourne CBD network
A similar study is performed with a large network with its turning information and arrival rates indicated in Fig. 2. The parameter is once again set to . In this setting, the King Street has the largest flows, thus, any flow that shares an intersection with King Street tends to be underserved especially the intersection between King Street and Lonsdale Street and the intersection between King Street and Bourke Street. Generally in the peak period, congestion in any link will cause spillback which leads to further congestion in the neighbouring junctions. This can only be recognized by the two BackPressure policies through comparing the inroad and outroad , and more service will be allocated in this case to traffic flows on the less congested directions. In contrast, the proportional policy and the greedy policy only consider the queue lengths present at inroad , and may waste some green time to the congested direction where traffic comes to a standstill due to the spill back.
Results for this scenario are shown in Fig. 6 and Fig. 7. As shown in Fig. 6 the cyclic phase BackPressure policy has the lowest total number of vehicles in the network, thus it provides the highest throughput, whereas the second highest throughput is provided by the BackPressure policy. Similarly to the previous scenario, the two BackPressure policies outperform the other two policies in case of heavy congestion because they take into account downstream queue lengths, and thus allocate resources (i.e. service phases) more efficiently.
Fig. 7 plots the congested link over time. Herein a link is said to be congested at a certain time if its queue length is more than of the link capacity. It is clear that the two BackPressure policies reduce the number of congested links significantly (i.e. less number of vehicles inside the network) resulting in higher network throughput.
4.3 Experimental Parameter Design
The cycle length in the cyclic phase BackPressure policy and the proportional policy and the frequency of making decision in the BackPressure policy and the greedy policy play a crucial role in the performance of the control scheme. A long cycle length or the low frequency of making decision may be less efficient due to the fact that the queue might be depleted before the end of the service time. In the other hand, a short cycle length may reduce the overall capacity since the vehicles have to stop and accelerate more often. Note that the latter is in fact represents a switching cost between phases even though the amber traffic signal is not considered here. This subsection investigates the impact of the cycle time and decision making frequency on the throughput and congestion level of each scheme. We study both network topologies (the small network and the large CBD network) under the similar demand levels as in the previous subsection with different cycle times and decision frequencies. Particularly, for the cyclic phase BackPressure policy and the proportional policy, the cycle length is set to seconds, and for the BackPressure policy and the greedy policy, a decision is made every seconds, respectively.
Small Network
For small network, the results are presented in Fig. 8, Fig. 9, and Fig. 10. Fig. 8 shows the average number of vehicles in the network plotted against different cycle times. Because vehicle does not disappear and stays in the network until it exists, lower number of vehicles in the network equates to higher throughput of the same demand. In this scenario, all of the studied policies provide a similar throughput using their corresponding best setting. Furthermore, it can be observed that in congested network higher cycle times tend to have better throughput because the traffic flows are less interrupted by the switching between phases. Nevertheless, the proposed cyclic phase BackPressure is less sensitive to the changes of cycle time while producing a compatible throughput.
Fig. 9 plots the average link densities versus link ID. It shows that congestions occur in the same set of links throughout all of the studied policies. In overall, the cyclic phase BackPressure policy and the proportional policy have lower link density than the other two policies. Fig. 10 shows the maximum link density over time pointing to when the congestions occur in the network. In all cases, congestions appear during the peak period and some portion of time during the offpeak period before the buildup traffic can be substantially drained. However, when the cycle time or the decision frequency is set too small (e.g. 10 seconds for the BackPressure policy and the greedy policy and 30 seconds for the cyclic phase BackPressure policy and the proportional policy), the congestions are not able to cleared at all except for our cyclic phase BackPressure policy.
Large Melbourne CBD Network
The impact of the cycle time and frequency of decision making on network throughput and congestion level using different policies for a large network are investigated and discussed in this subsection. The results are shown in Fig. 11, Fig. 12 and Fig. 13.
The average number of vehicles in the network for each setting is presented in Fig. 11. Unlike the results in the small network, there clearly exists an optimal value for the cycle length or decision making frequency of each policy. In particular the optimal cycle length for proportional policy is seconds, while the optimal cycle length/decision frequency for all other policies is seconds.
Proportional  Cyclic phase BP  Greedy  BP  

Avg. travel time  478.0  409.6  514.0  408.5 
Observe that the optimal cycle length in the large network scenario is shorter than that of the small network, which can be explained by shorter link lengths and larger number of intersections on any route. Both increase the interdependency between intersections and their performance as arriving traffic into any internal intersection is an output traffic from the others.
Furthermore, the average travel time of each policy in their best setting of cycle length and decision making frequency is shown in Table 1. There is a strong correlation between the average number of vehicles in the network and average travel time through that network. In particular, the higher number of vehicles in the network results in the longer travel time and vice versa. It can be seen that the proposed cyclic phase BackPressure policy has a competitive average travel time between all the considered policies, and the results show that the BackPressurebased policies yield a significantly better average travel time than that of the greedy policy. Note that this better average travel time has been achieved with the control decisions using the queue size measurements at discrete time intervals (once in every cycle) only as explained earlier.
Fig. 12 plots the average link densities against the link ID. It shows that the congestion area is varied with different policy and with different parameter settings. Any cycle length/decision frequency setting other than the optimal setting obviously increases the congestion greatly.
Finally, Fig. 13 shows the maximum link density over time. The cycle length plays a vital role to prevent congestion in this scenario. In the cyclic phase BackPressure policy, the second cycle length is undoubtedly outstanding. In other policies, the small cycle lengths are seemed to be better due to the short link lengths.
5 Conclusion
We proposed in this paper a novel decentralized signal control strategy based on the socalled BackPressure policy that does not require any knowledge of the traffic demand and only needs information (i.e. queue size) that is local to the intersection. In contrast to other existing BackPressurebased policies in which phases can form an erratic and unpredictable order resulting in potential unsafe operation, our scheme allocates nonzero amount of time to each phases within the cycle, thus repeating them in a cyclic manner. Furthermore, unlike all the other existing Backpressurebased policies, no knowledge of the local turn ratios (turning fractions) is required in our control strategy. Instead any unbiased estimator of the turning fractions can be utilized in the proposed scheme. We have formally proved the stability results of the proposed signal control policy even though the controllers are reacting based only on local information and demand in an distributed manner. The stability results indicate that our policy is stable for the largest possible set of arrival rates (or demand) that will provide sufficient throughput even in congested network.
Using simulation, we compared our cyclic phase BackPressure performance against other wellknown policies in terms of network throughput and congestion level using both small and large network topology with fixed routings. The results showed that our cyclic phase BackPressure policy tends to outperform other distributed polices both in terms of throughput and congestion. Although the performance of each policy varies widely depending on the parameter setting such as cycle length or decision frequency, under the optimal setting among the cases studied, the BackPressure with cyclic and noncyclic operation have better throughput in compare with the other policies.
There are still many issues, such as nonconstant switching times, finite link travel time and link capacity etc. that have not been considered here and will be a subject of future work.
Acknowledgements
This work was supported by the Australian Research Council (ARC) Future Fellowships grants FT120100723 and FT0991594.
Appendix A Estimation of Turning Fractions
An important aspect not addressed in the previous studies applying back pressure is the estimation of traffic turning fractions. Previous studies have either assumed the turning fractions are either explicitly known or have been calculated prior to the implementation of the policy.
Here we emphasize that the turning fractions can be estimated using recent locally calculated information about traffic flows. For instance, if we form an estimate on the turning fractions based on the last service cycles
(18) 
where denotes the measurement result for . If then any estimate may be used to define . Given that turning fractions are stationary and independent of queue sizes, these estimates form an unbiased estimate of the underlying turning probabilities as long as the measurement error in has zero mean, since then
(19) 
Other rules incorporating historical data or more recent data could also be considered here. What is necessary is that provides an unbiased estimate of the underlying turning fractions of the vehicles for nonempty queues. It is even possible to use an inconsistent estimate of the turning fractions or for the proportions to change on a larger time scale, as long as the estimate is unbiased, independent of (the history of) .
Appendix B Proof of the main stability result
In this section we prove Theorem 1. In order to that, we have to clarify some assumptions made about the stochastic elements of our model and provide some technical lemmas which are proven in a supplementary document together with Proposition 1.
b.1 Assumptions

The number of cars that can be served from any inroad within a traffic cycle is bounded,
(20) 
is stationary and independent of queue lengths and the number of cars served at each queue for all .

The matrix is invertible. Thus we have, for
(21) (22) 
The number of arrival is independent of the state of the queues in the road traffic network. Thus the average arrival rate into each junction can be defined as
(23) 
The error term in the queue size measurement, is bounded, i.e. (24)
b.2 Lemmas
In this section, we will prove a number of additional lemmas that are required for the main proofs. The first lemma describes the difference of the weights caused by the error in measurement, whereas the second describes a general result on weights. The third lemma is a consquence of them, introducing a bound which is used multiple times in proving later statements.
Lemma 1.
(25) where is defined according to (6).Proof.
Lemma 2.
Given weights with elements indexed by finite set , we consider a random variable with the following probability of event :
(27) 
then, the expected value of the weights under this distribution obey the following inequality
(28) 
Proof.
In the following inequality, we note that the entropy of a distribution is maximized by a uniform distribution on , .
as required. ∎
Lemma 3.
(29) 
Proof.
Lemma 4 and Lemma 5 introduce bounds on the increments of the square of the queue sizes and their conditional expectations.
Lemma 4.
There exists a constant such that our queue size process, (5), obeys the bound
(31) 
Proof.
Firstly, the following bound holds for the queue size process, (5).
(32) 
Let’s consider the two cases above. Firstly, if then, and according to the above bound, we have
(33)  
(34)  
In the final inequality, we use the fact that the term, (33), is bounded by the term (34).
Secondly, if then, according to (32),
Thus defining
we see that in both cases, above, we have the required bound
∎
Lemma 5.
There exists a constant such that the following equality holds
(35) 
Proof.
First let us suppose that the queue has been empty over the last time steps. Then, since a bounded number of cars arrive at the queue per traffic cycle, the queue size must be less than . Clearly the above bound holds for any, . Now lets suppose , we can take out the conditional expectation because it is known
The proportion of traffic is independent of and of . So the expectation of given (and ) is its mean . So
Also (3) implies
Also since is an unbiased estimate of at time , and independent of by assumption,
Substituting this all back in, we have
thus the above inequality also holds in the case as required. ∎
Lemma 6 indicates an allowed reordering of terms.
Lemma 6.
The following equality holds for each measured queue size vector,
Proof.
Although the following set of equalities is some what lengthy, the premise is fairly simple. We want to change to order of summation so that we first sum over junctions instead of first summing over inroads . These manipulations are as follows