Measurement-wise Occlusion in Multi-object Tracking
Handling object interaction is a fundamental challenge in practical multi-object tracking, even for simple interactive effects such as one object temporarily occluding another. We formalize the problem of occlusion in tracking with two different abstractions. In object-wise occlusion, objects that are occluded by other objects do not generate measurements. In measurement-wise occlusion, a previously unstudied approach, all objects may generate measurements but some measurements may be occluded by others. While the relative validity of each abstraction depends on the situation and sensor, measurement-wise occlusion fits into probabilistic multi-object tracking algorithms with much looser assumptions on object interaction. Its value is demonstrated by showing that it naturally derives a popular approximation for lidar tracking, and by an example of visual tracking in image space.
Recent applications of robotics, such as intelligent consumer vehicles, require an understanding of their surroundings on par with a human’s. This is currently achieved via maximizing information intake at all times, combining high-resolution sensors like multi-laser rotational lidars with powerful computers and substantial context such as 3D maps. Lower-resolution sensors and weaker computation could perhaps achieve the necessary level of understanding at a lower cost, but require a system that accurately and completely handles any uncertainties. The framework of multi-object tracking achieves this by modeling the environment as a set of objects whose presence, location, and characteristics follow potentially interdependent probability distributions. A carefully designed model can intrinsically perform complex tasks such as combining information from different points of view, correctly reasoning about yet-undetected objects, and quantifying uncertainty in its predictions.
Not every property of real multi-object systems can be easily formulated in this framework. For example, the majority of models treat the motion of each object as independently distributed, though many tracking applications feature objects that dynamically interact, for instance by following each other. Similarly, these models do not always enforce inter-object constraints such as that two objects cannot occupy the same space, though there are some ways to implement such constraints . Multi-object tracking models also typically assume that sensory information is the accumulation of individual information from each object within the sensor’s view. In practice, measurements may be a more complex result of several nearby objects. The clearest example of this is termed occlusion: sensors relying on line-of-sight will not receive information from objects that are behind other objects.
Occlusion is a simple concept but has no standard treatment for multi-object trackers. Offline visual tracking techniques often treat occlusion as an unavoidable source of failure and focus on correctly identifying objects upon reappearance [2, 3]. Alternatively, they utilize features that distinguish each object and rely on warning signs to detect occlusion in advance . Occupancy grids are a class of multi-object tracking algorithms that forego representation of distinct objects and instead model a region of space . A grid of adequate resolution is usually more computationally expensive than a similar multi-object tracker, but grids have the advantage of easily incorporating occlusion and other interaction effects. Recent research has applied theory from object tracking to grids  and learned grid trackers with techniques from computer vision . Finally, occlusion has been incorporated into the framework of set-theoretic multi-object tracking. Prior work has focused on one representation of occlusion and run into limitations, typically resorting to handmade approximations. Section III covers the framework of multi-object tracking, and section IV discusses ways to incorporate occlusion into this framework, with the final sections providing two use cases. But first we differentiate approaches to modeling occlusion with a simple example.
Ii Four Square Example
This example uses a discrete space with up to two objects and measurements. As shown in Figure 1, one object is guaranteed to be present and has an equal chance to exist in either the bottom left or bottom right square. The other object has an equal chance of being present or not present, and if present it has an equal chance of being in the top right or the top left square. A present object has a 50% chance of generating a measurement in the same square, a 25% chance of generating a measurement in the wrong square due to hypothetical sensor error, and a 25% chance of not generating a measurement due to sensor failure. There are no false positives in this example, i.e. a row without an object will not have any measurements. Figure 1 displays this model, with measurements denoted as red boxes. Because each object and measurement are consigned to separate rows, if there is no occlusion then the prior, measurement, and posterior distributions can be handled separately for each object. Several possible measurement outcomes are shown in Figure 1, and the posterior estimate of the objects given each outcome is shown in the “No Occlusion” rows of Table I.
We next assume that the object in the bottom row may occlude the top one. This example displays a common motivation for tracking under occlusion: to determine the presence and rough location of objects behind currently tracked objects. We first follow the traditional representation of occlusion: if the top object is behind the bottom object, it cannot generate any measurement. This naturally leads to a different posterior estimate, not only for the top object’s existence but also for the expected positions of both objects. For instance, the probability of the bottom object being in the left square given outcome D is much lower, because an object in the bottom left square would occlude the object creating a measurement in the top left square.
In the second representation of occlusion, the placement of objects is irrelevant, but a measurement in the bottom row renders a top measurement in the same column invisible. We refer to the first representation as object-wise occlusion, and the second as measurement-wise occlusion. Despite having similar base concept, they can ultimately have distinct effects on the posterior estimation of either object. Figure 2 lists the outcomes of this example that are considered impossible by either representation. Table I includes results from both types of occlusion, which can cause significantly different conclusions. Note that a posterior cannot be derived for measurement set E with measurement-wise occlusion, because such a measurement set is considered impossible.
|P(top object exists)|
|P(top object on left if exists)|
|P(bottom object on left)|
given measurements from figure 1
Which representation is more valid? For a highly accurate sensor, the outcomes for which object-wise and measurement-wise occlusion differ would rarely occur and the difference becomes trivial. Sensors for which the object-wise representation is better suited include:
Sensors that generate a small number of point measurements per object, such as post-processed radar. Even if clustering is used to match one measurement group per object, the definition of a measurement-wise occlusion would be complex and case-specific. Radar, however, requires a complex formulation of occlusion in the first place due to its reflective tendency [7, 8].
Computer vision algorithms that can infer the overall position of an object based on individual parts, especially when the occluding objects are nonconvex shapes such as humans. Deformable part-based models are an example. Figure 3 shows an example image where a moderately overlapping person was detected distinctly. Once again, the nature of occlusion for this type of sensor is quite complex.
Sensors for which the measurement-wise representation may be more valid include
Sensors that give unprocessed, high-resolution information, such as scanning lasers (lidar). These sensors give a fixed number of measurements at known angles, so any hypothetical measurement can only be occluded by a measurement at the same angle. The value of measurement-wise occlusion is especially clear for sensors whose sight is not parallel to the plane in which the objects move, such as rotational lidars placed on drones or the tops of vehicles . The probability of occlusion for each laser will depend on the height of each object, as well as any elevation or sensor tilt, whereas measurement-wise occlusion can be reasoned about with only a measured range value. We show in section V that some object-wise approximations for lidar tracking can be handled directly with measurement-wise occlusion.
Computer vision algorithms that utilize non-maximum suppression (NMS). Many computer vision techniques give multiple small or overlapping detection responses for a single object. NMS removes or merges overlapping detections to address this problem, at the cost of potentially removing detections of different, nearby objects. In other words, it is an intentional implementation of measurement-wise occlusion. Occlusion-sensitive versions of NMS have been studied , but to our knowledge have not been heavily adopted. The right side of Figure 3 shows detections from a deep-learning vision algorithm that has utilized NMS.
Ultimately, either approach is a simplification of the complex or possibly unknown true behavior of a sensor. The next sections show how these occlusion methods can be implemented for multi-object tracking.
Iii Tracking Framework
This section briefly describes multi-object tracking, omitting steps that are unaffected by occlusion such as prediction and object creation and removal. is a set of objects that is distributed according to a set probability density function . Similarly, is a set of measurements , generated from by the likelihood function . The goal of multi-object tracking is to determine, or approximate, the posterior distribution . This parallels the goal of single-object tracking to determine:
and in fact multi-object models are designed to utilize similar pairwise object-measurement relationships. We adopt the disjoint union notation of , in which the probability of a finite, unordered set can be written as a sum of permutations across a fixed-size, ordered list of disjoint subsets.
The notation means that , . This notation has not been widely adopted but offers several conveniences. For instance, probabilities over the superposition of two sets can be cleanly written.
Iii-a Object Models
The distribution is chosen based on descriptive power, as well as conjugacy with the measurement likelihood. For instance, the multi-bernoulli distribution  (and the equivalent classical filter JIPDA) describes a set of potential objects with independent probability of existing and independent state distributions .
We use the multi-bernoulli distribution as an example for the rest of the paper, on the grounds that other distributions have similar forms and reach similar posterior distributions (in the respects that are relevant to occlusion). For instance, the multi-bernoulli mixture filter uses a mixture of multi-bernoulli distributions, the labeled MB and GLMB filter have similar forms, and all of the above can be combined with an independent poisson point process to smoothly handle object appearances [14, 12].
Iii-B Measurement Model
Many sensors return a single measurement corresponding to each successfully detected object. This is represented by a single-measurement likelihood and an object-dependent detection probability . Additionally, sensors may return false positive measurements, which are typically assumed to be Poisson distributed with a generation rate and distribution . These assumptions are referred to as the standard measurement model and can be fully written as
Note that any number of objects may be assigned to the null measurement (undetected), and likewise any number of measurements may be false positives. The joint probability of the multi-bernoulli object model and the standard measurement model can be factored into a convenient form by rearranging the association variables.
is a matrix-shaped association variable between bernoulli components and measurements.
The posterior distribution of is a mixture of multi-bernoulli distributions. The number of components in the mixture is equal to the number of possible associations, so in practice approximations of this form are used. The marginal distribution of is also evident from (8), and can be thought of as the marginal of a function over associations . Calculation or approximation of this marginal probability can be performed in several ways, for instance using graphical techniques .
Some sensors, such as scanning lidars or computer vision techniques that collect simple features, instead generate a fixed number of measurements with an arbitrary number detecting any one object. These sensors could be described by applying the standard measurement model to each measurement separately, and assuming that at most one measurement is viewed for any given model. The separable likelihood model [16, 17] combines this framework with the assumption that objects are easily separable in the measurement space. It can thus consider the measurement-object matchings as predetermined. Other non-standard models parametrize the rate at which an object creates measurements . These models are not covered further because, as mentioned before, measurement-wise occlusion is difficult to formulate for such sensors. Intuitively, the standard and separable likelihood models enforce that the set of measurements is a collection of separate pieces of information about individual objects, with uncertainty only in the completeness and association of this information. Certain formulations of occlusion can threaten this assumption.
While discussed heavily in the design of practical multi-target trackers, the phenomenon of occlusion has not (to our knowledge) been formally defined for random sets. We start with a random set which follows some distribution . Occlusion divides the original set into two disjoint sets: the visible set and the occluded set . At its most general, a probabilistic occlusion model could be written
Where values represent whether or not a particular element was occluded 111We don’t strictly define as a random variable, just as a useful symbol.. We next define restricted occlusion, in which only visible objects impact the occlusion of other objects:
This assumption may not always be realistic: for instance, some sensors may miss objects that are partially occluded even as those objects occlude others, as illustrated in Figure 4. This is however a reasonable assumption in many cases, and is useful for straightforward inference. An even stricter form of occlusion is static occlusion:
In this case, no object affects another object’s probability of occlusion. This is valid when the causes of occlusion are known rather than being tracked, and is approximately valid when they are tracked very accurately.
Iv-a Object-wise Occlusion
Object-wise occlusion dictates that from the tracked object set , only a subset of objects are actually capable of generating measurements. Static occlusion in particular can be incorporated into the multi-bernoulli distribution.
The joint probability given the standard measurement model is of the same form as (8), with the following modifications.
It is clear that incorporating static object-wise occlusion in a tracking model is equivalent to modifying the probability of detection to . However, the general and even restricted occlusion models are difficult to formulate in such a way: will no longer simply be a product of individual likelihoods for each permutation.
Thus trackers use the static occlusion model and alter each object’s detection probability, even when the probability of occlusion is highly dependent on nearby objects. The marginal occlusion probability where is the logical choice for a static occlusion term.  accurately solves for the probability of occlusion between two rectangular objects tracked by a line-of-sight sensor, by calculating both the mean and variance of each object’s angular span and assuming they are independently distributed. However, this method cannot handle an object that is partially occluded by multiple objects, jointly resulting in a full occlusion. In such situations, they estimate the joint probability of visibility for a given object as the product of these pairwise occlusion probabilities.  handles approximately ellipsoidal objects in a similar way.  uses the mean position of each object to approximate a static occlusion model, but they calculate the joint probability of occlusion by making a miniature grid across the visible parameter of the rectangle. For a sensor that can handle partial occlusions well, the probability of visibility for the object is the maximum probability of visibility in this grid.  also uses a exponential weighting to calculate the probability of occlusion for each grid point, to mitigate the inaccuracy of the expected-value approximation. Other practical algorithms such as  perform deterministic checks for occlusion, assuming that the high accuracy of their sensory data keeps approximation error low.
Iv-B Measurement-wise Occlusion
Intuitively, measurement-wise occlusion should only affect the probability that an object did not generate one of the visible measurements. Specifically, each object term in the standard measurement model (6) could be modified to:
adding in the probability that object generates a measurement that was occluded by the visible measurements . This result is in fact obtained under restricted occlusion, regardless of the visibility model . The proof uses a convenient property of integration on disjoint sets, proven in  section 3.5.3.
The standard measurement model with restricted measurement-wise occlusion can be written:
We are only interested in the probability of the observed measurements , and so integrate out of each term.
Functionally, the only change to the multi-bernoulli joint distribution is an addition to term (10).
In addition to this change, the measurement model is multiplied by a constant exponential term corresponding to occluded false positives, and by . In restricted object-wise occlusion, would complicate inference by adding inter-object dependencies. In measurement-wise occlusion, the visible measurements are known and so this term is irrelevant to calculation of the posterior.
V Separable Likelihood Application
Section II argued that measurement-wise occlusion is a realistic choice for scanning line-of-sight sensors. Here the potential simplicity of its application is demonstrated. For these sensors, the standard measurement likelihood for each measurement can be written separately:
This method has been utilized by [7, 17, 24] to track vehicles using horizontally scanning lidar. Each work designed a measurement likelihood that was resistant to occlusion between well-separated objects. As shown in Figure 5, measurements near the hypothesized vehicle were highly likely, measurements slightly farther away were highly unlikely, and measurements significantly closer to the sensor were given a moderate, uniform likelihood. Alternatively, consider a deterministic restricted measurement-wise occlusion model where any measurement occludes all measurements with a higher distance. If objects are separated in distance enough that any given measurement is much more likely to have been generated from one object (or be a false positive) than the others, then the multi-bernoulli separable-measurement joint distribution can be simplified greatly.
Where and were defined in (9) and (10). Measurement-wise occlusion gives the properties desired by [7, 17, 24], without their constraints on the measurement likelihood. This permits, for instance, separable-likelihood tracking using Kalman or Rao-Blackwellized filters. Relaxing some of the assumptions, such as separable false positives or the deterministic nature of the occlusion, will still result in a tractable multi-bernoulli mixture posterior, though not necessarily a singular multi-bernoulli.
Vi Visual Tracking Application
|Occlusion||MOTA||MOTP||IDF1||Mostly Tracked||Mostly Lost||FP||FN||# Switches||GOSPA||Cardinality|
To demonstrate the value of occlusion-aware tracking beyond simple LOS sensors, we track pedestrians in the fourth video from the 2017 Multi-Object Tracking Benchmark using the supplied bounding boxes from the Faster-RCNN detector . These detections have a very low false positive rate but can miss partially occluded people, possibly due to heavy non-maximum suppression. This video is a challenging test of occlusion reasoning. There are many cases of pedestrians occluded by single other pedestrians, groups of other pedestrians, and also street lights and other stationary objects whose existences is not known by the tracker.
The bounding boxes of each person are tracked in image space, in which horizontal and vertical location and size are the features. Occlusion is likely if the overlap between boxes, for instance measured by the intersection area over total area, is high. This representation provides no natural ordering of occlusion for objects, unlike in a ground-plane setting where the relative distance to the sensor distinguishes occluding and occluded objects/measurements. We use two techniques to determine order of occlusion: first we assume that measurement boxes can only be occluded by measurement boxes whose bottom is lower than theirs. For right-side-up cameras detecting grounded objects, this emulates a distance-based ordering. To promote stability in the order of occlusion, each object is given a fifth feature, occludability. An object with a 95% occludability has a 95% chance of generating an occludable measurement, which may or may not actually be occluded by another measurement, and a 5% chance of generating a measurement which cannot be occluded no matter where it is. Given that only occludable measurements can be occluded, the posterior occludability inherently increases for undetected objects and is unchanged for detected objects. In the prediction step, occludability is slowly mixed to its equilibrium value. This approach to occlusion is applied to a measurement-wise tracker and to an object-wise tracker using an expected-value approximation. The same tracker is also run without occlusion reasoning.
The object state (sans occludability) is normally distributed, with single-object tracking carried out by a standard Kalman filter. The poisson multi-bernoulli filter  was used as the multi-object framework, with merging by track so that object labels were kept consistent. The data association step was achieved with the loopy belief propagation technique from . For implementation, a fixed array of 2048 normal components and an array of 72 object labels was used. The most likely 2048 components from each update step were kept. Likewise, the most likely 72 objects were kept while the others were ‘recycled’ as unlabeled, poisson-distributed components. Highly similar components in the same object were located via kd-trees and trivially merged by pooling their existence probability. New pedestrians entering the scene are assumed to be poisson-generated at the edges of the image.
Table II shows the accuracy and precision scores used by the MOT benchmark for labeled tracking evaluation, as well as the generalized optimal subpattern assignment metric (GOSPA)  and ratio of difference in total cardinality as unlabeled performance indicators. Arrows by each metric name indicate the direction of higher performance. Both labeled and unlabeled multi-object metrics require a base single-object metric: bounding box intersection-over-union was chosen as in the MOT15-17 benchmarks, but with a looser cutoff such that any degree of intersection is considered a possible match. The bounding boxes in video 4 are smaller than most in MOT17, and occluded individuals moving in crowded areas would be extremely difficult to match with the standard requirement of 0.5 IoU. As the primary application of the MOT benchmark is consistent post-processed labeling, its standard scoring code removes a significant number of individuals that are heavily occluded or unmoving at each time. We include all of these individuals as our goal is to track temporarily occluded objects.
While no tracker has excellent results, the occlusion-equipped models outperform the baseline model by most metrics. The two snapshots of the video in Figure 6 show the raw F-RCNN detections in magenta and the hypothesized objects in blue. The crowd in the upper left is not resolved (some individuals here are not detected throughout the video), but the two occlusion cases in the center are easily resolved based on the past positions of these individuals. The approximate object-wise tracker outperforms the measurement-wise tracker, especially in identity switches. It is possible that violation of the restricted occlusion assumption, by the undetected stationary obstacles, significantly impacts measurement-wise tracker. Figure 7 shows a case where one person is occluded by another, who proceeds to be occluded by a light pole.
The traditional formulation of occlusion in multi-object tracking is that objects block other objects from the sensor’s view, and that occluded objects generate no measurement. This is intuitive but creates object dependencies that make tracking intractable, so a variety of approximations have been proposed. We instead formally define occlusion as an operation on a random set and show that this operation can be applied to measurements as well as objects. This new approach, termed measurement-wise occlusion, is equally intuitive and fits tractably into the standard multi-object model with a loose restriction. It can be implemented with a simple additional step in any given multi-object tracking technique. We highlighted the practical value of this approach in two tracking applications where occlusion is a significant problem.
This work was supported by the Texas Department of Transportation under Project 0-6877 entitled “Communications and Radar-Supported Transportation Operations and Planning (CAR-STOP).”
-  M. Schreier, V. Willert, and J. Adamy, “Compact representation of dynamic driving environments for ADAS by parametric free space and dynamic object maps,” IEEE Trans. Intell. Transp. Syst., vol. 17, no. 2, pp. 367–384, Feb. 2016.
-  M. Betke and Z. Wu, “Data association for Multi-Object visual tracking,” Synthesis Lectures on Computer Vision, vol. 6, no. 2, pp. 1–120, Oct. 2016.
-  J. Scharcanski, A. B. de Oliveira, P. G. Cavalcanti, and Y. Yari, “A Particle-Filtering approach for vehicular tracking adaptive to occlusions,” IEEE Trans. Veh. Technol., vol. 60, no. 2, pp. 381–389, Feb. 2011.
-  A. Yilmaz, X. Li, and M. Shah, “Contour-based object tracking with occlusion handling in video acquired using mobile cameras,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, Nov. 2004.
-  D. Nuss, S. Reuter, M. Thom, T. Yuan, G. Krehl, M. Maile, A. Gern, and K. Dietmayer, “A random finite set approach for dynamic occupancy grid maps with Real-Time application,” arXiv preprint, May 2016.
-  F. Piewak, T. Rehfeld, M. Weber, and J. M. Zöllner, “Fully convolutional neural networks for dynamic object detection in grid maps,” in 2017 IEEE Intelligent Vehicles Symposium (IV), Jun. 2017, pp. 392–398.
-  A. Petrovskaya and S. Thrun, “Model based vehicle detection and tracking for autonomous urban driving,” Auton. Robots, vol. 26, no. 2-3, pp. 123–139, Apr. 2009.
-  A. Scheel and K. Dietmayer, “Tracking multiple vehicles using a variational radar model,” arXiv preprint arXiv:1711.03799, 2017.
-  T. Chen, R. Wang, B. Dai, D. Liu, and J. Song, “Likelihood-Field-Model-Based dynamic vehicle detection and tracking for Self-Driving,” IEEE Trans. Intell. Transp. Syst., vol. 17, no. 11, Nov. 2016.
-  S. Tang, M. Andriluka, and B. Schiele, “Detection and tracking of occluded people,” Int. J. Comput. Vis., vol. 110, no. 1, pp. 58–69, Oct. 2014.
-  A. Milan, L. Leal-Taixé, I. D. Reid, S. Roth, and K. Schindler, “MOT16: A benchmark for multi-object tracking,” CoRR, vol. abs/1603.00831, 2016. [Online]. Available: http://arxiv.org/abs/1603.00831
-  Á. F. García-Fernández, J. L. Williams, K. Granstrom, and L. Svensson, “Poisson multi-bernoulli mixture filter: direct derivation and implementation,” IEEE Transactions on Aerospace and Electronic Systems, 2018.
-  B. T. Vo and B. N. Vo, “Labeled random finite sets and Multi-Object conjugate priors,” IEEE Trans. Signal Process., vol. 61, no. 13, pp. 3460–3475, Jul. 2013.
-  J. L. Williams, “Hybrid poisson and multi-bernoulli filters,” in Information Fusion (FUSION), 2012 15th International Conference on. IEEE, 2012, pp. 1103–1110.
-  J. Williams and R. Lau, “Approximate evaluation of marginal association probabilities with belief propagation,” IEEE Trans. Aerosp. Electron. Syst., vol. 50, no. 4, pp. 2942–2959, Oct. 2014.
-  B. N. Vo, B. T. Vo, N. T. Pham, and D. Suter, “Joint detection and estimation of multiple objects from image observations,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5129–5141, Oct. 2010.
-  A. Scheel, S. Reuter, and K. Dietmayer, “Using separable likelihoods for laser-based vehicle tracking with a labeled Multi-Bernoulli filter,” in 2016 19th International Conference on Information Fusion (FUSION), Jul. 2016, pp. 1200–1207.
-  C. Adam, R. Schubert, and G. Wanielik, “Radar-based extended object tracking under clutter using generalized probabilistic data association,” in 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), Oct. 2013, pp. 1408–1415.
-  K. Wyffels and M. Campbell, “Negative information for occlusion reasoning in dynamic extended multiobject tracking,” IEEE Trans. Rob., vol. 31, no. 2, pp. 425–442, Apr. 2015.
-  L. Lamard, R. Chapuis, and J. P. Boyer, “Dealing with occlusions with multi targets tracking algorithms for the real road context,” in 2012 Intelligent Vehicles Symposium, Jun. 2012.
-  K. Granström, S. Reuter, D. Meissner, and A. Scheel, “A multiple model PHD approach to tracking of cars under an assumed rectangular shape,” in 17th International Conference on Information Fusion (FUSION), Jul. 2014, pp. 1–8.
-  F. Liu, J. Sparbert, and C. Stiller, “IMMPDA vehicle tracking system using asynchronous sensor fusion of radar and vision,” in 2008 IEEE Intelligent Vehicles Symposium, Jun. 2008, pp. 168–173.
-  R. Mahler, Advances in statistical multisource-multitarget information fusion. Artech House, 2014.
-  A. Scheel, S. Reuter, and K. Dietmayer, “Vehicle tracking using extended object methods: An approach for fusing radar and laser,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 231–238.
-  A. S. Rahmathullah, Á. F. García-Fernández, and L. Svensson, “Generalized optimal sub-pattern assignment metric,” in 2017 20th International Conference on Information Fusion (Fusion), Jul. 2017, pp. 1–8.
-a Highway simulations
We also create a simple simulated highway to assess occlusion handling for tracking vehicles across multiple lanes222This section is not in the published version of this paper.. The highway has four lanes, and in each lane vehicles move at a constant velocity on the center line, much like in the classic arcade game Frogger. A point sensor at the side of the highway views these vehicles. The vehicles’ widths are neglected, so their visibility depends entirely on their lane and relative angle from the sensor. For example, say there is a vehicle in the lane nearest the sensor with its back end directly in front of the sensor, and its front end at an angle ahead of the sensor. Under object-wise occlusion, any vehicles in further lanes whose front and back ends lie within and will be completely occluded. The sensor is assumed to recognize contiguous shapes, so measurement-wise occlusion operates similarly. Missed detections, false positives, and gaussian noise are applied to the sensor output in addition to occlusion. Figure 8 visualizes a single timestep of this highway, with two possible random measurement sets corresponding to the two occlusion types.
A particle filter version of the track-oriented multi-bernoulli filter is used so that closed-form updates can be performed even for partially occluded measurements. Measurement-wise occlusion probabilities can also be determined exactly, while object-wise occlusion is approximated in two different ways. The first takes the expected value of potentially occluding objects and calculates the probability that each individually occludes the target object, then combines the individual probabilities with the softmax function as in . The second stores a grid representation of the sensor’s field of view, and updates the visibility of each cell in the grid based on vehicle positions. Simulation parameters such as the magnitude of measurement noise are known to the tracker. The tracker is run for 10000 timesteps, representing over half an hour of traffic at 5 timesteps per second.
Table III shows performance of each occlusion model in terms of average GOSPA per timestep. Euclidean distance in position and length is used as the base metric. The approximate object-wise occlusion tracker work equally well under either simulated from of occlusion, with the grid approximation outperforming the expected-value approximation. The measurement-wise occlusion tracker scores slightly lower (better) than the grid approximation when the simulated occlusion type matches its assumptions, and slightly higher when object-wise occlusion is simulated. It is worth noting that this simulation is simple enough that an accurate grid approximation can be applied in real time, while more complex applications may not be able to apply it as quickly. Expected-value approximations are fast, but perform worse than the measurement-wise tracker for both simulations. Codes for the simulated tests and for the pedestrian tracking tests are available at https://github.com/utexas-ghosh-group/carstop/tree/master/MWO.