Image Moment Models for Extended Object Tracking
Abstract
In this paper, a novel image moments based model for shape estimation and tracking of an object moving with a complex trajectory is presented. The camera is assumed to be stationary looking at a moving object. Point features inside the object are sampled as measurements. An ellipsoidal approximation of the shape is assumed as a primitive shape. The shape of an ellipse is estimated using a combination of image moments. Dynamic model of image moments when the object moves under the constant velocity or coordinated turn motion model is derived as a function for the shape estimation of the object. An Unscented Kalman FilterInteracting Multiple Model (UKFIMM) filter algorithm is applied to estimate the shape of the object (approximated as an ellipse) and track its position and velocity. A likelihood function based on average loglikelihood is derived for the IMM filter. Simulation results of the proposed UKFIMM algorithm with the image moments based models are presented that show the estimations of the shape of the object moving in complex trajectories. Comparison results, using intersection over union (IOU), and position and velocity root mean square errors (RMSE) as metrics, with a benchmark algorithm from literature are presented. Results on real image data captured from the quadcopter are also presented.
I Introduction
Traditional target tracking literature[1], such as simultaneous localization and mapping (SLAM) [2, 3], structure from motion (SfM) [4, 5] and target tracking [6], models the targets as point targets. Once the estimation is performed, another layer of optimization is used to estimate the shape of the target. With the increased resolution of the modern sensors, such as phased array radar, laser range finder, 2D/3D cameras, the sensors are capable of giving more than one point measurement from an observed target at a single time instance. For instance, in a camera image, multiple SIFT/SURF points can be obtained inside a chosen region of interest (ROI) or 3D cameras, such as Kinect camera gives a collection of points in a given ROI. The multiple measurements from a target can be used to estimate and track not only the position and velocity of the centroid but also its spatial extent. The combined target tracking and shape estimation is commonly referred to as an extended object tracking (EOT) problem [7, 8].
Multiple feature points such as SIFT and SURF points can not be identified consistently and tracked individually over long period of time inside an object or multiple objects. With multiple noisy measure points generated from the target at each time step without association, the target can be roughly estimated as an ellipse shape, which will provide the kinematic (position and velocity of the centroid) and spatial extent information (orientation and size) useful for real world applications. The extended object is modeled by stick [9], Gaussian mixture model [10], rectangle [11], Gaussian process model [12], and splines [13]. Two widely used models to represent targets with spatial extent are random matrix model (RMM) [14] and elliptic random hypersurface model (RHM) [15], where the true shape of the object is approximated by an ellipse. In RMM, the shape of the target object is represented by using a symmetric positive definite (SPD) matrix. The elements of the matrix along with the centroid of the object are used as a state vector, which is estimated by using a filter. Multiple improvements to the RMM model are presented in literature [16, 17, 18, 19]. The situation when the measurement noise is comparable to the extent of the target and can not be neglected is considered in [16, 20]. Considering the target will change the size and shape abruptly especially during the maneuvering movement, the rotation matrix or scaling matrix is multiplied on both sides of the positive symmetric matrix and the corresponding filters are derived in [17, 18, 19]. The RHM model assumes each measurement source lies on a scaled version of the true ellipse describing the object, and the extent of the object is represented by the entries from the Cholesky decomposition of the SPD matrix [21, 22, 23]. In [24], a multiplicative noise term in the measurement equation is used to model the spatial distribution of the measurements and a second order extended Kalman filter is derived for a closed form recursive measurement update. In [25], comparisons between the RHM with RMM are illustrated. RHM with Fourier series expansion and levelset are applied for modeling starconvex and nonconvex shapes, respectively [23, 26]. By approximating the complex shapes as the combination of multiple ellipse subobjects, the elliptic RMMs are investigated to model irregular shapes [17, 27]. A comprehensive overview of the extended object tracking can be found in [8, 7].
The dynamic model for a moving extended object describes how the target’s kinematic parameters and extent evolve over time. For tracking a point object, the kinematic parameters such as position, velocity or acceleration can fully describe the state of the object. However, for an extended object, the object shape estimation is also important, especially when the target conducts maneuvering motion or the shape of the extended target changes abruptly. For tracking extended object using RMM, there is no explicit dynamic model and the update for the extent is based on simple heuristics which increase the extent’s covariance, while keeping the expected value constant [14]. An alternative to the heuristic update is to use Wishart distribution to approximate the transition density of the spatial extent [14, 28, 18]. The prediction update of extended targets within the RMM framework is explored by multiplying the rotation matrix or scaling matrix on the both sides of the positive symmetric matrix in [19, 18]. In [18], a comprehensive comparison results between four process models are presented. For tracking elliptic extended object using RHM, the covariance matrix of the uncertainty of the object’s shape parameters is increased at each time step to capture the variations in the shape [15].
Image moments have found a wide use in tracking, visual servoing and pattern recognition [29, 30, 2, 31]. Hu’s moments [32] invariant under translation, rotation and scaling of the object, are widely investigated in pattern recognition. In this paper, an alternative representation, using image moments, to describe an ellipse shape that can be used to approximate an extended object is presented. Dynamic models of image moments that are used to represent an extended object for the target moving in an uniform motion and a coordinated turn motion are presented. The image moments based RHM is used with the interacting multiple model (IMM) approach [33, 34, 35] for tracking extended target undergoing complex trajectories. A novel likelihood function based on average loglikelihood is derived for the IMM. An unscented Kalman filter (UFK) is used to estimate the states of each individual model of the UKFIMM filter. The UKFIMM approach assumes the target obeys one of the finite number of motion models and identifies the beginning and the end of the motion models by updating the mode probabilities. The adaptation via model probability update of the UKFIMM approach keeps the estimation errors low, both during maneuvers as well as nonmaneuver intervals. The contributions of the paper are briefly summarized as follows:

The minimal, complete, and nonambiguous representation of an elliptic object based on image moments is presented for extended object tracking. UKFIMM filter is adopted based on the multiple dynamic models and corresponding image moments based RHM.

A novel method of calculating the likelihood function, based on average loglikelihood of the image moments based RHM, is proposed for the UKFIMM filter. In order to estimate the model probability consistently, the calculation of the average loglikelihood function by unscented transformation is proposed.

Results of the UKFIMM filter with the image moments based model are presented and compared with a benchmark algorithm to validate the performance of the proposed approach.
Rest of the paper is organized as follows. In Section II, the image moments based random hypersurface model is proposed to approximate an elliptic object and its dynamic models are analytically derived. Following the framework of the random hypersurface model, the measurement model is also provided. in Section III, the Bayesian inference of the position, velocity and extent of the object from the noisy measurement points uniformly generated from the object is illustrated. Since the dynamic or measurement model is nonlinear, UKF is applied to estimate the extended object. For tracking the moving target switching between maneuvering and nonmaneuvering motions, the proposed image moments based RHM is embedded within the framework of the interacting multiple model (IMM) in section IV. The UKFIMM algorithm is illustrated with the proposed image moments based RHM, and the algorithm for the calculation of the likelihood function by using the average loglikelihood function and unscented transformation is also proposed. In section V, the proposed image moments based RHM with its dynamic models is evaluated in three tests: (1) static scenario for validating the measurement model; (2) constant velocity and coordinated turn motion to validate the dynamic models; (3) two complex trajectories are used to validate the UKFIMM algorithm with the proposed image moments based RHM, and its performance is compared with the RMM models in [20] as the benchmark. The estimation results show that the proposed model provides comparable and accurate results. In Section VI, the proposed algorithm is applied for tracking a moving car with the real trajectory. Conclusion and future work are given in Section VII. To improve legibility, the subindices, such as the time step and the measurement number will be dropped unless needed in the following.
Ii Image Moments based Random Hypersurface Model
Iia Representation of the Ellipse using Image Moments
In this section, a generalized representation of the ellipse using image moments is presented. The moment of an object in a 2D plane is defined by [29]
(1) 
where is the surface of the object and is a set of natural numbers. The centered moment is defined as [29]
(2) 
where , , and is the centroid of the object.
Any point on the surface of the object can be represented as a point located on the boundary of the scaled ellipse. The general equation of a family of ellipses in terms of semimajor, and semiminor axes, centroid, and orientation is given by
(3) 
where and are its semimajor and semiminor axes, respectively, is related to the orientation of ellipse , as , and is a scale factor. The points inside the ellipse can be represented by varying from to in (3). Rewriting (3) as follows
(4) 
Consider normalized centered moments , , , where is the area of the ellipse, , , and are centered moments. The following relationships between parameters of ellipse , , and the normalized centered image moments (, , ) can be derived [29]
(6) 
The area of ellipse, , can be written in normalized centered moments and parameters , and as follows [29]
(8) 
Let , where can be used to estimate the shape of the ellipse and represents the location of the centroid of the ellipse. An ellipse can be expressed using minimal, complete, and nonambiguous representation of parameters , in the following form
(9)  
IiB Dynamic Motion Models
In order to derive the differential equation for , the time derivative of the centered moment, is derived first. The time derivative of centered moment can be obtained from the time derivative of the contour of the ellipse as [29]
(10) 
where is the contour of the ellipse, is the velocity of the contour point , is the unitary vector normal to at point , and is an infinitesimal element of . If is piecewise continuous, and vector is tangent to and continuously differentiable, , the Green’s theorem can be used to represent (10) as [29]
(11) 
Using the constant velocity and coordinated turn models, specific differential equation of is derived for each case.
IiB1 Linear Motion Model
When an elliptical object is moving with a linear motion, each point inside the ellipse at time obeys , where is the initial velocity and is the acceleration. The centered moments of the ellipse can be calculated by putting in (11) as
(12) 
Since and are odd functions and is symmetric with respect to the centroid, the state space representation of the normalized centered moments of the ellipse is
(13) 
The state at discrete time is given by , where is a component of the state related to image moments, is the vector that includes the position and velocity of the centroid of the extended object. The discretized state equation is given as follows
(14) 
where the state transition matrix with , and is the zeromean Gaussian noise with covariance matrix , where is the noise covariance for the image moments and , where is the power spectral density. Notice that the discretized white noise acceleration model is adopted for the state vector , which is the same as the dynamic model for point based tracking. Other kinematic models for point based tracking also can be used for the state vector and can be found in [33].
IiB2 Coordinated Turn Motion Model
Coordinated turn (CT) model, characterized by constant turning rate and constant speed, are commonly used in tracking applications (cf. [33]). An elliptic extended object during the coordinated turn is shown in Fig. 1. For any point that belongs to the ellipse moving with a CT motion, the motion model of the ellipse can be represented as follows
(15)  
where is the turning rate and is the displacement between the origins of the reference frame and reference frame , the origin of the reference frame is the instantaneous center of rotation () of the object.
Substituting (15) into (12), the differential equation of centered moments of ellipse when the object is undergoing coordinated turn motion is given by
(16) 
The dynamic models of the normalized centered moments of the ellipse can be calculated using (16) as
(17)  
The state space representation of the normalized centered moments of the ellipse are
(18) 
and the solution to the state space in (18) is
(19) 
where the transition matrix , where . The derivation of the transition matrix is shown in the Appendix A.
At each time step , the complete state to be tracked is , where is a component of the state corresponding to the image moments, is the vector that includes the position, velocity of the centroid of the extended object and the turning rate of the extended object. The state equation is given as follows
(20) 
where the state transition matrix is obtained from (19), is a sampling period, , with , and is the zeromean Gaussian noise vector. Notice that this model is piecewise continuous.
IiC Measurement Model
Assuming the uniformly generated measurement without the sensor noise, (9) maps the unknown parameters to the pseudomeasurement with the squared scale term . The scaling factor is approximated to be Gaussian distributed with mean and variance [12]. Consider the real measurement of the unknown true measurement in the presence of the additive white Gaussian noise , where and , the real measurement can be expressed as . To find the relationship between the state vector and the real measurement , the measurement model is derived by substituting in (9). The following expression can be obtained
(21) 
where is the pseudomeasurement with the true value of and is a polynomial related to the white noise , which has the mean
(22) 
and covariance as
(23)  
where . The derivation of and its first two moments are shown in the Appendix B. Since the measurement model is highly nonlinear, the UKF presented in next section, is used to estimate the state vector .
Iii UKF for Extended Object Tracking using Image Moments Based RHM
On the basis of the dynamic motion models and the measurement model, a recursive Bayesian state estimator for tracking the elliptic extended objects is derived. The state vector of the elliptic extended object is . At each time step, several measurement points from the volume or area of the object’s extent are received. The task of the Bayesian state estimator is to perform backward inference, inferring the true state parameters from the measurement points. The measurement points at time step is denoted as , assuming there are measurements at time and each measurement point is . The state vector up to time step when all the measurements are incorporated is denoted as . Suppose that the posterior probability density function (pdf) at time step is available, the prediction for time step is given by the ChapmanKolmogorov equation as [36]
(24) 
the state vector evolves by the conditional density function . Assuming the Markov model is conformed, the conditional density function can be derived based on different dynamic models in Subsection IIB. Assuming the measurements at time are independent, the prediction is updated recursively via Bayes rule as
(25) 
where and .
When the target is moving with uniform motion (constant velocity model, which is a linear system), its states and covariance are predicted based on the dynamic model (14) as
(26)  
(27) 
However, the proposed image moments based RHM and its dynamic model such as coordinated turn model are nonlinear. When the system is nonlinear, the linearization method like the extended Kalman filter (EKF) will introduce large errors in the true posterior mean and covariance. UKF addresses this problem by the method of unscented transformation (UT), which doesn’t require the calculations of the Jacobian and Hessian matrices. The UT sigma point selection scheme results in approximations that are accurate to the third order for Gaussian inputs for all nonlinearities and has the same order of the overall number of computations as the EKF [37]. When the state variables in with mean and covariance are propagating through a nonlinear function , such as (19) or (21), the mean and covariance of are approximated by generating the UT sigma points as [37]
(28)  
(29) 
where . The sigma points and the weights and are calculated by [37]
(30)  
where is the scaling parameter as , is the parameter determines the spread of the sigma points around the mean , is the secondary scaling parameter usually set to and is the parameter to incorporate the prior knowledge of the distribution of . The UKF for image moments based random hypersurface model is illustrated in Algorithm. 1.
Iv Tracking extended target with IMM
The proposed image moments based random hypersurface model is embedded with the IMM approach for tracking extended target undergoing complex trajectories in this section. When the extended target is switching between maneuvering and nonmaneuvering behaviors, its kinematic state and spatial extent may change abruptly. Multiple model approaches, such as interacting multiple model (IMM), are effective to track the target with complex trajectories, especially with high maneuvering index (larger than ) [34, 33, 35]. The IMM approach assumes the target obeys one of a finite number of motion models and identifies the beginning and the end of the motion models by updating the model probabilities. The adaptation via model probability update helps the IMM approach keep the estimation errors consistently low, both during maneuvers as well as nomaneuver intervals. Details about the IMM for point target tracking can be found in literature such as [33].
The proposed image moments based random hypersurface model with the dynamic motion models, such as the constant velocity motion model and the coordinated turn motion model in Section II, are integrated in an IMM framework. Since the dynamic motion model and the measurement model are nonlinear, the UKFIMM algorithm is proposed. The flowchart of the UKFIMM algorithm are shown in Fig. 2, where is the mixing probability, is the Markov chain transition matrix between the and models and are likelihood function corresponding to the model. There are multiple measurement points at each time step, the sequential approach is adopted for UKF and the likelihood function is generated based on the measurement model.
At each time step, assuming there are measurements . The pseudomeasurement variable can be generated for each measurement , based on the predicted state vector , covariance and the measurement model in (21). The mean and the covariance of the pseudomeasurement variable , can be obtained by the method of unscented transformation (UT). Assuming the measurements are independent identically Gaussian distributed, the loglikelihood function based on the pseudomeasurement variable is
(31) 
where and are the mean and covariance of the pseudomeasurement , generated for each measurement point . In many cases, the likelihood can become extremely small. To avoid this issue, the average loglikelihood is used which is given by
(32) 
and
(33) 
which is the value of the measurement likelihood between and . This measurement likelihood is used in the IMM filter. The details of the calculation of the measurement likelihood is show in Algorithm 2.
V Simulation Results
In this section, several simulation tests are conducted to evaluate the performance of the proposed image moments based extended object tracking. To validate the measurement model in (21), the shapes of the static objects are estimated with different noise levels in the first simulation. Then the tracking of the extended target moving with linear motion and coordinated turn motion are demonstrated. The constant velocity model in (14) and the nearly coordinated turn model in (20) are used and validated for these cases. Two targets with the shapes of the plussign and ellipse are used in the simulations. At last, tracking of targets moving with maneuvering and nonmaneuvering intervals are presented. Two scenarios are simulated in this test. One with slow motion and maneuvers and the other with fast motion and maneuvers. The UKFIMM algorithm with constant velocity model and the nearly coordinated turn model is applied in these cases. The RMM and its combination with the IMM in [20] are implemented as a benchmark comparison for our proposed image moments based random hypersurface model.
The intersection over union (IoU) is used as the metric to evaluate the proposed algorithm. The IoU is defined as the area of the intersection of the estimated shape and the true shape divided by the union of the two shapes[38]
(34) 
where is the true state vector and is the estimated state vector. is between and , where the value corresponds to a perfect match between the estimated area and the groundtruth. Additionally, the root mean squared errors (RMSE) of the estimated position and velocity of the centroid of the extended target are also evaluated, which are defined as
(35) 
where is the Monte Carlo runs, is the error of the estimation from the th run. For the RMSE of the position, , where is the estimated centroid of the extended target and is the groundtruth. Similarly, for the RMSE of the velocity, the estimation error is defined as , where is the estimated velocity of the centroid and is the groundtruth.
Va Static Extended Objects
The plussign shaped target is made up of two rectangles with the width and height of and , and and , respectively. The major and minor axes of the elliptic target are set to and , respectively. The simulation is performed by uniformly sampling points from the static extended objects. Three different levels of additive Gaussian white noises with variances such as (low), (medium) and (high) are used to generate the noisy measurements.
UFK is used for estimating the state given noisy measurements of points uniformly sampled from the plussignshaped and ellipseshaped extended objects. The state is initialized as a circle with radius of located at the origin. The estimation results for the plussignshaped object are shown in Figs. 3(a), 3(b), 3(c) and the estimation results for the ellipseshaped object are shown in Figs. 3(d), 3(e), 3(f). The mean values of the IoU of the static ellipse and the plussignshaped targets with different noise levels are shown in Table I. The image moments based measurement model can precisely estimate the shape of the targets. With the increases in covariance of the measurement noise, the proposed image moments based model also gives a shape close to the actual shape of the targets. The IoU value for the plussign shaped target is lower than the elliptical target because the ellipse is used to roughly estimate the plussign shape.
Target shape  Static target  Linear motion  Coordinated turn motion  

Low  Medium  High  
Ellipse  0.88  0.85  0.88  0.87  
Plussign  0.48  0.47  0.46  0.48 
VB Linear Motion
In this subsection, extended objects with plussign and elliptical shapes moving with a constant velocity are simulated. The plussign shaped target is made up of two rectangles with the width and height of and , and and , respectively. For the ellipseshaped object, the major and minor axes are set to and , respectively. The extended objects start moving from position with a constant velocity of for seconds and the measurements are generated from the targets at every seconds. At each time step , measurement points uniformly sampled from the objects are generated.
For UKF implementation, the states are initialized as a circle with radius of , located at with a constant velocity of . The Gaussian white noise variance is selected as for each point measurement. The parameter for the process noise covariance in the constant velocity model in (14) is set as and . The tracking results for ellipseshaped extended object are shown in Fig. 4(a) and for plussignshaped extended object are shown in Fig. 4(b). It can be seen that the shapes of targets are being estimated accurately as more measurements are obtained. The mean value of the RMSE of the position over Monte Carlo runs is for ellipse and for the plussign. The mean value of the RMSE of the velocity over Monte Carlo runs is for ellipse and for the plussign. The mean values of the IoU of the ellipse and the plussignshaped targets during the linear motion are shown in Table I.
VC Coordinated Turn Motion
The extended object undergoing coordinated turn is simulated in this case. The extended object with the shape of the plussign starts from with velocity at time , then it executes a coordinated turn for seconds. The extended elliptic object executes a coordinated turn for seconds. The sampling interval is seconds. At each time step, noisy measurement points are uniformly generated from the extents of the targets. The noise variance is selected as for each point measurement.
The extended objects executing a coordinated turn are estimated based on the dynamic model (20). The states are initialized as a circular shape with radius of . The tracking results for ellipseshaped object are shown in Fig. 4(c) and tracking results for plussignshaped object are shown in Fig. 4(d). The mean value of the RMSE of the position over Monte Carlo runs is for ellipse and for the plussign. The mean value of the RMSE of the velocity over Monte Carlo runs is for ellipse and for the plussign. The mean values of the IoU of the ellipse and the plussignshaped targets during the coordinated turn motion are shown in Table I. The image moments based model, which provides a dynamic model for the shape of extended object undergoing a coordinated turn, can estimate the positions and velocities of the target, as well as the orientations and extents of the targets very accurately.
VD Complex trajectories
The image moments based RHM is embedded in the IMM framework. The proposed model is tested in two simulations of the extended elliptical objects switching between maneuvering and nonmaneuvering intervals multiple times.
VD1 Slow motion and maneuvering case
The target is moving with a constant velocity of , with initial state in Cartesian coordinates (with position in m). The target first executes a coordinated turn with the turning rate of at 260 second for 100 seconds, then it goes through two coordinated turns with the turning rate of at 570 second and 830 second for seconds. The trajectory is shown in Fig. 5. The major and minor axes of the elliptical target are set to m and m, respectively. The number of the measurements in each scan is generated based on the Poisson distribution with mean of , and the measurement points are uniformly distributed. The variance of the measurement noise is , and the sampling time is s.
The proposed Image moments based random hypersurface model with UKFIMM algorithm is compared with the RMM with IMM algorithm in [20]. The RMMIMM algorithm uses two models. The model with a high kinematic process noise and a high extension agility accounts for abrupt changes in shape and orientation during maneuvers, and another model with low kinematic noise and a low extension agility accounts for the nonmaneuvers. The extension agility is set as 10 and 5 separately for both models. The kinematic states of both models use the constant velocity model (the kinematic dynamic model in (14)), and the parameter in (14) is set as and respectively. The proposed image moments based RHM with the UKFIMM filter combines the constant velocity model in (14) and the coordinated turn model in (20). The parameter for the process noise covariance in the constant velocity model in (14) is set as and . For the coordinated turn model in (20), . The initial probability of the two models in the IMM filter for both algorithms is set as equal and the Markov chain transition matrix is selected to be . The model probability of the proposed algorithm is shown in Fig. 7. With the same trajectory, the two algorithms run with 1000 Monte Carlo runs and their simulation results are shown in Fig. 6. The proposed algorithm has lower RMS errors both for position and velocity and the RMM algorithm in [20] has better IOU values.
VD2 Fast motion and maneuvering case
The typical trajectory from [33] is used for this simulation and shown in Fig. 8. The target is moving with a constant velocity of , with initial state in Cartesian coordinates (with position in m). The details of its maneuvering and nonmaneuvering intervals are shown in Table II. For easily visualizing, the major and minor axes of the elliptical target are enlarged to m and m, respectively. The number of the measurements in each scan is generated based on the Poisson distribution with mean of , and the measurement points are uniformly distributed. The variance of the sensor noise is , and the sampling time is s.
Time (second)  Model  Turning rate ()  Turning direction  Acceleration 

CV  0  
CT  2  left  0.89g  
CV  0  
CT  1  right  0.45g  
CT  1  left  0.45g  
CT  1  right  0.45g  
CV  0 
The RMMIMM algorithm [20] also consists of two models. The extension agility is set as 10 and 5 separately for both models. The kinematic states of both models use the constant velocity model (the kinematic dynamic model in (14)), and the parameter in (14) is set as and separately. The proposed image moments based RHM with the UKFIMM filter combines the constant velocity model in (14) and the coordinated turn model in (20). The parameter for the process noise covariance in the constant velocity model in (14) is set as and . For the coordinated turn model in (20), . The initial probability of the two models in the IMM filter for both algorithms is set as equal and the Markov chain transition matrix is selected to be . The model probability of the proposed algorithm is shown in Fig. 10. With the same trajectory, the two algorithms run with 1000 Monte Carlo runs and their simulation results are shown in Fig. 9.
The proposed image moments based RHM and its measurement and dynamic models are validated in the simulations of the static targets, the targets with linear motion and with the coordinated turn motion. As the noise levels are increased, the size of the estimated elliptical shape doesn’t increase as the sensor noise increases. When the targets are performing during the linear motion or the coordinated turn motion, the proposed algorithm can predict the position and velocity of the moving target, as well as the spatial extent and orientation of the targets. To estimate the target moves switching between the maneuvering and nonmaneuvering intervals. the proposed image moments based RHM is embedded with the IMM framework. The proposed average measurement loglikelihood function can estimate the model probability accurately and consistently. The RMSE values of the position and velocity of the target’s centroid is lower than the results from the RMM. The state variables of the RMM is the centroid and the random matrix, which is updated based on the mean and spread matrix of the measurement points[20]. The proposed RHM using the centroid and the three image moments as the state variables, which is updated based on each individual measurement point. When the number of the measurement points is small or noisy, the proposed image moments based RHM estimates the position and velocity of the centroid accurately, while the mean of the measurement points is far away from the position of the centroid. The accurate dynamic model has the advantage of predicting the location of the target, especially when predict the location of the target undergoing fast motion and the sampling frequency is relatively low.
Vi Experiment
In this section, the proposed image moments based RHM is applied for tracking a moving car represented as an extended object in a real video. A short video clip from the Stanford drone dataset [39] is used, which shows a moving car from a bird’s eye view. The video is captured with a 4k camera mounted on a quadcopter platform (a 3DR solo) hovering above an intersection on a university campus at an altitude of approximately meters which contains 431 frames with the image size of 1422 by 1945 pixels and the video has been undistorted and stabilized [39]. The ground truth is manually labeled at each frame and the measurement points are uniformly generated inside the bounding box of the ground truth. The number of measurements in each frame is generated based on the Poisson distribution with mean of . The sensor noise is Gaussian white noise with variance . In Fig. 11, the first topview scene of the moving car is shown and snapshots of the estimation results out of the 431 frames are plotted in the same figure. The target is moving switching between the linear motions and the rotational motions. The constant velocity model in (14) and the coordinated turn model in (20) with the UKFIMM filter are applied to track the moving car. The parameter for the process noise covariance in the constant velocity model in (14) is set as and . For the coordinated turn model in (20), (with position is in pixels). The initial probability of the two models in the IMM filter for both algorithms is set as equal and the Markov chain transition matrix is selected to be .
The proposed algorithms run with 1000 Monte Carlo runs and their estimation results are shown in Fig. 12. The mean value of the RMSE of the centroid position over Monte Carlo runs is . The mean value of the IoU (the ground truth is approximated as an ellipse with the axes are same as the width and height of the corresponding bounding box and they have the same orientation) over Monte Carlo runs is .
Vii Conclusion
In this paper, the minimal, complete, and nonambiguous representation of an elliptic object is modeled based on image moments for extended object tracking. The measurement model and the dynamic models of the image moments for linear motion and coordinated turn motion are analytically derived. The unscented Kalman filter and its combination with the interacting multiple model approach is applied for estimating the position, velocity and spatial extent based on the noisy measurement points uniformly generated from the extended target. The proposed image moments based random hypersurface model and its filters are validated and evaluated in different simulation scenarios and one real trajectory. The evaluation results show that the proposed model and its inference can provide accurate estimations of the position, velocity and extents of the targets. The proposed Image moments based RHM for tracking the extended objects can be embedded into other Bayesian based methods, such as multiple hypothesis tracking techniques or probabilistic data association filters.
Appendix A Transition matrix of the coordinated turn motion
The interpolation polynomial method [33] is used to get the transition matrix of the dynamic equation . Firstly, By solving , the eigenvalues of the matrix is calculated as , and . Then, a polynomial of degree of as is found, which is equal to on the spectrum of , that is
(38) 
which and . The polynomial is calculated as