Image Moment Models for Extended Object Tracking

Image Moment Models for Extended Object Tracking

   Gang Yao, Ashwin Dani Gang Yao and Ashwin Dani are with the Department of Electrical and Computer Engineering, University of Connecticut Storrs, Connecticut, USA. e-mail: gang.yao@uconn.edu, ashwin.dani@uconn.edu
Abstract

In this paper, a novel image moments based model for shape estimation and tracking of an object moving with a complex trajectory is presented. The camera is assumed to be stationary looking at a moving object. Point features inside the object are sampled as measurements. An ellipsoidal approximation of the shape is assumed as a primitive shape. The shape of an ellipse is estimated using a combination of image moments. Dynamic model of image moments when the object moves under the constant velocity or coordinated turn motion model is derived as a function for the shape estimation of the object. An Unscented Kalman Filter-Interacting Multiple Model (UKF-IMM) filter algorithm is applied to estimate the shape of the object (approximated as an ellipse) and track its position and velocity. A likelihood function based on average log-likelihood is derived for the IMM filter. Simulation results of the proposed UKF-IMM algorithm with the image moments based models are presented that show the estimations of the shape of the object moving in complex trajectories. Comparison results, using intersection over union (IOU), and position and velocity root mean square errors (RMSE) as metrics, with a benchmark algorithm from literature are presented. Results on real image data captured from the quadcopter are also presented.

Extended Object Tracking, Shape Estimation, Image Moments Dynamic Model, Log-Likelihood for filtering

I Introduction

Traditional target tracking literature[1], such as simultaneous localization and mapping (SLAM) [2, 3], structure from motion (SfM) [4, 5] and target tracking [6], models the targets as point targets. Once the estimation is performed, another layer of optimization is used to estimate the shape of the target. With the increased resolution of the modern sensors, such as phased array radar, laser range finder, 2D/3D cameras, the sensors are capable of giving more than one point measurement from an observed target at a single time instance. For instance, in a camera image, multiple SIFT/SURF points can be obtained inside a chosen region of interest (ROI) or 3D cameras, such as Kinect camera gives a collection of points in a given ROI. The multiple measurements from a target can be used to estimate and track not only the position and velocity of the centroid but also its spatial extent. The combined target tracking and shape estimation is commonly referred to as an extended object tracking (EOT) problem [7, 8].

Multiple feature points such as SIFT and SURF points can not be identified consistently and tracked individually over long period of time inside an object or multiple objects. With multiple noisy measure points generated from the target at each time step without association, the target can be roughly estimated as an ellipse shape, which will provide the kinematic (position and velocity of the centroid) and spatial extent information (orientation and size) useful for real world applications. The extended object is modeled by stick [9], Gaussian mixture model [10], rectangle [11], Gaussian process model [12], and splines [13]. Two widely used models to represent targets with spatial extent are random matrix model (RMM) [14] and elliptic random hyper-surface model (RHM) [15], where the true shape of the object is approximated by an ellipse. In RMM, the shape of the target object is represented by using a symmetric positive definite (SPD) matrix. The elements of the matrix along with the centroid of the object are used as a state vector, which is estimated by using a filter. Multiple improvements to the RMM model are presented in literature [16, 17, 18, 19]. The situation when the measurement noise is comparable to the extent of the target and can not be neglected is considered in [16, 20]. Considering the target will change the size and shape abruptly especially during the maneuvering movement, the rotation matrix or scaling matrix is multiplied on both sides of the positive symmetric matrix and the corresponding filters are derived in [17, 18, 19]. The RHM model assumes each measurement source lies on a scaled version of the true ellipse describing the object, and the extent of the object is represented by the entries from the Cholesky decomposition of the SPD matrix [21, 22, 23]. In [24], a multiplicative noise term in the measurement equation is used to model the spatial distribution of the measurements and a second order extended Kalman filter is derived for a closed form recursive measurement update. In [25], comparisons between the RHM with RMM are illustrated. RHM with Fourier series expansion and level-set are applied for modeling star-convex and non-convex shapes, respectively [23, 26]. By approximating the complex shapes as the combination of multiple ellipse sub-objects, the elliptic RMMs are investigated to model irregular shapes [17, 27]. A comprehensive overview of the extended object tracking can be found in [8, 7].

The dynamic model for a moving extended object describes how the target’s kinematic parameters and extent evolve over time. For tracking a point object, the kinematic parameters such as position, velocity or acceleration can fully describe the state of the object. However, for an extended object, the object shape estimation is also important, especially when the target conducts maneuvering motion or the shape of the extended target changes abruptly. For tracking extended object using RMM, there is no explicit dynamic model and the update for the extent is based on simple heuristics which increase the extent’s covariance, while keeping the expected value constant [14]. An alternative to the heuristic update is to use Wishart distribution to approximate the transition density of the spatial extent [14, 28, 18]. The prediction update of extended targets within the RMM framework is explored by multiplying the rotation matrix or scaling matrix on the both sides of the positive symmetric matrix in [19, 18]. In [18], a comprehensive comparison results between four process models are presented. For tracking elliptic extended object using RHM, the covariance matrix of the uncertainty of the object’s shape parameters is increased at each time step to capture the variations in the shape [15].

Image moments have found a wide use in tracking, visual servoing and pattern recognition [29, 30, 2, 31]. Hu’s moments [32] invariant under translation, rotation and scaling of the object, are widely investigated in pattern recognition. In this paper, an alternative representation, using image moments, to describe an ellipse shape that can be used to approximate an extended object is presented. Dynamic models of image moments that are used to represent an extended object for the target moving in an uniform motion and a coordinated turn motion are presented. The image moments based RHM is used with the interacting multiple model (IMM) approach [33, 34, 35] for tracking extended target undergoing complex trajectories. A novel likelihood function based on average log-likelihood is derived for the IMM. An unscented Kalman filter (UFK) is used to estimate the states of each individual model of the UKF-IMM filter. The UKF-IMM approach assumes the target obeys one of the finite number of motion models and identifies the beginning and the end of the motion models by updating the mode probabilities. The adaptation via model probability update of the UKF-IMM approach keeps the estimation errors low, both during maneuvers as well as non-maneuver intervals. The contributions of the paper are briefly summarized as follows:

  • The minimal, complete, and non-ambiguous representation of an elliptic object based on image moments is presented for extended object tracking. UKF-IMM filter is adopted based on the multiple dynamic models and corresponding image moments based RHM.

  • A novel method of calculating the likelihood function, based on average log-likelihood of the image moments based RHM, is proposed for the UKF-IMM filter. In order to estimate the model probability consistently, the calculation of the average log-likelihood function by unscented transformation is proposed.

  • Results of the UKF-IMM filter with the image moments based model are presented and compared with a benchmark algorithm to validate the performance of the proposed approach.

Rest of the paper is organized as follows. In Section II, the image moments based random hypersurface model is proposed to approximate an elliptic object and its dynamic models are analytically derived. Following the framework of the random hypersurface model, the measurement model is also provided. in Section III, the Bayesian inference of the position, velocity and extent of the object from the noisy measurement points uniformly generated from the object is illustrated. Since the dynamic or measurement model is nonlinear, UKF is applied to estimate the extended object. For tracking the moving target switching between maneuvering and non-maneuvering motions, the proposed image moments based RHM is embedded within the framework of the interacting multiple model (IMM) in section IV. The UKF-IMM algorithm is illustrated with the proposed image moments based RHM, and the algorithm for the calculation of the likelihood function by using the average log-likelihood function and unscented transformation is also proposed. In section V, the proposed image moments based RHM with its dynamic models is evaluated in three tests: (1) static scenario for validating the measurement model; (2) constant velocity and coordinated turn motion to validate the dynamic models; (3) two complex trajectories are used to validate the UKF-IMM algorithm with the proposed image moments based RHM, and its performance is compared with the RMM models in [20] as the benchmark. The estimation results show that the proposed model provides comparable and accurate results. In Section VI, the proposed algorithm is applied for tracking a moving car with the real trajectory. Conclusion and future work are given in Section VII. To improve legibility, the subindices, such as the time step and the measurement number will be dropped unless needed in the following.

Ii Image Moments based Random Hypersurface Model

Ii-a Representation of the Ellipse using Image Moments

In this section, a generalized representation of the ellipse using image moments is presented. The moment of an object in a 2D plane is defined by [29]

(1)

where is the surface of the object and is a set of natural numbers. The centered moment is defined as [29]

(2)

where , , and is the centroid of the object.

Any point on the surface of the object can be represented as a point located on the boundary of the scaled ellipse. The general equation of a family of ellipses in terms of semi-major, and semi-minor axes, centroid, and orientation is given by

(3)

where and are its semi-major and semi-minor axes, respectively, is related to the orientation of ellipse , as , and is a scale factor. The points inside the ellipse can be represented by varying from to in (3). Rewriting (3) as follows

(4)

Consider normalized centered moments , , , where is the area of the ellipse, , , and are centered moments. The following relationships between parameters of ellipse , , and the normalized centered image moments (, , ) can be derived [29]

(5)

Substituting (5) into (4), the following expression is obtained

(6)

The area of ellipse, , can be written in normalized centered moments and parameters , and as follows [29]

(7)

Using (7), (6) can be represented as follows

(8)

Let , where can be used to estimate the shape of the ellipse and represents the location of the centroid of the ellipse. An ellipse can be expressed using minimal, complete, and non-ambiguous representation of parameters , in the following form

(9)

Ii-B Dynamic Motion Models

In order to derive the differential equation for , the time derivative of the centered moment, is derived first. The time derivative of centered moment can be obtained from the time derivative of the contour of the ellipse as [29]

(10)

where is the contour of the ellipse, is the velocity of the contour point , is the unitary vector normal to at point , and is an infinitesimal element of . If is piece-wise continuous, and vector is tangent to and continuously differentiable, , the Green’s theorem can be used to represent (10) as [29]

(11)

Using the constant velocity and coordinated turn models, specific differential equation of is derived for each case.

Ii-B1 Linear Motion Model

When an elliptical object is moving with a linear motion, each point inside the ellipse at time obeys , where is the initial velocity and is the acceleration. The centered moments of the ellipse can be calculated by putting in (11) as

(12)

Since and are odd functions and is symmetric with respect to the centroid, the state space representation of the normalized centered moments of the ellipse is

(13)

The state at discrete time is given by , where is a component of the state related to image moments, is the vector that includes the position and velocity of the centroid of the extended object. The discretized state equation is given as follows

(14)

where the state transition matrix with , and is the zero-mean Gaussian noise with covariance matrix , where is the noise covariance for the image moments and , where is the power spectral density. Notice that the discretized white noise acceleration model is adopted for the state vector , which is the same as the dynamic model for point based tracking. Other kinematic models for point based tracking also can be used for the state vector and can be found in [33].

Ii-B2 Coordinated Turn Motion Model

Coordinated turn (CT) model, characterized by constant turning rate and constant speed, are commonly used in tracking applications (cf. [33]). An elliptic extended object during the coordinated turn is shown in Fig. 1. For any point that belongs to the ellipse moving with a CT motion, the motion model of the ellipse can be represented as follows

(15)

where is the turning rate and is the displacement between the origins of the reference frame and reference frame , the origin of the reference frame is the instantaneous center of rotation () of the object.

Figure 1: Coordinated turn model of the elliptic extended object.

Substituting (15) into (12), the differential equation of centered moments of ellipse when the object is undergoing coordinated turn motion is given by

(16)

The dynamic models of the normalized centered moments of the ellipse can be calculated using (16) as

(17)

The state space representation of the normalized centered moments of the ellipse are

(18)

and the solution to the state space in (18) is

(19)

where the transition matrix , where . The derivation of the transition matrix is shown in the Appendix A.

At each time step , the complete state to be tracked is , where is a component of the state corresponding to the image moments, is the vector that includes the position, velocity of the centroid of the extended object and the turning rate of the extended object. The state equation is given as follows

(20)

where the state transition matrix is obtained from (19), is a sampling period, , with , and is the zero-mean Gaussian noise vector. Notice that this model is piece-wise continuous.

Ii-C Measurement Model

Assuming the uniformly generated measurement without the sensor noise, (9) maps the unknown parameters to the pseudo-measurement with the squared scale term . The scaling factor is approximated to be Gaussian distributed with mean and variance [12]. Consider the real measurement of the unknown true measurement in the presence of the additive white Gaussian noise , where and , the real measurement can be expressed as . To find the relationship between the state vector and the real measurement , the measurement model is derived by substituting in (9). The following expression can be obtained

(21)

where is the pseudo-measurement with the true value of and is a polynomial related to the white noise , which has the mean

(22)

and covariance as

(23)

where . The derivation of and its first two moments are shown in the Appendix B. Since the measurement model is highly nonlinear, the UKF presented in next section, is used to estimate the state vector .

Iii UKF for Extended Object Tracking using Image Moments Based RHM

On the basis of the dynamic motion models and the measurement model, a recursive Bayesian state estimator for tracking the elliptic extended objects is derived. The state vector of the elliptic extended object is . At each time step, several measurement points from the volume or area of the object’s extent are received. The task of the Bayesian state estimator is to perform backward inference, inferring the true state parameters from the measurement points. The measurement points at time step is denoted as , assuming there are measurements at time and each measurement point is . The state vector up to time step when all the measurements are incorporated is denoted as . Suppose that the posterior probability density function (pdf) at time step is available, the prediction for time step is given by the Chapman-Kolmogorov equation as [36]

(24)

the state vector evolves by the conditional density function . Assuming the Markov model is conformed, the conditional density function can be derived based on different dynamic models in Subsection II-B. Assuming the measurements at time are independent, the prediction is updated recursively via Bayes rule as

(25)

where and .

When the target is moving with uniform motion (constant velocity model, which is a linear system), its states and covariance are predicted based on the dynamic model (14) as

(26)
(27)

However, the proposed image moments based RHM and its dynamic model such as coordinated turn model are nonlinear. When the system is nonlinear, the linearization method like the extended Kalman filter (EKF) will introduce large errors in the true posterior mean and covariance. UKF addresses this problem by the method of unscented transformation (UT), which doesn’t require the calculations of the Jacobian and Hessian matrices. The UT sigma point selection scheme results in approximations that are accurate to the third order for Gaussian inputs for all nonlinearities and has the same order of the overall number of computations as the EKF [37]. When the state variables in with mean and covariance are propagating through a nonlinear function , such as (19) or (21), the mean and covariance of are approximated by generating the UT sigma points as [37]

(28)
(29)

where . The sigma points and the weights and are calculated by [37]

(30)

where is the scaling parameter as , is the parameter determines the spread of the sigma points around the mean , is the secondary scaling parameter usually set to and is the parameter to incorporate the prior knowledge of the distribution of . The UKF for image moments based random hypersurface model is illustrated in Algorithm. 1.

Set the time steps ;
Set the initial state vector and covariance ;
for =1 to N do
       case Constant velocity model do
             State is predicted as in (26); Covariance is predicted as in (27);
      case Coordinated turn model do
             Augment the state vector ; Calculate sigma points using (30); States prediction based on (20) with sigma points; Using (28), (29) to calculate the mean and covariance of the state vector ;
      Obtain the measurement points at time step ;
       for =1 to  do
             Calculate the mean and covariance of using (22) and (23);
             Augment the state vector ;
             Calculate sigma points using (30);
             Pseudo-measurement calculated based on (21) for measurement point ;
             Using (28), (29) to calculate the mean and covariance of the ;
             Update state vector ;
            
      
Algorithm 1 UKF with sequential processing of measurements.

Iv Tracking extended target with IMM

The proposed image moments based random hypersurface model is embedded with the IMM approach for tracking extended target undergoing complex trajectories in this section. When the extended target is switching between maneuvering and non-maneuvering behaviors, its kinematic state and spatial extent may change abruptly. Multiple model approaches, such as interacting multiple model (IMM), are effective to track the target with complex trajectories, especially with high maneuvering index (larger than ) [34, 33, 35]. The IMM approach assumes the target obeys one of a finite number of motion models and identifies the beginning and the end of the motion models by updating the model probabilities. The adaptation via model probability update helps the IMM approach keep the estimation errors consistently low, both during maneuvers as well as no-maneuver intervals. Details about the IMM for point target tracking can be found in literature such as [33].

The proposed image moments based random hypersurface model with the dynamic motion models, such as the constant velocity motion model and the coordinated turn motion model in Section II, are integrated in an IMM framework. Since the dynamic motion model and the measurement model are nonlinear, the UKF-IMM algorithm is proposed. The flowchart of the UKF-IMM algorithm are shown in Fig. 2, where is the mixing probability, is the Markov chain transition matrix between the and models and are likelihood function corresponding to the model. There are multiple measurement points at each time step, the sequential approach is adopted for UKF and the likelihood function is generated based on the measurement model.

Figure 2: Flowchart of UKF-IMM framework.

At each time step, assuming there are measurements . The pseudo-measurement variable can be generated for each measurement , based on the predicted state vector , covariance and the measurement model in (21). The mean and the covariance of the pseudo-measurement variable , can be obtained by the method of unscented transformation (UT). Assuming the measurements are independent identically Gaussian distributed, the log-likelihood function based on the pseudo-measurement variable is

(31)

where and are the mean and covariance of the pseudo-measurement , generated for each measurement point . In many cases, the likelihood can become extremely small. To avoid this issue, the average log-likelihood is used which is given by

(32)

and

(33)

which is the value of the measurement likelihood between and . This measurement likelihood is used in the IMM filter. The details of the calculation of the measurement likelihood is show in Algorithm 2.

Obtain the predicted state vector and covariance of model ;
Obtain the measurement points at time step ;
for =1 to  do
       Calculate the mean and covariance of using (22) and (23);
       Augment the state vector ;
       Calculate sigma points using (30);
       Propagate sigma points through the measurement model in (21);
       Using (28), (29) to calculate the mean and covariance of the pseudo-measurement ;
       Summation of the value of the log-likelihood function using (31);
      
Calculation of the value of the average log-likelihood function using (32) and the measurement likelihood using (33);
Algorithm 2 Calculation of the measurement likelihood corresponding to the model by unscented transformation.

V Simulation Results

In this section, several simulation tests are conducted to evaluate the performance of the proposed image moments based extended object tracking. To validate the measurement model in (21), the shapes of the static objects are estimated with different noise levels in the first simulation. Then the tracking of the extended target moving with linear motion and coordinated turn motion are demonstrated. The constant velocity model in (14) and the nearly coordinated turn model in (20) are used and validated for these cases. Two targets with the shapes of the plus-sign and ellipse are used in the simulations. At last, tracking of targets moving with maneuvering and non-maneuvering intervals are presented. Two scenarios are simulated in this test. One with slow motion and maneuvers and the other with fast motion and maneuvers. The UKF-IMM algorithm with constant velocity model and the nearly coordinated turn model is applied in these cases. The RMM and its combination with the IMM in [20] are implemented as a benchmark comparison for our proposed image moments based random hypersurface model.

The intersection over union (IoU) is used as the metric to evaluate the proposed algorithm. The IoU is defined as the area of the intersection of the estimated shape and the true shape divided by the union of the two shapes[38]

(34)

where is the true state vector and is the estimated state vector. is between and , where the value corresponds to a perfect match between the estimated area and the ground-truth. Additionally, the root mean squared errors (RMSE) of the estimated position and velocity of the centroid of the extended target are also evaluated, which are defined as

(35)

where is the Monte Carlo runs, is the error of the estimation from the th run. For the RMSE of the position, , where is the estimated centroid of the extended target and is the ground-truth. Similarly, for the RMSE of the velocity, the estimation error is defined as , where is the estimated velocity of the centroid and is the ground-truth.

V-a Static Extended Objects

The plus-sign shaped target is made up of two rectangles with the width and height of and , and and , respectively. The major and minor axes of the elliptic target are set to and , respectively. The simulation is performed by uniformly sampling points from the static extended objects. Three different levels of additive Gaussian white noises with variances such as (low), (medium) and (high) are used to generate the noisy measurements.

UFK is used for estimating the state given noisy measurements of points uniformly sampled from the plus-sign-shaped and ellipse-shaped extended objects. The state is initialized as a circle with radius of located at the origin. The estimation results for the plus-sign-shaped object are shown in Figs. 3(a), 3(b), 3(c) and the estimation results for the ellipse-shaped object are shown in Figs. 3(d), 3(e), 3(f). The mean values of the IoU of the static ellipse and the plus-sign-shaped targets with different noise levels are shown in Table I. The image moments based measurement model can precisely estimate the shape of the targets. With the increases in covariance of the measurement noise, the proposed image moments based model also gives a shape close to the actual shape of the targets. The IoU value for the plus-sign shaped target is lower than the elliptical target because the ellipse is used to roughly estimate the plus-sign shape.

     

     

Figure 3: Estimation of the shape of extended targets with different measurement noise levels in a particular run; estimation of the shape is drawn in green color and the gray shape is the ground truth. (a) - (c) estimations of the shape of the extended target (with the shape of the plus-sign) with three different measurement noise levels; (d) - (f) estimations of the elliptic target with three different measurement noise levels.
Target shape Static target Linear motion Coordinated turn motion
Low Medium High
Ellipse 0.88 0.85 0.88 0.87
Plus-sign 0.48 0.47 0.46 0.48
Table I: The mean value of Intersection-Over-Union (IoU) between the true and the estimated target region in different simulated scenarios. Three noise levels (low , medium and high) are used to evaluate the static targets (ellipse and plus-sign-shaped target). The mean value is calculated over 100 Monte Carlo runs.

(a) ellipse     (b) plus-sign     (c) ellipse     (d) plus-sign

Figure 4: Tracking of the extended objects during constant velocity model or coordinated turn model in a particular run; Initial shape is shown in red color, estimation of the shape is drawn in green color and the gray shape is the ground truth; (a) ellipse with a constant velocity; (b) plus-sign with a constant velocity; (c) the ellipse executes a coordinated turn; (d) the plus-sign executes a coordinated turn.

V-B Linear Motion

In this subsection, extended objects with plus-sign and elliptical shapes moving with a constant velocity are simulated. The plus-sign shaped target is made up of two rectangles with the width and height of and , and and , respectively. For the ellipse-shaped object, the major and minor axes are set to and , respectively. The extended objects start moving from position with a constant velocity of for seconds and the measurements are generated from the targets at every seconds. At each time step , measurement points uniformly sampled from the objects are generated.

For UKF implementation, the states are initialized as a circle with radius of , located at with a constant velocity of . The Gaussian white noise variance is selected as for each point measurement. The parameter for the process noise covariance in the constant velocity model in (14) is set as and . The tracking results for ellipse-shaped extended object are shown in Fig. 4(a) and for plus-sign-shaped extended object are shown in Fig. 4(b). It can be seen that the shapes of targets are being estimated accurately as more measurements are obtained. The mean value of the RMSE of the position over Monte Carlo runs is for ellipse and for the plus-sign. The mean value of the RMSE of the velocity over Monte Carlo runs is for ellipse and for the plus-sign. The mean values of the IoU of the ellipse and the plus-sign-shaped targets during the linear motion are shown in Table I.

V-C Coordinated Turn Motion

The extended object undergoing coordinated turn is simulated in this case. The extended object with the shape of the plus-sign starts from with velocity at time , then it executes a coordinated turn for seconds. The extended elliptic object executes a coordinated turn for seconds. The sampling interval is seconds. At each time step, noisy measurement points are uniformly generated from the extents of the targets. The noise variance is selected as for each point measurement.

The extended objects executing a coordinated turn are estimated based on the dynamic model (20). The states are initialized as a circular shape with radius of . The tracking results for ellipse-shaped object are shown in Fig. 4(c) and tracking results for plus-sign-shaped object are shown in Fig. 4(d). The mean value of the RMSE of the position over Monte Carlo runs is for ellipse and for the plus-sign. The mean value of the RMSE of the velocity over Monte Carlo runs is for ellipse and for the plus-sign. The mean values of the IoU of the ellipse and the plus-sign-shaped targets during the coordinated turn motion are shown in Table I. The image moments based model, which provides a dynamic model for the shape of extended object undergoing a coordinated turn, can estimate the positions and velocities of the target, as well as the orientations and extents of the targets very accurately.

V-D Complex trajectories

The image moments based RHM is embedded in the IMM framework. The proposed model is tested in two simulations of the extended elliptical objects switching between maneuvering and non-maneuvering intervals multiple times.

V-D1 Slow motion and maneuvering case

The target is moving with a constant velocity of , with initial state in Cartesian coordinates (with position in m). The target first executes a coordinated turn with the turning rate of at 260 second for 100 seconds, then it goes through two coordinated turns with the turning rate of at 570 second and 830 second for seconds. The trajectory is shown in Fig. 5. The major and minor axes of the elliptical target are set to m and m, respectively. The number of the measurements in each scan is generated based on the Poisson distribution with mean of , and the measurement points are uniformly distributed. The variance of the measurement noise is , and the sampling time is s.

Figure 5: The trajectory, measurements and one example run of the simulation. Estimation results are shown for every seconds.

(a) Average RMSE of position     (b) Average RMSE of velocity     (c) Average IoU

Figure 6: Simulation results of the proposed image moments based RHM algorithm compared with the RMM algorithm in [20] over Monte Carlo runs: (a) The average RMSE of the position of the centroid; (b) The average RMSE of the velocity of the centroid; (c) The average IoU.

The proposed Image moments based random hypersurface model with UKF-IMM algorithm is compared with the RMM with IMM algorithm in [20]. The RMM-IMM algorithm uses two models. The model with a high kinematic process noise and a high extension agility accounts for abrupt changes in shape and orientation during maneuvers, and another model with low kinematic noise and a low extension agility accounts for the non-maneuvers. The extension agility is set as 10 and 5 separately for both models. The kinematic states of both models use the constant velocity model (the kinematic dynamic model in (14)), and the parameter in (14) is set as and respectively. The proposed image moments based RHM with the UKF-IMM filter combines the constant velocity model in (14) and the coordinated turn model in (20). The parameter for the process noise covariance in the constant velocity model in (14) is set as and . For the coordinated turn model in (20), . The initial probability of the two models in the IMM filter for both algorithms is set as equal and the Markov chain transition matrix is selected to be . The model probability of the proposed algorithm is shown in Fig. 7. With the same trajectory, the two algorithms run with 1000 Monte Carlo runs and their simulation results are shown in Fig. 6. The proposed algorithm has lower RMS errors both for position and velocity and the RMM algorithm in [20] has better IOU values.

Figure 7: Model probability of the UKF-IMM filter for the image moments based RHM.

V-D2 Fast motion and maneuvering case

The typical trajectory from [33] is used for this simulation and shown in Fig. 8. The target is moving with a constant velocity of , with initial state in Cartesian coordinates (with position in m). The details of its maneuvering and non-maneuvering intervals are shown in Table II. For easily visualizing, the major and minor axes of the elliptical target are enlarged to m and m, respectively. The number of the measurements in each scan is generated based on the Poisson distribution with mean of , and the measurement points are uniformly distributed. The variance of the sensor noise is , and the sampling time is s.

Figure 8: Trajectory of the extended object and the estimation results of the image moments based RHM in one run.
Time (second) Model Turning rate () Turning direction Acceleration
CV 0
CT 2 left 0.89g
CV 0
CT 1 right 0.45g
CT 1 left 0.45g
CT 1 right 0.45g
CV 0
Table II: Details on the complex trajectory.

The RMM-IMM algorithm [20] also consists of two models. The extension agility is set as 10 and 5 separately for both models. The kinematic states of both models use the constant velocity model (the kinematic dynamic model in (14)), and the parameter in (14) is set as and separately. The proposed image moments based RHM with the UKF-IMM filter combines the constant velocity model in (14) and the coordinated turn model in (20). The parameter for the process noise covariance in the constant velocity model in (14) is set as and . For the coordinated turn model in (20), . The initial probability of the two models in the IMM filter for both algorithms is set as equal and the Markov chain transition matrix is selected to be . The model probability of the proposed algorithm is shown in Fig. 10. With the same trajectory, the two algorithms run with 1000 Monte Carlo runs and their simulation results are shown in Fig. 9.

(a) Average RMSE of position     (b) Average RMSE of velocity     (c) Average IoU

Figure 9: Comparison of the estimation results of the image moments based RHM and the RMM algorithm developed in [20].
Figure 10: Average model probabilities in runs.

The proposed image moments based RHM and its measurement and dynamic models are validated in the simulations of the static targets, the targets with linear motion and with the coordinated turn motion. As the noise levels are increased, the size of the estimated elliptical shape doesn’t increase as the sensor noise increases. When the targets are performing during the linear motion or the coordinated turn motion, the proposed algorithm can predict the position and velocity of the moving target, as well as the spatial extent and orientation of the targets. To estimate the target moves switching between the maneuvering and non-maneuvering intervals. the proposed image moments based RHM is embedded with the IMM framework. The proposed average measurement log-likelihood function can estimate the model probability accurately and consistently. The RMSE values of the position and velocity of the target’s centroid is lower than the results from the RMM. The state variables of the RMM is the centroid and the random matrix, which is updated based on the mean and spread matrix of the measurement points[20]. The proposed RHM using the centroid and the three image moments as the state variables, which is updated based on each individual measurement point. When the number of the measurement points is small or noisy, the proposed image moments based RHM estimates the position and velocity of the centroid accurately, while the mean of the measurement points is far away from the position of the centroid. The accurate dynamic model has the advantage of predicting the location of the target, especially when predict the location of the target undergoing fast motion and the sampling frequency is relatively low.

Figure 11: Illustration of the estimation results in a particular run by the proposed image moments based RHM; The target is at first estimated as the circle ( green circle ) with the radius of 20 pixels; The estimated results (yellow ellipse), the measurements (red crossing) and the ground truth (blue box) are shown for every 40 frames.

(a) Average RMSE of the position
(b) Average IoU

Figure 12: Tracking results of the proposed image moments based RHM algorithm over Monte Carlo runs.

Vi Experiment

In this section, the proposed image moments based RHM is applied for tracking a moving car represented as an extended object in a real video. A short video clip from the Stanford drone dataset [39] is used, which shows a moving car from a bird’s eye view. The video is captured with a 4k camera mounted on a quadcopter platform (a 3DR solo) hovering above an intersection on a university campus at an altitude of approximately meters which contains 431 frames with the image size of 1422 by 1945 pixels and the video has been undistorted and stabilized [39]. The ground truth is manually labeled at each frame and the measurement points are uniformly generated inside the bounding box of the ground truth. The number of measurements in each frame is generated based on the Poisson distribution with mean of . The sensor noise is Gaussian white noise with variance . In Fig. 11, the first top-view scene of the moving car is shown and snapshots of the estimation results out of the 431 frames are plotted in the same figure. The target is moving switching between the linear motions and the rotational motions. The constant velocity model in (14) and the coordinated turn model in (20) with the UKF-IMM filter are applied to track the moving car. The parameter for the process noise covariance in the constant velocity model in (14) is set as and . For the coordinated turn model in (20), (with position is in pixels). The initial probability of the two models in the IMM filter for both algorithms is set as equal and the Markov chain transition matrix is selected to be .

The proposed algorithms run with 1000 Monte Carlo runs and their estimation results are shown in Fig. 12. The mean value of the RMSE of the centroid position over Monte Carlo runs is . The mean value of the IoU (the ground truth is approximated as an ellipse with the axes are same as the width and height of the corresponding bounding box and they have the same orientation) over Monte Carlo runs is .

Vii Conclusion

In this paper, the minimal, complete, and non-ambiguous representation of an elliptic object is modeled based on image moments for extended object tracking. The measurement model and the dynamic models of the image moments for linear motion and coordinated turn motion are analytically derived. The unscented Kalman filter and its combination with the interacting multiple model approach is applied for estimating the position, velocity and spatial extent based on the noisy measurement points uniformly generated from the extended target. The proposed image moments based random hypersurface model and its filters are validated and evaluated in different simulation scenarios and one real trajectory. The evaluation results show that the proposed model and its inference can provide accurate estimations of the position, velocity and extents of the targets. The proposed Image moments based RHM for tracking the extended objects can be embedded into other Bayesian based methods, such as multiple hypothesis tracking techniques or probabilistic data association filters.

Appendix A Transition matrix of the coordinated turn motion

(36)

where . The solution to this linear time-invariant state space equation (36) is

(37)

which .

The interpolation polynomial method [33] is used to get the transition matrix of the dynamic equation . Firstly, By solving , the eigenvalues of the matrix is calculated as , and . Then, a polynomial of degree of as is found, which is equal to on the spectrum of , that is

(38)

which and . The polynomial is calculated as