Particle Filter Re-detection for Visual Tracking via Correlation Filters

Particle Filter Re-detection for Visual Tracking via Correlation Filters

Di Yuan Donghao Li Zhenyu He zhenyuhe@hit.edu.cn School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China Xinming Zhang xinmingxueshu@hit.edu.cn School of Science, Harbin Institute of Technology, Shenzhen, China
Abstract

Most of the correlation filter based tracking algorithms can achieve good performance and maintain fast computational speed. However, in some complicated tracking scenes, there is a fatal defect that causes the object to be located inaccurately. In order to address this problem, we propose a particle filter redetection based tracking approach for accurate object localization. During the tracking process, the kernelized correlation filter (KCF) based tracker locates the object by relying on the maximum response value of the response map; when the response map becomes ambiguous, the KCF tracking result becomes unreliable. Our method can provide more candidates by particle resampling to detect the object accordingly. Additionally, we give a new object scale evaluation mechanism, which merely considers the differences between the maximum response values in consecutive frames. Extensive experiments on OTB2013 and OTB2015 datasets demonstrate that the proposed tracker performs favorably in relation to the state-of-the-art methods.

keywords:
visual tracking, correlation filter, particle filter redetection, scale evaluation
journal: Knowledge-Based Systems

1 Introduction

Visual object tracking is one important topic in computer vision and plays a necessary role in numerous applications, such as video surveillance, automobile navigation, human-computer interfaces, robotics and driverless vehicle. Although substantial progress has been proposed in recent years, achieving higher efficiency with lower computation complexity in visual object tracking is still a tough problem to solve.

Many different methods have been proposed for visual object tracking in succession in recent decadesKAP:Isard (); IEEE:Hossain (); MOT (); Zhang2017Robust (). Considering that particle filters assume non-linearity and non-Gaussianity assumption to estimate problems and have high performanceKAP:Isard (), Kabir and Chi-WooIEEE:Hossain () develop a observation model based on the robustness of phase correlation in a particle filter framework for visual object tracking to address occlusion. Li IEEE:Li () propose a visual object tracking method based on adaptive background modelling to improve the robustness of the particle filter framework. There are also some improved trackers that have more precision and robustness than the traditional particle filter based trackersYi2015Single (); Ou2017Object (); Yu2010Infrared (). Although particle filter based trackers have some advantages, the high computational complexity remains a fatal flaw. In this paper, we use a particle resampling strategy to provide more target candidates and use the correlation filter to choose the best one as the target object to improve the computational efficiency.

Recently, correlation filter based tracking algorithms have achieved remarkable resultsIEEE:MOSSE (); He2015One (); Springer:DSST (); He2016Robust (); liu2017deep (). Typically, the design of the correlation filter usually places the peak of the response in the scene as the tracking target, and place the low response position as the background. Although the filter can locate the tracking target effectively, the training process requires a large number of samples, which reduces the tracking speed. By using adaptive training schemes, Bolme IEEE:MOSSE () propose a minimum output sum of squared error (MOSSE) filter whose tracking results are quite robust and effective. After that, a series of correlation filter based tracker were developedSpringer:SAMF (); Springer:RPAC (); fDSST (); Multi-view (). Although these correlation filter based trackers have achieved some pretty better performances in visual object tracking, all of them have the limitation of being excessively dependent on the maximum response value. When the response map become unreliable, the maximum response value becomes smaller. Under these circumstances, the object is determined by the response map may drift or become lost, so an efficient redetection mechanism is very important in the tracking algorithmWEN2008TWO (); Ou2014Robust (); Jing2015Super (); Lai2016Approximate (); Shi2016Two (); Qi2017Structure ().

To provide a more credible candidate for the object target, we develop an effective redetection model for visual tracking. In each frame after initialization, an image patch of a previous estimated position is cropped as the search window input and the HOG features are extracted to better describe it. Subsequently, convolution between input features and the correlation filter is performed in the frequency domain. After that, a response map is obtained by inverse fast Fourier transform (IFFT) and we can get the maximum response value from it. If the maximum response value is larger than the threshold, the coordinate of the maximum response value is taken as the object new position. If the maximum response value is less than the threshold, the location of the target is redetected by using the particle filter to resample more candidates. Lastly, extracted the appearance in the newly estimated position is extracted to train and update the correlation filter.

The main contributions of this paper can be summarized as follows:

  • We propose an efficient method for accurately locating the tracking object by particle filter redetection. This method allows us to redetect the location of target object if the result of the correlation filter tracking is ambiguous or unreliable.

  • A novel scale-evaluation strategy is given by comparing the relationship of the maximum response values in consecutive frames. This scale evaluation mechanism can effectively reduce the impact of variations in the scale of the target on the performance and increase the robustness of the algorithm.

The rest of this paper is structured as follows. We first introduce some related works in Section 2. Next, we propose the particle filter redetection tracker via correlation filters, including the introduction to the basic KCF tracker, the particle filter redetection model, the scale evaluation and the model update in Section 3. Then, we present the implementation details of our tracker in Section 4. Subsequently, we introduce the evaluation criterion and evaluate our approach on comprehensive benchmark datasets in Section 5. Finally, we briefly present the conclusion of our work in Section 6.

2 Related Works

Visual object tracking has been studied extensively and have multifarious applications in the real world scenes. As an comprehensive review on particle filter technology and correlation filter technology is not necessary for this paper, we have just reviewed the works related to our method for simplicity, which include the particle filter based trackers and the correlation filter based trackers.

2.1 Particle Filter Based Trackers

Particle filter based algorithms have been studied in visual object tracking for many years and their variations are still widely used nowadaysIEEE:Hossain (); IEEE:Li (); Yuan2017Patch (); Mai2016Optimization (); Fazli2009Particle (). The traditional particle filter algorithm implements a recursive Bayesian framework by using the nonparametric Monte Carlo sampling method, which can effectively track the target objects in most scenesKAP:Isard (). However, due to the limitation of initializing the particle number and the target template artificially, the number of particles is hard to decide and the target template selection is not accurate enough. Li IEEE:Li () presented an improved particle filter algorithm that achieved semi-automatic initialization of the tracking object. In view of the particle filter framework, Kabir and Chi-WooIEEE:Hossain () proposed a phase dependent robust observation model and introduced an optimization method to improve the precision. Because particle filters are able to model the uncertainty of object movements, which can provide a robust tracking framework, it can also consider multiple state hypotheses simultaneously. Zhou Zhou2016Adaptive () integrated multiple cues into a particle filter framework and introduced a quality function to calculate the reliability of each cue. Mai Mai2016Optimization () developed a particle filter based tracker by using color distributions features and utilized the computing power of embedded systems to reduce the complexity of the tracker. By embedding deterministic linear prediction in stochastic diffusion, an adaptive method has been proposed to adjust the number of samples according to an adaptive noise component Zhou2004Visual (). Although these particle filter frameworks have achieved some good performances, they still suffer from one drawback: high computational complexity.

Unlike these methods, our method uses the property of a circulant matrix and the conversion between time and frequency domains to reduce the computational cost and can handle situations, in which the target object becomes lost by the correlation filter based tracker since the particle filters used by our method can produce high accuracy predictions from previous observations by the particle resampling strategy.

2.2 Correlation Filter Based Trackers

Since correlation filter operation can convert the convolution operation of two image blocks into Fourier domain element-wise products, it has been applied to visual object tracking thanks to the fast computational speed. Bolme IEEE:MOSSE () introduced an adaptive correlation filter by minimizing the output sum of the squared error to make the tracking strategy simpler and more effective. In an evaluation of online visual tracking approaches, Henriques Springer:CSK () proposed a CSK tracker that can provide good performance and a high calculation speed. These two trackers both use the single-channel gray value feature. Danelljan IEEE:CN () improved the CSK methods by using the color attributes. In IEEE:KCF (), the KCF method further improves the efficiency of the CSK tracker by using HOG features and the kernel method to transform the non-linear regression problem into a linear regression. For the scale evaluation problem, the DSSTSpringer:DSST ()(discriminative scale space tracking) tracker uses the HOG feature to learn an adaptive multi-scale correlation filter to handle the scale change of the object target. Zhang Zhang2016In (), exploited the circulant structure property of a target template to improve sparse representation based trackers. For improve the robustness of the algorithm, some local patches or parts based correlation filter trackers have also been developedOu2016Multi (); Li2015Reliable (); Guo2015Robust (); Liu2015Real (); Liu2016Structural (). Li Li2015Reliable () introduced reliable patches, whose distributions are under a sequential Monte Carlo framework, to exploit the use of local contexts to carry out the tracking task. InLiu2015Real (), a part based multiple correlation filter is proposed to preserve the structure of the object target by adopting a Bayesian inference framework and a structural constraint mask to make the tracker robust. Liu Liu2016Structural () proposed a part based structural correlation filter and exploited circular shifts of those parts to preserve the structure of the object target for visual tracking. However, these correlation filter based tracking methods are exceedingly dependent on the maximum response value. Therefore, these methods may lose their tracking object when the maximum response value becomes ambiguous or unreliable.

Unlike the existing correlation filter based tracking methods, which are excessively dependent on the maximum response value to locate the target, we propose a particle filter redetection correlation filter tracker. When the correlation filter based tracking result become unreliable, the particle filter redetection method can exploit the particle resampling strategy to provide more object candidates, which can greatly enhance the robustness of the tracking method.

3 The Proposed Method

In this section, we give the overall algorithm framework in Section 3.1, introduce the basic framework of kernelized correlation filter based tracker in Section 3.2, propose the method of particle filter redetection for visual tracking in Section 3.3, give a simple but effective scale evaluation algorithm in Section 3.4, and finally, propose the a model update strategy in Section 3.5.

3.1 Overview of the Proposed Method

As illustrated in Figure 1, the proposed approach consists of two parts: the CF-tracking part, which is used to track the target object directly, and the redetection part, which is used to re-detect the object target. During the tracking process, the feature is extracted according to the known target position in the first frame, and the correlation filter is trained directly. The target object size of the first two frames never has no obvious changes, so we use the same object size in the first two frames. Next, in -th frame (), the feature is extracted from the search window and the response map is computed through the known correlation filter. Then, by comparing the maximum response value () and the threshold (), we can determine which part can be used to track the target object. If , the tracker gives the tracking result directly; otherwise, the redetection part is used to track the target object. Finally, when we get the tracking result, the result is used to train and update the correlation filter correspondingly until the last tracking frame.

Figure 1: The framework of our approach is comprised of two parts: the CF-tracking part and the redetection part. For the -th frame, the maximum response value and the threshold are compared and the part that is to be used to track the target object is determined.

3.2 Kernelized Correlation Filter Based Tracking Framework

Before the detailed discussion of our proposed framework and for completeness, we first revisit the details of the conventional KCFIEEE:KCF () based tracking method. The KCF tracking method trains a classifier through dense sampling from an image patch. Using kernel trick, the samples data matrix can have highly structured, and is thus allow to operation with cyclic shifts. Besides, according to the Convolution Theorem, we can get the convolution of two patches in the spatial domain simply by using the element-wise product in the Fourier domain. Therefore, for correlation filter based trackers, the computational efficiency can be greatly improved by Fast Fourier Transform (FFT) and its inverse transformation. However, the KCF tracker uses the target object appearance to train and update its models, and if the object is heavily occluded or in fast motion, the tracker may fail to detect it.

The KCF tracker models the appearance of the target object by using a filter , which is trained on an image patch of pixels with HOG features. All the circular shifts of (m,n) are generated as the training samples for the filter with the Gaussian function label . The filter can be acquired by minimizing the error between the training sample and the regression target . The minimization problem is:

(1)

where represents the mapping to a kernel space in the Hilbert space, denotes the inner product, and is a regularization parameter (). Since the label is not binary, the filter learned from the training samples contains the coefficients of a Gaussian ridge regression.

Using FFT to compute this problem, this objective function can be identically expressed as , so the solution of Eq. (1) can be acquired by the following formula:

(2)

where and denote FFT and IFFT, respectively. The kernel correlation is computed by the Gaussian kernel in the Fourier domain. The vector contains all the coefficients. The KCF model consists of the target object appearance and the coefficients .

In the tracking process, a patch with the same size as is cropped from the new frame image. The response score is calculated by:

(3)

where denotes the element-wise product, , and is the learned target object appearance.

3.3 Particle Filter Re-detection Model

The fast motion, occlusion or background clutter of the target object can have a big impact on the tracking performance. For example, if there is a lot of background clutter, the KCF tracker maybe lose the target object because it is overly dependent on the maximum response value. Therefore, we propose a framework for the KCF tracker that can provide more target object candidates in the redetection part (Figure 1, Re-detection part).

A particle filter is an efficient method of providing more reasonable target object candidates by using the particle resampling strategy. The central idea is to use a set of random particles with related weights to represent posterior densities and estimate the values based on these samples and the related weightsIEEE:Arulampalam (); Lai2014Multilinear (). It is based on the theory of the sequential Monte Carlo importance sampling method. Suppose and are the state and the observation variables at time respectively. Mathematically, object tracking is based on the observations up to the previous time to find the most probable state at time :

(4)

The posterior distribution of the state variable is updated according to Bayes rule using the new observation at time :

(5)

The particle filter approach approximates the posterior state distribution by samples, which are called particles with corresponding importance weights and the sum to 1. The particles obey an importance distribution and the weights are updated as:

(6)

When the state transition is independent of the observation, is always simplified to a first-order Markov process . Meanwhile, the weights are updated as:

(7)

For every frame, the tracker always uses the the particle with the largest weight as the tracking result. Correspondingly, as in our tracking framework, we can use the particle with the largest maximum response value as the tracking result.

In our method, we think the correlation filter based tracker (CFT) is strongly dependent on the maximum response value of the response map. This situation may cause the tracker to lose its target object when the response map becomes ambiguous or unreliable. In order to ensure that our algorithm achieves a high performance, a particle filter redetection tracker (PFT) mechanism has been adopted. The main equations are as follow:

(8)

where denotes the maximum response value of the response map, which is obtained from the correlation filter based tracker, and is a threshold to determine whether the response map is credible or not.

For the particle filter part of our method, we utilize the advantage of the particle resampling mechanism to provide more reasonable object candidates. In this redetection part, the number of image patches is set to with the same size of search window, and the image patches obey the normal distribution centered on the position of the previous target object. Given the object appearance model and the coefficients , each particle image can be guided toward the modes of the target state distribution by using its circular shifts. For each image patch (also called a particle) , the HOG features are extracted and a correlation calculation is performed between the HOG features and the correlation filter in the frequency domain based on the Convolution Theorem. After that, IFFT is used to obtain the spatial response map. This is expressed in the mathematical model as:

(9)

where denotes the -th particle corresponding to the image patch and denotes the corresponding response map.

We can choose the image patch with the best maximum response value as the center of the target object, because mathematically,

(10)

where denotes the best particle, which corresponds to the one with the maximum response value, and denotes the -th particle corresponding maximum response value.

In this approach, we use the particle filter to choose more search windows when the response map given by a single search window is ambiguous or unreliable. In this way, we can find more target object candidates, which can make the tracking results more robust and efficient.

3.4 Scale Evaluation

We can only obtain the object center position through the maximum value of the response map in correlation filter tracking framework, but there is no scale estimation of the tracking objectIEEE:KCF (); Springer:CSK (). However, scale variation is one common challenging aspect and can influence the accuracy and performance in visual tracking processSpringer:DSST (). In this section, we give a simple but effective mechanism for detection of scale changes depend on the relationship between the maximum response values of the consecutive frames.

For most tracking methods, the model or template size of the object is fixed as either a manually set or a initial object sizeYou2014A (); Zhang2015Robust (); ERT (). In order to handle candidate images of different sizes, the candidate images patch are usually adjusted to the same size by the affine transformation model. But the affine transformation model has more parameters, which can lead to high computational cost and reduce the tracking efficiency. The rate of change of the maximum response between consecutive frames is negatively related to the change of the object size, because the response map of the object obeys a normal distribution. Therefore, we use the change rate between the maximum responses of consecutive frames to determine the change of object size. In tracking image sequences, due to the size of object is gradually changing and is accompanied by a certain degree of attenuation effect, for simplicity, we only consider the changing trend of the object size instead of the accurate values, so we merely consider whether the size of the object becomes smaller or larger or remains unchanged.

For the correlation filter based tracker, the initial target size has been given. We set the object size as the initial size for the second frame. Then, if we know the object size of the ()-th frame (where ), we can determine the trend of the change in the object size in the -th frame. The direction for the -th frame can be determined as:

(11)

where denotes the maximum response value of the -th frame; if the target size of -th frame becomes smaller; if the target size of -th frame becomes larger; otherwise the target size of the -th frame does not change. and are two thresholds that are used to determine the direction of the size of the target object.

The target size of the -th frame can be calculated as:

(12)

where denotes the target size of the -th frame and denotes the scale factor of frame , which determined by . If the scale factor ; If the scale factor ; otherwise the scale factor . We can use Eq.( 11) to obtain the direction of the object scale change and use Eq.( 12) to achieve the optimal size of the object in frame .

3.5 Model Update

Model updating is an important step in visual tracking. In the process of tracking, the object appearance often changes with the factors of rotation, scale and posture. Therefore, the filter needs to be quickly updated to accommodate the changes in the object tracking process. In this paper, we adopt a linear update model to update the filterIEEE:MOSSE (); Qian2011Accurate (). The method only exploits the current frame target to update the filter:

(13)

where denotes the updated correlation filter model of the target in the -th frame, denotes the correlation filter of the -th frame, and is the learning rate, which is used to update the correlation filter in the current frame.

4 Implementation Details

In this section, we first represent the overall tracking process of our proposed tracking method, and then describe the parameter settings of our experiments. The integral framework of our approach is given in Algorithm  1. Our tracker begins with the object position in the first frame image, and we can use it to train a correlation filter. In the next frame image, we can extract the HOG features from a search window and convolve with the correlation filter to obtain a response map. The maximum response value is compared with a threshold to determine which method should be used to find the new position of the target object. If the maximum response value is more than the threshold, the position is found directly by KCF tracking; otherwise, the location of the target is detected by using a particle filter to resample more candidates. Then, the appearance in the newly estimated position is extracted for training and updating of the correlation filter. The whole process is repeated until the position of the target object is given in the last frame image.

Because the size of the target object has no obvious change in the first two frames, we use the same size for the first two frames and set the number of particles , the judgement threshold , and the direction thresholds and . The search window size is set as - , that is, twice the target object size. The regularization parameter of the CF model is set as . In the part of scale evaluation, the three scale weights change directions are set as and the default setting is set for the basic KCF trackerIEEE:KCF ().

1:  Inputs: the initial target bounding box , the target size , the search window -, the initial tracking frame , the model learning rate , the particle number , the judgment threshold , and the direction thresholds and .
2:  Outputs: The position and scale of the target in each frame.
3:  Extract the target features from with area ;
4:  Train the initial models with Eq. (2) and ;
5:  if , where is the number of the current frame and is the total number of tracking frames, then
6:     Evaluate the scale change and get the optimal scale factor with Eqs. (11) and (12);
7:     Crop the search window with - from the current frame and extract the features from the search window;
8:     Compute the correlation filter response with Eq. (3);
9:     For the image in the -th frame (where ), determine the difference between the maximum response value and the threshold value with Eq. (8).
10:     if   then
11:        Get the target position of the current frame and the target object size is ;
12:     end if
13:     if   then
14:        Get search windows from current frame and calculate the corresponding response map with Eq. (9).
15:        Choose the best candidate as the correlation filter response map with Eq. (10);
16:        Get the object position of the current frame , and the target object size is ;
17:     end if
18:     Get the correlation filter model with the current target object and update it with Eq. (13);
19:  end if
Algorithm 1 Correlation filter based particle filter redetection framework (CFPFT)

5 Experiments

We evaluate our proposed method on the datasets OTB2013 and OTB2015 IEEE:OTB2013 (); Wu2015Object (). The dataset OTB2013 has different sequences and categorizes these sequences with 11 attributes, namely, fast motion (FM), background cluster (BC), motion blur (MB), deformation (DEF), illumination variation (IV), in-plane rotation (IR), low resolution (LR), occlusion (OCC), out-of-plane rotation (OR), out of view (OOV) and scale variation (SV). The dataset OTB2015 includes different sequences. Our method is implemented in MATLAB and run at round frames per second on a PC with an Intel Core-- CPU ( GHz) and GB of RAM.

5.1 Evaluation Criterion

In order to evaluate the performance of our proposed algorithm, we use three classes of evaluation indexes proposed in OTB2013: One-Pass Evaluation (OPE), Temporal Robustness Evaluation (TRE), and Spatial Robustness Evaluation (SRE). OPE is a traditional evaluation method that runs trackers on each sequence just once. For TRE, each compared tracking method is evaluated numerous times from different starting frames across a video sequence. Each tracker is evaluated from a particular starting frame, with the initialization of the corresponding ground-truth object state in each evaluation, and the experiments are implemented times with different starting frames in every video sequence. The SRE evaluation generates the object states by shifting or scaling the ground-truth bounding box of an object slightly, and the experiments are implemented times with different spatial perturbations. With TRE and SRE, the robustness of each evaluated trackers can be comprehensively interpreted.

After running the trackers, precision plots and success plots are applied to the present results. Precision plots show the percentages of frames whose estimated locations lie within a given threshold distance from the ground-truth centers. With regard to the success plots, an average overlap measure is the most appropriate for tracker comparisonIEEE:Cehovin (), as it accounts for both size and position. For this purpose, we use the typical criterion of the Pascal VOC Overlap Ratio (VOR)Springer:Everingham (). Given the bounding of the result and the bounding box of the ground truth, the VOR can be computed as:

(14)

where and denote the intersection and union of two regions, respectively. Afterwards, a frame whose VOR is larger than a threshold is termed a successful frame, and the ratios of successful frames at the thresholds ranged ranging from to are plotted in the success plots.

5.2 Experimental Evaluation

In this section, we show the experimental results for the OTB2013IEEE:OTB2013 () and OTB2015Wu2015Object (). For these visual tracking benchmarks, the experimental results are illustrated by precision plot (or rate) and success plot (or rate). The precision plot shows the percentage of successfully tracked frames in the whole sequence and evaluates the performance of the algorithms with Center Location Error (CEL) in pixels, which ranks the trackers as the precision score at pixels. The success plot shows the percentage of successfully tracked frames using the VOR threshold ( is usually taken as the threshold), while the Area Under the Curve (AUC) is used as the metric for ranking.

5.2.1 Evaluation with OTB2013

In this section, we analyze our approach on the OTB2013IEEE:OTB2013 () benchmark by demonstrating the impact of our contributions. For the OTB2013IEEE:OTB2013 () benchmark, the performances of all the tracking methods are measured by the OPE, TRE, and SRE mechanisms.

We compare our method with representative algorithms, which include algorithms given in the OTB2013 benchmark and two representative algorithms based on the correlation filter, namely, KCFIEEE:KCF () and DSSTSpringer:DSST ().

To make the results clear, we only plot the top 10 ranked trackers in the precision and success plots. As shown in Figure 2, our proposed CFPFT tracker achieves top rank and the best performance with a large margin in all the tracking plots. Specifically, the proposed tracker achieves a ranking score for the success plot and a ranking score . Compared with the KCF tracker, which has a success ranking score and a precision ranking score , our CFPFT tracker has obtained improvements over and , respectively. Even compared with DSST, which has a success ranking score and a precision ranking score , our tracker also has obtained improvements over and , respectively. This demonstrates that the idea of the redetection mechanism for tracking is effective and promising in practice.

(a) The precision plots of OPE
(b) The success plots of OPE
Figure 2: Precision and success plots of OPE on the OTB2013 benchmark. The numbers in the legend indicate the representative precision at pixels for precision plots and the average area-under-the-curve scores for success plots.

The OPE performances of the trackers on each attribute are shown in Figures  3 and 4, which demonstrates the OPE performances of the top ten trackers on the attributes. Our proposed tracker achieves the best or the second best performance among all of the trackers compared, and the performance of different attribute groups indicates that our CFPFT tracker is clearly more accurate and robust. These advantages benefit from the particle filter redetection and the scale evaluation mechanism.

(a) Fast motion
(b) Background clutter
(c) Motion blur
(d) Deformation
(e) Illumination variation
(f) In-plane rotation
(g) Low resolution
(h) Occlusion
(i) Out-of-plane rotation
(j) Out-of view
(k) Scale variation
Figure 3: The precision plots of evaluation of different attributes on OTB2013. The number at the end of the caption of each sub-figure shows how many sequences are included in the corresponding case.
(a) Fast motion
(b) Background clutter
(c) Motion blur
(d) Deformation
(e) Illumination variation
(f) In-plane rotation
(g) Low resolution
(h) Occlusion
(i) Out-of-plane rotation
(j) Out-of view
(k) Scale variation
Figure 4: The success plots of evaluation of different attributes on OTB2013. The number at the end of the caption of each sub-figure shows how many sequences are included in the corresponding case.

In order to provide sufficient experimental comparison results to verify the robustness of our CFPFT tracker, we show the overall comparison performance for SRE and TRE in Figure  5. The Figures  5(a-b) shows that our tracker achieves the second best performance on the success plots, close to that of DSST and better than that of KCF. On the precision plots, our CFPFT tracker achieves the best performance and is 2 higher than that of DSST, which is in the second place. From Figures  5(c-d), we see both the precision and the success plots shows that our tracker achieves the best performance. The results for TRE show the robustness of our tracker to the initialization in the first frame by shifting or scaling the ground truth. Because our CFPFT is based on KCF, the results show the robustness of the redetection mechanism. To summarize briefly, the CFPFT tracker is effective and achieves promising results on the visual tracking benchmark OTB2013.

(a) The precision plots of SRE
(b) The success plots of SRE
(c) The precision plots of TRE
(d) The success plots of TRE
Figure 5: The precision and success plots of SRE and TRE on the OTB2013 benchmark.

5.2.2 Evaluation with OTB2015

To further evaluate the performance of our proposed approach, in this section, we compare the performance of our CFPFT tracker with some state-of-the-art trackers, including TGPRGao2014Transfer (), DSSTSpringer:DSST (), KCFIEEE:KCF (), SCMYang2012Robust (), StruckHare2012Struck (), CNN-SVMCNN-SVM (), CNTCNT (), CFNet-conv1CFNet (), and HDTHDT (), and the last four trackers are based on deep learning theory. Unlike other methods, deep neural network based trackers extract the features by utilizing deep learning.

(a) The precision plots of OPE
(b) The success plots of OPE
Figure 6: The precision and success plots of OPE on OTB2015 over 100 standard benchmark sequences.

Figure  6 shows the precision and the success plots of our CFPFT tracker and the eight state-of-the-art trackers on the OTB2015 dataset. From that, we can see the success rate and the precision rate of our approach are just below those of HDTHDT () and CNN-SVMCNN-SVM (), which uses the deep feature. Meanwhile, it is obvious that our CFPFT tracker achieves promising performance and outperforms other six trackers which include a deep learning based trackerCNT (), two correlation filter based trackersIEEE:KCF (); Springer:DSST () and three representative trackersGao2014Transfer (); Yang2012Robust (); Hare2012Struck ().

We further analyze the performance of CFPFT for different attributes in OTB2015Wu2015Object (). Table  1 shows the comparisons of CFPFT with eight other state-of-the-art tracking algorithms on these attributes. In terms of distance precision rates (DPR), CFPFT achieves the best or close to the best results for all attributes. Compared with the deep learning based trackerCNT (), CFPFT can locate the object better in videos but performs worse than the other two deep learning based trackersCNN-SVM (); HDT (). Compared with other trackers, including correlation filter based trackers and traditional trackers, CFPFT can locate the object better. On the other hand, CFPFT also achieves the best or close to the best results of overlap success rates (OSR) for all attributes. Compared with all eight state-of-the-art trackers, CFPFT performs more robustly of fast motion, background cluttered, low resolution, occlusion and scale variation. While compared with correlation filters based trackersSpringer:DSST (); IEEE:KCF () and the traditional methodGao2014Transfer (); Yang2012Robust (), CFPFT performs more robustly for all attributes with the help of the redetection mechanism and the scale evaluation mechanism.

Attribute CFPFT KCF DSST HDT CNN-SVM CNT TGPR SCM Struck
BC / / / / / / / / /
DEF / / / / / / / / /
FM / / / / / / / / /
IPR / / / / / / / / /
IV / / / / / / / / /
LR / / / / / / / / /
MB / / / / / / / / /
OCC / / / / / / / / /
OPR / / / / / / / / /
OV / / / / / / / / /
SV / / / / / / / / /
average / / / / / / / / /
Table 1: Average precision and success scores of our CFPFT and other trackers on OTB2015 for different attributes. The value format of each table cell is ” DPR/OSR () for 11 attributes”.

In addition, we use OTB-2013/50/2015 datasets to perform a quantitative comparison of DPR at pixels and OSR at pixel in Table. 2. It shows that our CFPFT outperforms other state-of-the-art trackers at both rates. On OTB2013 dataset, our tracker achieves a DPR of and an OSR of and performs slightly worse than HDTHDT (), which is a deep learning based tracker. Compared to CFNet-conv1CFNet (), which is deep learning method, our approach also achieves better DPR and OSR results. On OTB50 dataset, our tracker achieves a DPR of and an OSR of , achieves the best performance except for the deep learning based tracker HDTHDT (). Though CNTCNT () and CFNet-conv1CFNet () utilize deep features to represent the appearance of an object, our approach performs better than both of them in term of DPR and OSR results. Meanwhile, on OTB2015 dataset, our tracker achieves a DPR of and an OSR of , showing better performance than the correlation filters based trackersSpringer:DSST (); IEEE:KCF (); STC (), the representative trackersGao2014Transfer (); Yang2012Robust (); Hare2012Struck () and the deep learning based trackersCNT (); CFNet (). All of these good results are benefited from the particle resampling mechanism and the simple scale evaluation strategy. From all the experiment results, we can see that our tracker achieves a good and promising performance.

Dataset Evaluation CFPFT Correlation filters trackers Deep learning trackers Representative trackers
Criterion (Ours) STC KCF DSST CFNet-conv1 HDT CNT TGPR SCM Struck
OTB-2013 DPR
OSR
OTB-50 DPR
OSR
OTB-2015 DPR
OSR
Table 2: Comparisons with state-of-the-art tracking methods on OTB-2013/50/2015IEEE:OTB2013 (); Wu2015Object (). Our CFPFT outperforms the existing approaches in term of DPR and OSR ().

6 Conclusion

In this paper, we propose a particle filter redetection tracker with correlation filters (CFPFT) to achieve an effective and robust performance on the test benchmark. The redetection mechanism plays an important role in the visual object tracking process when the tracker losts its object. It can effectively re-locate the object and improve the tracking performance by extensive particle resampling that can provide more candidates. Besides, we give a simple scale evaluation mechanism that shows the effectiveness on the sequence with scale change. The extensive experimental results shows the competitiveness of our CFPFT tracker compared with trackers, which are widely used in the performance evaluation of tracking algorithm. The analysis of the experimental results with different attributes demonstrates the better ability of our tracker.

Acknowledgment

This research was supported by the National Natural Science Foundation of China (Grant Nos. 61672183, 61272252, U1509216, 61472099, 61502119), by the Shenzhen Research Council (Grant Nos. JCYJ20170413104556946, JCYJ20160406161948211, JCYJ20160226201453085, JSGG20150331152017052), by Science and Technology Planning Project of Guangdong Province (Grant No. 2016B090918047) and by Natural Science Foundation of Guangdong Province (Grant No. 2015A030313544).

References

  • (1) M. Isard, A. Blake, Condensation-conditional density propagation forvisual tracking, Kluwer Academic Publishers, 1998, pp. 5–28.
  • (2) K. Hossain, C. W. Lee, Visual object tracking using particle filter, in: International Conference on Ubiquitous Robots and Ambient Intelligence, 2013, pp. 98–102.
  • (3) Z. He, X. Li, X. You, D. Tao, Y. Y. Tang, Connected component model for multi-object tracking, IEEE Transactions on Image Processing 25 (8) (2016) 3698–3711.
  • (4) S. Zhang, X. Lan, Y. Qi, P. C. Yuen, Robust visual tracking via basis matching, IEEE Transactions on Circuits and Systems for Video Technology 27 (3) (2017) 421–430.
  • (5) X. Li, S. Lan, Y. Jiang, P. Xu, Visual tracking based on adaptive background modeling and improved particle filter, in: IEEE International Conference on Computer and Communications, 2017, pp. 469–473.
  • (6) S. Yi, Z. He, X. You, Y. M. Cheung, Single object tracking via robust combination of particle filter and sparse representation, Signal Processing 110 (2015) 178–187.
  • (7) W. Ou, D. Yuan, Q. Liu, Y. Cao, Object tracking based on online representative sample selection via non-negative least square, Multimedia Tools and Applications (2017) 1–19.
  • (8) Y. Yu, Y. F. Che, Infrared object tracking based on particle filter, in: International Congress on Image and Signal Processing, 2010, pp. 1508–1511.
  • (9) D. S. Bolme, J. R. Beveridge, B. A. Draper, Y. M. Lui, Visual object tracking using adaptive correlation filters, in: Computer Vision and Pattern Recognition, 2010, pp. 2544–2550.
  • (10) Z. He, Y. Cui, H. Wang, X. You, C. L. P. Chen, One global optimization method in network flow model for multiple object tracking, Knowledge-Based Systems 86 (2015) 21–32.
  • (11) M. Danelljan, G. Häger, F. S. Khan, M. Felsberg, Accurate scale estimation for robust visual tracking, in: British Machine Vision Conference, Vol. 65, 2014, pp. 1–11.
  • (12) Z. He, S. Yi, Y. M. Cheung, X. You, Robust object tracking via key patch sparse representation, IEEE Transactions on Cybernetics 47 (2) (2016) 354–364.
  • (13) Q. Liu, X. Lu, Z. He, C. Zhang, W. S. Chen, Deep convolutional neural networks for thermal infrared object tracking, Knowledge-Based Systems 134 (2017) 189–198.
  • (14) Y. Li, J. Zhu, A scale adaptive kernel correlation filter tracker with feature integration, in: European Conference on Computer Vision, 2014, pp. 254–265.
  • (15) T. Liu, G. Wang, Q. Yang, Real-time part-based visual tracking via adaptive correlation filters, in: Computer Vision and Pattern Recognition, 2015, pp. 4902–4912.
  • (16) M. Danelljan, G. Hager, F. S. Khan, M. Felsberg, Discriminative scale space tracking., IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (8) (2017) 1561–1575.
  • (17) X. Li, Q. Liu, Z. He, H. Wang, C. Zhang, W. S. Chen, A multi-view model for visual tracking via correlation filters, Knowledge-Based Systems 113 (2016) 88–99.
  • (18) W. S. Chen, P. Yuen, J. Huang, B. Fang, Two-step single parameter regularization fisher discriminant method for face recognition, International Journal of Pattern Recognition and Artificial Intelligence 20 (02) (2008) 189–207.
  • (19) W. Ou, X. You, D. Tao, P. Zhang, Y. Tang, Z. Zhu, Robust face recognition via occlusion dictionary learning, Pattern Recognition 47 (4) (2014) 1559–1572.
  • (20) X. Y. Jing, X. Zhu, F. Wu, X. You, Q. Liu, D. Yue, R. Hu, B. Xu, Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning, in: Computer Vision and Pattern Recognition, 2015, pp. 695–704.
  • (21) Z. Lai, W. K. Wong, Y. Xu, J. Yang, D. Zhang, Approximate orthogonal sparse embedding for dimensionality reduction., IEEE Transactions on Neural Networks and Learning Systems 27 (4) (2016) 723–735.
  • (22) X. Shi, Z. Guo, F. Nie, L. Yang, D. Tao, Two-dimensional whitening reconstruction for enhancing robustness of principal component analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (10) (2016) 2130–2136.
  • (23) G. Qi, X. Y. Jing, F. Wu, Z. Wei, X. Liang, W. Shao, Y. Dong, H. Li, Structure-based low-rank model with graph nuclear norm regularization for noise removal, IEEE Transactions on Image Processing 26 (7) (2017) 3098–3112.
  • (24) W. Ou, D. Yuan, D. Li, B. Liu, D. Xia, W. Zeng, Patch-based visual tracking with online representative sample selection, Journal of Electronic Imaging 26 (3) (2017) 033006(1)–033006(12).
  • (25) T. N. T. Mai, S. Kim, Optimization for particle filter-based object tracking in embedded systems using parallel programming, in: International Conference on Computer Science and its Applications, 2016, pp. 246–252.
  • (26) S. Fazli, H. M. Pour, H. Bouzari, Particle filter based object tracking with sift and color feature, in: Second International Conference on Machine Vision, 2009, pp. 89–93.
  • (27) H. Zhou, Y. Gao, G. Yuan, R. Ji, Adaptive multiple cues integration for particle filter tracking, in: IET International Radar Conference, 2016, pp. 31–36.
  • (28) S. K. Zhou, R. Chellappa, B. Moghaddam, Visual tracking and recognition using appearance-adaptive models in particle filters, IEEE Transactions on Image Processing 13 (11) (2004) 1491–1506.
  • (29) J. F. Henriques, C. Rui, P. Martins, J. Batista, Exploiting the circulant structure of tracking-by-detection with kernels 7575 (1) (2012) 702–715.
  • (30) M. Danelljan, F. S. Khan, M. Felsberg, J. V. D. Weijer, Adaptive color attributes for real-time visual tracking, in: Computer Vision and Pattern Recognition, 2014, pp. 1090–1097.
  • (31) J. F. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with kernelized correlation filters, IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (3) (2014) 583–596.
  • (32) T. Zhang, A. Bibi, B. Ghanem, In defense of sparse tracking: Circulant sparse tracker, in: Computer Vision and Patter Recognition, 2016, pp. 3880–3888.
  • (33) W. Ou, G. Li, K. Zhang, G. Xie, Multi-view non-negative matrix factorization by patch alignment framework with view consistency, Neurocomputing 204 (C) (2016) 116–124.
  • (34) Y. Li, J. Zhu, S. C. H. Hoi, Reliable patch trackers: Robust visual tracking by exploiting reliable patches, in: Computer Vision and Pattern Recognition, 2015, pp. 353–361.
  • (35) Z. Guo, X. Wang, J. Zhou, J. You, Robust texture image representation by scale selective local binary patterns, IEEE Transactions on Image Processing 25 (2) (2015) 687–699.
  • (36) T. Liu, G. Wang, Q. Yang, Real-time part-based visual tracking via adaptive correlation filters, in: Computer Vision and Pattern Recognition, 2015, pp. 4902–4912.
  • (37) S. Liu, T. Zhang, X. Cao, C. Xu, Structural correlation filter for robust visual tracking, in: Computer Vision and Pattern Recognition, 2016, pp. 4312–4320.
  • (38) H. Van Trees, K. Bell, A tutorial on particle filters for online nonlinear/nongaussian bayesian tracking, Wiley-IEEE Press, 2007, pp. 723–737.
  • (39) Z. Lai, Y. Xu, Q. Chen, J. Yang, D. Zhang, Multilinear sparse principal component analysis, IEEE Transactions on Neural Networks and Learning Systems 25 (10) (2014) 1942–1950.
  • (40) X. You, X. Li, Z. He, X. F. Zhang, A robust local sparse tracker with global consistency constraint, Signal Processing 111 (C) (2014) 308–318.
  • (41) S. Zhang, H. Zhou, F. Jiang, X. Li, Robust visual tracking using structurally random projection and weighted least squares, IEEE Transactions on Circuits and Systems for Video Technology 25 (11) (2015) 1749–1760.
  • (42) X. Ma, Q. Liu, Z. He, X. Zhang, W. S. Chen, Visual tracking via exemplar regression model, Knowledge-Based Systems 106 (2016) 26–37.
  • (43) J. Qian, B. Fang, W. Yang, X. Luan, H. Nan, Accurate tilt sensing with linear model, IEEE Sensors Journal 11 (10) (2011) 2301–2309.
  • (44) Y. Wu, J. Lim, M. H. Yang, Online object tracking: A benchmark, in: IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2411–2418.
  • (45) Y. Wu, J. Lim, M. H. Yang, Object tracking benchmark, IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (9) (2015) 1834–1848.
  • (46) L. Cehovin, M. Kristan, A. Leonardis, Is my new tracker really better than yours?, in: Applications of Computer Vision, 2014, pp. 540–547.
  • (47) M. Everingham, J. Winn, The pascal visual object classes challenge 2010 (voc2010) development kit contents, in: International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, 2011, pp. 117–176.
  • (48) J. Gao, H. Ling, W. Hu, J. Xing, Transfer learning based visual tracking with gaussian processes regression, in: European Conference on Computer Vision, 2014, pp. 188–203.
  • (49) M. H. Yang, H. Lu, W. Zhong, Robust object tracking via sparsity-based collaborative model, in: Computer Vision and Pattern Recognition, 2012, pp. 1838–1845.
  • (50) S. Hare, A. Saffari, P. H. S. Torr, Struck: Structured output tracking with kernels, in: International Conference on Computer Vision, 2012, pp. 263–270.
  • (51) S. Hong, T. You, S. Kwak, B. Han, Online tracking by learning discriminative saliency map with convolutional neural network, Computer Science (2015) 597–606.
  • (52) K. Zhang, Q. Liu, Y. Wu, M. H. Yang, Robust visual tracking via convolutional networks without training, IEEE Transactions on Image Processing 25 (4) (2016) 1779–1792.
  • (53) J. Valmadre, L. Bertinetto, J. F. Henriques, A. Vedaldi, P. H. S. Torr, End-to-end representation learning for correlation filter based tracking, in: Computer Vision and Pattern Recognition, 2017, pp. 2805–2813.
  • (54) Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, M. H. Yang, Hedged deep tracking, in: Computer Vision and Pattern Recognition, 2016, pp. 4303–4311.
  • (55) K. Zhang, L. Zhang, Q. Liu, D. Zhang, M. H. Yang, Fast visual tracking via dense spatio-temporal context learning, in: European Conference on Computer Vision, 2014, pp. 127–141.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
4231
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description