Integrating Boundary and Center Correlation Filters for Visual Tracking with Aspect Ratio Variation
Abstract
The aspect ratio variation frequently appears in visual tracking and has a severe influence on performance. Although many correlation filter (CF)based trackers have also been suggested for scale adaptive tracking, few studies have been given to handle the aspect ratio variation for CF trackers. In this paper, we make the first attempt to address this issue by introducing a family of 1D boundary CFs to localize the left, right, top, and bottom boundaries in videos. This allows us cope with the aspect ratio variation flexibly during tracking. Specifically, we present a novel tracking model to integrate 1D Boundary and 2D Center CFs (IBCCF) where boundary and center filters are enforced by a nearorthogonality regularization term. To optimize our IBCCF model, we develop an alternating direction method of multipliers. Experiments on several datasets show that IBCCF can effectively handle aspect ratio variation, and achieves stateoftheart performance in terms of accuracy and robustness. The source code of our tracker is available at https://github.com/lifeng9472/IBCCF/.
1 Introduction
Visual tracking is one of the most fundamental problems in computer vision. And it plays a critical role in versatile applications such as video surveillance, intelligent transportation, and humancomputer interaction [36, 31, 35, 39, 5]. Given its annotation on the initial frame, visual tracking aims to estimate the trajectory of a target with large appearance variations caused by many factors, e.g., scale, rotation, occlusion, and background clutter. Although great advance has been made, it remains a challenging issue to develop an accurate and robust tracker to handle all these factors. Recently, correlation filter (CF)based approaches have received considerable attention in visual tracking [3, 14, 7]. In these methods, a discriminative correlation filter is trained to generate 2D Gaussian shaped responses centered at target position. Benefited from the circulant matrix and fast Fourier transform (FFT), the CFbased trackers usually perform very efficiently. With the introduction of deep features [23, 29] and spatial regularization [9, 22] and continuous convolution [11], the performance of CFbased trackers have been persistently improved, and lead to the stateoftheart performance.
Despite the advances in CFbased tracking, the aspect ratio variation remains an open problem. The changes in aspect ratio can be caused by variations inplane/outofplane rotation, deformation, occlusion, or scale variation, and usually have a severe effect on tracking performance. To handle scale variation, Li and Zhu [17] propose a Scale Adaptive with Multiple Features tracker (SAMF). Danelljan et al. [7] suggest a discriminative scale space tracking (DSST) method to learn separate CFs for translation and scale estimation on scale pyramid representation. Besides, scale variation issue can also be handled by the partbased CF trackers [21, 18, 20]. These methods, however, can only cope with scale variation issue but cannot well address the the aspect ratio variation issue. Fig. 1 illustrates the tracking results on three sequences with aspect ratio variation. It clearly shows that, even with deep CNN features, neither the standard CF (i.e. HCF [23]) tracker nor its scaleadaptive version (sHCF) can address the issue of aspect ratio variation caused by inplane/outofplane rotation and deformation.
In this paper, we present a visual tracking model to handle aspect ratio variation by integrating boundary and center correlation filters (IBCCF). The standard CF estimates the trajectory by finding the highest response in each frame to locate the center of the target, and can be seen as a center tracker. In contrast, we introduce a family of 1D boundary CFs to localize the positions of the left, right, top and bottom boundaries (i.e. ) in the sequences, respectively. By treating 2D boundary region as a multichannel representation of 1D vectors, a boundary CF is learned to generate 1D Gaussian shaped responses centered at target boundary (See Fig. 2). By using boundary CFs, the left, right, top and bottom boundaries can be flexibly tuned in the image sequences. Thus the aspect ratio variation can be naturally handled during tracking.
We empirically analyze and reveal the nearorthogonality property between the center and boundary CFs. Then, by enforcing the orthogonality with an additional regularization term (i.e. nearorthogonality constraint), we present a novel IBCCF tracking model to integrate 1D boundary and 2D center CFs. Meanwhile, an alternating direction method of multipliers (ADMM) [4] is then developed to optimize the proposed IBCCF model.
To evaluate our IBCCF, extensive experiments have been conducted on OTB2013, OTB2015 [35] , TempleColor [19], VOT2016 [5] and VOT2017 datasets. The results validate the effectiveness of IBCCF on handling aspect ratio variation. Compared with several stateoftheart trackers, our IBCCF achieves comparable performance in terms of accuracy and robustness. As shown in Fig. 1, by using CNN features, our IBCCF can well adapt to the aspect ratio variation in the three sequences, yielding better tracking performance.
To sum up, the contributions of this paper are threefold:

A novel IBCCF model is developed to address the aspect ratio variation in CFbased trackers. To achieve this, we first introduce a family of boundary CFs to track the left, right, top and bottom boundaries besides tracking the target center. Then, we combines boundary and center CFs by encouraging orthogonality between them for accurate tracking.

An ADMM algorithm is suggested to optimize our IBCCF model, where each subproblem has the closedform solution. Our algorithm alternates between updating center CFs and updating boundary CFs, and empirically converges with very few iterations.

The extensive experimental results demonstrate the effectiveness of our proposed IBCCF, and it achieves comparable tracking performance against several stateoftheart trackers.
2 Related Work
In this section, we provide a brief survey on CFbased trackers, and discuss several scale adaptive and partbased CF trackers close to our method.
2.1 Correlation Filter Trackers
Denote by an image patch of pixels. Let be a 2D Gaussian shaped labels. The correlation filter is then learned by minimizing the ridge regression objective:
(1) 
where denotes the regularization parameter, and is the 2D convolution operator. Denote by the Fourier transform of , and the complex conjugate of . Using fast Fourier transform (FFT), the closedform solution to Eqn. (1) can be given as:
(2) 
where denotes the elementwise multiplication, and represents the inverse Discrete Fourier transform operator.
From the pioneering MOSSE by Bolme et al. [3], great advances have been made in CFbased tracking. Henriques et al. [14] extend MOSSE to learn nonlinear CF via kernel trick. And the multichannel extension of CF has been studied in [15]. Driven by feature engineering, HOG [6], color names [10] and deep CNN features [23, 29] have been successively adopted in CFbased tracking. Other issues, such as longterm tracking [24], continuous convolution [11], spatial regularization [9, 22], and boundary effect [16], are also investigated to improve tracking accuracy and robustness. Besides ridge regression, other learning models, e.g., support vector machine (SVM) [40, 27] and sparse coding [32], are also introduced. Due to the page limits, in the following, we further review the scale adaptive and partbased CFs which are close to our work.
2.2 Scale Adaptive and Partbased CF Trackers
The first family of methods close to our approach are scale adaptive CF trackers, which aim to estimate target scale changes during tracking. Among the current scale adaptive CF trackers [9, 11, 24], SAMF [17] and DSST [7] are two commonly used methods for scale estimation. They apply the learned filter to samples of multiresolutions around the target, and compute the response for each scale of the sample whose the maximum response is seen as the optimal scale. However, such strategy is time consuming in the case of large scale space, and many improvements over them have been proposed. Tang and Feng [33] employ bisection search and fast feature scaling method to speed up scale space searching. Bibi and Ghanem [2] maximize the posterior probability rather than the likelihood (i.e. maximum response map) in different scales for more stable detections. Additionally, Zhang et al. [38] also suggest a robust scale estimation method by averaging the scales over consecutive frames. Despite these successes in isometric scale variation, such kind of methods cannot well address aspect ratio variation. Different from the aforementioned methods, the proposed IBCCF approach can handle aspect ratio variation effectively with the introduction of boundary CFs.
Our IBCCF also shares some philosophy with the partbased CF methods, which divide the entire target into several parts and merge the results from all parts for final prediction. For example, Liu et al. [21] divide the target into five parts which are assigned with five independent CF trackers, and the final target position estimation is obtained by merging five CF trackers using Bayesian inference methods. Different from simple dividing parts, Li et al. [18] propose to exploit reliable parts, which estimate their probability distributions under a sequential Monte Carlo framework and employ a Hough voting scheme to locate the target. In the similar line, Liu et al. [20] propose to jointly learn multiple parts from the target with CF trackers in an ADMM framework. Compared with the partbased trackers, the proposed IBCCF has several merits: (1) IBCCF chooses to track meaningful boundary regions, which is more general than the fixed partition based method [21] and easier to be handled than the learned parts based method [20]; (2) With the introduction of 1D boundary CFs, IBCCF can naturally deal with the aspect ratio variation problem; (3) The nearorthogonality constraint between boundary and center CFs encourages IBCCF better performance than partbased ones.
3 The Proposed IBCCF Model
In this section, we first introduce the boundary correlation filters. Then, we investigate the nearorthogonality property between the boundary and center CFs, and finally present our IBCCF model.
3.1 Boundary Correlation Filters
In standard CF, the bounding box of a target with fixed size is uniquely characterized by its center . By incorporating with scale estimation, the target bounding box can be determined by both the center and the scale factor . However, both standard and scale adaptive CFs cannot address the aspect ratio variation issue, so better description of bounding box is required. For CNNbased object detection [12], the bounding box is generally parameterized by center coordinate, its height and width. Although such parameterization scheme can cope with aspect ratio variation, it is difficult to predict target height and width in the CF framework.
In this work, the bounding box is parameterized with its left, right, top and bottom boundaries . It is natural to see that such parameterization is able to handle aspect ratio variation with dynamically adjusting four boundaries of target. Moreover, for each boundary of , a 1D boundary CF (BCF) is learned to estimate the left, right, top or bottom boundary, respectively. Taking the left boundary as an example, Fig. 2(b) illustrates the process of 1D boundary CF. Given a target bounding box, let be the center, and be the height and width. Its left boundary can be represented as . Then we crop a left boundary image region centered at with width and height .
To learn 1D boundary CF, the left boundary image region is treated as a multichannel (i.e. ) representation of 1D vectors . Denote by a 1D Gaussian shaped labels centered at . Then the 1D left boundary CF model can then be formulated as,
(3) 
where denotes the 1D convolution operator. For each channel of , its closed form solution can be obtained by,
(4) 
As shown in Fig. 2(a), the center region is convolved with a 2D CF to generate a 2D filtering responses. Then, the target center is determined by the position with the maximum response. Thus standard CF can be seen as a center CF (CCF) tracker. In contrast, as shown in Fig. 2(b), the left boundary region is first equivalently written as a multichannel representation of 1D vectors. The multichannel 1D vectors are then convolved with multichannel 1D correlation filters to produce 1D filtering responses. And the left boundary is determined by finding the position with the maximum response. Analogously, the other boundaries can also be obtained to track with the right, top and bottom boundary CFs, respectively. Fig. 3 shows the setting of the boundary regions based on the target bounding box.
When a new frame comes, we first crop the boundary regions, which are convolved with the corresponding boundary CFs. The left, right, top and bottom boundaries are then determined based on the corresponding 1D filtering responses. Note that each boundary is estimated independently. Thus, our BCF approach can adaptively fit target scale and aspect ratio.
3.2 Nearorthogonality between Boundary and Center CFs
It is natural to see that the boundary and center CFs are complementary and can be integrated to boost tracking performance. To locate target in the current frame, we can first detect an initial position with CCF, then BCFs are employed to further refine the boundaries and position. To update the tracker, we empirically investigate the relationship between the CCF and BCFs, and then suggest to include a nearorthogonality regularizer for better integration.
Suppose that the size of left boundary region for BCF is the same with that of center region for CCF. Without loss of generality, we let , , and be the vectorization of the center region, center CF, and left boundary CF, respectively. On one hand, the filtering responses of left boundary CF should be higher at the left boundary and near zero otherwise. So we have that , which indicates that and are nearly orthogonal.


On the other hand, the filtering responses of center CF achieve its maximum at the center position, and thus the angle between and should be small. Therefore, should be nearly orthogonal with .
However, in general the sizes of left boundary region and center region are not the same. From Fig. 3, one can see that they share a common region. Let be the vectorization of center CF in the common region, and so do and . We then extend the nearorthogonality property to the common region, and expect that and are also nearly orthogonal, i.e., . Analogously, we also expect that , , . Fig. 4 shows the angles between the center CF and boundary CFs in common region on the sequence Skiing. From Fig. 4(a), one can note that it is roughly hold true that the boundary and center CFs are nearly orthogonal. Thus, as illustrated in Fig. 4(b), we expect better nearorthogonality and tracking accuracy can be attained by imposing the nearorthogonality constraint on the training of CCF and BCFs. Empirically, for Skiing, the introduction of nearorthogonality constraint does bring 4.7% gains by overlap precision during tracking.
3.3 Problem Formulation of IBCCF
By enforcing the nearorthogonality constraint, we propose our IBCCF model to integrate boundary and center CFs, resulting in the following objective,
(5) 
Comparison with DSST [7]. Even DSST also adopts 1D and 2D CFs, it learns separate 2D and 1D CFs for translation and scale estimation on scale pyramid representation, respectively. And our IBCCF is distinctly different with DSST from three aspects: (i) While DSST formulates scale estimation as 1D CF, BCF is among the first to suggest a novel parameterization of bounding box, and formulates boundary localization as 1D CFs. (ii) For DSST, the inputs to 1D CF are image patches at different scales, while the input for BCF is four regions covering the edges of bounding boxes. (iii) In DSST, the 1D CF and 2D translation CF are separately trained. While in IBCCF, 1D BCFs and 2D CCF are jointly learned by solving the IBCCF model in Eqn. (5).
4 Optimization
In this section, we propose an ADMM method to minimize Eqn. (5) by alternately updating center CF and boundary CFs (), where each subproblem can be easily solved with a closeform solution.
We first employ variable splitting method to change Eqn. (5) into a linear equality constrained optimization problem:
(6)  
Hence, the Augmented Lagrangian Method (ALM) can be applied to solve Eqn. (6), and its augmented lagrangian form [4] is reformulated as:
(7)  
where , and represent the Lagrange multiplier, and are penalty factors, respectively. For multivariable nonconvex optimization, ADMM iteratively updates one of variables while keeping the rest fixed, hence the convergence can be guaranteed [4]. By using ADMM, Eqn. (7) is divided into the following subproblems:
(8) 
where and .
From Eqn. (5), we can see that the boundary CFs are independent of each other, so each pair of and can be updated in parallel for efficiency.
Next, we detail the solution to each subproblem as follows:
Subproblem . Using the properties of circulant matrix and FFT, the closed form solution of is given as:
(9) 
Subproblem . The second row of Eqn. (8) is rewritten as:
(10) 
where the matrix is obtained by padding zeros to each column of . Then can be computed by:
(11) 
Note that matrix only contains four columns, thus Singular Value Decomposition (SVD) can be used for improving the efficiency. By performing SVD of with , we have:
(12) 
where . Let the nonzero elements in matrix be , the nonzero elements of diagonal matrix become . Hence, Eqn. (12) can be written as:
(13) 
Since diagonal matrix only contains four nonzero elements, we have , where is the first four columns of matrix
and denotes the diagonal matrix of the nonzero elements in . Such special case can be solved efficiently ^{1}^{1}1Please refer to SVD function with “economy” mode in Matlab..
Subproblem . The solution of shares similar solution with one of Eqn. (9):
(14) 
Subproblem . The fourth row of Eqn. (8) is written as:
(15)  
where and the closeform solution of is:
(16) 
Since is rank1 matrix, Eqn. (16) can be efficiently solved with ShermanMorrsion formula [28] , so we have:
(17) 
Convergence. To verify the effectiveness of the proposed ADMM, we illustrate the convergence curve with ADMM on sequence Skiing. As shown in Fig. 5, although IBCCF model is a nonconvex problem, we can see that it converges within very few iterations (four iterations in this case). This phenomenon is ubiquitous in our experiments, and most of the sequences converges within five iterations.
5 Experiments
CCOT [11]  SINT+ [34]  SKSCF [40]  ScaleDLSSVM [27]  Staple [1]  SRDCF [9]  DeepSRDCF [8]  RPT [18]  MEEM [37]  DSST [7]  SAMF [17]  HCF [23]  SCF [20]  sHCF  IBCCF  

OTB2013  83.4  81.3  80.9  73.2  74.9  78  79.2  71.4  69.1  67.7  68.9  73.5  79.7  73.4  83.7 
OTB2015  82.7    67.4  65.2  71.3  72.7  77.6  64  62.3  62.2  64.5  65.6    69.2  78.4 
In this section, we first compare IBCCF with stateoftheart trackers on OTB dataset [35]. Then we validate the effects of each component on IBCCF, and analyze the time cost using OTB dataset. Finally, we conduct comparative experiments on TempleColor [19] and VOT benchmarks.
Following the common settings in HCF [23], we implement IBCCF by using the outputs of layers conv34, conv44 and conv54 of VGGNet19 [30] for feature extraction. To combine the responses from different layers, we follow the HCF setting and assign the weights of three layers in the center CF to 0.02, 0.5 and 1, respectively. For boundary CFs, we omit the layer of conv34 and set the weights for layers of conv44 and conv54 both to 1. The regularization parameters and are set to and 0.1, respectively. Note that we employ a subset of 40 sequences from TempleColor dataset as the validation set to choose the above parameters. Detailed description about the subset and corresponding experiments are given in Section 5.3. Our approach is implemented with Matlab by using MatConvNet Library. The average running time is about 1.25fps on a PC equipping with a Intel Xeon(R) 3.3GHz CPU, 32GB RAM and NVIDIA GTX 1080 GPU.
5.1 OTB benchmark
OTB benchmark consists of two subsets, i.e., OTB2013 and OTB2015. OTB2013 contains 51 sequences annotated with 11 different attributes, such as scale variation, occlusion and low resolution. OTB2015 extends OTB2013 to 100 videos. We quantitatively evaluate our method with OnePass Evaluation (OPE) protocol, where overlap precision (OP) metrics is used by computing the fraction of frames with bounding box overlaps exceeding 0.5 in a sequence. Besides, we also provide overlap success plots containing the OP metrics over a range of thresholds.
5.1.1 Comparison with stateofthe art trackers
We compare our algorithm with 13 stateoftheart methods: HCF [23], CCOT [11], SRDCF [9], DeepSRDCF [8], SINT+ [34], RPT [18], SCF [20], SAMF [17], ScaleDLSSVM [27], Staple [1], DSST [7], MEEM [37] and SKSCF [40]. Among them, most trackers except HCF and MEEM perform scale estimation during tracking. And both RPT and SCF methods exploit partbased models. In addition, for verifying the effectiveness of BCFs on handling aspect ratio variation, we implement a HCF variant with five scales (denoted by sHCF) under the SAMF framework. Note that we employ publicly available codes of compared trackers or copy the results from the original paper for fair comparison.
Table 1 lists a comparison of mean OP on OTB2013 and OTB2015 datasets. From it we can draw the following conclusions: (1) Our IBCCF outperforms most trackers except CCOT [11] and surpasses its counterpart HCF (i.e., only center CF) by 10.2% and 12.8% on OTB2013 and OTB2015, respectively. We owe these significant improvements to integration of boundary CFs. CCOT achieves higher mean OP than IBCCF on OTB2015. It should be noted that spatial regularization is considered to suppress the boundary effect in both DeepSRDCF [8] and CCOT. Furthermore, CCOT also extends DeepSRDCF by learning multiple convolutional filters in continuous spatial domain. In contrast, our IBCCF does not consider the spatial regularization and continuous convolution, and can yield favorable performance against the competing trackers. (2) our IBCCF is consistently superior to sHCF and other scale estimation based methods (e.g., DeepSRDCF) on both datasets. It indicates our boundary CFs are more helpful than scale estimation to CFbased trackers. (3) Compared with partbased trackers (e.g., SCF), our IBCCF also shows its superiority, i.e., 4% gains over SCF on OTB2013 dataset.
Next, we show the overlap success plots of different trackers, which are ranked using the AreaUndertheCurve (AUC) score. As shown in Fig. 6, our IBCCF tracker is among the top three trackers on both datasets and outperforms HCF by 5.9% and 6.8% on OTB2013 and OTB2015 datasets, respectively.






5.1.2 Video attributes related to aspect ratio variation
In this subsection, we perform analysis of attributes influencing aspect ratio variation on OTB2015 dataset. Here we only provide the overlap success plots for four attributes which have great influence on aspect ratio variation and the rest results can be found in the supplementary material.
Scale Variations: In the case of scale variations, target size continuously changes during tracking. It is worth noting that most of videos in this attribute only undergo scale changes rather than aspect ratio variation. Despite in such setting, as illustrated in the upper left of Fig. 7, IBCCF still performs favorably among the compared trackers and is superior to the sHCF method by 9.5%, demonstrating its effectiveness on handling target size variations.
Inplane/Outofplane Rotation: In this case, targets encounter with rotation due to fast motion or viewpoint changes, which often cause aspect ratio variation of targets. As shown in the upper right and lower left of Fig. 7, our IBCCF is robust to such kinds of variations and outperforms most of other trackers. Specifically, IBCCF achieves remarkable improvements over its counterparts HCF, i.e., 2.5% and 6.4% gains in case of inplane and outofplane rotation, respectively. It indicates our IBCCF can deal with aspect ratio variation caused by rotations.
Occlusion: Obviously, the partial occlusions can lead to aspect ratio variation of target. And complete occlusions also have an adverse impact on the boundary prediction. Despite of these negative effects, IBCCF still outperforms most of the competing trackers and brings 7.6% gains over the center CF tracker (i.e. HCF).
5.2 Internal Analysis of the proposed approach
5.2.1 Impacts of Boundary CFs and nearorthogonality
Here we investigate the impact of boundary CFs and nearorthogonality property on the proposed IBCCF approach. To achieve this, we make four different variants of IBCCF: the tracker only with 1D boundary CFs (BCFs), the tracker only with the center CF (i.e., HCF), the IBCCF tracker without orthogonality constraints denoted by IBCCF (w/o constraint) and full IBCCF model. Table 2 summarizes the mean OP and AUC score of four methods on OTB2015.
From Table 2, one can see that both BCFs tracker and orthogonality constraint are key parts of the proposed IBCCF method, and they can bring significant improvements over the center CF tracker. Detailed analysis on the results can be found in the supplementary material.
Mean OP  AUC Score  

BCFs  50.2  39 
Center CF  65.6  56.2 
IBCCF (w/o constraints)  72  58.9 
IBCCF  78.4  63 
5.2.2 Time Analysis
Here we analyze the average time cost of IBCCF for each stage on OTB2015 dataset. The results are shown in Table 3. One can clearly see that all subproblems including and can be solved rapidly, validating the efficiency of the ADMM solution. Overall, the average running time of IBCCF and IBCCF (w/o constraint) is about 1.25 and 2.19 fps on OTB2015 dataset, respectively.
Time Cost(ms)  

CCF Feature Extraction  95 
BCFs Feature Extraction  141 
CCF Prediction  26 
BCFs Prediction  40 
Subproblem  51 
Subproblem  37 
Subproblem  40 
Subproblem  4 
DSST [7]  sKCF  Struck [13]  CCOT [11]  SRDCF [9]  DeepSRDCF [8]  Staple [1]  MDNet_N [26]  TCNN [25]  DPT  SMPR  SHCT  HCF [23]  IBCCF  

EAO  0.181  0.153  0.142  0.331  0.247  0.276  0.295  0.257  0.325  0.236  0.147  0.266  0.220  0.266 
Accuracy  0.5  0.44  0.44  0.52  0.52  0.51  0.54  0.53  0.54  0.48  0.44  0.54  0.47  0.51 
Robustness  2.72  2.75  1.5  0.85  1.5  1.17  1.35  1.2  0.96  1.75  2.78  1.42  1.38  1.22 
5.3 TempleColor dataset
In this section, we perform comparative experiments on TempleColor dataset which contains 128 color sequences. Different from the OTB dataset, it contain more video sequences with aspect ratio changes. Hence, to better exploit the potential of IBCCF, we also choose a subset of 40 sequences with the largest standard deviations of aspect ratio variation from TempleColor dataset and compare IBCCF with other methods. Note that the sequences in the subset are not overlapped with other datasets. In addition, for validating the effectiveness of IBCCF with handcrafted features, we implement two variants of IBCCF with HOG and color name [10] features (i.e. IBCCFHOGCN, IBCCFHOGCN (w/o constraint)).
Fig. 8 illustrates the comparison of overlap success plots for different trackers on two datasets. From Fig. 8(a), one can see that IBCCF ranks the second among all trackers, demonstrating the effectiveness of IBCCF on handling aspect ratio variation again. Furthermore, IBCCFHOGCN also performs favorably against other methods and surpasses all of its counterparts (i.e. IBCCFHOGCN (w/o constraint), DSST and SAMF). This validates the superiority of IBCCF under handcrafted feature setting. As shown in Fig. 8(b), IBCCF is among the top three bestperformed trackers and outperforms its counterpart HCF by 4.4%.


5.4 The VOT benchmarks
Finally, we conduct experiments on Visual Object Tracking (VOT) benchmark [5], which consist of 60 challenging videos from reallife datasets. In VOT benchmark, a tracker is initialized at the first frame and reset again when it drifts the target. The performance is measured in terms of accuracy, robustness and expected average overlap (EAO). The accuracy computes the average overlap ratio between the estimated positions and ground truth. The robustness score evaluates the average number of tracking failures. And the EAO metrics measures the average noreset overlap of a tracker run on several shortterm sequences.
VOT2016 results. We compare IBCCF with several stateoftheart trackers, including MDNet [26] (VOT2015 winner), TCNN [25] (VOT2016 winner) and partbased trackers such as DPT, GGTv2, SMPR and SHCT. All the results are obtained from VOT2016 challenge website^{2}^{2}2http://www.votchallenge.net/vot2016/. Table 4 lists the results on VOT2016 dataset. One can note that IBCCF outperforms HCF method in terms of all three metrics. In addition, IBCCF also performs favorably against the partbased trackers, validating the superiority of boundary tracking on handling aspect ratio variation.
VOT2017 results. At the time of writing, the results of VOT2017 challenge were not available. Hence, we only report our results on the three metrics. In particular, the EAO, accuracy and robustness scores of IBCCF on VOT2017 dataset are 0.209, 0.48 and 1.57, respectively.
6 Conclusion
In this work, we propose a tracking framework by integrating boundary and center correlation filters (IBCCF) to address the aspect ratio variation problem. Besides tracking the target center, a family of 1D boundary CFs is introduced to localize the left, right, top and bottom boundaries, thus can adapt to the target scale and aspect ratio changes flexibly. Furthermore, we analyze the nearorthogonality property between the center and boundary CFs, and impose an extra orthogonality constraint on the IBCCF model for improving the performance. An ADMM algorithm is also developed to solve the proposed model. We perform both qualitative and quantitative evaluation on four challenging benchmarks, and the results show that the proposed IBCCF approach perform favorably against several stateoftheart trackers. Since we only employ the basic HCF model as the center CF tracker, in the future, we will incorporate with spatial regularization and continuous convolution to further improve our IBCCF.
7 Acknowledgements
This work is supported by the National Natural Science Foundation of China (grant no. 61671182 and 61471082) and Hong Kong RGC General Research Fund (PolyU 152240/15E).
References
 [1] L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik, and P. Torr. Staple: Complementary learners for realtime tracking. In CVPR, 2016.
 [2] A. Bibi and B. Ghanem. Multitemplate scaleadaptive kernelized correlation filters. In ICCVW, 2015.
 [3] D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui. Visual object tracking using adaptive correlation filters. In CVPR, 2010.
 [4] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1–122, 2011.
 [5] L. Čehovin, A. Leonardis, and M. Kristan. Visual object tracking performance measures revisited. TIP, 25(3):1261–1274, 2016.
 [6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.
 [7] M. Danelljan, G. Hager, F. S. Khan, and M. Felsberg. Discriminative scale space tracking. TPAMI, PP(99):1–1, 2016.
 [8] M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. Convolutional features for correlation filter based visual tracking. In ICCVW, 2015.
 [9] M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. Learning spatially regularized correlation filters for visual tracking. In ICCV, 2015.
 [10] M. Danelljan, F. S. Khan, M. Felsberg, and J. V. D. Weijer. Adaptive color attributes for realtime visual tracking. In CVPR, 2014.
 [11] M. Danelljan, A. Robinson, F. Shahbaz Khan, and M. Felsberg. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In ECCV, 2016.
 [12] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
 [13] S. Hare, A. Saffari, and P. H. Torr. Struck: Structured output tracking with kernels. In ICCV, 2011.
 [14] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Highspeed tracking with kernelized correlation filters. TPAMI, 37(3):583–596, 2015.
 [15] H. Kiani Galoogahi, T. Sim, and S. Lucey. Multichannel correlation filters. In ICCV, 2013.
 [16] H. Kiani Galoogahi, T. Sim, and S. Lucey. Correlation filters with limited boundaries. In CVPR, 2015.
 [17] Y. Li and J. Zhu. A scale adaptive kernel correlation filter tracker with feature integration. In ECCVW, 2014.
 [18] Y. Li, J. Zhu, and S. C. H. Hoi. Reliable patch trackers: Robust visual tracking by exploiting reliable patches. In CVPR, 2015.
 [19] P. Liang, E. Blasch, and H. Ling. Encoding color information for visual tracking: Algorithms and benchmark. TIP, 24(12):5630–5644, 2015.
 [20] S. Liu, T. Zhang, X. Cao, and C. Xu. Structural correlation filter for robust visual tracking. In CVPR, 2016.
 [21] T. Liu, G. Wang, and Q. Yang. Realtime partbased visual tracking via adaptive correlation filters. In CVPR, 2015.
 [22] A. Lukežič, T. Vojíř, L. Čehovin, J. Matas, and M. Kristan. Discriminative correlation filter with channel and spatial reliability. arXiv preprint arXiv:1611.08461, 2016.
 [23] C. Ma, J.B. Huang, X. Yang, and M.H. Yang. Hierarchical convolutional features for visual tracking. In ICCV, 2015.
 [24] C. Ma, X. Yang, C. Zhang, and M. H. Yang. Longterm correlation tracking. In CVPR, 2015.
 [25] H. Nam, M. Baek, and B. Han. Modeling and propagating cnns in a tree structure for visual tracking. arXiv preprint arXiv:1608.07242, 2016.
 [26] H. Nam and B. Han. Learning multidomain convolutional neural networks for visual tracking. In CVPR, June 2016.
 [27] J. Ning, J. Yang, S. Jiang, L. Zhang, and M. H. Yang. Object tracking via dual linear structured svm and explicit feature map. In CVPR, 2016.
 [28] M. Pedersen, K. Baxter, B. Templeton, and D. Theobald. The matrix cookbook. Technical University of Denmark, 7:15, 2008.
 [29] Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M. H. Yang. Hedged deep tracking. In CVPR, 2016.
 [30] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556, 2014.
 [31] A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah. Visual tracking: An experimental survey. TPAMI, 36(7):1442–1468, 2014.
 [32] Y. Sui, Z. Zhang, G. Wang, Y. Tang, and L. Zhang. Realtime visual tracking: Promoting the robustness of correlation filter learning. In ECCV, 2016.
 [33] M. Tang and J. Feng. Multikernel correlation filter for visual tracking. In ICCV, 2015.
 [34] R. Tao, E. Gavves, and A. W. Smeulders. Siamese instance search for tracking. In CVPR, 2016.
 [35] Y. Wu, J. Lim, and M.H. Yang. Object tracking benchmark. TPAMI, 37(9):1834–1848, 2015.
 [36] H. Yang, L. Shao, F. Zheng, L. Wang, and Z. Song. Recent advances and trends in visual tracking: A review. Neurocomputing, 74(18):3823–3831, 2011.
 [37] J. Zhang, S. Ma, and S. Sclaroff. Meem: robust tracking via multiple experts using entropy minimization. In ECCV, 2014.
 [38] K. Zhang, L. Zhang, Q. Liu, D. Zhang, and M.H. Yang. Fast visual tracking via dense spatiotemporal context learning. In ECCV, 2014.
 [39] L. Zhang, W. Wu, T. Chen, N. Strobel, and D. Comaniciu. Robust object tracking using semisupervised appearance dictionary learning. Pattern Recognition Letters, 62(C):17–23, 2015.
 [40] W. Zuo, X. Wu, L. Lin, L. Zhang, and M.H. Yang. Learning support correlation filters for visual tracking. arXiv preprint arXiv:1601.06032, 2016.