光子学报, 2019, 48 (3): 0315002, 网络出版: 2019-04-02
自适应特征选择的分层卷积视觉跟踪
Hierarchical Convolutional Features via Adaptive Selection for Visual Tracking
机器视觉 视觉跟踪 相关滤波 卷积神经网络 通道选择 Machine vision Object tracking Correlation filter Convolutional neural network Channel selection
摘要
为提升分层卷积相关滤波跟踪算法的速度和精度, 减少无效卷积通道特征对跟踪精度的影响, 提出一种自适应特征选择的分层卷积相关滤波跟踪方法.该方法选取能表征目标的双层卷积特征, 将相关滤波训练与预测合并, 在视频序列的每一帧计算上一帧目标区域与非目标区域的卷积特征均值比, 选取满足特征均值比要求的卷积通道特征训练相关滤波分类器, 根据分类器与目标特征的最大响应值预测目标位置; 最后根据预测结果稀疏更新目标初始帧特征, 作为后续帧训练分类器的依据.在OTB-100标准数据集上对算法进行测试, 实验结果表明本文算法的平均距离精度为91%, 平均重叠率精度为64.4%, 平均速度为21.7帧/秒, 比原分层卷积相关滤波跟踪算法分别高出7.3、8.2个百分点和11.3帧/秒, 该算法的平均距离精度比高精度的连续卷积跟踪算法(CCOT)高1.2个百分点, 跟踪速度是CCOT的近20倍.本文算法可以有效提升分层卷积跟踪算法的速度和精度, 在目标发生遮挡、快速运动等干扰时能稳定跟踪到目标.
Abstract
In order to improve the speed and accuracy of the hierarchical convolutional features for visual tracking algorithm, and weaken the influence of the inefficient features in different channels, an adaptive hierarchical convolutional features for visual tracking based on correlation filter framework is proposed. In this paper, we select features from two hierarchical convolutional layers representing objects, and combine the filter training with prediction. For each frame of the video sequence, the correlation filter is trained by features which are screened through the average convolutional feature ratio between the target′s region and non-target′s region in the former frame. Then the object′s position is predicted with the maximum response obtained by the classifier and the target′s features. Finally, we sparsely update the features of the initial frame in accordance with the predicted result. The proposed method is tested on OTB-100 benchmark dataset. The results show that the average distance precision is 91%, along with the average overlap accuracy 64.4% and the average speed 21.7 frames per second, which are 7.3 percentage points, 8.2 percent points higher and 11.3 frames per second faster than the original tracking method, respectively. Besides, the average distance accuracy is 1.2 percent points higher than the continuous convolution operators for visual tracking (CCOT), and the tracking speed is almost 20 times faster than CCOT. This method can improve the speed and accuracy of the convolutional tracking method effectively. It can track stably when subjected occlusion, fast moving and other interferes.
熊昌镇, 车满强, 葛金鹏. 自适应特征选择的分层卷积视觉跟踪[J]. 光子学报, 2019, 48(3): 0315002. XIONG Chang-zhen, CHE Man-qiang, GE Jin-peng. Hierarchical Convolutional Features via Adaptive Selection for Visual Tracking[J]. ACTA PHOTONICA SINICA, 2019, 48(3): 0315002.