基于双阈值-非极大值抑制的Faster R-CNN改进算法

根据目标检测算法中出现的目标漏检和重复检测问题，本文提出了一种基于双阈值-非极大值抑制的Faster R-CNN改进算法。算法首先利用深层卷积网络架构提取目标的多层卷积特征，然后通过提出的双阈值-非极大值抑制(DT-NMS)算法在RPN阶段提取目标候选区域的深层信息，最后使用了双线性插值方法来改进原RoI pooling层中的最近邻插值法，使算法在检测数据集上对目标的定位更加准确。实验结果表明，DT-NMS算法既有效地平衡了单阈值算法对目标漏检问题和目标误检问题的关系，又针对性地减小了同一目标被多次检测的概率。与soft-NMS算法相比，本文算法在PASCAL VOC2007上的重复检测率降低了2.4%，多次检测的目标错分率降低了2%。与Faster R-CNN算法相比，本文算法在PASCAL VOC2007上检测精度达到74.7%，性能提升了1.5%。在MSCOCO数据集上性能提升了1.4%。同时本文算法具有较快的检测速度，达到16 FPS。

Abstract

According to the problems of target missed detection and repeated detection in the object detection algorithm, this paper proposes an improved Faster R-CNN algorithm based on dual threshold-non-maximum suppression. The algorithm first uses the deep convolutional network architecture to extract the multi-layer convolution features of the targets, and then proposes the dual threshold-non-maximum suppression (DT-NMS) algorithm in the RPN(region proposal network). The phase extracts the deep information of the target candidate regions, and finally uses the bilinear interpolation method to improve the nearest neighbor interpolation method in the original RoI pooling layer, so that the algorithm can more accurately locate the target on the detection dataset. The experimental results show that the DT-NMS algorithm effectively balances the relationship between the single-threshold algorithm and the target missed detection problem, and reduces the probability of repeated detection. Compared with the soft-NMS algorithm, the repeated detection rate of the DT-NMS algorithm in PASCAL VOC2007 is reduced by 2.4%, and the target error rate of multiple detection is reduced by 2%. Compared with the Faster R-CNN algorithm, the detection accuracy of this algorithm on the PASCAL VOC2007 is 74.7%, the performance is improved by 1.5%, and the performance on the MSCOCO dataset is improved by 1.4%. At the same time, the algorithm has a fast detection speed, reaching 16 FPS.

参考文献

[1] Borji A, Cheng M M, Jiang H Z, et al. Salient object detection: a benchmark[J]. IEEE Transactions on Image Processing, 2015, 24(12): 5706–5722.

[2] 罗海波, 许凌云, 惠斌, 等. 基于深度学习的目标跟踪方法研究现状与展望[J]. 红外与激光工程, 2017, 46(5): 0502002.

Luo H B, Xu L Y, Hui B, et al. Status and prospect of target tracking based on deep learning[J]. Infrared and Laser Engineering, 2017, 46(5): 0502002.

[3] 侯志强, 韩崇昭. 视觉跟踪技术综述[J]. 自动化学报, 2006, 32(4): 603–617.

Hou Z Q, Han C Z. A survey of visual tracking[J]. Acta Automatica Sinica, 2006, 32(4): 603–617.

[4] 辛鹏, 许悦雷, 唐红, 等. 全卷积网络多层特征融合的飞机快速检测[J]. 光学学报, 2018, 38(3): 0315003.

Xin P, Xu Y L, Tang H, et al. Fast airplane detection based on multi-layer feature fusion of fully convolutional networks[J]. Acta Optica Sinica, 2018, 38(3): 0315003.

[5] 戴伟聪, 金龙旭, 李国宁, 等. 遥感图像中飞机的改进YOLOv3实时检测算法[J]. 光电工程, 2018, 45(12): 180350.

Dai W C, Jin L X, Li G N, et al. Real-time airplane detection algorithm in remote-sensing images based on improved YOLOv3[J]. Opto-Electronic Engineering, 2018, 45(12): 180350.

[6] 王思明, 韩乐乐. 复杂动态背景下的运动目标检测[J]. 光电工程, 2018, 45(10): 180008

Wang S M, Han L L. Moving object detection under complex dynamic background[J]. Opto-Electronic Engineering, 2018, 45(10): 180008.

[7] 周炫余, 刘娟, 卢笑, 等. 一种联合文本和图像信息的行人检测方法[J]. 电子学报, 2017, 45(1): 140–146.

Zhou X Y, Liu J, Lu X, et al. A method for pedestrian detection by combining textual and visual information[J]. Acta Electronica Sinica, 2017, 45(1): 140–146.

[8] 曹明伟, 余烨. 基于多层背景模型的运动目标检测[J]. 电子学报, 2016, 44(9): 2126–2133.

Cao M W, Yu Y. Moving object detection based on multi-layer background model[J]. Acta Electronica Sinica, 2016, 44(9): 2126–2133.

[9] Zhang Z S, Qiao S Y, Xie C H, et al. Single-shot object detection with enriched semantics[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 5813–5821.

[10] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 779–788.

[11] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[EB/OL]. (2016-12-29) [2019-05-28]. arXiv: 1512. 02325 v1. https://arxiv.org/abs/1512.02325v1.

[12] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 580–587.

[13] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916.

[14] Girshick R. Fast R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440–1448.

[15] Ren S, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[EB/OL]. (2015-06-04)[2019-05-28]. arXiv: 1506.01497. https://arxiv.org/ abs/1506.01497?source=post_page.

[16] Bodla N, Singh B, Chellappa R, et al. Soft-NMS – improving object detection with one line of code[C]//Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5562–5570.

[17] He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]// Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980–2988.

[18] Wang X L, Shrivastava A, Gupta A. A-Fast-RCNN: hard positive generation via adversary for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3039–3048.

[19] Kong T, Sun F C, Yao A B, et al. RON: reverse connection with objectness prior networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5244–5252.

[20] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6517–6525.

[21] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, doi: 10.1109/TPAMI.2018.2858826.

[22] Redmon J, Farhadi A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08)[2019-05-28]. arXiv: 1804.02767. https://arxiv.org/abs/1804.02767.

[23] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.

[24] Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2261–2269.

侯志强, 刘晓义, 余旺盛, 马素刚. 基于双阈值-非极大值抑制的Faster R-CNN改进算法[J]. 光电工程, 2019, 46(12): 190159. Hou Zhiqiang, Liu Xiaoyi, Yu Wangsheng, Ma Sugang. Improved algorithm of Faster R-CNN based on double threshold-non-maximum suppression[J]. Opto-Electronic Engineering, 2019, 46(12): 190159.

基于双阈值-非极大值抑制的Faster R-CNN改进算法

关于本站 Cookie 的使用提示

全站搜索

基于双阈值-非极大值抑制的Faster R-CNN改进算法

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索