基于深度卷积神经网络的目标检测研究综述

作为计算机视觉中的基本视觉识别问题, 目标检测在过去的几十年中得到了广泛地研究。目标检测旨在给定图像中找到具有准确定位的特定对象, 并为每个对象分配一个对应的标签。近年来, 深度卷积神经网络DCNN(Deep Convolutional Neural Networks)凭借其特征学习和迁移学习的强大能力在图像分类方面取得了一系列突破, 在目标检测方面, 它越来越受到人们的重视。因此, 如何将CNN应用于目标检测并获得更好的性能是一项重要的研究。首先回顾和介绍了几类经典的目标检测算法; 然后将深度学习算法的产生过程作为切入点, 以系统的方式全面概述了各种目标检测方法; 最后针对目标检测和深度学习算法面临的重大挑战, 讨论了一些未来的方向, 以促进深度学习对目标检测的研究。

Abstract

Object detection, which is a fundamental visual recognition problem in computer vision, has been extensively studied in the past few decades and has become one of the popular research areas in the world. The aim of object detection is to accurately locate specific objects in a given image and assign a corresponding label to each object. In recent years, Deep Convolutional Neural Networks (DCNN) have been used in a series of developments in object detection and image classification owing to their powerful capabilities of feature learning and transfer learning.It has garnered considerable attention in the field of computer vision for object detection. Therefore, the method of applying CNN in target detection to obtain better performance is an important topic for research.First, we reviewed and introduced several types of classic object detection algorithms.Next, we considered the generation process of the deep learning algorithm as a starting point, analyzed the technical ideas and key problems of DCNN in the application of target detection, and provided a comprehensive overview of various target detection methods in a systematic manner. Finally, in view of the major challenges in target detection and deep learning algorithms, we provided future development scope and direction to promote the study of target detection using deep learning.

参考文献

[1] KHAN A, RINNER B, CAVALLLARO A. Cooperative robots to observe moving targets ［J］. IEEE Transactions on Cybernetics, 2016, 48(1): 187-198.

[2] SAPUTERA Y P, WAHAB M, ESTU T T. Radar Software Development for the Surveillance of Indonesian Aerospace Sovereignty ［C］. 2018 International Conference on Electrical Engineering and Computer Science (ICECOS), IEEE, 2018: 189-194.

[3] ANTON S D, SINH S, SCHOTTEN H D. Anomaly-based Intrusion Detection in Industrial Data with SVM and Random Forests ［C］. 2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), IEEE, 2019: 1-6.

[4] 王耀东, 朱力强, 余祖俊, 等. 用于机械系统瞬时目标的双视角高速视觉检测系统［J］.光学精密工程, 2017, 25(10): 2725-2735.

WANG Y D, ZHU L Q, YU Z J, et al.. Two-view high speed vision system for instant object detection in mechanical system ［J］. Opt. Precision Eng., 2017, 25(10): 2725-2735.(in Chinese)

[5] JIANG A Q, HUYNH D. Multiple pedestrian tracking from monocular videos in an interacting multiple model framework ［J］. IEEE Transactions on Image Processing, 2017, 27(3): 1361-1375.

[6] 张小荣, 胡炳樑, 潘志斌, 等. 基于张量表示的高光谱图像目标检测［J］.光学精密工程, 2019, 27(2): 488-498.

ZHANG X R, HU B L, PAN ZH B, et al.. Tensor representation based target detection for hyperspectral imagery ［J］. Opt. Precision Eng., 27(2): 488-498. (in Chinese)

[7] 李正周, 曹雷, 邵万兴, 等. 基于空时混沌分析的海面小弱目标检测精密工程［J］.光学精密工程, 2018, 26(1): 193-199.

LI ZH ZH, CAO L, SHAO W X, et al.. Detection of small target in sea clutter based on spatio-temporal chaos analysis ［J］. Opt. Precision Eng., 2018, 26(1): 193-199. (In Chinese)

[8] LOWE D. Distinctive image features from scale-invariant keypoints ［J］. International Journal of Computer Vision, 2004, 60(2): 91-110.

[9] CAI Z W, SABERIAN M, VASCONCELOS N. Learning complexity-aware cascades for deep pedestrian detection ［C］. Proceedings of the IEEE International Conference on Computer Vision, 2015: 3361-3369.

[10] VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features ［J］. CVPR, 2001, 1(3): 511-518.

[11] ZHANG C X, ZHANG J S, KIM S W. PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection ［J］. Computational Statistics, 2016, 31(4): 1237-1262.

[12] PEI L, YE M, ZHAO X Z, et al.. Learning spatio-temporal features for action recognition from the side of the video ［J］. Signal, Image Video Processing, 2016, 10(1): 199-206.

[13] LECUN Y, BOTTOU L, BENGIO Y, et al.. Haffner, "Gradient-based learning applied to document recognition ［J］. Proceedings of the IEEE, 1998, 86(11): 2278-2324.

[14] LECUN Y, BOSER B, DENKER J, et al.. Handwritten digit recognition with a back-propagation network ［J］. Advances in Neural Information Processing Systems, 1990: 396-404.

[15] HECHT-NIELSEN. Theory of the backpropagation neural network ［J］. Neural networks for perception: Elsevier, 1992: 65-93.

[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks ［J］. Advances in neural information processing systems, 2012: 1097-1105.

[17] NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines ［C］. Proceedings of the 27th international conference on machine learning (ICML-10), 2010: 807-814.

[18] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al.. Improving neural networks by preventing co-adaptation of feature detectors ［J］. Computer Ence, 2012, 3(4): 212-223.

[19] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks ［J］. European conference on computer vision,Springer, 2014: 818-833.

[20] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［J］. Computer Ence, 2014.

[21] SZEGEDY C, LIU W, JIA Y, et al.. Going deeper with convolutions ［C］. Proceedings of the IEEE conference on computer vision and pattern recognition, 2015: 1-9.

[22] HE K, ZHANG X Y, REN S Q, et al.. Deep residual learning for image recognition ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.

[23] NORMALIZATION B. Accelerating deep network training by reducing internal covariate shift ［C］. International Conference on International Conference on Machine Learning JMLR, 2015.

[24] UIJLING J R, VAN DE SANDE K E, GEVERS T, et al.. Selective search for object recognition ［J］. International Journal of Computer Vision, 2013, 104(2): 154-171.

[25] KUO W, HARIHARAN B, MALIK J. Deepbox: Learning objectness with convolutional networks ［C］. Proceedings of the IEEE International Conference on Computer Vision, 2015: 2479-2487.

[26] PINHEIRO P O, LIN T-Y, COLLOBERT R, et al.. Learning to refine object segments ［C］. European Conference on Computer Vision, Springer, 2016: 75-91.

[27] GUPTA S, GRISHICK R, ARBELAEZ P, et al.. Learning rich features from RGB-D images for object detection and segmentation ［C］. European Conference on Computer Vision, Sringer, 2014: 345-360.

[28] PERRONNIN F, SANCHEZ J, MENSINK T. Improving the fisher kernel for large-scale image classification ［C］. European Conference on Computer Vision, Springer, 2010: 143-156.

[29] HE K, ZHANG X Y, REN S Q, et al.. Spatial pyramid pooling in deep convolutional networks for visual recognition ［J］. IEEE Transactions on Pattern Analysis Machine Intelligence, 2015, 37(9): 1904-1916.

[30] GIRSHICK R. Fast R-CNN ［C］. Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.

[31] XUE J, LI J, GONG Y. Restructuring of deep neural network acoustic models with singular value decomposition ［C］. Interspeech, 2013: 2365-2369.

[32] REN S, HE K, GIRSHICK R, et al.. Faster R-CNN: Towards real-time object detection with region proposal networks ［C］. Advances in Neural Information Processing Systems, 2015: 91-99.

[33] DAI J, LI Y, HE K, et al.. R-FCN: Object detection via region-based fully convolutional networks ［C］. Advances in Neural Information Processing Systems, 2016: 379-387.

[34] LIN T-Y, MAIRE M, BELONGIE S, et al.. Microsoft coco: Common objects in context ［C］. European Conference on Computer Vision, Springer, 2014: 740-755.

[35] LIU W, ANGUELOV D, ERHAN D, et al.. SSD: Single shot multibox detector ［C］. European conference on computer vision, Springer, 2016: 21-37.

[36] REDMON J, DIVVALA S, GIRSHICK R, et al.. You Only Look Once: Unified, real-time object detection ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.

[37] REDMON J, FARHADI A. YOLO9000: better, faster, stronger ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.

[38] REDMON J, FARHADI A. YOLOV3: An incremental improvement ［J］. arXiv preprint arXiv: 1804.02767, 2018.

[39] ERHAN D, SZEGEDY C, TOSHEV, et al.. Scalable object detection using deep neural networks ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 2147-2154.

[40] BELL S, LAWRENCE Z, BALA K, et al.. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2874-2883.

[41] FU C-Y, LIU W, RANGA A, et al.. DSSD: Deconvolutional single shot detector ［J］.arXiv preprint arXiv: 1701.06659, 2017.

[42] SHEN Z, LIU Z, LI J, et al.. DSOD: Learning deeply supervised object detectors from scratch ［C］. Proceedings of the IEEE International Conference on Computer Vision, 2017: 1919-1927.

[43] LAW H, HENG J. Cornernet: Detecting objects as paired keypoints ［C］. Proceedings of the European Conference on Computer Vision (ECCV), 2018: 734-750.

[44] ZHU C, HE Y, SAVVIDES M. Feature selective anchor-free module for single-shot object detection ［J］. arXiv preprint arXiv: 00621, 2019.

[45] ZHOU X, ZHOU J, KRAHENBUHL P. Bottom-up object detection by grouping extreme and center points ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 850-859.

[46] TIAN Z, SHEN C, CHEN H, et al.. FCOS: Fully Convolutional One-Stage Object Detection ［J］. arXiv preprint arXiv: 01355, 2019.

[47] DUAN K, BAI S,XIE L, et al.. Centernet: Keypoint triplets for object detection ［C］. Proceedings of the IEEE International Conference on Computer Vision, 2019: 6569-6578.

[48] EVERINGHAM M, WAN G, WILLIAMS C, et al.. The pascal visual object classes (voc) challenge ［J］. International Journal of Computer Vision, 2010, 88(2): 303-338.

[49] KUZNETSOVA A, ROM H, ALLDRIN N, et al.. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale ［J］. arXiv preprint arXiv: 1811.00982, 2018.

[50] DENG J,DONG W, SOCHER R, et al.. Imagenet: A large-scale hierarchical image database ［C］. 2009 IEEE conference on computer vision and pattern recognition, IEEE, 2009: 248-255.

[51] YANG S, LUO P, LOY C-C, et al.. Wider face: A face detection benchmark ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 5525-5533.

[52] JAIN V, LEARNED-MILLER E. FDDB: A benchmark for face detection in unconstrained settings ［C］. Computer Science, 2010.

[53] FELZENSZWALB P, GIRSHICK R, MCALLE-STER D, et al.. Discriminatively trained mixtures of deformable part models ［J］.PASCAL VOC Challenge, 2008.

[54] DOLLAR P, WOJEK C, SCHIELE B, et al.. Pedestrian detection: An evaluation of the state of the art ［J］. IEEE Transactions on Pattern Analysis Machine Intelligence, 2011, 34(4): 743-761.

[55] ZHANG S, BENENSON R, SCHIELE B. Citypersons: A diverse dataset for pedestrian detection ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 3213-3221.

[56] ［57］ DALAL N, TRIGGS B. Histograms of oriented gradients for human detection ［C］. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR′05), San Diego, CA, USA, 2005, 1: 886-893.

GEIGER A, LENZ P, STILLER C, et al.. Vision meets robotics: The KITTI dataset ［J］. The International Journal of Robotics Research, 2013, 32(11): 1231-1237.

[57] ESS A, LEIBE B, VANGOOL L. Depth and appearance for mobile scene analysis ［C］. 2007 IEEE 11th International Conference on Computer Vision, IEEE, 2007: 1-8.

[58] 刘晓, 崔光照, 李正周, 等. 基于视觉系统分层的小目标运动检测［J］.光学精密工程, 2019, 27(10): 2251-2262.

LIU X, CUI G ZH, LI ZH ZH, et al.. Small target motion detection based on layering of vision system ［J］. Opt. Precision Eng., 2019, 27(10): 2251-2262.(in Chinese)

[59] SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 761-769.

[60] KONG T, SUN F, YAO A, et al.. Ron: Reverse connection with objectness prior networks for object detection ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5936-5944.

[61] XIANG Y, CHOI W, LIN Y, et al.. Subcategory-aware convolutional neural networks for object proposals and detection ［C］. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2017: 924-933.

[62] LIN T-Y, DOLLAR P, GIRSHICK R, et al.. Feature pyramid networks for object detection ［C］. Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 2117-2125.

[63] GOODFELLOW L, POUGET-ABADIE J, MIRZA M, et al.. Generative adversarial nets ［C］. Advances in Neural Information Processing Systems, 2014: 2672-2680.

[64] 梁浩, 刘克俭, 刘康, 等. 引入再检测机制的孪生神经网络目标跟踪［J］.光学精密工程, 2019, 27(7): 1621-1631.

LIANG H, LIU K J, LIU K, et al.. Target tracking in twin neural networks with re-detection mechanism ［J］. Opt. Precision Eng., 2019, 27(7): 1621-1631.(in Chinese)

[65] HUANG J, RATHOD V, SUN C, et al.. Speed/accuracy trade-offs for modern convolutional object detectors ［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7310-7311.

[66] TOME D, BONDI L, BAROFFIO L, et al.. Reduced memory region based deep Convolutional Neural Network detection ［C］. 2016 IEEE 6th International Conference on Consumer Electronics-Berlin (ICCE-Berlin), IEEE, 2016: 15-19.

范丽丽, 赵宏伟, 赵浩宇, 胡黄水, 王振. 基于深度卷积神经网络的目标检测研究综述[J]. 光学精密工程, 2020, 28(5): 1152. FAN Li-li, ZHAO Hong-wei, ZHAO Hao-yu, HU Huang-shui, WANG Zhen. Survey of target detection based on deep convolutional neural networks[J]. Optics and Precision Engineering, 2020, 28(5): 1152.

基于深度卷积神经网络的目标检测研究综述

关于本站 Cookie 的使用提示

全站搜索

基于深度卷积神经网络的目标检测研究综述

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索