半导体光电, 2020, 41 (1): 1, 网络出版: 2020-04-13   

基于深度学习的目标检测技术的研究综述

Research Progresses of Target Detection Technology Based on Deep Learning
作者单位
重庆邮电大学 光电工程学院, 重庆 400065
摘要
深度学习已经成为机器视觉领域应用最为广泛的技术方法, 基于深度学习的目标检测技术是当前的一项热门研究课题。文章首先对国内外目标检测技术的最新研究进展进行了梳理, 并分析和总结了传统目标检测方法的优缺点; 然后详细介绍了几种基于深度学习的目标检测技术及其优缺点; 最后讨论了现阶段深度学习存在的问题和未来的发展方向。
Abstract
Deep learning has become the most widely used technical method in the field of computer vision. Target detection technology based on deep learning is a hot research topic. In this paper, the latest research progress of target detection technology at home and abroad are reviewed, then the advantages and disadvantages of traditional target detection methods are analyzed and summarized. Then, several target detection techniques based on deep learning and their merits and demerits are introduced. Finally, the existing problems of deep learning and the development trends are discussed.
参考文献

[1] Zhu D, Luo Y, Dai L, et al. Salient object detection via a local and global method based on deep residual network[J]. J. of Visual Commun. and Image Represent., 2018(54): 1-9.

[2] Zhang Z, Qiao S, Xie C, et al. Single-shot object detection with enriched semantics[C]// The IEEE Conf. on Computer Vision and Pattern Recognition, 2018: 3610-3621.

[3] Hamid R, Nathan T, Gwak J Y, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]// The IEEE Conf. on Computer Vision and Pattern Recognition, 2019: 658-670.

[4] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups[J]. IEEE Signal Proc. Magazine, 2012, 29(6): 82-97.

[5] Rosenblatt F. The perception: A probabilistic model for information storage and organization in the brain[J]. Psychological Rev., 1958, 65(6): 386-408.

[6] Sun D, Wulff J, Sudderth E B, et al. A fully-connected layered model of foreground and background flow[C]// 2013 IEEE Conf. on Computer Vision and Pattern Recognition, 2013: 2451-2458.

[7] Vincent P, Larochelle H, Bengio Y, et al. Extracting and composing robust features with denoising autoencoders[C]// Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), 2008: 1096-1103.

[8] Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion[J]. J. of Machine Learning Research, 2010, 11(12): 3371-3408.

[9] Jiang X, Zhang Y, Zhang W, et al. A novel sparse auto-encoder for deep unsupervised learning[C]// Sixth Inter. Conf. on Adv. Computational Intelligence, 2013: 256-261.

[10] Hinton G E. A practical guide to training restricted Boltzmann machines[J]. Momentum, 2010, 9(1): 926-947.

[11] Hinton G. Boltzmann machine[J]. Encyclopedia of Machine Learning, 2007, 2(5): 119-129.

[12] Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks[J]. Adv. in Neural Information Processing Systems, 2012, 25(2): 1097-1105.

[13] Bernier J L, Ortega J, Rodriguez M M, et al. An Accurate Measure for Multilayer Perception Tolerance to Additive Weight Deviations[M]. Engin. Applications of Bio-Inspired Artificial Neural Networks, Berlin: Springer, 1999: 121-130.

[14] Phan N H,Wang Y, Wu X T, et al. Differential privacy preservation for deep auto-encoders: an application of human behavior prediction[C]// 30th AAAI Conf. on Artificial Intelligence, 2016: 179-185.

[15] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups[J]. IEEE Signal Proc. Magazine, 2012, 29(6): 82-97.

[16] Salakhutdinov R, Hinton G. Using deep belief nets to learn covariance kernels for Gaussian processes[C]// Inter. Conf. on Neural Information Proc. Syst., 2008(20): 1347-1355.

[17] Cho K H, Raiko T, Ilin A. Gaussian-Bernoulli deep Boltzmann machine[C]// IEEE The 2013 Inter. Joint Conf. on Neural Networks (IJCNN), 2013(10): 561-570.

[18] Huang P S, Kim M, Hasegawa-Johnson M, et al. Joint optimization of masks and deep recurrent neural networks for monaural source separation[J]. IEEE/ACM Trans. on Audio Speech & Language Proc., 2015, 23(12): 2136-2147.

[19] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]// Inter. Conf. on Neural Information Proc. Sys., 2012(1):1097-1105.

[20] Girshick R, Donahue J, Darrelland T, et al. Rich feature hierarchies for object detection and semantic segmentation[C]// 2014 IEEE Conf. on Computer Vision and Pattern Recognition, 2014: 1-21.

[21] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2014, 37(9): 1904-1916.

[22] Girshick R. Fast R-CNN[J]. Computer Science, 2015(4): 169-178.

[23] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149.

[24] 郭 毓, 苏鹏飞, 吴益飞. 基于Faster R-CNN的机器人目标检测及空间定位[J]. 华中科技大学学报(自然科学版), 2018, 46(12): 60-64.

    Guo Yu, Su Pengfei, Wu Yifei. Object detection and location of robot based on Faster R-CNN[J]. J. of Huazhong University of Science and Technol. (Nature Science Edi.), 2018, 46(12): 60-64.

[25] He K, Gkioxari G, Dollar P, et al. Mask R-CNN[J]. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2018(42): 386-397.

[26] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[J]. arXiv e-prints, 2015(6): 2640-2650.

[27] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]// IEEE 2017 IEEE Conf. Computer Vision and Pattern Recognition(CVPR), 2017: 6517-6525.

[28] Redmon J, Farhadi A. YOLOv3: An incremental improvement[C]// 2018 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018: 2767-2773.

[29] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[J]. ECCV 2016: Computer Vision, 2016: 21-37.

[30] Vicente S, Carreira J, Agapito L, et al. Reconstructing PASCAL VOC[J]. Computer Vision & Pattern Recognition, 2014, 10(5): 111-122.

[31] Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge[J]. Inter. J. of Computer Vision, 2015, 115(3): 211-252.

[32] Kuznetsova A, Rom H, Alldrin N, et al. The open images dataset V4: Unified image classification, object detection and visual relationship detection at scale[J]. arXiv:1811.00982, 2018, 18(4): 111-119.

[33] Wang Xinlong, Xiao Tete, Jiang Yuning, et al. Repulsion loss: detecting pedestrians in a crowd[C]// 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2018, 1(4): 7774-7783.

[34] Liu Wei, Liao Shengcai, Ren Weiqiang, et al. High-level semantic feature detection: a new perspective for pedestrian detection[C]// 2019 IEEE Conf. on Computer Vision and Pattern Recognition, 2018: 2167-2173.

[35] Quellec G, Charriere K, Boudi Y, et al. Deep image mining for diabetic retinopathy screening[J]. Medical Image Analysis, 2017, 39: 178-193.

[36] Li Zhuolin, Dong Minghui, Wen Shiping, et al. CLU-CNNs: Object detection for medical images[J]. Neurocomputing, 2019, 350: 53-59.

[37] Li J, Wang Y, Wang C, et al. DSFD: Dual shot face detector[C]// 2019 IEEE Conf. on Computer Vision and Pattern Recognition, 2019: 1215-1224.

[38] Niu Xuesong, Han Hu, Yang Songfan, et al. Relationship learning with person-specific regularization for facial action unit detection[C]// 2019 IEEE Conf. on Computer Vision and Pattern Recognition, 2019: 411-420.

罗元, 王薄宇, 陈旭. 基于深度学习的目标检测技术的研究综述[J]. 半导体光电, 2020, 41(1): 1. LUO Yuan, WANG Boyu, CHEN Xu. Research Progresses of Target Detection Technology Based on Deep Learning[J]. Semiconductor Optoelectronics, 2020, 41(1): 1.

本文已被 3 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!