基于深度学习的目标检测技术的研究综述

罗元; 王薄宇; 陈旭

doi:doi:10.16818/j.issn1001-5868.2020.01.001

半导体光电, 2020, 41 (1): 1, 网络出版: 2020-04-13

基于深度学习的目标检测技术的研究综述

Research Progresses of Target Detection Technology Based on Deep Learning

论文大纲

罗元王薄宇陈旭

作者单位

重庆邮电大学光电工程学院, 重庆 400065

计算机视觉深度学习目标检测 computer vision deep learning target detection

摘要

深度学习已经成为机器视觉领域应用最为广泛的技术方法, 基于深度学习的目标检测技术是当前的一项热门研究课题。文章首先对国内外目标检测技术的最新研究进展进行了梳理, 并分析和总结了传统目标检测方法的优缺点; 然后详细介绍了几种基于深度学习的目标检测技术及其优缺点; 最后讨论了现阶段深度学习存在的问题和未来的发展方向。

Abstract

Deep learning has become the most widely used technical method in the field of computer vision. Target detection technology based on deep learning is a hot research topic. In this paper, the latest research progress of target detection technology at home and abroad are reviewed, then the advantages and disadvantages of traditional target detection methods are analyzed and summarized. Then, several target detection techniques based on deep learning and their merits and demerits are introduced. Finally, the existing problems of deep learning and the development trends are discussed.

参考文献

[1] Zhu D, Luo Y, Dai L, et al. Salient object detection via a local and global method based on deep residual network[J]. J. of Visual Commun. and Image Represent., 2018(54): 1-9.

[2] Zhang Z, Qiao S, Xie C, et al. Single-shot object detection with enriched semantics[C]// The IEEE Conf. on Computer Vision and Pattern Recognition, 2018: 3610-3621.

[3] Hamid R, Nathan T, Gwak J Y, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]// The IEEE Conf. on Computer Vision and Pattern Recognition, 2019: 658-670.

[4] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups[J]. IEEE Signal Proc. Magazine, 2012, 29(6): 82-97.

[5] Rosenblatt F. The perception: A probabilistic model for information storage and organization in the brain[J]. Psychological Rev., 1958, 65(6): 386-408.

[6] Sun D, Wulff J, Sudderth E B, et al. A fully-connected layered model of foreground and background flow[C]// 2013 IEEE Conf. on Computer Vision and Pattern Recognition, 2013: 2451-2458.

[7] Vincent P, Larochelle H, Bengio Y, et al. Extracting and composing robust features with denoising autoencoders[C]// Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), 2008: 1096-1103.

[8] Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion[J]. J. of Machine Learning Research, 2010, 11(12): 3371-3408.

[9] Jiang X, Zhang Y, Zhang W, et al. A novel sparse auto-encoder for deep unsupervised learning[C]// Sixth Inter. Conf. on Adv. Computational Intelligence, 2013: 256-261.

[10] Hinton G E. A practical guide to training restricted Boltzmann machines[J]. Momentum, 2010, 9(1): 926-947.

[11] Hinton G. Boltzmann machine[J]. Encyclopedia of Machine Learning, 2007, 2(5): 119-129.

[12] Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks[J]. Adv. in Neural Information Processing Systems, 2012, 25(2): 1097-1105.

[13] Bernier J L, Ortega J, Rodriguez M M, et al. An Accurate Measure for Multilayer Perception Tolerance to Additive Weight Deviations[M]. Engin. Applications of Bio-Inspired Artificial Neural Networks, Berlin: Springer, 1999: 121-130.

[14] Phan N H,Wang Y, Wu X T, et al. Differential privacy preservation for deep auto-encoders: an application of human behavior prediction[C]// 30th AAAI Conf. on Artificial Intelligence, 2016: 179-185.

[15] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups[J]. IEEE Signal Proc. Magazine, 2012, 29(6): 82-97.

[16] Salakhutdinov R, Hinton G. Using deep belief nets to learn covariance kernels for Gaussian processes[C]// Inter. Conf. on Neural Information Proc. Syst., 2008(20): 1347-1355.

[17] Cho K H, Raiko T, Ilin A. Gaussian-Bernoulli deep Boltzmann machine[C]// IEEE The 2013 Inter. Joint Conf. on Neural Networks (IJCNN), 2013(10): 561-570.

[18] Huang P S, Kim M, Hasegawa-Johnson M, et al. Joint optimization of masks and deep recurrent neural networks for monaural source separation[J]. IEEE/ACM Trans. on Audio Speech & Language Proc., 2015, 23(12): 2136-2147.

[19] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]// Inter. Conf. on Neural Information Proc. Sys., 2012(1):1097-1105.

[20] Girshick R, Donahue J, Darrelland T, et al. Rich feature hierarchies for object detection and semantic segmentation[C]// 2014 IEEE Conf. on Computer Vision and Pattern Recognition, 2014: 1-21.

[21] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2014, 37(9): 1904-1916.

[22] Girshick R. Fast R-CNN[J]. Computer Science, 2015(4): 169-178.

[23] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149.

[24] 郭毓, 苏鹏飞, 吴益飞. 基于Faster R-CNN的机器人目标检测及空间定位[J]. 华中科技大学学报(自然科学版), 2018, 46(12): 60-64.

Guo Yu, Su Pengfei, Wu Yifei. Object detection and location of robot based on Faster R-CNN[J]. J. of Huazhong University of Science and Technol. (Nature Science Edi.), 2018, 46(12): 60-64.

[25] He K, Gkioxari G, Dollar P, et al. Mask R-CNN[J]. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2018(42): 386-397.

[26] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[J]. arXiv e-prints, 2015(6): 2640-2650.

[27] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]// IEEE 2017 IEEE Conf. Computer Vision and Pattern Recognition(CVPR), 2017: 6517-6525.

[28] Redmon J, Farhadi A. YOLOv3: An incremental improvement[C]// 2018 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018: 2767-2773.

[29] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[J]. ECCV 2016: Computer Vision, 2016: 21-37.

[30] Vicente S, Carreira J, Agapito L, et al. Reconstructing PASCAL VOC[J]. Computer Vision & Pattern Recognition, 2014, 10(5): 111-122.

[31] Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge[J]. Inter. J. of Computer Vision, 2015, 115(3): 211-252.

[32] Kuznetsova A, Rom H, Alldrin N, et al. The open images dataset V4: Unified image classification, object detection and visual relationship detection at scale[J]. arXiv:1811.00982, 2018, 18(4): 111-119.

[33] Wang Xinlong, Xiao Tete, Jiang Yuning, et al. Repulsion loss: detecting pedestrians in a crowd[C]// 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2018, 1(4): 7774-7783.

[34] Liu Wei, Liao Shengcai, Ren Weiqiang, et al. High-level semantic feature detection: a new perspective for pedestrian detection[C]// 2019 IEEE Conf. on Computer Vision and Pattern Recognition, 2018: 2167-2173.

[35] Quellec G, Charriere K, Boudi Y, et al. Deep image mining for diabetic retinopathy screening[J]. Medical Image Analysis, 2017, 39: 178-193.

[36] Li Zhuolin, Dong Minghui, Wen Shiping, et al. CLU-CNNs: Object detection for medical images[J]. Neurocomputing, 2019, 350: 53-59.

[37] Li J, Wang Y, Wang C, et al. DSFD: Dual shot face detector[C]// 2019 IEEE Conf. on Computer Vision and Pattern Recognition, 2019: 1215-1224.

[38] Niu Xuesong, Han Hu, Yang Songfan, et al. Relationship learning with person-specific regularization for facial action unit detection[C]// 2019 IEEE Conf. on Computer Vision and Pattern Recognition, 2019: 411-420.

罗元, 王薄宇, 陈旭. 基于深度学习的目标检测技术的研究综述[J]. 半导体光电, 2020, 41(1): 1. LUO Yuan, WANG Boyu, CHEN Xu. Research Progresses of Target Detection Technology Based on Deep Learning[J]. Semiconductor Optoelectronics, 2020, 41(1): 1.

基于深度学习的目标检测技术的研究综述

关于本站 Cookie 的使用提示

全站搜索

基于深度学习的目标检测技术的研究综述

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索