首页 > 论文 > 激光与光电子学进展 > 56卷 > 15期(pp:150003--1)

基于深度学习的语义分割问题研究综述

Review of Deep Learning-Based Semantic Segmentation

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

语义分割是计算机视觉领域的核心技术,通过对图像中的每个像素点进行分类,将图像分割成若干个具有特定语义类别的区域。近年来,卷积神经网络(CNN)不断取得突破性进展,利用深度学习方法处理语义分割问题展示出具大的潜力。首先从语义分割的定义出发,探讨了目前语义分割领域存在的挑战。在介绍CNN相关原理的基础上,详细对比了几种用于语义分割算法评测的数据集,并重点对近年来语义分割领域基于解码器、信息融合和循环神经网络的深度学习方法进行综述。最后进行总结和展望,阐述了未来语义分割领域在进一步丰富数据库场景、提高算法实时性和开展三维点云语义分割三方面的发展趋势。

Abstract

Semantic segmentation, which classifies all pixels in an image and divides the image into several regions with specific semantic categories, is a key technology in the field of computer vision. In recent years, convolutional neural networks (CNNs) have been making breakthroughs and have demonstrated great potential in using deep learning to perform semantic segmentation. Herein, beginning with the definition of semantic segmentation, existing challenges in the field of semantic segmentation are discussed. Based on CNN principles, several datasets used for semantic segmentation algorithm evaluation are compared in detail, and recent deep learning methods based on decoders, information fusion, and recurrent neural networks in semantic segmentation are summarized. Finally, future development trends (e.g. enriching database scenes, improving real-time performance of algorithms, and researching the semantic segmentation) of three-dimensional point cloud data in semantic segmentation are summarized.

Newport宣传-MKS新实验室计划
补充资料

DOI:10.3788/LOP56.150003

所属栏目:综述

基金项目:国家自然科学基金(61773395);

收稿日期:2019-01-25

修改稿日期:2019-03-05

网络出版日期:2019-08-01

作者单位    点击查看

张祥甫:海军工程大学兵器工程学院, 湖北 武汉 430032
刘健:海军工程大学兵器工程学院, 湖北 武汉 430032
石章松:海军工程大学兵器工程学院, 湖北 武汉 430032
吴中红:海军工程大学兵器工程学院, 湖北 武汉 430032
王智:海军工程大学兵器工程学院, 湖北 武汉 430032

联系人作者:刘健(liujian_nue@163.com)

备注:国家自然科学基金(61773395);

【1】He Y, Wang H and Zhang B. Color-based road detection in urban traffic scenes. IEEE Transactions on Intelligent Transportation Systems. 5(4), 309-318(2004).

【2】An Z, Xu X P, Yang J H et al. Design of augmented reality head-up display system based on image semantic segmentation. Acta Optica Sinica. 38(7), (2018).
安喆, 徐熙平, 杨进华 等. 结合图像语义分割的增强现实型平视显示系统设计与研究. 光学学报. 38(7), (2018).

【3】Ros G, Sellart L, Materzynska J et al. The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. [C]∥The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 26-July 1, 2016, Las Vegas, Nevada, USA. New York: IEEE. 3234-3243(2016).

【4】Yi Z, Criminisi A, Shotton J et al. Discriminative, semantic segmentation of brain tissue in MR images. ∥Yang G Z, Hawkes D, Rueckert D, et al. Medical image computing and computer-assisted intervention -MICCAI 2009. Lecture notes in computer science. Berlin, Heidelberg: Springer. 5762, 558-565(2009).

【5】Liu H, Peng L and Wen J W. Multi-scale aware pedestrian detection algorithm based on improved full convolutional network. Laser & Optoelectronics Progress. 55(9), (2018).
刘辉, 彭力, 闻继伟. 基于改进全卷积网络的多尺度感知行人检测算法. 激光与光电子学进展. 55(9), (2018).

【6】Simo-Serra E, Fidler S, Moreno-Noguer F et al. A high performance CRF model for clothes parsing. ∥Cremers D, Reid I, Saito H, et al. Computer vision—ACCV 2014. Lecture notes in computer science. Cham: Springer. 9005, 64-81(2015).

【7】Dollar P, Appel R, Belongie S et al. Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 36(8), 1532-1545(2014).

【8】Girshick R, Donahue J, Darrell T et al. Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 38(1), 142-158(2016).

【9】Krizhevsky A, Sutskever I and Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 60(6), 84-90(2017).

【10】Simonyan K. -04-10)[2019-01-05]. https:∥arxiv. org/abs/1409, (2015).

【11】Szegedy C, Liu W, Jia Y Q et al. Going deeper with convolutions. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE. 15523970, (2015).

【12】He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE. 770-778(2016).

【13】Mohamed A R, Dahl G E and Hinton G. Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing. 20(1), 14-22(2012).

【14】Cheng G J and Liu L T. Feasibility study of deep learning algorithm applied to rock image processing. Software Guide. 15(9), 163-166(2016).
程国建, 刘丽婷. 深度学习算法应用于岩石图像处理的可行性研究. 软件导刊. 15(9), 163-166(2016).

【15】Wang L and Liu Q. A multi-object image segmentation algorithm based on local features. Laser & Optoelectronics Progress. 55(6), (2018).
王琳, 刘强. 基于局部特征的多目标图像分割算法. 激光与光电子学进展. 55(6), (2018).

【16】Shi J B and Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 22(8), 888-905(2000).

【17】Wu S Q, Nakao M and Matsuda T. Automatic GrabCut based lung extraction from endoscopic images with an initial boundary. [C]∥2016 IEEE 13th International Conference on Signal Processing (ICSP), November 6-10, 2016, Chengdu, China. New York: IEEE. 1374-1378(2017).

【18】Everingham M. Eslami S M A, van Gool L, et al. The Pascal visual object classes challenge: a retrospective. International Journal of Computer Vision. 111(1), 98-136(2015).

【19】Hariharan B, Arbelaez P, Bourdev L et al. Semantic contours from inverse detectors. [C]∥2011 International Conference on Computer Vision, November 6-13, 2011, Barcelona, Spain. New York: IEEE. 991-998(2012).

【20】Mottaghi R, Chen X J, Liu X B et al. The role of context for object detection and semantic segmentation in the wild. [C]∥2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 23-28, 2014, Columbus, OH, USA. New York: IEEE. 891-898(2014).

【21】Wang J Y and Yuille A. Semantic part segmentation using compositional model combining shape and appearance. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE. 1788-1797(2015).

【22】Garcia-Garcia A, Orts-Escolano S, Oprea S et al. -04-22)[2019-01-05]. org/abs/1704, (2017).

【23】Lin T Y, Maire M, Belongie S et al. Microsoft COCO: common objects in context. ∥Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014. Lecture notes in computer science. Cham: Springer. 8693, 740-755(2014).

【24】Cordts M, Omran M, Ramos S et al. The cityscapes dataset for semantic urban scene understanding. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE. 3213-3223(2016).

【25】Brostow G J, Fauqueur J and Cipolla R. Semantic object classes in video:a high-definition ground truth database. Pattern Recognition Letters. 30(2), 88-97(2009).

【26】Sturgess P, Alahari K, Ladicky L et al. Combining appearance and structure from motion features for road scene understanding[C]∥Proceedings of the British Machine Vision Conference 2009, September 7-10, 2009, London. Durham, England,. 62, (2009).

【27】Ros G, Ramos S, Granados M et al. Vision-based offline-online perception paradigm for autonomous driving. [C]∥2015 IEEE Winter Conference on Applications of Computer Vision, January 5-9, 2015, Waikoloa, HI, USA. New York: IEEE. 231-238(2015).

【28】Zhang R, Candra S A, Vetter K et al. Sensor fusion for semantic segmentation of urban scenes. [C]∥2015 IEEE International Conference on Robotics and Automation (ICRA), May 26-30, 2015, Seattle, WA, USA. New York: IEEE. 1850-1857(2015).

【29】Geiger A, Lenz P, Stiller C et al. Vision meets robotics: the KITTI dataset. The International Journal of Robotics Research. 32(11), 1231-1237(2013).

【30】Alvarez J M and Gevers T. LeCun Y, et al. Road scene segmentation from a single image. ∥ Fitzgibbon A, Lazebnik S, Perona P, et al. Computer vision-ECCV 2012. Lecture notes in computer science. Berlin, Heidelberg: Springer. 7578, 376-389(2012).

【31】Ros G and Alvarez J M. Unsupervised image transformation for outdoor semantic labelling. [C]∥2015 IEEE Intelligent Vehicles Symposium (IV), June 28-July 1, 2015, Seoul, Korea. New York: IEEE. 537-542(2015).

【32】Shelhamer E, Long J and Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39(4), 640-651(2017).

【33】Badrinarayanan V, Kendall A and Cipolla R. SegNet:a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39(12), 2481-2495(2017).

【34】Chen L C, Papandreou G, Kokkinos I, fully connected CRFs[J/OL] et al. -06-07)[2019-01-05]. https:∥arxiv. org/abs/1412, (2016).

【35】Chen L C, Papandreou G, Kokkinos I et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 40(4), 834-848(2018).

【36】Chen L C, Papandreou G, Schroff F et al. -12-05)[2019-01-05]. https:∥arxiv. org/abs/1706, (2017).

【37】Chen L C, Zhu Y K, Papandreou G et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. ∥ Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018. Lecture notes in computer science. Cham: Springer. 11211, 833-851(2018).

【38】Zheng S, Jayasumana S, Romera-Paredes B et al. Conditional random fields as recurrent neural networks. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE. 1529-1537(2015).

【39】Liu W and Rabinovich A. -11-19)[2019-01-05]. https:∥arxiv. org/abs/1506, (2015).

【40】Pinheiro P O, Lin T Y, Collobert R et al. Learning to refine object segments. ∥Leibe B, Matas J, Sebe N, et al. Computer vision—ECCV 2016. Lecture notes in computer science. Cham: Springer. 9905, 75-91(2016).

【41】Zhao H S, Shi J P, Qi X J et al. Pyramid scene parsing network. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI. New York: IEEE. 6230-6239(2017).

【42】Raj A, Maturana D and Pennsylvania: Carnegie Mellon University. CMU-RITR-15-21. (2015).

【43】Roy A and Todorovic S. A multi-scale CNN for affordance segmentation in RGB images. ∥Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science. Cham: Springer. 9908, 186-201(2016).

【44】Eigen D and Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE. 2650-2658(2016).

【45】Bian X, Lim S N and Zhou N. Multiscale fully convolutional network with application to industrial inspection. [C]∥2016 IEEE Winter Conference on Applications of Computer Vision (WACV), March 7-10, 2016, Lake Placid, NY, USA. New York: IEEE. 16035894, (2016).

【46】Visin F, Romero A, Cho K et al. ReSeg: a recurrent neural network-based model for semantic segmentation. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 26-July 1, 2016, Las Vegas, NV, USA. New York: IEEE. 426-433(2016).

【47】Li Z, Yukang Gan Y K, Liang X D et al. -07-26)[2019-01-05]. https:∥arxiv.org/abs/1604.05000v1. (2016).

【48】Pinheiro P H O. -06-12)[2019-01-05]. https:∥arxiv. org/abs/1306, (2013).

【49】Byeon W, Breuel T M, Raue F et al. Scene labeling with LSTM recurrent neural networks. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE. 3547-3555(2015).

【50】Shuai B, Zuo Z, Wang B et al. DAG-recurrent neural networks for scene labeling. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE. 3620-3629(2016).

【51】Bell S, Upchurch P, Snavely N et al. Material recognition in the wild with the Materials in Context Database. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE. 3479-3487(2015).

【52】Pinheiro P O, Collobert R and Dollár P. Learning to segment object candidates[C]∥ Proceedings of the 28th International Conference on Neural Information Processing Systems, December 7-12, 2015, Montreal, Canada. 2, 1990-1998(2015).

【53】Visin F, Francesco K, Cho K et al. -07-23)[2019-01-05]. https:∥arxiv. org/abs/1505, (2015).

【54】Hochreiter S and Schmidhuber J. Long short-term memory. Neural Computation. 9(8), 1735-1780(1997).

【55】Wu Z Z and King S. Investigating gated recurrent networks for speech synthesis. [C]∥2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 20-25, 2016, Shanghai, China. New York: IEEE. 5140-5144(2016).

【56】Li X, Jie Z Q, Wang W et al. FoveaNet: perspective-aware urban scene parsing. [C]∥2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE. 784-792(2017).

【57】Yu C Q, Wang J B, Peng C et al. Learning a discriminative feature network for semantic segmentation. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE. 1857-1866(2018).

【58】Souly N, Spampinato C and Shah M. Semi supervised semantic segmentation using generative adversarial network. [C]∥2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE. 5689-5697(2017).

【59】Guo C C, Yu F Q and Chen Y. Image semantic segmentation based on convolutional neural network feature and improved superpixel matching. Laser & Optoelectronics Progress. 55(8), (2018).
郭呈呈, 于凤芹, 陈莹. 基于卷积神经网络特征和改进超像素匹配的图像语义分割. 激光与光电子学进展. 55(8), (2018).

引用该论文

Xiangfu Zhang, Jian Liu, Zhangsong Shi, Zhonghong Wu, Zhi Wang. Review of Deep Learning-Based Semantic Segmentation[J]. Laser & Optoelectronics Progress, 2019, 56(15): 150003

张祥甫, 刘健, 石章松, 吴中红, 王智. 基于深度学习的语义分割问题研究综述[J]. 激光与光电子学进展, 2019, 56(15): 150003

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF