首页 > 论文 > 激光与光电子学进展 > 56卷 > 19期(pp:190001--1)

基于深度学习的单目图像深度估计的研究进展

Progress in Deep Learning Based Monocular Image Depth Estimation

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

利用二维图像来进行场景的深度估计是计算机视觉领域的经典问题之一,也是实现三维重建、场景感知的重要环节。近年来基于深度学习的单目图像深度估计发展迅速,各种新算法层出不穷。介绍了深度学习在这一领域的应用历程与研究进展,采用监督与无监督两类方式分别系统地分析了有代表性的算法与框架,综述了深度学习在单目图像深度估计领域的研究进展与变化趋势,总结了当前研究的缺陷与不足,展望了未来研究的热点。

Abstract

Obtaining depth estimation of a scene from a two-dimensional image is a classic computer vision problem that plays an important role in three-dimensional reconstruction and scene perception. Monocular image depth estimation based on deep learning has been developing rapidly in recent years with new methods being proposed rapidly. This study discusses the application history and research progress in deep learning-based monocular depth estimation and analyzes several representative deep learning algorithms and network architectures in detail for both supervised and unsupervised learning. Finally, the research progress and trend of the deep learning in the monocular depth estimation field are summarized. Existing problems and future research priorities are discussed as well.

广告组1 - 空间光调制器+DMD
补充资料

中图分类号:TP391

DOI:10.3788/LOP56.190001

所属栏目:综述

基金项目:国家重点研发项目;

收稿日期:2019-03-20

修改稿日期:2019-04-11

网络出版日期:2019-10-01

作者单位    点击查看

李阳:遥感与地理信息系统研究所, 北京大学地球与空间科学学院, 北京 100871
陈秀万:遥感与地理信息系统研究所, 北京大学地球与空间科学学院, 北京 100871
王媛:遥感与地理信息系统研究所, 北京大学地球与空间科学学院, 北京 100871
刘茂林:遥感与地理信息系统研究所, 北京大学地球与空间科学学院, 北京 100871

联系人作者:李阳(yang.li2012@pku.edu.cn)

备注:国家重点研发项目;

【1】Zeng A and Song S R. NieBner M, et al. 3DMatch: learning local geometric descriptors from RGB-D reconstructions. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE. 199-208(2017).

【2】Wang Z, Liu H, Wang X D et al. Segment and label indoor scene based on RGB-D for the visually impaired. ∥Gurrin C, Hopfgartner F, Hurst W, et al. MultiMedia modeling. Lecture notes in computer science. Cham: Springer. 8325, 449-460(2014).

【3】Mancini M, Costante G, Valigi P et al. Fast robust monocular depth estimation for Obstacle Detection with fully convolutional networks. [C]∥2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 9-14, 2016, Daejeon, Korea. New York: IEEE. 4296-4303(2016).

【4】Chen Z H, Hong Y, Wang J K et al. Monocular visual odometry based on recurrent convolutional neural networks. Robot. 41(2), 147-155(2019).
陈宗海, 洪洋, 王纪凯 等. 基于循环卷积神经网络的单目视觉里程计. 机器人. 41(2), 147-155(2019).

【5】Li X Z, Yang A L, Qin B L et al. Monocular camera three dimensional reconstruction based on optical flow feedback. Acta Optica Sinica. 35(5), (2015).
李秀智, 杨爱林, 秦宝岭 等. 基于光流反馈的单目视觉三维重建. 光学学报. 35(5), (2015).

【6】Zhan K F, Chen W J, Li W S et al. Line laser 3D scene reconstruction system and error analysis. Chinese Journal of Lasers. 45(12), (2018).
詹坤烽, 陈文建, 李武森 等. 线激光三维场景重建系统及误差分析. 中国激光. 45(12), (2018).

【7】Bi T T, Liu Y, Weng D D et al. Survey on supervised learning based depth estimation from a single image. Journal of Computer-Aided Design & Computer Graphics. 30(8), 3-13(2018).
毕天腾, 刘越, 翁冬冬 等. 基于监督学习的单幅图像深度估计综述. 计算机辅助设计与图形学学报. 30(8), 3-13(2018).

【8】bontar J. LeCun Y. Stereo matching by training a convolutional neural network to compare image patches. The Journal of Machine Learning Research. 17, 2287-2318(2016).

【9】Hirschmuller H. Accurate and efficient stereo processing by semi-global matching and mutual information. [C]∥2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR''''05), June 20-25, 2005, San Diego, CA, USA. New York: IEEE. 2, 807-814(2005).

【10】Zhao S Y, Zhang L, Shen Y et al. Super-resolution for monocular depth estimation with multi-scale sub-pixel convolutions and a smoothness constraint. IEEE Access. 7, 16323-16335(2019).

【11】He L, Dong Q L and Hu Z Y. The inherent ambiguity in scene depth learning from single images. Scientia Sinica (Informationis). 46(7), 811-818(2016).
何雷, 董秋雷, 胡占义. 从单幅图像学习场景深度信息固有的歧义性. 中国科学:信息科学. 46(7), 811-818(2016).

【12】Tsai Y M, Chang Y L and Chen L G. Block-based vanishing line and vanishing point detection for 3D scene reconstruction. [C]∥2006 International Symposium on Intelligent Signal Processing and Communications, December 12-15, 2006, Tottori, Japan. New York: IEEE. 586-589(2006).

【13】Tang C, Hou C P and Song Z J. Depth recovery and refinement from a single image using defocus cues. Journal of Modern Optics. 62(6), 441-448(2015).

【14】Prados E and Faugeras O. Shape from shading. ∥Paragios N, Chen Y, Faugeras O. Handbook of mathematical models in computer ision. Boston, MA: Springer. 375-388(2009).

【15】Karsch K, Liu C and Kang S B. Depth extraction from video using non-parametric sampling. ∥Fitzgibbon A, Lazebnik S, Perona P, et al. Computer vision-ECCV 2012. Lecture notes in computer science. Berlin, Heidelberg: Springer. 7576, 775-788(2012).

【16】Saxena A, Sun M and Ng A Y. Make3D: learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence. 31(5), 824-840(2009).

【17】Saxena A, Sun M and Ng A Y. Learning 3-D scene structure from a single still image. [C]∥2007 IEEE 11th International Conference on Computer Vision, October 14-21, 2007, Rio de Janeiro, Brazil. New York: IEEE. 9848899, (2007).

【18】Liu B, Gould S and Koller D. Single image depth estimation from predicted semantic labels. [C]∥2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 13-18, 2010, San Francisco, CA, USA. New York: IEEE. 1253-1260(2010).

【19】Girshick R, Donahue J, Darrell T et al. Rich feature hierarchies for accurate object detection and semantic segmentation. [C]∥2014 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA. New York: IEEE. 580-587(2014).

【20】Liu F, Liu P Y, Li B et al. Deep learning model design of video target tracking based on TensorFlow platform. Laser & Optoelectronics Progress. 54(9), (2017).
刘帆, 刘鹏远, 李兵 等. TensorFlow平台下的视频目标跟踪深度学习模型设计. 激光与光电子学进展. 54(9), (2017).

【21】Hinton G E. Reducing the dimensionality of data with neural networks. Science. 313(5786), 504-507(2006).

【22】Krizhevsky A, Sutskever I and Hinton G E. ImageNet classification with deep convolutional neural networks. [C]∥Proceedings of the 25th International Conference on Neural Information Processing Systems, December 3-6, 2012, Lake Tahoe, Nevada, USA. Canada: NIPS. (2012).

【23】Eigen D, Puhrsch C and Fergus R. Depth map prediction from a single image using a multi-scale deep network. [C]∥27th International Conference on Neural Information Processing Systems, December 8-13, 2014, Montreal, Canada. Canada: NIPS. (2014).

【24】Eigen D and Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE. 2650-2658(2015).

【25】Grigorev A, Jiang F, Rho S et al. Depth estimation from single monocular images using deep hybrid network. Multimedia Tools and Applications. 76(18), 18585-18604(2017).

【26】Liu F Y, Shen C H, Lin G S et al. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence. 38(10), 2024-2039(2016).

【27】Laina I, Rupprecht C, Belagiannis V et al. Deeper depth prediction with fully convolutional residual networks. [C]∥2016 Fourth International Conference on 3D Vision (3DV), October 25-28, 2016,Stanford, CA, USA. New York: IEEE. 239-248(2016).

【28】Cao Y, Wu Z F and Shen C H. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology. 28(11), 3174-3182(2018).

【29】Xie J Y, Girshick R and Farhadi A. Deep3D: fully automatic 2D-to-3D video conversion with deep convolutional neural networks. ∥Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science. Cham: Springer. 9908, 842-857(2016).

【30】Garg R. Kumar B G V, Carneiro G, et al. Unsupervised CNN for single view depth estimation: geometry to the rescue. ∥Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science. Cham: Springer. 9912, 740-756(2016).

【31】Godard C, Aodha O M and Brostow G J. Unsupervised monocular depth estimation with left-right consistency. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE. 6602-6611(2017).

【32】Zhou T H, Brown M, Snavely N et al. Unsupervised learning of depth and ego-motion from video. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: IEEE. 6612-6619(2017).

【33】Casser V, Pirk S, Mahjourian R et al. -11-15)[2019-03-15]. https:∥arxiv. org/abs/1811, (2018).

【34】Bao Z Q, Li A H, Cui Z G et al. Research progress of deep learning in visual localization and three-dimensional structure recovery. Laser & Optoelectronics Progress. 55(5), (2018).
鲍振强, 李艾华, 崔智高 等. 深度学习在视觉定位与三维结构恢复中的研究进展. 激光与光电子学进展. 55(5), (2018).

【35】Saxe A M and McClelland J L. -02-19)[2019-03-15]. https:∥arxiv.org/abs/1312.6120v1. (2014).

【36】Srivastava R K and Greff K. -11-23)[2019-03-15]. https:∥arxiv. org/abs/1507, (2015).

【37】He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE. 770-778(2016).

【38】Roy A and Todorovic S. Monocular depth estimation using neural regression forest. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE. 5506-5514(2016).

【39】He L, Wang G H and Hu Z Y. Learning depth from single images with deep neural network embedding focal length. IEEE Transactions on Image Processing. 27(9), 4676-4689(2018).

【40】Couprie C, Farabet C, Najman L et al. -03-14)[2019-03-15]. https:∥arxiv. org/abs/1301, (2013).

【41】Chen L F, Yang Z, Ma J J et al. Driving scene perception network: real-time joint detection, depth estimation and semantic segmentation. [C]∥2018 IEEE Winter Conference on Applications of Computer Vision (WACV), March 12-15, 2018, Lake Tahoe, NV, USA. New York: IEEE. 1283-1291(2018).

【42】Jiao J B, Cao Y, Song Y B et al. Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. ∥Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018. Lecture notes in computer science. Cham: Springer. 11219, 55-71(2018).

【43】Lin T Y, Goyal P, Girshick R et al. Focal loss for dense object detection. [C]∥The IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE. 2980-2988(2017).

【44】Saxena A, Chung S H and Ng A Y. Learning depth from single monocular images. [C]∥Proceedings of the 18th International Conference on Neural Information Processing Systems, December 5-8, 2005, Vancouver, British Columbia, Canada. Canada: NIPS. (2005).

【45】Li B, Dai Y C and He M Y. Monocular depth estimation with hierarchical fusion of dilated CNNs and soft-weighted-sum inference. Pattern Recognition. 83, 328-339(2018).

【46】Yu F. -04-30)[2019-03-15]. https:∥arxiv. org/abs/1511, (2016).

【47】Fu H, Gong M M, Wang C H et al. Deep ordinal regression network for monocular depth estimation. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT. New York: IEEE. 2002-2011(2018).

【48】Herbrich R, Graepel T and Obermayer K. Support vector learning for ordinal regression. [C]∥9th International Conference on Artificial Neural Networks: ICANN ''''99, September 7-10, 1999, Edinburgh, UK. New York: IEEE. 97-102(1999).

【49】Chen L C, Papandreou G, Kokkinos I et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 40(4), 834-848(2018).

【50】Mayer N, Ilg E, Hausser P et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE. 4040-4048(2016).

【51】Wang Z, Bovik A C, Sheikh H R et al. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing. 13(4), 600-612(2004).

【52】Heise P, Klose S, Jensen B et al. PM-huber: PatchMatch with Huber regularization for stereo matching. [C]∥2013 IEEE International Conference on Computer Vision, December 1-8, 2013, Sydney, Australia. New York: IEEE. 2360-2367(2013).

【53】Kuznietsov Y, Stuckler J and Leibe B. Semi-supervised deep learning for monocular depth map prediction. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI. New York: IEEE. 2215-2223(2017).

【54】Nister D, Naroditsky O and Bergen J. Visual odometry. [C]∥Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA. New York: IEEE. 1315094, (2004).

【55】Mur-Artal R and Tardós J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics. 33(5), 1255-1262(2017).

【56】Yang Z H, Wang P, Xu W et al. -11-10)[2019-03-15]. https:∥arxiv. org/abs/1711, (2017).

【57】Zhou L P, Ye J M, Abello M, clip loss[J/OL] et al. -12-08)[2019-03-15]. https:∥arxiv. org/abs/1812, (2018).

【58】Vijayanarasimhan S, Ricco S, Schmid C, motion from video[J/OL] et al. -04-25)[2019-03-15]. https:∥arxiv. org/abs/1704, (2017).

【59】Yin Z C and Shi J P. GeoNet:unsupervised learning of dense depth, optical flow and camera pose. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT. New York: IEEE. 1983-1992(2018).

【60】Ilg E, Mayer N, Saikia T et al. FlowNet 2.0: evolution of optical flow estimation with deep networks. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI. New York: IEEE. 1647-1655(2017).

【61】Xu D, Ricci E, Ouyang W L et al. Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI. New York: IEEE. 161-169(2017).

【62】Guo X Y, Li H S, Yi S et al. Learning monocular depth by distilling cross-domain stereo networks. ∥Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018. Lecture notes in computer science. Cham: Springer. 11215, 506-523(2018).

【63】Kumar A R S, Bhandarkar S M and Prasad M. Monocular depth prediction using generative adversarial networks. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 18-22, 2018, Salt Lake City, UT, USA. New York: IEEE. 413-418(2018).

【64】Almalioglu Y, Saputra M R U et al. -03-05)[2019-03-15]. https:∥arxiv.org/abs/1809.05786v2. (2019).

【65】Teng Q R, Chen Y M and Huang C. Occlusion-aware unsupervised learning of monocular depth, optical flow and camera pose with geometric constraints. Future Internet. 10(10), (2018).

【66】Li S M, Lei G Q and Fan R. Depthmap super-resolution based on two-channel convolutional neural network. Acta Optica Sinica. 38(10), (2018).
李素梅, 雷国庆, 范如. 基于双通道卷积神经网络的深度图超分辨研究. 光学学报. 38(10), (2018).

引用该论文

Li Yang,Chen Xiuwan,Wang Yuan,Liu Maolin. Progress in Deep Learning Based Monocular Image Depth Estimation[J]. Laser & Optoelectronics Progress, 2019, 56(19): 190001

李阳,陈秀万,王媛,刘茂林. 基于深度学习的单目图像深度估计的研究进展[J]. 激光与光电子学进展, 2019, 56(19): 190001

被引情况

【1】戴仁月,方志军,高永彬. 融合扩张卷积网络与SLAM的无监督单目深度估计. 激光与光电子学进展, 2020, 57(6): 61007--1

【2】杨红,徐爱俊. 基于短视频图像的立木深度图生成算法. 激光与光电子学进展, 2020, 57(16): 161011--1

【3】钱富琛,郭政儒,董文乾,胡晓蕾,陈飞,郝强,曾和平. 高精度同步飞秒和皮秒脉冲产生技术. 中国激光, 2020, 47(10): 1001001--1

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF