首页 > 论文 > 激光与光电子学进展 > 55卷 > 5期(pp:50007--1)

深度学习在视觉定位与三维结构恢复中的研究进展

Research Progress of Deep Learning in Visual Localization and Three-Dimensional Structure Recovery

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

介绍了利用深度学习从图像或视频中恢复三维结构、进行深度估计和实现视觉传感器实时定位方面的研究与应用;对深度学习的研究概况进行了介绍;深入分析和比较了有无监督情况下具有代表性的深度学习算法和系统;对近年来深度学习方面的研究热点进行了讨论,并进行了总结和展望。

Abstract

Research and application of deep learning in recovery of three-dimensional structure from image or video, depth estimation, and real-time localization of visual sensor are introduced. Research progress of deep learning is overviewed. According to whether there is supervision, some representative deep learning algorithms and systems are introduced individually with deep analysis and comparison. Finally, the research spots on deep learning in recent years are discussed, conclusions are presented, and some research tendencies are discussed.

Newport宣传-MKS新实验室计划
补充资料

中图分类号:TP391.41

DOI:10.3788/lop55.050007

所属栏目:综述

基金项目:国家自然科学基金(61501470)、陕西省重点研发计划(2017GY-075)

收稿日期:2017-10-09

修改稿日期:2017-11-16

网络出版日期:--

作者单位    点击查看

鲍振强:火箭军工程大学, 陕西 西安 710025
李艾华:火箭军工程大学, 陕西 西安 710025
崔智高:火箭军工程大学, 陕西 西安 710025
袁梦:火箭军工程大学, 陕西 西安 710025

联系人作者:鲍振强(bzhenqiang@163.com)

备注:鲍振强(1991—),男,硕士研究生,主要从事计算机视觉、视觉定位与导航方面的研究。E-mail: bzhenqiang@163.com

【1】Roberts L G. Machine perception of three-dimensional solids[M]. Cambridge: Massachusetts Institute of Technology, 1965: 31-39.

【2】Barrow H G, Tenenbaum J M. Interpreting line drawings as three-dimensional surfaces[J]. Artificial Intelligence, 1981, 17: 75-116.

【3】Tian Y B, Bai J, Huang Z. Depth estimation with a panoramic stereo imaging system[J]. Acta Optica Sinica, 2013, 33(6): 0611002.
田延冰, 白剑, 黄治. 基于全景环带立体成像系统的深度信息估计[J]. 光学学报, 2013, 33(6): 0611002.

【4】Flack J, Fox S. Rapid 2D-to-3D conversion[C]. SPIE, 2002, 4660: 78-86.

【5】Chen S E, Williams L. View interpolation for image synthesis[C]. Conference on Computer Graphics and Interactive Techniques, 1993: 279-288.

【6】Fitzgibbon A, Wexler Y, Zisserman A. Image-based rendering using image-based priors[J]. International Journal of Computer Vision, 2005, 63(2): 141-151.

【7】Seitz S M, Dyer C R. View morphing[C]. Conference on Computer Graphics and Interactive Techniques, 1996: 21-30.

【8】Zitnick C L, Kang S B, Uyttendaele M, et al. High-quality video view interpolation using alayered representation[C]. ACM Transactions on Graphics, 2004, 23(3): 600-608.

【9】L′ubor L, Hne C, Pollefeys M. Learning the matching function[J]. Computer Science, 2015: arXiv.

【10】Zbontar J, LeCun Y. Stereo matching by training a convolutional neural network to compare image patches[J]. Journal of Machine Learning Research, 2016, 17(65): 1-32.

【11】Xu L, Zhao H T, Sun S Y. Monocular infrared image depth estimation based on deep convolutional neural networks[J]. Acta Optica Sinica, 2016, 36(7): 0715002.
许路, 赵海涛, 孙韶媛. 基于深层卷积神经网络的单目红外图像深度估计[J]. 光学学报, 2016, 36(7): 0715002.

【12】Wu S C, Zhao H T, Sun S Y. Depth estimation from monocular infrared video based on bi-recursive convolutional neural network[J]. Acta Optica Sinica, 2017, 37(12): 1215003.
吴寿川, 赵海涛, 孙韶媛. 基于双向递归卷积神经网络的单目红外视频深度估计[J]. 光学学报, 2017, 37(12): 1215003.

【13】Mayer N, Ilg E, Husser P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4040-4048.

【14】Saxena A, Sun M, Ng A Y. Make3D: learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 824-840.

【15】Liu F, Shen C, Lin G, et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2024-2039.

【16】Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network[C]. International Conference on Neural Information Processing Systems, 2014: 2366-2374.

【17】Shi J, Pollefeys M. Pulling things out of perspective[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2014: 89-96.

【18】Li B, Shen C, Dai Y, et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1119-1127.

【19】Laina I, Rupprecht C, Belagiannis V, et al. Deeper depth prediction with fully convolutional residual networks[C]. Fourth IEEE International Conference on 3D Vision, 2016: 239-248.

【20】Li B, Shen C, Dai Y, et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1119-1127.

【21】Fan X, Zheng K, Lin Y, et al. Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015, 8753: 1347-1355.

【22】Ummenhofer B, Zhou H, Uhrig J, et al. DeMoN: depth and motion network for learning monocular stereo[C]. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 5622-5631.

【23】Kuznietsov Y, Stückler J, Leibe B. Semi-supervised deep learning for monocular depth map prediction[C]. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 2215-2223.

【24】Liu B, Gould S, Koller D. Single image depth estimation from predicted semantic labels[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2010: 1253-1260.

【25】Kendall A, Martirosyan H, Dasgupta S, et al. End-to-end learning of geometry and context for deep stereo regression[C]. 16th IEEE International Conference on Computer Vision, 2017: 66-75.

【26】Tulsiani S, Zhou T, Efros A A, et al. Multi-view supervision for single-view reconstruction via differentiable ray consistency[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 209-217.

【27】Bell A J, Sejnowski T J. The "independent components" of natural scenes are edge filters[J]. Vision Research, 1997, 37(23): 3327-3338.

【28】Bourlard H, Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition[J]. Biological Cybernetics, 1988, 59(4/5): 291-294.

【29】Olshausen B A, Field D J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images[J]. Nature, 1996, 381(6583): 607-609.

【30】Salakhutdinov R, Hinton G. Deep Boltzmann machines[J]. Journal of Machine Learning Research, 2009, 5(2): 1967-2006.

【31】Gadelha M, Maji S, Wang R. Shape generation using spatially partitioned point clouds[J]. Computer Science, 2016: arXiv:1707.06267.

【32】Rezende D J, Eslami S M A, Mohamed S, et al. Unsupervised learning of 3D structure from images[J]. Advances in Neural Information Processing Systems, 2016: 4997-5005.

【33】Yan X, Yang J, Yumer E, et al. Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision[J]. Advances in Neural Information Processing Systems, 2016: 1696-1704.

【34】Jayaraman D, Grauman K. Learning image representations tied to ego-motion[C]. IEEE International Conference on Computer Vision, 2015: 1413-1421.

【35】Kendall A, Grimes M, Cipolla R. PoseNet: a convolutional network for real-time 6-DOF camera relocalization[C]. IEEE International Conference on Computer Vision, 2015: 2938-2946.

【36】Agrawal P, Carreira J, Malik J. Learning to see by moving[C]. IEEE International Conference on Computer Vision, 2015: 37-45.

【37】Garg R, Vijay K B G, Carneiro G, et al. Unsupervised CNN forsingle view depth estimation: geometry to the rescue[C]. 14th European Conference on Computer Vision, 2016, 9912: 740-756.

【38】Kendall A, Cipolla R. Geometric loss functions for camera pose regression with deep learning[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6555-6564.

【39】Flynn J, Snavely K, Neulander I, et al. Deepstereo: learning to predict new views from real world imagery: US20160335795[P]. 2018-03-13.

【40】Xie J, Girshick R, Farhadi A. Deep3D: Fully automatic 2D-to-3D video conversion with deep vonvolutional neural networks[C]. 14th European Conference on Computer Vision, 2016, 9908: 842-857.

【41】Godard C, Aodha O M, Brostow G J. Unsupervised monocular depth estimation with left-right consistency[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6602-6611.

【42】Konda K, Memisevic R. Learning visual odometry with a convolutional network[C]. International Conference on Computer Vision Theory and Applications, 2015: 486-490.

【43】Handa A, Bloesch M, Ptrucean V, et al. gvnn: neural network library for geometric computer vision[C]. 14th European Conference on Computer Vision, 2016, 9915: 67-82.

【44】Zhao Y, Liu G L, Tian G H, et al. A survey of visual SLAM based on deep learning[J]. Robot, 2017, 39(6): 889-896.
赵洋, 刘国良, 田国会, 等. 基于深度学习的视觉SLAM综述[J]. 机器人, 2017, 39(6): 889-896.

【45】Wang S, Clark R, Wen H, et al. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks[C]. IEEE International Conference on Robotics and Automation, 2017: 2043-2050.

【46】Li R, Wang S, Long Z, et al. UnDeepVO: monocular visual odometry through unsupervised deep learning[J]. Computer Science, 2017: arXiv:1709.06841.

【47】Vijayanarasimhan S, Ricco S, Schmid C, et al. SfM-Net: learning of structure and motion from video[J]. Computer Science, 2017: arXiv:1704.07804.

【48】Gadelha M, Maji S, Wang R. 3D shape induction from 2D views of multiple objects[J]. Computer Science, 2016: arXiv:1612.05872.

【49】Arora R, Livescu K. Multi-view learning with supervision for transformed bottleneck features[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2014: 2499-2503.

【50】Shotton J, Glocker B, Zach C, et al. Scene coordinate regression forests for camera relocalization in RGB-D images[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2013: 2930-2937.

【51】Zhou T, Brown M, Snavely N, et al. Unsupervised learning of depth and ego-motion from video[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6612-6619.

引用该论文

Bao Zhenqiang,Li Aihua,Cui Zhigao,Yuan Meng. Research Progress of Deep Learning in Visual Localization and Three-Dimensional Structure Recovery[J]. Laser & Optoelectronics Progress, 2018, 55(5): 050007

鲍振强,李艾华,崔智高,袁梦. 深度学习在视觉定位与三维结构恢复中的研究进展[J]. 激光与光电子学进展, 2018, 55(5): 050007

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF