首页 > 论文 > 光学学报 > 38卷 > 6期(pp:615002--1)

结合有序光流图和双流卷积网络的行为识别

Double-Stream Convolutional Networks with Sequential Optical Flow Image for Action Recognition

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

为有效利用行为视频的长时时域信息,提高行为识别准确率,提出一种结合有序光流图和双流卷积神经网络的行为识别算法。首先利用Rank支持向量机(SVM)算法将连续光流序列压缩总结成单幅有序光流图,实现对视频长时时域结构的建模;然后设计一个包含表观和短时运动流与长时运动流的双流卷积网络,分别以堆叠RGB帧、有序光流图为输入提取视频的表观和短时运动信息与长时运动信息;最后将双流网络的C3D描述子和VGG描述子融合后输入线性SVM进行行为识别。在HMDB51和UCF101两个数据集的实验结果表明,该算法能够有效利用空域表观信息和时域运动信息,具有较高的行为视频识别准确率。

Abstract

In order to effectively utilize the long-term temporal information of video for improving the accuracy of action recognition, a new recognition approach is proposed based on the sequential optical flow image and double-stream convolutional neural networks. Firstly, the Rank support vector machine (SVM) algorithm is used to compress the continuous optical flow frames into a single sequential optical flow image to realize the modeling of the long-term temporal structure of video. Secondly, we design a double-stream convolutional networks containing appearance and short-term motion stream and long-term motion stream. It takes the stacked RGB frames and the sequential optical flow images as input to extract the appearance and short-time motion information and the long-time motion information of the video. Finally, the linear SVM is adopted to integrate C3D descriptor and VGG descriptor for action recognition. The experimental results on HMDB51 and UCF101 datasets show that the proposed approach improves the action recognition accuracy effectively by using the spatial information and the temporal motion information.

Newport宣传-MKS新实验室计划
补充资料

中图分类号:TP391

DOI:10.3788/aos201838.0615002

所属栏目:机器视觉

基金项目:国家自然科学基金(61501470)、陕西省重点研发计划(2017GY-075)

收稿日期:2017-11-27

修改稿日期:2018-01-04

网络出版日期:--

作者单位    点击查看

李庆辉:火箭军工程大学作战保障学院, 陕西 西安 710025
李艾华:火箭军工程大学作战保障学院, 陕西 西安 710025
王涛:火箭军工程大学作战保障学院, 陕西 西安 710025
崔智高:火箭军工程大学作战保障学院, 陕西 西安 710025

联系人作者:李庆辉(lqhui1212@126.com)

备注:李庆辉(1989-),男,博士研究生,主要从事机器视觉方面的研究。E-mail: lqhui1212@126.com

【1】Herath S, Harandi M, Porikli F. Going deeper into action recognition: A survey[J]. Image and Vision Computing, 2017, 60: 4-21.

【2】Ma M, Li Y B. Multi-level image sequences and convolutional neural networks based human action recognition method[J]. Journal of Jilin University Engineering and Technology Edition, 2017, 47(4): 1244-1252.
马淼, 李贻斌. 基于多级图像序列和卷积神经网络的人体行为识别[J]. 吉林大学学报(工学版), 2017, 47(4): 1244-1252.

【3】Subetha T, Chitrakala S. A survey on human activity recognition from videos[C]∥Proceedings of IEEE International Conference on Information Communication and Embedded Systems, 2016: 1-7.

【4】Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014: 1725-1732.

【5】Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[J]. Advances in Neural Information Processing Systems, 2014, 1(4): 568-576.

【6】Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks[C]∥Proceedings of IEEE International Conference on Computer Vision, 2015: 4489-4497.

【7】Donahue J, Hendricks L A, Rohrbach M, et al. Long-term recurrent convolutional networks for visual recognition and description[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 677-691.

【8】Wang L M, Xiong Y J, Wang Z, et al. Temporal segment networks: Towards good practices for deep action recognition[J]. ACM Transactions on Information Systems, 2016, 22(1): 20-36.

【9】Varol G, Laptev I, Schmid C. Long-term temporal convolutions for action recognition[J]. IEEE Trans Pattern Analysis and Machine Intelligence, 2018, 40(6): 1510-1517.

【10】Bilen H, Fernando B, Gavves E, et al. Dynamic image networks for action recognition[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3034-3042.

【11】Lin S Z, Zheng Y, Lu X F, et al. Adaptive tracking algorithm for aerial small targets based on multi-domain convolutional neural networks and autoregression model[J]. Acta Optica Sinica, 2017, 37(12): 1215006.
蔺素珍, 郑瑶, 禄晓飞, 等. 基于多域卷积神经网络与自回归模型的空中小目标自适应跟踪方法[J]. 光学学报, 2017, 37(12): 1215006.

【12】Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]∥Proceedings of International Conference on Learning Representations, 2015: 1-14.

【13】Qu L, Wang K R, Chen L L, et al. Fast road detection based on RGBD images and convolutional neural network[J]. Acta Optica Sinica, 2017, 37(10): 1010003.
曲磊, 王康如, 陈利利, 等. 基于RGBD图像和卷积神经网络的快速道路检测[J]. 光学学报, 2017, 37(10): 1010003.

【14】Feichtenhofer C, Pinz A, Wildes R P. Spatiotemporal residual networks for video action recognition[C]∥Proceedings of Neural Information Processing Systems, 2016: 3468-3476.

【15】Wang H, Schmid C. Action recognition with improved trajectories[C]∥Proceedings of IEEE International Conference on Computer Vision, 2013: 3551-3558.

【16】Peng X J, Wang L M, Wang X X, et al. Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice[J]. Computer Vision and Image Understanding, 2016, 150: 109-125.

【17】Wang L M, Qiao Y, Tang X O. MoFAP: A multi-level representation for action recognition[J]. International Journal of Computer Vision, 2016, 119(3): 254-271.

【18】Zha S X, Luisier F, Andrews W, et al. Exploiting image-trained CNN architectures for unconstrained video classification[J]. arXiv preprint arXiv: 1503. 04144, 2015.

【19】Wang L M, Qiao Y, Tang X O. Action recognition with trajectory-pooled deep-convolutional descriptors[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015: 4305-4314.

【20】Carreira J, Zisserman A. Quo vadis, action recognition? A new model and the kinetics dataset[J]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 4724-4733.

引用该论文

Li Qinghui,Li Aihua,Wang Tao,Cui Zhigao. Double-Stream Convolutional Networks with Sequential Optical Flow Image for Action Recognition[J]. Acta Optica Sinica, 2018, 38(6): 0615002

李庆辉,李艾华,王涛,崔智高. 结合有序光流图和双流卷积网络的行为识别[J]. 光学学报, 2018, 38(6): 0615002

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF