首页 > 论文 > 激光与光电子学进展 > 56卷 > 15期(pp:151503--1)

基于Bi-LSTM-Attention模型的人体行为识别算法

Human Action Recognition Algorithm Based on Bi-LSTM-Attention Model

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

针对长短时记忆网络(LSTM)不能有效地提取动作前后之间相互关联的信息导致行为识别率偏低的问题,提出了一种基于Bi-LSTM-Attention模型的人体行为识别算法。该算法首先从每个视频中提取20帧图像,通过Inceptionv3模型提取图像中的深层特征,然后构建向前和向后的Bi-LSTM神经网络学习特征向量中的时序信息,接着利用注意力机制自适应地感知对识别结果有较大影响的网络权重,使模型能够根据行为的前后关系实现更精确的识别,最后通过一层全连接层连接Softmax分类器并对视频进行分类。通过Action Youtobe和KTH人体行为数据集与现有的方法进行比较,实验结果表明,本文算法有效地提高了行为识别率。

Abstract

This study proposed a human action recognition algorithm based on the Bi-LSTM-Attention model to solve the problem of low action recognition rate. This problem was caused by the inability of long short term memory (LSTM) networks to effectively extract correlative informations before and after actions. The proposed algorithm first extracted 20 image frames from each video and used the Inceptionv3 model to extract deep features from these frames. Then, forward and backward Bi-LSTM neural networks were constructed to learn the temporal information in the feature vectors. The influences of network weights on recognition results were adaptively perceived using the attention mechanism. This step was performed so that the model could achieve more accurate recognition based on the relationship between informations acquired before and after performing the given action. Finally, the videos were connected via a fully-connected layer to a Softmax classifier for classification. Comparison between the Action Youtobe and KTH human action datasets and existing methods revealed that the proposed algorithm effectively improved the action recognition rate.

Newport宣传-MKS新实验室计划
补充资料

DOI:10.3788/LOP56.151503

所属栏目:机器视觉

基金项目:教育部-新华三集团“云数融合”基金(2017A13055);

收稿日期:2019-01-23

修改稿日期:2019-03-11

网络出版日期:2019-08-01

作者单位    点击查看

朱铭康:江南大学轻工过程先进控制教育部重点实验室,江苏 无锡 214421
卢先领:江南大学物联网工程学院, 江苏 无锡 214122

联系人作者:卢先领(jnluxl@jiangnan.edu.cn)

备注:教育部-新华三集团“云数融合”基金(2017A13055);

【1】Buri M, Pobar M and Kos M I. An overview of action recognition in videos. [C]∥2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), May 22-26, 2017, Opatija, Croatia. New York: IEEE. 1098-1103(2017).

【2】Luo H L, Wang C J and Lu F. Survey of video behavior recognition. Journal on Communications. 39(6), 169-180(2018).
罗会兰, 王婵娟, 卢飞. 视频行为识别综述. 通信学报. 39(6), 169-180(2018).

【3】Willems G. Tuytelaars T, van Gool L. An efficient dense and scale-invariant spatio-temporal interest point detector. ∥Forsyth D, Torr P, Zisserman A. Computer vision-ECCV 2008. Lecture notes in computer science. Berlin, Heidelberg: Springer. 5303, 650-663(2008).

【4】Rapantzikos K, Avrithis Y and Kollias S. Dense saliency-based spatiotemporal feature points for action recognition. [C]∥2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 20-25, 2009, Miami, FL, USA. New York: IEEE. 1454-1461(2009).

【5】Abdulmunem A, Lai Y K and Sun X F. Saliency guided local and global descriptors for effective action recognition. Computational Visual Media. 2(1), 97-106(2016).

【6】Luo J J, Wang W and Qi H R. Spatio-temporal feature extraction and representation for RGB-D human action recognition. Pattern Recognition Letters. 50, 139-148(2014).

【7】Liu A A, Su Y T, Nie W Z et al. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39(1), 102-114(2017).

【8】Liu Z, Huang J T and Feng X. Action recognition model construction based on multi-scale deep convolution neural network. Optics and Precision Engineering. 25(3), 799-805(2017).
刘智, 黄江涛, 冯欣. 构建多尺度深度卷积神经网络行为识别模型. 光学精密工程. 25(3), 799-805(2017).

【9】Zhu Y, Zhao J K, Wang Y N et al. A review of human action recognition based on deep learning. Acta Automatica Sinica. 42(6), 848-857(2016).
朱煜, 赵江坤, 王逸宁 等. 基于深度学习的人体行为识别算法综述. 自动化学报. 42(6), 848-857(2016).

【10】Charalampous K and Gasteratos A. On-line deep learning method for action recognition. Pattern Analysis and Applications. 19(2), 337-354(2016).

【11】Ji S W, Xu W, Yang M et al. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 35(1), 221-231(2013).

【12】Donahue J, Hendricks L A, Guadarrama S et al. Long-term recurrent convolutional networks for visual recognition and description. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE. 2625-2634(2015).

【13】Gammulle H, Denman S, Sridharan S et al. Two stream LSTM: a deep fusion framework for human action recognition. [C]∥2017 IEEE Winter Conference on Applications of Computer Vision (WACV), March 24-31, 2017, Santa Rosa, CA, USA. New York: IEEE. 177-186(2017).

【14】Li Q H, Li A H, Wang T et al. Double-stream convolutional networks with sequential optical flow image for action recognition. Acta Optica Sinica. 38(6), (2018).
李庆辉, 李艾华, 王涛 等. 结合有序光流图和双流卷积网络的行为识别. 光学学报. 38(6), (2018).

【15】Das S, Koperski M, Bremond F et al. Deep-temporal LSTM for daily living action recognition. [C]∥2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance(AVSS), November 27-30, 2018, Auckland, New Zealand. New York: IEEE. 18455900, (2018).

【16】Ullah A, Ahmad J, Muhammad K et al. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access. 6, 1155-1166(2018).

【17】Szegedy C, Vanhoucke V, Ioffe S et al. Rethinking the inception architecture for computer vision. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE. 2818-2826(2016).

【18】Ravanbakhsh M, Mousavi H, Rastegari M et al. -12-13)[2019-01-02]. https:∥arxiv. org/abs/1512, (2015).

【19】Yang X D and Tian Y L. Action recognition using super sparse coding vector with spatio-temporal awareness. ∥Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014. Lecture notes in computer science. Cham: Springer. 8690, 727-741(2014).

【20】Wang J, Liu Z C, Wu Y et al. Mining actionlet ensemble for action recognition with depth cameras. [C]∥2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16-21, 2012, Providence, RI, USA. New York: IEEE. 1290-1297(2012).

【21】Peng X J, Zou C Q, Qiao Y et al. Action recognition with stacked fisher vectors. ∥Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014. Lecture notes in computer science. Cham: Springer. 8693, 581-595(2014).

【22】Li Y D and Xu X P. Human action recognition by decision-making level fusion based on spatial-temporal features. Acta Optica Sinica. 38(8), (2018).
李艳荻, 徐熙平. 基于空-时域特征决策级融合的人体行为识别算法. 光学学报. 38(8), (2018).

【23】Sun L, Jia K, Chan T H et al. DL-SFA: deeply-learned slow feature analysis for action recognition. [C]∥2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 23-28, 2014, Columbus, OH, USA. New York: IEEE. 2625-2632(2014).

【24】Huang Y W, Wan C L and Feng H. Multi-feature fusion human behavior recognition algorithm based on convolutional neural network and long short term memory neural network. Laser & Optoelectronics Progress. 56(7), (2019).
黄友文, 万超伦, 冯恒. 基于卷积神经网络与长短期记忆神经网络的多特征融合人体行为识别算法. 激光与光电子学进展. 56(7), (2019).

引用该论文

Zhu Mingkang,Lu Xianling. Human Action Recognition Algorithm Based on Bi-LSTM-Attention Model[J]. Laser & Optoelectronics Progress, 2019, 56(15): 151503

朱铭康,卢先领. 基于Bi-LSTM-Attention模型的人体行为识别算法[J]. 激光与光电子学进展, 2019, 56(15): 151503

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF