首页 > 论文 > 激光与光电子学进展 > 57卷 > 20期(pp:201506--1)

自适应融合RGB和骨骼特征的行为识别

Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

传统的基于RGB和骨骼特征的行为识别算法,普遍存在两种特征互补性不足及视频关键时序性不强等问题。为解决这一问题,提出一种自适应融合RGB和骨骼特征的行为识别算法。首先,面向RGB图像和骨骼图像,联合双向长短时记忆(LSTM)网络和自注意力机制提取两者的时空特征;然后,构建自适应权重计算网络(AWCN),并以两者的空间特征为输入计算出自适应权重;最后,利用自适应权重得到上述时空特征的融合特征,实现了最终的动作分类。通过在Penn Action、JHMDB和NTU RGB-D人体行为数据集上与现有的方法进行比较,实验结果表明,本文算法有效地提高了行为识别精度。

Abstract

In this paper, we proposed an action recognition algorithm based on the adaptive fusion of RGB and skeleton features to efficiently improve the accuracy of action recognition. However, the conventional action recognition algorithms based on RGB and skeleton features generally suffer from the inability to effectively utilize the complementarity of the two features and thus fail to focus on important frames in the video. Considering this, we first used the bidirectional long short-term memory network (Bi-LSTM) combined with a self-attention mechanism to extract spatial-temporal features of RGB and skeleton images. Next, we constructed an adaptive weight computing network (AWCN) and computed these adaptive weights based on the spatial features of two types of images. Finally, the extracted spatial-temporal features were fused by the adaptive weights to implement action recognition. Using Penn Action, JHMDB, and NTU RGB-D dataset, the experimental results show that our proposed algorithm effectively improves the accuracy of action recognition compared with existing methods.

广告组1 - 空间光调制器+DMD
补充资料

中图分类号:TP391

DOI:10.3788/LOP57.201506

所属栏目:机器视觉

基金项目:国家自然科学基金、中国博士后科学基金、科技援疆专项计划、江苏博士后科学基金;

收稿日期:2019-12-23

修改稿日期:2020-02-25

网络出版日期:2020-10-01

作者单位    点击查看

郭伏正:江南大学模式识别与计算智能国际联合实验室, 江苏 无锡 214122
孔军:江南大学模式识别与计算智能国际联合实验室, 江苏 无锡 214122
蒋敏:江南大学模式识别与计算智能国际联合实验室, 江苏 无锡 214122

联系人作者:孔军(kongjun@jiangnan.edu.cn)

备注:国家自然科学基金、中国博士后科学基金、科技援疆专项计划、江苏博士后科学基金;

【1】Wang H, Schmid C. Action recognition with improved trajectories[C]∥2013 IEEE International Conference on Computer Vision, December 1-8, 2013, Sydney, NSW, Australia. New York: , 2013, 3551-3558.

【2】Li C J, Liu Y. Abnormal driving behavior detection based on covariance manifold and LogitBoost [J]. Laser & Optoelectronics Progress. 2018, 55(11): 111503.
李此君, 刘云鹏. 基于协方差流形和LogitBoost的异常驾驶行为识别方法 [J]. 激光与光电子学进展. 2018, 55(11): 111503.

【3】Xu H Y, Kong J, Jiang M. Human action recognition based on quaternion 3D skeleton representation [J]. Laser & Optoelectronics Progress. 2018, 55(2): 021002.
徐海洋, 孔军, 蒋敏. 基于四元数3D骨骼表示的人体行为识别 [J]. 激光与光电子学进展. 2018, 55(2): 021002.

【4】Xu H Y, Kong J, Jiang M, et al. Action recognition based on histogram of spatio-temporal oriented principal components [J]. Laser & Optoelectronics Progress. 2018, 55(6): 061009.
徐海洋, 孔军, 蒋敏, 等. 基于时空方向主成分直方图的人体行为识别 [J]. 激光与光电子学进展. 2018, 55(6): 061009.

【5】Hu J F, Zheng W S, Lai J H, et al. Jointly learning heterogeneous features for RGB-D activity recognition[C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: , 2015, 5344-5352.

【6】Liu T S, Kong J, Jiang M. RGB-D action recognition using multimodal correlative representation learning model [J]. IEEE Sensors Journal. 2019, 19(5): 1862-1872.Liu T S, Kong J, Jiang M. RGB-D action recognition using multimodal correlative representation learning model [J]. IEEE Sensors Journal. 2019, 19(5): 1862-1872.

【7】Chéron G, Laptev I. Cordelia. P-. CNN: , 2015, 3218-3226.

【8】Baradel F, Wolf C. -08-07)[2019-12-18] . https: ∥arxiv. 2017, org/abs/1703: 10106.

【9】Choutas V, Weinzaepfel P, Revaud J, et al. PoTion: , 2018, 7024-7033.

【10】Luvizon D C, Picard D, Tabia H. 2D/3D pose estimation and action recognition using multitask deep learning[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: , 2018, 5137-5146.

【11】El-Ghaish H, Hussien M E, Shoukry A, et al. Human action recognition based on integrating body pose, part shape, and motion [J]. IEEE Access. 2018, 6: 49040-49055.El-Ghaish H, Hussien M E, Shoukry A, et al. Human action recognition based on integrating body pose, part shape, and motion [J]. IEEE Access. 2018, 6: 49040-49055.

【12】Luvizon D C, Tabia H, action recognition[EB/OL]. -12-15)[2019-12-18] . https: ∥arxiv. 2019, org/abs/1912: 08077.

【13】Cai Z, Neher H, Vats K, Optical Flows[EB/OL], et al. -12-22)[2019-12-18] . https: ∥arxiv.org/pdf/1812.09533.pdf. 2018.

【14】Simonyan K. -04-10)[2019-12-18] . https: ∥arxiv. 2015, org/abs/1409: 1556.

【15】Cao Z, Simon T, Wei S, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA. New York: , 2017, 1302-1310.

【16】Nie B X, Xiong C M, Zhu S C. Joint action recognition and pose estimation from video[C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: , 2015, 1293-1301.

【17】Zhang W, Zhu M, Derpanis K G. From actemes to action: , 2013, 2248-2255.

【18】Cao C, Zhang Y, Zhang C, et al. Action recognition with joints-pooled 3D deep convolutional descriptors . [C]∥ Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. New York: ACM. 2016, 3324-3330.

【19】Iqbal U, Garbade M, Gall J. Pose for action-action for pose[C]∥2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), May 30 - June 3, 2017, Washington, DC, USA. New York: , 2017, 438-445.

【20】Khalid M U, Yu J. Multi-modal three-stream network for action recognition[C]∥2018 24th International Conference on Pattern Recognition (ICPR), August 20-24, 2018, Beijing, China. New York: , 2018, 3210-3215.

【21】Du W B, Wang Y L, Qiao Y. RPAN: , 2017, 3745-3754.

【22】Liu M Y, Meng F Y, Chen C, et al. Joint dynamic pose image and space time reversal for human action recognition from videos . [C]∥Proceedings of the AAAI Conference on Artificial Intelligence, Reston, VA: AIAA. 2019, 33: 8762-8769.

【23】Peng X, Schmid C. Multi-region two-stream R-CNN for action detection . [C]∥European Conference on Computer Vision. Cham: Springer. 2016, 744-759.

【24】McNally W, Wong A, McPhee J. STAR-. net: , 2019, 49-56.

【25】Javidani A. -02-19) [2019-12-18] . https: ∥arxiv. 2018, org/abs/1802: 06724.

【26】Song S J, Lan C L, Xing J L, et al. -11-18)[2019-12-18] . https: ∥arxiv. 2016, org/abs/1611: 06067.

【27】Zhang P F, Lan C L, Xing J L, et al. View adaptive recurrent neural networks for high performance human action recognition from skeleton data 2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy.[J]. New York: , 2017, 2136-2145.

【28】Yan S J, Xiong Y J. -01-23)[2019-12-18] . https: ∥arxiv. 2018, org/abs/1801: 07455.

【29】Li C, Zhong Q Y, Xie D, et al. Skeleton-based action recognition with convolutional neural networks[C]∥2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), July 10-14, 2017, Hong Kong, China. New York: , 2017, 597-600.

【30】-09-23)[2019-12-18] . https: ∥arxiv. 2019, org/abs/1909: 10214.

【31】Li C, Zhong Q Y, Xie D, detection with hierarchical aggregation[EB/OL], et al. -04-17)[2019-12-18] . https: ∥arxiv. 2018, org/abs/1804: 06055.

【32】Si C Y, Jing Y, Wang W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning network [J]. Pattern Recognition. 2020, 107: 107511.

引用该论文

Guo Fuzheng,Kong Jun,Jiang Min. Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features[J]. Laser & Optoelectronics Progress, 2020, 57(20): 201506

郭伏正,孔军,蒋敏. 自适应融合RGB和骨骼特征的行为识别[J]. 激光与光电子学进展, 2020, 57(20): 201506

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF