基于时空交互注意力模型的人体行为识别算法 下载: 1049次
潘娜, 蒋敏, 孔军. 基于时空交互注意力模型的人体行为识别算法[J]. 激光与光电子学进展, 2020, 57(18): 181506.
Na Pan, Min Jiang, Jun Kong. Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181506.
[1] SimonyanK, ZissermanA. Two-stream convolutional networks for action recognition in videos[C]∥Advances in neural information processing systems, December 8-13, 2014, Montreal, Quebec, Canada: Curran Associates, Inc., 2014: 568- 576.
[2] Wang LM, Xiong YJ, WangZ, et al.Temporal segment networks: towards good practices for deep action recognition[M] ∥Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016: 20- 36.
[3] CarreiraJ, ZissermanA. Quo vadis, action recognition? A new model and the kinetics dataset[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 21-26 July 2017, Honolulu, HI, USA.New York: IEEE Press, 2017: 4724- 4733.
[4] MnihV, HeessN, GravesA, et al. Recurrent models of visual attention[C]∥NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2.2014: 2204- 2212.
[5] Fan LF, Chen YX, WeiP, et al.Inferring shared attention in social scene videos[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18-23 June 2018, Salt Lake City, UT, USA.New York: IEEE Press, 2018: 6460- 6468.
[6] Lu M L, Li Z N, Wang Y M, et al. Deep attention network for egocentric action recognition[J]. IEEE Transactions on Image Processing, 2019, 28(8): 3703-3713.
[7] FuJ, LiuJ, Tian HJ, et al.Dual attention network for scene segmentation[C]∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15-20 June 2019, Long Beach, CA, USA.New York: IEEE Press, 2019: 3141- 3149.
[8] 朱铭康, 卢先领. 基于Bi-LSTM-Attention模型的人体行为识别算法[J]. 激光与光电子学进展, 2019, 56(15): 151503.
[9] Tang YS, TianY, Lu JW, et al.Deep progressive reinforcement learning for skeleton-based action recognition[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18-23 June 2018, Salt Lake City, UT, USA.New York: IEEE Press, 2018: 5323- 5332.
[10] Jing L L, Yang X D, Tian Y L. Video You only look once: overall temporal convolutions for action recognition[J]. Journal of Visual Communication and Image Representation, 2018, 52: 58-65.
[11] Yu T Z, Guo C X, Wang L F, et al. Joint spatial-temporal attention for action recognition[J]. Pattern Recognition Letters, 2018, 112: 226-233.
[12] Lu L H, Di H J, Lu Y, et al. Spatio-temporal attention mechanisms based model for collective activity recognition[J]. Signal Processing: Image Communication, 2019, 74: 162-174.
[13] He KM, GkioxariG, DollárP, et al.Mask R-CNN[C]∥2017 IEEE International Conference on Computer Vision (ICCV). 22-29 Oct. 2017, Venice, Italy.New York: IEEE Press, 2017: 2980- 2988.
[14] Fan LJ, Huang WB, GanC, et al.End-to-end learning of motion representation for video understanding[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18-23 June 2018, Salt Lake City, UT, USA.New York: IEEE Press, 2018: 6016- 6025.
[15] Li Z Y, Gavrilyuk K, Gavves E, et al. Video LSTM convolves, attends and flows for action recognition[J]. Computer Vision and Image Understanding, 2018, 166: 41-50.
[16] Zhang J X, Hu H F. Deep spatiotemporal relation learning with 3D multi-level dense fusion for video action recognition[J]. IEEE Access, 2019, 7: 15222-15229.
[17] Khowaja S A, Lee S L. Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition[J]. Neural Computing and Applications, 2019: 1-12.
[18] WangH, SchmidC. Action recognition with improved trajectories[C]∥2013 IEEE International Conference on Computer Vision. 1-8 Dec. 2013, Sydney, NSW, Australia.New York: IEEE Press, 2013: 3551- 3558.
[19] Peng X J, Wang L M, Wang X X, et al. Bag of visual words and fusion methods for action recognition: comprehensive study and good practice[J]. Computer Vision and Image Understanding, 2016, 150: 109-125.
[20] Lan ZZ, LinM, Li XC, et al.Beyond Gaussian pyramid: multi-skip feature stacking for action recognition[C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7-12 June 2015, Boston, MA, USA. New York: IEEE Press, 2015: 204- 212.
[21] ZhuY, Lan ZZ, NewsamS, et al.Hidden two-stream convolutional networks for action recognition[M] ∥Computer Vision-ACCV 2018. Cham: Springer International Publishing, 2019: 363- 378.
[22] Tu Z G, Xie W, Dauwels J, et al. Semantic cues enhanced multimodality multistream CNN for action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(5): 1423-1437.
[23] TranA, Cheong LF. Two-stream flow-guided convolutional attention networks for action recognition[C]∥2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 22-29 Oct. 2017, Venice, Italy.New York: IEEE Press, 2017: 3110- 3119.
[24] Du W B, Wang Y L, Qiao Y. Recurrent spatial-temporal attention network for action recognition in videos[J]. IEEE Transactions on Image Processing, 2018, 27(3): 1347-1360.
[25] Cao CQ, Zhang YF, Zhang CJ, et al. Action recognition with joints-pooled 3D deep convolutional descriptors[C]∥IJCAI'16: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence.2016: 3324- 3330.
[26] VillegasR, YangJ, ZouY, et al. Learning to generate long-term future via hierarchical prediction[C]∥Proceedings of the 34th International Conference on Machine Learning-Volume 70, Aug 6-11, 2017, Sydney, Australia: JMLR. org, 2017: 3560- 3569.
[27] Gao RH, XiongB, GraumanK. Im2flow: motion hallucination from static images for action recognition[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18-23 June 2018, Salt Lake City, UT, USA. New York: IEEE Press, 2018: 5937- 5947.
潘娜, 蒋敏, 孔军. 基于时空交互注意力模型的人体行为识别算法[J]. 激光与光电子学进展, 2020, 57(18): 181506. Na Pan, Min Jiang, Jun Kong. Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181506.