基于嵌入注意力机制层级LSTM的音视频情感识别 下载: 1570次
刘天宝, 张凌涛, 于文涛, 魏东川, 范轶军. 基于嵌入注意力机制层级LSTM的音视频情感识别[J]. 激光与光电子学进展, 2021, 58(2): 0210017.
Tianbao Liu, Lingtao Zhang, Wentao Yu, Dongchuan Wei, Yijun Fan. Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210017.
[1] 袁配配, 张良. 基于深度学习的行人属性识别[J]. 激光与光电子学进展, 2020, 57(6): 061001.
[2] 刘芾, 李茂军, 胡建文, 等. 基于低像素人脸图像的表情识别[J]. 激光与光电子学进展, 2020, 57(10): 101008.
[3] 张义超, 孙子文. 基于优化卷积深度信念网络的智能手机身份认证方法[J]. 激光与光电子学进展, 2020, 57(8): 081009.
[4] Nwe T L. Foo S W, de Silva L C. Speech emotion recognition using hidden Markov models[J]. Speech Communication, 2003, 41(4): 603-623.
[5] Satt A, Rozenberg S, Hoory R. Efficient emotion recognition from speech using deep learning on spectrograms[J]. Proceedings of Interspeech, 2017, 2017: 1089-1093.
[6] SutskeverI, VinyalsO, Le QV. Sequence to sequence learning with neural networks[C]∥Proceedings of the 27th International Conference on Neural Information Processing Systems, December 8-13, 2014, Montreal, Quebec, Canada. New York: Curran Associates, 2014, 2: 3104- 3112.
[7] IrsoyO, CardieC. Deep recursive neural networks for compositionality in language[C]∥ Proceedings of the 27th International Conference on Neural Information Processing Systems, December 8-13, 2014, Montreal, Quebec, Canada. New York: Curran Associates, 2014, 2: 2096- 2104.
[8] Lin ZH, Feng MW, dos Santos CN, et al. ( 2017-03-09)[2020-07-05]. https:∥arxiv.org/abs/1703. 03130.
[9] Guo Z H, Zhang L, Zhang D. A completed modeling of local binary pattern operator for texture classification[J]. IEEE Transactions on Image Processing, 2010, 19(6): 1657-1663.
[10] 张石清, 李乐民, 赵知劲. 基于一种改进的监督流形学习算法的语音情感识别[J]. 电子与信息学报, 2010, 32(11): 2724-2729.
Zhang S Q, Li L M, Zhao Z J. Speech emotion recognition based on an improved supervised manifold learning algorithm[J]. Journal of Electronics & Information Technology, 2010, 32(11): 2724-2729.
[11] WangS, Wang WX, Zhao JM, et al.Emotion recognition with multimodal features and temporal models[C]∥Proceedings of the 19th ACM International Conference on Multimodal Interaction-ICMI 2017, November 3-17, 2017, Glasgow, UK.New York: ACM Press, 2017: 598- 602.
[12] Wu C H, Lin J C, Wei W L. Survey on audiovisual emotion recognition: databases, features, and data fusion strategies[J]. APSIPA Transactions on Signal and Information Processing, 2014, 3: e12.
[13] Abdel-Hamid O, Mohamed A R, Jiang H, et al. Convolutional neural networks for speech recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(10): 1533-1545.
[14] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
[15] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[16] BahdanauD, ChoK, Bengio Y. Neural machine translation by jointly learning to align and translate[EB/OL]. ( 2016-05-19)[2020-07-05]. https:∥arxiv.org/abs/1409. 0473.
[17] FanY, Lu XJ, LiD, et al.Video-based emotion recognition using CNN-RNN and C3D hybrid networks[C]∥Proceedings of the 18th ACM International Conference on Multimodal Interaction-ICMI 2016, October 31-November 16, 2016, Tokyo, Japan.New York: ACM Press, 2016: 445- 450.
[18] NguyenD, NguyenK, SridharanS, et al.Deep spatio-temporal features for multimodal emotion recognition[C]∥2017 IEEE Winter Conference on Applications of Computer Vision (WACV), March 24-31, 2017, Santa Rosa, CA, USA.New York: IEEE Press, 2017: 1215- 1223.
[19] KnyazevB, ShvetsovR, EfremovaN, et al.Leveraging large face recognition data for emotion classification[C]∥2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), May 15-19, 2018, Xi'an, China.New York: IEEE Press, 2018: 692- 696.
[20] Wang Y J, Guan L. Recognizing human emotional state from audiovisual signals[J]. IEEE Transactions on Multimedia, 2008, 10(4): 659-668.
[21] DhallA, GoeckeR, JoshiJ, et al.EmotiW 2016: video and group-level emotion recognition challenges[C]∥Proceedings of the 18th ACM International Conference on Multimodal Interaction-ICMI 2016, October 31-November 16, 2016, Tokyo, Japan. New York: ACM Press, 2016: 427- 432.
[22] MartinO, KotsiaI, MacqB, et al.The eNTERFACE'05 audio-visual emotion database[C]∥22nd International Conference on Data Engineering Workshops (ICDEW'06), April 3-7, 2006, Atlanta, GA, USA.New York: IEEE Press, 2006.
[23] Avots E, Sapiński T, Bachmann M, et al. Audiovisual emotion recognition in wild[J]. Machine Vision and Applications, 2019, 30(5): 975-985.
[24] Noroozi F, Marjanovic M, Njegus A, et al. Audio-visual emotion recognition in video clips[J]. IEEE Transactions on Affective Computing, 2017, 10(1): 60-75.
[25] Wang X S, Chen X, Cao C J. Human emotion recognition by optimally fusing facial expression and speech feature[J]. Signal Processing: Image Communication, 2020, 84: 115831.
[26] Zhang YY, Wang ZR, DuJ. Deep fusion: an attention guided factorized bilinear pooling for audio-video emotion recognition[C]∥2019 International Joint Conference on Neural Networks (IJCNN), July 14-19, 2019, Budapest, Hungary. New York: IEEE Press, 2019.
[27] Dangol R, Alsadoon A. Prasad P W C, et al. Speech emotion recognition using convolutional neural network and long-short term memory[J]. Multimedia Tools and Applications, 2020, 79: 32917-32934.
刘天宝, 张凌涛, 于文涛, 魏东川, 范轶军. 基于嵌入注意力机制层级LSTM的音视频情感识别[J]. 激光与光电子学进展, 2021, 58(2): 0210017. Tianbao Liu, Lingtao Zhang, Wentao Yu, Dongchuan Wei, Yijun Fan. Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210017.