基于具有深度门的多模态长短期记忆网络的说话人识别

陈湟康; 陈莹

doi:doi:10.3788/LOP56.031007

激光与光电子学进展, 2019, 56 (3): 031007, 网络出版: 2019-07-31

基于具有深度门的多模态长短期记忆网络的说话人识别下载： 1148次

Speaker Identification Based on Multimodal Long Short-Term Memory with Depth-Gate

陈湟康 ^*陈莹 ^**

作者单位

江南大学轻工过程先进控制教育部重点实验室, 江苏无锡 214122

图像处理说话人识别长短期记忆网络融合深度门权重共享 image processing speaker recognition long short-term memory network fusion depth-gate weight sharing

AI 词云图 AI一句话精读 AI短摘要

注：本部分内容由 AI 自动生成，请您知悉。

摘要

为了在说话人识别任务中有效融合音视频特征,提出一种基于深度门的多模态长短期记忆(LSTM)网络。首先对每一类单独的特征建立一个多层LSTM模型,并通过深度门连接上下层的记忆存储单元,增强上下层的联系,提升该特征本身的分类性能。同时,通过在不同模型之间共享连接隐藏层输出与各个门单元的权重,学习每一层模型之间的联系。实验结果表明,该方法能有效融合音视频特征,提高说话人识别的准确率,并且对干扰具有一定的稳健性。

Abstract

In order to effectively fuse the audio and visual features in the task of speaker recognition, a multimodal long short-term memory network (LSTM) with depth-gate is proposed. First, a multi-layer LSTM model is established for each type of individual features. Then the depth-gate is used to connect the memory cells in the upper and lower layers, and the connection between the upper and lower layers is enhanced, which improves the classification performance of the feature itself. At the same time, the connection among layer models can be learned by sharing the output of hidden layers and the weight of each gate unit among different models. The experimental results show that this method can be used to effectively fuse the audio and video features and improve the accuracy of speaker recognition. Moreover, this method is robust to external disturbance.

PDF全文

陈湟康, 陈莹. 基于具有深度门的多模态长短期记忆网络的说话人识别[J]. 激光与光电子学进展, 2019, 56(3): 031007. Huangkang Chen, Ying Chen. Speaker Identification Based on Multimodal Long Short-Term Memory with Depth-Gate[J]. Laser & Optoelectronics Progress, 2019, 56(3): 031007.

基于具有深度门的多模态长短期记忆网络的说话人识别下载： 1148次

关于本站 Cookie 的使用提示

全站搜索

基于具有深度门的多模态长短期记忆网络的说话人识别 下载： 1148次

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索

基于具有深度门的多模态长短期记忆网络的说话人识别下载： 1148次