激光与光电子学进展, 2020, 57 (18): 181702, 网络出版: 2020-09-02   

用于腹腔镜扶持器控制的特定人语音识别算法 下载: 1048次

Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control
作者单位
天津大学精密仪器与光电子工程学院, 天津 300072
摘要
提出了一种基于融合i-vector特征的长短时记忆(LSTM)循环神经网络模型,用于腹腔镜扶持器语音控制,在小训练样本下实现对特定医生语音中的短时、孤立词指令的识别。该模型以LSTM循环神经网络作为基础模型,以梅尔频率倒谱系数(MFCC)作为输入特征参数,将i-vector特征作为LSTM循环神经网络的深层输入信息,与神经网络中LSTM层后的深层特征信息进行拼接,达到参数融合的目的,实现对特定主刀医生语音指令的准确识别以及对非主刀医生语音指令的拒识别,为腹腔镜操作提供安全智能的语音识别方案。使用自建语音库进行实验,分别验证所提算法对训练库内语音的识别性能以及对训练库外语音的拒识别性能。实验结果表明:与动态时间规整算法(DTW)和混合高斯模型-隐马尔可夫模型(GMM-HMM)相比,所提模型在对训练库内特定人语音指令识别正确率高达99.6%的同时保持着错误接受率为0%,对训练库外语音的平均错误接受率为2.5%,满足腹腔镜扶持器控制的准确性和安全性要求。
Abstract
A long short-term memory (LSTM) recurrent neural network based on an i-vector feature is presented for speech control of laparoscopic supporter to realize short-term isolated word command recognition from the speech of a specific doctor using small training samples. In this model, LSTM recurrent neural network is used as the basic model, Mel-frequency cepstrum coefficient (MFCC) is used as the input characteristic parameter, i-vector feature is used as the deep input information of LSTM recurrent neural network, and the deep feature information behind LSTM layer in the neural network is spliced to achieve the purpose of parameter fusion, so as to realize the accurate recognition of the voice instructions of the specific surgeon and the rejection recognition of the voice instructions of the non surgeon. This approach offers a secure and intelligent speech recognition scheme for laparoscopic surgeries. Further, a self-built speech database is used as a training library to verify speech recognition performance of the proposed algorithm as well as its rejection performance for the speech not included in the training library. Experiments show that compared with dynamic time warping(DTW)and Gaussian mixture model-Hidden Markov model (GMM-HMM), the proposed model exhibits a 99.6% correct recognition rate for voice commands of specific people recorded in the training library while maintaining a false acceptance rate of 0%, with an average false acceptance rate of 2.5% for voices not included in the training library. The proposed model meets the requirements of accuracy and safety expected by laparoscopic supporter control standards.

任凯龙, 汪毅, 陈晓冬, 蔡怀宇. 用于腹腔镜扶持器控制的特定人语音识别算法[J]. 激光与光电子学进展, 2020, 57(18): 181702. Kailong Ren, Yi Wang, Xiaodong Chen, Huaiyu Cai. Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181702.

本文已被 2 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!