首页 > 论文 > 液晶与显示 > 33卷 > 2期(pp:165-173)

基于量子粒子群优化广义回归神经网络的语音转换方法

Voice conversion based on quantum particle swarm optimization of generalized regression neural network

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

针对粒子群算法优化神经网络进行语音转换时容易产生收敛速度慢、早熟的问题,本文采用一种新的量子粒子群算法优化广义回归神经网络的语音转换模型。该量子粒子群通过改变量子比特相位进而改变位置矢量, 并利用量子非门进行变异操作。因此首先利用量子粒子群对网络进行优化得到最佳的光滑因子参数, 从而建立频谱映射规则。接着, 利用频谱参数和基频参数的相关性, 对韵律特征基频也进行转换。然后, 联立转换后的频谱参数和基频参数, 利用STRAIGHT模型合成目标语音。最后, 采用主观和客观测评方式进行评价。实验结果表明, 与传统粒子群算法优化广义回归神经网络相比, 该方法转换后的语音自然度和相似度得到提升, 谱失真率下降2.1%。本文方法具有比径向基神经网络、广义回归神经网络、粒子群算法优化广义回归神经网络更好的语音转换性能。

Abstract

In this paper, a new quantum particle swarm optimization algorithm is used to optimize the voice conversion model of generalized regression neural network in order to solve the problem of slow convergence and premature phenomenon in particle swarm optimization. The quantum particle swarm optimization algorithm changes the position vector by changing the quantum bit phase and uses the quantum non-gate to perform the mutation operation. Therefore, we first use the quantum particle swarm to optimize the network to get the best smooth factor parameters, so as to establish spectrum mapping rules. After that, we use the correlation between the spectral parameters and the fundamental frequency parameters to convert the prosodic characteristic fundamental frequency. Then, the STRAIGHT model is used to synthesize the target voice in conjunction with the converted spectral parameters and the fundamental frequency parameters. Finally, we use the subjective and objective evaluation methods to evaluate. The experimental results show that the natural and similarity of the proposed method for the transformed voice are improved and the spectral distortion rate is reduced by 2.1% compared with the traditional particle swarm optimization algorithm. The proposed method has better voice conversion performance than radial basis function neural network, generalized regression neural network and generalized regression neural network optimized by particle swarm optimization.

Newport宣传-MKS新实验室计划
补充资料

中图分类号:TN912.3

DOI:10.3788/yjyxs20183302.0165

所属栏目:图像处理

基金项目:住房城乡建设部科学技术项目计划(No.2016-R2-045);陕西省教育厅专项基金(No.2013JK1081);陕西省科学技术研究发展计划项目(No.CXY1122(2));陕西省自然科学基金青年基金(No.2013JQ8003)

收稿日期:2017-09-07

修改稿日期:2017-10-27

网络出版日期:--

作者单位    点击查看

王 民:西安建筑科技大学 信息与控制工程学院, 陕西 西安710055
赵 渊:西安建筑科技大学 信息与控制工程学院, 陕西 西安710055
刘 利:西安建筑科技大学 信息与控制工程学院, 陕西 西安710055
许 娟:西安建筑科技大学 信息与控制工程学院, 陕西 西安710055

联系人作者:王民(wangmin1329@163.com)

备注:王民(1959-), 男, 江苏常州人, 教授, 硕士生导师, 西安建筑科技大学通信与信息工程系主任, 电子信息工程教研室主任, 长期从事智能信息处理方面的研究工作。

【1】GHORBANDOOST M, SAYADIYAN A, AHANGAR M, et al. Voice conversion based on feature combination with limited training data [J]. Speech Communication, 2015, 67: 113-128.

【2】GODOYE, ROSEC O, CHONAVEL T. Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4): 1313-1323.

【3】LIL F, ZHAO Y, JIANG D M, et al. Hybrid deep neural network-hidden markov model (DNN-HMM) based speech emotion recognition [C]//Proceedings of 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. Geneva, Switzerland: IEEE, 2013: 312-317.

【4】NAKAMURA K, TODA T, SARUWATARI H, et al. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech [J]. Speech Communication, 2012, 54(1): 134-146.

【5】BHARTI K S, KOOLAGUDI S G, RAO K S, et al. Voice conversion using linear prediction coefficients and artificial neural network [C]//Proceedings of the Cube International Information Technology Conference. Pune, India: ACM, 2012: 240-245.

【6】NIRMAL J, ZAVERI M, PATNAIK S, et al. Voice conversion using general regression neural network [J]. Applied Soft Computing, 2014, 24: 1-12.

【7】杨秀峰.基于神经网络的语音转换算法研究[D].西安:西安建筑科技大学, 2017.
YANG X F. The research of voice conversion based on neural network [D]. Xi’an: Xi’an University of Architecture and Technology, 2017. (in Chinese)

【8】王洪涛, 李丹.基于改进粒子群算法的图像边缘检测研究[J].液晶与显示, 2014, 29(5):800-804.
WANG H T, LI D. Image edge detection research based on improved particle swarm optimization algorithm [J]. Chinese Journal of Liquid Crystals and Displays, 2014, 29(5): 800-804. (in Chinese).

【9】李士勇, 李盼池.求解连续空间优化问题的量子粒子群算法[J].量子电子学报, 2007, 24(5):569-574.
LI S Y, LI P C. Quantum particle swarms algorithm for continuous space optimization [J]. Chinese Journal of Quantum Electronics, 2007, 24(5): 569-574. (in Chinese)

【10】KENNEDYJ, EBERHART R. Particle swarm optimization [C]//Proceedings of 1995 IEEE International Conference on Neural Networks. Perth, WA, Australia: IEEE, 1995: 1942-1948.

【11】张玲华, 姚绍芹, 解伟超.基于自适应粒子群优化径向基函数神经网络的语音转换[J].数据采集与处理, 2015, 30(2):336-343.
ZHANG L H, YAO S Q, XIE W C. Voice conversion based on adaptive particle swarm optimization radial basis function neural network [J]. Journal of Data Acquisition and Processing, 2015, 30(2): 336-343. (in Chinese)

【12】张国梁, 贾松敏, 张祥银, 等.采用自适应变异粒子群优化SVM的行为识别[J].光学 精密工程, 2017, 25(6):1669-1678.
ZHANG G L, JIA S M, ZHANG X Y, et al. Action recognition based on adaptive mutation particle swarm optimization for SVM [J]. Optics and Precision Engineering, 2017, 25(6): 1669-1678. (in Chinese)

【13】张志成, 林君, 石要武, 等.用加权子空间拟合和量子粒子群算法联合估计多普勒频率和波达方向[J].光学 精密工程, 2013, 21(9):2445-2451.
ZHANG Z C, LIN J, SHI Y W, et al. Joint estimation of Dopplers and DOAs by WSF-QPSO method [J]. Optics and Precision Engineering, 2013, 21(9): 2445-2451. (in Chinese)

【14】解伟超.语音转换中声道谱参数和基频变换算法的研究[D].南京:南京邮电大学, 2013.
XIE W C. The research on vocal tract spectrum and pitch frequency transformation in voice conversion [D]. Nanjing: Nanjing University of Posts and Telecommunications, 2013. (in Chinese)

【15】SHAO X, MILNER B. Pitch prediction from MFCC vectors for speech reconstruction [C]//Proceedings of 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal, Canada: IEEE, 2004: 97-100.

引用该论文

WANG Min,ZHAO Yuan,LIU Li,XU Juan. Voice conversion based on quantum particle swarm optimization of generalized regression neural network[J]. Chinese Journal of Liquid Crystals and Displays, 2018, 33(2): 165-173

王 民,赵 渊,刘 利,许 娟. 基于量子粒子群优化广义回归神经网络的语音转换方法[J]. 液晶与显示, 2018, 33(2): 165-173

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF