首页 > 论文 > 光谱学与光谱分析 > 40卷 > 6期(pp:1869--1)

改进和声搜索算法的近红外光谱特征变量选择

Research on Near Infrared Spectral Feature Variable Selection Method Based on Improved Harmonic Search Algorithm

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

近红外光谱分析以其简便、 快速、 高效、 低成本、 绿色环保等优点, 已广泛应用于诸多领域。 然而, 近红外光谱同时存在变量维度高、 多重共线性、 包含冗余信息和高频噪声等问题, 直接构建预测模型不但增加建模复杂度, 同时也会影响模型的预测性能和泛化能力, 因此提出一种基于改进和声搜索算法(HS)的光谱特征变量选择方法。 HS常用于解决特征变量优化选择问题。 在应用和声搜索算法进行最优光谱变量选择时, 首先通过偏最小二乘(PLS)载荷系数计算各光谱点的特征贡献度, 作为和声搜索算法改进的扰动权重。 算法优选光谱特征变量过程中, 引入变量特征贡献度作为激励因子, 采用随机遍历和激励因子共同作用的方式生成初始解向量。 产生新和声向量时, 应用变量特征贡献度作为惩罚项, 通过加入平衡因子使选择参数随迭代次数而动态调整, 从而适应光谱变量的搜索, 增强搜索过程的遍历性和种群的多样性。 为验证本算法的有效性, 以烟叶样品烟碱、 总糖、 总氮三个指标的近红外光谱PLS建模应用为例, 对采集的原始光谱进行预处理后, 应用该方法对光谱变量进行优选, 根据变量被选择的累积频次分别计算不同变量个数的模型预测性能, 通过校正均方根误差(RMSEC)随变量增加的变化趋势确定最终选择的光谱特征变量。 在训练集上分别建立各指标的PLS模型, 应用测试集测试模型性能, 并与全光谱、 无信息变量消除法(UVE)和粒子群算法(PSO)进行比较。 实验结果显示, 应用该算法所选变量建立的烟碱、 总糖和总氮三个模型的决定系数(R2)分别为0.921 1, 0.925 7和0.941 2, 预测均方根误差(RMSEP)分别为0.102 3, 1.034 6和0.053 1, 与其他方法相比, 光谱特征变量更少, 同时R2和RMSEP值更优。 由此表明, 改进的和声搜索算法能有效筛选特征光谱, 降低建模复杂度, 提升模型预测性能和泛化能力。

Abstract

Near-infrared spectroscopy has been widely used in many fields for detection and analysis because of its advantages of simplicity, speed, efficiency, low cost, and environment protection. However, the NIR spectra also contain interferences such as high variable dimension, multiple collinearities, redundant information, and high frequency noise. The direct construction of the prediction model not only increases the modeling complexity but also affects the prediction performance and generalization. For this purpose, a spectral feature variable selection method based on the improved Harmony Search algorithm (HS) is proposed. HS is often used to solve feature variable optimization problem. When the spectral variable selection is applied by the HS algorithm, the feature contribution of spectra is firstly calculated by the PLS loading coefficient as the disturbance weight of the improved HS. In the process of optimizing the spectral feature variables, the variable feature contribution is introduced as the excitation factor, and the initial solution vectors are generated by the combination of random traversal and excitation factor. When generating the new harmony vector, the feature contribution is applied as a penalty factor, and the parameters of HS are dynamically adjusted with the number of iterations by adding the balance factor, so as to adapt to the search of spectral variables. It enhances the ergodicity of the search process and the diversity of the population. In order to verify the effectiveness of the algorithm, the NIR PLS models of nicotine, total sugar and total nitrogen using tobacco samples are constructed. After pre-processing the original spectra, this method is used to optimize spectral variables. The prediction performance of each model corresponding to the number of different variables is calculated according to the cumulative frequency at which the variables are selected, and the final selected spectral variables are determined by the increasing trend of the Root Mean Square Error of Calibration (RMSEC) with the variables. The three PLS models are established on the training set and the test set respectively, and they are compared with the full spectrum, Uninformative Variables Elimination (UVE) and Particle Swarm Optimization (PSO). The experimental results show that the coefficient of determination (R2) of nicotine, total sugar and total nitrogen models using the selected variables is 0.921 1, 0.925 7 and 0.941 2, respectively; and the Root Mean Square Error of Prediction (RMSEP) is 0.102 3, 1.034 6 and 0.053 1. Compared with other methods, the RMSEP of this study is low, the R2 of these models is more than 0.92, and the spectral characteristic variables are small. It is shown that the improved HS algorithm can effectively filter the feature spectrum, reduce the modeling complexity, improve the model prediction performance and generalization ability.

广告组1 - 空间光调制器+DMD
补充资料

中图分类号:O657.3

DOI:10.3964/j.issn.1000-0593(2020)06-1869-07

基金项目:国家重点研发计划课题(2018YFB1701703), 云南中烟工业有限责任公司科技项目(2016XX01)资助

收稿日期:2019-04-15

修改稿日期:2019-08-04

网络出版日期:--

作者单位    点击查看

张磊:中国海洋大学信息科学与工程学院, 山东 青岛 266100
丁香乾:中国海洋大学信息科学与工程学院, 山东 青岛 266100
宫会丽:中国海洋大学信息科学与工程学院, 山东 青岛 266100
吴丽君:云南中烟工业有限责任公司技术中心, 云南 昆明 650024
白晓莉:云南中烟工业有限责任公司技术中心, 云南 昆明 650024
罗林:云南中烟工业有限责任公司技术中心, 云南 昆明 650024

联系人作者:吴丽君(wallis8@126.com)

备注:张磊, 1987年生, 中国海洋大学信息科学与工程学院博士研究生 e-mail:zhanglei_0036@163.com

【1】CHEN Li-ju, LIU Wei(陈丽菊, 刘 巍). Modern Physics(现代物理知识), 2016, 18(2): 10.

【2】SUN Wen-ping, GONG Hui-li, WANG Mei-xun, et al(孙文苹, 宫会丽, 王梅勋, 等). Microcomputer & Its Applications(微型机与应用), 2015, 34(1): 78.

【3】LI Qian-qian, TIAN Kuang-da, LI Zu-hong, et al(李倩倩, 田旷达, 李祖红, 等). Chinese Journal of Analytical Chemistry(分析化学), 2013, 41(6): 917.

【4】XU Bao-ding, QIN Yu-hua, YANG Ning, et al(徐宝鼎, 秦玉华, 杨 宁, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2019, 39(3): 717.

【5】WANG Yong, WANG Li-fu, ZOU Hui, et al(王 勇, 王李福, 邹 辉, 等). Computer Engineering and Design(计算机工程与设计), 2018, 378(6): 127.

【6】Moayedikia A, Ong K L, Boo Y L, et al. Engineering Applications of Artificial Intelligence, 2017, 57(C): 38.

【7】Enayatifar R, Yousefi M, Abdullah A H, et al. Communications in Nonlinear Science & Numerical Simulation, 2013, 18(12): 3481.

【8】ZHAI Jun-chang, GAO Li-qun, OUYANG Hai-bin, et al(翟军昌, 高立群, 欧阳海滨, 等). Control and Decision(控制与决策), 2015, 30(11): 1953.

【9】Khalili M, Kharrat R, Salahshoor K, et al. Applied Mathematics & Computation, 2014, 228(9): 195.

【10】Sutskever I, Hinton G E. Neural Computation, 2014, 20(11): 2629.

【11】OUYANG Hai-bin, GAO Li-qun, ZOU De-xuan, et al(欧阳海滨, 高立群, 邹德旋, 等). Control Theory and Applications(控制理论与应用), 2014, 31(1): 57.

【12】JIANG Hong, SU Yang(江 虹, 苏 阳). Laser and Infrared(激光与红外), 2016, 46(1): 119.

【13】Abdelgayed T S, Morsi W G, Sidhu T S. IEEE Transactions on Smart Grid, 2018, 9(2): 521.

【14】LIU Yan, CAI Wen-sheng, SHAO Xue-guang(刘 言, 蔡文生, 邵学广). Chinese Science Bulletin(科学通报), 2015, (8): 704.

引用该论文

ZHANG Lei,DING Xiang-qian,GONG Hui-li,WU Li-jun,BAI Xiao-li,LUO Lin. Research on Near Infrared Spectral Feature Variable Selection Method Based on Improved Harmonic Search Algorithm[J]. Spectroscopy and Spectral Analysis, 2020, 40(6): 1869

张磊,丁香乾,宫会丽,吴丽君,白晓莉,罗林. 改进和声搜索算法的近红外光谱特征变量选择[J]. 光谱学与光谱分析, 2020, 40(6): 1869

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF