光谱学与光谱分析, 2019, 39 (3): 717, 网络出版: 2019-03-19   

特征分层结合改进粒子群算法的近红外光谱特征选择方法研究

Study on Feature Selection of Near Infrared Spectra Based on Feature Hierarchical Combining Improved Particle Swarm Optimization
作者单位
1 中国海洋大学信息科学与工程学院, 山东 青岛 266100
2 青岛科技大学信息科学技术学院, 山东 青岛 266061
3 云南中烟工业有限责任公司技术中心, 云南 昆明 650024
摘要
在近红外光谱数据定量建模中, 数据的高冗余和高噪严重影响了建模的稳健性和精确性, 因此提出了一种特征分层结合改进粒子群算法(PSO)的特征光谱选择方法。 首先通过互信息度量特征的重要性得分, 并按特征的重要性降序排序, 有效避免了因采用降维方法得到主成分而引起的丢失重要信息的问题。 其次, 引入了跳跃度概念, 并构造了一种特征分层的方法, 重要性程度相似的特征并入同一个特征子集, 将降序排列的特征集分割为不同的特征子集, 避免了筛选特征过程中因人为设定特征重要性得分阈值而导致的不确定性。 最后, 采用收敛速度快、 控制参数少的粒子群算法作为最优特征子集的优化方法, 同时对粒子群算法做了两方面改进: 引入混沌模型增加种群的多样性, 提高了PSO的全局搜索能力, 避免陷入局部最优; 将特征数目引入到适应度函数中, 在迭代前期通过惩罚因子调节特征数目对适应度函数的影响, 提高了算法的适应能力。 将分层后的数据以特征子集为单位, 依次累加并作为改进粒子群算法的输入, 从而选择出高辨别力的特征子集。 以烟碱指标为例进行了特征选择过程的描述, 实验采用尼高力公司的Antaris Ⅱ近红外光谱仪进行近红外光谱数据的采集, 光谱扫描范围为4 000~10 000 cm-1。 首先, 利用互信息理论计算全光谱1 557个特征对待测指标定量建模的重要性得分, 得分取30次实验的均值。 其次, 将所有特征按照重要性得分降序排序, 计算所有特征的跳跃度, 依据跳跃度寻找特征分层的临界点, 将特征划分到不同的特征层中, 构建了包含8个特征子集的特征集合S={S′1, S′2, S′3, S′4, S′5, S′6, S′7, S′8}。 然后, 依次将特征子集S′1, {S′1, S′2}, {S′1, S′2, S′3}, …, {S′1, S′2, S′3, S′4, S′5, S′6, S′7, S′8}作为初始粒子群的候选集, 以R/(1+RMSEP)作为特征子集优劣的评价标准, 各自重复实验50次, 比值最大的特征子集即为最优特征子集。 为验证该算法的有效性, 选取了具有代表性烟叶近红外光谱数据作为训练集和测试集, 建立了烟碱、 总糖两个指标的PLS定量模型, 并分别与全光谱、 分层后的特征光谱、 粒子群算法选出的特征光谱进行了比较。 仿真结果表明, 本算法所选特征烟碱、 总糖的建模相关系数r分别为0.988 5和0.982 2, 交互验证均方差RMSECV分别为0.098 4和0.889 3, 预测均方根误差RMSEP分别为0.100 7和0.901 6, 模型准确率均明显高于其他三种方法。 从所选特征数来看, 该算法所选特征数最少, 有效剔除了原特征集中的弱相关和噪声、 冗余信息, 所建模型的主因子数最少, 降低了模型的复杂性, 模型更加稳健, 适应性更广。
Abstract
In the quantitative modeling of near-infrared spectroscopy data, the high redundancy and high noise of the data severely affect the robustness and accuracy of the modeling. Therefore, this paper presents a feature-based spectroscopy combined with improved Particle Swarm Optimization (PSO) Method of choosing. First, we measure the importance score of each feature through mutual information, and then sort the features according to the importance of the features in descending order. This effectively avoids the problem of losing important information caused by using the principal component reduction method. Secondly, the concept of jump degree is introduced and a method of feature stratification is constructed. Similar features of similar importance are merged into the same feature subset, and the descending ordered feature set is segmented into different feature subsets, avoiding the screening uncertainty caused by artificially setting the score of feature importance score during feature process. Finally, the particle swarm optimization algorithm with fast convergence rate and few control parameters is used as the optimal feature subset optimization method. At the same time, particle swarm optimization is improved in two aspects: The chaotic model is introduced to increase the diversity of the population and improve the global searching ability of PSO, so as to avoid getting into local optimum. The number of features is introduced into the fitness function, and the influence of the number of features on the fitness function is adjusted by the penalty factor in the early iteration to improve the adaptability of the algorithm. The stratified data is collected as a feature subset and then added as a modified particle swarm optimization algorithm to select the high-resolution feature subset. In this paper, the nicotine index as an example of the feature selection process is described, using Nicolet company Antaris II near infrared spectrometer near infrared spectrum data acquisition, spectrum scanning range is 4 000~10 000 cm-1. First, we use the mutual information theory to calculate the importance score of 1 557 features of the whole spectrum on the quantitative modeling of the index to be measured, and take the average of 30 experiments. Secondly, all the features are sorted in descending order of importance scores to calculate the jumping degree of all the features. According to the jumping degree, the critical points of the feature stratification are searched, and the features are divided into different feature layers to construct a feature containing 8 feature subsets set S={S′1, S′2, S′3, S′4, S′5, S′6, S′7, S′8}. Then, the feature subset is in turn {S′1}, {S′1, S′2}, {S′1, S′2, S′3}, …, {S′1, S′2, S′3, S′4, S′5, S′6, S′7, S′8} as a candidate for initial particle swarm. With R/(1+RMSEP) as the evaluation criteria of the pros and cons of feature subsets, each iterative experiment 50 times, the ratio of the largest feature subset is the optimal feature subset. In order to verify the effectiveness of this algorithm, we select representative tobacco near-infrared spectral data as a training set and a test set, establish a PLS quantitative model of nicotine and total sugar, and compare with the full-spectrum, stratified characteristic spectrum, particle swarm algorithm selected by the characteristic spectra. The simulation results show that the modeling correlation coefficients R of nicotine and total sugar selected by this algorithm are respectively 0.988 5 and 0.982 2, RMSECV of mutual verification are 0.098 4 and 0.889 3 respectively, RMSEP of prediction root mean square error are 0.901 6 and 0.100 7 respectively, Accuracy are significantly higher than the other three methods. From the selected number of features, the proposed algorithm has the least number of selected features, effectively eliminating the weak correlation and noise and redundant information in the original feature set, minimizing the number of main factors of the model and reducing the complexity of the model, and the model is steadier, more adaptable.

徐宝鼎, 秦玉华, 杨宁, 高锐, 苑程程. 特征分层结合改进粒子群算法的近红外光谱特征选择方法研究[J]. 光谱学与光谱分析, 2019, 39(3): 717. XU Bao-ding, QIN Yu-hua, YANG Ning, GAO Rui, YUAN Cheng-cheng. Study on Feature Selection of Near Infrared Spectra Based on Feature Hierarchical Combining Improved Particle Swarm Optimization[J]. Spectroscopy and Spectral Analysis, 2019, 39(3): 717.

本文已被 2 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!