光谱学与光谱分析, 2023, 43 (1): 239, 网络出版: 2023-03-28  

集成学习结合波长选取的有机物红外光谱定量回归方法研究

Research on Quantitative Regression Method of IR Spectra of Organic Compounds Based on Ensemble Learning With Wavelength Selection
作者单位
1 安徽大学互联网学院, 安徽 合肥 230039
2 合肥工业大学计算机与信息学院, 安徽 合肥 230009
3 中国科学院合肥物质科学研究院, 安徽 合肥 230031
摘要
研究集成学习方法在有机物红外光谱定量分析中的应用及特征波长选取方法对红外光谱集成学习建模效率和预测精度的影响。 以柴油红外光谱的十六烷和总芳香烃含量为研究对象, 首先采用极端随机森林(ERT)、 线性核支持向量机(LinearSVM)、 径向基核支持向量机(RBFSVM)和多项式核支持向量机(polySVM)作为基学习器, LinearSVM作为元学习器建立两层Stacking集成学习框架, 分析比较单个基学习器与集成学习对柴油红外光谱的定量回归预测精度, 与偏最小二乘(PLS)定量回归模型相比, Stacking集成学习模型对柴油光谱的两种有机物含量的预测精度均有提升, 其中十六烷含量的ERT模型预测结果最优(r=0.848, RMSEP=1.603, RDP=2.627), 总芳香烃含量的Stacking模型预测结果最优(r=0.991, RMSEP=0.526, RDP=9.243); 进一步利用组合偏最小二乘(SiPLS)和连续投影算法(SPA)对红外光谱进行特征波长选取, 利用优选出的红外光谱特征波长建立集成学习定量回归模型, 其中十六烷含量的SiPLS-ERT模型预测结果最优(r=0.893, RMSEP=1.013, RDP=3.051), 芳香烃含量的SiPLS-Stacking模型预测结果最优(r=0.998, RMSEP=0.354, RDP=11.475), 且模型平均训练时间较全光谱训练时间减少50%以上, 建模速度明显提高。 研究结果表明, 特征波长结合集成学习定量回归建模能够用于有机物红外光谱的定量分析中, 与传统定量回归方法相比, 该方法的建模效率和预测精度均有较大提高, 为进一步研究机器学习在光谱定量分析中的应用提供相关方法支持。
Abstract
The application of the ensemble learning method in the quantitative analysis of organic infrared spectra and the influence of the characteristic wavelength selection method on the modeling efficiency and prediction accuracy of infrared spectra ensemble learning is studied. Taking the cetane number and total aromatic hydrocarbon content of diesel infrared spectra as the research object, firstly, a two-layer stacking ensemble learning framework is established by using extreme random forest (ERT), linear kernel support vector machine (LinearSVM), radial basis kernel support vector machine (RBFSVM) and polynomial kernel support vector machine (polySVM) as baselearners, and LinearSVM as meta-learners. The quantitative regression accuracy of diesel infrared spectra by single base learners and ensemble learning model is analyzed and compared. Compared with the partial least squares (PLS) quantitative regression model, the prediction accuracy of the Stacking ensemble learning model for two organic compounds in diesel spectra is improved. The ERT model for cetane number content is the best (r=0.848, RMSEP=1.603, RDP=2.627), the prediction result of Stacking model for total aromatic content is the best (r=0.991, RMSEP=0.645, RDP=9.243). Further, the characteristic wavelengths of infrared spectra are selected using the combined partial least squares (SiPLS) and successive projections algorithm (SPA), and the ensemble learning quantitative regression model is established using the selected characteristic wavelengths. Among them, the prediction results of the SiPLS-ERT model for cetane number content are the best (r=0.893, RMSEP=1.013, RDP=3.051), and the prediction results of the SiPLS-Stacking model for total aromatic content are the best (r=0.998, RMSEP=0.354, RDP=11.475), and the average training time of the model is reduced by more than 50% compared with the full spectra training time, and the modeling speed is significantly improved. The results show that the characteristic wavelengths combined with ensemble learning quantitative regression modeling can be used in the quantitative analysis of organic infrared spectra. Compared with the traditional quantitative regression method, the modeling efficiency and prediction accuracy of this method are greatly improved, which provides relevant method support for the further study of the application of machine learning in the quantitative analysis of spectra.

鞠薇, 鲁昌华, 张玉钧, 陈晓静, 蒋薇薇. 集成学习结合波长选取的有机物红外光谱定量回归方法研究[J]. 光谱学与光谱分析, 2023, 43(1): 239. JU Wei, LU Chang-hua, ZHANG Yu-jun, CHEN Xiao-jing, JIANG Wei-wei. Research on Quantitative Regression Method of IR Spectra of Organic Compounds Based on Ensemble Learning With Wavelength Selection[J]. Spectroscopy and Spectral Analysis, 2023, 43(1): 239.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!