光谱学与光谱分析, 2016, 36 (11): 3523, 网络出版: 2016-12-30  

近红外光谱定量分析模型的样本影响研究

Study on Outliers Influence in NIR Quantitative Analysis Model
作者单位
1 长春理工大学, 吉林 长春 130022
2 吉林省科学技术信息研究所, 吉林 长春 130000
摘要
作为二次分析方法, 近红外光谱分析的重现性和可靠性非常依赖于建模过程。 以近红外光谱小麦蛋白质定量分析模型为例, 研究了多变量定标建模过程中异常样本问题, 旨在讨论复杂样本建模中的样本对模型的影响和作用。 以PLSR算法建模中校正方差与验证方差的解释百分比曲线的背离特性作为异常样本存在的判据, 当两个百分比曲线显著偏离时, 则认为样本集中存在异常样本, 并对建模产生了显著影响。 异常样本的识别和处理, 以及影响分析是本文主要的创新性工作, 采用了基于样本删除的子模型遍历统计方法, 能够渐次识别并提取出异常样本。 在剔除异常样本后的模型预测结果中, 以模型的预测残差标准差作为参考距离对异常样本进行了离群程度分级, 可分为显著离群样本, 相对离群样本以及潜在离群样本, 数据集中显著离群样本约占7.8%, 相对离群样本约占15.6%。 异常样本对模型的影响表现在对正常样本的预测残差上, 使预测值偏离理想拟合直线, 分散性增加。 剔除异常样本或以样本权重建模可有效抑制异常样本的影响, 使模型的解释性更偏向于多数样本数据, 降低模型的经验风险误差。
Abstract
As a secondary analysis method, reproducibility and reliability of near-infrared spectroscopy (NIRS) quantitative analysis are quite dependent on modelling process. In this paper, it is focused on outlier analysis for protein quantitative model of wheat based on NIRS. The purpose is to discuss the outlier effect in modelling process of complex sample set. The indicator of outliers is the deviation between two interpretative percentage curves in partial least squares regression (PLSR) modelling, when two percentage curves have significant deviation or departure point, the sample set should include the outliers. The innovative research work is the analysis and treatment of outliers. On the basis of sub-model ergodic calculation method, outliers can be gradually identified and picked-up. The standard deviation of model’s prediction residual is used as the reference graduation to distinguish the degree of deviation. According to the degree of deviation from sample population, outliers can also be divided into significant outliers, relative outliers and potential outliers. In this paper, the significant outliers of the sample set are about 7.8%, and the relative outliers are about 15.6%. The outliers will pull normal samples apart from the ideal fitting line and make the dispersity increase. No matter modelling with removed outliers or weighted samples, the purpose is to make the fitting results of quantitative analysis modelling more inclined to majority samples, while reducing or eliminating the impact of outliers.

郑峰, 刘丽莹, 刘小溪, 李野, 石晓光, 张国玉, 宦克为. 近红外光谱定量分析模型的样本影响研究[J]. 光谱学与光谱分析, 2016, 36(11): 3523. ZHENG Feng, LIU Li-ying, LIU Xiao-xi, LI Ye, SHI Xiao-guang, ZHANG Guo-yu, HUAN Ke-wei. Study on Outliers Influence in NIR Quantitative Analysis Model[J]. Spectroscopy and Spectral Analysis, 2016, 36(11): 3523.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!