首页 > 论文 > 光谱学与光谱分析 > 35卷 > 7期(pp:1830--1)

基于强影响度的近红外奇异样本识别算法研究

Study on an Algorithm for Near Infrared Singular Sample Identification Based on Strong Influence Degree

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

校正样本选择以及奇异样本剔除对于近红外光谱定量和定性建模非常重要.现有的识别奇异样本的方法一般都基于数据重心估计,需要一个经验的判断阈值,在很大程度上限制了其识别准确性和实用性.针对现有方法奇异样本识别准确率低的问题,改进了一种现有度量尺度-杠杆值,构造出一种新的基于强影响度的奇异样本识别算法.这种度量尺度在一定程度上减少了对数据重心的依赖,使正常样本更加聚集,拉开了奇异样本与正常样本的距离;同时,为了避免人工根据经验设定阈值的不合理性,引入统计学领域中跳跃度的概念,提出了一种自动阈值设定方法判别奇异样本.为了验证该算法的有效性,利用马氏距离、杠杆值-光谱残差法与该算法分别对200个代表性校正集样本中的异常样品进行剔除,然后通过偏最小二乘法(PLS)对剩余的校正集样本(以烟碱为指标)定量建模,并对60个代表性测试集样本进行预测,以交互验证均方根误差(RMSECV)、相关系数(r)和预测均方根误差(RMSEP)为评价指标比较各算法的优劣.实验对比结果表明,基于强影响度的奇异样本识别算法较现有方法明显提高了奇异样本识别的准确率,具有较低的RMSECV(0.104),RMSEP(0.112)以及较高的R(0.983),提高了模型的稳定性和预测能力。

Abstract

Correcting sample selection and elimination of singular sample is very important for the quantitative and qualitative modeling of near infrared spectroscopy.However,methods for identification of singular sample available are generally based on data center estimates which require an experience decision threshold,this largely limit its recognition accuracy and practicability.Aiming at the low accuracy of the existing methods of singular sample recognition problem,this paper improves the existing metric - Leverage value and presents a new algorithm for near infrared singular sample identification based on strong influence degree.This metric reduces the dependence on the data center to a certain extent,so that the normal samples become more aggregation,and the distance between the singular samples and the normal samples is opened;at the same time,in order to avoid artificial setting threshold unreasonably according to experience,this paper introduces the concept of the jump degree in the field of statistics,and proposes an automatic threshold setting method to distinguish singular samples.In order to verify the validity of our algorithm,abnormal samples of 200 representative samples were eliminated in the calibration set with using Mahalanobis distance,Leverage- Spectral residual method and the algorithm presented in this paper respectively;then through partial least squares(PLS),the rest of the calibration samples were made quantitative modelings(took Nicotine as index),and the results of quantitative modelings were made a comparative analysis;besides,60 representative testing samples were made a prediction through the modelings;at last,all the algorithms above were made a comparison with took Root Mean Square Error of Cross Validation(RMSECV),Correlation Coefficient(r) and Root Mean Square Error of Prediction(RMSEP) as evaluation Index.The experimental results demonstrate that the algorithm for near infrared singular sample identification based on strong influence degree significantly improves the accuracy of singular sample identification over existing methods.With lower RMSECV(0.104),RMSEP(0.112) and higher r(0.983),it also contribute to boost the stability and prediction ability of the model.

广告组1 - 空间光调制器+DMD
补充资料

中图分类号:O657.3

DOI:10.3964/j.issn.1000-0593(2015)07-1830-05

基金项目:国家科技支撑计划项目(2012BAF12B06)和青岛市科技计划项目课题(12-4-1-9-gx)资助

收稿日期:2014-06-01

修改稿日期:2014-09-05

网络出版日期:--

作者单位    点击查看

吴兆娜:中国海洋大学信息科学与工程学院, 山东 青岛 266100
丁香乾:中国海洋大学信息工程中心, 山东 青岛 266071
宫会丽:中国海洋大学信息科学与工程学院, 山东 青岛 266100
董梅:山东临沂烟草有限公司, 山东 临沂 276000
王梅勋:山东临沂烟草有限公司, 山东 临沂 276000

联系人作者:宫会丽(huiligong@163.com)

备注:吴兆娜,1989年生,中国海洋大学信息科学与工程学院硕士研究生。 E-mail: wuzhaona.dy@163.com

【1】Philip Williams,Karl Norris.Near Infrared Technology in the Agriculture and Food Industries.2nd ed.Inc.St.,American Association of Cereal Chemists,Minnesota USA:AACC,2001.

【2】CHU Xiao-li,YUAN Hong-fu,LU Wan-zhen(褚小立,袁洪福,陆婉珍).Modern Scientific Instruments(现代科学仪器),2006,16(3):8.

【3】CHU Xiao-li(褚小立).Molecular Spectroscopy Analytical Technology Combined with Chemometrics and Its Applications(化学计量学方法与分子光谱分析技术).Beijing:Chemical Industry Press(北京:化学工业出版社),2011.77.

【4】YAN Yan-lu,ZHANG Lu-da,CHEN Bin,et al(严衍禄,张录达,陈 斌,等).Modern Instruments(现代仪器),2011,17(5):5.

【5】Nieuwoudt H H,Prior B A,Pretorius I S,et al.Agricultural and Food Chemistry,2004,52(12):3726.

【6】LENG Hong-qiong,GUO Ya-dong,LIU Wei,et al(冷红琼,郭亚东,刘 巍,等).Spectroscopy and Spectral Analysis(光谱学与光谱分析),2013,33(7):1801.

【7】ZHU Shi-ping,WANG Yi-ming,ZHANG Xiao-chao,et al(祝诗平,王一鸣,张小超,等).Transactions of the Chinese Society of Agricultural Machinery(农业机械学报),2004,35(4):115.

【8】CHEN Bin,ZOU Xian-yong,ZHU Wen-jing(陈 斌,邹贤勇,朱文静).Journal of Jiangsu University·Natural Science Edition(江苏大学学报·自然科学版),2008,29(4):277.

【9】YANG Hu,SHAO Hua(杨 虎,邵 华).Chinese Journal of Engineering Mathematics(工程数学学报),2009,26(1):123.

【10】ZHANG De-ran(张德然).Statistical Research(统计研究),2003,5:53.

引用该论文

WU Zhao-na,DING Xiang-qian,GONG Hui-li,DONG Mei,WANG Mei-xun. Study on an Algorithm for Near Infrared Singular Sample Identification Based on Strong Influence Degree[J]. Spectroscopy and Spectral Analysis, 2015, 35(7): 1830

吴兆娜,丁香乾,宫会丽,董梅,王梅勋. 基于强影响度的近红外奇异样本识别算法研究[J]. 光谱学与光谱分析, 2015, 35(7): 1830

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF