首页 > 论文 > 光谱学与光谱分析 > 37卷 > 4期(pp:1095-1099)

基于特征分层选择和融合度相结合的近红外光谱多类识别度量算法研究

Study on an Algorithm for Near Infrared Spectrum Multiclass Identification and Measurement Based on Feature Hierarchical Selection and Sample Fusion Degree

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

针对高维空间下获取最优特征子集异常复杂和模型识别准确率较低的问题, 提出了基于特征分层选择和融合度相结合的近红外光谱多类识别度量算法。 首先引入跳跃度, 构造了一种特征分层方法, 将所有特征依据对样本的重要性程度划分不同的特征子集, 从而避免了从原始特征数据逐个剔除无关特征构建特征子集的繁琐过程; 同时又改进了样本的融合度, 将其代替K最近邻分类器(KNN)中依据概率进行类别判断的方式, 提高了分类器的识别精度, 较好地解决了多类识别准确率较低的问题。 为验证该算法的有效性, 选取五类具有代表性382个烟叶样品为实验对象, 构建了烟叶产地识别度量模型, 并选取64个样本进行了模型测试, 以预测均方根误差(RMSEP)、 交互验证均方差(RMSECV)和相关系数(r)作为模型稳健性的评价指标, 以产地识别准确率作为算法优劣评价标准。 仿真实验结果表明, 利用该算法构建的模型具有较低的RMSEP(0.117), RMSECV(0.106)和较高的r(0.973), 平均识别准确率达到98.44%, 性能明显优于其他算法, 该算法对于高维光谱数据具有良好的识别性能。

Abstract

Aiming at solving the difficulty of getting the best feature subset from high dimensional and the low identification accuracy of existing models, this paper proposes an algorithm for near infrared spectrum identification and measurement based on feature hierarchical selection and sample fusion degree. The paper firstly introduces the concept of jump degree, and proposes a feature hierarchical method to divide all the features into different subsets in terms of their importance to sample, which avoid the complicated process of deleting unrelated features one by one when constructing feature subset from the original feature data; At the same time, this paper improves sample fusion degree, while regarding it as the category judgment type of the improved KNN algorithm that take the place of probability, which has increased the precision of multiclass identification. The low identification accuracy was solved better though it. In order to verify the validity of our algorithm, five kinds of 382 representative tobacco samples were chosen as the experimental objects to build tobacco producing area identification models and 64 tobacco samples were chose as test samples; At last, with Root Mean Square Error of Prediction (RMSEP), Root Means Square Error of Cross Validation (RMSECV) and Correlation Coefficient (r) as the evaluation index of stability and identification accuracy as evaluation standard, the algorithm above made a comparison with other algorithms. The experimental results show that the model constructed by our algorithm has better stability with lower RMSEP (0.117), RMSECV (0.106) and higher r (0.973). The identification accuracy of our algorithm is the highest, reaching at 98.44%. The algorithm proposed in this paper has an excellent identification performance for high dimensional spectral data.

广告组1 - 空间光调制器+DMD
补充资料

中图分类号:O657.3

DOI:10.3964/j.issn.1000-0593(2017)04-1095-05

基金项目:国家科技支撑计划课题(2015BAF12B01)资助

收稿日期:2015-12-04

修改稿日期:2016-05-16

网络出版日期:--

作者单位    点击查看

朱 成:中国海洋大学信息科学与工程学院, 山东 青岛 266100
宫会丽:中国海洋大学信息科学与工程学院, 山东 青岛 266100
丁香乾:中国海洋大学信息科学与工程学院, 山东 青岛 266100
侯瑞春:中国海洋大学信息科学与工程学院, 山东 青岛 266100

联系人作者:朱成(zc76ai@126.com)

备注:朱 成, 1991年生, 中国海洋大学信息科学与工程学院硕士研究生

【1】GAO Rong-qiang, FAN Shi-fu(高荣强, 范世福). Analytical Instrumentation(分析仪器), 2002, (3): 9.

【2】Philip Williams, Karl Norris. Near Infrared Technology in the Agriculture and Food Industries. 2nd ed. Inc. St., American Association of Cereal Chemists, Minnesota USA: AACC, 2001.

【3】CHU Xiao-li, YUAN Hong-fu, LU Wan-zhen(褚小立, 袁洪福, 陆婉珍). Analytical Instrumentation(分析仪器), 2006, (2): 1.

【4】CHU Xiao-li, LU Wan-zhen(褚小立, 陆婉珍). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2014, 34(10): 2595.

【5】JIANG Jin-feng, LI Li, ZHAO Ming-yue, et al(蒋锦锋, 李 莉, 赵明月, 等). Acta Tabacaria Sinica(中国烟草学报), 2006, 12(2): 8.

【6】Ni Lijin, Zhang Liguo, Xie Juan, et al. Analytical Chemica Acta, 2009, 633: 43.

【7】SHU Ru-xin, SUN Ping, YANG Kai, et al(束茹新, 孙 平, 杨 凯, 等). Tobacco Science(烟草科技), 2011, 11: 50.

【8】ZHAO Hai-dong, SHEN Jin-yuan, LIU Run-jie, et al(赵海东, 申金媛, 刘润洁, 等). Infrared Technology(红外技术), 2013, 35(10): 659-664.

【9】QIN Yu-hua, DING Xiang-qian, GONG Hui-li(秦玉华, 丁香乾, 宫会丽). Infrared and Laser Engineering(红外与激光工程), 2013, 42(5): 1355.

【10】Leo Breiman. Random Forests. Machine Learning, 2001, 45(1): 5.

【11】ZHANG De-ran(张德然). Statistical Research(统计研究), 2003, 5: 53.

【12】YI Jun-kai, ZHANG Ya-cong, SUN Jian-wei(易军凯, 张雅聪, 孙建伟). Computer Engineering and Applications(计算机工程与应用), 2011, 16(3): 76.

【13】Simon Bernard, Laurent Heutte, Sebastien Adam. Lecture Notes in Computer Science, 2009, 5519: 171.

引用该论文

ZHU Cheng,GONG Hui-li,DING Xiang-qian,HOU Rui-chun. Study on an Algorithm for Near Infrared Spectrum Multiclass Identification and Measurement Based on Feature Hierarchical Selection and Sample Fusion Degree[J]. Spectroscopy and Spectral Analysis, 2017, 37(4): 1095-1099

朱 成,宫会丽,丁香乾,侯瑞春. 基于特征分层选择和融合度相结合的近红外光谱多类识别度量算法研究[J]. 光谱学与光谱分析, 2017, 37(4): 1095-1099

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF