光谱学与光谱分析, 2017, 37 (4): 1095, 网络出版: 2017-06-20  

基于特征分层选择和融合度相结合的近红外光谱多类识别度量算法研究

Study on an Algorithm for Near Infrared Spectrum Multiclass Identification and Measurement Based on Feature Hierarchical Selection and Sample Fusion Degree
作者单位
中国海洋大学信息科学与工程学院, 山东 青岛 266100
摘要
针对高维空间下获取最优特征子集异常复杂和模型识别准确率较低的问题, 提出了基于特征分层选择和融合度相结合的近红外光谱多类识别度量算法。 首先引入跳跃度, 构造了一种特征分层方法, 将所有特征依据对样本的重要性程度划分不同的特征子集, 从而避免了从原始特征数据逐个剔除无关特征构建特征子集的繁琐过程; 同时又改进了样本的融合度, 将其代替K最近邻分类器(KNN)中依据概率进行类别判断的方式, 提高了分类器的识别精度, 较好地解决了多类识别准确率较低的问题。 为验证该算法的有效性, 选取五类具有代表性382个烟叶样品为实验对象, 构建了烟叶产地识别度量模型, 并选取64个样本进行了模型测试, 以预测均方根误差(RMSEP)、 交互验证均方差(RMSECV)和相关系数(r)作为模型稳健性的评价指标, 以产地识别准确率作为算法优劣评价标准。 仿真实验结果表明, 利用该算法构建的模型具有较低的RMSEP(0.117), RMSECV(0.106)和较高的r(0.973), 平均识别准确率达到98.44%, 性能明显优于其他算法, 该算法对于高维光谱数据具有良好的识别性能。
Abstract
Aiming at solving the difficulty of getting the best feature subset from high dimensional and the low identification accuracy of existing models, this paper proposes an algorithm for near infrared spectrum identification and measurement based on feature hierarchical selection and sample fusion degree. The paper firstly introduces the concept of jump degree, and proposes a feature hierarchical method to divide all the features into different subsets in terms of their importance to sample, which avoid the complicated process of deleting unrelated features one by one when constructing feature subset from the original feature data; At the same time, this paper improves sample fusion degree, while regarding it as the category judgment type of the improved KNN algorithm that take the place of probability, which has increased the precision of multiclass identification. The low identification accuracy was solved better though it. In order to verify the validity of our algorithm, five kinds of 382 representative tobacco samples were chosen as the experimental objects to build tobacco producing area identification models and 64 tobacco samples were chose as test samples; At last, with Root Mean Square Error of Prediction (RMSEP), Root Means Square Error of Cross Validation (RMSECV) and Correlation Coefficient (r) as the evaluation index of stability and identification accuracy as evaluation standard, the algorithm above made a comparison with other algorithms. The experimental results show that the model constructed by our algorithm has better stability with lower RMSEP (0.117), RMSECV (0.106) and higher r (0.973). The identification accuracy of our algorithm is the highest, reaching at 98.44%. The algorithm proposed in this paper has an excellent identification performance for high dimensional spectral data.

朱成, 宫会丽, 丁香乾, 侯瑞春. 基于特征分层选择和融合度相结合的近红外光谱多类识别度量算法研究[J]. 光谱学与光谱分析, 2017, 37(4): 1095. ZHU Cheng, GONG Hui-li, DING Xiang-qian, HOU Rui-chun. Study on an Algorithm for Near Infrared Spectrum Multiclass Identification and Measurement Based on Feature Hierarchical Selection and Sample Fusion Degree[J]. Spectroscopy and Spectral Analysis, 2017, 37(4): 1095.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!