光谱学与光谱分析, 2018, 38 (8): 2390, 网络出版: 2018-08-26  

基于FTIR技术和稀疏线性判别分析的秦艽种类鉴别

Identification of Gentiana Macrophylla by FTIR Technology and Sparse Linear Discriminant Analysis
作者单位
1 甘肃中医药大学信息工程学院, 甘肃 兰州 730000
2 甘肃省高校中(藏)药化学与质量研究省级重点实验室, 甘肃 兰州 730000
摘要
傅里叶变换红外光谱通常包含有大量的波长变量点, 对其进行定性分析需要建立稳健的、 可解释性的分类模型。 稀疏线性判别分析(SLDA)是一种较为新颖和有效的机器学习算法, 常用于高维度、 小样本数据的变量筛选和判别分析, SLDA通过在线性判别分析中引入正则项, 使分类器训练过程和变量选择过程同时完成, 不同判别方向上载荷系数的稀疏性则增强了模型的可解释性。 采集甘肃不同产地的秦艽样本94个, 其中麻花秦艽(Gentiana straminea Maxim)30个, 黄管秦艽(Gentiana officinalis)28个, 大叶秦艽(Gentiana macrophylla Pall)36个, 利用傅里叶变换红外光谱法获得所有样本的光谱图。 取其中70个样本构成训练集, 剩余24个为测试集。 使用训练集建立SLDA模型, 对2个判别方向上不为0的载荷系数个数进行网格化寻优, 得到了最优的参数空间。 利用建立的SLDA模型对测试样本进行预测, 其分类准确率达到100%, 实现了对三种秦艽的快速、 准确鉴别。 实验结果表明, 与PLS-DA方法相比, SLDA模型在分类准确率、 稀疏性及可解释性方面均具有一定优势, 是一种新颖、 有效的光谱定性分析方法。
Abstract
Fourier transform infrared(FTIR) spectrum usually includes a large number of wavelength variables and the qualitative analysis of FTIR spectrum needs to establish a stable and interpretable classification model. Sparse linear discriminant analysis (SLDA), a relatively new and effective machine learning algorithm, is commonly used for variable selection and discriminant analysis of high-dimensional settings, in which the number of wavelength variable is very large and the number of observations is limited. By introducing regularization items into linear discriminant analysis, the classifier training and variable selection are performed simultaneously in SLDA, and the sparsity of load coefficients in different discriminant directions increases the interpretability of the model. A total of 94 samples of Gentiana macrophylla, including 30 Gentiana straminea Maxims, 28 Gentiana officinalis and 36 Gentiana macropylla Pall, were collected. FTIR spectrum of all samples was obtained by Fourier transform infrared spectroscopy method. 70 of the samples were selected as the training set, the remaining as the test set. Based on the training set, the SLDA model was established through the grid optimization of the number of non-zero loading coefficients in the two discriminant directions, and the optimal parameter space was obtained. According to the model parameters, the prediction accuracy of the test set was 100%, and thus the rapid and accurate identification of the three kinds of Gentiana macrophylla was realized. The experimental results showed that the SLDA model was superior to PLS-DA method in terms of classification accuracy, sparseness and interpretability. SLDA will be a novel and effective method for spectroscopy qualitative analysis.
参考文献

[1] HUANG Lu-lin, YANG Xiao, FENG Xian-hong, et al(黄璐琳, 杨 晓, 丰先红, 等). Modern Chinese Medicine(中国现代中药), 2011, 13(5): 40.

[2] XIONG Bo, GUO Shu-peng, HU Lin(熊 波, 郭树鹏, 胡 林). Chinese Journal of Experimental Traditional Medical Formulae(中国实验方剂学杂志), 2015, 21(17): 230.

[3] De Luca M, Terouzi W, Ioele G, et al. Food Chemistry, 2011, 124(3): 1113.

[4] WU Zhe, ZHANG Ji, JIN Hang, et al(吴 喆, 张 霁, 金 航, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2017, 37(6): 1754.

[5] Shao J, Wang Y, Deng X, et al. The Annals of Statistics, 2011, 39(2): 1241.

[6] Kong H, Lai Z, Wang X, et al. Neurocomputing, 2016, 177: 198.

[7] Clemmensen L, Hastie T, Witten D, et al. Technometrics, 2011, 53(4): 406.

[8] Rasmussen M A, Bro R. Chemometrics and Intelligent Laboratory Systems, 2012, 119: 21.

[9] Zou H, Hastie T. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005, 67(2): 301.

[10] Efron B, Hastie T, Johnstone I, et al. The Annals of Statistics, 2004, 32(2): 407.

[11] Liu C, Yang S X, Deng L. Expert Systems with Applications, 2015, 42(22): 8497.

李四海, 余晓晖, 赵磊, 晋玲. 基于FTIR技术和稀疏线性判别分析的秦艽种类鉴别[J]. 光谱学与光谱分析, 2018, 38(8): 2390. LI Si-hai, YU Xiao-hui, ZHAO Lei, JIN Ling. Identification of Gentiana Macrophylla by FTIR Technology and Sparse Linear Discriminant Analysis[J]. Spectroscopy and Spectral Analysis, 2018, 38(8): 2390.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!