光谱学与光谱分析, 2020, 40 (5): 1495, 网络出版: 2020-12-09  

红外光谱的随机森林算法与数据融合策略对绒柄牛肝菌产地鉴别

Infrared Spectral Study on the Origin Identification of Boletus Tomentipes Based on the Random Forest Algorithm and Data Fusion Strategy
作者单位
1 云南农业大学资源与环境学院, 云南 昆明 650201
2 云南农业大学农学与生物技术学院, 云南 昆明 650201
3 云南省农业科学院药用植物研究所, 云南 昆明 650200
摘要
绒柄牛肝菌(Boletus tomentipes Earle)是一种健康食品, 受广大消费者的青睐, 其子实体营养物质积累量受生长环境(海拔、 气候等)影响, 不同产地间营养物质含量差异显著, 为去劣存优, 急需建立一种准确、 快速、 廉价的产地鉴别技术。 采用数据融合策略结合随机森林算法(RF)对绒柄牛肝菌的产地进行鉴别, 比较了多种特征值提取方法对RF模型分类效果的影响。 扫描来自4个产地(北亚热带、 北温带、 南亚热带、 中亚热带)87个样品不同部位的傅里叶变换近红外光谱和傅里叶变换中红外光谱, 分析其光谱特征。 通过Kennard-Stone算法将所有样品划分为2/3的训练集(58)和1/3的验证集(29), 基于4种红外光谱(近红外的菌柄(N-b)、 近红外的菌盖(N-g)、 中红外的菌柄(M-b)、 中红外的菌盖(M-g))与三种数据融合策略(低级融合、 中级融合、 高级融合)的数据, 结合RF建立产地鉴别模型, 比较了不同方法提取的特征值(投影重要性指标值、 Boruta、 潜在变量)对模型分类效果的影响。 其中, 根据袋外错误率(oob)选择最优ntree和mtry; 以特异性、 灵敏度、 训练集正确率和验证集正确率评价模型分类性能, 综合多种评价指标, 找出绒柄牛肝菌产地鉴别的最佳方法。 结果表明: (1)近红外和中红外光谱均能反映不同产地绒柄牛肝菌间存在的细微差异。 (2)单一光谱结合RF建立判别模型效果不理想。 (3)三种融合策略均可提高绒柄牛肝菌的产地鉴定效果, 产地鉴别效果优劣依次为高级融合、 中级融合、 低级融合。 通过扫描绒柄牛肝菌近红外和中红外光谱, 采用基于特征值LV的高级融合策略, 结合RF建立不同产地绒柄牛肝菌鉴别模型, 有高验证集正确率(99.6%), 高灵敏度(0.969), 高特异性(0.986), 实现了绒柄牛肝菌产地的准确、 快速、 廉价鉴别, 可以作为绒柄牛肝菌产地溯源的一种可靠方法。
Abstract
Boletus tomentipes Earleas a kind of healthy food is favored by the majority of consumers. The nutrient accumulation of the fruiting body is affected by the growth environment (altitude, climate, etc. ). There is a significant difference in the content of nutrient between different regionsIt is urgent to establish an accurate, rapid and cheap origin identification technology. In this paper, a data fusion strategy combined with random forest algorithm (RF) was used to identify the origin of B. tomentipes, and the effects of various eigenvalue extraction methods on the classification of RF models were compared. Fourier transform near infrared and Fourier transform mid-infrared spectra of 87 samples from 4 producing areas (north subtropics, north temperate zones, south subtropical zones and middle subtropical zones) were scanned to analyze their spectral characteristics. All the sampleswere divided into two thirds of the training set (58) and a third of the validation set (29) by the kennard-stone algorithm. Based on 4 kinds of infrared spectra ( near-infrared average spectra of stipes (N-b), near-infrared average spectra of caps (N-g), mid-infrared average spectra of stipes (M-b), mid-infrared average spectra of caps (M-g)) and three data fusion strategies (low-level fusion strategies, mid-level fusion strategies, high-level fusion strategies) of data, combining with the RF building identification model, the effects of different characteristic value (variable importance in projection, Boruta, latent variables) on the classification results of the model are compared. Among them, the optimal ntree and mtrywere selected according to oob. The classification performance of the model was evaluated with specificity, sensitivity, training set correctness, and validation set accuracy. Finally, the best method to identify the origin of B. tomentipes was found by multiple evaluation indicators. The results showed that (1) near infrared and middle infrared spectra could identify the origin of B. tomentipes. (2) It is not ideal for establish a discriminant model with a single spectrum combined with RF. (3) All three fusion strategies can improve the origin identification effect of B. tomentipes. Theresults of origin identification from good to bad are in order of high-level fusion, mid-level fusion, low-level fusion. By scanning the near infrared and middle infrared spectra of B. tomentipes, a high-level fusion strategy based on characteristic value LV was adopted, and the identification model of B. tomentipes from different regions was established with RF, which has high verification set accuracy (99.6%), high sensitivity (0.969) and high specificity (0.986). As a reliable method, it can identify the geographical origin of B. tomentipes quickly and accurately.
参考文献

[1] Wang X, Zhang J, Wu L, et al. Food Chemistry, 2014, 151: 279.

[2] YANGBAI Qiu-xiu, CHEN Xun, LIU Xiao-fei(杨白秋秀, 陈 旭, 刘晓飞). Edible Fungi of China(中国食用菌), 2017, 36(5): 13.

[3] LU Yong-xin, TIAN Hou-ming, YANG Hai-shu, et al(鲁永新, 田侯明, 杨海抒, 等). Chinese Journal of Eco-Agriculture(中国生态农业学报), 2015, 23(6): 748.

[4] Falandysz J, Zhang J, Wiejak A, et al. Ecotoxicology and Environmental Safety, 2017, 142: 497.

[5] YANG Tian-wei, CUI Bao-kai, ZHANG Ji, et al(杨天伟, 崔宝凯, 张 霁, 等). Mycosystema(菌物学报), 2014, 33(2): 262.

[6] Chen Y, Yan Y, Xie M, et al. Journal of Pharmaceutical and Biomedical Analysis, 2008, 47(3): 469.

[7] Wang X, Zhang J, Li T, et al. Journal of Analytical Methods in Chemistry, 2015, 2015: http://dx.doi.org/10.1155/2015/165412.

[8] Li Y, Zhang J, Wang Y. Analytical and Bioanalytical Chemistry, 2018, 410(1): 91.

[9] Wang Y, Zuo Z T, Huang H Y, et al. Royal Society Open Science, 2019, 6(5): 190399.

[10] He P, Xu X, Zhang B, et al. Estimation of Leaf Chlorophyll Content in Winter Wheat Using Variable Importance for Projection (VIP) with Hyperspectral Data. Proceedings of SPIE, 2015, 9637: 963708.

[11] CHEN Yi-jie, TANG Jia-shan(陈逸杰, 唐加山). Software Guide(软件导刊), 2019, 18(4): 69.

[12] Mellado-Mojica E, López M G. Food Chemistry, 2015, 167: 349.

胡翼然, 李杰庆, 刘鸿高, 范茂攀, 王元忠. 红外光谱的随机森林算法与数据融合策略对绒柄牛肝菌产地鉴别[J]. 光谱学与光谱分析, 2020, 40(5): 1495. HU Yi-ran, LI Jie-qing, LIU Hong-gao, FAN Mao-pan, WANG Yuan-zhong. Infrared Spectral Study on the Origin Identification of Boletus Tomentipes Based on the Random Forest Algorithm and Data Fusion Strategy[J]. Spectroscopy and Spectral Analysis, 2020, 40(5): 1495.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!