中国激光, 2020, 47 (11): 1111002, 网络出版: 2020-10-20
基于高光谱技术与机器学习的新疆红枣品种鉴别 下载: 1040次
Identification of Xinjiang Jujube Varieties Based on Hyperspectral Technique and Machine Learning
光谱学 高光谱技术 机器学习 品种鉴别 数据预处理 特征波段提取 spectroscopy hyperspectral technique machine learning variety identification data preprocessing characteristic band extraction
摘要
为实现对红枣品种的判别,利用高光谱技术并结合机器学习算法对金丝大枣、骏枣和滩枣这三个品种的新疆红枣进行研究。首先,分别利用多元散射校正(MSC)、标准正态变量变换(SNV)、一阶导(1-Der)和Savitzky-Golay(SG)平滑等数据预处理方法对原始光谱进行预处理,研究了预处理方法对建模的影响;然后,利用光谱-理化值共生距离法(SPXY)将样本集划分为校正集和预测集,基于线性判别分析(LDA)、K-最近邻分类(KNN)和支持向量机(SVM)算法对预处理后的全波段光谱建立红枣品种鉴别模型,结果显示,在多种预处理方法中,1-Der的处理效果最好;然后,结合主成分分析(PCA)、连续投影算法(SPA)和竞争性自适应重加权采样(CARS)等特征提取方法对全波段光谱进行特征波段的提取,再基于特征波段建立红枣品种鉴别模型,结果发现,在几种特征提取方法中,基于CARS所提特征波段建立的模型可以获得最高的鉴别准确率;最后,以SVM模型为例对模型运行时间进行了比较,结果发现,基于特征波段所建模型的运行时间远短于基于全波段所建模型的运行时间。
Abstract
To identify different Xinjiang jujube varieties, a hyperspectral technique and machine learning algorithms were employed to obtain and analyze the spectral data of Jinsi-jujube, Jun-jujube, and Tan-jujube. First, the original spectra were preprocessed using various data preprocessing methods, including multiplicative scatter correction (MSC), standard normal variate transformation (SNV), first-derivative (1-Der), and Savitzky-Golay (SG) smoothing. The effects of the preprocessing methods on modeling were investigated. Then, the samples were divided into calibration and prediction sets using sample set partitioning methods based on joint X-Y distance (SPXY). The jujube variety identification models were established based on linear discriminant analysis (LDA), K-nearest neighbor (KNN), and support vector machine (SVM) algorithms using the preprocessed full-band spectra. The results demonstrate that 1-Der outperformed other preprocessing methods mentioned above. Next, the characteristic bands were extracted from the full-band spectra using principal component analysis (PCA), successive projections algorithm (SPA), and competitive adaptive reweighted sampling (CARS). Then, the jujube variety identification models were established based on the characteristic bands. The CARS-based models achieved the highest accuracy in the models established based on several characteristic band extraction methods. Finally, taking the SVM model as an example, the model runtime was compared. The time required by the SVM model based on the characteristic bands was much shorter than the time required by the model based on the full-band spectra.
刘立新, 何迪, 李梦珠, 刘星, 屈军乐. 基于高光谱技术与机器学习的新疆红枣品种鉴别[J]. 中国激光, 2020, 47(11): 1111002. Liu Lixin, He Di, Li Mengzhu, Liu Xing, Qu Junle. Identification of Xinjiang Jujube Varieties Based on Hyperspectral Technique and Machine Learning[J]. Chinese Journal of Lasers, 2020, 47(11): 1111002.