光谱学与光谱分析, 2021, 41(2): 606, 网络出版: 2021-04-08
The Classification of Plant Leaves by Applying Chemometrics Methods on Laser-Induced Breakdown Spectroscopy
激光诱导击穿光谱(LIBS)是一种高效快速的光谱采集手段, 可应用于各类物质的元素分析工作中。 线性判别分析(LDA)与支持向量机(SVM)是化学计量学中两种常用的有监督算法, 均通过对已知不同种类的样本数据进行学习建模, 进而实现对未知类别数据的归类。 为了实现LIBS技术对有机物的高准确率识别, 将这两种算法应用到LIBS光谱数据的分类中。 实验利用波长为1 064 nm的纳秒激光烧蚀女贞、 珊瑚树、 竹子三种植物的叶片, 并采集每种树叶220~432 nm波段的100组光谱数据。 通过对300组样本的原始光谱数据进行主成分提取, 由第一主成分(PC1)和第二主成分(PC2)的得分图得出三种植物光谱的相似度非常高。 然后, 利用每种叶片70组样本的光谱数据作为训练集建模, 其余30组光谱数据作为测试集来进行树叶种类的预测识别。 将PCA对原始光谱数据提取得到的前20个主成分作为LDA与SVM建模的属性值。 对于LDA算法, 将属性值分析后得到前两个判别函数值, 通过聚类分析发现不同种类的植物叶片光谱数据在空间上的分离效果较好, 同一种类基本聚集在一起。 再借助马氏距离可得到测试集的平均分类正确率为96.67%。 与此类似, 使用SVM方法对训练集样本的数据进行学习得到分类超平面, 对测试集的平均分类正确率达到98.9%。 研究结果表明, 经过PCA对数据的预处理, 再结合LDA, SVM这两种方法可实现LIBS技术应用于复杂有机物的快速准确分类, 并且PCA与SVM结合的分类正确率更高。 该方法可在食品快速溯源、 生物组织原位鉴别、 有机爆炸物远程分析等领域应用。
Laser induced breakdown spectroscopy (LIBS) is a highly efficient and rapid elemental analysis method. It can be applied to the elemental analysis of various materials. Linear discriminant analysis (LDA) and support vector machine (SVM) are two commonly used supervised algorithms in chemometrics. These two methods both need to build the models with known sample data, and then to classify unknown sample data. In order to achieve high accuracy of recognition for organics by LIBS technology, these two algorithms were used to analyze LIBS spectra. In this experiment, a nanosecond laser with 1 064 nm wavelength was used to ablate three kinds of plant leaves (Ligustrum lucidum, Viburnum odoratissinum, bamboo) to produce plasma. The plasma spectra were acquired by an optical fiber spectrometer in the range of 220 to 432 nm. 100 spectra from each kind of plant leaves were collected. Firstly, the principal component extraction for the original spectral data of 300 samples was carried out. Then the first two principal components (PC1, PC2) were used to make the score plot. The spectra of these three kinds of plant leaves are very similarities so that they could not be identified directly. Then, 70 spectra of each kind of plant sample were set as a train set, and the other 30 spectra were used as the test set to test the classification model. The first 20 principal components extracted by the PCA were used as attribute values for modeling of LDA and SVM. For the LDA, the spectra were processed to obtain the first two discriminant function values. The larger scatter distribution intervals for different types of leaves can be acquired by plotting the discriminant function values. Then combined with the Mahalanobis distance, the average classification accuracy of the test set was up to 96.67%. Similarly, the SVM method was used to learn the characters of the train set to obtain the classification hyperplane. The average classification accuracy rate of SVM for the test set was up to 98.89%, which is better than LDA. This work can be helpful to food traceability, in situ identification of biological tissues and remote analysis of organic explosives by LIBS technology.