光谱学与光谱分析, 2018, 38 (8): 2483, 网络出版: 2018-08-26  

数据驱动模型的血液物种光谱检测技术

Spectral Detection Technique of Blood Species Based on Data Driven Model
作者单位
1 中国医学科学院北京协和医学院生物医学工程研究所, 天津 300192
2 中国医学科学院北京协和医学院医学实验动物研究所, 北京 100021
3 天津大学精密仪器与光电子工程学院, 天津 300072
摘要
介绍一种基于光谱检测和数据驱动模型的非接触式血液物种识别技术。 选取了4个物种(猴144, 大鼠203, 狗133, 人169)共计649个血样作为原始样本。 超连续谱激光光源的波长范围是450~2 400 nm。 分别采集抗凝管盛装血液样本的后向散射可见光谱(294~1 160 nm)和十个不同空间位点的前向散射近红外光谱(1 021~1 757 nm), 将十一条光谱数据顺序连接为一维数据作为每个样本的原始数据。 利用主成分分析法对数据集进行特征信息提取, 保留原始差异信息量的99.99%, 同时将数据量压缩为原始数据量的1.5%, 提高分类识别的运算效率。 对不同数量的训练集和验证集进行训练预测实验表明, 十折交叉验证的识别误差率随着样本数量的增加而降低, 样本库规模的增大可以提高识别的精确度。 由于数据驱动模型是基于机器学习算法的数据流处理模型, 因而可以采用多种不同的分类算法实现。 通过比较人工神经网络、 支持向量机、 偏最小二乘回归、 多元线性回归、 随机森林和朴素贝叶斯的识别效果可以发现, 不同算法的识别效果具有类别差异性, 即各个算法的正确识别率排序在不同的物种中是有差异的。 因而实际应用中, 在选择数据驱动模型时, 除了需要考虑算法的整体识别率之外, 当对部分类别的识别效果有额外要求时, 还应该考虑算法本身的类别差异性。
Abstract
This paper proposed a non-contact blood species recognition technique based on spectral detection and data driven model. A total of 649 blood samples were selected from 4 species (monkey 144, rat 203, dog 133, and human 169) as the original samples. The wavelength range of the super continuum laser source was 450~2 400 nm. The backward scattered visible spectrum (294~1 160 nm) and the forward scattered near-infrared spectra of ten different spatial sites were collected from each blood sample contained in anticoagulant tubes. Then the eleven spectra were sequentially connected into one-dimensional data as the original data of each sample. The principal component analysis was used to extract the feature information of the dataset, which retained 99.99% of the original variance information, while compressing the data amount to 1.5% of the original data volume, such to improve the computational efficiency of classification and recognition. Experiments on different numbers of training sets and verification sets showed that the recognition error rate of ten-fold cross-validation decreases with the increase of the number of samples, and the increase of sample bank size can improve the recognition accuracy. Because the data driven model is a data stream processing model based on machine learning algorithms, in which a variety of different classification algorithms can be used to realize this model. By comparing the recognition effects of six algorithms (artificial neural network, support vector machine, partial least-squares regression, multiple linear regression, random forest and Nave Bayes), it was found that the recognition effects of different algorithms have the category differences, that is, the sort of these algorithms in terms of their correct recognition rate are different for different species. Therefore, when choosing the data driven model as a solution, in addition to considering the overall recognition rate of the algorithm, the scheme should also consider the category differences of the algorithm if there are additional requirements on the recognition effect of some certain categories.

李宏霄, 孙美秀, 向志光, 汪毅, 林凌, 秦川, 李迎新. 数据驱动模型的血液物种光谱检测技术[J]. 光谱学与光谱分析, 2018, 38(8): 2483. LI Hong-xiao, SUN Mei-xiu, XIANG Zhi-guang, WANG Yi, LIN Ling, QIN Chuan, LI Ying-xin. Spectral Detection Technique of Blood Species Based on Data Driven Model[J]. Spectroscopy and Spectral Analysis, 2018, 38(8): 2483.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!