光谱学与光谱分析, 2019, 39 (1): 96, 网络出版: 2019-03-17  

基于堆栈压缩自编码的近红外光谱药品鉴别方法

Stacked Contractive Auto-Encoders Application in Identification of Pharmaceuticals
作者单位
1 桂林电子科技大学电子工程与自动化学院, 广西 桂林 541004
2 北京邮电大学自动化学院, 北京 100876
3 中国食品药品检定研究院, 北京 100050
摘要
由于近红外光谱在药品鉴别应用中具有分析速度快、 样品无损、 可现场检测等突出优点, 目前已在众多领域中广泛应用。 但近红外光谱存在信噪比低, 吸收强度弱且谱峰重叠等缺点, 无法从光谱中直接得到定性/定量的物质信息, 因而近红外光谱分析技术常作为一种间接分析技术, 并且光谱的化学计量学建模方法成为近红外光谱分析的核心内容。 深度学习是机器学习的一个新的分支, 并已经成功运用于多个领域。 深度学习的网络结构和非线性的激活能力, 使其模型特别适合高维、 非线性的大规模数据建模。 为进一步丰富近红外光谱建模方法, 并提高近红外光谱分析技术的回归精度或分类准确率, 将深度学习方法应用于近红外光谱分析, 发展新的建模方法十分必要。 面向近红外光谱定性分析技术, 提出一种基于堆栈压缩自编码网络(SCAE)光谱定性分析方法, 并应用于多类别药品的光谱分析, 以区分或鉴别不同厂家生产的同种药品。 压缩自编码网络(CAE)以自编码网络(AE)为基础, 进一步加入雅克比矩阵作为约束项。 自编码网络最初是用实现数据降维, 以学习数据内部特征, 而雅克比矩阵包含数据在各个方向上的信息, 将其作为AE的约束项则可使提取到的特征对输入数据在一定程度下的扰动具有不变性, 从而提高AE提取特征的能力。 SCAE是一种由多层CAE构成的神经网络。 前一层CAE的隐藏层作为后一层CAE的输入层, 网络的全部参数是通过采用逐层贪婪的训练方式来获取的, 训练结束后将所有网络视为一个整体, 通过反向传播算法进行微调, 最后使用Logistic/Softmax分类器进行定性分析。 实验数据均为中国食品药品检定研究院采集, 以头孢克肟胶囊作为二分类实验数据, 硝酸异山梨酯片作为多分类实验数据。 通过Bruker Matrix光谱仪测定每个样本在不同波长下的吸光度值得到其光谱曲线, 再通过OPUS软件消除漂移等因素对光谱样本之间产生的偏差。 接下来通过实验确定约束项雅克比矩阵的系数λ为0.003之后建立模型。 建模过程分为五个阶段, 分别为: 预处理阶段, 预训练阶段, 微调阶段, 测试阶段和对比阶段。 为了验证SCAE在分类准确性、 算法稳定性和建模时间等方面的性能, 与BP神经网络、 SVM算法、 稀疏自编码(SAE)和降噪自编码(DAE)开展对比实验研究。 分类准确性方面, 在不同的训练集与测试集的比例下, SCAE均有最佳的分类准确性与算法稳定性。 建模时间方面, 由于SVM算法不需要预训练和特征提取, 所以运行时间方面比其他算法有大的优势, 但是SCAE建模速度优于除SVM之外的其他对比算法。 综合而言, 使用SCAE进行药品鉴别有效可行。
Abstract
As near-infrared spectroscopy has many advantages, such as fast analysis, non-destructive testing and field detection, it has been widely used in many fields. However, there are some shortcomings such as low signal-to-noise ratio, weak absorption intensity and overlapping peaks in near-infrared spectroscopy. NIR spectroscopy can not be qualitatively/quantitatively obtained from the spectrum. Therefore, NIR spectroscopy can only be used as an indirect analytical technique. The research of infrared spectral modeling method becomes the core of analyzing near infrared spectroscopy. Deep learning is a new branch of machine learning and has been successfully applied in many fields. The network structure of deep learning and the non-linear activation ability make the model especially suitable for high-dimensional and nonlinear large-scale data modeling. In order to further enrich the NIRS modeling method and improve the accuracy of NIRS, it is necessary to develop a new modeling method using NIRS. The qualitative analysis of near-infrared spectroscopy is studied in this paper. A model based on Stacked Contractive Auto-Encoders(SCAE) is proposed to identify the same drugs produced by different manufacturers on the market. With contractive Auto-Encoder (CAE) based on Auto-Encoder network by adding Jacobi matrix as a constraint, self-coding network is used to reduce the dimension of the data to learn the internal characteristics of the data, and Jacobi matrix contains information in all directions. The extracted features can be invariant to a certain degree of perturbation of the input data and improve the ability of self-encoding network to extract features. SCAE is a multi-layer CAE neural network. As the input layer of the latter layer of CAE network, all the parameters of the network are obtained by adopting the layer-by-layer greedy training method. After the training, all the networks are regarded as a whole, Fine-tuning by backpropagation algorithm, and finally using Logistic/Softmax classifier for qualitative analysis. The experimental data were collected by the National Institutes for Food and Drug Control, with Cefixime Capsules as the second classification experimental data and Isosorbide Dinitrate Tablets as a multi-classification experimental data. The spectral curves were obtained by measuring the absorbance of each sample at different wavelengths with a Bruker Matix spectrometer, and then the deviation from the spectral samples was obtained by OPUS software to eliminate the drift and other factors. Next, we established the model by experimentally determining the Lamda of the constrained Jacobi matrix ratio coefficient of 0.003. The modeling process was divided into five stages, namely: pre-treatment stage, pre-training stage, fine-tuning stage, testing stage and contrast stage. In order to verify the performance of SCAE network in terms of classification accuracy, algorithm stability and modeling time, the algorithm was compared with BP neural network, SVM algorithm, sparse Auto-Encoders (SAE), Denoising Auto-Encoders(DAE) for comparison. In terms of classification accuracy, stack compression self-coding network has the highest classification accuracy and algorithm stability at different ratios of training set to test set. In terms of modeling time, SVM algorithm has a great advantage over other algorithms in terms of running time because it does not need pre-training and feature extraction. However, stack compression self-coding network modeling speed is better than other contrast algorithms except SVM. In summary, the use of stack compression self-coding network for drug identification is effective and feasible.

甘博瑞, 杨辉华, 张卫东, 冯艳春, 尹利辉, 胡昌勤. 基于堆栈压缩自编码的近红外光谱药品鉴别方法[J]. 光谱学与光谱分析, 2019, 39(1): 96. GAN Bo-rui, YANG Hui-hua, ZHANG Wei-dong, FENG Yan-chun, YIN Li-hui, HU Chang-qin. Stacked Contractive Auto-Encoders Application in Identification of Pharmaceuticals[J]. Spectroscopy and Spectral Analysis, 2019, 39(1): 96.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!