改进对称零面积变换寻峰算法在拉曼光谱中的应用
Raman spectroscopy is an efficient and non-destructive analytical method for obtaining chemical information. The characteristic peaks in a Raman spectrum contain chemical information about the substance. The symmetric zero-area conversion is a commonly employed peak-seeking method. However, before peak seeking, various parameters related to the spectral line should be input, such as window width, Lorentz function half-width, and Gaussian function half-width. For different Raman spectra, these parameters to be input may be different, and if the input parameters do not match the current Raman spectrum, the obtained peak positions may be inaccurate. Currently, some open Raman databases only contain raw Raman spectral data without corresponding peak information. Preprocessing the raw spectral data and obtaining the corresponding peak positions and intensities by peak-seeking algorithms lead to better and more convenient utilization. Although the symmetric zero-area conversion method has advantages in automatic peak seeking and can obtain the intensity information corresponding to the spectral peaks, this peak-seeking algorithm requires various parameters related to the spectral data, such as window width, Lorentz function half-width, and Gaussian function half-width. Therefore, the universality of the symmetric zero-area conversion method is relatively limited during processing different Raman spectra in the database. We propose an improved symmetric zero-area method to reduce the input of parameters related to spectral data and adapt it to data with different spectral resolutions. We hope that this algorithm can automatically search peaks in batches for many raw Raman spectral data in the Raman database to generate a more concise and convenient database.
This algorithm improves the peak-seeking algorithm of symmetric zero-area conversion by combining noise reduction and baseline removal algorithms. First, the Whittaker Smoother algorithm is employed to remove noise from the raw Raman spectrum, which can quickly and easily remove noise without producing peak position shifts. Then, the asymmetrically weighted penalized least squares (arPLS) algorithm is utilized to remove the spectrum baseline. Next, we improve the symmetric zero-area method by normalizing the half-width of the Raman spectrum peaks, thus reducing the number of required input parameters and suppressing peak-seeking offsets. After peak seeking, the found peak positions are further corrected to reduce offsets and accurately locate peaks. Finally, the WALPSZ peak-seeking algorithm is formed by combining the Whittaker Smoother and arPLS. Additionally, the algorithm is leveraged to automatically search for peaks in ROD's raw Raman spectral data and adopted for experimental Raman spectral analysis of Anhydrite, Pyrite, and Moissanite. The obtained peak positions are compared with the literature's data to verify their reliability and universality for different Raman spectral data.
First, the traditional symmetric zero-area conversion method and the WALPSZ algorithm are applied to analyze the peak seeking of ROD's Calcite, Analcime, Bindheimite, and Brookite original spectral data. When utilizing the traditional symmetric zero-area peak-seeking algorithm with fixed parameters, it has the best peak-seeking effect on Calcite [Fig. 3(a)] and a better peak-seeking effect on Analcime, but there is a situation where a peak is searched twice at 1000-1500 cm-1 [Fig. 3(b)]. The peak seeking of Bindheimite shows an obvious peak-seeking offset and a situation where one peak is searched twice [Fig. 3(c)]. The peak seeking of Brookite exhibits a clear missing peak case [Fig. 3(d)]. By employing the WALPSZ peak-seeking algorithm, it maintains a sound peak-seeking effect on Calcite and solves the above inaccurate peak-seeking problems when facing other Raman spectra, which indicates that the WALPSZ peak-seeking algorithm has better universality. To further verify the universality and accuracy of the WALPSZ peak-seeking algorithm and explore whether the algorithm can still be applied in actual measured Raman spectra, Anhydrite, Pyrite, and Moissanite are prepared for Raman spectral measurement, and the WALPSZ peak-seeking algorithm is adopted for peak-seeking analysis (Fig. 12). The found peaks are compared with those found by the WALPSZ peak-seeking algorithm in the original spectral data of these three samples in ROD and RRUFF and literature data, and we find that these peaks can correspond to each other (Table 2).
The symmetric zero-area conversion method is improved by reducing the input parameters and then is combined with the Whittaker Smoother and arPLS baseline removal algorithm to form the WAPLSZ peak-seeking algorithm, which enhances its universality. The WAPLSZ peak-seeking algorithm is compared with the traditional symmetric zero-area conversion method and the peak-seeking results of other original Raman spectra of ROD by the WAPLSZ peak-seeking algorithm. The results show that reducing the input parameters makes this algorithm capable of automatically batch searching for spectral data in open Raman databases. Meanwhile, we employ the WALPSZ peak-seeking algorithm to obtain the peak positions of Anhydrite, Pyrite, and Moissanite in ROD and RRUFF's Raman spectra, obtain the peak positions of the measured Raman spectra of these samples by this algorithm, and compare them with the peak positions in literature. The results reveal that the WALPSZ peak-seeking algorithm is effective for automatically searching for peaks in measured Raman spectral data and original data in ROD and that the obtained peak positions can correspond to each other and are consistent with the data recorded in the literature. Then, the reliability and accuracy of the WALPSZ peak-seeking algorithm are verified for automatically searching for peaks in Raman original data. Finally, this algorithm can help establish a database of automatically searched peak positions in ROD and correspond to data recorded in literature to analyze chemical information from measured Raman spectra.
1 引言
拉曼光谱法是一种高效、无损的化学信息获取分析方法。由于不需要破坏样品,拉曼光谱法被广泛应用于医学、材料、生物和考古等领域[1-5]。当光与物质相互作用时,光子可能被吸收或散射。当光子的能量对应于分子基态与激发态之间的能级差时,光子可能会被分子吸收;也有可能与分子相互作用时被散射出去。前后光子的能量差与分子振动能级差相对应,可以通过分析这些信息来获取分子振动能级[6]。这些信息就像是特定分子的“指纹”,具有独特性。通过拉曼位移获取拉曼峰可获得样品的化学组成;通过拉曼峰相对强度可获得物质相对含量。
在分析拉曼光谱时,需要对原始光谱进行预处理,包括去噪声、去基线和寻峰。之后,将寻得的峰位与数据库进行对比即可获取待测样品的化学组成。噪声可能会导致拉曼谱寻得假峰,影响拉曼分析的准确率,因此去噪声是处理拉曼光谱数据不可或缺的一步。拉曼分析某些样品时,可能会产生荧光背景,这种荧光有时比拉曼散射强几个数量级。关于扣除拉曼光谱背景(去基线)的方法近年来依然在不断发展[7-8],非对称最小二乘拟合法由于可以自由调节拟合出来的基线偏移量,获得最优的光谱基线校正结果,在拉曼光谱去基线处理中被广泛使用[9]。经过去噪声和去基线的处理,可以通过寻峰获得光谱中的化学信息。常见的寻峰方法包括高斯乘积函数找峰法[10]、导数法[11]、协方差找峰法、连续小波变换法[12]和对称零面积变换法[13]。对称零面积变换法在自动寻峰中应用广泛,因为它具有弱峰识别、重峰识别和抑制高基底能力的优势。
拉曼开放数据库(Raman Open Database,ROD)是由SOLSA H2020项目开发的拉曼开放数据库[14]。里面有超过1000个高质量的拉曼光谱原始数据,这些数据来自各种拉曼光谱仪,使用了各种激发光源。想直接使用这些数据比较困难,如果对其中的原始光谱数据进行预处理以及寻峰算法得到相应的峰位以及峰强度的信息后,就能更好、更方便地使用它。对称零面积变换法虽然在自动寻峰上具有优势,也能获取谱峰对应的强度信息,但该寻峰算法需要输入与光谱数据有关的各项参数,如窗宽、洛伦兹函数半峰全宽、高斯函数半峰全宽等,因此对称零面积变换法在处理不同分辨率的拉曼光谱时,其普适性相对有限。本文对对称零面积变换法进行了一定的修正和改进,减少与光谱数据相关参数的输入,以适应不同谱峰宽度的数据。
2 算法与流程
2.1 基于Whittaker Smoother 的去噪声算法
拉曼光谱的噪声主要是高斯噪声。可以使用常规滤波方法,如傅里叶变换频域滤波和小波变换[15],去除这种高斯噪声。但是这些方法本质上是在频域上对噪声进行去除,在光谱数据较多时,这些方法会非常耗时。Eilers[16]提出了一种被称为“Whittaker Smoother”的高效、快速、简单的去噪声方法。该算法可以很快进行谱光滑,而且不会造成峰的偏移。此算法核心是在数据的失真度与粗糙程度取得一个平衡。当需要将一个有噪声的谱线y拟合到相对没有噪声的谱线z时,需要考虑两个因素:一是数据的失真度;二是数据的粗糙程度。需要在保证数据尽量不失真的情况下降低数据的噪声。然而当z越平滑时,它就越偏离y,失真度会增加。谱线的粗糙程度可以通过
为方便计算,引入矩阵的算法,向量
式中,
故设
图 1. 对加入了高斯白噪声的方沸石拉曼光谱数据使用Whittaker Smoother去噪
Fig. 1. Using Whittaker Smoother for denoising Raman spectroscopy data of Analcime with Gaussian white noise
2.2 基于非对称加权惩罚最小二乘(arPLS)去基线算法
非对称最小二乘(AsLS)去基线算法是扣除拉曼光谱荧光背景的一个常用算法。基于上述的去噪声算法进行改进,得到非对称最小二乘去基线算法的核心[17]。引入权值因子wi,则粗糙度与失真度的平衡组合Q为
式中:当
通过
由于
式中:
图 2. 对方沸石拉曼光谱数据使用arPLS去基线
Fig. 2. Using arPLS to deduct baseline for Raman spectral data of Analcime
2.3 基于Vogit函数的少参数对称零面积寻峰算法
对称零面积变换法的基本思想是用面积为零的对称函数与光谱数据进行卷积变换,除了存在峰的地方以外,其他线性基底的卷积变换将为零,这种面积为零的对称函数一般被称为“窗函数”,其数学表达式为
式中:
式中:W=2m+1为窗宽;
在振动光谱学中,许多振动光谱的线轮廓本质上是洛伦兹线型。但由于样品本身的特性以及光谱仪设备存在的统计涨落等影响,光谱会进行一定程度的展宽,展宽因素接近高斯函数。因此拉曼光谱的线型既不是高斯线型,也不是洛伦兹线型。相反,它是洛伦兹函数与高斯函数的卷积,被称为Voigt函数[13],其表达式为
式中:
对于不同的拉曼光谱,都有各自最佳的W、
图 3. 对ROD中拉曼光谱的寻峰效果( , ,W=19)。(a)方解石;(b)方沸石;(c)水锑铅矿;(d)板钛矿
Fig. 3. Peak-seeking effect of Raman spectroscopy in ROD ( , , W=19). (a) Calcite; (b) Analcime; (c) Bindheimite; (d) Brookite
从
式中:
图 4. 对方沸石拉曼光谱使用对称零面积法的寻峰效果。(a)寻得的峰位;(b)峰位对应的SSi
Fig. 4. Peak-seeking effect using symmetric zero-area method for Raman spectroscopy of Analcime. (a) Peaks of seeking; (b) SSi of peaks
因此引入Score参数对SSi和谱峰强度进行平衡从而对峰进行评价:
式中:
由于通过归一化获得的参数W、
图 5. 对ROD中水锑铅矿的寻峰修正。(a)寻峰修正前;(b)寻峰修正后
Fig. 5. Peak-seeking correction of Bindheimite in ROD. (a) Before peak-seeking correction; (b) after peak-seeking correction
2.4 WALPSZ寻峰算法的应用
将Whittaker Smoother去噪算法、arPLS去基线算法、LPSZ寻峰算法相结合,形成WALPSZ寻峰算法,其具体流程图如
目前开放的拉曼数据库,如ROD、RRUFF等含有许多实验原始谱数据,但是没有提供特征峰位,不能直接通过峰位对比来获取物质的化学信息。WALPSZ寻峰算法具有普适性,使用该寻峰算法可以将ROD中的原始拉曼光谱数据转化成峰位和对应峰强的信息,方便使用。使用WALPSZ寻峰算法可以对数据库中大量原始拉曼光谱批量寻峰,并选出其
表 1. 对ROD批量寻峰得到的部分峰位数据
Table 1. Part of peaks obtained by batch peak-seeking of ROD
|
图 7. 对ROD批量寻峰得到的部分峰位数据的寻峰效果图。(a)铁铝榴石;(b)亚砷锌石;(c)雷顿石;(d)铁磷灰石;(e)碳酸芒硝;(f)纤铁矿
Fig. 7. Peak-seeking effects of partial peaks obtained by batch peak-seeking of ROD. (a) Almandine; (b) Leiteite; (c) Leightonite; (d) Zwieselite; (e) Hanksite; (f) Lepidocrocite
3 结果与讨论
3.1 WALPSZ寻峰算法在拉曼开放数据库的寻峰效果
使用WALPSZ寻峰算法对
图 8. WALPSZ寻峰算法的寻峰效果。(a)方解石;(b)方沸石;(c)水锑铅矿;(d)板钛矿
Fig. 8. Peak-seeking effect of WALPSZ peak-seeking algorithm. (a) Calcite; (b) Analcime; (c) Bindheimite; (d) Brookite
3.2 算法在实测光谱中的应用
为了进一步验证WALPSZ寻峰算法的适用性和准确性,以及探究该算法是否能在实际测量的拉曼光谱中依然适用,分别准备了无水硫酸钙样品、黄铁矿、莫桑石。这些样品使用法国HORIBA公司的激光拉曼光谱仪(LabRAM HR)进行测量,该光谱仪的焦长为800 mm,光谱重复性≤±0.2 cm-1,光谱分辨率≤±1.95 cm-1,激光波长使用785 nm。对实测样品的拉曼光谱分析如
图 9. 对实测样品拉曼光谱数据扣除噪声和背景。(a)、(b)无水硫酸钙;(c)、(d)黄铁矿;(e)、(f)莫桑石
Fig. 9. Noise and background are deducted from Raman spectral data of measured sample. (a), (b) Anhydrite; (c), (d) Pyrite; (e), (f) Moissanite
使用WALPSZ寻峰算法对从ROD和RRUFF数据库中抽取的无水硫酸钙、黄铁矿、莫桑石的拉曼光谱以及实际测量的拉曼光谱进行寻峰。如
图 10. WALPSZ寻峰算法对三种待测样品ROD中拉曼光谱的寻峰效果。(a)无水硫酸钙;(b)黄铁矿;(c)莫桑石
Fig. 10. Effect of Raman spectroscopy peak-seeking of WALPSZ peak-seeking algorithm in ROD on three samples to be measured. (a) Anhydrite; (b) Pyrite; (c) Moissanite
图 11. WALPSZ寻峰算法对三种待测样品RRUFF中拉曼光谱的寻峰效果。(a)无水硫酸钙;(b)黄铁矿;(c)莫桑石
Fig. 11. Effect of Raman spectroscopy peak-seeking of WALPSZ peak-seeking algorithm in RRUFF on three samples to be measured. (a) Anhydrite; (b) Pyrite; (c) Moissanite
图 12. WALPSZ寻峰算法对三种待测样品实测拉曼光谱的寻峰效果。(a)无水硫酸钙;(b)黄铁矿;(c)莫桑石
Fig. 12. Peak-seeking effect of actual Raman spectroscopy of WALPSZ peak-seeking algorithm on three samples to be measured. (a) Anhydrite; (b) Pyrite; (c) Moissanite
为了进一步验证WALPSZ寻峰算法的可靠性,将该算法寻得的数据库中的拉曼光谱峰位和实际测量的拉曼光谱峰位与实际峰位进行比较,如
表 2. 四种来源的拉曼光谱峰位比较
Table 2. Comparison of Raman spectral peaks from four sources
|
4 结论
在分析拉曼光谱时,通过获取峰的拉曼位移与数据库对比可以识别样品的化学组成。然而,一些开放的拉曼数据库(如ROD)仅有拉曼光谱原始数据,需要对其进行批量寻峰以获得特征信息。对称零面积法在自动寻峰方面具有优势,但处理ROD中各种分辨率的拉曼光谱时通用性不足。本文通过将对称零面积法寻峰进行改进,减少了输入参数后,再将其与Whittaker Smoother去噪算法、arPLS去基线算法结合,形成WAPLSZ寻峰算法。WAPLSZ寻峰算法相较于传统的对称零面积变换寻峰算法增加了其普适性,减少了参数的输入,使该算法可以对ROD中的光谱数据自动批量寻峰。
本文使用WALPSZ寻峰算法获取了ROD与RRUFF数据库中无水硫酸钙、黄铁矿、莫桑石的拉曼光谱的峰位。也通过该算法获取了实际测量上述样品得到的拉曼光谱数据的峰位,并将这些与相关文献中的峰位进行对比。结果表明,WALPSZ寻峰算法对实测拉曼光谱数据,以及ROD中原始数据的自动寻峰是有效的,获取峰位可以相对应,且都与文献中记录的数据一致。这验证了WALPSZ寻峰算法对拉曼原始数据自动寻峰的可靠性与准确性,可以将ROD中自动寻峰的峰位建库,并和文献记录的数据对应,从而分析实测拉曼光谱的化学信息。
[1] 张灿, 张洁, 朱永. 槽型波导耦合纳米结构增强拉曼光谱[J]. 光学学报, 2020, 40(3): 0313001.
[2] 覃宗定, 许雪棠, 张枝芝, 等. 基于拉曼光谱的硝酸甘油对活体血液作用的实时分析[J]. 光学学报, 2014, 34(1): 0130001.
[3] Colantonio C, Clivet L, Laval E, et al. Integration of multispectral imaging, XRF mapping and Raman analysis for noninvasive study of illustrated manuscripts: the case study of fifteenth century “Humay meets the Princess Humayun” Persian masterpiece from Louvre Museum[J]. The European Physical Journal Plus, 2021, 136(9): 958.
[4] Ru C L, Wen W, Zhong Y. Raman spectroscopy for on-line monitoring of botanical extraction process using convolutional neural network with background subtraction[J]. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2023, 284: 121494.
[5] 黄祖芳, 李玉玲, 杜生荣, 等. 基于拉曼光谱技术的精子评筛研究进展[J]. 中国激光, 2023, 50(15): 1507202.
[6] SmithE, DentG. Modern Raman spectroscopy: a practical approach[M]. 2nd ed. Singapore: John Wiley & Sons, 2019.
[7] 司赶上, 刘家祥, 李振钢, 等. 基于形态学与多项式拟合的紫外拉曼荧光背景扣除算法[J]. 光学学报, 2022, 42(22): 2230001.
[8] 姚泽楷, 蔡耀仪, 李诗文, 等. 基于平滑样条曲线结合离散状态转移算法的拉曼光谱基线校正方法[J]. 中国激光, 2022, 49(18): 1811001.
[9] 杨桂燕, 李路, 陈和, 等. 基于广义Whittaker平滑器的拉曼光谱基线校正方法[J]. 中国激光, 2015, 42(9): 0915003.
[10] 吴和喜, 袁新宇, 刘庆成, 等. 一种γ谱弱峰优化检测方案研究[J]. 原子能科学技术, 2012, 46(9): 1142-1146.
Wu H X, Yuan X Y, Liu Q C, et al. Optimal scheme for detecting weak peak in γ-ray spectrum[J]. Atomic Energy Science and Technology, 2012, 46(9): 1142-1146.
[11] 汪雪元, 何剑锋, 刘琳, 等. 小波变换导数法X射线荧光光谱自适应寻峰研究[J]. 光谱学与光谱分析, 2020, 40(12): 3930-3935.
[12] Zhang Z M, Tong X, Peng Y, et al. Multiscale peak detection in wavelet space[J]. Analyst, 2015, 140(23): 7955-7964.
[13] 毕云峰, 李颖, 郑荣儿. LIBS/Raman光谱对称零面积变换自动寻峰方法研究[J]. 光谱学与光谱分析, 2013, 33(2): 438-443.
[14] El Mendili Y, Vaitkus A, Merkys A, et al. Raman Open Database: first interconnected Raman-X-ray diffraction open-access resource for material identification[J]. Journal of Applied Crystallography, 2019, 52(3): 618-625.
[16] Eilers P H C. A perfect smoother[J]. Analytical Chemistry, 2003, 75(14): 3631-3636.
[17] Eilers P H C. Parametric time warping[J]. Analytical Chemistry, 2004, 76(2): 404-411.
[18] Baek S J, Park A, Ahn Y J, et al. Baseline correction using asymmetrically reweighted penalized least squares smoothing[J]. The Analyst, 2015, 140(1): 250-257.
[19] Dobrzhinetskaya L, Mukhin P, Wang Q, et al. Moissanite (SiC) with metal-silicide and silicon inclusions from tuff of Israel: Raman spectroscopy and electron microscope studies[J]. Lithos, 2018, 310/311: 355-368.
[20] Muñoz E C, Gosetti F, Ballabio D, et al. Characterization of pyrite weathering products by Raman hyperspectral imaging and chemometrics techniques[J]. Microchemical Journal, 2023, 190: 108655.
[21] Prieto-Taboada N, Gómez-Laserna O, Martínez-Arkarazo I, et al. Raman spectra of the different phases in the CaSO4-H2O system[J]. Analytical Chemistry, 2014, 86(20): 10131-10137.
Article Outline
王海, 黄宁, 何泽, 王鹏, 袁靖茜. 改进对称零面积变换寻峰算法在拉曼光谱中的应用[J]. 光学学报, 2024, 44(3): 0330001. Hai Wang, Ning Huang, Ze He, Peng Wang, Jingxi Yuan. Application of Improved Symmetric Zero-Area Conversion Peak-Seeking Algorithm in Raman Spectroscopy[J]. Acta Optica Sinica, 2024, 44(3): 0330001.