Optimization of Characteristic Wavelength Variables of Near Infrared Spectroscopy for Detecting Contents of Cellulose and Hemicellulose in Corn Stover
预处理是提高玉米秸秆生物转化利用效率的有效途径。 玉米秸秆经生物炼制转化为生物燃料时, 转化率与其原料内的纤维素和半纤维素含量直接相关。 为了实现对预处理后玉米秸秆的生物炼制过程的有效调控, 提出使用近红外光谱(NIRS)对玉米秸秆的纤维素和半纤维素含量进行快速检测, 解决传统化学方法测试速度慢、 成本高的问题。 为了提高NIRS检测的效率和精度, 将遗传算法与模拟退火算法相结合构建遗传模拟退火算法(GSA)用于预处理后玉米秸秆纤维素和半纤维素含量NIRS特征波长优选。 GSA算法以NIRS波长点数为码长进行二进制编码, 以偏最小二乘法(PLS)回归模型的交叉验证均方根误差为目标函数, 结合温度参数设计适应度函数, 基于Metropolis判别准则实现扰动解的选择复制, 能够在避免早熟的同时有效提高进化后期的搜索效率。 采用碱预处理、 生物预处理及其相结合的方法对采集的玉米秸秆进行预处理后制备样品120个, 并测定其纤维素和半纤维素含量及NIRS。 使用7点Savitzky-Golay平滑结合多元散射校正和标准正则变换对光谱进行预处理后, 利用Kennard-Stone法按3∶1比例划分校正集和验证集。 然后, 使用GSA算法对NIRS全谱进行特征波长优选(记为Full-GSA)、 对协同区间偏最小二乘法(SiPLS)优选后谱区进行特征波长优选(记为SiPLS-GSA)、 对反向区间偏最小二乘法(BiPLS)优选后谱区进行特征波长优选(记为BiPLS-GSA), 并使用PLS回归模型和验证集对特征波长优选结果进行评测。 Full-GSA以全谱1 557个波长点为基因, 执行16次算法, 优选出118个纤维素特征波长点和164个半纤维素特征波长点。 SiPLS-GSA经SiPLS优选的纤维素和半纤维素谱区波长点数分别为388个和160个, 再经GSA进一步优选后得到157个纤维素特征波长点和148个半纤维素特征波长点。 BiPLS-GSA经BiPLS优选的纤维素和半纤维素谱区波长点数分别为358个和180个, 再经GSA进一步优选后得到130个纤维素特征波长点和153个半纤维素特征波长点。 结果表明, 通过波长优选, 不仅参与建模的波长点数量显著减少, 而且回归模型的性能显著优于全谱建模。 其中, 采用Full-GSA优选的纤维素特征光谱回归性能最佳, 采用SiPLS-GSA优选的半纤维素特征光谱回归性能最佳。 回归模型验证集的平均相对误差(MRE)分别为1.752 4%和2.020 8%, 较全谱建模分别降低了13.636 6%和25.368 4%。 基于结合温度参数设计适应度函数的策略构建的GSA具有良好的全局搜索性能, 适用于玉米秸秆纤维素和半纤维素含量NIRS特征波长优选。 GSA以全谱每个波长点为染色体基因的编码方案适用于NIRS全谱的特征波长优选。 GSA同样适用于SiPLS和BiPLS优选后谱区的特征波长优选, 能够有效实现优选后谱区的波长点优选。
Pretreatment is an effective way to improve the utilization efficiency of the corn stover biotransformation. The conversion rate is directly related to contents of the cellulose and hemicellulose in corn stover during the bio-refinery conversion to biofuels. To achieve an effective control for the corn stover bio-refining process after the pretreatment, the near infrared spectroscopy (NIRS) was used to quickly detect contents of the cellulose and hemicellulose, solving the problems of being time consuming and high-cost in the traditional chemical analysis method. To improve the efficiency and precision of the NIRS detection, the genetic simulated annealing algorithm (GSA) based on genetic algorithm (GA) combined with simulated annealing algorithm (SA) was presented for optimizing the characteristic wavelength variables of NIRS. In the GSA, firstly, the number of the NIRS wavelengths was used as the code length for binary coding; secondly, the root mean square error of cross-validation (RMSECV) of the partial least squares (PLS) regression model was used as the objective function; thirdly, the fitness function was designed combining with the temperature parameter; and last, the selective replication of the perturbation solution was realized based on the Metropolis criterion. Therefore, GSA can effectively improve the search efficiency at the later stage of evolution while avoiding premature convergence. 120 samples of corn stover were prepared by using the pretreatments of alkaline, biology, and the combination of alkaline and biology. The contents of cellulose and hemicellulose were measured using the wet chemistry methods. The NIRS were collected using the Nicolet Antaris Ⅱ Fourier near infrared spectrometer. The spectrum was pretreated by 7 points Savitzky-Golay smoothing combining with multivariate scattering correction and standard normal variate transformation. The samples were divided into correction set and validation set by using Kennard-Stone algorithm at a ratio of 3∶1. The GSA is used for the characteristic wavelength variables optimizations of the NIRS whole wavelengths (Full-GSA), the synergy interval partial least squares selected spectral region (SiPLS-GSA), and the backward interval partial least squares selected spectral region (BiPLS-GSA), respectively. And then, the optimized results of the characteristic wavelength variables were evaluated by the PLS regressive model with the validation set. In Full-GSA, 1 557 wavelength points were used as chromosome genes in whole wavelengths, 118 cellulose characteristic wavelength points and 164 hemicellulose characteristic wavelength points were selected after 16 executions. In SiPLS-GSA, the cellulose and hemicellulose wavelength points of spectral region optimized by SiPLS were 388 and 160, respectively, and 157 cellulose characteristic wavelength points and 148 hemicellulose characteristic wavelength points were gotten after the further optimization by GSA. In BiPLS-GSA, the cellulose and hemicellulose wavelength points of spectral region optimized by BiPLS were 358 and 180, respectively, and 130 cellulose characteristic wavelength points and 153 hemicellulose characteristic wavelength points were selected after the further optimization by GSA. It was shown that not only the number of wavelengths was significantly decreased after the optimization, but also the performance of regressive model was obviously better than that of the whole wavelengths. The best performance of regressive model for cellulose characteristic wavelengths was obtained by Full-GSA, and the best performance for hemicellulose characteristic wavelengths was obtained by SiPLS-GSA. The mean relative error (MRE) values of validation set for cellulose and hemicellulose in the best model were 1.752 4% and 2.020 8%, which were decreased by 13.636 6% and 25.368 4% compared with the whole wavelengths, respectively. The GSA combining with temperature parameters to design the fitness function is suitable for the NIRS characteristic wavelength selection of the cellulose and hemicellulose contents in corn stover, and has a good global search capability. The encoding scheme of GSA using each wavelength point in whole wavelengths as chromosome gene is suitable for the characteristic wavelength selection of NIRS whole spectrum. GSA is also suitable for the characteristic wavelength selection of the spectral region optimized by SiPLS and BiPLS, and the selection of wavelength points in the optimized spectral region can also be achieved effectively.
基金项目：国家科技支撑计划课题(2015BAD21B03), 哈尔滨市科技创新人才专项(2016RAXXJ009), 黑龙江省青年科学基金项目(QC2016033), 黑龙江八一农垦大学校内培育课题(XZR2017-09)资助
初晓冬：东北农业大学工程学院, 黑龙江 哈尔滨 150030
王 智：东北农业大学工程学院, 黑龙江 哈尔滨 150030
许永花：东北农业大学电气与信息学院, 黑龙江 哈尔滨 150030
李文哲：东北农业大学工程学院, 黑龙江 哈尔滨 150030
孙 勇：东北农业大学工程学院, 黑龙江 哈尔滨 150030
备注：刘金明, 1981年生, 东北农业大学工程学院博士研究生, 黑龙江八一农垦大学电气与信息学院副教授
【1】Katsimpouras C, Zacharopoulou M, Matsakas L, et al. Bioresource Technology, 2017, 244: 1129.
【2】Yan X, Wang Z R, Zhang K J, et al. Bioresource Technology, 2017, 245: 419.
【3】Liu C M, Wachemo A C, Yuan H R, et al. Renewable Energy, 2018, 116: 224.
【4】Mourtzinis S, Cantrell K B, Arriaga F J, et al. Bioenergy Research, 2014, 7(2): 551.
【5】Xue J J, Yang Z L, Han L J, et al. Applied Energy, 2015, 137: 18.
【6】Jin X L, Chen X L, Shi C H, et al. Bioresource Technology, 2017, 241: 603.
【8】Shen G H, Han L J, Fan X, et al. Journal of near Infrared Spectroscopy, 2017, 25(1): 63.
【9】Xie L J, Wang A C, Xu H R, et al. Transactions of the Asabe, 2016, 59(2): 399.
【10】Li X L, Sun C J, Zhou B X, et al. Scientific Reports, 2015, 5: 17210.
【11】Niu W J, Huang G Q, Liu X, et al. Energy & Fuels, 2014, 28(12): 7474.
【12】Yang Y, Wang L, Wu Y J, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2017, 182: 73.
【13】Sheykhizadeh S, Naseri A. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2018, 194: 202.
【14】Kutsanedzie F Y H, Chen Q, Hassan M M, et al. Food Chemistry, 2018, 240: 231.
【15】Kim J S, Lee Y Y, Kim T H. Bioresource Technology, 2016, 199: 42.
LIU Jin-ming,CHU Xiao-dong,WANG Zhi,XU Yong-hua,LI Wen-zhe,SUN Yong. Optimization of Characteristic Wavelength Variables of Near Infrared Spectroscopy for Detecting Contents of Cellulose and Hemicellulose in Corn Stover[J]. Spectroscopy and Spectral Analysis, 2019, 39(3): 743-750
刘金明,初晓冬,王 智,许永花,李文哲,孙 勇. 玉米秸秆纤维素和半纤维素NIRS特征波长优选[J]. 光谱学与光谱分析, 2019, 39(3): 743-750