光谱学与光谱分析, 2014, 34 (10): 2868, 网络出版: 2014-10-23  

基于K-L散度的核磁共振波谱数据尺度缩放方法

A Novel Metabolomic Data Scaling Method Based on K-L Divergence
作者单位
1 厦门大学电子科学系, 福建省等离子体与磁共振研究重点实验室, 福建 厦门 361005
2 厦门大学通信工程系, 福建 厦门 361005
3 Department of Bioprocess Engineering & Institute of Bioproduct Development, Universiti Teknologi Malaysia, Skudai81310, Malaysia
摘要
在基于核磁共振(NMR)的代谢组学数据分析中, 尺度缩放是关键的预处理步骤之一, 其主要目的是通过调整数据的方差结构, 改善后续的多变量统计分析的结果。 从信息熵的角度出发, 利用Kullback-Leibler (K-L)散度来度量不同实验分组的生物样品的1H NMR波谱数据的差异程度, 并结合单位方差缩放法, 提出一种基于K-L散度的尺度缩放方法。 该方法先利用单位方差法将数据各变量的标准差调整到同一水平上, 再利用K-L散度对各变量进行有监督地加权, 增强重要变量、 减弱无关变量。 由于K-L散度是在概率分布的意义上度量数据间的差异程度, 且对于高斯和非高斯分布的数据均适用, 因此能更准确地度量不同实验分组样品的1H NMR波谱数据的差异性, 从而更有效地地对谱数据的重要变量进行识别和加权。 人群尿液1H NMR波谱数据的分析结果表明, 基于K-L散度的尺度缩放方法能有效抑制噪声变量, 同时很好地区分特征变量和非特征变量; 提高主成分回归(PCR)模型的判别能力; 改善偏最小二乘回归判别分析(PLS-DA)模型的解释能力、 预测能力以及对特征代谢物的辨识能力。
Abstract
A new scaling method in the current study based on Kullback-Leibler (K-L) divergence is proposed for NMR metabolomic data. The proposed method (called K-L scaling) is a supervised scaling method as group information is incorporated in the scaling procedure. Notably, K-L divergence measures the difference between two different datasets by their probability distributions, it can be used for the analysis of data that either follows Gaussian or non-Gaussian distributions. In K-L scaling, all variables were first standardized to unit variance, then their variance was adjusted using Kullback-Leibler divergence to highlight the significant variables. K-L scaling can tell effectively the difference in spectral data points between two experimental groups, and then enhances the weights of biological-relevant variables, and at the same time reduces the weight of noise and uninformative variables. The developed method was applied to a 1H-NMR metabolomic dataset acquired from human urine. Analysis results of the dataset showed that this new scaling method is efficient in suppressing the contribution of noise in the resulting multivariate model. In addition, it can increase the weights of important variables, and improve the interpretability and predictability of subsequent principal component regression (PCR) and partial least squares discriminant analysis (PLS-DA). Furthermore, the scaling method facilitated the identification of metabolic signatures. The current result suggested that the developed K-L scaling method may become a useful alternative for the preprocessing of NMR-based metabolomic data.

邓伶莉, Cheng Kian-Kai, 沈桂平, 周玲, 刘新卓, 董继扬, 陈忠. 基于K-L散度的核磁共振波谱数据尺度缩放方法[J]. 光谱学与光谱分析, 2014, 34(10): 2868. DENG Ling-li, Cheng Kian-Kai, SHEN Gui-ping, ZHOU Ling, LIU Xin-zhuo, DONG Ji-yang, CHEN Zhong. A Novel Metabolomic Data Scaling Method Based on K-L Divergence[J]. Spectroscopy and Spectral Analysis, 2014, 34(10): 2868.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!