光谱学与光谱分析, 2019, 39 (10): 3292, 网络出版: 2019-11-05   

基于XGBOOST的恒星光谱分类特征数值化

XGBOOST Based Stellar Spectral Classification and Quantized Feature
作者单位
1 中国科学院国家天文台光学天文重点实验室, 北京 100101
2 中国科学院大学, 北京 100049
摘要
恒星光谱分类是研究恒星的基础性工作之一, 常用的光谱分类是基于20世纪70年代Morgan和Keenan建立起来的并逐步完善的MK分类系统。 然而基于MK规则的交互式决策分类系统对处理海量天文光谱数据存在着一定的困难。 目前光谱巡天一般采用的自动化分类则是模版匹配方法而忽略对谱线特征的测量。 怎样自动、 客观地提取海量光谱中的分类特征并应用这些特征进行分类可以对天体的物理化学性质的统计分析至关重要。 针对此问题, 通过机器学习和计算光谱的谱线指数结合的方法, 提取光谱特征, 并通过大数据分析定量地确定对光谱特征谱线的分类判据(数值化), 确定每一类光谱具有物理意义的特征谱线的强度分布。 首先对LAMOST DR4恒星光谱测量其谱线指数作为输入, 光谱的分类标记采用官方发布的分类结果。 使用XGBoost算法进行自动分类及特征排序, 从而获得已知或未知的对于分类决策最为敏感的谱线。 首先, 选取高信噪比(S/N>30)、 被LAMOST标记为B, A, F和M的恒星光谱数据, 总计约414万个。 然后, 对光谱数据计算谱线指数从而使其得到降维处理, 过滤冗余信息。 其次, 将处理后的恒星光谱数据随机划分为训练集和测试集, 通过适当调整算法参数, 用训练集得到所需要的分类决策树模型, 用测试集测试其稳定性和可用性, 以防止出现过拟合, 同时使用算法自带函数进行提取分类特征。 最后, 输出并整理实验中算法所得的决策树模型, 并挑选其概率比较大的分支作为最终的决策树模型。 通过实验, 可以发现在固定参数下, XGBoost所得的模型有一定的自适应性, 较少受数据集影响, 总体准确率可达88.5%; 同时其所输出的分类决策树与已知的特征较为吻合, 而且可以获得基于大数据的、 数值化的特征谱线对应分类的范围, 为完善基于特征的分类提供定量的规则。
Abstract
Star spectral classification is a foundational work of stellar research. The Morgan-Keenan (MK) classification system which was developed in 1970s is the most widely used classical classification system. However, MK based interactive decision classification system has some difficulties when dealing with massive quantity of astronomical spectral data. Nowadays the most widely used method of automatically classification is template match which neglects measuring the spectral line. As a result, one of the most popular topics is how to extract features from massive data objectively and precisely and to apply the features for making classification decisions. In this paper, we processed the spectral data of LAMOST DR4 stars to obtain the line index as input data and used the official released labels of the spectrum as outcome. The XGBoost algorithm was applied to automatically classify the stellar spectra and rank the features. In this way, the identified and potential line indices which are sensitive to classification were revealed. Firstly, we labeled and selected the spectral data of stars with B, A, F and M by LAMOST high signal-to-noise ratio (S/N>30) with the sample size amounting to around 41. 4 million. Then, the line indices of spectral data was calculated to reduce the dimension and to filter out the redundant information. Secondly, the processed star spectral data were randomly divided to a training set and a test set. By modifying the parameters, the required classification decision tree model was fitted by training set using XGBoost algorithm and the stability and availability of the model were validated by test set to avoid over-fitting. In the meantime, the classification features were extracted by the algorithm’s own function. Finally, the branch with the highest probabilities was selected as the final decision tree model. Through experiments, it is shown that the XGBoost model has a better performance in self-adaptability under fixed parameters with less affection in data sets and the overall accuracy rate as high as 88.5%. Moreover, the output classification decision tree is more consistent with identified features and the numerical characteristics of spectrum and its corresponding range are obtainable through the model. This would shed light on providing quantitative rules for evaluating classification decision trees with numerical spectral features.
参考文献

[1] Luo Ali, Zhao Yongheng, et al. Research in Astronomy and Astrophysics, 2015, 15(8): 1095.

[2] Schierscher F, Paunzen E. Astronomische Nachrichten, 2011, 332(6): 597.

[3] Hampton E J, Medling A M, Groves B, et al. Monthly Notices of the Royal Astronomical Society, 2017, 470: 3395.

[4] Liu Chao, Cui Wenyuan, et al. Research in Astronomy and Astrophysics, 2015, 15(8): 1137.

[5] Du Changde, Luo A, Yang H. New Astronomy, 2017, 51: 51.

[6] Worthey G, Faber S M, Gonzalez J Jesus, et al. The Astrophysical Journal Supplement Series, 94(2): 687.

[7] WANG Guang-pei, PAN Jing-chang, YI Zhen-ping, et al(王光沛, 潘景昌, 衣振萍, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2016, 36(8): 2646.

[8] Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. 785.

[9] Gray R O, Corbally C J. Stellar Spectral Classification. Princeton University Press, 2009. 160.

张枭, 罗阿理. 基于XGBOOST的恒星光谱分类特征数值化[J]. 光谱学与光谱分析, 2019, 39(10): 3292. ZHANG Xiao, LUO A-li. XGBOOST Based Stellar Spectral Classification and Quantized Feature[J]. Spectroscopy and Spectral Analysis, 2019, 39(10): 3292.

本文已被 1 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!