光谱学与光谱分析, 2012, 32 (2): 510, 网络出版: 2012-02-20
基于随机森林的激变变星候选体的数据挖掘
Data Mining Approach to Cataclysmic Variables Candidates Based on Random Forest Algorithm
摘要
提出一种适用于在郭守敬望远镜海量光谱中自动、 快速筛选激变变星的方法。 利用已证认的激变变星光谱作为模板, 通过随机森林分类训练, 得到一个分类模型, 该模型给出了各个波长对应流量的重要性排序, 可根据该排序进行降维并用于激变变星判别, 结果作为反馈进一步丰富模板库。 实验中共发现了16个新的激变变星候选体, 表明了该方法的可行性。
Abstract
An automatic and efficient method for cataclysmic variables candidates is presented in the present paper. The identified CVs were selected as templates. A model was constructed by random forest algorithm with templates and random selected spectra. Wavelength ranking was described by the model and the classifier was constructed afterwards. Most of the non-candidates were excluded by the method. Template matching strategy was used to identify the final candidates which were analyzed to complement the templates as feedback. 16 new CVs candidates were found in the experiment that shows that our approach to finding special celestial bodies can be feasible in LAMOST.
姜斌, 罗阿理, 赵永恒. 基于随机森林的激变变星候选体的数据挖掘[J]. 光谱学与光谱分析, 2012, 32(2): 510. JIANG Bin, LUO A-li, ZHAO Yong-heng. Data Mining Approach to Cataclysmic Variables Candidates Based on Random Forest Algorithm[J]. Spectroscopy and Spectral Analysis, 2012, 32(2): 510.