激光与光电子学进展, 2017, 54 (10): 103001, 网络出版: 2017-10-09   

基于二分搜索结合修剪随机森林的特征选择算法在近红外光谱分类中的应用 下载: 656次

Feature Selection Algorithm Application in Near-Infrared Spectroscopy Classification Based on Binary Search Combined with Random Forest Pruning
作者单位
1 中国海洋大学信息科学与工程学院, 山东 青岛 266100
2 云南中烟工业有限责任公司技术中心, 云南 昆明 650024
摘要
针对随机森林(RF)在高维空间特征选择过程中计算繁琐和内存开销大、分类准确率低等问题, 提出了基于二分搜索(BS)结合修剪随机森林(RFP)的特征选择算法(BSRFP); 该算法首先根据纯度基尼指数获取特征重要性评分, 删除重要性评分较低的特征, 然后利用BS算法结合基分类器差异性的修剪技术得到最优特征子集和最高分类准确率的分类器; 为了验证算法的有效性, 构建卷烟质量识别模型并与其他方法进行比较。结果表明: BS算法简化了特征搜索过程, RFP算法缩减了RF算法的规模; RFP算法的分类准确率可达96.47%; BSRFP算法选择出的特征相关性更强, 对卷烟质量识别具有更高的准确度。
Abstract
In view of the problems of the random forest in the feature selection process in high-dimensional spaces, such as calculation complexity, large model memory overhead, and low classification accuracy, a feature selection algorithm named binary search random forest pruning (BSRFP) is proposed. This algorithm firstly obtains the feature importance scores according to the purity Gini index, and deletes features with low importance scores. The optimal feature subset and the classifier with the highest classification accuracy are then obtained with utilization of the pruning technique combining binary search with the diversity among base classifiers. To verify the effectiveness of this algorithm, a cigarette quality recognition model is established and compared with other methods. The results show that the binary search algorithm simplifies the feature search process, and the RFP algorithm reduces the size of random forest algorithm. The classification accuracy of the random forest pruning algorithm is 96.47%. The features selected by using BSRFP algorithm are more correlated, and the algorithm provides higher accuracy of cigarette quality recognition.
参考文献

[1] 孙通, 吴宜青, 李晓珍, 等. 基于近红外光谱和子窗口重排分析的山茶油掺假检测[J]. 光学学报, 2015, 35(6): 0630005.

    Sun Tong, Wu Yiqing, Li Xiaozhen, et al. Discrimination of camellia oil adulteration by NIR spectra and subwindow permutation analysis[J]. Acta Optica Sinica, 2015, 35(6): 0630005.

[2] 刘炜, 常庆瑞, 郭曼, 等. 夏玉米可见/近红外光小波主成分提取与氮素含量神经网络检测[J]. 红外与毫米波学报, 2011, 30(1): 48-54.

    Liu Wei, Chang Qingrui, Guo Man, et al. Detection of leaf nitrogen content of summer corn using visible/near infrared spectra[J]. Journal of Infrared and Millimeter Waves, 2011, 30(1): 48-54.

[3] 陈晓峰, 龙长江, 牛智有, 等. 基于潜在语义分析与NIR的中药材分类研究[J]. 光学学报, 2014, 34(9): 0930001.

    Chen Xiaofeng, Long Changjiang, Niu Zhiyou, et al. Classification research of Chinese medicine based on latent semantic analysis and NIR[J]. Acta Optica Sinica, 2014, 34(9): 0930001.

[4] 李会梅, 刘刚, 欧全宏, 等. 8种豆的二维相关红外光谱的分析[J]. 激光与光电子学进展, 2016, 53(3): 033003.

    Li Huimei, Liu Gang, Ou Quanhong, et al. Analysis of eight bean species by two-dimensional correlation infrared spectroscopy[J]. Laser & Optoelectronics Progress, 2016, 53(3): 033003.

[5] 褚小立, 陆婉珍. 近五年我国近红外光谱分析技术研究与应用进展[J]. 光谱学与光谱分析, 2014, 34(10): 2595-2605.

    Chu Xiaoli, Lu Wanzhen. Research and application progress of near infrared spectroscopy analytical technology in China in the past five years[J]. Spectroscopy Spectral Analysis, 2014, 34(10): 2595-2605.

[6] 郭志明, 黄文倩, 彭彦昆, 等. 自适应蚁群优化算法的近红外光谱特征波长选择方法[J]. 分析化学, 2014, 42(4): 513-518.

    Guo Zhiming, Huang Wenqian, Peng Yankun, et al. Adaptive ant colony optimization approach to characteristic wavelength selection of NIR spectroscopy[J]. Chinese Journal of Analytical Chemistry, 2014, 42(4): 513-518.

[7] 方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3): 32-38.

    Fang Kuangnan, Wu Jianbin, Zhu Jianping, et al. A review of technologies on random forests[J]. Statistics & Information Forum, 2011, 26(3): 32-38.

[8] 张爱武, 肖涛, 段乙好, 等. 一种机载LiDAR点云分类的自适应特征选择方法[J]. 激光与光电子学进展, 2016, 53(8): 082802.

    Zhang Aiwu, Xiao Tao, Duan Yihao, et al. A method of adaptive feature selection for airborne LiDAR point cloud classification[J]. Laser & Optoelectronics Progress, 2016, 53(8): 082802.

[9] 杨珺雯, 张锦水, 朱秀芳, 等. 随机森林在高光谱遥感数据中降维与分类的应用[J]. 北京师范大学学报(自然科学版), 2015, 51(s1): 82-88.

    Yang Junwen, Zhang Jinshui, Zhu Xiufang, et al. Random forest applied for dimension reduction and classification in hyperspectral data[J]. Journal of Beijing Normal University (Natural Science Edition), 2015, 51(s1): 82-88.

[10] 姚登举, 杨静, 詹晓娟. 基于随机森林的特征选择算法[J]. 吉林大学学报(工学版), 2014, 44(1): 137-141.

    Yao Dengju, Yang Jing, Zhan Xiaojuan. Feature selection algorithm based on random forest[J]. Journal of Jilin University (Engineering and Technology Edition), 2014, 44(1): 137-141.

[11] 许勇刚, 张建业, 龚小刚, 等. 基于改进随机森林算法的电力业务实时流量分类方法[J]. 电力系统保护与控制, 2016, 44(24): 82-89.

    Xu Yonggang, Zhang Jianye, Gong Xiaogang, et al. A method of real-time traffic classification in secure access of the power enterprise based on improved random forest algorithm[J]. Power System Protection and Control, 2016, 44(24): 82-89.

[12] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32.

[13] 张世辉, 何欢, 孔令富, 等. 融合多特征基于图割实现视频遮挡区域检测[J]. 光学学报, 2015, 35(4): 0415001.

    Zhang Shihui, He Huan, Kong Lingfu, et al. Fusing multi-feature for video occlusion region detection based on graph cut[J]. Acta Optica Sinica, 2015, 35(4): 0415001.

[14] 唐伟, 刘丰年, 陈崇帮, 等. 改进的基尼指数在文本分类中的应用研究[J]. 长沙大学学报, 2013, 27(5): 55-63.

    Tang Wei, Liu Fengnian, Chen Chongbang, et al. Application of improved Gini index in the text classification[J]. Journal of Changsha University, 2013, 27(5): 55-63.

[15] Fan W L, Hu P, Liu Z G. Multi-attribute node importance evaluation method based on Gini-coefficient in complex power grids[J]. IET Generation, Transmission & Distribution, 2016, 10(9): 2027-2034.

[16] Yang F, Lu W H, Luo L K, et al. Margin optimization based pruning for random forest[J]. Neurocomputing, 2012, 94: 54-63.

刘明, 李忠任, 张海涛, 于春霞, 唐兴宏, 丁香乾. 基于二分搜索结合修剪随机森林的特征选择算法在近红外光谱分类中的应用[J]. 激光与光电子学进展, 2017, 54(10): 103001. Liu Ming, Li Zhongren, Zhang Haitao, Yu Chunxia, Tang Xinghong, Ding Xiangqian. Feature Selection Algorithm Application in Near-Infrared Spectroscopy Classification Based on Binary Search Combined with Random Forest Pruning[J]. Laser & Optoelectronics Progress, 2017, 54(10): 103001.

本文已被 1 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!