融合注意力机制和语义关联性的多标签图像分类

薛丽霞; 江迪; 汪荣贵; 杨娟

doi:doi:10.12086/oee.2019.180468

光电工程, 2019, 46 (9): 180468, 网络出版: 2019-10-14

融合注意力机制和语义关联性的多标签图像分类

Multi-label classification based on attention mechanism and semantic dependencies

论文大纲

薛丽霞江迪汪荣贵杨娟 ^*

作者单位

合肥工业大学计算机与信息学院，安徽合肥 230009

多标签图像分类卷积神经网络注意力机制语义关联性 multi-label classification convolution neural network attention mechanism semantic dependencies

摘要

卷积神经网络在单标签图像分类中表现出了良好的性能，但是，如何将其更好地应用到多标签图像分类仍然是一项重要的挑战。本文提出一种基于卷积神经网络并融合注意力机制和语义关联性的多标签图像分类方法。首先，利用卷积神经网络来提取特征；其次，利用注意力机制将数据集中的每个标签类别和输出特征图中的每个通道进行对应；最后，利用监督学习的方式学习通道之间的关联性，也就是学习标签之间的关联性。实验结果表明，本文方法可以有效地学习标签之间语义关联性，并提升多标签图像分类效果。

Abstract

Multi-label image classification which is a generalization of the single-label image classification is aimed to assign multi-labels to the image to full express the specific visual concepts contained in the image. We propose a method based on convolutional neural networks, which combines attention mechanism and semantic relevance, to solve the multi label problem. Firstly, we use convolution neural network to extract features. Then, we apply the attention mechanism to obtain the correspondence between the label and channel of the feature map. Finally, we explore the channel-wise correlation which is essentially the semantic dependencies between labels by means of supervised learning. The experimental results show that the proposed method can exploit the dependencies between multiple tags to improve the performance of multi label image classification.

参考文献

[1] Sivic J, Zisserman A. Video Google: a text retrieval approach to object matching in videos[C]//Proceedings 9th IEEE International Conference on Computer Vision, 2003: 1470–1477.

[2] 汪荣贵, 丁凯, 杨娟, 等. 三角形约束下的词袋模型图像分类方法[J]. 软件学报, 2017, 28(7): 1847–1861.

Wang R G, Ding K, Yang J, et al. Image classification based on bag of visual words model with triangle constraint[J]. Journal of Software, 2017, 28(7): 1847-1861.

[3] 黄启宏, 刘钊. 基于多超平面支持向量机的图像语义分类算法(英文)[J]. 光电工程, 2007, 34(8): 99–104.

Huang Q H, Liu Z. Multiple-hyperplane SVMs algorithm in image semantic classification[J]. Opto-Electronic Engineering, 2007, 34(8): 99–104.

[4] Chang C C, Lin C J. LIBSVM: a library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27.

[5] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5–32.

[6] Harzallah H, Jurie F, Schmid C. Combining efficient object localization and image classification[C]//Proceedings of the 12th International Conference on Computer Vision, 2009: 237–244.

[7] Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91–110.

[8] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886–893.

[9] Ojala T, Pietik inen M, Harwood D. A comparative study of texture measures with classification based on featured distributions[J]. Pattern Recognition, 1996, 29(1): 51–59.

[10] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556[cs.CV], 2015.

[11] Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Computer Vision and Pattern Recognition, 2017: 2261–2269.

[12] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.

[13] Razavian A S, Azizpour H, Sullivan J, et al. CNN features off-the-shelf: an astounding baseline for recognition[C]// Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 512–519.

[14] Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of 2009 IEEE Computer Vision and Pattern Recognition, 2009: 248–255.

[15] Wei Y C, Xia W, Lin M, et al. HCP: a flexible CNN framework for multi-label image classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(9): 1901–1907.

[16] Cheng M M, Zhang Z M, Lin W Y, et al. BING: binarized normed gradients for objectness estimation at 300fps[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 3286–3293.

[17] Wang J, Yang Y, Mao J H, et al. CNN-RNN: a unified framework for multi-label image classification[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2285–2294.

[18] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735–1780.

[19] Zhang J J, Wu Q, Shen C H, et al. Multilabel image classification with regional latent semantic dependencies[J]. IEEE Transactions on Multimedia, 2018, 20(10): 2801–2813.

[20] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning, 2015: 448–456.

[21] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 2011: 315–323.

[22] Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention[J]. arXiv:1412.7755[cs.LG], 2015.

[23] Xu K, Ba J, Kiros R, et al. Show, attend and tell: neural image caption generation with visual attention[J]. arXiv:1502.03044 [cs.LG], 2015.

[24] Wang Z X, Chen T S, Li G B, et al. Multi-label image recognition by recurrently discovering attentional regions[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 464–472.

[25] Everingham M, van Gool L, Williams C K I, et al. The Pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338.

[26] Srivastava N, Salakhutdinov R. Learning representations for multimodal data with deep belief nets[C]//Proceedings of 2012 ICML Representation Learning Workshop, 2012: 79.

[27] Wang R G, Xie Y F, Yang J, et al. Large scale automatic image annotation based on convolutional neural network[J]. Journal of Visual Communication and Image Representation, 2017, 49: 213–224.

[28] Li Y N, Yeh M C. Learning image conditioned label space for multilabel classification[J]. arXiv:1802.07460[cs.CV], 2018.

薛丽霞, 江迪, 汪荣贵, 杨娟. 融合注意力机制和语义关联性的多标签图像分类[J]. 光电工程, 2019, 46(9): 180468. Xue Lixia, Jiang Di, Wang Ronggui, Yang Juan. Multi-label classification based on attention mechanism and semantic dependencies[J]. Opto-Electronic Engineering, 2019, 46(9): 180468.

融合注意力机制和语义关联性的多标签图像分类

关于本站 Cookie 的使用提示

全站搜索

融合注意力机制和语义关联性的多标签图像分类

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索