Global-local feature attention network with reranking strategy for image caption generation

WU Jie; XIE Si-ya; SHI Xin-bao; CHEN Yao-wen

doi:doi:10.1007/s11801-017-7185-4

光电子快报（英文版）, 2017, 13 (6): 448, Published Online: Sep. 13, 2018

Global-local feature attention network with reranking strategy for image caption generation

WU Jie ¹XIE Si-ya ¹SHI Xin-bao ¹CHEN Yao-wen ^2,*

Author Affiliations

¹ College of Engineering, Shantou University, Shantou 515063, China

² Key Laboratory of Digital Signal and Image Processing of Guangdong, Shantou University, Shantou 515063, China

Abstract

In this paper, a novel framework, named as global-local feature attention network with reranking strategy (GLAN-RS), is presented for image captioning task. Rather than only adopting unitary visual information in the classical models, GLAN-RS explores the attention mechanism to capture local convolutional salient image maps. Furthermore, we adopt reranking strategy to adjust the priority of the candidate captions and select the best one. The proposed model is verified using the Microsoft Common Objects in Context (MSCOCO) benchmark dataset across seven standard evaluation metrics. Experimental results show that GLAN-RS significantly outperforms the state-of-the-art approaches, such as multimodal recurrent neural network (MRNN) and Google NIC, which gets an improvement of 20% in terms of BLEU4 score and 13 points in terms of CIDER score.

PDF全文

WU Jie, XIE Si-ya, SHI Xin-bao, CHEN Yao-wen. Global-local feature attention network with reranking strategy for image caption generation[J]. 光电子快报（英文版）, 2017, 13(6): 448.

Global-local feature attention network with reranking strategy for image caption generation

关于本站 Cookie 的使用提示

全站搜索

Global-local feature attention network with reranking strategy for image caption generation

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索