中国激光, 2019, 46 (11): 1109001, 网络出版: 2019-11-09   

结合分水岭和回归网络的视频时序动作选举算法 下载: 1138次

Algorithm for Video Temporal Action Proposal Combining Watershed and Regression Networks
作者单位
1 东北大学信息科学与工程学院, 辽宁 沈阳 110819
2 东北大学机器人科学与工程学院, 辽宁 沈阳 110169
摘要
针对时序动作选举任务,设计一种两段式动作候选区域选举网络。第一段将改进的分水岭算法应用于一维时序信号,通过浸水聚类产生多种不同长度的候选区域,实现动作时序边界的粗定位,进而提出一种时序金字塔结构化方法,引入动作片段的上下文信息模块,对候选区域的主体信息和上下文信息进行结构化建模,生成一个增强的全局特征。第二段利用时序坐标回归算法定位动作边界,同时加入动作/背景分类器过滤背景候选区域,得到更加精确的时序边界。整个网络以三维卷积神经网络(C3D)提取的单元级特征进行训练,挖掘了视频时域和空域的丰富语义,在提升算法精度的同时大大提升了训练效率。在两大基准数据集Thumos 14和ActivityNet上进行测试,结果表明,与已有方法相比,两段式视频时序动作选举算法达到了最优平均召回率,可有效提高动作定位的精度。
Abstract
A two-stage action-candidate regional proposal network is designed herein for a temporal action detection task. The first stage applies a modified watershed algorithm to an one-dimensional temporal signal to form candidate regions with different lengths by immersion clustering, which obtains a rough localization of action temporal boundary. Then, a temporal pyramid structural method is introduced to model the structure of action instances and their contextual information, generating an enhanced global feature. The second stage performs a temporal-coordinate regression algorithm to local the action boundary, and simultaneously a classifier for the action and boundary is added to filter the candidate regions of background for obtaining a more accurate temporal boundary. Furthermore, an unit-level feature extracted by a three-dimensional convolution neural network (C3D) is used to train the entire two-stage proposal algorithm, which contains both spatial and temporal information and considerably improves training efficiency while improving the accuracy of the algorithm. Experiments on two large-scale benchmark datasets, Thumos 14 and ActivityNet, show that the proposed approach achieves the optimal average recall rate over other state-of-the-art methods, indicating that this method can efficiently improve the precision of an action localization task.

黄韵文, 王斐, 李景宏, 王国锐. 结合分水岭和回归网络的视频时序动作选举算法[J]. 中国激光, 2019, 46(11): 1109001. Yunwen Huang, Fei Wang, Jinghong Li, Guorui Wang. Algorithm for Video Temporal Action Proposal Combining Watershed and Regression Networks[J]. Chinese Journal of Lasers, 2019, 46(11): 1109001.

本文已被 1 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!