基于特征重组的遥感图像有向目标检测
Object detection of optical remote sensing images is the process of providing a given optical remote sensing image dataset with object positioning frame, object category, and confidence by model processing, and it is an important task in remote sensing image processing and has practical significance in both civil and military fields. In the civil field, it can be employed to analyze the situations of airport flights and ships in ports and thus facilitate timely adjustment and avoid congestion. In the military field, enemies' military deployment is analyzed by the photographed images, and feasible plans are made to ensure successful military operations. Therefore, object detection of remote sensing images has research significance and application prospect. Compared with the traditional detection algorithms, the detection method based on the convolutional neural network has become the mainstream object detection of remote sensing images. The method based on deep learning can yield better accuracy than the traditional object detection methods of visible light remote sensing images, and it is unnecessary to manually design rules, which has a relatively unified standard and enhances the model robustness. However, there are still many defects in introducing the object detection model dealing with natural images directly into remote sensing tasks. Starting from the oriented object detection difficulties of remote sensing, we design an oriented object detection algorithm for optical remote sensing images to improve the feature extraction and feature recognition ability of multi-scale and multi-directional remote sensing small targets in complex backgrounds.
Aiming at the poor performance of general algorithms for remote sensing oriented object detection, we propose an oriented object detection model based on SWA training strategy and feature recombination. The model is optimized based on the Rotated RPN algorithm. On the one hand, the feature recombination mechanism is introduced to make the model focus on effective features, which can reduce unnecessary computing resources and improve the model accuracy. On the other hand, based on RPN, the rotating RPN is introduced, and the position and angle parameters are regressed by the midpoint offset method to generate high-quality directed candidate frames. For the required feature inconsistency between classification and regression tasks, a polarized attention detector is employed, and the training strategy is improved. Meanwhile, the model is trained by cyclic mode to alleviate the problem that the traditional training strategy will converge to the boundary region of the optimal solution.
Specifically, we conduct the following improvements based on Rotated RPN. 1) Given the problems in the object detection tasks of remote sensing images, such as a large number of small targets, a large proportion of background, and a large change in target size, the feature pyramid can not extract effective information during extracting and fusing features, which degrades detection performance. Therefore, we consider making changes in the feature pyramid to strengthen the feature extraction ability of the feature pyramid and the ability to fully fuse information of various sizes. Additionally, the reshape module is designed and integrated into the Carafe model as a deep horizontal connection of FPN. 2) To solve the problems of angle discontinuity and edge order exchange in the critical angle of the common directed box representation, we introduce the midpoint offset method to define the directed box. An adaptive attention module is designed in front of the suggested area generation module to enhance the ability of effective feature representation and further strengthen the ability of feature extraction and characterization. 3) The features required for the classification task should have the same response to different angles, which is because the focus of the classification task should be on the target itself. Thus, it should be highly responsive to the effective information inside the prediction frame, while the features required for the regression task should be sensitive to the angle change. Meanwhile, more attention should be paid to the boundary area of the target and less attention is to the information inside the prediction frame for realizing accurate angle and position prediction and reducing interference. Therefore, to avoid feature interference between different tasks and extract key features, we introduce a polarization attention module to the shared convolution layer at the front end of the dual-branch detector and adopt different response functions to distinguish the representation ability of different features. The classification head and regression head employ an activation function and an inhibition function respectively. 4) In view of the limitation that the traditional training strategy may converge to the boundary region of the optimal solution, we introduce the SWA cyclic training strategy, obtain the corresponding weights by adopting the SGD method to train more epochs, and average these results to acquire results closest to the optimal solution.
To verify the algorithm performance, we select two remote sensing oriented annotation datasets Dior-R and HRSC2016 to compare the algorithm performance. Several typical one-stage and two-stage oriented object detection models are selected and compared with this model. On the Dior-R dataset, our algorithm yields the best accuracy of 64.49%, 4.95% higher than that of the benchmark model (Table 5). On the HRSC2016 dataset, the proposed algorithm achieves the best accuracy of 90.83%, which is 11.75% higher than that of the benchmark model (Table 7). Additionally, we analyze the performance improvement after introducing the feature recombination module, focus shift method, adaptive attention module, polarized attention detector, and SWA training strategy respectively. The experimental results show that the algorithm has sound detection performance for remote sensing oriented objects in complex backgrounds.
To improve the detection performance of oriented objects in remote sensing images, we propose an oriented object detection model based on feature recombination and polarized attention. The experimental results show that the algorithm can effectively detect oriented objects in remote sensing images, and has good performance in all kinds of scenes.
1 引言
光学遥感图像有向目标检测是指将给定的光学遥感数据集输入检测模型,给出带有目标定位框、目标类别以及置信度的过程,其无论是在民用领域还是在军用领域都有着较大的应用价值[1-2]。因此,遥感图像目标检测具有重要的研究意义和良好的应用前景。
由于拍摄角度和拍摄距离不同,遥感图像与自然图像相比具有小目标规模大、不同目标尺寸相差大、背景复杂等检测难点,从而影响检测性能。结合深度学习在检测领域取得的成绩,人们研究了许多方法来实现遥感图像中的目标检测。针对遥感图像定向目标检测框架,FFAM-YOLO[3]利用级联注意力机制解决遥感图像中小目标特征信息少、定位困难的问题。FFC-SSD[4]采用分组聚类获得更合适的样本尺寸分布,设计反池化高效多尺度特征融合模块增强特征提取能力。Suresh等[5]利用滑动窗口提取不同的特征,然后根据提前设置好的字典中的特征与目标进行匹配。Xu等[6]放弃使用IoU作为样本分配策略,提出一种基于高斯分布的标签分配策略,使用高斯感受野距离来度量真实值与预测值之间的距离。Nan等[7]为模型引入注意力机制,关注存在目标的高响应区域,提升模型的检测性能。
由于遥感图像的拍摄角度为俯视,目标的方向各异。传统的定向框难以高效地标注目标,导致模型的检测性能受限。针对遥感图像有向目标检测框架,徐志京等[8]设计生成对抗网络DEGAN来增强小目标的纹理特征,采用深度强化学习增强图像的颜色信息,设计自适应特征变换金字塔对特征进行提取。Zhao等[9]提出基于注意力机制的特征融合模块、可变形横向连接模块和基于锚框引导的检测模块,对飞机的特征进行细化调整。Yang等[10]使用分割网络来指导检测模型,设计了去噪模块,可以突出小目标的特征。Ding等[11]在RPN阶段只生成水平预选框,之后模型通过轻量的全连接层将水平候选框旋转为有向候选框,然后从有向候选框中获取旋转区域特征,对旋转框进行回归。为了解决在临界角度出现的不连续性和方形问题,Yang等[12]提出使用高斯距离计算回归损失。Zhang等[13]针对目标定位问题在前人的基础上进行优化,固定使用特征金字塔中的某两层输入RPN网络,解决模型对于不同层的选择问题。为了解决遥感图像背景复杂的问题,Li等[14]在多尺度的基础上引入视觉注意力机制,并在不同尺寸图上单独使用,从而削弱背景噪声的影响。Ma等[15]提出的Rotated RPN算法放弃在原有的水平框基础上加入角度参数作为有向框标注,直接生成任意方向的建议框。
本文面向遥感图像有向目标检测任务,针对通用模型对遥感有向目标检测性能不足的问题,提出一种基于特征重组和极化注意力机制的有向目标检测模型。针对遥感小目标特征难以提取的问题,设计基于特征重组的特征金字塔模块,保证提取特征的完整性。针对常用有向框标注法存在的临界角度问题,设计自适应建议区域生成模块,缓解角度不连续引起的损失变大问题。针对常见检测头在使用遥感目标特征信息时存在的不一致性问题,设计极化注意力检测头,区分回归和分类头所需的不同特征,提升检测精度。该模型在Rotated RPN模型的基准上进行优化,实验结果表明,改进后模型在有向遥感数据集上的检测精度明显提升。
2 PFR-Rotate模型
2.1 基准Rotated RPN模型
Rotated RPN[15]原本是针对文本检测提出的,用于处理面向任意方向的文本检测任务。由于遥感图像目标检测任务的兴起,不少学者将其引入遥感图像检测任务中。Rotated RPN主要工作有两点:1)提出了RRPN这一网络结构,这种结构可以为RPN直接生成带方向参数的预测框,从而使预测框能更贴合有向目标,达到提升精度的目的;2)为了配合RRPN,将生成的任意方向的建议框映射到特征图上再进行池化,提出了RROI。Rotated RPN是在二阶段检测模型的基础上进行的改进,整体框架如
Rotated RPN在光学遥感图像有向目标检测任务中精度仍有提升空间,问题可能在于:1)遥感数据集中有向小目标规模大,特征难以提取完全,Rotated RPN使用的传统主干网络以及特征金字塔网络对分布密集的小目标关注度不够高,导致模型无法对所有小目标精准定位;2)Rotated RPN在每个锚点上生成多个角度的锚框,但仍然难以兼顾到所有角度,且导致计算量增加;3)Rotated RPN使用的通用检测头将分类和回归任务耦合在一起,没有考虑到不同任务所需特征的不一致性。本文针对以上问题,在Rotated RPN的基础上进行优化改进,提出PFR-Rotate模型。
为了缓解遥感图像中有向目标存在小目标规模大、背景比例较大、目标密集等问题,设计了一种基于特征重组和极化注意力的有向目标检测模型PFR-Rotate,改进的模型增强了不同尺寸特征之间的互补性,提升了模型的检测能力。PFR-Rotate模型结构如
2.2 基于特征重组的特征金字塔模块
遥感图像中小目标规模大、密集分布且背景占据较大比例,导致特征难以提取,影响检测性能。通用的特征金字塔模块尽管考虑到不同尺寸之间的融合,但并未针对小目标的特征提取采取有效措施,因此在数据集存在密集小目标的情况下检测效果不佳。本文为加强模型特征提取能力以及对密集小目标的检测性能,提出FR-FPN模块,在特征金字塔最深层使用FR模块作为横向连接。
FR模块包括两部分:第一部分是上采样核预测模块。首先,对输入特征图进行通道压缩,减少后续计算量。将上采样核reshape为1×C和C×1尺寸(C为卷积核通道数)并进行有机融合,得到新的融合特征间相关性的上采样核。reshape部分结构如
2.3 基于特征自适应的建议区域生成模块
常见的有向框定义法有opencv定义法、长边定义法、八参数定义法等。opencv表示法由于角度定义不连续,在临界角度时需要转动较大角度,回归损失较大。长边表示法定义较长边与X轴的夹角为旋转角,但预测框刚好是正方形时没有真正意义上的长边,表示方式不唯一。八参数定义法在临界角度时会出现顶点顺序交换导致损失变大的问题。Rotated RPN为了避免这些临界问题,在生成预选框时直接生成多个多角度预测框,但计算量也因此增大,且生成的预测框不能兼顾到所有情况。
本文使用改进的中点偏移法[16]来定义有向框,可以有效解决传统表示方法中角度周期性、边顺序交换、方形问题。锚框使用6个参数来定义,其中x、y表示候选框外接矩形的中心点位置,w、h表示候选框外接矩形的宽度和高度,Δα、Δβ表示候选框顶点与外接矩形宽和高中点的偏移量。中点偏移定义法如
在得到6个参数后,通过
ATT-ORPN流程如
2.4 极化注意力检测头模块
常见的检测头是耦合的,分类和回归使用的特征共享,忽视了不同任务之间所需特征的差异性,而在处理遥感任务时,模型对遥感目标特征提取能力不足,导致特征差异性对检测能力影响更大。分类任务的关注点应该在目标本身,需要对预测框内部有效信息高响应,而回归任务只需要关注目标边界区域,应该减少对内部信息的关注,减少干扰。本文在极化注意力[17]的基础上设计了改进的双分支极化注意力检测头,分类分支使用高响应的全局特征,以减少噪声干扰,回归分支关注边界特征,抑制不相关的高激活区域对回归的影响。双分支极化注意力检测头结构如
极化检测头模块有两个分支,两个分支在经过空间注意力和通道注意力并联的结构后分别使用不同的响应函数,使用注意力机制可以进一步让模型聚焦于不同任务所需的不同特征。分类分支为了选择高响应区域作为分类的依据而使用激活函数,回归分支为了抑制高响应区域并使模型关注边界区域帮助回归任务的完成而使用抑制函数。极化注意力检测头在每一步完成后都结合原始特征,保证特征信息在模型处理时不丢失。
2.5 训练策略
目前,目标检测模型在训练时使用的训练策略多为SGD方法,一般会选择最后一个epoch的权重,或是在验证集上取得最优效果的epoch权重作为最终模型的权重。但传统的SGD方法训练的模型存在局限性,会收敛到最优解的边界区域。本文使用SWA训练策略[18],在SGD的基础上循环运行,即通过SGD方法额外训练多个epoch得到对应权重空间边界附近的多个点,平均这些点的结果,找到一个接近最优解的结果权重。由于平均了多个点的结果,SWA相比SGD具有更好的泛化性。
3 实验结果与分析
3.1 数据集
Dior-R数据集[19]由西北工业大学韩军伟课题组于2022年提出,由23463幅图像和190288个目标实例组成,共包含20个类别:飞机(APL)、机场(APO)、棒球场(BF)、篮球场(BC)、桥梁(BR)、烟囱(CH)、水坝(DAM)、高速公路服务区(ETS)、高速公路收费站(ESA)、高尔夫球场(GF)、田径场(GTF)、港口(HA)、立交桥(OP)、船只(SH)、体育场(STA)、储油罐(STO)、网球场(TC)、火车站(TS)、车辆(VE)和风车(WM)。由于成像条件、天气、季节不同,成像结果差异较大,并且具有较高的类间相似性和类内多样性,这些因素都增加了检测难度。
HRSC2016数据集由西北工业大学于2016年发布,图像尺寸范围为300~1500 pixel。训练集、验证集和测试集分别有436幅图像(包括1207个样本)、181幅图像(包括541个样本)和444幅图像(包括1228个样本)。HRSC2016与Dior-R数据集一样使用有向标注格式,尽管HRSC2016数据集只有一个大类,但其目标尺寸差异更大,背景颜色干扰更为严重,因此可以用作本文模型的泛化实验。
3.2 实验设置
本实验基于mmrotate框架[20],实验环境为Python 3.8、Pytorch 1.7.0、torchvision 0.7.0,batch size为2,动量因子为0.9,权重衰减系数为0.0005,所有实验均采用Resnet50[21]作为主干网络,以50%的概率随机翻转图像作为数据增强手段。在Dior-R数据集上训练12个epoch,学习率初始值设置为0.001,在迭代到第9和11个epoch后学习率分别下降到1×10-4和1×10-5。在HRSC2016数据集上训练40个epoch,学习率初始值为0.005。硬件设备为Intel® Core™ i9-10900X CPU、NVIDIA RTX3080Ti显卡。
3.3 消融实验
FPN层关系着模型对目标特征提取以及融合的能力,对检测性能有着重要的影响。为了验证在FPN深层引入特征重组模块的有效性,在Dior-R数据集上针对FPN设置了一组消融实验,将reshape思想融入SPP、CBAM等模块中,实验结果如
表 1. FPN部分消融实验
Table 1. Comparison of FPN ablation experimental results
|
从
检测头模块关系着模型针对特征的分析能力,影响最终的检测结果。为了验证极化注意力检测头中极化函数选择的有效性,针对PA-head中分类分支和回归分支使用的极化函数进行了一组消融实验,分别使用式(
表 2. 极化函数消融实验
Table 2. Comparison of polarization function ablation experimental results
|
针对分类分支中
表 3. 式(4)中η取值的消融实验
Table 3. Ablation experiment for value selection about η in Eq. (4)
|
为了验证本文模型各模块的有效性,在Dior-R数据集上设置了一组消融实验,以Rotated RPN作为基准模型,分别使用所改进的各部分模块在基准网络上进行实验,实验结果如
表 4. 消融实验结果对比
Table 4. Comparison of ablation experimental results
|
从
图 8. 各实验对Dior-R数据集每类样本的检测精度
Fig. 8. Average precision (AP) for each category in Dior-R dataset for each experiment
图 9. 损失函数和精度对比。(a)损失函数;(b)mAP
Fig. 9. Loss function and mAP comparison. (a) Loss function; (b) mAP
3.4 模型对比实验
为了验证所提出的用于遥感有向目标检测模型PFR-Rotate的有效性,将本文模型与其他一些一阶段和二阶段代表性有向检测模型进行比较,结果如
表 5. Dior-R数据集上各模型检测精度对比
Table 5. Comparison of precision of different network models on Dior-R dataset
|
将几个代表性的定向检测模型和有向检测模型在Dior-R数据集上分别使用水平标注和有向标注进行实验,实验结果如
表 6. Dior-R数据集定向和有向标注结果对比
Table 6. Comparison of directional and directed labeling results of Dior-R dataset
|
为了验证本文模型在有向数据集上的通用性,将本文模型与其他模型在HRSC2016数据集上也进行对比实验,结果如
表 7. HRSC2016数据集上各模型结果对比
Table 7. Comparison of test results of different network models on HRSC2016 dataset
|
图 11. HRSC2016数据集可视化结果对比
Fig. 11. Comparison of visualization results for HRSC2016 dataset
4 总结
为了解决通用目标检测模型对遥感图像中有向目标检测能力不足的问题,在Rotated RPN的基础上进行改进,设计了一种基于特征重组和极化注意力的有向目标检测框架PFR-Rotate。引入特征重组模块,将FPN深层特征进行重组加权,使模型能够更加关注有效特征而忽视无效特征对检测的干扰。使用新的有向框标注方法并且为RPN设计自适应注意力模块,增强特征表达。为检测头引入极化注意力模块,以减弱不同任务之间所需特征的差异性。从结果来看,本文模型在两个数据集中均取得优秀的表现,体现了模型的有效性。本文模型虽然提升了检测精度,但是参数量较大,不能很好地适应实时检测的需求。后续工作考虑在尽可能保证精度的前提下对模型进行轻量化处理。
[1] YangF, FanH, ChuP, et al. Clustered object detection in aerial images[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 27-November 2, 2019, Seoul, Republic of Korea. New York: IEEE Press, 2020: 8310-8319.
[2] Cui Z Y, Li Q, Cao Z J, et al. Dense attention pyramid networks for multi-scale ship detection in SAR images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(11): 8983-8997.
[3] 张寅, 朱桂熠, 施天俊, 等. 基于特征融合与注意力的遥感图像小目标检测[J]. 光学学报, 2022, 42(24): 2415001.
[4] 薛俊达, 朱家佳, 张静, 等. 基于FFC-SSD模型的光学遥感图像目标检测[J]. 光学学报, 2022, 42(12): 1210002.
[5] Suresh V, Janik P, Rezmer J, et al. Forecasting solar PV output using convolutional neural networks with a sliding window algorithm[J]. Energies, 2020, 13(3): 723.
[6] XuC, WangJ W, YangW, et al. RFLA: Gaussian receptive field based label assignment for tiny object detection[M]//Avidan S, BrostowG, CisséM , et al. Computer vision–ECCV 2022. Lecture notes in computer science. Cham: Springer, 2022, 13669: 526-543.
[7] Nan Z X, Peng J Z, Jiang J J, et al. A joint object detection and semantic segmentation model with cross-attention and inner-attention mechanisms[J]. Neurocomputing, 2021, 463: 212-225.
[8] 徐志京, 柏雪. 基于双重特征增强的遥感舰船小目标检测[J]. 光学学报, 2022, 42(18): 1828002.
[9] Zhao Y, Zhao L J, Liu Z, et al. Attentional feature refinement and alignment network for aircraft detection in SAR imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 5220616.
[10] Yang X, Yan J C, Liao W L, et al. SCRDet: detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 2384-2399.
[11] DingJ, XueN, LongY, et al. Learning RoI transformer for oriented object detection in aerial images[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA. New York: IEEE Press, 2020: 2844-2853.
[13] Zhang S, He G H, Chen H B, et al. Scale adaptive proposal network for object detection in remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(6): 864-868.
[14] LiQ P, MouL C, JiangK Y, et al. Hierarchical region based convolution neural network for multiscale object detection in remote sensing images[C]//2018 IEEE International Geoscience and Remote Sensing Symposium, July 22-27, 2018, Valencia, Spain. New York: IEEE Press, 2018: 4355-4358.
[15] Ma J Q, Shao W Y, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
[16] XieX X, ChengG, WangJ B, et al. Oriented R-CNN for object detection[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV), October 10-17, 2021, Montreal, QC, Canada. New York: IEEE Press, 2022: 3500-3509.
[17] Ming Q, Miao L J, Zhou Z Q, et al. CFC-net: a critical feature capturing network for arbitrary-oriented object detection in remote-sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5605814.
[19] Cheng G, Wang J B, Li K, et al. Anchor-free oriented proposal generator for object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5625411.
[20] ZhouY, YangX, ZhangG F, et al. MMRotate: a rotated object detection benchmark using PyTorch[C]//Proceedings of the 30th ACM International Conference on Multimedia, October 10-14, 2022, Lisboa, Portugal. New York: ACM, 2022: 7331-7334.
[21] HeK M, ZhangX Y, RenS Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE Press, 2016: 770-778.
[22] LinT Y, GoyalP, GirshickR, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE Press, 2017: 2999-3007.
[23] Xu Y C, Fu M T, Wang Q M, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(4): 1452-1459.
[24] LiuW, AnguelovD, ErhanD, et al. SSD: single shot MultiBox detector[M]//Leibe B, MatasJ, SebeN , et al. Computer vision–ECCV 2016. Lecture notes in computer science. Cham: Springer, 2016, 9905: 21-37.
[26] CaiZ W, VasconcelosN. Cascade R-CNN: delving into high quality object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE Press, 2018: 6154-6162.
[27] 王友伟, 郭颖, 邵香迎. 基于改进级联算法的遥感图像目标检测[J]. 光学学报, 2022, 42(24): 2428004.
[28] Han J M, Ding J, Li J, et al. Align deep features for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5602511.
[29] LiW T, ChenY J, HuK X, et al. Oriented RepPoints for aerial object detection[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 18-24, 2022, New Orleans, LA, USA. New York: IEEE Press, 2022: 1819-1828.
[30] Li J X, Tian Y, Xu Y P, et al. Oriented object detection in remote sensing images with anchor-free oriented region proposal network[J]. Remote Sensing, 2022, 14(5): 1246.
Article Outline
王友伟, 郭颖, 邵香迎, 王季宇, 鲍正位. 基于特征重组的遥感图像有向目标检测[J]. 光学学报, 2024, 44(6): 0628001. Youwei Wang, Ying Guo, Xiangying Shao, Jiyu Wang, Zhengwei Bao. Oriented Object Detection in Remote Sensing Images Based on Feature Recombination[J]. Acta Optica Sinica, 2024, 44(6): 0628001.