光学学报, 2024, 44 (6): 0628001, 网络出版: 2024-03-04  


Oriented Object Detection in Remote Sensing Images Based on Feature Recombination
王友伟 1,2郭颖 1,2,*邵香迎 1,2王季宇 1,2鲍正位 1,2
1 南京信息工程大学江苏省大气环境与装备技术协同创新中心,江苏 南京 210044
2 南京信息工程大学自动化学院,江苏 南京 210044
针对遥感有向目标存在的检测问题,设计了一个基于改进Rotated RPN的网络,设计特征重组机制,通过加权使网络关注有效目标区域。使用新的有向框标注方法,避免在临界角度出现错位等问题。在检测头前端使用极化注意力模块,改善因为分类和回归任务所需特征不一致导致的性能下降问题。实验结果表明,该模型可以提高多类目标的检测精度。相较于基准Rotated RPN,该模型在Dior-R数据集上精度提升4.95%,在HRSC2016数据集上精度提升11.75%。

Object detection of optical remote sensing images is the process of providing a given optical remote sensing image dataset with object positioning frame, object category, and confidence by model processing, and it is an important task in remote sensing image processing and has practical significance in both civil and military fields. In the civil field, it can be employed to analyze the situations of airport flights and ships in ports and thus facilitate timely adjustment and avoid congestion. In the military field, enemies' military deployment is analyzed by the photographed images, and feasible plans are made to ensure successful military operations. Therefore, object detection of remote sensing images has research significance and application prospect. Compared with the traditional detection algorithms, the detection method based on the convolutional neural network has become the mainstream object detection of remote sensing images. The method based on deep learning can yield better accuracy than the traditional object detection methods of visible light remote sensing images, and it is unnecessary to manually design rules, which has a relatively unified standard and enhances the model robustness. However, there are still many defects in introducing the object detection model dealing with natural images directly into remote sensing tasks. Starting from the oriented object detection difficulties of remote sensing, we design an oriented object detection algorithm for optical remote sensing images to improve the feature extraction and feature recognition ability of multi-scale and multi-directional remote sensing small targets in complex backgrounds.


Aiming at the poor performance of general algorithms for remote sensing oriented object detection, we propose an oriented object detection model based on SWA training strategy and feature recombination. The model is optimized based on the Rotated RPN algorithm. On the one hand, the feature recombination mechanism is introduced to make the model focus on effective features, which can reduce unnecessary computing resources and improve the model accuracy. On the other hand, based on RPN, the rotating RPN is introduced, and the position and angle parameters are regressed by the midpoint offset method to generate high-quality directed candidate frames. For the required feature inconsistency between classification and regression tasks, a polarized attention detector is employed, and the training strategy is improved. Meanwhile, the model is trained by cyclic mode to alleviate the problem that the traditional training strategy will converge to the boundary region of the optimal solution.

Specifically, we conduct the following improvements based on Rotated RPN. 1) Given the problems in the object detection tasks of remote sensing images, such as a large number of small targets, a large proportion of background, and a large change in target size, the feature pyramid can not extract effective information during extracting and fusing features, which degrades detection performance. Therefore, we consider making changes in the feature pyramid to strengthen the feature extraction ability of the feature pyramid and the ability to fully fuse information of various sizes. Additionally, the reshape module is designed and integrated into the Carafe model as a deep horizontal connection of FPN. 2) To solve the problems of angle discontinuity and edge order exchange in the critical angle of the common directed box representation, we introduce the midpoint offset method to define the directed box. An adaptive attention module is designed in front of the suggested area generation module to enhance the ability of effective feature representation and further strengthen the ability of feature extraction and characterization. 3) The features required for the classification task should have the same response to different angles, which is because the focus of the classification task should be on the target itself. Thus, it should be highly responsive to the effective information inside the prediction frame, while the features required for the regression task should be sensitive to the angle change. Meanwhile, more attention should be paid to the boundary area of the target and less attention is to the information inside the prediction frame for realizing accurate angle and position prediction and reducing interference. Therefore, to avoid feature interference between different tasks and extract key features, we introduce a polarization attention module to the shared convolution layer at the front end of the dual-branch detector and adopt different response functions to distinguish the representation ability of different features. The classification head and regression head employ an activation function and an inhibition function respectively. 4) In view of the limitation that the traditional training strategy may converge to the boundary region of the optimal solution, we introduce the SWA cyclic training strategy, obtain the corresponding weights by adopting the SGD method to train more epochs, and average these results to acquire results closest to the optimal solution.

Results and Discussions

To verify the algorithm performance, we select two remote sensing oriented annotation datasets Dior-R and HRSC2016 to compare the algorithm performance. Several typical one-stage and two-stage oriented object detection models are selected and compared with this model. On the Dior-R dataset, our algorithm yields the best accuracy of 64.49%, 4.95% higher than that of the benchmark model (Table 5). On the HRSC2016 dataset, the proposed algorithm achieves the best accuracy of 90.83%, which is 11.75% higher than that of the benchmark model (Table 7). Additionally, we analyze the performance improvement after introducing the feature recombination module, focus shift method, adaptive attention module, polarized attention detector, and SWA training strategy respectively. The experimental results show that the algorithm has sound detection performance for remote sensing oriented objects in complex backgrounds.


To improve the detection performance of oriented objects in remote sensing images, we propose an oriented object detection model based on feature recombination and polarized attention. The experimental results show that the algorithm can effectively detect oriented objects in remote sensing images, and has good performance in all kinds of scenes.

1 引言



由于遥感图像的拍摄角度为俯视,目标的方向各异。传统的定向框难以高效地标注目标,导致模型的检测性能受限。针对遥感图像有向目标检测框架,徐志京等8设计生成对抗网络DEGAN来增强小目标的纹理特征,采用深度强化学习增强图像的颜色信息,设计自适应特征变换金字塔对特征进行提取。Zhao等9提出基于注意力机制的特征融合模块、可变形横向连接模块和基于锚框引导的检测模块,对飞机的特征进行细化调整。Yang等10使用分割网络来指导检测模型,设计了去噪模块,可以突出小目标的特征。Ding等11在RPN阶段只生成水平预选框,之后模型通过轻量的全连接层将水平候选框旋转为有向候选框,然后从有向候选框中获取旋转区域特征,对旋转框进行回归。为了解决在临界角度出现的不连续性和方形问题,Yang等12提出使用高斯距离计算回归损失。Zhang等13针对目标定位问题在前人的基础上进行优化,固定使用特征金字塔中的某两层输入RPN网络,解决模型对于不同层的选择问题。为了解决遥感图像背景复杂的问题,Li等14在多尺度的基础上引入视觉注意力机制,并在不同尺寸图上单独使用,从而削弱背景噪声的影响。Ma等15提出的Rotated RPN算法放弃在原有的水平框基础上加入角度参数作为有向框标注,直接生成任意方向的建议框。

本文面向遥感图像有向目标检测任务,针对通用模型对遥感有向目标检测性能不足的问题,提出一种基于特征重组和极化注意力机制的有向目标检测模型。针对遥感小目标特征难以提取的问题,设计基于特征重组的特征金字塔模块,保证提取特征的完整性。针对常用有向框标注法存在的临界角度问题,设计自适应建议区域生成模块,缓解角度不连续引起的损失变大问题。针对常见检测头在使用遥感目标特征信息时存在的不一致性问题,设计极化注意力检测头,区分回归和分类头所需的不同特征,提升检测精度。该模型在Rotated RPN模型的基准上进行优化,实验结果表明,改进后模型在有向遥感数据集上的检测精度明显提升。

2 PFR-Rotate模型

2.1 基准Rotated RPN模型

Rotated RPN15原本是针对文本检测提出的,用于处理面向任意方向的文本检测任务。由于遥感图像目标检测任务的兴起,不少学者将其引入遥感图像检测任务中。Rotated RPN主要工作有两点:1)提出了RRPN这一网络结构,这种结构可以为RPN直接生成带方向参数的预测框,从而使预测框能更贴合有向目标,达到提升精度的目的;2)为了配合RRPN,将生成的任意方向的建议框映射到特征图上再进行池化,提出了RROI。Rotated RPN是在二阶段检测模型的基础上进行的改进,整体框架如图1所示。

图 1. Rotated RPN整体框架

Fig. 1. Framework of Rotated RPN

Rotated RPN在光学遥感图像有向目标检测任务中精度仍有提升空间,问题可能在于:1)遥感数据集中有向小目标规模大,特征难以提取完全,Rotated RPN使用的传统主干网络以及特征金字塔网络对分布密集的小目标关注度不够高,导致模型无法对所有小目标精准定位;2)Rotated RPN在每个锚点上生成多个角度的锚框,但仍然难以兼顾到所有角度,且导致计算量增加;3)Rotated RPN使用的通用检测头将分类和回归任务耦合在一起,没有考虑到不同任务所需特征的不一致性。本文针对以上问题,在Rotated RPN的基础上进行优化改进,提出PFR-Rotate模型。


图 2. PFR-Rotate 模型结构图

Fig. 2. Framework of PFR-Rotate

2.2 基于特征重组的特征金字塔模块



图 3. reshape模块结构图

Fig. 3. Framework of reshape model

图 4. 特征重组模块(FR)结构图

Fig. 4. Framework of feature recombination model

2.3 基于特征自适应的建议区域生成模块

常见的有向框定义法有opencv定义法、长边定义法、八参数定义法等。opencv表示法由于角度定义不连续,在临界角度时需要转动较大角度,回归损失较大。长边表示法定义较长边与X轴的夹角为旋转角,但预测框刚好是正方形时没有真正意义上的长边,表示方式不唯一。八参数定义法在临界角度时会出现顶点顺序交换导致损失变大的问题。Rotated RPN为了避免这些临界问题,在生成预选框时直接生成多个多角度预测框,但计算量也因此增大,且生成的预测框不能兼顾到所有情况。


图 5. 中点偏移定义法

Fig. 5. Midpoint offset definition method

ATT-ORPN流程如图6所示。考虑到遥感目标特征表示不明显的问题,在RPN模块编码前使用自适应注意力机制来加强有效目标的表示能力。自适应注意力机制采用多分支并行的策略,一个分支使用自适应全局池化和全连接,另一分支使用Conv+BN+siReLU为特征赋予不同的权重。之后对有向框进行编码解码,获得有向框的表示。ATT-ORPN可以解决常见有向框表示法存在的各种边界问题,并且提升检测精度。此外,为了配合ATT-ORPN的使用,本文还使用了RROI Align。

图 6. ATT-ORPN流程图

Fig. 6. Flow chart of ATT-ORPN

2.4 极化注意力检测头模块


图 7. 双分支极化注意力检测头结构图

Fig. 7. Structure diagram of double polarization attention head

2.5 训练策略


3 实验结果与分析

3.1 数据集


HRSC2016数据集由西北工业大学于2016年发布,图像尺寸范围为300~1500 pixel。训练集、验证集和测试集分别有436幅图像(包括1207个样本)、181幅图像(包括541个样本)和444幅图像(包括1228个样本)。HRSC2016与Dior-R数据集一样使用有向标注格式,尽管HRSC2016数据集只有一个大类,但其目标尺寸差异更大,背景颜色干扰更为严重,因此可以用作本文模型的泛化实验。

3.2 实验设置

本实验基于mmrotate框架20,实验环境为Python 3.8、Pytorch 1.7.0、torchvision 0.7.0,batch size为2,动量因子为0.9,权重衰减系数为0.0005,所有实验均采用Resnet5021作为主干网络,以50%的概率随机翻转图像作为数据增强手段。在Dior-R数据集上训练12个epoch,学习率初始值设置为0.001,在迭代到第9和11个epoch后学习率分别下降到1×10-4和1×10-5。在HRSC2016数据集上训练40个epoch,学习率初始值为0.005。硬件设备为Intel® Core™ i9-10900X CPU、NVIDIA RTX3080Ti显卡。

3.3 消融实验


表 1. FPN部分消融实验

Table 1. Comparison of FPN ablation experimental results

BaselineSPPCBAMSECarafereshapeFRmAP /%mAP variation /%Recall /%f /(frame·s-1




fcls1=x,      x>00,     x0fcls2=11+e(x-0.5)fcls3=11+e-η(x-0.5)freg1=x,    x<0.51-x,  otherwisefreg2=-[x-0.52+C]freg3=12x-C,    x>00,      x=0-12x-C,    x<0

表 2. 极化函数消融实验

Table 2. Comparison of polarization function ablation experimental results

clsregmAP /%
Eq .(2)Eq .(3)Eq .(4)Eq .(5)Eq .(6)Eq .(7)



表 3. 式(4)中η取值的消融实验

Table 3. Ablation experiment for value selection about η in Eq. (4)

mAP /%62.1362.3762.5862.55


为了验证本文模型各模块的有效性,在Dior-R数据集上设置了一组消融实验,以Rotated RPN作为基准模型,分别使用所改进的各部分模块在基准网络上进行实验,实验结果如表4所示,其中√表示模型含有该模块。

表 4. 消融实验结果对比

Table 4. Comparison of ablation experimental results

BaselineFR-FPNATT-ORPNPA-headSWAmAP /%mAP variation /%Recall /%f /(frame·s-1




图 8. 各实验对Dior-R数据集每类样本的检测精度

Fig. 8. Average precision (AP) for each category in Dior-R dataset for each experiment

图 9. 损失函数和精度对比。(a)损失函数;(b)mAP

Fig. 9. Loss function and mAP comparison. (a) Loss function; (b) mAP

3.4 模型对比实验

为了验证所提出的用于遥感有向目标检测模型PFR-Rotate的有效性,将本文模型与其他一些一阶段和二阶段代表性有向检测模型进行比较,结果如表5所示,其中ms表示多尺度训练,主干网络为ResNet50。基准模型Rotated RPN在Dior-R数据集上的精度为59.54%。Gliding Vertex使用水平框近似来表示有向框,避免了边界预测不完整的问题,但并未完全解决水平标注法存在的问题,精度为60.06%。ROI Transformer通过定义新的有向框表示方法和新的标签处理过程将精度提升到63.87%,但由于需要经过两次回归得到最终的结果,因此在速度上还是有些不足。本文模型一方面在提取特征时能够关注有效特征,避免大量背景对检测结果的干扰,另一方面为分类和回归分配不同特点的特征,提升网络的检测精度。在配合新的有向框定义法以及训练策略的基础上精度达到64.49%,使用多尺度训练可以达到65.97%。可以看出,本文模型在多个类别取得最优的效果,说明了本文模型的有效性,不仅兼顾了不同尺寸、方向的目标,还适应了大部分场景,达到了在多尺度、多方向、复杂场景下的遥感图像有向目标检测性能优化的目的。

表 5. Dior-R数据集上各模型检测精度对比

Table 5. Comparison of precision of different network models on Dior-R dataset

ModelRetinaNet-O22Rotated RPN15Gliding Vertex23ROI Transformer11AOPG19PFR-RotatePFR-Rotate(ms)



表 6. Dior-R数据集定向和有向标注结果对比

Table 6. Comparison of directional and directed labeling results of Dior-R dataset

Directional labelingDirected labeling
ModelBackbonemAP /%ModelBackbonemAP /%
YOLO V325Darknet5357.1Rotated RPN15Resnet5059.54
Cascade R-CNN26Resnet5060.5Gliding Vertex23Resnet5060.06


为了验证本文模型在有向数据集上的通用性,将本文模型与其他模型在HRSC2016数据集上也进行对比实验,结果如表7所示,括号中的07、12沿用了VOC数据集的表述方式,代表两种不同的评价。从表7可以看出,本文模型在有向目标数据集上有较好的通用性。本文模型的检测精度相较于基准模型有较大的提升,主要原因在于基准模型对于小目标的检测能力不足,而HRSC2016数据集中只有船舰这一个大类,该类别目标通常较小,本文模型可以较好地解决小目标的检测问题,所以效果有所提升。S2A-Net提出特征对齐模块和面向检测模块,通过锚点细化网络生成高质量的锚点,并通过自适应对齐特征,又采用主动旋转滤波器对方向信息进行编码,提取旋转不变特征,从而有效解决分类性能与回归性能不一致的问题,在HRSC2016数据集上获得90.17%的检测精度。Oriented RepPoints提出三种新的定向转换函数和样本分配策略,引入空间约束来惩罚自适应学习的异常之处,和Oriented R-CNN同样取得90.4%左右的检测精度。本文模型使用的特征重组方法能对密集目标做出有效检测,所以在HRSC2016数据集上取得最佳精度90.83%。

表 7. HRSC2016数据集上各模型结果对比

Table 7. Comparison of test results of different network models on HRSC2016 dataset

Rotated RPN15Resnet5079.0885.64
Gliding Vertex23Resnet10188.20
ROI Transformer11Resnet10186.20
Oriented R-CNN28Resnet5090.4096.50
Oriented RepPoints29Resnet5090.3897.26



图 10. Dior-R数据集可视化结果对比

Fig. 10. Comparison of visualization results for Dior-R dataset

图 11. HRSC2016数据集可视化结果对比

Fig. 11. Comparison of visualization results for HRSC2016 dataset

4 总结

为了解决通用目标检测模型对遥感图像中有向目标检测能力不足的问题,在Rotated RPN的基础上进行改进,设计了一种基于特征重组和极化注意力的有向目标检测框架PFR-Rotate。引入特征重组模块,将FPN深层特征进行重组加权,使模型能够更加关注有效特征而忽视无效特征对检测的干扰。使用新的有向框标注方法并且为RPN设计自适应注意力模块,增强特征表达。为检测头引入极化注意力模块,以减弱不同任务之间所需特征的差异性。从结果来看,本文模型在两个数据集中均取得优秀的表现,体现了模型的有效性。本文模型虽然提升了检测精度,但是参数量较大,不能很好地适应实时检测的需求。后续工作考虑在尽可能保证精度的前提下对模型进行轻量化处理。


王友伟, 郭颖, 邵香迎, 王季宇, 鲍正位. 基于特征重组的遥感图像有向目标检测[J]. 光学学报, 2024, 44(6): 0628001. Youwei Wang, Ying Guo, Xiangying Shao, Jiyu Wang, Zhengwei Bao. Oriented Object Detection in Remote Sensing Images Based on Feature Recombination[J]. Acta Optica Sinica, 2024, 44(6): 0628001.

