增强小目标特征的多尺度光学遥感图像目标检测

针对光学遥感图像目标分布密集、尺度变化范围较大及小目标特征信息过少等造成目标检测精度不高、泛化能力差等问题，本文提出了一种增强小目标特征的多尺度神经网络（ESF-MNet）。首先在骨干网络中引入注意力模块构建出高效层注意力聚合结构，以增强特征提取能力；此外，在浅层特征图与颈部网络融合之前加入感受野增强模块，以捕获不同尺度的上下文信息。其次，使用GSConv构成颈部网络，减少网络层参数量，保持网络的特征提取能力，并通过基于内容感知的特征重组模块提高识别精度。最后，采用下采样率分别为4、8和16倍的三个下采样模块作为头部网络输入，来提高小目标的检测效果。为了证明该方法的有效性，在DOTA数据集和NWPU NHR-10数据集上进行实验，平均检测精度分别达78.6%和94.3%，计算复杂度为94.7 G，整体模型大小为26.2 M。该方法具备检测精度高、计算复杂度低、模型权重小等特点，能有效提高小目标的检测精度，进一步改善光学遥感图像小目标检测性能。

Abstract

Objective

Remote sensing technology is a method to observe and obtain information about objects and phenomena on the Earth's surface by satellites and aircraft. It allows us to obtain large-scale, multi-spectral, and high-resolution data from remote locations on Earth. The global and real-time technology features multi-spectral observation, high resolution, and multi-source data fusion without contact. Remote sensing target detection is a process of target recognition and extraction using remote sensing data. It aims to automatically detect, locate, and identify specific target types from remote sensing images, which is of significance for disaster warning and response, environmental monitoring, and ecological protection.

Methods

The traditional remote sensing image target detection algorithms include valley threshold and Sobel operator and convolutional neural network (CNN) algorithm, of which the most widely employed is the CNN. The algorithm has sound feature extraction and pattern recognition capabilities, but it is sensitive to locations and scale and may still perform poorly when small targets or large-scale changes are involved. Therefore, for the detection of remote sensing targets, it is necessary to consider many factors such as complex background, unbalanced target distribution, dense target, false detection, and missed detection. Therefore, we propose a multi-scale neural network for enhancing small target features (ESF-MNet) to deal with the low detection accuracy and poor generalization of current remote sensing targets. The core idea is to combine multiple CBH modules and CA attention mechanism to form a multi-residual cascade layer and perform efficient aggregation to enhance target feature expression. The RFE module is introduced to help the network better respond to remote sensing targets of different scales. GSConv and CARAFE modules are utilized to form the main structure of the Neck end. While reducing the amount of parameters and maintaining accuracy, the CARAFE module is adopted to improve the semantic extraction ability of the network. Meanwhile, a detection head that is more suitable for small targets is constructed to reduce the lost small target information as the network depth increases.

Results and Discussions

Qualitative and quantitative experiments are carried out on mainstream remote sensing detection models such as ESF-MNet, with ablation experiments analyzed. To verify the effectiveness of each improvement point, we conduct seven experiments on DOTA and NWPU NHR-10 datasets under the same environment and parameters based on the YOLOv7 network model. The detected image targets have complex backgrounds, as shown in Table 1. If the attention effect is not employed alone, the mentioned EACM module can significantly improve the effect. The proposed receptive field enhancement module effectively captures context information at different scales. The constructed Neck layer simplifies the network structure and improves the semantic extraction ability, and the proposed detection layer is suitable for small targets and enhances the fusion of shallow features. The mAP0.5 is improved by 3.7% and 4.5% on the two datasets respectively, which proves the effectiveness of each module. The proposed algorithm is compared with other algorithms to further compare the model performance. The experimental environment is the same, with the same training set and test set adopted. Faster R-CNN, FMSSD, YOLOv5s, YOLOv7, YOLOv8s, algorithms in Refs. [21-23], and the proposed algorithm are shown in Tables 2 and 3. In terms of average accuracy value, the ESF-MNet model performs best. Especially in the aspect of custom small targets, the performance is more prominent. The mAP reaches 83.6% and 97.6% respectively. However, the algorithm accuracy does not reach the best level when detecting some large target objects (such as track and field, basketball court). The main reason is that the network depth after model lightweight is shallow and the downsampling multiple is small. If the network depth and the downsampling multiple increase, although the detection effect of large targets can be improved, poor detection of small targets will be caused. Therefore, our research focus is to improve the detection accuracy of small and medium-sized targets on the premise of ensuring higher detection accuracy for large targets. Generally, compared with other algorithms, the proposed algorithm still has obvious advantages in mAP, greatly reduces the false detection rate, and also meets the basic needs of real-time detection.

Conclusions

The detection and recognition of targets in optical remote sensing images is of significance for civilian applications. However, in the case of complex background, dense small targets, and lack of feature information, the identification of small targets is very difficult. Meanwhile, we construct an efficient layer attention aggregation module in the backbone network to extract the target features of various categories and employ the receptive field enhancement module to fuse the feature maps of different depths and thus improve the information expression ability of the network. Additionally, by utilizing GSConv and CARAFE modules to form the Neck layer, and adopting the compression method of halving the number of channels, the neck is finely processed, and the cross-stage partial network (GSCSP) module VoV-GSCSP module is designed by one-time aggregation method, which can reduce the network computation and improve the detection speed. With the addition of the CARAFE module, the detection accuracy is improved. In addition, a multi-scale network is constructed by leveraging a feature output layer with a lower sampling rate of 4, 8, and 16 times in the detection head structure, which effectively improves the detection of small targets. Experimental results show that the model has sound real-time performance and strong robustness for small target detection in complex background. Although the model has been improved, it may still has missed detection and error detection. Although the remote sensing image target detection method is mature, it is still difficult to calculate the large and complex, accurate, and efficient method. However, we will continue to study and solve these problems in the future.

PDF全文

单慧琳, 王硕洋, 童俊毅, 胡宇翔, 张雁皓, 张银胜. 增强小目标特征的多尺度光学遥感图像目标检测[J]. 光学学报, 2024, 44(6): 0628006. Huilin Shan, Shuoyang Wang, Junyi Tong, Yuxiang Hu, Yanhao Zhang, Yinsheng Zhang. Multi-Scale Optical Remote Sensing Image Target Detection Based On Enhanced Small Target Features[J]. Acta Optica Sinica, 2024, 44(6): 0628006.

增强小目标特征的多尺度光学遥感图像目标检测

关于本站 Cookie 的使用提示

全站搜索

增强小目标特征的多尺度光学遥感图像目标检测

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索