Few-shot object detection via online inferential calibration
1 中国科学院光电技术研究所,四川 成都 610209
2 中国科学院大学,北京 100049
针对少量样本条件下模型易过拟合、目标错检与漏检问题,本文基于TFA (two-stage fine-tuning approach)提出了一种在线推断校准的小样本目标检测框架。该框架设计了一种全新的Attention-FPN网络,通过建模特征通道间的依赖关系选择性融合特征,结合分级冻结的学习机制引导RPN模块提取正确的新类前景目标;同时,构建了一种在线校准模块对样本进行实例分割编码,对众多候选目标进行评分重加权处理,纠正误检和漏检的预测目标。结果表明,所提算法在VOC数据集Novel Set1中,五个任务的平均nAP50提升10.16%,在性能上优于目前的主流算法。

Overview: The success of the deep detection model largely requires a large amount of data for training. Under the condition of fewer training samples, the model is easy to overfit and the detection effect is unsatisfactory. In view of the model that is easy to overfit and cause the target misdetection and missed detection in the absence of training samples, we present the Few-Shot Object Detection via the Online Inferential Calibration (FSOIC) framework by using the Faster R-CNN as detector. Through its excellent detection performance and powerful ability to distinguish the foreground and background, it effectively solves the problem that the single-stage detector cannot locate the target when the training samples are scarce. The bottom-layer features have a larger size and stronger location information, but the lack of global vision leads to weak semantic information, while the top-layer features are the opposite. To make full use of the sample information, the framework is designed to possess a new Attention-FPN network, which selectively the fuses features through modeling the dependencies between the feature channels, and directs the RPN module to extract the correct novel classes of the foreground objects by combined with the hierarchical freezing learning mechanism. The channel attention mechanism compresses the feature map and spreads it into a one-dimensional vector for sigmoid through two fully connected layers. The weight is generated for each feature channel, and the correlation between each channel is established. The weight of the input features is allocated according to the category, and the dependence relationship between each channel is modeled. Due to the closed nature of the neural network, simple feature fusion is uncertain, and it is difficult to fuse the feature map in a satisfactory direction. To the imbalanced sample features, the candidate targets of the new class are scored too low and filtered in the selection of the prediction box, resulting in false detection and missed detection of the detector. We designed the online calibration module that segmentes and encodes the samples, scored the re-weighted the multiple candidate objects, and corrected the misdetected and missed predicted objects. The performance of our detection algorithm performs better than most comparisons. The experimental results in the VOC Novel Set 1 show that the proposed method improves the average nAP50 of the five tasks by 10.16% and performs better than most comparisons.Considering that the model is easy to overfit and cause the target misdetection and missed detection under the condition of few samples, this paper propose the few-shot object detection via the online inferential calibration (FSOIC) based on the two-stage fine-tuning approach (TFA). In this framework, a novel Attention-FPN network is designed to selectively fuse the features by modeling the dependencies between the feature channels, and direct the RPN module to extract the correct novel classes of the foreground objects in combination with the hierarchical freezing learning mechanism. At the same time, the online calibration module is constructed to encode and segment the samples, reweight the scores of multiple candidate objects, and correct misclassifying and missing objects. The experimental results in the VOC Novel Set 1 show that the proposed method improves the average nAP50 of the five tasks by 10.16% and performs better than most comparisons.

1 引言


基于深度学习的目标检测器分为两阶段检测器和单阶段检测器[1]。不同于单阶段检测器直接对目标进行检测,两阶段检测器,R-CNN[2]、SPP-Net[3]、Fast R-CNN[4]、Faster R-CNN[5]有一个单独的模块用于生成区域候选框,在第一阶段找到一定数量的候选目标,并在第二阶段对预测目标进行定位及分类。单阶段检测器,YOLO[6]、SSD[7]、YOLOv2[8]、RetinaNet[9]、YOLOv3[10]、YOLOv4[11]通过密采样直接对语义目标进行分类和定位,它们使用预定义的不同比例和长宽比的先验框来定位目标。相比于单阶段检测器直接完成分类任务和回归任务,两阶段检测器在执行推断任务时由于能对初步预测目标进行精确修正,因此具有更高的检测精度[12]

尽管以上的检测器性能优异,但难以直接应用于小样本任务。YOLOMAML[13]采用了MAML[14]的方法训练YOLO检测器,但其本质上是用小样本数据直接训练网络。实验结果中出现大量目标错检、目标漏检以及回归框定位不准确的问题。该文章认为直接用小样本数据训练网络效果并不理想,需要寻找新的方法来解决以上问题。Meta R-CNN[15]为验证元知识在小样本检测任务中的有效性,直接用小样本数据训练Faster R-CNN,并对Meta R-CNN和Faster R-CNN性能进行对比。结果显示,Faster R-CNN检测器在1 shot上的检测精度仅为2.7,而Meta R-CNN在1 shot上的精度为19.9。因此,普通的检测器无法胜任小样本检测任务,我们需要引入适用于小样本任务的检测方法。

小样本学习(Few-shot learning),能够有效解决在现实中缺乏大量训练数据以及遇到从未见过的新类别的问题[16-17]。根据在学习时采用的方法不同,小样本学习的方法可分为:数据增强、基于迁移学习的方法、度量学习以及基于元学习的方法。




元学习致力于找到神经网络中对每个任务较为敏感的全局最优参数,通过微调这些参数,让模型的损失函数快速收敛。MAML[14],Reptile[34],Meta-SGD[35],Meta-LSTM[36]通过学习多组不同的小样本任务,获得一组全局最优初始化参数。Meta R-CNN[15]通过对图像标记部分加上掩码,引入类注意力向量,对提取的特征进行特征融合,并重组预测头,完成小样本目标检测任务。DCNet[27]在基于元学习的框架上提出了具有上下文感知的密集关系蒸馏方法,通过利用支持集的特征来捕获查询图片的细粒度特征获得更全面的特征表示来解决小样本目标检测问题。Meta-yolo[37]使用元特征学习器和一个调整权重模块来解决小样本目标检测问题。Meta-DETR[38]提出类间相关的元学习策略,将查询特征与多个支持类同时聚合,捕获类间相关性,强化模型的泛化能力。

目前主流的方法主要从提升特征提取能力方面对模型进行优化,并没能够充分利用好样本信息。样本不仅可用于学习,也可在进行推断任务时,用于对检测框进行校准。针对上述问题,本文提出了一种在线推断校准的小样本目标检测框架(few-shot object detection via online inferential calibration, FSOIC)。我们在骨干网络上引入了全新的Attention-FPN网络和多组ROI模块,能够在不影响基类信息的条件下学习新类知识,从而引导RPN提取更多高质量的新类前景目标,解决目标漏检问题。同时,本文设计的在线校准模块通过类模板特征对候选目标评分进行校准,促使预测头选择更精确的预测目标,改善目标错检问题。本文在VOC数据集的三个新类子集上进行了大量实验,与14种主流算法进行对比,定量和定性实验结果说明了算法能够有效提升网络检测性能。

2 相关工作

2.1 Faster R-CNN

Faster R-CNN[5]是目标检测中经典的双阶段网络,由骨干网络、RPN模块、ROI模块以及预测头组成,如图1所示。

图 1. Faster R-CNN网络结构

Fig. 1. Faster R-CNN network architecture

下载图片 查看所有图片

Faster R-CNN的骨干网络不唯一,VGG、Resnet都可作为其特征提取网络[39]。第一阶段,将整张图片输入骨干网络进行特征提取后,RPN会对目标生成大量的区域建议框,并进行二分类,判断生成框属于前景还是背景。第二阶段,其内部的ROI模块会对每一个区域候选框进行尺寸固定,筛选感兴趣的区域,最后由分类预测头和回归预测头进行分类和目标定位。由于其出色的检测性能,以及强大的区分前景和背景的能力,能够有效解决单阶段检测器在训练样本稀少时无法定位目标的问题,更适合小样本目标检测任务。

LSTD[26]在小样本任务中用SSD[7]设计边界盒回归,用Faster-RCNN设计目标分类,根据RPN提取的候选前景目标分数选择提案目标。MetaDet[40]引入了一个权重预测元模型,以Faster-R-CNN为框架,对参数化权值预测的元模型进行端到端的训练。该算法将RPN视为与类别无关的组件,利用基类中的元知识促进新类的生成,完成小样本目标检测任务。目前主流的小样本检测器,如FSCE[25]、TIP[41],FSDetView[42]均采用Faster-RCNN为检测器。但以上检测器并未能充分利用样本信息,本文对Faster R-CNN做进一步优化,设计在线校准模块对样本信息进行充分利用,提高网络检测精度。

2.2 FPN



2.3 TFA

两阶段微调的方法(TFA)[44]是一种模型学习策略,将具有丰富标签的数据类别定义为基类(base class),仅有的少量的标签数据类别定义为新类(novel class)。在第一阶段,使用基类目标对网络进行预训练,使得网络的骨干网络具有良好的特征提取能力。第二阶段将少量基类和新类目标送入模型中,对骨干网络的参数进行冻结,防止特征提取网络由于数据稀少而发生过拟合,仅微调最后一层分类器,使分类器具备区分不同类别的能力。

TFA w/cos[44],以Faster-RCNN为检测器,采用TFA的学习策略,仅微调检测器的最后一层并固定模型的其余参数。结果显示,TFA的方法可以显著提高检测精度,在性能上优于基于元学习的方法。在TFA w/cos[44]提出后,后续的主流小样本检测算法TFA w/cos+Halluc[18],Retentive R-CNN[45]均采用了TFA的学习策略。经过实验分析,本文认为TFA的学习策略过于统一,难以引导检测器在不同shot任务中拟合参数。本文对TFA w/cos[44]方法进行优化,提出分级冻结的学习机制,使检测模型在训练阶段能够更好地拟合网络参数。

3 方法


3.1 FSOIC整体框架

改进后的Faster R-CNN检测框架由Attention-FPN骨干网络、RPN模块、多组ROI模块、在线校准模块以及预测头组成,如图2所示。

图 2. FSOIC网络结构

Fig. 2. FSOIC network architecture

下载图片 查看所有图片


图 3. 基于TFA的检测结果

Fig. 3. Detection results based on TFA

下载图片 查看所有图片


表 1. 分级冻结机制

Table 1. Hierarchical freezing mechanism



分类预测头和回归预测头分别输出预测框目标的种类和位置坐标。训练时,模型的损失函数由RPN模块的候选目标框损失 Lossrpn_loc (用L1表示),区域候选框目标类别损失 Lossrpn_cls (用L2表示),回归预测头损失 Lossbox_reg (用L3表示),以及分类预测头损失 Losscls (用L4表示)四个部分组成。总的损失函数(用Ltot表示)定义如下:


用Smooth L1 Loss函数(用Ls表示)将标签数据的位置信息定义为 xi ,预测目标位置信息定义为 yi ,用于计算RPN候选框损失 Lossrpn_loc (用L1表示)和回归预测头损失 Lossbox_reg (用L3表示)。Smooth L1 Loss(用 Ls表示)定义如下所示:


二分类交叉熵函数用于判别两个概率分布之间的距离,将预测目标定义为 xi ,标签数据定义为 yi ,其中在计算区域候选框目标类别损失 Lossrpn_cls (用L2表示)时, xi 为标签数据是否存在目标, yi 为前景目标;计算预测头的分类损失 Losscls (用L4表示)时, xi 定义为标签数据类别, yi 为预测目标类别。二分类交叉熵损失函数(Le)定义如下所示:


3.2 Attention-FPN骨干网络

Tsung-Yi Lin在FPN[43]一文指出,高分辨率特征图对物体的识别表征能力较弱。底层特征虽然具有更强的位置信息,但缺少全局视野导致语义信息薄弱。高层特征虽然分辨率低下,却具有丰富的语义信息。将高层特征与底层特征进行融合,可以有效增强底层特征的语义信息。为解决错分类与漏检问题,我们需要引导RPN提取更多的新类目标。因此,本文设计了一个自上而下的注意力多尺度特征融合网络Attention-FPN。通过增强底层特征的语义信息,RPN能够获取更多丰富的新类知识,提取更精准的新类前景目标。如图4所示。

图 4. Attention-FPN网络结构

Fig. 4. Attention-FPN network architecture

下载图片 查看所有图片


图 5. 通道注意力模块

Fig. 5. Channel attention module

下载图片 查看所有图片

骨干网络输出的特征图,将作为RPN模块的输入用于生成区域建议框。同时,本文引入4组ROI Align池化层,用于对不同尺度候选区域的特征图进行区域选取和尺寸固定,并将提取到的候选特征图作用于预测头进行分类和回归预测。

3.3 候选框校准




类模板生成模块由Faster R-CNN的骨干特征提取网络、四个ROI模块以及两个全连接层组成,构成一个分类器。具体的网络结构如图6所示。

图 6. FSOIC算法的类模板生成模块

Fig. 6. FSOIC algorithm class template generation module

下载图片 查看所有图片

首先,骨干网络对样本图片进行信息编码,生成多个不同尺度的特征图。其次,ROI模块根据样本自带的标签位置信息裁剪特征图,过滤图片中的背景信息,并将多尺度的特征图转化为固定大小的特征图。最后,通过两层全连接层将特征图尺寸转换为大小为 1×1024 的特征向量。


图 7. 特征度量空间

Fig. 7. Feature metric space

下载图片 查看所有图片

模板生成模块将生成的第j个类向量定义为 yi ,经过加权求和生成类模板 x ,如式(4)所示:


模板匹配模块将i类样本模板定义为 xi ,候选预测目标特征定义为 pi ,计算类模板向量与预测目标特征压缩后的特征向量之间的余弦相似度,如式(5)所示:


候选框评分校准模块将原始候选目标评分定义为 si ,余弦相似度定义为 sicos ,并为原始得分分配目标权重α,与相似度进行加权求和,对目标框得分进行校准,如式(6)所示:




图 8. 检测结果性能对比

Fig. 8. Performance comparison of the detection results

下载图片 查看所有图片

图 9. 10 shot任务中遮挡条件下的检测结果

Fig. 9. Detection results under the occlusion conditions in the 10 shot task

下载图片 查看所有图片

4 实验

4.1 实验设置

本文在配有8张NVIDIA GeForce RTX 3090显卡的服务器上进行实验。基于Pascal VOC数据集,我们将小样本目标检测任务按照训练的样本数量划分为1、2、3、5、10 shot共5个任务。基于COCO数据集,我们将检测任务分为10 shot和30 shot,实验参数如表2所示。

表 2. 数据集实验设置

Table 2. Experimental settings of the dataset

DatasetShotNumber of categoriesInitial learning rateBatch_sizeDecay ratio of learning rateNumber of attenuationIterations


4.2 实验结果比较

为了进一步验证改进后的算法性能,本文将FSOIC算法与目前最先进的小样本目标检测算法在通用数据集PASCAL VOC的三个新类子集上进行性能比较,测试集共4952张图片,包括14976个目标实例,实验结果如表3所示。

表 3. 小样本目标检测算法在VOC新类划分集的性能分析比较表

Table 3. Performance analysis and comparison of the few shot object detection algorithm in VOC new class partition sets

MethodYearNovel Set 1Novel Set 2Novel Set 3
MetaDet[40]ICCV 1918.920.630.236.849.621.823.127.831.743.020.623.929.443.944.1
Meta R-CNN[15]ICCV 1919.925.535.045.751.510.419.429.634.845.414.318.227.541.248.1
RepMet[28]CVPR 1926.132.934.438.641.317.222.123.428.335.827.531.131.534.437.2
FSRW[37]ICCV 1914.815.526.733.947.215.715.322.730.140.521.325.628.442.845.9
FSDetView[42]ECCV 2024.235.342.249.157.421.624.631.937.045.721.
TFA w/cos[44]ICML 2039.836.144.755.756.023.526.934.
MPSR[51]ECCV 2041.7-51.455.261.824.4-39.239.947.835.6-42.348.049.7
TFA w/cos+Halluc[18]CVPR 2145.144.044.755.055.923.227.535.134.939.030.535.141.449.049.3
TIP[41]CVPR 2127.736.543.350.259.622.730.133.840.946.921.730.638.144.550.9
FSCE[25]CVPR 2144.243.851.461.963.427.329.543.544.
Retentive R-CNN[45]CVPR 2142.445.845.953.756.121.727.835.237.040.330.237.643.049.750.1
Meta-DETR[38]IEEE 2235.
AGCM[33]IEEE 2240.3--58.559.927.5--49.350.642.1--54.258.2


表3中红色标记数据为最优性能,蓝色标记数据为次优性能。表3显示我们的检测算法在性能上优于现在主流的小样本目标检测算法。以nAP50 (noval AP50)为评价标准,在Novel Set1中,对比TFA w/cos[44],五个任务的平均精度提升10.16%,高于综合性能最优秀的FSCE 3.68%。三个VOC子集在五个任务中的平均nAP50提升9.05%。

小样本检测在COCO数据集上的结果如表4所示。FSOIC算法对比于目前最先进的小样本检测器,仍然取得了最佳的性能。以nAP (noval AP)为评价标准,对比基线TFA w/cos[44],两个任务平均精度提升2.85%,高于FSCE 0.55%。

表 4. 小样本目标检测算法在COCO数据集的性能分析比较

Table 4. Performance analysis and comparison of few shot object detection algorithms in the COCO datasets

MethodYearNovel AP
LSTD [26]AAAI 183.26.7
FSRW [37]ICCV 195.69.1
MPSR[51]ECCV 209.814.1
TFA w/cos [44]ICML 2010.013.7
Retentive R-CNN [45]CVPR 2110.513.8
FSCE[25]CVPR 2111.916.4


4.3 消融实验


表 5. 消融实验性能比较

Table 5. Comparison of the ablation experimental performance

MethodFPN+4*ROIFinetune RPNOnline calibrationAttention of channelNovel Set1
TFA w/cos[44]----39.844.756.0



为验证各个模块的实际检测性能,本文在图10分别对优化前的算法以及优化后的算法检测结果进行对比。其中图10(a)、(b)、(c)分别为TFA w/cos[44]算法、使用在线推断校准模块的模型以及使用在线推断校准模块并添加Attention-FPN的模型检测结果。对比图10可以看到,10(a)中漏检的目标在10(b)中被检测出来,且部分目标评分得到大幅提升。通过对比,我们可以得出,在线推断校准模块,可以有效解决目标漏检与目标评分过低的问题。10(b)中评分较低的目标,在10(c)中目标评分得到有效提升,且10(c)中的预测框包含的背景更少,定位更精确。由此得出,在引入Attention-FPN后,骨干网络输出的特征图具有更丰富的语义信息,使得RPN生成更优质的新类前景框,从而间接引导预测头筛选出更精确的预测框并获得更高的目标评分。

图 10. 10 shot 任务下的检测结果。(a) 基于TFA的Faster R-CNN网络检测结果;(b) 使用在线推断校准模块的FasterR-CNN网络检测结果;(c) 使用在线推断校准模块并添加Attention-FPN网络的Faster R-CNN网络检测结果

Fig. 10. 10 shot task detection results. (a) Detection results of the Faster R-CNN network based on TFA; (b) Detection results of the Faster R-CNN net work using the online inference calibration module; (c) Detection results of the Faster R-CNN network using the online inference calibration module and adding the Attention-FPN network

下载图片 查看所有图片

5 总结

为解决小样本检测的错检与漏检问题,本文对TFA w/cos[44]算法进行优化。本文在训练阶段,引入Attention-FPN和多组ROI模块,并使用分级冻结的学习策略引导RPN学习新类知识,提升网络对新类特征提取能力;在目标预测阶段,引入评分校准模块对候选目标预测评分进行修正并过滤评分较低的候选框,纠正错检目标;通过调整RPN模块来增大候选目标框的数量,对更多的候选目标框进行校准,避免模型漏检。实验结果表明,本文提出的FSOIC算法有效提升了检测器在小样本目标检测任务中的性能。本文的下一步工作考虑对RPN模块进行优化,采用双RPN结构,分别对基类目标和新类目标进行特征提取,根据预测的种类筛选不同的特征,提高对目标的识别和回归精度。


[1] 陈旭, 彭冬亮, 谷雨基于改进YOLOv5s的无人机图像实时目标检测光电工程202249321037210.12086/oee.2022.210372

    Chen X, Peng D L, Gu YReal-time object detection for UAV images based on improved YOLOv5sOpto-Electron Eng202249321037210.12086/oee.2022.210372

[2] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580−587. https://doi.org/10.1109/CVPR.2014.81.

[3] He K M, Zhang X Y, Ren S Q, et alSpatial pyramid pooling in deep convolutional networks for visual recognitionIEEE Trans Pattern Anal Mach Intell20153791904191610.1109/TPAMI.2015.2389824

[4] Girshick R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169.

[5] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015, 91–99.

[6] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779–788. https://doi.org/10.1109/CVPR.2016.91.

[7] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//14th European Conference on Computer Vision, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.

[8] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517–6525. https://doi.org/10.1109/CVPR.2017.690.

[9] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 2999−3007. https://doi.org/10.1109/ICCV.2017.324.

[10] Redmon J, Farhadi A. YOLOv3: an incremental improvement[Z]. arXiv: 1804.02767, 2018. https://arxiv.org/abs/1804.02767.

[11] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection[Z]. arXiv: 2004.10934, 2020. https://arxiv.org/abs/2004.10934.

[12] Ma L, Gou Y T, Lei T, et alSmall object detection based on multi-scale feature fusion using remote sensing imagesOpto-Electron Eng202249421036310.12086/oee.2022.210363

    马梁, 苟于涛, 雷涛, 等基于多尺度特征融合的遥感图像小目标检测光电工程202249421036310.12086/oee.2022.210363

[13] Bennequin E. Meta-learning algorithms for few-shot computer vision[Z]. arXiv: 1909.13579, 2019. https://arxiv.org/abs/1909.13579.

[14] Behl H S, Baydin A G, Torr P H S. Alpha MAML: adaptive model-agnostic meta-learning[Z]. arXiv: 1905.07435, 2019. https://arxiv.org/abs/1905.07435.

[15] Yan X P, Chen Z L, Xu A N, et al. Meta R-CNN: towards general solver for instance-level low-shot learning[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019: 9576–9585. https://doi.org/10.1109/ICCV.2019.00967.

[16] Wang Y Q, Yao Q M. Few-shot learning: a survey[Z]. arXiv: 1904.05046v1, 2019. https://arxiv.org/abs/1904.05046v1.

[17] Duan Y, Andrychowicz M, Stadie B, et al. One-shot imitation learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 1087–1098.

[18] Zhang W L, Wang Y X. Hallucination improves few-shot object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13003–13012. https://doi.org/10.1109/CVPR46437.2021.01281.

[19] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//2017 IEEE International Conference on Computer Vision, 2017: 2242–2251. https://doi.org/10.1109/ICCV.2017.244.

[20] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 3672–2680.

[21] Li K, Zhang Y L, Li K P, et al. Adversarial feature hallucination networks for few-shot learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 13467–13476. https://doi.org/10.1109/CVPR42600.2020.01348.

[22] Hui B Y, Zhu P F, Hu Q H, et al. Self-attention relation network for few-shot learning[C]//2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2019: 198–203. https://doi.org/10.1109/ICMEW.2019.00041.

[23] Hao F S, Cheng J, Wang L, et alInstance-level embedding adaptation for few-shot learningIEEE Access2019710050110051110.1109/ACCESS.2019.2906665

[24] Schönfeld E, Ebrahimi S, Sinha S, et al. Generalized zero-and few-shot learning via aligned variational autoencoders[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 8239–8247. https://doi.org/10.1109/CVPR.2019.00844.

[25] Sun B, Li B H, Cai S C, et al. FSCE: few-shot object detection via contrastive proposal encoding[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7348–7358. https://doi.org/10.1109/CVPR46437.2021.00727.

[26] Chen H, Wang Y L, Wang G Y, et al. LSTD: a low-shot transfer detector for object detection[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, 2018: 346.

[27] Hu H Z, Bai S, Li A X, et al. Dense relation distillation with context-aware aggregation for few-shot object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10180–10189. https://doi.org/10.1109/CVPR46437.2021.01005.

[28] Karlinsky L, Shtok J, Harary S, et al. RepMet: representative-based metric learning for classification and few-shot object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5197–5206. https://doi.org/10.1109/CVPR.2019.00534.

[29] Jiang W, Huang K, Geng J, et alMulti-scale metric learning for few-shot learningIEEE Trans Circuits Syst Video Technol20213131091110210.1109/TCSVT.2020.2995754

[30] Sung F, Yang Y X, Zhang L, et al. Learning to compare: relation network for few-shot learning[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 1199–1208. https://doi.org/10.1109/CVPR.2018.00131.

[31] Tao X Y, Hong X P, Chang X Y, et al. Few-shot class-incremental learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 12180–12189. .

[32] Wang Y, Wu X M, Li Q M, et al. Large margin few-shot learning[Z]. arXiv: 1807.02872, 2018. https://doi.org/10.48550/arXiv.1807.02872.

[33] Agarwal A, Majee A, Subramanian A, et al. Attention guided cosine margin to overcome class-imbalance in few-shot road object detection[C]//2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 2022: 221–230. https://doi.org/10.1109/WACVW54805.2022.00028.

[34] Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms[Z]. arXiv: 1803.02999, 2018. https://arxiv.org/abs/1803.02999.

[35] Li Z G, Zhou F W, Chen F, et al. Meta-SGD: learning to learn quickly for few-shot learning[Z]. arXiv: 1707.09835, 2017. https://arxiv.org/abs/1707.09835.

[36] Ravi S, Larochelle H. Optimization as a model for few-shot learning[C]//5th International Conference on Learning Representations, 2016.

[37] Kang B Y, Liu Z, Wang X, et al. Few-shot object detection via feature reweighting[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 8419–8428. https://doi.org/10.1109/ICCV.2019.00851.

[38] Zhang G J, Luo Z P, Cui K W, et al. Meta-DETR: image-level few-shot detection with inter-class correlation exploitation[J]. IEEE Trans Pattern Anal Mach Intell, 2022. https://doi.org/10.1109/TPAMI.2022.3195735.

[39] Ma W, Yu J, Wang X, et alGarbage detection and classification method based on improved faster R-CNNComput Eng202147829430010.19678/j.issn.1000-3428.0058258

    马雯, 于炯, 王潇, 等基于改进Faster R-CNN的垃圾检测与分类方法计算机工程202147829430010.19678/j.issn.1000-3428.0058258

[40] Wang Y X, Ramanan D, Hebert M. Meta-learning to detect rare objects[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 9924–9933. https://doi.org/10.1109/ICCV.2019.01002.

[41] Li A X, Li Z G. Transformation invariant few-shot object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 3093–3101. https://doi.org/10.1109/CVPR46437.2021.00311.

[42] Xiao Y, Marlet R. Few-shot object detection and viewpoint estimation for objects in the wild[C]//16th European Conference on Computer Vision, 2020: 192−210. https://doi.org/10.1007/978-3-030-58520-4_12.

[43] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936−944. https://doi.org/10.1109/CVPR.2017.106.

[44] Wang X, Huang T, Gonzalez J, et al. Frustratingly simple few-shot object detection[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 9919–9928.

[45] Fan Z B, Ma Y C, Li Z M, et al. Generalized few-shot object detection without forgetting[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4525−4534. https://doi.org/10.1109/CVPR46437.2021.00450.

[46] Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//14th European Conference on Computer Vision, 2016: 850–865. https://doi.org/10.1007/978-3-319-48881-3_56.

[47] Li B, Yan J J, Wu W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8971−8980. https://doi.org/10.1109/CVPR.2018.00935.

[48] Zhu Z, Wang Q, Li B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 103–119. https://doi.org/10.1007/978-3-030-01240-3_7.

[49] Zhao C M, Chen Z B, Zhang J LResearch on target tracking based on convolutional networksOpto-Electron Eng202047118066810.12086/oee.2020.180668

    赵春梅, 陈忠碧, 张建林基于卷积网络的目标跟踪应用研究光电工程202047118066810.12086/oee.2020.180668

[50] Zhao C M, Chen Z B, Zhang J LApplication of aircraft target tracking based on deep learningOpto-Electron Eng201946918026110.12086/oee.2019.180261

    赵春梅, 陈忠碧, 张建林基于深度学习的飞机目标跟踪应用研究光电工程201946918026110.12086/oee.2019.180261

[51] Wu J X, Liu S T, Huang D, et al. Multi-scale positive sample refinement for few-shot object detection[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 456–472. https://doi.org/10.1007/978-3-030-58517-4_27.

彭昊, 王婉祺, 陈龙, 彭先蓉, 张建林, 徐智勇, 魏宇星, 李美惠. 在线推断校准的小样本目标检测[J]. 光电工程, 2023, 50(1): 220180. Hao Peng, Wanqi Wang, Long Chen, Xianrong Peng, Jianlin Zhang, Zhiyong Xu, Yuxing Wei, Meihui Li. Few-shot object detection via online inferential calibration[J]. Opto-Electronic Engineering, 2023, 50(1): 220180.

