2018-10-12
这篇文章介绍两篇 ECCV 2018最新的 paper,一篇提出IoU-Net,用来学习来预测每个检测到的边界框与匹配的ground truth 之间的IoU。 网络获得了定位置信度,通过保留精确的定位边界框来改进NMS。 此外,提出了一种基于优化的边界框细化方法,其中将预测的IoU表示为目标;另一篇提出DetNet,这是一种专门用于物体检测的新型 backbone 网络。
《Acquisition of Localization Confidence for Accurate Object Detection》
Abstract:Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects. While the probabilities for class labels naturally reflect classification confidence, localization confidence is absent. This makes properly localized bounding boxes degenerate during iterative regression or even suppressed during NMS. In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes. Furthermore, an optimization-based bounding box refinement method is proposed, where the predicted IoU is formulated as the objective. Extensive experiments on the MS-COCO dataset show the effectiveness of IoU-Net, as well as its compatibility with and adaptivity to several state-of-the-art object detectors.
摘要:现代的基于CNN的物体检测器依靠边界框回归和非最大抑制(NMS)来定位对象。 虽然类标签的概率自然反映了分类置信度(classification confidence),但缺乏定位置信度(localization confidence)。 这使得正确定位的边界框在迭代回归期间 degenerate 或甚至在NMS期间被抑制。 在本文中,我们提出了IoU-Net学习来预测每个检测到的边界框与匹配的ground truth 之间的IoU。 网络获得了定位置信度,通过保留精确的定位边界框来改进NMS。 此外,提出了一种基于优化的边界框细化方法,其中将预测的IoU表示为目标。 MS-COCO数据集上的大量实验表明了IoU-Net的有效性,以及它与几种最先进的物体探测器的兼容性和适应性。
arXiv:https://arxiv.org/abs/1807.11590
注:源码未放出
《DetNet: A Backbone network for Object Detection》
Abstract:Recent CNN based object detectors, no matter one-stage methods like YOLO, SSD, and RetinaNe or two-stage detectors like Faster R-CNN, R-FCN and FPN are usually trying to directly finetune from ImageNet pre-trained models designed for image classification. There has been little work discussing on the backbone feature extractor specifically designed for the object detection. More importantly, there are several differences between the tasks of image classification and object detection. 1. Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales. 2. Object detection not only needs to recognize the category of the object instances but also spatially locate the position. Large downsampling factor brings large valid receptive field, which is good for image classification but compromises the object location ability. Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. Moreover, DetNet includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers. Without any bells and whistles, state-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on our DetNet~(4.8G FLOPs) backbone. The code will be released for the reproduction.
摘要:最近的基于CNN的物体探测器,无论是像YOLO,SSD和RetinaNet这样的one-stage方法,还是像Faster R-CNN,R-FCN和FPN这样的two-stage探测器,都经常试图直接从ImageNet预先训练好的图像模型中进行微调分类。关于专门为物体检测设计的 backbone 特征提取器的讨论很少。更重要的是,图像分类和对象检测的任务之间存在若干差异。(1)最近的物体探测器如FPN和RetinaNet通常涉及额外的阶段,以防止图像分类的任务处理各种尺度的物体。 (2)目标检测不仅需要识别对象实例的类别,还需要在空间上定位位置。较大的下采样因子带来了较大的有效感受野,有利于图像分类,但会损害对象定位能力。由于图像分类和物体检测之间存在差距,本文提出了DetNet,这是一种专门用于物体检测的新型 backbone 网络。此外,DetNet还包括针对传统backbone网络的额外阶段,用于图像分类,同时在更深层中保持高空间分辨率。在没有任何其它tricks的情况下,基于我们的DetNet~(4.8G FLOP)backbone,在MSCOCO基准测试中获得了目标检测和实例分割的最优结果。
arXiv:https://arxiv.org/abs/1804.06215
注:源码未放出