Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv3 - Continue on #1695 #3083

Merged
merged 83 commits into from
Aug 31, 2020

Conversation

ElectronicElephant
Copy link
Contributor

@ElectronicElephant ElectronicElephant commented Jun 20, 2020

Hi all,

@WenqiangX and I are glad to help implement YOLOv3, and perhaps v4 as well, if everything goes on well. We are from (MVIG, SJTU)[http://mvig.sjtu.edu.cn] and have spent quite a little time surveying all kinds of YOLOv3 implementation in PyTorch. Also, we have spent a lot of time working on the gluon-cv version of YOLOv3, which relays on mxnet. So, we are familiar with it.

Months ago, @wuhy08 has contributed a lot (#1695), but there are still some problems to be solved. I have tested his implementation and found it can be trained using one vid card and achieve ~38 mAP in (618, 618) resolution, which is pretty good. However, it fails to train on multiple cards and achieves very low mAP even with find_unused_parameters=True. Still, great thanks to him and Western Digital.

Basically, we aim to complete the following list:

  • Merge @wuhy08 's code with mm-detection 2.0
  • And test it!
  • Move some hyper-params to the config file
  • Refactor the backbone with your new ConvModule (as mentioned in Implementing YOLOv3 architecture #1695 (comment))
  • Refactor the backbone and neck to support other modules
  • Introduce some great ideas from gluon-cv
  • Convert the darknet pretrained file (May have difficulty and help is needed)
  • Fix the labels +1 / -1 problem

If you have any suggestions, feel free to comment.

Also, I hope that the authors of this project can manage your time to review the code to see if there is any major problems so that we can save some time.

This pr is still in an early stage, but is expected to finish by the end of July.

If everything is good, could you please close the original pr so that we can continue to work here?

@CLAassistant
Copy link

CLAassistant commented Jun 20, 2020

CLA assistant check
All committers have signed the CLA.

@hellock hellock requested a review from xvjiarui June 20, 2020 15:46
@xvjiarui xvjiarui added the WIP Working in progress label Jun 21, 2020
@xvjiarui
Copy link
Collaborator

Hi @ElectronicElephant
Thanks for your contribution! I will review this PR. You could remove the WIP label once finished.

Btw, @wuhy08 @ElectronicElephant may sign CLA first.

@ElectronicElephant

This comment has been minimized.

@wuhy08
Copy link
Contributor

wuhy08 commented Jun 22, 2020

Hi @ElectronicElephant @xvjiarui

Sorry for the delay. I have signed the CLA. My effort for implementing YOLOv3 was related to my past project at Western Digital. Since WD has agreed to open source my contribution, I submitted a PR. Now as it is in public domain, you are free to continue to contribute on top of it. I just request you credit WD and me properly.

Best

HW

@ElectronicElephant
Copy link
Contributor Author

Hi @ElectronicElephant @xvjiarui

Sorry for the delay. I have signed the CLA. My effort for implementing YOLOv3 was related to my past project at Western Digital. Since WD has agreed to open source my contribution, I submitted a PR. Now as it is in public domain, you are free to continue to contribute on top of it. I just request you credit WD and me properly.

Best

HW

Great, thanks!

Frankly, your PR works like a charm under a single GPU, achieving very high mAP. I'll try to make it run on multiple GPUs.

I'll reserve all your commit history, as long as the license.

🤝

@ZwwWayne
Copy link
Collaborator

Please merge master to resolve conflicts.

@xvjiarui xvjiarui requested a review from hellock August 31, 2020 02:07
@hellock hellock merged commit dfbb6d6 into open-mmlab:master Aug 31, 2020
@LMerCy
Copy link

LMerCy commented Sep 25, 2020

Hi @sudo-rm-covid19
Thanks for your reminding. I have also noticed that.
So I configured assigned like this.

assigner=dict(
type='MaxIoUAssigner',
# use low_quality only
pos_iou_thr=1.,
neg_iou_thr=0.5,
min_pos_iou=0.,
gt_max_assign_all=False,
match_low_quality=True,
ignore_iof_thr=-1))

Should it be equivalent to the expected output?

Problem occurs when consecutive max overlapped iou of the same value within a nearby region. There is a chance that the positive anchor is assigned to the first matched anchor (not in the responsible cell). Following code are my implementations based on mmdet v1, where I created responsible flags to mask out irrelevant overlaps (by setting them to -1)

class YOLOAnchorGenerator(AnchorGenerator):
    """Generate yolo-style anchors
    Different from AnchorGenerator in the following manners:
    1. Instead of generate base anchors by scale and ratios, they are provided
    2. In addition to anchors and valid flags, it provides anchor strides as well
    3. It generates responsible grid cell given gt_bboxes, stride and feature map size
    
    Args:
        anchor_base_sizes: list[tuple or list, in the format of (w, h)], indicating 
            anchors for a scale
        ctr: list or tuple in the format of (x, y), indicating center offset for the
            base anchors
    """
    def __init__(self, anchor_base_sizes, ctr):
        super(YOLOAnchorGenerator, self).__init__(anchor_base_sizes, [], [], ctr=ctr)


    def gen_base_anchors(self):
        self.base_size = torch.Tensor(self.base_size)
        assert len(self.base_size.shape) == 2
        w = self.base_size[:, 0].view(-1)
        h = self.base_size[:, 1].view(-1)
        assert len(self.ctr) == 2
        x_ctr, y_ctr = self.ctr

        # Note the round operation, which may cause the generated anchors' width and height are
        # different from input ones.
        base_anchors = torch.stack(
            [
                x_ctr - 0.5 * (w - 1), y_ctr - 0.5 * (h - 1),
                x_ctr + 0.5 * (w - 1), y_ctr + 0.5 * (h - 1)
            ],
            dim=-1).round()

        return base_anchors

    def anchors_stride(self, featmap_size, stride=16, device='cuda'):
        """Record stride of anchors in the current scale
        
        returns: Tensor of shape (K*A, ), the stride of anchors at current scale
        """
        feat_h, feat_w = featmap_size
        stride_map = torch.ones(feat_h * feat_w * self.num_base_anchors, device=device) * stride
        stride_map = stride_map.type_as(self.base_anchors)
        return stride_map

    def responsible_flags(self, featmap_size, gt_bboxes, stride, device='cuda'):
        """Generate responsible anchor flags of grid cells.
        Args: 
            featmap_size: (height, width) of feature map
            gt_bboxes: Tensor of shape (nx4) in (xyxy) format
            stride: stride of current scale

        returns: Tensor of shape (K*A, ), the anchors in the same grid gt boxes fell are set.
        """
        feat_h, feat_w = featmap_size
        # it is more convenient to transfer the gt_bboxes into (cx, cy) format
        gt_bboxes_cx = ((gt_bboxes[:, 0] + gt_bboxes[:, 2]) * 0.5).to(device)
        gt_bboxes_cy = ((gt_bboxes[:, 1] + gt_bboxes[:, 3]) * 0.5).to(device)
        gt_bboxes_grid_x = torch.floor(gt_bboxes_cx / stride).long()
        gt_bboxes_grid_y = torch.floor(gt_bboxes_cy / stride).long()
        # row major indexing
        gt_bboxes_grid_idx = gt_bboxes_grid_y * feat_w + gt_bboxes_grid_x

        # Note: the following assertion may interrupt training if abnormal gt_boxes exist
        assert torch.min(gt_bboxes_grid_x) >= 0 and torch.max(gt_bboxes_grid_x) < feat_w
        assert torch.min(gt_bboxes_grid_y) >= 0 and torch.max(gt_bboxes_grid_y) < feat_h

        responsible_grid = torch.zeros(feat_h * feat_w, dtype=torch.uint8, device=device)
        responsible_grid[gt_bboxes_grid_idx] = 1

        responsible_grid = responsible_grid[:,
                      None].expand(responsible_grid.size(0),
                                   self.num_base_anchors).contiguous().view(-1)
        return responsible_grid
class GridAssigner(BaseAssigner):
	"""Assign a corresponding gt bbox or background to each bbox.

	Each proposals will be assigned with `-1`, `0`, or a positive integer
	indicating the ground truth index.

	- -1: don't care
	- 0: negative sample, no assigned gt
	- positive integer: positive sample, index (1-based) of assigned gt

	Args:
		pos_iou_thr (float): IoU threshold for positive bboxes. 

	    neg_iou_thr (float or tuple): IoU threshold for negative bboxes. 
	
		min_pos_iou (float): Minimum iou for a bbox to be considered as a
            positive bbox. Positive samples can have smaller IoU than
            pos_iou_thr due to the 4th step (assign max IoU sample to each gt).
        
        gt_max_assign_all (bool): Whether to assign all bboxes with the same
            highest overlap with some gt to that gt.
	"""
	def __init__(self,
				 pos_iou_thr,
				 neg_iou_thr,
				 min_pos_iou=.0,
				 gt_max_assign_all=True):
		self.pos_iou_thr = pos_iou_thr
		self.neg_iou_thr = neg_iou_thr
		self.min_pos_iou = min_pos_iou
		self.gt_max_assign_all = gt_max_assign_all

		assert 0 <= pos_iou_thr <= 1 and 0 <= min_pos_iou <= 1

		assert isinstance(neg_iou_thr, (float, tuple, list)) 
		if isinstance(neg_iou_thr, float):
			assert 0 <= neg_iou_thr <= 1
		else:
			assert len(neg_iou_thr) == 2
			assert neg_iou_thr[0] <= neg_iou_thr[1] and 0 <= min(neg_iou_thr) <= 1 	\
				and 0 <= max(neg_iou_thr) <= 1

	def assign(self, 
			   bboxes,
			   box_responsible_flags,
			   gt_bboxes,
			   gt_labels=None):
		"""Assign gt to bboxes. The process is very much like the max iou assigner,
		except that positive samples are constrained within the cell that the gt boxes
		fell in.

		This method assign a gt bbox to every bbox (proposal/anchor), each bbox
		will be assigned with -1, 0, or a positive number. -1 means don't care,
		0 means negative sample, positive number is the index (1-based) of
		assigned gt.
		The assignment is done in following steps, the order matters.

		1. assign every bbox to -1
		2. assign proposals whose iou with all gts <= neg_iou_thr to 0
		3. for each bbox within a cell, if the iou with its nearest gt > pos_iou_thr 
		   and the center of that gt falls inside the cell, assign it to that bbox
		4. for each gt bbox, assign its nearest proposals within the cell the gt bbox 
		   falls in to itself.
		"""

		bboxes = bboxes[:, :4]
		num_gts, num_bboxes = gt_bboxes.size(0), bboxes.size(0)

		# compute iou between all gt and bboxes
		# returned tensor shape is (num_gts, num_bboxes)
		overlaps = bbox_overlaps(gt_bboxes, bboxes)

		# 1. assign -1 by default
		assigned_gt_inds = overlaps.new_full((num_bboxes, ),
											 -1,
											 dtype=torch.long)

		# either no ground truth in image or no anchor boxes
		# it is a standard operation for all anchor based assigners
		if num_gts == 0 or num_bboxes == 0:
			# No ground truth or boxes, return empty assignment
			max_overlaps = overlaps.new_zeros((num_bboxes, ))
			if num_gts == 0:
				# No truth, assign everything to background
				assigned_gt_inds[:] = 0
			if gt_labels is None:
				assigned_labels = None
			else:
				assigned_labels = overlaps.new_zeros((num_bboxes, ),
												     dtype=torch.long)
			return AssignResult(
				num_gts, assigned_gt_inds, max_overlaps, labels=assigned_labels)		

		# 2. assign negative: below
		# for each anchor, which gt best overlaps with it
		# for each anchor, the max iou of all gts
		# shape of max_overlaps == argmax_overlaps == num_bboxes
		max_overlaps, argmax_overlaps = overlaps.max(dim=0)
		
		if isinstance(self.neg_iou_thr, float):
			assigned_gt_inds[(max_overlaps >= 0) & \
						     (max_overlaps <= self.neg_iou_thr)] = 0
		elif isinstance(self.neg_iou_thr, (tuple, list)):
			assigned_gt_inds[(max_overlaps > self.neg_iou_thr[0]) & \
							 (max_overlaps <= self.neg_iou_thr[1])] = 0

		# 3. assign positive: falls into responsible cell and above positive IOU threshold, the 
		#					order matters. 
		# the prior condition of comparsion is to filter out all unrelated anchors, 
		# i.e. not box_responsible_flags
		overlaps[:, ~box_responsible_flags.type(torch.bool)] = -1.
		# calculate max_overlaps again, but this time we only consider IOUs for anchors responsible for
		# prediction
		max_overlaps, argmax_overlaps = overlaps.max(dim=0)

		# for each gt, which anchor best overlaps with it
		# for each gt, the max iou of all proposals
		# shape of gt_max_overlaps == gt_argmax_overlaps == num_gts
		gt_max_overlaps, gt_argmax_overlaps = overlaps.max(dim=1)
		
		pos_inds = (max_overlaps > self.pos_iou_thr) & box_responsible_flags.type(torch.bool)
		assigned_gt_inds[pos_inds] = argmax_overlaps[pos_inds] + 1

		# 4. assign positve to max overlapped anchors within responsible cell
		for i in range(num_gts):
			if gt_max_overlaps[i] > self.min_pos_iou:
				if self.gt_max_assign_all:
					max_iou_inds = (overlaps[i, :] == gt_max_overlaps[i]) & \
									 box_responsible_flags.type(torch.bool)
					assigned_gt_inds[max_iou_inds] = i + 1
				elif box_responsible_flags[gt_argmax_overlaps[i]]:
					assigned_gt_inds[gt_argmax_overlaps[i]] = i + 1

		# assign labels of positive anchors
		if gt_labels is not None:
			assigned_labels = assigned_gt_inds.new_zeros((num_bboxes, ))
			pos_inds = torch.nonzero(assigned_gt_inds > 0).squeeze()
			if pos_inds.numel() > 0:
				assigned_labels[pos_inds] = gt_labels[
					assigned_gt_inds[pos_inds] - 1]

		else:
			assigned_labels = None

		return AssignResult(
			num_gts, assigned_gt_inds, max_overlaps, labels=assigned_labels)

My ap drops a lot in my own dataset after changing @ElectronicElephant head to the refactored one. Shouldn't it be the same to the old version with pos_iou_thr=1.0 , and what does gt_max_assign_all mean?If all my anchor are not same, gt_max_assign_all will have no influence? @sudo-rm-covid19 @xvjiarui

@sudo-rm-covid19
Copy link
Contributor

Hi @LMerCy,
pos_iou_thr = 1 means you don't use a constant threshold to assign positive anchors. Instead you are comparing the anchors that are within the grid cell the GT box fells into with that GT box and choosing the one with max IOU as positive anchor, no matter what value of the IOU is (since min_pos_iou sets 0).
gt_max_assign_all flag means if there are multiple anchor within the cell have the same max IOU with the GT box, then both of them are set positive if the flag is true, otherwise set the first one as positive only.

@LMerCy
Copy link

LMerCy commented Sep 25, 2020

@sudo-rm-covid19 Yes, i agree all what you say.
As if my anchors within the cell are different, their IOU with the same gt won't be the same?

@sudo-rm-covid19
Copy link
Contributor

@sudo-rm-covid19 Yes, i agree all what you say.
As if my anchors within the cell are different, their IOU with the same gt won't be the same?

Yes, I think so and it will choose one anchor per gt.

@LMerCy
Copy link

LMerCy commented Sep 25, 2020

@sudo-rm-covid19 If two gt with similar size fall in the same cell, and they have the same iou with the same anchor, then only one gt will match the anchor. But maybe this may have little influence.

What actually confuse me is my ap
drops with the refactor head, but I can't find why.

liuhuiCNN pushed a commit to liuhuiCNN/mmdetection that referenced this pull request May 21, 2021
@GalSang17
Copy link

Hi, how do I replace Darknet53 in Yolov3 with MobileNetV2?

@ElectronicElephant
Copy link
Contributor Author

Hi, how do I replace Darknet53 in Yolov3 with MobileNetV2?

IMHO, just changing the config file is OK. It should be working in the mm-detection-way. There is nothing special.

@GalSang17
Copy link

Hi, how do I replace Darknet53 in Yolov3 with MobileNetV2?

IMHO, just changing the config file is OK. It should be working in the mm-detection-way. There is nothing special.

image
Is it just like this? The neck section will throw an exception

@ElectronicElephant
Copy link
Contributor Author

Hi, how do I replace Darknet53 in Yolov3 with MobileNetV2?

IMHO, just changing the config file is OK. It should be working in the mm-detection-way. There is nothing special.

image
Is it just like this? The neck section will throw an exception

emmm, there are several points:

For the backbone part, you should check the strides for each out_indices. The length of out_indices should be 3, if you do like minimal change of neck or head.

For the neck part, you need to at least make sure that in_channels work with the backbone.

BTW, you can open a new issue and we can track the problem there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.