Skip to content

MaskRCNN model doesn't work with torch.cuda.amp.autocast RuntimeError: Unrecognized tensor type ID: Autocast #2172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WaterKnight1998 opened this issue May 2, 2020 · 7 comments

Comments

@WaterKnight1998
Copy link

WaterKnight1998 commented May 2, 2020

🐛 Bug

I am converting the model into FP16.

Using torch.cuda.amp.autocast. But it throws me an error:

~/Documents/test/seg/models/archs/mask_rcnn.py in mixed_precision_one_batch(self, i, b)
    186         with autocast():
    187             self.model.train()
--> 188             loss_dict = self.model(images,targets)
    189             if not self.training:
    190                 self.model.eval()

~/anaconda3/envs/seg/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    556             result = self._slow_forward(*input, **kwargs)
    557         else:
--> 558             result = self.forward(*input, **kwargs)
    559         for hook in self._forward_hooks.values():
    560             hook_result = hook(self, input, result)

~/anaconda3/envs/seg/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
     68         if isinstance(features, torch.Tensor):
     69             features = OrderedDict([('0', features)])
---> 70         proposals, proposal_losses = self.rpn(images, features, targets)
     71         detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
     72         detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

~/anaconda3/envs/seg/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    556             result = self._slow_forward(*input, **kwargs)
    557         else:
--> 558             result = self.forward(*input, **kwargs)
    559         for hook in self._forward_hooks.values():
    560             hook_result = hook(self, input, result)

~/anaconda3/envs/seg/lib/python3.7/site-packages/torchvision/models/detection/rpn.py in forward(self, images, features, targets)
    486         proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
    487         proposals = proposals.view(num_images, -1, 4)
--> 488         boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
    489 
    490         losses = {}

~/anaconda3/envs/seg/lib/python3.7/site-packages/torchvision/models/detection/rpn.py in filter_proposals(self, proposals, objectness, image_shapes, num_anchors_per_level)
    408             boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep]
    409             # non-maximum suppression, independently done per level
--> 410             keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh)
    411             # keep only topk scoring predictions
    412             keep = keep[:self.post_nms_top_n()]

~/anaconda3/envs/seg/lib/python3.7/site-packages/torchvision/ops/boxes.py in batched_nms(boxes, scores, idxs, iou_threshold)
     73     offsets = idxs.to(boxes) * (max_coordinate + 1)
     74     boxes_for_nms = boxes + offsets[:, None]
---> 75     keep = nms(boxes_for_nms, scores, iou_threshold)
     76     return keep
     77 

~/anaconda3/envs/seg/lib/python3.7/site-packages/torchvision/ops/boxes.py in nms(boxes, scores, iou_threshold)
     33         by NMS, sorted in decreasing order of scores
     34     """
---> 35     return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
     36 
     37 

RuntimeError: Unrecognized tensor type ID: Autocast
@WaterKnight1998 WaterKnight1998 changed the title Looks like MaskRCNN doesn't work at FP16 Looks like MaskRCNN doesn't work with torch.cuda.amp.autocast May 3, 2020
@WaterKnight1998 WaterKnight1998 changed the title Looks like MaskRCNN doesn't work with torch.cuda.amp.autocast MaskRCNN model doesn't work with torch.cuda.amp.autocast RuntimeError: Unrecognized tensor type ID: Autocast May 3, 2020
@zhangguanheng66
Copy link
Contributor

Please copy/paste code snippet for reproducing.

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented May 4, 2020

Please copy/paste code snippet for reproducing.

@zhangguanheng66

with autocast():
           self.model.train()
           loss_dict = self.model(images,targets)
           if not self.training:
               self.model.eval()
               self.pred=self.model(images)
           else:
               self.pred=loss_dict
           self('after_pred')       
           # Nos quedamos con  la perdida de la máscara
           loss = sum(loss for loss in loss_dict.values())
           self.loss = loss;                                
       if not self.training: return
       self.scaler.scale(self.loss).backward();       
       self.scaler.step(self.opt);                     
       self.opt.zero_grad()

Where model is: torchvision.models.detection.mask_rcnn.maskrcnn_resnet50_fpn

@zhangguanheng66
Copy link
Contributor

Please copy/paste code snippet for reproducing.

@zhangguanheng66

with autocast():
           self.model.train()
           loss_dict = self.model(images,targets)
           if not self.training:
               self.model.eval()
               self.pred=self.model(images)
           else:
               self.pred=loss_dict
           self('after_pred')       
           # Nos quedamos con  la perdida de la máscara
           loss = sum(loss for loss in loss_dict.values())
           self.loss = loss;                                
       if not self.training: return
       self.scaler.scale(self.loss).backward();       
       self.scaler.step(self.opt);                     
       self.opt.zero_grad()

Where model is: torchvision.models.detection.mask_rcnn.maskrcnn_resnet50_fpn

@WaterKnight1998 Can you provide a more accessible code snippet? The above code seems part from your class object.

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented May 4, 2020

@WaterKnight1998 Can you provide a more accessible code snippet? The above code seems part from your class object.

model=torchvision.models.detection.maskrcnn_resnet50_fpn(num_classes=91)
model=model.train()
scaler = GradScaler()
with autocast():
  for epoch in epochs:
    for input, target in data:
        losses = model(input,target)
        loss = sum([loss for losss in losses.values()])
        # Scales the loss, and calls backward() on the scaled loss to create scaled gradients.
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        optimizer.zero_grad()

@zhangguanheng66
Copy link
Contributor

I got URLError like this

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

@WaterKnight1998
Copy link
Author

I got URLError like this

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

Why?

@fmassa
Copy link
Member

fmassa commented Oct 21, 2020

R-CNN models in torchvision now natively support autocast since the 0.7.0 release, see #2384 and https://github.com/pytorch/vision/releases/tag/v0.7.0

@fmassa fmassa closed this as completed Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants