Set coco pre-trained YoloV5 input size to native resolution > 4K, how does this affect predictions? #6137

Michelvl92 · 2021-12-30T11:35:42Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hi,

I usually use your coco pre-trained yolov5-x model to test on my datasets to check the prediction/test performance. These datasets usually have images/video frames of high resolution usually > 4K and even > 8K, sometimes even 240 pixels, and contains usually small objects.

When I increase the input resolution to the native resolution in some cases "it looks visually" (not tested on AP score) that is able to detect almost all the objects I need to detect, which is good news. But I want to have a better understanding if this is a good habit, or if I actually should tile the frames to the model's native training input resolution of 640x640?

My quick thoughts are that the model multi-scale prediction grid is expanded, and becomes less fine-grained, and thus harder to detect small objects or objects that are densely packed, and thus should not be the way to go?, and therefore should I better tile the input images in the native trained input resolution? If this is correct or not, could you provide a better (theoretical) explanation?

I am focusing mainly on datasets that contain daily small images of area sizes in the range of 8-32 pixels (In the huge images). resizing/subsampling those images to the native training resolution will almost remove the small images. Therefore the images should be processed as much as possible on the native resolution (if you agree with this).
What would be the best training strategies?

Tiled training (native training resolution of 640640), and tiled prediction?
Upscale the model input to dataset images (e.g. 4k, 8k, etc.) and train a model on the resolution (eventually change the grinds settings to have a more finely grid?), prediction with the full resolution?
Tiled training, but full native dataset resolution prediction?
Other better suggestions?

glenn-jocher · 2021-12-30T16:04:49Z

@Michelvl92 the image size itself is not important, merely that your objects are similarly sized during training and deployment.

Michelvl92 · 2022-01-03T16:29:50Z

@glenn-jocher I should be more clear, not the image size, but the input size of the yolov5 input tensor. E.g. such that images are not resized but processed by YoloV5 at the native resolution of e.g. 4K or 8K (which of course will make inference really slow.

glenn-jocher · 2022-01-03T17:48:59Z

@Michelvl92 sure you can specify 4k or 8k inference if your hardware allows it:

python detect.py --img 3840
python detect.py --img 7680

github-actions · 2022-02-03T00:13:16Z

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

Audrey528 · 2022-12-08T09:37:32Z

@glenn-jocher Why do using coordinate added max_wh to calculate NMS?

c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS

glenn-jocher · 2022-12-08T23:12:39Z

class offsets

Michelvl92 added the question Further information is requested label Dec 30, 2021

glenn-jocher linked a pull request Jan 3, 2022 that will close this issue

Update NMS max_wh=7680 for 8k images #6178

Merged

glenn-jocher mentioned this issue Jan 3, 2022

Update NMS max_wh=7680 for 8k images #6178

Merged

glenn-jocher closed this as completed in #6178 Jan 3, 2022

glenn-jocher reopened this Jan 3, 2022

github-actions bot added the Stale Stale and schedule for closing soon label Feb 3, 2022

github-actions bot closed this as completed Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set coco pre-trained YoloV5 input size to native resolution > 4K, how does this affect predictions? #6137

Set coco pre-trained YoloV5 input size to native resolution > 4K, how does this affect predictions? #6137

Michelvl92 commented Dec 30, 2021 •

edited

Loading

glenn-jocher commented Dec 30, 2021

Michelvl92 commented Jan 3, 2022

glenn-jocher commented Jan 3, 2022

github-actions bot commented Feb 3, 2022 •

edited by glenn-jocher

Loading

Audrey528 commented Dec 8, 2022

glenn-jocher commented Dec 8, 2022

Set coco pre-trained YoloV5 input size to native resolution > 4K, how does this affect predictions? #6137

Set coco pre-trained YoloV5 input size to native resolution > 4K, how does this affect predictions? #6137

Comments

Michelvl92 commented Dec 30, 2021 • edited Loading

Search before asking

Question

glenn-jocher commented Dec 30, 2021

Michelvl92 commented Jan 3, 2022

glenn-jocher commented Jan 3, 2022

github-actions bot commented Feb 3, 2022 • edited by glenn-jocher Loading

Audrey528 commented Dec 8, 2022

glenn-jocher commented Dec 8, 2022

Michelvl92 commented Dec 30, 2021 •

edited

Loading

github-actions bot commented Feb 3, 2022 •

edited by glenn-jocher

Loading