Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set coco pre-trained YoloV5 input size to native resolution > 4K, how does this affect predictions? #6137

Closed
1 task done
Michelvl92 opened this issue Dec 30, 2021 · 6 comments · Fixed by #6178
Closed
1 task done
Labels
question Further information is requested Stale Stale and schedule for closing soon

Comments

@Michelvl92
Copy link

Michelvl92 commented Dec 30, 2021

Search before asking

Question

Hi,

I usually use your coco pre-trained yolov5-x model to test on my datasets to check the prediction/test performance. These datasets usually have images/video frames of high resolution usually > 4K and even > 8K, sometimes even 240 pixels, and contains usually small objects.

When I increase the input resolution to the native resolution in some cases "it looks visually" (not tested on AP score) that is able to detect almost all the objects I need to detect, which is good news. But I want to have a better understanding if this is a good habit, or if I actually should tile the frames to the model's native training input resolution of 640x640?

My quick thoughts are that the model multi-scale prediction grid is expanded, and becomes less fine-grained, and thus harder to detect small objects or objects that are densely packed, and thus should not be the way to go?, and therefore should I better tile the input images in the native trained input resolution? If this is correct or not, could you provide a better (theoretical) explanation?

I am focusing mainly on datasets that contain daily small images of area sizes in the range of 8-32 pixels (In the huge images). resizing/subsampling those images to the native training resolution will almost remove the small images. Therefore the images should be processed as much as possible on the native resolution (if you agree with this).
What would be the best training strategies?

  • Tiled training (native training resolution of 640640), and tiled prediction?
  • Upscale the model input to dataset images (e.g. 4k, 8k, etc.) and train a model on the resolution (eventually change the grinds settings to have a more finely grid?), prediction with the full resolution?
  • Tiled training, but full native dataset resolution prediction?
  • Other better suggestions?
@Michelvl92 Michelvl92 added the question Further information is requested label Dec 30, 2021
@glenn-jocher
Copy link
Member

@Michelvl92 the image size itself is not important, merely that your objects are similarly sized during training and deployment.

@Michelvl92
Copy link
Author

@glenn-jocher I should be more clear, not the image size, but the input size of the yolov5 input tensor. E.g. such that images are not resized but processed by YoloV5 at the native resolution of e.g. 4K or 8K (which of course will make inference really slow.

@glenn-jocher
Copy link
Member

@Michelvl92 sure you can specify 4k or 8k inference if your hardware allows it:

python detect.py --img 3840
python detect.py --img 7680

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Feb 3, 2022
@github-actions github-actions bot closed this as completed Feb 8, 2022
@Audrey528
Copy link

@glenn-jocher Why do using coordinate added max_wh to calculate NMS?

c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS

@glenn-jocher
Copy link
Member

class offsets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale Stale and schedule for closing soon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants