-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set coco pre-trained YoloV5 input size to native resolution > 4K, how does this affect predictions? #6137
Comments
@Michelvl92 the image size itself is not important, merely that your objects are similarly sized during training and deployment. |
@glenn-jocher I should be more clear, not the image size, but the input size of the yolov5 input tensor. E.g. such that images are not resized but processed by YoloV5 at the native resolution of e.g. 4K or 8K (which of course will make inference really slow. |
@Michelvl92 sure you can specify 4k or 8k inference if your hardware allows it:
|
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐! |
@glenn-jocher Why do using coordinate added
|
class offsets |
Search before asking
Question
Hi,
I usually use your coco pre-trained yolov5-x model to test on my datasets to check the prediction/test performance. These datasets usually have images/video frames of high resolution usually > 4K and even > 8K, sometimes even 240 pixels, and contains usually small objects.
When I increase the input resolution to the native resolution in some cases "it looks visually" (not tested on AP score) that is able to detect almost all the objects I need to detect, which is good news. But I want to have a better understanding if this is a good habit, or if I actually should tile the frames to the model's native training input resolution of 640x640?
My quick thoughts are that the model multi-scale prediction grid is expanded, and becomes less fine-grained, and thus harder to detect small objects or objects that are densely packed, and thus should not be the way to go?, and therefore should I better tile the input images in the native trained input resolution? If this is correct or not, could you provide a better (theoretical) explanation?
I am focusing mainly on datasets that contain daily small images of area sizes in the range of 8-32 pixels (In the huge images). resizing/subsampling those images to the native training resolution will almost remove the small images. Therefore the images should be processed as much as possible on the native resolution (if you agree with this).
What would be the best training strategies?
The text was updated successfully, but these errors were encountered: