-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HUB not working correctly with Multi-GPU custom agent setup #695
Comments
👋 Hello @sinchinpark, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:
If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix. If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response. We try to respond to all issues as promptly as possible. Thank you for your patience! |
Sorry it's duplicate of #606 |
@sinchinpark Did you use the Custom option from the Advanced Model Configuration accordion (read more here) to change the device from |
Yes, I'm using HUB portal for all operations (from importing dataset to training the model) |
@sinchinpark Our team will investigate this issue and I will update you as soon as possible. |
@sergiuwaxmann Thanks |
@sinchinpark Thank you! |
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help. For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLO 🚀 and Vision AI ⭐ |
@sinchinpark Hey there! |
Search before asking
HUB Component
Models, Training
Bug
Description
I am experiencing issues when using HUB portal for training on dataset with a multi-GPU custom agent setup. Specifically, I am using 2xGPUs and have modified the default parameters as follows:
However, the HUB does not seem to process the training data correctly and gets stuck throughout the training process. This issue persists even after the training is supposedly finished, as shown in the attached screenshot.
Interestingly, using device=0 on the same machine with the same model works fine!
Logs and Errors:
Here are some potentially useful logs and errors from my custom agents:
Also, I encountered the following warnings multiple times:
Expected Behavior:
The training should proceed without getting stuck, showing progress and metrics on Dashboard and allow to deploy/export after training finished (similar to the behavior observed when using device=0).
Custom Agent Env
Environment
Ultralytics HUB Version
v0.1.43
Client User Agent
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36
Operating System
Linux x86_64
Server Timestamp
1716456982
Minimal Reproducible Example
No response
Additional
No response
The text was updated successfully, but these errors were encountered: