-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support custom trainer and backend #91
Conversation
class _CustomTorchBackend(_TorchBackend): | ||
share_cuda_visible_devices: bool = True | ||
|
||
def on_start(self, worker_group: WorkerGroup, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the process group is not initialized, how about initializing it here without throwing an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have no way to know whether the process group is initialized in Task.run
method.
Co-authored-by: Hakjin Lee <nijkah@gmail.com> Signed-off-by: Junhwa Song <ethan9867@gmail.com>
Signed-off-by: Junhwa Song <ethan9867@gmail.com>
c0b83ab
to
9e54600
Compare
* Bump ray from 1.9.1 to 2.1.0 * Fix deprecated warning * Refactor * Fix modules * Fix requirements * Fix test code * Support custom trainer and backend (#91) * Upgrade MMTask (#97) * Fix minor (#100) * Fix blocking issue at test_tasks.py * Support single GPU tuning * Bump FLAML to v1.0.14 to avoid deprecated warning * Supplement documentations (#102) * Support resume (#104) Co-authored-by: Younghwan Na <100389977+yhna940@users.noreply.github.com> Co-authored-by: Hakjin Lee <nijkah@gmail.com>
Motivation
Since MM-based repositories already use
torch.distributed.init_process_group
, usingTorchTrainer
for DDP in ray framework causes theRuntimeError("trying to initialize the default process group twice!")
.To solve this problem, I introduced a custom backend modified from here.