-
Notifications
You must be signed in to change notification settings - Fork 4.7k
[comm/ccl.py] Compatible with older pytorch versions #4578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@Liangliang-Ma what is the specific case that would cause ValueError? |
The function |
|
@Liangliang-Ma After apply this patch I see a different error with pytorch 2.1. Can you also fix this error in this PR? I think it should due to difference between pytorch 2.0 and pytorch 2.1 |
|
Its a test case issue. No further comments on this PR.
|
|
Content duplicated with #4430 . Close this one. |
In latest pytorch, torch.distributed.distributed_c10d.get_global_rank would return ValueError if passed with wrong parameter.
A few versions ago, it would return RuntimeError.
To be able to run in all those situations, I modified comm/ccl.py to be compatible.