Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Default process group has not been initialized, please make sure to call init_process_group. #139

Open
nuistZPZ opened this issue May 25, 2024 · 0 comments

Comments

@nuistZPZ
Copy link

您好,我在使用的时候遇到了问题,我发现如果不使用分布式训练就需要修改源代码。直接在命令行中令distributed为False并不能解决该类问题,请问应该如何解决分布式训练带来的问题,只能注释掉相关代码来解决吗。
----translation-----
Hello, I'm having problems with it and I realized that I need to modify the source code if I don't use distributed training. Directly making distributed to False on the command line does not solve this type of problem, how should I solve the problem caused by distributed training, can I only comment out the relevant code to solve it.

----log-----
Traceback (most recent call last):
File "F:\Projects\Multi Modal\ALBEF\Pretrain.py", line 203, in
main(args, config)
File "F:\Projects\Multi Modal\ALBEF\Pretrain.py", line 175, in main
dist.barrier()
File "F:\anaconda3\envs\albef\lib\site-packages\torch\distributed\c10d_logger.py", line 72, in wrapper
return func(*args, **kwargs)
File "F:\anaconda3\envs\albef\lib\site-packages\torch\distributed\distributed_c10d.py", line 3428, in barrier
opts.device = _get_pg_default_device(group)
File "F:\anaconda3\envs\albef\lib\site-packages\torch\distributed\distributed_c10d.py", line 644, in _get_pg_default_device
group = group or _get_default_group()
File "F:\anaconda3\envs\albef\lib\site-packages\torch\distributed\distributed_c10d.py", line 977, in _get_default_group
raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant