Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛BUG] Multiple GPU error #2149

Open
Mohsin-Ul-Islam opened this issue Feb 25, 2025 · 0 comments
Open

[🐛BUG] Multiple GPU error #2149

Mohsin-Ul-Islam opened this issue Feb 25, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@Mohsin-Ul-Islam
Copy link

Hi I am getting this error when running the recommendation on multiple cards. I have also set the gpu_id in the config as well.

File "/home/ubuntu/recommender/.venv/lib/python3.12/site-packages/torch/utils/data/distributed.py", line 77, in __init__
    num_replicas = dist.get_world_size()
                   ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/recommender/.venv/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 2020, in get_world_size
    return _get_group_size(group)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/recommender/.venv/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 986, in _get_group_size
    default_pg = _get_default_group()
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/recommender/.venv/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 1150, in _get_default_group
    raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.
@Mohsin-Ul-Islam Mohsin-Ul-Islam added the bug Something isn't working label Feb 25, 2025
@Mohsin-Ul-Islam Mohsin-Ul-Islam changed the title [🐛BUG] Describe your problem in one sentence. [🐛BUG] Multiple GPU error Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant