Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running NCCL mpi test accros multiple nodes #33

Closed
sharannarang opened this issue Jun 28, 2016 · 3 comments
Closed

Running NCCL mpi test accros multiple nodes #33

sharannarang opened this issue Jun 28, 2016 · 3 comments

Comments

@sharannarang
Copy link

Hi,

I've built and run the mpi_test on 1 node with 8 TitanX gpus successfully. I use srun to launch the mpi test and it passes. However, the test fails when run across 2 nodes with 8 TitanX gpus per node. I use the following command line:

srun -N2 -n16 --gres=gpu:8 -p TitanXx8 build/test/mpi/mpi_test 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

The test fails with the following error:

WARN src/core.cu:225 failed to allocate 2101248 byte device buffer
WARN src/core.cu:596 rank 12 failed to allocate device buffer
WARN src/core.cu:683 rank 12 failed to allocate communicator
NCCL Init failed (10) 'cuda malloc failed'

Does NCCL run across multiple nodes?

@sjeaugey
Copy link
Member

No, indeed, NCCL doesn't run across multiple nodes.

@sharannarang
Copy link
Author

Are there any plans to add this support?

@sjeaugey
Copy link
Member

sjeaugey commented Aug 4, 2017

Inter-node communication has been implemented in NCCL2, which is now available at https://developer.nvidia.com/nccl.

@sjeaugey sjeaugey closed this as completed Aug 4, 2017
minsii added a commit to minsii/nccl that referenced this issue Nov 13, 2023
Summary:

When concurrent collective/p2p are sent via multiple NCCL communicators, ctran mapper register/deregister/search paths can be called by multiple threads concurrently. Thus, we need ensure thread-safety for the global timer for registration.

This patch fixes it by adding mutex for all accesses to the global variables used.

Differential Revision: D51083701
minsii added a commit to minsii/nccl that referenced this issue Nov 14, 2023
Summary:

When concurrent collective/p2p are sent via multiple NCCL communicators, ctran mapper register/deregister/search paths can be called by multiple threads concurrently. Thus, we need ensure thread-safety for the global timer for registration.

This patch fixes it by adding mutex for all accesses to the global variables used.

Differential Revision: D51083701
minsii added a commit to minsii/nccl that referenced this issue Nov 15, 2023
Summary:
Pull Request resolved: facebookresearch#33

When concurrent collective/p2p are sent via multiple NCCL communicators, ctran mapper register/deregister/search paths can be called by multiple threads concurrently. Thus, we need ensure thread-safety for the global timer for registration.

This patch fixes it by adding mutex for all accesses to the global variables used.

Reviewed By: wesbland

Differential Revision: D51083701

fbshipit-source-id: ca0ba40484f9c871780fc99623e0c9d8224328e3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants