-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The program only works on 1 GPU when using nccl 2.0 #123
Comments
Please take a look at #19 and see if that might be related. |
Please make sure you recompile the NCCL tests with the correct nccl.h when switching from NCCL 1 to NCCL 2. The nccl.h changed between NCCL 1 and NCCL 2 and compiling the tests with a NCCL 1 nccl.h will cause a hang when running with NCCL 2. |
@cliffwoolley @sjeaugey Thanks for your response! I download Then I copy Besides the header file, is there any other possibility? Such as compile option? Thanks! |
I've seen a nccl.h (from NCCL 1) in /usr/local/include take precedence over the one specified on the command line. I added a printf to the NCCL tests to display the version you compiled against. Just to double check, can you update the NCCL tests, compile and run again ? |
@sjeaugey Sorry for interrupting u again.
I don't install
You mean you update the Thanks! |
Oh, OK, I see the problem now. You tried running the tests from NCCL 1 with NCCL 2. This is not supposed to work. Please use the NCCL tests instead (https://github.com/nvidia/nccl-tests). |
@sjeaugey Thanks very much! That's the point. Because I don't know |
Hi all,
I download nccl 2.0, and try to run
reduce_test.cu
file usingnccl 2.0
(modifytest_utilities.h
to adapt tonccl 2.0
):$ nvcc -gencode=arch=compute_60,code=sm_60 -I/usr/local/nccl/include -o reduce_test reduce_test.cu /usr/local/nccl/lib/libnccl.so -lcudart -lrt -lcuda -lcurand -lnvToolsExt
I find the program only runs when specifying
1
GPU:When I want to utilize all
4
GPUs, the program seems hang:Could anyone give some suggestions of this issue? Or can provide some example on using
nccl 2.0
?P.S. the
nccl 1.0
works fine on my server.The text was updated successfully, but these errors were encountered: