-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all_reduce_test stop. #30
Comments
Could this be the same problem as in #19, i.e. you need to turn off ACS ? |
An new driver can be fixed this issue, which has been posted on nvidia website. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I used CentOS 7.0 and CUDA 7.5 on the server with 6pcs Tesla cards, it stop and has no response when running ./all_reduce_test 10000000 under single folder.
My GPU topo is as below
CPU 0 -- GPU0
-- GPU1
-- GPU2
CPU 1 -- GPU3
-- GPU4
-- GPU5
Even I ran with ./all_reduce_test 2 0 1, it still didn't run.
Do I need to install MPI even if I use tests in single folder? Is single test valid for multi-CPU as the topo above?
I checked ACSCtl, all are negative. I don't know what I can do.
The text was updated successfully, but these errors were encountered: