Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all_reduce_test stop. #30

Closed
ClaireYang opened this issue Jun 17, 2016 · 3 comments
Closed

all_reduce_test stop. #30

ClaireYang opened this issue Jun 17, 2016 · 3 comments

Comments

@ClaireYang
Copy link

ClaireYang commented Jun 17, 2016

I used CentOS 7.0 and CUDA 7.5 on the server with 6pcs Tesla cards, it stop and has no response when running ./all_reduce_test 10000000 under single folder.
2

My GPU topo is as below

CPU 0 -- GPU0
-- GPU1
-- GPU2
CPU 1 -- GPU3
-- GPU4
-- GPU5
Even I ran with ./all_reduce_test 2 0 1, it still didn't run.
Do I need to install MPI even if I use tests in single folder? Is single test valid for multi-CPU as the topo above?
I checked ACSCtl, all are negative. I don't know what I can do.

@sjeaugey
Copy link
Member

Could this be the same problem as in #19, i.e. you need to turn off ACS ?

@ClaireYang
Copy link
Author

ClaireYang commented Jun 18, 2016

I used Lspci –vvv | grep ACSCtl to check, and all have disabled ACSCtl. So I don't know what I can do.
1

@ClaireYang
Copy link
Author

An new driver can be fixed this issue, which has been posted on nvidia website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants