-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiments on some nodes are 10x slower than other #19
Comments
Hello @h4duan. Did you try to reproduce the issue again in the same nodes? I noticed one of the GPUs (t006-009, ID 0) remained unused during the execution you are mentioning, but I've been able to run successfully in that same GPU. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I just launched a 4-node experiment on mi1008x (t006-[009-010],t007-[009-010]) and found that my experiment ran significantly slower (more than 10 times) than before. Then I ran the exact same experiment on another 4 node (t004-007,t006-007,t008-[007,009]) and the speed is the same as before. I haven't experience this issue before. I'm wondering if there's something wrong with the nodes in (t006-[009-010],t007-[009-010]). Thanks!
The text was updated successfully, but these errors were encountered: