Skip to content
This repository was archived by the owner on Jun 15, 2021. It is now read-only.

Execution errors with more than 4 nodes #9

Open
e-ago opened this issue Jan 20, 2017 · 0 comments
Open

Execution errors with more than 4 nodes #9

e-ago opened this issue Jan 20, 2017 · 0 comments
Labels

Comments

@e-ago
Copy link

e-ago commented Jan 20, 2017

I tested on the Wilkes cluster (Tesla K20 GPUs) the CoMD-CUDA implementation using up to 16 nodes, and I got some errors:

8 processes, crash. All the processes on the i direction, -e -i 8 -j 1 -k 1 -x 80 -y 80 -z 80
err_8proc_8x.txt

16 processes, all zeroes, -e -i 4 -j 2 -k 2 -x 40 -y 40 -z 40
err_16proc_size40.txt

two output with : -e -i 4 -j 2 -k 2 -x 80 -y 80 -z 80 return different "Final energy" and both values are wrong (different from the Final energy in the 4 processes run)
out1_16proc_4i4j_size80.txt
out2_16proc_4i4j_size80.txt

In general, several run with a size < 80 return all zeroes

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants