You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes slurm (on some clusters) fails to exit when there is a nan - this means the program ends but does not exit, which is annoying and wastes resources.
The problem is with OpenMP - it works if you run with only one thread. The solution seems to be that the master thread needs to call the exit, but then we need to make sure that it is not only the master thread that can find nan to exit - we want it to exit for nan on any thread, and also we still want the thread that finds the nan to write out the data of where it is.
The text was updated successfully, but these errors were encountered:
mirenradia
changed the title
Slurm doesn't exit in Nancheck with openMP
Simulation sometimes fails to abort when NanCheck finds a NaN with >1 OpenMP thread.
Jul 10, 2023
Sometimes slurm (on some clusters) fails to exit when there is a nan - this means the program ends but does not exit, which is annoying and wastes resources.
The problem is with OpenMP - it works if you run with only one thread. The solution seems to be that the master thread needs to call the exit, but then we need to make sure that it is not only the master thread that can find nan to exit - we want it to exit for nan on any thread, and also we still want the thread that finds the nan to write out the data of where it is.
The text was updated successfully, but these errors were encountered: