You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When running a periodic calculation with 9 twists, 724 electrons, and 44 atoms using the mixed precision version of QMCPACK with GPU offload, the calculation was aborted with the following error:
NaNguard::checkOneParticleGradients error message: TWF::calcRatioGrad at particle 687
grads[0] = (-nan,0.0418255)
grads[1] = (-nan,-0.0806002)
grads[2] = (-nan,0.0412396)
Unexpected exception thrown in threaded section
Fatal Error. Aborting at Unhandled Exception
This issue appears to be related to NaN values in the gradients of the wave function for a specific particle.
The same calculation with full precision ran smoothly without any problems.
Expected behavior
The calculation should complete successfully without encountering NaN values in the wave function gradients, resulting in accurate and stable output data.
System:
System name: Perlmutter
Modules loaded:
module use /global/common/software/nersc/n9/llvm/modules
module load craype cray-mpich
module load llvm/17.0.6-gpu
Other systems where this is reproducible: Not tested on other systems.
Additional context
The calculation was performed using the complex version of QMCPACK with NVIDIA GPU and OpenMP offload.
No other context or error messages where in the output files.
The text was updated successfully, but these errors were encountered:
Thanks for the report Roman. Is this the first run you have tried or are other runs either working or failing for you? Any issues with other runs? I see the full precision run of this system was fine.
I tried it first for the larger system and ended up with the same error as for this smaller system. I did not investigated any further. For full precision, I did not run into any issues as you wrote.
Describe the bug
When running a periodic calculation with 9 twists, 724 electrons, and 44 atoms using the mixed precision version of QMCPACK with GPU offload, the calculation was aborted with the following error:
The same calculation with full precision ran smoothly without any problems.
To Reproduce
Input and output files below:
dmc_2x2_single_prec-test.zip
Expected behavior
The calculation should complete successfully without encountering NaN values in the wave function gradients, resulting in accurate and stable output data.
System:
System name: Perlmutter
Modules loaded:
module use /global/common/software/nersc/n9/llvm/modules
module load craype cray-mpich
module load llvm/17.0.6-gpu
Other systems where this is reproducible: Not tested on other systems.
Additional context
The calculation was performed using the complex version of QMCPACK with NVIDIA GPU and OpenMP offload.
No other context or error messages where in the output files.
The text was updated successfully, but these errors were encountered: