-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we add floating-point traps to our DEBUG GNU builds? -ffpe-trap=invalid,zero,overflow
#1231
Comments
i vote yes, we should enable this for GNU. |
I'm in favor as well. |
Yes. I don't see a problem with making this the default in the gnu entry for config_compilers.xml. |
I noted that we were light on debug flags for GNU. So I added the ones mentioned above as well
I read that -Og differs from -O0 in that -Og will try optimizations that should not affect debugging. Re-ran acme_dev. Obviously, only the DEBUG=TRUE tests of acme-dev would be affected.
Here is the code:
|
Hi @ndkeen , Would you please try to submit the run again by commenting out the offending line to see if the model runs fine otherwise? The same quantities, |
OK, I had sent a quick message to NERSC consultants regarding the other job that failed without info and they said that it looked like the job ran out of memory. Of course, I asked how they are getting this valuable information and why can't we have it... But sure, enough, that job as well as this one passed when I doubled the nodes I asked for (and slurm will evenly split up the MPI's, so I effectively get double the memory). I guess it makes sense that adding more debugging flags coudl use more memory, but good to know how close to the edge we are. I will see if it's easy to request more nodes for these problems in default PE layouts for cori-knl. So false alarm. |
That is good to know. Thanks Noel! |
Still trying to find a PE layout that makes all tests happy for cori-knl. Running acme_developer on edison now. All of the tests passed on edison (except HOMME is now failing to link, but unrelated to this change). I can go ahead and merge this to let it be tested on other machines. If it catches something it did not before, that's good, but it could require a little more memory and might cause a fail. |
Mvertens/drv flds in added mct/cime_config/namelist_definition_drv_flds.xml with updated schema removed bld directory updated schema for namelist_definition_drv_flds.xml put in error check that there are no duplicate entries in the drv drv_flds_in that have different values - verified that this works by having CLM change the same namelists that are set by CAM in drv_flds_in In addition to the scripts regression test - verified that the following tests were bfb with cesm2_0_alpha06f SMS_D_Ln9.f09_f09.FWAMIP.yellowstone_intel.cam-reduced_hist3s ERS_Ld7.f19_g16.B1850.yellowstone_intel.allactive-defaultio ERP_Ln9.f09_f09.FC55CLUBB.yellowstone_intel.cam-outfrq9s Test suite: scripts_regression_test Test baseline: Test namelist changes: Test status: bit for bit Fixes #1217 User interface changes?: None Code review: jedwards
Again, forgot to reference this issue with my PR. |
0f241db response to comments 1007a7a cannot predetermin ndims here 99ef07d Merge pull request #1241 from NCAR/free_new_allocs 29ed162 free recently allocated vars fbc3584 Merge pull request #1239 from NCAR/dontuse_nc_max 63dee3d Merge pull request #1240 from NCAR/limitto2GiB 64f2492 limit to 2GiB due to romio bug 29aee05 dont use NC_MAX values d831ad3 Merge pull request #1231 from mgduda/mpi_type_fix e996bdb Merge pull request #1222 from NCAR/ejh_autoconf_logging 426af22 Partial fix for incorrect type of 'mpi_type' in pioc_support.c 7eb724f added enable-logging option to autotools build git-subtree-dir: src/externals/pio2 git-subtree-split: 0f241db88cfee1912a2769a052dba0d2d79f83d5
Mvertens/drv flds in added mct/cime_config/namelist_definition_drv_flds.xml with updated schema removed bld directory updated schema for namelist_definition_drv_flds.xml put in error check that there are no duplicate entries in the drv drv_flds_in that have different values - verified that this works by having CLM change the same namelists that are set by CAM in drv_flds_in In addition to the scripts regression test - verified that the following tests were bfb with cesm2_0_alpha06f SMS_D_Ln9.f09_f09.FWAMIP.yellowstone_intel.cam-reduced_hist3s ERS_Ld7.f19_g16.B1850.yellowstone_intel.allactive-defaultio ERP_Ln9.f09_f09.FC55CLUBB.yellowstone_intel.cam-outfrq9s Test suite: scripts_regression_test Test baseline: Test namelist changes: Test status: bit for bit Fixes #1217 User interface changes?: None Code review: jedwards
For Debug Intel builds, we use the
-fpe0
flag which will stop the code on invalid, divide-by-zero, and overflow. However, I don't see these traps enabled for GNU DEBUG builds.Should we add:
-ffpe-trap=invalid,zero,overflow
?This is from the man page for GNU fortran:
This is more man page for Intel Fortran:
The text was updated successfully, but these errors were encountered: