-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib-4412 : UNRECOVERABLE library error of Cray compiler CCE/14.0.0 on Crusher #5001
Comments
Please add location of the error for reference. |
@sarats , I could not locate the source location of this error because there is no location information in e3sm log file. I guess that locating the point of this error may be difficult because the error is occurred inside of a certain binary library. |
Naive question: this is a runtime error, right? |
@sarats , yes, it is a runtime error. The error message does not leave any pointer to where the issue occurred. One thing we know from the e3sm log file is that this error occurred after after printing many balance check warnings. Please see the part of the error in e3sm log file. |
IBM compiler on Ascent/Summit has a similar issue for these 3 cases. There's a work-around for IBM in NGEET/fates#824 . Pinging @rgknox and @glemieux . Have you been able to get accounts on Summit (which would also enable access to Crusher)? IBM: https://my.cdash.org/viewTest.php?onlyfailed&buildid=2223558 |
@amametjanov I've got access to Summit, but addressing NGEET/fates#702 has been low on the priority list. The solar radiation issue is a known issue as well (NGEET/fates#794). I'll chat with Ryan and Charlie about prioritizing this soon. |
@amametjanov Does the above workaround you devised for IBM work for Cray compilers as well? @glemieux Getting this fix incorporated would be good to get things working on Crusher/Frontier. cc @rljacob |
@glemieux Crusher access is included with your OLCF/Summit access. So will Frontier when its available. |
@amametjanov @sarats The source file(biogeochem/EDCohortDynamicsMod.F90) does not exist in current master branch. Az: can you point me which file I should look at? |
In the master version from Jan 20, those files are under |
@sarats @amametjanov Thanks for the info. FYI, I tried to copy your fixes into the latest E3SM, and still got similar error at one of deallocation statements in "PRTGenericMod.F90" as shown below.
|
Possibly something went wrong with copying. I just rebased my branch onto latest E3SM (FATES submodule hash def6b3e76f9ff3043150a777f403883b3e659374).
|
@amametjanov @sarats Yes, after following your directions, I could run the three test cases without failure using Cray compiler. However, a memory leak is detected with Amd compiler at following test case: SMS_Ld20.f45_f45.IELMFATES.crusher_amdclang.elm-fates_rd Even thought the memory leak issue with Amd compiler exists, I think that it is still better to have this fix implemented. |
FYI, this should be fixed by NGEET/fates#824 the next time we update to point the fates submodule to tag sci.1.63.2_api.25.1.0. |
@glemieux Thanks for the fix. I will try to test it at my side on Crusher when the fix is visible at E3SM master branch. |
…pi' into next (PR #5369) This pull request updates the ELM-FATES API to provide FATES with the lightning and population density data from FireMod.F90. This provides ELM-FATES users access to the additional SPITFIRE run modes. The design is adapts the framework developed for CLM-FATES. The design document discussing the background and general design is available in the FATES Developer's Guide. All non-fates tests should be b4b as this PR only adds access to additional FATES modes which are not yet covered by existing tests. This also updates the fates pointer to tag sci.1.63.2_api.25.1.0 bring in the fix to the cray and ibm pointer deallocation issue to resolve #5001. Fixes #5001 [B4B]
The error message is:
6: lib-4412 : UNRECOVERABLE library error
6: An argument in the DEALLOCATE statement is a disassociated pointer, an
6: unallocated array, or a pointer not allocated as a pointer.
Test case name is:
SMS_Ld20.f45_f45.IELMFATES.crusher_crayclang.elm-fates_eca
SMS_Ld20.f45_f45.IELMFATES.crusher_crayclang.elm-fates_rd
ERS_Ld20.f45_f45.IELMFATES.crusher_crayclang.elm-fates
The text was updated successfully, but these errors were encountered: