Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two FATES tests that fail #1129

Open
ekluzek opened this issue Aug 31, 2020 · 13 comments · Fixed by #1179
Open

Two FATES tests that fail #1129

ekluzek opened this issue Aug 31, 2020 · 13 comments · Fixed by #1179
Assignees
Labels
bug something is working incorrectly

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Aug 31, 2020

Brief summary of bug

ERP_Ld9.f45_f45_mg37.I2000Clm45Fates.izumi_nag.clm-FatesAllVars
SMS_Lm6.f45_f45_mg37.I2000Clm45Fates.cheyenne_intel.clm-Fates

General bug information

CTSM version you are using: ctsm1.0.dev108-2-ga69b1b21 (release-cesm2.2 branch before first tag)

Does this bug cause significantly incorrect results in the model's science? N
Configurations affected: FATES

Details of bug

The AllVars test asks for some variables that don't exist. So the list in the user_nl_clm just needs to be shortened.

The other test fails because finidat is being set to the $DIN_LOC_ROOT directory, because there is an empty entry for fates. The empty entry should be removed.

@ekluzek ekluzek added the bug something is working incorrectly label Aug 31, 2020
@ekluzek ekluzek added this to the cesm2.2.0 milestone Aug 31, 2020
@ekluzek ekluzek self-assigned this Aug 31, 2020
ekluzek added a commit that referenced this issue Aug 31, 2020
… equal to /glade/p/cesmdata/cseg/inputdata, this fixes the other part of #1129
@glemieux
Copy link
Collaborator

glemieux commented Oct 1, 2020

@ekluzek I'm seeing this error on a simple f45_f45_mg37 grid case run on cheyenne. What prompted the original removal? Was this commit brought over to main?

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 1, 2020

@glemieux I'm not exactly sure what you are asking. The setting for finidat was making it get set to a cold-start, but that contradicted some of the other settings so it wouldn't work.

And no, this hasn't been brought over to main yet, but needs too be.

@glemieux
Copy link
Collaborator

glemieux commented Oct 2, 2020

Sorry @ekluzek I was referring to the original commit that removed the filename from the finidat definition: 73c1bae#diff-189bd11300fa533cd56fbd01e1204a93L609-L610

I don't quite understand how the finidat definition works; if you define something, it's supposed to be telling the hlm to use a specific file that meets the given case setup, correct? But if you remove that file, does it default back to some generic file?

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 2, 2020

@glemieux we should perhaps chat about this offline. But, the reason I had to remove that file was that it no longer worked because of the fsurdat file update. A problem with FATES is that we've got to have exact matches and can't use the interpolation. So we either have to constantly update finidat files because of changes that demand a new one -- or do a coldstart. What I did was to just remove the filename, but I should have removed the whole thing.

For fates if it can't find a file it'll default to a coldstart. For other configurations it can usually find some file that will match.

@glemieux
Copy link
Collaborator

glemieux commented Oct 2, 2020

Thanks @ekluzek that helped fill a gap in my knowledge.

Chatting offline, maybe at the next ctsm software meeting, about when to bring the fix in sounds good.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Mar 16, 2021

The issue was that these tests fail with the NAG compiler when DEBUG is not TRUE. So we changed our tests so that they are all DEBUG on for izumi_nag.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Mar 16, 2021

This came up again because this test was not working for @fischer-ncar

ERS_Ld5_Mmpi-serial.1x1_vancouverCAN.I1PtClm45SpRs.izumi_nag.clm-default

We need to remove it from the test list or make it with DEBUG on.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Mar 16, 2021

The error we see looks like this...

Warning: /scratch/cluster/erik/ctsm5.1.dev026/src/fates/parteh/PRTAllometricCNPMod.F90, line 2280: H_ALLOM explicitly imported into PRTALLOMETRICCNPMOD but not used
Warning: /scratch/cluster/erik/ctsm5.1.dev026/src/fates/parteh/PRTAllometricCNPMod.F90, line 2280: LEAVES_ON explicitly imported into PRTALLOMETRICCNPMOD but not used
Extension: /scratch/cluster/erik/ctsm5.1.dev026/src/fates/parteh/PRTAllometricCNPMod.F90, line 1665: Argument A2 (no. 2) data type DOUBLE PRECISION inconsistent with previous argument with data type REAL in reference to intrinsic MAX
Panic: /scratch/cluster/erik/ctsm5.1.dev026/src/fates/parteh/PRTAllometricCNPMod.F90: Unexpected expr node type 432
Internal Error -- please report this bug

@ekluzek
Copy link
Collaborator Author

ekluzek commented Mar 16, 2021

Note a related error that a user is getting with an old GNU compiler is here:

https://bb.cgd.ucar.edu/cesm/threads/error-build-fail-clm-buildlib-failed-clm5-0-porting.5509/#post-39508

@ekluzek ekluzek added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Mar 16, 2021
billsacks added a commit to ESCOMP/MOSART that referenced this issue Mar 17, 2021
These are failing due to a nag compiler issue similar to what is
reported here: ESCOMP/CTSM#1129

@ekluzek fyi
@billsacks
Copy link
Member

billsacks commented Mar 18, 2021

I looked briefly into the error reported in the forum post, which points to this line, if I understand correctly:

     ! Have the global generic pointer, point to this hypothesis' object
     prt_global => prt_global_acnp

This does look suspicious to me: prt_global is declared as:

  type(prt_global_type),pointer,public :: prt_global

whereas prt_global_acnp is declared as:

   class(prt_global_type), public, target, allocatable :: prt_global_acnp

Note the type vs. class. Class is more general than type, so I'm not surprised that some compilers don't like having you set the specific prt_global point to the more general prt_global_acnp: what would happen if this prt_global_acnp instance were actually some subclass? (I'm actually surprised that any compiler allows this.)

I think this should be changed in one of 3 ways:

  1. Declare prt_global to be class(prt_global_type) instead of type
  2. Declare prt_global_acnp to be type(prt_global_type) instead of class
  3. (probably not the right solution here, but possible) Wrap the pointer assignment in a "select type" block, only doing the the assignment if prt_global_acnp satisfies "type is prt_global_type", aborting in any other case.

@rgknox
Copy link
Collaborator

rgknox commented Mar 18, 2021

Thanks for looking into this everyone. I'm leaning towards option 1 @billsacks

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Mar 18, 2021
@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 7, 2021

OK, this error is still happening in ctsm5.1.dev064. So the proposed fix didn't completely fix it. We've gotten around it by only running NAG tests on izumi with DEBUG on. This is a problem for CAM test lists though. We keep talking about this in CSEG meetings and aren't sure what to do.

I'm reopening to raise awareness...

@ekluzek
Copy link
Collaborator Author

ekluzek commented Aug 19, 2023

I've put our workaround of making sure on izumi_nag only DEBUG tests are done on the release-cesm2.2 branch. So that clears the cesm2.2 milestone for this issue.

The issue itself still exists on the main dev branch though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants