Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tveg causing errors in global nocomp runs #973

Closed
JessicaNeedham opened this issue Jan 12, 2023 · 5 comments · Fixed by #976
Closed

tveg causing errors in global nocomp runs #973

JessicaNeedham opened this issue Jan 12, 2023 · 5 comments · Fixed by #976

Comments

@JessicaNeedham
Copy link
Contributor

The variable tveg is causing errors in global runs with nocomp on. With debug=TRUE there's a floating divide by zero error here:

hio_tveg(io_si) = hio_tveg(io_si) + &

from site_area_veg being zero in some patches - defined here:

site_area_veg = area - sites(s)%area_pft(0)

If debug is false then the following error gets triggered:

Non blocking write for variable (FATES_TVEG, varid=260) failed (Number of subarray requests/regions=1, Size of data local to this process = 792). NetCDF: Numeric conversion not representable (err=-60).

@glemieux
Copy link
Contributor

Thanks for the heads up @JessicaNeedham. I think this suggests that somehow there are patches on a bareground site that aren't having their patchno set to zero.

@glemieux
Copy link
Contributor

@JessicaNeedham how quickly into the run are you seeing this? In the cases you are running are you using a restart?

@JessicaNeedham
Copy link
Contributor Author

@glemieux divide by zero error occurs during compilation if debug=TRUE. Otherwise, if debug=FALSE, the error occurs after the first history file is written to output. I get one month of output and then it crashes.

My simulations are from bare ground.

@glemieux
Copy link
Contributor

glemieux commented Jan 19, 2023

@JessicaNeedham determined that the issue is due to the fact that the patchno is only being set with a hlm_use_spmode check instead of a broader nocomp check here:

if(hlm_use_sp.eq.itrue)then
patchno = 1
currentPatch => currentSite%oldest_patch
do while(associated(currentPatch))
if(currentPatch%nocomp_pft_label.eq.0)then
! for bareground patch, we make the patch number 0
! we also do not count this in the veg. patch numbering scheme.
currentPatch%patchno = 0
else

Updating allows an fbg+nocomp case with debug on to run successfully on Compy. That said, when I tried replicating the fix on Cori, I was coming up with new error that I'm still trying to track down. Our first guess is that maybe it is due to differences in the debug compiler options between the two machines and the issue I'm seeing may be real.

UPDATE: The issue I was seeing with the new error actually was a recurrence of #911 due to the fact that I was testing off of an older fates tag that didn't include this fix. Testing this with the updates on #888 result in a passing nocomp+fbg debug mode test.

@glemieux
Copy link
Contributor

I was able to replicate this issue with clm-fates on Cheyenne as well when running a test in debug mode. Testing out the fix via #976, however, resulted in a downstream issue in spitfire with patchno again, so this will need some more review. I suspect we simply need to add in the common nocomp_pft_label check on the patch loops in SFMainMod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants