-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2-Moment Moist is non-zero-diff with MAPL develop #762
Comments
@mathomp4 Not sure what's different/changed?
|
Is it because of MAPL update or because of all the staff we merged into develop in GEOSgcm_GridComp? |
My test is the difference between using MAPL v2.6.3 and MAPL |
As I said above, I am building GEOS with MAPL v2.6.4 now. That can at least help us bracket where the difference occurs. ETA: I should have those results in...30 minutes? I'm building with |
Update. The difference appears to be between v2.6.4 and https://github.com/GEOS-ESM/MAPL/compare/v2.6.4..develop which is...surprising. There is not much there. I'm going to start grabbing commits and see what I can see. |
In the words of Scooby-doo, "ruh-roh". Yeah - not a FP calculation in sight in those MAPL changes. I guess the next step is to start from 2.6.4 and cherry pick the other things on develop to see which breaks it. This of course assumes its not some weird combination. But at best this must be some sort of memory corruption from what I can see. |
First update. These are fine:
|
Yeah. This is the one that hopefully doesn't happen! |
@yvikhlya can you list what you worked something out with @bena-nasa on using
|
I have an idea that I am testing now. |
@sanAkel No. the CubeSphere change is fine. That one didn't cause the difference. But as I said above, I think I have an idea |
Huh. well, my idea didn't pan out. Hmm. I mean, I found a bug and I hoped that was the issue. But nope. Back to a careful survey! |
I never worked out anything in MAPL_CubedSphereGridFactory.F90 |
Update. Looks like if you take all the files in |
Testing seems to show that the change that matters was going from: if (this%stack%empty()) then
block
use MPI
integer :: rank, ierror
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierror)
if (rank == 0) then
_ASSERT(.false., "Timer "//name//' should not start when empty: ')
end if
end block
return
else
node => this%stack%back()
if (.not. node%has_child(name)) then
m = this%make_meter()
call node%add_child(name, m) !this%make_meter())
end if
endif to: if (this%stack%empty()) this%status = INCORRECTLY_NESTED_METERS
_ASSERT_RC(.not. this%stack%empty(),"Timer <"//name// "> should not start when empty.",INCORRECTLY_NESTED_METERS)
node => this%stack%back()
if (.not. node%has_child(name)) then
m = this%make_meter()
call node%add_child(name, m) !this%make_meter())
end if can cause the non-zero-diffness. 🤷🏼♂️ |
The timers form a tree of nodes. Note that the change above is just changing the structure of the logic. around the block with the There really is no evidence that there is any bug anywhere around this implementation, and the implementation it replaced is provably buggy, though a very different sort of bug. The previous version would not have printed an error if a timer mismatch was detected on a non-root process. (An unlikely problem, but ...) |
@tclune Appreciate that explanation.
In that case, and if this change is the sole reason for a different answer, I am willing to take the change, by categorizing it as "a bug fix"! (Adding) @mathomp4 seems have shown that indeed |
It's not that simple though. (A) the bug it fixes was not happening in practice, and (B) the change should not affect results anyway. I agree that we currently have no evidence that the new results are better or worse than the old, but the situation is simply not acceptable. I would prefer to revert the change (after maybe trying more variants) than assuming the change is ok. At least if we get the old answer, it's not the SI team (or my!) fault. |
I wasn't aware of
of course not! I had been planning on a new run with MOM6 (need to check somethings first). Is it worth:
|
Update of extra tests running MOM6, 1 day, c90:
Uhhhh... yeah. 🤷🏼♂️ |
Ha! We have a clear direction to proceed! |
Note: I've probably never compared MAPL All I really can say is that Intel Release |
@sanAkel We don't really have any advice at the moment. Quite a distressing situation. |
This could be due to non zero difference in atmosphere stress forcing. This would have no effect in AMIP mode, since AMIP runs with data ocean. |
@mathomp4 It may seem unrelated but perhaps it does? |
Well all of my tests have that fix in it. So I am comparing like to like. I only changed the MAPL branch. That is it. |
As an extra data point, I build GEOS with |
Another update. @tclune had a thought to try both MAPL |
Since this is MOM related, I suppose my next test will be to turn off heap arrays in...MAPL? Everywhere? We know MOM was sensitive to heap-arrays in FMS, maybe this is now showing its head in MAPL |
Well, not sure what's happening. I've tried:
|
@mathomp4 what does
|
FAIL means that MAPL |
Does that mean, |
Yeah. All my tests so far have shown no change (save for all |
After many many tests, it looks like it is the 2-moment microphysics which causes this. The good news is we can see this without needing to run MOM which allows for tinier problems. Might be something to bring in @dbarahon at some point to see if maybe 2-moment has a possible memory issue somewhere? 🤷🏼♂️ |
Quoth @tclune: Bill is not surprised. Closing. |
@dbarahon I just re-opened this issue for your attention. Thanks! |
Thought it was resolved. Is this still the issue? 2-Moment Moist is non-zero-diff with MAPL develop |
@dbarahon please see ⬆️ . Essentially the No. This was closed because the problem was certainly not due to MAPL, but choosing 2-moment option. |
This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue. |
Not stale...though still unknown. |
@mathomp4 indeed status is unknown. Added |
This might be a "we expect that Matt" thing, but my tests last night showed that if you run MOM6 (or MOM5) with MAPL
develop
it is non-zero-diff to MAPL v2.6.3 (the version in GEOSgcmmain
). This is a real-life, state-is-different change:I am going to build GEOS with MAPL v2.6.4 and see if I can at least figure out where the change came in. (CC @sanAkel @yvikhlya )
The text was updated successfully, but these errors were encountered: