-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use adjust_ps=false in Cxx theta-l_kokkos code #6063
Use adjust_ps=false in Cxx theta-l_kokkos code #6063
Conversation
This adjust_ps logical has come up in other PRs (#4717) and the comment is always "we're not going to change it for EAM because we don't want to change v2 answers". We're assembling v3 now so time to reconsider that? Also this issue E3SM-Project/scream#1343 says "E3SM should also migrate to adjust_ps=.false., but this is on hold in order to preserve buggy V2 behavior." |
, m_policy_dp3d_lim (Homme::get_default_team_policy<ExecSpace,TagDp3dLimiter>(m_num_elems)) | ||
, m_tu(m_policy_dp3d_lim) | ||
, m_dp3d_thresh(params.dp3d_thresh) | ||
, m_vtheta_thresh(params.vtheta_thresh) | ||
, m_policy_dp3d_lim (Homme::get_default_team_policy<ExecSpace,TagDp3dLimiter>(m_num_elems)) | ||
, m_tu(m_policy_dp3d_lim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated, but gives us a warning in EAMxx that I would like to remove. Let me know if I need to revert for the PR.
To Rob's question: Yes, EAM also has this bug and EAM should adopt adjust_ps=.false. But I didn't want to suggest it since it's a small effect and we are so close to the V3 deadlines. |
Will this be answer changing for SCREAM simulations? |
When we merge for EAMxx it will be answer changing for C++, not otherwise ( |
Ok, and everyone has agreed to that? I ask because it looks like the old method is being removed from C++, so we won't be able to switch back if there are issues. |
I was about to ask whether we want climo for this change, to document it, even if we can do climo only in eam. I ran climo for this long time ago, so the best would be to rerun https://acme-climate.atlassian.net/wiki/spaces/COM/pages/1632864081/Climo+6-year+runs+for+ps+adjustment+dp3d+adjustment+T+adjustment+choices . |
Correct - (removing old code). This has been tested in SCREAM v0.1, but then the bug was accidentally turned back on in v1. so it hasn't had extensive testing in SCREAM v1, but it is low risk. |
Also, can we change the description a little? Something like
@tcclevenger did you run homme suites for this? if so, would you please post the output from chrysalis and weaver here? thanks! |
Though my comment above is suggestive, i actually think we should rerun climo for this change. If @tcclevenger does not want to, i can do it. |
@oksanaguba I have run all theta-f* tests on my workstation. I am struggling with weaver at the moment. Have you successfully run there since the drivers and modules were updated? I will continue to try, or move to summit. And I do not have access to chrysalis. |
@tcclevenger i haven't used weaver for a while. |
Waiting on @tcclevenger to test on GPU. |
An update: I now have homme building on weaver, but the F90 As for summit, I'm running into an internal compiler error whose solution doesn't seem trivial. I'll most likely continue on the weaver path. |
Is the Summit ICE for the standalone-Homme build? Does it occur with this configuration?
This configuration worked for me with the master branch of a few weeks ago; you might check the master branch in addition to your branch. You should also merge this PR into the SCREAM repo and run the CIME SCREAMv1 test suite, e.g., the specific test ERS_Ln90.ne30pg2_ne30pg2.F2010-SCREAMv1. Another point: The ICE might depend on modules. You can source the .env_mach_specific.sh file from a CIME test to get an environment to use when building standalone Homme. |
@oksanaguba @ambrad Looks like my issue on summit was the modules! Loading them through CIME has it building without error. Thanks for the help! |
71ec93c
to
f61f055
Compare
Ran the
I also merged into EAMxx and ran AT test suite on weaver and mappy. Weaver passed, Mappy failed with diffs in CIME cases (expected) with normalized diffs in range [2e-5, 1]. I also ran same CIME cases on summit and diffs existed in the same fields as CPU and the same values. @mt5555 Do these numbers seem reasonable (if that is possible to know), or are there any simulations I need to run to test the output? @oksanaguba can you run the Chrysalis tests. I don't have access yet. |
Ran climo for model-vs-model for EAM with the default setup (adjust_ps = true, master branch) and with adjust_ps = false. The climo is here https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/onguba/theta/eam-ne30-dpadj.443759.0002-0006/def/viewer/ , not sure if these diffs are considered small, but they are documented. |
@oksanaguba Do you know if there are cxx vs. F90 tests using |
@oksanaguba What still needs to be looked at for this? Is there anything I can do on my end? |
@tcclevenger i want to check one more time which code line(s) actually make adjust_ps vs adjust_dp cases diverge when forcing is zero. that is something i tried to do before, but did not succeed. i plan to get back to this at the end of this week. |
Regarding nonbfb behavior for adjust_ps=true vs false with use_moisture=true, it is due to this code
that is, phydro is computed differently. when using test theta-f1-tt10-hvs1-hvst0-r2-qz10-nutopoff-GB-ne2-nu3.4e18-ndays1 these differences not show up during the 1st time step (not sure why, not like the values are represented as decimals)
but show up in the 2nd time step
I do not see anything else that would be a concern. |
i am running homme suite on chrysalis, after that this will be ready. |
@tcclevenger forcing test passed for me when running a few times by hand, but failed in test suite, with seed 1206257716 . So i hardcoded the seed into forcing_ut.cpp and it does fail, with nans (not like results differ). last time we saw it it was about init of ps and init of dp3d mismatch (dp3d did not get init-ed from A,B). you fixed that, so i am not sure what else is there. one of us would need to debug. |
@tcclevenger i believe one issue with the new code is that this init of dp is not what we want:
See, for example, this correct code for dp from A,B:
I tried this fix:
but it also did not work (meaning it seems that there is still a mismatch between this dp and dp as computed from p(A,B,ps)). So I printed averages of Ai vs Am and for me they do not match (but they should). We can re-iterate on Tuesday in case i made mistakes. |
Thanks for looking at this, @oksanaguba. I can try to do some debugging this week! |
This resolved it for me, that is, now hydro pressure computed from dp coincides with the pressure from A,B:
Please commit the changes from above (hv init and dp calculations), run on gpu standalone forcing test (i assume there is no need to run ERS test, since the above does not touch forcing functor), and then i will re-test multiple times on chrysalis. Thanks. |
@oksanaguba Implemented your changes, test with that specific seed before and after and went from failing to passing. Tested forcing ut on weaver V100 gpu and builds and runs without errors and test passes. |
discussion: Oksana will test one more time. |
Running hommebfb on chrysalis as expected
|
also forcing up passed 100 times:
|
This PR is ready. |
|
@oksanaguba you're the integrator on this. Go ahead and start merging it. |
This is an incremental change to introduce adjustment of dp3d instead of adjustment of ps. adjust_ps is still used for preqx (forcing was never rewritten), rsplit=0 (only adjust_ps is possible), or EAM (to not touch every test). Fixed bugs in forcing_ut init, testing it only with adjust_ps=false option. [non-BFB] for HOMME tests with baselines (with moist forcing and some without nontrivial forcing, because even zero forcing is processed differently for adjustment of dp3d).
cdash is still affected by chrysalis, i cannot sort out yet whcih tests diff-ed because of this PR. |
I think you are good. This PR only caused diffs in: ERS_D.ne4pg2_oQU480.F2010.mach_comp.eam-hommexx, SMS_Ln5.ne30pg2_ne30pg2.F2010-SCREAM-LR-DYAMOND2, and SMS_Ln9_P24x1.ne4_ne4.FDPSCREAM-ARM97 |
I just realized we have a deficiency here -- we have no choice but to keep both options, adjust_ps and adjust_dp3d, therefore we need to print in the logs which one is active. i am (again) looking at the code instead of just reading a log file. |
#endif | ||
|
||
#ifdef CAM | ||
adjust_ps=.true. ! For CAM runs, require forcing to stay on reference levels | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this last change is the problem for cdahs -- DYAMOND and DPSCREAM tests ( SMS_Ln5.ne30pg2_ne30pg2.F2010-SCREAM-LR-DYAMOND2.chrysalis_intel and SMS_Ln9_P24x1.ne4_ne4.FDPSCREAM-ARM97.chrysalis_intel ) both have -DCAM and -DSCREAM in ./cmake-bld/cmake/atm/CMakeFiles/atm.dir/flags.make defined. We code like only one is active. Previously v0 scream was using adjust_dp3d, but now it is switched to adjust_ps .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to only forcing adjust_ps=.true.
is CAM is defined and SCREAM is not defined.
Not quite sure what to do -- DYAMOND and DPSCREAM v0 tests fail because they were accidentally switched to adjust_ps . hommexx test fails because it was switched from adjust_ps to adjust_dp3d. The latter is not an issue, but v0 tests probably need to be addressed. I won't be able to do this today, maybe @tcclevenger can address this? Separately, we need a follow up Pr that would print which option is used, in all cases -- in homme and hommexx output. |
Its not a problem for E3SMv3 that SCREAMv0 answers change. Up to @PeterCaldwell I guess. |
@tcclevenger please push your change to your branch and i will test now. |
@oksanaguba Done. Should be good to test now. |
This is an incremental change to introduce adjustment of dp3d instead of adjustment of ps. adjust_ps is still used for preqx (forcing was never rewritten), rsplit=0 (only adjust_ps is possible), or EAM (to not touch every test). Fixed bugs in forcing_ut init, testing it only with adjust_ps=false option. [non-BFB] for HOMME tests with baselines (with moist forcing and some without nontrivial forcing, because even zero forcing is processed differently for adjustment of dp3d).
re-merged to next after confirming that DY and DP tests now pass. |
This is an incremental change to introduce adjustment of dp3d instead of adjustment of ps.
adjust_ps is still used for preqx (forcing was never rewritten), rsplit=0 (only adjust_ps is possible), or EAM (to not touch every test).
Fixed bugs in forcing_ut init, testing it only with adjust_ps=false option.
[non-BFB] for HOMME tests with baselines (with moist forcing and some without nontrivial forcing, because even zero forcing is processed differently for adjustment of dp3d).