-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update damping and time step parameters #2363
Update damping and time step parameters #2363
Conversation
This commit changes the damping parameters in default_vars.sh and global_control.nml.IN to reflect new optimal values that are going to be used in the global workflow.
There are some reliability issues occurring with testing on Hercules (conus13km_debug_2threads_intel was not generating files on a consistent basis). I will do more testing on Hercules and test on Hera to see if there are similar issues. I will also document changes to run times if significant. K_SPLIT variable in default_vars will need to be moved/cleaned up. |
Added quotes to fix SuperLinter issue.
could you also add corresponding changes related to NOAA-EMC/global-workflow#2751 |
@XiaqiongZhou-NOAA : There have been problems with the omega PR #2327 in other applications and it's being reverted. Take a look at PR #2362 and Issue #346. |
I assume PR#2362 will be merged soon and Issue#346 already has a fix. It has already a fix NOAA-GFDL/GFDL_atmos_cubed_sphere#349 |
@XiaqiongZhou-NOAA Yes, I can roll the omega calc update into this PR also. There are still a handful of tests that are failing with the new damping parameters that I need to look into. If I can't resolve it soon, then I'll just make a separate PR for the omega calc. |
Thanks for the correction @XiaqiongZhou-NOAA, I misunderstood the state of things. |
A quick update on this PR: The new parameters are causing multiple issues with the regression tests. I'm compiling a list of the affected runs and I'll update it later today. I've been troubleshooting and found that if I revert K_SPLIT to its original value, then the crashes are resolved. It also seems like some of the runs have also extended their runtime with the new parameter values (and using the default K_SPLIT) but I'm trying to replicate the issue just to verify. Either way, the omega calculation has been added to a separate PR (#2373) while this issue is addressed. I'll post another update in here with a Google Doc documenting the affected tests. |
New damping settings should only be applied to 127-layer global model RTs |
@yangfanglin The global tests that Dan is testing are running with L127 levels. |
Which RTs and on what platform the runs are failing ? Might be worthwhile running the global-workflow CI tests mentioned in this PR NOAA-EMC/global-workflow#2575 and crosschecking namelist settings. |
So these are the results I've compiled so far: Google Docs link K_SPLIT and DELTIM seem to be the two variables causing the most issues. The common error in the cpld cases is the following: I'm going through the other test logs to see what the issues are because some of failures are being caused by exceeding the allotted wall clock time. @yangfanglin: Yes, I agree that we should look at the input.nml files to make sure that they are identical before diving deeper into the errors. Do you happen to have a copy of the GW CI test input.nml available? These tests were all run on Hera but I'm seeing similar issues on Hercules. |
Adding parameters to default_vars.sh and changing two RTs for testing.
This commit adds a new export_ugwpv1 fucntion to default_vars that assigns the new paramter values to tests that are currently using the version 1 of ugwp.
I've added changes to default_vars.sh and am currently performing one last test that cleans up some of the variable definitions in the regression tests (there were repeated variables). I am still unable to have a stable run with the reduced value of k_split. I've generated a namelist from GW and tried to emulate it without any success. I would ask to see if there is a namelist that uses these updated values for a resolution that is not C48. For now, I've left the default k_split. From what I've read, k_split should only affect stability and isn't changing the core physics of the system. We should also discuss how many tests we want to move over to v1. There are multiple tests that are using v0, others are using new HR4 variables, etc. For the RTs, I'm thinking we should keep some older configurations as long as they are still supported by UFS. |
Clean up some variable duplicates in the GWPv1 RTs. Other RTs will also need to get cleaned up.
@yangfanglin Can your group help with this? Currently the configurations were updated in global workflow, but it failed in the UFS WM rt. |
Can you list all n_split, k_split, detlm and resolution? Generally we can not only increase K_split, have to consider detlm/(k_split*n_split)? |
Fanglin is in training. I believe @KevinViner-NOAA suggested this update. |
We can use a C96 run as an example. n_split is 4, k_split is 1, and deltim is 600. That should match what is in the GW config (https://github.com/NOAA-EMC/global-workflow/blob/develop/parm/config/gfs/config.ufs#L132) and is being set in the default_vars.sh (https://github.com/dpsarmie/ufs-weather-model/blob/fix/updateDampingParm/tests/default_vars.sh#L944). I was able to get the C48 runs converted over with the GW parameters and all other runs at C96 and finer are stable if k_split is set to its default value (ex. k_split = 2 for C96 runs). Thank you for any help or information you might be able to provide. |
If k_split=2 and n_split=4, deltim=600, this will decrease dynamic time step or acoustic time step (deltm/(k_split*n_split) too much. The new setting (k_split=1,n_split=4 and deltim=600) should be dynamically stable. The crash is probably due to other issues. It is not a good solution just decrease dynamic time step. |
Agreed. I'm also working on unifying the namelist parameters between the RTs and GW, so having two different settings wouldn't be optimal. |
I am also seeing this persistently on orion/hercules with cpld_restart_gfsv17_intel, cpld_restart_bmark_p8_intel, and cpld_restart_pdlib_p8_intel. |
It could be the bug in the generation of the rt_temp.conf where a test dependency isn't handled correctly. |
Ok I think I found the issue. I was able to fix the gfsv17 case by making the K_SPLIT/N_SPLIT consistent between the the control and the restart RT. I've pushed changes to the other RTs that were causing issues for you all. I'm currently testing the other cases and will let you all know once those pass. Still not 100% sure on why this is the case/fix but I'll worry about that later. |
@zach1221 @FernandoAndrade-NOAA Ok, all three restarts should be good to go. I made new baselines and checked them on Hera. Let me know if you run into any issues. |
@dpsarmie my cpld_mpi_pdlib_p8 intel tests are failing comparisons on Acorn and WCOSS2 |
I'm having this exact issue, failing to compare, on both hercules and orion. Can't say yet on Derecho as it's been super slow. |
It turns out this PR needs additional troubleshooting. @zach1221 @FernandoAndrade-NOAA @BrianCurtis-NOAA Can we schedule #2220 to commit today? |
I think @dpsarmie has one last adjustment to make, and #2220 is still in draft status, i don't know why you're even considering it at this point. |
I'm testing the changes that I just merged on Hercules to see if I get the same error. It should be a quick check and I'll report back once it's done. |
Ok, cpld_mpi_pdlib and the other associated files passed on Hercules. @zach1221 @BrianCurtis-NOAA
|
Ok @BrianCurtis-NOAA @jkbk2004 testing is complete. We're ready for review. |
Description:
This PR changes the damping and time step parameters to new updated values that were created for Global Workflow (NOAA-EMC/global-workflow#2575). These values should improve reliability and allow for longer timesteps to be used in the model. A new section was introduced in default_vars.sh because the parameter values are resolution dependent.
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
Testing Log: