Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add netcdf restart and history files using PIO (parallel netCDF) for dev/ufs-weather-model #1298

Closed
DeniseWorthen opened this issue Aug 25, 2024 · 7 comments · Fixed by ufs-community/ufs-weather-model#2445
Assignees
Labels
enhancement New feature or request

Comments

@DeniseWorthen
Copy link
Contributor

Is your feature request related to a problem? Please describe.

WW3 uses binary I/O. This includes the initialization file, the history files and the restart file. The mesh cap branch includes a capability for run-time output of 'gridded mean fields' in netCDF. The current netCDF history capability writes netCDF serially. Switching to PIO would allow history output in parallel, which allows for potential scalability benefits with large meshes.

The binary restart files from WW3 are difficult to debug. For large unstructured domains, the writing of restart files for UFS is also found to be too slow. Implementing PIO/netCDF restarts would resolve both issues.

Describe the solution you'd like
Implement PIO+PnetCDF capability for both run-time history files as well as restart files for the dev/ufs-weather-model branch.

@DeniseWorthen DeniseWorthen added the enhancement New feature or request label Aug 25, 2024
@DeniseWorthen
Copy link
Contributor Author

@mvertens @alperaltuntas I have a working branch for this capability which I've tested with both structured and un-structured meshes. I've obtained restart repro as well as invariance to MPI decomposition for both types. I've added hooks for CESM and NorESM to configure PIO through the shr code. I need to do some final testing, but I should have something that you could test and modify for your use cases soon.

@DeniseWorthen DeniseWorthen changed the title Add netcdf restart and history files using PIO for parallel netCDF read/writes for dev/ufs-weather-model Add netcdf restart and history files using PIO (parallel netCDF) for dev/ufs-weather-model Aug 25, 2024
@mvertens
Copy link
Collaborator

@DeniseWorthen - that sounds amazing! Thanks for this.

@DeniseWorthen
Copy link
Contributor Author

@mvertens @alperaltuntas I have a clean feature branch for this work here. Please test when you get the chance.

https://github.com/DeniseWorthen/WW3/tree/feature/pio4ww3

I'm happy to have a tag-up to walk you through the changes, which are fairly substantial, primarily because I've backed out the changes we needed to make to w3wavemd and w3iorsmd for the mesh cap. Enabling netCDF restarts and history now occurs separately from any native WW3 writing or reading, so only minor modifications are needed relative to the develop branch.

In brief, there are two config parameters (use_restartnc and use_historync) which will enable the PIO+netCDF capability. These are set default true for CESM. There is also a setting which will allow you to startup from a binary WW3 restart (restart_from_binary).

The PIO is initialized in wav_pio_mod.F90. The new restart capability is in wav_restart_mod.F90. The netCDF history is now all in wav_history_mod.F90. This contains the old w3iogoncmd and the old wav_grdout routine.

@mvertens
Copy link
Collaborator

@DeniseWorthen - this is a huge step forwards. I'd love a walk through. @alperaltuntas - do you want to join? I can do tonight or even Friday night my time.

@alperaltuntas
Copy link
Collaborator

Thanks @DeniseWorthen, @mvertens. I can also join a call on Friday. I am available any time. In the meantime, I'll try testing the branch.

@DeniseWorthen
Copy link
Contributor Author

DeniseWorthen commented Sep 11, 2024

I have one last issue to resolve before opening at PR for this work. When waves are in the slow loop, the model does not restart reproduce. All other tests pass in the UWM RTs.

The slow-loop waves fail to restart repro for either structured or unstructured meshes. I've also verified that using dev/ufs-weather-model and writing only mapsta and va to the binary restarts does restart reproduce in the slow loop. So I am not missing any needed fields in the netCDF restarts. I suspect a flag or something that is not being set correctly for the netCDF restarts.

@DeniseWorthen
Copy link
Contributor Author

I mis-spoke in my earlier comment regard slow loop coupling. Using the current dev/ufs-weather-model branch, when waves are in the slow loop, ice is required in the restart file in addition to va and mapsta. Otherwise, no other fields are required and the model passes all baselines.

Ice is required because ww3 places this field at the center of the coupling interval. So when CMEPS exports the ice fraction to WW3 (which is in fact the averaged ice fraction over the fast coupling loop), WW3 does not use that new ice fraction until half-way through it's time-loop subcycling

IF ( DTTST .LE. 0.5*DTI0 ) IDACT(13:13) = 'U'

Whether this is what WW3 should be doing when fields are being provided from a prognostically coupled sea-ice (or ocean or atmosphere) component is another story. I don't believe WW3 should be doing management of field "updating" and "interpolation" between model advance times for our use-case. For one thing, it requires two copies (the "previous" and "next" fields) of most global import field in order to "update" the fields at certain intervals. None of this is required or even useful when fields are being provided through CMEPS from prognostically coupled components. At each model advance through the cap, the import fields are by definition "updated". Even worse, because these fields are required to be global on each DE, we are carrying two copies of identical global fields on each DE for no good reason.

In any case, in order to get the slow loop restarts for working for netCDF, I was forced to add ice to the netcdf restarts. This field will only be added when required, as specified in the ww3_shel.nml

&output_type_nml
....  
type%restart%extra = 'ice'
/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

3 participants