-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ocean data regridding to Gaussian grid functionality #8
Add ocean data regridding to Gaussian grid functionality #8
Conversation
679e33a
to
7b7353f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @danielabdi-noaa for this, it looks great! I have a ton of comments, but mostly are just fairly small suggestions. I think the main things that we should come to agreement on are:
- Code structure: as mentioned in one of my comments, I think this should be a module within ufs2arco, but I'd like to hear what you think
- With that in mind, I was also thinking of the regridding as one step in the ufs2arco process. 1) pull the data 2) regrid 3) chunk and store to zarr. If that's the case, then I think we would want to generalize this a bit more, and I think we would move the regrid options to be in the same config file as everything else, and it would just be another section. I think that would make sense since the regridding involves opening a dataset, processing it, and storing a dataset, we may as well hook it up to everything else. What do you think?
- Assumptions about the data: Right now this specifically looks for dimension and variable names that are specific to MOM6. We will eventually want to figure out how to make this general to any of the model components, for instance so it could work with CICE6 data. We could also address that in the future, and just make it known that this is specific to MOM6 for now.
- xarray vs numpy: Right now this does all of the regridding with numpy arrays, since the
.values
accessor is used. I haven't used xesmf in a while, but I think we could do all of this in xarray. This would let us clean up some code regarding some of theis_3d
if statements, and we could trim down some of the coordinate variable creation. I'd be curious what you think about trying that out.
regrid/gaussian_grid.py
Outdated
|
||
|
||
@__single_arg_fast_cache | ||
def gaussian_latitudes(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really nice @danielabdi-noaa ! I'm curious, is this the code that is used to create the same grid that the FV3 is output on? I had imagined that we would just read the FV3 grid from somewhere and use that, but I think this is more elegant if they are the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize that FV3 already outputs on a Gaussian grid, and thought that the gaussian grid is something we would want to use for the AI project due to its desirable properties. Anyway, I checked the grid coordinates contained in the atmosphere files (bfg_/sfg_ files) and it seems to be exactly the same. I will add the option of reading the grid from the atmosphere files just to be safe and use that as the default target interpolation grid.
regrid/regrid.py
Outdated
from gaussian_grid import gaussian_latitudes | ||
|
||
|
||
class Regrid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment on the structure, I was imagining that this regrid.py
module would go in the ufs2arco/
source code directory so that it could be imported with everything else. I do think that the regridding capability is general enough that it could go there, since we're working with grids that are regularly used by the UFS. Did you have something else in mind?
Regarding the class name, what do you think about calling this something like Tripolar2GaussianRegridder
or something? I'm picturing a future scenario where we may want to do a different regridding operation, so we may want others open like Gaussian2CubedSphereRegridder
or Gaussian2LatLonRegridder
... or something like that. I'm totally open to suggestions on the name, but I think something more specific would be helpful later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again regarding structure, I would then suggest moving the config-regrid.yaml
, regrid_env.yaml
into a directory specific to the data you're working with, something like examples/replay-mom6
.
Co-authored-by: Timothy Smith <smith.timothy.andrew@gmail.com>
Co-authored-by: Timothy Smith <smith.timothy.andrew@gmail.com>
86fab3c
to
f158f54
Compare
@timothyas All good points. Here is the current status
lats, lons = RegridMOM6.compute_gaussian_grid(180, 360)
rg = RegridMOM6(lats, lons, ds, config_filename = "config-replay.yaml")
ds = rg.regrid(ds) The regrid config is merged into the existing yaml file there, under its own section
ds=rg.regrid(ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @danielabdi-noaa! I think this looks great, this is the way I was thinking about it too. I agree with your choices to look for a weight file but create one if one doesn't exist, and I like that we have the Gaussian grid generation code here + the option to read a grid. I have a few more comments now:
- I don't understand the lines that make sure that there are more than 2 dimensions/coordinates, and that time is one of the variables, do you know why this is? Is this to avoid regridding coordinate (non-dimension) variables? I'm just thinking that it could be the case that we want to regrid a bunch of 2D variables at a single point in time.
- I'm now unable to run the docs because I don't have a grid rotation file. We could make this file available online, but I'd actually suggest that we make the rotation file an optional argument, and ignore vector fields if they are not present. I've made suggestions throughout the code to enable that, but let me know what you think.
- There are some issues with some of the xesmf dependencies that make it so we can't import xesmf without activating the conda environment (sometimes requiring deactivating it and reactivating it). Since I think many people will not necessarily want to do regridding, I suggest we make this import optional. What do you think? This would be one way to side step the windows build that is failing on our CI workflow, which I don't understand and don't really feel like debugging :)
- I added some small updates so that the documentation renders on readthedocs
- I have some small commits for the documentation. Would you mind if I committed these to your branch?
Co-authored-by: Timothy Smith <smith.timothy.andrew@gmail.com>
3cf3b55
to
5ca907d
Compare
5ca907d
to
d0df16a
Compare
@pjpegion the CICE6 output seems to always come with |
No, it may not be present. We should point to a seperate grid file. |
add regrid objects to API Reference documentation
Co-authored-by: Timothy Smith <smith.timothy.andrew@gmail.com>
Co-authored-by: Timothy Smith <smith.timothy.andrew@gmail.com>
…e storage capabilities
Regrid PR Suggestions
This looks great @danielabdi-noaa, thanks for all your work and for iterating back and forth with me. I don't understand why the windows build is still failing, I can check it out next week though. Otherwise I think this is good to go |
@timothyas It is failing because arguments passed to |
@danielabdi-noaa something I thought about relating to this PR is that it might be regridding more variables than we want. For instance if we only want temperature and salinity from our MOM6Dataset, this current code will still regrid everything it finds, then subset the variables. The thing is that right now we would have Regridder objects referencing items in the Dataset section of the yaml file, so I think it's probably cleanest to merge this PR, then implement that change. Do you agree? If so we should open an issue with that so we don't forget. |
Co-authored-by: Timothy Smith <smith.timothy.andrew@gmail.com>
Co-authored-by: Timothy Smith <smith.timothy.andrew@gmail.com>
@timothyas I have committed your suggestions. Lets see if it solves the CI issue! |
Hi @danielabdi-noaa, with that suggestion the CI is passing. Then after #5 was merged there was a minor conflict in ufs2arco/init.py which I hope you don't mind I just resolved (basically the importing of Timer and classes from .regrid were on the same line). So now I think this looks good, let me know if you think it's ready and @frolovsa it is ready for your eyes whenever you get the chance. |
@timothyas Looks good to me! |
I just realized this never got merged. Sorry @danielabdi-noaa ! I gave it another look and I think it's good to go. Merging now.. 🚀 |
This adds capability of re-gridding tripolar ocean dataset into a Gaussian grid.
@timothyas Let me know if this is how you want it or if you need a deeper integration with ufs2caro (same config file,
same input/output paths etc). I am assuming regrid comes first so currently it is reading from netcdf file one.
To use the class, please take the steps outlined the doc string of "Regrid" class.
I will leave as a draft