Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For mksurfdata_map: use maps with no source masking, applying mask separately #286

Closed
billsacks opened this issue Feb 13, 2018 · 12 comments · Fixed by #823
Closed

For mksurfdata_map: use maps with no source masking, applying mask separately #286

billsacks opened this issue Feb 13, 2018 · 12 comments · Fixed by #823
Assignees
Labels
blocker another issue/PR depends on this one enhancement new capability or improved behavior of existing capability

Comments

@billsacks
Copy link
Member

@swensosc suggested this 2015-10-19, and it seems like a good idea to me: When mapping files from their raw data grid to CLM resolutions, we could use maps with no source masking, and then apply the mask in a separate step.

We currently have a LOT of mapping files from the mksurfdata_map raw data files to the CLM grids. Much of the reason we have so many is that we have a separate set of mapping files for each raw data mask - e.g., even if many of the raw data files are at the same 3' resolution, we need different mapping files for the different masks.

Sean pointed out that we should be able to use mapping files without masks, and then tweak the mapping algorithms to apply the source masks separately. I think we do things like that in other parts of CESM (e.g., in the coupler?). This would greatly reduce the number of mapping files we need to maintain. Furthermore, if a raw dataset is updated, and this update involves changing the mask, you wouldn't need to remake mapping files. (This was Sean's original motivation, as he is updating the lake dataset in this way.) Instead, mksurfdata_map would simply read the mask off of the (updated) raw data file.

@billsacks billsacks added the enhancement new capability or improved behavior of existing capability label Feb 13, 2018
@ekluzek
Copy link
Collaborator

ekluzek commented Feb 14, 2018

My understanding of how the ESMF regridding works is that this isn't something that you can do. But, I could be wrong. The mapping files created have the masks inherently embedded into them. I don't know of an easy way to extract them out. You could assume without a mask, but that means when you run the mapping you will be averaging in data that is outside the mask. That's the thing that I think we want to ensure doesn't happen.

I don't really think the burden is that high for carrying around files at the same resolution, but multiple masks. Right now there are three half degree, five 3x3 minute, and 2 10x10 minute grids. So you'd have some speedup with this, but still the 1km-merge-10min_HYDRO1K-merge-nomask grid is the one that far and away takes the most time.

@billsacks
Copy link
Member Author

@ekluzek Unless I'm overlooking something: You can have masks embedded in the grid files when creating the mapping files, but you don't have to. If you don't, then you need to do a bit more work in the mapping routine, but we actually already have code in place to do this: gridmap_areaave_srcmask in mksurfdata_map. This is used when the source mask isn't known ahead of time.

You may be right that the burden isn't that high for the different grid files, but the burden is higher for the combinatoric mapping files. I know we've talked about moving away from storing all of them eventually, though.

In the end, I don't have strong feelings about whether this should be done. I think it's a good idea, but I'm not sure if it gains us enough to be worth the development time.

@mvertens
Copy link

mvertens commented Feb 15, 2018 via email

@billsacks
Copy link
Member Author

@mvertens I don't disagree. However, I'll point out that doing this suggestion could actually help at least as much if we're generating mapping files on the fly, because we'd only need to generate, say, 1/2 or 1/3 as many mapping files.

@ekluzek
Copy link
Collaborator

ekluzek commented Feb 15, 2018

@mvertens as @billsacks says, yes, if that line of development (to create mapping files for mksurfdata_map on the fly) is taken up again, this change should happen along with it. It'll both shorten the time to make the mapping files and as @billsacks pointed out minimize how many are required. If we go to a paradigm of creating them on the fly, you want to create as few as possible as fast as possible. The time to create them needs to be sufficiently short though, to make that the standard mechanism.

@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Apr 11, 2019

Tasks that I see mentioned above...

Sean's suggestion:

  • Separate the masks from the mapping files to reduce the number of mapping files needed
  • Keep the masks in new mask files and use when needed

Mariana's suggestion:

@slevis-lmwg
Copy link
Contributor

Corrected previous post to say #644

@billsacks
Copy link
Member Author

@slevisconsulting - yes, Mariana's suggestion is in the scope of #644 , so this issue relates to Sean's suggestions, which we thought were a good idea both for the sake of dataset management and efficiency. Note that this will require changes to mksurfdata_map as well as the scripts / xml related to our dataset management.

@billsacks
Copy link
Member Author

billsacks commented Jun 12, 2019

Once this issue is resolved, we can more easily resolve #8 .

Blocks #8 .

@billsacks billsacks added the blocker another issue/PR depends on this one label Jun 14, 2019
mariuslam pushed a commit to NordicESMhub/ctsm that referenced this issue Aug 26, 2019
@slevis-lmwg
Copy link
Contributor

An update:
To avoid unnecessary work in #815 I have switched my attention to the present issue (#286).

qsub regridbatch.sh is running right now with changes in...
mkmapdata.sh
namelist_defaults_ctsm.xml
namelist_defaults_ctsm_tools.xml
namelist_definition_ctsm.xml
that reflect the replacement of numerous SRC files containing various masks with one SRC file per SRC resolution that contains grid_imask = 1.

@slevis-lmwg
Copy link
Contributor

For now regridbatch.sh seems to be working for all the source (SRC) and destination (DST) resolutions except 1km-merge-10min_HYDRO1K-merge-nomask

@slevis-lmwg
Copy link
Contributor

Resubmitted with a couple of changes and seems to work for the 1km-merge-10min_HYDRO1K-merge-nomask SRC resolution.

I will open a PR soon to share my code mods to-date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker another issue/PR depends on this one enhancement new capability or improved behavior of existing capability
Projects
None yet
4 participants