-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change mksurfdata_map/mksurfdata_esmf Makefile to build single-point datasets using subset_data #1674
Comments
A question for this is what global surface dataset these should start from? Both which resolution and also should it use the surface dataset just created by mksurfdata_map or use the current one in the XML file? To do the former the dependency to the global dataset would need to be added to the Makefile. The later would be independent of other mksurfdata_map files, but also wouldn't be using the most updated dataset. |
@ekluzek - if the move to mksurdata_esmf is coming soon - why are we still putting changes in mksurfdata_map. |
@mvertens this is something that applies to both. It's really completely independent of the mksurfdata_map/mksurfdata_esmf code. This is something that could either go onto the ctsm5.2 branch or onto master. And I'm actually not sure right now which one it should go in. So I called it mksurfdata_map so that it would apply to either one. |
I'd like to understand something here: With mksurfdata_esmf, will it still be possible to make a single-point surface dataset without overrides? For example, could it be used to directly create the numaIA surface dataset, which I believe doesn't do any overrides? The reason I ask is: if this is still possible, then it seems like the most straightforward thing to do could be to create a single point dataset using mksurfdata_map pointing to a single-point mesh file for the output, then using modify_fsurdat to do the appropriate modifications. That sidesteps the issue of needing to choose some arbitrary global resolution to then subset for these out-of-the-box single-point datasets. Or is the capability to directly create a single point surface dataset going away? |
@billsacks - it is really inefficient to create a single point dataset with mksurfdata_esmf. As we already discovered yesterday, getting the mappings for very low resolution output grids can be very costly. Sam and I already discussed this and we fill the most straightforward way is to create global datasets and then use the subset capability to extract a single point. I'm happy to discuss this offline if that would be helpful. The bottom line is that this capability is not going away - but would be expensive to use. |
If you have a mesh file for your single point site you'll be able to use the new mksurfdata_esmf to create a single point dataset. So you could do that for numaIA for example (and 1x1 brazil). I don't see that going away -- just the ability to override after you've done that inside of mksurfdata. But, for most of the single point sites we wanted to eliminate having to create a mesh file for them. So the standard procedure we are now recommending is to use subset_data to create single point surface datasets. One of the goals for single point sites is to have a mechanism that's fairly standard. So it works for NEON and plumber sites, and isn't that much different for a user defined tower site as well. |
@billsacks I do appreciate your comment about the roles of fsurdat_modifier vs. subset_data, it's a valid question. That is what I hope to workout in our subgroup meeting and have a recommendation of how everything relates to each other. Let me know if you would like to be added to that subgroup meeting. I think we'll still have some discussion of the recommendations in CTSM software, but that's the heart of that subgroup discussion. |
@ekluzek - it is very costly to create the route handle from a 1 km high resolution data set (such as for elevation) to a very low resolution dataset (such as 10x15). That is because you cannot scale out the output mesh to many processors and the input mesh has millions of points. It is very fast on the other hand to create a very high resolution surface data set (I can generate a 7.5km surface dataset in under 7 minutes) and use that high resolution surface dataset to extract single points. I see that as a much better way to move forwards. Again - I am happy to meet to talk about this. |
@ekluzek - I would like to be a member of any subgroup that is formed to discuss these issues. |
OK, I just talked to @mvertens and as she is having trouble with 10x15, we think single point will be even worse. So we should add some logic to say "don't use mksurfdata_esmf for low grid count grids -- use subset data". This is already something we had decided in going away from PTCLM and moving towards subset_data. So you would never use mksurfdata_esmf for a single point site, you'd always use subset_data. This will mean some of our sites will have a slight change in answers, so we might want to bring this in before ctsm5.2 comes in actually. |
Okay, sounds fine. This feels weird that we can now make a high-resolution dataset with no problem but can't make a coarse-resolution or single-point dataset – naively, it feels like it should be possible to use a different decomposition / parallelization strategy that would enable parallelization across the source domain (instead of the destination domain) for the generation of the mesh & route handle in these cases to provide good performance – but I can see how this isn't a use case that is worth optimizing for. |
I'm fine with the decision to use subset data for regional and single point cases, I thought that's why we were making this tool. I agree with Bill, that's its odd we can't make a coarse resolution grid easily. Is that just because creating the mapping file is too memory intensive? @ekluzek to your suggestion about answer changing single point runs, and "we might want to bring this in before ctsm5.2 comes in actually". How critical is this, especially if we're about to upend a bunch of the underlying datasets used for surface data? Are we wanting to maintain backwards compatibility for PLUMBER2 simulations? Is it critical do understand how modifications to our mksurfdata workflow are changing answers at sites where Gordon and @olyson are regularly running single point cases? , I kind of assume these will be larger changes than the particulars |
One more thing I'd like to weigh in on here, is that I don't think it's critical to carry around a high resolution surface datasets for the purposes of single point simulations. The current workflow of using subset data on a 1 degree surface dataset and overwriting with site specific information if necessary seems fine. That said, subset_data should also work on those higher resolution (7.5 km) datasets, but I don't think it needs to be a standard or default way we use this tool. |
@wwieder - the current culprit for the coarse dataset creation (the only real problem is 10x15 - nothing else) - is trying to generate the route handle (i.e. online mapping file) to map a 1km dataset to a grid that only has 400 points. I believe the issue is that there are simply not enough degrees of freedom for a 10x15 to scale this out. I'm reaching out to Bob Oehmke today to verify my assumption. My assumption is that you can't scale things out if you don't have a big enough target grid. |
@wwieder yes lets talk about this more at out next CTSM software meeting. Especially the bit about changing surface datasets. For that I'm just talking about changing our testing datasets: 1x1_brazil, 5x5_amazon, 1x1_numaIA,1x1_vancouverCAN,1x1_mexicocityMEX, and 1x1_urbanc_alpha. One of the reasons I want to do that now is just to show that we can get this to work before we do the big change of all datasets, where all surface datasets will change. I want to separate any possible problems with this part of the change, from the general change in all surface datasets. Otherwise, the general ctsm5.2 change, might hide problems in moving from mksurfdata to subset_data. |
We talked about this issue in our CTSM software meeting. Note, from that discussion (also on the wiki) are:
So I'm going to move forward with using subset data to create the single point datasets as the standard for them. |
For this task to be complete #1673 needs to be dealt with first. I can still make progress on it though, and when the other is finished this can be finalized. |
I have this working for mexicocity and the kind of differences I see are as follows:
Almost none of the above differences in teh surface dataset will matter, with 100% urban coverage. I thought perhaps some soil related things might matter for pervious road: zbedrock, FMAX, ORGANIC? |
Yes, differences in zbedrock, FMAX, and ORGANIC will matter for pervious road. |
@olyson, OK good to know. Then in your opinion is it OK to use the 1-degree grid cell averages for these? Or should we adjust them to the site? We could use the values used previously which came out of mksurfdata for a site smaller than a 1-degree grid cell so would be "more" accurate. But, still unless we have local site date it's not going to be that accurate. If you have a source of these data for the sites, we could use that. So which sounds like the way to go to you?
|
Let's use the easiest (1-deg grid-cell values). I don't have site-specific data for those variables and it's probably not worth introducing additional complexity into the process to get the values from the previous file. mexicocity is mostly used in the test suite and by myself occasionally to assess differences due to model changes. |
The file is here: /glade/work/erik/ctsm_worktrees/answer_changes/tools/mksurfdata_map lsmlat and lsmlon are different because they are integer indices (both 1) in the original and floats for the actual latitude/longitude value in the new one. LATIXY and LONGXY are identical which is the important thing (that and lat and lon variables). lsmlat, and lsmlon aren't actually used. |
Got it, thanks. I see that FMAX is exactly zero in the new dataset. Is that perhaps a new feature of single-point datasets (I seem to recall Sean arguing for something like that at some point) or just a coincidence? |
@olyson this is because I ran subset_data with "--cap-saturation" (which sets FMAX==0). That is how I setup all the single-point datasets. It is something I could remove though. I also used "--uniform-snowpack" for all the single point datasets (sets STD_ELEV==20). I think this is the way we should do things though, so I'll leave it like that. |
Sounds good, thanks. |
The answer changing part of this is that the non-urban single point sites (smallvilleIA, numaIA, brazil) change from half degree grid-cells from mksurfdata to 1-degree for the first two, and then 2-degree to 1-degree for brazil. I think this is OK though, the sites are still similar in their characteristics, and don't drastically change from the original ones. The brazil site can't just use the f19 fsurdat file to get the same results either, as the previous case was close to the f19 gridcell, but rounded off and made to be an exact 2 degree by 2 degree grid cell. |
The answer changing part as mentioned above is that we are using --cap-saturation and --uniform-snowpack for the single point sites. |
We just discussed this in the standup, but we figure the urban datasets should use the 78pft version rather than the 16pft version because it won't matter for the urban datasets and we want to move to always using the 78pft versions rather than having to have both. |
To make sure things are working as expected, I copied the earlier values of the following variables to the new dataset (for vancouverCAN) and showed that I get identical answers to ctsm5.1.dev115: FMAX,STD_ELEV,zbedrock,ORGANIC, and SLOPE. This means things are working as we think they are, which is good to know. For urban there are other fields that are different, but they don't matter for a 100% urban case. |
Currently mksurfdata.pl is used to create the single point surface datasets. We are moving that capability over to the new subset_data tool. So we need to change the Makefile in the mksurfdata_map/mksurfdata_esmf tool directory to use subset_data to build the single point datasets.
Relates to:
#1664
Blockers for this are: #1665 #1673
The text was updated successfully, but these errors were encountered: