Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single point simulations #1198

Closed
4 of 5 tasks
wwieder opened this issue Oct 29, 2020 · 20 comments
Closed
4 of 5 tasks

Single point simulations #1198

wwieder opened this issue Oct 29, 2020 · 20 comments
Assignees
Labels
enhancement new capability or improved behavior of existing capability science Enhancement to or bug impacting science

Comments

@wwieder
Copy link
Contributor

wwieder commented Oct 29, 2020

We are discussing deprecating the PTCLM toolchain (as the toolchain is being remade). This will be replaced with the faster scripts that pull atmospheric forcing, surface dataset and domain file with python scripts from @swensosc (currently in tools/contrib) that can be written over / modified as needed.

Are there features / functions that we'll lose by doing this? Specifically, @olyson how do you set up single point urban simulations?
If / when we go down this route we'll need to:

This can be addressed by the SE supported by NCAR-NEON funding

@swensosc
Copy link
Contributor

swensosc commented Oct 29, 2020 via email

@olyson
Copy link
Contributor

olyson commented Oct 29, 2020

For urban sites, I generally just start with an existing surface dataset/domain file and edit using NCL.

@billsacks
Copy link
Member

my understanding was that the reason ptclm takes as long as it does is due to its use of mapping files. Another way forward would be to write a script that would go to the raw data files and just grab the values at the closest point. This would be a little work, but it would preclude the need to point to an existing surface data file. It would also represent our 'best guess' for the site, rather than getting values that have been averaged over a much larger area.

That's a good point, @swensosc . @ekluzek @slevisconsulting @negin513 I wonder if it's worth considering an option to the new toolchain that just does a nearest neighbor mapping rather than a conservative mapping. If it's easy to do, we could make this the default for single-point runs, but I'm starting to wonder if it would be a useful option to have in general. My impression is that it should be easy to implement this, since I think it would just mean changing one argument to the esmf regridding routine. I guess, though, the main value in doing this would be to speed the process, so it's only worth doing if it actually does speed the process significantly. (I'm suggesting this instead of a custom script just to keep the process consistent and to make it easier for us to implement that option, under the assumption that ESMF's nearest neighbor mapping would be about as efficient as any custom script we wrote.)

I don't think this would work for the topography standard deviation which is computed from the 1 km dataset, though. There may be other cases that don't work as intended in mksurfdata_map if using nearest neighbor, but off-hand I can't think of others.

@ekluzek
Copy link
Collaborator

ekluzek commented Oct 29, 2020

@billsacks yes I like this idea better. @swensosc creating a script to handle all dozen of the input files to mksurfdata sounds a bit problematic. As @billsacks points out it would need to be synchronized with the mksurfdata_map code as well. If we thought we wanted to replace the mksurfdata_map code with a script (which isn't a bad thing to consider) that would seem better. In the long run having a script replace mksurfdata_map would be a good goal to make it easier to work with -- but that would be a big project.

But, I really like the idea of building into the planned toolchain the ability to get nearest neighbor mapping. That should be tons faster. And yes the standard dev of the 1km datasets is the only thing I can think of that wouldn't work either. But, you really wouldn't get a good value for that for a single point anyway. Maybe that should just be assigned an arbitrary value for single point sites? @swensosc what would be a good way to set this value for a tower site?

I know soil color is handled in a strange way, but I think the nearest neighbor method should work for it.

@ekluzek
Copy link
Collaborator

ekluzek commented Oct 29, 2020

@olyson we still have mksurfdata create datasets for MexicoCity, Vancouver-CAN, Camden-NJ and urbanc_alpha. And the model is setup to run these out of the box. Is there still a need to be able to do this? Or can these special cases be removed? We do create these with each update of the surface datasets. If we didn't need to do that, we could remove some complexity.

We also have the capacity to create a special urban test case called asphalt-jungle (with no permeable road). I don't think we run that test anymore, so we could likely remove that.

@olyson
Copy link
Contributor

olyson commented Oct 30, 2020

I think it could be useful to keep at least one of these as part of the test suite? I think you could remove all but Camden-NJ, as I've found that to be useful very recently.

@swensosc
Copy link
Contributor

swensosc commented Oct 30, 2020 via email

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Oct 30, 2020 via email

@swensosc
Copy link
Contributor

swensosc commented Oct 30, 2020 via email

@ekluzek
Copy link
Collaborator

ekluzek commented Feb 19, 2021

@danicalombardozzi @wwieder @negin513 @jedwards4b and myself met today and talked about this some in context of the work with NEON. I proposed that both NEON and supported, unsupported sites be setup to use user-mod-directories to get the settings right. The more highly supported sites (like NEON) would have fewer things in the user-mod directory and mostly encoded in XML. For an unsupported tower site the surface datasets would be put in the user_nl_clm file.

So for NEON you'd do this to run the specific Yellowstone NEON tower site...

./create_newcase --compset I1PtClm50Bgc --res CLM_USRDAT --user-mods-dir neon/YELL

For a supported PLUMBER2 site you'd have something similar with maybe plumber2/HARV for the user-mod directory for it. The difference to above is that there would likely be more settings in the user-mod directory itself rather than encapsulated in the XML files.

When a user runs the singlept script to create data for an unsupported site it would be something like this...

./create_newcase --compset I1PtClm50Bgc --res CLM_USRDAT --user-mods-dir mydirectoryIcreated

Where the output of the singlept script is to create a user-mod directory that the use can point to run their case.

By, using a user-mod-directory in each case it makes the workflow more similar between all of the options. The compset and resolution above already exist so there we are just taking advantage of something that's already there.

@slevis-lmwg
Copy link
Contributor

In the context of the NEON work discussed above, @negin513 mentioned to me that the question was posed whether to generate surface datasets for such site simulations on the fly each time one of these simulations is performed. Assuming I understood the issue correctly, my recommendation would be to generate the surface datasets only once to avoid the risk of slightly different copies of the same surface dataset inadvertently creeping into the project. Generating the surface datasets once will contribute to consistency in the performed simulations.
(If I misunderstood the issue, pls disregard my comment.)

@danicalombardozzi
Copy link
Contributor

@slevisconsulting: Thanks for your input! We settled on making the surface datasets once rather than on the fly.

@ekluzek
Copy link
Collaborator

ekluzek commented Mar 4, 2021

@jedwards4b has come up with a great idea to simplify things to eliminate domain files by overloading PTS_MODE in cime to get the latitude and longitude from the xml variables PTS_LAT and PTS_LON. Domain files can be helpful for regional grids to give the size and area and mask of each gridcell. But, for single point sites none of those things are really meaningful.

The cime PR for this is here...

ESMCI/cime#3868

@ekluzek
Copy link
Collaborator

ekluzek commented Mar 19, 2021

PR #1309 brings in the changes so that domain files are not needed when using the NUOPC coupler

@wwieder
Copy link
Contributor Author

wwieder commented Apr 22, 2021

I'm also going to assign @glemieux and @rgknox to this issue, as bringing in FATES functionality for single point cases would also be nice with this effort (but also maybe warrant a specific issue to track).

@rgknox
Copy link
Collaborator

rgknox commented May 5, 2021

Working through some tests at one of our NGEE-Tropics sites for FATES, BCI panama.

In my typical workflow, when running a RES=CLM_USRDAT, I also use:

./xmlchange DATM_MODE=CLM1PT

This generates a file: run/datm.streams.txt.CLM1PT.CLM_USRDAT

My build process doesn't seem to complain about any of my xml settings, but it can't find the stream file. Is this deprecated, or is there a new way to import site level met drivers?

@ekluzek
Copy link
Collaborator

ekluzek commented May 5, 2021

@rgknox there's a bug in the latest cime that we have in the latest CTSM that causes this to not work. So consider this non functional until I get it to work. The cime issue is here...

ESMCI/cime#3905

@ekluzek
Copy link
Collaborator

ekluzek commented May 5, 2021

@rgknox a workaround I found that seems to work for me for an individual case is to set the ATM_GRID and LND_GRID in the case by hand to CLM_USRDAT with xmlchange. Try that and see if you can get your case to work. Be sure to let me know if it does.

@rgknox
Copy link
Collaborator

rgknox commented May 6, 2021

Thanks @ekluzek

The model is still trying to find an unset fatmlndfrc file:

Model ctsm missing file fatmlndfrc = '/raid1/lbleco/cesm/cesm_input_datasets//share/domains/UNSET'

@ekluzek ekluzek added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Aug 26, 2021
@ekluzek
Copy link
Collaborator

ekluzek commented Aug 26, 2021

@rgknox you had a comment above about fatmlndfrc, which points out something that looks screwy, but works. But, I've seen confusion with this now, and agree that we should address it. There's also some other cases where UNSET is used in cdeps and cmeps that we should change as well.

datm:

model_maskfile/model_meshfile is UNSET

nuopc.runcofig: single_column_lnd_domainfile = UNSET

env_run.xml:

*_DOMAIN_FILE/MESH are UNSET which might be OK
MASK_MESH is UNSET which might be OK
PROXY,MPI_RUN_COMMAND,DATM_CPLHIST_* are UNSET which I think is OK

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Sep 2, 2021
@wwieder wwieder closed this as completed Nov 15, 2022
@samsrabin samsrabin added the science Enhancement to or bug impacting science label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability science Enhancement to or bug impacting science
Projects
No open projects
Development

No branches or pull requests