[CLM-only] Rework mksurfdata.pl to have a build-namelist like the main CLM build-namelist #86

ekluzek · 2017-12-16T20:30:09Z

Erik Kluzek < erik > - 2011-08-09 12:12:11 -0600
Bugzilla Id: 1388
Bugzilla CC: bugzillaMuszala, mvertens, rfisher, sacks,

The complexity of the mksurfdata_map tool is so complex it needs a robust build-namelist tool to manage it. It should be reworked so that mksurfdata.pl uses the tool externally (instead of queryNamelist.pl). The namelist generation would be better and could be used outside of mksurfdata.pl.

ekluzek · 2017-12-16T20:30:15Z

Erik Kluzek < erik > - 2011-10-26 10:09:43 -0600

Add mariana to this bug.

Also need to have the ability to create a set of mapping files and immediately be able to run mksurfdata.pl using them. This might mean having a script to update the XML files with the mapping files, or something. Not quite sure.

ekluzek · 2017-12-16T20:30:20Z

Erik Kluzek < erik > - 2011-10-26 10:38:42 -0600

Talking to Mariana we thought we need an option for unusual grids. Where you want to just explore a grid as quickly as possible without having to put files into the XML database. In this case mksurfdata.pl could have an option to run mkmapdata.sh leave the files in place and simply point to them when creating the namelist. The naming convention then needs to be updated to be consistent between the two.

ekluzek · 2017-12-16T20:30:26Z

Erik Kluzek < erik > - 2011-10-28 11:43:59 -0600

Mariana would like the following optional tool creation process...

Erik - I would like to implement a tool chain in clm - that does the following

0) given a model domain atm SCRIP grid file (unit mask) and a model domain ocn SCRIP grid file
1) creates the necessary mapping files on the fly for generating the surface dataset using (0)
2) creates the surface dataset using (1)
3) creates the map file for ocn->atm (including a unit mapping if needed)
4) creates the ocn and land domain files using (1)

The above should not require adding any changes to the clm xml files.

ekluzek · 2020-10-12T18:20:41Z

We talked about this in a meeting today about the tool chain process. We thought this would be part of the implementation. A tool for mksurfdata_map called preview_namelists (just like the main CESM tool in cime) to create the namelist for mksurfdata_map. The tool would also create the "namelist" needed to create mapping files. And it would allow a "user_nl_clm" file to modify the contents of the namelist. And it would also allow check_input_data from cime to be run to get needed raw datasets into the $DIN_LOC_ROOT area the user points to.

ekluzek · 2020-10-12T18:21:38Z

Something @billsacks talked about is that the LILAC build process may be best to be started first, so that the $DIN_LOC_ROOT area is defined beforehand.

billsacks · 2020-10-12T18:37:56Z

My impression of today's discussion is that we discussed two separate mechanisms for generating the namelist file:

(1) Use a user_nl approach like we do for the model run

(2) Have a separate tool that generates a default namelist, given some high-level options. Then the user would hand-edit this file and run the main tool that generates mapping files and runs mksurfdata_map.

I'm leaning towards (2): I feel like (1) is a good approach for something very complex like the whole CTSM runtime namelist, but that it may be harder for the user in this relatively simpler case. In particular, it relies on figuring out what you want to change up-front. In my mind, this would mean that we would not have a user_nl file (in contrast to @ekluzek 's note above).

Also, regarding DIN_LOC_ROOT as defined by LILAC's build_ctsm script: I wanted to add (slightly different from what I said in the meeting) that, by default, for a user-defined machine, this build_ctsm script creates a new inputdata space within your specified build location. You can change this with a command-line flag to build_ctsm in order to use a more persistent / shared location on a given machine.

Other advantages would come from running build_ctsm first, at least for a user-defined machine: this requires you to set information that can be leveraged by the mksurfdata_map build, as well as batch system-related information (and number of tasks per node on the machine) that could be leveraged when setting up the batch scripts for creating the mapping files.

If you wanted to set up this information without actually doing the build, you could run build_ctsm with the --no-build argument in order to set up all of the needed directories and machine information, then run it with the --rebuild argument when you actually want to build the model. I didn't set it up with this usage in mind, but I think it would work that way, and would just require tweaking some documentation to describe this workflow.

dlawrenncar · 2020-10-12T18:52:41Z

My intuition says option 2 as well. From a user perspective, that seems the easiest. Personally, I would like to be able to look through the list of raw files, for example, to decide what I might want to change, and even what I could potentially even think about changing.

…

On Mon, Oct 12, 2020 at 12:38 PM Bill Sacks ***@***.***> wrote: My impression of today's discussion is that we discussed two separate mechanisms for generating the namelist file: (1) Use a user_nl approach like we do for the model run (2) Have a separate tool that generates a default namelist, given some high-level options. Then the user would hand-edit this file and run the main tool that generates mapping files and runs mksurfdata_map. I'm leaning towards (2): I feel like (1) is a good approach for something very complex like the whole CTSM runtime namelist, but that it may be harder for the user in this relatively simpler case. In particular, it relies on figuring out what you want to change up-front. In my mind, this would mean that we would not have a user_nl file (in contrast to @ekluzek <https://github.com/ekluzek> 's note above). Also, regarding DIN_LOC_ROOT as defined by LILAC's build_ctsm script: I wanted to add (slightly different from what I said in the meeting) that, by default, for a user-defined machine, this build_ctsm script creates a new inputdata space within your specified build location. You can change this with a command-line flag to build_ctsm in order to use a more persistent / shared location on a given machine. Other advantages would come from running build_ctsm first, at least for a user-defined machine: this requires you to set information that can be leveraged by the mksurfdata_map build, as well as batch system-related information (and number of tasks per node on the machine) that could be leveraged when setting up the batch scripts for creating the mapping files. If you wanted to set up this information without actually doing the build, you could run build_ctsm with the --no-build argument in order to set up all of the needed directories and machine information, then run it with the --rebuild argument when you actually want to build the model. I didn't set it up with this usage in mind, but I think it would work that way, and would just require tweaking some documentation to describe this workflow. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#86 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFABYVGBYJKA3UTJIT3XFR3SKNEJJANCNFSM4EIRZNUA> .

ekluzek · 2020-10-12T20:40:04Z

@billsacks and @dlawrenncar I can see it being done either way as well (Bill's option 1 or option 2). Note, that option 2 is something we do have available right now in mksurfdata.pl and I recommend it when the changes you want to make are outside of what mksrfdata.pl can handle (for example for paleo cases). So this is talked about in the User's Guide and README files for example.

The downsides I see to this is that the namelist is long and complex. Editing the entire namelist puts it on them to understand it in its entirety, which is a big expectation. I also think that because we already have infrastructure for the model to handle user_nl_clm, it actually may be easier to implement it that way than option 2. It also provides the user with a similar way of working as for the main model, so you use the same solution to edit the namelist in two different contexts, rather than a different one. It's always easier to teach a user one way to do something and have them do it for different things, than to teach them two different ways of doing something.

But, again I can see doing it either way here.

slevis-lmwg · 2020-10-21T23:53:17Z

In the context of the wrapper tool, here is new information about the namelist.

What we have been referring to as namelist will evolve into two files:

A control file (in a format TBD such as namelist, yaml, config-file format, json, xml) for all user modifications.
An internal namelist file read by the mksurfdata_map fortran code that will not involve user modification.

slevis-lmwg · 2020-10-22T00:13:57Z

Identification of the pieces of mksurfdata.pl that seem relevant and not relevant in the generation of the new control file:

Not relevant
sub check_soil
sub check_soil_col_fmx
sub check_pft
These subroutines are for override cases that do not use the corresponding mksrf_ files (though I didn't see a call to check_soil_col_fmx from within mksurfdata.pl). These subroutines seem to repeat checks done (or that should be done) by the fortran executable mksurfdata_map. I propose eliminating the subroutines here from the new script that will be generating the control file.

Then we need to think about how to allow the user to modify the control file when they want to override soil and pft data. If we want to maintain the flexibility of adding pft and soil data as strings, then I could see the mksrf_ file name strings themselves used as comma delimited lists when so desired.

Relevant
sub write_transient_timeseries_file
sub write_namelist_file
sub trim

slevis-lmwg · 2020-10-22T16:44:19Z

A couple of interesting comments from today's CTSM Software meeting:

@billsacks said: The script that replaces mksurfdata.pl should have no more than 5-6 command-line options. Put remaining options in a config file. Python has good config file reading tools.
@ekluzek expressed concern about us removing existing mksurfdata.pl options for cases that are still useful, especially related to PTCLM. @dlawrenncar said not to be concerned because there may be an overhaul of the PTCLM process coming, if I understood correctly. If so, do we need to coordinate the mksurfdata tool-chain with the PTCLM overhaul?

billsacks · 2020-10-22T17:09:38Z

@billsacks said: The script that replaces mksurfdata.pl should have no more than 5-6 command-line options. Put remaining options in a config file. Python has good config file reading tools.

Just to clarify: I was just giving this as a gut-level rough rule of thumb that I would probably use to decide whether to go with command-line options or a config file. My experience tends to be that command-line options are easier in general, because it's easy to build scripts around them and generally rerun a command multiple times, but that it can get unwieldy if there are a lot of required command-line options (optional command-line options are fine). But I would certainly welcome other opinions about the relative usability of command-line options vs. config files.

ekluzek · 2020-10-22T19:03:10Z

@ekluzek expressed concern about us removing existing mksurfdata.pl options for cases that are still useful, especially related to PTCLM. @dlawrenncar said not to be concerned because there may be an overhaul of the PTCLM process coming, if I understood correctly. If so, do we need to coordinate the mksurfdata tool-chain with the PTCLM overhaul?

The options in question are used by PTCLM, but also used if you want to create a single point surface dataset for a given tower site. If we want to maintain this general ability for the surface dataset tools to do this general kind of thing -- then it should stay around. One of the concerns that @dlawrenncar brought up is that we have several different ways to create datasets for a single point site. Because, people use different mechanisms, we can't concentrate support in one tool or way of doing it. As part of today's CLM science meeting we are going to discuss the different methods and try to have a cohesive plan on how to support this functionality. So it's possible the tool chain effort will need to coordinate with this effort for single point, but it's unclear right now.

ekluzek · 2022-04-14T04:13:21Z

The intent of this is being accomplished in PR #1663

ekluzek · 2022-04-27T22:38:51Z

Closing as there is a new tool for creating namelists for mksurfdata_esmf.

### Description of changes Add in ability to handle 1PT forcing streams for a generic CLM_USRDAT site. This was in MCT, but left off of the NUOPC implementation. NEON sites were added, but not the generic case. Also change some of the settings of extend from 1PT streams to cycle. And for 1PT cases and NLDAS2 forcing use limit in place of cycle, so you'll see if you are out of the data range. ### Specific notes Contributors other than yourself, if any: CMEPS Issues Fixed (include github issue #): Fixes ESCOMP#86 Are there dependencies on other component PRs - [ x] CIME cime5.8.47 #3954 Are changes expected to change answers? - [ x] bit for bit Any User Interface Changes (namelist or namelist defaults changes)? - [ ] Yes - [ x] No Testing performed: Ran following single point tests... SMS_D_Ld5_Mmpi-serial.1x1_mexicocityMEX.I1PtClm50SpRs.cheyenne_intel.clm-default SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.cheyenne_intel.clm-USUMB_mct SMS_D_Lm1_Mmpi-serial_Vnuopc.CLM_USRDAT.I1PtClm50SpRs.cheyenne_intel.clm-USUMB_nuopc SMS_Lm1_Mmpi-serial.1x1_brazil.IHistClm50BgcQianRs.cheyenne_gnu.clm-output_bgc_highfreq SMS_Lm1_Mmpi-serial_Vnuopc.1x1_brazil.IHistClm50BgcQianRs.cheyenne_gnu.clm-output_bgc_highfreq Hashes used for testing: - [ ] CIME - repository to check out: https://github.com/ESCOMP/CESM.git - tag: cime5.8.47 - [ ] CMEPS - repository to check out: https://github.com/ESCOMP/CESM.git - tag: v0.10.0 - [ ] CTSM - repository to check out: https://github.com/jedwards4b/CTSM.git - branch: neon_compsets - describe: ctsm5.1.dev038-37-g8a65d2f19 - This will be ctsm5.1.dev039

### Description of changes Fix the align year issue. Fix a total of three issues. ### Specific notes Contributors other than yourself, if any: CMEPS Issues Fixed (include github issue #): Fixes ESCOMP#86 Fixes ESCOMP#89 Fixes ESCOMP#91 Are there dependencies on other component PRs: No - [ ] CIME (list) - [ ] CMEPS (list) Are changes expected to change answers? - [ x] bit for bit - [ ] different at roundoff level - [ ] more substantial Technically tests with the 1PT sites will be different because the start date is different. Any User Interface Changes (namelist or namelist defaults changes)? - [ ] Yes - [ x] No The one caveat is that now $DATM_YR is enabled to work for the 1PT sites Testing performed: - [ x] (required) aux_cdeps - machines and compilers: cheyenne_intel - failed tests: Since I tested in a CTSM checkout the DLND test fails as expected, but as no changes to dlnd this shouldn't matter: SMS_Vnuopc_Ld3.f09_f09_mg17.1850_SATM_DLND%SCPL_SICE_SOCN_SROF_SGLC_SWAV.cheyenne_intel Hashes used for testing: - [ ] CIME - repository to check out: https://github.com/ESCOMP/CESM.git - tag: cime5.8.47 - [ ] CMEPS - repository to check out: https://github.com/ESCOMP/CESM.git - tag; v0.10.0 - [ ] CTSM - repository to check out: https://github.com/jedwards4b/CESM.git - branch: neon_compsets - hash: ctsm5.1.dev038-43-gd43264c75

ekluzek added this to the future milestone Dec 16, 2017

ekluzek added the enhancement new capability or improved behavior of existing capability label Dec 16, 2017

billsacks added the priority: low Background task that doesn't need to be done right away. label Nov 5, 2018

ekluzek removed this from the future milestone Aug 26, 2019

ekluzek assigned negin513 and slevis-lmwg Oct 12, 2020

ekluzek removed the priority: low Background task that doesn't need to be done right away. label Oct 12, 2020

slevis-lmwg mentioned this issue Oct 13, 2020

mksurfdat toolchain: Wrapper tool that handles all the steps needed to create a CTSM surface dataset #644

Closed

slevis-lmwg mentioned this issue Jul 7, 2021

Toolchain part1: ./gen_mksurf_namelist.py #1419

Merged

ekluzek closed this as completed Apr 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLM-only] Rework mksurfdata.pl to have a build-namelist like the main CLM build-namelist #86

[CLM-only] Rework mksurfdata.pl to have a build-namelist like the main CLM build-namelist #86

ekluzek commented Dec 16, 2017

ekluzek commented Dec 16, 2017

ekluzek commented Dec 16, 2017

ekluzek commented Dec 16, 2017

ekluzek commented Oct 12, 2020

ekluzek commented Oct 12, 2020

billsacks commented Oct 12, 2020

dlawrenncar commented Oct 12, 2020 via email

ekluzek commented Oct 12, 2020

slevis-lmwg commented Oct 21, 2020

slevis-lmwg commented Oct 22, 2020

slevis-lmwg commented Oct 22, 2020

billsacks commented Oct 22, 2020

ekluzek commented Oct 22, 2020

ekluzek commented Apr 14, 2022

ekluzek commented Apr 27, 2022

[CLM-only] Rework mksurfdata.pl to have a build-namelist like the main CLM build-namelist #86

[CLM-only] Rework mksurfdata.pl to have a build-namelist like the main CLM build-namelist #86

Comments

ekluzek commented Dec 16, 2017

ekluzek commented Dec 16, 2017

ekluzek commented Dec 16, 2017

ekluzek commented Dec 16, 2017

ekluzek commented Oct 12, 2020

ekluzek commented Oct 12, 2020

billsacks commented Oct 12, 2020

dlawrenncar commented Oct 12, 2020 via email

ekluzek commented Oct 12, 2020

slevis-lmwg commented Oct 21, 2020

slevis-lmwg commented Oct 22, 2020

slevis-lmwg commented Oct 22, 2020

billsacks commented Oct 22, 2020

ekluzek commented Oct 22, 2020

ekluzek commented Apr 14, 2022

ekluzek commented Apr 27, 2022