Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLM-only] Rework mksurfdata.pl to have a build-namelist like the main CLM build-namelist #86

Closed
ekluzek opened this issue Dec 16, 2017 · 15 comments
Assignees
Labels
enhancement new capability or improved behavior of existing capability

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2011-08-09 12:12:11 -0600
Bugzilla Id: 1388
Bugzilla CC: bugzillaMuszala, mvertens, rfisher, sacks,

The complexity of the mksurfdata_map tool is so complex it needs a robust build-namelist tool to manage it. It should be reworked so that mksurfdata.pl uses the tool externally (instead of queryNamelist.pl). The namelist generation would be better and could be used outside of mksurfdata.pl.

@ekluzek ekluzek added this to the future milestone Dec 16, 2017
@ekluzek ekluzek added the enhancement new capability or improved behavior of existing capability label Dec 16, 2017
@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2011-10-26 10:09:43 -0600

Add mariana to this bug.

Also need to have the ability to create a set of mapping files and immediately be able to run mksurfdata.pl using them. This might mean having a script to update the XML files with the mapping files, or something. Not quite sure.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2011-10-26 10:38:42 -0600

Talking to Mariana we thought we need an option for unusual grids. Where you want to just explore a grid as quickly as possible without having to put files into the XML database. In this case mksurfdata.pl could have an option to run mkmapdata.sh leave the files in place and simply point to them when creating the namelist. The naming convention then needs to be updated to be consistent between the two.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2011-10-28 11:43:59 -0600

Mariana would like the following optional tool creation process...

Erik - I would like to implement a tool chain in clm - that does the following

0) given a model domain atm SCRIP grid file (unit mask) and a model domain ocn SCRIP grid file
1) creates the necessary mapping files on the fly for generating the surface dataset using (0)
2) creates the surface dataset using (1)
3) creates the map file for ocn->atm (including a unit mapping if needed)
4) creates the ocn and land domain files using (1)    

The above should not require adding any changes to the clm xml files.

@billsacks billsacks added the priority: low Background task that doesn't need to be done right away. label Nov 5, 2018
@ekluzek ekluzek removed this from the future milestone Aug 26, 2019
@ekluzek ekluzek removed the priority: low Background task that doesn't need to be done right away. label Oct 12, 2020
@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 12, 2020

We talked about this in a meeting today about the tool chain process. We thought this would be part of the implementation. A tool for mksurfdata_map called preview_namelists (just like the main CESM tool in cime) to create the namelist for mksurfdata_map. The tool would also create the "namelist" needed to create mapping files. And it would allow a "user_nl_clm" file to modify the contents of the namelist. And it would also allow check_input_data from cime to be run to get needed raw datasets into the $DIN_LOC_ROOT area the user points to.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 12, 2020

Something @billsacks talked about is that the LILAC build process may be best to be started first, so that the $DIN_LOC_ROOT area is defined beforehand.

@billsacks
Copy link
Member

My impression of today's discussion is that we discussed two separate mechanisms for generating the namelist file:

(1) Use a user_nl approach like we do for the model run

(2) Have a separate tool that generates a default namelist, given some high-level options. Then the user would hand-edit this file and run the main tool that generates mapping files and runs mksurfdata_map.

I'm leaning towards (2): I feel like (1) is a good approach for something very complex like the whole CTSM runtime namelist, but that it may be harder for the user in this relatively simpler case. In particular, it relies on figuring out what you want to change up-front. In my mind, this would mean that we would not have a user_nl file (in contrast to @ekluzek 's note above).

Also, regarding DIN_LOC_ROOT as defined by LILAC's build_ctsm script: I wanted to add (slightly different from what I said in the meeting) that, by default, for a user-defined machine, this build_ctsm script creates a new inputdata space within your specified build location. You can change this with a command-line flag to build_ctsm in order to use a more persistent / shared location on a given machine.

Other advantages would come from running build_ctsm first, at least for a user-defined machine: this requires you to set information that can be leveraged by the mksurfdata_map build, as well as batch system-related information (and number of tasks per node on the machine) that could be leveraged when setting up the batch scripts for creating the mapping files.

If you wanted to set up this information without actually doing the build, you could run build_ctsm with the --no-build argument in order to set up all of the needed directories and machine information, then run it with the --rebuild argument when you actually want to build the model. I didn't set it up with this usage in mind, but I think it would work that way, and would just require tweaking some documentation to describe this workflow.

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Oct 12, 2020 via email

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 12, 2020

@billsacks and @dlawrenncar I can see it being done either way as well (Bill's option 1 or option 2). Note, that option 2 is something we do have available right now in mksurfdata.pl and I recommend it when the changes you want to make are outside of what mksrfdata.pl can handle (for example for paleo cases). So this is talked about in the User's Guide and README files for example.

The downsides I see to this is that the namelist is long and complex. Editing the entire namelist puts it on them to understand it in its entirety, which is a big expectation. I also think that because we already have infrastructure for the model to handle user_nl_clm, it actually may be easier to implement it that way than option 2. It also provides the user with a similar way of working as for the main model, so you use the same solution to edit the namelist in two different contexts, rather than a different one. It's always easier to teach a user one way to do something and have them do it for different things, than to teach them two different ways of doing something.

But, again I can see doing it either way here.

@slevis-lmwg
Copy link
Contributor

In the context of the wrapper tool, here is new information about the namelist.

What we have been referring to as namelist will evolve into two files:

  • A control file (in a format TBD such as namelist, yaml, config-file format, json, xml) for all user modifications.
  • An internal namelist file read by the mksurfdata_map fortran code that will not involve user modification.

@slevis-lmwg
Copy link
Contributor

Identification of the pieces of mksurfdata.pl that seem relevant and not relevant in the generation of the new control file:

Not relevant
sub check_soil
sub check_soil_col_fmx
sub check_pft
These subroutines are for override cases that do not use the corresponding mksrf_ files (though I didn't see a call to check_soil_col_fmx from within mksurfdata.pl). These subroutines seem to repeat checks done (or that should be done) by the fortran executable mksurfdata_map. I propose eliminating the subroutines here from the new script that will be generating the control file.

Then we need to think about how to allow the user to modify the control file when they want to override soil and pft data. If we want to maintain the flexibility of adding pft and soil data as strings, then I could see the mksrf_ file name strings themselves used as comma delimited lists when so desired.

Relevant
sub write_transient_timeseries_file
sub write_namelist_file
sub trim

@slevis-lmwg
Copy link
Contributor

A couple of interesting comments from today's CTSM Software meeting:

  • @billsacks said: The script that replaces mksurfdata.pl should have no more than 5-6 command-line options. Put remaining options in a config file. Python has good config file reading tools.
  • @ekluzek expressed concern about us removing existing mksurfdata.pl options for cases that are still useful, especially related to PTCLM. @dlawrenncar said not to be concerned because there may be an overhaul of the PTCLM process coming, if I understood correctly. If so, do we need to coordinate the mksurfdata tool-chain with the PTCLM overhaul?

@billsacks
Copy link
Member

  • @billsacks said: The script that replaces mksurfdata.pl should have no more than 5-6 command-line options. Put remaining options in a config file. Python has good config file reading tools.

Just to clarify: I was just giving this as a gut-level rough rule of thumb that I would probably use to decide whether to go with command-line options or a config file. My experience tends to be that command-line options are easier in general, because it's easy to build scripts around them and generally rerun a command multiple times, but that it can get unwieldy if there are a lot of required command-line options (optional command-line options are fine). But I would certainly welcome other opinions about the relative usability of command-line options vs. config files.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 22, 2020

  • @ekluzek expressed concern about us removing existing mksurfdata.pl options for cases that are still useful, especially related to PTCLM. @dlawrenncar said not to be concerned because there may be an overhaul of the PTCLM process coming, if I understood correctly. If so, do we need to coordinate the mksurfdata tool-chain with the PTCLM overhaul?

The options in question are used by PTCLM, but also used if you want to create a single point surface dataset for a given tower site. If we want to maintain this general ability for the surface dataset tools to do this general kind of thing -- then it should stay around. One of the concerns that @dlawrenncar brought up is that we have several different ways to create datasets for a single point site. Because, people use different mechanisms, we can't concentrate support in one tool or way of doing it. As part of today's CLM science meeting we are going to discuss the different methods and try to have a cohesive plan on how to support this functionality. So it's possible the tool chain effort will need to coordinate with this effort for single point, but it's unclear right now.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Apr 14, 2022

The intent of this is being accomplished in PR #1663

@ekluzek
Copy link
Collaborator Author

ekluzek commented Apr 27, 2022

Closing as there is a new tool for creating namelists for mksurfdata_esmf.

@ekluzek ekluzek closed this as completed Apr 27, 2022
samsrabin pushed a commit to samsrabin/CTSM that referenced this issue Apr 19, 2024
### Description of changes

Add in ability to handle 1PT forcing streams for a generic CLM_USRDAT site. This was in MCT, but left off of the NUOPC implementation. NEON sites were added, but not the generic case.

Also change some of the settings of extend from 1PT streams to cycle. And for 1PT cases and NLDAS2 forcing use limit in place of cycle, so you'll see if you are out of the data range.

### Specific notes

Contributors other than yourself, if any:

CMEPS Issues Fixed (include github issue #):

  Fixes ESCOMP#86

Are there dependencies on other component PRs
 - [ x] CIME cime5.8.47 #3954

Are changes expected to change answers?
 - [ x] bit for bit


Any User Interface Changes (namelist or namelist defaults changes)?
 - [ ] Yes
 - [ x] No

Testing performed: Ran following single point tests...

SMS_D_Ld5_Mmpi-serial.1x1_mexicocityMEX.I1PtClm50SpRs.cheyenne_intel.clm-default			
SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.cheyenne_intel.clm-USUMB_mct			
SMS_D_Lm1_Mmpi-serial_Vnuopc.CLM_USRDAT.I1PtClm50SpRs.cheyenne_intel.clm-USUMB_nuopc			
SMS_Lm1_Mmpi-serial.1x1_brazil.IHistClm50BgcQianRs.cheyenne_gnu.clm-output_bgc_highfreq			
SMS_Lm1_Mmpi-serial_Vnuopc.1x1_brazil.IHistClm50BgcQianRs.cheyenne_gnu.clm-output_bgc_highfreq

Hashes used for testing:
- [ ] CIME
  - repository to check out: https://github.com/ESCOMP/CESM.git
  - tag: cime5.8.47
- [ ] CMEPS
  - repository to check out: https://github.com/ESCOMP/CESM.git
  - tag: v0.10.0
- [ ] CTSM
  - repository to check out: https://github.com/jedwards4b/CTSM.git
  - branch: neon_compsets
  - describe: ctsm5.1.dev038-37-g8a65d2f19
  - This will be ctsm5.1.dev039
samsrabin pushed a commit to samsrabin/CTSM that referenced this issue Apr 19, 2024
### Description of changes

Fix the align year issue. Fix a total of three issues.

### Specific notes

Contributors other than yourself, if any:

CMEPS Issues Fixed (include github issue #):
 Fixes ESCOMP#86
 Fixes ESCOMP#89 
 Fixes ESCOMP#91 

Are there dependencies on other component PRs: No
 - [ ] CIME (list)
 - [ ] CMEPS (list) 

Are changes expected to change answers?
 - [ x] bit for bit
 - [ ] different at roundoff level
 - [ ] more substantial 
Technically tests with the 1PT sites will be different because the start date is different.

Any User Interface Changes (namelist or namelist defaults changes)?
 - [ ] Yes
 - [ x] No
 
The one caveat is that now $DATM_YR is enabled to work for the 1PT sites

Testing performed:
- [ x] (required) aux_cdeps
   - machines and compilers:  cheyenne_intel
   - failed tests: Since I tested in a CTSM checkout the DLND test fails as expected, but as no changes to dlnd this shouldn't matter:
SMS_Vnuopc_Ld3.f09_f09_mg17.1850_SATM_DLND%SCPL_SICE_SOCN_SROF_SGLC_SWAV.cheyenne_intel

Hashes used for testing:
- [ ] CIME
  - repository to check out: https://github.com/ESCOMP/CESM.git
  - tag: cime5.8.47
- [ ] CMEPS
  - repository to check out: https://github.com/ESCOMP/CESM.git
  - tag; v0.10.0
- [ ] CTSM
  - repository to check out: https://github.com/jedwards4b/CESM.git
  - branch: neon_compsets
  - hash: ctsm5.1.dev038-43-gd43264c75
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability
Projects
None yet
Development

No branches or pull requests

5 participants