Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating new precipitation inputs for NEON sites #1904

Closed
wwieder opened this issue Nov 16, 2022 · 23 comments · Fixed by #1954
Closed

Generating new precipitation inputs for NEON sites #1904

wwieder opened this issue Nov 16, 2022 · 23 comments · Fixed by #1954
Assignees
Labels
enhancement new capability or improved behavior of existing capability

Comments

@wwieder
Copy link
Contributor

wwieder commented Nov 16, 2022

Addressing precipitation issues in NEON (grassland) sites seems like a high priority for users of the NCAR-NEON system.

I'll propose a suggested path for accomplishing this, but welcome a discussion on our final implementation.  

Here's the short version:

  • Create v2.1 datm data for CTSM
  • Replace NEON precipitation with PRISM at each NEON site - [this is mainly for consistency and ease of use]
  • Handle this on the NCAR side, creating a new directory run/inputdata/atm/cdeps/v2.1.   
  • Do this AFTER users create a case and download NEON v2 data.

My rationale for doing this on the NCAR side is that it: 

  • Maintains the integrity of the NEON measurements, 
  • Provides a framework for users to use other input data if it's available
  • Forces users to understand the choices they're making about input data sources.
  • Builds on capabilities related to how we're modifying surface datasets with NEON soil properties.

Some challenges / drawbacks include:

  • Complicates the workflow for creating and running cases with run_neon.
  • Adds additional steps for running the system, as opposed to serving up v2.1 directly from the NEON side.  
  • Maintaining these externally sourced datasets as new NEON data become available

@negin513, @jedwards4b @ekluzek I realize you won't be directly contributing to this effort, but thought your perspectives would be helpful as consultants. Also @TeaganKing, @ddurden you're ideas here are important!

@wwieder wwieder added enhancement new capability or improved behavior of existing capability type: -discussion labels Nov 16, 2022
@wwieder wwieder self-assigned this Nov 16, 2022
@TeaganKing
Copy link
Contributor

Thanks for sharing this suggested workflow and rationale!

To provide some additional information that @wwieder and I were discussing, here is a quick comparison of daily precipitation between NEON and PRISM for the MOAB site. One key difference in this particular example is that PRISM shows precipitation from around January to April 2018 while NEON data does not.
NEON_PRSIM_comparison

@ekluzek
Copy link
Collaborator

ekluzek commented Nov 16, 2022

@wwieder and @TeaganKing thanks for the discussion on this. I need to know more about PRISM for this. Does PRISM just replace NEON precipitation or other variables as well? Is PRISM site data or more something like a reanalysis dataset?

I think you are right the user should be aware of what data they are using and make sure that they think they are using the right data. From above there are important differences between the two so you'll want to understand that.

I'm picturing in practice this would be by changing user_nl_datm_streams to point to the PRISM dataset. @TeaganKing could you point me to your case so I could see how this is being done?

@ekluzek ekluzek added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Nov 16, 2022
@wwieder
Copy link
Contributor Author

wwieder commented Nov 16, 2022

You're right Erik, there are quite a few detail on the implementation we'd need to work out here.
The first question is does NCAR or NEON do this? I'm suggesting that NCAR does.

Then, yes, PRISM is a gridded (4km) reanalysis (which may only be for CONUS) with daily data that goes up to basically present day.

  • We'd pull PRISM data from the grid cell that's closest to the NEON tower and
  • start by drizzling precipitation throughout the day.
  • I'd thought we'd have a script that replaced neon precipitation with PRISM, given the time step difference, but maybe there's an easier plan?

Currently NEON data get downloaded into run/inputdata/atm/cdeps/v2/
I was imagining we'd made a new datm directory run/inputdata/atm/cdeps/v2.1
Then users can compare the default NEONv2 data with the PRISMv2.1?

@wwieder
Copy link
Contributor Author

wwieder commented Nov 17, 2022

Note, I only know how to get point data for PRISM manually off of their web site.
Less than ideal if we need to repeat the activity down the road...

@ekluzek
Copy link
Collaborator

ekluzek commented Nov 17, 2022

Ahh, OK 4km is a pretty good gridded resolution and reasonable to use for a tower site. But, that probably also means you'd want to run subset_data on it to just get the closest point pulled out so it would be faster.

Rather than put on the NEON data, I suggest we get the gridded PRISM data for CONUS and have subset_data, pull out the closest point. Then you can point to the PRISM data for precipitation and use NEON for the rest. So you wouldn't modify the NEON data, you'd have a separate PRISM dataset, and you tell user_nl_datm_streams that you want to use PRISM for precipitation and NEON for everything else. Then the temporal frequency can be different for PRISM and NEON and you don't have to adjust the PRISM data so it's on the same frequency as the NEON data.

By the way what is the temporal frequency of the PRISM data? I assume there must be some way to get the entire dataset, which is likely going to be easier for us to use. Although another approach would be to just get the PRISM data for each NEON site, since that's what you want in the end anyway.

@ddurden
Copy link

ddurden commented Nov 17, 2022 via email

@wwieder
Copy link
Contributor Author

wwieder commented Nov 17, 2022

These are good idea @ekluzek but I didn't find the raw, gridded PRISM dataset very easy to work with.
Instead, they do offer a convenient interface for pulling site level data in batch. @ddurden, I think these files are adequate for what we need, so no need to provide something different.

The data are daily resolution and available for all CONUS NEON sites here.
/glade/p/cgd/tss/people/wwieder/inputdata/PRISM

These are .csv files, but it seems like it would be helpful to generate a script that breaks these into individual site-level time series (as annual .nc files)? To be read into CLM? I'm less clear what this should look like. Maybe we can discuss with @TeaganKing and @ekluzek at some point?

@wwieder
Copy link
Contributor Author

wwieder commented Nov 22, 2022

Not sure if it's helpful, but here's a script that I used to modify input data for single point runs
https://github.com/wwieder/BNF_MIP/blob/main/modify_DATM_inputs.ipynb

@TeaganKing
Copy link
Contributor

Hi @ekluzek, I was working with @wwieder this morning on modifying the datm streams to use precipitation data from PRISM. Note that we still want to use the default NEON data streams for other variables, so we want to retain the original NEON.MOAB stream in addition to including an additional stream. In /glade/work/tking/ctsm_tking/tools/site_and_regional/MOAB.transient, I have added in a new datastream (NEON.MOAB.PRECIP) with PRECIP listed under datavars and the relevant files listed under datafiles, and also have another stream (NEON.MOAB) which specifies other non-precip variables.
When previewing the namelists, I'm running into the following error: /glade/work/tking/ctsm_tking/tools/site_and_regional/MOAB.transient/user_nl_datm_streams contains a streamname 'NEON.MOAB.PRECIP' that is not part of valid streamnames ['NEON.MOAB', 'presaero.SSP3-7.0', 'presndep.SSP3-7.0', 'preso3.SSP3-7.0', 'topo.observed', 'co2tseries.20tr']. I'm wondering if you have suggestions for how to smoothly set up an additional datm stream? Do you know where these need streamnames need to be defined?

@jedwards4b
Copy link
Contributor

jedwards4b commented Dec 1, 2022 via email

@TeaganKing
Copy link
Contributor

Thank you @jedwards4b !

We will want to use PRISM data for the following sites (only sites in the continental US): 'BART', 'HARV', 'BLAN', 'SCBI', 'SERC', 'DSNY', 'JERC', 'OSBS', 'STEI', 'TREE', 'UNDE', 'KONA', 'KONZ', 'UKFS', 'GRSM', 'MLBS', 'ORNL', 'DELA', 'LENO', 'TALL', 'DCFS', 'NOGP', 'WOOD', 'CPER', 'RMNP', 'STER', 'CLBJ', 'OAES', 'YELL', 'MOAB', 'NIWO', 'JORN', 'SRER', 'ONAQ', 'ABBY', 'WREF', 'SJER', 'SOAP', 'TEAK'

@jedwards4b
Copy link
Contributor

jedwards4b commented Dec 1, 2022 via email

@TeaganKing
Copy link
Contributor

Ok, that sounds like a good plan! Thank you, Jim!

@TeaganKing
Copy link
Contributor

@ekluzek , it looks like updating the stream definition file at components/cdeps/datm/cime_config/stream_definition_datm.xml and also adding the new stream to variable streamslist in cdeps/datm/cime_config/namelist_definition_datm.xml is working! Thanks to @jedwards4b for his help on this!

@ekluzek
Copy link
Collaborator

ekluzek commented Dec 1, 2022

Cool, glad that is working @TeaganKing! We need to get this into a PR for CDEPS. How should we go about doing that? Do you want to point me to your sandbox and I'll create a CDEPS branch and PR for it?

@ekluzek
Copy link
Collaborator

ekluzek commented Dec 1, 2022

@TeaganKing actually the best way would be to have you create a branch of CDEPS in your sandbox and you create the PR. We could meet to go over how to do that.

@wwieder
Copy link
Contributor Author

wwieder commented Dec 1, 2022

Hi Erik, I agree, but feel like the SE support for long-term application of this capability should wait until we've actually assessed the science impact (and presumably improvements) associated with this change.

@TeaganKing
Copy link
Contributor

Yes, I'm happy to open up a PR if that still seems like a good plan after assessing the results!

@wwieder
Copy link
Contributor Author

wwieder commented Dec 5, 2022

@TeaganKing after checking the results you generated for MOAB seem to be working OK, can you start new AD an postAD simulations that point to the PRISM data?

  • This can be done in 3 phases, but will likely take a few days to complete.
  • The example I provided below assumes you have a base case to clone from (MOAB).
  • You can make your own output-root directory, or just have the files go to your local (tool/site_and_regional/) directory.
  • Using the --neon-site all flag should be OK, and sites outside of CONUS that don't have precipitation data will just fail.

Here are the commands I'd use for each phase of spinup.

./run_neon.py --neon-sites all --neon-version v2 --run-type ad --overwrite --output-root /glade/scratch/wwieder/NEON_prismPPT --base-case /glade/scratch/wwieder/NEON_prismPPT/MOAB

./run_neon.py --neon-sites all --neon-version v2 --run-type postad --overwrite --output-root /glade/scratch/wwieder/NEON_prismPPT --base-case /glade/scratch/wwieder/NEON_prismPPT/MOAB

./run_neon.py --neon-sites all --neon-version v2 --run-type transient --overwrite --output-root /glade/scratch/wwieder/NEON_prismPPT --base-case /glade/scratch/wwieder/NEON_prismPPT/MOAB --run-from-postad

Let me know if you have any questions or issues.

@TeaganKing
Copy link
Contributor

Hi @wwieder , That sounds great! Thanks for sharing those suggestions of how to run the AD and postAD simulations!

@TeaganKing
Copy link
Contributor

As a quick update on this, the results for the MOAB transient run do seem reasonable. While trying to run AD cases, we noticed that the run_neon base case must be of the same type as a requested clone, and @wwieder has filed a ticket. After generalizing the namelists for any NEON site (thanks @jedwards4b for your help on making variables readable in the stream names!), the PRISM precipitation data is now being used in AD runs. I'll be looking at the output and will share how things went once those runs are complete.

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Jan 26, 2023
@wwieder
Copy link
Contributor Author

wwieder commented Feb 21, 2023

@TeaganKing has run simulations for all NEON sites with PRISM data and we'd like to bring this in as a supported feature for NEON users. It seems like there are three parts to this work.

  • CDEPS changes

  • CTSM modifications

  • Input data (I suggest we bring in the PRISM data on NCAR machines, not through NEON).

  • The code modifications will need namelist switches so that users can define what input data we'll use

  • Other aspects of the NEON infrastructure may need to be modified so we can provide initial condition files appropriately, define the configuration we're using with run_neon, or other usemod_dir changes?

  • In bringing this to main, we'd like to think about ways that can be extensible to also reading in other 'alternative' input data (e.g., CESM coupler history output from S2S or SMYLE- type simulations that are being done by the ESPWG).

@TeaganKing can open up a PR so we can have a more focused discussion on this, but I wonder if it would be helpful to have a broader design discussion before she opens the PR (or save this until after the PR is opened)? @jedwards4b @ekluzek

Note, we're targeting having a version of this working by May 2023 for the NCAR-NEON workshop.

@ekluzek
Copy link
Collaborator

ekluzek commented Feb 21, 2023

The CDEPS issue that relates to this is here:

ESCOMP/CDEPS#212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

6 participants