Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script for modifying neon surface dataset #1375

Merged
merged 15 commits into from
Jul 26, 2021

Conversation

negin513
Copy link
Contributor

Description of changes

This script is for modifying the surface dataset at neon sites using data available from the neon server.

Specific notes

Contributors other than yourself, if any: @wwieder , @danicalombardozzi

CTSM Issues Fixed (include github issue #): #1353

Are answers expected to change (and if so in what way)? N/A

Any User Interface Changes (namelist or namelist defaults changes)? N/A

This script has the following user interface:

|------------------------------------------------------------------|
|---------------------  Instructions  -----------------------------|
|------------------------------------------------------------------|
This script is for modifying surface dataset at neon sites
using data available from the neon server.

After creating a single point surface data file from a global 
surface data file using subset_data.py, use this script to 
overwrite some fields with site-specific data for neon sites.

This script will do the following:
- Download neon data for the specified site if it does not exist in the specified directory.
- Modify surface dataset with downloaded data (neon-specific).

-------------------------------------------------------------------
Instructions for running on Cheyenne/Casper:

load the following into your local environment
    module load python
    ncar_pylib

To remove NPL from your environment on Cheyenne/Casper:
    deactivate
-------------------------------------------------------------------
To see the available options:
    ./modify_singlept_site_neon.py --help
-------------------------------------------------------------------

optional arguments:
  -h, --help            show this help message and exit
  --neon_site SITE_NAME
                        4-letter neon site code.
  --surf_dir SURF_DIR   Directory of single point surface dataset. [default:
                        /glade/scratch/$user/single_point/]
  --out_dir OUT_DIR     Directory to write updated single point surface
                        dataset. [default:
                        /glade/scratch/$user/single_point_neon_updated/]

# if file not found run subset_data.py
# Clean up imports for both codes...
# Check against a list of valid names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add zbedrock here, but note that the utilities could be extended to additional data on the surface data as they become needed (canopy height) or available (lai)

Copy link
Contributor Author

@negin513 negin513 Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added zbedrock functionality that we discussed in #1353 (comment) here:

zb_flag = False
if (obs_bot.iloc[-1]<rock_thresh):
f2['zbedrock']=obs_bot.iloc[-1]*100
print (f2['zbedrock'])
zb_flag = True

I used zb_flag for whether or not zbedrock is updated: so if zbedrock gets updated , the code updates the metadata to show zbedrock is modified:

if zb_flag:
nc.attrs['Updated_fields'] = ['PCT_CLAY','PCT_SAND','ORGANIC','zbedrock']
else:
nc.attrs['Updated_fields'] = ['PCT_CLAY','PCT_SAND','ORGANIC']

Should we do anything special for LAI or Canopy height at this point? Maybe add it as a to-do item at the top of the code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwieder @negin513 has a question for you above. But since you normally want to run NEON with BGC I don't think we care about this for now, but I'll let Will weigh in for sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're always planning on running NEON with BGC for now. if we ever get LAI data from NEON sites we can run in SP, but I don't see that happening soon.


import numpy as np
import pandas as pd
import xarray as xr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More broadly how does CTSM, CESM or CESM-lab handle package versions and python environments? Is this something that gets tested for python scripts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwieder these are all standard python packages that you get with ncar_pylib on cheyenne. There are two aspects to python environments, one is the list of python modules used in the python scripts, the other is in setting up the python environment that includes setting specific versions of those. The first is in the script itself, and the only thing that you can do is to check for a specific version of python and/or a python module. It would be good to at least add a check for the python version (this is something we do in cime for example). If you use something version specific in a python module you should do the same. I'm guessing that for this list of modules we aren't using anything version specific.

In terms of setting up the python environment that's outside the script. I looked and didn't see anything specific that CESMLab is doing in this regard, nor are they doing version checking inside the scripts (at least not yet). But, as cheyenne is our workhorse and ncar_pylib is sufficient I think this is fine as it is right now.

I will add an issue to have python scripts check for versions though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ekluzek, I mostly want to make sure the script works, even as ncar_pylib gets updated.

return parser


def get_neon(neon_dir, site_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that you also have a wrapper script do batch process all sites with one call, @negin513 ?


neon_file = os.path.join(neon_dir, site_name + "_surfaceData.csv")

#-- Download the file if it does not exits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to check for when the file was created to know if a file exists but needs to be updated for some reason? Can you query the creation date of the NEON file and if it's later than the creation date of our local copy update to the 'new' data? This relates to NEONScience/NCAR-NEON#39

nc.attrs['Updated_with'] = os.path.abspath(__file__)
nc.attrs['Updated_from'] = surf_file
nc.attrs['Updated_using'] = neon_file
nc.attrs['Updated_fields'] = ['PCT_CLAY','PCT_SAND','ORGANIC']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, add zbedrock here too if the NEON observations don't make it down to 2 m depth.

fname_out = basename[:cend]+"_"+"c"+today_string+".nc"
return(fname_out)

def sort_print_soil_layers(obs_bot, soil_bot):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this function!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to hear it. 👍😎🎉

carbon_tot = df['carbonTot'][bin_index[soil_lev]]
#print ("carbon_tot:", carbon_tot)
layer_depth = df['biogeoBottomDepth'][bin_index[soil_lev]] - df['biogeoTopDepth'][bin_index[soil_lev]]
f2['ORGANIC'][soil_lev] = carbon_tot * bulk_den * 0.1 / layer_depth * 100 / 0.58
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we missing the conversion from g to kg OM at the endORGANIC = carbonTot * bulk_den * 0.1 / layer_depth * 100 /0.58 * 1e-3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe plot up old and new organic values for a few sites to check that values seem to be of similar magnitude?

Copy link
Contributor

@wwieder wwieder Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @negin513 sorry this has been so confusing. Here's what I think we need do for ORGANIC on the surface dataset. We're taking these data from NEON:

  • carbonTot (gC/kg soil) and
  • bulkDensExclCoarseFrag (g soil / cm3 soil) and need to calculate
  • ORGANIC (kg OM/m3 soil, assuming 0.58gC/gOM)

To do this I think we just need:

  • ORGANIC = carbonTot * bulkDensity / 0.58
  • I think all the units converting between g-kg and cm3-m3 end up dropping out
  • I also don't think we need depth, since units for ORGANIC are volumetric, not on an areas basis.

After doing this calculation with the raw NEON data, then you can select the CLM soil layer that corresponds to the NEON horizon, as you did for SAND and CLAY. let's stick with this for now, it's easier

  • OR -

We can do a depth weighted average for values between NEON layers, unlike for texture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for confusion on this and I can provide more details, but it seems like the equation here should be:

ORGANIC = carbonTot * bulk_den / 0.58 (units would be kgOM/m3 soil)

Then we can select CLM layers that correspond to the matching NEON observations?

layer_depth = df['biogeoBottomDepth'][bin_index[soil_lev]] - df['biogeoTopDepth'][bin_index[soil_lev]]
f2['ORGANIC'][soil_lev] = carbon_tot * bulk_den * 0.1 / layer_depth * 100 / 0.58

#TODO : max depth for neon sites from WW
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this to #1353

@ekluzek
Copy link
Collaborator

ekluzek commented May 20, 2021

Since this is specific to neon and under contrib this can come in anytime.

@ekluzek
Copy link
Collaborator

ekluzek commented Jun 29, 2021

@negin513 where is this PR at in terms of being ready to go?

@ekluzek ekluzek added PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete test: none No tests required (e.g. tools/contrib) tag: support tools only enhancement new capability or improved behavior of existing capability labels Jul 15, 2021
@ekluzek ekluzek self-assigned this Jul 15, 2021
@wwieder
Copy link
Contributor

wwieder commented Jul 22, 2021

before this gets merged in do you want to correct the formula for organic @negin513 and re-create the modified surface data? This also isn't critical since we'll get to repeat the process once NEON provides the estimatedOrganicCarbon data from the megapit.

@ekluzek
Copy link
Collaborator

ekluzek commented Jul 22, 2021

@wwieder since I've checked in the new surface datasets as they are, we are going to go with what we have right now. But, we can have another future change with the changes you want for a subsequent tag.

ekluzek added 3 commits July 23, 2021 00:11
 I used existing infrastructure to add descriptive strings to certain history
 fields that I had labeled by number in ESCOMP#1340. While doing this, I applied the
 change to a bunch of other history fields that needed it.

 Some variable names for pools was also changed to use terms consistent with the new
 names as well.
… it has to run in the directory where the script exists, and the system tools test don't allow that right now
…e starts with a . and is hidden, but seems to be correct, also add the two new tests to the standard test list
@ekluzek ekluzek merged commit 2e4f7d5 into ESCOMP:master Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete test: none No tests required (e.g. tools/contrib)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants