Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output of sowing and harvest dates #1537

Closed
samsrabin opened this issue Nov 1, 2021 · 8 comments
Closed

Output of sowing and harvest dates #1537

samsrabin opened this issue Nov 1, 2021 · 8 comments
Assignees
Labels
enhancement new capability or improved behavior of existing capability

Comments

@samsrabin
Copy link
Collaborator

As part of my work on #519, I need to extract planting and harvest dates from the output of every crop in every gridcell. Initially I've been determining these by looking at when output CPHASE (i.e., member cphase_patch of crop_type) either reaches 4 (harvest date) or is reset from 4 back down to 1 (sowing date). However, this will presumably fail when harvest occurs because the maximum growing season length has been reached. In extreme cases, crop phase could be 1 throughout the year, making it useless for this purpose.

One option would be for me to simply set crop phase to 4 upon harvest, but this seems likely to eventually cause confusion. I'd like it to be obvious when a crop wasn't actually mature at "harvest."

I think the most flexible and efficient way to do this—i.e., allowing for eventual code allowing multiple growing seasons per year, while also minimizing output storage requirements—would be to create a single new crop_event variable in crop_type for each patch. This could be 1 on the sowing date, 2 on the harvest date, and 0 otherwise. We could even distinguish between harvest reasons: 2 if harvested upon reaching maturity, and 3 if harvested upon reaching maximum growing season length (or butting up against the next sowing date).

Related: Is there any way to save a variable to netCDF as type byte or short? This would help save space on small integer outputs like this.

@samsrabin
Copy link
Collaborator Author

I see now that the existing code already should set crop phase to 4 upon harvest, even when harvesting due to maximum growing season length instead of reaching maturity. This is a bit confusing to me, as I'm seeing CPHASE outputs between 3 and 4 in such cases. Maybe it's an averaging issue?

Regardless, I think my proposed crop_event would still be useful. Allowing for harvest reason to be distinguished would be useful for eventual integration with food systems modeling work (e.g., a model might say that crops harvested before maturity can't be used to feed people). In addition, it would be nice to provide an unambiguous way to determine the date of sowing and planting—I drove myself in circles for a while coming up with the logic to do so in a postprocessing script.

@ekluzek ekluzek added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Nov 3, 2021
@ekluzek
Copy link
Collaborator

ekluzek commented Nov 3, 2021

@samsrabin both byte and short are allowed on NetCDF. Typically byte is only used for character strings. In CTSM we don't allow for short, but you could add it to the allowed option (because ncdio_pio.F90.in specifies what the options are). I think byte might be more difficult to make sure you distinguish it for character strings.

Because integer's already take up less space than floating point, I'm not sure there's a really compelling reason to use short integers. We have so much more floating point data than integer. And to save memory that integer data would need to be large arrays, which we don't have that many of. A refinement that I think would be worthwhile would be to convert some of the floating point data that we have that really should be integer though.

So I'm not convinced this would be important to do. But, it looks like adding short's should'n't be hard either. But if you were to actually try to do it, I wouldn't be surprised if you ran into issues, as you find hidden assumptions.

@ekluzek ekluzek added the enhancement new capability or improved behavior of existing capability label Nov 3, 2021
@billsacks
Copy link
Member

@samsrabin I'm not clear on how you plan to use these outputs to determine the actual date of the events. Does this rely on daily output of the relevant variables? The fact that you're seeing values between 3 and 4 does seem like a time-averaging issue. I'm just wondering if it might be more straightforward – at least for post-processing / analysis, and maybe in the code as well – if you had variables like PLANTING_DATE and HARVEST_DATE and stored the actual day of year in these variables at the time when planting / harvest happened. Then you could just do annual output of these variables. Or am I missing something that makes it straightforward to extract planting & harvest date with your plan without requiring frequent output?

@danicalombardozzi can you comment on whether & how you have extracted planting & harvest date information in the past?

@samsrabin
Copy link
Collaborator Author

@billsacks Yeah, that's how I was planning to determine the dates. I figured this would be an elegant solution to deal with eventual multiple plantings or harvests in a single year, although I guess that could also be accomplished using a "number of events" dimension on separate, annual planting and harvest variables.

If you think that makes more sense, could you point me towards somewhere in the code something similar happens? I'm not familiar with any annual output variables, or any outputs with a dimension other than time.

@billsacks
Copy link
Member

You should do what you feel makes most sense in terms of post-processing / analysis, but my somewhat outsider view on this is that a set of annually-output date variables – yes, with an extra "number of events" dimension – would both be more straightforward to analyze and be more efficient in terms of output volume. But I could be wrong. At first I thought you could avoid having the "number of events" dimension for now, but now I'm realizing that for SH / tropical crops that might be needed even now (with a max size of 2) to accommodate (for example) a planting in early Jan and then another one in late Dec, which I think can happen now, at least with the prognostic planting dates (I forget the outcome of the long discussion we had about this for prescribed planting dates). So I can see how the logic wouldn't be totally straightforward for filling this diagnostic variable... probably something like this (maybe this is already clear to you, but I'm thinking out loud):

  • At the beginning of each year (which you can determine using the function is_beg_curr_year in clm_time_manager.F90), reset this diagnostic variable (to 0 or spval) for all patches
  • When a planting event happens, find the first non-filled (i.e., still 0 or spval) slot along the "number of events" dimension for this patch and fill it with the current day of year (see some other functions in clm_time_manager.F90 that you can use for this purpose); similarly for harvest, etc.
  • In the long name for the relevant history variables, say that it should only be output annually; there's no way to force that

For outputting a multi-dimension variable, this is as good an example as any:

call hist_addfld2d (fname='ALBGRD', units='proportion', type2d='numrad', &

You will first need to add this new dimension as one of the allowable choices in histFileMod.F90; search for 'numrad' in that module and follow the example (there are a handful of places where you need to add references to your new dimension).

To have annual output, I think the best thing to do is:

  • Make the field inactive by default (with the default='inactive' argument to hist_addfld2d)
  • In your user_nl_clm, set up an annual output file that includes the new fields; if you haven't done something like this before, there are some good examples in the following file; see hist_nhtfrq in particular:

https://github.com/escomp/CTSM/blob/master/cime_config/usermods_dirs/_includes/output_base/user_nl_clm#L1

I'll let you judge whether this all feels easier or harder – both for initial development and from the user perspective – than your original plan.

@samsrabin
Copy link
Collaborator Author

@billsacks I've given this a go, and I think I'm running into issues because this I made it an instantaneous variable (avgflag='I' instead of A) that gets reset during the first timestep of the year. If I set hist_nhtfrq to either -8760 or 17520, I get just -1 output (the value it gets reset to). Possible solutions:

  1. Output every 1 or 5 days, which will guarantee an output on December 31. This would presumably miss any sowings or harvests that occur on that day.
  2. Convert to avgflag='A', set hist_nhtfrq to -8760, and just multiply by 365 during postprocessing. This seems less than ideal.
  3. Have reset happen during second timestep of the year, instead of the first. Prevent sowing and harvest during the first timestep.

What do you think? Are there any ideas I'm missing?

@billsacks
Copy link
Member

Oh, good point about needing to use avgflag='I' for this: I didn't think about that crucial point, so I'm glad you figured it out. It seems to me like this should work, as long as you are using the right logic to decide on the first time step of the year. Are you using is_beg_curr_year to determine this or something else? Note that (a bit unintuitively) the first time step of the year will have the current time = dtime, not 0 (because current time gives the time at the end of the time step); this should be handled correctly by is_beg_curr_year. If you're doing that correctly, then I'm not sure why this isn't working: I think annual history files should be output in the last time step of the year, which should be a time step before the reset happens.

This may be more efficient to work out in person or over Zoom. I just invited you to our weekly "stand-up" meeting this afternoon. We could discuss it then if that time works for you, or you can send me an invite for another time - 3:30 - 4:30 today, or sometime tomorrow.

@samsrabin
Copy link
Collaborator Author

is_beg_curr_year() did the trick!

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability
Projects
None yet
Development

No branches or pull requests

3 participants