Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read in crop planting and harvest dates #519

Closed
billsacks opened this issue Sep 21, 2018 · 9 comments · Fixed by #1863
Closed

Read in crop planting and harvest dates #519

billsacks opened this issue Sep 21, 2018 · 9 comments · Fixed by #1863
Assignees
Labels
enhancement new capability or improved behavior of existing capability science Enhancement to or bug impacting science size: large Large project that will take a few weeks or more

Comments

@billsacks
Copy link
Member

As an alternative to predicting crop planting and harvest dates based on GDD, we want to provide the capability to read in planting and harvest dates from a dataset, for each crop in each grid cell.

There are some questions about how we would specify the timing of other phenological stages (e.g., the switch from vegetative to reproductive). @danicalombardozzi suggested that we might want to start by simply using time-based rather than GDD-based thresholds for the switch from one phase to another when using prescribed planting and harvesting dates (e.g., the switch to the reproductive phase happens when 60% of the time has elapsed between planting and harvest).

Alternatively, we could continue to use GDD-based triggers for other phenological stages. @barlage described to me how this is done in Noah-MP when using fixed planting and harvest dates. What they do is: for a given year, determine the total GDD accumulated between planting and harvest in that year. They do that over multiple years to determine a typical GDD between planting and harvest. Then, for this coming year, they assume that there will be about that number of GDD between planting and harvest, and determine the GDD thresholds for other phenological stages based on this projected total. This could work if planting and harvesting dates change in time, too, as long as the changes aren't too big from one year to another.

One possible issue I see with the method @barlage described is if the harvest date this year disagrees a lot with the number of GDD in this year: Under our current GDD-based modeling, we'd let the harvest get pushed earlier or later. But with the method @barlage described, harvest would happen at a fixed date, even if (say) only 1700 GDD had been accumulated in (say) a 2000 GDD-based cultivar, and so you're harvesting long before the crop has reached maturity. I'm not sure how big of an issue this would be in practice.

My original assumption was that, in a given run, all crops in all grid cells would operate the same way: either with prognostic, GDD-based planting and harvest, or with prescribed planting and harvest dates. But I'm actually not sure about that: Do we need to allow for the possibility that some crops (or even some grid cells for a given crop) have prescribed planting and harvest dates but some do not?

@billsacks billsacks added the enhancement new capability or improved behavior of existing capability label Sep 21, 2018
@billsacks
Copy link
Member Author

May want to address #75 along with this.

@billsacks billsacks added the size: large Large project that will take a few weeks or more label Mar 9, 2020
@samsrabin
Copy link
Collaborator

This has been suggested as a good first item for me to tackle as a way to get my feet wet with CLM. Prescribed planting and harvest are something we require as part of the protocol of the Global Gridded Crop Model Intercomparison. Wim Thiery has done some GGCMI runs for the latest (CMIP6-based) phase, but because of a number of protocol violations it's unclear how useful they'll be as part of the larger GGCMI ensemble.

GGCMI specifies a method for dealing with GDD targets given time-invariant planting and harvest dates. The idea is to set each gridcell's target as the mean GDD accumulated in the crop's designated season over a certain baseline period. For GGCMI phase 3, I believe that baseline is 1980–2010. Then in the actual simulation, we don't set harvest date, but rather the GDD target. Instead of specifying the harvest date for each gridcell, it's effectively specifying the cultivar used in each gridcell.

So what I'm suggesting is adding the ability to read in planting dates and GDD targets as spatially-varying, crop-specific parameters. (It would be good to future-proof by allowing these to be temporally-varying as well—future GGCMI experiments will specify shifting growing seasons.) Reading in harvest date could also be allowed, but perhaps not scientifically supported, since as you suggest it would lead to crops being harvested before (or, indeed, long after) reaching maturity. Although, if we also let GDD targets vary over time, one could perform a run with specified planting and harvest date with a "cultivar" that reaches maturity exactly on time.

Re: timing phenological stages: GDD-based triggers, as happens in Noah-MP—with stage transitions happening at certain fractions of GDD accumulated—is how my previous model, LPJ-GUESS, handles this. At first I thought, based on the Phenology section of the Tech Note, that this was how things already happened in CLM. But looking at the code, it seems like (as you said) the actual thresholds and hui variables are in units of GDD, not fractions. Fortunately this seems like a relatively tractable problem—changing units rather than changing the actual structure of how phenology works.

Re: Some crops and/or gridcells having one or more of these read in from a file, with others not: This is something we allow in LPJ-GUESS. Every crop in every gridcell does read a value from a file, but if that value is negative, it's ignored. In that case, the crop falls back on its global CFT-specific value. This functionality isn't too difficult to build in, so I'd lean towards including it, although it doesn't have to be a priority.

@samsrabin
Copy link
Collaborator

Alright, I've reached a decision point that I'd like some advice on. First, a review of the plan so far (from discussions elsewhere).

The plan so far

The way we've decided this should work for a given crop:

  1. CLM reads in maps of prescribed sowing dates. For now, these are static in time.
  2. CLM simulates planting on the prescribed day, then 364 days of "growth" with no harvest. This allows the accumulation of growing degree days (GDDs), which are then saved as an output.
  3. For each growing season as described in (2) in a given "baseline" period, a Python script calculates the total GDDs accumulated in each gridcell between the prescribed sowing day (read in by CLM) and the GGCMI-provided harvest day (not read in by CLM).
  4. The Python script then calculates the average growing-season GDDs for each gridcell.
  5. This is saved as the GDD target ("cultivar") map, which can be combined with the aforementioned maps of prescribed sowing dates in subsequent CLM runs.

An alternative to this would be for CLM to also read in maps of prescribed harvest dates in (1) and use that prescribed growing season in (2). We decided against this because the plan described above allows us to derive new cultivar maps when harvest inputs change without having to do a new run.

Current status

I've got sowing dates being read in reliably at this point, testing on the very coarse f10_f10_mg37 grid. I'm beginning work on the Python script to translate the output growing degree days into input cultivar files. This requires me to interpolate the prescribed sowing date files (0.5° resolution) to match the run grid, so I wanted to make sure that the interpolation method I use in Python gives the same values as whatever method CTSM uses. Unfortunately, there are some discrepancies. For example:

sdatescheck_irrigated_temperate_corn

(Top left is my interpolated map of prescribed sowing dates from Python. Top right is the original 0.5° map of prescribed sowing dates. Bottom left is the realized sowing date output for 2001. Bottom right is the difference between the top left and bottom left; white where no difference or not simulated.)

((You might notice that the top right map looks kinda wild. The original prescribed sowing date map only included land cells. I expected CTSM to gracefully select nearest-neighbor grid cells ignoring missing values, but alas it did not, so I did that interpolation myself using cdo -remapnn.))

Weirdly, only a few gridcells in addition to the ones in that bottom right map ever have a problem, and they're all in that same Northern Hemisphere latitude band.

Possible solutions

At least what's come to mind so far…

  1. Replicate CLM's interpolation in Python. I anticipate this being extremely frustrating and ultimately unproductive.
  2. Use the alternate method described in the section above. This would remove the need to do any interpolation in Python, as the output growing degree days would already be accumulated over the correct time period. That is, I wouldn't have to read harvest dates into Python and interpolate them.
  3. Only ever generate cultivar files at 0.5° resolution, and let CLM interpolate to other grids. This would also remove the need to interpolate harvest dates in Python, as I could just use the prescribed harvest date files directly. I would still want to check that how Python reads the sowing dates matches how CLM reads them, though, and that's not guaranteed.

Any thoughts?

Oh, and also:

It's weird that CTSM didn't gracefully select nearest-neighbor grid cells ignoring missing values. I'm wondering if maybe I'm doing this whole "stream" files thing wrong somehow. For what it's worth, I'm using MCT at the moment; maybe it'd be better to switch to nuopc.

@billsacks
Copy link
Member Author

@samsrabin thanks a lot for your detailed comment. I haven't gotten my head entirely around the issue, but I think that, for the workflow of generating gdd maps, I'd be inclined to do the CTSM run at the same resolution as the sowing dates - i.e., 0.5 deg. Then you don't need to worry about interpolation issues, which as you're finding can be very frustrating to work through! So I think it's worth the extra computational expense, as long as you're not going to need to redo this very frequently (which it sounds like you probably won't).

Regarding stream interpolation, and especially your last comment under "Oh, and also": @ekluzek is the expert on that, so I'll let him comment, or you can reach out to him with questions.

Let me know if you'd like to talk more about this.

@samsrabin
Copy link
Collaborator

samsrabin commented Apr 15, 2022

I think I've got this working acceptably, but we should probably have a discussion about some things.

I'll get the least interesting thing out of the way first: I'm using "solution 2" from my post on Dec. 1. Since I'm running my tests on the f10_f10_mg37 grid, I'm first converting the GGCMI crop calendars from 0.5° to that resolution.

Now the fun stuff! I'll focus on irrigated spring wheat for now, just to illustrate things. v0 will refer to the CLM-native runs, v1 to my runs forced with GGCMI-prescribed sowing date and GGCMI-calendar-derived harvest threshold. All maps show mean over the 1980–2009 growing seasons, unless otherwise specified.

My prescribed inputs are obeyed:

sdate_0vs1_wheat_spring_ir

harvest_thresh_0vs1_wheat_spring_ir

But the mean HUI at harvest is often too low:
hui_0vs1_wheat_spring_ir

This suggests that it's getting harvested before it's mature. Indeed:
harvest_reason_0vs1_wheat_spring_ir

This happens so often because, in many places, the growing season length from the GGCMI crop calendars exceeds the maximum growing season length allowed (mxmat) by CLM (note the break in the colorbar at mxmat):
seas_length_0vs1_wheat_spring_ir

To alleviate this, I did another run with mxmat ignored. Instead, if the crop never matured, harvest would occur the day before the next prescribed sowing. v1 in subsequent figures will be from this new run; v0 will be the same as v0 before.

This mitigates the issue, although of course some patches still never mature (which is fine):
harvest_reason_0vs1_wheat_spring_ir 1

In subsequent figures, I'll be excluding any patch-seasons where the crop never reached maturity.
seas_length_ifmature_0vs1_wheat_spring_ir

This is still not a perfect correspondence to the GGCMI crop calendar (e.g., most of New Zealand), but it's much closer.

The remaining discrepancy is in the range of that of existing models in the GGCMI ensemble (these were run at 0.5°):
seas_length_compGGCMI_ifmature_diffExpected_wheat_spring_ir

It's a similar story for most other crops.

So overall, it's looking pretty good! But there are some things to discuss.

Should we actually relax mxmat?

Definitely yes for GGCMI runs, where it's necessary to comply with the protocol. But for other runs, I don't know. I'm not sure how the original mxmat values were derived.

Winter wheat

I haven't done this for winter wheat yet. I don't think the process will be any different, but I might be surprised. With LPJ-GUESS, I had to also prescribe vernalization degree-days.

Sugarcane

Sugarcane is weird. The GGCMI calendar seems to assume a season length of about 364 days almost everywhere, which makes sense: Let it grow as long as you want because it's not an annual, so it's not going to fill a harvestable organ and then die.

But this presents a problem for my prescribed harvest target, which is the mean GDD accumulation (from a different run) over those 364 days. We expect that about half the seasons will be cooler than average and thus not accumulate enough GDDs to reach maturity. Indeed:
harvest_reason_0vs1_sugarcane_rf
seas_length_ifmature_0vs1_sugarcane_rf

(Why do some cells always reach maturity? I think this is due to the HUI "boost" that occurs if a crop reaches full LAI before it's reached the GDD threshold for the end of the leaf-out period, but I haven't tested this.)

It's not obvious how to deal with this, as it depends on whether "yield" (i.e., the part of the harvestable organ that enters the food system) is being reduced to 0 in postprocessing for patch-seasons that didn't reach maturity, as I think is done for GGCMI. If that's the usual method, I would say to not do that for sugarcane, or maybe to set some harvest threshold above which it's not done.

I'll raise this question with GGCMI folks, as it has implications for work with the ensemble outputs, but of course we can make our own decisions for non-GGCMI analyses.

Next steps before merge

  1. Test at 0.5° resolution.
  2. See what happens if I use 0.5° resolution inputs in a run on f10_f10_mg37—does interpolation look reasonable?
  3. Decide whether to relax mxmat (for non-GGCMI runs, at least).
  4. Include winter wheat. (Waiting on Jyoti to get it stable.)
  5. Figure out how to deal with sugarcane in postprocessing.
  6. Anything else?

@danicalombardozzi
Copy link
Contributor

Thanks @samsrabin, this is really exciting to see! It is great that the prescribed planting dates seem to be working so well and that they are more spatially heterogeneous than what CLM predicts. I know @pengbinpeluo had previously identified problems with HUI in CLM, and it's great to see the reason behind this (harvest before maturity). I haven't looked carefully at mxmat but I wonder how this value was identified (is it from AgroIBIS?) and if it varies by crop type in CLM. It's definitely worth revisiting this and figuring out what a reasonable growing season length is for each crop type and whether mxmat is limiting. It seems like perhaps the GGCMI data can be helpful here?

Are spring and winter wheat and sugarcane the only crop types you plan to use, or will you eventually work on other crops (e.g., soy, corn, etc.)? It might also be interesting to see the impact on yields and how the updates compare with observations. I will need to dig around, but I should be able to point you to some gridded observational datasets.

@samsrabin
Copy link
Collaborator

Oh yeah, I should have been clear about that—I actually ran this for corn (temperate and tropical), soybean (likewise), sugarcane, rice, cotton, and spring wheat. That is, everything in the standard f10_f10_mg37 setup. If you want to see any figures for any of those, let me know, I've got them all.

I was also wondering where mxmat came from. I couldn't find the AgroIBIS code anywhere online, but I'm assuming it came from there. It does indeed vary by crop type, e.g. 150 for spring wheat and 300 for sugarcane.

A comparison of yields to observation data—and specifically, how performance differs from normal CLM-Crop—is something I was planning for an eventual publication. I've previously worked with country-level yields compared to data from FAOStat, but if you have something gridded that'd be nice to have as well.

@billsacks
Copy link
Member Author

Thanks for this @samsrabin ! I don't have much to add right now, but will add a couple of notes:

  • As we discussed by email, it looks like mxmat did come from AgroIBIS, at least for corn, soy and wheat. Given that AgroIBIS was originally set up to target the central U.S., I would definitely question the applicability of its mxmat values in a global crop model.
  • The gridded yield dataset I'm familiar with is http://www.earthstat.org/harvested-area-yield-175-crops/ – not sure if @danicalombardozzi knows of something more recent than that.

@danicalombardozzi
Copy link
Contributor

I agree that it might be good to revisit the mxmat values for regions beyond the central US.

Regarding the global gridded yield dataset, @lawrencepj1 merged the earthstat dataset that @billsacks points to with the FAO historical yield dataset (by country) so that we have gridded yields through time for comparison.

@samsrabin samsrabin added the science Enhancement to or bug impacting science label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability science Enhancement to or bug impacting science size: large Large project that will take a few weeks or more
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants