Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calendar issues, leap years, data types #3

Open
davemfish opened this issue Nov 2, 2022 · 7 comments
Open

Calendar issues, leap years, data types #3

davemfish opened this issue Nov 2, 2022 · 7 comments

Comments

@davemfish
Copy link
Contributor

Some GCM models follow a calendar that includes leap days (Feb, 29) while some do not (I assume, I haven't checked them all myself). I'm not sure of all the implications for this yet.

Presumeably, our observed historical records always include leap days. So I guess the question is, "what to do when the GCM calendar does not include leap days"?

@flector-co
Copy link
Collaborator

Well, my concern about leap days is only for the indexing of the time dimension of NetCDF -- which is only measured in "days after a {reference date}", rather than a standardized timestamp. A wrong assumption on the indexing, will accrue in the simulation period (typically of 2.5 centuries starting in 1-1-1850, --- other models even use as reference date 1-1-0001) and produce large shifts in the calendar, and induce a drift in seasonal patterns of the predictions (now I think about it, I wonder how many local studies claiming change in the climate seasonality of PR and TS are a result of these kind of errors).

Right now, the KNN method doesn't include leap days, and operates with years with 365 julian days, skipping the leap days points of the GCM. However, so far we have not encountered a use case where the lack of leap days is problematic (hydrological models like InVEST work on monthly averages which are marginally affected by leap days. Other models, like SWAT or WEAP have built-in interpolation methods to add the missing day).

@davemfish
Copy link
Contributor Author

When I open a GCM (CanESM5 in this case), I do see real calendar dates in the time dimension, but no leap days. Every year has 365 days.

mfds = xarray.open_mfdataset([historical_ds_path, future_ds_path])
mfds.time

array(['1850-01-01T12:00:00.000000000', '1850-01-02T12:00:00.000000000',
       '1850-01-03T12:00:00.000000000', ..., '2100-12-29T12:00:00.000000000',
       '2100-12-30T12:00:00.000000000', '2100-12-31T12:00:00.000000000'],
      dtype='datetime64[ns]')

Do some other GCM's include leap days?

Okay, so for the simulation period we pretend all years have 365 days. That's easy enough. @flector-co

@flector-co
Copy link
Collaborator

Yes, other GCMs include leap days. See for instance the MPI* models

@flector-co
Copy link
Collaborator

and for the simulation, is ok to work with years of 365 days

@davemfish
Copy link
Contributor Author

Here's a summary of the challenges and solutions we implemented so far.

To get a calendar of 365-day years, we can use xarray.date_range(start, stop, calendar='noleap'). That's convenient, but choosing that calendar means we get dates of type cftime https://unidata.github.io/cftime/api.html, which is inconvenient.

netCDF's time dimension can be read by xarray as cftime or as numpy.datetime64 (the default). Unfortunately, asking xarray to use cftime still does not make dates that are interoperable with calendar='noleap', so I chose to read netCDF time as numpy.datetime64.

We can convert to pandas.Timestamp or pandas.DatetimeIndex anytime we need to do operations (<, > ==, addition/subtraction, etc), or to have convenient attributes like date.month, date.day.

@davemfish davemfish changed the title Leap years Calendar issues, leap years, data types Nov 9, 2022
@davemfish
Copy link
Contributor Author

Once I tried out the original CMIP netCDFs, I discovered xarray cannot parse the time to numpy.datetime64 for any of them. It loads time as cftime objects from the "noleap" calendar. So that is actually convenient as that's the same calendar we want to use to create dates for the simulation period. And we can avoid translating through pandas after all. We can simply use cftime and python's builtin datetime.timedelta to do math-like operations.

cftime objects also have .month, .day attributes. And when they are part of an xarray DataArray time dimension, they are accessed like dataset.time.dt.month

@davemfish
Copy link
Contributor Author

In testing, it seems most of the CMIP models we have use a 365-day ("noleap") calendar. But at least one (MIROC6) uses a standard gregorian calendar, which includes leap-days.

As discussed, it should not make a difference analytically whether a given period of the GCM includes a leap day for some models and not for others.

For example, as we simulate for a date such as 2024-02-25, we extract a 15-day window from the GCM before & after that date and calculate the joint-probability matrix of precip transitions from that window.

Whether the window is,
2024-02-10 : 2024-03-11 (if Feb 29th exists) or
2024-02-10 : 2024-03-12 (if Feb 29th does not exist), does not matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants