-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
virtual variables not available when using open_dataset #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, this is certainly related to #118. Virtual variables work by using pandas.DatetimeIndex methods, but if you're not using a standard calendar, you end up with an object array of netCDF4.datetime objects instead of an array of numpy.datetime64 objects (which can be turned into a DatetimeIndex). Unfortunately, we do need to be able to make a DatetimeIndex to be able to use its (very quick) calculations for properties like year. The alternative is to write our own implementation, which would likely mean far slower pure-python code. We could also write a function to cast an array into a DatetimeIndex from datetime objects, which I'm guessing would be your preferred solution, even though there are issues like the difference between dates, as DatetimeIndex objects and numpy's datetime64 always assume a standard gregorian calendar. |
I think the simplest option would be to develop a function to cast the Does I've run into issues like this repeatedly and I think it would be really nice if the |
Timedelta operations are used in exactly one place in xray: speeding up decoding of dates from netCDF if a standard calendar is being used. Otherwise, that sort of stuff is left up to the user. If dates with non-standard calendars can generally be most usefully expressed as a pandas.DatetimeIndex, then let's go ahead and default to decoding them into datetime64 arrays. The relevant function to modify is here (see also here) if you'd like to make a pull request! |
Ok, I just spent a few minutes working through a possible (although not ideal) solution for this. It works although it is a bit ugly and quite a bit slower than the standard calendar option. This option returns a In [1]:
import pandas as pd
from netCDF4 import num2date, date2num
import datetime
import numpy as np
from xray.conventions import decode_cf_datetime as decode
units = 'days since 0001-01-01'
# pandas time range
times = pd.date_range('2001-01-01-00', end='2001-06-30-23', freq='H')
# numpy array of numeric dates on noleap calendar
noleap_time = date2num(times.to_pydatetime(), units, calendar='noleap')
# numpy array of numeric dates on standard calendar
std_time = date2num(times.to_pydatetime(), units, calendar='standard')
# decoding function using datetime intermediary
def nctime_to_nptime(times):
new = np.empty(len(times), dtype='M8[ns]')
for i, t in enumerate(times):
new[i] = np.datetime64(datetime.datetime(*t.timetuple()[:6]))
return new
In [2]:
# decode noleap_time
%timeit nctime_to_nptime(decode(noleap_time, units, calendar='noleap'))
noleap_datetimes = nctime_to_nptime(num2date(noleap_time, units, calendar='noleap'))
print 'dtype:', noleap_datetimes.dtype
print noleap_datetimes
10 loops, best of 3: 38.8 ms per loop
dtype: datetime64[ns]
['2000-12-31T16:00:00.000000000-0800' '2000-12-31T17:00:00.000000000-0800'
'2000-12-31T18:00:00.000000000-0800' ...,
'2001-06-30T14:00:00.000000000-0700' '2001-06-30T15:00:00.000000000-0700'
'2001-06-30T16:00:00.000000000-0700']
In [3]:
# decode std_time using vectorized converter
%timeit decode(std_time, units, calendar='standard')
standard_datetimes = decode(std_time, units, calendar='standard')
print 'dtype:', standard_datetimes.dtype
print standard_datetimes
1000 loops, best of 3: 243 µs per loop
dtype: datetime64[ns]
['2000-12-31T16:00:00.000000000-0800' '2000-12-31T16:59:59.000000000-0800'
'2000-12-31T17:59:59.000000000-0800' ...,
'2001-06-30T13:59:59.000000000-0700' '2001-06-30T14:59:59.000000000-0700'
'2001-06-30T15:59:59.000000000-0700'] Two three things to notice here:
|
Those precision issues are unfortunate but perhaps unavoidable in this case because you are representing dates as floating point numbers -- the units are in "days" but the frequency between time points is measured in "hours". |
…o support virtual_variable indexing, pydata#121
The tutorial provides an example of how to use xray's
virtual_variables
. The same functionality is not availble from a Dataset object created by open_dataset.Tutorial:
however, reading a time coordinate / variable from a netCDF4 file, and applying the same logic raises an error:
Is there a reason that the virtual time variables are only available if the dataset is created from a pandas date_range? Lastly, this could be related to the #118 .
The text was updated successfully, but these errors were encountered: