-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
to_netcdf with decoded time can create file with inconsistent time:units and time_bounds:units #2921
Comments
It looks like The possibility of this is alluded to the in a comment in #2436, which relates the issue to #1614.
produces
|
Thanks for the great report @klindsay28. Looks like CF recommends that
We already have special treatment for @klindsay28 Any interest in putting together a PR that would avoid setting these attributes on Ping @fmaussion for feedback. |
+1 I didn't fully appreciate how strict the CF conventions were regarding this at the time of #2571. Where it is unambiguous I agree we should make an effort to comply (or preserve compliance). |
Today @lukasbrunner and me ran into this problem. Opening an mfdataset and saving
Opening the dataset in
the first date is Workaroundimport xarray as xr
filenames = ['file1.nc', 'file2.nc']
ds = xr.open_mfdataset(fNs)
ds.load()
# make sure the encoding is really empty
assert not ds.time.encoding
# assign encoding, such that they are equal
ds.time.encoding.update(ds.time_bnds.encoding)
# save
ds.to_netcdf('~/F/test.nc') Btw. thanks to @klindsay28 for the nice error report. |
Fixes pydata#2921 1. Removes certain attributes from bounds variables on encode. 2. open_mfdataset: Sets encoding on variables based on encoding in first file.
* Don't set attributes on bounds variables. Fixes #2921 1. Removes certain attributes from bounds variables on encode. 2. open_mfdataset: Sets encoding on variables based on encoding in first file. * remove whitespace stuff. * Make sure variable exists in first file before assigning encoding * Make sure we iterate over coords too. * lint fix. * docs/comment fixes. * mfdataset encoding test. * time_bounds attrs test + allow for slight CF non-compliance. * I need to deal with encoding! * minor fixes. * another minor fix. * review fixes. * lint fixes. * Remove encoding changes and xfail test. * Update whats-new.rst
Change output format from NETCDF4 to NETCDF4_CLASSIC (for backwards compatibility; *believe* but haven't tested that this still permits writing >2Gb output files). If variable has a time axis, make it unlimited. Set units on the time bounds variable, if present, to address issue noted in pydata/xarray#2921 (comment)
Issue pydata#2921 is about mismatching time units between a time variable and its "bounds" companion. However, pydata#2965 does more than fixing pydata#2921, it removes all double attributes from "bounds" variables which has the undesired side effect that there is currently no way to save them to netcdf with xarray. Since the mentioned link is a recommendation and not a hard requirement for CF compliance, these attributes should be left to the caller to prepare the dataset variables appropriately if required. Reduces the amount of surprise that attributes are not written to disk and fixes pydata#8368.
Code Sample, a copy-pastable example if possible
Problem description
The above code initially creates 2 netCDF files, for Jan-1850 and Feb-1850, that have the variables
time
andtime_bounds
, andtime:bounds='time_bounds'
. It then reads the 2 files back in as a single Dataset, usingopen_mfdataset
, and this Dataset is written back out to a netCDF file. ncdump of this final file isThe problem is that the units attribute for
time
andtime_bounds
are different in this file, contrary to what CF conventions requires.The final call to
to_netcdf
is creating a file wheretime
's units (and type) differ from what they are in the intermediate files. These transformations are not being applied totime_bounds
.While the change to
time
's type is not necessarily an issue, I do find it surprising.This inconsistency goes away if either of
decode_times
ordecode_cf
is set toFalse
in the python code above. In particular, the transformations totime
's units and type do not happen.The inconsistency also goes away if
open_mfdataset
opens a single file. In this scenario also, the transformations totime
's units and type do not happen.I think that the desired behavior is to either not apply the units and type transformations to
time
, or to also apply them totime_bounds
. The first option would be consistent with the current single-file behavior.xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.2.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 1.1.5
distributed: 1.26.1
matplotlib: 3.0.3
cartopy: None
seaborn: None
setuptools: 40.8.0
pip: 19.0.3
conda: None
pytest: 4.3.1
IPython: 7.4.0
sphinx: None
The text was updated successfully, but these errors were encountered: