-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Append along an unlimited dimension to an existing netCDF file #1672
Comments
Any updates on this? |
None that I'm aware of. I think this issue is still in the "help wanted" stage. |
I would love to have this capability. As @shoyer mentioned, for adding time steps of any sort to existing netcdf files would be really beneficial. The only real alternative is to save a netcdf file for each additional time step...even if there are tons of time steps and each file is a couple hundred KBs (which is my situation with NASA data). I'll look into it if I get some time... |
This would be extremely helpful for our modelling of time varying renewable energy. |
I think I got a basic prototype working. That said, I think a real challenge lies in supporting the numerous backends and lazy arrays. For example, I was only able to add data in peculiar fashions using the netcdf4 library which may trigger complex computations many times. Is this a use case that we must optimize for now? |
Small prototype, but maybe it can help boost the development. import netCDF4
def _expand_variable(nc_variable, data, expanding_dim, nc_shape, added_size):
# For time deltas, we must ensure that we use the same encoding as
# what was previously stored.
# We likely need to do this as well for variables that had custom
# econdings too
if hasattr(nc_variable, 'calendar'):
data.encoding = {
'units': nc_variable.units,
'calendar': nc_variable.calendar,
}
data_encoded = xr.conventions.encode_cf_variable(data) # , name=name)
left_slices = data.dims.index(expanding_dim)
right_slices = data.ndim - left_slices - 1
nc_slice = (slice(None),) * left_slices + (slice(nc_shape, nc_shape + added_size),) + (slice(None),) * (right_slices)
nc_variable[nc_slice] = data_encoded.data
def append_to_netcdf(filename, ds_to_append, unlimited_dims):
if isinstance(unlimited_dims, str):
unlimited_dims = [unlimited_dims]
if len(unlimited_dims) != 1:
# TODO: change this so it can support multiple expanding dims
raise ValueError(
"We only support one unlimited dim for now, "
f"got {len(unlimited_dims)}.")
unlimited_dims = list(set(unlimited_dims))
expanding_dim = unlimited_dims[0]
with netCDF4.Dataset(filename, mode='a') as nc:
nc_dims = set(nc.dimensions.keys())
nc_coord = nc[expanding_dim]
nc_shape = len(nc_coord)
added_size = len(ds_to_append[expanding_dim])
variables, attrs = xr.conventions.encode_dataset_coordinates(ds_to_append)
for name, data in variables.items():
if expanding_dim not in data.dims:
# Nothing to do, data assumed to the identical
continue
nc_variable = nc[name]
_expand_variable(nc_variable, data, expanding_dim, nc_shape, added_size)
from xarray.tests.test_dataset import create_append_test_data
from xarray.testing import assert_equal
ds, ds_to_append, ds_with_new_var = create_append_test_data()
filename = 'test_dataset.nc'
ds.to_netcdf(filename, mode='w', unlimited_dims=['time'])
append_to_netcdf('test_dataset.nc', ds_to_append, unlimited_dims='time')
loaded = xr.load_dataset('test_dataset.nc')
assert_equal(xr.concat([ds, ds_to_append], dim="time"), loaded) |
hi - i consider this extremely useful!!! is your prototype already part of some library (or should we expect it in xr?) many thanks for the code |
It isn't really part of any library. I don't really have plans of making it into a public library. I think the discussion is really around the xarray API, and what functions to implement at first. Then somebody can take the code and integrate it into the decided upon API. |
Any movement on this? I'd love to have this -- kinda critical for some of my work. @hmaarrfk seems to have made a start, and it doesn't look too hairy :-) |
This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps:
https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks
It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to
to_netcdf()
, e.g.,extend='time'
to indicate the extended dimension.The text was updated successfully, but these errors were encountered: