-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Append/concat to existing zarr datastore #2022
Comments
This would probably make sense to think about along-side support for appending along an existing dimension in a netCDF file (#1672). I can see a few potential ways to write the syntax. Probably supplying a range of indices along a dimension to write to would make the most sense, e.g., |
My use case for this is appending Argo float data to an existing zarr store. At the moment I have 800+ netcdf files that need transforming before they can be added or read by xarray in *.nc type read. At the moment I read the first transform it and add to a zarr sort using .to_zarr. Then I proceed to read the next files and append each variable to zarr using zarr append function. This is probably not a good way to go but all that I could figure at the moment. @shoyer I think it would be useful to have a straight append mode: |
We may have people interested in working on this soon. I think we have some details to sort out regarding the api for appending. The most generic case looks something like this ds1 = xr.open_dataset('file1.nc')
# file2.nc already exists
ds1.to_netcdf('file2.nc', mode='a+') We need to figure out what should happen under different circumstances. Some cases are:
It seems like much of the logic for overlapping dimension should be able to be handled via |
I'm pretty sure the coordinates will just get overwritten, too, at least as long as the coordinate arrays have the same shape. If they have different shapes, you probably will get an error. We certainly don't do any checks for alignment currently.
This is only case I would try to solve to the initial implementation. It's probably 20% of the work (to add a keyword argument like If we need alignment, I'm sure we could make that work in a follow-up. Certainly it would be less error prone to use. |
@NickMortimer would you have snipped for appending xarray objects to existing zarr dataset? Would be indeed really nice to get this built-in into xarray, but that is just a matter of patience I guess :) Thanks! |
Patience...or action. Anyone is welcome and encouraged to submit a pull request on this topic. Xarray is a volunteer effort. |
Obviously. I'm just new to Zarr so a bit early to contribute to Xarray on that topic. |
Following discussion from pangeo-data/pangeo#19
How would we go about implementing a concat or append function for zarr data stores? I am imagining something like xr.concat here. Its not clear to me how this would work when using
open_mfdataset
.Problem description
If you are using cloud storage facility like gcs,
ds.to_zarr
can fail before the completion of the upload. This is a problem for multi-TB datasets as the entire process needs to be restarted without any way to resume where you left off.Expected Output
new zarr dataset with additional dataset appended along appropriate dim
The text was updated successfully, but these errors were encountered: