-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What is your issue?
The docs for Dataset.to_zarr say that if align_chunks=True, then safe_chunks is set to False:
align_chunks (bool, default False) – If True, rechunks the Dask array to align with Zarr chunks before writing. This ensures each Dask chunk maps to one or more contiguous Zarr chunks, which avoids race conditions. Internally, the process sets safe_chunks=False and tries to preserve the original Dask chunking as much as possible. Note: While this alignment avoids write conflicts stemming from chunk boundary misalignment, it does not protect against race conditions if multiple uncoordinated processes write to the same Zarr array concurrently.
However, I get the error
ValueError: Specified Zarr chunks encoding['chunks']=(480, 20, 293) for variable named 'mean_age_particles_e' would overlap multiple Dask chunks. Check the chunk at position 0, which has a size of 480 on dimension 0. It is unaligned with backend chunks of size 480 in region slice(744, None, None). Writing this array in parallel with Dask could lead to corrupted data. To resolve this issue, consider one of the following options: - Rechunk the array using `chunk()`. - Modify or delete `encoding['chunks']`. - Set `safe_chunks=False`. - Enable automatic chunks alignment with `align_chunks=True`.
if I use align_chunks=True without specifying safe_chunks=False.
For instance, if I have two netcdf files with around 600-700 time values, the following fails:
with temp_store() as store:
chunks = {"time": 480, "latitude": 293, "longitude": 391, "height": 20}
with xr.open_dataset(fp_path / fp_files[0]) as ds:
# ds.sizes["time"] is 744, so the last chunk written will be incomplete
ds = ds.chunk(chunks)
ds.to_zarr(store)
with xr.open_mfdataset([fp_path / fp_files[1]]) as ds:
ds.to_zarr(store, mode="a", append_dim="time", align_chunks=True)
But it works if I replace the last line with
ds.to_zarr(store, mode="a", append_dim="time", safe_chunks=False, align_chunks=True)