-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calling to_zarr inside map_blocks function results in missing values #8703
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
Hello, I have the "gut feeling" that calling a method related to persisting data inside of the function passed to Is there a way to try something else, eg removing
and instead calling
I understand the issue regarding the preservation of chunks IDs. I would assume that the original order of chunks would be preserved, if the Dask chunks are identical to the Zarr chunks? Also, I tested repeated runs, and ultimately all ones are written. It really seems that some are skipped randomly with the current code. |
Could this be #8371 ? |
Unfortunately, I need to save slices from chunks into different zarr files for my real data.
Yes, it works. But it might be time-consuming. I observed that for 10 chunks I need from 3 to 14 |
This is #8371, you need to specify the correcty chunksizes when creating or writing Using
makes this succeed for me. |
Thank you. It works. I may have missed it somewhere, but it would be great if it was mentioned, for example, in https://docs.xarray.dev/en/stable/user-guide/io.html?appending-to-existing-zarr-stores=#appending-to-existing-zarr-stores |
We'd happily take a PR for that |
What happened?
I want to work with a huge dataset stored in hdf5 loaded in chunks. Each chunk contains part of my data that should be saved to a specific region of zarr files. I need to follow the original order of chunks.
I found it a convenient way to use a
map_blocks
function for this purpose. However, I have missing values in the final zarr file. Some chunks or parts of chunks are not stored.I used a simplified scenario for code documenting this behavior. The initial zarr file of zeros is filled with ones. There are always some parts where there are still zeros.
What did you expect to happen?
No response
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
xarray: 2024.1.1
pandas: 2.1.4
numpy: 1.26.3
scipy: 1.11.4
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: 2.16.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.7
dask: 2024.1.1
distributed: 2024.1.0
matplotlib: 3.8.2
cartopy: 0.22.0
seaborn: 0.13.1
numbagg: 0.6.8
fsspec: 2023.12.2
cupy: None
pint: None
sparse: None
flox: 0.8.9
numpy_groupies: 0.10.2
setuptools: 69.0.2
pip: 23.3.1
conda: None
pytest: 7.4.4
mypy: None
IPython: None
sphinx: None
The text was updated successfully, but these errors were encountered: