Skip to content

Conversation

jaicher
Copy link
Contributor

@jaicher jaicher commented Aug 26, 2021

This fixes #5741 by loading to memory all variables with zero length before saving with Dataset.to_zarr()

Added test that fails when saving to zarr a dataset with a chunked array
that has a dimension of length zero (Issue pydata#5741)
This addresses Issue pydata#5741 and allows `test_save_emptydim` to pass. We
get around `to_zarr` not liking dask arrays with zero length dimensions
by giving it numpy arrays, which works for some reason
@github-actions
Copy link
Contributor

github-actions bot commented Aug 26, 2021

Unit Test Results

         6 files           6 suites   56m 5s ⏱️
16 230 tests 14 494 ✔️ 1 736 💤 0
90 576 runs  82 396 ✔️ 8 180 💤 0

Results for commit 7a4b2c3.

♻️ This comment has been updated with latest results.

@max-sixty
Copy link
Collaborator

Thanks for the PR @jaicher — it looks good. I'm going to mark it as mergable and then we can merge in a couple of days if no one who knows more than me re dask comments

@max-sixty max-sixty added the plan to merge Final call for comments label Sep 13, 2021
@max-sixty max-sixty merged commit 97887fd into pydata:main Oct 10, 2021
snowman2 pushed a commit to snowman2/xarray that referenced this pull request Feb 9, 2022
* Added test_save_emptydim for zarr backends, which fails when chunking

Added test that fails when saving to zarr a dataset with a chunked array
that has a dimension of length zero (Issue pydata#5741)

* Load all variables with zero entries before saving to_zarr

This addresses Issue pydata#5741 and allows `test_save_emptydim` to pass. We
get around `to_zarr` not liking dask arrays with zero length dimensions
by giving it numpy arrays, which works for some reason

* Updated whats-new.rst with information about fix for pydata#5741

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
jaicher added a commit to jaicher/xarray that referenced this pull request Mar 23, 2022
If an array has zero size (due to an empty dimension), it is saved as a
single chunk regardless of Dask chunking on other dimensions (pydata#5742).
If the `chunks` parameter is provided for other dimensions when loading
the Zarr file, xarray gives a warning about potentially degraded
performance from splitting the single chunk.

When the array has zero size, this warning seems inappropriate because:

- performance degradation on an empty array should be negligible.
- we don't always know if one of the dimensions is empty until loading.
  I would use the `chunks` parameter for dimensions that have known
  chunksize (to specify some multiple of that chunksize), but this only
  works without warning when the array is nonempty.
dcherian added a commit that referenced this pull request Apr 9, 2022
* Test with warning when loading Zarr with empty dimension with chunks

If an array has zero size (due to an empty dimension), it is saved as a
single chunk regardless of Dask chunking on other dimensions (#5742).
If the `chunks` parameter is provided for other dimensions when loading
the Zarr file, xarray gives a warning about potentially degraded
performance from splitting the single chunk.

When the array has zero size, this warning seems inappropriate because:

- performance degradation on an empty array should be negligible.
- we don't always know if one of the dimensions is empty until loading.
  I would use the `chunks` parameter for dimensions that have known
  chunksize (to specify some multiple of that chunksize), but this only
  works without warning when the array is nonempty.

* Don't check chunk compatibility if variable is empty/has no size

* Docs describing removal of warning for `chunks` with empty array

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plan to merge Final call for comments

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset.to_zarr fails on dask array with zero-length dimension (ZeroDivisonError)

2 participants