-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix saving chunked datasets with zero length dimensions #5742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added test that fails when saving to zarr a dataset with a chunked array that has a dimension of length zero (Issue pydata#5741)
This addresses Issue pydata#5741 and allows `test_save_emptydim` to pass. We get around `to_zarr` not liking dask arrays with zero length dimensions by giving it numpy arrays, which works for some reason
Thanks for the PR @jaicher — it looks good. I'm going to mark it as mergable and then we can merge in a couple of days if no one who knows more than me re dask comments |
snowman2
pushed a commit
to snowman2/xarray
that referenced
this pull request
Feb 9, 2022
* Added test_save_emptydim for zarr backends, which fails when chunking Added test that fails when saving to zarr a dataset with a chunked array that has a dimension of length zero (Issue pydata#5741) * Load all variables with zero entries before saving to_zarr This addresses Issue pydata#5741 and allows `test_save_emptydim` to pass. We get around `to_zarr` not liking dask arrays with zero length dimensions by giving it numpy arrays, which works for some reason * Updated whats-new.rst with information about fix for pydata#5741 Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
jaicher
added a commit
to jaicher/xarray
that referenced
this pull request
Mar 23, 2022
If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (pydata#5742). If the `chunks` parameter is provided for other dimensions when loading the Zarr file, xarray gives a warning about potentially degraded performance from splitting the single chunk. When the array has zero size, this warning seems inappropriate because: - performance degradation on an empty array should be negligible. - we don't always know if one of the dimensions is empty until loading. I would use the `chunks` parameter for dimensions that have known chunksize (to specify some multiple of that chunksize), but this only works without warning when the array is nonempty.
dcherian
added a commit
that referenced
this pull request
Apr 9, 2022
* Test with warning when loading Zarr with empty dimension with chunks If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the `chunks` parameter is provided for other dimensions when loading the Zarr file, xarray gives a warning about potentially degraded performance from splitting the single chunk. When the array has zero size, this warning seems inappropriate because: - performance degradation on an empty array should be negligible. - we don't always know if one of the dimensions is empty until loading. I would use the `chunks` parameter for dimensions that have known chunksize (to specify some multiple of that chunksize), but this only works without warning when the array is nonempty. * Don't check chunk compatibility if variable is empty/has no size * Docs describing removal of warning for `chunks` with empty array Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes #5741 by loading to memory all variables with zero length before saving with
Dataset.to_zarr()
pre-commit run --all-files
whats-new.rst