Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch default for Zarr reading/writing to consolidated=True? #5251

Closed
shoyer opened this issue May 3, 2021 · 4 comments
Closed

Switch default for Zarr reading/writing to consolidated=True? #5251

shoyer opened this issue May 3, 2021 · 4 comments

Comments

@shoyer
Copy link
Member

shoyer commented May 3, 2021

Consolidated metadata was a new feature in Zarr v2.3, which was released over two year ago (March 22, 2019).

Since then, I have used consolidated=True every time I've written or opened a Zarr store. As far as I can tell, this is almost always a good idea:

  • With local storage, it usually doesn't really matter. You spend a bit of time writing the consolidated metadata and have one extra file on disk, but the overhead is typically negligible.
  • With Cloud object stores or network filesystems, it can matter quite a large amount. Without consolidated metadata, these systems can be unusably slow for opening datasets. Cloud storage is of course the main use-case for Zarr. If you're using a local disk, you might as well stick with single files such as netCDF.

I wonder if consolidated metadata is mature enough now that we could consider switching the default behavior in Xarray. From my perspective, this is a big "gotcha" for getting good performance with Zarr. More than one of my colleagues has been unimpressed with the performance of Zarr until they learned to set consolidated=True.

I would suggest doing this in way is almost entirely backwards compatible, with only a minor performance costs for reading non-consolidated datasets:

  • to_zarr() switches the default to consolidated=True. The consolidate_metadata() will thus happen by default.
  • open_zarr() switches the default to consolidated=None, which means "Try reading consolidated metadata, and fall-back to non-consolidated if that fails." This will be slightly slower for non-consolidated metadata due to the extra file-lookup, but given that opening with non-consolidated metadata already requires a moderately large number of file look-ups, I doubt anyone will notice the difference.

CC @rabernat

@shoyer shoyer changed the title Switch default for Zarr reading/writing consolidated=True? Switch default for Zarr reading/writing to consolidated=True? May 3, 2021
@shoyer
Copy link
Member Author

shoyer commented May 4, 2021

I see six 👍 on this issue so I'm going to go ahead and get started :)

@shoyer
Copy link
Member Author

shoyer commented May 4, 2021

I pushed this change in another commit to #5252.

@hammer
Copy link

hammer commented Aug 30, 2021

Should this be closed now that #5252 has gone in?

@dcherian
Copy link
Contributor

Thanks @hammer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants