-
-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read performance regression via Store backed by FSMap (fsspec, GCS) #1296
Comments
@ravwojdyla I am very interested in optimizing the performance of this code path. My first recommendation would be to try your test instead with store = zarr.FSStore(...)
np.asarray(zarr.open(store)) rather than |
@rabernat thanks for a quick response. Yes that will bypass the issue the same way reading via |
Ok, thanks for confirming that the problem goes away if you pass an We could resolve the regression by automatically promoting an FSMap to an FSStore. We discussed this a bit in #911 (comment). I believe that the only blocker to that idea was resolved in fsspec/filesystem_spec#939. |
@rabernat that sounds great! +1 |
Now this the time when I try to nerd-swipe you into making the PR yourself! 🤓 Basically, we just need to add a block in this function (and also in the equivalent V3 version): Lines 132 to 156 in e7c0eb4
which checks Is this something you think you could take on? We would be happy to help guide you through your first PR to Zarr. We are always looking for new contributors, and it seems like you understand this stuff pretty well already. |
@rabernat well played :P Sure, will give it a try. Thank for the context ^ |
This was closed by #1304. |
Zarr version
2.11.0 and up (including current main)
Numcodecs version
0.10.2
Python Version
3.10
Operating System
Mac and Linux
Installation
conda, pip and from source
Description
It appears that #789, commit: 5c71212 so from zarr 0.11.0, there's a performance regression that affects reading zarr data via Store backed by fsspec/FSMap.
In our test example (in practice we use xarray), we have a zarr array made of 2K files (total 1GB compressed), reading it via:
Looking at the stacktraces from the different versions, looks like 0.10.3 was asynchronous fetching multiple items, while 0.13.3 is synchronized per storage item?
zarr 0.13.3
zarr 0.10.3
Steps to reproduce
And we need to an existing zarr array to read:
Additional output
No response
The text was updated successfully, but these errors were encountered: