-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Sharding Prototype #1
Conversation
chunking_test.py
Outdated
import zarr | ||
|
||
store = zarr.DirectoryStore("data/chunking_test.zarr") | ||
z = zarr.zeros((20, 3), chunks=(3, 3), shards=(2, 2), store=store, overwrite=True, compressor=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shards
is specified in units of chunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very promising to me 👍
Closing this, all further action is happening at the original zarr-developers repo |
Internal only: Please provide any feedback you find, the below part is just the description for the PR on the official zarr repo.
This PR is for an early prototype of sharding support, as described in the corresponding issue TODO. It mainly be used to discuss the overall implementation approach for sharding. This PR is not meant to be merged.
This prototype
arr.zeros((20, 3), chunks=(3, 3), shards=(2, 2), …)
).One shard corresponds to one storage key, but can contain multiple chunks:
.zarray
config and loaded when opening an array again,ShardedStore
class that is used to wrap the chunk-store when sharding is enabled. This store handles the grouping of multiple chunks to one shard and transparently reads and writes them via the inner store. The original store API does not need to be adapted, it just stores shards instead of chunks, which are translated back to chunks byShardedStore
.chunking_test.py
for demonstration purposes, this will not be part of the final PR.If the overall direction of this PR is pursued, the following steps (and possibly more) are missing:
getitems
,setitems
&delitems
on theShardedStore
(also document such optimization possibilities on the
Store
orBaseStore
class)Array
where possible (e.g. indigest
&_resize_nosync
)