-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add array storage helpers #2065
Conversation
@tomwhite let me know if this looks workable for you |
Thanks @d-v-b this looks great! I wondered why you deprecated |
my thinking for this is twofold:
does this check out? I'm sorry if the warnings are inconvenient, but I really would like to find a proper expression of v3 semantics on the |
to expand on this: v3 introduces two kinds of chunks, read-chunks and write chunks. the number of read chunks may not equal the number of write chunks. so where we had 1 |
…nto add-array-storage-helpers
…ods, and they can take an origin kwarg
…nto add-array-storage-helpers
…array-storage-helpers
…array-storage-helpers
…array-storage-helpers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-v-b - First of all, thanks for working on this! I must say I've soured a bit on deprecating some of these interfaces for the 3.0 release unless we have something to replace them. There's nothing to keep us from deprecating these in 3.1 (for example) if we come up with a new interface. What do you think about removing the warnings for now and coming back to this with a new interface down the road?
The number of chunks in an array is something that should always be well-defined.
to expand on this: v3 introduces two kinds of chunks, read-chunks and write chunks. the number of read chunks may not equal the number of write chunks. so where we had 1 nchunks quantity in v2, v3 has two possible answers to nchunks. that's why it is not straightforward to commit to this aspect of the array API.
I understand that we're trying to incorporate sharding here. At the risk of opening up a big can of worms, I think a we may be taking this too far. To me its much easier to think of chunks as the minimal block of data. Beyond that, sharding may allow you to store many chunks in a single object.
@@ -443,6 +449,55 @@ def basename(self) -> str | None: | |||
return self.name.split("/")[-1] | |||
return None | |||
|
|||
@property | |||
@deprecated("AsyncArray.cdata_shape may be removed in an early zarr-python v3 release.") | |||
def cdata_shape(self) -> ChunkCoords: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious which of these helpers could migrate to the purview of the chunk grid.
That works for me!
Noted, I will pull back from the brink and make things more v2-ish again :) |
…array-storage-helpers
…e kwarg to grid iteration; make chunk grid iterators consistent for array and async array
@jhamman take a look when you have time, I think I addressed your concerns. |
I should point out that this PR also contains some changes unrelated the the array API, but I think they are useful improvements:
|
…nto add-array-storage-helpers
* v3: (21 commits) Default zarr.open to open_group if shape is not provided (zarr-developers#2158) feat: metadata-only support for storage transformers metadata (zarr-developers#2180) fix(async): set default concurrency to 10 tasks (zarr-developers#2256) chore(deps): drop support for python 3.10 and numpy 1.24 (zarr-developers#2217) feature(store): add LoggingStore wrapper (zarr-developers#2231) Apply assorted ruff/flake8-simplify rules (SIM) (zarr-developers#2259) Add array storage helpers (zarr-developers#2065) Apply ruff/flake8-annotations rule ANN204 (zarr-developers#2258) No need to run DeepSource any more - we use ruff (zarr-developers#2261) Remove unnecessary lambda expression (zarr-developers#2260) Enforce ruff/flake8-comprehensions rules (C4) (zarr-developers#2239) Use `map(str, *)` in `test_accessed_chunks` (zarr-developers#2229) Replace Gitter with Zulip (zarr-developers#2254) Enforce ruff/flake8-pytest-style rules (PT) (zarr-developers#2236) Fix multiple identical imports (zarr-developers#2241) Enforce ruff/flake8-return rules (RET) (zarr-developers#2237) Enforce ruff/flynt rules (FLY) (zarr-developers#2240) Fix fill_value handling for complex dtypes (zarr-developers#2200) Update V2 codec pipeline to use concrete classes (zarr-developers#2244) Apply and enforce more ruff rules (zarr-developers#2053) ...
This PR adds
nchunks
,nbytes
, andnchunks_initialized
functionality from 2.x.closes #2027
depends on #2064
details
Adds the following to
array.py
:(AsyncArray / Array).nchunks
: deprecated, the total number of chunks in the array. exists for 2.xx compatibility.(AsyncArray / Array).cdata_shape
: deprecated, the shape of the chunk grid. exists for 2.xx compatibility.(AsyncArray / Array).nbytes
: the total number of bytes that the array can store(AsyncArray / Array)._iter_chunk_coords
: an iterator over tuples of ints which represent positions in the chunk grid(AsyncArray / Array)._iter_chunk_regions
: an iterator over slices which represent the contiguous array region spanned by each chunk(AsyncArray / Array)._iter_chunk_keys
: an iterator over strings which represent the paths in storage for all the chunkschunks_initialized(array)
: a function that takes an array and returns a tuple of the chunk keys for that array that exist in storage. this also has tests.nchunks_initialized(array)
: deprecated, a function that callslen(chunks_initialized(array))
. this exists for 2.xx compatibility.All of the above
_iter_chunk_*
methods should be considered private and provisional. I added them because their functionality is valuable, but eventually I think we will have a better array API that renders these methods obsolete. If we think these are cluttering the array API, I'd be happy splitting them off into stand-alone functions.iter_grid
toindexing.py
, this just provides lexicographic iteration over the elements of a bounded N-dimensional, positive grid (e.g., a grid of chunks).TODO: