API for direct block access

There are internal methods for retrieving individual blocks, but there are some circumstances where addressing data one block at a time is helpful for end users, and would avoid the user having to do their own pipeline of chunk size -> block indices -> slicing only for zarr to then go slicing -> block indices etc. again.

I envision something like 

```python
@dataclass
class ChunkWrapper:
    chunk_idx: Tuple[int, ...]
    chunk_slice: Tuple[slice, ...]  # or an offset-shape pair, or a start-stop pair
    data: np.ndarray

class Array:
    ...
    def get_chunk(self, chunk_idx: Tuple[int, ...]) -> ChunkWrapper:
        ...

    def set_chunk(self, chunk_idx: Tuple[int, ...], data: np.ndarray) -> None:
        # check data is the right shape, handling edge blocks
        ...

    def iter_chunk_idxs(self) -> Iterator[ChunkWrapper]:
        ...
```

Then e.g. a blockwise operation could be trivially implemented with

```python
for idx in my_array.iter_chunk_idxs():
    chunk = my_array.get_chunk(idx)
    my_array.set_chunk(idx, chunk.data  * 2)
```

Obviously in this particular case, you could use dask, but the principle is useful elsewhere. My use case is that I have an array of labels which I want to relate to point annotations: I want to get a chunk, see which point annotations exist inside it, and find the relationships, preferably without chunk-mangling boilerplate :grin:

This allows tools implementing their own parallelism (dask being one example, but there are many others imaginable) much easier access to the blocked nature of the underlying arrays.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API for direct block access #543

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API for direct block access #543

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions