-
-
Notifications
You must be signed in to change notification settings - Fork 370
Description
There are internal methods for retrieving individual blocks, but there are some circumstances where addressing data one block at a time is helpful for end users, and would avoid the user having to do their own pipeline of chunk size -> block indices -> slicing only for zarr to then go slicing -> block indices etc. again.
I envision something like
@dataclass
class ChunkWrapper:
chunk_idx: Tuple[int, ...]
chunk_slice: Tuple[slice, ...] # or an offset-shape pair, or a start-stop pair
data: np.ndarray
class Array:
...
def get_chunk(self, chunk_idx: Tuple[int, ...]) -> ChunkWrapper:
...
def set_chunk(self, chunk_idx: Tuple[int, ...], data: np.ndarray) -> None:
# check data is the right shape, handling edge blocks
...
def iter_chunk_idxs(self) -> Iterator[ChunkWrapper]:
...Then e.g. a blockwise operation could be trivially implemented with
for idx in my_array.iter_chunk_idxs():
chunk = my_array.get_chunk(idx)
my_array.set_chunk(idx, chunk.data * 2)Obviously in this particular case, you could use dask, but the principle is useful elsewhere. My use case is that I have an array of labels which I want to relate to point annotations: I want to get a chunk, see which point annotations exist inside it, and find the relationships, preferably without chunk-mangling boilerplate 😁
This allows tools implementing their own parallelism (dask being one example, but there are many others imaginable) much easier access to the blocked nature of the underlying arrays.