Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3 stores: Implement efficient get/set_partial_values #1106

Closed
jstriebel opened this issue Aug 3, 2022 · 3 comments
Closed

v3 stores: Implement efficient get/set_partial_values #1106

jstriebel opened this issue Aug 3, 2022 · 3 comments
Milestone

Comments

@jstriebel
Copy link
Member

In #1096 get/set_partial_values methods were introduced to Zarr v3 stores. The provided method is a viable fallback for stores that cannot read and write partial objects. Other stores however should implement optimized methods, such as fsspec-based stores (using read_block). It might be useful that stores indicate if they have fast partial read/write methods, so that strategies such as partial decompression can be selected automatically.

As a follow-up, the new get/set_partial_values methods could be used for the actual partial decompression in the PartialReadBuffer, instead of the current store-specific implementation.

Follow-up to #1096

@martindurant
Copy link
Member

I believe that the most important use case for this is actually uncompressed arrays! That's a much simpler code path and reads no partial-reader (also happens to be the only one important to me for now).

How are you proposing that get_partial_buffer should be called? At the moment in (v2) Array._get_selection we iterate over the selections for each chunk, so we have the information right before handing off to the store.

@jstriebel
Copy link
Member Author

I believe that the most important use case for this is actually uncompressed arrays! That's a much simpler code path and reads no partial-reader (also happens to be the only one important to me for now).

Indeed, that's a great use-case!

How are you proposing that get_partial_buffer should be called? At the moment in (v2) Array._get_selection we iterate over the selections for each chunk, so we have the information right before handing off to the store.

I'll try to dump my thoughts about them:

I guess there are at least two ways:

  • Simply store the partial data as-is and pass it on,
  • use sth. like the PartialReadBuffer or extend it, so that the partial data has a similar interface as the whole chunk.

To solve this more holistically, the compressor (or a dummy for uncompressed arrays) should be able to tell if it can decode partial data, and have some interface for "demanding" data. In the uncompressed use-case, the requested array indices can directly be translated to chunk offsets, but in the blosc or other cases with an index, the decoder might need to read data in several passes (e.g. first getting some index, then getting the actual data, based on the index). For such cases, the PartialReadBuffer is a nice abstraction that allows to reload data in several passes, depending on the decoder. If the pattern is always to maybe get some data upfront for a chunk, and then the decoder can translate indices to offsets, this might be also be a viable option.

PS: First, we still need to implement efficient get/set_partial_values for stores where this is possible, to gain anything from it.

@jhamman
Copy link
Member

jhamman commented Jul 1, 2024

This is done now on the v3 branch.

@jhamman jhamman closed this as completed Jul 1, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Zarr-Python - 3.0 Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants