Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3 array creation: codecs #1943

Open
d-v-b opened this issue Jun 2, 2024 · 0 comments
Open

v3 array creation: codecs #1943

d-v-b opened this issue Jun 2, 2024 · 0 comments

Comments

@d-v-b
Copy link
Contributor

d-v-b commented Jun 2, 2024

One thing about zarr v2 -> v3 that might surprise users is the change from the v2 compressor metadata (a single thing) + filters (an ordered collection) to the v3 codecs metadata (an ordered collection with a special required element).

I suspect most users coming from v2 won't use array-array or bytes-bytes codecs. These users will think in terms of a single compressor for their data, if they worry about the compressor at all. For such users, the codecs keyword argument in v3 array creation will be confusing, because a) it's not called "compressor", and b) it's an iterable. Users who do use filters will wonder where the filters keyword argument went, and they will have to discover that their filters are now called "codecs", and these codecs should be prepended in front of the thing that used to be called the compressor.

I wonder if we could smooth out some of this confusion by adding an abstraction on top of the v3 codecs metadata in our array creation routines, and returning to v2 terminology. Specifically, we could use the keyword "filters" to denote array-array codecs, "compressor" to denote the required array-bytes compressor, and introduce a new, v3-array-only keyword "post_compressor" to denote any bytes-bytes codecs. I'm not wedded to this name, feel free to suggest something better.

It would be an error to request a v2 array with a post-compressor, and otherwise the exact same keywords work for v2 and v3 array creation routines. Ergonomically this feels like an improvement and it would simplify today's chimeric AsyncArray.create function, which is burdened with supporting mutually exclusive codecs and compressor / filters keyword arguments.

e.g.

def create(
  shape, 
  dtype, 
  filters: Iterable[ArrayArrayCodec], 
  compressor: ArrayBytesCodec, 
  post_compressor: Iterable[BytesBytesCodec], 
  zarr_format, ...) -> AsyncArray

thoughts? Especially from people kicking the tires on the v3 array api (@rabernat)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant