[v3] Elevate codec pipeline #1932

d-v-b · 2024-05-31T12:43:12Z

This PR moves the CodecPipeline data structure off the ArrayMetadata classes and instead localizes it higher in the stack.

design for array metadata classes

In zarr-python, we have ArrayMetadata classes that are expressly designed to model the contents of zarr metadata, e.g. zarr.json or .zarray (and these classes should not do anything else). Designing the ArrayMetadata classes around this goal is very deliberate, largely due to lessons we learned in the v2 codebase, where the zarr.Array API was mixed together with the a model of the .zarray JSON document, and this led to a variety of problems.

we are breaking that design in v3

With that in mind, note that the v3 spec defines that the codecs attribute of zarr.json is a JSON array of Codec objects. But in the current version of the v3 branch, ArrayV3Metadata.codecs has the type CodecPipeline, but CodecPipeline is not a collection of Codec objects; instead, it's an object with a lot of attributes and methods. Putting such an object under the ArrayV3Metadata.codecs key violates the design principle of the ArrayMetadata class.

one way to fix it

But the fix is relatively simple: we remove CodecPipeline from ArrayV3Metadata and instead push the responsibility for creating instances of that class to objects that consume ArrayMetadata. That's what this PR does.

Specifically, I make CodecPipeline an attribute of AsyncArray; that attribute is created on initialization, by calling a function on the metadata.codecs attribute. Sharding codecs also use CodecPipeline; in this PR, they create the CodecPipeline when needed instead of keeping it as an attribute. If we don't like this, we could consider adding a _codec_pipeline attribute to the sharding codecs, as long as it's clear that this attribute is not part of the JSON serialized form of the codec.

I made a few other changes in the process, including changing the signature of the Codec.validate method to explicitly take the parameters that it needs (shape, dtype, chunk_shape), instead of an ArraySpec object that includes extra fields (specifically, ArraySpec.attributes) that no codec will ever need for validation. If that's unpopular in this PR we can spin it out into its own PR or just not do it, but I think this change is an improvement.

@normanrz this touches a lot of code you authored so I will be very interested in getting your feedback here

…e it dynamically

d-v-b · 2024-05-31T12:45:52Z

the changes here would address some of the problems described in #1913

normanrz

I am not convinced that the class structure needs to match the json structure one-to-one. That is why we have the overridable from_dict and to_dict classes. We should be pragmatic, in particular, in places where we can make the user API more convenient.

I am fine with the changes to the CodecPipeline and can live with the changed validate signature. I am not happy with forcing the users to supply typesize and shuffle for the BloscCodec.

src/zarr/codecs/blosc.py

src/zarr/abc/codec.py

src/zarr/codecs/blosc.py

…erential function call

d-v-b · 2024-06-04T17:53:54Z

@normanrz i reverted the changes to the blosc codec

edit: feel free to ignore this PR until I fix the merge conflicts

…nto elevate_codec_pipeline

d-v-b · 2024-06-23T15:58:45Z

the pre-commit action is flagging a mypy error that a) I can't replicate locally and b) is in a file that this PR did not touch, so I'm going to merge this despite the pre-commit failing.

* v3: (22 commits) [v3] `Buffer` ensure correct subclass based on the `BufferPrototype` argument (zarr-developers#1974) Fix doc build (zarr-developers#1987) Fix doc build warnings (zarr-developers#1985) Automatically generate API reference docs (zarr-developers#1918) Update `RemoteStore.__str__` and add UPath tests (zarr-developers#1964) [v3] Elevate codec pipeline (zarr-developers#1932) 0 dim arrays: indexing (zarr-developers#1980) `parse_shapelike` allows 0 (zarr-developers#1979) Clean up typing and docs for indexing (zarr-developers#1961) add json indentation to config (zarr-developers#1952) chore: update pre-commit hooks (zarr-developers#1973) Bump pypa/gh-action-pypi-publish in the actions group (zarr-developers#1969) chore: update pre-commit hooks (zarr-developers#1957) Update release.rst (zarr-developers#1960) doc: update release notes for 3.0.0.alpha (zarr-developers#1959) Basic working FsspecStore (zarr-developers#1785) Feature: Top level V3 API (zarr-developers#1884) Buffer Prototype Argument (zarr-developers#1910) Create issue-metrics.yml fixes bug in transpose (zarr-developers#1949) ...

d-v-b added 2 commits May 30, 2024 18:44

initial work toward pushing codecpipeline higher in the stacK

e78df06

remove codecpipeline from metadata, add it to AsyncArray and or creat…

740af0d

…e it dynamically

d-v-b added the V3 label May 31, 2024

normanrz requested changes May 31, 2024

View reviewed changes

src/zarr/codecs/blosc.py Outdated Show resolved Hide resolved

src/zarr/abc/codec.py Outdated Show resolved Hide resolved

src/zarr/codecs/blosc.py Outdated Show resolved Hide resolved

d-v-b added 5 commits May 31, 2024 17:45

revert changes to blosc.py

e3ced40

revert changes to test_codecs.py

6cad24f

consistent expanded function signature for evolve_from_array_spec

29f907b

restore wider function signature for codec.validate to avoid self-ref…

50ecc7a

…erential function call

remove commented code block

54dc2e8

d-v-b marked this pull request as ready for review May 31, 2024 17:03

d-v-b requested a review from normanrz June 4, 2024 17:53

d-v-b added 5 commits June 12, 2024 09:52

Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…

6755488

…nto elevate_codec_pipeline

make codec_pipeline a cached property of sharding codec

2d5a84e

cached_property -> vanilla property

99797c2

Merge branch 'v3' into elevate_codec_pipeline

4a14f34

Merge branch 'v3' into elevate_codec_pipeline

5dfae4f

d-v-b requested a review from jhamman June 12, 2024 20:02

normanrz approved these changes Jun 14, 2024

View reviewed changes

Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…

a5e89f2

…nto elevate_codec_pipeline

d-v-b merged commit 8aadd15 into zarr-developers:v3 Jun 23, 2024
17 of 18 checks passed

d-v-b deleted the elevate_codec_pipeline branch June 23, 2024 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v3] Elevate codec pipeline #1932

[v3] Elevate codec pipeline #1932

d-v-b commented May 31, 2024

d-v-b commented May 31, 2024

normanrz left a comment

d-v-b commented Jun 4, 2024 •

edited

Loading

d-v-b commented Jun 23, 2024

[v3] Elevate codec pipeline #1932

[v3] Elevate codec pipeline #1932

Conversation

d-v-b commented May 31, 2024

design for array metadata classes

we are breaking that design in v3

one way to fix it

d-v-b commented May 31, 2024

normanrz left a comment

Choose a reason for hiding this comment

d-v-b commented Jun 4, 2024 • edited Loading

d-v-b commented Jun 23, 2024

d-v-b commented Jun 4, 2024 •

edited

Loading