Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wrappers for zarr v3 #524

Merged
merged 62 commits into from
Nov 8, 2024
Merged

Add wrappers for zarr v3 #524

merged 62 commits into from
Nov 8, 2024

Conversation

normanrz
Copy link
Contributor

@normanrz normanrz commented May 6, 2024

The Zarr v3 specification only lists a few codecs that are officially supported. However, it is desirable to expose the codecs in numcodecs for use with v3 arrays as well. This PR adds wrapper classes for numcodecs support.

The name of the codecs is prefixed with numcodecs. to avoid naming collisions in case some codecs of numcodecs get added to the Zarr spec. Also, there is a warning that numcodecs codecs are not officially supported and will likely not work in any other Zarr implementation.

Most array-to-array ("filters") and bytes-to-bytes codecs are supported. Absent are the variable-length codecs as well as json, msgpack and pickle.

Here is an example of the persisted configuration:

{
  "name": "numcodecs.fixedoffsetscale",
  "configuration": {"offset": 0, "scale": 51, "astype": "uint16"}
}

Use of numcodecs in v2 arrays is not affected.

Fixes #502

@pep8speaks
Copy link

pep8speaks commented May 6, 2024

Hello @normanrz! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-05-08 20:53:05 UTC

@normanrz normanrz changed the title Add wrappers for zarr v3 [DRAFT] Add wrappers for zarr v3 May 6, 2024
@MSanKeys963 MSanKeys963 requested a review from jakirkham May 8, 2024 16:20
@normanrz normanrz changed the title [DRAFT] Add wrappers for zarr v3 Add wrappers for zarr v3 May 8, 2024
@rabernat
Copy link
Contributor

The name of the codecs is prefixed with https://zarr.dev/numcodecs/ to avoid naming collisions in case some codecs of numcodecs get added to the Zarr spec

I am not sure about the idea of using a URL that does not actually resolve to anything useful.

@rabernat
Copy link
Contributor

pcodec is actually an "Array to Bytes" codec: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/pcodec.py

How would that fit in here?

@martindurant
Copy link
Member

Any thoughts about what to do with numcodecs codecs not defined in this repo, but currently used via entrypoints?

@d-v-b
Copy link
Contributor

d-v-b commented Jun 19, 2024

The name of the codecs is prefixed with https://zarr.dev/numcodecs/ to avoid naming collisions in case some codecs of numcodecs get added to the Zarr spec

I am not sure about the idea of using a URL that does not actually resolve to anything useful.

seconding this sentiment, a URL that doesn't resolve to anything is rather confusing. I think numcodecs.<codec_name> or numcodecs/<codec_name> are simpler templates for a numcodecs-qualified name.

@rabernat
Copy link
Contributor

Any thoughts about what to do with numcodecs codecs not defined in this repo, but currently used via entrypoints?

Could we ask those codecs to implement Zarr codec entrypoints directly? Which codecs do you have in mind?

The challenge is that the V3 codecs are quite a bit more explicit in their typing (Array to Bytes, Bytes to Bytes, etc.) than legacy numcodecs codecs. So automatically translating an arbitrary numcodecs codec to a V3 codec is not possible.

@martindurant
Copy link
Member

I am thinking of https://github.com/fsspec/kerchunk/blob/main/kerchunk/codecs.py and imagecodecs. There are probably others.

@normanrz
Copy link
Contributor Author

The name of the codecs is prefixed with https://zarr.dev/numcodecs/ to avoid naming collisions in case some codecs of numcodecs get added to the Zarr spec

I am not sure about the idea of using a URL that does not actually resolve to anything useful.

I had asked @MSanKeys963 to setup the respective redirects to the numcodecs docs. That should solve that.

@normanrz
Copy link
Contributor Author

pcodec is actually an "Array to Bytes" codec: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/pcodec.py

How would that fit in here?

Must have missed pcodec. I'll add it.

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.92%. Comparing base (d8a219f) to head (153d340).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #524    +/-   ##
========================================
  Coverage   99.91%   99.92%            
========================================
  Files          59       62     +3     
  Lines        2408     2691   +283     
========================================
+ Hits         2406     2689   +283     
  Misses          2        2            
Files with missing lines Coverage Δ
numcodecs/tests/test_zarr3.py 100.00% <100.00%> (ø)
numcodecs/tests/test_zarr3_import.py 100.00% <100.00%> (ø)
numcodecs/zarr3.py 100.00% <100.00%> (ø)

@normanrz
Copy link
Contributor Author

normanrz commented Nov 5, 2024

I now added docs for this module.
Screenshot 2024-11-05 at 18 11 10

I also stripped the Codec suffix of all the wrapper codecs so that they are called the same as the original codecs. I think this is a nice way of interacting with the codecs. As part of that, I also reworked the tests.

I deactivated zfpy in macos-14 builds. zfpy for arm64 seems to specify other numpy versions which clashes with the numpy requirements of zarr3.

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I won't have time for a full review for a few days, but hoping this partial one is helpful.

The build docs aren't rendering properly: https://numcodecs--524.org.readthedocs.build/en/524/zarr3.html

Looking at the readthedocs build log this is because

  1. zarr-python v3 isn't being installed, which is required to import numcodecs.zarr and read the docstring.
  2. The codec names clash, so need fully specifying as e.g. numcodecss.zarr3.CodecName instead of just CodecName

I'm happy to put together a fix for the doc issues when I have time, but might not be for a couple of days.

.github/workflows/ci.yaml Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Outdated Show resolved Hide resolved
normanrz and others added 3 commits November 6, 2024 11:07
Co-authored-by: David Stansby <dstansby@gmail.com>
@normanrz
Copy link
Contributor Author

normanrz commented Nov 6, 2024

Thanks for checking the docs CI. I didn't catch that. I fixed that right now: https://numcodecs--524.org.readthedocs.build/en/524/zarr3.html

The zarr3 codecs link back to the original codecs. Should I also add links the other way round (e.g. numcodecs.blosc.Blosc -> numcodecs.zarr3.Blosc)?

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 - I've left two optional comments, feel free to take them or leave them and then self merge

numcodecs/zarr3.py Outdated Show resolved Hide resolved
numcodecs/zarr3.py Outdated Show resolved Hide resolved
@dstansby
Copy link
Contributor

dstansby commented Nov 7, 2024

The zarr3 codecs link back to the original codecs. Should I also add links the other way round (e.g. numcodecs.blosc.Blosc -> numcodecs.zarr3.Blosc)?

I don't think it's worth doing this

@normanrz normanrz enabled auto-merge (squash) November 8, 2024 09:19
@normanrz normanrz self-assigned this Nov 8, 2024
@normanrz normanrz merged commit 44130cd into main Nov 8, 2024
39 of 41 checks passed
@normanrz
Copy link
Contributor Author

normanrz commented Nov 8, 2024

This PR got merged prematurely. Looks like we should better set up the auto-merge.

Anyways, this test is still failing after merging #620. Should I add https://github.com/zarr-developers/numcodecs/pull/524/files/71178b07b04099f9a5953f2513599a95ae810c0f#diff-944291df2c9c06359d37cc8833d182d705c9e8c3108e7cfe132d61a06e9133dd back @dstansby?

@dstansby dstansby mentioned this pull request Nov 9, 2024
dstansby pushed a commit that referenced this pull request Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Supporting Zarr-Python 3 Codec API
7 participants