-
-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression with 3.0.0rc1: reading zarr with tensorstore #2647
Comments
Thanks for the nice reproducer. The array metadata being produced here looks like: {
"shape": [
5,
7,
8,
9
],
"chunks": [
1,
1,
8,
9
],
"fill_value": 0,
"order": "C",
"filters": null,
"dimension_separator": ".",
"compressor": {
"id": "zstd",
"level": 0,
"checksum": false
},
"zarr_format": 2,
"dtype": "<u4"
} Based on the error, tensorstore doesn't seem to like the |
Looking at TS code, it looks like they support zstd checksum for zarr3: Edit: so at the end of the day i'm not sure where this issue belongs. I guess we can use something other than zstd in the test and it should be fine? |
I'd consider this a zarr-python bug --- I proposed a zarr v3 zstd codec with a checksum parameter that is implemented in TensorStore, but the zarr v2 zstd codec in TensorStore follows the zstd codec in numcodecs (at the time it was implemented) and lacks support for a checksum parameter. |
To help us fix the napari side, can someone point to a |
zstd in numcodecs has since evolved and now also supports the checksum parameter when used as a v2 codec. See https://numcodecs.readthedocs.io/en/stable/release.html#id8 I acknowledge that a major problem with v2 is that we don't have a good spec process for codecs, apart from following what numcodecs does. |
I'm not sure there is 1 dict that will work for v2 and v3 🙃 ! Please someone correct me here. Zarr v2 and v3 use different JSON schemas for codecs; the zarr v2 spec requires that codec dicts take the form Nonetheless, there are codecs defined "with" the v3 spec, including some compression routines used in zarr v2 (GZip, and Blosc, Zstd is submitted as a PR) .... but as all the v3-flavored codec JSON documents take the form |
Is the pragmatic solution here to revert this, to avoid producing v2 data that has different metadata (the extra checksum field) that has been the status quo for a long time, and (somehow) just include the checksum field in the v3 codec output? |
Or perhaps do some special casing here in zarr-python before serialising with zstd that looks like: if zarr_format == 2 and compressor == zstd:
assert checksum == False
compressor_dict.pop('checksum') |
I can't easily check at the moment, but can zarr-python 2.x open these zarr-python 3.x zarr_format = 2 produced files? |
That should depend on the version of numcodecs. |
So older than numcodecs 13, means no go? 😬 |
With numcodecs 0.13.1, writing with zarr 3 and then reading with zarr 2 seems to work fine: import zarr
if zarr.__version__[0] == "3":
print("writing array")
arr = zarr.create_array(
"test_data.zarr",
shape=(100, 100),
chunks=(10, 10),
dtype="int32",
zarr_format=2,
overwrite=True,
)
arr[:] = 0
print(zarr.__version__)
arr = zarr.open_array("test_data.zarr")
print(arr[0, 0]) |
I would say that zarr-python should definitely not include checksum=false in the zarr v2 json metadata since that breaks other implementations that don't support the checksum parameter for zarr v2, including older versions of numcodecs itself. I'm a bit wary of adding support for this new parameter in zarr v2 at all since it creates compatibility issues but it is less of an issue if it requires users to opt in explicitly. |
Fixed by setting |
Zarr version
3.0.0rc1
Numcodecs version
0.14.1
Python Version
3.12
Operating System
macOS, but also all on CI
Installation
pip
Description
With 3.0.0rc1 we observed some fails in napari --pre tests. Juan is already addressing one, but there is also a second one:
https://github.com/napari/napari/actions/runs/12610293301/job/35144717871#step:9:369
Here is a script that recapitulates the failure without napari or our pytest parameterization. This script below works with zarr<3 and zarr 3.0.0b3, but fails with rc1
I've tested tensorstore 0.1.69 and 0.1.71 and numcodecs 0.13.1 and 0.14.1 with the same behavior.
Steps to reproduce
Additional output
No response
The text was updated successfully, but these errors were encountered: