-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zarr V3 metadata fixes #248
base: main
Are you sure you want to change the base?
Conversation
This matches the encoding in `manifest.json`.
Type-checking failures seem unrelated |
for more information, see https://pre-commit.ci
@@ -92,8 +92,8 @@ def zarr_v3_array_metadata(zarray: ZArray, dim_names: list[str], attrs: dict) -> | |||
"configuration": {"chunk_shape": metadata.pop("chunks")}, | |||
} | |||
metadata["chunk_key_encoding"] = { | |||
"name": "default", | |||
"configuration": {"separator": "/"}, | |||
"name": "v2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong? For writing v3 metadata?
In general if we're not planning to use this format any more (see #262 (comment)), how much of this PR do you want to keep @LDeakin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably all the rest of the fixes are still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong? For writing v3 metadata?
The chunk manifest example in zarr-developers/zarr-specs#287 and virtualizarr
produces "0.0"
style chunk key encoding, which is v2
with .
separator. default
with /
would be "c/0/0"
.
If the chunk key encoding of the array and the chunk manifest matches, then the chunk-manifest-json
storage transformer does not need to concern itself with chunk key encodings, which makes sense to me.
In general if we're not planning to use this format any more (see #262 (comment)), how much of this PR do you want to keep @LDeakin ?
Not fussed, this PR was just the minimal changes I needed to use the chunk-manifest-json
as currently spec'd and produced by virtualizarr
. I'd hope most of these changes would be superseded by bringing in zarr-python
V3 as a dependency anyway.
I haven't looked thoroughly at the spec for icechunk yet, but do you see it replacing chunk-manifest-json
entirely? Can the time travel stuff be decoupled from the chunk manifests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chunk manifest example in zarr-developers/zarr-specs#287 and virtualizarr produces "0.0" style chunk key encoding, which is v2 with . separator. default with / would be "c/0/0".
My intention was to test out writing to and reading from a v3-compatible json-based chunk manifest spec. If what I actually did looks more like v2 then that's my bad for not understanding the spec properly!
Not fussed, this PR was just the minimal changes I needed to use the chunk-manifest-json as currently spec'd and produced by virtualizarr. I'd hope most of these changes would be superseded by bringing in zarr-python V3 as a dependency anyway.
Okay thanks. Maybe we get virtualizarr working fully, then look at the updated diff, as I would expect @mpiannucci's efforts on icechunk compatibility should iron out similar concerns around fill values?
I'd hope most of these changes would be superseded by bringing in zarr-python V3 as a dependency anyway.
👍 We're close to being able to do that now that zarr-python v3 alpha (beta today actually) is out.
I haven't looked thoroughly at the spec for icechunk yet, but do you see it replacing chunk-manifest-json entirely?
I think that is Earthmover's intention.
Can the time travel stuff be decoupled from the chunk manifests?
In theory it probably could, but in practice unless there is a strong use case for using chunk manifests where you wouldn't also like to have all the other features of icechunk, I'm not really sure why you would bother separting them. All the features of icechunk are closely-related in that they all involve/require adding a new layer of indirection into the store, i.e. the manifests + snapshots (which are kind of like time-stamped consolidated metadata IIUC). This question deserves discussion on that zarr spec proposal issue though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question deserves discussion on that zarr spec proposal issue though.
I've asked in zarr-developers/zarr-specs#287 (comment)
This fixes several problems with
zarr.json
metadata that I noticed when implementing a chunk manifest storage transformer.v2
with.
separator to matchmanifest.json
chunk-manifest-json
storage transformer should not need to be aware of the chunk key encodingNaN
for integer arraysbytes
codec"endian"
to"little"
null
fill value ornan
fill value for integer data type