v3 spec: Consider removing metadata encoding #174

jstriebel · 2022-11-24T12:53:12Z

In #171 I consolidated metadata_key_suffix and metadata_encoding into a single object, which can also be an extension point. @jbms raised the question if this explicit extension point is needed, since global extensions could define the same behavior (see conversation here. I'll try to summarize the pro and contra for removing it:

Pro:

We can still later add a metadata_encoding attribute if we want, rather than adding it under "extensions" --- since any volume using a different metadata_encoding will only be usable by implementations that support it, the default behavior of failing when encountering any unknown metadata members will do the right thing.
The extension point is untested so far and might need further changes later-on.

Contra:

It was specifically asked for a possibility to change the metadata encoding in Non-JSON metadata and attributes #37 and Support for inf, nan, binary data in attributes #141
Including the metadata_encoding extension point explicitly makes implementors more aware of this being a possible extension, and might make adapting further extensions more easy.

The text was updated successfully, but these errors were encountered:

jbms · 2022-11-24T13:58:19Z

Assuming we keep metadata_encoding at all, I think that metadata_key_suffix can be skipped altogether, rather than embedded as a member of metadata_encoding. The specification for each metadata encoding type would simply state what the key suffix is. An implementation that does not know about a given metadata_encoding does not benefit by knowing the extension since it still can't decode it.

As for #37, the original request was for being able to encode metadata as hdf5 attributes which have a different data model. I'm not clear exactly how that request fits in with zarr v3, but I don't think metadata_encoding is actually helpful for that purpose.

As for #141, I also don't think metadata_encoding is a great solution if regular json with its limitations remains the default; in fact I think if the representable values vary depending on the metadata_encoding that would be confusing and problematic.

rabernat · 2023-01-11T13:28:51Z

What if we made metadata encoding up to the store?

For, some stores (filesystems, object stores), json is a natural choice. For other stores (e.g. document databases), native storage of dictionaries would be more natural.

joshmoore · 2023-01-18T17:27:24Z

What if we made metadata encoding up to the store?

How would calling code know which store to use?

jbms · 2023-01-18T17:39:25Z

What if we made metadata encoding up to the store?

How would calling code know which store to use?

I think you might have intended to ask: "How would the calling code would which encoding to use?"

Conceptually we could say that the store just decodes and re-encoders when writing, and does the reverse when reading, based on the key requested.

Implementations of zarr could choose to make this more efficient by adding to their store abstraction an interface for reading and writing in-memory JSON values directly, in order to avoid the extra encoding and decoding. That would also be necessary if the value cannot be represented as JSON (e.g. nan, infinity, specific nan values), though we might want to avoid such values precisely to ensure consistency across metadata encodings.

rabernat · 2023-01-18T18:03:40Z

Right, so the interface might look something like

class Store:

    def store_meta(key: str, metadata: dict[str, ?]) -> None:
        ...

    def store_bytes(key: str, data: bytes) -> None:
        ...

i.e. you would path a native dict to the store.

joshmoore · 2023-01-19T10:44:16Z

I think you might have intended to ask: "How would the calling code would which encoding to use?"

... Maybe. Guess as always concrete examples would help. For a database, I agree it's not really our business to say how it's going in. I was more concerned about the middle ground where someone is trying to achieve zarr.json.gz or zarr.bson, etc. i.e. We're still more or less in file space but getting towards binary representations. As long as the cases with "there's no JSON" require code in order to configure the store, then I can see a path forward, but we should be really upfront about that in the spec.

jstriebel · 2023-02-16T11:01:22Z

We removed the metadata encoding for now since a similar behavior can be specified via group extensions or group storage transformers. If the data is still stored json-like, it might be handled at the level of the store, as indicated here:

In general, a value is a sequence of bytes. Specific stores may choose more specific storage formats, which must be stated in the specification of the respective store. E.g. a database store might encode values of *.json keys with a database-native json type.

I'm closing this issue for now, I'd propose to move to #37 to discuss other encodings further.

jstriebel added the core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec label Nov 24, 2022

jstriebel mentioned this issue Nov 24, 2022

Apply more ZEP 1 feedback #171

Merged

jstriebel mentioned this issue Nov 24, 2022

ZEP0001 - Core v3.0 spec for review #149

Closed

This was referenced Dec 6, 2022

Why do storage transformers need "type" separate from "configuration" #191

Closed

Consider to remove entrypoint metadata, do not specify root #192

Closed

jstriebel closed this as completed Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3 spec: Consider removing metadata encoding #174

v3 spec: Consider removing metadata encoding #174

jstriebel commented Nov 24, 2022

jbms commented Nov 24, 2022

rabernat commented Jan 11, 2023

joshmoore commented Jan 18, 2023

jbms commented Jan 18, 2023

rabernat commented Jan 18, 2023 •

edited

Loading

joshmoore commented Jan 19, 2023

jstriebel commented Feb 16, 2023

v3 spec: Consider removing metadata encoding #174

v3 spec: Consider removing metadata encoding #174

Comments

jstriebel commented Nov 24, 2022

jbms commented Nov 24, 2022

rabernat commented Jan 11, 2023

joshmoore commented Jan 18, 2023

jbms commented Jan 18, 2023

rabernat commented Jan 18, 2023 • edited Loading

joshmoore commented Jan 19, 2023

jstriebel commented Feb 16, 2023

rabernat commented Jan 18, 2023 •

edited

Loading