Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move codecs into separate (versioned documents), update urls #187

Merged
merged 3 commits into from
Dec 2, 2022

Conversation

jstriebel
Copy link
Member

@jstriebel jstriebel commented Nov 30, 2022

This PR

Note: The text of the actual codec specifications is unchanged.

@jstriebel jstriebel added core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec codec Codec spec related issue labels Nov 30, 2022
@jstriebel jstriebel self-assigned this Nov 30, 2022
@jbms
Copy link
Contributor

jbms commented Nov 30, 2022

array-extensions/filters can be removed since that functionality is now covered by the codecs metadata field

@jbms
Copy link
Contributor

jbms commented Nov 30, 2022

As also mentioned in #183, I think there is value in keeping the metadata representation concise, because in some cases users may read and write it directly. For example, some zarr implementations (like tensorstore) may just rely on the metadata json representation itself for specifying codecs, rather than defining a separate programming API for each codec. I think we might look at HTML as a model: we have <img> not <https://purl.org/html/tags/img/v7.3> As new features are added, browser compatibility is documented:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#browser_compatibility
https://caniuse.com/mdn-html_elements_img_crossorigin

My own preference is that, for example, the gzip codec is specified with just plain "gzip". If we later need to make an incompatible change (unlikely in my opinion), we could then introduce a "gzip2" or "gzip/v2" codec as well. If a backwards-compatible change is added, such as introduction of a new optional field, then we could continue using just "gzip", and implementations that don't support the new field would either fail or ignore it, depending on how the codec is defined. I can see that there is one downside --- with an explicit version number, if you try to read "gzip/v1.3" and the implementation is only aware of "gzip/v1.2", then the error message can say "gzip/v1.3" not supported, latest version supported is "gzip/v1.2", as opposed to "Invalid field 'foobar', array may have been created with a newer version of the gzip codec.".

Only codecs defined by external organizations, and that haven't been standardized as part of the zarr specification, would use a longer identifier (e.g. a full URL, possibly without the scheme).

It still could make sense to have version numbers for the codec specification itself, so that implementations can specify in their documentation exactly which version of the codec specification they claim to support. The proposed mechanism here seems to be to duplicate the file under a new name with an incremented version number when any change is made. I wonder if we could instead just rely on git to track multiple versions; implementations could then just link to the older spec file on github. There are also various ways to manage multiple versions in the generated documentation.

@jstriebel
Copy link
Member Author

@jbms I agree with you in all points, but I still would prefer to split up the codec documents. I just reconciled the purl URLs and would like to change or update them, see #176. How about continuing the URL/id conversation there? I think it should not block this PR, this just cleans up lot's of things to have a clean overview now.

array-extensions/filters can be removed since that functionality is now covered by the codecs metadata field

Good point, removing it in this PR as well now.

@jbms Would you like to see any further changes except for an ongoing discussion about the purl urls? As explained, I'd rather defer that discussion to #176 for now.

@jstriebel jstriebel merged commit edda100 into zarr-developers:main Dec 2, 2022
@jstriebel jstriebel deleted the separate-codecs branch December 2, 2022 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codec Codec spec related issue core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Versioning codecs v3 Spec: Finalize current codecs
2 participants