-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension for attribute datatypes? #229
Comments
Hi Alex! There has been some discussion on this, yes. I'm going to transfer your issue to Related |
Here's my personal idea for the best way to implement this. We define a zarr v3 extension for an external attributes file (the V3 spec puts attributes in "zarr.json") and allow this to be a binary storage format like CBOR, msgpack, etc. That would solve both the "large amount of metadata" and the "explicit attribute datatypes" issue in one go. |
As Ryan noted, we've had a lot of discussions about attributes and a number of solutions have been proposed but no consensus has yet been reached. I think it may be helpful to distinguish between a number of different issues:
CBOR provides a way to encode a richer set of values (1) but does not by itself provide a way to distinguish between e.g. an int32 and a uint64. However, CBOR does provide a way to associate an arbitrary integer tag with a value, and in principle zarr could define tag values to indicate data types. I don't know how well that would work in practice, e.g. how well it would allow the zarr metadata to be read and written by other tools, but it might work reasonably well. msgpack similarly provides a way to encode a richer set of values but does not particularly help with (2). It provides an extension mechanism but I don't think the extension mechanism would work well for the purpose of indicating a data type. |
Thank you @rabernat for routing my question to a more appropriate place. @jbms I agree with your breakdown of use cases. For the container cases like Can this discussion be combined with one of the mentioned issues or better to keep it separate? |
I think you will have to decide yourself whether your comments are closely related to one of the existing issues. I don't think there is existing an issue specifically related to the idea of storing an explicit data type for each attribute. But I believe that is the approach taken by nczarr (netcdf zarr). In general, while I can see that there may be some value in being explicit about data types, and that it provides better compatibility with the HDF5 data model, it also seems to me that it would introduce a lot of additional complexity and it is not clear exactly which use cases, other than HDF5 compatibility, benefit from it. In contrast, merely extending the set of values that can be represented seems more promising. But if you have a compelling proposal for how to add support for explicit data types, I'd certainly be interested. |
Not finding a better issue so I'll cross-reference here an impending need for datetime from bluesky/tiled#514 |
See also bluesky/tiled#782 |
Hello!
Hope this is a good place for my question: Has there been any interest before for more explicit Zarr attribute datatypes? My understanding of the Zarr v3 draft specification is that attribute values will be of any valid JSON datatype. Is the expectation that casting, for example, of an attribute scalar value of
1
into any specific software datatype likeint32
oruint64
to be done by Zarr readers depending on the context?Thanks!
The text was updated successfully, but these errors were encountered: