-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-JSON metadata and attributes #37
Comments
Hi Stephan,
One quick question. Currently the metadata in the v3.0 spec is *not* just a
flat set of name/value pairs, where values are simple types like string or
number. Some parts of the metadata require nesting, meaning that the value
is a JSON object or array of objects. E.g., the value of chunk_grid is an
object, and the value of chunk_codecs is an array of objects. How would you
accommodate this if using HDF5 attributes to store metadata?
…On Thu, 23 May 2019 at 22:51, Stephan Saalfeld ***@***.***> wrote:
As briefly discussed in the group chat, I would like to propose a change
to how metadata and attributes are accessed. The current spec is specific
that this data must be readable and writable as JSON. This is compatible
with all current storage backends of Zarr and the filesystem and cloud
storage backends of N5. It is not compatible with the current HDF5 backend
of N5 where attributes and metadata are represented as HDF5 attributes.
Instead of requiring JSON, I suggest that metadata and attribute access
should be specified similar to the group and array access protocol of the
spec, i.e. as access primitives, i.e. API. The most basic primitives would
be:
getAttribute - Retrieve the value associated with a given key and
attributeKey.
| Parameters: `key`, `attributeKey`, [`type`]
| Output: `value`
setAttribute - Store a (key, value) pair.
| Parameters: `key`, `attributeKey`, `value`
| Output: none
Probably also something to list attributes and may be infer their types if
necessary.
The N5 API does it this way and I find it very straight forward to use
this across JSON and non-JSON backends
https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L214
https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L271
https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L43
https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L59
and the default JSON implementation which is only bloated to support
version 0 with non auto-inferred compressors
https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/AbstractGsonReader.java
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#37?email_source=notifications&email_token=AAFLYQUJRQHO4MGLPOJJ7HTPW4GXZA5CNFSM4HPKSPK2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GVSIBLA>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFLYQVIZJVWX5Z4RXXZGLDPW4GXZANCNFSM4HPKSPKQ>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health
Big Data Institute
Li Ka Shing Centre for Health Information and Discovery
University of Oxford
Old Road Campus
Headington
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596 or +44 (0)7866 541624
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: @alimanfoo <https://twitter.com/alimanfoo>
Please feel free to resend your email and/or contact me by other means if
you need an urgent reply.
|
Just to generalise a bit, I think there are two possible sets of requirements here: (1) If/how to support storage implementations which have some "native" mechanism for storing metadata (e.g., N5's HDF5 backend). (2) If/how to support alternative encodings of metadata (e.g., MessagePack instead of JSON). In terms of the v3.0 core protocol, do we try to create a framework that can accommodate either of these requirements, if so how? This might mean just providing the right foundation to allow protocol extensions to address them, rather than fully addressing them within the core protocol. |
My current thinking (to prevent storing a dataset of XML) was to convert OME-XML to the upcoming OME-JSON-LD and put that in the block of metadata. Either a hierarchical JSON tree would work, or a set of triples could represent the underlying RDF. Depending on allowed keys, it's conceivable that one could map the Subject and the Predicate into a single key but it won't be attractive:
|
Hi @joshmoore, I would imagine it should be fine to include some JSON-LD within a zarr array metadata document. I have to confess I don't fully grok the JSON-LD syntax, but I'd hope something like this was OK:
Someone please correct me if this doesn't work. |
Just following up on this...
I'm currently thinking that it's not worth the trouble to try to accommodate the way the existing N5 HDF5 backend stores metadata. This is simply because the flat name/value pair model for metadata is very restrictive, and not rich enough to express some of the basic things we want to express in the core metadata, or which some applications might want to store in user metadata (like the OME example). So I'm not planning to make any spec changes to accommodate this. Please push back if anyone disagrees.
This is something I can see the potential value of, at least how to leave the door open for this to be explored. However, I don't want to overcomplicate the core spec, so I won't try to accommodate this currently, unless someone specifically asks for it. |
FWIW the way Zarr handles this problem today is to provide a way for users to copy from Zarr to HDF5. IMHO it seems reasonable to continue with that strategy going forward. As to using an alternative to JSON, we would be interested in this. In particular protobuf came up as an interesting option. |
Using protobuf it should certainly be possible to express all of the core metadata. One question would be how it would handle user attributes, where you cannot predefine the schema ahead of time. But maybe that can be worked around somehow. In any case, I'd be happy to figure out how to write the spec to allow for alternative metadata encodings. |
Interestingly looks like Arrow are using flatbuffers. Flatbuffers seem easier to accommodate than protobuf because of the support for unions. I'm thinking we could keep JSON as the canonical format, but could also create a flatbuffers schema for the core metadata, if only to know it was possible, i.e., to check we hadn't come up with a metadata structure that was hard to encode in something other than JSON. |
Yes, as long as there is a place to "embed" a JSON tree, I assume I can make it work. (Note: that could also be another file if that's preferable) |
Just to say I've done some work on the v3.0 core protocol spec in the development branch to provide a mechanism for alternative metadata encodings to be defined and used, more info in this comment. Note that this does not address the original request in this issue from @axtimwalde to provide a mechanism to support native storage of metadata, e.g., in an HDF5 backend. However, it would provide a mechanism to support use of encodings like flatbuffers or msgpack. Comments very welcome, just food for discussion. |
As briefly discussed in the group chat, I would like to propose a change to how metadata and attributes are accessed. The current spec is specific that this data must be readable and writable as JSON. This is compatible with all current storage backends of Zarr and the filesystem and cloud storage backends of N5. It is not compatible with the current HDF5 backend of N5 where attributes and metadata are represented as HDF5 attributes. Instead of requiring JSON, I suggest that metadata and attribute access should be specified similar to the group and array access protocol of the spec, i.e. as access primitives, i.e. API. The most basic primitives would be:
getAttribute
- Retrieve thevalue
associated with a givenkey
andattributeKey
.setAttribute
- Store a (key
,attributeKey
,value
) triple.Probably also something to list attributes and may be infer their types if necessary.
The N5 API does it this way and I find it very straight forward to use this across JSON and non-JSON backends
https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L214
https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L271
https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L43
https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L59
and the default JSON implementation which is only bloated to support version 0 with non auto-inferred compressors
https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/AbstractGsonReader.java
The text was updated successfully, but these errors were encountered: