Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-JSON metadata and attributes #37

Open
axtimwalde opened this issue May 23, 2019 · 11 comments
Open

Non-JSON metadata and attributes #37

axtimwalde opened this issue May 23, 2019 · 11 comments
Labels
protocol-extension Protocol extension related issue

Comments

@axtimwalde
Copy link

axtimwalde commented May 23, 2019

As briefly discussed in the group chat, I would like to propose a change to how metadata and attributes are accessed. The current spec is specific that this data must be readable and writable as JSON. This is compatible with all current storage backends of Zarr and the filesystem and cloud storage backends of N5. It is not compatible with the current HDF5 backend of N5 where attributes and metadata are represented as HDF5 attributes. Instead of requiring JSON, I suggest that metadata and attribute access should be specified similar to the group and array access protocol of the spec, i.e. as access primitives, i.e. API. The most basic primitives would be:

getAttribute - Retrieve the value associated with a given key and attributeKey.

| Parameters: `key`, `attributeKey`, [`type`]
| Output: `value`

setAttribute - Store a (key, attributeKey, value) triple.

| Parameters: `key`, `attributeKey`, `value`
| Output: none

Probably also something to list attributes and may be infer their types if necessary.
The N5 API does it this way and I find it very straight forward to use this across JSON and non-JSON backends

https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L214

https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L271

https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L43

https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L59

and the default JSON implementation which is only bloated to support version 0 with non auto-inferred compressors

https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/AbstractGsonReader.java

@alimanfoo
Copy link
Member

alimanfoo commented May 24, 2019 via email

@alimanfoo
Copy link
Member

Just to generalise a bit, I think there are two possible sets of requirements here:

(1) If/how to support storage implementations which have some "native" mechanism for storing metadata (e.g., N5's HDF5 backend).

(2) If/how to support alternative encodings of metadata (e.g., MessagePack instead of JSON).

In terms of the v3.0 core protocol, do we try to create a framework that can accommodate either of these requirements, if so how? This might mean just providing the right foundation to allow protocol extensions to address them, rather than fully addressing them within the core protocol.

@alimanfoo alimanfoo added the core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec label May 24, 2019
@joshmoore
Copy link
Member

My current thinking (to prevent storing a dataset of XML) was to convert OME-XML to the upcoming OME-JSON-LD and put that in the block of metadata. Either a hierarchical JSON tree would work, or a set of triples could represent the underlying RDF. Depending on allowed keys, it's conceivable that one could map the Subject and the Predicate into a single key but it won't be attractive:

    "@id" : "arc:arc0",
    "@type" : [ "ome:Arc", "ome:ManufacturerSpec" ],
    "identifier" : "LightSource:1",
    "ome:arcType" : {
      "@id" : "arcType:Xe"
    },

https://gitlab.com/openmicroscopy/incubator/ome-owl/blob/master/ontology/RDF/JSON-LD/2016-06/sample/instrument_data.json#L274

@alimanfoo
Copy link
Member

Hi @joshmoore, I would imagine it should be fine to include some JSON-LD within a zarr array metadata document. I have to confess I don't fully grok the JSON-LD syntax, but I'd hope something like this was OK:

{
    "zarr_format": "http://purl.org/zarr/spec/protocol/core/3.0",
    "shape": [10000, 1000],
    "data_type": "<f8",
    "chunk_grid": {
        "type": "regular",
        "chunk_shape": [1000, 100]
    },
    "chunk_memory_layout": "C",
    "chunk_codecs": [
        {
            "codec": "http://purl.org/zarr/spec/codec/gzip",
            "level": 1
        }
    ],
    "fill_value": "NaN",
    "extensions": [],
    "attributes": {
        "foo": 42,
        "bar": "apples",
        "baz": [1, 2, 3, 4],
        "OME": {
            // some block of OME-JSON-LD
        }
    }
}

Someone please correct me if this doesn't work.

@alimanfoo
Copy link
Member

Just following up on this...

(1) If/how to support storage implementations which have some "native" mechanism for storing metadata (e.g., N5's HDF5 backend).

I'm currently thinking that it's not worth the trouble to try to accommodate the way the existing N5 HDF5 backend stores metadata. This is simply because the flat name/value pair model for metadata is very restrictive, and not rich enough to express some of the basic things we want to express in the core metadata, or which some applications might want to store in user metadata (like the OME example). So I'm not planning to make any spec changes to accommodate this. Please push back if anyone disagrees.

(2) If/how to support alternative encodings of metadata (e.g., MessagePack instead of JSON).

This is something I can see the potential value of, at least how to leave the door open for this to be explored. However, I don't want to overcomplicate the core spec, so I won't try to accommodate this currently, unless someone specifically asks for it.

@jakirkham
Copy link
Member

FWIW the way Zarr handles this problem today is to provide a way for users to copy from Zarr to HDF5. IMHO it seems reasonable to continue with that strategy going forward.

As to using an alternative to JSON, we would be interested in this. In particular protobuf came up as an interesting option.

@alimanfoo
Copy link
Member

As to using an alternative to JSON, we would be interested in this. In particular protobuf came up as an interesting option.

Using protobuf it should certainly be possible to express all of the core metadata. One question would be how it would handle user attributes, where you cannot predefine the schema ahead of time. But maybe that can be worked around somehow. In any case, I'd be happy to figure out how to write the spec to allow for alternative metadata encodings.

@alimanfoo
Copy link
Member

Interestingly looks like Arrow are using flatbuffers. Flatbuffers seem easier to accommodate than protobuf because of the support for unions. I'm thinking we could keep JSON as the canonical format, but could also create a flatbuffers schema for the core metadata, if only to know it was possible, i.e., to check we hadn't come up with a metadata structure that was hard to encode in something other than JSON.

@joshmoore
Copy link
Member

Someone please correct me if this doesn't work.

Yes, as long as there is a place to "embed" a JSON tree, I assume I can make it work. (Note: that could also be another file if that's preferable)

@alimanfoo
Copy link
Member

Just to say I've done some work on the v3.0 core protocol spec in the development branch to provide a mechanism for alternative metadata encodings to be defined and used, more info in this comment. Note that this does not address the original request in this issue from @axtimwalde to provide a mechanism to support native storage of metadata, e.g., in an HDF5 backend. However, it would provide a mechanism to support use of encodings like flatbuffers or msgpack. Comments very welcome, just food for discussion.

@jstriebel
Copy link
Member

see also #141 and #81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol-extension Protocol extension related issue
Projects
None yet
Development

No branches or pull requests

6 participants