Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand axes tests to check for correct units #144

Merged
merged 4 commits into from
May 6, 2022

Conversation

melissalinkert
Copy link
Member

Checks that units are (or aren't) written as expected.

@chris-allan noticed when reviewing #134 that the units written by bioformats2raw are valid OME units, but not valid units based on the list in https://ngff.openmicroscopy.org/latest/#axes-md. This isn't technically incorrect, but probably isn't setting a good example. Adding OME units to the spec is probably the easiest solution, but I assume we'll need to discuss.

/cc @sbesson, @joshmoore, @dgault

@melissalinkert
Copy link
Member Author

4a0d8e4 should fix the majority of vocabulary mismatches between OME schema units (https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html#UnitsLength, https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html#UnitsTime) and the units suggested by the NGFF spec.

There are still a few things in the OME schema which are not included in the NGFF spec:

  • decasecond
  • decameter
  • thou
  • line
  • lightyear
  • point
  • pixel
  • referenceframe

Most of those are extremely rare outside of an artificial test case, but referenceframe in particular will come from Bio-Formats occasionally.

In discussing with @sbesson, @joshmoore, and @jburel, we could choose to do one or more of the following to better address vocabulary mismatches:

  • Update bioformats2raw to remove all units and physical sizes from NGFF metadata (which includes closing this PR). Physical size information is then only in METADATA.ome.xml, and there is no risk of unit names conflicting with NGFF spec recommendations. Scaling information is kept, but only in pixels space.
  • Expand the list of recommended units in the NGFF spec to include the remainder of what is already in the OME schema (as listed above). This may be incompatible with UDUNITS-2.
  • Add a unit schema/namespace field to the NGFF spec, so that it's clear how (or with what library) the unit name should be interpreted. This would allow separate recommendations for UDUNITS-namespaced units and OME-namespaced units.
  • Add explicit mappings in bioformats2raw (and other implementations?) from the units listed above to existing NGFF units. This is easy enough for decasecond, decameter, etc. but unclear for referenceframe.
  • Update the NGFF axes spec such that if a unit is present, it MUST be one of the units defined in the spec. This doesn't solve the vocabulary differences, but reduces ambiguity in how units are to be specified.

This is somewhat related to a few other open spec issues, including ome/ngff#91 and ome/ngff#104.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional tests and the mapping effort. It's reassuring to see that most units found in OME-XML can be mapped into the UDUNIT-2 vocabulary used in the NGFF spec via the enum name without the need for additional APIs/hardcoding. Converesely, this also means that any scale unit defined in the NGFF specification can be mapped back into the OME system of units. As a general rule, we should ensure these two systems do not start deviating in incompatible way and fill the equivalence gaps.

I think the solution proposed in this PR is the most straightforward way to allow to this tool to achieve strict compliance with the NGFF spec in the majority of the cases for the physical pixel size.

On the remaining units, I would be surprised is most of them do not have an equivalent in the UDUNITS-2 vocabulary. For pixel and/or referenceframe, it is possible the best mapping might be to remove the unit information in the NGFF space

@constantinpape any opinion on which ones of the options mentioned by @melissalinkert above should be converted as ome/ngff issues?

@constantinpape
Copy link

constantinpape commented Apr 5, 2022

just my 2 cents here:

  • Update bioformats2raw to remove all units and physical sizes from NGFF metadata (which includes closing this PR). Physical size information is then only in METADATA.ome.xml, and there is no risk of unit names conflicting with NGFF spec recommendations. Scaling information is kept, but only in pixels space.

I would be fairly unhappy with this solution, since we are recommending to use bioformats2raw to write NGFF metadata -- and with this solution a pretty integral part of this would be missing and could only be recovered from something not part of the NGFF spec at all.

I think my preferred solution would be:

On the remaining units, I would be surprised is most of them do not have an equivalent in the UDUNITS-2 vocabulary. For pixel and/or referenceframe, it is possible the best mapping might be to remove the unit information in the NGFF space

Also, what exactly are the meaning of these units: thou, line, referenceframe, point?
I have not heard the word thou in this context before; it sounds like Shakespeare ;). And for the others I don't see how these are meaningful units of measurement.

  • Update the NGFF axes spec such that if a unit is present, it MUST be one of the units defined in the spec. This doesn't solve the vocabulary differences, but reduces ambiguity in how units are to be specified.

Just for some context: one of the reasons we decided to keep this at SHOULD was to allow for other axis:type (e.g. angle) that would then have non-spatial/temporal units, to keep the spec extensible.

And I think an issue on any of this would be fine, but we can maybe narrow down the scope a bit further here.

@sbesson
Copy link
Member

sbesson commented Apr 13, 2022

Also, what exactly are the meaning of these units: thou, line, referenceframe, point?
I have not heard the word thou in this context before; it sounds like Shakespeare ;). And for the others I don't see how these are meaningful units of measurement.

thou is an imperial unit (1/1000 of an inch) and the udunits2 equivalent seems to be https://github.com/Unidata/UDUNITS-2/blob/c83da987387db1174cd2266b73dd5dd556f4476b/lib/udunits2-common.xml#L420-L426. Same for point which I think would map to https://github.com/Unidata/UDUNITS-2/blob/c83da987387db1174cd2266b73dd5dd556f4476b/lib/udunits2-common.xml#L549-L555

In general, the meaning of the OME length units should either be found in the XSD specification or in the generated documentation

Just for some context: one of the reasons we decided to keep this at SHOULD was to allow for other axis:type (e.g. angle) that would then have non-spatial/temporal units, to keep the spec extensible.

Since the 0.4 specification allows unspecified axis types, I agree trying to specify a complete list of units is not an option. For defined axis types such as space/time, I do not feel there is an issue associated with enforcing the list of available units. That's the first option I suggested in ome/ngff#91 so I will probably move this conversation there.

@chris-allan chris-allan merged commit b5f6c76 into glencoesoftware:master May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants