Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation tests #256

Closed
will-moore opened this issue Nov 10, 2021 · 5 comments
Closed

Validation tests #256

will-moore opened this issue Nov 10, 2021 · 5 comments
Labels
bug something isn't working question further information is requested

Comments

@will-moore
Copy link

Hi,
I read on the README https://github.com/HumanBrainProject/openMINDS_core#tests that "In the tests directory you can find JSON-LDs designed to test the validation behaviour of each schema" but I don't see a tests/ directory?

I'd like to understand how OpenMINDS does validation of JSON data against a schema.
My initial investigation is described at ome/ngff#75

Thanks,
Will

@lzehl
Copy link
Member

lzehl commented Nov 10, 2021

Hi Will, thanks a lot for raising this issue! Here some comments from my side, but I hope @skoehnen & @olinux will comment as well:

The tests/ directories are indeed missing at the moment, but these are not used for the actual validation, they are meant to be used to test the validation itself. They are missing, and are partially out of date, because we lack the manpower at the moment. I hope we can start to tackle these open issues latest in the beginning of the new year.

The schemas themselves are validated by the openMINDS pipeline. Here I would refer you to @olinux for details.

As for the general validation of JSON-LDs (metadata instances) against the schemas: technically you can simply use any JSON-Schema validators for this for the JSON-Schemas of openMINDS (not the *schema.tpl.json, see note below).

@olinux you can maybe explain how the validation for instances is done in the EBRAINS KG.

If you use the openMINDS Python library the validation should be done through the package which apparently does not work / is not implemented yet (@skoehnen please comment)

Please note that openMINDS Python is released in an alpha version. Not all features are fully implemented yet.
The validation is one we did not test/discuss yet in detail. For this reason I'm grateful that you brought this issue up!

I tested it myself and the instances seem indeed not to be fully validated yet (specifically the expected value type of the instance seems not to be correctly validated yet). This is indeed a bit tricky because JSON-Schema cannot handle JSON-LD linkages well (as far as I know). This might be the reason why this validation was not yet implemented.
@skoehnen can we correct / tackle this rather soonish?

Some notes/questions:

  • you can always look at the openMINDS schema template (*.schema.tpl.json), but the instances are validated against the respective formal JSON-Schemas.
  • what version did you select for the openMINDS collection?
  • can you explain a bit ngff for me/us?
  • let us/me know if you want to set up a meeting for further discussions?

🙂 Lyuba

@will-moore
Copy link
Author

Hi Lyuba, thanks for your reply.
We at OME are working on a "Next generation file format" for (bio)-imaging data, which is based on the Zarr format, with the metadata being stored in JSON (maybe moving to JSON-LD in future). It's early days, but you can read the current ngff spec at https://ngff.openmicroscopy.org/latest/.

We are starting to look at how we can validate the JSON metadata in ngff data, since we expect these files to be produced by many different users and tools and we want to be able to quickly validate the output.
We've been looking at various options, e.g. SALAD and shacl, and also openMINDs.

For the testing on ome/ngff#75 I had openMINDS==0.0.9.
I'll look more at JSON-Schema validation...

Will.

@lzehl
Copy link
Member

lzehl commented Nov 11, 2021

Ah I understand. So for openMINDS we make use of existing validators.

The current target format for openMINDS would be JSON-Schema for the schemas and JSON-LD for the metadata instances. Besides the LD part, our instances can be tested via a JSON-Schema validator. The LD part is a bit more tricky. Depth 1 for parent / child is fine and should be covered by JSON-Schema (with some tweaks), but actual graph validation across multiple nodes is not covered yet.

The openMINDS syntax (.schema.tpl.json) does not have its own instance validator at the moment (I also do not think we aim for one, but that may change). Meaning the openMINDS instances are always validated against the schemas that were translated to the JSON-Schema target format.

In addition, on the long run, we probably also support SHACL as target format for the openMINDS schemas.

More close to your approach is probably the validator of BIDS (Brain Imaging Data Structure). Not sure if you checked that out already?

Thanks for raising the issue though. We definitely need to fix the validation of value types in the Python library 😉
I hope my comments clarified a bit more how we do our validation.

@lzehl lzehl added bug something isn't working question further information is requested labels Nov 11, 2021
@skoehnen
Copy link
Collaborator

To add some links, we use this validator for JSON-schema files https://pypi.org/project/jsonschema/
The validation of JSON-LDs across links definitely more complicated.

@lzehl
Copy link
Member

lzehl commented Jul 26, 2022

@skoehnen just as a reminder that we could implement this as feature for the new version of the Python library. I suggest to implement the validation as separate function that a user can call when ready (with warning / error reports):

  • validation of individual instance, e.g., my_instance.validate()
  • validation of collection, e.g., my_collection.validate()

Validation should include simple value constraints (incl. type of linked child) for individual instances.
Some time in the future this can be complemented with a model validation based on model constraints (which are currently not yet formulated in the schemas).

@lzehl lzehl closed this as completed Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something isn't working question further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants