Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create GitHub Action to validate JSON-LD files #3

Open
pbuttigieg opened this issue Aug 25, 2023 · 6 comments
Open

Create GitHub Action to validate JSON-LD files #3

pbuttigieg opened this issue Aug 25, 2023 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@pbuttigieg
Copy link
Contributor

@pieterprovoost @marc-portier would you know of any off-the-shelf validators we can deploy in a GitHub Action to make sure the JSON-LD/schema.org files MBO participants create are in good shape?

Something that essentially runs https://validator.schema.org/ on the collection as it emerges?

@pbuttigieg pbuttigieg added the question Further information is requested label Aug 25, 2023
@pieterprovoost
Copy link
Contributor

It looks like https://validator.schema.org/ is not available as a package or API, but I came across these candidates:

Can we assume that all JSON-LD documents will be added to this repository, or do we need to validate documents embedded in web pages etc as well?

@marc-portier
Copy link

my reflex on this would be to use standard python stuff for the lower levels like

  • json.tool to verify the low level syntax (as I suspect issues will be there already)
  • RDFLib.parse to check if things are actually representing a workable knowledge graph

and then don't be shy to manage our own shacl AP description for optimal control of:

  • the expressed triples / property paths we find important
  • the meaning full error-messages it should preoduce in the shacle report
  • the formal documentation this shacl then represents

with that in place we can simply slam in RDFlib/pyshacl to do the validation

in fact, it also keeps things standard enough so other than py implementations for this workflow could be considered at any time (less lock in ?)

with respect to getting hold of the rdf - per question of @pieterprovoost:

  • I would start off with an assumption of clear URI that point to RDF (jsonld by default, but rdfxml or ttl could easily be supported too) and see how far that gets us
  • only later investigate more elaborate harvesting/crawling/scraping techniques:
    • using content negotiation
    • reading FAIR-signposting links
    • checking for embedded script tags through some html parses (e.g. py bs4 package)
      (but even then I would keep the harvesting separate from the validation, and just make sure we keep some provenance trail so the possible error-report can be linked back to the actual source ?)

@pieterprovoost
Copy link
Contributor

I have created a minimal proof of concept for validating JSON-LD documents hosted in this repository.

  • The repo now has a datasets folder for JSON-LD documents.
  • There's also a validation folder with scripts and shacl subfolders.
  • The docs folder contains a Jekyll template.
  • On every push, a script checks all JSON-LD documents in the datasets folder and performs JSON and SHACL validation. Results from the validation are added to the Jekyll website in a separate branch reports.
  • When there are changes in the reports branch, the Jekyll website with validation reports gets deployed at https://lab.marcobolo-project.eu/dataset-catalogue/.

@marc-portier
Copy link

marc-portier commented Sep 8, 2023

cool work

would be great to add explicit sh:message and sh:sevirity -- getting those from the shacl validation-result into the generated page could help us guide people to what they actually need to change ... --> but maybe something to address in scope of #4

@pieterprovoost
Copy link
Contributor

Results are presented in tabular format now, and presentation can be further improved once we get some more extensive / realistic validation results.

@kmexter
Copy link
Contributor

kmexter commented Aug 22, 2024

Is this issue to be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants