Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$schema should point to meta-schema incorporating our changes to JSON Schema #566

Open
jpmckinney opened this issue Sep 16, 2017 · 14 comments
Labels
bug Focus - Extensions Relating to new or proposed extensions, or the governance and maintenance of extensions Focus - Packages Relating to release packages and record packages
Milestone

Comments

@jpmckinney
Copy link
Member

jpmckinney commented Sep 16, 2017

Originally reported via email by @devgateway, quoted below:

The $schema section should define the meta-schema of the current schema. This is used by validation tools that comply with the standard, to correctly identify entities that are part of the schema. Further read on the standard page:

http://json-schema.org/latest/json-schema-core.html#rfc.section.7

and another third party page linked from json-schema.org, they say it is a good read:

https://spacetelescope.github.io/understanding-json-schema/reference/schema.html

The issue is OCDS defines draft-04 as the reference meta-schema: "$schema": "http://json-schema.org/draft-04/schema#"

Because extra properties are defined in the OCDS schema files, the line above should read something like this:

"$schema": "http://standard.open-contracting.org/schema/1__1/meta-schema.json"

which should be an extension of draft-04, adding the deprecated keyword, and others that were added on top of draft-04.

This meta-schema extension should allow:

  • validating the schema itself against the meta-schema (which is self-validating). For example if someone introduces now a typo in the definition of deprecated for a new deprecated field, say deprecatedVersoin instead of deprecatedVersion, this will be spotted easily.

  • make validation tools know what rules to apply for validation, right now they correctly ignore all these extra keywords because $schema says it only has entities defined from draft-04 (which in the case of OCDS release files is false).

So for me, because I use one of these validating libraries in jOCDS, I need to hack the OCDS schema and either remove $schema altogether (which is a problem in itself), or write my own $schema and define your extra keywords there...

@jpmckinney
Copy link
Member Author

jpmckinney commented Nov 2, 2017

It may be possible to simply re-use this (reformatting that info as JSON): https://github.com/open-contracting/standard-maintenance-scripts/blob/master/tests/test_json.py#L42-L57

@kindly
Copy link
Contributor

kindly commented Nov 3, 2017

@jpmckinney thanks for the link that was helpful.

In pull request #611 in the schema directory there is an adaptation of the draft-04 schema which is more strict (uses additionalProperties: false) and covers our use. It is also configured to run in the tests so travis will pick up issues. There is still some work todo on adding title and description tags to the fields we have added which @timgdavies said he would look at. Also, $ref and format properties had to be added to make the additionalProperties: false work.

I nonetheless have concerns about using this meta-schema in the the $schema tag.
Having a non-standard $schema tags means that there is not any machine way to detect what version of schema should be used when validating without looking at external docs to saying we use draft-04. The $schema tag has more meaning than just the meta-schema you are validating against it also refers to the actual draft document and the rules for each property.

Also it could be considered a breaking change and "could" break some libraries as they would no longer have a link to an official draft schema to validate against.

As our schemas are also valid against draft-4 then I think we should keep it in our schemas. draft-4 meta-schema itself does not validate itself (if you add additionalProperties: false due to the format and $ref) so this must be expected usage.

@jpmckinney
Copy link
Member Author

Hmm, but isn't the JSON document we are extending (http://json-schema.org/draft-04/schema#) the 'official draft schema to validate against'? Our meta-schema, itself, would have a $schema of http://json-schema.org/draft-04/schema#, but its id would be different.

Are libraries just looking at the string 'http://json-schema.org/draft-04/schema#' and deciding on a bunch of rules to apply, independent of the contents of the JSON document? I know that JSON Schema has non-schema rules about how 'format' is validated, and about what words like 'maxItems' mean, but I don't think those rules and semantics have changed across versions of JSON Schema. In other words, libraries shouldn't need to have different code paths for different versions. I contributed to a Ruby JSON Schema validator, and it didn't have special logic that followed from which schema version was being used.

http://json-schema.org/latest/json-schema-core.html#rfc.section.6.4 says:

Implementations MAY define additional keywords to JSON Schema. Save for explicit agreement, schema authors SHALL NOT expect these additional keywords to be supported by peer implementations. Implementations SHOULD ignore keywords they do not support.

Authors of extensions to JSON Schema are encouraged to write their own meta-schemas, which extend the existing meta-schemas using "allOf". This extended meta-schema SHOULD be referenced using the "$schema" keyword, to allow tools to follow the correct behaviour.

@kindly
Copy link
Contributor

kindly commented Nov 3, 2017

From my understanding there are non-schema rules across drafts that do change and that we can not be assured of them not in future.
For example the usage of required changed from being a boolean in properties to a list between draft 3 and 4 and the python library we use ended up special casing how properties worked in them:
https://github.com/Julian/jsonschema/blob/master/jsonschema/_validators.py#L243
https://github.com/Julian/jsonschema/blob/master/jsonschema/_validators.py#L294

However, you are correct that libraries could look at the meta-schemas $schema to determine the official version to look for but I am not sure they would do that.
I also have reservations about using allOf though as an extension mechanism as it produces very opaque error messages (at least in the python library).

On reflection, I do not mind if we do this, but we should at least wait for minor bump in OCDS version number if we do. There is quite a bit of work around adding the meta-schema.json to the official schema locations that has to be worked out and making sure the references in the id/&schema tags are in sync.

@jpmckinney
Copy link
Member Author

I mainly just want to kick the tires on this, so that we don't do things one way, and then decide at a later date that we should have done it the way the JSON Schema draft recommends all along.

Could you describe the specific work that would have to be done in a minor bump, while it's still fresh?

@kindly
Copy link
Contributor

kindly commented Nov 4, 2017

In a minor version bump we should do the following:

Governance

  • As this is a change to our JSON schema and it arguably a semantic change I think we should go through a advisory discussion like all the changes that happened in 1.1
  • We have to decide if we are adding this only to the new versions of the schema or we also do this to a new patch releases of 1.0 or to all existing schemas.
  • Have to decide if we have a new metaschema per patch release (what we do for all other schemas) or that we maintain different meta-schemas versions as they should change less often.

Documentation

  • We should add reference to this schema somewhere in the docs and add descriptions of each extra field, with translations etc.

Technical Changes

  • Add a test to make sure all schemas point to the correct metaschema (should be fairly straight forward)
  • The process of copying over the final schemas to the official locations is a manual step (purposefully so) and this procedure may need to be changed to add the meta-schema.json to those locations. This may be a lot more work if we do anything historically.
  • Need to decide if just having our own full copy of the schema (as in the pull request) or extend draft-4 in some way. For our case using anyOf (i.e anyOf our meta-schema or draft-04) will not work because draft-04 does not have additionalProperties: False, which means any extra property would still pass validation.

@jpmckinney
Copy link
Member Author

Thanks!

I think JSON Schema recommends allOf not anyOf for extending itself. You mentioned error messages being more opaque when using allOf. But in terms of benefit, would allOf mean that our extended schema could be shorter, instead of repeating all of draft-04?

@kindly
Copy link
Contributor

kindly commented Nov 8, 2017

I think using allOf would mean that we still need to copy the whole schema. This is also due to to having additionalProperties: False as our meta-schema would have to have all the properties too (or at least the ones we use) for it to pass. So we get bad error messages (these consist of basically saying that all of the schemas did not pass without saying why) without any gains in brevity.

My suggestion would be to use JSON merge patch internally. So we would have 2 files.
One would be just the patch (the minimal difference between draft-04 and our meta-schema) and this is the only one we should touch by hand. The other would be the the full meta-schema which we publish and link to in the $schema which would be generated from the patch as a process that we have tests for to make sure they happen (like the versioned schema).

@jpmckinney
Copy link
Member Author

Ah, true. All sounds good.

@kindly
Copy link
Contributor

kindly commented Nov 9, 2017

Pull request #611 now separates draft-04 from the patch we apply to it. It has tests too make sure the patch is applied if it modified.

@jpmckinney
Copy link
Member Author

jpmckinney commented Jan 4, 2018

There is still some work todo on adding title and description tags to the fields we have added which @timgdavies said he would look at.

These could be added to meta-schema-patch.json, but JSON Schema Draft 4 itself doesn't have titles or descriptions, so it seems acceptable to document the new fields in our documentation.

Noting that, before we replace $schema in packages with our own meta-schema.json, we need to address comments in: #566 (comment)

jpmckinney added a commit that referenced this issue Jan 4, 2018
[#566] Add meta-schema.json and test using it.
@mpostelnicu
Copy link

Hi guys

we have just implemented devgateway/jocds#14
Many thanks again for adding the metaschema files.

Wanted to see if there is a plan to include the new metaschema in the $schema attribute of the ocds schema from now on. For the moment we have to do this manually in order for validation to correctly work.

Thanks much !

@jpmckinney jpmckinney added this to the 1.2 milestone May 18, 2018
@jpmckinney
Copy link
Member Author

@mpostelnicu Great to hear! This should be part of 1.2, but we don't yet have a release schedule for it.

@jpmckinney
Copy link
Member Author

jpmckinney commented Nov 29, 2018

Related #426 See #426 (comment) for why these issues are distinct.

@jpmckinney jpmckinney added the Focus - Packages Relating to release packages and record packages label Oct 24, 2020
@jpmckinney jpmckinney added the Focus - Extensions Relating to new or proposed extensions, or the governance and maintenance of extensions label Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Focus - Extensions Relating to new or proposed extensions, or the governance and maintenance of extensions Focus - Packages Relating to release packages and record packages
Projects
Status: To do: Refactoring
Development

No branches or pull requests

6 participants