Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict meta-schema mode #682

Closed
jgonzalezdr opened this issue Nov 24, 2018 · 4 comments
Closed

Strict meta-schema mode #682

jgonzalezdr opened this issue Nov 24, 2018 · 4 comments

Comments

@jgonzalezdr
Copy link
Contributor

jgonzalezdr commented Nov 24, 2018

Problem description

Often schema writers use a wrong keyword instead of the right one, e.g. using by mistake "minLength" to limit the number of allowed properties or array items, and therefore the intended constraint is not enforced by the schema.

Additionally, also too often schema writers introduce typos in keyword names, e.g. declaring "mininum" instead of "minimum", and again the intended constraint is not enforced.

This kind of mistakes are hard to detect initially, because the schemas validate properly against the meta-schema, and too often the schemas are checked only with "proper" test instances (i.e. they have the expected number of array items), and these test instance of course validate properly as expected.

Of course, testing schemas also against "invalid" test instances helps detect these errors, but generating every single negative test for complex schemas it many times not feasible at all.

Solution proposal

A possible solution for this would be adding a strict "mode" for schemas to validation. The current behavior could be associated with the current meta-schema ("http://json-schema.org/draft-XX/schema#") and the strict mode with a new meta-schema ("http://json-schema.org/draft-XX/strict-schema#").

To avoid the use of "wrong" keywords, at specification level the changes would consist in indicating that in strict mode the "type" keyword is mandatory for schemas/subschemas, and indicating for keywords that only have sense when applied to a specific type that in strict mode they must not be used if the declared type for the subschema does not include that specific type.

To avoid typos in keywords, it should be also indicated in specs that in strict mode schemas must not allow properties not described in the specs.

I've created a strict meta-schema for draft-07 that I use to check schemas and that you may find here.

For the next draft, thanks to "$recursiveRef", the strict meta-schema could be just an extension of the "non-strict" one. Using "unevaluatedProperties": false (instead of "additionalProperties": false as in my example) would enable extension validation vocabularies while still avoiding keyword typos.

If there is interest in adding this feature I can prepare a PR with a proposal.

@handrews
Copy link
Contributor

@jgonzalezdr thank you for the thorough write-up. An actual "strict mode" has been debated repeatedly over the past five years, under two sets of editors, and rejected every time. The JSON Schema spec should not address runtime modifiers for implementations, for the most part.

The places where implementations are allowed that sort of choice (e.g. the optional nature of format validation) are some of the most vexing aspects of working with JSON Schema. We are working to eliminate or restrict that unpredictability (although I don't think improvements for format will make it into this draft, as it's a very hard problem).

You have already identified the correct solution, which is to write your own strict meta-schema and use that instead. With the addition of $vocabulary alongside the other new keywords you mention, implementations will be able to tell that your strict meta-schema describes schemas with the same semantics as the standard meta-schema, just with different validation criteria.

This is exactly what we want, and the only piece that appears to be missing from your assessment of how an explicit strict meta-schema would be used. Without $vocabulary, implementations would have no programmatic way of realizing that the keyword semantics of valid schemas are identical to those described by the standard meta-schema.

This puts the strictness under the control of the schema author (they can select the appropriate value for $schema) instead of under the control of whoever runs the implementation (requiring some sort of run-time switch, with an implicit replacement of the meta-schema).

If someone who is running validation wants to enforce a more strict approach, there are various ways to do that (e.g. just give your strict meta-schema the $id of the standard meta-schema and locally fake out your implementation- I do this routinely for testing purposes).

If you would like to discuss the possibility of publishing strict versions of our meta-schemas on json-schema.org, and documenting them in the specifications, please file a separate ticket for that. I think that is a worthwhile discussion.

Aside from that, as long as I have not missed something major in your proposal, I plan to close this. Strict mode has literally had years of debate, and at this point we are quite sure that we do not want it in the spec. That decision directly motivated the combination of unevaluatedProperties, $recursiveRef, and $vocabulary, which together provide an alternative that better aligns with the principle that schema authors should control validation behavior.

@awwright
Copy link
Member

awwright commented Nov 25, 2018

Not to say a strict mode is a bad idea though, I would highly encourage implementations to implement sanity checks and other features (notify about redundant/erroneous constraints, etc).

But I think if what we've seen in programming languages is any indication, this problem is better solved with a dedicated linter, rather than being a requirement of the media type.

And I think this is for the best: Since a linter can evolve faster than a media type can, this means the community can converge on best practices quicker too.

@handrews
Copy link
Contributor

@awwright yes, linters! Great point on the separate evolution rate, too, that had not occurred to me.

@jgonzalezdr
Copy link
Contributor Author

@handrews: Thanks for your detailed response. I now better understand the intended design of the specification. I guess the idea is to keep a lightweight specification that defines the minimum required grammar and delegates to implementations everything else.

And of course you're totally right, using $vocabulary will be the solution for that problem.

I'll open a new ticket for discussing about publishing "alternative" schemas for vocabularies, as this may be an easy and simple solution for schema writers to check that schemas do not have unintended mistakes using already existing tools, until more sophisticated tools are available as @awwright suggests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants