Skip to content

Simplify the processing of "$vocabulary" #1281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
handrews opened this issue Aug 22, 2022 · 4 comments
Closed

Simplify the processing of "$vocabulary" #1281

handrews opened this issue Aug 22, 2022 · 4 comments
Labels

Comments

@handrews
Copy link
Contributor

handrews commented Aug 22, 2022

This would not significantly change anything about the behavior of $vocabulary. While there are ambiguities in the current wording, I will be filing thoughts on clarifying those as separate issues.

Currently, $vocabulary requires cumbersome distinctions between schema vs meta-schema processing. This issue plus #1183 would allow removing such distinctions. It is also outside of the existing keyword classifications, despite not needing such special treatment. We can solve this by:

  1. calling $vocabulary an annotation, which it is
  2. stating that as an annotation, $vocabulary MUST be ignored (but still collected) if it is not from the first dynamic scope (defined as the dynamic scope with empty string as its evaluation path "")
  3. stating that the semantics of $vocabulary are only defined when the instance is a JSON Schema, and that any other interpretation MUST NOT be considered interoperable (I'm leaving the possibility open that someone might figure out some other viable interpretation, perhaps with a media type that embeds JSON Schemas, and there's no point in forbidding it because that usage is outside the scope of JSON Schema)
  4. stating a process for static use of $vocabulary: This was one of the main reasons for the weird description in the first place, which is that we don't want to mandate meta-schema validation as a prerequisite for figuring out the vocabularies. Since the schema object of the first dynamic scope is predictable (it's the object directly referenced by the (meta-)schema's identifier), implementations MAY inspect $vocabulary statically and consider it to have been applied as an annotation to the instance(-schema) root, and MAY cache this result just as they would have cached the annotation resulting from meta-schema evaluation.

Point 4 is the only part of the above that is not already part of the JSON Schema processing model in some way, and it just explains what I'm pretty sure some implementations do already.

While there has been some discussion of removing the restriction in point 2, let's not discuss that here (if anyone feels strongly about it, feel free to file an issue).


This would involve the following changes:

  • Update the first paragraph of §8.1.2 to state that $vocabulary is an annotation
  • Replace the last paragraph of §8.1.2 (about how $vocabulary MUST be ignored for non-schema instances) with point 2 (that only the annotation from the first dynamic scope can be used) and point 3 (that the annotation semantics are only defined for schemas). Most importantly, this removes the phrase documents that are not being processed as a meta-schema so that we can move away from special meta-schema processing. It also removes confusing language that implies that $vocabulary has any effect on the schema that contains $vocabulary (because of point 1 about annotations, there should not be any confusion on this point any more)
  • Add a new subsection before §8.1.2.1 on the static processing of $vocabulary (for avoiding the cost of meta-schema validation, and/or for implementations that do not support annotation collection)
  • Eliminate §8.1.2.2 Non-inheritability of vocabularies as it is now a clear and direct consequence of point 2, and the explanation of the "first dynamic scope" thing should be written in such a way that this is clear.
  • Eliminate §9.3.1 Detecting a meta-schema assuming Remove the notion of "canonical URIs" in favour of boundaried schema resources #1183 is resolved in a way that no longer requires this section (if it's not yet resolved, this part can be deferred until it is).

Any objections? The only practical impact is that implementations that collect annotations should now collect $vocabulary (from any/all dynamic scopes, the restriction on dynamic scope is only on the usage, not the collection. Since we don't have annotation tests yet, there is no impact on the existing test suite.


If no one objects in the next week I'll write a PR for this.

@lud-wj
Copy link

lud-wj commented Mar 3, 2024

Hello,

I am trying to make sense about $vocabulary but it does not feel like it is only an annotation. For instance in the test suite, it is expected that vocabulary that is not declared in the metaschema is not applied, even though the given data is not itself a schema (as if we were validating a schema with a meta schema) but mere data.

@gregsdennis
Copy link
Member

@lud-wj we have a backlog item to create docs for vocabs in general. Until we get that on https://json-schema.org, please have a read through my docs on the subject. That should help.

The test suite does check that assertion keywords that are defined in unlisted vocabularies are not validated, yes, but that's not considered a meta-schema validation.

$vocabulary itself doesn't provide an assertion, and it doesn't contain subschemas (so it's not an applicator). It really is annotative only, but it doesn't create an annotation in the output. Its presence in the meta-schema tells the tooling what keywords are defined for the schema. Thus, if a meta-schema is used that doesn't list the Validation vocabulary, then none of those keywords (e.g. maximum, minLength, etc.) should be processed, and those keywords become "unknown."

Note: since 2020-12 provides annotations for unknown keywords' values, you will get annotations for those. This means that annotation-only keywords behave the same whether their vocabulary is listed or not.

@lud-wj
Copy link

lud-wj commented Mar 3, 2024

Thanks @gregsdennis your website is really helpful. I think I get a better understanding of where the "split" is between what capabilities a vocabulary declares and the part that it declares to validate schemas using those new keywords.

@gregsdennis
Copy link
Member

These items need to be incorporated into whatever vocabularies ends up being. Regardless, it's being extracted into the feature life cycle, so it'll need to be worked out there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants