-
-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic vocabulary support #561
Comments
A key point of this proposal is that since meta-schemas still work the same way as always (although hopefully with #558's But a validator that either just looks at Unless we decide that |
So to confirm, by default it's assumed "all vocabularies" are used? What is "all" and where does that come from. |
@philsturgeon I'll break the "by default" down a bit:
In the 2nd case, with a It is still the case that by default, all unrecognized keywords are ignored. Therefore, there is no concept of "all" vocabularies. If you had an actual blank meta-schema, it would allow everything, but would not indicate any semantics. So I wouldn't consider it a vocabulary. You don't have a vocabulary until you constrain that open set of everything into specific syntax constraints (expressed as meta-schemas) and semantics (defined in prose specifications- I have some thoughts on formalizing this but almost certainly not in draft-08, I'd like to get some feedback and understand use cases with the basic concept first). |
I have updated the initial comment to include more examples of how the current specifications would be handled with this proposal, including showing the core applicator and (most of the) hyper-schema vocabulary meta-schemas, and how the hyper-schema meta-schema (the one people reference today) would be built from the vocabularies. |
As another test case, I am looking at how to frame JSON-LD as a vocabulary, mostly to allow the two systems to be used side-by-side and ensure that JSON Schema does not conflict with it. I've filed json-ld/json-ld.org#612 asking some questions about their existing JSON Schema for JSON-LD to start with this. This is related to #309. |
Makes sense! I love this approach, as it’ll help get folks extending JSON Schema for their own needs, without dumping the discrepancies into a word doc or forcing guesswork onto implementors. |
I'm going to start integrating bits of this into PRs. I'll wait on the This will stay open for more feedback on the details as they are all new with this issue. I'll mark this as Accepted when we have agreement (or conspicuous lack of objections) on the details, and then move on to a PR. Until then, I'll just be referencing the general direction. |
Given how much I've talked this up across every project, slack channel, or other forum I can find, I think this has been open for feedback long enough. Moving to PRs now! |
I know this ticket has been open for a long time, and I'm sorry for jumping into it while @handrews is already preparing a proposal, but while thinking about #682 I thought of a drawback that most probably has already been debated and discarded, but I could not find such discussion in the issue tickets. My concern about Until now, the minimum number of documents for validating an instance was just one (the schema); for implementations the actual meta-schema document was not really needed, since the schema grammar checks could be hardcoded following specs. However, if the vocabularies that a schema is compatible with are declared in the meta-schema, the minimum number of documents for validating an instance becomes two (the schema and the meta-schema); implementations will have to get access to the meta-schema document just to check the vocabularies and most possibly just ignore the declared grammar in the meta-schema and apply its own hardcoded one. Moreover, should implementations validate the schema against the meta-schema? What happens if the meta-schema is in fact incompatible with a declared vocabulary (e.g. declaring that the keyword I wonder if it has been taken in account the possibility of declaring the vocabularies in the schema instead of in the meta-schema (maybe optionally in addition of doing it in the meta-schema). |
Technically, they've always required the schema and the meta-schema. It just happens that the meta-schema has been hard-coded. This change will simply require a "softer coding" of the meta-schemas in that they'll need to be extensible to accept other sets of keywords.
Just like with any other schema, it remains the author's responsibility to require that a schema is valid, whether through compatible keywords or compatible vocabularies. For example, there's nothing currently stopping an author from writing this schema: {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "integer",
"minimum": 10,
"maximum": 9
} Instances will always fail against this schema. As such, you may say that it's "invalid." These are semantics that JSON Schema does not protect itself against. It assumes that the authors can identify such conflicts and avoid (or correct) them. The same goes for vocabularies. Additionally, I believe @handrews's PR does indicate that implementations SHOULD compare declared vocabularies and attempt to identify compatibility issues. |
@jgonzalezdr I am thrilled to see more people engaging the topic of vocabularies! It is a major complex change, and while folks such as @gregsdennis have provided valuable feedback and ideas I have been worried that it has been too poorly communicated (by me) to attract sufficient feedback. Also, we have tickets that took five years to resolve- the concept of "a long time" is relative, and I'm more worried that I'm rushing this concept. All add a few more things to @gregsdennis's excellent reply: Recognizing a URI is enoughIt is a key (and perhaps poorly explained) principle of JSON Schema that implementations may determine behaviors simply based on the URI of a schema or meta-schema. There are many ways that this can work, the two most common simply being that the behavior for a given URI is hard-coded, or that the document associated with the URI is available from some sort of local cache. With meta-schemas, most implementations either let you choose the draft version (the only interoperable use for meta-schemas until now) when you instantiate or call the code, and then completely ignore Side note: we've caused problems with pre-packaged meta-schemas by bug-fixing the meta-schemas in place (under the same URI), as @gregsdennis has reminded me from time to time. 😄 While I've been unapologetic about this in the past, I think #612 will provide a better way to manage that in the future, which I need to write up in that issue. From the point of view of how schemas and meta schemas currently are (or aren't) processed, vocabularies and I'd be interested in any suggestions on how to make this more clear. Preferably using fewer words, which is not my strong point 😛 Meta-schemas that are more restrictive than their vocabularies are a featureRegarding:
While declaring that a keyword has a different type than that given in its specification would be a problem, a restriction such as not using a particular keyword is fine. Restricting However, forbidding a keyword like Another use case might be restricting keyword combinations, like requiring that These use cases explain why vocabularies and meta-schemas are separate. The above restrictions do not change the semantics of the keywords at all. Wherever the meta-schema allows the keywords, their semantics are exactly as identified by the vocabulary. But the meta-schema can be used to restrict cases that are problematic for an application. Combining the above two conceptsIn order to support an open-ended set of meta-schemas that implement custom restrictions, implementations will need to actually perform meta-schema validation. This whole concept encourages people to customize meta-schemas for their applications. Validation against the meta-schema handles this correctly, but implementing keyword semantics still requires a human to read a specification and write code that implements it. So we expect implementations to simply recognize the URIs in This is another reason that I have hesitated to come up with a file format for vocabularies that would live at the vocabulary's URI. At least to start, I would prefer that implementations just use vocabulary URIs as actual identifiers, not as locators. Once we have some feedback on how this works in practice, we can talk about what information would be useful to put in a vocabulary file, if any, and how to go about that. So, to summarize this section, while outright conflicts between a meta-schema and its vocabulary are bad (and produce undefined behavior, at least with respect to whatever the meta-schema author was trying to do), "conflicts" that syntactically compatible are expected and even encouraged. Regarding conflicting vocabulariesThe PR states:
@gregsdennis I think this is the part you were referring to about detecting conflicting vocabulary semantics? My intention here is that meta-schema authors, who chose what to put in that meta-schema's I don't think there is any feasible way to expect implementations to do so. Defining vocabularies with compatible semantics for the same keyword (most obviously a vocabulary that adds more values for This definitely falls under the sort of "yeah, you can write invalid things, but don't" category as your |
Technically you're right. The problem that I wanted to illustrate is that vocabularies implicitly have a meta-schema associated with them. Therefore, schemas for vocabularies must be compliant with the implicit vocabulary meta-schema, independently of what a "custom" meta-schema defined by Of course, custom meta-schema should also be compliant with the vocabulary meta-schema, but what is a bit tricky here is that since schemas must be compliant with the vocabulary's implicit meta-schema but it is not mandatory for implementations to check that the schema validates against its "custom" meta-schema, at the end in most cases the "custom" meta-schema is totally bypassed.
It's not exactly the same case. The example that you propose is semantically incorrect (it does not have sense or practical usage), but is grammatically correct, and is confined at schema level, therefore implementations will have no problem in processing that schema. The problem that I commented is more like having incompatible grammars at meta-schema level (a bit like expecting that a sentence was grammatically correct in two different languages), and I think that it shall be very clear how implementation shall proceed when they detect such conflicts instead of delegating to meta-schema writers the responsibility of writing proper meta-schemes. |
You're right. It's not exactly the same. And you properly described why they're not exactly the same. However the responsibility of ensuring a custom meta-schema uses compatible vocabularies still lies on the author. The author must understand the vocabularies they're referencing and how they may interact.
A requirement to validate against the custom meta-schema (including all of the vocabularies that it uses) is one of the changes being proposed. Implementations will need to update to begin performing this validation. It should be noted, though, that once a schema has been validated it need not be validated again; this is a one-time operation, so performance impact to validation processes is negligible. |
The problem that I see here is that historically the vocabulary and meta-schema concepts where entangled, but is has proven that separating both concepts is necessary. I suppose that initially the Maybe the problem is aggravated because we are using the meta-schema term to really talk about a different and much narrower concept: vocabulary dialects. Let me define some nomenclature that may help establish some common ground:
I think that we all agree that implementations should only consider the vocabulary of a schema to decide if and how to process it. Dialects that only change the grammar would be transparent for the implementation, since they only impact the schema at the time of writing it, not at the time of using it. Implementations that "know" dialects that add additional keywords can process the additional keywords with its associated semantics (i.e. application-specific implementation), but other implementations should safely just ignore them. I think that I haven't said nothing really new here, and the idea behind the work in progress by @handrews is aligned with that. But I see a "design smell" in it: it makes meta-schemas a "first-class citizen", and we now have 3 levels: meta-schema, schema, instance. Moreover, as the vocabularies that have to be used by a schema are declared in the meta-schema, a meta-schema may not be able to validate itself, so we can end up with a meta-meta-schema or even a (meta)^n-schema. However, if the vocabularies that should be used to process a schema are defined in the schema itself, this infinite regression problem is solved. Additionally, implementations will have at hand the information needed to determine if and how to process the schema, and at the end, the meta-schema is not strictly necessary, since the schemas will have to be anyway compatible with the implicit vocabularies meta-schemas (i.e. not ill-formed). As a conclusion:
|
@jgonzalezdr This is a great summary, thanks. There is no infinite regression- you always end up at a self-validating meta-schema, generally speaking, it will be the one that is analogous to the current Your conclusion is proposing moving That is actually how a much earlier version of this worked (I'm not sure I actually ever wrote that version up except partially on my laptop- this concept went through a lot of unsuccessful iterations before I even bothered posting anything). The problem with putting That is far too high of a burden on schema authors, who are a very large population (among people who care about JSON Schema at all :-D ). Schema authors will have a wide variety of skill level and experience, and it needs to be very easy to write a basic schema. The set of people who will write meta-schemas is much smaller, although probably moderate-sized now that dialects (great term, btw, very helpful) will be easy to create. But I think it is reasonable to ask people who want to create dialects to understand the constituent vocabularies, and how vocabularies in general work. To me, it is a requirement that writing a schema is not more complicated with vocabularies than it currently is without. For most schema authors, vocabulary composition is an implementation detail. Most will probably just go on using the new versions of the regular meta-schemas we have right now and just start using new features like There is a vexing problem where, as you say, vocabularies kind of have an implicit meta-schema, except that we actually need to make it explicit, which runs the risk of annoying duplication. This is, as far as I can tell, a fairly intractable problem, and the main reason why I've gone through several versions of the proposal before posting. This is why I have avoided defining the resource at the vocabulary URI, a.k.a. the vocabulary description file. I am hoping that putting this out there in the real world will bring us feedback that will tell use what we need from that file. Originally, I was going to have them be the meta-schemas, maybe with some more extra keywords, but that got very complicated very quickly. If you have an alternative that preserves the simplicity for schema authors, I would love to hear it. Otherwise, I think the current approach of leaving the vocabulary file undefined allows us to avoid actually duplicating things between the meta-schema and vocabulary file (because there is no vocabulary file) and spend more time coming up with a practical solution to the problem. |
Meh, It took almost all day reading all-those-related-mega-threads regarding schema and/or data evolution. 😕 Still have no idea about best practice and final decision or even what proposals were rejected (time to create FAQ?). Practical Scenario 1:
|
@handrews: I see your point and I buy into that. My concern is about complex schemas "$ref-ing" other ones in a deep structure, with different vocabularies involved. But anyway, most probably the meta-schema "chain" will finish with the validation vocabulary meta-schema, isn't it? I'll give a thought to some practical use cases to see if I can find any caveat.
It may seem unnecessary and superfluous, but to ensure defined behavior of implementations against "ill" meta-schemas, my opinion is that specs should state, in addition to the current rule that a schema MUST be valid against its declared meta-schema, that a schema MUST be valid according to the rules for all vocabularies declared in the meta-schema. The temptation to rule that a schema must also be valid against vocabularies' implicit meta-schemas should be avoided, since actually the vocabulary rules can be fairly complex, and some may not be formalized as a meta-schema (at least with the current validation vocabulary). For example, an invented vocabulary could make mandatory that if both the |
Don't borrow trouble worrying about incredibly complicated meta-schema chains. I do not think it those will be common, and in any event this is why we publish drafts. This is not a set specification. The worst case scenario here is that we get feedback that it's confusing and slow, and then we improve it.
This is already done. From section 7 of the current published spec (referring to the
There is no way to enforce or even detect this. The meta-schemas take care of the validation that is possible.
No one is proposing anything at all involving implicit meta-schemas. |
@xferra great questions! We will definitely not be implementing any sort of packaging/versioning system! From the specification perspective, the place to put any versioning string is in the meta-schema URIs, and now also in the vocabulary URIs. You can put semver or whatever else in those. These are identifiers, so it is not necessary to serve the meta-schema at a retrievable URL (although it is possible, of course). It is (for this one draft, at least) expressly forbidden to serve any document at the vocabulary URI, so those are purely identifiers, so there's not really anything to package and distribute for vocabularies. If you implement a vocabulary with a plugin for a validator (or other tool), you should package and distribute it however the tool is packaged and distributed. If you want to distribute meta-schemas (particularly if you use a URN or have any other situation where the meta-schema cannot be served from its URIs, I would treat that basically like a configuration file, and package it either on its own (in which case your versioning may or may not match the URIs, I could see use cases either way), or package it with whatever uses it (the way many validators actually package the standard meta-schemas). I'm sure I'm missing some things here, but I think a key point is that right now there's nothing to distribute for vocabularies. You just document them and identify them with a URI. By the time we know enough to design some sort of vocabulary description file, we should know a lot more about use cases and real-world practices. Also, thanks for slogging through it all. Believe it or not, this is nowhere near the longest topic we've had! But I know it's a lot of work. |
In fact, implementations for a vocabulary do enforce / detect that the schema they are processing is valid. My point is not that a tool without prior knowledge of the vocabulary should be able to detect a mismatch between the meta-schema and the declared vocabulary automatically. Let me elaborate a bit more about the problematic that I think should be addressed: Suppose that I write a draft-07 schema that declares Suppose now that I write a meta-schema based on draft-07's, that allows Suppose now that I write a meta-schema based on draft-08's, that allows This is the "weak" point I see in the PR right now, I may be wrong but I think that it allows to define meta-schemas which are not valid for a vocabulary. This didn't happen with previous drafts because the vocabulary was tightly associated with a single meta-schema. |
To address this issue, in the "Best Practices for Vocabulary and Meta-Schema Authors" a new paragraph could be added in the likes of:
This requirement is similar to the "combining conflicting vocabularies" already present in the PR, but complementing it to address conflicts between the meta-schema and the vocabularies. |
@jgonzalezdr OK, the best practices idea makes sense, thanks! I'm not sure your example quite works out that way, but since we have a more concise recommendation to go with here I'm not going to sort that out. |
Thanks all! Merged #671. There will no doubt be more work on vocabularies but I'm calling this and the other closely related issues targeted for draft-08 done! If there was anything left unresolved from the discussion here, please file a new issue. |
Proposal
Vocabularies
base
andlinks
, and the keywords in the LDOsMeta-Schemas
$vocabularies
takes a list of URIs identifying the vocabularies described by the meta-schema$schema
in a schema,$vocabularies
must be in the root object of the meta-schema$recurse
(Recursive schema composition #558)Examples:
NOTE: "core-applicators" (stuff moved by #513) and "validation-assertions" (stuff left behind by #513) are not final names or vocabulary boundaries, I literally made them up while typing, please do not complain about whether they are "correct".
The applicators (per #513) as a vocabulary
This is where
$recurse
( #558 ) would primarily be used. This assumes thatdependencies
has been split per #528, and the applicator version is still calleddependencies
, while the string form is re-named and left in the validation vocabulary.Hyper-Schema as a vocabulary
This only shows some of the LDO fields, and ignores that we actually distribute the
links
schema as a separate fileHyper-Schema meta-schema with vocabularies
This assumes a "validation-assertions" vocabulary for the vocabulary spec, and assumes the core keywords do not need to be declared as a vocabulary (although maybe they should be, I'm not sure). Also, I'm waving my hands when it comes to where the basic annotations (
title
,default
, etc.) live, just pretend that's settled somehow please, as sorting that out is not the point of this issue.This also assumes that the draft-08 regular schema properly assembles everything except for the hyper-schema vocabulary. So while we declare all of the vocabularies explicitly, to get the meta-schema behavior, we just combine the regular meta-schema and the hyper-schema-vocabulary-only meta-schema (shown above).
OpenAPI 3.0's superset/subset problem
Using a meta-schema to constrain or add lightweight extensions helps discourage creating many similar vocabularies. For example, consider a meta-schema for OpenAPI's schema object, which does not allow the "null"
type
and instead has a boolean "nullable" keyword, and also does not allowpatternProperties
. @philsturgeon has referred to this mismatch as a "superset/subset".Also, they require extension keywords to begin with "x-" and forbid other keywords that are not defined in the spec. Note the use of
unevaluatedProperties
(#556) for this.This example explicitly
allOf
s the vocabulary schemas. A variation on the proposal is for$vocabularies
to also do that implicitly. Needs a bit more thought on whether you'd ever notallOf
them, and why. See #558 for why justallOf
works without redefining recursive keywords (the core-applicators vocabulary would be written with"$recurse": true
instead of"$ref": "#"
).NOTE: "core-applicators" (stuff moved by #513) and "validation-assertions" (stuff left behind by #513) are not final names or vocabulary boundaries, I literally made them up while typing, please do not complain about whether they are "correct".
What's going on here is:
$vocabularies
makes that clear while just having anallOf
is ambiguous."null"
value fortype
, the meta-schema prevents that value from appearingtype
– in the normal meta-schema, the type of type is"type": ["string", "array"]
patternProperties
, but the meta-schema prevents that keyword from being usednullable
is an extension keyword^x-
is an extension keyword pattern$vocabularies
There's more to work out but I think this is enough to start the conversation and find out which parts are particularly confusing.
The text was updated successfully, but these errors were encountered: