The role of meta-schemas for the stable dialect #384
Replies: 8 comments 53 replies
-
The way I see it, the problem we're trying to solve with the stable dialect is to stop the proliferation of JSON Schema dialects while still being able to evolve the spec. Having so many dialects is bad for users and implementers. Users should be able to just write schemas and not worry about dialects. Implementers shouldn't need to include explicit code and meta-schemas for every release we do. I expect the URI schema authors use in From there, we can go two directions. One is that the dialect URI is the meta-schema identifier. In that case, the meta-schema needs to describe the dialect and be inclusive of all the releases that came before it and may come after it. The other direction is that meta-schemas describe a release. In that case, users would still use the dialect URI in their schemas, but implementations would map that request to the latest release meta-schema they support (similar to an HTTP redirect). However, a meta-schema that describes a release isn't always what you would want. We're modeling our releases off of ECMAScript. In that model, releases are just a description for implementations to advertise what features they support. For example, if ES2024 includes three new features, you're expected to support whatever ES2024 features you can even if you don't fully support ES2024. You just can't claim ES2024 compliance until you support all three new features. You aren't forbidden from supporting any ES2024 features until you support them all. Let's assume you have an implementation that has full support for 2024 plus partial support for 2025. A 2024 release meta-schema that forbids unknown keywords would fail for a schema that uses a supported 2025 feature. That's where implementation specific meta-schemas start to makes sense. I think it doesn't matter much whether we provide a dialect meta-schema or a release meta-schema as long as we're clear about which it is. What matters is that the dialect URI and vocabulary URIs don't change. Implementations can choose to use an implementation-specific meta-schema if they have needs that don't match whatever we choose to provide. |
Beta Was this translation helpful? Give feedback.
-
After typing up a long-winded reasoning that leans heavily into why we should include I think this renders the debate moot while also providing users with what they're asking for: a meta-schema that checks for mistyped keywords. |
Beta Was this translation helpful? Give feedback.
-
As I've mentioned before I don't personally believe it is/will be possible to have a single permanent release that never has a single future breaking change, so it's hard not to think about this specific question with that in mind. I'm going to use this specific comment for 2 seemingly "off-topic" parts of what I get from reading the post, and leave responding/having an opinion on the core issues of what we do with metaschemas for after once I read your second comment and form concrete opinions on the questions you called out. (Thanks for defining the terms you're using!)
This is quite odd to me -- I'm assuming you have some reasoning beyond what's in the comment, but I am pretty confident this kind of definition of backwards compatibility is not what users would generally expect. I'm also not sure whether it's specifically relevant to the meta-schema discussion, it seems likely not to be? But perhaps you can elaborate somewhere (here if somehow it is relevant) or elsewhere, on why you're including this in your definition of compatible (after all, here's a cheap way to use it to make any change backwards compatible: rather than inverting the validation result, simply declare it to be indeterminate!).
A minor, pedantic point (that I'm sure you agree with) -- historically it's to help validate a schema, not validate a schema entirely -- after all, a schema valid under the 2020-12 metaschema may still not be a 2020-12 valid schema -- but the difference between the two is so difficult that IME essentially no implementations do much for the latter, so the reason metaschemas are so useful is they give implementers a way to do something without additional effort once they've written their implementation. Just pointing that out explicitly as I think it's relevant enough for thinking about future design that even today, meta-schemas are "catch as many possible issues" more than "be the gold standard". |
Beta Was this translation helpful? Give feedback.
-
When would someone use a meta-schema separately from an implementation? How would they do that? |
Beta Was this translation helpful? Give feedback.
-
I have some serious reservations on this if I’m understanding correctly. I haven't provided feedback as fast as I'd have liked, but also this discussion isn't that old. Critically, I think I disagree with the definition you are using for “compatible” releases. |
Beta Was this translation helpful? Give feedback.
-
The discussion here is good and useful! First, I feel the need to define what I understand by "compatible" in a general sense. To me, it means "can be used without problem". If I call a function with a given input, I should get the same output. Going to the opening post here, @jdesrosiers defines...
Full agreement! However, you continue...
If I cannot get the exact same answer, I see this as a breaking change, and therefore not compatible. Based on some discussion on one of our recent calls, I believe I'm not the only one to struggle with the definition of compatible defined in the opening post. (Granted, @jdesrosiers wasn't there to defend.) There was some further discussion where the position was defended... (I'm going to take this in two parts) (pt 1)
Removal is not the same as depreaction. As you said...
(pt 2)
Removal as opposed to just deprecation? Maybe we were confusing terms at the time? If we assume depreaction and not removal, we negate the need for "indeterminate" being required in the 2025 column of your example to be a definition of comaptability. What about the inverse? Forwards compatability. (Given I'm not convinced we agree on the definition of dialect, I'll use "release" here.) Let's say an implementations supports only up to 2024, and not yet 2025. Today, many schemas are published without the use of Prong 1: The user has a schema from pre compatible era JSON Schema. They have no way to determine this fact. The schema uses different definitions / understandings of keywords from previous versions of JSON Schema. Surprise! Sometimes the validation result is not what you expect. Prong 2: The user creates a schema using "compatible" era standard. The schema is thrown onto a public registry without any declaration as to the release/dialect used. The schema is then used by other consumers with tools from the pre compatible era. Surprise! Sometimes the validation result is not what you expect. While thinking about compatible era JSON Schema, we are occasionally choosing "not to see" previous era JSON Schema, but this is an area where we can't do that fully. Critically, the use of Let's recap the earlier scenario. Implemenations which support "compatible" era validation can do a kind of "gateway" check on the a "compatible" era schema, and proceed to run it through the same code regardless of which version the schema identifies as the authors intent. This is true, because the implementation would be using the meta-schemas to determine if it knows all the keywords it needs to based on the authors intent. (And, double checking the authors intent and what they have produced, don't missmatch.) I appreciate the above doesn't directly address a number of other things in this discussion, such as defining a "dialect" or "release", but I feel that understanding what "compatible" means, underpins a few of the other discussions. I'll come back and try to pick out other specific questions or problems I see, but this is all I can manage for today. |
Beta Was this translation helpful? Give feedback.
-
I'm putting this response in another thread, because I again see no connection to this ticket so I'm still confused why it's here, but given it's discussed above:
To be clear on my own opinion, which of course relates to me not believing we will have one single compatible release forever, I want us deprecating things (and not removing them without deprecations), and then in some defined period indeed removing deprecated things (otherwise effectively this happens anyhow, since no one will implement some X year deprecated feature in an implementation that is new), and that release is indeed one with backwards incompatible changes (but it must only contain removal of already deprecated things, and in general all releases must not change things that are not deprecated).
I was trying to avoid discussing it here since it really doesn't seem related to this issue, but I think that's silly personally, I don't think having a definition of "compatible release" that no one would use (i.e. no one would understand without context because we're using the word in a way no one uses) is the right way to say we want to be able to make incompatible changes -- I think we just need to list the kinds of incompatible changes we want to make and under what circumstances. |
Beta Was this translation helpful? Give feedback.
-
Trying to be more on topic: I agree with your updated definitions of "dialect" and "release". I think it's worth adding a comment to @Julian's earlier post...
I have to agree here. I believe we want to have the ability to remove things in extreme situations, with the intent to never want to do so. Redefining "compatibility" to make that allowed, doesn't make sense. Let's just call a spade a spade (say what we actually mean), and be really clear in communicating exactly the intent. I think what Julian suggests here is very strong. @mwadams also shares the concern that saying we promise 100% compatibility forever is a nice wish, but probably nieve. I get the sense Matthew and Julian are speaking from first hand experience. "We won't break compatibility without a two years deprecation notice, not everything deprecated will be removed, and removal and therefore breaking a small fraction of compatibility, will only ever happen in extream circumstances," doesn't sound unreasonable. (I'm not advocating for that extact wording or timelines at all here.) On to the direct questions from this discussion...
I think dialect and/or release is fine. The implication is, people can be specific if they so wish, as required.
I think yes. We can publish both "open" and "closed" versions of the living dialect schema and each snapshot. We can be clear about the implications of picking which meta-schema to use. The author should be able to make their intent clear.
If I understand correctly, then, already answered, yes.
As a Schema author, I want to be able to apply constraints. It reduces "surprises" and creates a better developer expereince (at least, for the near future. Catching tooling up may take years).
I think being able to do either is fine. A "release" would imply the dialect.
I can't think of one. As per above, a "release" would imply a dialect, would it not? @jdesrosiers (or indeed @gregsdennis) my suggestion here, to attempt another ADR PR as a draft, drawing a under at least what we seem to have now agreed so far. We can then create a further Discussion for outstanding concerns if required, and look to finish and merge the ADR. (I think I objected previously because it looked like it had been decided that there would be no "release" identifiers one could used for |
Beta Was this translation helpful? Give feedback.
-
I think we've been disagreeing on recent meta-schema discussions because we have differing mental models of what a stable JSON Schema dialect means. I think ambiguity of terms has been a cause of misunderstandings, so I'll start by defining the important terms I'm using.
Definitions
Compatible releases -- The validation result of any instance against a given schema must have the same true/false result according to every release in the compatible set including past and future releases. A validation result may be indeterminate (not true or false). If an instance is valid against one release and indeterminate against another, the releases are considered compatible.
Dialect -- A set of compatible releases. In the past, dialects have been a set of one release, maybe two if there was a patch release. Through 2020-12, official dialects have been called "drafts". A dialect may be defined using the Vocabulary System (OpenAPI 3.1) or not (MongoDB). Some dialects may be compatible with an official JSON Schema dialect (OpenAPI 3.1 <--> 2020-12) while others are not (OpenAPI 3.0 <-/-> draft-04).
Release -- Any published snapshot of the specification. There may be multiple releases for a single dialect. In the past, the only time we've had multiple releases for a dialect was for a patch release. The next release would introduce the "stable" dialect and releases after that would be updates to the "stable" dialect.
Roles of meta-schemas
Meta-schemas have historically served several roles in JSON Schema. As we move to a compatible-releases process, we need to ask how that change affects the role of meta-schemas.
Validate a JSON Schema
The most longstanding role of meta-schemas is to validate a schema. But, what does it mean for a schema to be valid? A schema could be valid according to a dialect, release, or an implementation. In the past, these three validities converge, but now that releases can make compatible updates to a dialect, those three can diverge and we need to decide what a meta-schema should be describing.
Questions:
Identifying a Dialect
Schema authors use the the meta-schema URI with
$schema
in their schemas to identify the dialect of JSON Schema they expect the schema to be evaluated with. There has never been a way to identify the release of JSON Schema.Questions:
Defining a Dialect (Vocabulary System)
The third role of meta-schemas is to define a dialect using
$vocabularies
. When we release a new dialect, the vocabulary identifiers change to match the dialect. When we do an additional release for the same dialect, those identifiers don't change.Questions:
Beta Was this translation helpful? Give feedback.
All reactions