Standardize implementation-defined behavior #190
Replies: 52 comments 14 replies
-
I think the intent in that specific paragraph is to say that an implementation can choose which draft it will use to process the schema. I expect most will choose the latest that they support. |
Beta Was this translation helpful? Give feedback.
-
My preference would be to change what's defined here. However, I think we COULD say that, the implementation MUST select a dialect it knows about as default, which SHOULD be a JSON Schema org defined dialect. Additionally I'd like to add that if the dialect is chosen by the implementation (And not via This would be more impacting tools used to write schemas with immediate feedback. I'm not fully aware of the effects this could have for general purpose applications. |
Beta Was this translation helpful? Give feedback.
-
I added an abstract in the OP, but I'm not sure if that actually clarified what I'm looking for. I'll try to revise it as I better understand the problem space we're dealing with. (Edit: I ended up rewriting most of the OP.)
Yes, the behavior should be better specified, and not left undefined. However I'm not sure an error is reasonable. There's no situation where
This is more reasonable. Implementations shouldn't be allowed to pick any arbitrary behavior on their whim, the selected behavior should be unsurprising. However I don't think "dialects" are a good way to reason about this. Media types are a way to version. We're currently having this debate over the |
Beta Was this translation helpful? Give feedback.
-
Be that as it may, JSON Schema has MANY uses outside of HTTP requests. |
Beta Was this translation helpful? Give feedback.
-
Sure, though this doesn't necessarily preclude using HTTP and Internet features. I wasn't seriously suggesting this as a solution, but it's interesting to think about.
I think I understand your position now, but I still have questions and problems that need addressing:
|
Beta Was this translation helpful? Give feedback.
-
The example that So, an implementation needs to know or assume a dialect somehow to correctly evaluate even the simplest schema. The vocabulary system makes just about anything possible. In my implementation, I require that a dialect is declared somehow. It doesn't have to be with |
Beta Was this translation helpful? Give feedback.
-
@jdesrosiers But I didn't say anything to the effect of "no matter which dialect is used"; the example |
Beta Was this translation helpful? Give feedback.
-
@awwright I'm not sure what distinction you're trying to make. Every schema is written in some dialect whether that dialect is declared or not. If the dialect is not declared, an implementation may choose one to use to interpret the schema. That dialect does not need to be an official JSON Schema dialect. It could be a dialect where
This is a false assumption. We've made backwards incompatible changes in almost every release. We've never made any guarantees that any keyword will always work a certain way in every dialect past, current, or future. The vocabulary system allows users to create dialects that do almost anything and they are not required to be backwards compatible with official JSON Schema releases. |
Beta Was this translation helpful? Give feedback.
-
The idea of "dialect" was comparatively recently introduced. E.g.
Ok, but
I've addressed this; we make very few changes that actually force implementations to change behavior in a breaking way; and only in very well-researched situations (like $ref). If you mean how we occasionally remove behavior: HTTP, email, etc, also remove behavior in every new release, that's not the same thing as "backwards incompatible" (removing something from a tech spec usually doesn't force implementations to change their behavior).
Ok, but I'm not talking about custom dialects/vocabularies/meta-schemas. The question is: What happens in the default case? |
Beta Was this translation helpful? Give feedback.
-
The name "dialect" is relatively new (if you consider three years new). The concept is at least as old as the
I don't understand what you're trying to say. I can create a dialect that does something weird with
You say this as if we're considering a change to the specification that might break that schema. This problem already exists. Custom dialects that can do almost anything are already a reality. You can't put that genie back in the bottle. I'm not defending it. I'm just saying we have to accept what already exists in the wild.
We don't have one source of truth for what the default behavior is. Right now, every dialect declares it's own rules and historically, those rules have been very permissive. Even if in a future release we constrain what can be done in the default case, implementations that were written for previous drafts would not be affected. They could still choose whatever dialect they want (or error, or whatever) in the default case. If the default behavior is different between dialects, there's no way for an implementation to know which default to follow. |
Beta Was this translation helpful? Give feedback.
-
The ability to use a keyword to change the dialect so that any keyword can mean anything does not go back that far. The concept of "$schema" was introduced in draft-03, as a to provide a hint to validators. Its use by both authors and validators was completely optional:
This is consistent with "$schema" being a meta-schema reference, that could optionally be used as a versioning heuristic. At the very earliest, the idea that you could use "$schema" to switch behaviors was draft-04, when Hyper-Schema was published as a separate specification. Even at this point, I don't believe $schema was required in order to switch behavior. If I used a JSON Schema validator that was hypermedia enabled, I would expect the hypermedia functions to work even in the absence of the "$schema" keyword. In draft-05 a.k.a. draft-wright-json-schema-00, I removed specific references to older values of "$schema", replacing it with a generic paragraph about how it's OK to implement values found in other publications. This is where the idea that "$schema" can switch behaviors properly comes from; and it was very carefully worded to maximize forward compatibility.
You seem to be caught up on the idea that I can define a custom dialect and give "type" whatever semantics I want. I am specifically saying that I am not using this functionality. When someone publishes a post on Stack Overflow about why We don't need to ask and it doesn't matter. We know we're talking about the validation keywords defined in the latest draft. But how does a validator know this? Where is this written? It appears that it's not. (Now if you want to adjust the settings, or write an implementation that does something special ($data), or create a new dialect that does something unexpected, go for it; that's not what I'm objecting to here.) |
Beta Was this translation helpful? Give feedback.
-
I get that you two are trying to come to a common understanding, but it seems to me like you're both splitting hairs. This conversation has veered away from its original purpose, which was to unify implementations' behaviors when This isn't an error state like the other cases where we say "implementation-defined" or "undefined" behavior. In this case, we have a schema that can be processed. Implementations should do it the same way. I don't really have stake in which way this goes, except that I need to know what to implement. |
Beta Was this translation helpful? Give feedback.
-
Yes, I agree with this. |
Beta Was this translation helpful? Give feedback.
-
Actually, I'm not sure we can answer this question without coming to an agreement on this question about the fundamental nature of how JSON Schema works. But, this discussion isn't making progress and we need to stop until we can come up with a more effective way to have this discussion. My perspective on the question of what the default behavior should be is that there is no safe choice other than throwing an error. The problem with defaulting to a specific draft is that an implementation can't change that default without possibly breaking their user's code. People will depend on that default not changing, so you can't just change the default to the current version every time there is a new release. It would be fine if each release was backwards compatible, but that's not the case. The only safe thing to do if no dialect is known is to refuse to process the schema. The other problem is that defining a default behavior would transcend dialects and would effect all implementations in existence regardless of what dialect they were written for. That's why I think the right place for this to be defined is the media-type specification that is currently in progress rather than in our next release. But, globally defining a default behavior would break many existing dialects. For example, OpenAPI 2.0 and 3.0 and MongoDB depend on assuming their dialect is the default. If we say the default is, for example, draft-07, then suddenly those OpenAPI and MongoDB schemas that don't declare a |
Beta Was this translation helpful? Give feedback.
-
Except for some standard assumptions, the fundamental nature of how JSON Schema ought to be defined in the specification. These assumptions would be:
@jdesrosiers Perhaps you can confirm which assumptions you're making, or which of these you disagree with, and add to this list anything you think is relevant. For example: You've mentioned how $schema can be used to switch behaviors for $ref, a core keyword. But this is not clear. Section 3 defines a dialect as a "set of vocabularies"; Section 8 says that "$schema" sets the dialect; that the core vocabulary is required; and no keyword (core or otherwise) is defined in terms of the dialect or value of "$schema"; therefore, the core keywords must exhibit the same behavior regardless of dialect, or value of "$schema". Is this an assumption you're making that ought to be self-evident, is this a proposal you're making, is this a contradiction in the spec to be fixed; or is this a faulty reading on my part? |
Beta Was this translation helpful? Give feedback.
-
Can you be more specific?
Through draft-05 at least, a validator would not be able to reject a schema just because it didn't understand the meta-schema. Or at least, this was not required. It wasn't even suggested. Some amount of reverse compatibility was the default. It didn't even exist until draft-03, before which, new drafts introduced keywords like "divisibleBy" and "uniqueItems". Now, maybe the behavior was under-specified, and it deserved a stronger definition. But this has reduced backwards compatibility, and this should be closely examined. |
Beta Was this translation helpful? Give feedback.
-
I'm responding to comments like:
or
or implicit claims in
that because something worked a certain way until a certain point that it works that way forever. We have a specification -- what you or I think is right or wrong based on external knowledge isn't relevant -- you need to cite where in the specifications the things you're claiming are written. |
Beta Was this translation helpful? Give feedback.
-
Without that of course your (or my, or anyone's) opinions are of course valuable to inform what we do going forward but have no bearing whatsoever on what's already written down clearly. |
Beta Was this translation helpful? Give feedback.
-
The context of those comments are things I would expect if interoperability is a goal. Where one party can rely on other (compliant) implementations to have a predictable behavior.
If I send out a schema to two clients, Alice only accepts 2019-09, Bob only accepts 2020-12, then JSON Schema doesn't seem very interoperable! Now I don't expect everyone to be on the latest version of software, but I do expect to be able to fall back onto a version that everyone supports. But since 2019-09, JSON Schema doesn't proscribe backwards compatibility, at all. Should interoperability—other implementations having predictable behavior—be a goal? Do you agree with the conclusions I'm drawing from this definition? |
Beta Was this translation helpful? Give feedback.
-
I don't know what you mean by "conclusions". Draft 2019 and 2020 work the way they say they work. That's not mutable. No logical argument, or definition, or contortion can change what they say. You can work to change how draft 2022 works by trying to convince others to have the same definition of interoperability that you do, or to value interoperability more than it has been, either of which may explain why the 2 drafts don't work the way you expected them to. You cannot change how the existing drafts work, and you seem to continue to try and use a logical argument to affect how 2 clear specifications work. I'm trying to point this out simply to try and make it plain how futile that line of thought is, it's just not how specifications work. If or once we get past that we can worry about more useful things like what we want to be, and stop trying to push walls for things that already are the way they are. You also continue to make plainly incorrect statements, which is a frustrating way to have discussions:
No draft prior provided backwards compatibility guarantees. You are aware of this. There are at least 2 examples which I thought of very immediately of this when prompted. Please stop painting these two drafts as the first deviation. They deviate in a specific way you disagree with. I disagree with the |
Beta Was this translation helpful? Give feedback.
-
@Julian There's a serious misunderstanding here. The theme of my posts today has been: Is interoperability a goal, do the drafts satisfy that goal, and what improvements can we make?
This is worded like I disagree, but I'm not sure what you're arguing against.
I did not mean to say that e.g. draft-03 proscribed reverse compatibility. Backwards compatibility was nonetheless supported, as a consequence of how it's written; and if we want to support the same kind of backward compatibility in the future, we will have to proscribe additional behavior. For example: Except for some keywords that were removed, a draft-03 validator could be fed a draft-02 schema and it would be guaranteed to work—$schema was not a keyword then. In contrast, you cannot take a 2019-09 schema and reliably expect it to work in validators compliant with subsequent drafts. Such a validator could reject all older schemas and this would be legal (that is the topic of this issue). This is what I mean when I say our support for backwards compatibility has been falling, and should be re-examined in the light of $schema and meta-schemas. |
Beta Was this translation helpful? Give feedback.
-
@awwright OK, after going for a walk and coming back to see these last couple of comments, I decided to do some serious digging. You mentioned draft-05 after my last comment, and I had not looked at that or draft-06 when I was looking at Imagine my surprise when I discovered that there was in fact normative wording regarding past drafts in draft-05 through draft-07. From draft-05 (the relevant wording is carried through draft-07 unchanged):
First of all, why did you not simply quote and link this specification text? I asked repeatedly for any sort of written evidence of compatibility requirements. All you needed to do was show this to me, and we could have saved a tremendous amount of effort and frustration. I had been working from the draft-03 and draft-04 text, which has no such requirements, and since you were referencing those as well, I assumed I was looking at the right thing. Anyway, I was wrong in my assertion that there was never any language around this. There was, for four drafts (wright-00 and -01, handrews-00 and -01). Having discovered that, I went and hunted down the PRs where you added that language, and where I removed that language. @awwright, you added that language in PR json-schema-org/json-schema-spec#50 "Fix a lot of id/ref/dereferencing problems", which you posted on Sept. 15, 2016 at 1:31PM PDT, and merged less than two days later with no reviews and no comments at 8:50 AM PDT. I went back through my email from around that time and could not find any relevant discussion on the old mailing list, although I did not spend too much time digging. But it's pretty clear that even if that then-new backwards-compatibility SHOULD was discussed at some point, it was definitely not subject to a typical PR review and approval process. That doesn't invalidate it – we had not yet agreed on that sort of process question. But it's relevant in terms of how much you can claim to have gotten buy-in for it. Furthermore, it's a SHOULD rather than a MUST, and comes with a "deemed reasonable" qualifier. That is hardly an ironclad guarantee of compatibility-interoperability. And the language does not give any guidance on what an implementation ought to do if it encounters a Looking at that text, I don't see any reason that an implementation couldn't refuse to process such a schema. If it did not refuse to process it, then it would obviously be processing it by mismatched rules, and certainly by draft-07 that was pretty problematic. I know you don't think so, but I am not aware of any draft-07 implementation that would correctly handle draft-04's The case of not having a Having dug that up, I went to find out where I'd removed that language, which was in PR json-schema-org/json-schema-spec#671 '"$vocabulary" and basic vocabulary support.' opened on Nov. 10, 2018. Initially, this PR was structured as a few commits that were each logical steps in the larger change, with the recommendation that each commit be reviewed separately. The change to The PR was open for over a month before being merged on Dec. 17, 2018. There were a lot of review comments and updated commits. And, of course, it was nearly another year before 2019-09 actually went out. The initial PR did not remove the SHOULD regarding past drafts. I went through every single resolved comment, and eventually found the discussion with Jesús González where I decided to remove it (you'll have to click "show resolved" to see it). On Nov. 29th at 6:16 PM PST I wrote:
A few hours later, at 3:34 AM PDT on Nov. 30th, you (Austin) commented on the PR. The timing is unfortunate, as I assume you reviewed it before I made that change. However, you would have just gotten an email notification about the updated commit, which was pushed on Nov. 29th (GitHub's UI says Dec. 16th due to some rebase weirdness, but the git log shows the correct timestamp), and my comment on it. I replied to your comment later on the 30th, but you never replied or otherwise interacted with the PR after that, so I don't know what you did or didn't see. But you were definitely aware that there was a major change up for review in this area, with active discussions and updates in response to feedback. And it stayed up for another two weeks or so. Ben approved the PR on Dec. 4th, Greg on Dec. 16th, and I merged it on Dec. 17th. Of course, I've missed stuff in reviews that I later realized were important. Missing the change in the PR does not automatically invalidate your concerns. But please, stop treating this change in 2019-09 like a mistake that just slipped in. The PR was open for a long time, and had a lot of eyes on it, not just from the JSON Schema core team, but from other community members including one from the OpenAPI core team. Regardless, I continue to assert that 2019-09 did not actually break anything. Your compatibility requirement was not strong enough for true interoperability, as it offered no guidance for handling unsupported drafts. Even though it's not at all clear what an implementation ought to do in such a case, and it's quite likely that without proper support for a draft, the result of evaluating it will often be wrong. It doesn't matter at all that I'll also note that 2019-09 was the first draft to provide explicit guidance on cross-draft compatibility, albeit only for I am happy to discuss what sort of compatibility you think should be present in the spec in the future. What we need to debate now is whether your concept of compatibility is the right one. I do not think it is, because even when your SHOULD was in the spec, as far as I can tell no one implemented the sort of cross-draft interoperability you think was implied. Please do feel free to demonstrate otherwise by linking to such an implementation. I also don't think that a directive to implement versions back to draft-04 or whatever would go over well with the larger community, particularly folks who wrote their implementations recently enough that they only support 2019-09 and later. I have thoughts on what sort of compatibility is needed, but I'm exhausted by this discussion right now and don't expect to get back to it in the necessary depth to explore that until next week sometime. |
Beta Was this translation helpful? Give feedback.
-
Yes, this. I don't see any value of discussing previous or even the current draft in terms of draft version compatability. (Honestly, I thought the "compatability" discussion was clamping down on different implementations of the same drafts of JSON Schema, not cross-draft support for when schemas don't specify the draft). The previous and current versions of JSON Schema are AS IS. There's no changing them. I know others have raised this as it seems like that's what @awwright you were proposing, but then you said you were not. Given multiple people think this is what you were proposing, whatever you're trying to say maybe doens't have enough context. It's not clear. It feels like to me the majority of this discussion is off-topic (if we even agree on the purpose of the discussion, which I'm not sure we do). Cross version compatability moving forward is something we have said we want to look at, but including previous drafts in that discussion feels... pointless. |
Beta Was this translation helpful? Give feedback.
-
Going back to the beginning, @awwright said:
This is trivial to fix:
That prevents the pathological behavior, while preserving the intent of the current specification (allowing a refusal to process). Changes to that intent (e.g. forcing implementations to always attempt to process) would require a much more in-depth discussion. |
Beta Was this translation helpful? Give feedback.
-
Yeah you're right that wording needs some tweaks. And since we're discussing specific wording I don't think it's a retread. I think what we want is:
Options 1, 2, and 3 are straightforward. Options 2 and/or 3 are what nearly everyone does AFAICT, and the test suite up util now has relied on option 3 to correlate directory names and processing rules. I'm not sure if anyone does option 1, but I can see an implementation that only supports the core and applicator vocabularies plus some specialized annotation vocabularies using option 1. And yes, I've implemented a system in JSON Schema that did not use assertions at all. The core vocabulary is a very useful way to manage a large set of JSON resources that can be combined in different ways. We can debate whether such things are really "schemas", but the system implemented the core vocabulary and the JSON Schema way of loading and walking the resources. But it wouldn't make sense to feed it a validation schema. Option 4 is an attempt to accommodate @awwright, based on what he says his intent was with his draft-05 wording. I would prefer to drop it, as I think it leads to horribly unpredictable behavior that may technically involve "compatibility" but wont' be at all reliable and therefore not what I would consider interoperable. But I was trying for minimal disruption. Option 4 was not forbidden by the 2019-09+ language at all, and seems to be what Austin wants, so I preserved it for this minimal fix. I'm not willing to restore the SHOULD regarding older versions of JSON Schema without debating it as a separate point. Regardless of how potentially cross-draft |
Beta Was this translation helpful? Give feedback.
-
In this issue, and in others that I've filed, I'm trying my best to identify specific problems and their solutions; but in the course of discussion I'm not getting a good sense of why each problem isn't actually a problem (e.g. "actually, that problem can be solved by doing ..." either inside or outside JSON Schema) or why a solution would be insufficient ("it's not possible/practical to solve this problem because ..."). So far, for every specific problem that someone has identified, I agree it can and should be solved. E.g. a rule written one place should work the same way everywhere. There shouldn't be silent failures. A schema should be useful for non-validation purposes. I'm not sure I've done a good job keeping this clear. I'd like to schedule time with everyone individually, and gather everyone's list of priorities. Including current unsolved problems, and identifying existing features that we shouldn't inadvertently break. I'll try to collect specific examples/tests that we can judge solutions by (for example: adding if/then/else to a schema shouldn't silently fail to do anything). If you don't agree with anything I suggested above, I want to hear that too. |
Beta Was this translation helpful? Give feedback.
-
It's not obvious to me from that wording that it means that processing a schema without |
Beta Was this translation helpful? Give feedback.
-
This is a very long thread already and I don't want to make it worse, but I'd like to provide my perspective on the whole backwards-compatibility problem from the point of view of somebody building non-validation-related JSON Schema tooling. If a JSON Schema definition lacks the The I agree with @awwright that many i.e. protocols have declared an explicit intention of preserving backwards-compatibility from the start and because of that, they can get away with not having version specifiers and having things just work (or at least most of the time). However, while strong backwards-compatibility desires helps with adoption, it also hurts innovation. Even in the context of networking, we know that there are things that are not great, or could be done better, but we are still stuck with them because of the backwards-compatibility baggage. As I see it, the current philosophy of JSON Schema is to NOT play the strong backwards-compatibility game. Instead, it embraces the vocabulary system to make breaking changes when needed, and it has done so several times as seen on the "Migration" sections in https://json-schema.org/specification.html. The presence of
Here is my position based on my work on JSON BinPack and several discussions with both @Relequestual and @Julian:
I think a middle ground that would help with the above strategy through automated software like |
Beta Was this translation helpful? Give feedback.
-
So it turns out the In draft-02 and earlier, the value of |
Beta Was this translation helpful? Give feedback.
-
Just leaving this here as a note. JSON Path is also looking at how to define and indicate this idea. |
Beta Was this translation helpful? Give feedback.
-
The specification currently specifies that schemas without any "$schema" keyword are "implementation defined" (emphasis added):
This under-constrains the validation behavior and permits behavior that I would never expect to be possible (it wouldn't be wrong to say that
{"type":"string"}
permitsnull
). And it provides no guarantees about reverse-compatibility; updates to the meta-schema that are reverse-incompatible will have broad effects. (That is to say, dialects don't actually achieve the intended goal of forward compatibility, it just pushes this responsibility onto implementations, probably in platform-specific ways.)Further, there's at least a few implementations that do not read
"$schema"
and even if they should, requiring this keyword would be impractical. It's perfectly clear what is meant when someone writes{"type":"string"}
without any context.It's understood that implementations aren't always up-to-date on a spec; that some implementations only follow older versions of a spec normally goes without saying. When we say that documents without "$schema" is implementation-defined behavior, we're either being redundant, or we're saying something more than that.
The "$schema" keyword is, of course, useful. It can declare a subset of the JSON Schema vocabulary (e.g. some document databases might not be able to implement the full vocabulary); or user or implementation-specific keywords (vocabularies). And it can be used as a heuristic to decide if older behavior for a keyword should be used.
But it can't solve all versioning problems. A custom meta-schema might not define compatibility with any draft/release of JSON Schema. We still need to define the behavior for these situations.
By comparison,
text/html
andapplication/xhtml+xml
have a single document that defines how to interpret all documents, even ones marked with an older version number. Some HTML versions allow you to specify a DTD, but these restrict the elements you're allowed to use, they aren't required, they don't actually change the semantics of the elements, and their omission has a standardized behavior.A simple test for potential solutions is:
null
(and other values) cannot be valid against{"type":"string"}
. Currently the behavior is under-constrained (under-specified) and there's nothing to say that this would be wrong.Potential solutions:
Error on a missing $schema keyword: This would break a very large number of existing implementations and documents.
null
wouldn't be valid, but it wouldn't necessarily be invalid either. (And even if we were authoring from scratch, requiring version identifiers in documents is strongly discouraged in Internet media types, and very unpopular among document authors.)Implementations assume the latest known, standard $schema: This would give stronger guarantees about cross-platform compatibility (presumably,
"type"
would always mean the same thing). However, this would defeat the intention of"$schema"
as a dialect identifier. If an implementation updates its default"$schema"
, this would break reverse compatibility if the new dialect is not reverse-compatible. (This is also true wherever the behavior is undefined or implementation-defined.)Single media type specification: Newer drafts replace older drafts in their entirety, though implementations may choose to implement older behavior for reverse compatibility reasons. Because the URI of the meta-schema changes with every draft, "$schema" can be used as a heuristic to determine if a schema is expecting an older, superseded behavior. All releases would have to be reverse compatible, or at least, changes would have to be carefully weighed, especially if there's orthogonal implementations. This was the behavior through draft-07.
Beta Was this translation helpful? Give feedback.
All reactions