Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is $schema restricted to root schemas? #431

Closed
handrews opened this issue Oct 2, 2017 · 14 comments
Closed

Why is $schema restricted to root schemas? #431

handrews opened this issue Oct 2, 2017 · 14 comments

Comments

@handrews
Copy link
Contributor

handrews commented Oct 2, 2017

In PR #248 we forbade the use of $schema in subschemas. I can't remember why I approved this.

One use case we've noted over and over is that of "packing" multiple schema files into one. It's the main justification for the base-uri-changing functionality of "$id". What that means is that the only time something being a root schema matters is if it is the root schema of the entry point file for processing.

Once you are using multiple files, then whether you $ref to another file (with no fragment or "#" as the fragment), or whether you pre-process and "pack" that file into the original file, the result is the same. But in the former, the referenced schema is a root schema. In the latter case, it is a subschema.

That means that using $ref, you can reference a draft-04 schema from a draft-06 schema. But if you pack it, suddenly that is illegal, because you can't use $schema in the packed subschema to switch the processing rules.

This seems very wrong.

@awwright you wrote the PR- do you remember why it seemed correct? What am I missing?

I know @epoberezkin had some concerns about implementation, but I don't recall why that was compelling- the first thing you do on processing a schema is check $schema to set the rules for processing the rest of the schema (this is how my embryonic implementation handled things before I decided it was probably best to leave that to other libraries). Perhaps @Julian has thoughts on this as well?

@handrews handrews added this to the draft-07 (wright-*-02) milestone Oct 2, 2017
@handrews
Copy link
Contributor Author

handrews commented Oct 2, 2017

I suppose we could redefine "root" schema as "a schema that sets a new base URI" or something, but that seems confusing.

@epoberezkin
Copy link
Member

epoberezkin commented Oct 2, 2017

@handrews we had a very long conversation on the issue, both about $ref and inclusion not being the same and that meta-schema is actually a JSON-schema that cannot be changed during validation.

I can point you to the relevant comments somewhat later but let's please keep it as it is.

Referencing draft-04 schema from draft-06 schema is fine, because $ref is not schema inclusion. Having $schema in the middle of the schema is not fine, as there is no validation process defined that allows to change schema (in this case, meta-schema) on the fly.

@epoberezkin
Copy link
Member

Root schema is the top level of a separate JSON instance, not any inner schema that changes base URI. So we can't redefine what root schema means.

@epoberezkin
Copy link
Member

@handrewd I am happy to have this discussion again, as long as it stays the same for draft-07, it's not seen as either critical or bug, and you re-read our previous conversation on the subject.

Let's get draft-07 out as is and then we can talk again about it.

@epoberezkin
Copy link
Member

With the current spec, you can pack multiple schemas into a single file that is a collection of schemas but not a JSON schema, in the general case.

@handrews
Copy link
Contributor Author

handrews commented Oct 2, 2017

Having $schema in the middle of the schema is not fine, as there is no validation process defined that allows to change schema (in this case, meta-schema) on the fly.

That doesn't make any sense. Why would you need a special process? The process for handling a schema is:

  1. Check for $schema, use it to set further rules
  2. Check for a change in base URI (either id or $id depending on $schema)
  3. Process the other keywords

recurse and repeat as needed. It doesn't matter whether the schema is at the top of the file, inside the file, the target of a $ref, or anything else. A schema is a schema, wherever it is. It "inherits" the parent's $schema value and base URI if those are not changed, but otherwise it's all processed the same way no matter how you got there.

What am I missing?

@handrews
Copy link
Contributor Author

handrews commented Oct 2, 2017

Let's get draft-07 out as is and then we can talk again about it.

I likely need this resolved for my usage, so I'm not interested in deferring it.

@handrews
Copy link
Contributor Author

handrews commented Oct 2, 2017

Plus, everything about draft-07 is blocked on other people anyway, so it's not like it's being held up for this.

@handrews
Copy link
Contributor Author

handrews commented Oct 2, 2017

@epoberezkin also, while I want it resolved for draft-07, that just literally means resolved. It doesn't mean any specific resolution. If you or @awwright can explain how/why this is supposed to work, that's fine. Right now I see the requirements as contradictory, and your statements so far have not cleared that up.

@handrews
Copy link
Contributor Author

handrews commented Oct 3, 2017

@epoberezkin OK I went back and read the whole thing (issue #244, not the PR) again. I didn't slog through it before b/c I assumed this was a simple error, which was incorrect. Anyway... I think I figured it out, including why we see this so differently (both views actually make sense).

The whole discussion mostly just reminds me how much I hate $schema, which is not new.

The issue is actually pretty inconclusive, and there is a CREF explaining that the behavior might change. Because I was never entirely sold, and @awwright also had some concerns (I think- he proposed including the CREF, anyway).

Anyway, you pushed off further in the name of shipping a draft last time. Which was totally reasonable, I'm not complaining! But I am putting my foot down this time (I'm doing the vast majority of the non-PR-review work on this draft so I feel entitled).


Our opposing views are easily explained by the two totally different purposes assigned to $schema: indicating which schema to use to perform validation on the containing schema as an instance, and declaring the vocabulary within the local schema object. The former is, by nature, an assertion across the entire file- no more, no less. The latter is local for each schema object, but inherited from parent schemas when no $schema is present.

You are primarily concerned with it's impact on validating the schema as an instance, because you wrote and maintain a validator. Sensible!

I am primarily concerned with declaring vocabulary, because I look at JSON Schema as a system for defining and using numerous vocabularies. My main interest in validation is as a hook for applying other vocabularies. Using several vocabularies in the same file (and even using multiple concurrently in one schema object- hyper-schema plus UI generation, for instance). See also #314 for more details on how difficult this is right now.


So what do we really need from this keyword? I'm going to argue that

  1. We actually don't need it, although we should keep it in draft-06 form (or very close to it) for compatibility at least for now
  2. We do need something else that behaves differently

Validating schemas as instances

We don't really need $schema to declare how to validate the schema as an instance. There are already mechanisms for doing that, and they are the only mechanisms available to other instances. And somehow non-schema instances get validated just fine :-)

Declaring vocabulary on a schema object-by-schema object basis

This is separate from declaring what to use to validate the whole document. If a schema needs to mix vocabularies, then the validating meta-schema must support all vocabularies. This avoids the whole "validating a schema becomes a special case" problem.

So if for some reason I have a frankenstein schema that switches back and forth between draft-04 and draft-06 (say, because different teams maintain different schemas, but they all need to be shipped in one file), I would need to validate that against a meta-schema that recursively anyOf'd the draft-04 and draft-06 meta-schemas.

And then I'd need to use $schema all over, except that actually that doesn't work very well for the reasons explained in #314. So clearly a different solution is called for. Specifically, I'll propose $vocabularies, which takes an array of schema URIs. It declare the vocabularies that are in use by the local schema object, with the semantics being that an implementation that recognizes a vocabulary can make use of that schema as that one vocabulary defines, and can ignore keywords from unrecognized vocabularies.

So I might have:

{
    "$vocabularies": [
        "http://json-schema.org/draft-07/hyper-schema",
        "http://example.com/custom-ui-generation-schema"
    ],
    "links": [...],
    "someUiGenKeyword": {...}
}

A hyperschema implementation can use this as a hyperschema. An implementation of the custom UI vocabulary could use it for that. An implementation understanding both could in theory use them together in some way (but that would probably not be good vocab design).


So for draft-07 I recommend that we:

  • Leave $schema specified as it (which, as of draft-06, is restricted to declaring a single validating meta-schema), but flag it for deprecation and document that the best practice is to associate or choose the validating schema the same way you would for any other instance document
  • Introduce $vocabularies, probably after some more debate (here or in Understanding extended meta-schemas #314)

@epoberezkin
Copy link
Member

epoberezkin commented Oct 3, 2017

There is no way, in general case, to define which schema should be used with a JSON instance. It is defined, in most cases, on application level. Am I missing something? For JSON schema it is convenient to have meta-schema defined on the top level of JSON instance.

I like the idea of $vocabularies because of extension of meta-schema. It should complement rather than replace $schema. You also need to review the conversation we had in email.

Given that different vocabularies usually require different meta-schemas to validate a schema (as a JSON instance), allowing vocabulary change in the middle of JSON instance will make validating a schema, in a standard way, impossible.

My concern has little to do with my implementation. The change you propose, that would only solve YOUR narrow problem, would destroy a fundamental principle of the JSON schema specification - that we can validate JSON schema as JSON instance against meta-schema.

You have a particular problem - to be able to ship all your schemas as a single file. Why do you need this file to be a schema? Why is it not good enough to have a collection of schemas as array (you can even define such collection in the spec)? Will there be any library that is able to correctly process ALL vocabularies in a single schema file?

@handrews
Copy link
Contributor Author

handrews commented Oct 3, 2017

The change you propose, that would only solve YOUR narrow problem, would destroy a fundamental principle of the JSON schema specification - that we can validate JSON schema as JSON instance against meta-schema.

Did you miss the part where I said:

This avoids the whole "validating a schema becomes a special case" problem.*

I specifically said in my proposal to leave $schema as it is in draft-06. I just said to warn of possible future deprecation (I realize "flag it" is not clear, but it should be clear enough that it does not mean "rip your implementation out and turn it sideways").

Given that different vocabularies usually require different meta-schemas to validate a schema (as a JSON instance), allowing vocabulary change in the middle of JSON instance will make validating a schema, in a standard way, impossible.

To use a schema that declares multiple vocabularies, I would need to declare a meta-schema that supports all of those vocabularies. See again This avoids the whole "validating a schema becomes a special case" problem.

It is the problem of the meta-schema author to come up with one that works. Implementations don't care.

Will there be any library that is able to correctly process ALL vocabularies in a single schema file?

This is meaningless. What is "all"? Why would you even want to use them all at once? What even are they? The point of vocabularies is that anyone can make one. We're standardizing a few, but I would expect many non-standard / extended ones.

@handrews
Copy link
Contributor Author

handrews commented Oct 4, 2017

@epoberezkin: PR #432 implements what I wrote up last night. I went ahead with a PR in the hopes of proving to you that I am not destroying your entire implementation philosophy. Hopefully this is more clear in the PR.

@handrews
Copy link
Contributor Author

handrews commented Oct 8, 2017

We have a general agreement that $schema is correctly specified, even if the rationale is not clear in the spec. The vocabulary issue is already tracked by #314, so I am closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants