Skip to content

Replace non-root "id" (or "$id") with "anchor" (or "$anchor") #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
handrews opened this issue Nov 17, 2016 · 14 comments
Closed

Replace non-root "id" (or "$id") with "anchor" (or "$anchor") #149

handrews opened this issue Nov 17, 2016 · 14 comments

Comments

@handrews
Copy link
Contributor

While working on specifying fragment identifier rules ( see #144 ) I've been trying to bring JSON Schema's language around base URIs and fragments in line with other media types. I've noticed that while XML and HTML both allow for changing a document's base URI, they both only allow one base URI per document. This is much less confusing than the current JSON Schema approach of allowing each schema level to set its own base URI.

What are the use cases that motivated this? I can come up with:

  • Simple identifiers for internal $ref.
  • Bundling multiple schemas with base URIs that differ by more than just the fragment together into one JSON document.

The internal reference use case could be handled by introducing an anchor (or $anchor) keyword with exactly the same functionality as "anchor" in HTML or other hypermedia systems. The anchor would be usable as a plain word fragment identifier.

I don't understand the bundling use case very well. What problem is it solving? I looked on the old wiki and could not find any discussion of usage, but I vaguely recall someone bringing up the bundling idea.

For me, I "bundle" schemas, for instance for an API, by putting them in the definitions section and addressing them with JSON Pointer fragments. Even if I need to address them across different files, that works fine. Likewise for setting a profile URI.

Using id for anything other than an initial base URI at the root schema or a simple plain word anchor is horribly confusing. What does it give us? What am I missing?

@awwright
Copy link
Member

XML lets you set a custom URI base using xml:base.

But I would compare what JSON Schema as more like iframes which, for rendering purposes, really is like a document inside a document.

@handrews
Copy link
Contributor Author

While I agree that that is how it is functioning, why is iframes a good model for JSON Schema? Why does JSON Schema need documents inside of documents?

@handrews
Copy link
Contributor Author

handrews commented Nov 18, 2016

@awwright The "iframes" comparison doesn't really hold up - iframes are special elements that make it clear that one document is being embedded in another, while our id can show up anywhere. A possible change that would bring JSON Schema more inline with that is to allow id at any root schema and at any schema directly under "definitions".

Basically, schemas under "definitions" are more like root schemas than subschemas. Applying a schema document to an instance for validation does not necessarily involve the definition schemas, any more than it necessarily involves schemas in other documents. Schemas in definitions and other documents are only involved in validation if they are $ref'd in by the root schema or its non-definition subschemas.

So I can see an analogy of schemas in "definitions" (either the root schema's definitions or subschema definitions) as being documents-within-documents, and therefore things that could set their own URI. It's still not clear to me why this is needed- you have not addressed that at all, and it needs to be addressed. But I am trying to find some common ground for discussion here.

[EDIT: Actually I was misreading xml:base as once per document, rather than once per element, so never mind the following paragraph]
Regarding XML: Yes, I know about xml:base, that is why I wrote that XML allows you to change the base URI, once. So I'm not sure what your point was in bringing that up.

@handrews
Copy link
Contributor Author

handrews commented Nov 20, 2016

After doing even more research...

I still think that JSON Schema should align with the HTML and the "iframe" usage by allowing it only once per document, where schemas in "definitions" take the place of "iframe" as explicitly nested documents. This is conceptually so much simpler than letting any arbitrary subschema be considered a document.

I've seen an objection that we can't restrict "$id" from appearing in subschemas because JSON Schema is context-free. This is not correct. Validation is context-free. But "$schema" and "$id" establish and/or rely on context already.

The closest thing to a use case that I've seen is in json-schema/json-schema#77, and even there I still do not see a use case for identifying anything other than root schemas and definitions schemas. Can anyone provide a single realistic use case that could not be handled by putting the schema that needs to be identified under "definitions"?

Anyone? Paging everyone else I can find discussing "id" in other issues who hasn't explicitly left the project: @epoberezkin? @Relequestual? @JanTvrdik? @jdesrosiers? @seagreen? @sam-at-github? @gazpachoking? @ericgj? @scranen? @pbryan?

@awwright
Copy link
Member

xml:base can be indefinitely nested. The specification for xml:base shows an example with this kind of nesting.

Unfortunately I keep re-reading this and I'm not sure which problem this is trying to solve. Can you start with a problem statement please?

@handrews
Copy link
Contributor Author

Unfortunately I keep re-reading this and I'm not sure which problem this is trying to solve. Can you start with a problem statement please?

Really? Over the course of the last four years there have been around six to eight issues complaining about how confusing and difficult "id" is and you can't figure out why I am trying to make it less confusing?

The problem I am trying to solve is that "id" is confusing and does not appear to have a use case for its most confusing usage.

I'll ask again: What is one use case, any use case, for using "id" with a not-just-fragment URI reference in a schema other than the root schema or schemas in "definitions"? Just give me one use case.

I just read through every single issue on "id" in this repo or the old one, including many with contributions from you, @awwright. While you give some use cases involving "definitions", there are absolutely no use cases for using "id" with arbitrary subschemas. Examples, yes, but no use cases- they are just contrived examples to illustrate various points. I also went through the old wiki. Lots of complaints, still no use cases.

Explain to me why we have this. Anyone.

@awwright
Copy link
Member

Speaking with my implementor hat on, differentiating between root schemas and sub schemas makes parsing and validation a lot more problematic. It creates two types of "schemas" that we have to differentiate between in code.

If you don't like the idea of schemas using some of those features, well, then don't write schemas in that form.

The vast majority of complaints about "id" have come from one person, who made a lot of valid points about how needlessly complex the definition was, and so we fixed most all the problems I can see.

If there's lingering problems, I would like to entertain that. But I need to understand the situation first. I'd like to hear that in your own words.

What are some specific shortcomings from the current I-D, and what are some solutions you can think of, if any?

@handrews
Copy link
Contributor Author

@awwright , I still have a few questions after our IRC session on this, especially because @seagreen just brought up a very similar idea to in #14 a few hours ago:

Another option would be to keep either the scope change or the change to how the schema can be referenced, but not both.

Going back to RFC 3986, we have:

For example, defining a base URI for later use by relative references calls for an absolute-URI syntax rule that does not allow a fragment.

and also:

If the base URI is obtained from a URI reference, then that reference must be converted to absolute form and stripped of any fragment component prior to its use as a base URI.

This means that the two current functions of "id" (changing the base and providing an identifier for plain-name fragment use) could be separated, which I think would be much less confusing. Instead of:

{
    "definitions": {
        "foo": {
            "id": "http://example.com/schemas#shortname",
            ...
        }
    }
}

we could have something like the following, where "anchor" would be processed after "base" so that this still results in a URI for the "#/definitions/foo" schema of "http://example.com/schemas#shortname":

{
    "definitions": {
        "foo": {
            "base": "http://example.com/schemas",
            "anchor": "shortname",
            ...
        }
    }
}

The rationale for this is just to clarify the intent of schema authors and make the whole thing easier to reason about. A fragment in "base" would always be ignored, as fragments cannot affect the base. Schema authors SHOULD NOT use a fragment in a base URI. The "anchor" is just a name, similar to the '...' construct in HTML (and various anchor syntaxes in other media types).

Naming with an anchor and setting a new base are sufficiently different use cases of "id" that they are called out separately and described with different rules (i.e. "This form of "id" keyword MUST begin with a hash ("#")..."). It would be a lot simpler just to separate these two cases into separate keywords, each of which would have a description which would be much more clear.

@seagreen , is this what you had in mind? I know you asked about dropping one or the other, but would separating them be sufficient?

@awwright , what do you think about this approach? Unlike what I initialed filed in this issue, it does not remove any functionality.

@handrews
Copy link
Contributor Author

Just posted #155 not because I think there will necessarily be a consensus on it, just to show what it would look like. I think the clarity in the updated spec is compelling (but that might just be me :-)

@epoberezkin
Copy link
Member

@awwright I think many people find the current id usage confusing, not just one person

I agree with @handrews, I don't see a real use case for base-URI change. Even the ability to use short id inside definitions is not necessary strictly speaking. Using $anchor (that is not URI) would simplify. Or we can further extend the meaning of definitions and simply state that #name is the same as #/definitions/name (in this case you don't even need to use id at all).

On another hand I see no problem differentiating between root schema and everything inside - I think it is an important distinction to have.

@scranen
Copy link

scranen commented Nov 21, 2016

From a practical point of view, I don't think you can really avoid having "id" fields in arbitrary subschemas, because you can always $ref another schema from somewhere. Since as an implementor I would not want to deal with referenced schemas in another way than the schema currently being processed, I would have to decide what to do with that "id" field anyway. I therefore don't see any harm in allowing "id" fields in arbitrary subschemas -- it doesn't make it worse.

The bundling argument goes something like this. In some corporate environments, you can not afford to have dependencies on external URLs, even though you will want to reference all kinds of externally defined JSON schemas. In such cases, you might bundle the external schemas in some local JSON file, and use the "id" field to give them the proper name space. Of course in such cases you could also just not bundle them together in one file, but this might be unnecessarily restricting the user: they would need a parser that allows adding multiple documents to some kind of parsing context, while if you allow bundling in a single document, you just need a parser that can parse a single, self-contained document.

One issue I do have with the "id" field is that it is not clear to me when a parser should be able to resolve a certain URL based on an encountered "id". For each subdocument of a JSON schema, you have three options: (1) it is required by the spec to be another JSON schema, (2) it is required by the spec that it is not a JSON schema, or (3) the spec doesn't say anything about it. The part the spec does not say anything about (case 3) we can not expect a parser to treat in any semantic way. In particular, if there are JSON schemas defined as subdocuments of some property "foo" of the root schema, the parser should not interpret any "id" property in such documents as a schema id. In case 1 on the other hand, it is unclear if the parser must always be able to resolve any ids found in such subschemas.

@handrews
Copy link
Contributor Author

@epoberezkin said:

I agree with @handrews, I don't see a real use case for base-URI change.

@awwright has since convinced me of the utility of identifying subschemas with totally different schemes such as urn:uuid. That makes more sense than shifting the HTTP URI base around, and (I think) should be highlighted more in the spec.

Or we can further extend the meaning of definitions and simply state that #name is the same as #/definitions/name (in this case you don't even need to use id at all).

That wouldn't work well in the case of nested definitions, which I use a fair amount. To package up all the schemas for an API, I make the root schema the entry point resource, and the other resource show up in the top-level definitions. But for complex resources, they might have their own definitions to avoid polluting the API definitions namespace.

@handrews
Copy link
Contributor Author

@scranen wrote:

In particular, if there are JSON schemas defined as subdocuments of some property "foo" of the root schema, the parser should not interpret any "id" property in such documents as a schema id. In case 1 on the other hand, it is unclear if the parser must always be able to resolve any ids found in such subschemas.

When I was trying to get rid of "id" for subschemas, this is the sort of thing that I felt was confusing enough to be avoided- you have articulated it much more clearly than I did :-)

BTW, I do the packaging thing a lot, I just don't also make each schema available separate as well. Which is why I don't need "id" for my packaging, but I do see your point. That's one reason why (at some point in this increasingly complicated issue) I was saying that "definitions" schemas could have ids but other types of subschemas (e.g. object property schemas) should not.

Thinking through the object property scenario, though, while I would say that a schema author SHOULD NOT use an id (or "$base") when such a subschema is present inline, it's quite plausible for it to have an id (or "$base") if it is "$ref"'d in. Since I think we should not have a situation where a "$ref"'d schema has different rules than an inline subschema, that means that "id" or "$base" needs to be legal in property subschemas an other similar places. I think.

@handrews
Copy link
Contributor Author

See the 2nd-to-last comment in PR #155 for why I've dropped this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants