Handling the base URI while evaluating against an instance #868

handrews · 2020-03-03T04:00:11Z

This issue is about the architectural principles that:

You can resolve all uses of the base URI as a pre-processing step
Once you do that, evaluating a schema object is the same regardless of its parent

Technically, draft 2019-09 violates that, although in practice we can wiggle around it. But we should decide whether these principles hold, and make sure that our spec meets our own principles! Options include:

We change how we describe the behavior of a few keywords, and keep both principles (I'm leaning towards this as explained at the end)
We decide that you can resolve $id and $ref as a pre-processing step, but in general you might need to keep track of base URIs during evaluation with an instance
We decide not to promote the idea of a pre-processing step at all (although you do need one to at least discover all of the static URIs that can be reference targets, so you never entirely get rid of it)

$id and $ref (the only draft-07 keywords to rely on the base URI) to full URIs during schema loading, can be pre-processed by simply setting their values to the full URIs at the same time that you find the various schemas and cache them in some sort of URI-lookup thingy.

However, in the latest draft, $anchor, $recursiveAnchor, and $recursiveRef rely on the base URI as well.

For $anchor, since it only adds a URI to associate with that schema in your cache, there's no further processing with the base URI to do there. So it's not a problem in practice.

But $recursiveAnchor and $recursiveRef are problems. At least in theory. In practice, because we restrict $recursiveAnchor to resource root schemas and $recursiveRef to only have a value of "#", you can handle these without needing to know the base URI. That is, in fact, why those restrictions exist.

So we can kind-of get away with ignoring this problem for now, and we could change how we talk about these keywords to remove the base URI stuff.

In general, though, these keywords as we currently describe them work by dynamically re-calculating the base URI of the URI-reference in $recursiveRef depending on $recursiveAnchor. This was done so that we could lift those restrictions on the value of $recursiveRef. Or replace these keywords with a more general $dynamicAnchor and $dynamicRef since the name "recursive" wouldn't be entirely accurate anymore.

This works as follows:

Resolve the URI-reference in $recursiveRef just as you would for $ref to get the initial target
If the initial target contains "$recursiveAnchor": true, walk back up the dynamic scope to find the outermost scope that also has "$recursiveAnchor": true, to get the intermediate target
Re-resolve the URI-reference in $recursiveRef against the base URI from the immediate target, to produce the final target

The nice thing about this is that it works with any URI-reference in $recursiveRef (although some sorts of URI-references don't make much sense- the other reason we restricted it). And in practice, the "#" restriction means that the final target is always the same as the intermediate target, so you never actually need to re-resolve the URI reference. once you find the intermediate target, you're done.

But in the general case, where the intermediate and final targets could be different, you need to know, at runtime, the base URI for both the initial and intermediate targets. You can't even resolve the $recursiveRef to a full URI because you need to know which part was the original reference in order to re-resolve it against the intermediate target's base URI.

If we want to keep the ability to preprocess the base URI to the point where we never need to worry about parent schema objects, the best way to do that would be to reserve a keyword ($base? $_base?) where an implementation could safely store the base URI during preprocessing if there's a keyword in that object that would need it. Then, when $recursiveRef or $recursiveAnchor is encountered, and implementation could just look at that reserved location.

There are subtleties there like what to do if someone actually does try to use it as a keyword, etc. But that's what comes to mind for me.

Thoughts? At the moment, I'm leaning towards changing how we talk about $recursiveAnchor and $recursiveRef and saying something like "these keywords adjust the reference target within the dynamic scope" instead of "these keywords change the base URI of the reference." That gets a bit messy with another architectural principle of "always identify things with URIs", but I feel like that is more easily finessed. Besides, we at least start with a URI, and we can figure out the URI of the final target if we wanted to.

The text was updated successfully, but these errors were encountered:

Relequestual · 2020-03-06T12:55:21Z

I'm not sure I fully understand all of the above, but after a first pass I have a few comments...

I'm not for adding more complexity to keywords which do referencing with out a clear use case.

If we decide not to modify their behaviour assuming there's no use case presented, if a use case is presented later, it can form part of a later draft.

I hold these opinions for a number of good reasons which I won't go into, but including the lack of educational material (which we will get to).

"these keywords adjust the reference target within the dynamic scope" instead of "these keywords change the base URI of the reference."

It takes me a little while to spin up to properly understanding dynamic scope.

The suggested re-phrase doesn't change what actually happens, right?

As an aside... what lead to the creation of this issue? I can't see an explained use case so I can only assume there's something else you're doing that lead you to think about this?

handrews · 2020-03-07T18:28:42Z

@Relequestual I'm trying to write up the description of how, in general, keywords are evaluated. Specifically, what needs to be available in an interface for a vocabulary plugin to be able to implement a reasonable set of keywords? This is important to avoid 3rd parties defining keywords like OAS's discriminator that need to know things outside of our intended processing model. The way to avoid that is to define what is part of our processing model.

BTW there is a section on dynamic scope, please feel free to suggest ways to make it more concise or clear.

Regarding the general keyword interface:

An annotation like title just needs to know the keyword name and value, and the instance and schema locations
An assertion like type needs the above plus the instance value
An applicator needs to be able to pass subschemas and relative instance locations back to the implementation for evaluation, before performing its own logic to combine results
A reference like $ref needs to know the base URI, but only need to know it at load time, not at evaluation time

That last point is important because base URI information flows down from parent to child, but everything else can be handled with subschemas only. If the $recursive* keywords are described as:

these keywords change the base URI of the reference.

then you also need to keep track of the base URI while evaluating against an instance, even if you pre-processed it out during schema load for all $id and $ref occurrences. You can't preprocess it out for $recursive*.

So we can either require that plugin interfaces support that information, or we can get a bit more hand-wavey about how it works to produce the same effect without needing to keep track of the base URI. An implementation already keeps track of the dynamic scope, that's how you walk the schema+instance at all.

handrews · 2020-05-01T21:14:29Z

OK, to further confuse things, we (by which I mean I) don't even use $recursiveRef correctly in our meta-schemas. The links.json schema for LDOs, as used in Hyper-Schema, have this in a bunch of places:

"$recursiveRef": "https://json-schema.org/draft/2019-09/hyper-schema"

That is, um... not "#". Also, it should really be referencing the hyper-schema vocabulary meta-schema, not the hyper-schema dialect meta-schema, but that's a separate issue.

I think I have a solution for this, but I'm going to make it it's own issue. I just wanted to note here that there's a real problem because the above comments state that this is all working for now because of our "#" restriction, but we're not actually following our own restriction!

🤦

handrews · 2020-05-18T16:56:00Z

The approach in #909 solves this problem, and I'm writing a PR for it. The alternatives in #907 or #911 would also solve it. I'm assigning this to myself since I'm already on the PR.

handrews added the core label Mar 3, 2020

handrews added this to the draft-08-patch1 milestone Mar 3, 2020

handrews assigned awwright, Relequestual, gregsdennis and johandorland Mar 3, 2020

handrews mentioned this issue Apr 25, 2020

First pass a list of principles. #856

Closed

handrews assigned handrews and unassigned awwright, Relequestual, gregsdennis and johandorland May 18, 2020

handrews mentioned this issue May 20, 2020

Rename $recursive* to $dynamic*, make it work with normal anchors / plain name fragments instead of base URI switching #930

Merged

handrews closed this as completed in #930 Jul 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling the base URI while evaluating against an instance #868

Handling the base URI while evaluating against an instance #868

handrews commented Mar 3, 2020

Relequestual commented Mar 6, 2020

handrews commented Mar 7, 2020

handrews commented May 1, 2020

handrews commented May 18, 2020

Handling the base URI while evaluating against an instance #868

Handling the base URI while evaluating against an instance #868

Comments

handrews commented Mar 3, 2020

Relequestual commented Mar 6, 2020

handrews commented Mar 7, 2020

handrews commented May 1, 2020

handrews commented May 18, 2020