Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling the base URI while evaluating against an instance #868

Closed
handrews opened this issue Mar 3, 2020 · 4 comments · Fixed by #930
Closed

Handling the base URI while evaluating against an instance #868

handrews opened this issue Mar 3, 2020 · 4 comments · Fixed by #930
Assignees
Labels

Comments

@handrews
Copy link
Contributor

handrews commented Mar 3, 2020

This issue is about the architectural principles that:

  • You can resolve all uses of the base URI as a pre-processing step
  • Once you do that, evaluating a schema object is the same regardless of its parent

Technically, draft 2019-09 violates that, although in practice we can wiggle around it. But we should decide whether these principles hold, and make sure that our spec meets our own principles! Options include:

  • We change how we describe the behavior of a few keywords, and keep both principles (I'm leaning towards this as explained at the end)
  • We decide that you can resolve $id and $ref as a pre-processing step, but in general you might need to keep track of base URIs during evaluation with an instance
  • We decide not to promote the idea of a pre-processing step at all (although you do need one to at least discover all of the static URIs that can be reference targets, so you never entirely get rid of it)

$id and $ref (the only draft-07 keywords to rely on the base URI) to full URIs during schema loading, can be pre-processed by simply setting their values to the full URIs at the same time that you find the various schemas and cache them in some sort of URI-lookup thingy.

However, in the latest draft, $anchor, $recursiveAnchor, and $recursiveRef rely on the base URI as well.

For $anchor, since it only adds a URI to associate with that schema in your cache, there's no further processing with the base URI to do there. So it's not a problem in practice.

But $recursiveAnchor and $recursiveRef are problems. At least in theory. In practice, because we restrict $recursiveAnchor to resource root schemas and $recursiveRef to only have a value of "#", you can handle these without needing to know the base URI. That is, in fact, why those restrictions exist.

So we can kind-of get away with ignoring this problem for now, and we could change how we talk about these keywords to remove the base URI stuff.


In general, though, these keywords as we currently describe them work by dynamically re-calculating the base URI of the URI-reference in $recursiveRef depending on $recursiveAnchor. This was done so that we could lift those restrictions on the value of $recursiveRef. Or replace these keywords with a more general $dynamicAnchor and $dynamicRef since the name "recursive" wouldn't be entirely accurate anymore.

This works as follows:

  • Resolve the URI-reference in $recursiveRef just as you would for $ref to get the initial target
  • If the initial target contains "$recursiveAnchor": true, walk back up the dynamic scope to find the outermost scope that also has "$recursiveAnchor": true, to get the intermediate target
  • Re-resolve the URI-reference in $recursiveRef against the base URI from the immediate target, to produce the final target

The nice thing about this is that it works with any URI-reference in $recursiveRef (although some sorts of URI-references don't make much sense- the other reason we restricted it). And in practice, the "#" restriction means that the final target is always the same as the intermediate target, so you never actually need to re-resolve the URI reference. once you find the intermediate target, you're done.

But in the general case, where the intermediate and final targets could be different, you need to know, at runtime, the base URI for both the initial and intermediate targets. You can't even resolve the $recursiveRef to a full URI because you need to know which part was the original reference in order to re-resolve it against the intermediate target's base URI.


If we want to keep the ability to preprocess the base URI to the point where we never need to worry about parent schema objects, the best way to do that would be to reserve a keyword ($base? $_base?) where an implementation could safely store the base URI during preprocessing if there's a keyword in that object that would need it. Then, when $recursiveRef or $recursiveAnchor is encountered, and implementation could just look at that reserved location.

There are subtleties there like what to do if someone actually does try to use it as a keyword, etc. But that's what comes to mind for me.

Thoughts? At the moment, I'm leaning towards changing how we talk about $recursiveAnchor and $recursiveRef and saying something like "these keywords adjust the reference target within the dynamic scope" instead of "these keywords change the base URI of the reference." That gets a bit messy with another architectural principle of "always identify things with URIs", but I feel like that is more easily finessed. Besides, we at least start with a URI, and we can figure out the URI of the final target if we wanted to.

@Relequestual
Copy link
Member

I'm not sure I fully understand all of the above, but after a first pass I have a few comments...

I'm not for adding more complexity to keywords which do referencing with out a clear use case.

If we decide not to modify their behaviour assuming there's no use case presented, if a use case is presented later, it can form part of a later draft.

I hold these opinions for a number of good reasons which I won't go into, but including the lack of educational material (which we will get to).

"these keywords adjust the reference target within the dynamic scope" instead of "these keywords change the base URI of the reference."

It takes me a little while to spin up to properly understanding dynamic scope.

The suggested re-phrase doesn't change what actually happens, right?

As an aside... what lead to the creation of this issue? I can't see an explained use case so I can only assume there's something else you're doing that lead you to think about this?

@handrews
Copy link
Contributor Author

handrews commented Mar 7, 2020

@Relequestual I'm trying to write up the description of how, in general, keywords are evaluated. Specifically, what needs to be available in an interface for a vocabulary plugin to be able to implement a reasonable set of keywords? This is important to avoid 3rd parties defining keywords like OAS's discriminator that need to know things outside of our intended processing model. The way to avoid that is to define what is part of our processing model.

BTW there is a section on dynamic scope, please feel free to suggest ways to make it more concise or clear.

Regarding the general keyword interface:

  • An annotation like title just needs to know the keyword name and value, and the instance and schema locations
  • An assertion like type needs the above plus the instance value
  • An applicator needs to be able to pass subschemas and relative instance locations back to the implementation for evaluation, before performing its own logic to combine results
  • A reference like $ref needs to know the base URI, but only need to know it at load time, not at evaluation time

That last point is important because base URI information flows down from parent to child, but everything else can be handled with subschemas only. If the $recursive* keywords are described as:

these keywords change the base URI of the reference.

then you also need to keep track of the base URI while evaluating against an instance, even if you pre-processed it out during schema load for all $id and $ref occurrences. You can't preprocess it out for $recursive*.

So we can either require that plugin interfaces support that information, or we can get a bit more hand-wavey about how it works to produce the same effect without needing to keep track of the base URI. An implementation already keeps track of the dynamic scope, that's how you walk the schema+instance at all.

@handrews
Copy link
Contributor Author

handrews commented May 1, 2020

OK, to further confuse things, we (by which I mean I) don't even use $recursiveRef correctly in our meta-schemas. The links.json schema for LDOs, as used in Hyper-Schema, have this in a bunch of places:

"$recursiveRef": "https://json-schema.org/draft/2019-09/hyper-schema"

That is, um... not "#". Also, it should really be referencing the hyper-schema vocabulary meta-schema, not the hyper-schema dialect meta-schema, but that's a separate issue.

I think I have a solution for this, but I'm going to make it it's own issue. I just wanted to note here that there's a real problem because the above comments state that this is all working for now because of our "#" restriction, but we're not actually following our own restriction!

🤦

@handrews
Copy link
Contributor Author

The approach in #909 solves this problem, and I'm writing a PR for it. The alternatives in #907 or #911 would also solve it. I'm assigning this to myself since I'm already on the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment