Description
Looking at various principles that we've gotten better at articulating, and leveraging vocabularies, I think we can offer some guidance in this area, which is the most controversial remaining feature request.
$data
itself is not what we want:
- it, like
$merge
, is a process separate from the rest of JSON Schema evaluation, which can be dropped into pretty much anywhere. This makes the semantics and cost of all keywords more complex. - Because it is a global behavior, it would have to go into the Core vocabulary, significantly increasing the burden of implementing Core.
However, we do have Hyper-Schema's links
and base
, both of which can incorporate instance data via templating. The mapping of the instance into the template can pull in data from anywhere and everywhere, which (as @awwright noted in Scope of JSON Schema Validation) is a problem for streaming implementations).
So why are links
and base
OK, while $data
is not? In part, it's because they are essentially annotations, which haven't been formally processed until now. But also, they are specific-use keywords. Now that we have $vocabulary
, a streaming validator could refuse to parse a JSON Hyper-Schema document. That would not be possible with $data
in the core, because everything MUST support Core.
I think we should note that extension keywords MAY produce annotations or evaluate assertions based on data from elsewhere in the instance, with the following restrictions:
- Only URI-references, or [Relative] JSON Pointers may be used to address data (this is already needed for existing features)
- JSON Pointer fragment syntax is required over the instances, as the instance is processed in the data model and is no longer encoded in any particular media type; JSON Schema implementations are not responsible for understanding arbitrary media types' fragment syntaxes or semantics
- JSON Schema does not handle fetching any URIs. Just as with
$ref
, a conforming implementation is only responsible for resolving URIs of which it has been informed
None of this would really change the requirements for implementing Core, Validation, or Hyper-Schema. It's just a description of how Hyper-Schema already works.
Keywords that load data still need to otherwise fit in the keyword taxonomy if a generic JSON Schema processor is expected to handle them. You can't add $data
as an extension vocabulary, because it is not a schema keyword. It's a kinda-sorta just-in-time preprocessing step that makes the entire schema document into a template.
On the other hand, links
(specifically href
) and base
are specific template keywords. The cost of template resolution is limited to them. 3rd parties could write other data-templated keywords, such as a greaterThanData
keyword which takes a JSON Pointer (relative or absolute form) and asserts the condition in the name. Or they could write a isMemberOf
keyword that takes a URI that identifies some dynamic set of which the data is expected to be a member.
Using a URI like the isMemberOf
keyword would require some way for an implementation to associate URIs with data, similar to how implementations associate URIs with schemas. Maybe we don't want to encourage that. But as with URIs and schemas, we would state that fetching the resource identified by the URI is not automatic, and at most the implementation can store data associated with URIs and resolve from that storage, just as it stores schemas under URIs and resolves $ref
s from that.
But these relatively complex things are confined to extension vocabularies, and no one is forced to support them. Most people don't support Hyper-Schema, after all :-)
Thoughts?