Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keywords that load instance data #855

Closed
handrews opened this issue Feb 10, 2020 · 9 comments · Fixed by #1034
Closed

Keywords that load instance data #855

handrews opened this issue Feb 10, 2020 · 9 comments · Fixed by #1034
Assignees
Labels

Comments

@handrews
Copy link
Contributor

Looking at various principles that we've gotten better at articulating, and leveraging vocabularies, I think we can offer some guidance in this area, which is the most controversial remaining feature request.

$data itself is not what we want:

  • it, like $merge, is a process separate from the rest of JSON Schema evaluation, which can be dropped into pretty much anywhere. This makes the semantics and cost of all keywords more complex.
  • Because it is a global behavior, it would have to go into the Core vocabulary, significantly increasing the burden of implementing Core.

However, we do have Hyper-Schema's links and base, both of which can incorporate instance data via templating. The mapping of the instance into the template can pull in data from anywhere and everywhere, which (as @awwright noted in Scope of JSON Schema Validation) is a problem for streaming implementations).

So why are links and base OK, while $data is not? In part, it's because they are essentially annotations, which haven't been formally processed until now. But also, they are specific-use keywords. Now that we have $vocabulary, a streaming validator could refuse to parse a JSON Hyper-Schema document. That would not be possible with $data in the core, because everything MUST support Core.

I think we should note that extension keywords MAY produce annotations or evaluate assertions based on data from elsewhere in the instance, with the following restrictions:

  • Only URI-references, or [Relative] JSON Pointers may be used to address data (this is already needed for existing features)
  • JSON Pointer fragment syntax is required over the instances, as the instance is processed in the data model and is no longer encoded in any particular media type; JSON Schema implementations are not responsible for understanding arbitrary media types' fragment syntaxes or semantics
  • JSON Schema does not handle fetching any URIs. Just as with $ref, a conforming implementation is only responsible for resolving URIs of which it has been informed

None of this would really change the requirements for implementing Core, Validation, or Hyper-Schema. It's just a description of how Hyper-Schema already works.

Keywords that load data still need to otherwise fit in the keyword taxonomy if a generic JSON Schema processor is expected to handle them. You can't add $data as an extension vocabulary, because it is not a schema keyword. It's a kinda-sorta just-in-time preprocessing step that makes the entire schema document into a template.

On the other hand, links (specifically href) and base are specific template keywords. The cost of template resolution is limited to them. 3rd parties could write other data-templated keywords, such as a greaterThanData keyword which takes a JSON Pointer (relative or absolute form) and asserts the condition in the name. Or they could write a isMemberOf keyword that takes a URI that identifies some dynamic set of which the data is expected to be a member.

Using a URI like the isMemberOf keyword would require some way for an implementation to associate URIs with data, similar to how implementations associate URIs with schemas. Maybe we don't want to encourage that. But as with URIs and schemas, we would state that fetching the resource identified by the URI is not automatic, and at most the implementation can store data associated with URIs and resolve from that storage, just as it stores schemas under URIs and resolves $refs from that.

But these relatively complex things are confined to extension vocabularies, and no one is forced to support them. Most people don't support Hyper-Schema, after all :-)

Thoughts?

@handrews
Copy link
Contributor Author

I think that the annotation-based approach for deferring such validation to the application also remains viable and is probably preferred, but since we already do this with links, it seems like formalizing the reach of that is a good idea and will let us redirect $data-ish demands.

@handrews
Copy link
Contributor Author

Forgot to address @awwright's concerns from the wiki page. One was about streaming implementations, which I did address: we keep these keywords out or core and validation, and a streaming implementation would presumably refuse to support a vocabulary that is overly burdensome. I don't think it's necessary to ensure that all vocabularies are streaming-friendly, as long as the vocabularies defined by the Core and Validation spec are.

The next was about different ways people might want to do same-document references (or references in general). I think sticking to URIs and JSON Pointers takes care of this, as these are technologies that are already involved. Anything outside of that scope and you're on your own.

The last concern was "JSON Schema should not express a preference on the best way to validate references to data." I think punting this out to vocabularies and only specifying the general boundaries (URIs, JSON Pointers, no automatic fetching of data, etc.) avoids putting the project as a whole on record of endorsing any one approach. But it does enable approaches beyond "put some annotations on it and deal with it in your application" so maybe there are still concerns to work out here.

@handrews
Copy link
Contributor Author

Note that it's probably worth considering #115 extension to Relative JSON Pointer to move left/right along arrays. There are several levels of complexity there, and my current thoughts are to only address the originally proposed left-right option.

@handrews
Copy link
Contributor Author

Right now I am looking at adding something vaguely like the following to a new section on the general process for evaluating keywords.

  • Keywords MAY be defined to use JSON Pointers or Relative JSON Pointers to examine other parts of an instance than the current location
  • Keywords that allow adjusting the location with [Relative] JSON Pointer(s) SHOULD default to using the current location, if a default is desirable.

This matches how various parts of the links object in HyperSchema work.

@awwright
Copy link
Member

awwright commented Feb 24, 2020

See also json-schema-org/json-schema-vocabularies#26 "Feature for defining data sources/relationships"

@Relequestual
Copy link
Member

Your suggestion for "general process for evaluating keywords" sounds reasonable. It's inline with current behaviour and enables similar for potential new vocabularies.

@Relequestual
Copy link
Member

Right now I am looking at adding something vaguely like the following to a new section on the general process for evaluating keywords.

* Keywords MAY be defined to use JSON Pointers or Relative JSON Pointers to examine other parts of an instance than the current location

* Keywords that allow adjusting the location with [Relative] JSON Pointer(s) SHOULD default to using the current location, if a default is desirable.

This matches how various parts of the links object in HyperSchema work.

@handrews It looks like this issue could be resolved with a very small amount of work. Would adding two simple paragraphs as you suggest be enough to resolve this?

@jackpap
Copy link

jackpap commented Nov 6, 2020

Hi, I'm new to JSON schema, but was wondering if there is currently a way to have a dynamically populated enum for validation? It seems like the plans for $data were scrapped, is there currently an alternative ?

{ "emotions" : ["happy", "sad"], "myEmotion" : "sad" } ---> I just want to check that myEmotion is one of the entries of emotions.

@Relequestual
Copy link
Member

Relequestual commented Nov 7, 2020

The alternative is, allow for a new vocabulary to provide the $data keyword. (That's what this issue wants to address)
First though we need to focus on our current release and documentation on how to create a new vocabulary and dialect. Then people can start to make them!

Some implementations already have $data so you have to decide if you want your schemas to be portable or only work within your own system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants