Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative constraints #250

Closed
stuartpb opened this issue Feb 13, 2017 · 10 comments
Closed

Relative constraints #250

stuartpb opened this issue Feb 13, 2017 · 10 comments
Milestone

Comments

@stuartpb
Copy link

stuartpb commented Feb 13, 2017

Quick example: my schema includes a length object that has two properties, min and max. I'd like a way to specify that a document like this, where max is less than min, is invalid:

{"length": {"min": 42, "max": 7}}

Other relative property/item relationships like "this array must include the same values as this property", "the third item in this tuple must not be the same as the first item", or "all the values in this array must be sorted in lexical order" would be useful, too.

@stuartpb stuartpb changed the title Relative property relationships Relative constraints Feb 13, 2017
@epoberezkin
Copy link
Member

There is a proposal for $data keyword to create such references from schema to data being validated: #51

You can add your support to that issue... As far as I know, at the moment only Ajv implements it.

@awwright
Copy link
Member

@stuartpb JSON Schema is a vocabulary for structural validation, description, and hypermedia; so I consider data verification, even intra-document constraints, to be out-of-scope. The provided way to make relationships between data is hyperlinks via Hyper-schema.

I think the biggest reason for this is limiting the computational complexity of JSON Schema, and the Principle of Least Power. When the vocabulary is limited to a set of instructions with well-defined semantics, you can learn more about the document being described.

@stuartpb
Copy link
Author

stuartpb commented Feb 13, 2017

It sounds like you're drawing a pretty arbitrary line in the sand here. What about dependencies? Isn't that an intra-document constraint? How are direct numeric relational constraints any less "a set of instructions with well-defined semantics" than an anyOf with several different format properties that may or may not cause a different part of the subschema to match, depending on how the operating environment regards the format RFCs' edge cases?

@handrews
Copy link
Contributor

@stuartpb I'm just going to point out that while @awwright is the spec editor, he does not speak for everyone involved with the project on "$data". The issue is still open because there is not yet consensus either way.

@awwright I don't see any way to address this use case with hyperlinks, could you elaborate?

@awwright
Copy link
Member

@stuartpb Some of it is arguable, yeah.

There are mechanisms that have the effect of "if $this then $that," but that's not quite the same as saying "value of X must be present in Y" or similar things that could essentially turn JSON Schema into a relational database or scripting language, which isn't really the goal.

But if we can think of a particular way to do it that fits within the principles of JSON Schema, it could have some interesting applications. This is what I'm referring to when I say principle of least power:

Mostly I don't want to encourage people to dump all their data into a single file, instead of talking about multiple distinct resources by their URI.

Also, ideas for specific keywords like enforcing an array order seems reasonable, if there's a good use case for it. There's already a keyword for asserting items be unique (turning it into a set).

@handrews
Copy link
Contributor

There was a proposal for "ordered" as a keyword on the old wiki- it was one that I didn't port over as no one seemed to be asking for it. Worth filing separately if there's enough interest to discuss it.

@awwright I think you need to get specific about exactly how you would handle the max-less-than-min case brought up by @stuartpb or be clear that you see that that use case is outside of JSON Schema. What you've stated is pretty abstract and does not help us understand whether your approach covers the stated needs.

I know that I do not want JSON Schema to be a scripting language (this has come up before), or a relational database for that matter. But I don't see how we are in danger of that with "$data" so you are going to need to make your case in more detail either here or in #51. Talking about abstract principles is all well and good but it's not a controversial point. We need specifics of how those abstract principles do or do not lead to concrete solutions in order to evaluate them against the very concrete "$data" proposal.

@PeterJGeraghty
Copy link

After working with many “schema”, modelling, and messaging technologies for many years I’m just having my first detailed look at JSON Schema. Regarding the idea of allowing $data in validation schema, I think it is a mistake to rush into that, for the following reasons.

JSON Schema is positioned in terms of a formal standard for validation of JSON instances, but in practice a major use of schemas (by people and by software tooling) is as data model definitions. There are many needs for visualisation, analysis, code generation etc. which are met by the data model definition aspect of schemas, and of course models can exist in isolation from any particular instances. W3C XML Schema did not formally recognise this distinction between data modelling and validation, and one consequence was that certain features specified by its authors, which were not supportable by tooling, were largely unused. There were plenty of examples of schema authors initially writing schemas that used constructs which led to those schemas being badly received by their intended audience, so that they had to amend and re-issue them, which is all delay and wasted effort. JSON Schema should try to void similar complications.

So I think it would be helpful to the industry large if the JSON Schema authors bear in mind this dual purpose of a schema, and consider carefully how each feature they introduce impacts on users who want a data model or some kind of tooling support versus users who want a formal validation grammar. I think allowing $data within constructs which are relevant to data model consumers is a bad idea.

A related point is where a boundary can be drawn between data model definition and constraint definition, and I appreciate that this is not unambiguous. UML draws a line between what can be specified per attribute and what must be expressed as constraints, allowing cardinality to be specified per attribute. Also treating mandatoriness as part of data model goes back to “not null” columns in relational databases. There is also a very long tradition of putting constraints which are of a standard type and relate to one individual data item into a “data dictionary”, leading naturally to things such as facet constraints in XML Schema and similar keywords like maxLength etc. in JSON schema. A very important aspect of these simple constraints is that violations of them can be reported meaningfully through generic mechanisms. To explain the violation, all the validation engine has to do is identify which field, and which facet.

When it comes to more complex constraints involving cross-dependencies, there is a need for human intelligibility at two points in time. First, people need to be able to understand what the constraint is, so they can avoid creating data which will violate it. Second, if an instance does violate it, the explanation needs to identify both which constraint was violated and which data items were involved in its evaluation. There are constraint formalisms which have evolved over time (in areas away from JSON) which try to achieve these goals, generally involving expression language (XPath, OCL) and also a system of error codes, error messages (possibly multilingual) etc.. JSON pointer may be an embryonic expression language?

JSON Schema already has enough power and flexibility to create a risk of writing schemas embodying validation criteria which are unintelligible to a person having to enter the data through a form, or to a business analyst who had to specify a program to produce data which complied with the schema. Other people on this list have pointed out that when an instance fails validation against a JSON Schema using combinations of “allOf”, “oneOf" etc. the message produced by a validation engine may be no help to a human trying to understand what was wrong with the data.

So, basically, I’m suggesting that use of $data in validation as a piecemeal enhancement is a bad idea, and it would be better to take a more strategic view of how a constraint mechanism of arbitrary complexity, separate from, but complementary to a data model, could be envisaged for JSON, in a way which gave intelligibility or transparency.

@awwright
Copy link
Member

awwright commented Mar 2, 2017 via email

@handrews handrews added the $data label May 16, 2017
@handrews handrews added this to the draft-07 (wright-*-02) milestone May 16, 2017
@handrews
Copy link
Contributor

@PeterJGeraghty we now have the json-schema-org/json-schema-vocabularies project for additional vocabularies. One of the three initial topics for proposals is data model / code generation vocabulary. Elsewhere we've discussed that that vocabulary and the UI generation vocabulary would likely make use of only a subset of the validation vocabulary.

This will allow well-defined behavior for those systems without limiting the expressive power of validation. Various logical operators and conditionals are essential validation tools but somewhere between difficult and useless for things like code generation. There is already precedent for this- Hyper-Schema keywords in subschemas that do not validate because of conditionals or negation MUST be ignored.

So I would not consider that a reason to avoid "$data", although it may impact whether it goes in core or validation, as it's probably not a good idea to impose restrictions on core features. It's fine to impose restrictions on validation because a.) it doesn't change how validation works, it just changes whether the additional vocabulary is relevant, and b.) having a vocabulary that did not build on the validation vocabulary at all would be a legal use of JSON Schema. Although no one has come up with anything practical like that :-P

@handrews
Copy link
Contributor

Since there is no proposal here under discussion except $data, I am closing this to merge the discussion into #51. We don't need two independent threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

5 participants