-
-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relative constraints #250
Comments
@stuartpb JSON Schema is a vocabulary for structural validation, description, and hypermedia; so I consider data verification, even intra-document constraints, to be out-of-scope. The provided way to make relationships between data is hyperlinks via Hyper-schema. I think the biggest reason for this is limiting the computational complexity of JSON Schema, and the Principle of Least Power. When the vocabulary is limited to a set of instructions with well-defined semantics, you can learn more about the document being described. |
It sounds like you're drawing a pretty arbitrary line in the sand here. What about |
@stuartpb I'm just going to point out that while @awwright is the spec editor, he does not speak for everyone involved with the project on @awwright I don't see any way to address this use case with hyperlinks, could you elaborate? |
@stuartpb Some of it is arguable, yeah. There are mechanisms that have the effect of "if $this then $that," but that's not quite the same as saying "value of X must be present in Y" or similar things that could essentially turn JSON Schema into a relational database or scripting language, which isn't really the goal. But if we can think of a particular way to do it that fits within the principles of JSON Schema, it could have some interesting applications. This is what I'm referring to when I say principle of least power: Mostly I don't want to encourage people to dump all their data into a single file, instead of talking about multiple distinct resources by their URI. Also, ideas for specific keywords like enforcing an array order seems reasonable, if there's a good use case for it. There's already a keyword for asserting items be unique (turning it into a set). |
There was a proposal for "ordered" as a keyword on the old wiki- it was one that I didn't port over as no one seemed to be asking for it. Worth filing separately if there's enough interest to discuss it. @awwright I think you need to get specific about exactly how you would handle the max-less-than-min case brought up by @stuartpb or be clear that you see that that use case is outside of JSON Schema. What you've stated is pretty abstract and does not help us understand whether your approach covers the stated needs. I know that I do not want JSON Schema to be a scripting language (this has come up before), or a relational database for that matter. But I don't see how we are in danger of that with |
After working with many “schema”, modelling, and messaging technologies for many years I’m just having my first detailed look at JSON Schema. Regarding the idea of allowing $data in validation schema, I think it is a mistake to rush into that, for the following reasons. JSON Schema is positioned in terms of a formal standard for validation of JSON instances, but in practice a major use of schemas (by people and by software tooling) is as data model definitions. There are many needs for visualisation, analysis, code generation etc. which are met by the data model definition aspect of schemas, and of course models can exist in isolation from any particular instances. W3C XML Schema did not formally recognise this distinction between data modelling and validation, and one consequence was that certain features specified by its authors, which were not supportable by tooling, were largely unused. There were plenty of examples of schema authors initially writing schemas that used constructs which led to those schemas being badly received by their intended audience, so that they had to amend and re-issue them, which is all delay and wasted effort. JSON Schema should try to void similar complications. So I think it would be helpful to the industry large if the JSON Schema authors bear in mind this dual purpose of a schema, and consider carefully how each feature they introduce impacts on users who want a data model or some kind of tooling support versus users who want a formal validation grammar. I think allowing $data within constructs which are relevant to data model consumers is a bad idea. A related point is where a boundary can be drawn between data model definition and constraint definition, and I appreciate that this is not unambiguous. UML draws a line between what can be specified per attribute and what must be expressed as constraints, allowing cardinality to be specified per attribute. Also treating mandatoriness as part of data model goes back to “not null” columns in relational databases. There is also a very long tradition of putting constraints which are of a standard type and relate to one individual data item into a “data dictionary”, leading naturally to things such as facet constraints in XML Schema and similar keywords like maxLength etc. in JSON schema. A very important aspect of these simple constraints is that violations of them can be reported meaningfully through generic mechanisms. To explain the violation, all the validation engine has to do is identify which field, and which facet. When it comes to more complex constraints involving cross-dependencies, there is a need for human intelligibility at two points in time. First, people need to be able to understand what the constraint is, so they can avoid creating data which will violate it. Second, if an instance does violate it, the explanation needs to identify both which constraint was violated and which data items were involved in its evaluation. There are constraint formalisms which have evolved over time (in areas away from JSON) which try to achieve these goals, generally involving expression language (XPath, OCL) and also a system of error codes, error messages (possibly multilingual) etc.. JSON pointer may be an embryonic expression language? JSON Schema already has enough power and flexibility to create a risk of writing schemas embodying validation criteria which are unintelligible to a person having to enter the data through a form, or to a business analyst who had to specify a program to produce data which complied with the schema. Other people on this list have pointed out that when an instance fails validation against a JSON Schema using combinations of “allOf”, “oneOf" etc. the message produced by a validation engine may be no help to a human trying to understand what was wrong with the data. So, basically, I’m suggesting that use of $data in validation as a piecemeal enhancement is a bad idea, and it would be better to take a more strategic view of how a constraint mechanism of arbitrary complexity, separate from, but complementary to a data model, could be envisaged for JSON, in a way which gave intelligibility or transparency. |
This is some great insight, thanks!
It sounds like the Principle of Least Power is applicable here -- the
broader/more powerful something is, the less useful it frequently is.
I know error messages for things like "not" and "oneOf" have been something
of a concern, since it's difficult to produce useful error messages against.
On Mar 2, 2017 04:52, "PeterJGeraghty" <notifications@github.com> wrote:
After working with many “schema”, modelling, and messaging technologies for
many years I’m just having my first detailed look at JSON Schema. Regarding
the idea of allowing $data in validation schema, I think it is a mistake to
rush into that, for the following reasons.
JSON Schema is positioned in terms of a formal standard for validation of
JSON instances, but in practice a major use of schemas (by people and by
software tooling) is as data model definitions. There are many needs for
visualisation, analysis, code generation etc. which are met by the data
model definition aspect of schemas, and of course models can exist in
isolation from any particular instances. W3C XML Schema did not formally
recognise this distinction between data modelling and validation, and one
consequence was that certain features specified by its authors, which were
not supportable by tooling, were largely unused. There were plenty of
examples of schema authors initially writing schemas that used constructs
which led to those schemas being badly received by their intended audience,
so that they had to amend and re-issue them, which is all delay and wasted
effort. JSON Schema should try to void similar complications.
So I think it would be helpful to the industry large if the JSON Schema
authors bear in mind this dual purpose of a schema, and consider carefully
how each feature they introduce impacts on users who want a data model or
some kind of tooling support versus users who want a formal validation
grammar. I think allowing $data within constructs which are relevant to
data model consumers is a bad idea.
A related point is where a boundary can be drawn between data model
definition and constraint definition, and I appreciate that this is not
unambiguous. UML draws a line between what can be specified per attribute
and what must be expressed as constraints, allowing cardinality to be
specified per attribute. Also treating mandatoriness as part of data model
goes back to “not null” columns in relational databases. There is also a
very long tradition of putting constraints which are of a standard type and
relate to one individual data item into a “data dictionary”, leading
naturally to things such as facet constraints in XML Schema and similar
keywords like maxLength etc. in JSON schema. A very important aspect of
these simple constraints is that violations of them can be reported
meaningfully through generic mechanisms. To explain the violation, all the
validation engine has to do is identify which field, and which facet.
When it comes to more complex constraints involving cross-dependencies,
there is a need for human intelligibility at two points in time. First,
people need to be able to understand what the constraint is, so they can
avoid creating data which will violate it. Second, if an instance does
violate it, the explanation needs to identify both which constraint was
violated and which data items were involved in its evaluation. There are
constraint formalisms which have evolved over time (in areas away from
JSON) which try to achieve these goals, generally involving expression
language (XPath, OCL) and also a system of error codes, error messages
(possibly multilingual) etc.. JSON pointer may be an embryonic expression
language?
JSON Schema already has enough power and flexibility to create a risk of
writing schemas embodying validation criteria which are unintelligible to a
person having to enter the data through a form, or to a business analyst
who had to specify a program to produce data which complied with the
schema. Other people on this list have pointed out that when an instance
fails validation against a JSON Schema using combinations of “allOf”,
“oneOf" etc. the message produced by a validation engine may be no help to
a human trying to understand what was wrong with the data.
So, basically, I’m suggesting that use of $data in validation as a
piecemeal enhancement is a bad idea, and it would be better to take a more
strategic view of how a constraint mechanism of arbitrary complexity,
separate from, but complementary to a data model, could be envisaged for
JSON, in a way which gave intelligibility or transparency.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#250 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAatDdvmXBQC9z8YNfivO-fXm1JOOP-Pks5rhq1lgaJpZM4L-_8e>
.
|
@PeterJGeraghty we now have the json-schema-org/json-schema-vocabularies project for additional vocabularies. One of the three initial topics for proposals is data model / code generation vocabulary. Elsewhere we've discussed that that vocabulary and the UI generation vocabulary would likely make use of only a subset of the validation vocabulary. This will allow well-defined behavior for those systems without limiting the expressive power of validation. Various logical operators and conditionals are essential validation tools but somewhere between difficult and useless for things like code generation. There is already precedent for this- Hyper-Schema keywords in subschemas that do not validate because of conditionals or negation MUST be ignored. So I would not consider that a reason to avoid |
Since there is no proposal here under discussion except |
Quick example: my schema includes a
length
object that has two properties,min
andmax
. I'd like a way to specify that a document like this, wheremax
is less thanmin
, is invalid:Other relative property/item relationships like "this array must include the same values as this property", "the third item in this tuple must not be the same as the first item", or "all the values in this array must be sorted in lexical order" would be useful, too.
The text was updated successfully, but these errors were encountered: