-
-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify "format" requirements in vocabulary terms #764
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a few suggestes and a few comments.
<t> | ||
The value of this keyword is called a format attribute. It MUST be a string. A | ||
format attribute can generally only validate a given set of instance types. If | ||
the type of the instance to validate is not in this set, validation for this | ||
format attribute and instance SHOULD succeed. | ||
format attribute and instance SHOULD succeed. All format attributes defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the type of the instance to validate is not in this set, validation for this format attribute and instance SHOULD succeed.
I've never noticed this before, but it sounds not what you want...
Say you have a format "smallint". If a string is provided at the applicable instance, validation will assert TRUE if the format says it is only applicable to numeric types.
I guess this makes sense, because it's similar to how other keywords work, in that keywords are often only applicable if they target an instance of the same type.
For example, maxLength: 5
fails validation for a string of longer than 5 characters, but does not fail validation for a numeric value 123456789
because it is not the correct applicable type.
This is covered explicitly in draft-8 core (http://json-schema.org/work-in-progress/WIP-jsonschema-core.html#rfc.section.7.4.1)
7.4.1. Assertions and Instance Primitive Types
Most assertions only constrain values within a certain primitive type. When the type of the instance is not of the type targeted by the keyword, the instance is considered to conform to the assertion.
For example, the "maxLength" keyword from the companion validation vocabulary will only restrict certain strings (that are too long) from being valid. If the instance is a number, boolean, null, array, or object, then it is valid against this assertion.
But, I feel we should maybe re-reference this section here for clarity...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather just take the "SHOULD succeed" language out entirely. We used to have that phrasing everywhere and this is just left over from that. As you note, this is typical behavior, so it's better not to call it out- calling things out should be done for atypical behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
Is there some text about "if a keyword isn't applicable to the types it may be applied to, then it is not applied and therefore is the same as a true assertion" generally?
Take minimum
:
The value of "minimum" MUST be a number, representing an inclusive lower limit for a numeric instance.
If the instance is a number, then this keyword validates only if the instance is greater than or exactly equal to "minimum".
The intent is clear, but the phrasing and wording is... not very spec like.
We don't explicitly specify what happens if the instance is not a number...
The behaviour is, in essense, undefined (unless I've missed something).
If that's the case... maybe a spec cleanup is a candidate for draft-9.
Sorry, I've got way off topic here... but it's relevant.
jsonschema-validation.xml
Outdated
Due to the complexity involved in fully validating some format attributes | ||
defined in this specification, implementations MAY provide only limited | ||
validation support for some format attributes. Implementations SHOULD | ||
document any such intentional limitations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you on this.
One of the reasons I was going to shunt this issue to draft-9 was I wanted to provide a MINIMUM requirement for the MUST defined above in line 559.
Otherwise, the MUST on 559 holds no actual requirement, because it's then defined that meeting said MUST is actually meeting two SHOULD requirements (and we're back to... whatever the implementation wants to do).
The point of the issues around format are providing AT LEAST SOMEthing when format is used if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[EDIT: See the next comment down instead]
This does require at least something- it just does not define what that something is, and there is NO WAY IN HELL we will get any consensus from implementors on defining a minimum for each format attribute within the next couple weeks.
What this does is ensures that the keyword produces a validation result, and gives both the implementation and schema author some level of control. Previously, the schema author had no control over how things were processed.
This does not solve everything but it's a big improvement and I'm not willing to let the perfect be the enemy of the good here. For me personally, format
is unsalvageable as any sort of interoperable feature and should be discouraged (if not formally deprecated or removed- it's far too heavily used for that right now) in favor of new vocabularies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Relequestual OK now that I'm more awake and have had a chance to think about it more (I really shouldn't- SHOULD NOT? - respond to comments right when I first wake up) I think that having the vocabulary true/false option means that we can strengthen these requirements without having a huge fight over the minimal requirements. Which can be pretty minimal with a warning that they may be strengthened in the future.
We can do this because the default usage of the vocabulary is still false
, which is essentially the same as what it has always been. As is noted in the text, even when the vocabulary is false
, an implementation MAY try to validate it anyway.
So let's do this:
- If the vocabulary is
true
, then all of the format attributes MUST be supported to some minimum level along the lines of syntactic fallback validation for "format" #54. Otherwise the implementation must raise an unsupported vocabulary error. - If the vocabulary is
false
, then it's all MAY. Which includes the SHOULD level that is currently specified (and, as far as I can tell, very frequently ignored to some degree or another).
Again, since the default is false
, this is an entirely backwards-compatible change.
I will revise the PR. I think I may move a bit of this text and quite a bit more commentary into an appendix of guidance on implementing and (if necessary e.g. because OpenAPI has a huge format registry already) extendingformat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a totally reasonable and proportionate way forward.
I'm happy with a super bare minimum level of required validation in the instance where the format vocabulary is true
: for example, email matches ^.+@.+\..+$
. Something that catches the equivilence of a glancing "huh... well that's clearly wrong" type issue.
Ultimatly, JSON Schema is (normally) first pass on validation, apart from the instances where it's baked into a database or the like.
jsonschema-validation.xml
Outdated
why the requirement is a SHOULD. Implementations MAY ignore "format" entirely | ||
as is allowed by false vocabulary declarations. However, due to the long history | ||
of this keyword, treating it as something of a special case seems reasonable. | ||
This may be revised in future drafts based on feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's been some discussion around "should all unknown keywords be collected as annotations for application use?". Doing so would prevent the need to define this explicit behaviour, which as you note, is different to when other vocabs are given a false value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I decided to separate that conversation from this one. This will work fine now, and does not prevent us from making annotation collection the default behavior later. Either before or after publishing this draft, but we really do need to wrap this up soon, and changing the overall default behavior may be pretty contentious.
Clarifying format
is fine to do because it is such an enormous source of complaints, and none of this is incompatible with existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is fine.
However, it does lead me to consider that this is placing an extra implementation SHOULD based on a vocab being false. Do you think it's worth saying something like " generally it is not recomend that behaviour is derived from a vocabulary having a false value"? or do you think that's overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered more clearly in the new text I had written up on Sunday. I'm just finishing up the minimum validation requirements now and will post an update by tonight.
This looks good to me |
@Relequestual @gregsdennis @Julian @johandorland OK I have reworked this a bit more to add clarity (including clarity on where schema authors cannot expect consistent or thorough validation) and ensured that there are consistent and reasonably predictable default behaviors. There are two ways to control whether and to what degree
This is still a complicated mess but it's a more clear complicated mess and I think it is important to establish that the overall default behavior is to not validated at all, because that is the only absolutely consistent option. |
Use the true vs false values of vocabulary declaration to indicate whether a schema author requires assertion behavior (such as because a keyword such as "oneOf" depends on such validation functioning correctly). Use a false value in the standard core+validation vocabulary, reflecting the historical lack of requirement for this keyword to be implemented. Define annotation behavior when the vocabulary is declared with false, to facilitate the recommended best practice of performing semantic validation in your application. While not the typical false vocabulary behavior, this seems like the best way to reframe the historically unpredictable behavior.
By default, a false vocabulary prevents "format" from being validated. By default, a true vocabulary requires "format" to be validated, although the degree of validation required remains somewhat vague at least for this draft. In both the true and false cases, validation can be toggled on or off when passing schemas and instances to the implementation (although in the false case, there is no guarantee at all that turning on valdiation will produce any validation behavior; this matches the previous draft's "format" specification).
This is the weird one. It's basically this conversation between client and implementation: Client: "Oh, so, I don't want you to ever validate |
custom format attributes. An implementation MUST NOT fail | ||
validation or cease processing due to an unknown format attribute. | ||
When treating "format" as an annotation, implementations SHOULD collect both | ||
known and unknown format attribute values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We" should build some tests around collecting format values as annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. And really around annotation collection in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an issue to track this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philsturgeon no, but feel free to file one 😄
I'm happy with the current state. |
@gregsdennis LOL "did I stutter?" 😆 The point of that option is that there are always people who do not want to run the validation because it is expensive. I agree that that is a weird case, and I'd be fine with saying that it's an invalid combination. Let's see what other folks think on that. |
@Relequestual are you OK with this? If so I'd like to merge it. Greg OK'd it since the last update and you are the other person who made significant comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, clear, ship it!
review was ages ago and seems like concerns were addressed
Fixes #732, fixes #759, addresses the
format
part of #646.The
content*
of #646 will be done in a separate PR.Use the true vs false values of vocabulary declaration to
indicate whether a schema author requires assertion behavior
(such as because a keyword such as "oneOf" depends on such
validation functioning correctly).
Use a false value in the standard core+validation vocabulary,
reflecting the historical lack of requirement for this
keyword to be implemented.
Define annotation behavior when the vocabulary is declared with
false, to facilitate the recommended best practice of performing
semantic validation in your application. While not the typical
false vocabulary behavior, this seems like the best way to
reframe the historically unpredictable behavior.