Specify "format" requirements in vocabulary terms #764

handrews · 2019-07-19T02:11:22Z

Fixes #732, fixes #759, addresses the format part of #646.
The content* of #646 will be done in a separate PR.

Use the true vs false values of vocabulary declaration to
indicate whether a schema author requires assertion behavior
(such as because a keyword such as "oneOf" depends on such
validation functioning correctly).

Use a false value in the standard core+validation vocabulary,
reflecting the historical lack of requirement for this
keyword to be implemented.

Define annotation behavior when the vocabulary is declared with
false, to facilitate the recommended best practice of performing
semantic validation in your application. While not the typical
false vocabulary behavior, this seems like the best way to
reframe the historically unpredictable behavior.

Relequestual

I've made a few suggestes and a few comments.

hyper-schema.json

Relequestual · 2019-07-19T11:41:18Z

jsonschema-validation.xml

                <t>
                    The value of this keyword is called a format attribute. It MUST be a string. A
                    format attribute can generally only validate a given set of instance types. If
                    the type of the instance to validate is not in this set, validation for this
-                    format attribute and instance SHOULD succeed.
+                    format attribute and instance SHOULD succeed.  All format attributes defined


If the type of the instance to validate is not in this set, validation for this format attribute and instance SHOULD succeed.

I've never noticed this before, but it sounds not what you want...
Say you have a format "smallint". If a string is provided at the applicable instance, validation will assert TRUE if the format says it is only applicable to numeric types.

I guess this makes sense, because it's similar to how other keywords work, in that keywords are often only applicable if they target an instance of the same type.

For example, maxLength: 5 fails validation for a string of longer than 5 characters, but does not fail validation for a numeric value 123456789 because it is not the correct applicable type.

This is covered explicitly in draft-8 core (http://json-schema.org/work-in-progress/WIP-jsonschema-core.html#rfc.section.7.4.1)

7.4.1. Assertions and Instance Primitive Types

Most assertions only constrain values within a certain primitive type. When the type of the instance is not of the type targeted by the keyword, the instance is considered to conform to the assertion.

For example, the "maxLength" keyword from the companion validation vocabulary will only restrict certain strings (that are too long) from being valid. If the instance is a number, boolean, null, array, or object, then it is valid against this assertion.

But, I feel we should maybe re-reference this section here for clarity...

I'd rather just take the "SHOULD succeed" language out entirely. We used to have that phrasing everywhere and this is just left over from that. As you note, this is typical behavior, so it's better not to call it out- calling things out should be done for atypical behavior.

Agreed.
Is there some text about "if a keyword isn't applicable to the types it may be applied to, then it is not applied and therefore is the same as a true assertion" generally?

Take minimum:

The value of "minimum" MUST be a number, representing an inclusive lower limit for a numeric instance.

If the instance is a number, then this keyword validates only if the instance is greater than or exactly equal to "minimum".

The intent is clear, but the phrasing and wording is... not very spec like.
We don't explicitly specify what happens if the instance is not a number...
The behaviour is, in essense, undefined (unless I've missed something).

If that's the case... maybe a spec cleanup is a candidate for draft-9.

Sorry, I've got way off topic here... but it's relevant.

Relequestual · 2019-07-19T11:46:13Z

jsonschema-validation.xml

+                    Due to the complexity involved in fully validating some format attributes
+                    defined in this specification, implementations MAY provide only limited
+                    validation support for some format attributes.  Implementations SHOULD
+                    document any such intentional limitations.


I hear you on this.
One of the reasons I was going to shunt this issue to draft-9 was I wanted to provide a MINIMUM requirement for the MUST defined above in line 559.

Otherwise, the MUST on 559 holds no actual requirement, because it's then defined that meeting said MUST is actually meeting two SHOULD requirements (and we're back to... whatever the implementation wants to do).

The point of the issues around format are providing AT LEAST SOMEthing when format is used if required.

[EDIT: See the next comment down instead]

This does require at least something- it just does not define what that something is, and there is NO WAY IN HELL we will get any consensus from implementors on defining a minimum for each format attribute within the next couple weeks.

What this does is ensures that the keyword produces a validation result, and gives both the implementation and schema author some level of control. Previously, the schema author had no control over how things were processed.

This does not solve everything but it's a big improvement and I'm not willing to let the perfect be the enemy of the good here. For me personally, format is unsalvageable as any sort of interoperable feature and should be discouraged (if not formally deprecated or removed- it's far too heavily used for that right now) in favor of new vocabularies.

@Relequestual OK now that I'm more awake and have had a chance to think about it more (I really shouldn't- SHOULD NOT? - respond to comments right when I first wake up) I think that having the vocabulary true/false option means that we can strengthen these requirements without having a huge fight over the minimal requirements. Which can be pretty minimal with a warning that they may be strengthened in the future.

We can do this because the default usage of the vocabulary is still false, which is essentially the same as what it has always been. As is noted in the text, even when the vocabulary is false, an implementation MAY try to validate it anyway.

So let's do this:

If the vocabulary is true, then all of the format attributes MUST be supported to some minimum level along the lines of syntactic fallback validation for "format" #54. Otherwise the implementation must raise an unsupported vocabulary error.

If the vocabulary is false, then it's all MAY. Which includes the SHOULD level that is currently specified (and, as far as I can tell, very frequently ignored to some degree or another).

Again, since the default is false, this is an entirely backwards-compatible change.

I will revise the PR. I think I may move a bit of this text and quite a bit more commentary into an appendix of guidance on implementing and (if necessary e.g. because OpenAPI has a huge format registry already) extendingformat.

I think this is a totally reasonable and proportionate way forward.
I'm happy with a super bare minimum level of required validation in the instance where the format vocabulary is true: for example, email matches ^.+@.+\..+$. Something that catches the equivilence of a glancing "huh... well that's clearly wrong" type issue.

Ultimatly, JSON Schema is (normally) first pass on validation, apart from the instances where it's baked into a database or the like.

Relequestual · 2019-07-19T11:49:50Z

jsonschema-validation.xml

+                        why the requirement is a SHOULD.  Implementations MAY ignore "format" entirely
+                        as is allowed by false vocabulary declarations.  However, due to the long history
+                        of this keyword, treating it as something of a special case seems reasonable.
+                        This may be revised in future drafts based on feedback.


There's been some discussion around "should all unknown keywords be collected as annotations for application use?". Doing so would prevent the need to define this explicit behaviour, which as you note, is different to when other vocabs are given a false value.

Yeah I decided to separate that conversation from this one. This will work fine now, and does not prevent us from making annotation collection the default behavior later. Either before or after publishing this draft, but we really do need to wrap this up soon, and changing the overall default behavior may be pretty contentious.

Clarifying format is fine to do because it is such an enormous source of complaints, and none of this is incompatible with existing behavior.

Overall I think this is fine.
However, it does lead me to consider that this is placing an extra implementation SHOULD based on a vocab being false. Do you think it's worth saying something like " generally it is not recomend that behaviour is derived from a vocabulary having a false value"? or do you think that's overkill?

This is covered more clearly in the new text I had written up on Sunday. I'm just finishing up the minimum validation requirements now and will post an update by tonight.

gregsdennis · 2019-07-22T22:57:40Z

This looks good to me

handrews · 2019-07-24T00:27:23Z

@Relequestual @gregsdennis @Julian @johandorland OK I have reworked this a bit more to add clarity (including clarity on where schema authors cannot expect consistent or thorough validation) and ensured that there are consistent and reasonably predictable default behaviors.

There are two ways to control whether and to what degree format is validated:

The schema author can declare the format vocabulary with either false (the default) or true
The code that runs the validation can be configured to never perform format validation, always perform it, or (by default) behave based on the vocabulary value.

false + default configuration: format MUST NOT be validated (this is the overall default)
false + fmt validation configured off: format MUST NOT be validated
false + fmt validation configured on: format MUST be validated to whatever extent is available, however an implementation need not provide any useful validation, so it may not be possible to fail validation (a.k.a. what we have now)
true + default configuration: format MUST be validated, and it MUST be possible to fail validation
true + fmt validation configured off: format MUST NOT be validated
true + fmt validation configured on: format MUST be validated, and it MSUT be possible to fail validation

This is still a complicated mess but it's a more clear complicated mess and I think it is important to establish that the overall default behavior is to not validated at all, because that is the only absolutely consistent option.

Use the true vs false values of vocabulary declaration to indicate whether a schema author requires assertion behavior (such as because a keyword such as "oneOf" depends on such validation functioning correctly). Use a false value in the standard core+validation vocabulary, reflecting the historical lack of requirement for this keyword to be implemented. Define annotation behavior when the vocabulary is declared with false, to facilitate the recommended best practice of performing semantic validation in your application. While not the typical false vocabulary behavior, this seems like the best way to reframe the historically unpredictable behavior.

By default, a false vocabulary prevents "format" from being validated. By default, a true vocabulary requires "format" to be validated, although the degree of validation required remains somewhat vague at least for this draft. In both the true and false cases, validation can be toggled on or off when passing schemas and instances to the implementation (although in the false case, there is no guarantee at all that turning on valdiation will produce any validation behavior; this matches the previous draft's "format" specification).

gregsdennis · 2019-07-26T08:43:13Z

true + fmt validation configured off

This is the weird one. It's basically this conversation between client and implementation:

Client: "Oh, so, I don't want you to ever validate format."
Implementation: "But... the meta-schema says I have to."
Client: "Did I stutter?"
Implementation: "..."

gregsdennis · 2019-07-26T08:51:22Z

jsonschema-validation.xml

+                        custom format attributes.  An implementation MUST NOT fail
+                        validation or cease processing due to an unknown format attribute.
+                        When treating "format" as an annotation, implementations SHOULD collect both
+                        known and unknown format attribute values.


"We" should build some tests around collecting format values as annotations.

Yeah. And really around annotation collection in general.

Do we have an issue to track this?

@philsturgeon no, but feel free to file one 😄

gregsdennis · 2019-07-26T08:52:03Z

I'm happy with the current state.

handrews · 2019-07-26T16:29:54Z

@gregsdennis LOL "did I stutter?" 😆

The point of that option is that there are always people who do not want to run the validation because it is expensive. I agree that that is a weird case, and I'd be fine with saying that it's an invalid combination. Let's see what other folks think on that.

handrews · 2019-08-09T19:18:31Z

@Relequestual are you OK with this? If so I'd like to merge it. Greg OK'd it since the last update and you are the other person who made significant comments.

philsturgeon

Awesome, clear, ship it!

review was ages ago and seems like concerns were addressed

handrews requested review from Relequestual, awwright, gregsdennis and philsturgeon July 19, 2019 02:11

handrews added format Type: Enhancement labels Jul 19, 2019

handrews added this to the draft-08 milestone Jul 19, 2019

Relequestual previously requested changes Jul 19, 2019

View reviewed changes

handrews changed the base branch from handrews-base to master July 19, 2019 15:42

handrews and others added 2 commits July 23, 2019 19:19

handrews force-pushed the fmt-vocab branch from 39dd6ae to 3ea9e82 Compare July 24, 2019 02:19

gregsdennis reviewed Jul 26, 2019

View reviewed changes

gregsdennis approved these changes Jul 26, 2019

View reviewed changes

Julian approved these changes Jul 26, 2019

View reviewed changes

philsturgeon approved these changes Aug 10, 2019

View reviewed changes

philsturgeon requested a review from Relequestual August 10, 2019 06:23

handrews merged commit 39af765 into json-schema-org:master Aug 10, 2019

handrews deleted the fmt-vocab branch August 10, 2019 18:33

handrews mentioned this pull request Aug 17, 2019

Explain format (and content*) more clearly #646

Closed

handrews mentioned this pull request Jan 15, 2020

syntactic fallback validation for "format" #54

Closed

Uh oh!

Specify "format" requirements in vocabulary terms #764

Specify "format" requirements in vocabulary terms #764

Uh oh!

Conversation

handrews commented Jul 19, 2019

Uh oh!

Relequestual left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

handrews Jul 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gregsdennis commented Jul 22, 2019

Uh oh!

handrews commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gregsdennis commented Jul 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gregsdennis commented Jul 26, 2019

Uh oh!

handrews commented Jul 26, 2019

Uh oh!

handrews commented Aug 9, 2019

Uh oh!

philsturgeon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

handrews Jul 19, 2019 •

edited

Loading

handrews commented Jul 24, 2019 •

edited

Loading