Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify "format" requirements in vocabulary terms #764

Merged
merged 2 commits into from
Aug 10, 2019

Conversation

handrews
Copy link
Contributor

Fixes #732, fixes #759, addresses the format part of #646.
The content* of #646 will be done in a separate PR.

Use the true vs false values of vocabulary declaration to
indicate whether a schema author requires assertion behavior
(such as because a keyword such as "oneOf" depends on such
validation functioning correctly).

Use a false value in the standard core+validation vocabulary,
reflecting the historical lack of requirement for this
keyword to be implemented.

Define annotation behavior when the vocabulary is declared with
false, to facilitate the recommended best practice of performing
semantic validation in your application. While not the typical
false vocabulary behavior, this seems like the best way to
reframe the historically unpredictable behavior.

Copy link
Member

@Relequestual Relequestual left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a few suggestes and a few comments.

<t>
The value of this keyword is called a format attribute. It MUST be a string. A
format attribute can generally only validate a given set of instance types. If
the type of the instance to validate is not in this set, validation for this
format attribute and instance SHOULD succeed.
format attribute and instance SHOULD succeed. All format attributes defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type of the instance to validate is not in this set, validation for this format attribute and instance SHOULD succeed.

I've never noticed this before, but it sounds not what you want...
Say you have a format "smallint". If a string is provided at the applicable instance, validation will assert TRUE if the format says it is only applicable to numeric types.

I guess this makes sense, because it's similar to how other keywords work, in that keywords are often only applicable if they target an instance of the same type.

For example, maxLength: 5 fails validation for a string of longer than 5 characters, but does not fail validation for a numeric value 123456789 because it is not the correct applicable type.

This is covered explicitly in draft-8 core (http://json-schema.org/work-in-progress/WIP-jsonschema-core.html#rfc.section.7.4.1)

7.4.1. Assertions and Instance Primitive Types

Most assertions only constrain values within a certain primitive type. When the type of the instance is not of the type targeted by the keyword, the instance is considered to conform to the assertion.

For example, the "maxLength" keyword from the companion validation vocabulary will only restrict certain strings (that are too long) from being valid. If the instance is a number, boolean, null, array, or object, then it is valid against this assertion.

But, I feel we should maybe re-reference this section here for clarity...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather just take the "SHOULD succeed" language out entirely. We used to have that phrasing everywhere and this is just left over from that. As you note, this is typical behavior, so it's better not to call it out- calling things out should be done for atypical behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
Is there some text about "if a keyword isn't applicable to the types it may be applied to, then it is not applied and therefore is the same as a true assertion" generally?

Take minimum:

The value of "minimum" MUST be a number, representing an inclusive lower limit for a numeric instance.

If the instance is a number, then this keyword validates only if the instance is greater than or exactly equal to "minimum".

The intent is clear, but the phrasing and wording is... not very spec like.
We don't explicitly specify what happens if the instance is not a number...
The behaviour is, in essense, undefined (unless I've missed something).

If that's the case... maybe a spec cleanup is a candidate for draft-9.

Sorry, I've got way off topic here... but it's relevant.

Due to the complexity involved in fully validating some format attributes
defined in this specification, implementations MAY provide only limited
validation support for some format attributes. Implementations SHOULD
document any such intentional limitations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you on this.
One of the reasons I was going to shunt this issue to draft-9 was I wanted to provide a MINIMUM requirement for the MUST defined above in line 559.

Otherwise, the MUST on 559 holds no actual requirement, because it's then defined that meeting said MUST is actually meeting two SHOULD requirements (and we're back to... whatever the implementation wants to do).

The point of the issues around format are providing AT LEAST SOMEthing when format is used if required.

Copy link
Contributor Author

@handrews handrews Jul 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[EDIT: See the next comment down instead]

This does require at least something- it just does not define what that something is, and there is NO WAY IN HELL we will get any consensus from implementors on defining a minimum for each format attribute within the next couple weeks.

What this does is ensures that the keyword produces a validation result, and gives both the implementation and schema author some level of control. Previously, the schema author had no control over how things were processed.

This does not solve everything but it's a big improvement and I'm not willing to let the perfect be the enemy of the good here. For me personally, format is unsalvageable as any sort of interoperable feature and should be discouraged (if not formally deprecated or removed- it's far too heavily used for that right now) in favor of new vocabularies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Relequestual OK now that I'm more awake and have had a chance to think about it more (I really shouldn't- SHOULD NOT? - respond to comments right when I first wake up) I think that having the vocabulary true/false option means that we can strengthen these requirements without having a huge fight over the minimal requirements. Which can be pretty minimal with a warning that they may be strengthened in the future.

We can do this because the default usage of the vocabulary is still false, which is essentially the same as what it has always been. As is noted in the text, even when the vocabulary is false, an implementation MAY try to validate it anyway.

So let's do this:

  • If the vocabulary is true, then all of the format attributes MUST be supported to some minimum level along the lines of syntactic fallback validation for "format" #54. Otherwise the implementation must raise an unsupported vocabulary error.
  • If the vocabulary is false, then it's all MAY. Which includes the SHOULD level that is currently specified (and, as far as I can tell, very frequently ignored to some degree or another).

Again, since the default is false, this is an entirely backwards-compatible change.

I will revise the PR. I think I may move a bit of this text and quite a bit more commentary into an appendix of guidance on implementing and (if necessary e.g. because OpenAPI has a huge format registry already) extendingformat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a totally reasonable and proportionate way forward.
I'm happy with a super bare minimum level of required validation in the instance where the format vocabulary is true: for example, email matches ^.+@.+\..+$. Something that catches the equivilence of a glancing "huh... well that's clearly wrong" type issue.

Ultimatly, JSON Schema is (normally) first pass on validation, apart from the instances where it's baked into a database or the like.

why the requirement is a SHOULD. Implementations MAY ignore "format" entirely
as is allowed by false vocabulary declarations. However, due to the long history
of this keyword, treating it as something of a special case seems reasonable.
This may be revised in future drafts based on feedback.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's been some discussion around "should all unknown keywords be collected as annotations for application use?". Doing so would prevent the need to define this explicit behaviour, which as you note, is different to when other vocabs are given a false value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I decided to separate that conversation from this one. This will work fine now, and does not prevent us from making annotation collection the default behavior later. Either before or after publishing this draft, but we really do need to wrap this up soon, and changing the overall default behavior may be pretty contentious.

Clarifying format is fine to do because it is such an enormous source of complaints, and none of this is incompatible with existing behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is fine.
However, it does lead me to consider that this is placing an extra implementation SHOULD based on a vocab being false. Do you think it's worth saying something like " generally it is not recomend that behaviour is derived from a vocabulary having a false value"? or do you think that's overkill?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered more clearly in the new text I had written up on Sunday. I'm just finishing up the minimum validation requirements now and will post an update by tonight.

@handrews handrews changed the base branch from handrews-base to master July 19, 2019 15:42
@gregsdennis
Copy link
Member

This looks good to me

@handrews
Copy link
Contributor Author

handrews commented Jul 24, 2019

@Relequestual @gregsdennis @Julian @johandorland OK I have reworked this a bit more to add clarity (including clarity on where schema authors cannot expect consistent or thorough validation) and ensured that there are consistent and reasonably predictable default behaviors.

There are two ways to control whether and to what degree format is validated:

  1. The schema author can declare the format vocabulary with either false (the default) or true
  2. The code that runs the validation can be configured to never perform format validation, always perform it, or (by default) behave based on the vocabulary value.
  • false + default configuration: format MUST NOT be validated (this is the overall default)
  • false + fmt validation configured off: format MUST NOT be validated
  • false + fmt validation configured on: format MUST be validated to whatever extent is available, however an implementation need not provide any useful validation, so it may not be possible to fail validation (a.k.a. what we have now)
  • true + default configuration: format MUST be validated, and it MUST be possible to fail validation
  • true + fmt validation configured off: format MUST NOT be validated
  • true + fmt validation configured on: format MUST be validated, and it MSUT be possible to fail validation

This is still a complicated mess but it's a more clear complicated mess and I think it is important to establish that the overall default behavior is to not validated at all, because that is the only absolutely consistent option.

handrews and others added 2 commits July 23, 2019 19:19
Use the true vs false values of vocabulary declaration to
indicate whether a schema author requires assertion behavior
(such as because a keyword such as "oneOf" depends on such
validation functioning correctly).

Use a false value in the standard core+validation vocabulary,
reflecting the historical lack of requirement for this
keyword to be implemented.

Define annotation behavior when the vocabulary is declared with
false, to facilitate the recommended best practice of performing
semantic validation in your application.  While not the typical
false vocabulary behavior, this seems like the best way to
reframe the historically unpredictable behavior.
By default, a false vocabulary prevents "format" from being validated.
By default, a true vocabulary requires "format" to be validated,
although the degree of validation required remains somewhat vague
at least for this draft.

In both the true and false cases, validation can be toggled on
or off when passing schemas and instances to the implementation
(although in the false case, there is no guarantee at
all that turning on valdiation will produce any validation behavior;
this matches the previous draft's "format" specification).
@gregsdennis
Copy link
Member

true + fmt validation configured off

This is the weird one. It's basically this conversation between client and implementation:

Client: "Oh, so, I don't want you to ever validate format."
Implementation: "But... the meta-schema says I have to."
Client: "Did I stutter?"
Implementation: "..."

custom format attributes. An implementation MUST NOT fail
validation or cease processing due to an unknown format attribute.
When treating "format" as an annotation, implementations SHOULD collect both
known and unknown format attribute values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We" should build some tests around collecting format values as annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. And really around annotation collection in general.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue to track this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philsturgeon no, but feel free to file one 😄

@gregsdennis
Copy link
Member

I'm happy with the current state.

@handrews
Copy link
Contributor Author

@gregsdennis LOL "did I stutter?" 😆

The point of that option is that there are always people who do not want to run the validation because it is expensive. I agree that that is a weird case, and I'd be fine with saying that it's an invalid combination. Let's see what other folks think on that.

@handrews
Copy link
Contributor Author

handrews commented Aug 9, 2019

@Relequestual are you OK with this? If so I'd like to merge it. Greg OK'd it since the last update and you are the other person who made significant comments.

Copy link
Collaborator

@philsturgeon philsturgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, clear, ship it!

@philsturgeon philsturgeon dismissed Relequestual’s stale review August 10, 2019 06:24

review was ages ago and seems like concerns were addressed

@handrews handrews merged commit 39af765 into json-schema-org:master Aug 10, 2019
@handrews handrews deleted the fmt-vocab branch August 10, 2019 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

format for non-string data types (complex validation rules) Format attributes
5 participants