Skip to content

Specify "format" requirements in vocabulary terms #764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion hyper-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"https://json-schema.org/draft/2019-08/vocab/applicator": true,
"https://json-schema.org/draft/2019-08/vocab/validation": true,
"https://json-schema.org/draft/2019-08/vocab/meta-data": true,
"https://json-schema.org/draft/2019-08/vocab/format": true,
"https://json-schema.org/draft/2019-08/vocab/format": false,
"https://json-schema.org/draft/2019-08/vocab/content": true,
"https://json-schema.org/draft/2019-08/vocab/hyper-schema": true
},
Expand Down
188 changes: 162 additions & 26 deletions jsonschema-validation.xml
Original file line number Diff line number Diff line change
Expand Up @@ -174,14 +174,15 @@

</section>

<section title="Meta-Schema">
<section title="Meta-Schema" anchor="meta-schema">
<t>
The current URI for the JSON Schema Validation meta-schema is
<eref target="http://json-schema.org/draft/2019-08/schema#"/>.
For schema author convenience, this meta-schema describes all vocabularies
defined in this specification and the JSON Schema Core specification.
Individual vocabulary and vocabulary meta-schema URIs are given for
each section below.
each section below. Certain vocabularies are optional to support, which
is explained in detail in the relevant sections.
</t>
<t>
Updated vocabulary and meta-schema URIs MAY be published between
Expand Down Expand Up @@ -503,28 +504,45 @@
</section>
</section>

<section title='A Vocabulary for Semantic Validation With "format"' anchor="format">
<section title='A Vocabulary for Semantic Content With "format"' anchor="format">

<section title="Foreword">
<t>
Structural validation alone may be insufficient to validate that an instance
meets all the requirements of an application. The "format" keyword is defined to
allow interoperable semantic validation for a fixed subset of values which are
Structural validation alone may be insufficient to allow an application to correctly
utilize certain values. The "format" annotation keyword is defined to allow schema
authors to convey semantic information for a fixed subset of values which are
accurately described by authoritative resources, be they RFCs or other external
specifications.
</t>

<t>
Implementations MAY treat "format" as an assertion in addition to an annotation,
and attempt to validate the value's conformance to the specified semantics.
See the Implementation Requirements below for details.
</t>

<t>
The value of this keyword is called a format attribute. It MUST be a string. A
format attribute can generally only validate a given set of instance types. If
the type of the instance to validate is not in this set, validation for this
format attribute and instance SHOULD succeed.
format attribute and instance SHOULD succeed. All format attributes defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type of the instance to validate is not in this set, validation for this format attribute and instance SHOULD succeed.

I've never noticed this before, but it sounds not what you want...
Say you have a format "smallint". If a string is provided at the applicable instance, validation will assert TRUE if the format says it is only applicable to numeric types.

I guess this makes sense, because it's similar to how other keywords work, in that keywords are often only applicable if they target an instance of the same type.

For example, maxLength: 5 fails validation for a string of longer than 5 characters, but does not fail validation for a numeric value 123456789 because it is not the correct applicable type.

This is covered explicitly in draft-8 core (http://json-schema.org/work-in-progress/WIP-jsonschema-core.html#rfc.section.7.4.1)

7.4.1. Assertions and Instance Primitive Types

Most assertions only constrain values within a certain primitive type. When the type of the instance is not of the type targeted by the keyword, the instance is considered to conform to the assertion.

For example, the "maxLength" keyword from the companion validation vocabulary will only restrict certain strings (that are too long) from being valid. If the instance is a number, boolean, null, array, or object, then it is valid against this assertion.

But, I feel we should maybe re-reference this section here for clarity...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather just take the "SHOULD succeed" language out entirely. We used to have that phrasing everywhere and this is just left over from that. As you note, this is typical behavior, so it's better not to call it out- calling things out should be done for atypical behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
Is there some text about "if a keyword isn't applicable to the types it may be applied to, then it is not applied and therefore is the same as a true assertion" generally?

Take minimum:

The value of "minimum" MUST be a number, representing an inclusive lower limit for a numeric instance.

If the instance is a number, then this keyword validates only if the instance is greater than or exactly equal to "minimum".

The intent is clear, but the phrasing and wording is... not very spec like.
We don't explicitly specify what happens if the instance is not a number...
The behaviour is, in essense, undefined (unless I've missed something).

If that's the case... maybe a spec cleanup is a candidate for draft-9.

Sorry, I've got way off topic here... but it's relevant.

in this section apply to strings, but a format attribute can be specified
to apply to any instance types defined in the data model defined in the
<xref target="json-schema">core JSON Schema.</xref>
<cref>
Note that the "type" keyword in this specification defines an "integer" type
which is not part of the data model. Therefore a format attribute can be
limited to numbers, but not specifically to integers. However, a numeric
format can be used alongside the "type" keyword with a value of "integer",
or could be explicitly defined to always pass if the number is not an integer,
which produces essentially the same behavior as only applying to integers.
</cref>
</t>

<t>
Meta-schemas that do not use "$vocabulary" SHOULD be considered to
require this vocabulary as if its URI were present with a value of true,
although see the Implementation Requirements below for details.
utilize this vocabulary as if its URI were present with a value of false.
See the Implementation Requirements below for details.
</t>
<t>
The current URI for this vocabulary, known as the Format vocabulary, is:
Expand All @@ -539,27 +557,141 @@

<section title="Implementation Requirements">
<t>
The "format" keyword functions as both an annotation
and as an assertion. While no special effort is required to
implement it as an annotation conveying semantic meaning, implementing
validation is non-trivial.
The "format" keyword functions as an annotation, and optionally as an assertion.
<cref>This is due to the keyword's history, and is not in line with current
keyword design principles.</cref> In order to manage this ambiguity, the
"format" keyword is defined in its own separate vocabulary, as noted above.
The true or false value of the vocabulary declaration governs the implementation
requirements necessary to process a schema that uses "format", and the
behaviors on which schema authors can rely.
</t>
<t>
Implementations MAY support the "format" keyword as a validation assertion.
Should they choose to do so:

<list>
<t>they SHOULD implement validation for attributes defined below;</t>
<t>they SHOULD offer an option to disable validation for this keyword.</t>
</list>
<section title="As an annotation">
<t>
The value of format MUST be collected as an annotation, if the implementation
supports annotation collection. This enables application-level validation when
schema validation is unavailable or inadequate.
</t>
<t>
This requirement is not affected by the boolean value of the vocabulary
declaration, nor by the configuration of "format"'s assertion
behavior described in the next section.
<cref>
Requiring annotation collection even when the vocabulary is declared with
a value of false is atypical, but necessary to ensure that the best
practice of performing application-level validation is possible even when
assertion evaluation is not implemented. Since "format" has always been
a part of this specification, requiring implementations to be aware of it
even with a false vocabulary declaration is deemed to not be a burden.
</cref>
</t>
</section>

<section title="As an assertion">
<t>
Regardless of the boolean value of the vocabulary declaration,
an implementation that can evaluate "format" as an assertion MUST provide
options to enable and disable such evaluation. The assertion evaluation
behavior when the option is not explicitly specified depends on
the vocabulary declaration's boolean value.
</t>

</t>
<t>
When implementing this entire specification, this vocabulary MUST
be supported with a value of false (but see details below),
and MAY be supported with a value of true.
</t>

<t>
Implementations MAY add custom format attributes. Save for agreement between
parties, schema authors SHALL NOT expect a peer implementation to support this
keyword and/or custom format attributes.
</t>
<t>
When the vocabulary is declared with a value of false, an implementation:
<list>
<t>
MUST NOT evaluate "format" as an assertion unless it is explicitly
configured to do so;
</t>
<t>
SHOULD provide an implementation-specific best effort validation
for each format attribute defined below;
</t>
<t>
MAY choose to implement validation of any or all format attributes
as a no-op by always producing a validation result of true;
</t>
<t>
SHOULD document its level of support for validation.
</t>
</list>
<cref>
This matches the current reality of implementations, which provide
widely varying levels of validation, including no validation at all,
for some or all format attributes. It is also designed to encourage
relying only on the annotation behavior and performing semantic
validation in the application, which is the recommended best practice.
</cref>
</t>

<t>
When the vocabulary is declared with a value of true, an implementation
that supports this form of the vocabulary:
<list>
<t>
MUST evaluate "format" as an assertion unless it is explicitly
configured not to do so;
</t>
<t>
MUST implement syntactic validation for all format attributes defined
in this specification, and for any additional format attributes that
it recognizes, such that there exist possible instance values
of the correct type that will fail validation.
</t>
</list>
The requirement for minimal validation of format attributes is intentionally
vague and permissive, due to the complexity involved in many of the attributes.
Note in particular that the requirement is limited to syntactic checking; it is
not to be expected that an implementation would send an email, attempt to connect
to a URL, or otherwise check the existence of an entity identified by a format
instance.
<cref>
The expectation is that for simple formats such as date-time, syntactic
validation will be thorough. For a complex format such as email addresses,
which are the amalgamation of various standards and numerous adjustments
over time, with obscure and/or obsolete rules that may or may not be
restricted by other applications making use of the value, a minimal validation
is sufficient. For example, an instance string that does not contain
an "@" is clearly not a valid email address, and an "email" or "hostname"
containing characters outside of 7-bit ASCII is likewise clearly invalid.
</cref>
</t>
<t>
It is RECOMMENDED that implementations use a common parsing library for each format,
or a well-known regular expression. Implementations SHOULD clearly document
how and to what degree each format attribute is validated.
</t>
<t>
The <xref target="meta-schema">standard core and validation meta-schema</xref>
includes this vocabulary in its "$vocabulary" keyword with a value of false,
since by default implementations are not required to support this keyword
as an assertion. Supporting the format vocabulary with a value of true is
understood to greatly increase code size and in some cases execution time,
and will not be appropriate for all implementations.
</t>
</section>
<section title="Custom format attributes">
<t>
Implementations MAY support custom format attributes. Save for agreement between
parties, schema authors SHALL NOT expect a peer implementation to support such
custom format attributes. An implementation MUST NOT fail
validation or cease processing due to an unknown format attribute.
When treating "format" as an annotation, implementations SHOULD collect both
known and unknown format attribute values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We" should build some tests around collecting format values as annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. And really around annotation collection in general.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue to track this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philsturgeon no, but feel free to file one 😄

</t>
<t>
Vocabularies do not support specifically declaring different value sets for keywords.
Due to this limitation, and the historically uneven implementation of this keyword,
it is RECOMMENDED to define additional keywords in a custom vocabulary rather than
additional format attributes if interoperability is desired.
</t>
</section>
</section>

<section title="Defined Formats">
Expand Down Expand Up @@ -1273,6 +1405,10 @@
<list style="hanging">
<t hangText="draft-handrews-json-schema-validation-02">
<list style="symbols">
<t>Grouped keywords into formal vocabuarlies</t>
<t>Update "format" implementation requirements in terms of vocabularies</t>
<t>By default, "format" MUST NOT be validated, although validation can be enabled</t>
<t>A vocabulary declaration can be used to require "format" validation</t>
<t>Moved "definitions" to the core spec as "$defs"</t>
<t>Moved applicator keywords to the core spec</t>
<t>Renamed the array form of "dependencies" to "dependentRequired", moved the schema form to the core spec</t>
Expand Down
2 changes: 1 addition & 1 deletion schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"https://json-schema.org/draft/2019-08/vocab/applicator": true,
"https://json-schema.org/draft/2019-08/vocab/validation": true,
"https://json-schema.org/draft/2019-08/vocab/meta-data": true,
"https://json-schema.org/draft/2019-08/vocab/format": true,
"https://json-schema.org/draft/2019-08/vocab/format": false,
"https://json-schema.org/draft/2019-08/vocab/content": true
},
"$recursiveAnchor": true,
Expand Down