From d3012200355d126a33edb56adcbdccedc25cc0fe Mon Sep 17 00:00:00 2001 From: Henry Andrews Date: Thu, 13 May 2021 17:18:58 -0700 Subject: [PATCH 1/2] Clarify various things about canonical URIs Fixes issue #937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion). --- jsonschema-core.xml | 144 ++++++++++++++++++++++++-------------------- 1 file changed, 79 insertions(+), 65 deletions(-) diff --git a/jsonschema-core.xml b/jsonschema-core.xml index 07b9436e..058219aa 100644 --- a/jsonschema-core.xml +++ b/jsonschema-core.xml @@ -315,8 +315,8 @@ of five categories: - control schema identification through setting the schema's - canonical URI and/or changing how the base URI is determined + control schema identification through setting a URI + for the schema and/or changing how the base URI is determined produce a boolean result when applied to an instance @@ -426,7 +426,9 @@ A JSON Schema resource is a schema which is canonically identified by an - absolute URI. + absolute URI. Schema resources MAY + also be identified by URIs including fragments. Any such URIs + are considered to be non-canonical. The root schema is the schema that comprises the entire JSON document @@ -730,9 +732,9 @@ be able to support those keywords or vocabularies that contain them. -
+
- Identifiers set the canonical URI of a schema, or affect how such URIs are + Identifiers define URIs for a schema, or affect how such URIs are resolved in references, or both. The Core vocabulary defined in this document defines several identifying keywords, most notably "$id". @@ -1340,26 +1342,31 @@ If present, the value for this keyword MUST be a string, and MUST represent a valid URI-reference. This URI-reference - SHOULD be normalized, and MUST resolve to an - absolute-URI (without a fragment). Therefore, - "$id" MUST NOT contain a non-empty fragment, and SHOULD NOT contain an - empty fragment. + SHOULD be normalized, and MUST be semantically equivalent to an + absolute-URI (without a fragment). - Since an empty fragment in the context of the application/schema+json media - type refers to the same resource as the base URI without a fragment, - an implementation MAY normalize a URI ending with an empty fragment by removing - the fragment. However, schema authors SHOULD NOT rely on this behavior - across implementations. + The application/schema+json media type defines that an absolute-URI + identifying a resource and the same URI with an empty fragment + appended (which identifies the resource's root schema object) are + semantically equivalent. Since this semantic equivalence is not part + of the RFC 3986 normalization process, + implementors and schema authors cannot rely on generic URI libraries + understanding the equivalence. + + + Therefore, "$id" MUST NOT contain a non-empty fragment, and SHOULD NOT + contain an empty fragment. The absolute-URI form MUST be considered + the canonical URI, regardless of the presence or absence of an empty fragment. - This is primarily allowed because older meta-schemas have an empty - fragment in their $id (or previously, id). A future draft may outright - forbid even empty fragments in "$id". + An empty fragment is currently allowed because older meta-schemas have + an empty fragment in their $id (or previously, id). + A future draft may outright forbid even empty fragments in "$id". - This URI also serves as the base URI for relative URI-references in keywords - within the schema resource, in accordance with + The absolute-URI also serves as the base URI for relative URI-references + in keywords within the schema resource, in accordance with RFC 3986 section 5.1.1 regarding base URIs embedded in content. @@ -1623,7 +1630,7 @@ media type. - Unless the "$id" keyword described in the next section is present in the + Unless the "$id" keyword described in an earlier section is present in the root schema, this base URI SHOULD be considered the canonical URI of the schema document's root schema resource. @@ -1750,7 +1757,7 @@ Since JSON Pointer URI fragments are constructed based on the structure of the schema document, an embedded schema resource and its subschemas can be identified by JSON Pointer fragments relative to either its own - canonical URI, or relative to the containing resource's URI. + canonical URI, or relative to a containing resource's URI. Conceptually, a set of linked schema resources should behave @@ -1782,13 +1789,18 @@ } ]]> - - The URI "https://example.com/foo#/items/additionalProperties" - points to the schema of the "additionalProperties" keyword in - the embedded resource. The canonical URI of that schema, however, - is "https://example.com/bar#/additionalProperties". - + + The URI "https://example.com/foo#/items" points to the "items" schema, + which is an embedded resource. The canonical URI of that schema + resource, however, is "https://example.com/bar". + + + For the "additionalProperties" schema within that embedded resource, + the URI "https://example.com/foo#/items/additionalProperties" points + to the correct object, but that object's URI relative to its resource's + canonical URI is "https://example.com/bar#/additionalProperties". +
Now consider the following two schema resources linked by reference @@ -1810,29 +1822,31 @@ ]]> - Here we see that the canonical URI for that "additionalProperties" - subschema is still valid, while the non-canonical URI with the fragment - beginning with "#/items/$ref" now resolves to nothing. + Here we see that the URI for the "additionalProperties" schema object + that is relative to its resource's canonical URI is still valid, + while the URI relative to the "items" schema object's URI no longer + resolves to anything.
Note also that "https://example.com/foo#/items" is valid in both arrangements, but resolves to a different value. This URI ends up - functioning similarly to a retrieval URI for a resource. While valid, - examining the resolved value and either using the "$id" (if the value - is a subschema), or resolving the reference and using the "$id" of the - reference target, is preferable. + functioning similarly to a retrieval URI for a resource. While this URI + is valid, it is more robust to use the "$id" of the embedded or referenced + resource unless it is specifically desired to identify the object containing + the "$ref" in the second (non-embedded) arrangement. - An implementation MAY choose not to support addressing schema resources - (and their subschemas) by non-canonical URIs. - As such, it is RECOMMENDED that schema authors only use canonical URIs, - as using non-canonical URIs may reduce schema interoperability. + An implementation MAY choose not to support addressing schema resource + contents by URIs using a base other than the resource's canonical URI, + plus a JSON Pointer fragment relative to that base. Therefore, schema + authors SHOULD NOT rely on such URIs, as using them may reduce interoperability. This is to avoid requiring implementations to keep track of a whole stack of possible base URIs and JSON Pointer fragments for each, given that all but one will be fragile if the schema resources - are reorganized. Some have argued that this is easy so there is + are reorganized. Some + have argued that this is easy so there is no point in forbidding it, while others have argued that it complicates schema identification and should be forbidden. Feedback on this topic is encouraged. @@ -1844,9 +1858,9 @@ - Further examples of such non-canonical URIs, as well as the appropriate - canonical URIs to use instead, are provided in appendix - . + Further examples of such non-canonical URI construction, as well as + the appropriate canonical URI-based fragments to use instead, + are provided in appendix .
@@ -2709,8 +2723,8 @@
The absolute, dereferenced location of the validating keyword. The value MUST - be expressed as a full URI using the canonical URI of the relevant - schema object, and it MUST NOT include by-reference applicators + be expressed as a full URI using the canonical URI of the relevant schema resource + with a JSON Pointer fragment, and it MUST NOT include by-reference applicators such as "$ref" or "$dynamicRef" as non-terminal path components. It MAY end in such keywords if the error or annotation is for that keyword, such as an unresolvable reference. @@ -3319,10 +3333,10 @@ https://example.com/schemas/common#/$defs/count/minimum - + https://example.com/root.json - + https://example.com/root.json# @@ -3330,21 +3344,21 @@ https://example.com/schemas/common#/$defs/count/minimum https://example.com/root.json - + https://example.com/root.json#foo - + https://example.com/root.json#/$defs/A - https://example.com/other.json - + https://example.com/other.json + https://example.com/other.json# - + https://example.com/root.json#/$defs/B @@ -3352,43 +3366,43 @@ https://example.com/schemas/common#/$defs/count/minimum https://example.com/other.json - + https://example.com/other.json#bar - + https://example.com/other.json#/$defs/X - + https://example.com/root.json#/$defs/B/$defs/X - https://example.com/t/inner.json - + https://example.com/t/inner.json + https://example.com/t/inner.json#bar - + https://example.com/t/inner.json# - + https://example.com/other.json#/$defs/Y - + https://example.com/root.json#/$defs/B/$defs/Y - + urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f - + urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f# - + https://example.com/root.json#/$defs/C @@ -3432,8 +3446,8 @@ https://example.com/schemas/common#/$defs/count/minimum This transformation can be safely and reversibly done as long as all static references (e.g. "$ref") use URI-references that resolve - to canonical URIs, and all schema resources have an absolute-URI - as the "$id" in their root schema. + to URIs using the canonical resource URI as the base, and all schema + resources have an absolute-URI as the "$id" in their root schema. With these conditions met, each external resource can be copied @@ -3441,7 +3455,7 @@ https://example.com/schemas/common#/$defs/count/minimum schema objects, and without changing any aspect of validation or annotation results. The names of the schemas under "$defs" do not affect behavior, assuming they are each unique, as they - do not appear in canonical URIs for the embedded resources. + do not appear in the canonical URIs for the embedded resources.
From 1e4dbab3625c84d0bb9b6704d521d2d87eff0d79 Mon Sep 17 00:00:00 2001 From: Henry Andrews Date: Tue, 13 Jul 2021 16:47:56 -0700 Subject: [PATCH 2/2] Update based on review feedback. --- jsonschema-core.xml | 46 +++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/jsonschema-core.xml b/jsonschema-core.xml index 058219aa..00738358 100644 --- a/jsonschema-core.xml +++ b/jsonschema-core.xml @@ -427,14 +427,24 @@ A JSON Schema resource is a schema which is canonically identified by an absolute URI. Schema resources MAY - also be identified by URIs including fragments. Any such URIs - are considered to be non-canonical. + also be identified by URIs, including URIs with fragments, + if the resulting secondary resource (as defined by + section 3.5 of RFC 3986) is identical + to the primary resource. This can occur with the empty fragment, + or when one schema resource is embedded in another. Any such URIs + with fragments are considered to be non-canonical. The root schema is the schema that comprises the entire JSON document in question. The root schema is always a schema resource, where the URI is determined as described in section . + + Note that documents that embed schemas in another format will not + have a root schema resource in this sense. Exactly how such usages + fit with the JSON Schema document and resource concepts will be + clarified in a future draft. + Some keywords take schemas themselves, allowing JSON Schemas to be nested: @@ -1342,17 +1352,19 @@ If present, the value for this keyword MUST be a string, and MUST represent a valid URI-reference. This URI-reference - SHOULD be normalized, and MUST be semantically equivalent to an - absolute-URI (without a fragment). + SHOULD be normalized, and MUST resolve to an + absolute-URI (without a fragment), + or to a URI with an empty fragment. - The application/schema+json media type defines that an absolute-URI - identifying a resource and the same URI with an empty fragment - appended (which identifies the resource's root schema object) are - semantically equivalent. Since this semantic equivalence is not part - of the RFC 3986 normalization process, - implementors and schema authors cannot rely on generic URI libraries - understanding the equivalence. + The empty fragment form is NOT RECOMMENDED and is retained only + for backwards compatibility, and because the + application/schema+json media type defines that a URI with an + empty fragment identifies the same resource as the same URI + with the fragment removed. However, since this equivalence is not + part of the RFC 3986 normalization process, + implementers and schema authors cannot rely on generic URI libraries + understanding it. Therefore, "$id" MUST NOT contain a non-empty fragment, and SHOULD NOT @@ -1757,7 +1769,7 @@ Since JSON Pointer URI fragments are constructed based on the structure of the schema document, an embedded schema resource and its subschemas can be identified by JSON Pointer fragments relative to either its own - canonical URI, or relative to a containing resource's URI. + canonical URI, or relative to any containing resource's URI. Conceptually, a set of linked schema resources should behave @@ -1822,10 +1834,12 @@ ]]> - Here we see that the URI for the "additionalProperties" schema object - that is relative to its resource's canonical URI is still valid, - while the URI relative to the "items" schema object's URI no longer - resolves to anything. + Here we see that "https://example.com/bar#/additionalProperties", + using a JSON Pointer fragment appended to the canonical URI of + the "bar" schema resource, is still valid, while + "https://example.com/foo#/items/additionalProperties", which relied + on a JSON Pointer fragment appended to the canonical URI of the + "foo" schema resource, no longer resolves to anything.