Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving $refs defined in unknown keywords #687

Closed
aravindanve opened this issue Dec 1, 2018 · 27 comments
Closed

Resolving $refs defined in unknown keywords #687

aravindanve opened this issue Dec 1, 2018 · 27 comments
Labels
clarification Items that need to be clarified in the specification core

Comments

@aravindanve
Copy link

aravindanve commented Dec 1, 2018

How should a json-schema (draft7) implementation resolve $refs defined in unknown keywords?

JSON-Schema-Test-Suite defines schemas such as this, and I assume they are valid:

{
    "tilda~field": {"type": "integer"},
    "slash/field": {"type": "integer"},
    "percent%field": {"type": "integer"},
    "properties": {
        "tilda": {"$ref": "#/tilda~0field"},
        "slash": {"$ref": "#/slash~1field"},
        "percent": {"$ref": "#/percent%25field"}
    }
}

Take this example below:

{
    "$id": "http://example.com/root.json",
    "definitions": {
        "A": { "type": "integer" }
    },
    "properties": {
        "$id": {
            "type": "string"
        },
        "attributes": {
            "$ref": "#/tilda~0field/slash~1field/$id"
        }
    },
    "tilda~field": {
        "$id": "t/inner.json",
        "slash/field": {
            "$id": {
                "$id": "test/b",
                "$ref": "document.json"
            }
        }
    }
}

Which of the following is the $ref at #/tilda~0field/slash~1field/$id/$ref resolved to?

Which of the $ids in #/tilda~0field must be considered as baseURI for the $ref in question and why.

PS I'm not sure if this is the right place to post this. If I need to clarify further or post this elsewhere, please let me know.

@aravindanve aravindanve changed the title Resolving $refs defined in unknown keywords Resolving $refs defined in unknown keywords Dec 1, 2018
@gregsdennis
Copy link
Member

Which of the following is the $ref at #/tilda~0field/slash~1field/$id/$ref resolved to?

I think Section 8.2 gives the answer: "A subschema's "$id" is resolved against the base URI of its parent schema." This means that it will be http://example.com/t/test/document.json.

Which of the $ids in #/tilda~0field must be considered as baseURI for the $ref in question and why.

The base URI for this is the $id at the root, re-routed by the $id in tilda~field.

  1. Start with http://example.com/root.json.
  2. Change folders to t and use file inner.json.
  3. Change folders to test (inside t) and use file document.json.

I'm not sure if this is the right place to post this. If I need to clarify further or post this elsewhere, please let me know.

This is a perfect place to post questions. It's watched and it will be available for other to read. If you want to carry on in conversation "offline" about something, then feel free to join the Slack workspace.

@gregsdennis
Copy link
Member

I think this answer is better than what I said, though the gist of it is the same.

@aravindanve
Copy link
Author

aravindanve commented Dec 2, 2018

Thanks for the prompt response @gregsdennis!

I think Section 8.2 gives the answer: "A subschema's "$id" is resolved against the base URI of its parent schema." This means that it will be http://example.com/t/test/document.json.

The problem is, I'm not clear on how to identify the last parent schema. The spec says:

A JSON Schema MAY contain properties which are not schema keywords. Unknown keywords SHOULD be ignored.

So if a schema defines an unknown keyword mappings like this:

{
    "$id": "http://example.com/root.json",
    "properties": {
        "required": ["$id"],
        "$id": { "type": "string" },
        "$name": { "type": "string" },
        "$email": { "type": "string", "format": "email" },
        "$title": {
            "$ref": "#/mappings/$title"
        },
        "$profile": {
            "$ref": "#/mappings/$profile"
        }
    },
    "mappings": {
        "$id": "_id",
        "$name": "name",
        "$title": {
            "$ref": "title.json"
        },
        "$profile": {
            "$ref": "profile.json"
        }
    }
}

Now, when my validator encounters "$ref": "#/mappings/$profile", should it try and resolve the $ref found at #/mappings/$profile further or throw?

Resolving the $ref

If it has to resolve the $ref further, then it has to figure out the current baseURI at the location #/mappings/$profile. Which will be done by looking at the parent, which has the $id "_id".

But #/mappings is not a valid schema, and $id here probably describes how a property called $id must be mapped to the underlying database or whatever. Another hint is that the keyword #/mappings/$title is being used in a custom sense here. Now how must the validator proceed?

Should it use http://example.com/_id to resolve profile.json against, even though the parent is clearly not a valid schema, or use the baseURI at the root to resolve the ref?

PS I understand in both cases the ref resolves to http://example.com/profile.json but that is just a coincidence here, it may not be the case always.

@jdesrosiers
Copy link
Member

@aravindanve, that's a really good question. I doubt anyone considered a case like that when this was defined. I certainly haven't.

#/mappings/$profile and #/mappings/$id are not schemas because "#/mappings" is not a JSON Schema keyword. Therefore, you can not $ref those paths because $ref is only allowed to reference a schema.

If you could $ref a non-schema, the $id at #/mappings/$id wouldn't apply because #/mappings is not a schema and thus $id is just a property, not a keyword.

@jdesrosiers
Copy link
Member

The interesting question here is, how does this affect custom keywords? How does it affect vocabularies?

@gregsdennis
Copy link
Member

gregsdennis commented Dec 2, 2018

The key lies in the "SHOULD be ignored" portion. Just ignore it as miscellaneous data. It's not invalid to have it there, but neither should your implementation do anything with it, except maybe store it for serialization back into JSON.

Don't throw; don't resolve the references. It's just JSON data.

In regard to vocabularies, a draft-08 concept, supposing a vocabulary was declared that defines mappings, you would then process in accordance with the vocabulary documentation. This means that someone would need to provide implementation for those keywords via a plug-in or other means to your library so that the keywords can be consumed and interpreted. If no such implementation is provided, then your library can recognize the vocabulary and should throw, but only if the vocabulary is listed as required in the meta-schema. If the vocabulary is not required or no vocabulary defines that particular keyword, it gets ignored as before.

@aravindanve
Copy link
Author

@jdesrosiers @gregsdennis This clears things up.

But you see, my first example:

{
    "tilda~field": {"type": "integer"},
    "slash/field": {"type": "integer"},
    "percent%field": {"type": "integer"},
    "properties": {
        "tilda": {"$ref": "#/tilda~0field"},
        "slash": {"$ref": "#/slash~1field"},
        "percent": {"$ref": "#/percent%25field"}
    }
}

Is lifted directly from JSON-Schema-Test-Suite. I think there are a bunch of tests that routinely reference schemas nested inside unknown keywords.

For now, in my implementation I'll disallow $refs pointing to unknown parts of the schema. But it means it will fail a bunch of tests defined by the suite.

@aravindanve
Copy link
Author

@jdesrosiers One other thing

#/mappings/$profile and #/mappings/$id are not schemas because "#/mappings" is not a JSON Schema keyword. Therefore, you can not $ref those paths because $ref is only allowed to reference a schema.

Technically #/mappings may not be a valid schema, but #/mappings/$profile is.

So when you say "you can not $ref those paths because $ref is only allowed to reference a schema", it is referencing a schema :)

@aravindanve
Copy link
Author

aravindanve commented Dec 2, 2018

@gregsdennis What do you mean by:

Don't throw; don't resolve the references. It's just JSON data.

If i dont resolve the reference in #/properties/$profile, it automatically is treated as an empty schema by the validator. i.e. any data that is passed, will pass validation. Is that the expected behaviour?

@gregsdennis
Copy link
Member

Every time I look at this my brain hurts sees different things. References are hard.

First, since the original reference (which is processed by the implementation as it's under the properties keyword) points to this location, the value at this location is interpreted as a schema (this part I got wrong before). However, the $id value at #/tilda~field is not interpreted as a schema because tilda~field is not a known keyword. It is just a container for JSON data.

Second, the schema at #/tilda~0field/slash~1field/$id has both an $id and a $ref. The draft-07 spec says that keywords alongside $ref should be ignored (this is changing in draft-08), which means that the $id of test/b should have no effect. That being the case, and given the first point, I would expect that http://example.com/document.json is resolved.

To confuse things even more, if you were to wrap the tilda~field inside a definitions keyword, then the $id would be interpreted and the resolution would be http://example.com/t/document.json. This is because definitions is a known keyword and its contents are interpreted specifically to be schemas prior to any $refs evaluating to those locations.


By "Don't throw; don't resolve the references," I was just saying that the implementation should ignore the contents of the data. On its own, the contents of unknown keywords shouldn't be validated and references contained within the data shouldn't be resolved. But in your case, the resolution journey begins within a known keyword, so continued resolution is expected.

@handrews
Copy link
Contributor

handrews commented Dec 2, 2018

@aravindanve @gregsdennis 's explanations are correct.

  • If the $ref keyword itself is under an unknown keyword, it is ignored because unknown keywords and their entire contents are ignored
  • If the target of a $ref is nested under a property name that is not a known keyword, that is fine, because it is just being used as a location. definitions (draft-07 and earlier) and $defs(drat-08) are the recommended locations to put schemas, but that just ensures that they aren't used for anything else. Technically, you can use any name you want to group schemas.

So that test suite example is fine. It would be nice to make it a bit less confusing but if you want to argue for that please file that on the test suite repository.

@aravindanve
Copy link
Author

@gregsdennis @handrews I can see what the tests are getting at. I can see the utility in allowing such purely location based references. But my point is, it introduces a certain level of ambiguity as those unknown keyword object references go deeper and deeper.

Okay, so for now i'll stick to this:

First, since the original reference (which is processed by the implementation as it's under the properties keyword) points to this location, the value at this location is interpreted as a schema (this part I got wrong before). However, the $id value at #/tilda~field is not interpreted as a schema because tilda~field is not a known keyword. It is just a container for JSON data.

I'll resolve all $refs in nested schemas against the document root $id (Only for unknown keywords of course). Which is what I have done to get the tests in the suite to pass.

Because the alternative is to determine the nearest parent that "fits" the definition of a schema and use its $id which is non-deterministic at best and may produce random outcomes based on factors such as levels of nesting, names of the properties in those objects etc. Anyways, I'll try and come up with a better example that encompasses all the edge cases I'm considering.

Thanks & Cheers!

@awwright
Copy link
Member

awwright commented Dec 3, 2018

Which of the following is the $ref at #/tilda~0field/slash~1field/$id/$ref resolved to?

In my interpretation, you don't, because {$ref} is only parsed where a schema is expected, and that is not a place where a schema is expected.

@gregsdennis
Copy link
Member

@awwright on its own, correct, this is not a place where a schema is expected. However, the example shows a $ref pointing to this location, so the data is interpreted as a schema upon resolution of this first one, at which point the secondary $ref (in the data) is then resolved.

@awwright
Copy link
Member

awwright commented Dec 3, 2018

@gregsdennis Got it. The conclusion of my point is don't author schemas like that, instead put schema definitions under a "definitions" block where the schema is expected.

But for schema authors who insist, I think implementations should use whatever URI was used to look up the schema. (This is the standard RFC3986 behavior). This suggests ignoring parent schemas.

@gregsdennis
Copy link
Member

don't author schemas like that

This seems important enough to say again.

@handrews
Copy link
Contributor

handrews commented Dec 3, 2018

@awwright hmm... maybe there's something worth clarifying here. I think the confusion can be summarized with:

{
  "$schema": "https://json-schema.org/draft-08",
  "$id": "https://example.com/whatever",
  "properties": {
    "a": {"$ref": "#/$defs/foo/thisKeywordIsUnknown/alsoUnknown/aDef"}
  },
  "$defs": {
    "foo": {
      "$id": "foo",
      "thisKeywordIsUnknown": {
        "$id": "bar",
        "alsoUnknown": {
          "aDef": {"$ref": "#target"}
        }
      }
    }
  }
}

What is the resolved URI that ends with #target?

Since $defs is a known keyword in draft-08, we should respect the "$id" in #/$defs/foo, because we know that is a schema.

But we don't know how to interpret the "$id" in the object at #/$defs/foo/thisKeywordIsUnknown, so my expectation is that we assume it is not a schema and therefore do not pay any attention to that "$id".

So I think the reference unltimately resolves to https://example.com/foo#target and not https://example.com/bar#target. I also think that https://example.com/whatever#target would be wrong, because #/$defs/foo is unquestionably a schema.

@awwright
Copy link
Member

awwright commented Dec 3, 2018

@handrews Good example, if I'm fixing the typo in your $ref correctly.

Is this significantly different than this one?

{
  "$schema": "https://json-schema.org/draft-08",
  "$id": "https://example.com/whatever",
  "properties": {
    "a": {"$ref": "#/$defs/foo/$defs/aDef"}
  },
  "$defs": {
    "foo": {
      "$id": "foo",
      "$defs": {
        "aDef": {"$ref": "#target"}
      }
    }
  }
}

That is, if we're using property paths in fragments to identify a schema, think the same behavior I'm talking about still applies; even when we are pointing to a schema where a schema is expected.

@jdesrosiers
Copy link
Member

@aravindanve

#/mappings/$profile and #/mappings/$id are not schemas because "#/mappings" is not a JSON Schema keyword. Therefore, you can not $ref those paths because $ref is only allowed to reference a schema.

Technically #/mappings may not be a valid schema, but #/mappings/$profile is.

So when you say "you can not $ref those paths because $ref is only allowed to reference a schema", it is referencing a schema :)

Your missing the point. #/mappings/$profile might look like a schema, but it's not. It's just arbitrary JSON whose properties happen to match JSON Schmea keywords.

@handrews

If the target of a $ref is nested under a property name that is not a known keyword, that is fine, because it is just being used as a location.

That's the way I've always thought about it, but this issue has made me realize that that's not what the spec says. When the spec was changed to only allow $ref to reference schemas, it had the unintended side effect of making it not possible to reference something that looks like a schema, but isn't a schema. If we want to keep the behavior you describe, the spec needs an update.

@awwright

The conclusion of my point is don't author schemas like that, instead put schema definitions under a "definitions" block where the schema is expected.

As true as this is, it's skirting the issue. Validator implementers need clear and consistent rules to build their tools. I feel like people keep bringing up real issues with using $id as it's defined and we keep telling them, just don't use it like that. It's good advice for schema authors, but unhelpful for validator implementers.

@Relequestual
Copy link
Member

Relequestual commented Dec 3, 2018

When the spec was changed to only allow $ref to reference schemas, it had the unintended side effect of making it not possible to reference something that looks like a schema, but isn't a schema. If we want to keep the behavior you describe, the spec needs an update.

@jdesrosiers

Which part of the spec says this? draft-7 8.3 says "The "$ref" keyword is used to reference a schema".
Combined with 4.3.1 "A JSON Schema MUST be an object or a boolean."
You could read "A JSON Schema MAY contain properties which are not schema keywords. Unknown keywords SHOULD be ignored." to include when detemining valid subschemas.

Chagelog for draft-wright-json-schema-00 does say

Limited use of "$ref" to wherever a schema is expected

But I don't see how, or at least, it's not explicit enough if that was the intent.

I think I agree that it's currently ambigious.
It would be interesting to survey the current major implementations to see how they behave.

@awwright
Copy link
Member

awwright commented Dec 3, 2018

But I don't see how, or at least, it's not explicit enough if that was the intent.

While it's not very explicit, there's this requirement:

An object schema with a "$ref" property MUST be interpreted as a "$ref" reference.

and there's no other provision for a "$ref" property to be interpreted as a reference outside of a schema.

@Relequestual
Copy link
Member

@awwright I think you've missed the point here!

Say I have a json instance that has $schema and $id at the top level, and no other schema key words... but nested in non schema keywords is a valid JSON Schema.

If another file references the $id in the first schema, and includes the path to the child object which is a valid schema, should that referenced JSON object then be considered a valid schema, even though in the first schema it has no implications (as it isn't in a place you'd expect to find a schema).

If THAT schema then has a $ref... when it's in the first schema file, it is never looked at, because it's nested under unrecognised keywords, however when evaluated by the second schema file, assuming the JSON object is now treated as a valid JSON Schema, the $ref is now in a location where you'd expect to find a schema.

@jdesrosiers
Copy link
Member

Which part of the spec says this? draft-7 8.3 says "The "$ref" keyword is used to reference a schema".

I think that's pretty clear. $ref is used the reference a schema. At best it's undefined what the behavior should be if referencing something that isn't a schema. I think we all agreed that the value of an unknown keyword is not interpreted as a schema, even if it has schema keywords.

Chagelog for draft-wright-json-schema-00 does say

Limited use of "$ref" to wherever a schema is expected

This on the other hand, says something different. If this was the way it was defined in the spec, it would be more consistent with the intended behavior.

@handrews
Copy link
Contributor

handrews commented Dec 5, 2018

@jdesrosiers

At best it's undefined what the behavior should be if referencing something that isn't a schema.

Agreed. Having a $ref point to a non-schema is conceptually the same as inlining the non-schema, the main difference being that the meta-schema cannot validate this. Which is why it is good practice to put $ref targets under $defs where the meta-schema can validate them.

But if you skip validation against the meta-schema, $ref-ing a non-schema has the same effect as inlining that non-schema, which is undefined in the standards-ese "probably random garbage" sense.

This on the other hand, says something different. If this was the way it was defined in the spec, it would be more consistent with the intended behavior.

The current spec presents $ref as just another schema object keyword, which is what that changelog from draft-wright-json-schema-00 is referring to. $ref outside of a schema object has no meaning, any more than any other schema keyword does outside of a schema object.

So while the current draft does not explicitly say "you cannot use $ref except where a schema is expected", that is ensured by the fact that you can only use $ref in a schema object.

@jdesrosiers
Copy link
Member

@handrews

Having a $ref point to a non-schema is conceptually the same as inlining the non-schema, the main difference being that the meta-schema cannot validate this.

But, didn't this issue bring to our attention that that isn't true?

$ref is not conceptually the same as inlining whatever is pointed to. This becomes clear when referencing an external document or a place where $id is used. If you just inline what is pointed to, you loose the context document and it's URI which are needed to properly resolve any $refs within the inlined value. When referencing a schema under definitions/$defs, an $id keyword can change the context document and it's URI. When referencing the same schema under an unknown keyword, an $id keyword would not change the context document or it's URI resulting in different validation behavior. It's not just a matter of meta-validation.

The current spec presents $ref as just another schema object keyword, which is what that changelog from draft-wright-json-schema-00 is referring to.

I'm pretty sure that was referring to no longer allowing something like the following.

{
  "type": "string",
  "minLength": 3,
  "maxLength": { "$ref": "#/minLength" }
}

We made the decision to limit JSON Reference to only be allowed where a schema is expected. maxLength expects an integer, so $ref would not be allowed here.

The wording in the changelog wording describes the intended behavior accurately while the spec wording does not because the changelog limits the source while the spec limits the target.

@handrews
Copy link
Contributor

@jdesrosiers I guess I failed to explain the difference between "conceptually the same" which was intended to capture the original intuition, and "mechanically the same" which would address how things actually work (or don't work).

Regarding:

The current spec presents $ref as just another schema object keyword, which is what that changelog from draft-wright-json-schema-00 is referring to.

that was a bad edit/sentence construction on my part. It was supposed to be something like:

The current spec presents $ref as just another schema object keyword. Previously, it was constrained to be a special case schema object keyword, which is what that changelog from draft-wright-json-schema-00 is referring to.

I tend to rewrite these things a lot and sometimes splice ideas by accident.

Regardless, the current spec says what it intends to say, which is different from that changelog from four drafts ago.

@handrews
Copy link
Contributor

handrews commented Dec 18, 2018

@aravindanve @jdesrosiers @gregsdennis my inclination is to state something like this:

A JSON object or boolean within a schema document can be recognized as a subschema through either its lexical scope or its dynamic scope. Using a by-reference applicator indicates that the target location is a subschema. However, because it is necessary to recognize schemas through their lexical scope in order to properly process $id, referencing a subschema that cannot also be recognized as a subschema lexically produces undefined behavior if $id is in the unrecognized lexical scope, and further by-reference keywords using relative URI references are present.

That's a little awkward, but perhaps we can figure out a way to be more clear. It's basically what @awwright meant with "don't do that", but written up in standards-ese 😁 ("undefined behavior" is standards-ese for "don't do that")

Would this work, at least to get draft-08 out the door?

@handrews handrews added this to the draft-08 milestone Dec 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Items that need to be clarified in the specification core
Projects
None yet
Development

No branches or pull requests

6 participants