Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema.json mandates infinite recursion via $ref #263

Closed
erayd opened this issue Mar 1, 2017 · 10 comments
Closed

schema.json mandates infinite recursion via $ref #263

erayd opened this issue Mar 1, 2017 · 10 comments

Comments

@erayd
Copy link

erayd commented Mar 1, 2017

If the following three points are all true:

  • items is not set; and
  • Setting default values during validation is enabled; and
  • Validating default values is enabled;

Then the result is infinite recursion:

  1. Validating some object against the root schema
  2. items is not set
  3. Set items to an empty object
  4. Validate the new empty object against the root schema (i.e. goto (1))

schema.json excerpt:

        "items": {
            "anyOf": [
                { "$ref": "#" },
                { "$ref": "#/definitions/schemaArray" }
            ],
            "default": {}
        },

The official spec document here says:

7. Schema references with $ref
A schema MUST NOT be run into an infinite loop against a schema. For example, if two schemas "#alice" and "#bob" both have an "allOf" property that refers to the other, a naive validator might get stuck in an infinite recursive loop trying to validate the instance. Schemas SHOULD NOT make use of infinite recursive nesting like this, the behavior is undefined.

The validation document here says:

6.2. "default"
There are no restrictions placed on the value of this keyword.

This keyword can be used to supply a default JSON value associated with a particular schema. It is RECOMMENDED that a default value be valid against the associated schema.

This keyword MAY be used in root schemas, and in any subschemas.

My understanding from reading the above is that if default is used in accordance with the spec, and validated as recommended, then schema.json violates the no-infinite-recursion-via-$ref rule.

I am currently trying to resolve this issue in a validator library (see here) - have I badly misunderstood things here, and / or is there a recommended way of dealing with the situation?

@epoberezkin
Copy link
Member

epoberezkin commented Mar 1, 2017

The spec says nothing about actually adding defaults into data, so there is no contradiction in the spec itself. If you do add defaults, you can indeed cause infinite recursion in this case. The way I am resolving this issue in ajv is that even if validator instance has the option that enables insertion of defaults, this option does not apply to meta-schemas (that are usually recursive) as you probably do not want to mutate your schemas anyway.

@mathroc
Copy link

mathroc commented Mar 1, 2017

I've read a bit more, and found out that default is known to be under specified. and people use it in different valid ways. there is ongoing discussion about fixing this (maybe by splitting it into more keywords for each use)

see #204, #217 & #136

so this is probably something that will need to be resolved if/when a "apply_default_on_validation" keyword is added

@erayd
Copy link
Author

erayd commented Mar 5, 2017

Thanks all - sounds like it's woolly enough we'll have to come up with our own sane middle ground for how to deal with them until the spec nails down the intended behavior more precisely.

@epoberezkin:
I know the spec says nothing about inserting the defaults into data... but it does recommend that they be valid, and it's the validation that causes the recursion. Unless the spec intends the user to blindly trust that the default is valid without checking it, isn't this still a contradiction?

I like your meta-schema idea, but how do you identify whether something is a meta-schema? As far as I'm aware, there's no attribute that says whether a schema is intended for validating other schemas. How do you deal with non-meta schemas that still define infinite recursion?

@handrews
Copy link
Contributor

handrews commented Mar 6, 2017

@erayd I don't see how you end up in an infinite loop here, even by adding an validating defaults.

If I understand you correctly, you are filling in the default value for "items" as an empty schema, then filling in the default within that empty schema, which produces another "items" with empty schema and so on.

But that's not how validation works. Validation is also a function of the instance. If you are validating an array, and "items" is not set, then you can treat it as having an empty schema and validate the elements of your instance array against that empty schema.

But then you stop. You don't just keep stuffing infinitely many layers of "items" in there. At most, if your instance is nested arrays, you will only recurse the number of levels that are present in your instance. Once you're no longer validating an array, there is no reason to fill in the default value of "items".

Plus, an implementation should notice that a schema of {} always successfully validates and does not need to recurse. That is part of the specification directing that a validator MUST NOT get into an infinite loop. Sooner or later you end up with defaults that are {} or true and you can just pass validation and stop.

@erayd
Copy link
Author

erayd commented Mar 6, 2017

@handrews In the case I'm talking about, and as per my example code above, items has absolutely nothing to do with arrays. It's an optional property of an empty object, has the default value of an empty object, and specifies that it should be validated using the same rules as its parent. What the property is named has no relevance; it could just as easily be called wizardOfOz and the effect would be the same. As such, if you follow the spec recommendation that default values be valid, and you don't just blindly trust the default to be valid, it's infinitely recursive, by definition, the second you try and validate it using anything that consistently expands defaults.

It's quite possible that I'm missing something fundamental (in which case, I'd really appreciate it if someone explained it a bit differently, because I'm not grasping it so far) - but unless I'm missing something, the only way to avoid infinite recursion here is to be inconsistent about how default values are handled, or to not validate default values. I get that the spec is not particularly well defined in this area, but surely if it deliberately intended that default be treated inconsistently, then it would say so explicitly, as the whole point of a spec is to avoid ambiguity and make sure everyone is on the same page.

But that's not how validation works. Validation is also a function of the instance. If you are validating an array, and "items" is not set, then you can treat it as having an empty schema and validate the elements of your instance array against that empty schema.

You're assuming that I don't want to validate whatever default value I just used in place of items. What I actually want to do is enforce the spec recommendation that the default be valid, by running the validator against whatever value it defines.

But then you stop. You don't just keep stuffing infinitely many layers of "items" in there.

Why do you stop? The spec certainly doesn't say to stop. Shouldn't it be the case that, when deciding whether or not to validate defaults, that decision is universal and consistent? Otherwise you could easily end up in the situation where your default value is a template, which itself contains other defaults, and those defaults are never expanded.

Yes, there are a number of ways to break such a loop - but shouldn't it be reasonable to assume the following?

  • That if a value is missing, but the schema defines a default for it, the value can be set to that default without causing problems; and
  • That I should be able to validate a default value using exactly the same logic that I would use to validate anything else.

At the end of the day, it's the combination of those two assumptions with the way the spec defines items that is triggering the problem. Changing either of those assumption or changing the spec would resolve the issue. I'm not terribly fussed with which of those things changes, or even if none of them do - I'd just like to make sure that whatever position I adopt is sane, and conforms as best as possible to the intent of the spec.

@handrews
Copy link
Contributor

handrews commented Mar 6, 2017

You stop because the validation outcome of {} is always success.

If you blindly expand every possible validation keyword in every empty object, of course it is infinite recursion. It's a recursive schema. But that's ignoring that you do not need to do any such thing to produce a validation outcome.

The specification directive:

A schema MUST NOT be run into an infinite loop against a schema.

is telling you that if it is possible for you to detect a terminating condition for recursion, then you MUST do so in order to be a conforming implementation. There are many possible schemas that are clear terminating conditions, but {} and {"not": {}} are the most obvious (and we make it even more obvious in the next draft by allowing true and false in their place, respectively).

So if you automatically write in every default to infinite levels, you are in violation of the spec. Not because of the specification of "default", but because the specification requires that you detect infinite recursion and avoid it. And in this case, it is extremely easy to do so. You do not need to write in the defaults to answer the validation question, so don't.

@erayd
Copy link
Author

erayd commented Mar 6, 2017

@handrews
Thanks - that's really helpful. I'd misunderstood that bit of the spec that you quoted to be talking about the schema, not about the validator. Noting that, it's clear that the onus is on me to detect and avoid such recursive behavior, so the question is now how to do that.

Does the spec violate the "Schemas SHOULD NOT make use of infinite recursive nesting like this, the behavior is undefined" bit of that same paragraph, or is it considered to be fine because the behavior of default is also undefined?

You stop because the validation outcome of {} is always success.

It's not. For example, {} would be in violation of {"required":["someProperty"]}. {} is a universally valid draft-04 JSON schema, but is not a universally valid JSON document, or fragment of one. Stopping on {} in such a situation would mean the default was might be invalid.

You do not need to write in the defaults to answer the validation question, so don't.

There are many, many use-cases where clamping an input to be valid against a schema, including filling of missing values, is useful (e.g. part of a webservice that accepts user input, validates it and adds missing values, then takes some action based on the resulting document). Simply saying "you don't need to do this, so don't" is a cop-out. In addition, the act of validating the default is where the issue lies, not whether or not that value is applied to the input - you cannot answer the validation question completely without also determining whether the defaults are valid.

Is there a recommended position to take on how deeply defaults should be applied, when default-applying is happening? It sounds like inconsistent behavior is mandated by the schema, based on the bit you quoted above - but it doesn't take any position on what that inconsistent behavior should be. Is applying defaults, but not defaults-on-defaults, a reasonable position to take? It has the upside of being simple to implement, but the downside of causing unexpanded templates.

@handrews
Copy link
Contributor

handrews commented Mar 6, 2017

My point with a schema of {} is that it all documents are valid against it. So if your schema is {} there is never a reason to fill any defaults into it in order to answer the validation question. In face, the defaults for the meta-schema are intentionally chosen such that everything validates against them. Otherwise, the spec wouldn't make sense. If you could fill defaults into a schema of {} and produce a possible validation failure, then that would be inconsistent.

So whenever you see a schema of {}, {"not": {}}, true, or false, you can stop. On the instance side, infinite recursion is never a problem. "$ref" is part of schemas, not instances. Even when you are validating a schema (which would be an instance involving a "$ref", in the context of an instance, "$ref" is not important. This is why it the spec does not violate the SHOULD NOT of recursion. Look closely at the mutual recursion example- that example is not possible to resolve not matter what instance is used. It's flatly impossible to even figure out the schema in code, much less apply it to anything. If you construct such a situation with a default, that's just as bad and to be avoided as if you do so without a default. Again, none of this has anything to do with "default" itself.

As for applying defaults, that is not a concern of the spec as it is written. If you want to make use of a default value in an application, it is up to you to do so in a way that makes sense.

If you want a well-specified mechanism for applying default values in APIs, well, I'd tell you to comment on the issue for it but I got fed up with people dragging it horribly off-topic in the replies and closed it. I'll file it again eventually after draft 06 is published and I'll try to remember to mention you in it.

Please don't try to further convince us that "default" is problematic to use. We know. There are multiple issues filed. There will be more. If you want more background, you can find my email in the spec document, feel free to email me and I'll go into more detail.

@handrews
Copy link
Contributor

handrews commented Mar 6, 2017

@erayd And I meant look at the spec document on master to find my email, not the published one :-)

I'm going to go ahead and close this because we already have #204 to cover the general problem with "default" and I think this all falls under that and its related issues. Do feel free to re-open if I'm missing something, or a file a new issue about a specific use case.

To be clear, you are identifying real concerns here, it's just a very messy area and we're more focused on getting Draft 06 out the door right now than diving into big new topics. That's why #204 is holding where it is. Hopefully we'll have Draft 06 out in the next few weeks and can start looking at bigger concerns again.

@handrews handrews closed this as completed Mar 6, 2017
@erayd
Copy link
Author

erayd commented Mar 6, 2017

Gotcha - that sounds essentially like your position agrees with the 'don't apply defaults inside defaults' solution I asked about above, so I will implement it in that manner for now. Thanks for clarifying :-).

More background would certainly be useful, but I suspect that my wanting to know more background to the issue is more of a curiosity thing than my genuinely needing to know it - I'll ask, but if you're busy, please feel free to decline further explanation.

I'll keep an eye on #204 and see where it goes - have subscribed to updates on that issue. The only case I can think of that doesn't seem to immediately cover is that it would be useful to be able to use nested defaults values to achieve essentially the schema equivalent of inheritance - but that's definitely not essential.

I appreciate the time you spent on this issue - thank you :-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants