Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$combine: Re-use with "additionalProperties": false #119

Closed
handrews opened this issue Oct 29, 2016 · 10 comments
Closed

$combine: Re-use with "additionalProperties": false #119

handrews opened this issue Oct 29, 2016 · 10 comments

Comments

@handrews
Copy link
Contributor

[Filing at the request of @epoberezkin in #15]

Background

This is a proposed solution for one of the ways people want to re-use schemas that declare "additionalProperties": false. It could be made part of the JSON Schema specification, either as a required or optional feature, or it could simply be published to illustrate how a completely separate tool could be built to work around the limitation. This separability hinges on the proposal being in the form of a preprocessing step.

The reason to have the separability is that there is no other way to address this use case without violating at least one desirable property of JSON Schema validation, or introducing arbitrary JSON transformation capabilities into a standard that should be focused on schemas.

Allowing for partial implementation may also be a good idea, as any full solution will be complex to implement. A useful but limited solution is quite easy.

Keywords $combine and $combinable

There are two keywords in this proposal, both marked with a $ to indicate that they (like $ref) are structural elements outside of validation proper, and can be preprocessed out (either entirely, or lazily as needed in the case of $ref cycles).

$combine implements the oft-requested re-use pattern of applying additionalProperties only after inventorying all properties and patternProperties in all constituent schemas.

$combinable is a boolean keyword. If false, then $combine has no special effect on the schema- it behaves like allOf (explained in detail below). This preserves an the use case of using additionalProperties to intentionally close a schema off to this sort of "consider all properties" extension.

If $combinable defaults to true, then current schemas must opt-out of being usable with $combine. If it defaults to false, then current schemas must opt-in. Opting in would be compatible with the current draft, so I recommend that $combinable default to false to avoid automatically circumventing any intent to close a schema to re-use.

Formal description of the limitation of additionalProperties

Note: This assumes that #101 (allow true and false for all child schemas) is implemented, as it makes this whole thing about ten times more clear).

For nearly all validation keywords k1, k2, k3... the following (which @awwright referred to as "linearity" in issue #65 ) holds true:

{
    "k1": "something",
    "k2": {},
    "k3": ["etc."]
}

is equivalent to

{
    "allOf": [
        {"k1": "something"},
        {"k2": {}},
        {"k3": ["etc."]}
    ]
}

There are only a few exceptions (see also issue #77 ), and they tend to be vexing. additionalProperties is the one that trips people up the most. Here are its properties in terms of "schema algebra" (for lack of a better term):

{
    "required": ["bar", "x-awesomeness"],
    "minProperties": 5,
    "maxProperties": 10,
    "properties": {
       "foo": {"type": "number"},
       "bar": {"type": "boolean"}
    },  
    "patternProperties": {
        "^x-": {"type": "string"},
        "maybe$": {"type": ["boolean", "null"]}
    },  
    "additionalProperties": {
        "type": "object"
    }   
}

is equivalent to

{
    "allOf": [
        {"required": ["bar"]},
        {"required": ["x-awesomeness"]},
        {"minProperties": 5},
        {"maxProperties": 10},
        {"properties": {"foo": {"type": "number"}}},
        {"properties": {"bar": {"type": "boolean"}}},
        {"patternProperties": {"^x-": {"type": "string"}}},
        {"patternProperties": {"maybe$": {"type": ["boolean", "null"]}}},
        {   
            "additionalProperties": {"type": "object"},
            "properties": {"foo": true, "bar": true},
            "patternProperties": {"^x-": true, "maybe$": true} 
        }   
    ]   
}

You can separate each entry in properties and patternProperties, and handle the child validation in each of those separate entries, but the only simplification that can be done to a factored-out additionalProperties is to reduce the properties and patternProperties down to just validating the instance object property names (by giving blank or true child schemas) and leaving the child validation to the separated schemas.

This is because additionalProperties determines how to behave by taking the set of instance properties, removing all properties that match either properties or patternProperties, and applying its child schema to the value of any properties that remain. So the child schemas in properties and patternProperties have no effect on additionalProperties and can be factored out, but the property names and patterns do.

This is fundamentally why re-using schemas that use additionalProperties is so problematic for many people. It comes up most when it's set to false, but the underlying problem affects any use of a non-default additionalProperties.

Note that required, minProperties, and maxProperties do not have to be carried into the factored-out additionalProperties schema, as they do not impact the behavior of additionalProperties. Issue #65 proposes also removing any property named in required from the set of properties affected by additionalProperties, which would mean that required would also need to be carried along in the factored-out schema. This is why I oppose #65.

Re-use and additionalProperties

In the thread about use cases for JSON Schema re-use, we agreed that there have been several use cases motivating various attempts to get around this algebraic limitation. A simple example of what's desired is that people want this:

{
    "type": "object",
    "allOf": [
        {"properties": {"foo": {"type": "boolean"}}, "additionalProperties": false},
        {"properties": {"bar": {"type": "number"}}, "required": ["bar"]}
    ]
}

to be equivalent to

{
    "type": "object",
    "properties": {"foo": {"type": "boolean"}, "bar": {"type": "number"}},
    "additionalProperties": false,
    "required": ["bar"]
}

Instead, that allOf combination can never validate- the foo schema forbids any property other than "foo", while the bar schema requires the property "bar".

Note that an analogous problem occurs if additionalProperties is set to any non-blank schema. For instance, if it was set to {"type": "string"} it would still be an impossible schema as that would be applied to "bar" which already required its value to be a number.

patternProperties behaves in an analogous way, so I mostly will ignore it to keep things (relatively) simple.

Solving the problem with a pre-processing step

I propose to solve this with a keyword, $combine, indicating a pre-processing step:

  • It should start with a $ to indicate that, like $ref, there is something structural going on
  • The keyword simply applies the sort of factoring-out of additionalProperties shown at the beginning of this proposal
  • As a pre-processing step, it can easily be published separately, outside of the JSON Schema spec proper
  • It preserves the context-free validation property by ensuring that all schemas can be programmatically converted to schemas that validate in a context-free manner

Example conversion

Here is the conversion, replacing allOf with $combine. After preprocessing, this:

{
    "type": "object",
    "$combine": [
        {"properties": {"foo": {"type": "boolean"}}, "additionalProperties": false},
        {"properties": {"bar": {"type": "number"}}, "required": ["bar"]}
    ]
}

becomes this:

{
    "type": "object",
    "allOf": [
        {"properties": {"foo": {"type": "boolean"}}},
        {"properties": {"bar": {"type": "number"}}, "required": ["bar"]},
        {"properties": {"foo": true, "bar": true}, "additionalProperties": false}
    ]
}

The only change to the first two entries is removing additionalProperties. The new third entry just collects the various properties (and patternProperties if they were present) with blank schemas, and includes them with the factored-out "additionalProperties": false.

The same thing would happen when considering our alternate example with "additionalProperties": {"type": "string"}. The first two would look exactly as they do above, and the third would look the same except for {"type": "string"} in place of false.

Difficulty and optional-ness

Making these conversions is easy in the simple case. Even when there are child schemas, it's pretty easy to implement:

{
    "$combine": [
        {"properties": {
            "foo": {"properties": {"bar": X}}
        }},
        {"properties": {
            "foo": {"properties": {"bar": Y}}
        }}
    ]
}

is equivalent to:

{
    "properties": {
        "foo": {"properties": {"bar": {"$combine": [X, Y]}}}
    }
}

$combine gets challenging when combining complex schemas that themselves involve boolean keywords. Implementing $combine in such a situation requires implementing a kind of "schema algebra". Requiring this of all implementations seems excessively burdensome. As with format, I'd recommend specifying this but leaving it up to implementations whether to support all or part of the feature. In particular, an implementation may want to implement $combine except not support combining schemas that contain boolean keywords.

Here are the challenges for the boolean keywords:

allOf

Since we are converting to an allOf already, and boolean AND is associative, this is straightforward. It's just more branches to combine but the mechanism is the same. So {"$combine": [A, "allOf": [B, C]]} is the same as {"$combine": [A, B, C]}

anyOf

An anyOf needs to be factored out, as the collection of properties and patternProperties should not draw from all branches, but only from one at a time. So {"$combine": [A, "anyOf": [B, C]]} becomes {"anyOf": [{"$combine": [A, B]}, {"$combine": [A, C]}]}, which reduces the problem to our existing rules.

not

This gets a little tricky. The fundamental principle with not is that:

{
    "not": {
        "properties": {"foo": {"type": "number"}},
        "patternProperties": {"^x-": {"type": "string"}},
        "additionalProperties": {"type": "boolean"}}
    }
}

is equivalent to

{
    "not": {
        "allOf": [
            {"properties": {"foo": {}}},
            {"patternProperties": {"^x-": {}}},
            {
                "properties": {"foo": {}},
                "patternProperties": {"^x-": {}},
                "additionalProperties": {"type": "boolean"}
            }
        ]
    }
}

which, by DeMorgan's law, is equivalent to:

{
    "anyOf": [
        {"properties": {"foo": {"not": {"type": "number"}}}},
        {"patternProperties": {"^x-": {"not": {"type": "string"}}}},
        {
            "properties": {"foo": true},
            "patternProperties": {"^x-": true},
            "additionalProperties": {"not": {"type": "boolean"}}
        }
    ]
}

The not doesn't affect the validation of the presence or absence of the properties that are named or matched by a pattern. It only affects the validation of child values.

NOTE: Fully covering all of the things that can happen with not is complicated, and may require adding some new keywords like minAdditionalProperties to make possible in all cases. I'll elaborate on these issues if there is ever momentum behind this proposal. They are all solvable.

oneOf

As {"oneOf": [A, B, C]} is equivalent to:

{
    "anyOf": [
        {"allOf": [A, {"not": B}, {"not": C}]},
        {"allOf": [{"not": A}, B, {"not": C}]},
        {"allOf": [{"not": A}, {"not": B}, C]}
    ]
}

It can be handled from this transformed point with the already established rules.

dependencies

Name dependencies are unaffected, but when dealing with schema dependencies we have to handle the conditional nature of how the dependent schemas are applied. This involves observing that:

{
    "dependencies": {"foo": X},
    Y
}

Where Y is the rest of the schema containing dependencies, is equivalent to:

{
    "oneOf": [
        {
            "required": ["foo"],
            "allOf": [X, Y]
        },
        {
            "properties": {"foo": false},
            X
        }
    ]
}

This reduces the problem to a combination of oneOf and allOf, which can therefore be reduced to a combination of allOf, anyOf, and not, which gets us back to our known rules.

Conclusion

This is, as far as I can tell, the only way to get this behavior while staying within the philosophical boundaries of JSON Schema.

I am not personally convinced that JSON Schema should directly incorporate any of the proposed "solutions" for the "additionalProperties": false "problem", but @epoberezkin requested in #15 that this be filed for the record.

I strongly prefer this (plus #98 ) over introducing arbitrary JSON transformations as is done by #15 , but I would also be fine with rejecting both proposals (or even all three proposals including #98 ). All three proposals could be implemented separately and used either as a build step, or as an extended vocabulary indicated by another meta-schema.

@epoberezkin
Copy link
Member

epoberezkin commented Oct 31, 2016

@handrews thank you for that, it makes your proposal quite clear.

I do like your idea that "schema algebra" can be outside of the JSON-Schema standard.

As you noted, the complexity of consistent implementation is a concern, and I believe that consistency is very important. I'd rather it's very strictly specified in a very detailed way how it is supposed to work than leaving it to implementations. Inconsistency leads to a limited usability and in difficulties with migration between implementations.

My bigger concern about "schema algebra", leaving complexity aside, is that it leads to more performance-expensive schemas in many cases. See your anyOf example, e.g., where A will have to be validated twice, combined with B and combined with C. For many real life applications it is a critical concern.

The argument "too powerful" applies here as well I believe. I'd say it's more powerful than $merge, judging by the extent of changes it inflicts on schemas.

Also, it is much more complex than generic extension mechanism that would be introduced by $merge and even by $use (which is as I wrote has a more complex definition than $merge).

I am cool with $merge and $use staying out of JSON-Schema standard as well. Maybe @fge would be interested to collaborate on a separate I-D for $merge/$patch (that can be defined independent of JSON schema and instead rely on RFC6901 (JSON pointer), RFC3986 (URI) and JSON reference (draft 3?) - is there RFC already? ). And you can make a separate I-D(s) for $use and/or for "schema algebra".

Then validator authors can decide what to implement and after some time we can decide if any of them should be merged (or used :) in the standard based on the number of users that use the features.

How does it sound?

@handrews
Copy link
Contributor Author

As I said, I'm fine with none of these things making it into the standard, so that's fine with me. All I really care about is keeping both $merge/$patch and the not-visible-in-the-schema "ban additional properties" mode out of the standard.

I don't care if $combine or $use make it in or not. $use would be very helpful to me, but it's also the easiest of all of the options to implement as a preprocessor step. And if you do put together a "patchable JSON" media type or something I may well use it.

JSON reference (draft 3?) - is there RFC already?

@awwright killed JSON Reference as a separate RFC. I'm still not sure I agree, but it's very low on my list of concerns. Although if you do make a $merge/$patch media type, it will need $ref so that might be an argument for separating it back out and just having JSON Schema specify the circumstances under which it allows for a JSON Reference. There is nothing about the current way $ref is specified that ties it to JSON Schema, really.

@handrews
Copy link
Contributor Author

It might be fun and useful to write out the schema algebra rules and put them on the JSON Schema web site. They don't need to be an I-D of their own, they all follow directly from the JSON Schema specification. If there's interest in that I can track it with a bug against the web site project.

@DimDragon
Copy link

I'm working with large standardization group and we have quite some JSON Schema code base, where code reuse is top goal, so problems mentioned in google group post I have encountered multiple times. While I'm totally convinced JSON Schema just needs some extension keywords in order to allow re-use and extension, I can offer slightly different approach:

  • it is true that merge / patch / combine are all very powerful constructs and while they offer extension options, they also can break original schema restrictions
  • probably instead of focusing on additionalProperties, you can consider another keyword with sole purpose of restricting modification. Something like sealed comes in mind.

I slightly disagree on sample given:

that people want this:

{
    "type": "object",
    "allOf": [
        {"properties": {"foo": {"type": "boolean"}}, "additionalProperties": false},
        {"properties": {"bar": {"type": "number"}}, "required": ["bar"]}
    ]
}

to be equivalent to

{
    "type": "object",
    "properties": {"foo": {"type": "boolean"}, "bar": {"type": "number"}},
    "additionalProperties": false,
    "required": ["bar"]
}

Here obviously 'classical' merge/patch/combine is attempted and allOf is used only cause there is no other way to represent this (and yes I NEED this as well). However what I would expect as result is to reuse all properties of first object with their restrictions, resulting new schema may or may not allow additional properties, this is up to schema author. In case allOf is NOT used for that case, problem goes away since you no longer require new schema to be valid against combination schemes....if you wish to enforce that...we are in another issue...

@handrews
Copy link
Contributor Author

@DimDragon how would your sealed proposal work? If we use sealed to indicate that the set of properties cannot be expanded in any re-use scenario, then how should additionalProperties behave?

@DimDragon
Copy link

In my mind sealed should not be directed against object properties in general, but only against merge/patch/combine semantics. Let me try to put small sample:

{
    "title": "PeriodType",
    "type": "object",
    "properties": {
        "start": { "type": "integer" },
        "end": { "type": "integer" }
    },
    "additionalProperties": false
}

Now suppose I want to use this schema as base and just add name property to it. Using merge/patch on schema level I can simply 'override' additionalProperties and add new property. Object instance valid against new schema is obviously not valid against staring schema, but this was also not the goal. In case base schema has "sealed": true, then merge/patch pre-processor should emit error for trying to modify sealed object.

In our library we had to deal with exact issue you sampled and while we keep ```"additionalProperties": false" appearances very low, we created something like a 'convention' to deal with this problem:

  • we create schemes with SOLE purpose of code reuse, those never have ```"additionalProperties": false" and use different naming rules.
  • those schemes are ONLY referenced inside allOf, oneOf```, ```anyOf
  • those schemes are NEVER used as $ref

I'm not advocating strongly for such extension. I only wanted to point out that for me the core of the problem lies in using allOf as 'combination'/'inheritance' keyword, while it is not intended as such. Although I had to admit I have done the same and have written processor, which basically threats 'allOf' as combine to big extend and also combines restrictions. Here is example:

go to https://jschema.com and try this schema:

{
    "title": "CombinedSchema",
    "allOf": [
        {
            "type": "integer",
            "minimum": 3,
            "maximum": 100,
            "multipleOf": 3
        },
        {
            "type": "integer",
            "minimum": 12,
            "maximum": 240,
            "multipleOf": 5
        }
    ]
}

you will see kind of 'combine' in action, if you however attempt:

{
    "title": "CombinedSchema",
    "allOf": [
        {
            "title": "BaseOne",
            "type": "object",
            "properties": {
                "a": { "type": "integer" },
                "b": { "type": "string" }
            },
            "additionalProperties": false
        },
        {
            "title": "BaseTwo",
            "type": "object",
            "properties": {
                "a": { "type": "number" },
                "c": { "type": "number" }
            },
            "required": [ "a" ]
        }
    ]
}

you will see the 'incompatibility' error.

Hope this is helpful to demonstrate the 'abuse' of allOf

@handrews
Copy link
Contributor Author

handrews commented Aug 20, 2017

VOTE-A-RAMA!!!

It's time to gauge community support for all re-use/extension/additionalProperties proposals and actually make some decisions for the next draft.

Please use emoji reactions ON THIS COMMENT to indicate your support.

  • You do not need to vote on every proposal
  • If you have no opinion, don't vote- that is also useful data
  • If you've already commented on this issue, please still vote so we know your current thoughts
  • Not all proposals solve exactly the same problem, so we may end up implementing more than one

This is not a binding majority-rule vote, but it will be a very significant input into discussions.

Here are the meanings for the emojis:

  • Celebration / Hooray / Tada!: I support this so strongly that I want to be the primary advocate for it
  • Heart: I think this is an ideal solution
  • Smiley face: I'd be happy with this solution
  • Thumbs up: I'd tolerate this solution
  • Thumbs down: I'd rather we not do this, but it wouldn't be the end of the world
  • Frowny face: I'd be actively unhappy, and may even consider other technologies instead

If you want to explain in more detail, feel free to add another comment, but please also vote on this comment.

The vote should stay open for several weeks- I'll update this comment with specifics once we see how much feedback we are getting and whether any really obvious patterns emerge.

@handrews
Copy link
Contributor Author

With six against and two for (one person voted twice :-P ) I'm pretty comfortable closing this as non-viable even though it's only been a week and a halr. No one was close to volunteering to advocate for this, so unless someone comes along to step into that role, this is not going to be accepted.

Also, it's my idea, and I think it's a mess :-)

But if someone wants to champion this, please comment and ask to re-open it.

@ljharb
Copy link

ljharb commented Aug 31, 2017

fwiw, I'm not confident enough to advocate for the specific proposal, but this problem absolutely needs to be solved, and I'm happy to advocate for any solution that solves it.

@handrews
Copy link
Contributor Author

handrews commented Sep 1, 2017

@ljharb see #388 for the latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants