Should we redefine our concept of "keyword independence"? #204
Replies: 3 comments 15 replies
-
This then requires the keyword to have knowledge of the schema in which they reside. While your concept grants that
While you may not be fully processing the dependent keywords in your static analysis, you still need to minimally consider them multiple times, which can subvert gains you get from parallel processing. My opinion is that we should have keyword independence as a goal, but not a hardfast requirement. Supposing that a keyword can be designed to work in such a way that allows it to be independent of other keywords, it should be. However we should recognize that sometimes keywords need the results from other keywords, and we need to be okay with this. Additionally, you still get the benefit of parallel processing if you group by dependency depth. For example, |
Beta Was this translation helpful? Give feedback.
-
Related to json-schema-org/json-schema-spec#701 |
Beta Was this translation helpful? Give feedback.
-
@jdesrosiers pulling this out to a top-level comment b/c the other thread has a good discussion going in a slightly different direction:
But what does that "exceptional" vs "general" really mean in practice? From a framework+modular keywords perspective, dynamic behaviors are either supported or not. There are two main dynamic behaviors: looking up an identifier in parent dynamic scopes, and reading annotations from child dynamic scopes. When it comes to cost, there are three areas where cost is incurred:
[side note: I'm going to ignore Framework costsFramework support for these behaviors is a one-time cost. Unless we eliminate them entirely (and We can and should work to reduce this cost, but the only way to eliminate it is to eliminate those keywords, and I don't think you or anyone else are proposing that we do that (please correct me if I am wrong). We should also work to understand how to make these keywords as parallelization-friendly as possible, while acknowledging that they do impose some serial execution requirements when they are used. We should work to ensure that they do not reduce parallelism when they are not in use. Standard keyword costsThe area of biggest concern would be standard keyword costs, because most implementations will (correctly) feel obligated to support these, and each such keyword can add additional cost and impose additional serial execution requirements. Part of why I filed #236 is to ensure that, as much as possible, standard keywords with dynamic behavior only incur (most of) their cost when they are used. While there are coarse-grained optimizations that can be done now (e.g. don't collect annotations if But as far as I know, there aren't any more dynamic keywords in high demand. I certainly have no plans to propose any, and the ones we have all met much higher bars of demand and/or discussion before they were accepted than most anything else in JSON Schema. I would hold other dynamic keywords to similar standards, and would support (and write if folks would like) an ADR making that high bar very clear. So if your concern about exceptional vs general case is primarily about standard keywords, I think I agree and I would like to help codify that. 3rd-party extension keyword costsThis is where we should have a general discussion about what our obligation is regarding what people could do with extension keywords. My keyword behaviors proposal is intended to put some boundaries on that so that there's a limited possibility for extensions to kill JSON Schema understandability and performance and damage the overall reputation of the project. Your counter-idea of doing this at a different abstraction level of principles would presumably accomplish the same goal (and I haven't tried to move the keyword behaviors stuff towards anything concrete because I honestly don't know which level will work out better). Whether we're talking about behaviors or principles, separating those things into "framework costs" (whether there's a literal framework or not) that are tightly constrained by the spec, vs "keyword costs" that might vary substantially, limits some of the possible costs of 3rd-party keywords. Which is obviously a more general concern than just dynamic behaviors, although dynamic behaviors are among the costs we most want to contain. Beyond these containment strategies, I don't think there's much we can or should do. We could exclude dynamic behaviors from the modular keyword "API" (in quotes b/c I don't mean literally specifying an API, more of a conceptual thing). But that would be hard to enforce and might actually make people think about those behaviors more. Plus, it would preclude valuable explorations of the tradeoffs. To summarize...
|
Beta Was this translation helpful? Give feedback.
-
The various discussions lately about the architecture of JSON Schema got me thinking about our principle of "keyword independence", why we value it, and why we seem to constantly violate it.
For those who aren't familiar, the keyword independence principle is the idea that keyword behaviors should be self contained. You shouldn't need information from other keywords in order to evaluate it. For example,
minimum
is ok, butadditionalProperties
is a violation because it depends onproperties
andpatternProperties
.Keywords that break this principle can have negative properties for schema design, but the overwhelming consensus is that the downsides in most cases are minor and better than the alternatives if there are any.
For me, the big win from keyword independence is the simplicity of the processing model it allows. Keywords can be evaluated in any order. No state needs to be maintained and passed around. This means you can evaluate keywords in parallel if the implementation's programming language supports it, which could mean big performance improvements in many cases.
It turns out that as long as the information we need from the depended on keywords can be determined statically (without evaluating the keyword), we can have keyword dependence without losing all those nice properties in the previous paragraph. For example,
additionalProperties
can look at the keys in theproperties
andpatternProperties
objects and have all the information it needs to evaluate against an instance independently. This can even be done in a compile step.So, if we consider keyword independence to be whether a keyword can be evaluated independently of the evaluation of other keywords, there are really only a few problematic keywords. These include the
then
andelse
keywords that depend on the evaluation result ofif
and theunevaluatedProperties
andunevaluatedItems
keywords that depend on the evaluation of sub-schemas. Technically, these violating keywords can be implemented to be independent, but it would result in evaluating some parts of the schema more than once in some cases.I think switching to this definition of keyword independence gives us a more clear and compelling reason for the principle than we had before. It also allows us to stop considering certain keywords in violation that haven't really been problematic. With a clearly defined reason for the principle, we can focus on trying to improve the few keywords that are in violation or explain why we think it's worth the violation.
Beta Was this translation helpful? Give feedback.
All reactions