-
-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardized schema output (validation result, annotations, and errors) (cont.) #643
Comments
Upon further consideration, I think it might be OK to have a standardized schema for reporting validation errors, especially for Web services, e.g. if you try to submit a document to a server, how the server would report back which problems should be corrected. RDFa has a similar standard, it defines a vocabulary for the purpose of Web services to report errors while parsing RDFa-enabled documents. |
Great point @awwright, and it might be interesting to think of how such a format might fit with RFC 7807 Problem Details for HTTP APIs (the |
I just recently had a go at implementing this. I found that implementing the verbose hierarchy was actually the easiest thing to do. I then have logic that condenses/flattens it for the other formats. (See the PR linked above for examples of generated output.) Edit Just released this functionality in a preview version. |
@gregsdennis great! Back to the spec topic, I thought of one more option. :) We can improve ease of adoption if the response will operate on links as standard JSON schema:
This piece of JSON I can open in for example PhpStorm and click through all relevant |
Interesting idea. I like where you're going with it. There was some resistance before to including the instance and schema data in the results, but I think it was more against the idea of including full chunks with each error. That's when using URIs & pointers was proposed (I think by @handrews). As you mentioned, though, I wonder if there's more wide-spread editor support for this kind of thing. I hesitate to introduce a feature just because it helps support a specific tool. |
@gregsdennis Thanks for your work on this.
Should se specify A further thought: I think @handrews has nominated me to try and push this effort / issue for draft-8... =D Bout time I did some more actual document writing! |
@Relequestual I'd say it goes in core, because we're defining the general output format for any and all vocabularies. So it should be described in terms of keyword classifications (assertions, annotations, etc.) rather than specific keywords. You may want to include some commentary in the validation spec about how it works for that spec's assertions and especially annotations, and show some concrete examples, but the core spec should technically be sufficient. We'll need to update hyper-schema as well as it currently makes its own recommendation about output, but I can do that after you've done core and validation if you'd like. |
@Relequestual I like the requirement keywords, but I'm not a fan of the term "causality," especially when passing schema with annotations are considered. That's why I just used "flat" and "hierarchical." They describe the shape rather than imbue meaning into the contents. Regarding changing "RECOMMENDED" to "MUST," I have no real feels either way. I can make arguments for both. |
What about adding an optional property |
@anatoli26 that's a good a idea, but it seems like an application-level concern. I think it's something that the server and client designers would have to negotiate in their contract. |
@gregsdennis @Relequestual I'm in no way set on causality, it was just a term to indicate that it included @vearutop 's cause context. I don't feel that flat, hybrid(?) and hierarchy are the right labels to give for the output. basic, detailed(?) and verbose may be more suitable, if there were ever more output to add then naming them on their shape loses appeal if you have more than one variation of flat/hierarchy hybrids. Using a more amenable designation could let the description change without a need to touch the name as well. I agree with @handrew that they should be in core, or if not, then their own spec, at least for the Annotation Description Object and Error Description Object. Not sure about basic, detailed and verbose definitions, but I have no opinion on where they go. |
I'm happy with basic, detailed, and verbose so long there's a clear definition of what each entails. |
I've been away at conference. I'll aim to look at these comments over the next week or so. Thanks. |
Is the Having a |
I'm my implementation, I had to explicitly set the My implementation aside, your point is taken. I still think it should be included for convenience of the consumer. Maybe the best option is to suggest its inclusion as a recommendation. |
Not at all. I just think the debugging support is in the domain of "applications that might be built on a validator" rather than what the RFC tells me the validator must support. Our use of a validator doesn't produce any output in any case other than an error, and we only output the first error. We don't provide validation as a service, and we don't care about debugging schemas and the object representations -- that's left to other applications. If in the case that we do provide an error, we produce it in a standard format then hopefully that makes it easier to build front ends on our system. I'd just like the spec to stop at that point. |
@KayEss Then your implementation only need provide a basic level of support for output and error reporting, which still gives it a compliance stamp. Users are frequently asking "Why don't I see all the errors", specifically when This isn't functionality we are defining to be difficult or make it harder for implementations to be compliant. It's not just about debugging. If you don't provide all the errors back to the user, their application can't make an informed decision about how to translate errors to their front end users, which is what people are asking for sometimes. It's not just about debugging JSON Schemas workflow here. The only way to get real interoperable tooling to support error collection that can be returned to front end users of an application, is to standadise multi-level, multi-error reporting formats. |
Just as a side note, I discovered today that GraphQL actually returns paths in array form as @KayEss suggests. I still think that Pointers are correct for JSON Schema since they're in use anyway, though. Many platforms don't care about strong typing, and we're trying to be platform-agnostic (this coming from a .Net developer). |
There is one advantage to them: the path "#/foo/1/bar" can be either |
On the other hand, pointers are more versatile. The same path is compatible with Many languages and platforms are not strongly-typed, and we're trying to remain platform-agnostic with JSON Schema (this coming from a .Net dev). |
Sure. I wasn't advocating for arrays. If I mentioned them at all it was just because that's what we have as our implementation is built on them. |
is there a current advised way of setting custom errors for validations? |
@bovas85 this issue is discussing the shape of the response. If you want to set custom error messaging, you should contact the owner of the implementation you're using. We're not covering that here. |
@bovas85 @gregsdennis In theory, if you get the path to the schema section that faild (which is what this issue IS covering), one could add (unspecified) annotation fields to contain your custom errros. It wouldn't be baked in to your validator, but you could write application code ontop to make it work. |
I thought the schema itself would allow specifying custom errors as the ones provided by the pattern or filter are not adequate, but I guess this is taken care of using libs like AJP |
@bovas85 There is no defined standard error message that's required. That's the point of this issue. Only current requirement is an assertion of valid or not. Anything else is currently up to each implementation. |
I took my first pass at reading this and I've got some of the following comments: Does this have to be in JSON Schema Core, or any of the specifications? Or can it just be a schema and/or document on its own? I'm wary of specifying different verbosity levels, that suggests that both implementations and programs must be prepared to process any of them. That sounds wasteful. We should survey implementations and see how they produce output. I can do that with a few weeks of time. "error" to me includes any problems during computation, for example an invalid schema. We should consider having two classes of errors to distinguish between "server-side" and "client-side" errors. Maybe call the latter "Exceptions" because they describe an exception to the assertions that the schema is making about the instance. All in all it sounds like we're getting dangerously close to just wanting to define a WebIDL interface for results (which can be implemented in all sorts of languages in a single way). Question for everyone: Who is going to be the target audience(s) for reading the output of this standard output format (who would prefer reading a JSON document over my hypothetical WebIDL interface)? |
@bovas85 the thing you're asking about has been discussed before in #148. After this issue is resolved, it may be worth revisiting that idea, in combination with the concept of annotation keywords. Annotations are much more well-developed now than when #148 was discussed and closed, and the output format being worked on in this issue would provide a clear way to convey such error message annotations back to the implementation. My preference would be to let folks experiment with extension keywords, now that extensions will be easier with |
This has been concerning me since seeing the three levels (but I was basically on my way out of town at that point, and then got sick as soon as I got back, so I haven't had a chance to think on it more until now). On the other hand, the output format is RECOMMENDED (or SHOULD), not MUST, so we just need to figure out where we want to balance cost vs interoperability. For the most part, the person running the validator can choose which validator to use, and write their code appropriately. It is not like the schema keywords themselves which the schema author and implementation must agree upon in order for anything to work.
This is a good question, and in Hyper-Schema draft-07 we just provide a recommended output schema, and briefly reference it in the specification text. And use it in examples. But it was pretty straightforward to come up with that format- it's basically the resolved URI Templates and resolved relative pointers, IIRC. This output format is substantially more complex, and was substantially more difficult to design. It has also attracted a broader audience commenting on it than pretty much anything else in draft-08 ( Finally, the output format includes annotation values, and is the last step needed to be able to build interoperable tools that rely on annotations. Since annotation usage is application-specific, we don't need make the output a MUST, but setting a clear expectation of what annotation collection SHOULD provide will make building an ecosystem of annotation-based tools much easier. So I would prefer to have it in the spec- I've only skimmed the PR at this point, but I particularly like the explanation of the rules governing the structure. That will help authors of extension keywords understand how their extensions will affect the output, beyond just validation results. |
Oops- forgot a bit. Where I was going with the "this has attracted more input" part is that it demonstrates a lot of interest in having this. I suppose that does not necessarily mean that this needs to be solved in the spec, but at least a certain level of officialness seems appropriate. |
Answers to Austin's questionsThis was longer than I intended...
We were stuck in indecision hell for months regarding the detailed vs verbose output options, this was a compromise that everyone appeared to be happy to accept. The goal being that a library can then claim "We support output levels up to detailed" meaning support for "flag, basic and detailed" so you know if it will work in your app or not without re-work. if we don't have levels there will be apps that have output that in no way follows any guidance, which reduces the overall benefit. I suspect that perhaps we need to make it super clear that if you support "detailed" or "verbose" you should support all the variations below it to claim support by your library of the output spec.
I already went through about 40 implementations and while the bulk supported "flag" and array based "basic", there was enough supporting "detailed" and "verbose" hierarchy styles that I no longer felt I could ignore its value to libraries that use it and their users.
I like the idea of error codes, like tv4 has, then internal errors would just have a code range while still adhering to the spec. I think we started that discussion but was determined that we get something agreed on into draft so it can be improved and evolve in later drafts based on real-world feedback as to whether codes were included in the spec.
For most libraries the output is used by an application to then display the error within the gui. There was mention of making it more human readable at one point but when the main reason for that was debugging then that would be better handled by using a debugging script to translate when needed. The basic and detailed output are designed for easy machine looping. Does that answer your question the way it was intended, I feel like it doesn't? |
@Anthropic thanks for catching us up on the work to date. I know it's always a bit difficult when someone parachutes in at the end of a long discussion 😛 Anyway, it sounds like the multi-level thing has gotten a lot of vetting and comparison to the existing real world already, so I'm comfortable putting it into a draft. If there's a mass revolt we'll change it, that's what drafts are for and we know we'll be doing at least one more before moving further into a standards process.
I agree with @Anthropic that while this is an interesting topic, it expands the scope of this work and should be separated out. Most users think of instance-against-schema errors as the real "errors" of interest. I feel like the other class of errors reduces to that in the form of schema-against-meta-schema errors. Although perhaps I misunderstand. In any event, unless you see something here that will prevent us from addressing this at all in the future, I would like to stay within the proposal's existing scope. Regarding output, my impression is that we are defining the output in terms of the JSON data model (as everything else in JSON Schema is defined), and showing examples of it in JSON (because... JSON Schema 😁 ). So if someone wants to build some other interface to that data, that is fine, and not relevant to specification conformance. Obviously if you are writing output to a file or pipe or something, you need a serialization format and the obvious one is JSON, but I don't think it's mandatory any more than having all schemas and instances be JSON- if you can parse it into or out of the data model, that's sufficient. |
@awwright
It will no doubt be looked at again after feedback, to see if providing a defined set of standard error codes somewhere adds as much value as I suspect it would for some implementation scenarios. |
This issue is to summarize the discussions and progress of #396, which sought to develop a standard output format for JSON Schema.
This issue does not cover standardized error message wording. This issue is for output formatting only.
Data Requirements
Validation outcome
This will be a simple boolean value indicating whether the instance passed validation.
Annotations/Errors
Since annotations are only collected when validation passes and errors only occur when validation fails, these two sets are mutually exclusive and will never appear in the result set together. However, they do share a similar (if not the same) structure.
$ref
segments in the path$ref
s (otherwise, the two locations will be the same).Instance data
We had discussed including the instance data in the response, but we determined that it was sufficient to provide a pointer (see Instance location above).
Copying the data into the output can still be an option for implementations, however.
Format
Both proposed formats return an object containing the validation outcome and a list of annotations or errors. The unresolved bit is how to represent annotations and errors generated from subschemas.
NOTE I will be using errors for these examples, but annotations could be used in their place. As mentioned before their structure is similar.
For these examples, we will use the following schema and (invalid) instances
Flat
This proposal outputs all annotations and errors in a flat list. Because it's a flat list, it's easy to both build and consume.
Hierarchical
This proposal arranges the errors in a hierarchical format based on the schema.
The primary argument against the flat structure is that it's difficult (both for machines and humans) to see any association between the errors. For example, in instance #1 results above, there is no immediate indication that the first three errors pertain to the first subschema of the
oneOf
and the last two errors pertain to the second subschema. Moreover it becomes more difficult to understand that either the first three or the last two must be resolved to pass the instance, but not all five. The hierarchical format aims to make that association more apparent.The construction of this object requires rules that would need to be included in the specification, particularly the conditions that are required for a node to be present.
*Of
,$ref
,if
/then
/else
, etc.) require a node.An (unoptimized) algorithm for this may be
Other considerations
Though the crux of this issue is the above, some other proposals have been made relating to this topic.
Implementation domain
The specification could allow for both of these formats, allowing the implementation to choose.
Configurable output levels
The specification could define different output settings.
The text was updated successfully, but these errors were encountered: