Report schema paths in validation errors #199

Stranger6667 · 2021-04-29T17:46:12Z

When a validation error is reported, it contains only instance_path that points to the erroneous part of the input, but there is no information about what part of the schema caused this error.

The gist of the implementation is to keep a stack of strings while the input schema is compiled. The stack should be convertible to paths::JSONPointer.

The process for regular, non-composite validators (more on this below):

When the compilation process enters a validator, push its relevant keyword name to the stack (e.g. properties);
When the validator struct is created, clone this stack and store it within this struct;
When the validator is compiled - pop from the stack

I think that pushing/popping could be implemented similarly to instance_path - a linked list without heap allocations. This way, the actual popping step is not needed at all.

Composite validators are slightly more complicated - they are single structs that implement the logic for multiple different keywords or their specific variants (e.g. this one handles patternProperties and additionalProperties: false) which is done due to performance reasons.

To handle them, the stack should not be pushed during compilation, but rather during validation - if a validation error happens, the stored stack should be cloned, and the exact keyword variant should be pushed to it. And finally, it is moved to ValidationError.

Validators live in keywords.

A lot of work in this direction was done by @gavadinov in gavadinov@4014932, and I think that it could be a great base for implementing this feature.

@gavadinov, let me know if you are OK if somebody can base their work on your commit, or if you want to continue it yourself :)

The text was updated successfully, but these errors were encountered:

DzikiChrzan · 2021-04-30T12:08:04Z

May I take a swing on this?

Stranger6667 · 2021-04-30T12:10:14Z

Hi @DzikiChrzan!

Absolutely! Let me know if you have any questions on it :)

DzikiChrzan · 2021-05-01T12:01:16Z

I've been studying the code and I can't fully understand the flow of the data.
First of all, how does PathChunk looks like and what Index in it is?
Do you recommend some end-to-end tests that I could run with gdb to actually see the flow?

Instance is a path to json file, which is read via serde. At start there is called validate on Schema, which then calls crate::validate(self, instance) underneath. This is method from trait and I see that all keywords have those implementations, but I can't find out how validation by keyword is run on whole instance_json.

Stranger6667 · 2021-05-01T13:02:00Z

First of all, how does PathChunk looks like and what Index in it is?

PathChunk is an enum that represents either a key within a JSON object or an index within a JSON array. Then a sequence of such chunks represents any path within a JSON value. For example, let's take the following JSON data that represents a stored "ls -lh /home" command:

{
   "cmd": ["ls", "-lh", "/home"]
}

If we'd like to extract the "/home" string, then we need to take the following steps when navigating from the document root:

Go to the cmd property - it corresponds to PathChunk::Name("cmd".to_string())
Within the cmd's value, we have an array - ["ls", "-lh", "/home"]. Here, we need to take the element with index 2. It corresponds to PathChunk::Index(2)

When ValidationError is constructed we convert a sequence of such PathChunk that we accumulated during the validation into Vec<PathChunk>. It is done within InstancePath::to_vec and used to create JSONPointer. That JSONPointer struct is stored in ValidationError.instance_path attribute and then later can be converted to a string (which is a valid RFC 6901 JSON Pointer) or to Vec<String>.

The main point of this two-variant enum is to avoid creating heap-allocated strings for indexes within JSON arrays as usize is a Copy type.

I'll update the docs for these structs, and maybe PathChunk and its variants could be renamed to reflect better what they are? E.g. PathComponent with PropertyName and ArrayIndex variants ... what do you think?

Do you recommend some end-to-end tests that I could run with gdb to actually see the flow?

I think that is_valid_another_type or is_valid tests from keywords/mod.rs are relatively small & might be convenient for this purpose.

Instance is a path to json file, which is read via serde. At start there is called validate on Schema, which then calls crate::validate(self, instance) underneath. This is method from trait and I see that all keywords have those implementations, but I can't find out how validation by keyword is run on whole instance_json.

I'll try to describe the main blocks of the compilation process first, as the validation part uses its outcomes.

The schema compilation starts in compilation::options::CompilationOptions::compile, and its goal is to traverse the input schema and build a tree of validators.

This tree is actually a vector of pointers (then it is a forest, I believe) that reference validators that implement the Validate trait. Some of them may have their own vectors of pointers to deeper validators and so on.

Each validator there knows how to process their inputs. For example - if the input is Value::Array, the ItemsObjectValidator validator will iterate over array items and delegate validation of each item to the next level in the validation tree (which is stored in its validators attribute). I.e., validators know how to descend deeper into their inputs.

When the validation process starts inside JSONSchema::validate, it iterates over the top validators level and delegates validation to them. As they know how to process their respective inputs, the instance object is processed recursively in the same way as the tree was build during the compilation step. I.e., all the delegation goes through the validate method, which may call validate methods on "deeper" validators and so on.

I hope that it helps :) If not - I'd be happy to elaborate.

DzikiChrzan · 2021-05-01T16:11:29Z

E.g. PathComponent with PropertyName and ArrayIndex variants

I think that one's one understands point of PathChunk enum, then naming seems fine. Updating the docs is a good point.

Thank you for thorough explanation. I'll get back at you when I'll have design in mind, probably tommorow.

DzikiChrzan · 2021-05-03T15:05:40Z

I just noticed, that gavadinov@4014932 is not merged yet.
Should I work on top of his fork or is it going to be one pull request (his changes + this issue)?

Stranger6667 · 2021-05-03T15:24:01Z

Should I work on top of his fork or is it going to be one pull request (his changes + this issue)?

The best way is to use that commit as inspiration & implement this issue separately. It changes most of the places required to implement this issue, and that approach works.

The original goal of that commit was to track paths within instances, but the implementation was then re-worked in #184. Instances can't be unambiguously tracked by having data only from the input schema. However, the initial take in that commit could be extended to track paths within schemas.

After that point, I changed the way instance paths are tracked - it uses a linked list without allocating a vector for the path components. Probably something similar can be done for paths in schemas as well. Still, I'm happy with a vector-based stack as the compilation step performance is not that important compared to validation.

DzikiChrzan · 2021-05-10T18:12:56Z

Sorry for long no-response period, but I got some time off and was resting from computer.
I've prepared a draft, so far I've added schema_path as a vector of Strings which are pushed before compilation and poped right after (just for couple of validators yet).
I just wanted to confirm with you that's more or less design you had in mind.
Also, I don't like idea of changing mutability of context but currently I see no other way...

Stranger6667 · 2021-05-10T19:31:25Z

Hi @DzikiChrzan

Thank you, I'll check it tomorrow :)

Ref: Stranger6667#199

Ref: #199 Co-authored-by: ueco <ueco@libertymail.net> Co-authored-by: Dmitry Dygalo <dadygalo@gmail.com>

Stranger6667 · 2021-06-19T15:01:45Z

Resolved via #226

Thank you, @DzikiChrzan, for taking a step on this :) I added few more things on top in that PR

Stranger6667 added help wanted Extra attention is needed good first issue Good for newcomers Priority: High Type: Feature Difficulty: Medium labels Apr 29, 2021

Stranger6667 mentioned this issue Apr 29, 2021

[FEATURE] User-defined validation for the format keyword #158

Closed

Stranger6667 mentioned this issue May 5, 2021

Include path/key in error message ? #144

Closed

DzikiChrzan mentioned this issue May 10, 2021

Add schema paths to compilation context; this change requires changin… #226

Merged

Stranger6667 mentioned this issue Jun 17, 2021

Use meta-schemas to validate input schemas #225

Merged

Stranger6667 added a commit to DzikiChrzan/jsonschema-rs that referenced this issue Jun 19, 2021

feat: Report schema paths in validation errors

d279ceb

Ref: Stranger6667#199

Stranger6667 added a commit that referenced this issue Jun 19, 2021

feat: Report schema paths in validation errors

561f9cd

Ref: #199 Co-authored-by: ueco <ueco@libertymail.net> Co-authored-by: Dmitry Dygalo <dadygalo@gmail.com>

Stranger6667 closed this as completed Jun 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report schema paths in validation errors #199

Report schema paths in validation errors #199

Stranger6667 commented Apr 29, 2021 •

edited

Loading

DzikiChrzan commented Apr 30, 2021

Stranger6667 commented Apr 30, 2021

DzikiChrzan commented May 1, 2021

Stranger6667 commented May 1, 2021

DzikiChrzan commented May 1, 2021 •

edited

Loading

DzikiChrzan commented May 3, 2021

Stranger6667 commented May 3, 2021

DzikiChrzan commented May 10, 2021

Stranger6667 commented May 10, 2021

Stranger6667 commented Jun 19, 2021

Report schema paths in validation errors #199

Report schema paths in validation errors #199

Comments

Stranger6667 commented Apr 29, 2021 • edited Loading

DzikiChrzan commented Apr 30, 2021

Stranger6667 commented Apr 30, 2021

DzikiChrzan commented May 1, 2021

Stranger6667 commented May 1, 2021

DzikiChrzan commented May 1, 2021 • edited Loading

DzikiChrzan commented May 3, 2021

Stranger6667 commented May 3, 2021

DzikiChrzan commented May 10, 2021

Stranger6667 commented May 10, 2021

Stranger6667 commented Jun 19, 2021

Stranger6667 commented Apr 29, 2021 •

edited

Loading

DzikiChrzan commented May 1, 2021 •

edited

Loading