Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct reporting of oneOf (anyOf, allOf) errors #895

Closed
jpmckinney opened this issue Nov 19, 2017 · 24 comments
Closed

Correct reporting of oneOf (anyOf, allOf) errors #895

jpmckinney opened this issue Nov 19, 2017 · 24 comments
Assignees
Labels

Comments

@jpmckinney
Copy link

e.g. http://standard.open-contracting.org/validator/data/efa21228-0d98-4080-8c4f-ab63db58e1f6

"[very long JSON] is not valid under any of the given schemas"

The hundreds of rows of oneOf errors should be collected into a single error, not presented as distinct errors.

If that's not possible, then if we can collapse the long JSON (e.g. show a snippet and require the user to click on something to see the full JSON), it will make reviewing errors easier.

@robredpath
Copy link
Member

@duncandewhurst this is the issue you were looking for

@duncandewhurst
Copy link
Contributor

Looking at this example my understanding is that when validating a record, any errors found in the compiledRelease are reported individually but if there is an error in the array of embedded releases, the individual error isn't reported and instead you get "[very long JSON] is not valid under any of the given schemas"

How much work would it be to get individual reports of errors in the embedded releases? I think that would make the most sense for users

@jpmckinney
Copy link
Author

@robredpath mentioned having a solution from another standard (BODS, I think).

@jpmckinney
Copy link
Author

@robredpath Was this completed as part of the April sprint? (The sprint doc hadn't been updated to document the outputs.)

@robredpath
Copy link
Member

@jpmckinney oops, sorry - I've updated the doc now. This wasn't done as part of the April sprint.

@duncandewhurst
Copy link
Contributor

I came across another variant of this issue today when validating a record package from Uganda, the path to the error was records:

(giant block of JSON) has non-unique elements.

There is also something weird going on because OrderedDict is appearing in the big block of JSON, when it doesn't appear in the raw data:

image

@duncandewhurst
Copy link
Contributor

@Bjwebb
Copy link
Member

Bjwebb commented Aug 29, 2019

How much work would it be to get individual reports of errors in the embedded releases? I think that would make the most sense for users

Yes, we can do this in a few days. It's what we did for BODS, see these screenshots for an example. We replace the oneOf validation function from jsonschema with our own custom one, so we can modify this further to handle the OCDS case better.

The reason we have to do this on a case by case basis, rather than improving the generic oneOf function, is that in general there's no way of telling which sub-schema you intended to use.

For BODS we can tell which subschema is meant because they each have a different statementType.

In the OCDS case we need to tell whether you intended Linked releases or Embedded releases. The easiest way to do this is to look for the url field, as we only expect this for Linked releases.

Additionally, for BODS a single statement is nested within the oneOf, whereas for OCDS, the whole list of releases is. This means we need to handle the case where some releases look like Linked releases and others like Embedded releases.

(giant block of JSON) has non-unique elements.

This is a different case, so won't be affected by the above work. It is easy to just remove the large block. I suggest creating a new issue for doing any more than this.

There is also something weird going on because OrderedDict is appearing in the big block of JSON, when it doesn't appear in the raw data:

This is because the jsonschema library displays the Python representation of the data, rather than the JSON representation. (Whereas in our custom code for oneOf we use the JSON representation).

@robredpath
Copy link
Member

Thanks, @Bjwebb

In the OCDS case we need to tell whether you intended Linked releases or Embedded releases. The easiest way to do this is to look for the url field, as we only expect this for Linked releases.

Presumably this only works if the error the user has made isn't to do with that field? This certainly sounds sufficient for a dramatic improvement over current behaviour, but am I right in thinking this algorithm could be a good candidate for future improvement to be smarter at detecting what the user was trying to do?

Also - I'd like to document the very helpful rationale here - "we are assuming that this is a Linked Release becuase the url field is only found in Linked Releases" - both in the code and the interface.

I've created #1220 for the 'giant block of JSON' issue.

Presumably when we stop sending large blocks of JSON to the user we'll also stop sending representations of the Python objects that contain them?

@Bjwebb
Copy link
Member

Bjwebb commented Sep 10, 2019

Presumably this only works if the error the user has made isn't to do with that field? This certainly sounds sufficient for a dramatic improvement over current behaviour, but am I right in thinking this algorithm could be a good candidate for future improvement to be smarter at detecting what the user was trying to do?

That's right.

Separately, me and @duncandewhurst discussed that it's better to look for id, which only appears in an Embedded release, because there's a proposal to add url to the Embedded release also.

Also - I'd like to document the very helpful rationale here - "we are assuming that this is a Linked Release becuase the url field is only found in Linked Releases" - both in the code and the interface.

I agree. The interface could be a bit fiddly, but its worth doing.

Presumably when we stop sending large blocks of JSON to the user we'll also stop sending representations of the Python objects that contain them?

Yes, we'll stop showing the Python representations. Note that they could appear again for different error messages, if we start using certain json schema features (e.g. anyOf instead of oneOf).

Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Sep 19, 2019
Make an educated guess at the correct subschema, and use the validation
messages from that.
Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Sep 24, 2019
@Bjwebb
Copy link
Member

Bjwebb commented Sep 24, 2019

@jpmckinney
Copy link
Author

jpmckinney commented Sep 24, 2019

What specific changes to the results should I be looking for?

This array should contain either entirely Embedded releases or Linked releases. Embedded releases contain an id whereas Linkedreleases do not. Your releases contain a mixture.

Please use sentence case and spaces between 'Linkedreleases'.

I think it'd be helpful to link to documentation, rather than to describe that one has id and the other doesn't (a user might interpret that as simply deleting or adding id as a quick fix).

@Bjwebb
Copy link
Member

Bjwebb commented Sep 24, 2019

What specific changes to the results should I be looking for?

Instead of "valid under any of the given schemas" (current behaviour) there are now more specific validation messages from one of the subschemas.

@jpmckinney
Copy link
Author

Aha! It's much better!

Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Sep 26, 2019
Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Sep 26, 2019
Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Sep 26, 2019
Bjwebb added a commit that referenced this issue Sep 26, 2019
Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Sep 26, 2019
Bjwebb added a commit that referenced this issue Sep 26, 2019
Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Oct 1, 2019
@Bjwebb
Copy link
Member

Bjwebb commented Oct 1, 2019

I've added text about assumptions of linked/embedded releases to the web UI: [demo]

Screenshot from 2019-10-01 15-17-19

Currently these appear for every relevant validation error message, because that was the easiest thing to do. Does this need changing before this work is merged?

@jpmckinney
Copy link
Author

jpmckinney commented Oct 1, 2019

date is required whether it is an embedded or linked release, which produces double the number of error reports. However, only date and tag have this issue, so I think that's acceptable for now.

Please create an issue to remove this repetition at a later date.

If there are very many errors, the texts about the assumptions will be very repetitive. In that case, it might be preferable to group the errors by assumption/subschema. However, this is already an improvement, so can be deployed as-is.

Please create an issue about grouping errors by subschema, so that the texts about the assumptions are not repeated.

@Bjwebb
Copy link
Member

Bjwebb commented Oct 2, 2019

New issues:
Remove repeated error reports - https://github.com/OpenDataServices/cove/issues/1234
Grouping errors by subschema - https://github.com/OpenDataServices/cove/issues/1233

Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Oct 8, 2019
Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Oct 8, 2019
Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Oct 8, 2019
Bjwebb added a commit to OpenDataServices/lib-cove that referenced this issue Oct 8, 2019
Bjwebb added a commit that referenced this issue Oct 11, 2019
Bjwebb added a commit that referenced this issue Oct 22, 2019
[#895] Better oneOf validation messages for records
@Bjwebb Bjwebb closed this as completed Nov 5, 2019
@duncandewhurst
Copy link
Contributor

This issue still seems to be present with Zambia's data: https://standard.open-contracting.org/review/data/d7b728ba-e43b-462e-9cac-6e24e5b12325

To reproduce, download the July 2019 record package from ZPPA's download page and submit it to CoVE.

@duncandewhurst duncandewhurst reopened this Nov 8, 2019
@Bjwebb
Copy link
Member

Bjwebb commented Nov 14, 2019

Looks as I'd expect. Could you check again and screenshot the problem?

@duncandewhurst
Copy link
Contributor

Hmm on clicking the link above it seems to be resolved, but on uploading the file afresh the issue reappears:

image

Zambia's portal seems to be down at the moment, so I uploaded the file here

@Bjwebb
Copy link
Member

Bjwebb commented Nov 26, 2019

I've identified the source of the problem here, which is my code doesn't work properly with 1.0. This is because it looks for the title of the releases array, which isn't there in 1.0. I'm guessing the reason it sometimes works is that you've selected 1.1 from the dropdown.

Today I've fixed a bug with arrays being falsely flagged as non-unique #1246. There's now no validation errors with that Zambia file, including none to do with oneOfs. The underlying problem is still there though, and can be reproduced by changing the version in any of the fixtures.

@Bjwebb
Copy link
Member

Bjwebb commented Dec 19, 2019

The lib-cove commits haven't made their way into a release yet, so not into CoVE.

@Bjwebb
Copy link
Member

Bjwebb commented Jan 14, 2020

The updated lib-cove is now live on https://standard.open-contracting.org/review/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants