-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compile-releases: Store warnings (in collection_note table) #222
Comments
Compilation doesn't presently check for this, but new warnings can be implemented. These would still have to be captured and reported by Kingfisher. Can you share a specific release that has this issue, to be sure that I understand the problem? |
Sure. In collection 606, release with OCID ocid = 'ocds-b5fd17-2f6874ab-adc8-11e6-9901-0019b9f3037b' has 37 suppliers with distinct names, all with id = 0. There is no parties array, so this is deprecated/1.0 use of suppliers anyway (suppliers have contactPoint and address fields). Identifier fields are empty or null. There is just one release with that OCID. Truncated data:
Same OCID in compiled_release collection 608 has one supplier (truncated):
This supplier is the final object in the award.suppliers array from collection 606. Same can be seen from running |
I've released a new version of ocdsmerge that will issue a Python warning if there are duplicate IDs in a single release.
Warnings are of the class
|
I release a new version of ocdsmerge that now offers some alternatives to the default merge routine. See the documentation: https://ocds-merge.readthedocs.io/en/latest/handle-bad-data.html However, since Kingfisher doesn't implement a routing slip pattern (discussed here) and instead sets global behaviors for all collections, it's not possible to e.g. only have the Contracts Finder collection use a different merge routine. So, for now, Kingfisher can just use the warnings above. |
Contracts Finder seems to have fixed their issue. For now, I added a commit, so that these warnings are stored in OCDS Merge's rules are expressed as rule_overrides = tuple(collection.options.get("merge_rules", {}).items()) since arguments to rule_overrides = {tuple(path.split("/")): getattr(ocdsmerge, attribute) for path, attribute in rule_overrides}
return ocdsmerge.Merger(patched_schema, rule_overrides=rule_overrides) |
As an analyst I need to know whether the compilation script encounters data quality issues in the source data in order to judge whether the compiled collection may be missing data.
Example - uk_contracts_finder collection 606 has all award/suppliers with same id (and no parties array). Compiling into collection 608 means all but one supplier is removed.
It would be helpful for compile warnings to be stored in the compiled release collection.
This should include warning about any array with non-unique local ids.
The text was updated successfully, but these errors were encountered: