Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to check for common model issues #1124

Merged
merged 3 commits into from
Jun 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ jobs:
sudo apt-get install -y -qq libicu-dev
pip install wheel pyicu
pip install -e ".[dev]"
- name: Run checks for default model
run: |
python contrib/check_model.py
- name: Run the tests
run: |
make test
Expand Down
131 changes: 131 additions & 0 deletions contrib/check_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
import sys
from collections import defaultdict
from followthemoney import model


IGNORE_DIVERGENT_TYPES = [
"author",
"organization",
"gender",
"number",
"authority",
"duration",
"area",
"subject",
"sender",
]

IGNORE_DIVERGENT_LABELS = [
"parent",
"holder",
"number",
"authority",
"criteria",
"procedure",
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question. Should we be concerned that there is both an authority type, and an authority label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "divergent types" means: There are two properties with the name "authority" that have different types (haven’t checked it, but probably one is has the entity and the other name or something like that).

What "divergent labels" means: There are two properties with the same name, but they use different labels in the UI (e.g. CallForTenders:authority has the label "Name of contracting authority" while Sanction:authority has the label "Authority").

We should be concerned about all of these issues. However, there are some issues that can only be resolved with breaking changes, so I’ve added them to the ignore list for now (otherwise it would break CI).

IGNORE_LABEL_COLLISIONS = [
"Address",
"Notes",
"Customs declarations",
"Country of origin",
"Payments received",
"Payments made",
"Entity",
"The language of the translated text",
"Responding to",
"ISIN",
"Document number",
]


def test_divergent_types(by_name):
divergent = {}

for name, props in by_name.items():
if len(props) == 1 or name in IGNORE_DIVERGENT_TYPES:
continue

types = set([p.type for p in props])
if len(types) > 1:
divergent[name] = props

return divergent


def test_divergent_labels(by_name):
divergent = {}

for name, props in by_name.items():
if len(props) == 1 or name in IGNORE_DIVERGENT_LABELS:
continue

labels = set([p.label for p in props])
if len(labels) > 1:
divergent[name] = props

return divergent


def test_label_collisions(by_label):
collisions = {}

for label, props in by_label.items():
if len(props) == 1 or label in IGNORE_LABEL_COLLISIONS:
continue

names = set([p.name for p in props])
if len(names) > 1:
collisions[label] = props

return collisions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three functions above: test_divergent_types, _labels, _collisions all use the same basic pattern. Would it be worth pulling this out into a generic function to reduce duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While they use the same structure, they do different things, and abstracting the generic structure would probably require passing a bunch of parameters or predicates as lambdas. I’m not sure this would make it easier to understand/maintain tbh


if __name__ == '__main__':
by_name = defaultdict(set)
by_label = defaultdict(set)

for schema in model:
for prop in schema.properties.values():
by_name[prop.name].add(prop)
by_label[prop.label].add(prop)

divergent_types = test_divergent_types(by_name)
divergent_labels = test_divergent_labels(by_name)
label_collisions = test_label_collisions(by_label)

failed = False

if divergent_types:
failed = True
print("DIVERGENT TYPES\n")
for name, props in divergent_types.items():
print(f" {name}:")
for prop in props:
print(f" * {prop.qname} - {prop.type.name}")
print()
print()

if divergent_labels:
failed = True
print("DIVERGENT LABELS\n")
for name, props in divergent_labels.items():
print(f" {name}:")
for prop in props:
print(f" * {prop.qname} - {prop.label}")
print()
print()

if label_collisions:
failed = True
print("COLLIDING LABELS\n")
for label, props in label_collisions.items():
print(f" {label}:")
for prop in props:
print(f" * {prop.qname}")
print()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth extracting this into a function to reduce replication?

if failed:
sys.exit(1)

print("No issues found.")
21 changes: 0 additions & 21 deletions contrib/collisions.py

This file was deleted.

4 changes: 2 additions & 2 deletions followthemoney/schema/CallForTenders.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,14 +54,14 @@ CallForTenders:
label: "Call for tenders result"
description: "The nature of the contractual agreement that will result from this CfT"
cpvCode:
label: "CPV codes"
label: "CPV code"
description: "Common Procurement Vocabulary (CPV)"
type: identifier
reverseAuctionsIncluded:
label: "Inclusion of e-Auctions"
# cf. https://mita.gov.mt/2021/10/05/reverse-bidding-for-data-connectivity-across-the-public-service-a-first-for-malta/
nutsCode:
label: "NUTS codes"
label: "NUTS code"
description: "Nomenclature of Territorial Units for Statistics (NUTS)"
type: identifier
relationToThreshold:
Expand Down
2 changes: 1 addition & 1 deletion followthemoney/schema/Contract.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Contract:
- contractDate
properties:
title:
label: "Contract title"
label: "Title"
type: string
authority:
label: "Contract authority"
Expand Down
4 changes: 3 additions & 1 deletion followthemoney/schema/ContractAward.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ ContractAward:
type: entity
range: Contract
callForTenders:
label: "Call For Tender"
label: "Call For Tenders"
type: entity
reverse:
name: contractAwards
Expand All @@ -58,9 +58,11 @@ ContractAward:
cpvCode:
label: "CPV code"
description: "Contract Procurement Vocabulary (what type of goods/services, EU)"
type: identifier
nutsCode:
label: "NUTS code"
description: "Nomencalture of Territorial Units for Statistics (NUTS)"
type: identifier
amended:
label: "Amended"
description: "Was this award amended, modified or updated by a subsequent document?"
2 changes: 1 addition & 1 deletion followthemoney/schema/Post.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,6 @@ Post:
label: "Organization"
type: string
wikidataId:
label: "Wikidata ID of the post"
label: "Wikidata ID"
hidden: true
type: identifier
2 changes: 1 addition & 1 deletion followthemoney/schema/Security.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Security:
label: Registration number
type: identifier
ticker:
label: Ticker
label: Stock ticker symbol
type: identifier
issuer:
label: "Issuer"
Expand Down