Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to check for common model issues #1124

Merged
merged 3 commits into from
Jun 22, 2023
Merged

Conversation

tillprochaska
Copy link
Contributor

@tillprochaska tillprochaska commented Jun 9, 2023

This change adds a script to test for common issues with the default FtM model:

  • Divergent types: Multiple properties with the same name, but different types
  • Divergent labels: Multiple properties with the same name, but different labels
  • Label collisions: Multiple properties with different names, but using the same label

These issues can cause problems for example in Aleph. For example, divergent types can cause errors when querying multiple Elasticsearch indexes. Divergent labels result in a confusing user experience.

I have also updated some schema definitions to use consistent labels and types (where types are compatible and the change is not a breaking change).

Current issues:

 DIVERGENT TYPES

  author:
  * Document:author - string
  * Assessment:author - entity

  organization:
  * Directorship:organization - entity
  * Membership:organization - entity
  * Post:organization - string

  area:
  * License:area - string
  * RealEstate:area - number

  authority:
  * Identification:authority - string
  * Contract:authority - entity
  * CallForTenders:authority - entity
  * Sanction:authority - string

  duration:
  * Video:duration - number
  * Sanction:duration - string
  * Audio:duration - number
  * Call:duration - number

  gender:
  * Person:gender - gender
  * Passport:gender - string

  subject:
  * Message:subject - string
  * UnknownLink:subject - entity
  * Email:subject - string

  sender:
  * Email:sender - string
  * EconomicActivity:sender - entity
  * Message:sender - entity

  number:
  * Identification:number - identifier
  * UserAccount:number - phone


DIVERGENT LABELS

  parent:
  * LegalEntity:parent - Parent company
  * Document:parent - Folder

  holder:
  * CryptoWallet:holder - Wallet holder
  * Post:holder - Holder
  * Identification:holder - Identification holder

  authority:
  * Identification:authority - Authority
  * Contract:authority - Contract authority
  * CallForTenders:authority - Name of contracting authority
  * Sanction:authority - Authority

  procedure:
  * Contract:procedure - Contract procedure
  * CallForTenders:procedure - Procedure

  criteria:
  * Similar:criteria - Matching criteria
  * Contract:criteria - Contract award criteria

  number:
  * Identification:number - Document number
  * UserAccount:number - Phone Number


COLLIDING LABELS

  The language of the translated text:
  * Page:translatedTextLanguage
  * Document:translatedLanguage

  Address:
  * Thing:address
  * Thing:addressEntity
  * CryptoWallet:publicKey

  Notes:
  * Thing:noteEntities
  * Thing:notes

  Document number:
  * Identification:number
  * ContractAward:documentNumber

  Customs declarations:
  * Vehicle:declaredCustoms
  * LegalEntity:economicActivityDeclarant
  * BankAccount:contractBankAccount

  ISIN:
  * Company:isinCode
  * Security:isin

  Country of origin:
  * LegalEntity:mainCountry
  * EconomicActivity:originCountry

  Payments received:
  * LegalEntity:paymentBeneficiary
  * BankAccount:paymentBeneficiaryAccount

  Payments made:
  * BankAccount:paymentPayerAccount
  * LegalEntity:paymentPayer

  Responding to:
  * Email:inReplyToEmail
  * Message:inReplyToMessage

  Entity:
  * Sanction:entity
  * Note:entity
  * Mention:resolved
  * Documentation:entity

@tillprochaska tillprochaska force-pushed the feature/model-checks branch 3 times, most recently from cf3efbd to 28e8d6a Compare June 9, 2023 07:53
@tillprochaska tillprochaska changed the title WIP Add lint script for default model WIP Add lint script to check for common model issues Jun 9, 2023
@tillprochaska tillprochaska changed the title WIP Add lint script to check for common model issues WIP Add script to check for common model issues Jun 9, 2023
@tillprochaska tillprochaska force-pushed the feature/model-checks branch from e373f1b to 610e1c9 Compare June 9, 2023 09:06
This change adds a script to test for common issues with the default FtM model:

* Divergent types: Multiple properties with the same name, but different types
* Divergent labels: Multiple properties with the same name, but different labels
* Label collisions: Multiple properties with different names, but using the same label

These issues can cause problems for example in Aleph. For example, divergent types can cause errors when querying multiple Elasticsearch indexes. Divergent labels result in a confusing user experience.
@tillprochaska tillprochaska force-pushed the feature/model-checks branch from 610e1c9 to 23dac13 Compare June 9, 2023 09:11
@tillprochaska tillprochaska marked this pull request as ready for review June 9, 2023 09:31
@tillprochaska tillprochaska changed the title WIP Add script to check for common model issues Add script to check for common model issues Jun 9, 2023
"criteria",
"procedure",
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question. Should we be concerned that there is both an authority type, and an authority label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "divergent types" means: There are two properties with the name "authority" that have different types (haven’t checked it, but probably one is has the entity and the other name or something like that).

What "divergent labels" means: There are two properties with the same name, but they use different labels in the UI (e.g. CallForTenders:authority has the label "Name of contracting authority" while Sanction:authority has the label "Authority").

We should be concerned about all of these issues. However, there are some issues that can only be resolved with breaking changes, so I’ve added them to the ignore list for now (otherwise it would break CI).

collisions[label] = props

return collisions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three functions above: test_divergent_types, _labels, _collisions all use the same basic pattern. Would it be worth pulling this out into a generic function to reduce duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While they use the same structure, they do different things, and abstracting the generic structure would probably require passing a bunch of parameters or predicates as lambdas. I’m not sure this would make it easier to understand/maintain tbh

for prop in props:
print(f" * {prop.qname}")
print()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth extracting this into a function to reduce replication?

@tillprochaska tillprochaska merged commit 7a9de99 into main Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants