Add script to check for common model issues #1124

tillprochaska · 2023-06-09T07:43:00Z

This change adds a script to test for common issues with the default FtM model:

Divergent types: Multiple properties with the same name, but different types
Divergent labels: Multiple properties with the same name, but different labels
Label collisions: Multiple properties with different names, but using the same label

These issues can cause problems for example in Aleph. For example, divergent types can cause errors when querying multiple Elasticsearch indexes. Divergent labels result in a confusing user experience.

I have also updated some schema definitions to use consistent labels and types (where types are compatible and the change is not a breaking change).

Current issues:

 DIVERGENT TYPES

  author:
  * Document:author - string
  * Assessment:author - entity

  organization:
  * Directorship:organization - entity
  * Membership:organization - entity
  * Post:organization - string

  area:
  * License:area - string
  * RealEstate:area - number

  authority:
  * Identification:authority - string
  * Contract:authority - entity
  * CallForTenders:authority - entity
  * Sanction:authority - string

  duration:
  * Video:duration - number
  * Sanction:duration - string
  * Audio:duration - number
  * Call:duration - number

  gender:
  * Person:gender - gender
  * Passport:gender - string

  subject:
  * Message:subject - string
  * UnknownLink:subject - entity
  * Email:subject - string

  sender:
  * Email:sender - string
  * EconomicActivity:sender - entity
  * Message:sender - entity

  number:
  * Identification:number - identifier
  * UserAccount:number - phone


DIVERGENT LABELS

  parent:
  * LegalEntity:parent - Parent company
  * Document:parent - Folder

  holder:
  * CryptoWallet:holder - Wallet holder
  * Post:holder - Holder
  * Identification:holder - Identification holder

  authority:
  * Identification:authority - Authority
  * Contract:authority - Contract authority
  * CallForTenders:authority - Name of contracting authority
  * Sanction:authority - Authority

  procedure:
  * Contract:procedure - Contract procedure
  * CallForTenders:procedure - Procedure

  criteria:
  * Similar:criteria - Matching criteria
  * Contract:criteria - Contract award criteria

  number:
  * Identification:number - Document number
  * UserAccount:number - Phone Number


COLLIDING LABELS

  The language of the translated text:
  * Page:translatedTextLanguage
  * Document:translatedLanguage

  Address:
  * Thing:address
  * Thing:addressEntity
  * CryptoWallet:publicKey

  Notes:
  * Thing:noteEntities
  * Thing:notes

  Document number:
  * Identification:number
  * ContractAward:documentNumber

  Customs declarations:
  * Vehicle:declaredCustoms
  * LegalEntity:economicActivityDeclarant
  * BankAccount:contractBankAccount

  ISIN:
  * Company:isinCode
  * Security:isin

  Country of origin:
  * LegalEntity:mainCountry
  * EconomicActivity:originCountry

  Payments received:
  * LegalEntity:paymentBeneficiary
  * BankAccount:paymentBeneficiaryAccount

  Payments made:
  * BankAccount:paymentPayerAccount
  * LegalEntity:paymentPayer

  Responding to:
  * Email:inReplyToEmail
  * Message:inReplyToMessage

  Entity:
  * Sanction:entity
  * Note:entity
  * Mention:resolved
  * Documentation:entity

This change adds a script to test for common issues with the default FtM model: * Divergent types: Multiple properties with the same name, but different types * Divergent labels: Multiple properties with the same name, but different labels * Label collisions: Multiple properties with different names, but using the same label These issues can cause problems for example in Aleph. For example, divergent types can cause errors when querying multiple Elasticsearch indexes. Divergent labels result in a confusing user experience.

…tAward:nutsCode`

Rosencrantz · 2023-06-19T07:56:09Z

contrib/check_model.py

+    "criteria",
+    "procedure",
+]
+


Question. Should we be concerned that there is both an authority type, and an authority label?

What "divergent types" means: There are two properties with the name "authority" that have different types (haven’t checked it, but probably one is has the entity and the other name or something like that).

What "divergent labels" means: There are two properties with the same name, but they use different labels in the UI (e.g. CallForTenders:authority has the label "Name of contracting authority" while Sanction:authority has the label "Authority").

We should be concerned about all of these issues. However, there are some issues that can only be resolved with breaking changes, so I’ve added them to the ignore list for now (otherwise it would break CI).

Rosencrantz · 2023-06-19T08:01:08Z

contrib/check_model.py

+            collisions[label] = props
+
+    return collisions
+


The three functions above: test_divergent_types, _labels, _collisions all use the same basic pattern. Would it be worth pulling this out into a generic function to reduce duplication?

While they use the same structure, they do different things, and abstracting the generic structure would probably require passing a bunch of parameters or predicates as lambdas. I’m not sure this would make it easier to understand/maintain tbh

Rosencrantz · 2023-06-19T08:02:01Z

contrib/check_model.py

+            for prop in props:
+                print(f"  * {prop.qname}")
+            print()
+


Is it worth extracting this into a function to reduce replication?

tillprochaska force-pushed the feature/model-checks branch 3 times, most recently from cf3efbd to 28e8d6a Compare June 9, 2023 07:53

tillprochaska changed the title ~~WIP Add lint script for default model~~ WIP Add lint script to check for common model issues Jun 9, 2023

tillprochaska mentioned this pull request Jun 9, 2023

BUG: When expanding facets that include properties of type text an handled error occurs. alephdata/aleph#3065

Closed

tillprochaska changed the title ~~WIP Add lint script to check for common model issues~~ WIP Add script to check for common model issues Jun 9, 2023

tillprochaska force-pushed the feature/model-checks branch from e373f1b to 610e1c9 Compare June 9, 2023 09:06

tillprochaska added 3 commits June 9, 2023 11:11

Use consistent property labels

54359bb

Use identifier property type for ContractAward:cpvCode and `Contrac…

23dac13

…tAward:nutsCode`

tillprochaska force-pushed the feature/model-checks branch from 610e1c9 to 23dac13 Compare June 9, 2023 09:11

tillprochaska marked this pull request as ready for review June 9, 2023 09:31

tillprochaska changed the title ~~WIP Add script to check for common model issues~~ Add script to check for common model issues Jun 9, 2023

Rosencrantz approved these changes Jun 19, 2023

View reviewed changes

tillprochaska merged commit 7a9de99 into main Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add script to check for common model issues #1124

Add script to check for common model issues #1124

tillprochaska commented Jun 9, 2023 •

edited

Loading

Rosencrantz Jun 19, 2023

tillprochaska Jun 19, 2023

Rosencrantz Jun 19, 2023

tillprochaska Jun 19, 2023

Rosencrantz Jun 19, 2023

Add script to check for common model issues #1124

Add script to check for common model issues #1124

Conversation

tillprochaska commented Jun 9, 2023 • edited Loading

Rosencrantz Jun 19, 2023

Choose a reason for hiding this comment

tillprochaska Jun 19, 2023

Choose a reason for hiding this comment

Rosencrantz Jun 19, 2023

Choose a reason for hiding this comment

tillprochaska Jun 19, 2023

Choose a reason for hiding this comment

Rosencrantz Jun 19, 2023

Choose a reason for hiding this comment

tillprochaska commented Jun 9, 2023 •

edited

Loading