Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance on how to set a party/id when the source data doesn't provide one #1635

Open
yolile opened this issue Aug 29, 2023 · 4 comments
Open
Labels
Focus - Documentation Includes corrections, clarifications, new guidance, and UI/UX issues Focus - Examples Relating to examples in the guidance

Comments

@yolile
Copy link
Member

yolile commented Aug 29, 2023

This question has been raised by at least 3 partners this year (India, Canada and Guatemala), so we need to add some guidance on the best alternatives to fill this field (and what to do with Organization/identifier)

@yolile yolile added this to the Iterative improvements milestone Aug 29, 2023
@yolile yolile added the Focus - Documentation Includes corrections, clarifications, new guidance, and UI/UX issues label Aug 29, 2023
@jpmckinney
Copy link
Member

jpmckinney commented Aug 29, 2023

We can perhaps have two streams:

  • Input: Constrain user input, i.e. come up with an authoritative list of buyers, and ideally assign them identifiers so that if a name is changed (which occurs frequently in Canada, for example) the identifier remains the same (Canada actually has a Federal Identity Program, so that the civil service can use one consistent name internally, independent of whatever name the current governing party prefers)

  • Mapping: If input can't be constrained, apply normalizations so that small differences in names don't result in new identifiers, e.g. lowercase, normalize spaces, remove articles, prepositions and conjunctions (I've observed lots of variations around these), substitute common abbreviations and typos (min. -> ministry, etc.). Recommend that they produce a full list of names, so that they can identify other things to normalize. It won't be perfect, but better than no normalization.

When I worked with Quebec data many years ago, one ministry had something like 34 variations.

@yolile
Copy link
Member Author

yolile commented Aug 29, 2023

Agree, but note that the first one will only work for buyers but not for suppliers.
For example, in Guatemala, there are some cases where they don't have the identifier for foreign companies, and only their names are recorded.

@jpmckinney
Copy link
Member

Ah, yeah, since there are different options for each case, I suppose the guidance could be organized by type of organization:

  1. buyer or procuring entity
  2. registered supplier
  3. unregistered supplier

And then within each:

  1. Control user input (codelist for buyers, company registry or other supplier database for registered suppliers)
  2. Normalize available data

For suppliers, normalization is trickier, because small differences in names can actually be two different companies. In some cases, the best available option might just be to assign a local ID, and not attempt to make it a global ID.

@yolile
Copy link
Member Author

yolile commented Aug 29, 2023

Yeap, that is what I thought, too. I marked this issue as Documentation, but I'm not sure if this should be a Worked example under the "Deal with the hard cases" section instead

@jpmckinney jpmckinney added the Focus - Examples Relating to examples in the guidance label Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Focus - Documentation Includes corrections, clarifications, new guidance, and UI/UX issues Focus - Examples Relating to examples in the guidance
Projects
None yet
Development

No branches or pull requests

2 participants