-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitize languages controlled vocabulary values #10197
Sanitize languages controlled vocabulary values #10197
Conversation
This PR's content can be used as a support to discuss the following issue (that has been taken into account in the PR) : |
@@ -284,12 +284,12 @@ | |||
language Shona 143 sna sn | |||
language Sinhala, Sinhalese 144 sin si | |||
language Slovak 145 slk slo sk | |||
language Slovene 146 slv sl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for dropping "Slovene" altogether? 639-2 lists both; and it looks like "Slovene" may still be the preferred name scientifically; the Wikipedia article lists it first, for example - https://en.wikipedia.org/wiki/Slovene_language). I'll just keep "Slovene" as one of the alternate forms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We proposed replacing the main language but it is possible to add an alternative language, yes.
These documents also show the language "Slovene" as a secondary name:
https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes
https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we could leave "Slovene" as the main name, and simply add "Slovenian" as an alternate? - i.e., have this in citation.tsv
:
language Slovene 146 slv sl Slovenian
The end result will be the same, both names will be valid and acceptable. It's just that changing the main name makes the block update so much more complicated (the block update API gets confused, so a Flyway database update becomes necessary).
I have a couple of similar questions about the other fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, if there is an objective reason to want to change the main entry for a specific language, it's not that big of a problem to add a flyway script to the release.
@@ -220,7 +220,7 @@ | |||
language Khmer 79 khm km | |||
language Kikuyu, Gikuyu 80 kik ki | |||
language Kinyarwanda 81 kin rw | |||
language Kyrgyz 82 | |||
language Kirghiz, Kyrgyz 82 kir ky |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost certain that this is not what we want to do. If the goal is to have both the "Kirghiz" and "Kyrgyz" spellings to be accepted as valid. Because the above would mean that either is invalid by itself, and only the full literal string "Kirghiz, Kyrgyz" is acceptable. So we should make the other spelling an alternate here as well, as in:
language Kyrgyz 82 kir ky Kirghiz
Just as I typed this, I realized that we have quite a few of such comma-separated entries in the block already!! Navajo, Navaho
etc.
We will need to fix them all. And there is no way to do that, other than via a database update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(To correct myself, there is definitely a way to address this without a Flyway update - for example, we can keep Navajo, Navaho
as the main name, but add each of the 2 formats as a separate alternate, as in:
language Navajo, Navaho 109 nav nv Navajo Navaho
)
And to further state the obvious, I was focusing on how these changes may affect metadata imports. I'm assuming that the intent behind the proposed changes to the main language names ("Slovenian", "Swahili (macrolanguage)" etc.) was how they appear in the UI menus (?). Both are important concerns, and it should be possible to reconcile them. |
@setevenferey I was waiting for some feedback, but then got distracted by working on other things, so I never finished looking into this (apologies). I still would like to know if it is really necessary to change the main controlled vocabulary value, such as changing |
Hello @landreev, We have no real need for the modification of the Swahili, Nepali and Slovenian languages, the goal is to be in agreement with the ISO standard but the sources of information are sometimes different. like the proposal for the Slovenian language: Thanks a lot |
As I mentioned earlier, in place of this pr, I created my own branch and made a new pr: #10481. |
@landreev as the new PR has been reviewed, should we close this one already :) ? |
Thank you for your feedback, Reviews of this PR are reflected in PR #10481 |
What this PR does / why we need it:
This is a first proposal open to proposals in order to fix the desired modifications before working on the flyway script.
Which issue(s) this PR closes:
Closes #8243
Special notes for your reviewer:
Provide your suggestions for modifications directly in the PR review
Additional documentation:
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Languages/List_of_ISO_639-3_language_codes_(2019)
https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes