-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't harvest when Dublin core field language is set #8139
Comments
Same problem here with oai_dc and language "en": |
In citation.tsv I see lines like this:
I assume that's what's needed is some way to map "en" to "eng" and "fr" to "fra". It looks like the 3 letter ISO-639-3 codes were added in pull request #7690 because of an inability to harvest "eng" datasets from Zenodo in #7638. This issue seems to be related:
|
@pdurbin : Yes, this seems to be the case. I tried to identify a place in the code, where such a mapping could take place, but wasn't successful (perhaps adding and handling a new further-processing column in the foreignmetadatafieldmapping table?). Thought about just removing the language entry language from the foreignmetadatafieldmapping table as an ugly hack (as language is not really an important field for the harvested datasets), but am also unsure about side effects of this. |
#7638 (comment) indicates that we can have multiple alternates - could en be added into the tsv without removing eng, etc? |
Yes, this is just a matter of adding more alternative variants to the list of controlled vocabulary values in citation.tsv.
to
and update the block ( But yes, we should add all these standard 2-letter codes to the block in the next release. |
Fix #8139 : add iso-639-1 code for language as oai_dc specification
I try to harvest a record on an oaipmh server. This record is format in oai_dc schema and has the field
language
set tofr
value (oai_dc specifies that language must be an ISO 639-1 code, 2 letters).But the harvest is failling with the following error:
Language is a controlled vocabulary field and values are human readable: see https://github.com/IQSS/dataverse/blob/develop/scripts/api/data/metadatablocks/citation.tsv#L186
I think that the controlled vocabulary must refer to ISO 639-1 codes and human readable display value must be set with translation files.
Removing language field from record fix the harvesting.
The text was updated successfully, but these errors were encountered: