-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Affiliations entered in affiliation fields are parenthesized in "Datacite" and Schema.org exports #9330
Comments
This bug with the parentheses exists in the Schema.org exports of older dataset versions but not in the Schema.org exports of more recently published dataset versions:
As far as I can tell, in the Schema.org exports of all datasets published more recently, the author affiliations don't have parentheses. I think this problem might be related to the discussion in #5144, where we talked about how to make sure that when we make changes to how Dataverse adds metadata to the DataCite metadata export, we ensure that the datasets published before those changes were made have their exports updated. The same should be true for the Schema.org export and other exports. In the Schema.org export of a dataset published today, we can see changes that were made when v5.13 was applied to Harvard Dataverse. Those changes don't show up in those two dataset exports I mentioned earlier and probably many datasets in Harvard Dataverse whose latest versions were published before v5.13 was applied to Harvard Dataverse. |
This bug still exists in v6.2. Is it possible to fix it? As a result of this bug, the metadata of all DOIs registered with Datacite are also incorrect. |
Hi @lmaylein. Thanks for asking! I think that the more recent work described in the GitHub issue at #5889 will fix this bug. Specifically, the OpenAIRE export doesn't include these parentheses, so in a comment in that GitHub issue I proposed that the merged export also wouldn't include the parentheses around the affiliations of the Author metadata field. And I imagine that parentheses will not be included around the affiliations of the other fields that describe people or organizations, too, such as Point of Contact, Contributor, Producer, and Distributor. |
Is the fix to re-export datasets? https://guides.dataverse.org/en/6.3/admin/metadataexport.html#batch-exports-through-the-api Do we know which PR fixed it, by removing the parentheses (if it is indeed fixed)? |
Schema.org was fixed in #9089. The problem for DataCite is that the displayValue for affiliation is sent to DataCite - see dataverse/src/main/java/edu/harvard/iq/dataverse/pidproviders/doi/XmlMetadataTemplate.java Line 163 in a466c97
|
Oh, the displayValue. Thanks. Hmm, I assume the parens are there in the displayValue for a reason. That is, we probably shouldn't remove them. @qqmyers I'm fine with waiting for one of your PRs above. If you address this bug in one of them, please use the normal "closes #9330" syntax so this issue goes through QA. |
Probably - in general, @jggautier is looking at which of the DataCite related issues can close and which either need to be rescoped after #10632. |
Great, we'll see what Julian says when he gets to this one then 😀 |
I just had stumbled upon the issue in our board and wondered if it had been forgoted |
Hey all. Affiliations entered in affiliation fields are still parenthesized in the DataCite and Schema.org exports of datasets published by some Dataverse repositories. The DataCite and Schema.org exports of the dataset that I included as an example in this GitHub issue's first comment have affiliations that are wrapped in parentheses. I checked the oldest dataset in each of the other 17 known Dataverse installations that are using v6.4 as of this writing. As far as I can tell the affiliations in their datasets' Schema.org exports don't have parenthesis. But two of those installations also have datasets whose DataCite exports have Author and Point of Contact Affiliations that are wrapped in parentheses:
In Repositorio de Datos Abiertos de Investigación (Redata), the most recently created or updated datasets, and maybe all datasets in Redata, have DataCite exports that have Author and Point of Contact Affiliations that are wrapped in parentheses. The oldest datasets in the other 15 installations that are on v6.4 either have affiliations that are not wrapped in parentheses or don't have Author Affiliation metadata. I haven't checked for Point of Contact and Producer Affiliations. |
When does this issue occur?
When Dataverse creates DataCite exports ("DataCite" from the dataset page's Export Metadata dropdown) and Schema.org metadata exports for datasets that have values in a few Affiliation fields in the Citation metadatablock
Which page(s) does it occurs on?
Metadata exports and OAI-PMH feed
What happens?
The affiliation metadata that depositors add to their datasets, e.g. Author Affiliation, Point of Contact Affiliation, Producer Affiliation, appears in the DataCite and Schema.org exports wrapped in parenthesis.
The DataCite export has these affiliation fields:
The Schema.org export has this affiliation field:
To whom does it occur (all users, curators, superusers)?
All users. It probably affects search, such as when using facets to narrow search results
What did you expect to happen?
The affiliation metadata would appear in the exports without the added parentheses
Which version of Dataverse are you using?
5.12.1
Any related open or closed issues to this bug report?
The issues related to using an algorithm to guess if the names entered in the author metadata field are people or organizations: #7349 and #5029. Will the PR to address those issues, #9089, remove the parenthesis? I think it might since the Schema.org exports that QDR's Dataverse fork creates already use the algorithm, and in their Schema.org exports, author affiliations aren't wrapped in parentheses, e.g. their Schema.org export at https://data.qdr.syr.edu/api/datasets/export?exporter=schema.org&persistentId=doi:10.5064/F6G3T1PF
Screenshots:
How the affiliations of the Author, Point of Contact, and Producer fields in DataCite export of the dataset at https://doi.org/10.7910/DVN/MUJHGR (published in Harvard Dataverse ):
How the affiliations of the Author field appears in Schema.org export of the dataset at https://doi.org/10.7910/DVN/MUJHGR (published in Harvard Dataverse ):
(Both the "author" and "creator" properties are used to repeat author metadata in the Schema.org export because of an experiment unrelated to this issue about parentheses. See Improving Dataverse's Schema.org JSON-LD schema to enable author names display in Google Dataset Search's #5029 (comment))
Definition of done:
When the affiliation metadata is not wrapped in parenthesis when it appears in metadata exports
The text was updated successfully, but these errors were encountered: