Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Affiliations entered in affiliation fields are parenthesized in "Datacite" and Schema.org exports #9330

Open
jggautier opened this issue Jan 26, 2023 · 11 comments

Comments

@jggautier
Copy link
Contributor

jggautier commented Jan 26, 2023

When does this issue occur?
When Dataverse creates DataCite exports ("DataCite" from the dataset page's Export Metadata dropdown) and Schema.org metadata exports for datasets that have values in a few Affiliation fields in the Citation metadatablock

Which page(s) does it occurs on?
Metadata exports and OAI-PMH feed

What happens?
The affiliation metadata that depositors add to their datasets, e.g. Author Affiliation, Point of Contact Affiliation, Producer Affiliation, appears in the DataCite and Schema.org exports wrapped in parenthesis.

The DataCite export has these affiliation fields:

  • Author Affiliation
  • Point of Contact Affiliation
  • Producer Affiliation

The Schema.org export has this affiliation field:

  • Author Affiliation

To whom does it occur (all users, curators, superusers)?
All users. It probably affects search, such as when using facets to narrow search results

What did you expect to happen?
The affiliation metadata would appear in the exports without the added parentheses

Which version of Dataverse are you using?
5.12.1

Any related open or closed issues to this bug report?
The issues related to using an algorithm to guess if the names entered in the author metadata field are people or organizations: #7349 and #5029. Will the PR to address those issues, #9089, remove the parenthesis? I think it might since the Schema.org exports that QDR's Dataverse fork creates already use the algorithm, and in their Schema.org exports, author affiliations aren't wrapped in parentheses, e.g. their Schema.org export at https://data.qdr.syr.edu/api/datasets/export?exporter=schema.org&persistentId=doi:10.5064/F6G3T1PF

Screenshots:

How the affiliations of the Author, Point of Contact, and Producer fields in DataCite export of the dataset at https://doi.org/10.7910/DVN/MUJHGR (published in Harvard Dataverse ):

  • Screen Shot 2023-01-26 at 11 10 22 AM
  • Screen Shot 2023-01-26 at 11 06 42 AM

How the affiliations of the Author field appears in Schema.org export of the dataset at https://doi.org/10.7910/DVN/MUJHGR (published in Harvard Dataverse ):

Definition of done:
When the affiliation metadata is not wrapped in parenthesis when it appears in metadata exports

@jggautier
Copy link
Contributor Author

jggautier commented Apr 4, 2024

This bug with the parentheses exists in the Schema.org exports of older dataset versions but not in the Schema.org exports of more recently published dataset versions:

As far as I can tell, in the Schema.org exports of all datasets published more recently, the author affiliations don't have parentheses.

I think this problem might be related to the discussion in #5144, where we talked about how to make sure that when we make changes to how Dataverse adds metadata to the DataCite metadata export, we ensure that the datasets published before those changes were made have their exports updated.

The same should be true for the Schema.org export and other exports. In the Schema.org export of a dataset published today, we can see changes that were made when v5.13 was applied to Harvard Dataverse. Those changes don't show up in those two dataset exports I mentioned earlier and probably many datasets in Harvard Dataverse whose latest versions were published before v5.13 was applied to Harvard Dataverse.

@lmaylein
Copy link
Contributor

lmaylein commented Jul 4, 2024

This bug still exists in v6.2. Is it possible to fix it? As a result of this bug, the metadata of all DOIs registered with Datacite are also incorrect.

@jggautier
Copy link
Contributor Author

Hi @lmaylein. Thanks for asking! I think that the more recent work described in the GitHub issue at #5889 will fix this bug. Specifically, the OpenAIRE export doesn't include these parentheses, so in a comment in that GitHub issue I proposed that the merged export also wouldn't include the parentheses around the affiliations of the Author metadata field. And I imagine that parentheses will not be included around the affiliations of the other fields that describe people or organizations, too, such as Point of Contact, Contributor, Producer, and Distributor.

@pdurbin
Copy link
Member

pdurbin commented Jul 8, 2024

As far as I can tell, in the Schema.org exports of all datasets published more recently, the author affiliations don't have parentheses.

Is the fix to re-export datasets? https://guides.dataverse.org/en/6.3/admin/metadataexport.html#batch-exports-through-the-api

Do we know which PR fixed it, by removing the parentheses (if it is indeed fixed)?

@qqmyers
Copy link
Member

qqmyers commented Jul 8, 2024

Schema.org was fixed in #9089. The problem for DataCite is that the displayValue for affiliation is sent to DataCite - see

.append("<affiliation>" + author.getAffiliation().getDisplayValue() + "</affiliation>");
. I'm addressing it in #10615, #10632 (which need updates), but it could be addressed separately, or ~worked around by removing the parens in the formatting at
authorAffiliation Affiliation The name of the entity affiliated with the author, e.g. an organization's name Organization XYZ text 9 (#VALUE) TRUE FALSE FALSE TRUE TRUE FALSE author citation
and resending the metadata to DataCite using the API (and assuming display without parens is OK).

@pdurbin
Copy link
Member

pdurbin commented Jul 8, 2024

Oh, the displayValue. Thanks.

Hmm, I assume the parens are there in the displayValue for a reason. That is, we probably shouldn't remove them.

@qqmyers I'm fine with waiting for one of your PRs above. If you address this bug in one of them, please use the normal "closes #9330" syntax so this issue goes through QA.

@DS-INRAE
Copy link
Member

DS-INRAE commented Oct 23, 2024

@qqmyers now that #10632 is merged, should this issue be closed 🙂 ?

@qqmyers
Copy link
Member

qqmyers commented Oct 23, 2024

Probably - in general, @jggautier is looking at which of the DataCite related issues can close and which either need to be rescoped after #10632.

@DS-INRAE
Copy link
Member

Great, we'll see what Julian says when he gets to this one then 😀

@DS-INRAE
Copy link
Member

I just had stumbled upon the issue in our board and wondered if it had been forgoted

@jggautier
Copy link
Contributor Author

jggautier commented Dec 16, 2024

Hey all. Affiliations entered in affiliation fields are still parenthesized in the DataCite and Schema.org exports of datasets published by some Dataverse repositories.

The DataCite and Schema.org exports of the dataset that I included as an example in this GitHub issue's first comment have affiliations that are wrapped in parentheses.

I checked the oldest dataset in each of the other 17 known Dataverse installations that are using v6.4 as of this writing. As far as I can tell the affiliations in their datasets' Schema.org exports don't have parenthesis. But two of those installations also have datasets whose DataCite exports have Author and Point of Contact Affiliations that are wrapped in parentheses:

In Repositorio de Datos Abiertos de Investigación (Redata), the most recently created or updated datasets, and maybe all datasets in Redata, have DataCite exports that have Author and Point of Contact Affiliations that are wrapped in parentheses.

The oldest datasets in the other 15 installations that are on v6.4 either have affiliations that are not wrapped in parentheses or don't have Author Affiliation metadata. I haven't checked for Point of Contact and Producer Affiliations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: ⚠️ Needed/Important
Development

No branches or pull requests

5 participants