Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape description for use in datacite xml (file and datacite api call) #7798

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Apr 13, 2021

What this PR does / why we need it: Adds escaping appropriate for include text in an xml doc to the description metadata in the DataCite metadata export/the same xml metadata as sent to the DataCite server via API during publication.

Which issue(s) this PR closes:

Closes #3328

Special notes for your reviewer: If I recall, some fields like title are escaped when created so additional escaping at this stage isn't needed. I haven't checked the status of all the other fields that get included (creator, publisher, contributor) where, if they aren't being escaped now, special chars could still cause an issue. That said, the description field seems like a key one to cover where most of the practical examples of special chars being used have shown up, so I think this is a useful step, although it won't stop problems from special chars in other fields if they indeed aren't already handled.

Suggestions on how to test this: The best test would be to have a test server configured with a test DataCite account and verify that a description with & < > * etc. fails before the PR and works afterwards. However, as noted in the issue, demo.dataverse.org seems to allow publication without this even though it appears to use the DataCite test server now.

A simpler test would be to just verify by inspection that the datacite metadata export file is valid xml, either by using a browser than can display xml (and not seeing an error like the one shown in the issue) or doing a view page source and seeing that characters such as & are escaped in the source (i.e. as & in this case).

Does this PR introduce a user interface change? If mockups are available, please link/include them here: no

Is there a release notes update needed for this change?: could note the fix

Additional documentation:

@kcondon kcondon self-assigned this Apr 23, 2021
@kcondon kcondon merged commit 98579a6 into IQSS:develop Apr 23, 2021
@djbrooke djbrooke added this to the 5.5 milestone Apr 27, 2021
@qqmyers qqmyers deleted the IQSS/3328-escape_description_for_datacite_xml branch May 17, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Publish Dataset - Fails when metadata contains HTML entities w/special characters such as &nbsp;
4 participants