-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset Metadata - clean up titles and other metadata that contain special characters #2133
Comments
@raprasad can we close this? It sounds like a data curation issue rather than a code issue. |
Discussed with @raprasad and we agree it's a data curation issue, not a code issue. @jggautier or @sbarbosadataverse can you please take a look? |
Just came across this dataset with an ampersand in the title: http://dx.doi.org/10.7910/DVN/XJVVQX In the citation box (and the breadcrumb), I imagined solving this issue could involve querying the database to identify datasets that have unsupported characters in certain metadata fields (searching for those datasets isn't possible now (#2702)). We could curate those datasets more thoroughly, and if we have a better sense of how often unsupported characters are entered in metadata fields, we could make a case for development work that could help with curation (e.g. something that might allow or transform those unsupported characters or add validation to those fields so the depositor is warned about unsupported characters before creating datasets). I agree this is an ongoing issue with no clear definition of done. |
@jggautier thanks for bringing this issue to our attention. It's certainly related to #3845 so I grabbed it to at least leave a comment when I move a pull request into code review. |
One suggestion I mentioned to @jggautier is to run a validator on the exported metadata file, rather than querying the database for known problematic strings (with the principle of "asserting goodness" rather than "enumerating badness" |
Closing this, since it's being tracked in Harvard Dataverse's curation github. |
(in process)
Review
<
, and>
tags:https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl%3A1902.1/20291&version=1.0
https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl%3A1902.1/20452&version=1.0
The text was updated successfully, but these errors were encountered: