Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Metadata - clean up titles and other metadata that contain special characters #2133

Closed
raprasad opened this issue May 5, 2015 · 6 comments
Labels
Feature: Metadata Type: Suggestion an idea User Role: Curator Curates and reviews datasets, manages permissions UX & UI: Design This issue needs input on the design of the UI and from the product owner

Comments

@raprasad
Copy link
Contributor

raprasad commented May 5, 2015

(in process)
Review <, and > tags:

https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl%3A1902.1/20291&version=1.0
https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl%3A1902.1/20452&version=1.0

@raprasad raprasad added UX & UI: Design This issue needs input on the design of the UI and from the product owner Priority: Medium Type: Suggestion an idea labels May 5, 2015
@raprasad raprasad added this to the In Review - Short Term milestone May 5, 2015
@mheppler mheppler changed the title clean up titles and other attibutes that resemble html tags Dataset Metadata - clean up titles and other metadate that contain special characters Jan 27, 2016
@scolapasta scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016
@pdurbin
Copy link
Member

pdurbin commented Jun 27, 2017

@raprasad can we close this? It sounds like a data curation issue rather than a code issue.

@jggautier jggautier changed the title Dataset Metadata - clean up titles and other metadate that contain special characters Dataset Metadata - clean up titles and other metadata that contain special characters Jun 27, 2017
@pdurbin
Copy link
Member

pdurbin commented Jun 28, 2017

Discussed with @raprasad and we agree it's a data curation issue, not a code issue. @jggautier or @sbarbosadataverse can you please take a look?

@jggautier
Copy link
Contributor

jggautier commented Aug 10, 2017

Just came across this dataset with an ampersand in the title: http://dx.doi.org/10.7910/DVN/XJVVQX

In the citation box (and the breadcrumb), &amp; winds up in the dataset's title. (Not sure how related #3845 is.)

I imagined solving this issue could involve querying the database to identify datasets that have unsupported characters in certain metadata fields (searching for those datasets isn't possible now (#2702)). We could curate those datasets more thoroughly, and if we have a better sense of how often unsupported characters are entered in metadata fields, we could make a case for development work that could help with curation (e.g. something that might allow or transform those unsupported characters or add validation to those fields so the depositor is warned about unsupported characters before creating datasets).

I agree this is an ongoing issue with no clear definition of done.

@pdurbin pdurbin self-assigned this Aug 10, 2017
@pdurbin
Copy link
Member

pdurbin commented Aug 10, 2017

@jggautier thanks for bringing this issue to our attention. It's certainly related to #3845 so I grabbed it to at least leave a comment when I move a pull request into code review.

@pameyer
Copy link
Contributor

pameyer commented Aug 10, 2017

One suggestion I mentioned to @jggautier is to run a validator on the exported metadata file, rather than querying the database for known problematic strings (with the principle of "asserting goodness" rather than "enumerating badness"

@pdurbin pdurbin added User Role: Curator Curates and reviews datasets, manages permissions and removed Vote to Close: pdurbin labels Aug 10, 2017
@jggautier
Copy link
Contributor

Closing this, since it's being tracked in Harvard Dataverse's curation github.

@pdurbin pdurbin removed their assignment Aug 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata Type: Suggestion an idea User Role: Curator Curates and reviews datasets, manages permissions UX & UI: Design This issue needs input on the design of the UI and from the product owner
Projects
None yet
Development

No branches or pull requests

7 participants