Multiple values being entered into metadata fields that expect only one value (e.g. authors and keywords) #4035
Labels
Feature: Metadata
User Role: Depositor
Creates datasets, uploads data, etc.
UX & UI: Design
This issue needs input on the design of the UI and from the product owner
This has been brought up in other channels but I haven't seen it reported in a github issue.
Some dataset depositors entering terms into the keyword metadata field enter a string of keywords separated by semi-colons or commas. I suspect that most of the time those keywords are copied from the associated articles' keywords and pasted into the dataset keyword field. Depositors doing this don't realize that if they want to enter multiple terms, they should click the + sign for another set of keyword fields (and of course they might be expecting that the metadata field will parse the string, since they've seen other applications do it).
This is a problem because the fields don't split the strings by the common semi-colon or comma characters in order to treat each keyword as a separate term; it turns the whole string into one term, which hurts discoverability:
To get a sense of how often this is done in Harvard Dataverse and how much of an issue it is, we could query Harvard Dataverse for all of the keywords of non-harvested datasets, then see how many contain semi-colons and/or commas.
I think we could also do the same query for harvested datasets in Harvard Dataverse, which might help indicate the size of the problem with the way keyword metadata is being harvested.
Both harvested and non-harvested datasets have keywords like this. Last November, Leonid sent me the results of a query that included harvested and non-harvested datasets with keywords metadata. Of the appr. 20,000 unique datasets in those results, about 2500 (12 percent) have keywords containing one or more semicolons. (I'm not including keywords that have only commas, because it's more likely that a greater portion of those are really just one-term keywords that happen to contain a comma (e.g. "Firm Objectives, Organization, and Behavior" from JEL codes or lots of terms from LCSH).
The text was updated successfully, but these errors were encountered: