Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unresolved feedback from community review of Citation metadata fields #8467

Open
jggautier opened this issue Mar 8, 2022 · 0 comments
Open

Comments

@jggautier
Copy link
Contributor

jggautier commented Mar 8, 2022

Members of the community who reviewed the changes being made in the first round of improvements to the Citation metadata fields (#8127) left a lot of feedback and a good amount of it could not be addressed in those changes. To keep the momentum going, we're posting a summary of that unaddressed feedback in this GitHub issue. Some of it can probably be tackled separately, but addressing much of it should involve research with the community about the intended, actual and planned use of certain metadata fields and research into the use of UI components for gathering better metadata.

  • Should all fields have tooltips? What are the merits for always using tooltips? What are the merits for using tooltips more sparingly?
    Technically all field names are given tooltips, even when there is no text. This has led to some tooltips containing only the name of the field itself, if only so that something appears if a user hovers or clicks on a tooltip. We should consider the form design principle that using tooltips sparingly can make the used tooltips more prominent and more likely to be used.

  • Should all fields always be named in the UI (have bold titles/field labels)?
    We proposed removing the Description's "Text" field (in the Citation metadatablock), since it seems redundant. Is it necessary? Is the Description field the only compound field where we’d consider removing or having the UI not display the name of a child field? The Description field is a compound field only because the DDI element that it's mapped to includes a “Date” attribute, which is meant to distinguish between descriptions written by data producers and archives. So the field used in the Dataverse software's Citation metadtablock includes a Date field. But it doesn't include any field that maps to the DDI element's "source" attribute, and without it, it's not possible to tell which dated description was written by a data producer and which was written by the "archive." Have Dataverse repositories been using the Date field in this way? For repositories that are migrating or plan to migrate datasets described with DDI Codebook into Dataverse repositories, does their metadata include the use of these date and source attributes?

    Or can the Description's Date child field be removed from the Citation metadatablock and from every repository's existing datasets? If the date field can be removed, then the Description field would become a primitive field, there would be no child field to name, and there may be no other compound fields whose child fields have redundant names.

    If child fields didn’t have names in the UI, e.g. the Description field’s “Text” child field, how would documentation reference those fields? For example, if someone read in a user guide that they should fill in the Description field, would it be obvious to that person that they should fill in the nameless textbox on the dataset create page?

  • Can we improve how the names of child fields are displayed on certain pages?
    In the guidelines document, we write that child fields should not include the names of their parent fields. This is because when the child field is referenced alone on other pages, e.g. the search facet categories and the advanced search page, the parent name is added to the front. For example, the "Grant Agency" child field, whose parent's name is "Grant Information", is displayed on the Advanced Search page as “Grant Information Grant Agency”.

    We also write in the guidelines document that the displayed names of parent fields should remain singular. We wanted to pluralize the names of parent fields that took multiple values, e.g. Authors, as another way to indicate to depositors that those fields accepted multiple values, e.g depositors can add multiple authors. But we couldn’t because of how the names of child fields are constructed on certain pages. For example, we want to avoid having the UI form display "Authors Name" or "Keywords Term" when those child field names are displayed as search facet categories and on the Advanced Search page.

    When these field names are translated in other languages, this convention is even more of a pain point. The GitHub issue at Internationalization - Compound names in Facet Category/Facet Label #6573 is about this.

  • What guidance can we include in the metadata field guidelines about how to reference the names of child fields?
    How should the names of child fields be referenced in written guidance, e.g. repository’s own user guides and metadata guides?

  • How might we align appropriate fields (e.g. Producer, Distributor, Funding Information) to terms in the “CRediT” controlled vocabulary?
    Conversation from reviewers about this point is below. Also important to note the related issue at Feature Request/Idea: Incorporate CRediT vocabulary for author/contributor roles #8213 and that the group that works on the DataCite metadata schema is having similar conversations. (See their roadmap page and public discussion in the forums.)

    • Steve McEachern: For each field in metadatablocks that ship with the Dataverse software, Steve suggests that we consider how other standards describe equivalent fields. Steve has specifically suggested looking at how terms from the contributor roles vocabulary “CRediT” are defined. Some of the CRediT terms could be mapped to some DDI fields like Producer and Distributor.

    • Steve McEachern: This (the Producer field) would be better if it is aligned with an external vocabulary (DDI-C predates the roles discussion internationally - but we have started to confront it as well). Suggest using a CReDIT (https://casrai.org/credit/) or similar equivalent for comparability purposes - e.g. could be "Equivalent to CReDiT's "project administration" role". Note that there are backwards compatibility questions here for some installations (who would have to figure out if any of the definitions align with their own current usage).

      CReDIT definitions:
      Project administration: "Management and coordination responsibility for the research activity planning and execution."
      Funding acquisition: "Acquisition of the financial support for the project leading to this publication."
      Supervision: "Oversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team."

      So this role is the RECIPIENT of any grant, not the Granting Agency

    • Sebastian Karcher: But CReDIT would mostly imagine individuals here? I've always thought of the Producer field principally for orgs.

  • Which fields in the metadatablocks that ship with the Dataverse software could use example values (or more example values) in their tooltips and what should those examples be?
    Several members of the community that reviewed the changes to the Citation metadatablock fields wrote about the need for including in some tooltips more examples of what depositors should enter in the tooltips' metadata fields.

  • What are the differences between what's expected in the Citation metadatablock's Keyword field and what’s expected in its Topic Classification field?
    If there are differences, how can these differences be made clear? Both tooltips, and the DDI Codebook elements that both fields map to, reference LCSH. Are any Dataverse repositories or repositories planning on migrating to a Dataverse repository, enforcing any distinction between these two fields? Maybe more expertly-curated repositories that have migrated data from Nesstar? If there's no difference, should the two fields be merged and should only one be used going forward?

  • For the Citation metadatablock's "Characteristic of Sources Noted" field, should we include an example in its tooltip? What example?
    See the DDI-C definition. We can also look at what people have entered in this field and in the related “Origin of Sources” and “Documentation and Access to Sources” fields.

  • The tooltips of the Citation metadatablock's "Date of Collection" and "Time Period Covered" fields were edited. Do the changes help clarify to users how the two fields can be different? If not, how can the differences between these two fields be made clearer?

  • Review the use of the Citation metadatablock's "Deposit Date" field and how DDI Codebook defines it.
    In particular, how has the field been used for datasets that were published someplace else before being moved to a Dataverse repository?

    This has been brought up before, but recently Michael Steeleworthy wrote:
    "The Portage DV Metadata group spent a lot of time trying to understand the different dates in the citation block, especially the distro date and deposit date. Of note here, the DDI element [for Deposit Date] is defined as "The date that the work was deposited with the archive that originally received it. ..." (my emphasis.) This always gets me thinking about how <distDate> and <depDate> is used in Dataverse repositories vs how it is used in DDI, and if it has an effect on interpretation of the field's content when indexed by other platforms."

  • Review the use of the Citation metadatablock's "Distribution Date" field for embargo end dates
    DataverseNO uses this date field for embargo end dates (see discussion in Embargo: I want to set an embargo period to control when my data will be accessible. #4052). Are other repositories doing or encouraging the same with this or other date metadata fields? With embargo support in the Dataverse software, should Distribution Date be used for the embargo end date? If not, what should Distribution Date be for a dataset version that's been embargoed?

  • Character limits for the displayed field names/bold titles
    Is there a technical limit for the number of characters allowed for each displayed field name/bold title and should that limit be mentioned in the guidelines? Some of the bold titles in some repositories have over 200 characters. (See the bold titles in the “Alliance for Research on Corporate Sustainability Metadata” metadatablock in the dataset at https://doi.org/10.7910/DVN/25840)

  • How might the deposit forms make it easier for depositors to enter metadata?
    What improvements can be made to the deposit forms so that depositors add metadata that improves the discoverability, reuse and preservation of their data? The following is a summary of some discussion about the UI elements used to give guidance to depositors and data searchers and issues with how they’re used.

    • Because the tooltips are reused across different pages (e.g. when creating/editing datasets, when viewing datasets, when using the advanced search page), we’ve removed any instructions in the Citation field tooltips for how depositors should enter metadata. Otherwise people in different contexts, e.g. people who are looking for data, will also see those tooltips and it doesn’t make sense. Can we use tooltips to give instructions to depositors in ways that make sense for people who are looking for data and may also see the tooltips?
    • If metadata must be entered in a specific unit of measurement, e.g. feet or milligrams, how should depositors be made aware of this requirement? Should that be specified only in the title? Some bold titles indicate a unit, e.g. “Maximum (m)” where m is meters. This is inconsistently applied in the Astrophysics metadatablock. Or would adding the requirement in a watermark make it more noticeable to depositors?
    • Some fields should be filled when other fields are filled, but the forms don’t always give depositors guidance about this. It’s technically possible to force one child field to be filled if any other child fields are filled, e.g. the Producer field (see Custom Metadata: Allow Dataverse Installations to Define Conditionally Required Fields for Compound Fields #7606). This feature may work for some other fields, like the “Grant Information” field where the Grant Agency field should be filled if the Grant Number field is filled. But this feature doesn’t work for fields like the Author field, where the Author ID Scheme should be filled if the Author ID Number field is filled. This is because the Author field has other child fields that don’t need to be filled for the Author ID field to make sense, e.g. the Author Affiliation field.
    • Many fields that ask for an ID ask depositors for both an ID Scheme and an ID Number. This isn’t always an intuitive way to ask for this information, particularly for DOIs and ORCIDs where it’s easier for depositors to get and enter the ID URLs. But for other types of IDs, like ISBNs, the URLs may not exist or may not be easy for depositors to get.
  • Left justify the tooltip text for better readability
    The tooltip text is centered but justifying it to the left will make longer tooltip text easier to read

  • Improving support for letting depositors both choose controlled vocabulary terms and type in their own
    In the Dataverse software, fields with controlled vocabularies let depositors choose only terms from that vocabulary. But there are cases where people managing data collections want people to be able to both choose terms from a controlled vocabulary and add their own terms if they feel no existing terms fit. A workaround sometimes used to allow for this is to include an “Other” term in the vocabulary list and include an “Other” field where depositors can add their own terms. But it’s not possible to mandate that depositors fill in that “Other” field if they choose “Other” from the controlled vocabulary field.

    Letting people enter their own terms into a field that includes a controlled vocabulary is a common way to improve controlled vocabularies. The terms that people write in can be reviewed to see how the controlled vocabulary can be improved: E.g. Are there missing concepts? Are concepts named in ways that are different from how most people think of them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant