Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata Schema Reference sheet #272

Open
fenekku opened this issue Feb 22, 2019 · 4 comments
Open

Metadata Schema Reference sheet #272

fenekku opened this issue Feb 22, 2019 · 4 comments

Comments

@fenekku
Copy link
Collaborator

fenekku commented Feb 22, 2019

[EDIT] For metadata reference, see now:

Core metadata model: inveniosoftware/invenio-rdm-records#1
Our extension: inveniosoftware/invenio-rdm-records#2

--

This is a reference sheet for what metadata the record must store or be able to derive:

Field or equivalent Notes Why Task Implemented
Title Required input Citation, DOI minting #10 ✔️
Authors (Creator) Required input: first name (req),middle name, last name (req). For unknown authors: enter Unknown in required fields Citation, DOI minting #10 ✔️
Description Required free text input To describe record -- ✔️
Resource Type Required input #10 ✔️
DOI Generated by DataCite want persistent, unique, embedded in larger id collection, identifier #10 ✔️
Contributor Can be placed in imagined People section as a role Attribution, Optional DOI entry TODO
Date Created Autogenerated -- ✔️
Grants and Funding For impact assessment, funding sources TODO
Keywords Subjects are from controlled lists. Keywords is from crowdsourcing
Language Language of content. Optional input General metadata TODO
Location Location of presentation (location used in citation, this should be the one)
Original Bibliographic Citation Can this be: "Would like to be cited as" ?
Original Identifier Pre-exisiting DOI if any; merge with DOI
Page Number Optional. Only for Book, Text Resources and Articles
Private Note SUPER_USER, librarian, owner, proxy can see it
Publisher Auto-generate menRva, but allow override. Multiple publishers should be allowed but to be seen how realistic this is
Publication Year Auto-generate
Related URL Optional input
License (Rights) Required input ✔️
Subject: Geographic Name Location of subject matter - Feed from MeSH
Subject: MeSH, Subject: LCSH Optional input ✔️
Subject: Name Optional. Name of person/organization referred in content (e.g. book about someone)
Visibility Who can access the record: Public, Restricted, Private (+shared with) missing shared with
Acknowledgements Attribution
Abstract Optional. Actual abstract of document if any

Related links:

@saragon02
Copy link
Collaborator

Working menRva metadata schema, based on DigitalHub and removing ARK:

Abstract
Acknowledgements
Contributor
Creator
Date Created
Description
DOI
Grants and Funding
Keywords
Language
Location
Original Bibliographic Citation
Original Identifier
Page Number
Private Note
Publisher
Related URL
Resource Type
Rights
Subject: Geographic Name
Subject: MeSH, Subject: LCSH
Subject: Name
Title
Visibility
 
Extra fields available for resource type groups: Dataset, Articles, Study Documentation, Theses & Dissertations and Text Resources
Data Access
Data Collection Method
Tools & Measures
Study Type
Research Design
Sample Size
Subject of Study
Population Gender
Population Age

@fenekku fenekku added this to the Phase 2 milestone Jun 10, 2019
@saragon02
Copy link
Collaborator

Some questions from the table may need to be answered based on further metadata subcommittee meetings. My recommendations to date are below:

Contributor: Great idea for a People section, and to have Contributor as a role
Keywords: Can ‘live’ near the Subject section, but these are the user-generated keywords that might eventually be saved into menRva’s own crowdsourced dictionary
Location: This is location of publisher or where the work first went public (e.g., city where a conference presentation was first given). Can be auto-generated if that information is easily harvestable
Page Number: Make this field available for resource type categories: Book, Text Resources and Articles
Private Note: Only visible for the record owner
Publisher: Allow multiple to display, but menRva always displays first
Subject Geographic Name: Different from Location, more about a distinct space as the subject matter of the thing deposited. Fed from: http://id.loc.gov/authorities/subjects/sh85089606.html
Subject Name: Refers to a person being the subject matter of a deposit (e.g., a book written about someone). Fed from: http://id.loc.gov/authorities/names.html Format of display can be in LOC Name Authority format, Last Name, First Name, Middle Name or Initial
Abstract: Different from Description and still needed. People have used them interchangeably in DH, but one use case is for archival items, where a description of the physical thing is often entered in Description, and other descriptive information is entered in Abstract. Some also use Abstract as it would be used in a scholarly publication.
Original Bibliographic Citation: Will follow up with format preference.

@fenekku fenekku modified the milestones: Phase 2, DigitalHub Parity Jul 22, 2019
@fenekku
Copy link
Collaborator Author

fenekku commented Aug 14, 2019

Raw from notes by @LisaOKeefe1 about discussion on metadata with @lnielsen :

  • Metadata: Galter’s fields
    • Title - for RDM need a single title (will not be repeatable)

      • Zenodo uses DataCite values for Unknown
    • Author - usually in two, given (includes middle) and family names (w3C)

      • Nielsen, Lars Holm
        • Any research you’re dealing with will involve non-Western names.
      • Incorporating ORCiD. Invenio will clean up ORCiD # vs. URL
        • Think strategically about partnering with ORCiD (KH)
        • Authority records could be used/helpful (maybe LC Name Authority File, VIAF)
      • For future: enabling author paste from PDF articles. Will facilitate easy formatting from paste
      • Organizations as author - leveraging ROR (Research Organization Registry)
    • Description - should be a single description field

      • Plus multiple “Additional notes” denoting other metadata like a description (e.g., Abstract)
      • Side discussion: Private notes: what is a proxy? How is that different than submitter? Why is it not seen? What is the intent?
    • DOI - at Zenodo we generate internally. Galter asks DataCite to provide.

    • Resource type - should generally be two levels (type + subtype).

more editing needed
Galter’s COAR-based resource types were generated hierarchically and this hierarchy is mapped and indexed on the back-end. If this is complicated for indexing, we can take it out.
Contributor - controlled list of contributor types should be customizable
Zenodo’s pulls from Datacite
Galter’s could pull from the CRO
In Galter’s UI we are thinking of a “people” group of fields, which can be tagged with roles. Only those tagged with “Author” populate the citation
Kristi shared CRO on the Ontology Lookup Service website
Lars wondering if roles are done. Yes, just getting input.
KH - we can take this, sit down with Martin, map their contributor roles to ours.
Lars - to add it we need to determine if it is a new thing to add. If so, they add it.
Grants funding
Optimization to make it easier. Registry of grant numbers
Open-AIRE has the biggest database, some US funders. Crossref has open funder registry.
Kristi - we can get the federal stuff. There’s an API to NSF, DARPA, a bunch of them.
Sara - we wonder if we need three fields, grant name, number and link (provided in the DataCite XML schema)
Lars - problem comes down to grant number if funder doesn’t have grant identifier or if it’s not persistent. In EU there’s acronym, too.
Can leverage FundRef for the direct link to grant pages. Would run into problems for any grants that don’t have a webpage
Keywords: free text
Language
Will we support multiple?
Internationalization support when?
Recommend asking user to select a primary language of the deposit/record
Location of presentation
About where original material was presented
Datacite doesn’t have fields for this. In Galter’s instance Location would need to be a custom field. See if others have the same requirements
Also, see: exhibit. Where does it take place?
Different from GeoLocation, which refers to a place as part of the subject matter of the deposit/record
Original identifier - can’t accept those
Page number - will be updated by Galter to something like number in sequence.
Publisher - required by DataCite. Will auto-generate “Invenio RDM”
Can supply others as in DH. We’re thinking of restricting to only one
Publication - publication date v submission date. Submission is auto-generated and refers to publication of the record. Publication can be a supplied publication date, can be much earlier
Related URL - we’re going to end up with a collection of 3 fields to describe this. We will most likely customize choices to our needs
For relationType Galter wants to keep:
Related
Cites
Part
an Alternate Identifier of
Is Related To (catchall when no other relationType applies)
License: Could really use a wizard or some sort of definitions/guidance for users as in CC-Australia
Subjects (GeoLocation, MeSH, LCSH, etc.)
Show all the different types in a Subjects group, leverage and make explicit the controlled vocabs we’re using
Acknowledgements - acknowledge some on campus institute
This is a type of “Description” - application of the “Additional Notes” to denote the field for Acknowledgements/
Could hold a text blurb. Otherwise, if it’s about acknowledging ppl who contributed in different roles to the deposit, wouldn’t Contributors with a fully deployed CRO cover this?
Almost, but Kristi mentioned there will always be some who need attribution who can’t be linked to any other way.
“Parking lot” for now. Could leverage JATS and ICJME in the future
Metadata: Zenodo
References: Free text box
Journal fields (for recording journal information for citations. Comes from an earlier version of Zenodo)

@saragon02
Copy link
Collaborator

LinkedDataResources_MeSH_FAST_Others.pdf
If it is helpful, here is a PDF with links to linked data resources for various controlled vocabularies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants