-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support flexible DataCite resourceType metadata in a Dataset #7077
Comments
Are you suggesting to introduce a general resource type "Software" (or similar)? In the SCID WG report you refer to, they are also talking about a more granular classification of "software artifacts", e.g. code fragment, file, directory, commit, release, ... And at the end of the report, the authors conclude that "[t]he next step would be to produce a set of recommendations based on these findings". Maybe the implementation in Dataverse should be based on the forthcoming recommendations? As for the resource type of files within a dataset, you say:
I'd rather not replace "Dataset" with "Collection" as resource type to classify datasets. Although datasets may be seen as collections of files, I think we should reserve the term "Collection" for collections of datasets and other research outputs (e.g. software). That's at least how I interpreted "Collection" when I applied it to a sub-dataverse within DataverseNO upon request from a research group who wanted to have a DOI for their whole sub-dataverse / collection; cf. https://doi.org/10.18710/AJ4S-X394. |
@poikilotherm this is something we have discussed and there are many related topics/issues. I am pinging @jggautier and @djbrooke, as they are involved in considering this and related questions. |
@TaniaSchlatter thank you! @philippconzett About my suggestions: I would like to see a new metadata field in
As far as I understood the DataCite schema, they are open for more detailed/descriptive types in the XML value of |
While working on a GitHub integration, it was inherent that we wanted Dataverse to support software publication, so we spoke about how the current metadata exports label what's being published as "datasets" and how software would need to be represented in the metadata. But the DataCite and Dublin Core vocabs for resource types mention more than software. @poikilotherm, is this GitHub issue, and in particular your comment above this one, recommending that Dataverse support the publication of all of those types of objects or just software/software artifacts? |
The software publication is just my particular use case. The issue and implementation would, if you think that scope is fine, be about complete flexibility, as IMHO it doesn't make much sense to limit this to "DataSet" and "Software" artifically. Instead, we should go for allowing the complete controlled vocabulary of terms as mentioned above. In terms of metadata blocks it's easy to do because it is a CV, can have a sensible default ("DataSet") and can stay hidden from the user if not supposed to be important for a Dataverse. In terms of creating the XML for DataCite it's easy to do, because it's about inserting two values in a simple The metadata field in
The controlled vocabulary would be as follows:
Note: I left out Again: |
Thx @pdurbin initiating a call appointment for this. @qqmyers as this is metadata related, would you like to be included in the poll for date and time? |
Thanks. I'd suggest @adam3smith - he's very interested in/knowledgeable about best practices in reporting to DataCite. I'd be happy to join in but it looks like this would mostly involving using metadata blocks as designed versus requiring design changes (where I think my metadata focus is). |
Thank you @jggautier and @djbrooke for our video call earlier today. Let me summarize our findings for future reference and being SLOPI. We discussed the topic and came to the conclusion, that you would rather see this as part of a bigger solution towards finally solving #2739. @jggautier kindly provided a list of work items to properly support software deposition in Dataverse at https://docs.google.com/document/d/1cDzVyc70SXYnbdRolYfY9tSwu9NMzHaD-FcNGW_sNyU. A short summary:
For now, other key features are in focus, so we agreed this will happen in our fork for Jülich DATA right now. If we are to work on more items from that list, we keep each other posted via issues, screenshots etc. Our work would serve as a starting point for upstream support of this feature, once this gets more traction on your side again. As this still of interest for you, I will leave this issue open. @djbrooke if you feel we should shorten the list of open issues, feel free to close. |
For a recent discussion happening in #8536 I looked at the DataCite Schema 4.4 for
|
Recent discussion here: |
sizing:
|
Leaving a note here that we might want to use https://vocabularies.coar-repositories.org/resource_types/ |
In this pull request... ... at 8593d32 I'm sending "Dataset", "Software" or "Workflow" for resourceTypeGeneral to DataCite. (Previously this was hard-coded to "Dataset".) Here's an example of "Software" (next to the name, pyDataverse) in the DataCite test environment: |
tl;dr: Dataverse should offer a UI component to select the general dataset type with a CV based on DataCite/Dublin Core. The selected type should be used for metadata registration at DataCite.
Related issues:
Jülich DATA has an open request by @sciapp to publish software in our repo and get a DOI for releases (so inline with FORCE11 recommendations and DataCite recommendations).
Recently, the RDA WG has published a paper open for community comments about the different PID options for software publications. The interesting part for Dataverse is the diagram for registered software datasets at Datacite.
Currently, software that gets published via Dataverse, will have a
resourceTypeGeneral="DataSet"
attached to it, as the metadata template does not allow for customization. (This is also true for #5086, where we might think about using type "Collection" for the dataset automatically and specific types for the files.)Having software counted as "DataSet" makes things less discoverable and does not push research software engineering forward. See DataCite Schema Docs for a complete list of types.
A full example of software metadata can be found at DataCite.
Things to do for an implementation:
citation.tsv
, using a controlled vocabulary based onresourceTypeGeneral
. Should be mandatory, but happy to discuss. This will definitely have to be presentonCreate
, but could use "DataSet" as a default!As this is a request to our ZB services and will become more important for Software Citations, we offer implementing it at the dataset level. Maybe @philippconzett can collaborate to provide it for file level in the same go?
Comments please! 🚀
Pinging @IngoHeimbach @doigl @bronger @mfenner @TaniaSchlatter
The text was updated successfully, but these errors were encountered: