-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantic Metadata API calls #6497
Comments
Thanks @qqmyers for the detailed description here. Sounds cool. If the grant proposal is shareable can you include a link? Any UI impacts or just new APIs and backend changes? |
The proposal body is at https://docs.google.com/document/d/1L24GFIpp75Mc6PNodAk0619eLFAzhmFxv4zuX8Ihzd0/edit?usp=sharing. It promises a round-trip export/import of a dataset leveraging the OAI_ORE/BagIt mechanism already ~implemented for export but doesn't get too specific about how. Initially I expect to make the DVUploader handle import, but that could turn into a button in Dataverse, etc. The intent/hope was to make this something that the Dataverse community could adopt, and getting community input to refine the design and final deliverables is part of the plan. We're also expecting this to be an exemplar for how GDCC can coordinate on community requested developments going forward, so feedback on how to use issues, google docs, community maillist, meetings, etc. are all welcome. |
@qqmyers overall, I love the idea of being able to create a dataset in Dataverse using standards such as an ORE map (not that I'm very familiar with this specific standard). Thanks also for linking to the grant proposal. I see BagPack is mentioned. Great. Do you plan to follow up on the conversation at https://groups.google.com/d/msg/dataverse-community/YlUErmoIl30/gkMXSmQPBQAJ with a link to this issue? Or I can if you want. With regard to the JSON above, a few questions come to mind:
|
@pdurbin - thanks - feel free to crosslink where appropriate. I've tried to look through issues to see how/where things might connect and I/we'll send community emails going forward, but any help in connecting the dots is appreciated.
|
per discussion, I'm going to move forward to implement something here and will use API versioning in doing so. |
@qqmyers sounds good. Since there are purl URLs in this issue I'd like to point people to http://bitly.com/purl-crisis which is a Google doc ( https://docs.google.com/document/d/17TBUja8z8EJGx5ZEyknP3gWuPITf6lt-XkXDHU_CKaY/edit?usp=sharing ) I heard about last week in a lightning talk by @paulwalk at PIDapalooza 2020. From the doc, here are the issues he identified: "Issues with PURL:
In short, he's calling anyone who wants to help with this to add themselves to the Google doc above and get involved. Here's the tweet from https://twitter.com/pidapalooza/status/1222814910209495041 @qqmyers to be clear, I'm not asking you to do anything. Again, I just wanted to mention it. |
Interesting. FWIW: The only purls here are ones that are assigned by orgs like Dublin Core, so presumably any action would be from those guys. Other than updating Dataverse if/when they switch, I don't see anything we can do in design to help for this API. In the larger work to be able to reimport bags, we should expect that we might have to translate from one term to another for reasons like this and add support for that to the design. |
A more flat API JSON structure would definitely make sense. Also to create mappings from/to different data sources. I will do that at least for pyDataverse in the next 2 months, to shift this from the sourcecode level to a XML or JSON file level. easier to maintain and change. So I guess we have some similar issues here. |
add dev guide links to list of APIs IQSS#6497
The current API for getting, editing, deleting dataset metadata requires understanding Dataverse's internal metadata representation and does not map local terms to global identifiers as is now supported in metadatablocks. In contrast, the OAI_ORE export exposes metadata in a way that is more repository-agnostic / globally-interpretable (see excerpt below).
The RDA grant to provide a standards-based archival format that can be exported/imported will need to import based on the OAI_ORE information. To do that, having a way to add/edit metadata, just given the information in the OAI_ORE file would be useful. (Nominally adding information about Dataverse's internal representation to the OAI_ORE would make it possible to format things as required by the current API, but part of the goal is to be interoperable with Bags/OAI_ORE files produced by other repositories which would not have this info.)
I propose to add new add/edit/delete API calls for dataset metadata that would have relatively flat JSON-LD payloads to support this work that I hope will be generally useful/easier to use than the current calls and that I hope can be retained as further work to update Dataverse's metadata capabilities (which I understand may replace/supplement metadatablocks) moves forward.
The general concept would be to allow upload in a format similar to the following, which is just a JSON-LD excerpt from an example dataset's ORE map. FWIW - I haven't fully scrubbed this example to remove things like the dataset version number which is ~read-only or the includedInDataCatalog entry that represents the dataset's location in a dataverse rather than metadata, but I wanted to show the general concept that there is just a set of key/value pairs where there keys are then defined in the
@context
in terms of a global identifier that can be used by Dataverse or other repository to map the item to the correct internal field. (I.e. the initial implementation of these api calls would look at the global term and then look it up in the metadata block definition to identify how to store it, essentially reversing the logic used to scan through metadata fields and look at the block definition to find the global term that is used to produce the ORE map to start with.) Unless/until further work is done, e.g. to allow Dataverse to accept new metadata terms dynamically, these api calls could return an error if the metadata terms/values do not match with configured metadata blocks. Similarly, they could co-exist with the existing APIs since they don't change anything w.r.t. the internal representation.Given the relatively small size and short timescale for the RDA grant work, I'd appreciate any comments/concerns/suggested alternative approaches, etc. asap. (FWIW: I plan to start working on some of the logic to match a globally-defined term to an internal dataverse field to start - seems like this is going to be needed in some form to handle externally produced Bags, regardless of whether the metadata is eventually submitted through the existing apis or the ones proposed here.)
The text was updated successfully, but these errors were encountered: