-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a researcher, I want my provenance to be made available to other researchers at publish time so that they can make better use of my data #4381
Comments
ping @MKLau |
On the upload page, can we refer to the provenance as "Documents" rather than "Bundles"? We're storing the data as bundles on the backend, but they're gonna be uploaded as standard PROV-JSON docs. The current wording may confuse people. |
@jacksonokuhn Your suggested revisions for the label/help text in the UI has been committed by @matthew-a-dunlap as part of issue #4343 b058423. @dlmurphy was going to address the User Guide revisions in that branch as well. |
Update: instead of "bundle file", we're using "provenance file" as our terminology to describe the JSON file that users upload which contains provenance information about their data. |
Updated the mockups link in the first comment from @djbrooke to point to the mockups that include the Provenance tab on the file page, as opposed to the file upload workflow, which is covered in issue #4343. Included in that 4343 issue is a comment of mine that outlines "what we are building". Here is an outline for this preview issue. What we are building File Pg
(See issue #4345 for pre-publish provenance preview.) |
so here's basically what needs to happen on publish as a series of api calls:
for each datafile:
|
to connect the prov generated by publish to user uploaded prov, we do the following:
if the datafile was changed on ingest
|
During our weekly technical meeting we identified an additional need for dataverse communicating with the provenance system (CPL). Specifically, we identified that CPL needs us to combine our uploading of provenance json into one call in cases where the provenance json is the same. To do this, we will need to look at all the provenance json being sent to CPL when we publish and see if any of the bundles match. In cases where they do match, we will need to tell CPL that that bundle points to multiple DataFiles and provide those files in a list. (Note: this description is a tad vague as I do not understand Prov syntax and the CPL apis well enough to give better details.) Someday this could be improved with a UI enhancement to allow the user select for one DataFile a prov.json file already uploaded for another file. Open questions:
|
Hopefully b23c396 gives a sense of what I've been working on lately, which is the "automatic" creation of prov data based on normal, boring interactions that Dataverse has supported since the beginning of time. That is to say, all I'm doing is creating a dataset, uploading a file, and publishing the dataset (all in After looking at that commit and seeing that I was starting to implement something that's different than what @jacksonokuhn had in mind, he updated his comment at #4381 (comment) to put in more detail (thanks!) of the REST API endpoints that should be called into from the To work on this issue, one must have the prov system running, which @matthew-a-dunlap and @sekmiller have done following instructions in pull request #4461. In #4364 @jacksonokuhn is working on moving the config from the Dataverse git repo to the cpl-prov repo but for now those configs in the pull request work fine. You download a couple files and spin up an Ubuntu VM in Vagrant running prov. I'm on vacation next week so I'll take myself off this issue. |
So @matthew-a-dunlap and I were talking and it looks like there was a bit of confusion around order of operations. We should generate the automatic publish PROV and THEN upload the PROV bundle. I can talk about it more with whoever is working on this. |
Will close for now, will reopen if we decide to take on a similar approach with further provenance work. |
In #4343, we've added the ability to add Provenance files/freetext to Dataverse.
When this story is done, we'd expect those files to be sent to the provenance system and saved.
The text was updated successfully, but these errors were encountered: