-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embargo: I want to set an embargo period to control when my data will be accessible. #4052
Comments
This is related to/duplicate of Ability to Set a Dataset Availability Schedule #934. |
@aschenbach - I closed the previous issue you commented on (#934) in favor of this. |
I'd like to draw your attention to the article Overly Honest Data Repository Development. Under the heading Delayed Release, a.k.a. Embargoes there is an interesting discussion on how embargo is handled in the Illinois Data Bank. |
@philippconzett this a fantastic resource. Thanks! I'm glad to see they did some user research before hacking on the code. 😄 |
@pdurbin We're considering putting embargo functionally on our development roadmap... it is something QDR really wants to implement in the next year ... Are you all still working on this feature? ... If not, we will do some scoping on our end and get in touch. |
@nniiicc hi! I'm not aware of any immediate effort we're planning and your comment is enough for me to tag you in the "Dev Efforts by the Dataverse Community" spreadsheet I mentioned at https://groups.google.com/d/msg/dataverse-community/X2diSWYll0w/ikp1TGcfBgAJ . For now I made the status "scoping the project" and I'm sure that people would be interested in whatever you write up! |
@nniiicc - it's awesome to hear that QDR may take this on. We're happy to work with you and other community members if you'd like to lead on this. |
When we're approaching this, we should consider three user stories:
Also to consider: when the data is no longer available, what happens? Deaccessioning? Or are we introducing a new concept? Maybe the first two would make sense to handle first as they are more similar. |
+1 for embargo feature request. @ubc, we have been requested many times to embargo specific datasets, it would be brilliant to allow a feature to do just that! |
The most typical use-case for dataset embargo is where the embargo is lifted only when the related research article is published. The author and their chosen journal don’t want to make the data public until the manuscript is published. However, since the publication date is not known when the data are deposited, setting a specific embargo date is not possible. (Peer review takes a variable length of time, and there is no guarantee of journal acceptance). The simplest and most pragmatic way to handle this would be to allow the author to manually lift the embargo once they are notified of the publication date by the journal. A more automated solution would be to support an API call from publisher workflow systems (such as Editorial Manager www.ariessys.com ) to automatically lift the embargo. |
@Richardcwynne good points and thanks for contributing to the discussion on this during this week's Dataverse Community Meeting. It looks like there were some light notes taken in the "UX/UI Collaborative Session Notes" doc at https://docs.google.com/document/d/1JlcwuQ8WIiVzG5n9ejfIlbZFX_oekvvWU7vkhtITCqc/edit?usp=sharing (people should feel free to add more!):
@nniiicc are you still interested in helping with this effort? |
@Richardcwynne I was hoping to catch you during one of the breaks, but can also leave my question / comment here. Is there any difference between a dataset that is "embargoed indefinitely" versus just keeping the dataset unpublished? The way I/we had been thinking is that embargo was a way to have Dataverse publish a dataset at a future time in an automated way. If it requires human (or external) interaction, wouldn't indefinite embargo just be the same as leaving the dataset unpublished? (and then having the researcher manually publish, rather than manually lift the embargo; or similarly, having the external system call existing publish API) |
I actually disagree with @Richardcwynne about his interpretation on embargo: Embargoed data is a published dataset with active DOI where data files are restricted until the end of the embargo period. This way, the data are discoverable even when the data files aren't available yet. @scolapasta we're still working on our roadmap for the next months -- given the current status, it's not a particularly high priority, but it's definitely still on there, yes. |
@scolapasta - sorry I missed you (I cannot be there today). Agreed, an indefinite embargo and a dataset remaining unpublished indefinitely are functionally equivalent. I don't have detailed knowledge of the DataVerse system, so the best I can offer is to describe the use case from a journal point of view. Typically the journal wants access to the dataset to be embargoed until the publication date of the related manuscript. (However deposit prior to manuscript publication is very beneficial since it generates a DOI that can be incorporated into the body of the yet-to-be-published manuscript). To help distinguish the use case in the UI, maybe you just add an embargo option called something like "Embargo until manuscript publication". Even if this is functionally equivalent to indefinite embargo it might help reassure and communicate the workflow to the researcher. Hope this helps. Drop me a note on LinkedIn if you'd like to set up a call to discuss in more detail. |
After internal discussion and taking community input into account (thanks for the discussion!), we are defining embargo as the system publishing a draft dataset version on a date, and plan on moving forward with the following as an initial step:
To do:
|
I don't quite understand how regular users would be able to see the embargoed data (landing page) if the embargo is implemented this way. Am I missing something? In my understanding (and I'm pretty sure that's how this is commonly understood in the data community), embargoed data has visible metadata & landing pages and a published (not reserved) DOI with |
Hi @adam3smith. Thanks for adding this! Could you include some reasons why it's important that the metadata is published (even though file publication is delayed)? I've read that it's helpful for the public to know that the dataset exists, even though the files won't be available. But @TaniaSchlatter, would some concrete examples about why help? Perhaps it would have an affect similar to pre-registration? |
@adam3smith For your use case, the steps would be:
Thoughts? |
Thanks @scolapasta -- I think that should work. Do I understand correctly that v. 1.0 then clearly indicate the existing embargo as part of @TaniaSchlatter 's to dos? Having the date the embargo runs out clearly visible to the public is crucial (see below). @jggautier The most obvious case is replication data with a 2 years after publication first use In that case you would want to be able to see that replication data was deposited and CrossRef should be able to link to the data when the journal article is published. People would then be able to see that such data exists, roughly what it contains (i.e. metadata) and when they'll be able to access it. |
@adam3smith is there a use case for supporting "Request Access" to these restricted files in a dataset with the embargo set or would that feature be disabled? |
@scolapasta In that case it is also possible to publish some documentation files with v1.0, which is very nice. The only problem I see is that this workflow isn't explained through the GUI (right?) |
@mheppler -- I don't know. I'd say typically no -- embargo is embargo, not conditional access -- but I'm sure there's some use case. I wouldn't think it'd be important enough to put much effort into fulfilling it, but I wouldn't rule it out if it's easy, I guess? |
Sorry for my late reply. Embargo should definitely be implemented as a functionality for PUBLISHED datasets, cf. my original request: "In quite a few cases, researchers would like to deposit and publish metadata about their project data, but also be able to specify an embargo period ..." |
I'm not sure how far the implementation of the embargo feature has come, but I'd like to add some more thoughts about the definition of metadata about embargoed datasets. According to the current version of the DataCite Metadata Schema (4.1), the following applies for the DataCite metadata field PublicationYear:
@jggautier pointed out the following in the Dataverse Users Community Google Group:
As far as DataCite is concerned, I think we could solve this problem using file level DOIs that are now available in Dataverse in the following way:
Best, |
I learned in #5920 that OpenAIRE has an "embargoedAccess" access level: https://guidelines.openaire.eu/en/latest/literature/field_accesslevel.html The definition of embargoedAccess is the following from https://wiki.surfnet.nl/display/standards/info-eu-repo/#info-eu-repo-AccessRights "This type of access means the resource is Closed Access, until released for Open Access on a certain date. (this date can be specified elsewhere in the metadata) The incentives to embargo a resource has often to do with the fairness towards publisher to provide them an advantage as the exclusive first deliverers of knowledge. Also embargoes are used in sections of a Ph.D. thesis that are concerned with classified corporate information." |
Hi Thomas |
@tjouneau hi! This issue is linked from the "Future" section of our roadmap, which you can find at https://www.iq.harvard.edu/roadmap-dataverse-project Here's a screenshot: |
Embargo support has been implemented by pull request #8020 which was just merged. |
This issue is connected to/replacing issue #343.
This issue is also discussed in this Google Group posting: https://groups.google.com/forum/#!topic/dataverse-community/tw3qLvS6POo.
I would like to suggest an embargo function to be integrated in Dataverse.
In quite a few cases, researchers would like to deposit and publish metadata about their project data, but also be able to specify an embargo period, i.e. a period of time during which the data are not openly available. Thus, there should be a field "Embargo date", where one can enter the date when the data are going to be released/unrestricted.
I'm not sure whether the embargo function should be available both at dataset level and individual file level. If so, the function should also ensure that in case of conflict (= wrong setting), there should be a warning message. Alternatively, embargo can only be specified at file level, and the embargo information at dataset level reflects the embargo setting(s) at file level.
It would also be useful if the system could send a message to the depositor a couple of days before the embargo date to remind him/her about it.
As for the access information in the metadata, it should be compliant with the OpenAIRE Guidelines; discussed in this Google Group thread and in this Github issue. On file level, the choice between the different labels should be straightforward (closedAccess, embargoedAccess, restrictedAccess, openAccess). But I'm not sure how the labels apply on dataset level in case different restrictions are chosen for different files. If, e.g., one file is embargoed all others are open, probably the access information on dataset level should reflect the most permissive access setting, i.e. openAccess?
When the embargo period is passed and the data is released, the access information has to be updated accordingly.
As workaround until an embargo function is in place, we are considering to use the metadata field Distribution Date (= Date that the work was made for distribution/presentation) for specifying the embargo date. We could then run a daily script to check whether any distribution date matches the current date. If yes, the curator would then unrestrict the files. (Alternatively, we could make the script also run the unrestriction). Does this sound like a feasible solution?
Best,
Philipp
The text was updated successfully, but these errors were encountered: