Embargo: I want to set an embargo period to control when my data will be accessible. #4052

philippconzett · 2017-08-09T08:09:35Z

This issue is connected to/replacing issue #343.
This issue is also discussed in this Google Group posting: https://groups.google.com/forum/#!topic/dataverse-community/tw3qLvS6POo.

I would like to suggest an embargo function to be integrated in Dataverse.

In quite a few cases, researchers would like to deposit and publish metadata about their project data, but also be able to specify an embargo period, i.e. a period of time during which the data are not openly available. Thus, there should be a field "Embargo date", where one can enter the date when the data are going to be released/unrestricted.

I'm not sure whether the embargo function should be available both at dataset level and individual file level. If so, the function should also ensure that in case of conflict (= wrong setting), there should be a warning message. Alternatively, embargo can only be specified at file level, and the embargo information at dataset level reflects the embargo setting(s) at file level.

It would also be useful if the system could send a message to the depositor a couple of days before the embargo date to remind him/her about it.

As for the access information in the metadata, it should be compliant with the OpenAIRE Guidelines; discussed in this Google Group thread and in this Github issue. On file level, the choice between the different labels should be straightforward (closedAccess, embargoedAccess, restrictedAccess, openAccess). But I'm not sure how the labels apply on dataset level in case different restrictions are chosen for different files. If, e.g., one file is embargoed all others are open, probably the access information on dataset level should reflect the most permissive access setting, i.e. openAccess?

When the embargo period is passed and the data is released, the access information has to be updated accordingly.

As workaround until an embargo function is in place, we are considering to use the metadata field Distribution Date (= Date that the work was made for distribution/presentation) for specifying the embargo date. We could then run a daily script to check whether any distribution date matches the current date. If yes, the curator would then unrestrict the files. (Alternatively, we could make the script also run the unrestriction). Does this sound like a feasible solution?

Best,
Philipp

mheppler · 2017-12-02T02:39:27Z

This is related to/duplicate of Ability to Set a Dataset Availability Schedule #934.

djbrooke · 2017-12-03T23:09:23Z

@aschenbach - I closed the previous issue you commented on (#934) in favor of this.

philippconzett · 2017-12-29T07:59:30Z

I'd like to draw your attention to the article Overly Honest Data Repository Development. Under the heading Delayed Release, a.k.a. Embargoes there is an interesting discussion on how embargo is handled in the Illinois Data Bank.

pdurbin · 2018-01-02T20:57:00Z

@philippconzett this a fantastic resource. Thanks! I'm glad to see they did some user research before hacking on the code. 😄

nniiicc · 2018-01-09T17:48:15Z

@pdurbin We're considering putting embargo functionally on our development roadmap... it is something QDR really wants to implement in the next year ... Are you all still working on this feature? ... If not, we will do some scoping on our end and get in touch.

pdurbin · 2018-01-09T18:00:31Z

@nniiicc hi! I'm not aware of any immediate effort we're planning and your comment is enough for me to tag you in the "Dev Efforts by the Dataverse Community" spreadsheet I mentioned at https://groups.google.com/d/msg/dataverse-community/X2diSWYll0w/ikp1TGcfBgAJ . For now I made the status "scoping the project" and I'm sure that people would be interested in whatever you write up!

djbrooke · 2018-01-09T20:17:38Z

@nniiicc - it's awesome to hear that QDR may take this on. We're happy to work with you and other community members if you'd like to lead on this.

djbrooke · 2018-01-16T23:52:05Z

When we're approaching this, we should consider three user stories:

As a researcher, I want to automatically make my dataset available when my book is published so that I don't need to remember to go into Dataverse on publication day.
As a publisher, I want to set when the data for a journal article will be available so that it can be part of my curation workflow.
As a licensed data holder, I want data that should no longer be available due to an expiring license to have its access restricted automatically so that I am not in violation of the terms.

Also to consider: when the data is no longer available, what happens? Deaccessioning? Or are we introducing a new concept? Maybe the first two would make sense to handle first as they are more similar.

eugene-barsky · 2018-04-13T19:40:44Z

+1 for embargo feature request.

@ubc, we have been requested many times to embargo specific datasets, it would be brilliant to allow a feature to do just that!

Richardcwynne · 2018-06-15T11:36:57Z

The most typical use-case for dataset embargo is where the embargo is lifted only when the related research article is published. The author and their chosen journal don’t want to make the data public until the manuscript is published.

However, since the publication date is not known when the data are deposited, setting a specific embargo date is not possible. (Peer review takes a variable length of time, and there is no guarantee of journal acceptance).

The simplest and most pragmatic way to handle this would be to allow the author to manually lift the embargo once they are notified of the publication date by the journal.

A more automated solution would be to support an API call from publisher workflow systems (such as Editorial Manager www.ariessys.com ) to automatically lift the embargo.

pdurbin · 2018-06-15T13:37:59Z

@Richardcwynne good points and thanks for contributing to the discussion on this during this week's Dataverse Community Meeting. It looks like there were some light notes taken in the "UX/UI Collaborative Session Notes" doc at https://docs.google.com/document/d/1JlcwuQ8WIiVzG5n9ejfIlbZFX_oekvvWU7vkhtITCqc/edit?usp=sharing (people should feel free to add more!):

Embargo - under “Edit” option on Dataset page
- Set embargo “expiration date” for when data becomes available
  - Feedback: Often not a specific date. Might depend on when a paper is published and that might be unknown. Suggest a duration instead.

@nniiicc are you still interested in helping with this effort?

scolapasta · 2018-06-15T14:09:11Z

@Richardcwynne I was hoping to catch you during one of the breaks, but can also leave my question / comment here.

Is there any difference between a dataset that is "embargoed indefinitely" versus just keeping the dataset unpublished?

The way I/we had been thinking is that embargo was a way to have Dataverse publish a dataset at a future time in an automated way. If it requires human (or external) interaction, wouldn't indefinite embargo just be the same as leaving the dataset unpublished? (and then having the researcher manually publish, rather than manually lift the embargo; or similarly, having the external system call existing publish API)

adam3smith · 2018-06-15T14:16:55Z

I actually disagree with @Richardcwynne about his interpretation on embargo:
Data associated with a publication remains unpublished until the article goes live. That's already possible to do manually, and building it into automation workflows with more publishers, CMS's would be cool. That's not an embargo the way I understand it (and the way I think this is commonly understood).

Embargoed data is a published dataset with active DOI where data files are restricted until the end of the embargo period. This way, the data are discoverable even when the data files aren't available yet.

@scolapasta we're still working on our roadmap for the next months -- given the current status, it's not a particularly high priority, but it's definitely still on there, yes.

Richardcwynne · 2018-06-15T14:42:18Z

@scolapasta - sorry I missed you (I cannot be there today). Agreed, an indefinite embargo and a dataset remaining unpublished indefinitely are functionally equivalent.

I don't have detailed knowledge of the DataVerse system, so the best I can offer is to describe the use case from a journal point of view. Typically the journal wants access to the dataset to be embargoed until the publication date of the related manuscript. (However deposit prior to manuscript publication is very beneficial since it generates a DOI that can be incorporated into the body of the yet-to-be-published manuscript).

To help distinguish the use case in the UI, maybe you just add an embargo option called something like "Embargo until manuscript publication". Even if this is functionally equivalent to indefinite embargo it might help reassure and communicate the workflow to the researcher.

Hope this helps. Drop me a note on LinkedIn if you'd like to set up a call to discuss in more detail.

TaniaSchlatter · 2018-06-20T18:01:33Z

After internal discussion and taking community input into account (thanks for the discussion!), we are defining embargo as the system publishing a draft dataset version on a date, and plan on moving forward with the following as an initial step:

Unpublished (draft) until a date (of publication)
DOI is reserved at the time of draft creation
We don’t plan to support the indefinite embargo now because it is handled by our existing publishing workflow
Draft can be created with or without files and still create the embargo (as now – no change)
The embargo ends, a version is published (as now – no change)

To do:

Define how the date of embargo end/state is displayed (banner at top is current convention)
Define and provide language and ability to change the embargo date

adam3smith · 2018-06-20T20:21:36Z

I don't quite understand how regular users would be able to see the embargoed data (landing page) if the embargo is implemented this way. Am I missing something?

In my understanding (and I'm pretty sure that's how this is commonly understood in the data community), embargoed data has visible metadata & landing pages and a published (not reserved) DOI with <date dateType="available"> pointing to the end date of the embargo

jggautier · 2018-06-20T20:32:24Z

Hi @adam3smith. Thanks for adding this! Could you include some reasons why it's important that the metadata is published (even though file publication is delayed)? I've read that it's helpful for the public to know that the dataset exists, even though the files won't be available. But @TaniaSchlatter, would some concrete examples about why help? Perhaps it would have an affect similar to pre-registration?

scolapasta · 2018-06-20T20:33:54Z

@adam3smith For your use case, the steps would be:

Create Dataset and restrict all files
Publish as v1.0 and the metadata and landing pages would be public.
Create a new draft and unrestrict the files
Embargo this version until the end date.
That version (either 1.1 or 2.0) is automatically published on that date and the files are now also public.

Thoughts?

adam3smith · 2018-06-20T20:43:20Z

Thanks @scolapasta -- I think that should work. Do I understand correctly that v. 1.0 then clearly indicate the existing embargo as part of @TaniaSchlatter 's to dos? Having the date the embargo runs out clearly visible to the public is crucial (see below).

@jggautier The most obvious case is replication data with a 2 years after publication first use In that case you would want to be able to see that replication data was deposited and CrossRef should be able to link to the data when the journal article is published. People would then be able to see that such data exists, roughly what it contains (i.e. metadata) and when they'll be able to access it.
Similar use cases apply where embargos are chosen for e.g. human participant concerns, e.g. until a given government leaves office.

mheppler · 2018-06-20T20:52:24Z

@adam3smith is there a use case for supporting "Request Access" to these restricted files in a dataset with the embargo set or would that feature be disabled?

RightInTwo · 2018-06-20T20:52:37Z

@scolapasta In that case it is also possible to publish some documentation files with v1.0, which is very nice. The only problem I see is that this workflow isn't explained through the GUI (right?)

adam3smith · 2018-06-20T20:56:11Z

@mheppler -- I don't know. I'd say typically no -- embargo is embargo, not conditional access -- but I'm sure there's some use case. I wouldn't think it'd be important enough to put much effort into fulfilling it, but I wouldn't rule it out if it's easy, I guess?

philippconzett · 2018-06-21T08:40:12Z

Sorry for my late reply. Embargo should definitely be implemented as a functionality for PUBLISHED datasets, cf. my original request: "In quite a few cases, researchers would like to deposit and publish metadata about their project data, but also be able to specify an embargo period ..."

philippconzett · 2018-10-06T13:49:16Z

I'm not sure how far the implementation of the embargo feature has come, but I'd like to add some more thoughts about the definition of metadata about embargoed datasets.

According to the current version of the DataCite Metadata Schema (4.1), the following applies for the DataCite metadata field PublicationYear:

If an embargo period has been in effect, use the date when the embargo period ends. In the case of datasets, "publish" is understood to mean making the data available on a specific date to the community of researchers.

@jggautier pointed out the following in the Dataverse Users Community Google Group:

But if the embargo release date is the day when the files become unrestricted, then why doesn’t Dataverse do this now? That is, why doesn’t Dataverse use the year in which the files become unrestricted as the publicationyear? It’s because when depositors hit publish, Dataverse has to send DataCite a publicationyear, and depositors have no way to indicate when the files will become unrestricted (until there’s an embargo feature).
For datasets where an embargo is set, if Dataverse sends the embargo release date to DataCite as the publicationyear, then in some cases the publicationyear that DataCite has will be different than the publication year in Dataverse’s dataset citation… unless the publication date in the citation changes to the embargo release date.

As far as DataCite is concerned, I think we could solve this problem using file level DOIs that are now available in Dataverse in the following way:

PublicationYear at dataset level should be based on the date when the dataset (that is the metadata record) was published, independent of whether some or all of its files are embargoed.
PublicationYear at file level should be based on the embargo release date of the file at stake, even if this implies a future year. If a future year at file level means that the metadata of this file won't be harvested by DataCite, I think this is OK, or even intended, as long as the dataset (metadata record) is harvested (resourceType for dataset files #5086 might be of interest here).

Best,
Philipp

pdurbin · 2019-08-06T17:44:16Z

I learned in #5920 that OpenAIRE has an "embargoedAccess" access level: https://guidelines.openaire.eu/en/latest/literature/field_accesslevel.html

The definition of embargoedAccess is the following from https://wiki.surfnet.nl/display/standards/info-eu-repo/#info-eu-repo-AccessRights

"This type of access means the resource is Closed Access, until released for Open Access on a certain date. (this date can be specified elsewhere in the metadata) The incentives to embargo a resource has often to do with the fairness towards publisher to provide them an advantage as the exclusive first deliverers of knowledge. Also embargoes are used in sections of a Ph.D. thesis that are concerned with classified corporate information."

tjouneau · 2019-12-11T11:38:50Z

Hi
Is there any update on this requested feature?
We're about to set up a benchmark on different repository solutions, including of course Dataverse. Embargoes are a feature which would be listed as "required" regarding the numerous datasets that would be concerned here.
Any developments?
Thanks!

Thomas

pdurbin · 2019-12-11T12:19:51Z

@tjouneau hi! This issue is linked from the "Future" section of our roadmap, which you can find at https://www.iq.harvard.edu/roadmap-dataverse-project

Here's a screenshot:

pdurbin · 2021-10-26T19:01:58Z

Embargo support has been implemented by pull request #8020 which was just merged.

philippconzett mentioned this issue Aug 9, 2017

Embargo Period for Releasing Datasets & Dataverses #343

Closed

djbrooke added the Status: Backlog label Aug 30, 2017

pdurbin mentioned this issue Dec 4, 2017

Ability to Set a Dataset Availability Schedule #934

Closed

pdurbin added the UX & UI: Design This issue needs input on the design of the UI and from the product owner label Jan 2, 2018

dlmurphy changed the title ~~Embargo function~~ Ability to set an embargo period on data Jan 11, 2018

dlmurphy changed the title ~~Ability to set an embargo period on data~~ Ability to set an embargo period for releasing data Jan 11, 2018

dlmurphy changed the title ~~Ability to set an embargo period for releasing data~~ Embargo: I want to set an embargo period to control when my data will be accessible. Jan 11, 2018

djbrooke added the ready for estimation label Aug 6, 2018

dlmurphy removed their assignment Aug 14, 2018

djbrooke removed the ready for estimation label Aug 15, 2018

jggautier added Feature: Publishing & Versions Feature: Metadata Feature: File Upload & Handling labels Aug 31, 2018

djbrooke added Design Team and removed Status: Backlog labels Sep 14, 2018

mheppler mentioned this issue Oct 11, 2018

Dataset - File UI Improvements #3404

Closed

pdurbin added Design Team and removed Design Team Feature: File Upload & Handling Feature: Metadata UX & UI: Design This issue needs input on the design of the UI and from the product owner labels Oct 13, 2018

djbrooke removed the Status: Design Team label May 22, 2019

TaniaSchlatter mentioned this issue Mar 30, 2021

Enable data depositors to set an embargo for their dataset #7743

Closed

3 tasks

djbrooke closed this as completed Oct 26, 2021

jggautier mentioned this issue Mar 8, 2022

Unresolved feedback from community review of Citation metadata fields #8467

Open

philippconzett added this to DataverseNO Jul 10, 2024

philippconzett moved this to Closed in DataverseNO Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embargo: I want to set an embargo period to control when my data will be accessible. #4052

Embargo: I want to set an embargo period to control when my data will be accessible. #4052

philippconzett commented Aug 9, 2017 •

edited by jggautier

Loading

mheppler commented Dec 2, 2017 •

edited

Loading

djbrooke commented Dec 3, 2017

philippconzett commented Dec 29, 2017

pdurbin commented Jan 2, 2018

nniiicc commented Jan 9, 2018

pdurbin commented Jan 9, 2018

djbrooke commented Jan 9, 2018

djbrooke commented Jan 16, 2018

eugene-barsky commented Apr 13, 2018

Richardcwynne commented Jun 15, 2018

pdurbin commented Jun 15, 2018

scolapasta commented Jun 15, 2018

adam3smith commented Jun 15, 2018

Richardcwynne commented Jun 15, 2018

TaniaSchlatter commented Jun 20, 2018 •

edited by scolapasta

Loading

adam3smith commented Jun 20, 2018

jggautier commented Jun 20, 2018 •

edited

Loading

scolapasta commented Jun 20, 2018 •

edited

Loading

adam3smith commented Jun 20, 2018

mheppler commented Jun 20, 2018

RightInTwo commented Jun 20, 2018

adam3smith commented Jun 20, 2018

philippconzett commented Jun 21, 2018

philippconzett commented Oct 6, 2018

pdurbin commented Aug 6, 2019

tjouneau commented Dec 11, 2019

pdurbin commented Dec 11, 2019

pdurbin commented Oct 26, 2021

Embargo: I want to set an embargo period to control when my data will be accessible. #4052

Embargo: I want to set an embargo period to control when my data will be accessible. #4052

Comments

philippconzett commented Aug 9, 2017 • edited by jggautier Loading

mheppler commented Dec 2, 2017 • edited Loading

djbrooke commented Dec 3, 2017

philippconzett commented Dec 29, 2017

pdurbin commented Jan 2, 2018

nniiicc commented Jan 9, 2018

pdurbin commented Jan 9, 2018

djbrooke commented Jan 9, 2018

djbrooke commented Jan 16, 2018

eugene-barsky commented Apr 13, 2018

Richardcwynne commented Jun 15, 2018

pdurbin commented Jun 15, 2018

scolapasta commented Jun 15, 2018

adam3smith commented Jun 15, 2018

Richardcwynne commented Jun 15, 2018

TaniaSchlatter commented Jun 20, 2018 • edited by scolapasta Loading

adam3smith commented Jun 20, 2018

jggautier commented Jun 20, 2018 • edited Loading

scolapasta commented Jun 20, 2018 • edited Loading

adam3smith commented Jun 20, 2018

mheppler commented Jun 20, 2018

RightInTwo commented Jun 20, 2018

adam3smith commented Jun 20, 2018

philippconzett commented Jun 21, 2018

philippconzett commented Oct 6, 2018

pdurbin commented Aug 6, 2019

tjouneau commented Dec 11, 2019

pdurbin commented Dec 11, 2019

pdurbin commented Oct 26, 2021

philippconzett commented Aug 9, 2017 •

edited by jggautier

Loading

mheppler commented Dec 2, 2017 •

edited

Loading

TaniaSchlatter commented Jun 20, 2018 •

edited by scolapasta

Loading

jggautier commented Jun 20, 2018 •

edited

Loading

scolapasta commented Jun 20, 2018 •

edited

Loading