Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embargo: I want to set an embargo period to control when my data will be accessible. #4052

Closed
philippconzett opened this issue Aug 9, 2017 · 28 comments

Comments

@philippconzett
Copy link
Contributor

philippconzett commented Aug 9, 2017

This issue is connected to/replacing issue #343.
This issue is also discussed in this Google Group posting: https://groups.google.com/forum/#!topic/dataverse-community/tw3qLvS6POo.

I would like to suggest an embargo function to be integrated in Dataverse.

In quite a few cases, researchers would like to deposit and publish metadata about their project data, but also be able to specify an embargo period, i.e. a period of time during which the data are not openly available. Thus, there should be a field "Embargo date", where one can enter the date when the data are going to be released/unrestricted.

I'm not sure whether the embargo function should be available both at dataset level and individual file level. If so, the function should also ensure that in case of conflict (= wrong setting), there should be a warning message. Alternatively, embargo can only be specified at file level, and the embargo information at dataset level reflects the embargo setting(s) at file level.

It would also be useful if the system could send a message to the depositor a couple of days before the embargo date to remind him/her about it.

As for the access information in the metadata, it should be compliant with the OpenAIRE Guidelines; discussed in this Google Group thread and in this Github issue. On file level, the choice between the different labels should be straightforward (closedAccess, embargoedAccess, restrictedAccess, openAccess). But I'm not sure how the labels apply on dataset level in case different restrictions are chosen for different files. If, e.g., one file is embargoed all others are open, probably the access information on dataset level should reflect the most permissive access setting, i.e. openAccess?

When the embargo period is passed and the data is released, the access information has to be updated accordingly.

As workaround until an embargo function is in place, we are considering to use the metadata field Distribution Date (= Date that the work was made for distribution/presentation) for specifying the embargo date. We could then run a daily script to check whether any distribution date matches the current date. If yes, the curator would then unrestrict the files. (Alternatively, we could make the script also run the unrestriction). Does this sound like a feasible solution?

Best,
Philipp

@mheppler
Copy link
Contributor

mheppler commented Dec 2, 2017

This is related to/duplicate of Ability to Set a Dataset Availability Schedule #934.

@djbrooke
Copy link
Contributor

djbrooke commented Dec 3, 2017

@aschenbach - I closed the previous issue you commented on (#934) in favor of this.

@philippconzett
Copy link
Contributor Author

I'd like to draw your attention to the article Overly Honest Data Repository Development. Under the heading Delayed Release, a.k.a. Embargoes there is an interesting discussion on how embargo is handled in the Illinois Data Bank.

@pdurbin
Copy link
Member

pdurbin commented Jan 2, 2018

@philippconzett this a fantastic resource. Thanks! I'm glad to see they did some user research before hacking on the code. 😄

@pdurbin pdurbin added the UX & UI: Design This issue needs input on the design of the UI and from the product owner label Jan 2, 2018
@nniiicc
Copy link

nniiicc commented Jan 9, 2018

@pdurbin We're considering putting embargo functionally on our development roadmap... it is something QDR really wants to implement in the next year ... Are you all still working on this feature? ... If not, we will do some scoping on our end and get in touch.

@pdurbin
Copy link
Member

pdurbin commented Jan 9, 2018

@nniiicc hi! I'm not aware of any immediate effort we're planning and your comment is enough for me to tag you in the "Dev Efforts by the Dataverse Community" spreadsheet I mentioned at https://groups.google.com/d/msg/dataverse-community/X2diSWYll0w/ikp1TGcfBgAJ . For now I made the status "scoping the project" and I'm sure that people would be interested in whatever you write up!

@djbrooke
Copy link
Contributor

djbrooke commented Jan 9, 2018

@nniiicc - it's awesome to hear that QDR may take this on. We're happy to work with you and other community members if you'd like to lead on this.

@dlmurphy dlmurphy changed the title Embargo function Ability to set an embargo period on data Jan 11, 2018
@dlmurphy dlmurphy changed the title Ability to set an embargo period on data Ability to set an embargo period for releasing data Jan 11, 2018
@dlmurphy dlmurphy changed the title Ability to set an embargo period for releasing data Embargo: I want to set an embargo period to control when my data will be accessible. Jan 11, 2018
@djbrooke
Copy link
Contributor

When we're approaching this, we should consider three user stories:

  • As a researcher, I want to automatically make my dataset available when my book is published so that I don't need to remember to go into Dataverse on publication day.
  • As a publisher, I want to set when the data for a journal article will be available so that it can be part of my curation workflow.
  • As a licensed data holder, I want data that should no longer be available due to an expiring license to have its access restricted automatically so that I am not in violation of the terms.

Also to consider: when the data is no longer available, what happens? Deaccessioning? Or are we introducing a new concept? Maybe the first two would make sense to handle first as they are more similar.

@eugene-barsky
Copy link

+1 for embargo feature request.

@ubc, we have been requested many times to embargo specific datasets, it would be brilliant to allow a feature to do just that!

@Richardcwynne
Copy link

The most typical use-case for dataset embargo is where the embargo is lifted only when the related research article is published. The author and their chosen journal don’t want to make the data public until the manuscript is published.

However, since the publication date is not known when the data are deposited, setting a specific embargo date is not possible. (Peer review takes a variable length of time, and there is no guarantee of journal acceptance).

The simplest and most pragmatic way to handle this would be to allow the author to manually lift the embargo once they are notified of the publication date by the journal.

A more automated solution would be to support an API call from publisher workflow systems (such as Editorial Manager www.ariessys.com ) to automatically lift the embargo.

@pdurbin
Copy link
Member

pdurbin commented Jun 15, 2018

@Richardcwynne good points and thanks for contributing to the discussion on this during this week's Dataverse Community Meeting. It looks like there were some light notes taken in the "UX/UI Collaborative Session Notes" doc at https://docs.google.com/document/d/1JlcwuQ8WIiVzG5n9ejfIlbZFX_oekvvWU7vkhtITCqc/edit?usp=sharing (people should feel free to add more!):

  • Embargo - under “Edit” option on Dataset page
    • Set embargo “expiration date” for when data becomes available
      • Feedback: Often not a specific date. Might depend on when a paper is published and that might be unknown. Suggest a duration instead.

@nniiicc are you still interested in helping with this effort?

@scolapasta
Copy link
Contributor

@Richardcwynne I was hoping to catch you during one of the breaks, but can also leave my question / comment here.

Is there any difference between a dataset that is "embargoed indefinitely" versus just keeping the dataset unpublished?

The way I/we had been thinking is that embargo was a way to have Dataverse publish a dataset at a future time in an automated way. If it requires human (or external) interaction, wouldn't indefinite embargo just be the same as leaving the dataset unpublished? (and then having the researcher manually publish, rather than manually lift the embargo; or similarly, having the external system call existing publish API)

@adam3smith
Copy link
Contributor

I actually disagree with @Richardcwynne about his interpretation on embargo:
Data associated with a publication remains unpublished until the article goes live. That's already possible to do manually, and building it into automation workflows with more publishers, CMS's would be cool. That's not an embargo the way I understand it (and the way I think this is commonly understood).

Embargoed data is a published dataset with active DOI where data files are restricted until the end of the embargo period. This way, the data are discoverable even when the data files aren't available yet.

@scolapasta we're still working on our roadmap for the next months -- given the current status, it's not a particularly high priority, but it's definitely still on there, yes.

@Richardcwynne
Copy link

@scolapasta - sorry I missed you (I cannot be there today). Agreed, an indefinite embargo and a dataset remaining unpublished indefinitely are functionally equivalent.

I don't have detailed knowledge of the DataVerse system, so the best I can offer is to describe the use case from a journal point of view. Typically the journal wants access to the dataset to be embargoed until the publication date of the related manuscript. (However deposit prior to manuscript publication is very beneficial since it generates a DOI that can be incorporated into the body of the yet-to-be-published manuscript).

To help distinguish the use case in the UI, maybe you just add an embargo option called something like "Embargo until manuscript publication". Even if this is functionally equivalent to indefinite embargo it might help reassure and communicate the workflow to the researcher.

Hope this helps. Drop me a note on LinkedIn if you'd like to set up a call to discuss in more detail.

@TaniaSchlatter
Copy link
Member

TaniaSchlatter commented Jun 20, 2018

After internal discussion and taking community input into account (thanks for the discussion!), we are defining embargo as the system publishing a draft dataset version on a date, and plan on moving forward with the following as an initial step:

  • Unpublished (draft) until a date (of publication)
  • DOI is reserved at the time of draft creation
  • We don’t plan to support the indefinite embargo now because it is handled by our existing publishing workflow
  • Draft can be created with or without files and still create the embargo (as now – no change)
  • The embargo ends, a version is published (as now – no change)

To do:

  • Define how the date of embargo end/state is displayed (banner at top is current convention)
  • Define and provide language and ability to change the embargo date

@adam3smith
Copy link
Contributor

I don't quite understand how regular users would be able to see the embargoed data (landing page) if the embargo is implemented this way. Am I missing something?

In my understanding (and I'm pretty sure that's how this is commonly understood in the data community), embargoed data has visible metadata & landing pages and a published (not reserved) DOI with <date dateType="available"> pointing to the end date of the embargo

@jggautier
Copy link
Contributor

jggautier commented Jun 20, 2018

Hi @adam3smith. Thanks for adding this! Could you include some reasons why it's important that the metadata is published (even though file publication is delayed)? I've read that it's helpful for the public to know that the dataset exists, even though the files won't be available. But @TaniaSchlatter, would some concrete examples about why help? Perhaps it would have an affect similar to pre-registration?

@scolapasta
Copy link
Contributor

scolapasta commented Jun 20, 2018

@adam3smith For your use case, the steps would be:

  1. Create Dataset and restrict all files
  2. Publish as v1.0 and the metadata and landing pages would be public.
  3. Create a new draft and unrestrict the files
  4. Embargo this version until the end date.
  5. That version (either 1.1 or 2.0) is automatically published on that date and the files are now also public.

Thoughts?

@adam3smith
Copy link
Contributor

Thanks @scolapasta -- I think that should work. Do I understand correctly that v. 1.0 then clearly indicate the existing embargo as part of @TaniaSchlatter 's to dos? Having the date the embargo runs out clearly visible to the public is crucial (see below).

@jggautier The most obvious case is replication data with a 2 years after publication first use In that case you would want to be able to see that replication data was deposited and CrossRef should be able to link to the data when the journal article is published. People would then be able to see that such data exists, roughly what it contains (i.e. metadata) and when they'll be able to access it.
Similar use cases apply where embargos are chosen for e.g. human participant concerns, e.g. until a given government leaves office.

@mheppler
Copy link
Contributor

@adam3smith is there a use case for supporting "Request Access" to these restricted files in a dataset with the embargo set or would that feature be disabled?

@RightInTwo
Copy link
Contributor

@scolapasta In that case it is also possible to publish some documentation files with v1.0, which is very nice. The only problem I see is that this workflow isn't explained through the GUI (right?)

@adam3smith
Copy link
Contributor

@mheppler -- I don't know. I'd say typically no -- embargo is embargo, not conditional access -- but I'm sure there's some use case. I wouldn't think it'd be important enough to put much effort into fulfilling it, but I wouldn't rule it out if it's easy, I guess?

@philippconzett
Copy link
Contributor Author

Sorry for my late reply. Embargo should definitely be implemented as a functionality for PUBLISHED datasets, cf. my original request: "In quite a few cases, researchers would like to deposit and publish metadata about their project data, but also be able to specify an embargo period ..."

@philippconzett
Copy link
Contributor Author

I'm not sure how far the implementation of the embargo feature has come, but I'd like to add some more thoughts about the definition of metadata about embargoed datasets.

According to the current version of the DataCite Metadata Schema (4.1), the following applies for the DataCite metadata field PublicationYear:

If an embargo period has been in effect, use the date when the embargo period ends. In the case of datasets, "publish" is understood to mean making the data available on a specific date to the community of researchers.

@jggautier pointed out the following in the Dataverse Users Community Google Group:

But if the embargo release date is the day when the files become unrestricted, then why doesn’t Dataverse do this now? That is, why doesn’t Dataverse use the year in which the files become unrestricted as the publicationyear? It’s because when depositors hit publish, Dataverse has to send DataCite a publicationyear, and depositors have no way to indicate when the files will become unrestricted (until there’s an embargo feature).
For datasets where an embargo is set, if Dataverse sends the embargo release date to DataCite as the publicationyear, then in some cases the publicationyear that DataCite has will be different than the publication year in Dataverse’s dataset citation… unless the publication date in the citation changes to the embargo release date.

As far as DataCite is concerned, I think we could solve this problem using file level DOIs that are now available in Dataverse in the following way:

  • PublicationYear at dataset level should be based on the date when the dataset (that is the metadata record) was published, independent of whether some or all of its files are embargoed.
  • PublicationYear at file level should be based on the embargo release date of the file at stake, even if this implies a future year. If a future year at file level means that the metadata of this file won't be harvested by DataCite, I think this is OK, or even intended, as long as the dataset (metadata record) is harvested (resourceType for dataset files #5086 might be of interest here).

Best,
Philipp

@pdurbin
Copy link
Member

pdurbin commented Aug 6, 2019

I learned in #5920 that OpenAIRE has an "embargoedAccess" access level: https://guidelines.openaire.eu/en/latest/literature/field_accesslevel.html

The definition of embargoedAccess is the following from https://wiki.surfnet.nl/display/standards/info-eu-repo/#info-eu-repo-AccessRights

"This type of access means the resource is Closed Access, until released for Open Access on a certain date. (this date can be specified elsewhere in the metadata) The incentives to embargo a resource has often to do with the fairness towards publisher to provide them an advantage as the exclusive first deliverers of knowledge. Also embargoes are used in sections of a Ph.D. thesis that are concerned with classified corporate information."

@tjouneau
Copy link

Hi
Is there any update on this requested feature?
We're about to set up a benchmark on different repository solutions, including of course Dataverse. Embargoes are a feature which would be listed as "required" regarding the numerous datasets that would be concerned here.
Any developments?
Thanks!

Thomas

@pdurbin
Copy link
Member

pdurbin commented Dec 11, 2019

@tjouneau hi! This issue is linked from the "Future" section of our roadmap, which you can find at https://www.iq.harvard.edu/roadmap-dataverse-project

Here's a screenshot:

Screen Shot 2019-12-11 at 7 18 41 AM

@pdurbin
Copy link
Member

pdurbin commented Oct 26, 2021

Embargo support has been implemented by pull request #8020 which was just merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Closed
Development

No branches or pull requests