Skip to content
This repository has been archived by the owner on May 10, 2022. It is now read-only.

Should we include data from journal supplementary files? #2

Open
noamross opened this issue Jan 3, 2018 · 6 comments
Open

Should we include data from journal supplementary files? #2

noamross opened this issue Jan 3, 2018 · 6 comments

Comments

@noamross
Copy link
Collaborator

noamross commented Jan 3, 2018

@sckott mentioned this in datacite/freya#2

Pros:

  • A lot of data is stored this way. It greatly expands the range of data that would be made available to users and the degree to which the package could improve data linkage

Cons:

  • We may want to discourage using supplementary data in journals to store data. OTOH we really don't have much influence through this tool.
  • It would be a lot harder to do this client-side than it would be for data repositories. Repositories are limited in number, so client-side mapping of DOI to resource would only require so much custom coding. For most of the we can identify the repository, and thus the mapping, from the DOI. There are many more journals, and journals themselves aren't the relevant unit - we need to understand how DOI --> file URL maps for each publisher's platform.
@sckott
Copy link

sckott commented Jan 3, 2018

(p.s. fulltext has https://github.com/ropensci/fulltext/#supplementary-materials via Will Pearse - but an argument can be made to pull that functionality out of the pkg into another [here or elsewhere])

@sckott
Copy link

sckott commented Jan 3, 2018

journals themselves aren't the relevant unit

I'd think DOI prefix owners (often == publisher) are the relevant units

@mfenner
Copy link

mfenner commented Jan 3, 2018

Figshare is hosting many (> 100k) supplementary files for publishers, so there are a lot of DataCite DOIs and metadata available for them. To take one example from today: https://doi.org/10.6084/m9.figshare.5752965.v1 is the DOI for a supplementary file to https://doi.org/10.1159/000485227 (a Karger prefix, DOI not live yet).

@charliejhadley
Copy link

Hello folks!

I've got a comment about the weirdness of publishers who use "Figshare for publishers" like PLOS ONE.

Take this article for instance: https://doi.org/10.1371/journal.pone.0198684

  1. Query the Figshare API for the collection ID: (4126502)

https://api.figshare.com/v2/collections?doi=10.1371%2Fjournal.pone.0198684

  1. Return all assets from the collection:

https://api.figshare.com/v2/collections/4126502

These assets include the actual paper itself, and all figures and tables included in the paper. This is tremendously useful!

BUT

This does not return the "supporting information" file https://doi.org/10.1371/journal.pone.0198684.s001

Summary

As a user of the doidata package, I would appreciate a method for accessing ALL of these assets from a paper when the publisher uses Figshare behind the scenes.

@nuest
Copy link

nuest commented Feb 25, 2019

@martinjhnhadley Do you know the package suppdata?

The suggestion by @sckott (#2 (comment)) is realised in that package, i.e. the DOI-based download from the fullext package is it's own package now: https://github.com/ropensci/suppdata

We're planning to have a hackathon as part of the Mozilla Global Sprint (ropensci/suppdata#35) around the suppdata package. Maybe that is a good occasion to revive doidata ?

@charliejhadley
Copy link

Thanks @nuest! I wasn't aware of the suppdata package, it looks like this will definitely solve some of my requirements.
I'm not sure if my skills are developed enough to usefully participate in the devel of doidata, but I'll register and give it a go.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants