You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 10, 2022. It is now read-only.
While single command to read-in data is appealing, it is also true that (a) the form of data may vary a great deal, and (b) people often want to have the downloaded file on-hand for other reasons. read_csv(doidata()) or doidata() %>% read_csv() are still fairly minimal and intuitive. Drawing ideas from fulltext::ft_get_si(), here's a scheme for default behavior that I think is still intuitive and drives users towards best practice in maintaining data provenance and credit while having access to downloaded files:
doidata("doi/filename") downloads data and always returns a file path to the downloaded data
default download location (destfile) is WORKDIR/data/dai_10.123_figshare123/filename. People like to inspect data and use it for other purposes than a single script, so hiding it away in some cache doesn't make sense.
doidata maintains an internal database of DOIs, file paths, and file hashes. If there is already a file at the location with the same name, it checks against the hash and just returns the path if they match.
If the hashes do not match, it returns an error unless overwrite=TRUE
The returned path has attributes with citation information, version information, used to print an informative message with a citation if verbose=TRUE (default)
The internal database also keeps the citation/version/origin information, so doidata_cite(file) can return the citation/version information of a file previously downloaded by checking the file hash. doidata_cite(url), doidata_cite(doi), or doidata_cite(doi/filename) all work, too, and work offline for previously-downloaded data.
doidata_url("doi/filename") returns the download url for the data
Another question is what appropriate behavior should be for versioned data. For instance, Zenodo and Figshare have DOIs that always points to the latest version of data, and separate DOIs for each version. One possibility:
When a latest-version DOI is provided, print a message/warning the includes the fixed-version DOI and suggests user used versioned DOI for reproducibility.
Store the fixed-version DOI in the internal database
When the latest version DOI no longer matches the fixed version in the internal database, doidata() will error, unless overwrite=TRUE.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
While single command to read-in data is appealing, it is also true that (a) the form of data may vary a great deal, and (b) people often want to have the downloaded file on-hand for other reasons.
read_csv(doidata())
ordoidata() %>% read_csv()
are still fairly minimal and intuitive. Drawing ideas fromfulltext::ft_get_si()
, here's a scheme for default behavior that I think is still intuitive and drives users towards best practice in maintaining data provenance and credit while having access to downloaded files:doidata("doi/filename")
downloads data and always returns a file path to the downloaded datadestfile
) isWORKDIR/data/dai_10.123_figshare123/filename
. People like to inspect data and use it for other purposes than a single script, so hiding it away in some cache doesn't make sense.overwrite=TRUE
verbose=TRUE
(default)doidata_cite(file)
can return the citation/version information of a file previously downloaded by checking the file hash.doidata_cite(url)
,doidata_cite(doi)
, ordoidata_cite(
doi/filename)
all work, too, and work offline for previously-downloaded data.doidata_url("doi/filename")
returns the download url for the dataAnother question is what appropriate behavior should be for versioned data. For instance, Zenodo and Figshare have DOIs that always points to the latest version of data, and separate DOIs for each version. One possibility:
doidata()
will error, unlessoverwrite=TRUE
.The text was updated successfully, but these errors were encountered: