-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DataCite API #28
Add DataCite API #28
Conversation
I don't think it is actually possible to get a download URL out of datacite. Take a look at datacite/freya#2 Right now I think our go is to provide 95% of the registration block, Editting the generated code is already part of our normal usage anyway, as they likely want to change the datadep name and probably edit the message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really cool.
Do you think we should check the schema at the top of the file?
I'm not sure.
I think maybe we should check it and issue a warning.
this is for v4,
it would work for v3 (which fig share used), mostly, but not perfectly.
So maybe issuing a warning if not match would be the right thing to do.
Since it will make future debugging easier.
I am excited to get zenodo support;
Supporting and more generally support for anything with a DOI issues by DataCite (and since DataCite is one of the largest DOI issuers, particularly for Data that is great)
Though as DataCite can't do URLs this becomes a API of last resort.
Still pretty exciting.
Sadly it doesn't work for Figshare
Try e.g 10.1371/journal.pone.0047999.t004
Figshare does support a slightly older version of the DataCite scheme (e.g. https://figshare.com/articles/225779/1/citations/datacite)
but it's DOI's do not hit the anything on the api of datacite, so I guess it uses a different issuer.
I think making figshare work (along with or prior to making DataVerse work) is the next stage.
Maybe OAI-PMH will help us for that. Maybe not.
src/DataCite.jl
Outdated
|
||
function mainpage_url(repo::DataCite, dataname) | ||
try | ||
identifier = match(r"\b(10[.][0-9]{4,}(?:[.][0-9]+)*\/(?:(?![\"&\'<>])\S)+)\b", dataname).match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing this is the regex for as DOI?
Do we use that somewhere else too? (DataDryad?)
I think it would be good to make a function match_doi
or similar that does this line
for code understandability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a regex for DOI. I included this to give flexibility to the user to put any url/text with DOI as DataCite supports them all. We use a regex in DataDryad, but that is very specific for a DOI for DataDryad only. Should I still make it into one function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so yes. I suspect infastructurally DataDryad is not bound to always use DOI numbers with their prefix (I think they probably always will, but maybe not),
and if someone has said it is a DataDryad repo, and given a nondatadryad DOI then that is kinda on them.
We should at some point check the "Unhappy Path" when people do that, and give good error messages. Right now I don't think we do so anyway, even with distinguishing Dryad DOIs from DataCite DOIs.
In anycase, see my point below that maybe this method can be deleted anyway once you have base_url
?
src/DataCite.jl
Outdated
author = format_authors(authors) | ||
license = attributes["license"] | ||
date = attributes["published"] | ||
paper = format_papers(authors, date, attributes["title"] * " [Data set]. " * attributes["container-title"] * ".", mainpage["id"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break this into two lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to break this in two
src/DataCite.jl
Outdated
date = attributes["published"] | ||
paper = format_papers(authors, date, attributes["title"] * " [Data set]. " * attributes["container-title"] * ".", mainpage["id"]) | ||
|
||
final = escape_multiline_string(""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to assign this to a variable
src/DataCite.jl
Outdated
end | ||
|
||
function website(repo::DataCite, mainpage_url) | ||
replace(mainpage_url, "https://api.datacite.org/works/", "https://doi.org/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to do
base_url(::DataCite) = "https://api.datacite.org/works/"
And use that both in website
and mainpage_url
.
Though don't we have a fallback for mainpage_url
that basically does what your are doing below,
minus the validation that it is a DOI.
And doesn't that fallback use base_url
?
In which case we can define base_url
then remove the definition of mainpage_url
entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, what about the validation then? Should I remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so yes.
I'm not sure if it is actually useful right now.
Is it?
What kind of mistakes is it catching I guess is the question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I implemented it so that we can provide anything, maybe the search url, or datacite api url, or just the DOI. In any case, it should be able to work.
Pierre Laurent, Florent Mouillot, Chao Yue, Maria Vanesa Moreno Dominguez, Philippe Ciais, Joana M.P. Nogueira (2018). List of fire patch properties computed and associated NetCDF maps from the MCD64A1 Collection 6 (2000-2016) and the MERIS fire_cci v4.1 (2005-2011) BA products [Data set]. OSU OREME. https://doi.org/10.15148/0e999ffc-e220-41ac-ac85-76e92ecd0320 | ||
if you use this in your research. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add at the top level generate function for all datadeps a strip
to remove whitespace from ends of description
src/DataCite.jl
Outdated
author = format_authors(authors) | ||
license = attributes["license"] | ||
date = attributes["published"] | ||
paper = format_papers(authors, date, attributes["title"] * " [Data set]. " * attributes["container-title"] * ".", mainpage["id"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anytime we have a DOI and want a citation we should do it via
method discussed in #22 (comment)
This is fine for now,
but maybe after this PR make another one that does through and replaces all paper formatting with something based on that kind of idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the reason I introduced the format_papers()
method. I subsequently can add whatever format is required.
Hmm And the following works: What does not is: Which is associated with the doi: 10.1371/journal.pone.0047999.t004 So I am guessing figshare rehosted that existing data, with its existing DOI. CrossRef issued that DOI: I don't think we can content negotiate anything better It might be nice to support DOIs in general via the content-negotiation method. |
So writing down whatever I understood and plan to implement based on your points. Please correct me if I am wrong:
I faced an issue though, I tried doing content negotiation as described above. But unfortunately the content-negotiation results which came for DataCite didn't contain the |
Good idea checking. I seem to have mislead you. The goal of this PR #28 is to add DataCite support, it has done that successfully (well no URL, I suspected that wasn't going to be possible). #29 may or may not be the best next issue to pursue after this one. BTW: cross-negotiate isn't a term that I am familiar with. I think you mean content negotiate |
For This PR. Something I think I missed in the code-review before: it should displace some kind of Looks like the test failure are something to do with the Github generator breaking. |
Its good that I asked before implementing it in the PR, saved some work of removing it. |
Need to merge #31 before this. |
src/DataDepsGenerators.jl
Outdated
@@ -114,6 +114,11 @@ function format_papers(authors::Vector, year::String, name::String, link::String | |||
join(authors, ", ") * " ($year). " * name * " " * link | |||
end | |||
|
|||
function check_dois(uri::String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doi
singlular, as it is only one.
and check
isn't right.
Check implies that this returns true
if something is a DOI, and false if not.
extract
, try_extract
or match
(since this corresponds to the regex), are better.
Maybe get_match
src/DataCite.jl
Outdated
@@ -38,13 +40,13 @@ function data_fullname(::DataCite, mainpage) | |||
end | |||
|
|||
function website(repo::DataCite, mainpage_url) | |||
replace(mainpage_url, "https://api.datacite.org/works/", "https://doi.org/") | |||
replace(mainpage_url, base_url(repo), "https://doi.org/") | |||
end | |||
|
|||
function mainpage_url(repo::DataCite, dataname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own future reference,
this is infact different in functionality to
https://github.com/oxinabox/DataDepsGenerators.jl/blob/master/src/DataDepsGenerators.jl#L113-L122
This one takes anything that countains a DOI, e.g. some other URL, and gets the DOI from it.
Where are that one only works if the correct full URL, or just a DOI is passed.
@oxinabox Ready to merged if you don't have any reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some things brought up earlier got forgotten in data discussions and in sorting out bugs.
Just small things so should be trivial to fix
src/DataCite.jl
Outdated
author = format_authors(authors) | ||
license = attributes["license"] | ||
date = attributes["published"] | ||
paper = format_papers(authors, date, attributes["title"] * " [Data set]. " * attributes["container-title"] * ".", mainpage["id"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to break this in two
src/DataCite.jl
Outdated
end | ||
|
||
function get_urls(repo::DataCite, page) | ||
urls = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Produce a info
here saying something about DataCite API does not support URL, URL must be added after manually.
And add a dummy output like "URL HERE", maybe
src/DataCite.jl
Outdated
Please cite this paper: | ||
$(paper) | ||
if you use this in your research. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete blank line
Codecov Report
@@ Coverage Diff @@
## master #28 +/- ##
==========================================
+ Coverage 93.93% 94.09% +0.15%
==========================================
Files 13 14 +1
Lines 231 254 +23
==========================================
+ Hits 217 239 +22
- Misses 14 15 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tests pass then all good.
Nice work
Integration Tests to be added after your first review. Secondly, I am not able to get the urls. Do you have an idea how to get it? There are some hints regarding
resource-type
etc.