Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Crossref as a DOI provider #8581

Closed
JacekChudzik opened this issue Apr 6, 2022 · 17 comments · Fixed by #10235 or #10806
Closed

Adding Crossref as a DOI provider #8581

JacekChudzik opened this issue Apr 6, 2022 · 17 comments · Fixed by #10235 or #10806
Labels
Feature: DOI & Handle Type: Suggestion an idea User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Milestone

Comments

@JacekChudzik
Copy link

Hello,

I would like you to consider adding Crossref as a DOI provider in dataverse.

We are trying to set a Dataverse for several local institutions and some of them already got their DOI from Crossref and are not interested in purchasing another DOI pool.

Is it possible tfor Dataverse to gain Crossref DOI's support? If so what might be the time period needed to develop such issue?

Best regards,
Jacek

@pdurbin
Copy link
Member

pdurbin commented Apr 15, 2022

@JacekChudzik thanks for participating in the thread ( https://groups.google.com/g/dataverse-community/c/WbqNVz7m4Ts/m/GMxzUoJZAwAJ ) and answering my call to create an issue!

So far what we have is some code that Patrick Vranckx shared on that thread which I'll upload here as well, since it's small (I renamed it to .tar.gz so GitHub would let me upload it): doi.tar.gz

Here are the contents of the code:

  • doi.py
  • doi.service
  • templates/crossref.j2
  • var/doi.json
  • var/doi.log

And here's how it works (from the Python script):

This script requests all PUBLISHED datasets to the dataverse instance.
(Doi's associated to draft datasets are not taken into account)
Logs are written in var/doi.log. 

The script is started by systemd and runs as user 'doi' (see 'doi.service' file).

The script stores all the requested doi's in var/doi.json
and loads them in dois{}. 

The main loop runs as follow:

   Step 1. Requests all the published datasets using 
           the dataverse API and iterates over them.
   Step 2. If the DOI associated to the dataset in not
           found in dois{}, a new doi is requested
           to Crossref. The state of new doi stored in dois{}
           is set to 'requested'. Verification will be done 
           in the next iteration of the main loop.
           A copy of the XML request file is stored in 
           var/requests
   Step 3. If the DOI already exists in dois{} with the state
           'requested', it is verified against handle.net.
           If the result points to the dataverse website, 
           the state is set to 'verified'
  
    When all the published datasets are treated, it sleeps 10 minutes.

And here are Patrick's instructions (from https://groups.google.com/g/dataverse-community/c/WbqNVz7m4Ts/m/J1wE1zV0AwAJ ):

You have to bypass the datacite DOI registration by using the "FAKE" doi provider as explained at https://guides.dataverse.org/en/latest/installation/config.html#doiprovider

We are using Centos7 as server OS.

To run the daemon, you need to install the crossref python module (pip3 install crossrefapi).

Of course you need to modify your information and credentials in the doi.py code (# global variables). The way the daemon registers the doi's is explained in the source code.

According to the doi.service file, it will run under the user 'doi'. Change the ownership of all the files to that user.

Are you interested in trying this out? Or are you asking for more official support for Crossref as a DOI provider? This would involve someone from the Dataverse community writing Java code as when support for DataCite (PR #2964) or Handle (PR #3826) was added.

@konradperlowski
Copy link
Contributor

Hi @pdurbin, speaking on behalf of @JacekChudzik we are interested in more official solution for Crossref.
Can we count on receiving such a solution, or any help in trying to create something like this by ourselves?

@pdurbin
Copy link
Member

pdurbin commented Apr 27, 2022

@konradperlowski we love helping contributors get their Dataverse development environments set up so they can start writing code and making pull requests! Would you, @JacekChudzik , or someone else be the developer in this case? It probably makes sense to make a smaller pull request first, maybe for a small bug, before moving on Crossref. I'd be happy to suggest a bug to work on if you like. That way, the developer can get to know our process and get a sense of what it's like to do development on Dataverse.

@qqmyers
Copy link
Member

qqmyers commented Apr 27, 2022

FWIW: With support from DANS, I'm currently (and a bit slowly) looking into refactoring the global ID handling in Dataverse which would make it easier to add CrossRef. Right now, most of what you need to do is create a class overriding the methods in AbstractGlobalIdServiceBean, but there are other places in the code where there is global id related functionality or where the code determines which identifier type you're using - I'm hoping to refactor that so you'll only have to implement the one class. I'll try to link here as that work progresses.

@konradperlowski
Copy link
Contributor

@pdurbin I could do a smaller pull request first, do you have any small and quick to fix issue that I can work on? We want to add cross-section functionality rather sooner than later, so I do not want to spend much time on it. Maybe in the meantime @qqmyers will finish refactoring global ID, by the way how much time do you think you need to do such thing?

@pdurbin
Copy link
Member

pdurbin commented Apr 27, 2022

@konradperlowski please let me know what you think of this one:

It should just be a one line change to a properties file but to test it you'll need to get Dataverse running. Also, a screenshot in your pull request would be appreciated.

If you'd rather do an non-code pull request, this one is about adding a table of contents directive to one of the pages in the dev guide:

Or I can keep looking for a suitable small issue. This was just a quick search. 😄

If you'd like to chat in real time, https://chat.dataverse.org is a good place.

Awesome news that @qqmyers is refactoring those classes! They need it. 😄

@pdurbin
Copy link
Member

pdurbin commented Apr 29, 2022

@konradperlowski thanks for PR #8666! Merged!

At this point, for the Crossref work, it sounds like you and @qqmyers should coordinate on who is in the code when.

Meanwhile, if there are any other open issues you'd like to work on, please let us know! Have a good weekend!

@konradperlowski
Copy link
Contributor

@qqmyers, how's your refactoring going? Can you tell when it will be done? I will need to start working on this Crossref thing rather sooner than later, so I am just wondering whether I should wait for your improvements

@qqmyers
Copy link
Member

qqmyers commented May 4, 2022

See #8674 for work in progress. I'm not sure how quickly I can get to real refactoring. What I'd suggest is to use the existing DataCite and DOIEZIDServiceBean classes and the info in this PR regarding where you need to add CrossRef as a new provider to get started. Note that the existing DataCite classes have obsolete code maintaining an internal cache so the EZID class might be a better model (although the code for generating metadata and calling the API in the DataCite classes might be closer to what CrossRef uses?)

@pdurbin pdurbin added Type: Suggestion an idea Feature: DOI & Handle User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh labels Oct 10, 2022
@konradperlowski
Copy link
Contributor

Coming back after long time 😅
I wrote some code and it kinda works, but I have some issues, @pdurbin are you still working on dataverse and can I bother you with some problems?

@pdurbin
Copy link
Member

pdurbin commented Oct 9, 2023

@konradperlowski absolutely! Great news! It's a holiday here today, but please feel free to go ahead and pop in https://chat.dataverse.org to create a new thread under #dev about this. Write as much as you want and I'll catch up soon. Thanks!

@pdurbin
Copy link
Member

pdurbin commented Oct 10, 2023

@konradperlowski thanks for creating this topic: https://dataverse.zulipchat.com/#narrow/stream/379673-dev/topic/CrossRef.20DOI.20provider/near/395835920 (I'll link to this comment from there.)

Like @poikilotherm said, there's been some recent refactoring by @qqmyers ("This PR takes significant steps toward making PID Providers plugable and allowing multiple DOI accounts to manage different authority/shoulder combinations." in this PR...

... and the idea is to make PID providers more pluggable. That's this issue (unless there's a newer one with smaller scope that I can't find):

@qqmyers can you please advise @konradperlowski on how to proceed with a CrossRef PID provider? Have you created a branch with further refactoring he can look at?

@qqmyers
Copy link
Member

qqmyers commented Oct 10, 2023

The info in #8674 is current afaik and it lists some of the changes that are still tbd. W.r.t. reserving PIDs - if CrossRef doesn't support that, your provider should return GlobalIdServiceBean.registerWhenPublished = true. That is supposed to avoid calls out to the remote service when you create a dataset and its PID.

@cmbz
Copy link

cmbz commented Jul 24, 2024

Please note, there is a pull request in process (#10235) that will close this issue. Please watch the linked issue for details.

Update: New pull request to close this issue #10806

@stevenwinship
Copy link
Contributor

stevenwinship commented Aug 29, 2024

@JacekChudzik @konradperlowski As we move forward to integrate CrossRef as a pid provider for Dataverse I would like to ask for a test account in order to verify this feature. I spoke with Shayn Smulyan of CrossRef who was unable to set up such an account for me. His suggestion was to get the test account credentials from the client needing this functionality. Could either of you please share with me a set of credentials, DOI prefix, and urls used for testing?
Thanks you

@JacekChudzik
Copy link
Author

We were also unable to receive a test account from CrossRef and we do not have any test accounts. We developed and tested everything based on the client's production account and therefore we are unable to provide a CrossRef account for you for testing. I don't know how can we solve this problem without help from CrossRef itself. I heard about CrossRef sandbox - don't know if this might help.

@stevenwinship
Copy link
Contributor

Unfortunatly the sandbox environment uses the same production credentials, which I do not have.

@pdurbin pdurbin added this to the 6.4 milestone Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: DOI & Handle Type: Suggestion an idea User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Projects
None yet
6 participants