Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENG-6248] Bulk resync for CrossRef and DataCite #10988

Open
wants to merge 3 commits into
base: feature/b-and-i-25-01
Choose a base branch
from

Conversation

ihorsokhanexoft
Copy link

Copy link
Collaborator

@brianjgeiger brianjgeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a management command that already does this work and is a bit more fully-featured. Check out osf/management/commands/sync_doi_metadata.py and modify that for rate limiting rather than making a new management command.

Comment on lines 18 to 19
for registration in Registration.objects.exclude(article_doi=None):
registration.request_identifier_update('doi', create=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also projects that have DOIs from datacite. We should find the ones that have a doi and sync those in here as well.

@celery_app.task()
def resync_crossref():
for preprint in Preprint.objects.exclude(article_doi=None):
preprint.request_identifier_update('doi', create=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matt had suggested adding some rate limiting to this. According to the datacite docs,

Our firewall imposes a rate limit of 3,000 requests per 5 minute window per 
IP address for all of our APIs.

On our test system, please do not exceed 750 requests per 5 minute window. 
See the [Test Accounts Policy](https://support.datacite.org/docs/test-accounts-policy) 
for more information.

According to the crossref docs:

We generally limit the number of concurrent requests to 5, though we 
reserve the right to adjust this in order to keep the API operational 
for all users.

When rate limits are exceeded, requests will receive a 429 HTTP 
response code and the offending user will be blocked for 10 seconds. 
If you continue to make requests while blocked, you will be blocked 
for another 10 seconds. Please monitor HTTP response codes and 
utilize automated backoff and retry logic in your application.

So we'll probably need more logic there. Don't try to get to close to the rate limits, because our system will be doing other requests while this is going on, and we don't want those to be limited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants