-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENG-6248] Bulk resync for CrossRef and DataCite #10988
base: feature/b-and-i-25-01
Are you sure you want to change the base?
[ENG-6248] Bulk resync for CrossRef and DataCite #10988
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a management command that already does this work and is a bit more fully-featured. Check out osf/management/commands/sync_doi_metadata.py
and modify that for rate limiting rather than making a new management command.
admin/management/tasks.py
Outdated
for registration in Registration.objects.exclude(article_doi=None): | ||
registration.request_identifier_update('doi', create=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also projects that have DOIs from datacite. We should find the ones that have a doi and sync those in here as well.
admin/management/tasks.py
Outdated
@celery_app.task() | ||
def resync_crossref(): | ||
for preprint in Preprint.objects.exclude(article_doi=None): | ||
preprint.request_identifier_update('doi', create=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matt had suggested adding some rate limiting to this. According to the datacite docs,
Our firewall imposes a rate limit of 3,000 requests per 5 minute window per
IP address for all of our APIs.
On our test system, please do not exceed 750 requests per 5 minute window.
See the [Test Accounts Policy](https://support.datacite.org/docs/test-accounts-policy)
for more information.
According to the crossref docs:
We generally limit the number of concurrent requests to 5, though we
reserve the right to adjust this in order to keep the API operational
for all users.
When rate limits are exceeded, requests will receive a 429 HTTP
response code and the offending user will be blocked for 10 seconds.
If you continue to make requests while blocked, you will be blocked
for another 10 seconds. Please monitor HTTP response codes and
utilize automated backoff and retry logic in your application.
So we'll probably need more logic there. Don't try to get to close to the rate limits, because our system will be doing other requests while this is going on, and we don't want those to be limited.
Purpose
Admin should be able to resync CrossRef and DataCite
Changes
Edited template, added celery tasks
Ticket
https://openscience.atlassian.net/jira/software/c/projects/ENG/boards/145?assignee=712020%3A7c7368dc-40cb-475f-bae8-b07a8bd2dd6c&assignee=557058%3A5fafb22b-df22-41bd-8ed5-d7701be68048&selectedIssue=ENG-6248