Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for the Research Organization Registry (ROR) #4483

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

joemull
Copy link
Member

@joemull joemull commented Nov 8, 2024

Closes #3168.

Overview

This piece of work adds a ROR data model, including affiliations, organizations, organization names, and locations. It includes an import routine to fetch and process ROR's full database. I've included as much backwards compatibility as possible for importers that do not know about ROR. It also adds a user interface for editing one's own affiliations, and an admin interface for all of the relevant models.

Command line interface

❯ python src/manage.py import_ror_data
DEBUG 2024-12-09 18:41:33,666 connectionpool P:131157 T:135176669522816 Starting new HTTPS connection (1): zenodo.org:443
DEBUG 2024-12-09 18:41:34,697 connectionpool P:131157 T:135176669522816 https://zenodo.org:443 "GET /api/communities/ror-data/records?sort=newest HTTP/1.1" 200 None
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 101.06it/s]
INFO 2024-12-09 18:41:38,658 import_ror_data P:131157 T:135176669522816 successful

User interface

image
image
image
image
image
image
image

Some functionality is still to do:

  • As a system admin, I can import the organizations more quickly (currently the process is slow due to the need to check existing data while creating new records to avoid duplicates)
  • A cron job is created to run ROR imports when I install Janeway in production
  • As the author of an article submission, I can add an affiliation for myself and my coauthors (see Update the author submission screen #4519)
  • As the author of a repository submission, I can add an affiliation for myself and my coauthors
  • As an editor, I can edit the affiliations of frozen author records on articles in the pipeline
  • As a publishing librarian or research impact librarian, I can see ROR data in Crossref deposits, OAI-PMH feeds, and JATS files exposed to web crawlers so that I can track research impact
  • As a publisher importing metadata, I can import RORs using the import / export / update interface
  • As an editor/publisher I can to be able to pull reports using ROR as a filter
  • As a marketer/publisher I can pull reports that compare authors against the consortial billing plugin

Some things still need design / feature development:

  • As any user, I can enter location metadata for my account that is separate from an affiliation (do we want this?)
  • As any user, when I enter a custom organization, I can add a custom location to the organization (do we want this?)
  • When I search for and add an author in the author submission interface (Update the author submission screen #4519), the ROR affiliations are pulled in from ORCID API search results
  • As a staff member, I can edit the affiliations of user accounts with a role in my journal

@joemull joemull linked an issue Nov 15, 2024 that may be closed by this pull request
1 task
@joemull joemull force-pushed the 3168-ror branch 2 times, most recently from d314d51 to cc46446 Compare December 9, 2024 18:20
@joemull joemull requested a review from ajrbyers December 9, 2024 18:43
@joemull joemull requested a review from MartinPaulEve December 9, 2024 18:44
@joemull joemull marked this pull request as ready for review December 9, 2024 18:44
@MartinPaulEve
Copy link
Contributor

If you enter an organization name that isn't found (e.g., in test import, "Birkbeck") and then click on one of the pagination items at the bottom, you are redirected to an erroring page eg:

http://127.0.0.1:8000/JRNL/profile/organization/search/?q=birkbeck&page=3&paginate_by=25

This errors because the item isn't found, but the search query is inserted into the pagination link

Copy link
Contributor

@MartinPaulEve MartinPaulEve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks really good. Just a few small things.

  • Pagination in ROR search currently redirects to page not found error
  • List of affiliations should show which is primary?
  • RORImportStatus should not be duplicated?

@@ -868,7 +868,7 @@ def check_for_bad_login_attempts(request):
time = timezone.now() - timedelta(minutes=10)

attempts = models.LoginAttempt.objects.filter(user_agent=user_agent, ip_address=ip_address, timestamp__gte=time)
print(time, attempts.count())
logger.info(time, attempts.count())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this message be more explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes probably, though I'm just removing the noisy print statement here, not changing this feature on this branch.

'is_active': True,
'password': 'this_is_a_password',
'salutation': 'Prof.',
'first_name': 'Martin',
'middle_name': '',
'last_name': 'Eve',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top test author

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a doubt!

('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('started', models.DateTimeField(auto_now_add=True)),
('stopped', models.DateTimeField(blank=True, null=True)),
('status', models.CharField(choices=[('ongoing', 'Ongoing'), ('unnecessary', 'Unnecessary'), ('successful', 'Successful'), ('failed', 'Failed')], default='ongoing')),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these choices reference RORImportStatus rather than duplicating?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the standard way Django records choices in migrations, even when the choices are expressed in a class. The reason is that if the migration referenced a variable in the models file, then if the choices list ever changed there, the migration might stop working.

Example:

choices=[('Issue', 'Issue'), ('Collection', 'Collection')],

@MartinPaulEve MartinPaulEve removed their assignment Dec 13, 2024
@joemull
Copy link
Member Author

joemull commented Dec 13, 2024

  • Pagination in ROR search currently redirects to page not found error

This turned out to be a Django bug I believe. I think it's obscure because most people don't encounter it, since they submit their search query to change the list of items before selecting a page.

But I've gone ahead and fixed it for this view.

  • List of affiliations should show which is primary?

Yes, good call. I added a label.

I also updated the save method to make the affiliation primary if it's the first one.

image

  • RORImportStatus should not be duplicated?

That's standard, I'm pretty sure. See comment inline.

@joemull joemull assigned MartinPaulEve and unassigned ajrbyers Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Research Organization Registry (ROR)
3 participants