Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing RT2 timeouts [WAS: Repotool2 times out on harvest of MCOs] #163

Open
tomwrobel opened this issue Dec 23, 2019 · 5 comments
Open

Comments

@tomwrobel
Copy link
Collaborator

Many contributor objects (MCOs) where there are more than N contributors are failing harvest. We don't know what N is, only that objects in the thousands certainly fail.

Repotool2 allows 100 seconds for a Sword2 response. Therefore we have to respond within that time.

Sword2 has been updated (currently only on QA) to truncate the number of contributors at 100, and to put an et_al name element for the authors role. This should speed up the read, even if it means that the number of contributors is truncated.

Related commits:

  • Willow sword [feature/ora_customizations 3bc9745] Add maximum contributors in response
  • Ansible.ora4 [TW-add-max-contributors-config-option 706b736] Configure Sword2 to return N contributors only
@tomwrobel
Copy link
Collaborator Author

ancilliary question: does Repotool2 pick up the last item in an OAI-PMH list?

@tomwrobel
Copy link
Collaborator Author

Notes: we need to increase the timeout value in Repotool2, there is no easy way around this

@tomwrobel
Copy link
Collaborator Author

Notes: there is an intended fix for this by increasing timeouts in 5.20 that should be backported to 5.18. We will revisit this issue once that is deployed

@tomwrobel tomwrobel changed the title Repotool2 times out on harvest of MCOs Increasing RT2 timeouts [WAS: Repotool2 times out on harvest of MCOs] Jan 29, 2020
@mrdsaunders
Copy link
Collaborator

mrdsaunders commented Feb 12, 2020

From Andrew Bennett:
In 5.20 improvements to the crosswalk engine will solve the issue. For a paper with 2,800 authors:
16,234ms when limited to 25 authors
28,498ms when limited to 50 authors
48,802ms when limited to 100 authors
After 5.20 all 2,800 authors will be processed in 314ms.

For 5.18/5.19 (following patch) we can limit the number of authors via the crosswalk:
<xwalk:parameter name="person-list-entry-read-limit" value="50" />

I will test (once on 5.19) and then update the crosswalk.

@tomwrobel
Copy link
Collaborator Author

@tomwrobel to note before we talk about this ticket tomorrow, Jason has explicitly requested that the person list can be limited BUT

  • we need all depositor and impersonator information
  • we need all known Oxford Authors

mrdsaunders added a commit that referenced this issue May 4, 2020
28/04/20: Updated file-versions value map to map null to null instead of NA.
24/04/20: Updated for v5.20 to add parameter to limit author list to x authors (GitHub issue #163, S Ltd ticket #248705)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants