Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement retries for spurious connection issues #758

Merged
merged 4 commits into from
Oct 4, 2023
Merged

Conversation

bennybp
Copy link
Contributor

@bennybp bennybp commented Sep 21, 2023

Description

Requests that result in certain errors due to spurious connection/networking issues should now be automatically retried.

The types of exceptions handled should cover most cases, but I am open to expanding the list if people find ones that I missed.

The request will be tried again after waiting 0.5, 1.0, 2.0, and 4.0 seconds (since the previous request)

Seem reasonable @dotsdl? Closes #741

Changelog description

Requests now will be automatically retried in case of connection or networking issues

Status

  • Code base linted
  • Ready to go

@codecov
Copy link

codecov bot commented Sep 21, 2023

Codecov Report

Merging #758 (3eef76e) into main (e8d9cba) will decrease coverage by 0.02%.
Report is 6 commits behind head on main.
The diff coverage is 96.87%.

Additional details and impacted files

@dotsdl
Copy link
Collaborator

dotsdl commented Sep 26, 2023

Reviewing now! Thanks for this @bennybp!

@dotsdl
Copy link
Collaborator

dotsdl commented Sep 27, 2023

retry looks kinda dead; looks like backoff may be a better bet for avoiding bitrot?

Copy link
Collaborator

@dotsdl dotsdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, and suggest making the parameters for retries configurable at runtime. I also think that retry as a library looks a bit dead, so may be a better bet to use something else, like backoff. Can probably still use backoff's decorator directly with configurable paramters as e.g.:

backoff.on_predicate(backoff.expo, ...)(func_to_retry)(*args, **kwargs)

If adding another dependency seems overkill for this, it's not too hard to implement your own solution with the behavior you want. As an example, we do this in a simple way in alchemiscale here.

qcportal/qcportal/manager_client.py Show resolved Hide resolved
Comment on lines 318 to 321
tries=5,
delay=0.5,
max_delay=5,
backoff=2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding these, perhaps use attributes on PortalClientBase configurable on init? Probably really only need to expose tries, max_delay, and backoff, but could expose all of these if you want to allow for customizability at runtime.

Also, probably need to set jitter as well to avoid many workers retrying all in-sync during a network hiccup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly opposed to adding these to the constructor, since a vast, vast majority of people would use the defaults, and this would almost double the number of arguments available. I can do something similar to the encoding though and just allow those attributes to be set for a client object.

@bennybp
Copy link
Contributor Author

bennybp commented Sep 27, 2023

If adding another dependency seems overkill for this, it's not too hard to implement your own solution

Let me think about just hardcoding a solution. We are only using it in one place, so it it's not be too bad to implement it myself (using the aforementioned class attributes)

@bennybp
Copy link
Contributor Author

bennybp commented Sep 29, 2023

Ok implemented by hand now. It's pretty simple :)

There is only one other place I would consider adding this kind of logic (when dealing with certain database operations). But it will end up looking a lot different, so I'm not worried about duplicating code.

@dotsdl
Copy link
Collaborator

dotsdl commented Oct 3, 2023

Giving this another review today!

Copy link
Collaborator

@dotsdl dotsdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @bennybp! This should help users quite a bit!

@bennybp bennybp merged commit 5818c49 into main Oct 4, 2023
16 checks passed
@bennybp bennybp deleted the qcportal_retry branch October 4, 2023 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add use of retries with exponential backoff to PortalClient requests
2 participants