Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Edge API retries configurable #44536

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jscheffl
Copy link
Contributor

@jscheffl jscheffl commented Dec 1, 2024

In #44311 (comment) @kaxil, @potiuk and me had a bit of discussion. As promised to come back with this, this PR implements (as promised) a way to make the retries in Edge worker configurable.

But it is also opening the box of debates because:

  • Do we want to add a new config? (Some people start screaming?)
    • (My position: Sensible default and make it configurable)
  • Should we really retry 10 times?
    • (My position: 10 attempts was the former default in internal API, in a small prod outage I can say at least this is good such that tasks do not fail in small webserver outages or connection interrupts. We see daily flakiness on our WAN. As Zombie threshold is at 300s per default retrying more than 5min might not be needed. But also we should faster on small glitches... so the exponential back-off is good. I tested with the parameters below and I think for waiting 5min max it is reasonable to test 10 times in between before fail.
  • Oh, why specific in Edge? (I saw multiple occasions in retries in different places in the repo. But also moving to TaskSDK I think we also should consider making this more common - at least Edge API retries from far away should be matching to the calls that are made to TaskSDK!)
    • (My position: I'd favor to make such setting common in Edge API and at least TaskSDK calls)

@ashb would also call for your opinion.

And if needed - but I assume it can be made within this PR - we could also call for an open [DISCUSS] or loop-in other stakeholders. Le me know.

@jscheffl jscheffl added AIP-69 Edge Executor area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK provider:edge Edge Executor / Worker (AIP-69) labels Dec 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-69 Edge Executor area:providers area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK provider:edge Edge Executor / Worker (AIP-69)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant