Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be smarter about retries #409

Closed
embano1 opened this issue Sep 8, 2021 · 13 comments · Fixed by #486
Closed

Be smarter about retries #409

embano1 opened this issue Sep 8, 2021 · 13 comments · Fixed by #486
Milestone

Comments

@embano1
Copy link
Contributor

embano1 commented Sep 8, 2021

The dispatcher currently considers all !2xx || !3xx status codes as retryable, which is somewhat against the HTTP protocol spec:

https://github.com/knative-sandbox/eventing-rabbitmq/blob/928976461374266e0c029bc61719c68b3dffad8b/pkg/dispatcher/dispatcher.go#L160

I'm happy to file a PR if we get consensus on which codes we want to retry. Related work:

@xtreme-sameer-vohra
Copy link
Contributor

Thanks for filing this @embano1
Seems pretty reasonable to be more specific. At minimum we should cover the explicit http status checks.

@embano1
Copy link
Contributor Author

embano1 commented Sep 23, 2021

I wonder why this is overwritten at all and not the default option used? https://github.com/cloudevents/sdk-go/blob/c6de9589bb3578364689adc64414000dabe93c01/v2/protocol/http/protocol.go#L40

This looks like it does what we need (although retrying on 500 would be nice too, I may want to file a PR for the SDK)? cloudevents/sdk-go#718

@benmoss
Copy link
Contributor

benmoss commented Sep 23, 2021

Yeah, it definitely seems like something the sdk should be handling

@lionelvillard
Copy link
Contributor

@embano1
Copy link
Contributor Author

embano1 commented Sep 23, 2021

@lionelvillard are you suggesting we should be using kncloudevents as client?

@lionelvillard
Copy link
Contributor

I'm just raising awareness, not suggesting anything :-) kncloudevents is meant to be Knative eventing compliant so maybe it makes sense for you to use it.

@embano1
Copy link
Contributor Author

embano1 commented Sep 23, 2021

@benmoss thoughts?

@vaikas
Copy link
Contributor

vaikas commented Sep 24, 2021

@embano1 for the context on why the overrides:
#113

And:
cloudevents/sdk-go#595

Not saying it's correct the way it is, just answering the why :)

@embano1
Copy link
Contributor Author

embano1 commented Sep 24, 2021

Not saying it's correct the way it is, just answering the why :)

Gotcha! So IMHO we have couple of options here (most, if not all of them breaking):

  1. adopt kncloudevents client
  2. drop custom retryer in favor of default in ce-client
  3. enhance custom retryer as proposed above

@benmoss
Copy link
Contributor

benmoss commented Oct 12, 2021

I'm confused what the desired behavior even is. We have a bunch of different definitions of when to retry and not, but not much clarity on which one is correct. I think I'd be most comfortable for us to follow the the eventing spec, but happy to hear arguments for alternative views.

@embano1
Copy link
Contributor Author

embano1 commented Oct 13, 2021

I agree Ben, we should align with the Knative Eventing spec then - even though I'm not 100% on the same page with some decisions here.

But this also means we have to implement our own retry code mapping [1] since the CloudEvents SDK has a diverging (from the above spec) set of defaults:

var defaultRetriableErrors = map[int]string{
	404: "Not Found",
	413: "Payload Too Large",
	425: "Too Early",
	429: "Too Many Requests",
	502: "Bad Gateway",
	503: "Service Unavailable",
	504: "Gateway Timeout",
}

[1]
Not sure if there's already a pkg in Knative land which provides these mappings in compliance with the SPEC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants