-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experimental] DeliverySpec RetryAfter #5811
Comments
/assign |
Do we need the Other questions:
|
(I'm 👍 on this feature, by the way) |
Thanks for the 👀 's on this Evan - much appreciated! I'll do my best to respond, forgive me if I miss the intent - I don't have as clear an understanding of the larger eventing ecosystem as you do (too narrowly focused on the KafkaChannel ; ) Also, I was holding off on creating the PR (I probably got ahead of myself by doing the work before the issue was reviewed) until I had the e2e tests were done... but I've gone ahead and created it so that folks can see the intended changes. I am still happy to refactor as the community deems necessary ; )
I would vote to keep the separation, but am certainly open to that approach if the community prefers it. I only added
I'm not sure what you mean by "handling in the storage layer"? My intent (and what I've implemented thus far) is to handle this exclusively "in-process" on a per-event basis. Meaning, when a dispatcher sends an event and receives back a 429/Retry-After, it would apply that backoff before resending that event only.
As I mentioned in
I haven't seen any precedent for similar DeliverySpec changes (e.g. Timeout field) and wouldn't know where such a thing would live? I would rather wait and see if there is request for global config and add it in later if necessary. |
If I set So, for me I'd thnk, that I am also interested in doing a retry-after recognition, for a good reason. Therefore not really sure if we need to tweak it and add extra nobs, like enabling/disabling it. However, I can see that if the server indicates 600 seconds, that this might be too long for me... |
(I'm 👍 on this feature too, by the way) |
Good point @matzew - I think the question you're really asking is... "If/when this feature graduates to Stable/GA, would we want to allow users of the retry capability an option to respect/ignore Retry-After headers?" If we are comfortable answering "No, if you use retry then Retry-After headers WILL be respected", then the DeliverySpec configuration can be reduced to a single optional field as follows... delivery:
backoffDelay: PT0.5S
backoffPolicy: exponential
retry: 3
retryAfterMax: PT120S ...and during the experimental-features Alpha/Beta stages the feature flag would simply control whether or not to respect the Retry-After headers, instead of gating the Personally, I'm fine with this approach as it seems to better adhere to the intentions of the HTTP Spec, but I thought this was undesirable from a "don't change the existing behavior" perspective. It's unclear how many users would actually try out the experimental-feature between Alpha/Stable, so it's possible the switchover might catch them out? |
On Thu 14. Oct 2021 at 21:22, Travis Minke ***@***.***> wrote:
If I set retry: 3 (or any larger as 0) I do indicate that I am interested
in "retrying".
So, for me I'd thnk, that I am also interested in doing a retry-after
recognition, for a good reason.
Therefore not really sure if we need to tweak it and add extra nobs, like
enabling/disabling it.
However, I can see that if the server indicates 600 seconds, that this
might be too long for me...
Good point @matzew <https://github.com/matzew> - I think the question
you're really asking is... *"If/when this feature graduates to Stable/GA,
would we want to allow users of the retry capability an option to
respect/ignore Retry-After headers?"* If we are comfortable answering *"No,
if you use retry then Retry-After headers WILL be respected"*, then the
DeliverySpec configuration can be reduced to a single optional field as
follows...
delivery:
backoffDelay: PT0.5S
backoffPolicy: exponential
retry: 3
retryAfterMax: PT120S
more compact and reasonable to express it like this
...and during the experimental-features Alpha/Beta stages the feature flag
would simply control whether or not to respect the *Retry-After* headers,
instead of gating the DeliverySpec.RetryAfter configuration during
validation.
Personally, I'm fine with this approach as it seems to better adhere to
the intentions of the HTTP Spec,
ACK, that’s right. Especially if I set retry to “n” (larger 0)
but I thought this was undesirable from a *"don't change the existing
behavior"* perspective. It's unclear how many users would actually try
out the experimental-feature between Alpha/Stable, so it's possible the
switchover might catch them out?
I’d use it: if I can honor the retry-after header I’d like to do so.
IMO an oversight when designing retry
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5811 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABGPTTATM6NJN2BBSSAM7DUG4UURANCNFSM5F6AMC6Q>
.
--
Sent from Gmail Mobile
|
I like it! I can re-tool to that implementation if folks agree (or no-one disagrees after a week and a summary at next weeks WG) ? |
+1 for Setting |
+1. |
Do you mean when this goes GA, right? |
When this feature is enabled. |
Hey everyone - thanks for the feedback - great discussion and suggestions! I'm going to summarize what I heard as the preferred approach, along with a related complication and proposal for how to resolve. Sorry it got a bit long-winded - trying to be clear / precise... New approach...
Complication... The nice thing about having the If we eliminate the Proposed Solution... We could REQUIRE the use of the If / when the experimental-feature graduates to "Stable / GA", we will remove the requirement for the The following table attempts to summarize the operation for each stage/use-case...
Thanks again for the review - let me know your thoughts, otherwise I'll proceed with this approach (will give it a few days to become lazily accepted ; ) |
I'm ok with the proposed solution. Or if we're worried about making changes in non-webhook code when this goes GA, we can make this field always required but defaulted by the webhook when the feature is enabled (or GA). type DeliverySpec struct {
# ...
RetryAfterMax *string // JSON stuff
} Defaulting webhook in alpha and beta phase: if featureEnabled && ds.RetryAfterMax == nil {
// Take globalDefault from a CM
ds.RetryAfterMax = globalDefault
} Validation webhook in alpha and beta phase: if featureDisabled && ds.RetryAfterMax != nil {
// fail webhook
} Defaulting webhook in GA phase: if ds.RetryAfterMax == nil {
// Take globalDefault from a CM
ds.RetryAfterMax = globalDefault
} Validation webhook in GA: // Left only the validation related to the `ds.RetryAfterMax` format In the dispatcher code by checking if The problem is that we need to choose a good global default that is good for most use cases ( |
There's been a good discussion here and in a Slack thread followed by a quiet period so I thought I'd summarize the current approach for another review... (thanks to everyone who participated!) Essentially the proposal above is the plan. The TLDR is...
Discussion Highlights
|
Description
The Retry-After header is a standard part of the HTTP spec which can be returned with 429 and 503 responses in order to specify a duration, or timestamp, which subsequent retries should wait before attempting to resend. It provides downstream event recipients with a mechanism to provide back-pressure when requests are arriving too frequently. The event retry mechanism in Knative eventing currently does not respect the Retry-After header. The intent of this new experimental-feature is to expose the ability for users to opt-in to respecting the Retry-After header.
Design Overview
Following the pattern established in the experimental-features process, and closely mirroring the implementation in the similar Delivery Timeout experimental-feature, the plan is to enhance the
DeliverySpec
to include a new optionalretryAfter
component. Use of this new component will be gated by thedelivery-retryafter
experimental feature flag in theconfig-features
ConfigMap and enforced via WebHook validation.Example DeliverySpec with new
retryAfter
component...The new
retryAfter
component will only take effect if theretry
value is specified and is at least 1. The optionalmaxDuration
field provides an override to prevent excessive backoff durations as might be desirable in certain use cases.Exit Criteria
DeliverySpec
allows optional configuration of retry behavior for 429 and 503 Retry-After headers.Experimental Feature Flag Name
delivery-retryafter
Experimental Feature Stages
The following is the proposed plan for moving through the stages of the experimental feature process...
delivery-retryafter
feature flag.DeliverySpec.RetryAfter
to knative/specs repoAffected WG
Additional Context
Resources
Experimental Features Process
Retry-After HTTP 1.1 RFC
Retry-After Mozilla Docs
CloudEvent Webhook Spec
History
An initial attempt at supporting Retry-After headers was made in March 2021 and is mostly documented in Knative Eventing Discussion #5011. This second attempt has been briefly discussed at the Eventing WG in the past few weeks.
The text was updated successfully, but these errors were encountered: