-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experimental-Feature] DeliverySpec.RetryAfter #5813
[Experimental-Feature] DeliverySpec.RetryAfter #5813
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5813 +/- ##
==========================================
+ Coverage 82.02% 82.23% +0.21%
==========================================
Files 220 220
Lines 7527 7572 +45
==========================================
+ Hits 6174 6227 +53
+ Misses 918 911 -7
+ Partials 435 434 -1
Continue to review full report at Codecov.
|
pkg/apis/duck/v1/delivery_types.go
Outdated
// header value will be used when calculating the next backoff duration. This will | ||
// only be considered when a 429 (Too Many Requests) or 503 (Service Unavailable) | ||
// response code is received and Retry is greater than 0. | ||
Enabled bool `json:"enabled"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we always respect the Retry-After
?
I am not really sure I like the toggling here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is stale - based on the original design which has evolved (is evolving in the associated Issue). Yes, everyone was against the "enabled" flag and in the latest design it is gone. I'm waiting to update this PR for the new design to settle - i need to summarize and push for agreement again...
pkg/apis/duck/v1/delivery_types.go
Outdated
// - https://www.iso.org/iso-8601-date-and-time-format.html | ||
// - https://en.wikipedia.org/wiki/ISO_8601 | ||
// +optional | ||
MaxDuration *string `json:"maxDuration,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we do not toggle (just thinking out loud) and have respecting Retry-after
as default.
Setting MaxDuration: -1
could disable it.
If nothing (or a value greater than 0) is provided we do as state on the comments
🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the idea is that at least in GA it would be always respected and you could opt out by specifying "PT0S" or something similar.
/assign @matzew |
PR has been refactored to align with current plan and is ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome piece of code.
added minor comments and learned some stuff about testing at eventing, thanks for that.
Could we add to the docs PR (not sure if also to the API) that in case the backoff + retry-after competing for setting the duration, the largest is honored?
eventing/pkg/kncloudevents/message_sender.go
Lines 143 to 148 in 6bb059a
// Return The Larger Of The Two Backoff Durations | |
if retryAfterDuration > backoffDuration { | |
return retryAfterDuration | |
} else { | |
return backoffDuration | |
} |
@@ -112,6 +119,15 @@ func RetryConfigFromDeliverySpec(spec v1.DeliverySpec) (RetryConfig, error) { | |||
retryConfig.RequestTimeout, _ = timeout.Duration() | |||
} | |||
|
|||
if spec.RetryAfterMax != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is too picky but I think it would be a nice policy to re-check the feature enablement here. If an admin enables a feature, then disable it and restart the controller, I think it would be expected that this value is not used.
This would be important for security related features, probably not for this one. Leaving the comment here for your consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this might be nice to do it is a bit problematic. The goal was to do all validation of the experimental-feature in the Webhook in order to avoid requiring downstream implementations from having to load/watch the config-features
ConfigMap into their context and then plumb that context through. I suppose we could build a separate ConfigMap lookup inline here, but that feels heavyweight and non-standard with other ConfigMap tracking logic. I'm inclined to not recheck the feature flags here but we can let others weigh in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, and I didn't thought of other implementations.
On my side, this can be resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this might be nice to do it is a bit problematic. The goal was to do all validation of the experimental-feature in the Webhook in order to avoid requiring downstream implementations from having to load/watch the
config-features
ConfigMap into their context and then plumb that context through. I suppose we could build a separate ConfigMap lookup inline here, but that feels heavyweight and non-standard with other ConfigMap tracking logic. I'm inclined to not recheck the feature flags here but we can let others weigh in?
I think we actually must check if the feature is enabled. New experimental API fields will be stored as unknown fields if the feature is disabled and the webhook will ignore them. This means in a "enable-disable" scenario, this code will find a valid value for the RetryAfterMax
and will honor it, even thought the feature is disabled.
I believe experimental features are designed to be part of eventing core as:
- API changes that're optionally implemented in alternative implementations.
- A change in the reference implementation. This is to validate the feature, as well as provide a reference implementation for other alternative implementations.
That means, that by design, if an alternative implementation will adopt an experimental feature, it needs to watch the features CM and provide its own implementation. Otherwise, it can wait until the graduation of the feature and becomes "on by default".
Well, that's my opinion, if folks feel confident in reviewing big PRs,
go for it.
Lines of code are not a problem, it's the scope.
…On Thu, Nov 18, 2021 at 7:13 PM Ben Moss ***@***.***> wrote:
I don't think we need to split this up
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or unsubscribe.
--
Pierangelo Di Pilato
Software Engineer
Red Hat, Inc
https://www.redhat.com/
|
@lionelvillard - are you still planning on reviewing? I realize it's a holiday week so it can wait till next week if you're out - just checking ; ) |
Can you replace /unhold |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lionelvillard, odacremolbap, travis-minke-sap The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
yep - good call - thanks! |
The following is the coverage report on the affected files.
|
/retest |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work for a Channel-based Broker since the filter handler doesn't propagate response headers back to the channel.
eventing/pkg/broker/filter/filter_handler.go
Lines 201 to 307 in 820db20
func (h *Handler) send(ctx context.Context, writer http.ResponseWriter, headers http.Header, target string, reportArgs *ReportArgs, event *cloudevents.Event, ttl int32) { | |
// send the event to trigger's subscriber | |
response, err := h.sendEvent(ctx, headers, target, event, reportArgs) | |
if err != nil { | |
h.logger.Error("failed to send event", zap.Error(err)) | |
writer.WriteHeader(http.StatusInternalServerError) | |
_ = h.reporter.ReportEventCount(reportArgs, http.StatusInternalServerError) | |
return | |
} | |
h.logger.Debug("Successfully dispatched message", zap.Any("target", target)) | |
// If there is an event in the response write it to the response | |
statusCode, err := h.writeResponse(ctx, writer, response, ttl, target) | |
if err != nil { | |
h.logger.Error("failed to write response", zap.Error(err)) | |
} | |
_ = h.reporter.ReportEventCount(reportArgs, statusCode) | |
} | |
func (h *Handler) sendEvent(ctx context.Context, headers http.Header, target string, event *cloudevents.Event, reporterArgs *ReportArgs) (*http.Response, error) { | |
// Send the event to the subscriber | |
req, err := h.sender.NewCloudEventRequestWithTarget(ctx, target) | |
if err != nil { | |
return nil, fmt.Errorf("failed to create the request: %w", err) | |
} | |
message := binding.ToMessage(event) | |
defer message.Finish(nil) | |
additionalHeaders := utils.PassThroughHeaders(headers) | |
// Following the spec https://github.com/knative/specs/blob/main/specs/eventing/data-plane.md#derived-reply-events | |
additionalHeaders.Set("prefer", "reply") | |
err = kncloudevents.WriteHTTPRequestWithAdditionalHeaders(ctx, message, req, additionalHeaders) | |
if err != nil { | |
return nil, fmt.Errorf("failed to write request: %w", err) | |
} | |
start := time.Now() | |
resp, err := h.sender.Send(req) | |
dispatchTime := time.Since(start) | |
if err != nil { | |
err = fmt.Errorf("failed to dispatch message: %w", err) | |
} | |
sc := 0 | |
if resp != nil { | |
sc = resp.StatusCode | |
} | |
_ = h.reporter.ReportEventDispatchTime(reporterArgs, sc, dispatchTime) | |
return resp, err | |
} | |
// The return values are the status | |
func (h *Handler) writeResponse(ctx context.Context, writer http.ResponseWriter, resp *http.Response, ttl int32, target string) (int, error) { | |
response := cehttp.NewMessageFromHttpResponse(resp) | |
defer response.Finish(nil) | |
if response.ReadEncoding() == binding.EncodingUnknown { | |
// Response doesn't have a ce-specversion header nor a content-type matching a cloudevent event format | |
// Just read a byte out of the reader to see if it's non-empty, we don't care what it is, | |
// just that it is not empty. This means there was a response and it's not valid, so treat | |
// as delivery failure. | |
body := make([]byte, 1) | |
n, _ := response.BodyReader.Read(body) | |
response.BodyReader.Close() | |
if n != 0 { | |
// Note that we could just use StatusInternalServerError, but to distinguish | |
// between the failure cases, we use a different code here. | |
writer.WriteHeader(http.StatusBadGateway) | |
return http.StatusBadGateway, errors.New("received a non-empty response not recognized as CloudEvent. The response MUST be either empty or a valid CloudEvent") | |
} | |
h.logger.Debug("Response doesn't contain a CloudEvent, replying with an empty response", zap.Any("target", target)) | |
writer.WriteHeader(resp.StatusCode) | |
return resp.StatusCode, nil | |
} | |
event, err := binding.ToEvent(ctx, response) | |
if err != nil { | |
// Like in the above case, we could just use StatusInternalServerError, but to distinguish | |
// between the failure cases, we use a different code here. | |
writer.WriteHeader(http.StatusBadGateway) | |
// Malformed event, reply with err | |
return http.StatusBadGateway, err | |
} | |
// Reattach the TTL (with the same value) to the response event before sending it to the Broker. | |
if err := broker.SetTTL(event.Context, ttl); err != nil { | |
writer.WriteHeader(http.StatusInternalServerError) | |
return http.StatusInternalServerError, fmt.Errorf("failed to reset TTL: %w", err) | |
} | |
eventResponse := binding.ToMessage(event) | |
defer eventResponse.Finish(nil) | |
if err := cehttp.WriteResponseWriter(ctx, eventResponse, resp.StatusCode, writer); err != nil { | |
return http.StatusInternalServerError, fmt.Errorf("failed to write response event: %w", err) | |
} | |
h.logger.Debug("Replied with a CloudEvent response", zap.Any("target", target)) | |
return resp.StatusCode, nil | |
} |
@pierDipi can you open an issue? |
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need a followup to check if the feature is enabled. This current behavior will be problematic in case of enabling then disabling the feature.
@@ -112,6 +119,15 @@ func RetryConfigFromDeliverySpec(spec v1.DeliverySpec) (RetryConfig, error) { | |||
retryConfig.RequestTimeout, _ = timeout.Duration() | |||
} | |||
|
|||
if spec.RetryAfterMax != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this might be nice to do it is a bit problematic. The goal was to do all validation of the experimental-feature in the Webhook in order to avoid requiring downstream implementations from having to load/watch the
config-features
ConfigMap into their context and then plumb that context through. I suppose we could build a separate ConfigMap lookup inline here, but that feels heavyweight and non-standard with other ConfigMap tracking logic. I'm inclined to not recheck the feature flags here but we can let others weigh in?
I think we actually must check if the feature is enabled. New experimental API fields will be stored as unknown fields if the feature is disabled and the webhook will ignore them. This means in a "enable-disable" scenario, this code will find a valid value for the RetryAfterMax
and will honor it, even thought the feature is disabled.
I believe experimental features are designed to be part of eventing core as:
- API changes that're optionally implemented in alternative implementations.
- A change in the reference implementation. This is to validate the feature, as well as provide a reference implementation for other alternative implementations.
That means, that by design, if an alternative implementation will adopt an experimental feature, it needs to watch the features CM and provide its own implementation. Otherwise, it can wait until the graduation of the feature and becomes "on by default".
This PR addresses parts of #5811 by creating the experimental-feature in the Alpha stage.
TLDR = New experimental feature allowing users to opt-in to respecting the Retry-After header when calculating the appropriate backoff duration for 429 and 503 responses.
Proposed Changes
delivery-retryafter
feature flag.DeliverySpec
with new optionalRetryAfterMax
componentDeliverySpec.RetryAfterMax
usage against feature flag in WebHook validationDeliverySpec.RetryAfterMax
to Subscription Controller/Reconciler updating ofChannel.Spec.Subscribers
RetryConfig
struct to include RetryAfterMax configuration.RetryConfigFromDeliverySpec()
to parseDeliverySpec.RetryAfterMax
intoRetryConfig
.SendWithRetries()
to consider Retry-After headers andRetryConfig
settings when calculating backoff durations.Open Questions
pkg/reconciler/subscription/subscription_test.go
to cover the changes in the Subscription controller. This introduced the need to have the test be aware of experimental-features. I think this a win but wanted to confirm it was acceptable (ie - no need to keep the controller test "pure").v1beta1
version of the DeliverySpec without any experimental-feature gates. I wasn't sure of the rationale for doing so and have NOT done so in this PR. If someone could explain why we'd wan to do this I'll be happy to include it.Pre-review Checklist
Release Note
Docs
knative/docs#4361
Holding for
Experimental-Features review process & refactoring approachreview/hold