Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission.Webhook always returns HTTP 200, making failurePolicy=Ignore moot #2662

Closed
cbroglie opened this issue Jan 22, 2024 · 15 comments
Closed
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@cbroglie
Copy link

If a handler encounters an unrecoverable error and returns:

return admission.Errored(http.StatusInternalServerError, err)

The ultimate response from the webhook server is a HTTP 200 with the 500 status code to return from the apiserver packaged in the response body.

The problem with this approach is that 500 error the handler attempts to return is not ignored by failurePolicy=Ignore, since the apiserver sees it as a valid rejection response.

When initially rolling out new webhooks we want them to be functional no-ops, in a shadow deny/log-only mode at first. And we start with failurePolicy=Ignore with the expectation that any server 500s will be ignored, which should ensure the initial rollout has no user-facing impact. But since controller-runtime's admission.Webhook implementation always returns a HTTP 200, the apiserver's ignore case is never triggered. Is there a recommended way for handlers to signal they want to return a non-successful HTTP code?

@troy0820
Copy link
Member

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Jan 22, 2024
@troy0820
Copy link
Member

This doesn't seem to be explicit to controller-runtime but webhooks in general. Webhooks always return 200.
https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#response

Are you expecting a different response or the failure policy is not adhering to the webhook config?

@troy0820
Copy link
Member

This issue highlights the gardener/gardener#7013 issue you may be facing around a discussion where this took place.

@cbroglie
Copy link
Author

cbroglie commented Jan 23, 2024

Yes, that's highlighting the same issue:

However in https://kubernetes.slack.com/archives/C0EG7JC6T/p1650375336872579 it is explained that failurePolicy: Ignore does not have effect if the webhook is working and denying the request with a 2XX response code:

failure policy controls what happens when the webhook fails (times out, or the webhook returns a non-2xx response, or returns invalid content that can't be interpreted as an AdmissionReview, or returns an invalid AdmissionReview response) (edited)
not when the webhook successfully returns an AdmissionReview rejecting the API request

What I'd like is a way to signal that the webhook is not working, e.g. by returning a 500. Then it's left up to the failurePolicy of the webhook to determine if the request is allowed or not.

Here's a concrete, albeit contrived example: let's say I'm introducing a webhook that reaches out to another service as part of its logic. The webhook pods require a network policy to allow ingress from the apiserver, and another one to allow egress to the other service. And let's say I get the port wrong on the ingress policy so the pods are unreachable. If the webhook is configured with failurePolicy=Ignore, the request is allowed. But if I messed up the port on the egress policy instead, the request to the external service would fail, and the webhook would attempt to return a 500. To me these are the same class of misconfiguration that failurePolicy=Ignore should handle, but in the latter case the request is ultimately rejected by the apiserver.

@troy0820
Copy link
Member

What I'd like is a way to signal that the webhook is not working, e.g. by returning a 500. Then it's left up to the failurePolicy of the webhook to determine if the request is allowed or not

That’s not what FailurePolicy does. This request is an upstream issue and not indicative of how controller runtime webhooks work.

In your example, your accept/reject criteria is defined in the logic that you use to implement the interface. You can write a custom validator and utilize logic from the external service to use errors you defined from the response type and reject/accept from there. If you need to ignore those errors you will not get that from the failure policy.

If you wish to submit an issue around this upstream, that API would need to be changed to accommodate your request.

The 200 response is showing that webhook indeed succeeded in handling the request as described in the issue linked, discussion in slack and api design of admission review requests.

@cbroglie
Copy link
Author

That’s not what FailurePolicy does. This request is an upstream issue and not indicative of how controller runtime webhooks work.

Why is there a distinction between a failure which occurs before the request gets to the handler, and a failure in the handler itself? That's what I was trying to illustrate in my example. In either situation it is a logical failure of the webhook to generate an admission decision.

Why do you feel it is an upstream issue? If controller runtime provided a hook to return a HTTP 500 when the webhook is failing and unable to make a decision, then it would follow the same apiserver code path as a timeout/failure to reach the webhook, and the ultimate decision would be based on the webhook's failure policy.

@troy0820
Copy link
Member

Why is there a distinction between a failure which occurs before the request gets to the handler, and a failure in the handler itself? That's what I was trying to illustrate in my example.

Returning an error rejects the request. If you wanted the request to not get rejected, you would return nil. If you want to do some error logic around those parameters within your webhook, you can do errors.Is(....) to decide whether to accept/reject the request.

  1. 200 OK means that the request to the webhook succeeded and the code it carries would be the logic implemented in the interface
  2. Returning !200 would mean that the response from the webhook itself was bad and not the rejection decision from what logic you provide from that implementation.

That's where failurePolicy comes in.

In either situation it is a logical failure of the webhook to generate an admission decision.

Handling the request and what failurePolicy does is what to do with the request if it can't reach the webhook (which is a mechanism of decision beyond the validation that occurs on the non-reachable webhook). You are asking to have that behavior changed by using the field that does this to do what can be done within the logic of the validator.

Why do you feel it is an upstream issue? If controller runtime provided a hook to return a HTTP 500 when the webhook is failing and unable to make a decision, then it would follow the same apiserver code path as a timeout/failure to reach the webhook, and the ultimate decision would be based on the webhook's failure policy.

This is an upstream issue because webhooks that function with core are inherited within the scope to behave just as upstream intends to and abide by the same dependency we carry implementing within controller-runtime. Implementing new logic to utilize the field failurePolicy to do what you are asking is in fact changing the behavior of what we understand the field to do with the unreachable webhook.

There is a path forward in your case. Controller-runtime provides the interface you can implement to reject/accept requests and how that happens is up to you and not derived from the http status code from the "error" that is handed back where failurePolicy wouldn't do anything with that.

If your issue is with failurePolicy and not the implementation of the webhook strategy provided by controller-runtime, this is an issue that upstream would have to resolve as that behavior is what controller-runtime provides.

@troy0820
Copy link
Member

You can also take advantage of admission.Warnings to try out webhook rejections when request fail.
https://kubernetes.io/blog/2020/09/03/warnings/

@cbroglie
Copy link
Author

Just to make sure we're on the same page, I'm trying to achieve (2):

  1. Returning !200 would mean that the response from the webhook itself was bad and not the rejection decision from what logic you provide from that implementation.

An example patch might look something like:

diff --git a/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go b/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go
index 57e465ab..38df9c29 100644
--- a/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go
+++ b/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go
@@ -96,6 +96,12 @@ func (wh *Webhook) ServeHTTP(w http.ResponseWriter, r *http.Request) {
 	wh.getLogger(&req).V(5).Info("received request")
 
 	reviewResponse = wh.Handle(ctx, req)
+	if reviewResponse.internalServerError != nil {
+		wh.getLogger(nil).Error(err, "internal server error handling request")
+		http.Error(w, reviewResponse.internalServerError.Error(), http.StatusInternalServerError)
+		return
+	}
+
 	wh.writeResponseTyped(w, reviewResponse, actualAdmRevGVK)
 }
 
diff --git a/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/response.go b/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/response.go
index ec1c88c9..d3758550 100644
--- a/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/response.go
+++ b/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/response.go
@@ -59,6 +59,16 @@ func Errored(code int32, err error) Response {
 	}
 }
 
+// InternalServerError constructs a response indicating that the an internal
+// given error occurred while handling the request, and an admission decision
+// could not be generated. The API server will treat it as a failure to call
+// the webhook.
+func InternalServerError(err error) Response {
+	return Response{
+		internalServerError: err,
+	}
+}
+
 // ValidationResponse returns a response for admitting a request.
 func ValidationResponse(allowed bool, message string) Response {
 	code := http.StatusForbidden
diff --git a/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go b/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go
index f1767f31..3d2a7d4d 100644
--- a/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go
+++ b/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go
@@ -62,6 +62,11 @@ type Response struct {
 	// AdmissionResponse is the raw admission response.
 	// The Patch field in it will be overwritten by the listed patches.
 	admissionv1.AdmissionResponse
+
+	// internalServerError indicates an internal error occurred while processing
+	// request, and an admission decision could not be generated. If set, a HTTP
+	// 500 will be returned instead of an admission decision.
+	internalServerError error
 }
 
 // Complete populates any fields that are yet to be set in

Then in a situation where my handler is unable to process the request it would return:

allow, err := doWebhookLogic(ctx, req)
if err != nil {
	return admission.InternalServerError(err)
}
if !allow {
	if shadowDenyMode {
		return admission.Allowed("").WithWarnings("some warning about being blocked in the future")
	}
	return admission.Denied("denied")
}
return admission.Allowed("")

The handler itself isn't trying to consider failurePolicy or anything like that, it's just trying to communicate that it's failing (for whatever reason) and cannot make an admission decision.

@troy0820
Copy link
Member

With your example, this now returns a non 200 ok response. There is no other non 200 response in this implementation of the webhook.

That is because any non 200 response is treated as a failure to serve the request and the apiserver will see that request as failed and allow the failurepolicy to take place. That ignore or failed will act on this with the intention of it not being reachable and the rejection will abide by that.

I contend that this strategy obfuscates the intention behind the failure policy and conflates the non 200 ok failure request as a unreachable request when it was served and left to abide by a failure policy to handle the decision of the logic of the unreachable request.

@alvaroaleman @sbueringer @vincepri what do you think?

@alvaroaleman
Copy link
Member

The intention behind the failurePolicy is to deal with webhook unavailability or not working for other reasons. If you want a new setting to ignore the decision of the webhook, that is an upream request and unrelated to controller-runtime as troy pointed out in #2662 (comment).

The http/200 indicates only if the webhook was able to successfully process the request and doesn't tell you anything about its decision. So yes, if you want to both enable the webhook but it not actually block anything, you will have to write code similiar to what you suggested in #2662 (comment)

@cbroglie
Copy link
Author

The intention behind the failurePolicy is to deal with webhook unavailability or not working for other reasons. If you want a new setting to ignore the decision of the webhook

I don't want a new setting to ignore the decision of the webhook, I want a way for the webhook to express it is failing and that it can't make a decision by returning a 500 error. To me that aligns with "webhook unavailability or not working for other reasons."

But I understand you guys feel there should be a distinction between failures to get to the webhook and failures that occur in the webhook itself, and that failurePolicy should only apply to the former. If that's settled, then there's nothing to do and we can close this out. The escape hatch we are using is a forked copy of admission.Webhook.ServeHTTP, since it's not much code.

@alvaroaleman
Copy link
Member

There is admisison.Errored for "my handling code encountered an error":

// Errored creates a new Response for error-handling a request.
func Errored(code int32, err error) Response {
return Response{
AdmissionResponse: admissionv1.AdmissionResponse{
Allowed: false,
Result: &metav1.Status{
Code: code,
Message: err.Error(),
},
},
}
}

@cbroglie
Copy link
Author

cbroglie commented Jan 23, 2024

But that doesn't trigger the apiserver's ignoreClientCallFailures code path b/c the 200 response means it's not an ErrCallingWebhook, which is what ticket is about:

ignoreClientCallFailures := hook.FailurePolicy != nil && *hook.FailurePolicy == v1.Ignore
rejected := false
if err != nil {
	switch err := err.(type) {
	case *webhookutil.ErrCallingWebhook:
		if !ignoreClientCallFailures {
			rejected = true
			admissionmetrics.Metrics.ObserveWebhookRejection(ctx, hook.Name, "validating", string(versionedAttr.Attributes.GetOperation()), admissionmetrics.WebhookRejectionCallingWebhookError, int(err.Status.ErrStatus.Code))
		}
		admissionmetrics.Metrics.ObserveWebhook(ctx, hook.Name, time.Since(t), rejected, versionedAttr.Attributes, "validating", int(err.Status.ErrStatus.Code))
	case *webhookutil.ErrWebhookRejection:
		rejected = true
		admissionmetrics.Metrics.ObserveWebhookRejection(ctx, hook.Name, "validating", string(versionedAttr.Attributes.GetOperation()), admissionmetrics.WebhookRejectionNoError, int(err.Status.ErrStatus.Code))
		admissionmetrics.Metrics.ObserveWebhook(ctx, hook.Name, time.Since(t), rejected, versionedAttr.Attributes, "validating", int(err.Status.ErrStatus.Code))
	default:
		rejected = true
		admissionmetrics.Metrics.ObserveWebhookRejection(ctx, hook.Name, "validating", string(versionedAttr.Attributes.GetOperation()), admissionmetrics.WebhookRejectionAPIServerInternalError, 0)
		admissionmetrics.Metrics.ObserveWebhook(ctx, hook.Name, time.Since(t), rejected, versionedAttr.Attributes, "validating", 0)
	}
} else {
	admissionmetrics.Metrics.ObserveWebhook(ctx, hook.Name, time.Since(t), rejected, versionedAttr.Attributes, "validating", 200)
	return
}

@alvaroaleman
Copy link
Member

But that doesn't trigger the apiserver's ignoreClientCallFailures code path b/c the 200 response means it's not an ErrCallingWebhook, which is what ticket is about:

Yes, because it isn't. The response code from the webhook indicates if it was able to construct a valid admisisonResponse. If the check itself within the webhook failed is indicated by the status code inside the Result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

4 participants