-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent creation of KSVC if its namespace is not in the Mesh #181
Prevent creation of KSVC if its namespace is not in the Mesh #181
Conversation
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With rejecting creation you kind of follow eventual consistency, so I think that is ok per se, especially considering the fact that those two controllers are independent of each other.
I have one thing for you to validate, and a few nits. Other than that the code looks good. Big kudos for spending time on writing tests!
Important
Keep in mind that at some point we will switch from creating SMM (or directly manipulating SMMR which we shouldn't be doing in the first place) to label selectors and at this point we will need to make sure we can handle this case as well. I do not know if SMMR is internally updated if that's the case, but I will figure it out.
main.go
Outdated
os.Exit(1) | ||
} | ||
} else { | ||
setupLog.Info("Skipping setup of Knative Service validating Webhook.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really have to log such info perhaps it's good to add "why" it is skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'll add a reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed... That said, I'm no longer sure how much we can skip this.
The conditional is in prevention of RawDeployment setup, where Knative is missing. However, AFAIK, there is no way to introduce equivalent conditionals around the ValidatingWebhookConfiguration
resource (i.e. the operator manifests).
So, if the user has Knative for other uses, the ValidatingWebhookConfiguration
would take effect and would break the cluster, given the controller would have the logic turned off.
I'll tag @VaishnaviHire and @zdtsw for their advice.
log.Info("Rejecting Knative service because its namespace is not a member of the service mesh") | ||
return fmt.Errorf("rejecting creation of Knative service %s on namespace %s because the namespace is not a configured member of the mesh", ksvc.Name, ksvc.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure you really need a log here - the error you are returning seems to be sufficient and I assume it will be logged somehow?
In addition the info message is misleading. It can be pending or failing if I'm reading this right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is not written to the controller logs. The error is returned to the client that tried to create the resource. So, the client chooses what to do with the message. Because of this, I chose to add the log message to make it clear what is happening in odh-model-controller side.
About pending
or failing
, I'm not sure if it is worth to go that deep here :-) Stating it is not configured is enough for me to make the user go and diagnose it, in case it stays on one of those states for longer than expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I thought this is handled by operator-sdk, but if that is not the case logging is helpful here.
About pending or failing, I'm not sure if it is worth to go that deep here :-) Stating it is not configured is enough for me to make the user go and diagnose it, in case it stays on one of those states for longer than expected.
I was coming from the angle that "not configured" is a bit ambiguous here. Does it mean that someone forgot to put the proper label on the namespace so it is not really seen by the mesh OR it cannot be configured by the mesh because... other errors? But I agree it might be going to deep.
When a KServe InferenceService is created, odh-model-controller enrolls the ISVC namespace to the mesh, if needed. It was found that sometimes the namespace enrollment process may take a little time to be processed. Since KServe and odh-model-controllers are (to certain extent) independent of each other, sometimes KServe controller and Knative serving controller are faster and the pod from the KSVC may be created faster than the mesh enrollment process. This would lead to the KSVC pod not having an Istio sidecar. Since the Istio authorization rules are evaluated on the sidecar of target service, the missing sidecar on the model/ksvc pod would mean that the traffic would bypass the authorization rules and this impacts ODH KServe authorization (ref: https://istio.io/latest/docs/ops/best-practices/security/#server-first-tcp-protocols-are-not-supported). In an effort to prevent that situation, this is adding a new validating webhook that would block creation of the Knative Service until the namespace is acknowledged to be a member of the service mesh. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
623cc58
to
6a1dd19
Compare
I just squashed the commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bartoszmajsak, israel-hdez, VedantMahabaleshwarkar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This change hanleds "soft opt-in" for authorization capability handled by Authorino. The logic consists of following steps: * during the controller bootstrap it will check if GroupVersion `authorino.kuadrant.io/v1beta2` is present in the cluster and conditionally add it to scheme. * when defining watches (using builder instance for OpenshiftReconciler) it will conditionally set a watch for `AuthConfig` according to the rules defined in opendatahub-io#181 but additionally if Authorino GroupVersion is present in the schema * it guards AuthConfig reconciler from executing based on the "schema rule". This prevents from constant failures due to AuthConfig not being defined for the cluster. NOTE: As it is the case with original logic brought in opendatahub-io#181, it will require restart of the controller pod to set watch on AuthConfig resource when Authorino is installed in the cluster.
This change handles "soft opt-in" for authorization capability handled by Authorino. The logic consists of following steps: * during the controller bootstrap it will check if GroupVersion `authorino.kuadrant.io/v1beta2` is present in the cluster and conditionally add it to scheme. * when defining watches (using builder instance for OpenshiftReconciler) it will conditionally set a watch for `AuthConfig` according to the rules defined in opendatahub-io#181 but additionally if Authorino GroupVersion is present in the schema * it guards AuthConfig reconciler from executing based on the "schema rule". This prevents from constant failures due to AuthConfig not being defined for the cluster. NOTE: As it is the case with original logic brought in opendatahub-io#181, it will require restart of the controller pod to set watch on AuthConfig resource when Authorino is installed in the cluster. Custom image: quay.io/bmajsak/odh-model-controller:latest
This change handles "soft opt-in" for authorization capability handled by Authorino. The logic consists of following steps: * during the controller bootstrap it will check if GroupVersion `authorino.kuadrant.io/v1beta2` is present in the cluster and conditionally add it to scheme. * when defining watches (using builder instance for OpenshiftReconciler) it will conditionally set a watch for `AuthConfig` according to the rules defined in opendatahub-io#181 but additionally if Authorino GroupVersion is present in the schema * it guards AuthConfig reconciler from executing based on the "schema rule". This prevents from constant failures due to AuthConfig not being defined for the cluster. NOTE: As it is the case with original logic brought in opendatahub-io#181, it will require restart of the controller pod to set watch on AuthConfig resource when Authorino is installed in the cluster. Custom image: quay.io/bmajsak/odh-model-controller:latest Signed-off-by: bartoszmajsak <bartosz.majsak@gmail.com>
This change handles "soft opt-in" for authorization capability handled by Authorino. The logic consists of following steps: * during the controller bootstrap it will check if GroupVersion `authorino.kuadrant.io/v1beta2` is present in the cluster and conditionally add it to scheme. * when defining watches (using builder instance for OpenshiftReconciler) it will conditionally set a watch for `AuthConfig` according to the rules defined in opendatahub-io#181 but additionally if Authorino GroupVersion is present in the schema * it guards AuthConfig reconciler from executing based on the "schema rule". This prevents from constant failures due to AuthConfig not being defined for the cluster. NOTE: As it is the case with original logic brought in opendatahub-io#181, it will require restart of the controller pod to set watch on AuthConfig resource when Authorino is installed in the cluster. Custom image: quay.io/bmajsak/odh-model-controller:latest Signed-off-by: bartoszmajsak <bartosz.majsak@gmail.com>
This change handles "soft opt-in" for authorization capability handled by Authorino. The logic consists of following steps: * during the controller bootstrap it will check if GroupVersion `authorino.kuadrant.io/v1beta2` is present in the cluster and conditionally add it to scheme. * when defining watches (using builder instance for OpenshiftReconciler) it will conditionally set a watch for `AuthConfig` according to the rules defined in opendatahub-io#181 but additionally if Authorino GroupVersion is present in the schema * it guards AuthConfig reconciler from executing based on the "schema rule". This prevents from constant failures due to AuthConfig not being defined for the cluster. NOTE: As it is the case with original logic brought in opendatahub-io#181, it will require restart of the controller pod to set watch on AuthConfig resource when Authorino is installed in the cluster. Custom image: quay.io/bmajsak/odh-model-controller:latest Signed-off-by: bartoszmajsak <bartosz.majsak@gmail.com>
This change handles "soft opt-in" for authorization capability handled by Authorino. The logic consists of following steps: * during the controller bootstrap it will check if GroupVersion `authorino.kuadrant.io/v1beta2` is present in the cluster and conditionally add it to scheme. * when defining watches (using builder instance for OpenshiftReconciler) it will conditionally set a watch for `AuthConfig` according to the rules defined in opendatahub-io#181 but additionally if Authorino GroupVersion is present in the schema * it guards AuthConfig reconciler from executing based on the "schema rule". This prevents from constant failures due to AuthConfig not being defined for the cluster. NOTE: As it is the case with original logic brought in opendatahub-io#181, it will require restart of the controller pod to set watch on AuthConfig resource when Authorino is installed in the cluster. Custom image: quay.io/bmajsak/odh-model-controller:latest Signed-off-by: bartoszmajsak <bartosz.majsak@gmail.com>
This change handles "soft opt-in" for authorization capability handled by Authorino. The solution is based on Conditions exposed in DSCInitialization resource, as provided in opendatahub-io/opendatahub-operator#949. When the condition of type `CapabilityServiceMeshAuthorization` has a reason `MissingOperator` it will assume authorino/authorization module is not present and therefore does not handle AuthConfigs - leaving models not secured as it was for the previous versions. This approach allows to softly opt-in for Authorization without blocking the upgrade. Any other reason means that ODH Operator is in not happy state and will eventually reconcile so that Authorino will start working. The logic consists of the following steps: * during the controller bootstrap it will check separately if ServiceMesh is enabled and Authorino is present (latter by inspecting the status.condition) * when defining watches (using builder instance for OpenshiftReconciler) it will conditionally set a watch for `AuthConfig` * it guards AuthConfig reconciler from executing based on the "authorization capability". This prevents constant failures due to `AuthConfig` not being defined for the cluster. > [!NOTE] > As it is the case with original logic brought in opendatahub-io#181, it will require restart of the controller pod to set a watch on AuthConfig resource when Authorino is installed in the cluster. Signed-off-by: bartoszmajsak <bartosz.majsak@gmail.com>
Description
When a KServe InferenceService is created, odh-model-controller enrolls the ISVC namespace to the mesh, if needed. It was found that sometimes the namespace enrollment process may take a little time to be processed.
Since KServe and odh-model-controllers are (to certain extent) independent of each other, sometimes KServe controller and Knative serving controller are faster and the pod from the KSVC may be created faster than the mesh enrollment process. This would lead to the KSVC pod not having an Istio sidecar.
Since the Istio authorization rules are evaluated on the sidecar of target service, the missing sidecar on the model/ksvc pod would mean that the traffic would bypass the authorization rules and this impacts ODH KServe authorization (ref: https://istio.io/latest/docs/ops/best-practices/security/#server-first-tcp-protocols-are-not-supported).
In an effort to prevent that situation, this is adding a new validating webhook that would block creation of the Knative Service until the namespace is acknowledged to be a member of the service mesh.
Fixes https://issues.redhat.com/browse/RHOAIENG-2542
How Has This Been Tested?
Simple sanity check: ensuring a model can be deployed.
Merge criteria: