Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Policy-Server should always run fault tolerant and PDBs should be configured #564

Closed
Martin-Weiss opened this issue Oct 26, 2023 · 4 comments

Comments

@Martin-Weiss
Copy link

Is your feature request related to a problem?

When we deploy the kubewarden-defaults chart - we end up with a single replica policy-server.

This has a problem with fault tolerance (drain) and when increasing the number of replicas we would need anti-affinity and pod disruption budgets as well.

Solution you'd like

Not sure if this should be on the operator or on the policy server - but we should have settings that allow the configuration of PDB and anti-affinity as well as number of replicas (default for all policy servers as well as single policy servers).

Alternatives you've considered

No response

Anything else?

No response

@flavio flavio transferred this issue from kubewarden/policy-server Oct 26, 2023
@flavio
Copy link
Member

flavio commented Oct 26, 2023

I've moved the issue to the controller repository because this is an epic.

Anti-affinity rules

I propose to extend the CRD of the PolicyServer by adding a new attribute called affinity of type vi/Affinity.

This object, when set, is then copied into the PodSpec of the Policy Server Deploment by our controller.

The default helm chart should then allow this affinity value to be set for the default Policy Server it creates.

Pod disruption budget

We should extend the PolicyServer CRD by adding the minAvailable and/or the maxUnavailable fields.

Question for @Martin-Weiss: do you think we should expose both attributes or just one of them?

The controller will then take care of creating the PodDisruptionBudget object that targets the specific Policy Server pods.

Also in this case, we should change the default helm chart to allow this value to be set for the default Policy Server.

Replica size

The defaults helm chart already allows the replica number to be set. I would leave the default value to 1, because I think the default values should not be the ones aimed for a production deployment.

@flavio flavio transferred this issue from kubewarden/helm-charts Oct 26, 2023
@mpepping
Copy link

Policy-servers should match the criticality of Kubernetes API-server replicas within a cluster. Running as a DaemonSet on control plane nodes, or something among those lines

@viccuad
Copy link
Member

viccuad commented Feb 1, 2024

Configurable/autoscalable resources

Since newly released Kubewarden 1.10, policy-servers have a different architecture that is both more efficient when scaling horizontally and more performant. See https://www.kubewarden.io/blog/2024/kubewarden-1-10-release/. This ameliorates the autoscalability problems.
For adding something like horizontalPodAutoscaler, one must be aware that policy-server deployments get the scheduled policies as active once a rollout has happened. Meanwhile the rollout, the old policies are accessible via the old policy-server pod as the webhooks point to it, and once the new policy-server pod is ready, the webhooks are updated to point to it. Hence triggering autoscalability may be counterproductive by the amount of rollouts. This may change in the future though.

configurable system-cluster-critical priorityClass

Given the notes listed in https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#notes-about-podpriority-and-existing-clusters, this feels like a one-way street. Setting it in the CRD should be optional, and once set up, maybe it shouldn’t be removable.

@flavio
Copy link
Member

flavio commented Apr 8, 2024

Marking as done. This is going to be part of 1.12 once tagged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

6 participants