Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gateway2: Add readiness and liveness probes to the kube gloo gateway pod #10229

Merged
merged 29 commits into from
Nov 18, 2024

Conversation

davidjumani
Copy link

@davidjumani davidjumani commented Nov 14, 2024

Description

This update allows users to enable and add custom readiness and liveness probes to the Kubernetes Gloo Gateway.
This allows zero downtime when upgrading gloo or when rolling out new pods

API changes

The following helm changes have been made :

  • kubeGateway.gatewayParameters.glooGateway.podTemplate.terminationGracePeriodSeconds to specify the terminationGracePeriodSeconds.
  • kubeGateway.gatewayParameters.glooGateway.podTemplate.gracefulShutdown to configure the graceful shutdown config for the envoy container.
  • kubeGateway.gatewayParameters.glooGateway.podTemplate.customLivenessProbe to specify a custom liveness probe for the envoy container.
  • kubeGateway.gatewayParameters.glooGateway.podTemplate.livenessProbeEnabled to enable the liveness probe. If the customLivenessProbe is not specified, no default liveness probe is set.
  • kubeGateway.gatewayParameters.glooGateway.podTemplate.customReadinessProbe to specify a custom readiness probe for the envoy container.
  • kubeGateway.gatewayParameters.glooGateway.podTemplate.probes to enable the readiness probe. If the customReadinessProbe is not specified, a default readiness probe is set. No default liveness probe is set

The following fields have been added to the GatewayParameters.Kube.PodTemplate field :

  • GracefulShutdown
  • TerminationGracePeriodSeconds
  • ReadinessProbe
  • LivenessProbe

Code changes

Updated the deployer to read the new values from the gateway params and generate the appropriate helm values for the gloo gateway pod deployment

CI changes

Added the zero downtime config to all Kubernetes tests to ensure this does not break an existing feature

Docs changes

TODO in a followup with @solo-io/solo-docs

Context

https://github.com/solo-io/solo-projects/issues/7084

Interesting decisions

The helm values were created to mimic the existing values to setup the probes and zero downtime config for edge classic. Ref: https://github.com/solo-io/solo-projects/issues/7084

Testing steps

Added unit tests

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

@davidjumani davidjumani requested a review from a team as a code owner November 14, 2024 15:39
@nfuden nfuden changed the title gateway2: Add readiness and liveness probes to the kube gloo gateway pod [Migrated] add stopwatches for translator and envoy syncer Nov 14, 2024
@nfuden nfuden closed this Nov 14, 2024
@davidjumani davidjumani changed the title [Migrated] add stopwatches for translator and envoy syncer [Migrated] gateway2: Add readiness and liveness probes to the kube gloo gateway pod Nov 14, 2024
@davidjumani davidjumani reopened this Nov 14, 2024
@davidjumani davidjumani changed the title [Migrated] gateway2: Add readiness and liveness probes to the kube gloo gateway pod gateway2: Add readiness and liveness probes to the kube gloo gateway pod Nov 14, 2024
@solo-changelog-bot
Copy link

Issues linked to changelog:
https://github.com/solo-io/solo-projects/issues/7084

- `kubeGateway.gatewayParameters.glooGateway.podTemplate.terminationGracePeriodSeconds` to specify the terminationGracePeriodSeconds.
- `kubeGateway.gatewayParameters.glooGateway.podTemplate.gracefulShutdown` to configure the graceful shutdown config for the envoy container.
- `kubeGateway.gatewayParameters.glooGateway.podTemplate.customLivenessProbe` to specify a custom liveness probe for the envoy container.
- `kubeGateway.gatewayParameters.glooGateway.podTemplate.livenessProbeEnabled` to enable the liveness probe. If the customLivenessProbe is not specified, no liveness probe is set.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `kubeGateway.gatewayParameters.glooGateway.podTemplate.livenessProbeEnabled` to enable the liveness probe. If the customLivenessProbe is not specified, no liveness probe is set.

should we also remove the option in other places from the helm values?

@@ -339,9 +339,19 @@ type GatewayParameters struct {
Stats *GatewayParamsStatsConfig `json:"stats,omitempty" desc:"Config used to manage the stats endpoints exposed on the deployed proxies"`
AIExtension *GatewayParamsAIExtension `json:"aiExtension,omitempty" desc:"Config used to manage the Gloo Gateway AI extension."`
FloatingUserId *bool `json:"floatingUserId,omitempty" desc:"If true, allows the cluster to dynamically assign a user ID for the processes running in the container. Default is false."`
PodTemplate *GatewayParamsPodTemplate `json:"podTemplate,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add description

type GatewayParamsPodTemplate struct {
GracefulShutdown *GracefulShutdownSpec `json:"gracefulShutdown,omitempty"`
TerminationGracePeriodSeconds *int `json:"terminationGracePeriodSeconds,omitempty" desc:"Time in seconds to wait for the pod to terminate gracefully. See [kubernetes docs](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#istio-lifecycle) for more info."`
Probes *bool `json:"probes,omitempty" desc:"Set to true to enable a readiness probe (default is false). Then, you can also enable a liveness probe."`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a bool flag can we call it enableProbes?

also, a little confused about the description:

Set to true to enable a readiness probe (default is false). Then, you can also enable a liveness probe.

should there be 2 different flags, enableLivenessProbe and enableReadinessProbe? or does this one flag enable both?

Copy link
Author

@davidjumani davidjumani Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to mimic the edge classic gateway proxy settings where probes controls the readiness probe.
Since there's no default liveness probe, a custom one can be set with the customLivenessProbe without the need for a bool to enable it

Probes *bool `json:"probes,omitempty" desc:"Set to true to enable a readiness probe (default is false). Then, you can also enable a liveness probe."`

// TODO(npolshak): Add support for GlooMtls
}

// GatewayProxyPodTemplate contains the Helm API available to configure the PodTemplate on the gateway Deployment
type GatewayParamsPodTemplate struct {
GracefulShutdown *GracefulShutdownSpec `json:"gracefulShutdown,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add descriptions to all fields

GracefulShutdown *GracefulShutdownSpec `json:"gracefulShutdown,omitempty"`
TerminationGracePeriodSeconds *int `json:"terminationGracePeriodSeconds,omitempty" desc:"Time in seconds to wait for the pod to terminate gracefully. See [kubernetes docs](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#istio-lifecycle) for more info."`
Probes *bool `json:"probes,omitempty" desc:"Set to true to enable a readiness probe (default is false). Then, you can also enable a liveness probe."`
CustomReadinessProbe *corev1.Probe `json:"customReadinessProbe,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the description here, we can say that if enableProbes=true and no customReadinessProbe is provided, then a default one will be created

}

// Don't merge the command string as that can break the entire probe
dst.Command = overrideSlices(dst.Command, src.Command)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add a test to merge_test.go that tests this override

command:
- /bin/sh
- -c
- wget --post-data "" -O /dev/null 127.0.0.1:19000/healthcheck/fail; sleep {{ $gateway.gracefulShutdown.sleepTimeSeconds | default "10" }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this work ok if sleepTimeSeconds is nil?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes since a default is set

@Nadine2016 Nadine2016 added documentation Improvements or additions to documentation and removed documentation Improvements or additions to documentation labels Nov 18, 2024
@jenshu
Copy link

jenshu commented Nov 18, 2024

known flake #10365

@nfuden nfuden merged commit b31586c into main Nov 18, 2024
17 of 18 checks passed
@nfuden nfuden deleted the add-readiness-liveness-probe branch November 18, 2024 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants