Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDAP-21096] Write appfabric server port to tmp file for k8s readiness probe #15811

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

vsethi09
Copy link
Contributor

CDAP-21096 Add support for startupProbe in CDAP master

Context

In addition to the rolling upgrade strategy, for high availability of APIs, startup probe need to be implemented to ensure that the CDAP service pods (eg. Appfabric) accepts requests only when the server has started.

CDAP services (eg. Appfabric) run on dynamic port and announces the port after the server has started. For this command probe can be configured to check the status of system service using the URL ‘/v3/system/services//status’.

CDAP services will write the port to a temp file in the container after startup. Here is a code snippet for startup probe using the port written to the temp file:

startupProbe:
          exec:
            command:
            - sh
            - -c
            - curl -s -f -k https://localhost:$(cat /tmp/appfabric.port)/v3/system/services/appfabric/status
          failureThreshold: 60
          periodSeconds: 3
          successThreshold: 1
          timeoutSeconds: 1

NOTE: Corresponding cdapio/cdap-operator PR: cdapio/cdap-operator#127

Change Description

  • Introduced StartupProbe as an optional field in the CDAPServiceSpec.
  • The StartupProbe config of type corev1.Probe will be set to the container spec.
  • Auto-generated:
    • api/v1alpha1/zz_generated.deepcopy.go
    • config/crd/bases/cdap.cdap.io_cdapmasters.yaml
  • Makefile: To fix unit tests, the max request size of etcd database used by controller-runtime/tools/setup-envtest had to be increased from default 1.5 MB to 2.5 MB. This is due to the increase in size of generated CRD.

Verification

  • Unit tests
  • CDAP Sandbox
  • Docker image: deployed locally built docker image and edited the cdapmaster manifest to add startupProbe to appfabric spec.

@vsethi09 vsethi09 added the build Triggers github actions build label Jan 20, 2025
@vsethi09 vsethi09 marked this pull request as ready for review January 20, 2025 18:57
@vsethi09 vsethi09 force-pushed the feature/CDAP-21096_appfabric_readiness_probe branch from 3405938 to 90bc0ab Compare January 21, 2025 08:31
Copy link
Contributor

@tivv tivv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why the current way of doing things (e.g. announceing the service in k8s) is not enough and we need an additional communication channel?

@vsethi09
Copy link
Contributor Author

Can you clarify why the current way of doing things (e.g. announceing the service in k8s) is not enough and we need an additional communication channel?

When appfabric k8s deployment is updated, a new Pod creation is Triggered. Due to rolling upgrade strategy (maxUnavailable: 25%), the old Pod terminates only when the new Pod is in Running state.

Even when the new Pod is in Running state, it may not be ready to accept requests. Appfabric service is not available when the new Pod is still not ready and the old pod is terminating. Hence, there is a need for startupProbe which ensures that old Pod is terminated when the new Pod is ready to accept traffic.

@tivv
Copy link
Contributor

tivv commented Jan 22, 2025

Can you clarify why the current way of doing things (e.g. announceing the service in k8s) is not enough and we need an additional communication channel?

When appfabric k8s deployment is updated, a new Pod creation is Triggered. Due to rolling upgrade strategy (maxUnavailable: 25%), the old Pod terminates only when the new Pod is in Running state.

Even when the new Pod is in Running state, it may not be ready to accept requests. Appfabric service is not available when the new Pod is still not ready and the old pod is terminating. Hence, there is a need for startupProbe which ensures that old Pod is terminated when the new Pod is ready to accept traffic.

What I mean is that as soon as Service is available, it registers itself with K8s using io.cdap.cdap.k8s.discovery.KubeDiscoveryService#register. If at all possible, we should use this information in the probe instead of creating a separate information channel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Triggers github actions build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants