Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add startup probes into the health trait #4182

Closed
wants to merge 8 commits into from
Closed

add startup probes into the health trait #4182

wants to merge 8 commits into from

Conversation

mertdotcc
Copy link
Contributor

Completes (?) #4146

@mertdotcc
Copy link
Contributor Author

@squakez open to any feedback!

@squakez
Copy link
Contributor

squakez commented Mar 28, 2023

Nice! The structure is there and looks good. Now you should use those parameters to configure the probe on Kubernetes. I think you can mimic the usage of the readiness and liveness probes to have it working.

@mertdotcc
Copy link
Contributor Author

@squakez Could you please check again?

Copy link
Contributor

@squakez squakez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider to include an E2E test to validate the feature. Something like:

t.Run("Readiness condition with never ready route", func(t *testing.T) {

@@ -54,4 +54,19 @@ type HealthTrait struct {
ReadinessSuccessThreshold int32 `property:"readiness-success-threshold" json:"readinessSuccessThreshold,omitempty"`
// Minimum consecutive failures for the readiness probe to be considered failed after having succeeded.
ReadinessFailureThreshold int32 `property:"readiness-failure-threshold" json:"readinessFailureThreshold,omitempty"`

// Configures the startup probe for the integration container (default `true`).
StartupProbeEnabled *bool `property:"startup-probe-enabled" json:"startupProbeEnabled,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd inclined to think this one should default to false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was confused with this anyway. Could you explain why the liveness probe defaults to false but the readiness one defaults to true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Kubernetes doc page [1] has the better explaination:

The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running

I think since in the integration space we're typically dealing with backend services we're mostly interested in readiness (by default). However, the presence of the parameters will let the final user to configure according to her requirements.

[1] https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect explanation, thank you! Puts things into perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be linked to what is written at the beginning of this PR #2719, it is preferable to have it enabled to use it to reconcile the Integration phase and readiness condition.

@mertdotcc
Copy link
Contributor Author

@squakez Could you check it one more time? Not sure about the test tbh.

@mertdotcc mertdotcc requested review from gansheer and squakez and removed request for gansheer and squakez March 28, 2023 16:18
Copy link
Contributor

@squakez squakez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it misses a check in pkg/trait/health.go at line 79.
About the test, I think you need to simulate the usage of -t health.startupProbeEnabled. I think the idea is to simulate the situation you describe in #4146 and validate the success with this new probe. I suggest you to test this locally before, you can refer the documentation in https://camel.apache.org/camel-k/next/contributing/e2e.html

In general, I use to make images and then run just the test suite I need to validate to avoid wating to much time to run the entire test suite. In this case you probably need to temporary edit your makefile where you have:

test-common: do-build
	FAILED=0; STAGING_RUNTIME_REPO="$(STAGING_RUNTIME_REPO)"; \
	go test -timeout 30m -v ./e2e/common/support/startup_test.go -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
	go test -timeout 30m -v ./e2e/common/languages -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
	go test -timeout 30m -v ./e2e/common/cli -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
	go test -timeout 30m -v ./e2e/common/config -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
	go test -timeout 30m -v ./e2e/common/misc -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
	go test -timeout 30m -v ./e2e/common/traits -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
	go test -timeout 30m -v ./e2e/common/support/teardown_test.go -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
	exit $${FAILED}

and replace just for your local execution with:

test-common: do-build
	FAILED=0; STAGING_RUNTIME_REPO="$(STAGING_RUNTIME_REPO)"; \
	go test -timeout 30m -v ./e2e/common/support/startup_test.go -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
	go test -timeout 30m -v ./e2e/common/traits/health_test.go -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
	go test -timeout 30m -v ./e2e/common/support/teardown_test.go -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
	exit $${FAILED}

@@ -29,11 +29,8 @@ import (
"testing"
"time"

. "github.com/onsi/gomega"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these removals might be the source of the IT's failure. You need to add them again.

@mertdotcc mertdotcc closed this Mar 29, 2023
@mertdotcc mertdotcc deleted the issue4146_startup_probes branch March 29, 2023 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants