-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add startup probes into the health trait #4182
Conversation
@squakez open to any feedback! |
Nice! The structure is there and looks good. Now you should use those parameters to configure the probe on Kubernetes. I think you can mimic the usage of the readiness and liveness probes to have it working. |
@squakez Could you please check again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider to include an E2E test to validate the feature. Something like:
camel-k/e2e/common/traits/health_test.go
Line 279 in f726675
t.Run("Readiness condition with never ready route", func(t *testing.T) { |
@@ -54,4 +54,19 @@ type HealthTrait struct { | |||
ReadinessSuccessThreshold int32 `property:"readiness-success-threshold" json:"readinessSuccessThreshold,omitempty"` | |||
// Minimum consecutive failures for the readiness probe to be considered failed after having succeeded. | |||
ReadinessFailureThreshold int32 `property:"readiness-failure-threshold" json:"readinessFailureThreshold,omitempty"` | |||
|
|||
// Configures the startup probe for the integration container (default `true`). | |||
StartupProbeEnabled *bool `property:"startup-probe-enabled" json:"startupProbeEnabled,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd inclined to think this one should default to false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was confused with this anyway. Could you explain why the liveness probe defaults to false but the readiness one defaults to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Kubernetes doc page [1] has the better explaination:
The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running
I think since in the integration space we're typically dealing with backend services we're mostly interested in readiness (by default). However, the presence of the parameters will let the final user to configure according to her requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect explanation, thank you! Puts things into perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could also be linked to what is written at the beginning of this PR #2719, it is preferable to have it enabled to use it to reconcile the Integration phase and readiness condition.
@squakez Could you check it one more time? Not sure about the test tbh. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it misses a check in pkg/trait/health.go at line 79.
About the test, I think you need to simulate the usage of -t health.startupProbeEnabled
. I think the idea is to simulate the situation you describe in #4146 and validate the success with this new probe. I suggest you to test this locally before, you can refer the documentation in https://camel.apache.org/camel-k/next/contributing/e2e.html
In general, I use to make images
and then run just the test suite I need to validate to avoid wating to much time to run the entire test suite. In this case you probably need to temporary edit your makefile where you have:
test-common: do-build
FAILED=0; STAGING_RUNTIME_REPO="$(STAGING_RUNTIME_REPO)"; \
go test -timeout 30m -v ./e2e/common/support/startup_test.go -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
go test -timeout 30m -v ./e2e/common/languages -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
go test -timeout 30m -v ./e2e/common/cli -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
go test -timeout 30m -v ./e2e/common/config -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
go test -timeout 30m -v ./e2e/common/misc -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
go test -timeout 30m -v ./e2e/common/traits -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
go test -timeout 30m -v ./e2e/common/support/teardown_test.go -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
exit $${FAILED}
and replace just for your local execution with:
test-common: do-build
FAILED=0; STAGING_RUNTIME_REPO="$(STAGING_RUNTIME_REPO)"; \
go test -timeout 30m -v ./e2e/common/support/startup_test.go -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
go test -timeout 30m -v ./e2e/common/traits/health_test.go -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
go test -timeout 30m -v ./e2e/common/support/teardown_test.go -tags=integration $(TEST_INTEGRATION_COMMON_LANG_RUN) $(GOTESTFMT) || FAILED=1; \
exit $${FAILED}
@@ -29,11 +29,8 @@ import ( | |||
"testing" | |||
"time" | |||
|
|||
. "github.com/onsi/gomega" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these removals might be the source of the IT's failure. You need to add them again.
Completes (?) #4146