add initial delay seconds to brig liveness and readiness probes #2878

amitsagtani97 · 2022-11-29T08:18:03Z

Ticket - https://wearezeta.atlassian.net/browse/SQPIT-491

Brig pods continuously restarts in case of liveness and readiness probes failing on the startup in the low compute resource environment.

To reproduce -

reduce the memory and cpu resource limits in charts/brig/values.yaml and deploy the chart.
Components of brig will take time to start which is greater than the wait time for before running first liveness probe, hence the pod is marked as failed and restarted again and again.

Added a initial delay before running first liveness and readiness probe, which allows all the brig containers to start on a low resource environment before running the probes.

Checklist

Add a new entry in an appropriate subdirectory of changelog.d
Read and follow the PR guidelines

charts/brig/templates/deployment.yaml

jschaul · 2022-12-06T16:15:51Z

charts/brig/templates/deployment.yaml

@@ -138,11 +138,13 @@ spec:
              scheme: HTTP
              path: /i/status
              port: {{ .Values.service.internalPort }}
+            initialDelaySeconds: 30


I think your problem would be better solved with a startup probe, see also https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

startupProbe: httpGet: ... failureThreshold: 6 periodSeconds: 5

That would wait for 30 seconds before moving over to the liveness probe which restarts brig.

Overall, if /i/status fails, however, then this begs the question if the installation works correctly otherwise? If brig doesn't have enough resources likely users will not have adequate latencies either.

Also, could it be that something else regarding networking is not working as it should, rather than this just being a resource problem?

Thank you, yes the startupProbe make more sense here, added it.

Yeah, there can be issues due to high network latencies as well, there is a argument "timeoutSeconds", which has default value of 1 second. Maybe, we can increase that as well for these probes.

jschaul · 2022-12-13T11:31:48Z

@amitsagtani97 PR looks good now; could you add one line to a changelog file (maybe under internal) then this PR is good to be (squash-) merged.

add initial delay secondes to brig liveness and readiness probes

3048d75

amitsagtani97 requested review from flokli, jschaul and julialongtin November 29, 2022 08:18

amitsagtani97 temporarily deployed to cachix November 29, 2022 08:18 Inactive

zebot added the ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist label Nov 29, 2022

flokli approved these changes Nov 29, 2022

View reviewed changes

jschaul requested changes Dec 6, 2022

View reviewed changes

replace initial delays with startupProbe

6c616de

amitsagtani97 changed the title ~~add initial delay secondes to brig liveness and readiness probes~~ add initial delay seconds to brig liveness and readiness probes Dec 7, 2022

amitsagtani97 temporarily deployed to cachix December 7, 2022 08:26 — with GitHub Actions Inactive

jschaul approved these changes Dec 13, 2022

View reviewed changes

add changelog message

b4a5efd

amitsagtani97 temporarily deployed to cachix December 13, 2022 11:39 — with GitHub Actions Inactive

amitsagtani97 merged commit b025d13 into develop Dec 13, 2022

amitsagtani97 deleted the update_brig_helm_chart branch December 13, 2022 13:01

zebot mentioned this pull request Jan 12, 2023

Release 2023-01-12 - (expected chart version 4.30.0) #2977

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add initial delay seconds to brig liveness and readiness probes #2878

add initial delay seconds to brig liveness and readiness probes #2878

amitsagtani97 commented Nov 29, 2022

jschaul Dec 6, 2022

amitsagtani97 Dec 7, 2022

jschaul commented Dec 13, 2022

add initial delay seconds to brig liveness and readiness probes #2878

add initial delay seconds to brig liveness and readiness probes #2878

Conversation

amitsagtani97 commented Nov 29, 2022

Checklist

jschaul Dec 6, 2022

Choose a reason for hiding this comment

amitsagtani97 Dec 7, 2022

Choose a reason for hiding this comment

jschaul commented Dec 13, 2022