Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for lifecycle hooks and health probe for sidecars #1482

Conversation

narendrapatel
Copy link

@narendrapatel narendrapatel commented Sep 6, 2022

Changes proposed in this PR:

One impact due to this:
On adding sidecar-hold-app-until-proxy-starts, the sidecar becomes the default container when using kubectl exec

These features are added as opt in via annotations

How I've tested this PR:

  • Added the annotations to a k8s deployment.
  • Post adding could see postStart, preStop and liveness gets added to the deployment manifest
  • Tested holding app container startup by running continuous calls to sidecar on startup from app. Working as expected. Without the annotation the call to sidecar fails as the app container starts up first.
  • Tested graceful shutdown by adding a preStop to the app container. Works as expected. Without the annotation the envoy sidecar gets terminated immediately while the app is still completing its preStop and all calls to the sidecar start failing. This is running in our environment perfectly from quite some time. Graceful shutdown with injected envoy-sidecar #536 (comment)
  • On adding annotation for liveness it comes up correctly in the pod manifest. This is useful for mesh deployments that do not yet use transparent proxies
  • The annotation expects only boolean value. Any other value would skip adding the annotation functionality.

How I expect reviewers to test this PR:

Checklist:

  • Tests added
  • CHANGELOG entry added

    HashiCorp engineers only, community PRs should not add a changelog entry.
    Entries should use present tense (e.g. Add support for...)

@hashicorp-cla
Copy link

hashicorp-cla commented Sep 6, 2022

CLA assistant check
All committers have signed the CLA.

@david-yu
Copy link
Contributor

david-yu commented Sep 6, 2022

Hi @narendrapatel have you been able to test and see that this works on Consul K8s?

@narendrapatel narendrapatel changed the title configuring probes for envoy sidecars Adding support for lifecycle hooks and probe for sidecars Sep 9, 2022
@narendrapatel narendrapatel changed the title Adding support for lifecycle hooks and probe for sidecars Adding support for lifecycle hooks and health probe for sidecars Sep 9, 2022
@narendrapatel
Copy link
Author

Hi @narendrapatel have you been able to test and see that this works on Consul K8s?

@david-yu yes I have tested this functionality in my local K8s cluster.

@narendrapatel narendrapatel marked this pull request as ready for review September 9, 2022 12:00
@david-yu
Copy link
Contributor

david-yu commented Sep 9, 2022

Hi @narendrapatel thanks for the PR, I initially thought this was going to be a PR to address health probes for sidecars but I am as able to see the whole picture here a bit better now that the whole PR is put up. It looks like the use case is to address a few different proxy lifecycle management use cases.

The Consul K8s team is intending to support a few different proxy lifecycle management use cases after our 1.14 release, however we can't merge this PR as is since we will be making significant architecture changes in how the proxy is deployed and configured as part of that release. I'll leave this PR open for the time being so others can use this for their own purposes as well.

@junjie-landing
Copy link

@david-yu thanks for the sharing.

I don't it was mentioned in the 1.14 release doc. Maybe I am overlooking something?

https://github.com/hashicorp/consul/releases/tag/v1.14.0

@narendrapatel narendrapatel force-pushed the envoy-sidecar-configure-probes branch from be43bc8 to 68d7cbe Compare December 6, 2022 17:32
@narendrapatel
Copy link
Author

narendrapatel commented Dec 7, 2022

I've updated the PR to be compatible with consul version v1.14 and consul control plane 1.0.0. But with the consul dataplane now using distroless base container, we'll need to add couple of binaries like sh, wget, sleep, netstat, grep and wc. The docker file for consul dataplane is available here. Also dropped use of annotation consul.hashicorp.com/sidecar-configure-probes from the PR since with consul control plane 1.0.0 readiness and liveness checks are added by default. So the PR now only covers graceful start up and shut down for the sidecar.

Change required for consul version < 1.14 and consul k8s control plane < 1.0.0 is here

@oliver-buckley-salmon-db

Hi,
Is this ever going to get merged? This functionality is required by many consumers, due to the non-graceful shutdown of services in the mesh.
Thanks

@shanilempel
Copy link

shanilempel commented May 11, 2023

Any news on that? We could highly benefit from such functionality.

@david-yu
Copy link
Contributor

Closing as this is now resolved by #2233 and will be released in upcoming patch releases for 1.0.x, 1.1.x and 1.2.x.

@david-yu david-yu closed this Jun 27, 2023
@oliver-buckley-salmon-db
Copy link

oliver-buckley-salmon-db commented Jun 28, 2023

Thanks everyone, this was a critical fix for us.

@psypuff
Copy link

psypuff commented Jun 29, 2023

@david-yu it seems #2233 doesn't address the startup issue fixed by this PR (Envoy proxy sidecar goes up only after the app container). Without a fix Consul Connect is pretty much unusable for us, we wished a fix would make its way to the just-released 1.2 version. What's the reason to not fix both lifecycle issues at once? Since this PR is now closed is there any other place to track the startup issue progress?

@david-yu
Copy link
Contributor

Hi @psypuff We really wanted to get all proxy lifecycle features in but needed to stagger them out due to the amount of work that was scheduled for Consul 1.16.0 and Consul K8s 1.2. Just as a clarification, the fix for pod shut down will come in patch releases of Consul K8s 1.1.x and 1.0.x that are releasing today, but for Consul K8s 1.2.x we will deliver them in 1.2.1.

The official issue tracking application startup scenarios is #1397. I'll also link this closed PR there as well for folks to reference if they need a workaround.

Command: []string{
"/bin/sh",
"-c",
`total_time=0; until wget --spider localhost:19000;` +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /ready endpoint should be more appropriate for this check IMHO:

#!/bin/sh

while [[ "$(curl -s -o /dev/null -w ''%{http_code}'' localhost:19000/ready)" != "200" ]]; do
  sleep 1
done

Copy link

@komapa komapa Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do while [[ "$(curl -sf http://127.0.0.1:19000/ready)" != "LIVE" ]]; do sleep 1; done with great success :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants