-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix signal handling for lifecycle sidecar #409
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the updated pattern for signal handling and passing a context into exex.CommandContext()
.
I might be out of the loop here, but does this address a specific problem we are running into? Also, given we don't really follow a pattern of logging time taken by a command, I'm unsure if this is something we would want to merge vs use to identify root cause and fix independently.
Sorry about that, I feel silly for not expanding in the PR. This addresses the issues that we've been seeing with services in the UI showing up after the application has been deleted. What was happening was that an application would get deleted, the pods would terminate and their preStop hooks would deregister the service, (normal), however due to the time between when we're running the This fixes the race condition between lifecycle-sidecar getting a term signal and pod de/re-registration. There is also a past known issue where we suspected pods were not executing their Let me know if that doesn't make sense and I can demo it to you, and I'll also add steps to repro to the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. Like you mentioned, we also need a test case. We should be able to start the command, send the signal and see that the service isn't re-registered in consul.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🦊 ! See comment about test, also needs changelog entry.
This fixes issues related to stale service registrations created when pods are terminated, causing services to be registered with no pods backing them. This was caused by a race condition in shutdown logic of the lifecycle-sidecar, so the fix address shutdown path for it.
Changes proposed in this PR:
How I've tested this PR:
existing unit tests + manual tests
How I expect reviewers to test this PR:
Spin up latest release of consul-k8s and deploy a sample app that is connect injected, it is helpful to see the issue in the UI so enable that as well. Once the app is online delete the application, you can watch the logs of the lifecycle-sidecar for the pod side by side with the agent logs and see that lifecycle sidecar runs a register after the pod de-registers its service, or see that in the UI the service never goes away.
Checklist: