-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure stateOK is reported only when all components have sent updates #11249
Conversation
Thanks! I initially tried this but |
lib/service/cfg.go
Outdated
if proxyConfig.Enabled { | ||
componentCount++ | ||
} | ||
if proxyConfig.Kube.Enabled && !proxyConfig.Kube.ListenAddr.IsEmpty() && !proxyConfig.DisableReverseTunnel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the proxyConfig.DisableReverseTunnel
impact kubernetes_service
?
Also I think that there are some cases where proxyConfig.Kube.ListenAddr
is empty but kube service should be started
For example when teleport is stared as kube agent:
teleport:
kubernetes_service:
enabled: true
kubeconfig_file: /path/to/kubeconfig
auth_service:
enabled: false
ssh_service:
enabled: false
proxy_service:
enabled: false
app_service:
enabled: false
db_service:
enabled: false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the
proxyConfig.DisableReverseTunnel
impactkubernetes_service
?
I'm not familiar enough with Teleport to able to answer this.
I came up with if proxyConfig.Kube.Enabled && !proxyConfig.Kube.ListenAddr.IsEmpty() && !proxyConfig.DisableReverseTunnel {
by putting the following two conditions together:
teleport/lib/service/service.go
Line 3043 in b70c1d6
if listeners.kube != nil && !process.Config.Proxy.DisableReverseTunnel { |
teleport/lib/service/service.go
Lines 2581 to 2588 in b70c1d6
if cfg.Proxy.Kube.Enabled && !cfg.Proxy.Kube.ListenAddr.IsEmpty() { | |
process.log.Debugf("Setup Proxy: turning on Kubernetes proxy.") | |
listener, err := process.importOrCreateListener(listenerProxyKube, cfg.Proxy.Kube.ListenAddr.Addr) | |
if err != nil { | |
return nil, trace.Wrap(err) | |
} | |
listeners.kube = listener | |
} |
ComponentCount
counts the number of components that send heartbeats. When the condition above holds, it seems that the proxy kube component will be started and will send heartbeats.
Also I think that there are some cases where
proxyConfig.Kube.ListenAddr
is empty but kube service should be startedFor example when teleport is stared as kube agent:
teleport: kubernetes_service: enabled: true kubeconfig_file: /path/to/kubeconfig auth_service: enabled: false ssh_service: enabled: false proxy_service: enabled: false app_service: enabled: false db_service: enabled: false
I think that in this case this function will correctly return 1
due to this piece of code below:
if cfg.Kube.Enabled {
componentCount++
}
lib/service/state.go
Outdated
process *TeleportProcess | ||
mu sync.Mutex | ||
states map[string]*componentState | ||
componentCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we emphasise in variable name what is the difference between current component calculated based on len(states
and desired stateOK
componentCount
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to improve this in 4c0fca2.
839b1fc
to
4c0fca2
Compare
da12339
to
734795f
Compare
With #11725 merged, I pushed 734795f so that this PR now implements both 1. and 2. @zmb3 @smallinsky |
lib/service/service.go
Outdated
@@ -3440,6 +3451,9 @@ func (process *TeleportProcess) waitForAppDepend() { | |||
|
|||
// registerTeleportReadyEvent ensures that a TeleportReadyEvent is produced | |||
// when all components have started. | |||
// Note that this function should be kept in sync with the Config.ComponentCount function so that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the mapping is 1:1, can't we just calculate the value here and store it in the process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think this is a good idea. Implemented something similar in 16b4228.
Updated the top-level comment to reflect the latest changes. |
…#11249) Fixes #11065. This commit: - ensures that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready - changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`) - uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065 Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that: 1. if `componentCount` is lower than it should, then the service fails to start (due to #11725) 2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`.
…#11249) Fixes #11065. This commit: - ensures that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready - changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`) - uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065 Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that: 1. if `componentCount` is lower than it should, then the service fails to start (due to #11725) 2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`.
* Throw startup error if `TeleportReadyEvent` is not emitted (#11725) * Throw startup error if `TeleportReadyEvent` is not emitted Before this commit, the `TeleportReadyEvent` was only waited for when a process reload occurred. Thus, if a bug exists in the code that emits this event (as it's currently the case since the `MetricsReady` and `WindowsDesktopReady` events are never emitted), such a bug may go unnoticed for a while. This commit ensures that the `TeleportReadyEvent` is always waited for on startup, and throws an error if the event is not emitted (after some timeout). This commit also: - removes the `MetricsReady` event (as this is not produced by a component that sends heartbeats, which is the case of every other event required by the `TeleportReadyEvent` event mapping) - ensures that `WindowsDesktopReady` event is emitted - refactors some of the code in `lib/service/supervisor.go` - moves the event mapping registration to a new `registerTeleportReadyEvent` function * Ensure stateOK is reported only when all components have sent updates (#11249) Fixes #11065. This commit: - ensures that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready - changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`) - uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065 Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that: 1. if `componentCount` is lower than it should, then the service fails to start (due to #11725) 2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`. * Make `PortList.Pop()` thread-safe (#11799)
* Throw startup error if `TeleportReadyEvent` is not emitted (#11725) * Throw startup error if `TeleportReadyEvent` is not emitted Before this commit, the `TeleportReadyEvent` was only waited for when a process reload occurred. Thus, if a bug exists in the code that emits this event (as it's currently the case since the `MetricsReady` and `WindowsDesktopReady` events are never emitted), such a bug may go unnoticed for a while. This commit ensures that the `TeleportReadyEvent` is always waited for on startup, and throws an error if the event is not emitted (after some timeout). This commit also: - removes the `MetricsReady` event (as this is not produced by a component that sends heartbeats, which is the case of every other event required by the `TeleportReadyEvent` event mapping) - ensures that `WindowsDesktopReady` event is emitted - refactors some of the code in `lib/service/supervisor.go` - moves the event mapping registration to a new `registerTeleportReadyEvent` function * Ensure stateOK is reported only when all components have sent updates (#11249) Fixes #11065. This commit: - ensures that `TeleportReadyEvent` is only produced when all components that send heartbeats (i.e. call [`process.onHeartbeat`](https://github.com/gravitational/teleport/blob/16bf416556f337b045b66dc9c3f5a3e16f8cc988/lib/service/service.go#L358-L366)) are ready - changes `TeleportProcess.registerTeleportReadyEvent` so that it returns a count of these components (let's call it `componentCount`) - uses `componentCount` to also ensure that `stateOK` is only reported when all the components have sent their heartbeat, thus fixing #11065 Since it seems difficult to know when `TeleportProcess.registerTeleportReadyEvent` should be updated, with the goal of quickly detecting a bug when it's introduced we have that: 1. if `componentCount` is lower than it should, then the service fails to start (due to #11725) 2. if `componentCount` is higher than it should, then an error is logged in function `processState.getStateLocked`. * Make `PortList.Pop()` thread-safe (#11799)
* Revert "Make `PortList.Pop()` thread-safe (#11799)" This reverts commit a17337d. * Revert "Ensure stateOK is reported only when all components have sent updates (#11249)" This reverts commit b749302. * Revert "Throw startup error if `TeleportReadyEvent` is not emitted (#11725)" This reverts commit 933e247. * Revert "Fix ProxyKube not reporting its readiness (#12150)" This reverts commit 6cdcfe7.
Fixes #11065.
This PR:
TeleportReadyEvent
is only produced when all components that send heartbeats (i.e. callprocess.onHeartbeat
) are readyTeleportProcess.registerTeleportReadyEvent
so that it returns a count of these components (let's call itcomponentCount
)componentCount
to also ensure thatstateOK
is only reported when all the components have sent their heartbeat, thus fixingstateOK
reported inprocessState.getState()
before all components have sent updates #11065Since it seems difficult to know when
TeleportProcess.registerTeleportReadyEvent
should be updated, with the goal of quickly detecting a bug when it's introduced we have that:componentCount
is lower than it should, then the service fails to start (due to Throw startup error ifTeleportReadyEvent
is not emitted #11725)componentCount
is higher than it should, then an error is logged in functionprocessState.getStateLocked
.Testing
None.