Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beats managed by agent should not restart unless the output configuration has actually changed #34178

Closed
cmacknz opened this issue Jan 4, 2023 · 2 comments · Fixed by #34346
Assignees
Labels

Comments

@cmacknz
Copy link
Member

cmacknz commented Jan 4, 2023

As part of #34066 we moved the logic to restart the beats when the output configuration pushed by the agent changes into the Beat itself. See elastic/elastic-agent#1913 for more details on why this is necessary.

The changes in #34066 also made it unnecessary to restart the Beat when the log level changes, however we still restart the Beat when the log level changes because the log level in V2 changes per unit and we detect it as an output unit change without looking more closely at exactly what the change was.

We should be able to use logic similar to what is for detecting that the Unit has changed at all (https://github.com/elastic/elastic-agent-client/blob/4477e3ace394ef1abfec55afa5cc5e1f6f87980c/pkg/client/unit.go#L253-L274) to detect that only the log level has changed.

@cmacknz cmacknz added Agent Team:Elastic-Agent Label for the Agent team v8.6.0 labels Jan 4, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz cmacknz changed the title Beats managed by agent unnecessarily restart when only the output log level has changed Beats managed by agent should not restart unless the output configuration has actually changed Jan 16, 2023
@cmacknz
Copy link
Member Author

cmacknz commented Jan 16, 2023

I renamed this issue to increase the scope of this issue to prevent unnecessary restarts for more than just log level changes. I believe any unit change (input or otherwise) causes the Beats to restart.

The logic to debounce/group unit changes together is causing us to reload the outputs and restart on any unit change, causing us to always restart unnecessarily even if the output unit is not changing at all:

case <-t.C:
// a copy of the units is used for reload to prevent the holding of the `cm.mx`.
// it could be possible that sending the configuration to reload could cause the `UpdateStatus`
// to be called on the manager causing it to try and grab the `cm.mx` lock, causing a deadlock.
cm.mx.Lock()
units := make(map[unitKey]*client.Unit, len(cm.units))
for k, u := range cm.units {
units[k] = u
}
cm.mx.Unlock()
cm.reload(units)

If you edit the input unit configuration of an input in a policy, the Beat will still restart because that reload call is always passed the full set of units and there is no logic to check if the output unit actually changed:

if cm.stopOnOutputReload && cm.outputIsConfigured {
cm.logger.Info("beat is restarting because output changed")
_ = unit.UpdateState(client.UnitStateStopping, "Restarting", nil)
cm.Stop()
return nil
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants