Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auditbeat: auditd error messages after update to 7.13.2 #26668

Closed
ynirk opened this issue Jul 1, 2021 · 5 comments
Closed

auditbeat: auditd error messages after update to 7.13.2 #26668

ynirk opened this issue Jul 1, 2021 · 5 comments

Comments

@ynirk
Copy link

ynirk commented Jul 1, 2021

After updating auditbeat to 7.13.2 (from 7.12.0) we start seeing the following error message:

2021-06-29T12:48:37.782Z#011ERROR#011[auditd]#011auditd/audit_linux.go:204#011get status request failed:failed to get audit status reply: no reply received

It only happens on a small proportion of deployed servers after auditbeat restart. Exemple on a specific instance
Screenshot 2021-06-29 at 15 15 31

logs started right after the update and we see some after auditbeat restart the next day.

The update has been deployed to fix kauditd deadlock issue (#26031) we were experiencing on some hosts. I'm wondering if it could be the same root cause ?

@ynirk ynirk added the Auditbeat label Jul 1, 2021
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 1, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 2, 2021
@efd6
Copy link
Contributor

efd6 commented Sep 29, 2021

@ynirk Can I check whether it is consistently the same hosts? and whether the hosts are under load during this start up?

Looking at the relevant changes between 7.12.0 and 7.13.2 there is no change that would introduce this behaviour, but the previous deadlock behaviour may have hidden it by failing out.

Are you able to test increasing the number of retries or increasing the backoff on the retries?

@ynirk
Copy link
Author

ynirk commented Jan 10, 2022

@efd6 sorry for the delay i totally miss the ping.
I see this behavior on lots of hosts (~2k hosts in the last 7 days) so it's not easy to tell if hosts are under pressure when it occurs.

@efd6
Copy link
Contributor

efd6 commented Jan 11, 2022

Thanks, @ynirk. Are you able to test whether this behaviour persists with a version built with the retry relaxations I mention above?

Also, are you able to provide the log lines that follow that error? The loop that handles this retries until there is catastrophic failure and no audit monitoring client can be started. It would be helpful to know how many loop iterations fail to obtain a response and interesting to know how many events are lost (log lines corresponding to this func). This latter query will be easier to address than the first.

@botelastic
Copy link

botelastic bot commented Jan 11, 2023

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jan 11, 2023
@botelastic botelastic bot closed this as completed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants