auditbeat: auditd error messages after update to 7.13.2 #26668

ynirk · 2021-07-01T14:16:15Z

After updating auditbeat to 7.13.2 (from 7.12.0) we start seeing the following error message:

2021-06-29T12:48:37.782Z#011ERROR#011[auditd]#011auditd/audit_linux.go:204#011get status request failed:failed to get audit status reply: no reply received

It only happens on a small proportion of deployed servers after auditbeat restart. Exemple on a specific instance

logs started right after the update and we see some after auditbeat restart the next day.

The update has been deployed to fix kauditd deadlock issue (#26031) we were experiencing on some hosts. I'm wondering if it could be the same root cause ?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-07-02T05:48:28Z

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

efd6 · 2021-09-29T06:06:28Z

@ynirk Can I check whether it is consistently the same hosts? and whether the hosts are under load during this start up?

Looking at the relevant changes between 7.12.0 and 7.13.2 there is no change that would introduce this behaviour, but the previous deadlock behaviour may have hidden it by failing out.

Are you able to test increasing the number of retries or increasing the backoff on the retries?

ynirk · 2022-01-10T13:55:03Z

@efd6 sorry for the delay i totally miss the ping.
I see this behavior on lots of hosts (~2k hosts in the last 7 days) so it's not easy to tell if hosts are under pressure when it occurs.

efd6 · 2022-01-11T00:42:43Z

Thanks, @ynirk. Are you able to test whether this behaviour persists with a version built with the retry relaxations I mention above?

Also, are you able to provide the log lines that follow that error? The loop that handles this retries until there is catastrophic failure and no audit monitoring client can be started. It would be helpful to know how many loop iterations fail to obtain a response and interesting to know how many events are lost (log lines corresponding to this func). This latter query will be easier to address than the first.

botelastic · 2023-01-11T01:11:20Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

ynirk added the Auditbeat label Jul 1, 2021

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 1, 2021

adriansr added the Team:Security-External Integrations label Jul 2, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 2, 2021

botelastic bot added the Stalled label Jan 11, 2023

botelastic bot closed this as completed Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auditbeat: auditd error messages after update to 7.13.2 #26668

auditbeat: auditd error messages after update to 7.13.2 #26668

ynirk commented Jul 1, 2021

elasticmachine commented Jul 2, 2021

efd6 commented Sep 29, 2021

ynirk commented Jan 10, 2022

efd6 commented Jan 11, 2022 •

edited

Loading

botelastic bot commented Jan 11, 2023

auditbeat: auditd error messages after update to 7.13.2 #26668

auditbeat: auditd error messages after update to 7.13.2 #26668

Comments

ynirk commented Jul 1, 2021

elasticmachine commented Jul 2, 2021

efd6 commented Sep 29, 2021

ynirk commented Jan 10, 2022

efd6 commented Jan 11, 2022 • edited Loading

botelastic bot commented Jan 11, 2023

efd6 commented Jan 11, 2022 •

edited

Loading