-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make o365audit input cancellable #21647
Conversation
PR elastic#21258 introduced a restart mechanism for o365input so that it didn't stop working once a fatal error was found. This updates the restart delay to use a cancellation-context-aware method so that the input doesn't block Filebeat termination.
Pinging @elastic/siem (Team:SIEM) |
ctx.Logger.Infof("Restarting in %v", failureRetryInterval) | ||
time.Sleep(failureRetryInterval) | ||
ctx.Logger.Infof("Restarting in %v", inp.config.API.ErrorRetryInterval) | ||
timed.Wait(ctx.Cancelation, inp.config.API.ErrorRetryInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: inputs might have recoverable and non-recoverable errors. Only non-recoverable errors must the input return. The later case is quite unlikely for most inputs I guess. In case we have this pattern more often we might want to unify it by providing some helpers, input manager wrapper, or simply add a 'setting' to the cursor.InputManager
to always rerun on error.
Change LGTM. +1 for making the interval configurable. Does the poller error on every internal intermediate error? If not, how about naming it |
No, it only errors for authentication errors. Originally this was a fatal error so that the Beat wouldn't start with bad configuration, now it retries because the auth server can be occasionally down and there's no way to tell a transient error apart from a permanent authentication error. |
PR elastic#21258 introduced a restart mechanism for o365input so that it didn't stop working once a fatal error was found. This updates the restart delay to use a cancellation-context-aware method so that the input doesn't block Filebeat termination. (cherry picked from commit 1abe97b)
PR elastic#21258 introduced a restart mechanism for o365input so that it didn't stop working once a fatal error was found. This updates the restart delay to use a cancellation-context-aware method so that the input doesn't block Filebeat termination. (cherry picked from commit 1abe97b)
PR elastic#21258 introduced a restart mechanism for o365input so that it didn't stop working once a fatal error was found. This updates the restart delay to use a cancellation-context-aware method so that the input doesn't block Filebeat termination. (cherry picked from commit 1abe97b)
* upstream/master: (127 commits) Update obs app links (elastic#21682) fix: update fleet test suite name (elastic#21738) Remove dot from file.extension value in Auditbeat FIM (elastic#21644) Fix leaks with metadata processors (elastic#16349) Add istiod metricset (elastic#21519) [Ingest Manager] Change Sync/Close call order (elastic#21735) [Ingest Manager] Syncing unpacked files (elastic#21706) Fix concurrent map read and write in socket dataset (elastic#21690) Fix conditional coding to remove seccomp info from Winlogbeat (elastic#21652) [Elastic Agent] Fix issue where inputs without processors defined would panic (elastic#21628) Add configuration of filestream input (elastic#21565) libbeat/logp: introduce Logger.WithOptions (elastic#21671) Make o365audit input cancellable (elastic#21647) fix: remove extra curly brace in script (elastic#21692) [Winlogbeat] Remove brittle configuration validation from wineventlog (elastic#21593) Fix function that parses from/to/contact headers (elastic#21672) [CI] Support Windows-2016 in pipeline 2.0 (elastic#21337) Skip publisher flaky tests (elastic#21657) backport: add 7.10 branch (elastic#21635) [CI: Packaging] fix: push ubi8 images too (elastic#21621) ...
PR elastic#21258 introduced a restart mechanism for o365input so that it didn't stop working once a fatal error was found. This updates the restart delay to use a cancellation-context-aware method so that the input doesn't block Filebeat termination. (cherry picked from commit f2ab428)
What does this PR do?
o365input
to perform a cancellable wait when an error causes it to restart.error_retry_interval
for the delay between restarts instead of a hardcoded5m
.Why is it important?
Using
time.Sleep
prevents Filebeat to terminate until the timeout is elapsed.Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added tests that prove my fix is effective or that my feature works[ ] I have added an entry inCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Related issues
Relates #21258