-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix task manager polling flow controls #153491
Conversation
@elasticmachine merge upstream |
… into task-manager/fix-polling
…-ref HEAD~1..HEAD --fix'
… into task-manager/fix-polling
@elasticmachine merge upstream |
@elasticmachine merge upstream |
Pinging @elastic/response-ops (Team:ResponseOps) |
@@ -400,7 +400,6 @@ kibana_vars=( | |||
xpack.securitySolution.prebuiltRulesPackageVersion | |||
xpack.spaces.maxSpaces | |||
xpack.task_manager.max_attempts | |||
xpack.task_manager.max_poll_inactivity_cycles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to go through the deprecation / breaking change process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setting isn't documented so I figured we could simply remove it. I could change it to a no-op so it doesn't become breaking for anyone. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbudz ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it up to the team - not familiar with the setting and whether its use has been advised in any production scenarios.
💚 Build Succeeded
Metrics [docs]Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @mikecote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Created a bunch of rules and saw them running as expected. Verified the ES timeouts as per PR instructions as well.
Fixes #151938
In this PR, I'm re-writing the Task Manager poller so it doesn't run concurrently when timeouts occur while also fixing the issue where polling requests would pile up when polling takes time. To support this, I've also made the following changes:
xpack.task_manager.max_poll_inactivity_cycles
settingsearch
andupdateByQuery
functions have no retries. This prevents the request from retrying 5x whenever a timeout occurs, causing each call taking up to 2 1/2 minutes before Kibana sees the error (now down to 30s each). We have polling to manage retries in these situations.sinon
for faking timersassertStillInSetup
checks on plugin setup. Felt like a maintenance burden that wasn't necessary to fix with my code changes.The main code changes are within these files (to review thoroughly so the polling cycle doesn't suddenly stop):
?w=1
)To verify
Tips:
how to slowdown search for claimed task queries
how to slow down update by queries
Not the cleanest way but you'll see occasional request timeouts from the updateByQuery calls. I had more luck creating rules running every 1s.