Skip to content

Commit

Permalink
[8.6] [Fleet] cancel tasks when 3rd retry failed (elastic#147190) (el…
Browse files Browse the repository at this point in the history
…astic#147230)

# Backport

This will backport the following commits from `main` to `8.6`:
- [[Fleet] cancel tasks when 3rd retry failed
(elastic#147190)](elastic#147190)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Julia
Bardi","email":"90178898+juliaElastic@users.noreply.github.com"},"sourceCommit":{"committedDate":"2022-12-08T08:14:33Z","message":"[Fleet]
cancel tasks when 3rd retry failed (elastic#147190)\n\n##
Summary\r\n\r\nRelated to
https://github.com/elastic/kibana/issues/144161\r\n\r\nFound that on a
bulk update tags task failure, the task didn't stop\r\nafter 3 retries
(should be over in less then a minute), the retries kept\r\nhappening
for 2 hours.\r\nThis change removes the retry task if 3 retries are
reached.\r\n\r\nAlso testing in cloud deployment to see if the tags
error can be\r\nreproduced with this fix.\r\nI could reproduce the
reported error locally, and seeing it goes away\r\nwith this
fix.\r\n\r\nTo verify:\r\n- Add at least 50k agents with the
`create_agents` script in kibana repo\r\n- open Kibana, select the 50k
agents, and open Actions / Add tags\r\n- Try this in a few seconds: add
2 new tags, and remove one of them\r\n- Wait about 30s, the agents
should reflect the changes\r\n- Check the logs to see that the tasks are
removed after 3rd retry is\r\nreached or successful.\r\n- Check that
there are no more running tasks. Any running task can be\r\nfound in
Kibana Console by running this query:
`GET\r\n.kibana_task_manager/_search?q=task.taskType:\"fleet:update_agent_tags:retry\"`\r\n\r\nLocally
simulated an error to test that the retry (and check) task
is\r\nremoved:\r\n\r\n```\r\n[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet]
Retry #3 of task
fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b
failed: failing task\r\n[2022-12-07T15:52:16.416+01:00][WARN
][plugins.fleet] Stopping after 3rd retry. Error: failing
task\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing
task
fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n[2022-12-07T15:52:16.416+01:00][INFO
][plugins.fleet] Removing task
fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n```","sha":"431c32b894077fc5910380252086442083734fce","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Fleet","v8.7.0","v8.6.1"],"number":147190,"url":"https://github.com/elastic/kibana/pull/147190","mergeCommit":{"message":"[Fleet]
cancel tasks when 3rd retry failed (elastic#147190)\n\n##
Summary\r\n\r\nRelated to
https://github.com/elastic/kibana/issues/144161\r\n\r\nFound that on a
bulk update tags task failure, the task didn't stop\r\nafter 3 retries
(should be over in less then a minute), the retries kept\r\nhappening
for 2 hours.\r\nThis change removes the retry task if 3 retries are
reached.\r\n\r\nAlso testing in cloud deployment to see if the tags
error can be\r\nreproduced with this fix.\r\nI could reproduce the
reported error locally, and seeing it goes away\r\nwith this
fix.\r\n\r\nTo verify:\r\n- Add at least 50k agents with the
`create_agents` script in kibana repo\r\n- open Kibana, select the 50k
agents, and open Actions / Add tags\r\n- Try this in a few seconds: add
2 new tags, and remove one of them\r\n- Wait about 30s, the agents
should reflect the changes\r\n- Check the logs to see that the tasks are
removed after 3rd retry is\r\nreached or successful.\r\n- Check that
there are no more running tasks. Any running task can be\r\nfound in
Kibana Console by running this query:
`GET\r\n.kibana_task_manager/_search?q=task.taskType:\"fleet:update_agent_tags:retry\"`\r\n\r\nLocally
simulated an error to test that the retry (and check) task
is\r\nremoved:\r\n\r\n```\r\n[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet]
Retry #3 of task
fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b
failed: failing task\r\n[2022-12-07T15:52:16.416+01:00][WARN
][plugins.fleet] Stopping after 3rd retry. Error: failing
task\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing
task
fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n[2022-12-07T15:52:16.416+01:00][INFO
][plugins.fleet] Removing task
fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n```","sha":"431c32b894077fc5910380252086442083734fce"}},"sourceBranch":"main","suggestedTargetBranches":["8.6"],"targetPullRequestStates":[{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/147190","number":147190,"mergeCommit":{"message":"[Fleet]
cancel tasks when 3rd retry failed (elastic#147190)\n\n##
Summary\r\n\r\nRelated to
https://github.com/elastic/kibana/issues/144161\r\n\r\nFound that on a
bulk update tags task failure, the task didn't stop\r\nafter 3 retries
(should be over in less then a minute), the retries kept\r\nhappening
for 2 hours.\r\nThis change removes the retry task if 3 retries are
reached.\r\n\r\nAlso testing in cloud deployment to see if the tags
error can be\r\nreproduced with this fix.\r\nI could reproduce the
reported error locally, and seeing it goes away\r\nwith this
fix.\r\n\r\nTo verify:\r\n- Add at least 50k agents with the
`create_agents` script in kibana repo\r\n- open Kibana, select the 50k
agents, and open Actions / Add tags\r\n- Try this in a few seconds: add
2 new tags, and remove one of them\r\n- Wait about 30s, the agents
should reflect the changes\r\n- Check the logs to see that the tasks are
removed after 3rd retry is\r\nreached or successful.\r\n- Check that
there are no more running tasks. Any running task can be\r\nfound in
Kibana Console by running this query:
`GET\r\n.kibana_task_manager/_search?q=task.taskType:\"fleet:update_agent_tags:retry\"`\r\n\r\nLocally
simulated an error to test that the retry (and check) task
is\r\nremoved:\r\n\r\n```\r\n[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet]
Retry #3 of task
fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b
failed: failing task\r\n[2022-12-07T15:52:16.416+01:00][WARN
][plugins.fleet] Stopping after 3rd retry. Error: failing
task\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing
task
fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n[2022-12-07T15:52:16.416+01:00][INFO
][plugins.fleet] Removing task
fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n```","sha":"431c32b894077fc5910380252086442083734fce"}},{"branch":"8.6","label":"v8.6.1","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
  • Loading branch information
kibanamachine and juliaElastic authored Dec 8, 2022
1 parent c8c3f5f commit fd4a6fc
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions x-pack/plugins/fleet/server/services/agents/action_runner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,13 @@ export abstract class ActionRunner {
if (this.retryParams.retryCount === 3) {
const errorMessage = 'Stopping after 3rd retry. Error: ' + error.message;
appContextService.getLogger().warn(errorMessage);

// clean up tasks after 3rd retry reached
await Promise.all([
this.bulkActionsResolver!.removeIfExists(this.checkTaskId!),
this.bulkActionsResolver!.removeIfExists(this.retryParams.taskId!),
]);

return;
}
} else {
Expand Down

0 comments on commit fd4a6fc

Please sign in to comment.