Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[8.6] [Fleet] cancel tasks when 3rd retry failed (elastic#147190) (el…
…astic#147230) # Backport This will backport the following commits from `main` to `8.6`: - [[Fleet] cancel tasks when 3rd retry failed (elastic#147190)](elastic#147190) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Julia Bardi","email":"90178898+juliaElastic@users.noreply.github.com"},"sourceCommit":{"committedDate":"2022-12-08T08:14:33Z","message":"[Fleet] cancel tasks when 3rd retry failed (elastic#147190)\n\n## Summary\r\n\r\nRelated to https://github.com/elastic/kibana/issues/144161\r\n\r\nFound that on a bulk update tags task failure, the task didn't stop\r\nafter 3 retries (should be over in less then a minute), the retries kept\r\nhappening for 2 hours.\r\nThis change removes the retry task if 3 retries are reached.\r\n\r\nAlso testing in cloud deployment to see if the tags error can be\r\nreproduced with this fix.\r\nI could reproduce the reported error locally, and seeing it goes away\r\nwith this fix.\r\n\r\nTo verify:\r\n- Add at least 50k agents with the `create_agents` script in kibana repo\r\n- open Kibana, select the 50k agents, and open Actions / Add tags\r\n- Try this in a few seconds: add 2 new tags, and remove one of them\r\n- Wait about 30s, the agents should reflect the changes\r\n- Check the logs to see that the tasks are removed after 3rd retry is\r\nreached or successful.\r\n- Check that there are no more running tasks. Any running task can be\r\nfound in Kibana Console by running this query: `GET\r\n.kibana_task_manager/_search?q=task.taskType:\"fleet:update_agent_tags:retry\"`\r\n\r\nLocally simulated an error to test that the retry (and check) task is\r\nremoved:\r\n\r\n```\r\n[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet] Retry #3 of task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b failed: failing task\r\n[2022-12-07T15:52:16.416+01:00][WARN ][plugins.fleet] Stopping after 3rd retry. Error: failing task\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n```","sha":"431c32b894077fc5910380252086442083734fce","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Fleet","v8.7.0","v8.6.1"],"number":147190,"url":"https://github.com/elastic/kibana/pull/147190","mergeCommit":{"message":"[Fleet] cancel tasks when 3rd retry failed (elastic#147190)\n\n## Summary\r\n\r\nRelated to https://github.com/elastic/kibana/issues/144161\r\n\r\nFound that on a bulk update tags task failure, the task didn't stop\r\nafter 3 retries (should be over in less then a minute), the retries kept\r\nhappening for 2 hours.\r\nThis change removes the retry task if 3 retries are reached.\r\n\r\nAlso testing in cloud deployment to see if the tags error can be\r\nreproduced with this fix.\r\nI could reproduce the reported error locally, and seeing it goes away\r\nwith this fix.\r\n\r\nTo verify:\r\n- Add at least 50k agents with the `create_agents` script in kibana repo\r\n- open Kibana, select the 50k agents, and open Actions / Add tags\r\n- Try this in a few seconds: add 2 new tags, and remove one of them\r\n- Wait about 30s, the agents should reflect the changes\r\n- Check the logs to see that the tasks are removed after 3rd retry is\r\nreached or successful.\r\n- Check that there are no more running tasks. Any running task can be\r\nfound in Kibana Console by running this query: `GET\r\n.kibana_task_manager/_search?q=task.taskType:\"fleet:update_agent_tags:retry\"`\r\n\r\nLocally simulated an error to test that the retry (and check) task is\r\nremoved:\r\n\r\n```\r\n[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet] Retry #3 of task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b failed: failing task\r\n[2022-12-07T15:52:16.416+01:00][WARN ][plugins.fleet] Stopping after 3rd retry. Error: failing task\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n```","sha":"431c32b894077fc5910380252086442083734fce"}},"sourceBranch":"main","suggestedTargetBranches":["8.6"],"targetPullRequestStates":[{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/147190","number":147190,"mergeCommit":{"message":"[Fleet] cancel tasks when 3rd retry failed (elastic#147190)\n\n## Summary\r\n\r\nRelated to https://github.com/elastic/kibana/issues/144161\r\n\r\nFound that on a bulk update tags task failure, the task didn't stop\r\nafter 3 retries (should be over in less then a minute), the retries kept\r\nhappening for 2 hours.\r\nThis change removes the retry task if 3 retries are reached.\r\n\r\nAlso testing in cloud deployment to see if the tags error can be\r\nreproduced with this fix.\r\nI could reproduce the reported error locally, and seeing it goes away\r\nwith this fix.\r\n\r\nTo verify:\r\n- Add at least 50k agents with the `create_agents` script in kibana repo\r\n- open Kibana, select the 50k agents, and open Actions / Add tags\r\n- Try this in a few seconds: add 2 new tags, and remove one of them\r\n- Wait about 30s, the agents should reflect the changes\r\n- Check the logs to see that the tasks are removed after 3rd retry is\r\nreached or successful.\r\n- Check that there are no more running tasks. Any running task can be\r\nfound in Kibana Console by running this query: `GET\r\n.kibana_task_manager/_search?q=task.taskType:\"fleet:update_agent_tags:retry\"`\r\n\r\nLocally simulated an error to test that the retry (and check) task is\r\nremoved:\r\n\r\n```\r\n[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet] Retry #3 of task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b failed: failing task\r\n[2022-12-07T15:52:16.416+01:00][WARN ][plugins.fleet] Stopping after 3rd retry. Error: failing task\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b\r\n```","sha":"431c32b894077fc5910380252086442083734fce"}},{"branch":"8.6","label":"v8.6.1","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
- Loading branch information