Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[task manager] do not sort tasks to be claimed by score if no pinned tasks #80692

Merged
merged 2 commits into from
Oct 22, 2020

Conversation

pmuellr
Copy link
Member

@pmuellr pmuellr commented Oct 15, 2020

resolves: #80371

Previously, when claiming tasks, we were always sorting the tasks to claim by
the score and then by the time they should be run. We sort by score to
capture runNow() tasks, also referred to internally as "pinned" tasks
in the update by query.

The change in this PR is to only sort by score if there are pinned tasks, and
to not sort by score at all if there aren't any.

@pmuellr
Copy link
Member Author

pmuellr commented Oct 15, 2020

It would be nice to add some kind of "load" functional test for this. That could test 2 things:

  • that we don't end up with "zombie" tasks
  • that runNow() tasks still run as soon as possible

The first is tough, as I've only managed to see this in one case with ~100K tasks queued to run. And if we use the default poll interval and worker size, we can only run about 10 workers every 3 seconds (can't remember if we poll earlier if there are workers finishing before the poll interval goes off). Perhaps we need a script to do an ad-hoc test we can run over the course of a few hours.

The second should be easier, I think. Queue up a bunch of tasks (say 100), then do a runNow(), and it should fire within 2 * the next poll interval (I think). [edit: we already have some here]

@@ -278,6 +279,12 @@ export class TaskStore {
)
);

// the sort should use score first, but only if there are pinned tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment about score being lower for old tasks and that that is the opposite to what TM actually needs, so whenever score is used TM is likely to pick up newer expired tasks rather than the older ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more explanation - but it's not entirely clear to me that the score IS strictly lower for older tasks - I didn't actually go down the rabbit hole to figure out how the scores were calculated :-). That would be my guess as well - that older docs score lower - but sometimes things actually worked out fairly well, so I think there may be more to it than just that. I've been assuming the score is basically a random number at this point, except for the pinned tasks that will score better ...

@gmmorris
Copy link
Contributor

LG but we should definitely test this, at least at unit level.

@pmuellr
Copy link
Member Author

pmuellr commented Oct 20, 2020

LG but we should definitely test this, at least at unit level.

Yup, was planning a jest test, not quite sure how to build a test for this that can run as a function test though.

…tasks

resolves: elastic#80371

Previously, when claiming tasks, we were always sorting the tasks to claim by
the score and then by the time they should be run.  We sort by score to
capture `runNow()` tasks, also referred to internally as "pinned" tasks
in the update by query.

The change in this PR is to only sort by score if there are pinned tasks, and
to not sort by score at all if there aren't any.
@pmuellr pmuellr force-pushed the task-manager/old-zombie-tasks branch from d2930d4 to d641462 Compare October 20, 2020 20:34
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pmuellr pmuellr added v7.11.0 v8.0.0 Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) release_note:skip Skip the PR/issue when compiling release notes labels Oct 21, 2020
@pmuellr pmuellr marked this pull request as ready for review October 21, 2020 03:42
@pmuellr pmuellr requested a review from a team as a code owner October 21, 2020 03:42
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. LGTM!

Copy link
Contributor

@gmmorris gmmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@pmuellr pmuellr merged commit 069e842 into elastic:master Oct 22, 2020
pmuellr added a commit to pmuellr/kibana that referenced this pull request Oct 22, 2020
…tasks (elastic#80692)

resolves: elastic#80371

Previously, when claiming tasks, we were always sorting the tasks to claim by
the score and then by the time they should be run.  We sort by score to
capture `runNow()` tasks, also referred to internally as "pinned" tasks
in the update by query.

The change in this PR is to only sort by score if there are pinned tasks, and
to not sort by score at all if there aren't any.
pmuellr added a commit to pmuellr/kibana that referenced this pull request Oct 22, 2020
…tasks (elastic#80692)

resolves: elastic#80371

Previously, when claiming tasks, we were always sorting the tasks to claim by
the score and then by the time they should be run.  We sort by score to
capture `runNow()` tasks, also referred to internally as "pinned" tasks
in the update by query.

The change in this PR is to only sort by score if there are pinned tasks, and
to not sort by score at all if there aren't any.
# Conflicts:
#	x-pack/plugins/task_manager/server/task_store.test.ts
pmuellr added a commit that referenced this pull request Oct 22, 2020
…tasks (#80692) (#81495)

resolves: #80371

Previously, when claiming tasks, we were always sorting the tasks to claim by
the score and then by the time they should be run.  We sort by score to
capture `runNow()` tasks, also referred to internally as "pinned" tasks
in the update by query.

The change in this PR is to only sort by score if there are pinned tasks, and
to not sort by score at all if there aren't any.
pmuellr added a commit that referenced this pull request Oct 22, 2020
…pinned tasks (#80692) (#81497)

resolves: #80371

Previously, when claiming tasks, we were always sorting the tasks to claim by
the score and then by the time they should be run.  We sort by score to
capture `runNow()` tasks, also referred to internally as "pinned" tasks
in the update by query.

The change in this PR is to only sort by score if there are pinned tasks, and
to not sort by score at all if there aren't any.

Also had to fix type check error after fixing a. merge conflict during the backport:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.10.0 v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task Manager] old idle task not run when there's a large backlog of idle tasks
5 participants