Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task_pool: avoid listing tasks where not necessary #5436

Merged
merged 1 commit into from
Mar 31, 2023

Conversation

oliver-sanders
Copy link
Member

@oliver-sanders oliver-sanders commented Mar 28, 2023

Some low-hanging fruit spotted whilst investigating #5435

The task_pool code was repeatedly requesting the same sorted list of tasks which was starting to bite for a 7001 task workflow.

  • Re-use the cached task list where appropriate.
  • Switch to using the task dictionary for "in" comparisons for faster performance.

I profiled this using the example in #5435 and the [flawed] patch from #5435 (comment). I pressed ctrl+c once the first b<x> task has submitted.

This diff brings the time taken by the can_spawn function down from 16.93s to 0.2046s which is a saving of ~5% of the overall runtime!

Question: 8.1.3 or 8.2.0?

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@oliver-sanders oliver-sanders added the efficiency For notable efficiency improvements label Mar 28, 2023
@oliver-sanders oliver-sanders self-assigned this Mar 28, 2023
@oliver-sanders
Copy link
Member Author

Before

Screenshot from 2023-03-28 16-50-03

After

Screenshot from 2023-03-28 16-50-17

@hjoliver
Copy link
Member

Nice.

Shows there's plenty of scope for optimizing the new system yet.

Test fails are just codecov upload.

Question: 8.1.3 or 8.2.0?

I'd vote for getting these kinds of changes in as quickly as possible, so 8.1.3 if we do it.

This is not strictly a bug fix, but then it's not really a user-facing enhancement or new feature either unless you're running certain kinds of nasty workflow.

* Re-use the cached task list where appropriate.
* Switch to using the task dictionary for "in" comparisons
  for faster performance.
@oliver-sanders oliver-sanders changed the base branch from master to 8.1.x March 29, 2023 09:19
@oliver-sanders
Copy link
Member Author

Agreed, rebased onto 8.1.x.

@oliver-sanders oliver-sanders added this to the cylc-8.1.3 milestone Mar 29, 2023
@oliver-sanders
Copy link
Member Author

@wxtim please check you're happy that this will work correctly post-reload.

@@ -128,7 +128,7 @@ def __init__(
self.task_name_list = self.config.get_task_name_list()
self.task_queue_mgr = IndepQueueManager(
self.config.cfg['scheduling']['queues'],
self.config.get_task_name_list(),
self.task_name_list,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between this and self.config.taskdefs - is this the cached version?

How do you know when to use each?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what that function is doing:

cylc-flow/cylc/flow/config.py

Lines 1599 to 1605 in 3ad0ca5

def get_task_name_list(self):
"""Return a sorted list of all tasks used in the dependency graph.
Note: the sort order may effect get_graph_raw ouput.
"""
return sorted(self.taskdefs)

The self.task_name_list is a cached version to avoid having to rebuild and sort the list every time we want to iterate through it.

When we reload a workflow the list of tasks changes so the cached list will need to be updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. That's fine then. I thought there was more magic than that!

@wxtim wxtim merged commit 3d8caa1 into cylc:8.1.x Mar 31, 2023
@oliver-sanders oliver-sanders deleted the can_spawn_optimisation branch March 31, 2023 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants