-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task_pool: avoid listing tasks where not necessary #5436
Conversation
Nice. Shows there's plenty of scope for optimizing the new system yet. Test fails are just codecov upload.
I'd vote for getting these kinds of changes in as quickly as possible, so 8.1.3 if we do it. This is not strictly a bug fix, but then it's not really a user-facing enhancement or new feature either unless you're running certain kinds of nasty workflow. |
* Re-use the cached task list where appropriate. * Switch to using the task dictionary for "in" comparisons for faster performance.
e8dd7b1
to
caa9045
Compare
Agreed, rebased onto 8.1.x. |
@wxtim please check you're happy that this will work correctly post-reload. |
@@ -128,7 +128,7 @@ def __init__( | |||
self.task_name_list = self.config.get_task_name_list() | |||
self.task_queue_mgr = IndepQueueManager( | |||
self.config.cfg['scheduling']['queues'], | |||
self.config.get_task_name_list(), | |||
self.task_name_list, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between this and self.config.taskdefs
- is this the cached version?
How do you know when to use each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what that function is doing:
Lines 1599 to 1605 in 3ad0ca5
def get_task_name_list(self): | |
"""Return a sorted list of all tasks used in the dependency graph. | |
Note: the sort order may effect get_graph_raw ouput. | |
""" | |
return sorted(self.taskdefs) |
The self.task_name_list
is a cached version to avoid having to rebuild and sort
the list every time we want to iterate through it.
When we reload a workflow the list of tasks changes so the cached list will need to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. That's fine then. I thought there was more magic than that!
Some low-hanging fruit spotted whilst investigating #5435
The task_pool code was repeatedly requesting the same sorted list of tasks which was starting to bite for a 7001 task workflow.
I profiled this using the example in #5435 and the [flawed] patch from #5435 (comment). I pressed ctrl+c once the first
b<x>
task has submitted.This diff brings the time taken by the
can_spawn
function down from 16.93s to 0.2046s which is a saving of ~5% of the overall runtime!Question: 8.1.3 or 8.2.0?Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).CHANGES.md
entry included if this is a change that can affect users?.?.x
branch.