-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
efficiency: increment_graph_window #5315
Comments
It appears what's happening is that for an example like So the graph expansion is running But even after that there's another factor of two in ther somewhere. For my example N=50 so I'd expect Which brings the total to |
* There is a significant overhead to initiating a `TaskProxy` object. * Avoid doing this where it is not necessary. * In my test with the example in cylc#5315 with `-s TASKS=15 -s CYCLES=1` this reduced the time taken by `expand_graph_window` by ~38% from from 12.2s to `7.54`.
* There is a significant overhead to initiating a `TaskProxy` object. * Avoid doing this where it is not necessary. * In my test with the example in cylc#5315 with `-s TASKS=15 -s CYCLES=1` this reduced the time taken by `expand_graph_window` by ~38% from from 12.2s to `7.54`.
I've had a go at writing a graph walker thinggy which expands the graph around multiple tasks. This way we can add a batch of tasks, then expand the window around the batch which would save walking over the same graph edges. https://github.com/oliver-sanders/cylc-flow/pull/new/increment-graph-window I think I've got vaguely the right edges coming out of it, but 'm having real trouble hooking that into the data store. I'm not really sure what I think having one function which walks the graph and another which registers nodes/edges should be easier to understand/optimise than two functions which call each other recursively as I keep getting lost in that logic. I the mean time we're trying to make the platforms/tokens interfaces more efficient in order to reduce the impact of the large number of function calls. |
The pool hands cylc-flow/cylc/flow/data_store_mgr.py Line 1162 in fb63772
Essentially
Then
The
I think I can refactor the code to put the recursion (graph walk) in the same function, however, the problem is a recursive one, especially with variable window sizes. There are two things I will look into to reduce the CPU cost significantly:
Great 👍 In fact, we don't even need to run |
Thanks for this info, really helpful.
MB. We used to have a profile-battery to help us spot these things but sadly never got around to porting it to Cylc 8. It's been compounded by the platforms/tokens interfaces which aren't the fastest.
This is definitely the mid/long term solution. We have to be careful with the cache though as it could get quite large if not correctly housekept. It might be worth caching the edge-id with this value as that re-computation gets expensive. We are also looking into caching in the Tokens objects to see if there are any passive gains to be made there.
I don't know why we set the
I think [/hope] if a task isn't in the pool then the additional information the |
Could you give me a quick rundown of what those dicts are doing at some point. I need to understand this stuff better. |
An overview of the pruning would help, consider:
with When Now:
Hope that helps |
Closing this issue as fixed by #5319 and (improved further by #5321, #5325, ...), thanks all for your work on this one! I think we should be able to get these fixes deployed before anyone runs into the issue for real. Tracking of further enhancements to be covered by cylc/cylc-admin#38 |
I've just tried out an old Cylc 7 scaling example and found out that Cylc 8 scales disastrously against this benchmark.
The issue occurs when pushing the number of edges in a workflow. This causes the
increment_graph_window
function to do a lot of work, which causes a vast number of non-functionalTaskProxy
objects to be created (and destroyed).Here's some
--profile
results showing theincrement_graph_window
call stack soaking up CPU:And here's the example workflow that generated it (I used
-s TASKS=50
):The underlying culprits are the platforms and tokens methods:
There are things we can do to reduce the overheads of these:
However, the best solution is to avoid calling them in the first place.
Pull requests welcome!
This is an Open Source project - please consider contributing a bug fix
yourself (please read
CONTRIBUTING.md
before starting any work though).The text was updated successfully, but these errors were encountered: