efficiency: increment_graph_window #5315

oliver-sanders · 2023-01-20T10:54:51Z

I've just tried out an old Cylc 7 scaling example and found out that Cylc 8 scales disastrously against this benchmark.

The issue occurs when pushing the number of edges in a workflow. This causes the increment_graph_window function to do a lot of work, which causes a vast number of non-functional TaskProxy objects to be created (and destroyed).

Here's some --profile results showing the increment_graph_window call stack soaking up CPU:

And here's the example workflow that generated it (I used -s TASKS=50):

#!Jinja2                                                                                                   
                                                                                                           
{% set TASKS = TASKS | default(100) %}                                                                     
{% set CYCLES = CYCLES | default(1) %}                                                                     
{% set SLEEP = SLEEP | default(1) %}                                                                       
                                                                                                           
[meta]                                                                                                     
    description = """                                                                                      
        Example scaling workflow with {{ (TASKS * 2) + 2 }} tasks and {{ (TASKS ** 2) + (TASKS * 2) }}                             
        dependencies per cycle which runs for {{ CYCLES }} cycles.                                         
    """                                                                                                    
                                                                                                           
[task parameters]                                                                                          
    x = 1..{{ TASKS }}                                                                                     
    y = 1..{{ TASKS }}                                                                                     
                                                                                                           
[scheduling]                                                                                               
    cycling mode = integer                                                                                 
    initial cycle point = 1                                                                                
    final cycle point = {{ CYCLES }}                                                                       
    [[graph]]                                                                                              
        P1 = """                                                                                           
            d[-P1] => a => b<x> => c<y> => d                                                               
        """                                                                                                
                                                                                                           
[runtime]                                                                                                  
    [[b<x>, c<y>]]                                                                                         
        script = sleep {{ SLEEP }}                                                                         
    [[a, d]]

The underlying culprits are the platforms and tokens methods:

There are things we can do to reduce the overheads of these:

However, the best solution is to avoid calling them in the first place.

Pull requests welcome!
This is an Open Source project - please consider contributing a bug fix
yourself (please read CONTRIBUTING.md before starting any work though).

The text was updated successfully, but these errors were encountered:

oliver-sanders · 2023-01-20T11:33:29Z

It appears what's happening is that for an example like <a> => <b> ALL a => b edges are "expanded" for EVERY a.

So the graph expansion is running <a> * <b> * <a> times. Given <a> = <b> in this example this means expand_graph_window is scaling to O(N^3) which is the source of the issue. We need to bring this down to O(N^2).

But even after that there's another factor of two in ther somewhere. For my example N=50 so I'd expect 50^3 calls which is 125'000, but we actually got 262651

Which brings the total to O(2N^3).

* There is a significant overhead to initiating a `TaskProxy` object. * Avoid doing this where it is not necessary. * In my test with the example in cylc#5315 with `-s TASKS=15 -s CYCLES=1` this reduced the time taken by `expand_graph_window` by ~38% from from 12.2s to `7.54`.

oliver-sanders · 2023-01-20T16:46:55Z

I've had a go at writing a graph walker thinggy which expands the graph around multiple tasks. This way we can add a batch of tasks, then expand the window around the batch which would save walking over the same graph edges.

https://github.com/oliver-sanders/cylc-flow/pull/new/increment-graph-window

I think I've got vaguely the right edges coming out of it, but 'm having real trouble hooking that into the data store. I'm not really sure what increment_graph_window is meant to do, how it registers nodes/edges and what all those n_window_* dicts are supposed to hold.

I think having one function which walks the graph and another which registers nodes/edges should be easier to understand/optimise than two functions which call each other recursively as I keep getting lost in that logic.

I the mean time we're trying to make the platforms/tokens interfaces more efficient in order to reduce the impact of the large number of function calls.

dwsutherland · 2023-01-21T00:28:47Z

I've had a go at writing a graph walker thinggy which expands the graph around multiple tasks. This way we can add a batch of tasks, then expand the window around the batch which would save walking over the same graph edges so many times.

The pool hands increment_graph_window a task proxy, and the data and some functionality of the proxy is used to populate the respective store node. And the purpose of generating task proxies was to call increment_graph_window in the same way the pool does, I didn't realize how expensive it is, maybe we can just generate the store off existing objects/info, however, we do get a lot of information from it:

cylc-flow/cylc/flow/data_store_mgr.py

Line 1162 in fb63772

def _process_internal_task_proxy(self, itask, tproxy):

I think I've got vaguely the right edges coming out of it, but 'm having real trouble hooking that into the data store. I'm not really sure what increment_graph_window is meant to do, how it registers nodes/edges and what all those n_window_* dicts are supposed to hold.

Essentially increment_graph_window:

Generate the store node
Get list of children and call _expand_graph_window
Get list of parents and call _expand_graph_window

Then _expand_graph_window:

Generate store edges
For any new edges call increment_graph_window to repeat the above for parent/child

The n_window_* dicts (and some other dicts/sets) are to keep track of branches associated with the active tasks for pruning purposes.

I think having one function which walks the graph and another which registers nodes/edges should be easier to understand/optimise than two functions which call each other recursively as I keep getting lost in that logic.

I think I can refactor the code to put the recursion (graph walk) in the same function, however, the problem is a recursive one, especially with variable window sizes.

There are two things I will look into to reduce the CPU cost significantly:

Reduce the reliance on TaskProxy objects in the data_manager, by just passing in data it needs and externalizing functionality to share (where possible).
Try record reference information for the graph walks (like those mysterious n_window_* dicts) and use this, to avoid re-walking the same paths.

I the mean time we're trying to make the platforms/tokens interfaces more efficient in order to reduce the impact of the large number of function calls.

Great 👍

In fact, we don't even need to run self.platform = get_platform() for store purposes.. (we get platform name from config, jobs do the full resolving prior to running.. which gets added to the new runtime info)
We probably use too much of the information from TaskProxy to not generate one, however, perhaps we can optionally generate store versions that are less expensive.

oliver-sanders · 2023-01-23T10:04:36Z

Thanks for this info, really helpful.

I didn't realize how expensive it is

MB. We used to have a profile-battery to help us spot these things but sadly never got around to porting it to Cylc 8. It's been compounded by the platforms/tokens interfaces which aren't the fastest.

to avoid re-walking the same paths.

This is definitely the mid/long term solution. We have to be careful with the cache though as it could get quite large if not correctly housekept. It might be worth caching the edge-id with this value as that re-computation gets expensive. We are also looking into caching in the Tokens objects to see if there are any passive gains to be made there.

In fact, we don't even need to run self.platform = get_platform() for store purposes

I don't know why we set the localhost platform as a default for all TaskProxiesin the first place, this is likely a bug in waiting. We're currently looking into removing this.

perhaps we can optionally generate store versions that are less expensive.

I think [/hope] if a task isn't in the pool then the additional information the TaskProxy object holds shouldn't be necessary because the task has either already run or not yet run.

oliver-sanders · 2023-01-23T17:21:18Z

those mysterious n_window_* dicts

Could you give me a quick rundown of what those dicts are doing at some point. I need to understand this stuff better.

dwsutherland · 2023-01-26T07:45:20Z

those mysterious n_window_* dicts

Could you give me a quick rundown of what those dicts are doing at some point. I need to understand this stuff better.

An overview of the pruning would help, consider:

a => b => c => e
b => d

with b active (and n=1) the associated set of nodes is B={a, b, c, d},
when c is active the associated set is C={b, c, e}.

When b leaves the pool (becomes inactive) we flag it, then we do a union (B, . . .) of all the association sets of flagged nodes (and/or those no longer in the active pool)..
We also do a union(C, . . .) of active associated nodes.
The difference between these union sets is what we prune (and anything else flagged)... There are a few other rules for exceptions, like a change in window size, isolates, and end of branch.. but this is essentially it.

Now:
self.n_window_nodes[active_id] = set() - is this association being initialized for active_id

self.n_window_edges[active_id] = set() - is the same for edges, but only used to determine if the edge has been walked down for this task (don't create another, or walk in a circle)

self.n_window_boundary_nodes[active_id] = {} - is a similar association but according to edge distance from active_id, it actually extents to n+1 distance (hence why we walk 1 distance further than n), so we can trigger pruning for anything beyond the window extent (which catches some rare cases). (we don't generate store nodes to +1, just associations)

Hope that helps

oliver-sanders · 2023-01-26T15:07:39Z

Closing this issue as fixed by #5319 and (improved further by #5321, #5325, ...), thanks all for your work on this one! I think we should be able to get these fixes deployed before anyone runs into the issue for real.

Tracking of further enhancements to be covered by cylc/cylc-admin#38

oliver-sanders added bug Something is wrong :( efficiency For notable efficiency improvements labels Jan 20, 2023

oliver-sanders added this to the cylc-8.1.x milestone Jan 20, 2023

oliver-sanders mentioned this issue Jan 20, 2023

Avoid creating task proxies #5316

Closed

8 tasks

dwsutherland mentioned this issue Jan 21, 2023

data-store graph window efficiency/refactor #5319

Merged

8 tasks

MetRonnie mentioned this issue Jan 23, 2023

Optimize ID/tokens code #5321

Merged

8 tasks

wxtim mentioned this issue Jan 23, 2023

remove use of get platform at itask.__init__ #5318

Closed

6 tasks

oliver-sanders mentioned this issue Jan 24, 2023

tokens: speed up duplication of Tokens objects #5325

Merged

8 tasks

oliver-sanders modified the milestones: cylc-8.1.x, cylc-8.x, 8.1.1 Jan 26, 2023

oliver-sanders closed this as completed Jan 26, 2023

oliver-sanders mentioned this issue Mar 28, 2023

efficiency: increment_graph_window #5435

Closed

hjoliver mentioned this issue Mar 29, 2023

Spawning + datastore with thousands of "ready" tasks #5437

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

efficiency: increment_graph_window #5315

efficiency: increment_graph_window #5315

oliver-sanders commented Jan 20, 2023

oliver-sanders commented Jan 20, 2023 •

edited

Loading

oliver-sanders commented Jan 20, 2023 •

edited

Loading

dwsutherland commented Jan 21, 2023 •

edited

Loading

oliver-sanders commented Jan 23, 2023 •

edited

Loading

oliver-sanders commented Jan 23, 2023

dwsutherland commented Jan 26, 2023 •

edited

Loading

oliver-sanders commented Jan 26, 2023 •

edited

Loading

efficiency: increment_graph_window #5315

efficiency: increment_graph_window #5315

Comments

oliver-sanders commented Jan 20, 2023

oliver-sanders commented Jan 20, 2023 • edited Loading

oliver-sanders commented Jan 20, 2023 • edited Loading

dwsutherland commented Jan 21, 2023 • edited Loading

oliver-sanders commented Jan 23, 2023 • edited Loading

oliver-sanders commented Jan 23, 2023

dwsutherland commented Jan 26, 2023 • edited Loading

oliver-sanders commented Jan 26, 2023 • edited Loading

oliver-sanders commented Jan 20, 2023 •

edited

Loading

oliver-sanders commented Jan 20, 2023 •

edited

Loading

dwsutherland commented Jan 21, 2023 •

edited

Loading

oliver-sanders commented Jan 23, 2023 •

edited

Loading

dwsutherland commented Jan 26, 2023 •

edited

Loading

oliver-sanders commented Jan 26, 2023 •

edited

Loading