You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not strictly a bug, but some extreme optimization is required in Cylc 8 for pathological workflows where a huge number of tasks hit the active window at once.
Here the scheduler has to spawn 7000 tasks all at once, off of a:succeed, which takes ~5 minutes on my fairly powerful laptop, during which time the scheduler is unresponsive.
primarily, the datastore n-window computation is responsible
secondarily (much less time), each spawned tasks needs a database read to get submit number and flow info
Ok, I got bored after 20mins or so and cut the run off at that point. FYI, if you ctrl+c your workflow, the profile.prof file still gets generated.
The spawn_on_output function itself took 0.1562s.
The increment_graph_window function in the data store took 1139s (including its resulting calls).
So it's the data store not the task pool. The increment_graph_window function was called 4'325 times, but called itself recursively 30'276'650 times which is where the CPU gets soaked up.
If possible, batch together the increment_graph_window / task spawning to reduce the number of top-level calls to increment_graph_window.
Would require heavy refactoring, the function is designed to expand the graph around one task at a time.
Potential savings ~4000x
Come up with a more efficient approach to increment_graph_window.
I.E. Remember which nodes we have already visited to avoid repeat visits.
Potential savings somewhere between 750x and 43'000x depending on the impact of batching.
The end result of these increment_graph_window calls is 30,392,831 detokenise calls, but that's not really the culprit here. There are 7000 tasks and 7000 dependencies so there's only call for 14000 detokenise calls, so we're calling the interface ~2000 times more than we should be. If we can make detokenise faster, great, but reducing the number of calls is where the order of magnitude improvements we need will come from.
The text was updated successfully, but these errors were encountered:
Not strictly a bug, but some extreme optimization is required in Cylc 8 for pathological workflows where a huge number of tasks hit the active window at once.
Here the scheduler has to spawn 7000 tasks all at once, off of
a:succeed
, which takes ~5 minutes on my fairly powerful laptop, during which time the scheduler is unresponsive.Initial profiling results from @oliver-sanders shows:
The text was updated successfully, but these errors were encountered: