Ideas for more efficient scheduling of very large suites #108

cylc · 2012-09-04T03:51:54Z

First, see Issue #107 (more efficient scheduling of large ensembles).

Currently the scheduling algorithm works like this: every task in the suite is represented by its own task proxy. Every time a task proxy changes state (usually after receiving a progress message from the real task) each task proxy communicates its completed outputs to a broker, then each task proxy asks the broker to satisfy its prerequisites - by literal string matching of unsatisfied prerequisites to completed outputs. The broker simply dumps its outputs and starts again each time through the loop. To prevent the task pool from growing indefinitely spent tasks are eliminated from the pool as soon as cylc decides they are not needed to satisfy any other task's prerequisites anymore.

hjoliver · 2012-09-18T13:22:45Z

1/ is literal string matching (prerequisites <-> outputs) very inefficient? Could replace with numerical hashes. This is probably not important, but I don't know.

2/ make the broker state persistent (from one scheduling loop cycle to the next, and across restarts) and thereby get rid of all "succeeded" task proxies from the pool as soon as they've registered their completed outputs. This will greatly reduce the size of the task pool in large suites that are spread across a number of cycles, and enable us to get rid of the rather complicated spent task clean-up algorithm. To allow users to view and interact with finished tasks we could move them from the main task pool to another pool that does not participate in dependency negotiation.

3/ Inner-loop optimization by use of more efficient Python data structures and techniques. I've already done this once, to great effect, but there may be more to do.

4/ If necessary, re-code any inner-loop bottleneck code in C - apparently Python has an excellent C API, and C is ten times faster than Python.

matthewrmshin · 2012-09-19T07:50:36Z

I'd imagine that literal string matching would be highly optimised in Python? I cannot remember if Python uses numeric hashes to implement dict and set, but I would imagine it doing so, and probably very highly optimised. Assuming that, it should be quite efficient to implement the persistent broker state as a Python dict or set.

hjoliver · 2012-10-02T10:30:11Z

5/ now that cylc has a dependency graph specified up front, we could actually take a step back from completely indiscriminate dependency matching (each task looks at the outputs of all other tasks): tasks could just check the outputs of the others that, from the graph, we know they depend on.

[UPDATE] - done by #1688

hjoliver · 2012-11-19T01:09:15Z

6/ Here's an idea that could make cylc really lean and fast: the bulk of memory use is, I suspect, for storing the complete set of [runtime] configuration data (after inheritance expansion) for every task proxy in the suite. Most of this is, by definition, only needed by the individual tasks at runtime, it has no bearing on the scheduling at all - but it is potentially a lot to hold in memory for the duration of the suite run, and loading the suite definition at start-up may be slow if we have to fully load all task proxies. So, instead of storing [runtime] data in each task proxy, we could defer loading it (and doing the inheritance) until just prior to job submission, for each task. Runtime parsing and inheritance would thus be repeated for every new task instance (or we could do it once and store the result on disk somewhere) but it would happen for individual tasks in the (soon to be) background job submission worker thread, instead of doing it all up front and storing everything. The only data held by each task proxy object would be, roughly speaking, that which is relevant to the scheduling algorithm, namely prerequisites and outputs - very light. This is also relevant to #170.

[UPDATE] this is superseded by #1689

hjoliver · 2016-06-22T08:39:57Z

I think we can close this Issue as superseded. For the points above:

doesn't matter
was investigated in Use the run-db to satisfy task prerequisites. #1428, follow-up Follow up on persistent broker state #1902
and 4. - obvious things to do if profiling shows a bottleneck
done by Large suites: more efficient dependency matching. #1688
superseded by Large suites: light weight task proxies. #1689

cylc mentioned this issue Sep 5, 2012

More efficient scheduling of large ensembles #107

Closed

ghost assigned hjoliver Oct 5, 2012

hjoliver mentioned this issue Nov 28, 2012

Database of suite events #140

Closed

This was referenced Jun 25, 2014

limit task pool size #987

Closed

Further optimization of suite definition parsing #184

Closed

matthewrmshin removed the UKMO-3: other label Dec 17, 2014

hjoliver mentioned this issue Mar 25, 2015

spent task cleanup can be broken by some cylc-6 graphs #1392

Closed

This was referenced Dec 1, 2015

Allowing cycling tasks to run out of order. #1538

Closed

Large suites: light weight task proxies. #1689

Open

Large suites: more efficient dependency matching. #1688

Closed

hjoliver added the superseded label Jun 22, 2016

hjoliver closed this as completed Jun 22, 2016

matthewrmshin removed this from the later milestone Jun 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for more efficient scheduling of very large suites #108

Ideas for more efficient scheduling of very large suites #108

cylc commented Sep 4, 2012

hjoliver commented Sep 18, 2012

matthewrmshin commented Sep 19, 2012

hjoliver commented Oct 2, 2012 •

edited

Loading

hjoliver commented Nov 19, 2012 •

edited

Loading

hjoliver commented Jun 22, 2016

Ideas for more efficient scheduling of very large suites #108

Ideas for more efficient scheduling of very large suites #108

Comments

cylc commented Sep 4, 2012

hjoliver commented Sep 18, 2012

matthewrmshin commented Sep 19, 2012

hjoliver commented Oct 2, 2012 • edited Loading

hjoliver commented Nov 19, 2012 • edited Loading

hjoliver commented Jun 22, 2016

hjoliver commented Oct 2, 2012 •

edited

Loading

hjoliver commented Nov 19, 2012 •

edited

Loading