future task globbing #5763

hjoliver · 2023-10-06T03:18:45Z

On current master, task commands (such as hold and trigger):

can match in the task pool by globbing on point and name (task and family)
can only target "future" tasks (i.e., not in the task pool) by specific task ID (not even family names)

We don't need to support future task point globs: that would be dangerous, and rarely if ever useful (cylc trigger "*/*" - yikes! ... although if turns out there is a valid case for this I guess we could require explicit opt-in and stop at the runahead limit)

But we do need to support family name, and name globbing, for future tasks. If I want to trigger or hold a bunch of upcoming tasks, having to target each one individually is painful and unnecessary. Easy to do:

match the complete runtime name hierarchy against the given task/family name glob pattern
discard matching task names that aren't valid for the given cycle point

Ping @oliver-sanders - I can't see any good reason not to do this, but I have a vague recollection that you have reservations?

The reason might have been (?) that in the cylc hold case, the future "tasks-to-hold" list can potentially cause a memory leak (e.g. you might hold "future" tasks that are actually in the past, so they stay in the list forever)? But even so, adding multiple tasks-to-hold at once, by globbing, is no different in principle than doing the same thing one task at a time. And with #5750 we can release them all again just as easily.

The text was updated successfully, but these errors were encountered:

oliver-sanders · 2023-10-06T13:54:07Z

Not reservations so much as complications and a lack of mental bandwith to flush out the details.

The history:

Cylc 7 globbed over the SoS task pool.
This actually worked rather wall as the task pool typically contained the set of tasks you wanted to operate on for most operations.
The SoD task pool of Cylc 8 is much smaller so less useful.
We discussed this at the time, globbing over the task pool was the only simple solution.

Globing over future tasks can probably be done, but globbing over future tasks gets tricky as:

Not all tasks are guaranteed to be run in Cylc 8 (graph branching no longer requires explicit removal).
There may be an infinite number of future tasks to glob over.
Flows might influence the tasks you want to glob over (e.g. you might not want to select tasks which would cause flow merging when the operation is performed).

If we're not careful then cylc trigger * would cause an infinite number of simultaneous submissions and cylc remove * would remove all tasks in the database, essentially deleting the workflow. This will require a bit of thought which we simply do not have the time to fit in on the 8.3.0 timescale so bumped this to 8.x.

There are multiple duplicates of this issue which I think should be closed as superseded e.g:

family hold beyond n=0 #5695
cylc show for future tasks #5677
Improved cylc release task matching. #5752 (potentially)
Probably others...

I've argued on these issues for a consistent approach employed in the central globbing interface rather than trying to implement this on a command by command basis (which you may have mistaken for objection?).

Note, some commands (e.g. trigger) auto-insert tasks if not found in the pool. I think hold maintains a list of tasks which were requested to be held.

See also:

Clearer message for task not found in n=0 #5678

hjoliver · 2023-10-07T03:34:59Z

Not all tasks are guaranteed to be run in Cylc 8 (graph branching no longer requires explicit removal).

Yes, this creates a potential (but likely small) memory leak for cylc hold (but not for cylc trigger). I think the solution to this is just to make it easy to see what the future-hold list contains, and easy to remove tasks from it.

There may be an infinite number of future tasks to glob over.

That's why I said above we don't need to (and probably should not!) support cycle point globbing. But globbing over task and family names at particular cycle points will be needed.

Flows might influence the tasks you want to glob over

Not sure this is a problem anymore. #5698 already makes hold/release flow-specific.

For triggering you can assign flows (or use the default).

If we're not careful then cylc trigger * would cause an infinite number of simultaneous submissions and cylc remove * would remove all tasks in the database, essentially deleting the workflow.

Agreed, gotta be careful with this, but as above I'm suggesting we do NOT support future cycle-point glob - so infinity ceases to be a problem!

I've argued on these issues for a consistent approach employed in the central globbing interface rather than trying to implement this on a command by command basis (which you may have mistaken for objection?).

Yes I think that's what I was vaguely recalling, makes sense.

I do agree with that point in general, of course, but only so far as the central globbing interface does actually apply to different commands. See for instance #5752 - cylc release <future-tasks> does not need to consider all possible future tasks, it only needs to consider those that have already been held.

This will require a bit of thought which we simply do not have the time to fit in on the 8.3.0 timescale so bumped this to 8.x.

Yes we're pretty strapped at the moment. But the reason I have raised this and related issues already is it they are coming up as support problems already, and in important contexts.

One of Tom C's problems on the forum, for example: if a family has internal dependencies it is currently very difficult to hold all members at once. cylc hold FAM holds current members in the pool (but if they're running already, that doesn't stop child members from submitting- another issue raised recently); and all the other members have to be held individually as future tasks (not even by family name).

hjoliver · 2023-10-07T03:38:48Z

There are multiple duplicates of this issue which I think should be closed as superseded e.g:

Yes sorry about that, I wanted to get this one up and didn't have time to find all related ones (they're not exact duplicates, this is more general)... I'll take a look ASAP and close them if possible.

oliver-sanders · 2023-10-16T14:05:35Z

See also the closely related #5416 which is about selecting the start-tasks of a cycle which is essentially a subset of future task selection.

Kinds of task selection:

Select a single task.
- E.G. cylc hold <cycle>/<task>
Glob over task pool.
- E.G. cylc trigger *:failed.
Select cycle start tasks (warm start like logic)
- E.G. cylc play --start-tasks='<start-cycle>/*'
Select future tasks
- E.G. cylc hold '<cycle>/<family>'.

Also consider the possibility of historical task selection (e.g. cylc remove). Note that historical and future don't have a simple meaning in Cylc 8 as the past of one flow may be the future of another and vice versa.

Now to try and work out a way to convey that in a consistent way that makes sense and doesn't break existing interfaces...

retro486 · 2023-11-09T21:39:24Z

I'm not familiar with SoD (spawn on demand) or SoS (spawn on submit) and I'm still reading through the Spawn on Demand Proposal, so I apologize of this is totally out of left field, but can spawned tasks inherit certain attributes from their parents, "Parent" here being either the cycle point or family used in the selection? Something like an overall state (active, held, killed, etc) so even when new tasks are spawned they'll inherit that attribute and some init code checks that state to determine what should be done (i.e., nothing if active, hold if held, or despawn if killed).

hjoliver · 2023-11-10T00:36:26Z

@retro486 - good for you, reading the SoD proposal!! It was really not aimed at Cylc users, so don't feel bad if it was hard to follow! [Also, ~2 years post implementation it is now somewhat out of date].

so I apologize of this is totally out of left field, but can spawned tasks inherit certain attributes from their parents, "Parent" here being either the cycle point or family used in the selection?

Unfortunately the word "parent" is now "overloaded" in coding parlance. It can mean one of two entirely orthogonal concepts:

parent family, in the [runtime] hierarchy (for inheritance of task config settings)
upstream tasks in the graph, e.g. in foo => bar we say foo and bar have a parent and child relationship (no inheritance here, except of flow number)

When talking about anything to do with the scheduling algorithm, it'll be the latter concept.

[downstream tasks] hold if held,

We have an as-yet unresolved discussion in the team on whether or not held tasks should should spawn held children. At the moment, they do not. (In Cylc 8 a running task can be "held", which means it will not submit another job even if it fails and has retries lined up - but if it generates any outputs while running, they may still spawn tasks that are not themselves held).

[downstream tasks] or despawn if killed

SoD means "spawn on demand" - if you kill a task, there's no need to "despawn" downstream tasks that depend on its success, because they will not have been spawned in the first place.

retro486 · 2023-11-10T02:47:40Z

@retro486 - good for you, reading the SoD proposal!! It was really not aimed at Cylc users, so don't feel bad if it was hard to follow! [Also, ~2 years post implementation it is now somewhat out of date].

I think I got the gist, it was clear enough about the differences on how tasks spawn in Cylc 8 and now I have a better understanding on the behavior I'm seeing.

Unfortunately the word "parent" is now "overloaded" in coding parlance. It can mean one of two entirely orthogonal concepts:

parent family, in the [runtime] hierarchy (for inheritance of task config settings)

upstream tasks in the graph, e.g. in foo => bar we say foo and bar have a parent and child relationship (no inheritance here, except of flow number)

When talking about anything to do with the scheduling algorithm, it'll be the latter concept.

Got it.

We have an as-yet unresolved discussion in the team on whether or not held tasks should should spawn held children. At the moment, they do not. (In Cylc 8 a running task can be "held", which means it will not submit another job even if it fails and has retries lined up - but if it generates any outputs while running, they may still spawn tasks that are not themselves held).

Ok so it sounds like this isn't an issue of "how can we" but "should we". I would say from my own use that in testing suites, troubleshooting runs, etc, it would be very helpful to be able to re-run tasks that have previously succeeded without spawning a whole new flow. From what I could tell, there isn't a way to do that right now.

So at this point I think I see where the issue is if tasks that were paused spawn paused children, causing the suite the stall which may not be what we want. I think a better description of what I'm looking for would be the ability to re-run successful tasks (in a new flow?) without spawning the rest of the downstream tasks.

Something like:

cylc trigger --flow=test_flow --no-spawn <task pattern>

This might be too specific for this particular issue, sorry about that...

hjoliver · 2023-11-10T02:52:37Z

No, all good, and we can do what you need already:

future tasks (that have never run) use cylc trigger --flow=none (the task will run, but it won't spawn any downstream activity, and it will have no effect on upcoming flows)
past tasks (that already ran) use cylc trigger (the default is assign to current flows, which means the task will run because you told it to, but it won't spawn downstream activity because current flows already passed by there)

How's that?

(we should switch this back to discourse - it's a bit off topic now for this issue)

hjoliver · 2023-11-18T21:53:41Z

Superseded by #5827

hjoliver added this to the cylc-8.3.0 milestone Oct 6, 2023

hjoliver added the could be better Not exactly a bug, but not ideal. label Oct 6, 2023

oliver-sanders added sod-follow-up question Flag this as a question for the next Cylc project meeting. labels Oct 6, 2023

oliver-sanders modified the milestones: cylc-8.3.0, cylc-8.x Oct 6, 2023

oliver-sanders mentioned this issue Oct 16, 2023

easier way to select cycle start tasks #5416

Open

hjoliver added the superseded label Nov 18, 2023

hjoliver mentioned this issue Nov 18, 2023

Command task matching requirements #5827

Open

9 tasks

hjoliver closed this as completed Nov 18, 2023

hjoliver removed this from the cylc-8.x milestone Nov 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

future task globbing #5763

future task globbing #5763

hjoliver commented Oct 6, 2023 •

edited

Loading

oliver-sanders commented Oct 6, 2023 •

edited

Loading

hjoliver commented Oct 7, 2023

hjoliver commented Oct 7, 2023

oliver-sanders commented Oct 16, 2023 •

edited

Loading

retro486 commented Nov 9, 2023

hjoliver commented Nov 10, 2023 •

edited

Loading

retro486 commented Nov 10, 2023

hjoliver commented Nov 10, 2023 •

edited

Loading

hjoliver commented Nov 18, 2023

future task globbing #5763

future task globbing #5763

Comments

hjoliver commented Oct 6, 2023 • edited Loading

oliver-sanders commented Oct 6, 2023 • edited Loading

hjoliver commented Oct 7, 2023

hjoliver commented Oct 7, 2023

oliver-sanders commented Oct 16, 2023 • edited Loading

retro486 commented Nov 9, 2023

hjoliver commented Nov 10, 2023 • edited Loading

retro486 commented Nov 10, 2023

hjoliver commented Nov 10, 2023 • edited Loading

hjoliver commented Nov 18, 2023

hjoliver commented Oct 6, 2023 •

edited

Loading

oliver-sanders commented Oct 6, 2023 •

edited

Loading

oliver-sanders commented Oct 16, 2023 •

edited

Loading

hjoliver commented Nov 10, 2023 •

edited

Loading

hjoliver commented Nov 10, 2023 •

edited

Loading