Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

easier way to select cycle start tasks #5416

Open
Tracked by #5474
oliver-sanders opened this issue Mar 17, 2023 · 10 comments
Open
Tracked by #5474

easier way to select cycle start tasks #5416

oliver-sanders opened this issue Mar 17, 2023 · 10 comments
Assignees
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Mar 17, 2023

Meeting 2023-03-16

When a workflow is first run Cylc uses the following logic to determine the tasks which are initially added to the pool:

  • Tasks with no parents.
  • With cycle points on or after the initial cycle point.
  • Subject to the runahead limit.

If we want to start from the beginning in a new flow, say because the workflow has completed and we want to re-run it, Cylc does not currently poses any switches to allow the user to replicate this initial startup logic. Instead the user must work out what tasks they want to (re)start the workflow with and manually specify them as start-tasks.

What we could do with is something with functionality along the lines of:

$ cylc play --start-tasks='*' --start-cycle='<cycle>' --flow=new
@oliver-sanders oliver-sanders added this to the cylc-8.2.0 milestone Mar 17, 2023
@hjoliver
Copy link
Member

When a workflow is first run Cylc uses the following logic to determine the tasks which are initially added to the pool:

We can also start from a specified cycle point, or from specified tasks in the graph. But only for a first run from scratch, not a restart.

@hjoliver
Copy link
Member

hjoliver commented Mar 17, 2023

I have another possible use case, for restarting the same workflow run from a different point (although not the start of the graph).

Consider a DR situation with some workflows that need to fail-over and fail-back between two platforms. To avoid housekeeping or adapting the system to new run directories each time it may be convenient to restart the same workflow runs each time you fail over, but from a new point in the graph corresponding to the latest sync point from the other platform (not from wherever the workflow got to last to it ran on this platform).

(On the other hand, housekeeping with cylc clean '*' is pretty damned easy, so maybe a start from scratch would be sensible).

@oliver-sanders
Copy link
Member Author

This could also apply when triggering a new flow e.g:

# start a new flow at a specified start cycle e.g:
# * I changed the configuration and want to re-run everything from a specified cycle
#   onwards...
# * I fixed the data and want to rerun the workflow from that cycle on...
#
# ...but I don't want to specify all the cycle-start tasks because the graph is complicated
# or I didn't write it, or I don't want to manually specify all of the start tasks.
cylc trigger --flow=new --start-cp=<cycle> <workflow>

I've just received a use case for something like:

# start a new flow at a specified cycle which stops at a specified cycle e.g:
# * I changed the configuration and want to re-run a specified cycle range.
# * I changed the data and want to re-run a specified cycle range.
cylc trigger --flow=new --start-cp=<cycle1> --stop-cp=<cycle2> <workflow>

@oliver-sanders oliver-sanders changed the title play: easier way to warm-restart play: easier way to select cycle start tasks Jun 19, 2023
@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jun 19, 2023

Updated the issue title to reflect the more general problem of selecting the cycle start tasks (i.e. the default cold-start logic).

This functionality could relate to:

  • cylc play - e.g. restart a completed workflow from cycle start tasks
  • cylc trigger - e.g. trigger cycle state tasks [in a new flow]
  • cylc set - e.g. trigger cycle state tasks [in a new flow] or set cycle prereqs [in a new flow]

At a push it could also relate to:

  • cylc hold/resume - i.e. an equivalent to "hold after cycle point".

@oliver-sanders oliver-sanders changed the title play: easier way to select cycle start tasks easier way to select cycle start tasks Jun 21, 2023
@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jul 5, 2023

Along with selecting cycle-start tasks, there are some use cases for the similar problem of selecting family-start tasks.

E.G. I want to trigger a new flow starting from this family.

At present we can do cylc trigger <workflow>//<cycle>/<family> --flow=new but this will only operate on tasks already in the pool so doesn't really work. Similar to selecting cycle-start tasks, we would only want to trigger the family-start tasks in this case.

It's exactly the same logic, cycles and families are essentially both groupings of tasks and the solution is the same for both cases i.e. iterate over tasks in the group to search for parentless ones applying the pre-initial condition (i.e. ignoring any edges entering the group from outside*).

I think this might be especially useful for rose-stem use cases where users may want to trigger a rebuild for a certain platform, and re-run everything downstream. This is not as simple as triggering a single task.

Generalising this to the extreme, we could also consider applying the same logic to arbitrary groups i.e. globs e.g. *build.

@oliver-sanders oliver-sanders modified the milestones: cylc-8.2.0, cylc-8.3.0 Jul 11, 2023
@hjoliver
Copy link
Member

hjoliver commented Jul 12, 2023

At present we can do cylc trigger <workflow>//<cycle>/<family> --flow=new but this will only operate on tasks already in the pool so doesn't really work.

Huh, definitely not the way it should work.

@oliver-sanders
Copy link
Member Author

To proceed we need a proposal for how this functionality is going to be exposed on the CLI / in the schema which considers the use cases mentioned above.

See also this cylc-admin proposal point (9): https://github.com/cylc/cylc-admin/blob/master/docs/proposal-interventions.md#9-i-want-to-run-a-cycle-of-tasks-ahead-or-behind-of-the-flow

@MetRonnie
Copy link
Member

MetRonnie commented Aug 10, 2023

We could look at consolidating the --startcp and --start-task options for cylc play, seeing as the latter needs a relative ID that contains the cycle point anyway.

Possibly add an alias to the --start-task option called --start and let it accept --start=<cycle> or --start='<cycle>/*'?

@oliver-sanders
Copy link
Member Author

Note, possible conflict with #5763 which would like <cycle>/* to mean all tasks in cycle, not just the start tasks of cycle.

@oliver-sanders
Copy link
Member Author

In Cylc 8 we no longer support the behaviour where restart deleted the previous database, effectively running an implicit cylc remove '*'. So (contrary to my OP) I'm not sure whether it makes the best sense to try shimming this into the cylc play.

E.G. if this command is run on a workflow which has already run and been stopped:

$ cylc play --start-tasks='<cycle>/*'

It would actually:

  • Continue the original "flow-front" evolved from the original task pool.
  • And add a new "flow-front" starting from the selected cycle.

So one command, but two things. IMO it's simpler if play just means "continue" as originally intended and doesn't infer any additional task pool mangling. We probably don't want to encourage the notion that this functionality requires restart which has caused so many problems with Cylc 7.

It might be conceptually simpler to expose this functionality through cylc set alone:

$ # insert the start tasks from <cycle> but don't trigger them
$ # i.e. allow xtriggers to hold them back as appropriate
$ cylc set <cycle> --pre=all [--flow=new  # if re-running tasks]

Which is kinda similar to cylc reset --state=waiting only it creates a new task instance rather than recycling an old one and won't interfere with n=0 tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants