-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nested subworkflows. #4822
Conversation
Example $ tree hydro/
hydro/
├── flow.cylc
└── sub-wf
└── flow.cylc Main workflow: [scheduler]
cycle point format = %Y
[scheduling]
initial cycle point = 2022
[[graph]]
P1Y = "foo => sub => bar"
[runtime]
[[foo, bar]]
script = sleep 10
[[sub]]
script = cylc__job__subworkflow sub-wf 1/post # EASY!! Sub-workflow: [scheduling]
[[graph]]
R1 = "pre => proc => post"
[runtime]
[[pre, post, proc]]
script = sleep 10 Install ~/cylc-src/hydro $ cylc install
~/cylc-src/hydro $ tree -L 3 ~/cylc-run/hydro/
/home/oliverh/cylc-run/hydro/
├── _cylc-install
│ └── source -> /home/oliverh/cylc-src/hydro
├── run1
│ ├── flow.cylc
│ ├── log
│ │ └── install
│ └── sub-wf
│ └── flow.cylc
└── runN -> run1 Run $ cylc play hydro
...
$ cylc scan
hydro/run1 niwa-1007823l:43088
hydro/run1/sub-wf/2026 niwa-1007823l:43079
hydro/run1/sub-wf/2024 niwa-1007823l:43041
hydro/run1/sub-wf/2022 niwa-1007823l:43059
hydro/run1/sub-wf/2025 niwa-1007823l:43020
hydro/run1/sub-wf/2023 niwa-1007823l:43046
$ tree -d -I 'work|log|share' -L 3 ~/cylc-run/hydro/
/home/oliverh/cylc-run/hydro/
├── _cylc-install
│ └── source -> /home/oliverh/cylc-src/hydro
├── run1
│ └── sub-wf # <----- subworkflow source directory (installed with hydro/run1)
│ ├── 2022 # <----- subworkflow run directories (one for each main cycle point)
│ ├── 2023
│ ├── 2024
│ ├── 2025
│ └── 2026
└── runN -> run1 |
5eb84e5
to
4b17570
Compare
4b17570
to
549deb2
Compare
Easy management and housekeeping, e.g.:
|
Limitations Subworkflow instances are not installed with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub workflows are a tricky problem, I completely understand (and support) the desire (I have use cases too), however, I don't think we can "neatly" package them as a solution at this point in time. We would need to develop much more advanced integration until which time users will need to be aware of the implementation details in order to understand how to work with them. For example what do the task statuses mean, how do you shut the workflow(s) down, how to identify/handle graph execution issues in sub-workflows, etc.
So I would be inclined to put this in the docs rather than implement it in the job script until we're ready to make this a formal & supported Cylc feature. Anyone who uses this approach really needs to understand what it is doing. The long-term solution will definitely require a different interface so this isn't an approach we can iterate to perfection.
The difference between parent, parent/sub
and parent/controller, parent/sub
is marginal. With UID 'parent/*'
allows them to be controlled as a block either way around so there isn't really any functional difference.
On the implementation side here are some pros and cons for the detatch/no-detach implementation detail:
No-Detach
pros:
- task:running means sub-workflow running.
- task:succeed means sub-workflow:succeeded which provides a handy shortcut for sub-workflow:finish-all which is otherwise hard to do (although if "all" tasks are expected to succeed
root:succeed-all => fin
is a valid approach) cylc stop --kill
andcylc kill
will work on the sub-workflows, kinda (workflow [events] could potentially niceify this sort of thing).- Potentially compatible-ish with auto retries and
cylc trigger
(which could resume the workflow run).
cons:
- Workflow exit codes don't necessarily match data outcome expectations.
- Not great for branched workflows.
- Blocks workflow migration.
Detach:
pros:
- Permits workflow migration.
- Helps prevent workflow servers getting inundated with sub-workflows by allowing load balancing to kick in.
cons:
- task:running means sub-workflow task has not yet reached the desired state, however, does not convey whether the sub-workflow is able to reach that state at all (e.g. it could have stalled, or the task failed, this state will not make it back to the parent workflow even if the sub-workflow is configured to shutdown in this eventuality).
cylc stop
,cylc stop --kill
andcylc kill
don't really do what the user wants/expects.
Some problems to consider:
- How to differentiate "restarting" a sub-workflow from "re-running" a sub-workflow.
- How to feed sub-workflow info back to the UI (short to mid term), if we use xtriggers then xtriggers: use foreign task id for workflow_state xtriggers #4582 would work but we can't achieve this with a standalone task (task metadata would be an option but it is static).
- How to differentiate sub-workflow not is running / stalled / whatever from targetted task(s) have not reached the desired state yet.
- How to integrate with
cylc install
features, perhaps the sub-workflow installation should be implemented withincylc install
(special logic for sub-workflows)- symlink-dirs
- run numbers
- easy reinstallation?
cylc/flow/etc/job.sh
Outdated
# Symlink to the installed flow.cylc after renaming it to avoid | ||
# detection of the template as a workflow (if not already renamed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed? Otherwise could we build in sub-
awareness to the check causing the issue?
# Set subworkflow paths and ID (includes main cycle point): | ||
local SRC_DIR="${CYLC_WORKFLOW_RUN_DIR}/${NAME}" | ||
local RUN_DIR="${SRC_DIR}/${CYLC_TASK_CYCLE_POINT}" | ||
local ID="${RUN_DIR#*/cylc-run/}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CYLC_WORKFLOW_ID?
cylc/flow/etc/job.sh
Outdated
cylc workflow-state \ | ||
--max-polls=10 --interval=10 \ | ||
-p "${DONE%/*}" -t "${DONE#*/}" --status succeeded "$ID" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this will behave with "failed" and "submit-failed" outcomes.
Note my final commit overlapped your review, and tweaked the model a bit to report not-running+not-finished as failed. Some of the concerns above are arguably not important if sub-workflows are (as they really should be) treated as a monolithic task in the main workflow (and of course are appropriate for use in that way) ... in which case, if the sub-workflow fails or stops without finishing for any reason, then it is "failed" and retriggering it in the main workflow should run it again from scratch (although I was experimenting with restarts too without commenting on that). However, I generally agree. Particularly that anyone using this approach must understand exactly what the implications are) and had also been thinking about the various points above. Plus: #4821 (comment) And: this shell function could just as well be a workflow bin script, so I'll go that way for now, for my current sub-workflow users. So, closing this PR !!!! |
REQUIRES #4821
Supersedes #4477
A new job shell function to make it easy for users to run and manage subworkflows defined in sub-directories of the main workflow's source directory.
NOTE:
This is a small change with no associated Issue.
Requirements check-list
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
andconda-environment.yml
.