Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nested subworkflows. #4822

Closed
wants to merge 2 commits into from

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Apr 13, 2022

REQUIRES #4821
Supersedes #4477

A new job shell function to make it easy for users to run and manage subworkflows defined in sub-directories of the main workflow's source directory.

  • these get installed with the main workflow (just like any main workflow source file)
  • the installed subworkflow source is a template to create each subworkflow instance on the fly (one for each main workflow cycle point)
  • the nesting is convenient for:
    • installing the subworkflow template automatically as part of the main workflow
    • intuitive grouping under the main workflow
    • targeting main and subworkflows together, with cylc commands

NOTE:

This is a small change with no associated Issue.

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Appropriate tests are included (unit and/or functional).
  • Already covered by existing tests.
  • Does not need tests (why?).
  • Appropriate change log entry included.
  • No change log entry required (why? e.g. invisible to users).
  • (master branch) I have opened a documentation PR at cylc/cylc-doc/pull/XXXX.
  • (7.8.x branch) I have updated the documentation in this PR branch.
  • No documentation update required.

@hjoliver hjoliver added the small label Apr 13, 2022
@hjoliver hjoliver self-assigned this Apr 13, 2022
@hjoliver
Copy link
Member Author

hjoliver commented Apr 13, 2022

Example

$ tree hydro/
hydro/
├── flow.cylc
└── sub-wf
    └── flow.cylc

Main workflow:

[scheduler]
    cycle point format = %Y
[scheduling]
    initial cycle point = 2022
    [[graph]]
        P1Y = "foo => sub => bar"
[runtime]
    [[foo, bar]]
        script = sleep 10
    [[sub]]
        script = cylc__job__subworkflow sub-wf 1/post  # EASY!!

Sub-workflow:

[scheduling]
   [[graph]]
       R1 = "pre => proc => post"
[runtime]
   [[pre, post, proc]]
       script = sleep 10

Install

~/cylc-src/hydro $ cylc install
~/cylc-src/hydro $ tree -L 3 ~/cylc-run/hydro/
/home/oliverh/cylc-run/hydro/
├── _cylc-install
│   └── source -> /home/oliverh/cylc-src/hydro
├── run1
│   ├── flow.cylc
│   ├── log
│   │   └── install
│   └── sub-wf
│       └── flow.cylc
└── runN -> run1

Run

$ cylc play hydro
...
$ cylc scan
hydro/run1 niwa-1007823l:43088
hydro/run1/sub-wf/2026 niwa-1007823l:43079
hydro/run1/sub-wf/2024 niwa-1007823l:43041
hydro/run1/sub-wf/2022 niwa-1007823l:43059
hydro/run1/sub-wf/2025 niwa-1007823l:43020
hydro/run1/sub-wf/2023 niwa-1007823l:43046

$ tree -d -I 'work|log|share' -L 3 ~/cylc-run/hydro/
/home/oliverh/cylc-run/hydro/
├── _cylc-install
│   └── source -> /home/oliverh/cylc-src/hydro
├── run1
│   └── sub-wf   # <----- subworkflow source directory (installed with hydro/run1)
│       ├── 2022   # <----- subworkflow run directories (one for each main cycle point)
│       ├── 2023
│       ├── 2024
│       ├── 2025
│       └── 2026
└── runN -> run1

@hjoliver
Copy link
Member Author

Easy management and housekeeping, e.g.:

$ cylc stop hydro/run1/*  # stop hydro/run1 and any subworkflows
$ cylc clean hydro/run1/*  # clean up subworkflows of hydro/run1

@hjoliver
Copy link
Member Author

Limitations

Subworkflow instances are not installed with cylc install (the template they are created from is, with the main workflow), so:

  • their locations aren't individually configurable via symlink-dirs (however, the whole run-directory will be wherever the main run directory is symlinked to)
  • installation hooks are bypassed, so subworkflows can't use rose-suite.conf...
  • other?

@hjoliver hjoliver added this to the cylc-8.0rc4 milestone Apr 13, 2022
@hjoliver hjoliver mentioned this pull request May 4, 2022
7 tasks
@hjoliver hjoliver marked this pull request as draft May 5, 2022 04:36
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub workflows are a tricky problem, I completely understand (and support) the desire (I have use cases too), however, I don't think we can "neatly" package them as a solution at this point in time. We would need to develop much more advanced integration until which time users will need to be aware of the implementation details in order to understand how to work with them. For example what do the task statuses mean, how do you shut the workflow(s) down, how to identify/handle graph execution issues in sub-workflows, etc.

So I would be inclined to put this in the docs rather than implement it in the job script until we're ready to make this a formal & supported Cylc feature. Anyone who uses this approach really needs to understand what it is doing. The long-term solution will definitely require a different interface so this isn't an approach we can iterate to perfection.

The difference between parent, parent/sub and parent/controller, parent/sub is marginal. With UID 'parent/*' allows them to be controlled as a block either way around so there isn't really any functional difference.

On the implementation side here are some pros and cons for the detatch/no-detach implementation detail:

No-Detach

pros:

  • task:running means sub-workflow running.
  • task:succeed means sub-workflow:succeeded which provides a handy shortcut for sub-workflow:finish-all which is otherwise hard to do (although if "all" tasks are expected to succeed root:succeed-all => fin is a valid approach)
  • cylc stop --kill and cylc kill will work on the sub-workflows, kinda (workflow [events] could potentially niceify this sort of thing).
  • Potentially compatible-ish with auto retries and cylc trigger (which could resume the workflow run).

cons:

  • Workflow exit codes don't necessarily match data outcome expectations.
  • Not great for branched workflows.
  • Blocks workflow migration.

Detach:

pros:

  • Permits workflow migration.
  • Helps prevent workflow servers getting inundated with sub-workflows by allowing load balancing to kick in.

cons:

  • task:running means sub-workflow task has not yet reached the desired state, however, does not convey whether the sub-workflow is able to reach that state at all (e.g. it could have stalled, or the task failed, this state will not make it back to the parent workflow even if the sub-workflow is configured to shutdown in this eventuality).
  • cylc stop, cylc stop --kill and cylc kill don't really do what the user wants/expects.

Some problems to consider:

  1. How to differentiate "restarting" a sub-workflow from "re-running" a sub-workflow.
  2. How to feed sub-workflow info back to the UI (short to mid term), if we use xtriggers then xtriggers: use foreign task id for workflow_state xtriggers #4582 would work but we can't achieve this with a standalone task (task metadata would be an option but it is static).
  3. How to differentiate sub-workflow not is running / stalled / whatever from targetted task(s) have not reached the desired state yet.
  4. How to integrate with cylc install features, perhaps the sub-workflow installation should be implemented within cylc install (special logic for sub-workflows)
    • symlink-dirs
    • run numbers
    • easy reinstallation?

Comment on lines 379 to 380
# Symlink to the installed flow.cylc after renaming it to avoid
# detection of the template as a workflow (if not already renamed).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed? Otherwise could we build in sub- awareness to the check causing the issue?

# Set subworkflow paths and ID (includes main cycle point):
local SRC_DIR="${CYLC_WORKFLOW_RUN_DIR}/${NAME}"
local RUN_DIR="${SRC_DIR}/${CYLC_TASK_CYCLE_POINT}"
local ID="${RUN_DIR#*/cylc-run/}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CYLC_WORKFLOW_ID?

Comment on lines 391 to 393
cylc workflow-state \
--max-polls=10 --interval=10 \
-p "${DONE%/*}" -t "${DONE#*/}" --status succeeded "$ID"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this will behave with "failed" and "submit-failed" outcomes.

@hjoliver
Copy link
Member Author

hjoliver commented May 5, 2022

Note my final commit overlapped your review, and tweaked the model a bit to report not-running+not-finished as failed.

Some of the concerns above are arguably not important if sub-workflows are (as they really should be) treated as a monolithic task in the main workflow (and of course are appropriate for use in that way) ... in which case, if the sub-workflow fails or stops without finishing for any reason, then it is "failed" and retriggering it in the main workflow should run it again from scratch (although I was experimenting with restarts too without commenting on that).

However, I generally agree. Particularly that anyone using this approach must understand exactly what the implications are) and had also been thinking about the various points above.

Plus: #4821 (comment)

And: this shell function could just as well be a workflow bin script, so I'll go that way for now, for my current sub-workflow users.

So, closing this PR !!!!

@hjoliver hjoliver closed this May 5, 2022
@hjoliver hjoliver removed this from the cylc-8.0rc4 milestone May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants