-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flow merge into no-flow task. #4645
Conversation
This wasn't difficult to fix once I understood the problem, so bumping up to 8.0rc1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note this is not currently compatible with #4640): Edit: now resolved
- unheld -> not-held in
cylc dump
spawn_successor
->spawn_successor_if_parentless
if completed_only: | ||
c_task.state.satisfy_me({ | ||
(str(itask.point), itask.tdef.name, output) | ||
}) | ||
self.data_store_mgr.delta_task_prerequisite(c_task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch isn't covered, need to manually test...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run the test from #4651 against a flow.cylc and a suite.rc variants and god the same result, happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something wrong with code coverage here. The new functional test here should hit this...
... brute force test: I put a print
statement in there and it does get executed. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea what could cause coverage to miss this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the coverage is accurate in this case. I put a LOG.critical("Here")
at L1273 and it didn't show up in the workflow log for tests/functional/spawn-on-demand/14-trigger-flow-blocker.t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, completed_only
is True for retroactive spawning on completed outputs (when an ongoing flow merges with a one-off manually-triggered task (thus making it part of the ongoing flow) ... i.e. what this PR is about). In back-compat mode, completed_only
is False and we spawn on all outputs ahead of time (before they are completed) to approximate Cylc 7 spawning of tasks before they are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the coverage is accurate in this case. I put a
LOG.critical("Here")
at L1273 and it didn't show up in the workflow log fortests/functional/spawn-on-demand/14-trigger-flow-blocker.t
Strange. I just tried the same, to double check, and it does show up in my workflow log. (I did have to disable the purge in the reftest
shell function, to keep the log for inspection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Maybe the test is not actually working correctly in some environments, namely yours and GitHub actions ... I'll investigate later...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a play but I can't get this branch to activate either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test fixed. It was flaky.
Fixes the reported bug, however, is missing a DB check which causes case (4) to fail in the #4651 tests. |
Rebased and deconflicted, post merge of #4640 |
Not needed - see #4651 (comment) |
OK, with my linked explanation of the controversial (Note also: |
if completed_only: | ||
c_task.state.satisfy_me({ | ||
(str(itask.point), itask.tdef.name, output) | ||
}) | ||
self.data_store_mgr.delta_task_prerequisite(c_task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the coverage is accurate in this case. I put a LOG.critical("Here")
at L1273 and it didn't show up in the workflow log for tests/functional/spawn-on-demand/14-trigger-flow-blocker.t
I think this is needed to maintain some consistencies / logic see #4651 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work across a restart:
[scheduling]
initial cycle point = 1
final cycle point = 5
cycling mode = integer
runahead limit = P1
[[graph]]
P1 = x
[runtime]
[[x]]
script = """
if (( CYLC_TASK_CYCLE_POINT == 1 )); then
cylc trigger "${CYLC_WORKFLOW_ID}//4/x"
elif (( CYLC_TASK_CYCLE_POINT == 3 )); then
cylc stop "${CYLC_WORKFLOW_ID}" --now --now
contact="${HOME}/cylc-run/${CYLC_WORKFLOW_ID}/.service/contact"
while [[ -f "${contact}" ]]; do
sleep 0.1
done
cylc play "${CYLC_WORKFLOW_ID}"
fi
"""
Log from the second run:
2022-02-04T12:20:38Z INFO - Scheduler: url=tcp://...:43024/ pid=50836
2022-02-04T12:20:38Z INFO - Workflow publisher: url=tcp://vld601.cmpd1.metoffice.gov.uk:43020
2022-02-04T12:20:38Z INFO - Run: (re)start=1 log=1
2022-02-04T12:20:38Z INFO - Cylc version: 8.0rc1.dev
2022-02-04T12:20:38Z INFO - Run mode: live
2022-02-04T12:20:38Z INFO - Initial point: 1
2022-02-04T12:20:38Z INFO - Final point: 5
2022-02-04T12:20:38Z INFO - [3/x running(runahead) job:01 flows:1] => running
2022-02-04T12:20:40Z INFO - [3/x running job:01 flows:1] (polled)started at 2022-02-04T12:20:35Z
2022-02-04T12:20:40Z INFO - [3/x running job:01 flows:1] health: execution timeout=None, polling intervals=PT15M,...
2022-02-04T12:20:40Z INFO - [4/x waiting(runahead) job:01 flows:1] (polled)succeeded at 2022-02-04T12:20:28Z
2022-02-04T12:20:40Z INFO - [4/x waiting(runahead) job:01 flows:1] => succeeded(runahead)
2022-02-04T12:20:41Z INFO - [3/x running job:01 flows:1] (received)succeeded at 2022-02-04T12:20:40Z
2022-02-04T12:20:41Z INFO - [3/x running job:01 flows:1] => succeeded
2022-02-04T12:20:41Z INFO - Workflow shutting down - AUTOMATIC
2022-02-04T12:20:41Z INFO - DONE
That's a bug, but it's not in scope for this PR (and not really specific to reflow either).
|
If I subtract one from the trigger point I get the same result: [scheduling]
initial cycle point = 1
final cycle point = 5
cycling mode = integer
runahead limit = P1
[[graph]]
P1 = x
[runtime]
[[x]]
script = """
if (( CYLC_TASK_CYCLE_POINT == 1 )); then
cylc trigger "${CYLC_WORKFLOW_ID}//3/x"
elif (( CYLC_TASK_CYCLE_POINT == 2 )); then
cylc stop "${CYLC_WORKFLOW_ID}" --now --now
contact="${HOME}/cylc-run/${CYLC_WORKFLOW_ID}/.service/contact"
while [[ -f "${contact}" ]]; do
sleep 0.1
done
cylc play "${CYLC_WORKFLOW_ID}"
fi
"""
Whereas with this diff, the workflow runs on to cycle 5 correctly: elif (( CYLC_TASK_CYCLE_POINT == 2 )); then
+ return
cylc stop "${CYLC_WORKFLOW_ID}" --now --now |
So: ... see #4658 Both variants run to completion with that change in. Full credit, you're hammering this issue so hard that we're flushing other bugs out of the system, which is great 😁 |
(Added some logging and rebased again) |
aafb0e3
to
4232627
Compare
(Rebased again to pick up unit test fixes). |
So much so I think I got a new one #4665 |
I'm ok with this approach, due to the number of trigger-related bugs uncovered recently I'd like to test/understand more tomorrow (still need to get other PRs merged so not blocking). Would be good to generalise the trigger tests to make sure we are covering all cases correctly. |
Fixed! (And not really spawn-on-demand related, for the record). |
One remaining issue with the coverage thing #4645 (comment) |
ed7d780
to
7e5f7ca
Compare
|
The test was flaky, so sometimes the merge didn't occur. Should be good now. |
Coverage 100% 🎉 |
These changes close #4644
Requirements check-list
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
andconda-environment.yml
.