-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parentless sequential xtrigger spawning #5738
parentless sequential xtrigger spawning #5738
Conversation
d9fc5a5
to
209e621
Compare
Note, the Cylc 7 spawning behaviour can be achieved by adding submit dependencies between the parentless tasks e.g:
This forces the sequentiality of |
IMO that introduces clutter and false dependencies into the workflow definition.. I would have to do that roughly thousands of times across all 50 workflows.. (a number of these workflows query all other workflows' tasks via Also some tasks might be in queues or tight job scheduler queues (and require submission retries).. I have a change inbound, like what we messaged about in Element, where this will be solved without effecting the current behavior.. Using one line of workflow definition .. |
2f57d84
to
ba46897
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense but haven't thought through the details as yet. Because this alters the spawning logic it does require a good amount of thought to ensure that it doesn't interact with other features. Here are some things this change could potentially interact with off the top of my head:
- Trigger - What happens if you manually trigger a task before the xtrigger is satisfied? If we're not careful, could this result in the task being removed from the workflow Cylc 7 style?
- Shutdown - We would also need to make sure that the task pool cant become empty as a result of delayed spawning as this could interact with shutdown logic.
- Remove - Could removing one instance of the task result in the task being removed from the workflow Cylc 7 style? Note
cylc remove
will work on active tasks soon. - Flows - Could the delayed spawning change the behaviour of flow merging in unexpected ways?
0f71185
to
2251aef
Compare
2251aef
to
0c80713
Compare
Restart/reload and remove are now handled.. Remove will cause the next occurrence to spawn, the argument being:
|
89957d6
to
bf54c77
Compare
I assume the docs are auto-generated from the config definition.. but I suppose I should build it with this branch to find out. |
7262f4c
to
116f297
Compare
68b1212
to
44fbc89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried a couple of functional tests, works nicely.
When a workflow is rapidly catching up, clock-triggered tasks get pushed through in batches of the runahead limit which is a slight quirk, but nothing we can't live with.
This logic needs more tests, particularly covering interactions with this logic, e.g. trigger, remove, reload, etc. These interactions are easy to get wrong and really easy to break by accident.
I've made a start at some integration tests and commented a couple of suggestions for new ones. Two of my tests currently fail and will need a look in:
# THIS FILE IS PART OF THE CYLC WORKFLOW ENGINE.
# Copyright (C) NIWA & British Crown (Met Office) & Contributors.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
"""Test interactions with sequential xtriggers."""
import pytest
from cylc.flow.cycling.iso8601 import ISO8601Point
@pytest.fixture()
def sequential(flow, scheduler):
id_ = flow({
'scheduler': {
'allow implicit tasks': 'True',
'cycle point format': 'CCYY',
},
'scheduling': {
'runahead limit': 'P2',
'initial cycle point': '2000',
'graph': {
'P1Y': '@wall_clock => foo',
}
}
})
sequential = scheduler(id_)
def list_tasks():
"""List the task instance cycle points present in the pool."""
nonlocal sequential
return sorted(itask.tokens['cycle'] for itask in sequential.pool.get_all_tasks())
sequential.list_tasks = list_tasks
return sequential
async def test_remove(sequential, start):
"""It should spawn the next instance when a task is removed.
Ensure that removing a task with a sequential xtrigger does not break the
chain causing future instances to be removed from the workflow.
"""
async with start(sequential):
# the scheduler starts with one task in the pool
assert sequential.list_tasks() == ['2000']
# it sequentially spawns out to the runahead limit
for year in range(2000, 2010):
foo = sequential.pool.get_task(ISO8601Point(f'{year}'), 'foo')
if foo.state(is_runahead=True):
break
sequential.xtrigger_mgr.call_xtriggers_async(foo)
sequential.pool.spawn_parentless_sequential_xtriggers()
assert sequential.list_tasks() == [
'2000',
'2001',
'2002',
'2003',
]
# remove all tasks in the pool
sequential.pool.remove_tasks(['*'])
# the next cycle should be automatically spawned
assert sequential.list_tasks() == ['2004']
# TODO: This usually fails because each task that is removed spawns its
# next instance, so depending on the order they are removed in, some
# tasks get removed then re-spawned again.
# NOTE: You won't spot this issue in a functional test because the
# re-spawned tasks are detected as completed and automatically removed.
# So ATM not dangerous, but potentially inefficient.
async def test_trigger(sequential, start):
"""It should spawn its next instance if triggered ahead of time.
If you manually trigger a sequentially spawned task before its xtriggers
have become satisfied, then the sequential spawning chain is broken.
The task pool should defend against this to ensure that triggering a task
doesn't cancel it's future instances.
"""
async with start(sequential):
assert sequential.list_tasks() == ['2000']
foo = sequential.pool.get_task(ISO8601Point('2000'), 'foo')
sequential.pool.force_trigger_tasks([foo.identity], {1})
foo.state_reset('succeeded')
sequential.pool.spawn_on_output(foo, 'succeeded')
assert sequential.list_tasks() == ['2001']
# TODO: This test fails with an empty pool suggesting that triggering
# could result in premature shutdown?
async def test_reload(sequential, start):
"""It should set the is_xtrigger_sequential flag on reload.
TODO: test that changes to the sequential status in the config get picked
up on reload
"""
async with start(sequential):
# the task should be marked as sequential
pre_reload = sequential.pool.get_task(ISO8601Point('2000'), 'foo')
assert pre_reload.is_xtrigger_sequential is True
# reload the workflow
sequential.pool.reload_taskdefs(sequential.config)
# the original task proxy should have been replaced
post_reload = sequential.pool.get_task(ISO8601Point('2000'), 'foo')
assert id(pre_reload) != id(post_reload)
# the new task should be marked as sequential
assert post_reload.is_xtrigger_sequential is True
# TODO: test that a task is marked as sequential if any of its xtriggers are
# sequential (as opposed to all)?
# TODO: test setting the sequential argument in [scheduling][xtrigger] items
# changes the behaviour
# TODO: test the interaction between "sequential xtriggers default" and the
# sequential argument to [scheduling][xtrigger]
# * Should we be able to override the default by setting sequential=False?
# * Or should that result in a validation error?
44fbc89
to
12db81d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just spotted 1 mistake in the conflict resolution but otherwise looks good to me
Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
e9ce3ff
to
3da7348
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part-way through... a few minor comments so far.
3da7348
to
f09ea62
Compare
f09ea62
to
63724c9
Compare
Testing ... looks good so far, nice change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks @dwsutherland
@MetRonnie - the only code changes after my review were a type annotation and config item name (which wasn't formally agreed on anyway), so no need for you to re-review. |
supersedes #5732
Spawning parentless xtrigger tasks out to the run ahead limit is unnecessary in the majority of cases, and creates a lot of UI clutter.
This PR makes sequential spawning on xtrigger satisfaction for parentless tasks optional, with non-sequential the default (no change). And the default for clock triggers.
Justification:
workflow_state
).~1/runahead-limit
times less xtriggers to check and tasks to show.At NIWA we have a modular approach to running our workflows, both in operations (around ~50 workflows, most interconnected) and in research (commonly downstream of operational workflows). So
wall_clock
andworkflow_state
are by far the most common xtriggers. We do have some requirement for non-sequentialworkflow_state
xtriggers (i.e. satellite checking, where data may or may not come in for a cycle), however, nearly all workflows and downstream workflows are sequential.. And many of these require very large runahead limits.Take the following simple example;
Currently that produces:
With this default behavior changed;
it is reduced to:
Real workflows commonly have way more tasks (branches length and/or parallel) and way larger runahead limits, so the efficiency and clutter-less gains are massive.
And if for some reason users want some xtriggers to be checked/run out into the future/RH-limit in a non-sequential manner, then they can override the default with an argument (i.e.
sequential=True/False
):(if there's multiple xtriggers on a single task it upholds the sequential setting if any)
This
sequential
argument name is reserved, and not passed onto the xtrigger module/function.It can be defined in the xtrigger module/function (as in workflow declaration), but must have a Boolean default:
(i.e. from modifying
cylc.flow.xtriggers.workflow_state
to include settings arg in function definition)cylc.flow.xtriggers.wall_clock.wall_clock
hassequential=True
setOn xtrigger satisfaction the corresponding parentless sequential xtriggered task (PSXT) will spawn out to the RH limit, or next occurrence of a PSXT, or non-parentless task occurrence.
Note:
Tui sometimes shows no tasks before a corresponding xtrigger is satisfied. This problem is not reflected in the data-store or WUI.
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).CHANGES.md
entry included if this is a change that can affect users