Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow cylc hold to hold future tasks that haven't spawned yet #4238

Merged
merged 13 commits into from
Jun 22, 2021

Conversation

MetRonnie
Copy link
Member

@MetRonnie MetRonnie commented Jun 2, 2021

These changes close #3743

You can now make Cylc hold future tasks when they spawn (without having to use the --after option).

Note this only works for "exact" task identifiers, e.g. cylc hold my_flow mytask.1234; using globs (e.g. 'mytask.*') or status qualifiers (e.g. mytask.1234:failed) will only hold active tasks, not unspawned ones.

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.py and
    conda-environment.yml.
  • Appropriate tests are included (unit and functional).
  • Appropriate change log entry included.
  • (master branch) I have opened a documentation PR at Update docs for cylc hold cylc-doc#249.

@MetRonnie MetRonnie added this to the cylc-8.0b2 milestone Jun 2, 2021
@MetRonnie MetRonnie self-assigned this Jun 2, 2021
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed one call to TaskPool.queue_tasks() that actually queues multiple tasks at once - at the end of config reload - hence the broken functional tests. It's probably fine to queue those tasks one at a time rather than revert your change.

@hjoliver
Copy link
Member

hjoliver commented Jun 3, 2021

Generally looks great and works well though 🎉

We probably need a way to show what future tasks are in the hold list. Maybe just add the info to cylc dump output. Or cylc hold --show (but that would make hold an instant-gratification query command as well as an asynchronous mutation...).

Not just active tasks
- Ensure globs & status can only be used for holding active tasks; to hold future tasks, must use explicit task id
- Add unit/integration tests
@MetRonnie
Copy link
Member Author

You missed one call to TaskPool.queue_tasks() that actually queues multiple tasks at once - at the end of config reload - hence the broken functional tests. It's probably fine to queue those tasks one at a time rather than revert your change.

Agh, wasn't picked up by mypy because the method calling it wasn't type annotated. Fixed and rebased

cylc/flow/task_pool.py Outdated Show resolved Hide resolved
Comment on lines 511 to 522
def put_tasks_to_hold(
self, tasks: Set[Tuple[str, 'PointBase']]
) -> None:
"""Replace the tasks in the tasks_to_hold table."""
# As we are replacing the maps each time this is called, there
# shouldn't be much cost to calling this multiple times between
# processing of the db queue
self.db_deletes_map[self.TABLE_TASKS_TO_HOLD] = [{}]
self.db_inserts_map[self.TABLE_TASKS_TO_HOLD] = [
{"name": name, "cycle": str(point)}
for name, point in tasks
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure why I did it this way. There may have been a reason why I didn't only insert/delete the differing tasks, but I can't see it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be the approach used for other tables?

For the moment I've got my fingers in my ears going lalala, the DB usage needs some major rationalisation:

  • Repeated wiping and rebuilding of tables.
  • Repeated creation and destruction of the DB connection.
  • Un-normalised tables.

See #3872, #3666.

At the moment it's messy but working and we have bigger things to worry about so I think this work will slide back a while to come.

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

cylc/flow/task_pool.py Show resolved Hide resolved
Comment on lines +1090 to +1093
if point_str is None:
LOG.warning(f"No active instances of task: {item}")
n_warnings += 1
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is matching future tasks not active ones?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing I know. This function relies on you having previously filtered out the matching active tasks using self.filter_task_proxies(). So if the point is not specified e.g. mytask, equivalent to mytask.*, and there is no active instance of mytask, then I say that is the correct warning to give.

def hold_active_task(self, itask: TaskProxy) -> None:
if itask.state.reset(is_held=True):
self.data_store_mgr.delta_task_held(itask)
self.tasks_to_hold.add((itask.tdef.name, itask.point))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Active tasks don't need to go into the tasks_to_hold set, if they were in the set before I think they should come out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't need to, but I think it's easier to deal with if everything that is held or is going to be held is in the set. They should only need to come out when released.

cylc/flow/task_pool.py Outdated Show resolved Hide resolved
Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this one is quite important, for users 👍

@oliver-sanders
Copy link
Member

Testing is going well, I've hit one issue with the n=1 data store which I had a poke at as I wasn't sure how easy/possible it was to shim, turns out actually quite easy. Here's a PR, feel free to use it or re-work it MetRonnie#6.

oliver-sanders and others added 2 commits June 4, 2021 12:04
@oliver-sanders
Copy link
Member

I think some tests have been angered by the recent conftest.py change?

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, testing went fine, DB housekeeping ok.

Found the following quirks:

  • Completed tasks remain in tasks_to_hold.

    E.G. If you hold a task then trigger it it will remain in tasks_to_hold even after it has passed out of the n=1 window. These stuck tasks can be removed by cylc release <flow> --all.

    It's a quirk not a bug, I don't think it's a problem or worth the fuss handling differently.

  • Behaviour with reflow.

    I think we will want to add the "flow label" into the tasks_to_hold set in order to handle the situation where we have the same task from different flows. E.G. the user might want to hold one flow but the another @hjoliver

    Easy change to make, happy to do now or later.

  • Need to bodge TUI and GUI to handle future families

    Follow-on not for this PR

    Because globing/matching is not allowed in the n>0 window holding/releasing tasks in the n>0 window from Tui/Gui doesn't work. Easy fix, we should get Tui/Gui to expand the cycle/family to the full list of tasks in the selection.

    Opened:

@oliver-sanders
Copy link
Member

(will leave @hjoliver with the reflow comment above)

@MetRonnie
Copy link
Member Author

I have also just noticed:

  • If you start a simple workflow (P1 = foo) paused, then hold all active tasks, then resume; foo.1 will apparently ignore the held state and will run
INFO - Processing 1 queued command(s)
INFO - Command succeeded: hold(['foo.*'])
INFO - Processing 1 queued command(s)
INFO - RESUMING the workflow now
INFO - Command succeeded: resume()
INFO - Queue released: foo.1
INFO - [foo.1] -submit-num=01, host=hostymchostface
INFO - [foo.1] -triggered off []
INFO - [foo.1] status=preparing (held): (internal)submitted at 2021-06-07  for job(01) flow(E)
INFO - [foo.1] -health check settings: submission timeout=None, polling intervals=PT15M,...
INFO - [foo.1] status=submitted (held): (received)started at 2021-06-07  for job(01) flow(E)
INFO - [foo.1] -health check settings: execution timeout=None, polling intervals=PT15M,...
INFO - [foo.1] status=running (held): (received)succeeded at 2021-06-07  for job(01) flow(E)

I think this can be fixed in a separate bugfix PR

@MetRonnie
Copy link
Member Author

E.G. If you hold a task then trigger it it will remain in tasks_to_hold even after it has passed out of the n=1 window. These stuck tasks can be removed by cylc release <flow> --all.

Hmm, perhaps it should be that triggering a task also releases it?

@oliver-sanders
Copy link
Member

I think we don't do that so that tasks can be triggered in a way that prevents retries from occurring?

There are other ways we could end up with held tasks drifting outside of the n=1 window (e.g. suicide triggers). I think the solution would be to remove the task from tasks_to_hold when the task is removed from the pool.

@MetRonnie
Copy link
Member Author

MetRonnie commented Jun 7, 2021

Another issue, which I realised from Oliver's comment #4246 (comment):

  • If you have a family A that has at cycle point 1:

    • a (n=0)
    • b (n=1)

    then doing cylc hold myflow A.1 will only hold a.1, and not set b.1 to be held. (If a.1 was n=1 instead, then holding A.1 would set both a.1 and b.1 to be held)

Update: we have decided to only expand families in the n=0 window. Resolved by 3b8cc40.

When holding e.g. FAM.2, only expand into tasks for active tasks, not future ones
@MetRonnie
Copy link
Member Author

(Bash test failure is due to Docker container initialisation stalling, don't think it's worth re-running)

@hjoliver
Copy link
Member

Behaviour with reflow.

I think we will want to add the "flow label" into the tasks_to_hold set in order to handle the situation where we have the same
task from different flows. E.G. the user might want to hold one flow but the another @hjoliver

Yeah, I'll post a follow-up issue .... #4277

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, no problems found 🎉

@hjoliver hjoliver merged commit f906a0f into cylc:master Jun 22, 2021
@MetRonnie MetRonnie deleted the hold branch June 22, 2021 08:14
@MetRonnie MetRonnie added the db change Change to the workflow database structure label Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
db change Change to the workflow database structure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Post-SoD task hold semantics
3 participants